Exporting Feature Matrix

In this example, we’re working with a mock customer behavior dataset

In [1]: import featuretools as ft

In [2]: es = ft.demo.load_mock_customer(return_entityset=True)

In [3]: es
Out[3]: 
Entityset: transactions
  Entities:
    customers [Rows: 5, Columns: 3]
    sessions [Rows: 35, Columns: 4]
    products [Rows: 5, Columns: 2]
    transactions [Rows: 500, Columns: 5]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities and a list of relationships and the “target_entity” to calculate features for. The output of DFS is a feature matrix and the corresponding list of feature definitions

In [4]: feature_matrix, features = ft.dfs(entityset=es,
   ...:                                   target_entity="customers",
   ...:                                   verbose=True)
   ...: 
Built 69 features

Elapsed: 00:00 | Remaining: ? | Progress:   0%|          | Calculated: 0/1 chunks
Elapsed: 00:00 | Remaining: 00:00 | Progress: 100%|##########| Calculated: 1/1 chunks

In [5]: feature_matrix
Out[5]: 
            zip_code  COUNT(transactions)  COUNT(sessions)                   ...                     STD(sessions.MAX(transactions.amount)) NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                  ...                                                                                                                                            
1              60091                  131               10                   ...                                                   6.174849                                       1                                -0.468837
2              02139                  122                8                   ...                                                   7.932827                                       1                                -0.516770
3              02139                   78                5                   ...                                                  14.017082                                       1                                -0.734662
4              60091                  111                8                   ...                                                  13.618016                                       1                                -0.543557
5              02139                   58                4                   ...                                                   6.465431                                       1                                -0.442066

[5 rows x 69 columns]

Save as csv

The feature matrix is a pandas dataframe that we can save to disk

In [6]: feature_matrix.to_csv("feature_matrix.csv")

We can also read it back in as follows:

In [7]: saved_fm = pd.read_csv("feature_matrix.csv", index_col="customer_id")

In [8]: saved_fm
Out[8]: 
             zip_code  COUNT(transactions)  COUNT(sessions)                   ...                     STD(sessions.MAX(transactions.amount)) NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                   ...                                                                                                                                            
1               60091                  131               10                   ...                                                   6.174849                                       1                                -0.468837
2                2139                  122                8                   ...                                                   7.932827                                       1                                -0.516770
3                2139                   78                5                   ...                                                  14.017082                                       1                                -0.734662
4               60091                  111                8                   ...                                                  13.618016                                       1                                -0.543557
5                2139                   58                4                   ...                                                   6.465431                                       1                                -0.442066

[5 rows x 69 columns]