Exporting Feature Matrix

In this example, we’re working with a mock customer behavior dataset

In [1]: import featuretools as ft

In [2]: es = ft.demo.load_mock_customer(return_entityset=True)

In [3]: es
Out[3]: 
Entityset: transactions
  Entities:
    transactions [Rows: 500, Columns: 5]
    products [Rows: 5, Columns: 2]
    sessions [Rows: 35, Columns: 4]
    customers [Rows: 5, Columns: 3]
  Relationships:
    transactions.product_id -> products.product_id
    transactions.session_id -> sessions.session_id
    sessions.customer_id -> customers.customer_id

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities and a list of relationships and the “target_entity” to calculate features for. The output of DFS is a feature matrix and the corresponding list of feature definitions

In [4]: feature_matrix, features = ft.dfs(entityset=es,
   ...:                                   target_entity="customers",
   ...:                                   verbose=True)
   ...: 
Built 69 features

Elapsed: 00:00 | Remaining: ? | Progress:   0%|          | Calculated: 0/1 chunks
Elapsed: 00:00 | Remaining: 00:00 | Progress: 100%|##########| Calculated: 1/1 chunks

In [5]: feature_matrix
Out[5]: 
            zip_code  COUNT(sessions)  NUM_UNIQUE(sessions.device)                  ...                   MODE(sessions.YEAR(session_start))  MODE(sessions.MONTH(session_start))  MODE(sessions.WEEKDAY(session_start))
customer_id                                                                         ...                                                                                                                                 
1              60091               10                            3                  ...                                                 2014                                    1                                      2
2              02139                8                            3                  ...                                                 2014                                    1                                      2
3              02139                5                            2                  ...                                                 2014                                    1                                      2
4              60091                8                            3                  ...                                                 2014                                    1                                      2
5              02139                4                            3                  ...                                                 2014                                    1                                      2

[5 rows x 69 columns]

Save as csv

The feature matrix is a pandas dataframe that we can save to disk

In [6]: feature_matrix.to_csv("feature_matrix.csv")

We can also read it back in as follows:

In [7]: saved_fm = pd.read_csv("feature_matrix.csv", index_col="customer_id")

In [8]: saved_fm
Out[8]: 
             zip_code  COUNT(sessions)  NUM_UNIQUE(sessions.device)                  ...                   MODE(sessions.YEAR(session_start))  MODE(sessions.MONTH(session_start))  MODE(sessions.WEEKDAY(session_start))
customer_id                                                                          ...                                                                                                                                 
1               60091               10                            3                  ...                                                 2014                                    1                                      2
2                2139                8                            3                  ...                                                 2014                                    1                                      2
3                2139                5                            2                  ...                                                 2014                                    1                                      2
4               60091                8                            3                  ...                                                 2014                                    1                                      2
5                2139                4                            3                  ...                                                 2014                                    1                                      2

[5 rows x 69 columns]