Deployment

Deployment of machine learning models requires repeating feature engineering steps on new data. In some cases, these steps need to be performed in near real-time. Featuretools has capabilities to ease the deployment of feature engineering.

Saving Features

First, let’s build some generate some training and test data in the same format. We use a random seed to generate different data for the test.

In [1]: import featuretools as ft

In [2]: es_train = ft.demo.load_mock_customer(return_entityset=True)

In [3]: es_test = ft.demo.load_mock_customer(return_entityset=True, random_seed=33)

Now let’s build some features definitions using DFS. Because we have categorical features, we also encode them with one hot encoding based on the values in the training data.

In [4]: feature_matrix, feature_defs = ft.dfs(entityset=es_train,
   ...:                                       target_entity="customers")
   ...: 

In [5]: feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)

In [6]: feature_matrix_enc
Out[6]: 
             zip_code = 02139  zip_code = 60091  zip_code = unknown                   ...                     STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                           ...                                                                                                                                             
1                           0                 1                   0                   ...                                                   6.174849                                        1                                -0.468837
2                           1                 0                   0                   ...                                                   7.932827                                        1                                -0.516770
3                           1                 0                   0                   ...                                                  14.017082                                        1                                -0.734662
4                           0                 1                   0                   ...                                                  13.618016                                        1                                -0.543557
5                           1                 0                   0                   ...                                                   6.465431                                        1                                -0.442066

[5 rows x 102 columns]

Now, we can use featuretools.save_features() to save a list features.

In [7]: ft.save_features(features_enc, "feature_definitions")

Calculating Feature Matrix for New Data

We can use featuretools.load_features() to read in a list of saved features to calculate for our new entity set.

In [8]: saved_features = ft.load_features('feature_definitions')

After we load the features back in, we can calculate the feature matrix.

In [9]: feature_matrix = ft.calculate_feature_matrix(saved_features, es_test)

In [10]: feature_matrix
Out[10]: 
             zip_code = 02139  zip_code = 60091  zip_code = unknown                   ...                     STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                           ...                                                                                                                                             
1                       False              True               False                   ...                                                  12.244963                                        1                                -0.212977
2                        True             False               False                   ...                                                   8.841551                                        1                                -0.232112
3                       False              True               False                   ...                                                  13.900748                                        1                                -0.718584
4                       False              True               False                   ...                                                   8.452658                                        1                                -0.722550
5                       False              True               False                   ...                                                  28.580732                                        1                                -0.542573

[5 rows x 102 columns]

As you can see above, we have the exact same features as before, but calculated on using our test data.