Deployment

Deployment of machine learning models requires repeating feature engineering steps on new data. In some cases, these steps need to be performed in near real-time. Featuretools has capabilities to ease the deployment of feature engineering.

Saving Features

First, let’s build some generate some training and test data in the same format. We use a random seed to generate different data for the test.

In [1]: import featuretools as ft

In [2]: es_train = ft.demo.load_mock_customer(return_entityset=True)

In [3]: es_test = ft.demo.load_mock_customer(return_entityset=True, random_seed=33)

Now let’s build some features definitions using DFS. Because we have categorical features, we also encode them with one hot encoding based on the values in the training data.

In [4]: feature_matrix, feature_defs = ft.dfs(entityset=es_train,
   ...:                                       target_entity="customers")
   ...: 

In [5]: feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)

In [6]: feature_matrix_enc
Out[6]: 
             zip_code = 02139  zip_code = 60091                       ...                         MODE(sessions.WEEKDAY(session_start)) = 2  MODE(sessions.WEEKDAY(session_start)) = unknown
customer_id                                                           ...                                                                                                                   
1                           0                 1                       ...                                                                 1                                                0
2                           1                 0                       ...                                                                 1                                                0
3                           1                 0                       ...                                                                 1                                                0
4                           0                 1                       ...                                                                 1                                                0
5                           1                 0                       ...                                                                 1                                                0

[5 rows x 102 columns]

Now, we can use featuretools.save_features() to save a list features.

In [7]: ft.save_features(features_enc, "feature_definitions")

Calculating Feature Matrix for New Data

We can use featuretools.load_features() to read in a list of saved features to calculate for our new entity set.

In [8]: saved_features = ft.load_features('feature_definitions')

After we load the features back in, we can calculate the feature matrix.

In [9]: feature_matrix = ft.calculate_feature_matrix(saved_features, es_test)

In [10]: feature_matrix
Out[10]: 
             zip_code = 02139  zip_code = 60091                       ...                         MODE(sessions.WEEKDAY(session_start)) = 2  MODE(sessions.WEEKDAY(session_start)) = unknown
customer_id                                                           ...                                                                                                                   
1                       False              True                       ...                                                              True                                            False
2                        True             False                       ...                                                              True                                            False
3                       False              True                       ...                                                              True                                            False
4                       False              True                       ...                                                              True                                            False
5                       False              True                       ...                                                              True                                            False

[5 rows x 102 columns]

As you can see above, we have the exact same features as before, but calculated on using our test data.