Deployment

Deployment of machine learning models requires repeating feature engineering steps on new data. In some cases, these steps need to be performed in near real-time. Featuretools has capabilities to ease the deployment of feature engineering.

Saving Features

First, let’s build some generate some training and test data in the same format. We use a random seed to generate different data for the test.

In [1]: import featuretools as ft

In [2]: es_train = ft.demo.load_mock_customer(return_entityset=True)

In [3]: es_test = ft.demo.load_mock_customer(return_entityset=True, random_seed=33)

Now let’s build some features definitions using DFS. Because we have categorical features, we also encode them with one hot encoding based on the values in the training data.

In [4]: feature_matrix, feature_defs = ft.dfs(entityset=es_train,
   ...:                                       target_entity="customers")
   ...: 

In [5]: feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)

In [6]: feature_matrix_enc
Out[6]: 
             zip_code = 02139  zip_code = 60091  zip_code = unknown  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount)  MODE(sessions.device) = desktop  MODE(sessions.device) = tablet  MODE(sessions.device) = mobile  MODE(sessions.device) = unknown                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id
1                           0                 1                   0                  131               10                  10236.77                                1                               0                               0                                0                   ...                                                     169.77                                 0.610052                                   41.95                               791.976505                              175.939423                                 9.299023                                 -0.377150                                5.857976                                        1                                -0.395358
2                           1                 0                   0                  122                8                   9118.81                                0                               0                               1                                0                   ...                                                     114.85                                 0.492531                                   42.96                               596.243506                              230.333502                                10.925037                                  0.962350                                7.420480                                        1                                -0.470007
3                           1                 0                   0                   78                5                   5758.24                                1                               0                               0                                0                   ...                                                      64.98                                 0.645728                                   21.77                               369.770121                              471.048551                                 9.819148                                 -0.244976                               12.537259                                        1                                -0.630425
4                           0                 1                   0                  111                8                   8205.28                                1                               0                               0                                0                   ...                                                      83.53                                 0.516262                                   17.27                               584.673126                              322.883448                                13.065436                                 -0.548969                               12.738488                                        1                                -0.497169
5                           1                 0                   0                   58                4                   4571.37                                0                               1                               0                                0                   ...                                                      73.09                                 0.830112                                   27.46                               313.448942                              198.522508                                 8.950528                                  0.098885                                5.599228                                        1                                -0.396571

[5 rows x 102 columns]

Now, we can use featuretools.save_features() to save a list features.

In [7]: ft.save_features(features_enc, "feature_definitions")

Calculating Feature Matrix for New Data

We can use featuretools.load_features() to read in a list of saved features for our new entity set.

In [8]: saved_features = ft.load_features('feature_definitions', es_test)

After we load the features back in, we can calculate the feature matrix.

In [9]: feature_matrix = ft.calculate_feature_matrix(saved_features)

In [10]: feature_matrix
Out[10]: 
             zip_code = 02139  zip_code = 60091  zip_code = unknown  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount)  MODE(sessions.device) = desktop  MODE(sessions.device) = tablet  MODE(sessions.device) = mobile  MODE(sessions.device) = unknown                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id
1                       False              True               False                  108                7                   8298.18                            False                           False                            True                            False                   ...                                                     145.67                                 0.888409                                   40.48                               541.452307                              264.820242                                11.560551                                 -0.989418                               11.336633                                        1                                -0.193705
2                        True             False               False                   73                5                   5615.36                             True                           False                           False                            False                   ...                                                     106.27                                 0.471924                                   34.93                               380.553253                              420.418805                                 3.513896                                  1.030220                                7.908124                                        1                                -0.191482
3                       False              True               False                   96                7                   8135.65                            False                            True                           False                            False                   ...                                                     160.04                                 0.114599                                   48.71                               581.583008                              377.210618                                12.120119                                  0.130497                               12.869592                                        1                                -0.655836
4                       False              True               False                  140                9                  11240.85                             True                           False                           False                            False                   ...                                                     159.64                                 0.129480                                   29.87                               731.382339                              211.918894                                11.642241                                 -0.271928                                7.969242                                        1                                -0.652966
5                       False              True               False                   83                7                   6781.33                            False                           False                            True                            False                   ...                                                     149.95                                 0.587567                                   60.29                               527.818923                              535.839994                                19.134789                                 -1.195453                               26.460616                                        1                                -0.435026

[5 rows x 102 columns]

As you can see above, we have the exact same features as before, but calculated on using our test data.