What is Featuretools?

Featuertools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [1]: import featuretools as ft

Load Mock Data

In [2]: data = ft.demo.load_mock_customer()

Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions
  • sessions: unique sessions and associated attributes
  • transactions: list of events in this session
In [3]: customers_df = data["customers"]

In [4]: customers_df
Out[4]: 
   customer_id zip_code           join_date date_of_birth
0            1    60091 2011-04-17 10:48:33    1994-07-18
1            2    13244 2012-04-15 23:31:04    1986-08-18
2            3    13244 2011-08-13 15:42:34    2003-11-21
3            4    60091 2011-04-08 20:08:14    2006-08-15
4            5    60091 2010-07-17 05:27:50    1984-07-28

In [5]: sessions_df = data["sessions"]

In [6]: sessions_df.sample(5)
Out[6]: 
    session_id  customer_id   device       session_start
13          14            1   tablet 2014-01-01 03:28:00
6            7            3   tablet 2014-01-01 01:39:40
1            2            5   mobile 2014-01-01 00:17:20
28          29            1   mobile 2014-01-01 07:10:05
24          25            3  desktop 2014-01-01 05:59:40

In [7]: transactions_df = data["transactions"]

In [8]: transactions_df.sample(5)
Out[8]: 
     transaction_id  session_id    transaction_time product_id  amount
74              232           5 2014-01-01 01:20:10          1  139.20
231              27          17 2014-01-01 04:10:15          2   90.79
434              36          31 2014-01-01 07:50:10          3   62.35
420              56          30 2014-01-01 07:35:00          3   72.70
54              444           4 2014-01-01 00:58:30          4   43.59

First, we specify a dictionary with all the entities in our dataset.

In [9]: entities = {
   ...:    "customers" : (customers_df, "customer_id"),
   ...:    "sessions" : (sessions_df, "session_id", "session_start"),
   ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
   ...: }
   ...: 

Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the “one” enitity, the “parent entity”. A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
   ....:                  ("customers", "customer_id", "sessions", "customer_id")]
   ....: 

Note

To manage setting up entities and relationships, the EntitySet class offer convenient APIs for managing data like this. See Representing Data with EntitySets for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature defintions.

Let’s first create a feature matrix for each customer in the data

In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
   ....:                                                  relationships=relationships,
   ....:                                                  target_entity="customers")
   ....: 

In [12]: feature_matrix_customers
Out[12]: 
            zip_code  COUNT(sessions)  NUM_UNIQUE(sessions.device) MODE(sessions.device)  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(join_date)  DAY(date_of_birth)  YEAR(join_date)  YEAR(date_of_birth)  MONTH(join_date)  MONTH(date_of_birth)  WEEKDAY(join_date)  WEEKDAY(date_of_birth)  SUM(sessions.STD(transactions.amount))  SUM(sessions.MAX(transactions.amount))  SUM(sessions.SKEW(transactions.amount))  SUM(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  SUM(sessions.NUM_UNIQUE(transactions.product_id))  STD(sessions.SUM(transactions.amount))  STD(sessions.MAX(transactions.amount))  STD(sessions.SKEW(transactions.amount))  STD(sessions.MIN(transactions.amount))  STD(sessions.MEAN(transactions.amount))  STD(sessions.COUNT(transactions))  STD(sessions.NUM_UNIQUE(transactions.product_id))  MAX(sessions.SUM(transactions.amount))  MAX(sessions.STD(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  MAX(sessions.MEAN(transactions.amount))  MAX(sessions.COUNT(transactions))  MAX(sessions.NUM_UNIQUE(transactions.product_id))  SKEW(sessions.SUM(transactions.amount))  SKEW(sessions.STD(transactions.amount))  SKEW(sessions.MAX(transactions.amount))  SKEW(sessions.MIN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  SKEW(sessions.COUNT(transactions))  SKEW(sessions.NUM_UNIQUE(transactions.product_id))  MIN(sessions.SUM(transactions.amount))  MIN(sessions.STD(transactions.amount))  MIN(sessions.MAX(transactions.amount))  MIN(sessions.SKEW(transactions.amount))  MIN(sessions.MEAN(transactions.amount))  MIN(sessions.COUNT(transactions))  MIN(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.SUM(transactions.amount))  MEAN(sessions.STD(transactions.amount))  MEAN(sessions.MAX(transactions.amount))  MEAN(sessions.SKEW(transactions.amount))  MEAN(sessions.MIN(transactions.amount))  MEAN(sessions.MEAN(transactions.amount))  MEAN(sessions.COUNT(transactions))  MEAN(sessions.NUM_UNIQUE(transactions.product_id))  NUM_UNIQUE(sessions.MODE(transactions.product_id))  NUM_UNIQUE(sessions.DAY(session_start))  NUM_UNIQUE(sessions.YEAR(session_start))  NUM_UNIQUE(sessions.MONTH(session_start))  NUM_UNIQUE(sessions.WEEKDAY(session_start))  MODE(sessions.MODE(transactions.product_id))  MODE(sessions.DAY(session_start))  MODE(sessions.YEAR(session_start))  MODE(sessions.MONTH(session_start))  MODE(sessions.WEEKDAY(session_start))
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
1              60091                8                            3                mobile                   9025.62                 40.442059                    139.43                   0.019698                      5.81                  71.631905                  126                                    5                              4              17                  18             2011                 1994                 4                     7                   6                       0                              312.745952                                 1057.97                                -0.476122                                   78.59                               582.193117                                                 40                              279.510713                                7.322191                                 0.589386                                6.954507                                13.759314                           4.062019                                           0.000000                                 1613.93                               46.905665                                 0.640252                                   26.36                                88.755625                                 25                                                  5                                 0.778170                                -0.312355                                -0.780493                                 2.440005                                 -0.424949                            1.946018                                           0.000000                                   809.97                               30.450261                                  118.90                                -1.038434                                50.623125                                 12                                                  5                              1128.202500                                39.093244                               132.246250                                 -0.059515                                 9.823750                                 72.774140                           15.750000                                           5.000000                                                   4                                         1                                         1                                          1                                            1                                             4                                  1                                2014                                    1                                      2
2              13244                7                            3               desktop                   7200.28                 37.705178                    146.81                   0.098259                      8.73                  77.422366                   93                                    5                              4              15                  18             2012                 1986                 4                     8                   6                       0                              258.700528                                  931.63                                -0.277640                                  154.60                               548.905851                                                 35                              251.609234                               17.221593                                 0.509798                               15.874374                                11.477071                           3.450328                                           0.000000                                 1320.64                               47.935920                                 0.755711                                   56.46                                96.581000                                 18                                                  5                                -0.440929                                 0.013087                                -1.539467                                 2.154929                                  0.235296                           -0.303276                                           0.000000                                   634.84                               27.839228                                  100.04                                -0.763603                                61.910000                                  8                                                  5                              1028.611429                                36.957218                               133.090000                                 -0.039663                                22.085714                                 78.415122                           13.285714                                           5.000000                                                   4                                         1                                         1                                          1                                            1                                             3                                  1                                2014                                    1                                      2
3              13244                6                            3               desktop                   6236.62                 43.683296                    149.15                   0.418230                      5.89                  67.060430                   93                                    5                              1              13                  21             2011                 2003                 8                    11                   5                       4                              257.299895                                  847.63                                 2.286086                                   66.21                               405.237462                                                 29                              219.021420                               10.724241                                 0.429374                                5.424407                                11.174282                           2.428992                                           0.408248                                 1477.97                               50.110120                                 0.854976                                   20.06                                82.109444                                 18                                                  5                                 2.246479                                -0.245703                                -0.941078                                 1.000771                                  0.678544                           -1.507217                                          -2.449490                                   889.21                               35.704680                                  126.74                                -0.289466                                55.579412                                 11                                                  4                              1039.436667                                42.883316                               141.271667                                  0.381014                                11.035000                                 67.539577                           15.500000                                           4.833333                                                   4                                         1                                         1                                          1                                            1                                             1                                  1                                2014                                    1                                      2
4              60091                8                            3                mobile                   8727.68                 45.068765                    149.95                  -0.036348                      5.73                  80.070459                  109                                    5                              2               8                  15             2011                 2006                 4                     8                   4                       1                              356.125829                                 1157.99                                 0.002764                                  131.51                               649.657515                                                 37                              235.992478                                3.514421                                 0.387884                               16.960575                                13.027258                           3.335416                                           0.517549                                 1351.46                               54.293903                                 0.382868                                   54.83                               110.450000                                 18                                                  5                                -0.391805                                -1.065663                                 0.027256                                 2.103510                                  1.980948                            0.282488                                          -0.644061                                   771.68                               29.026424                                  139.20                                -0.711744                                70.638182                                 10                                                  4                              1090.960000                                44.515729                               144.748750                                  0.000346                                16.438750                                 81.207189                           13.625000                                           4.625000                                                   5                                         1                                         1                                          1                                            1                                             1                                  1                                2014                                    1                                      2
5              60091                6                            3                mobile                   6349.66                 44.095630                    149.02                  -0.025941                      7.55                  80.375443                   79                                    5                              5              17                  28             2010                 1984                 7                     7                   5                       5                              259.873954                                  839.76                                 0.014384                                   86.49                               472.231119                                                 30                              402.775486                                7.928001                                 0.415426                                4.961414                                11.007471                           3.600926                                           0.000000                                 1700.67                               51.149250                                 0.602209                                   20.65                                94.481667                                 18                                                  5                                 0.472342                                 0.204548                                -0.333796                                -0.470410                                  0.335175                           -0.317685                                           0.000000                                   543.18                               36.734681                                  128.51                                -0.539060                                66.666667                                  8                                                  5                              1058.276667                                43.312326                               139.960000                                  0.002397                                14.415000                                 78.705187                           13.166667                                           5.000000                                                   5                                         1                                         1                                          1                                            1                                             3                                  1                                2014                                    1                                      2

We now have dozens of new features to describe a customer’s behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
   ....:                                                 relationships=relationships,
   ....:                                                 target_entity="sessions")
   ....: 

In [14]: feature_matrix_sessions.head(5)
Out[14]: 
            customer_id   device  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(session_start)  YEAR(session_start)  MONTH(session_start)  WEEKDAY(session_start) customers.zip_code  NUM_UNIQUE(transactions.DAY(transaction_time))  NUM_UNIQUE(transactions.YEAR(transaction_time))  NUM_UNIQUE(transactions.MONTH(transaction_time))  NUM_UNIQUE(transactions.WEEKDAY(transaction_time))  MODE(transactions.DAY(transaction_time))  MODE(transactions.YEAR(transaction_time))  MODE(transactions.MONTH(transaction_time))  MODE(transactions.WEEKDAY(transaction_time))  customers.COUNT(sessions)  customers.NUM_UNIQUE(sessions.device) customers.MODE(sessions.device)  customers.SUM(transactions.amount)  customers.STD(transactions.amount)  customers.MAX(transactions.amount)  customers.SKEW(transactions.amount)  customers.MIN(transactions.amount)  customers.MEAN(transactions.amount)  customers.COUNT(transactions)  customers.NUM_UNIQUE(transactions.product_id)  customers.MODE(transactions.product_id)  customers.DAY(join_date)  customers.DAY(date_of_birth)  customers.YEAR(join_date)  customers.YEAR(date_of_birth)  customers.MONTH(join_date)  customers.MONTH(date_of_birth)  customers.WEEKDAY(join_date)  customers.WEEKDAY(date_of_birth)
session_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
1                     2  desktop                   1229.01                 41.600976                    141.66                   0.295458                     20.91                  76.813125                   16                                    5                              3                   1                 2014                     1                       2              13244                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          7                                      3                         desktop                             7200.28                           37.705178                              146.81                             0.098259                                8.73                            77.422366                             93                                              5                                        4                        15                            18                       2012                           1986                           4                               8                             6                                 0
2                     5   mobile                    746.96                 45.893591                    135.25                  -0.160550                      9.32                  74.696000                   10                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          6                                      3                          mobile                             6349.66                           44.095630                              149.02                            -0.025941                                7.55                            80.375443                             79                                              5                                        5                        17                            28                       2010                           1984                           7                               7                             5                                 5
3                     4   mobile                   1329.00                 46.240016                    147.73                  -0.324012                      8.70                  88.600000                   15                                    5                              1                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                         8                            15                       2011                           2006                           4                               8                             4                                 1
4                     1   mobile                   1613.93                 40.187205                    129.00                   0.234349                      6.29                  64.557200                   25                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             9025.62                           40.442059                              139.43                             0.019698                                5.81                            71.631905                            126                                              5                                        4                        17                            18                       2011                           1994                           4                               7                             6                                 0
5                     4   mobile                    777.02                 48.918663                    139.20                   0.336381                      7.43                  70.638182                   11                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                         8                            15                       2011                           2006                           4                               8                             4                                 1

What’s next?