What is Featuretools?

Featuertools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [1]: import featuretools as ft

Load Mock Data

In [2]: data = ft.demo.load_mock_customer()

Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions
  • sessions: unique sessions and associated attributes
  • transactions: list of events in this session
In [3]: customers_df = data["customers"]

In [4]: customers_df
Out[4]: 
   customer_id zip_code           join_date date_of_birth
0            1    60091 2011-04-17 10:48:33    1994-07-18
1            2    13244 2012-04-15 23:31:04    1986-08-18
2            3    13244 2011-08-13 15:42:34    2003-11-21
3            4    60091 2011-04-08 20:08:14    2006-08-15
4            5    60091 2010-07-17 05:27:50    1984-07-28

In [5]: sessions_df = data["sessions"]

In [6]: sessions_df.sample(5)
Out[6]: 
    session_id  customer_id   device       session_start
13          14            1   tablet 2014-01-01 03:28:00
6            7            3   tablet 2014-01-01 01:39:40
1            2            5   mobile 2014-01-01 00:17:20
28          29            1   mobile 2014-01-01 07:10:05
24          25            3  desktop 2014-01-01 05:59:40

In [7]: transactions_df = data["transactions"]

In [8]: transactions_df.sample(5)
Out[8]: 
     transaction_id  session_id    transaction_time product_id  amount
74              232           5 2014-01-01 01:20:10          1  139.20
231              27          17 2014-01-01 04:10:15          2   90.79
434              36          31 2014-01-01 07:50:10          3   62.35
420              56          30 2014-01-01 07:35:00          3   72.70
54              444           4 2014-01-01 00:58:30          4   43.59

First, we specify a dictionary with all the entities in our dataset.

In [9]: entities = {
   ...:    "customers" : (customers_df, "customer_id"),
   ...:    "sessions" : (sessions_df, "session_id", "session_start"),
   ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
   ...: }
   ...: 

Second, we specify how the entities are related. When 2 two entities have a one-to-many relationship, we call the “one” enitity, the “parent entity”. A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
   ....:                  ("customers", "customer_id", "sessions", "customer_id")]
   ....: 

Note

To manage setting up entities and relationships, the EntitySet class offer convenient APIs for managing data like this. See Representing Data with EntitySets for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature defintions.

Let’s first create a feature matrix for each customer in the data

In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
   ....:                                                  relationships=relationships,
   ....:                                                  target_entity="customers")
   ....: 

In [12]: feature_matrix_customers
Out[12]: 
            zip_code  COUNT(transactions)  DAY(date_of_birth)  MONTH(date_of_birth)  COUNT(sessions)  YEAR(date_of_birth)  SUM(transactions.amount)  WEEKDAY(date_of_birth) MODE(sessions.device)  MIN(transactions.amount)  MAX(transactions.amount)  YEAR(join_date)  SKEW(transactions.amount)  DAY(join_date)  NUM_UNIQUE(sessions.device)  MONTH(join_date)  MEAN(transactions.amount)  NUM_UNIQUE(transactions.product_id)  WEEKDAY(join_date)  MODE(transactions.product_id)  STD(transactions.amount)  SKEW(sessions.STD(transactions.amount))  SKEW(sessions.SUM(transactions.amount))  NUM_UNIQUE(sessions.WEEKDAY(session_start))  MAX(sessions.NUM_UNIQUE(transactions.product_id))  MIN(sessions.STD(transactions.amount))  MODE(sessions.WEEKDAY(session_start))  MEAN(sessions.COUNT(transactions))  SUM(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.NUM_UNIQUE(transactions.product_id))  MAX(sessions.MEAN(transactions.amount))  SKEW(sessions.COUNT(transactions))  MIN(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.MAX(transactions.amount))  STD(sessions.MIN(transactions.amount))  MEAN(sessions.MIN(transactions.amount))  MAX(sessions.SUM(transactions.amount))  SUM(sessions.SKEW(transactions.amount))  MEAN(sessions.STD(transactions.amount))  NUM_UNIQUE(sessions.MONTH(session_start))  MODE(sessions.DAY(session_start))  NUM_UNIQUE(sessions.YEAR(session_start))  SKEW(sessions.MAX(transactions.amount))  MEAN(sessions.MEAN(transactions.amount))  MAX(sessions.STD(transactions.amount))  SUM(sessions.MAX(transactions.amount))  MODE(sessions.MONTH(session_start))  STD(sessions.COUNT(transactions))  SKEW(sessions.MIN(transactions.amount))  STD(sessions.SKEW(transactions.amount))  MIN(sessions.MEAN(transactions.amount))  MIN(sessions.COUNT(transactions))  MEAN(sessions.SUM(transactions.amount))  SUM(sessions.STD(transactions.amount))  MODE(sessions.MODE(transactions.product_id))  MIN(sessions.SUM(transactions.amount))  STD(sessions.NUM_UNIQUE(transactions.product_id))  SKEW(sessions.NUM_UNIQUE(transactions.product_id))  MODE(sessions.YEAR(session_start))  NUM_UNIQUE(sessions.MODE(transactions.product_id))  MEAN(sessions.SKEW(transactions.amount))  MIN(sessions.MAX(transactions.amount))  MAX(sessions.COUNT(transactions))  SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
1              60091                  126                  18                     7                8                 1994                   9025.62                       0                mobile                      5.81                    139.43             2011                   0.019698              17                            3                 4                  71.631905                                    5                   6                              4                 40.442059                                -0.312355                                 0.778170                                            1                                                  5                               30.450261                                      2                           15.750000                                                 40                                           5.000000                                 88.755625                            1.946018                                                  5                               132.246250                                6.954507                                 9.823750                                 1613.93                                -0.476122                                39.093244                                          1                                  1                                         1                                -0.780493                                 72.774140                               46.905665                                 1057.97                                    1                           4.062019                                 2.440005                                 0.589386                                50.623125                                 12                              1128.202500                              312.745952                                             4                                  809.97                                           0.000000                                           0.000000                                 2014                                                  4                                  -0.059515                                  118.90                                 25                                   78.59                                 0.640252                                   26.36                               582.193117                              279.510713                                13.759314                                 -0.424949                                7.322191                                        1                                -1.038434
2              13244                   93                  18                     8                7                 1986                   7200.28                       0               desktop                      8.73                    146.81             2012                   0.098259              15                            3                 4                  77.422366                                    5                   6                              4                 37.705178                                 0.013087                                -0.440929                                            1                                                  5                               27.839228                                      2                           13.285714                                                 35                                           5.000000                                 96.581000                           -0.303276                                                  5                               133.090000                               15.874374                                22.085714                                 1320.64                                -0.277640                                36.957218                                          1                                  1                                         1                                -1.539467                                 78.415122                               47.935920                                  931.63                                    1                           3.450328                                 2.154929                                 0.509798                                61.910000                                  8                              1028.611429                              258.700528                                             3                                  634.84                                           0.000000                                           0.000000                                 2014                                                  4                                  -0.039663                                  100.04                                 18                                  154.60                                 0.755711                                   56.46                               548.905851                              251.609234                                11.477071                                  0.235296                               17.221593                                        1                                -0.763603
3              13244                   93                  21                    11                6                 2003                   6236.62                       4               desktop                      5.89                    149.15             2011                   0.418230              13                            3                 8                  67.060430                                    5                   5                              1                 43.683296                                -0.245703                                 2.246479                                            1                                                  5                               35.704680                                      2                           15.500000                                                 29                                           4.833333                                 82.109444                           -1.507217                                                  4                               141.271667                                5.424407                                11.035000                                 1477.97                                 2.286086                                42.883316                                          1                                  1                                         1                                -0.941078                                 67.539577                               50.110120                                  847.63                                    1                           2.428992                                 1.000771                                 0.429374                                55.579412                                 11                              1039.436667                              257.299895                                             1                                  889.21                                           0.408248                                          -2.449490                                 2014                                                  4                                   0.381014                                  126.74                                 18                                   66.21                                 0.854976                                   20.06                               405.237462                              219.021420                                11.174282                                  0.678544                               10.724241                                        1                                -0.289466
4              60091                  109                  15                     8                8                 2006                   8727.68                       1                mobile                      5.73                    149.95             2011                  -0.036348               8                            3                 4                  80.070459                                    5                   4                              2                 45.068765                                -1.065663                                -0.391805                                            1                                                  5                               29.026424                                      2                           13.625000                                                 37                                           4.625000                                110.450000                            0.282488                                                  4                               144.748750                               16.960575                                16.438750                                 1351.46                                 0.002764                                44.515729                                          1                                  1                                         1                                 0.027256                                 81.207189                               54.293903                                 1157.99                                    1                           3.335416                                 2.103510                                 0.387884                                70.638182                                 10                              1090.960000                              356.125829                                             1                                  771.68                                           0.517549                                          -0.644061                                 2014                                                  5                                   0.000346                                  139.20                                 18                                  131.51                                 0.382868                                   54.83                               649.657515                              235.992478                                13.027258                                  1.980948                                3.514421                                        1                                -0.711744
5              60091                   79                  28                     7                6                 1984                   6349.66                       5                mobile                      7.55                    149.02             2010                  -0.025941              17                            3                 7                  80.375443                                    5                   5                              5                 44.095630                                 0.204548                                 0.472342                                            1                                                  5                               36.734681                                      2                           13.166667                                                 30                                           5.000000                                 94.481667                           -0.317685                                                  5                               139.960000                                4.961414                                14.415000                                 1700.67                                 0.014384                                43.312326                                          1                                  1                                         1                                -0.333796                                 78.705187                               51.149250                                  839.76                                    1                           3.600926                                -0.470410                                 0.415426                                66.666667                                  8                              1058.276667                              259.873954                                             3                                  543.18                                           0.000000                                           0.000000                                 2014                                                  5                                   0.002397                                  128.51                                 18                                   86.49                                 0.602209                                   20.65                               472.231119                              402.775486                                11.007471                                  0.335175                                7.928001                                        1                                -0.539060

We now have dozens of new features to describe a customer’s behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
   ....:                                                 relationships=relationships,
   ....:                                                 target_entity="sessions")
   ....: 

In [14]: feature_matrix_sessions.head(5)
Out[14]: 
            customer_id   device  WEEKDAY(session_start)  MONTH(session_start)  MODE(transactions.product_id)  MEAN(transactions.amount) customers.zip_code  DAY(session_start)  MIN(transactions.amount)  NUM_UNIQUE(transactions.product_id)  YEAR(session_start)  COUNT(transactions)  SKEW(transactions.amount)  SUM(transactions.amount)  MAX(transactions.amount)  STD(transactions.amount)  NUM_UNIQUE(transactions.WEEKDAY(transaction_time))  customers.MODE(transactions.product_id)  customers.SKEW(transactions.amount)  customers.STD(transactions.amount)  customers.SUM(transactions.amount)  MODE(transactions.DAY(transaction_time))  customers.DAY(join_date)  customers.MONTH(date_of_birth)  customers.NUM_UNIQUE(transactions.product_id)  customers.COUNT(sessions)  customers.YEAR(date_of_birth)  NUM_UNIQUE(transactions.YEAR(transaction_time))  customers.YEAR(join_date)  customers.MEAN(transactions.amount)  customers.MONTH(join_date) customers.MODE(sessions.device)  customers.WEEKDAY(join_date)  customers.DAY(date_of_birth)  MODE(transactions.MONTH(transaction_time))  customers.COUNT(transactions)  customers.MAX(transactions.amount)  customers.MIN(transactions.amount)  MODE(transactions.YEAR(transaction_time))  customers.NUM_UNIQUE(sessions.device)  MODE(transactions.WEEKDAY(transaction_time))  customers.WEEKDAY(date_of_birth)  NUM_UNIQUE(transactions.MONTH(transaction_time))  NUM_UNIQUE(transactions.DAY(transaction_time))
session_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
1                     2  desktop                       2                     1                              3                  76.813125              13244                   1                     20.91                                    5                 2014                   16                   0.295458                   1229.01                    141.66                 41.600976                                                  1                                         4                             0.098259                           37.705178                             7200.28                                         1                        15                               8                                              5                          7                           1986                                                1                       2012                            77.422366                           4                         desktop                             6                            18                                           1                             93                              146.81                                8.73                                       2014                                      3                                             2                                 0                                                 1                                               1
2                     5   mobile                       2                     1                              5                  74.696000              60091                   1                      9.32                                    5                 2014                   10                  -0.160550                    746.96                    135.25                 45.893591                                                  1                                         5                            -0.025941                           44.095630                             6349.66                                         1                        17                               7                                              5                          6                           1984                                                1                       2010                            80.375443                           7                          mobile                             5                            28                                           1                             79                              149.02                                7.55                                       2014                                      3                                             2                                 5                                                 1                                               1
3                     4   mobile                       2                     1                              1                  88.600000              60091                   1                      8.70                                    5                 2014                   15                  -0.324012                   1329.00                    147.73                 46.240016                                                  1                                         2                            -0.036348                           45.068765                             8727.68                                         1                         8                               8                                              5                          8                           2006                                                1                       2011                            80.070459                           4                          mobile                             4                            15                                           1                            109                              149.95                                5.73                                       2014                                      3                                             2                                 1                                                 1                                               1
4                     1   mobile                       2                     1                              5                  64.557200              60091                   1                      6.29                                    5                 2014                   25                   0.234349                   1613.93                    129.00                 40.187205                                                  1                                         4                             0.019698                           40.442059                             9025.62                                         1                        17                               7                                              5                          8                           1994                                                1                       2011                            71.631905                           4                          mobile                             6                            18                                           1                            126                              139.43                                5.81                                       2014                                      3                                             2                                 0                                                 1                                               1
5                     4   mobile                       2                     1                              5                  70.638182              60091                   1                      7.43                                    5                 2014                   11                   0.336381                    777.02                    139.20                 48.918663                                                  1                                         2                            -0.036348                           45.068765                             8727.68                                         1                         8                               8                                              5                          8                           2006                                                1                       2011                            80.070459                           4                          mobile                             4                            15                                           1                            109                              149.95                                5.73                                       2014                                      3                                             2                                 1                                                 1                                               1

What’s next?

Get help

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

  1. For usage questions, use Stack Overflow with the featuretools tag.
  2. For bugs, issues, or feature requests start a Github issue.
  3. For discussion regarding development on the core library, use gitter.
  4. For everything else, the core developers can be reached by email at help@featuretools.com.