featuretools.wrappers.DFSTransformer

class featuretools.wrappers.DFSTransformer(entities=None, relationships=None, entityset=None, target_entity=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_entities=None, ignore_variables=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False, profile=False)

Transformer using Scikit-Learn interface for Pipeline uses.

__init__(entities=None, relationships=None, entityset=None, target_entity=None, agg_primitives=None, trans_primitives=None, allowed_paths=None, max_depth=2, ignore_entities=None, ignore_variables=None, seed_features=None, drop_contains=None, drop_exact=None, where_primitives=None, max_features=-1, verbose=False, profile=False)

Creates Transformer

Parameters:
  • entities (dict[str -> tuple(pd.DataFrame, str, str)]) – Dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_column))}.
  • relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).
  • entityset (EntitySet) – An already initialized entityset. Required if entities and relationships are not defined.
  • target_entity (str) – Entity id of entity on which to make predictions.
  • agg_primitives (list[str or AggregationPrimitive], optional) –

    List of Aggregation Feature types to apply.

    Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”,
    ”count”, “percent_true”, “n_unique”, “mode”]
  • trans_primitives (list[str or TransformPrimitive], optional) –

    List of Transform Feature functions to apply.

    Default: [“day”, “year”, “month”, “weekday”, “haversine”,
    ”num_words”, “num_characters”]
  • allowed_paths (list[list[str]]) – Allowed entity paths on which to make features.
  • max_depth (int) – Maximum allowed depth of features.
  • ignore_entities (list[str], optional) – List of entities to blacklist when creating features.
  • ignore_variables (dict[str -> list[str]], optional) – List of specific variables within each entity to blacklist when creating features.
  • seed_features (list[PrimitiveBase]) – List of manually defined features to use.
  • drop_contains (list[str], optional) – Drop features that contains these strings in name.
  • drop_exact (list[str], optional) – Drop features that exactly match these strings in name.
  • where_primitives (list[str or PrimitiveBase], optional) –

    List of Primitives names (or types) to apply with where clauses.

    Default:
    [“count”]
  • max_features (int, optional) – Cap the number of generated features to this number. If -1, no limit.
  • profile (bool, optional) – Enables profiling if True.

Example

In [1]: import featuretools as ft

In [2]: import pandas as pd

In [3]: from sklearn.pipeline import Pipeline

In [4]: from sklearn.ensemble import ExtraTreesClassifier

# Get examle data
In [5]: n_customers = 3

In [6]: es = ft.demo.load_mock_customer(return_entityset=True, n_customers=5)

In [7]: y = [True, False, True]

# Build dataset
In [8]: pipeline = Pipeline(steps=[
   ...:     ('ft', ft.wrappers.DFSTransformer(entityset=es,
   ...:                                       target_entity="customers",
   ...:                                       max_features=3)),
   ...:     ('et', ExtraTreesClassifier(n_estimators=100))
   ...: ])
   ...: 

# Fit and predict
In [9]: pipeline.fit([1, 2, 3], y=y) # fit on first 3 customers
Out[9]: 
Pipeline(memory=None,
     steps=[('ft', <featuretools.wrappers.sklearn.DFSTransformer object at 0x113436198>), ('et', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False))])

In [10]: pipeline.predict_proba([4,5]) # predict probability of each class on last 2
Out[10]: 
array([[0., 1.],
       [0., 1.]])

In [11]: pipeline.predict([4,5]) # predict on last 2
Out[11]: array([ True,  True])

# Same as above, but using cutoff times
In [12]: ct = pd.DataFrame()

In [13]: ct['customer_id'] = [1, 2, 3, 4, 5]

In [14]: ct['time'] = pd.to_datetime(['2014-1-1 04:00',
   ....:                              '2014-1-2 17:20',
   ....:                              '2014-1-4 09:53',
   ....:                              '2014-1-4 13:48',
   ....:                              '2014-1-5 15:32'])
   ....: 

In [15]: pipeline.fit(ct.head(3), y=y)
Out[15]: 
Pipeline(memory=None,
     steps=[('ft', <featuretools.wrappers.sklearn.DFSTransformer object at 0x113436198>), ('et', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False))])

In [16]: pipeline.predict_proba(ct.tail(2))
Out[16]: 
array([[0.62, 0.38],
       [0.  , 1.  ]])

In [17]: pipeline.predict(ct.tail(2))
Out[17]: array([False,  True])

Methods

__init__([entities, relationships, …]) Creates Transformer
fit(cuttof_time_ids[, y]) Wrapper for DFS
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep])
transform(cuttof_time_ids) Wrapper for calculate_feature_matix