featuretools.synthesis.encode_features

featuretools.synthesis.encode_features(feature_matrix, features, top_n=10, include_unknown=True, to_encode=None, inplace=False, verbose=False)

Encode categorical features

Parameters:
  • feature_matrix (pd.DataFrame) – Dataframe of features.
  • features (list[PrimitiveBase]) – Feature definitions in feature_matrix.
  • top_n (pd.DataFrame) – Number of top values to include.
  • include_unknown (pd.DataFrame) – Add feature encoding an unknown class. defaults to True
  • to_encode (list[str]) – List of feature names to encode. features not in this list are unencoded in the output matrix defaults to encode all necessary features.
  • inplace (bool) – Encode feature_matrix in place. Defaults to False.
  • verbose (str) – Print progress info.
Returns:

encoded feature_matrix, encoded features

Return type:

(pd.Dataframe, list)

Example

In [1]: f1 = Feature(es["log"]["product_id"])

In [2]: f2 = Feature(es["log"]["purchased"])

In [3]: f3 = Feature(es["log"]["value"])

In [4]: features = [f1, f2, f3]

In [5]: ids = [0, 1, 2, 3, 4, 5]

In [6]: feature_matrix = ft.calculate_feature_matrix(features, es,
   ...:                                              instance_ids=ids)
   ...: 

In [7]: fm_encoded, f_encoded = ft.encode_features(feature_matrix,
   ...:                                            features)
   ...: 

In [8]: f_encoded
Out[8]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id = toothpaste>,
 <Feature: product_id is unknown>,
 <Feature: purchased>,
 <Feature: value>]

In [9]: fm_encoded, f_encoded = ft.encode_features(feature_matrix,
   ...:                                            features, top_n=2)
   ...: 

In [10]: f_encoded
Out[10]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id is unknown>,
 <Feature: purchased>,
 <Feature: value>]

In [11]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features,
   ....:                                            include_unknown=False)
   ....: 

In [12]: f_encoded
Out[12]: 
[<Feature: product_id = coke zero>,
 <Feature: product_id = car>,
 <Feature: product_id = toothpaste>,
 <Feature: purchased>,
 <Feature: value>]

In [13]: fm_encoded, f_encoded = ft.encode_features(feature_matrix, features,
   ....:                                            to_encode=['purchased'])
   ....: 

In [14]: f_encoded
Out[14]: [<Feature: product_id>, <Feature: purchased>, <Feature: value>]