API Reference

Demo Datasets

load_retail([id, nrows, return_single_table]) Returns the retail entityset example.
load_mock_customer([n_customers, …]) Return dataframes of mock customer data
load_flight([month_filter, …]) Download, clean, and filter flight data from 2017.

Deep Feature Synthesis

dfs([entities, relationships, entityset, …]) Calculates a feature matrix and features given a dictionary of entities and a list of relationships.


Scikit-learn (BETA)

DFSTransformer([entities, relationships, …]) Transformer using Scikit-Learn interface for Pipeline uses.


Timedelta(value[, unit, entity, data, inclusive]) Represents differences in time.

Time utils

make_temporal_cutoffs(instance_ids, cutoffs) Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives

Primitive Types

TransformPrimitive Feature for entity that is a based off one or more other features in that entity.

Primitive Creation Functions

make_agg_primitive(function, input_types, …) Returns a new aggregation primitive class.
make_trans_primitive(function, input_types, …) Returns a new transform primitive class

Aggregation Primitives

Count Counts the number of non null values.
Mean([skipna]) Computes the average value of a numeric feature.
Sum Sums elements of a numeric or boolean feature.
Min(**kwargs) Finds the minimum non-null value of a numeric feature.
Max Finds the maximum non-null value of a numeric feature.
Std Finds the standard deviation of a numeric feature ignoring null values.
Median Finds the median value of any feature with well-ordered values.
Mode Finds the most common element in a categorical feature.
AvgTimeBetween Computes the average time between consecutive events.
TimeSinceLast Time since last related instance.
TimeSinceFirst Time since first related instance.
NUnique Returns the number of unique categorical variables.
PercentTrue Finds the percent of ‘True’ values in a boolean feature.
All Test if all values are ‘True’.
Any Test if any value is ‘True’.
Last Returns the last value.
Skew Computes the skewness of a data set.
Trend Calculates the slope of the linear trend of variable overtime.

Transform Primitives

Combine features

IsIn([list_of_outputs]) For each value of the base feature, checks whether it is in a provided list.
Not For each value of the base feature, negates the boolean value.

General Transform Primitives

Absolute Absolute value of base feature.
TimeSince alias of featuretools.primitives.base.transform_primitive_base.time_since

Datetime Transform Primitives

Second Transform a Datetime feature into the second.
Minute Transform a Datetime feature into the “minute.
Weekday Transform a Datetime feature into the weekday.
IsWeekend Transform Datetime feature into the boolean of Weekend.
Hour Transform a Datetime feature into the hour.
Day Transform a Datetime feature into the day.
Week Transform a Datetime feature into the week.
Month Transform a Datetime feature into the “month.
Year Transform a Datetime feature into the year.

Cumulative Transform Primitives

Diff Compute the difference between the value of a base feature and the previous value.
TimeSincePrevious Compute the time since the previous instance.

Text Transform Primitives

NumCharacters Return the number of characters in a given string.
NumWords Returns the number of words in a given string by counting the spaces.

Location Transform Primitives

Latitude Returns the first value of the tuple base feature.
Longitude Returns the second value on the tuple base feature.
Haversine Calculate the approximate haversine distance in miles between two LatLong variable types.

Feature methods

FeatureBase.rename(name) Rename Feature, returns copy
FeatureBase.get_depth([stop_at]) Returns depth of feature

Feature calculation

calculate_feature_matrix(features[, …]) Calculates a matrix for a given set of instance ids and calculation times.

Feature encoding

encode_features(feature_matrix, features[, …]) Encode categorical features

Saving and Loading Features

save_features(features, filepath) Saves the features list to a specificed filepath.
load_features(filepath) Loads the features from a filepath.

EntitySet, Entity, Relationship, Variable Types


EntitySet([id, entities, relationships]) Stores all actual data for a entityset
Entity(id, df, entityset[, variable_types, …]) Represents an entity in a Entityset, and stores relevant metadata and data
Relationship(parent_variable, child_variable) Class to represent an relationship between entities

EntitySet load and prepare data

EntitySet.entity_from_dataframe(entity_id, …) Load the data for a specified entity from a Pandas DataFrame.
EntitySet.add_relationship(relationship) Add a new relationship between entities in the entityset
EntitySet.normalize_entity(base_entity_id, …) Create a new entity and relationship from unique values of an existing variable.
EntitySet.add_interesting_values([…]) Find interesting values for categorical variables, to be used to generate “where” clauses

EntitySet serialization

read_pickle(path[, load_data]) Load an EntitySet from a path on disk, assuming the EntitySet was saved in the pickle format.
read_parquet(path[, load_data]) Load an EntitySet from a path on disk, assuming the EntitySet was saved in the parquet format.
EntitySet.to_pickle(path) Write entityset to disk in the pickle format, location specified by path.
EntitySet.to_parquet(path) Write entityset to disk in the parquet format, location specified by path.

EntitySet query methods

EntitySet.__getitem__(entity_id) Get entity instance from entityset
EntitySet.find_backward_path(…) Find a backward path between a start and goal entity
EntitySet.find_forward_path(start_entity_id, …) Find a forward path between a start and goal entity
EntitySet.get_forward_entities(entity_id[, deep]) Get entities that are in a forward relationship with entity
EntitySet.get_backward_entities(entity_id[, …]) Get entities that are in a backward relationship with entity

EntitySet visualization

EntitySet.plot([to_file]) Create a UML diagram-ish graph of the EntitySet.

Entity methods

Entity.convert_variable_type(variable_id, …) Convert variable in dataframe to different type
Entity.add_interesting_values([max_values, …]) Find interesting values for categorical variables, to be used to

Relationship attributes

Relationship.parent_variable Instance of variable in parent entity
Relationship.child_variable Instance of variable in child entity
Relationship.parent_entity Parent entity object
Relationship.child_entity Child entity object

Variable types

Index(id, entity[, name]) Represents variables that uniquely identify an instance of an entity
Id(id, entity[, name]) Represents variables that identify another entity
TimeIndex(id, entity[, name]) Represents time index of entity
DatetimeTimeIndex(id, entity[, format, name]) Represents time index of entity that is a datetime
NumericTimeIndex(id, entity[, name]) Represents time index of entity that is numeric
Datetime(id, entity[, format, name]) Represents variables that are points in time
Numeric(id, entity[, name]) Represents variables that contain numeric values
Categorical(id, entity[, name]) Represents variables that can take an unordered discrete values
Ordinal(id, entity[, name]) Represents variables that take on an ordered discrete value
Boolean(id, entity[, name]) Represents variables that take on one of two values
Text(id, entity[, name]) Represents variables that are arbitary strings
LatLong(id, entity[, name]) Represents an ordered pair (Latitude, Longitude) To make a latlong in a dataframe do data[‘latlong’] = data[[‘latitude’, ‘longitude’]].apply(tuple, axis=1)
ZIPCode(id, entity[, name]) Represents a postal address in the United States.
IPAddress(id, entity[, name]) Represents a computer network address.
EmailAddress(id, entity[, name]) Represents an email box to which email message are sent.
CountryCode(id, entity[, name]) Represents an ISO-3166 standard country code.
SubRegionCode(id, entity[, name]) Represents an ISO-3166 standard sub-region code.

Feature Selection

remove_low_information_features(feature_matrix) Select features that have at least 2 unique values and that are not all null