API Reference

Demo Datasets

load_retail([id, nrows, use_cache]) Returns the retail entityset example.
load_mock_customer([n_customers, …]) Return dataframes of mock customer data
load_flight([entity_id, nrows, force]) Returns the flight dataset.

Deep Feature Synthesis

dfs([entities, relationships, entityset, …]) Calculates a feature matrix and features given a dictionary of entities and a list of relationships.


Timedelta(value[, unit, entity, data, inclusive]) Represents differences in time.

Time utils

make_temporal_cutoffs(instance_ids, cutoffs) Makes a set of equally spaced cutoff times prior to a set of input cutoffs and instance ids.

Feature Primitives

Primitive Types

Feature(entity, base_features, **kwargs) Alias for IdentityFeature and DirectFeature depending on arguments
TransformPrimitive(*base_features) Feature for entity that is a based off one or more other features in that entity.
AggregationPrimitive(base_features, …[, …]) Feature for a parent entity that summarizes

Primitive Creation Functions

make_agg_primitive(function, input_types, …) Returns a new aggregation primitive class.
make_trans_primitive(function, input_types, …) Returns a new transform primitive class

Aggregation Primitives

Count(id_feature, parent_entity[, count_null]) Counts the number of non null values.
Mean(base_features, parent_entity[, …]) Computes the average value of a numeric feature.
Sum(base_features, parent_entity[, …]) Counts the number of elements of a numeric or boolean feature.
Min alias of min
Max(base_features, parent_entity[, …]) Finds the maximum non-null value of a numeric feature.
Std(base_features, parent_entity[, …]) Finds the standard deviation of a numeric feature ignoring null values.
Median(base_features, parent_entity[, …]) Finds the median value of any feature with well-ordered values.
Mode(base_features, parent_entity[, …]) Finds the most common element in a categorical feature.
AvgTimeBetween(base_features, parent_entity) Computes the average time between consecutive events.
TimeSinceLast(base_features, parent_entity) Time since last related instance.
NUnique(base_features, parent_entity[, …]) Returns the number of unique categorical variables.
PercentTrue(base_features, parent_entity[, …]) Finds the percent of ‘True’ values in a boolean feature.
All(base_features, parent_entity[, …]) Test if all values are ‘True’.
Any(base_features, parent_entity[, …]) Test if any value is ‘True’.
Last(base_features, parent_entity[, …]) Returns the last value.
Skew(base_features, parent_entity[, …]) Computes the skewness of a data set.
Trend(base_features, parent_entity, **kwargs) Calculates the slope of the linear trend of variable overtime.

Transform Primitives

Combine features

PrimitiveBase.AND(other_feature) Logical AND with other_feature
PrimitiveBase.OR(other_feature) Logical OR with other_feature
PrimitiveBase.NOT() Creates inverse of feature

General Transform Primitives

Absolute(*base_features) Absolute value of base feature.
TimeSince alias of time_since

Datetime Transform Primitives

Second(*base_features) Transform a Datetime feature into the second.
Minute(*base_features) Transform a Datetime feature into the minute.
Weekday(*base_features) Transform Datetime feature into the boolean of Weekday.
Weekend(*base_features) Transform Datetime feature into the boolean of Weekend.
Hour(*base_features) Transform a Datetime feature into the hour.
Day(*base_features) Transform a Datetime feature into the day.
Week(*base_features) Transform a Datetime feature into the week.
Month(*base_features) Transform a Datetime feature into the month.
Year(*base_features) Transform a Datetime feature into the year.

Cumulative Transform Primitives

CumCount(base_feature, group_feature[, …]) Calculates the number of previous values of an instance for each value in a time-dependent entity.
CumSum(base_feature, group_feature[, …]) Calculates the sum of previous values of an instance for each value in a time-dependent entity.
CumMean(base_feature, group_feature[, …]) Calculates the mean of previous values of an instance for each value in a time-dependent entity.
CumMax(base_feature, group_feature[, …]) Calculates the max of previous values of an instance for each value in a time-dependent entity.
CumMin(base_feature, group_feature[, …]) Calculates the min of previous values of an instance for each value in a time-dependent entity.
Diff(base_feature, group_feature) Compute the difference between the value of a base feature and the previous value.
TimeSincePrevious(time_index, group_feature) Compute the time since the previous instance.

Text Transform Primitives

NumCharacters(*base_features) Return the characters in a given string.
NumWords(*base_features) Returns the words in a given string by counting the spaces.

Location Transform Primitives

Latitude(*base_features) Returns the first value of the tuple base feature.
Longitude(*base_features) Returns the second value on the tuple base feature.
Haversine(*base_features) Calculate the approximate haversine distance in miles between two LatLong variable types.

Feature methods

PrimitiveBase.head([n, cutoff_time]) See values for feature
PrimitiveBase.rename(name) Rename Feature, returns copy
PrimitiveBase.get_depth([stop_at]) Returns depth of feature

Feature calculation

calculate_feature_matrix(features[, …]) Calculates a matrix for a given set of instance ids and calculation times.

Feature encoding

encode_features(feature_matrix, features[, …]) Encode categorical features

Saving and Loading Features

save_features(features, filepath) Saves the features list to a specificed filepath.
load_features(filepath, entityset) Loads the features from a filepath.

EntitySet, Entity, Relationship, Variable Types


EntitySet(id[, entities, relationships, verbose]) Stores all actual data for a entityset
Entity(id, df, entityset[, variable_types, …]) Stores all actual data for an entity
Relationship(parent_variable, child_variable) Class to represent an relationship between entities

EntitySet attributes

EntitySet.entity_names Return list of each entity’s id

EntitySet load and prepare data

EntitySet.entity_from_dataframe(entity_id, …) Load the data for a specified entity from a Pandas DataFrame.
EntitySet.add_relationship(relationship) Add a new relationship between entities in the entityset
EntitySet.normalize_entity(base_entity_id, …) Utility to normalize an entity_store
EntitySet.combine_variables(entity_id, …) Combines two variable into variable new_id
EntitySet.add_interesting_values([…]) Find interesting values for categorical variables, to be used to generate “where” clauses

EntitySet serialization


EntitySet query methods

EntitySet.__getitem__(entity_id) Get entity instance from entityset
EntitySet.find_backward_path(…) Find a backward path between a start and goal entity
EntitySet.find_forward_path(start_entity_id, …) Find a forward path between a start and goal entity
EntitySet.get_forward_entities(entity_id[, deep]) Get entities that are in a forward relationship with entity
EntitySet.get_backward_entities(entity_id[, …]) Get entities that are in a backward relationship with entity

Entity attributes

Entity.name Returns name of entity.

Entity methods

Entity.head([n, cutoff_time]) See first n instance in entity
Entity.show_instance(instance_ids) See row corresponding to instance id
Entity.is_child_of(entity_id) Returns True if self is a child of entity_id
Entity.is_parent_of(entity_id) Returns True if self is a parent of entity_id
Entity.convert_variable_type(variable_id, …) Convert variable in dataframe to different type
Entity.has_time_index() Returns True if there is a time_index, otherwise False
Entity.add_interesting_values([max_values, …]) Find interesting values for categorical variables, to be used to

Relationship attributes

Relationship.parent_variable Instance of variable in parent entity
Relationship.child_variable Instance of variable in child entity
Relationship.parent_entity Parent entity object
Relationship.child_entity Child entity object

Variable types

Index(id, entity[, name]) Represents variables that uniquely identify an instance of an entity
Id(id, entity[, name]) Represents variables that identify another entity
TimeIndex(id, entity[, name]) Represents time index of entity
DatetimeTimeIndex(id, entity[, format, name]) Represents time index of entity that is a datetime
NumericTimeIndex(id, entity[, name]) Represents time index of entity that is numeric
Datetime(id, entity[, format, name]) Represents variables that are points in time
Numeric(id, entity[, name]) Represents variables that contain numeric values
Categorical(id, entity[, name]) Represents variables that can take an unordered discrete values
Ordinal(id, entity[, name]) Represents variables that take on an ordered discrete value
Boolean(id, entity[, name]) Represents variables that take on one of two values
Text(id, entity[, name]) Represents variables that are arbitary strings
LatLong(id, entity[, name]) Represents an ordered pair (Latitude, Longitude)

Feature Selection

remove_low_information_features(feature_matrix) Select features that have at least 2 unique values and that are not all null