API Reference

Demo Datasets

load_retail([id, nrows]) Returns the retail entityset example
load_mock_customer([n_customers, …]) Return dataframes of mock customer data
load_flight([entity_id, nrows, force]) Returns the flight dataset.

Deep Feature Synthesis

dfs([entities, relationships, entityset, …]) Calculates a feature matrix and features given a dictionary of entities and a list of relationships.

Timedelta

Timedelta(value[, unit, entity, data, inclusive]) Represents differences in time.

Feature Primitives

Primitive Types

Feature(entity, base_features, **kwargs) Alias for IdentityFeature and DirectFeature depending on arguments
TransformPrimitive(*base_features) Feature for entity that is a based off one or more other features
AggregationPrimitive(base_features, …[, …]) Feature for a parent entity that summarizes

Primitive Creation Functions

make_agg_primitive(function, input_types, …) Returns a new aggregation primitive class
make_trans_primitive(function, input_types, …) Returns a new transform primitive class

Aggregation Primitives

Count(id_feature, parent_entity[, count_null]) Counts the number of non null values
Mean(base_features, parent_entity[, …]) Computes the average value of a numeric feature
Sum(base_features, parent_entity[, …]) Counts the number of elements of a numeric or boolean feature
Min alias of min
Max(base_features, parent_entity[, …]) Finds the maximum non-null value of a numeric feature
Std(base_features, parent_entity[, …]) Finds the standard deviation of a numeric feature ignoring null values.
Median(base_features, parent_entity[, …]) Finds the median value of any feature with well-ordered values
Mode(base_features, parent_entity[, …]) Finds the most common element in a categorical feature
AvgTimeBetween(base_features, parent_entity) Computes the average time between consecutive events using the time index of the entity.
TimeSinceLast(base_features, parent_entity) Time since last related instance
NUnique(base_features, parent_entity[, …]) Returns the number of unique categorical variables
PercentTrue(base_features, parent_entity[, …]) Finds the percent of ‘True’ values in a boolean feature
All(base_features, parent_entity[, …]) Test if all values are ‘True’
Any(base_features, parent_entity[, …]) Test if any value is ‘True’
Last(base_features, parent_entity[, …]) Returns the last value
Skew(base_features, parent_entity[, …]) Computes the skewness of a data set.
Trend(base_features, parent_entity, **kwargs) Calculates the slope of the linear trend of variable overtime

Transform Primitives

Combine features

PrimitiveBase.isin(list_of_output)
PrimitiveBase.AND(other_feature) Logical AND with other_feature
PrimitiveBase.OR(other_feature) Logical OR with other_feature
PrimitiveBase.NOT() Creates inverse of feature

General Transform Primitives

Absolute(*base_features) Absolute value of base feature
TimeSince alias of time_since

Datetime Transform Primitives

Second(*base_features) Transform a Datetime feature into the second
Minute(*base_features) Transform a Datetime feature into the minute
Weekday(*base_features) Transform Datetime feature into the boolean of Weekday
Weekend(*base_features) Transform Datetime feature into the boolean of Weekend
Hour(*base_features) Transform a Datetime feature into the hour
Day(*base_features) Transform a Datetime feature into the day
Week(*base_features) Transform a Datetime feature into the week
Month(*base_features) Transform a Datetime feature into the month
Year(*base_features) Transform a Datetime feature into the year

Cumulative Transform Primitives

CumCount(base_feature, group_feature[, …]) Calculates the number of previous values of an instance for each value in a time-dependent entity.
CumSum(base_feature, group_feature[, …]) Calculates the sum of previous values of an instance for each value in a time-dependent entity.
CumMean(base_feature, group_feature[, …]) Calculates the mean of previous values of an instance for each value in a time-dependent entity.
CumMax(base_feature, group_feature[, …]) Calculates the max of previous values of an instance for each value in a time-dependent entity.
CumMin(base_feature, group_feature[, …]) Calculates the min of previous values of an instance for each value in a time-dependent entity.
Diff(base_feature, group_feature) For each value of the base feature, compute the difference between it and the previous value.
TimeSincePrevious(time_index, group_feature) Compute the time since the previous instance for each instance in a

Feature methods

PrimitiveBase.head([n, cutoff_time]) See values for feature
PrimitiveBase.rename(name) Rename Feature, returns copy
PrimitiveBase.get_depth([stop_at]) Returns depth of feature

Feature calculation

calculate_feature_matrix(features[, …]) Calculates a matrix for a given set of instance ids and calculation times.

Feature encoding

encode_features(feature_matrix, features[, …]) Encode categorical features

Saving and Loading Features

save_features(features, filepath) Saves the features list to a specificed filepath.
load_features(filepath, entityset) Loads the features from a filepath.

EntitySet, Entity, Relationship, Variable Types

Constructors

EntitySet(id[, entities, relationships, verbose]) Stores all actual data for a entityset
Entity(id, df, entityset[, variable_types, …]) Stores all actual data for an entity
Relationship(parent_variable, child_variable) Class to represent an relationship between entities

EntitySet attributes

EntitySet.id
EntitySet.name
EntitySet.entity_names Return list of each entity’s id
EntitySet.entities
EntitySet.relationships

EntitySet load and prepare data

EntitySet.entity_from_csv(entity_id, csv_path) Load the data for a specified entity from a CSV file.
EntitySet.entity_from_dataframe(entity_id, …) Load the data for a specified entity from a Pandas DataFrame.
EntitySet.add_relationship(relationship) Add a new relationship between entities in the entityset
EntitySet.normalize_entity(base_entity_id, …) Utility to normalize an entity_store
EntitySet.combine_variables(entity_id, …) Combines two variable into variable new_id
EntitySet.add_interesting_values([…]) Find interesting values for categorical variables, to be used to generate “where” clauses

EntitySet serialization

EntitySet.to_pickle(path)
EntitySet.read_pickle(path)

EntitySet query methods

EntitySet.__getitem__(entity_id) Get entity instance from entityset
EntitySet.find_backward_path(…) Find a backward path between a start and goal entity
EntitySet.find_forward_path(start_entity_id, …) Find a forward path between a start and goal entity
EntitySet.get_forward_entities(entity_id[, deep]) Get entities that are in a forward relationship with entity
EntitySet.get_backward_entities(entity_id[, …]) Get entities that are in a backward relationship with entity

Entity attributes

Entity.name Returns name of entity.
Entity.variables
Entity.index
Entity.time_index

Entity methods

Entity.head([n, cutoff_time]) See first n instance in entity
Entity.show_instance(instance_ids) See row corresponding to instance id
Entity.is_child_of(entity_id) Returns True if self is a child of entity_id
Entity.is_parent_of(entity_id) Returns True if self is a parent of entity_id
Entity.convert_variable_type(variable_id, …) Convert variable in dataframe to different type
Entity.has_time_index() Returns True if there is a time_index, otherwise False
Entity.add_interesting_values([max_values, …]) Find interesting values for categorical variables, to be used to

Relationship attributes

Relationship.parent_variable Instance of variable in parent entity
Relationship.child_variable Instance of variable in child entity
Relationship.parent_entity Parent entity object
Relationship.child_entity Child entity object

Variable types

Index(id, entity[, name]) Represents variables that uniquely identify an instance of an entity
Id(id, entity[, name]) Represents variables that identify another entity
TimeIndex(id, entity[, name]) Represents time index of entity
DatetimeTimeIndex(id, entity[, format, name]) Represents time index of entity that is a datetime
Datetime(id, entity[, format, name]) Represents variables that are points in time
Numeric(id, entity[, name]) Represents variables that contain numeric values
Categorical(id, entity[, name]) Represents variables that can take an unordered discrete values
Ordinal(id, entity[, name]) Represents variables that take on an ordered discrete value
Boolean(id, entity[, name]) Represents variables that take on one of two values
Text(id, entity[, name]) Represents variables that are arbitary strings

Feature Selection

remove_low_information_features(feature_matrix) Select features that have at least 2 unique values and that are not all null