featuretools.EntitySet

class featuretools.EntitySet(id, entities=None, relationships=None, verbose=False)

Stores all actual data for a entityset

entity_stores
__init__(id, entities=None, relationships=None, verbose=False)

Creates EntitySet

Parameters:
  • id (str) – Unique identifier to associate with this instance
  • verbose (bool) – Show additional information.
  • entities (dict[str -> tuple(pd.DataFrame, str, str)]) – Dictionary of entities. Entries take the format {entity id -> (dataframe, id column, (time_column), (variable_types))}. Note that time_column and variable_types are optional.
  • relationships (list[(str, str, str, str)]) – List of relationships between entities. List items are a tuple with the format (parent entity id, parent variable, child entity id, child variable).

Example

entities = {
    "cards" : (card_df, "id"),
    "transactions" : (transactions_df, "id", "transaction_time")
}

relationships = [("cards", "id", "transactions", "card_id")]

ft.EntitySet("my-entity-set", entities, relationships)

Methods

__init__(id[, entities, relationships, verbose]) Creates EntitySet
add_column(entity_id, column_id, column_data) Add variable to entity’s dataframe
add_entity(entity_id, df, **kwargs)
add_interesting_values([max_values, verbose]) Find interesting values for categorical variables, to be used to generate “where” clauses
add_last_time_indexes() Calculates the last time index values for each entity (the last time an instance or children of that instance were observed).
add_parent_time_index(entity_id, …[, …])
add_relationship(relationship) Add a new relationship between entities in the entityset
add_relationships(relationships) Add multiple new relationships to a entityset
combine_variables(entity_id, new_id, to_combine) Combines two variable into variable new_id
concat(other[, inplace]) Combine entityset with another to create a new entityset with the combined data of both entitysets.
delete_column(entity_id, column_id) Remove variable from entity’s dataframe
delete_entity_variables(entity_id, …)
denormalize(d[, denormalizer, entityset])
entity_from_dataframe(entity_id, dataframe) Load the data for a specified entity from a Pandas DataFrame.
find_backward_path(start_entity_id, …) Find a backward path between a start and goal entity
find_forward_path(start_entity_id, …) Find a forward path between a start and goal entity
find_path(start_entity_id, goal_entity_id[, …]) Find a path in the entityset represented as a DAG
gen_relationship_var(child_eid, parent_eid)
get_all_instances(entity_id)
get_backward_entities(entity_id[, deep]) Get entities that are in a backward relationship with entity
get_backward_relationships(entity_id) get relationships where entity “entity_id” is the parent.
get_column_count(eid, column_id)
get_column_data(entity_id, column_id) get data from column in specified form
get_column_max(eid, column_id)
get_column_mean(eid, column_id)
get_column_min(eid, column_id)
get_column_names(entity_id) Return a list of the columns on the underlying data store
get_column_nunique(eid, column_id)
get_column_stat(eid, column_id, stat)
get_column_std(eid, column_id)
get_column_type(entity_id, column_id) get type of column in underlying data structure
get_dataframe(entity_id) Get the data for a specified entity as a pandas dataframe.
get_forward_entities(entity_id[, deep]) Get entities that are in a forward relationship with entity
get_forward_relationships(entity_id) Get relationships where entity “entity_id” is the child
get_index(entity_id) Get name of the primary key ID column for this entity
get_instance_data(entity_id, instance_ids)
get_name() Returns name of entityset
get_pandas_data_slice(filter_entity_ids, …) Get the slice of data related to the supplied instances of the index entity.
get_relationship(eid_1, eid_2) Get relationship, if any, between eid_1 and eid_2
get_sample(n)
get_secondary_time_index(entity_id) Get names and associated variables of the secondary time index columns for this entity
get_sliced_instance_ids(entity_id, start, end)
get_time_index(entity_id) Get name of the time index column for this entity
get_top_n_instances(entity_id[, top_n])
get_variable_types(entity_id)
head(entity_id[, n, variable_id, cutoff_time])
index_data(r) If necessary, generate an index on the data which links instances of parent entities to collections of child instances which link to them.
make_index_variable_name(entity_id)
next()
normalize(normalizer)
normalize_entity(base_entity_id, …[, …]) Utility to normalize an entity_store
num_instances(entity_id)
path_relationships(path, start_entity_id) Generate a list of the strings “forward” or “backward” corresponding to the direction of the relationship at each point in path.
read_pickle(path)
sample_instances(entity_id[, n])
store_convert_variable_type(entity_id, …) Convert variable in data set to different type
to_pickle(path)