featuretools.Entity

class featuretools.Entity(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)

Represents an entity in a Entityset, and stores relevant metadata and data

An Entity is analogous to a table in a relational database

See also

Relationship, Variable, EntitySet

__init__(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)

Create Entity

Parameters:
  • id (str) – Id of Entity.
  • df (pd.DataFrame) – Dataframe providing the data for the entity.
  • entityset (EntitySet) – Entityset for this Entity.
  • variable_types (dict[str -> dict[str -> type]]) – An entity’s variable_types dict maps string variable ids to types (Variable) or (type, kwargs) to pass keyword arguments to the Variable.
  • index (str) – Name of id column in the dataframe.
  • time_index (str) – Name of time column in the dataframe.
  • secondary_time_index (dict[str -> str]) – Dictionary mapping columns in the dataframe to the time index column they are associated with.
  • last_time_index (pd.Series) – Time index of the last event for each instance across all child entities.
  • make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers the (0, len(dataframe)). Otherwise, assume index exists in dataframe.

Methods

__init__(id, df, entityset[, …]) Create Entity
add_interesting_values([max_values, verbose]) Find interesting values for categorical variables, to be used to
convert_variable_type(variable_id, new_type) Convert variable in dataframe to different type
delete_variable(variable_id) Remove variable from entity’s dataframe and from self.variables
query_by_values(instance_vals[, …]) Query instances that have variable with given value
set_index(variable_id[, unique])
param variable_id:
 Name of an existing variable to set as index.
set_secondary_time_index(secondary_time_index)
set_time_index(variable_id[, already_sorted])
update_data(df[, already_sorted, …]) Update entity’s internal dataframe, optionaly making sure data is sorted, reference indexes to other entities are consistent, and last_time_indexes are consistent.

Attributes

df Dataframe providing the data for the entity.
last_time_index Time index of the last event for each instance across all child entities.
shape Shape of the entity’s dataframe
variable_types Dictionary mapping variable id’s to variable types