featuretools.Entity

class featuretools.Entity(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, encoding=None, already_sorted=False, created_index=None, verbose=False)

Represents an entity in a Entityset, and stores relevant metadata and data

An Entity is analogous to a table in a relational database

See also

Relationship, Variable, EntitySet

__init__(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, encoding=None, already_sorted=False, created_index=None, verbose=False)

Create Entity

Parameters:
  • id (str) – Id of Entity.
  • df (pd.DataFrame) – Dataframe providing the data for the entity.
  • entityset (EntitySet) – Entityset for this Entity.
  • variable_types (dict[str -> dict[str -> type]]) – Optional mapping of entity_id to variable_types dict with which to initialize an entity’s store. An entity’s variable_types dict maps string variable ids to types (Variable).
  • index (str) – Name of id column in the dataframe.
  • time_index (str) – Name of time column in the dataframe.
  • secondary_time_index (dict[str -> str]) – Dictionary mapping columns in the dataframe to the time index column they are associated with.
  • last_time_index (pd.Series) – Time index of the last event for each instance across all child entities.
  • encoding (str, optional)) – If None, will use ‘ascii’. Another option is ‘utf-8’, or any encoding supported by pandas.

Methods

__init__(id, df, entityset[, …]) Create Entity
add_interesting_values([max_values, verbose]) Find interesting values for categorical variables, to be used to
convert_all_variable_data(variable_types)
convert_variable_data(column_id, new_type, …) Convert variable in data set to different type
convert_variable_type(variable_id, new_type) Convert variable in dataframe to different type
delete_variable(variable_id) Remove variable from entity’s dataframe and from self.variables
infer_variable_types([ignore, link_vars]) Extracts the variables from a dataframe
query_by_values(instance_vals[, …]) Query instances that have variable with given value
set_index(variable_id[, unique])
param variable_id:
 Name of an existing variable to set as index.
set_secondary_time_index(secondary_time_index)
set_time_index(variable_id[, already_sorted])
update_data(df[, already_sorted, …]) Update entity’s internal dataframe, optionaly making sure data is sorted, reference indexes to other entities are consistent, and last_time_indexes are consistent.

Attributes

df
id
index
is_metadata
last_time_index
shape
time_index
variable_types
variables