featuretools.Entity

class featuretools.Entity(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)[source]

Represents an entity in a Entityset, and stores relevant metadata and data

An Entity is analogous to a table in a relational database

See also

Relationship, Variable, EntitySet

__init__(id, df, entityset, variable_types=None, index=None, time_index=None, secondary_time_index=None, last_time_index=None, already_sorted=False, make_index=False, verbose=False)[source]

Create Entity

Parameters
  • id (str) – Id of Entity.

  • df (pd.DataFrame) – Dataframe providing the data for the entity.

  • entityset (EntitySet) – Entityset for this Entity.

  • variable_types (dict[str -> type/str/dict[str -> type]]) – An entity’s variable_types dict maps string variable ids to types (Variable) or type_string (str) or (type, kwargs) to pass keyword arguments to the Variable.

  • index (str) – Name of id column in the dataframe.

  • time_index (str) – Name of time column in the dataframe.

  • secondary_time_index (dict[str -> str]) – Dictionary mapping columns in the dataframe to the time index column they are associated with.

  • last_time_index (pd.Series) – Time index of the last event for each instance across all child entities.

  • make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers the (0, len(dataframe)). Otherwise, assume index exists in dataframe.

Methods

__init__(id, df, entityset[, …])

Create Entity

add_interesting_values([max_values, verbose])

Find interesting values for categorical variables, to be used to

convert_variable_type(variable_id, new_type)

Convert variable in dataframe to different type

delete_variables(variable_ids)

Remove variables from entity’s dataframe and from self.variables

query_by_values(instance_vals[, …])

Query instances that have variable with given value

set_index(variable_id[, unique])

param variable_id

Name of an existing variable to set as index.

set_secondary_time_index(secondary_time_index)

set_time_index(variable_id[, already_sorted])

update_data(df[, already_sorted, …])

Update entity’s internal dataframe, optionaly making sure data is sorted, reference indexes to other entities are consistent, and last_time_indexes are consistent.

Attributes

df

Dataframe providing the data for the entity.

last_time_index

Time index of the last event for each instance across all child entities.

shape

Shape of the entity’s dataframe

variable_types

Dictionary mapping variable id’s to variable types