featuretools.entityset.EntitySet.entity_from_dataframe

EntitySet.entity_from_dataframe(entity_id, dataframe, index=None, variable_types=None, make_index=False, time_index=None, secondary_time_index=None, encoding=None, already_sorted=False)

Load the data for a specified entity from a Pandas DataFrame.

Parameters:
  • entity_id (str) – Unique id to associate with this entity.
  • dataframe (pandas.DataFrame) – Dataframe containing the data.
  • index (str, optional) – Name of the variable used to index the entity. If None, take the first column.
  • variable_types (dict[str -> Variable], optional) – Keys are of variable ids and values are variable types. Used to to initialize an entity’s store.
  • make_index (bool, optional) – If True, assume index does not exist as a column in dataframe, and create a new column of that name using integers. Otherwise, assume index exists.
  • time_index (str, optional) – Name of the variable containing time data. Type must be in variables.DateTime or be able to be cast to datetime (e.g. str, float, or numeric.)
  • secondary_time_index (dict[str -> Variable]) – Name of variable containing time data to use a second time index for the entity.
  • encoding (str, optional) – If None, will use ‘ascii’. Another option is ‘utf-8’, or any encoding supported by pandas.
  • already_sorted (bool, optional) – If True, assumes that input dataframe is already sorted by time. Defaults to False.

Notes

Will infer variable types from Pandas dtype

Example

In [1]: import featuretools as ft

In [2]: import pandas as pd

In [3]: transactions_df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
   ...:                                 "session_id": [1, 2, 1, 3, 4, 5],
   ...:                                 "amount": [100.40, 20.63, 33.32, 13.12, 67.22, 1.00],
   ...:                                 "transaction_time": pd.date_range(start="10:00", periods=6, freq="10s"),
   ...:                                 "fraud": [True, False, True, False, True, True]})
   ...: 

In [4]: es = ft.EntitySet("example")

In [5]: es.entity_from_dataframe(entity_id="transactions",
   ...:                          index="id",
   ...:                          time_index="transaction_time",
   ...:                          dataframe=transactions_df)
   ...: 
Out[5]: 
Entityset: example
  Entities:
    transactions [Rows: 6, Columns: 5]
  Relationships:
    No relationships

In [6]: es["transactions"]
Out[6]: 
Entity: transactions
  Variables:
    id (dtype: index)
    session_id (dtype: numeric)
    amount (dtype: numeric)
    transaction_time (dtype: datetime_time_index)
    fraud (dtype: boolean)
  Shape:
    (Rows: 6, Columns: 5)

In [7]: es["transactions"].df
Out[7]: 
   id  session_id  amount    transaction_time  fraud
1   1           1  100.40 2018-11-29 10:00:00   True
2   2           2   20.63 2018-11-29 10:00:10  False
3   3           1   33.32 2018-11-29 10:00:20   True
4   4           3   13.12 2018-11-29 10:00:30  False
5   5           4   67.22 2018-11-29 10:00:40   True
6   6           5    1.00 2018-11-29 10:00:50   True