NOTICE

The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release:

pip install featuretools==1.0.0rc1

For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.


featuretools.demo.load_flight

featuretools.demo.load_flight(month_filter=None, categorical_filter=None, nrows=None, demo=True, return_single_table=False, verbose=False)[source]

Download, clean, and filter flight data from 2017. The original dataset can be found here.

Parameters
  • month_filter (list[int]) – Only use data from these months (example is [1, 2]). To skip, set to None.

  • categorical_filter (dict[str->str]) – Use only specified categorical values. Example is {'dest_city': ['Boston, MA'], 'origin_city': ['Boston, MA']} which returns all flights in OR out of Boston. To skip, set to None.

  • nrows (int) – Passed to nrows in pd.read_csv. Used before filtering.

  • demo (bool) – Use only two months of data. If False, use the whole year.

  • return_single_table (bool) – Exit the function early and return a dataframe.

  • verbose (bool) – Show a progress bar while loading the data.

Examples

In [1]: import featuretools as ft

In [2]: es = ft.demo.load_flight(verbose=True,
   ...:                          month_filter=[1],
   ...:                          categorical_filter={'origin_city':['Boston, MA']})
   ...: 
100%|xxxxxxxxxxxxxxxxxxxxxxxxx| 100/100 [01:16<00:00,  1.31it/s]

In [3]: es
Out[3]: 
Entityset: Flight Data
  Entities:
    airports [Rows: 55, Columns: 3]
    flights [Rows: 613, Columns: 9]
    trip_logs [Rows: 9456, Columns: 22]
    airlines [Rows: 10, Columns: 1]
  Relationships:
    trip_logs.flight_id -> flights.flight_id
    flights.carrier -> airlines.carrier
    flights.dest -> airports.dest