Save Intermediate Feature Matrix Results

In this tutorial, we will go over the how to save intermediate results when computing the feature matrix.

[1]:
import featuretools as ft

In this example, we will use a dataset of retail data of customers from a UK website from December 2010 to December 2011.

[2]:
es = ft.demo.load_retail(nrows=10000)

let’s use a simple feature for this example.

[3]:
region = ft.Feature(es["customers"]["Country"])

We can supply “cutoff times” to specify that we want to calculate features one year after a customer’s first invoice.

[4]:
import pandas as pd
cutoff_times = es["customers"].df[["CustomerID", "first_invoices_time"]].rename(
    columns={"CustomerID": "instance_id", "first_invoices_time": "time"})
cutoff_times["time"] = cutoff_times["time"] + pd.Timedelta("365 days")

Here is what some of the cutoff times look like.

[5]:
cutoff_times.head(10)
[5]:
instance_id time
CustomerID
17850.0 17850.0 2011-12-01 08:26:00
13047.0 13047.0 2011-12-01 08:34:00
12583.0 12583.0 2011-12-01 08:45:00
13748.0 13748.0 2011-12-01 09:00:00
15100.0 15100.0 2011-12-01 09:09:00
15291.0 15291.0 2011-12-01 09:32:00
14688.0 14688.0 2011-12-01 09:37:00
14527.0 14527.0 2011-12-01 09:41:00
15311.0 15311.0 2011-12-01 09:41:00
17809.0 17809.0 2011-12-01 09:41:00

If you want to save intermediate computations as CSVs, simply pass the location of a directory of where the computation should be saved. For example, if you pass a directory called “ft_temp”, CSV files will be output to the directory, named according t the timestamp that it represents.

[6]:
import os
save_progress = os.path.join(os.getcwd(), 'ft_temp')
if not os.path.exists(save_progress):
    os.makedirs(save_progress)
[7]:
fm_save = ft.calculate_feature_matrix([region],
                                       entityset=es,
                                       cutoff_time=cutoff_times.sample(10),
                                       save_progress=save_progress)

As seen below, there are now files in the directory, named by timestamp.

[8]:
% ls ft_temp/
ft_2011_12_01_03-08-00-000000.csv  ft_2011_12_02_05-03-00-000000.csv
ft_2011_12_01_09-00-00-000000.csv  ft_2011_12_02_05-19-00-000000.csv
ft_2011_12_01_12-43-00-000000.csv  ft_2011_12_02_12-07-00-000000.csv
ft_2011_12_01_12-51-00-000000.csv  ft_2011_12_02_12-18-00-000000.csv
ft_2011_12_02_03-19-00-000000.csv  ft_2011_12_03_12-57-00-000000.csv
[9]:
import shutil
shutil.rmtree(save_progress)