I am recently tackling this year's kdd cup competition. I was trying to speed up my code to fit Prophet model on multiple time series from a pandas dataframe using python's multiprocessing module. Below is an example how the map function of Pool class from multiprocessing module works:

1 2 3 4 5 6 7 8 9 10 |
from multiprocessing import Pool def square(x): return x**2 p = Pool(8) results = p.map(square, range(16)) p.close() print(results) # would print out [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225] |

Basically, Pool(8) creates a process pool object with 8 processes. p.map(square, range(16)) chop the iterable into 8 pieces and assign them to the 8 processes. For each process in the pool, it applies the function square on each element(in this case 2 in total) in the smaller iterable assigned to it. The results were collected into the results object.

One possible way for me to use this mechanism to fit my models is to prepare the data into smaller dataframes with columns ['ds', 'y'], collect these smaller dataframes into a list and map a Prophet wrapper function to this list using a pool of processes. The wrapper function would look like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from fbprophet import Prophet import pandas as pd def Prophet_fit_predict(df): model = Prophet( n_changepoints=# code for calculating number of points, holidays=# a list of holidays, holidays_prior_scale=0.1, ) model.fit(df) future = pd.DataFrame({ds: # code for future time_stamps}) forecast = model.predict(future) return forecast['yhat'].values results=Pool(8).map(Prophet_fit_predict, runlist) |

This would work, but it would require duplicated computation on future time_stamps and number of changepoints which are the same for all my time series. Ideally, I want to be able to calculate them once and share the value. One obvious way to do so, is to change my run list from a collection of dataframes to a list of tuples like (df, n_points, holidays, future). Or another method I ended up using is to use partial.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from functools import partial def Prophet_fit_predict(df, n_points, holidays, future): model = Prophet( n_changepoints=n_points, holidays=holidays, holidays_prior_scale=0.1, ) model.fit(df) forecast = model.predict(future) return forecast['yhat'].values n_points=#code that calculate this holidays=#list of vals future=#code calcualtes this results=Pool(8).map(partial(Prophet_fit_predict, n_points=n_points, holidays=holidays, future=future), runlist) |