Tag Archives: Multiprocessing

Pass more than one arguments to Pool.map

I am recently tackling this year's kdd cup competition. I was trying to speed up my code to fit Prophet model on multiple time series from a pandas dataframe using python's multiprocessing module. Below is an example how the map function of Pool class from multiprocessing module works:

Basically, Pool(8) creates a process pool object with 8 processes. p.map(square, range(16)) chop the iterable into 8 pieces and assign them to the 8 processes. For each process in the pool, it applies the function square on each element(in this case 2 in total) in the smaller iterable assigned to it. The results were collected into the results object.

One possible way for me to use this mechanism to fit my models is to prepare the data into smaller dataframes with columns ['ds', 'y'], collect these smaller dataframes into a list and map a Prophet wrapper function to this list using a pool of processes. The wrapper function would look like this:

This would work, but it would require duplicated computation on future time_stamps and number of changepoints which are the same for all my time series. Ideally, I want to be able to calculate them once and share the value. One obvious way to do so, is to change my run list from a collection of dataframes to a list of tuples like (df, n_points, holidays, future). Or another method I ended up using is to use partial.