Auto Tuning multiple Timeseries SARIMAX Model — With a case study and detailed code explanation

NandaKishore Joshi
6 min readSep 26, 2021

--

Working as a Data Scientist I have learnt that no one ML model can complete a project. Specially when trying to forecast any data in future, a single model cannot entirely solve the business need. We always end up forecasting multiple data combinations which need more than one( too many…!! timeseries model). For example , to make the forecast of sales for a Volkswagen Polo we can hypothetically have (Area — Variant) as data combination. Even if we consider only 10 areas and assume that the car has 5 variants, we will end up creating 50 timeseries data combination. Building 50 tuned models manually by itself is a challenge ,what if we have to tune these models once every month? That too if we have to use more sophisticated models like SARIMAX which has 7 hyperparameters to tune would be a nightmare every month for any Data Scientist.

In this article lets understand how to Auto Tune multiple SARIMAX models based on our business data (having multiple data combination) and choose the best parameters dynamically every time we have new actuals. We will look at a case study, of course with code to understand the idea behind auto tuning SARIMAX models. At the end of this article, you will be able to auto tune any number of SARIMAX models without any manual intervention every time new actuals are available. Sounds Interesting.. !! lets get on to the business.

Before starting with the code lets briefly understand the hyperparameters to be tuned for a SARIMAX model. SARIMAX is the extension of the ARIMA models with seasonality. Hence, SARIMAX takes into account the parameters involved in regular ARIMA mode (p,d,q) and also adds seasonality parameters (P,D,Q,s). These arguments to SARIMAX model are called order (p,d,q) and seasonal order (P,D,Q,s) respectively and hence 7 parameters to tune.

SARIMAX (p,d,q) x (P,D,Q,s)

CASE STUDY — Forecasting the revenue from different product categories for an e-commerce startup

Lets take up a close to real example to forecast the revenue of a startup from its various product categories for the next 16 months. Below is the sample of the data (dataframe =df) we have got

Fig 1 : Sample of actual data

Let’s understand the data first!!

We have got the monthly data from 2013 till May 2021 of revenue (in dollar!!) for 4 different product categories namely Cosmetics, Processed_Food, Furniture and Books. And this data is divided across 10 different areacodes. Now our task is to forecast the revenue for next 16 months for each Areacode — Product category combination.

AreaCodes — [‘A1’, ‘A10’, ‘A2’, ‘A3’, ‘A4’, ‘A5’, ‘A6’, ‘A7’, ‘A8’, ‘A9’]

In this article we will mainly focus on understanding how to auto tune the SAXIMAX model. So we will not go deep into EDA and other data preprocessing like null imputations etc.. We will also assume that the data has a seasonality of a year (which is more likely in a e-commerce company as purchases depend on the time of the year)

Lets define the index of the data and convert the revenue fields to float. We will impute nulls with zero assuming that nulls meant no revenue

df['monthyyyymm_no'] = pd.to_datetime(df['monthyyyymm_no'], format="%Y%m")
df = df.set_index('monthyyyymm_no', drop=True)
col_name="areacode"
first_col = df.pop(col_name)
df=df.astype(float)
df.insert(0, col_name, first_col)
df=df.fillna(0)

We will use custom gridsearch to find the best parameters. To do this lets define the list of parameters in the format [(p,d,q), (P,D,Q,s)]

params = {
0:[(1,1,1), (1,1,1,12)],
1:[(1,1,0), (1,1,1,12)],
2:[(1,1,0), (1,1,0,12)],
3:[(1,1,0), (0,1,0,12)],
4:[(1,1,1), (1,1,0,12)],
5:[(1,1,1), (2,1,0,12)],
6:[(1,1,2), (1,1,2,12)],
7:[(1,1,1), (1,1,2,12)],
8:[(1,1,1), (2,1,2,12)],
9:[(1,1,0), (1,1,2,12)],
10:[(2,1,1), (2,1,1,12)],
11:[(2,1,1), (1,1,1,12)],
12:[(2,1,1), (1,1,0,12)],
13:[(1,1,2), (2,1,2,12)],
14:[(1,1,2), (1,1,0,12)],
15:[(0,1,1), (1,1,1,12)]
}

Above we have defined 15 different parameter combination. This is inconsideration of the running cost. We can dynamically define more parameters like below where we get all the combination between 0 to 2.

p  = q = range(0,3)
d = range(0,2)
pdq = list(itertools.product(p, d, q))
pdqs = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

We need models for every Area — Product_category combination. So lets define lists to get them and also step=16 specifies the steps of the forecast we need

columns=['Cosmatics', 'Processed_Food','Furniture','Books']
area=[i for i in df['areacode'].unique()]
steps=16

We are using AIC as the metric to get the best parameters. Model with lowest AIC is picked to be best and the parameters corresponding to that model are stored as the best_param. In most timeseries models we use all the available actuals to train the model. This is because the most actual data will tell us the behavior of the future forecasts. Due to this we cannot split timeseries into test and train and hence cannot use our regression metrics like RMSE or MAE. So we relay on either Maximum Likelihood, AIC or BIC. In this example we are using AIC as the metric.

Now lets define a function to auto tune the SARIMAX model and store the parameters with the lowest AIC score.

#function returns dataframe with best parameters
def sarimax_gridsearch(ts,area,col,params, maxiter=5000):
'''
Input:
ts : your time series data
area : areacode of the current model
col : product category on which the model is being built
maxiter : number of iterations, increase if your model isn't converging
frequency : default='M' for month. Change to suit your time series frequency
e.g. 'D' for day, 'H' for hour, 'Y' for year.

Return:
Print and save out top parameter combinations
Returns dataframe of parameter combinations ranked by AIC
'''
# Run a grid search with pdq and seasonal pdq parameters and get the best AIC value
ans = []
for i in range(0,len(params)):

mod = SARIMAX(ts,
order=params[i][0],
seasonal_order=params[i][1],
enforce_stationarity=False,
enforce_invertibility=False
)

output = mod.fit()
ans.append([area,col,params[i][0][0],params[i][0][1],params[i][0][2], params[i][1][0],params[i][1][1],params[i][1][2],params[i][1][3],params[i][0],params[i][1], output.aic])
print('SARIMAX {} x {} 12 : parameters {},{} AIC={} '.format(area,col,params[i][0], params[i][1], output.aic))
# Convert into dataframe
ans_df = pd.DataFrame(ans, columns=['area','product','p','d','q', 'Ps','Ds','Qs','Ss','pdq','pdqs', 'aic'])
# Sort and return top combination
ans_df = ans_df.sort_values(by=['area','product','aic'],ascending=True)[0:1]

return ans_df

The above function accepts the actual data ,parameter grid (params) and corresponding area and product_category (col) as input. The output of this function is a dataframe with best parameters ((p,d,q),(P,D,Q,s)) along with the AIC value for the parameter combination for each ‘area’ and ‘product’ combination.

Below code shows how to call the above function

best_param=pd.DataFrame() # --- Dataframe to store the best parameters for each area-product combinationprediction=pd.DataFrame() # -- Dataframe to store the predictins for next 16 months for each area-product combinationans = []
par={}
for t in area:
data=df[df['areacode']==t]
forecast=pd.DataFrame()
temp={}
for c in columns:
# STEP 1 - Calling function to get best paramaters for each area-product combination
df_ans=sarimax_gridsearch(data[c],t,c,params)
# STEP 2 - Storing the best parameters for each area-product combination
best_param=best_param.append(df_ans)
best_param=best_param.sort_values(by=['area','product','aic'],ascending=True)

print('for area {} and product {}'.format(t,c))
print('best pdq is {}'.format(best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdq'].iloc[0]))
print('best pdqs is {}'.format(best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdqs'].iloc[0]))

# STEP 3- Building model with best parameters to make forecast for next 16 months
smx = SARIMAX(
data[c],
order=best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdq'].iloc[0],
seasonal_order=best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdqs'].iloc[0],
enforce_stationarity=False,
enforce_invertibility=False
)

model = smx.fit()
predictions = model.get_forecast(
steps=steps
).predicted_mean

df_forecast = pd.DataFrame(predictions)
df_forecast.columns=[c]
temp[c]=(best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdq'].iloc[0],best_param.loc[(best_param['area']==t) & (best_param['product']==c)]['pdqs'].iloc[0])

forecast=pd.concat([forecast, df_forecast], axis=1)
forecast=forecast.fillna(0)
forecast['area']=t
par[t]=temp # -- storing the best parameters into dictionary . this is an optional step

#step 4- storing the forecast of next 16 months for each area-product combination
prediction=pd.concat([prediction,forecast],axis=0)

The above code snippet mainly does these steps

Step 1 — Once data is filtered for a particular area, it is passed into the gridsearch function to get the best parameters for each area-product_category combination

Step 2 — Stores the best parameters for each area-product_category combination into a master best_param dataframe

Step 3 — SAXIMAX model is built using the best parameter for each area-product_category combination to make prediction for the next 16 months

Step 4 — Stores the forecast of the next 16 months for each area-product_category combination in a master master prediction dataframe

A sample of the final output of the above code is shown below

Fig 2 : Sample output of the best_parameters found by the gridsearch function along with AIC values
Fig 3 : Sample prediction of all 4 product_categories for Areacode A1 stored in prediction dataframe

The above code snippets can be used for other case studies and also with very little changes can also be used for other timeseries algo too.

Thanks for reading this article! Leave a comment below if you have any questions.

--

--