# Multivariate Timeseries Forecast with Lead and Lag Timesteps Using LSTM

**Why Multivariate and How it can help to make better predictions**?

Time series forecast plays a critical role in taking decisions in most industries. For example , forecasting the number containers to be purchased for a shipping company can save millions for the business. Similarly forecasting the demand for a particular product type plays a very important role in pricing and hence profitability for a e-commerce company.

In most cases its the business or the operations team who knows the factors that effect the demand or supply. Simply making the forecast based on the historical patterns may not always yield the desired output or it might not consider the future prospects. There are chances that the past mistakes can be repeated in the future forecasts. It is always good to consider the factors in effect and provide the team capability to play around and understand their impact on the predictions.

This is when the multivariate timeseries forecast comes into picture. Let us understand the multivariate forecast using below images

Figure 1 depicts the multivariate timeseries forecast of the dependent variable Y at time t with a lag=5. Cell in red is the value to be forecasted at time t which depends on the values in yellow cells (t-5 to t). These are the independent variables effecting the prediction of Y at t.

We can consider multivariate timeseries as regression problem with independent variables being the features of the previous lag (till t-1)along with the independent values of time t. With this approach there is lot more control on the forecast than just the previous timestamps.

Below is the Multivariate timeseries which also considers the lead values

From the above figure we can see that, along with the lag features, lead=2 (t+2) timesteps is also considered to make the forecast. This gives us more control on the factors effecting the forecast. In many cases we know that some of the future factors also effects our current time predictions. With these approach, the decision making team can really simulate the forecast based on various input values of independent features.

**Implementation of Forecast model using LSTM**

Now lets see how to implement the multivariate timeseries with both lead and lag feature.

- Getting the data ready with lead and lag factors

The major difference between using a LSTM for a regression task to timeseries is that , in timeseries lead and lag timestamp data needs to be considered. Lets define a function which can just do this based on the lead and lag as a parameter

`# convert series to supervised learning`

def series_to_supervised(data, n_lag=1, n_lead=1, dropnan=True):

n_vars = 1 if type(data) is list else data.shape[1]

df = DataFrame(data)

cols, names = list(), list()

# input sequence (t-n, … t-1)

for i in range(n_lag, 0, -1):

cols.append(df.shift(i))

names += [(‘var%d(t-%d)’ % (j+1, i)) for j in range(n_vars)]

# forecast sequence (t, t+1, … t+n)

for i in range(0, n_lead):

cols.append(df.shift(-i))

if i == 0:

names += [(‘var%d(t)’ % (j+1)) for j in range(n_vars)]

else:

names += [(‘var%d(t+%d)’ % (j+1, i)) for j in range(n_vars)]

# put it all together

agg = concat(cols, axis=1)

agg.columns = names

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg

The above functions converts the data into timeseries series with customized n_lag and n_lead steps. The ouput of this function contains data of lag and lead steps as columns with (t-n) or (t+n) timestamps

reframed = series_to_supervised(values, n_lag, (n_lead+1))#removing the future (t+n) dependent variable (Y)if n_lead>0:

reframed= reframed.drop(reframed.iloc[:,[i for i in range(df_no.shape[1]*(n_lag+1),reframed.shape[1],df_no.shape[1])]],axis=1)

The above code helps in dropping the future Y (at t+n) while training the models. Once we drop the future Y and we have the reframed data, its as simple as training the LSTM for a regression problem.

# splitting reframed to X and Y considering the first column to be out target featureX=reframed.drop(['var1(t)'],axis=1)

Y=reframed['var1(t)']X_values=X.values

Y_values=Y.values#n_preduct being the test lengthtrain_X,train_Y = X_values[:(X_values.shape[0]-n_predict),:],Y_values[:(X_values.shape[0]-n_predict)]

test_X,test_Y = X_values[(X_values.shape[0]-n_predict):,:],Y_values[(X_values.shape[0]-n_predict):]#reshaping train and test to feed to LSTM

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

Creating a simple LSTM model

`opt = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)`

model = Sequential()

model.add(LSTM(100,return_sequences=True, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dropout(0.25))

model.add(LSTM(units=50,return_sequences=True))

model.add(Dropout(0.20))

model.add(LSTM(units=10,return_sequences=False))

model.add(Dense(units=1, activation='linear'))

model.compile(loss='mae', optimizer=opt)

Once model is ready we can train the model on the train data and test it on the test. Below code shows some of the training checkpoints which can be used to help train a good model.

#adding few model check points

es = EarlyStopping(monitor='val_loss', min_delta=1e-10, patience=10, verbose=1)

rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.01, patience=10, verbose=1)

mcp = ModelCheckpoint(filepath="/test.h5", monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False)

tb = TensorBoard('logs')history = model.fit(train_X, train_Y, epochs=50, batch_size=10,callbacks=[mcp,rlr],validation_data=(test_X, test_Y), verbose=2, shuffle=False)

Once model is trained , we can get the predictions for our test data

`yhat = model.predict(test_X)`

**Summary**

In this article we saw what a multivariate timeseries is and how to use both lead and lag data to make the predicts. Some of the points to be noted while using this approach are

- As the n_lead and n_lag increases, the number of features at a particular prediction also increases. For example if we have 5 independent features at every time stamp and we conside n_lag=5 and n_lead =2, then the over all features post reframe will be 5+5*(n_lag)+5*(n_lead), which is in case 40 features.
- Good amount of training data is required as using lag and lead would reduce the trainig rows.
- LSTM model architecture has to be wisely considered to avoid the over fitting as number of features increases or decreases every time we change n_lead and n_lag.