LSTM for time series prediction

The objective of this article is to present the reader with a class in python that has a very intuitive and easy input to model and predict time series data using deep learning. Ideally, the reader should be able to copy the code presented in this article or the GitHub repository, tailor it to his needs (add more layers to the model for example) and use it in his/her work.

All the code that is used in this article can be found here:

The data for this article can be found here:

The packages that are used for deep modeling are TensorFlow and Keras.

The time-series is a sequence of numerical data points in successive order. These points are often measured at regular intervals (every month, every day, every hour, etc.). The data frequency used in this article is hourly and it was measured from 2004–10–01 to 2018–08–03. The total number of raw data points is 121271.

Visualization of the time series:

The main objective of the deep learning algorithm for a given time series is to find a function f such that:

Yₜ = f(Yₜ₋₁, Yₜ₋₂, …, Yₜ₋ₚ)

In other words, we want to estimate a function that explains the current values of energy consumption based on p lags of the same energy consumption.

Firstly, we need to read the data:-

df = pd.read_csv(‘household_power_consumption.txt’, sep=’;’,

parse_dates={‘dt’ : [‘Date’, ‘Time’]}, infer_datetime_format=True,

low_memory=False, na_values=[‘nan’,’?’], index_col=’dt’)

Attribute Information:

1.date: Date in format dd/mm/yyyy

2.time: time in format hh:mm:ss

3.global_active_power: household global minute-averaged active power (in kilowatt)

4.global_reactive_power: household global minute-averaged reactive power (in kilowatt)

5.voltage: minute-averaged voltage (in volt)

6.global_intensity: household global minute-averaged current intensity (in ampere)

7.sub_metering_1: energy sub-metering №1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

8.sub_metering_2: energy sub-metering №2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

9.sub_metering_3: energy sub-metering №3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

Data Cleaning before processing …..

1) Note that data include ‘nan’ and ‘?’ as a string. I converted both to NumPy nan in the importing stage (above) and treated both of them the same.

2) I merged two columns ‘Date’ and ‘Time’ to ‘dt’.

3) I also converted in the above, the data to time-series type, by taking the index to be the time.

df.head()df.info()
df.dtypes
df.describe()
df.columns

Dealing with missing values ‘nan’ with a test statistic

Machine-Leaning: LSTM Data Preparation and feature engineering

I will apply the recurrent neural network (LSTM) which is best suited for time-series and sequential problems. This approach is best if we have large data.

In order to reduce the computation time, and also get a quick result to test the model. One can resample the data over an hour (the original data are given in minutes). This will reduce the size of data from 2075259 to 34589 but keep the overall structure of data as shown in the above.

  • Above I showed 7 input variables (input series) and the 1 output variable for ‘Global_active_power’ at the current time in an hour (depending on resampling)

Splitting the rest of data to train and validation sets

First, I split the prepared dataset into train and test sets. To speed up the training of the model (for the sake of the demonstration), we will only train the model on the first year of data, then evaluate it on the next 3 years of data.

Model architecture

1) LSTM with 100 neurons in the first visible layer

3) dropout 20%

4) 1 neuron in the output layer for predicting Global_active_power.

5) The input shape will be 1-time step with 7 features.

6) I use the Mean Absolute Error (MAE) loss function and the efficient Adam version of stochastic gradient descent.

7) The model will be fit for 20 training epochs with a batch size of 70.

Final remarks

Here I have used the LSTM neural network which is now state-of-the-art for sequential problems.

In order to reduce the computation time, and get some results quickly, I took the first year of data (resampled over an hour) to train the model and the rest of the data to test the model.

I put together a very simple LSTM neural-network to show that one can obtain reasonable predictions.

Moreover, the neural-network architecture that I have designed is a toy model. It can be easily improved by adding CNN and dropout layers. The CNN is useful here since there are correlations in data (CNN layer is a good way to probe the local structure of data).

For more such articles, you can follow me on my LinkedIn Account. The link is given below:-

I am a Machine Learning Engineer currently pursuing my masters from University of Ottawa . https://www.linkedin.com/in/iamharman/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store