In a lot of machine learning problems the thing we want to predict is dependent on very clear inputs, such as properties of the target, pixels of an image, etc. In time series these indepent variables are often not known or does not exist.
For example, in predicting the demand of fresh vegetable products from a market, we don’t have a clear independent set of variables where we can fit a model on. A market collects vegetables from various farmers of different places. Is demand of a vegtable product dependent on the properties of a farm it was grown in, or the temperature at that place or the height and weight of the farmer? No, the demand of a vegetable does not depend on the farm or the farmer. The independent data is not easily available or even if we can try to find a relationship between such independent variables and the vegetable demand, these relationships are not perfect and clear.
The time series analysis is frequently used in such cases. The fundamental intuition behind time series forecasting is that, the measure of some variable at a time period will depend on the measure of the same variable at previous time periods. Therefore, we analyze the series on the series itself. If you are walking the path of becoming a data scientist, you might have already come across the term Time Series and you might have also realized the importance of Time Series Analysis and Forecating. In this post, I will try to give a gentle introduction so that it can kickstart your learning.
- TOC {:toc}
What is Time series?
As the name suggests, time series is just a series of observations collected over different points in time. There exists a correlation between the observations collected at adjacent time points, therefore the previous observations of a variable can be used in predicting the same variable. This distinguishes time series data from general machine learning data where the observtions are collected at a single point in time.
The data collected over time represents a time series only if the observations are dependent on time. If the data collected is purely random in nature, forecating the future values is not possible and such data is called white noise.
Univariate vs Multivariate time series
If only a single variable is varying over time, it is called Univariate time series. For example, temperature of a room measured every hour. Here there are no other variables recorded, hence predicting temperature only depends on the temperature values recorded at previous time points.
If there are more than one variable varying over time, it is called Multivariate time series. For example, if the humidity is recorded along with the temperature then both temperature and humidity are to be considered in order to predict the temperature.
Note: Predecting the future is not the only goal of time series data. We can have different goals while working with time series. These goals can be mainly categorized into analysis and forecasting.
Time Series Analysis
The main goal of time series analysis is exctracting useful statistics from data in order to understand the nature and underlying causes of the past. It helps to describe available data and provide interpretation to understand the problem domain better. Time series analyis can help to make better predictions.
Time Series Forecasting
The main goal of forecasting is to build models on the past data and use them to predict future observations. For example, predicting number of births in a country based on the data collected in past years. This is challenging as the future observations are unavailable and must be predicted from what has already happened in the past.
Stationarity in Time Series
If all the statistical characteristics of data like mean, auto correlation, variance do not vary with time then the time series is called stationary. But in general, most of the data recorded in not stationary i.e. the properties vary with time.
Analyzing such time series helps to understand the patterns such as trend, seasonality,cyclicality and irregularity. Trend is a general direction the data is changing as time passes. Seasonality is when a pattern recurs over fixed regular time intervals. Cyclicality is when there are any fluctuations around the trend. Unlike seasonality, cyclicality may vary in length. Irregularity is when there are random fluctuations which are not systematic and are irregular. These fluctuations cannot be controlled. These are called as time series components.
Most of the forecasting methods assume that the data is stationary because it is easy to predict the stationary data. Therfore, it is important to convert non-stationary data to stationary in order to apply forecasting models.
Check out this post by Mehul Gupta which explains Why time series has to be stationary more clearly.
It is important to analyze these components carefully in order to better understand the problem during analysis or forecasting. Since it is difficult to see all the components in a time series, a method called Decomposition can be used to identify them. These components can either combine in an additive way or in a multiplicative way.
An Additive time series is when the fluctuations in the data do not vary over time. Additive model is linear and seasonality has same frequency although the time increases.
Time series = trend + seasonality + cyclicality + irreguarity
A Multiplicative time series is when the variations or the fluctuations in the data increases as the time increases. Multiplicative model is non-linear and seasonality has either increasing or decreasing frequency.
Time series = trend * seasonality * cyclicality * irregularity
Time Series Decomposition
The purpose of decomposition is to identify and seperate components from a time series in order to perform better analysis and forecasting.
In general, the cyclical component is hard to seperate and it is left by grouping it with the trend component, to form a trend-cycle component. It is often simply referred to as the trend component, even though it may contain cyclical behavior.
Classical decomposition can be either a multiplicative or an additive decomposition. A function called seasonal_decompose() can be used to perform classical decomposition. You need to mention whether the model is additive or multiplicative.
Below is an example which shows decomposition of a dataset into components including the original data, trend, seasonality and irregularity(residual).
Let’s first load the dataset and plot a simple graph:
Since the variations are very complex, we cannot see all the components clearly. Now, Decomposing this will give us clear picture of the components. Let’s look how we can decompose this using seasonal_decompose() function:
Now you can see all the components, you can analyze them and remove any of them them if not needed. For example, if you want to analyze the trend of a stock data, you would need to remove the seasonality found in the data and the noise due to irregularity.
Let’s look into the basic steps to be followed while performing a forecasting task -
Basic steps for Forecasting
- Defining the problem: Understanding the problem domain and clearly knowing the end goal of the forecast. The most important skill needed for a data scientist is being able to explain why a prediction is made and present results in a proper way. This is possible only with having a clear knowledge of who needs the forecast, why and how it will be used.
- Data Collection: Collecting the past data related to the problem domain, gathering other important information from domain experts.
- Data preperation: This includes exploring the data to know components like trend or seasonality, cleaning the data to fill the missing values and remove outliers if any, basic feature engineering to understand the relation ship between features or to add any new features, resampling and data transforms to remove noise and improve the forecasting.
- Modeling: This includes configuring the right forecast model for the data. Widely used time series models are Auto Regressive(AR) models, Moving Average(MA) models, Intergrated(I) models and the combination of these models like Auto Regressive Moving Average models(ARMA), Auto Regressive Integrated Moving Average models(ARIMA). It is better to try models of different types, from simple to advanced approaches.
- Evaluation: The time series forecasting model can only be trusted through its performance at predicting the future. This may include testing the model on previous data by creating train-test splits and calculating error or wait for the new observations to occur to compare the predictions.
Applications of Time Series Forecasting
- Forecasting of agricultural commodity price
- Stock market analysis and forecasting
- Sales forecasting
- Forecasting supply chain components
- Weather forecasting
- Forecasting the birth rate in a country
any many more …
AR: Auto Regressive model
Auto Regressive (AR) model is a specific type of regression model where, the dependent variable depends on past values of itself.
For example, consider a stock price data. The stock price of today is influenced by yesterday’s price. If the price(close price) of day t-1 is x dollars, the price of day t starts with x dollars. We can assume the price can be determined by the following model:
where and are constants, and is white noise.
This model is called AR model, and generally AR(p) is given by the following definition:
MA: Moving Average model
Moving Average (MA) model works by analysing how wrong you were in predicting values for the previous time-periods to make a better estimate for the current time-period. This model factors in errors from the observations.
Generally MA(q) is given by the following definition:
MA model complements the AR model by taking the errors from the previous time-periods into considerations, to get better forecast results.
The combined model between AR and MA is called ARMA model, and it is defined as AR(p,q).
Estimating the ARMA(p,q) model hyperparameters
When selecting the parameters p and q, we must focus on characteristics of the model. Each model have different characteristics for autocorrelation function (ACF) and partial autocorrelation function (PACF). By looking at the ACF and PACF plots, we can identify the numbers of MA and AR terms i.e. q and p respectively.
Autocorrelation
The relationship between two variables is summarized by correlation. If the correlation is calculated for the timeseries observations with the observations at previous time steps is called as Autocorrelation. The observations of previous timesteps are called as lags. The ACF plot is a bar chart of the coefficients of correlation between timeseries and lags of itself.
Partial Autocorrelation
The “partial” correlation between two variables is the amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables. If the correlation is calculated for the timeseries observations with the observations at previous time steps at lag k by eliminating all the effects of shorter lags i.e. 1,2,…k-1, then it is called as Partial Autocorrelation. The PACF plot is a plot of the partial correlation coefficients between the series and lags of itself.
Characteristics of ACF and PACF
Now, the models have the following characteristics for the autocorrelation and partial autocorrelation.
- AR(p): When the lag is getting large, The autocorrelation decreases exponentially and the partial autocorrelation cuts-off after lag p.
- MA(q): When the lag is getting large, the autocorrelation cuts-off after lag q and the partial autocorelation tails off to zero.
- ARMA(p,q): When the lag is getting large, both autocorrelation and partial autocorrelation tails off to zero.
Using these characteristics, we can estimate the proper model.
Stationarity
AR, MA and ARMA models require the data to be stationary. A stationary series has a constant mean and variance over time. But in real-world, most of the data is not stationary.
So how to forecast non-stationary series?
Well, here comes the ARIMA model which works with data that is not stationary. The new term added for ARIMA is I.
Integrated (I)
Consider a non-stationary series which needs to be forecasted. We can say whether the data is stationary or not by studying the plot against time.
It is clearly visible that the mean is increasing over time i.e. the series is not stationary. If this upward trend is eliminated, the series becomes stationary. The easiest way to do this is to consoder the differences between consecutive timesteps. It goes as follows:
After applying this transformation, the series becomes like this with observable linear trend -
The transformed series is called as the differenced series. This differenced series is used for forecasting instead of the timeseries. This step of converting non-stationary series to stationary series results in the new term I which stands for Integrated. (Note: This has nothing to with integration).
In the above example, the data becomes stationary after performing the first order differencing which means a single time differencing of observations at consecutive timesteps. But in some cases, the data remains non-stationary after performing the first order differencing. Hence, the series could be differenced more than once to make it stationary.
d = 1:
d = 2:
Hence, ARIMA has hyperparameters p, q and d:
- p - order of the AR model
- q - order of the MA model
- d - order of differencing
Seasonality
It is to be noted that ARIMA model assumes that the data is not seasonal. The presence of variations that occur at specific regular intervals, such as weekly or monthly results in seasonality of time series. For example, Arrival of vegetables to the market or sales of Diwali crackers. ARIMA does not work well for seasonal series.
There is an upgrade of ARIMA model, called Seasonal ARIMA or SARIMA. It is represented as ARIMA (p,d,q) X (P,D,Q,m).
- p - trend AR order
- q - trend MA order
- d - trend difference order
- P - seasonal AR order
- Q - seasonal MA order
- D - seasonal difference order
- m - frequency of seasonality in timeseries
To identify values for the seasonal model, ACF and PACF plots can be analyzed by looking at correlation at seasonal lag time steps i.e. the lag at which seasonality occurs. Alternately, this post explains how grid searching can be used across the trend and seasonal hyperparameters.
Another easy way to use is to use pmdarima’s auto.arima which automatically gives the best fit model.
Further Reading
- Fundamentals of Time series Data and Analysis
- ARIMA models for time series forecasting
- Summary of rules for identifying ARIMA models
- Notes on nonseasonal ARIMA models
- How does auto.arima() work?
Credits
Thanks for reading! And please feel free to connect to me via twitter or linkedin. Feedback is always welcome.