Outlier Detection

Let’s take a moment to reflect. Both Kats and Greykite return detected data points for trend and changepoint detection. But are there other types of data points worth exploring?

One particularly intriguing concept is outliers in time series. As highlighted in Lotus Queen, an outlier is an object that significantly deviates from the rest of the dataset. In time series analysis, such points can provide valuable insights. For example, what caused the spikes in fraud detection traffic? Why are Sunday sales consistently lower than on weekdays? These are just a few examples.

Some outliers can be safely ignored, while others must be considered in subsequent modeling stages. Understanding the causes behind these anomalies is a critical step in deciding whether to exclude them or integrate them into your analysis.

In this section, you’ll explore outlier detection methods offered by Kats and Greykite, as well as Lady H.'s self-implemented methods! 😉

Kats Outlier Detection (Univariate Time Series)

Kats' outlier detection for univariate time series is based on the n * IQR (Interquartile Range) method. A data point is classified as an outlier if its value is lower than Q1−n×IQR (where Q1 is the first quartile) or higher than Q3+n×IQR (where Q3 is the third quartile). The parameter n is adjustable, allowing you to control the sensitivity of the outlier detection.

More detailed process is, Kats will decompose the time series first using additive or multiplicative decomposition, its purpose is to remove trend, seasonality and only keep residuals, then it detects outliers in residuals using the n * IQR method. By default, Kats uses n=3 but you can adjust its value with parameter iqr_mult.

The detection code is as simple as the 2 lines of code below. The time series here is sales data which was proved to be a better fit for multiplication decomposition, therefore here we are using "multiplication" as the decomposing method, set iqr_mult as 5 so that we can get more obvious outliers.

If we plot the detected outliers on the original time series data, the detected outliers are not all appear at crest or trough. However, they do appear to be more different from the rest of the data in residuals. This also tells us, sometimes the crest or trough of a time series may not be an outlier even if they have the most extreme values, they might be caused by seasonality or trend.

At the same time, Kats provides outlier removal function, and there are 2 options:

"No interpolation" option will replace detected outliers with NAN
"With interpolation" option will replace detected outliers using linear interpolation values

In the charts below, Lady H. has plotted the results from both options, the yellow line is the original time series while the green line is the time series after Kats outlier removal:

Again, we need to be cautious about outliers removal in time series data, since keeping some insightful outliers for later model forecasting will bring us more benefits than removing them.

🌻 Check detailed code in Kats Outlier Detection >>

Greykite Outlier Detection (Univariate Time Series)

Have to admit, Greykite doesn't have a mature outlier detection method. It only has ZscoreOutlierTransformer() function which will replace its detected outliers to NAN directly, this can be risky, since some outliers will benefit later model forecasting and better not to be removed. Moreover, it only applies to univariate time series.

Greykite detects outliers based on z-score cutoff idea:

It first calculates the z-score for each observation using z-score = abs(observation_value - population_mean) / population_std
If a z-score value is larger than a cutoff threshold, then the observation will be considered as an outlier, and Greykite will repalce it with NAN

Here's Greykite detected outliers if we set the cutoff thereshold as 3.

Let's compare with Kats detected outliers output:

Greykite tends to choose the most extreme data points as outliers, but after removing the trend and seasonality, this data point didn't appear to be an outlier in the residuals. For univariate time series' outlier detection, should we often use residuals instead of the raw time series for the detection? Kats vs Greykite method, which one makes more sense to you? Share your ideas here!

🌻 Check detailed code in Greykite Outlier Detection >>

Lady H.'s Self Implemented Outlier Detection (Multivariate Time Series)

While Kats does offer multivariate time series outlier detection, it unfortunately didn’t work as expected. As shown in this chat, Lady H. encountered errors during her attempts to use Kats for multivariate outlier detection. After investigating, she discovered that the issue stemmed from bugs in Kats. Upon reviewing the basic logic behind Kats’ multivariate detection method, Lady H. concluded that implementing her own solution would be more efficient than debugging Kats. Moreover, she went a step further and developed additional methods for detecting outliers in multivariate time series! 😉

VAR for Multivariate Time Series Outlier Detection

Kats was using VAR (Vector Auto Regression) for multivariate time series' outlier detection. Lady H. made some improvement upon this idea, and the general process is:

Test the dataset is cointegrated.
Make sure every time series in the input is stationary.
Fit VAR model with an optimal order, and apply Durbin Watson test to make sure there is no leftover pattern in the residuals in any time series.
Calculate squared errors with the fitted VAR, then calculate a threshold as threshold = avg + n * std.
The outliers are data records with its squared error higher than the threshold.

Now let's dive into more details!

First, let’s understand how the VAR (Vector Auto Regression) model works. VAR is a statistical model designed to capture the relationships between multiple variables as they evolve over time. In a VAR model, each variable is expressed as a linear combination of its own past values and the past values of other variables in the system. When dealing with multiple time series that are influencing each other, the model represents them as a system of equations, with one equation for each variable (time series). To make its formula easier to grasp, Lady H. has provided three examples below.

In the 1st example, there are 2 variables Y1 and Y2, each variable is represented as a linear combination of its lags (past values) and other variables' lags. α is the intercept, β is the coefficient, ε is the error term. The number of lags is the order of VAR, so in this example the past values are only using t-1, so the order is 1.
Similarly, in the 2nd example, the order is 2 because each variable has used t-1, t-2, 2 lags.
In the 3rd example, the VAR order is still 2, but there are 3 variables Y1, Y2, Y3.

Before applying the VAR (Vector Auto Regression) model, it is essential to ensure that all input variables are stationary, as this is a fundamental requirement for VAR. Fortunately, this was already addressed in the previous data exploration section. By applying first-order differencing to the "Humidity" and "HumidityRatio" variables, each time series satisfies to be differencing stationary and trending stationary.

Next, we need to determine the optimal order for the VAR (Vector Auto Regression) model. VAR uses four model selection metrics to identify the best order:

AIC (Akaike Information Criterion): This metric estimates the prediction error of a statistical model. It includes a penalty term to account for overfitting when the number of parameters exceeds the optimal level. AIC is designed to approximate complex models that may not perfectly represent reality, often favoring models with more parameters.
BIC (Bayesian Information Criterion): BIC is used for model selection within a set of parametric models with varying numbers of parameters. Compared to AIC, BIC focuses only on true models and applies a stronger penalty for additional parameters.
FPE (Final Prediction Error): FPE measures the model's prediction error for new data. A model with a lower FPE achieves a better balance between the number of parameters and its explanatory power.
HQIC (Hannan–Quinn Information Criterion): Similar to AIC, HQIC includes a penalty term but imposes a stronger penalty, making it more conservative in selecting model complexity.

For all these metrics, lower values indicate a better-performing model.

As observed in the output above, the order of 18 yields the lowest AIC and FPE values, while the BIC and HQIC values are also near their lowest. Based on these results, the optimal order is determined to be 18.

However, selecting the optimal order for VAR is not sufficient. It is crucial to verify that there are no remaining patterns in the residuals. To achieve this, we use the Durbin-Watson test, which checks for serial correlation to ensure that the model adequately explains the variances and patterns in the time series without leaving residual patterns unconsidered.

The Durbin-Watson test produces values in the range of [0, 4]. A value close to 2 indicates no significant serial correlation. Values closer to 0 suggest positive serial correlation, while values closer to 4 indicate negative serial correlation. So getting close to 2 is better.

The output above looks good. Otherwise, more actions are needed, such as increasing the order of the model or adding more variables or looking for a different model.

Now it comes to the core logic of outlier detection, which only has 2 lines of code as shown in detect_anomalies() function. The input is the squared error of the fitted VAR, then a threshold is calculated using the average squared error plus n times of the standard deviation of squared error. An outlier is the record with its squared error reaches to or above the threshold.

The whole piece of code and a sample of detected outliers are shown below:

However how to visualize the the outliers of this multivariate time series still puzzles Lady H., since if you look at each time series in the data, they have quite different scales, we can't simply put them in 1 plot and show the outliers, put them in seperate plots and show outliers still can't deliver any meaningful insight since the idea is based on the relationship between multiple time series. Any suggestion on the visualization in this problem?

🌻 Check detailed code in VAR Outlier Detection >>

VECM for Multivariate Time Series Outlier Detection

VECM, or Vector Error Correction Model, extends VAR (Vector Auto Regression) by incorporating error correction mechanisms. It is particularly useful for analyzing systems with potentially non-stationary variables. VECM begins by estimating an unrestricted VAR model and then tests for cointegration among the variables using the Johansen test. Because of this, VECM is often referred to as a cointegrated VAR model.

The key advantage of VECM over VAR lies in its ability to provide more efficient coefficient estimates by accounting for the long-term equilibrium relationships between variables.

To apply VECM in multivariate time series outlier detection, the code is quite similar to VAR outlier detection, we just need to replace VAR model with VECM 😉

🌻 Check detailed code in VECM Outlier Detection >>

After applying both VAR and VECM for outlier detection, have you wondered how different their detection results might be? Lady H. was equally curious and conducted a comparison, and only found two records had different outlier detection results.

PreviousChangepoint Detection NextSeasonality Detection

Last updated 11 months ago