Table of Contents

Introduction to Time Series Analysis

As discussed in the data preparation chapter, time series usually represent the dynamics of some process. Therefore, the order of the data entries has to be preserved. As emphasised, a time series is simply a set of data—usually events—arranged by a time marker. Typically, time series are placed in the order in which events occur/are recorded.

In the context of IoT systems, there might be several reasons why time series analysis is needed. The most widely ones are the following:

Due to its diversity, various algorithms might be used in anomaly detection, including those covered in previous chapters. For instance, clustering for typical response clusters, regression for normal future states estimation and measuring the distance between forecast and actual measurements, and classification to classify normal or abnormal states. An excellent example of using classification-based methods for anomaly detection is Isolation forests [3]

While most of the methods covered here might be employed in time series analysis, this chapter outlines anomaly detection and classification cases through an industrial cooling system example.

A cooling system case

A given industrial cooling system has to maintain a specific temperature mode of around -18oC. Due to the technology specifics, it goes through a defrost cycle every few hours to avoid ice deposits, leading to inefficiency and potential malfunction. However, a relatively short power supply interruption has been noticed at some point, which needs to be recognised in the future for reporting appropriately. The logged data series is depicted in the following figure 1:

 Cooling System
Figure 1: Cooling System

It is easy to notice that there are two standard behaviour patterns: defrost (small spikes), temperature maintenance (data between spikes) and one anomaly – the high spike.

One possible alternative for building a classification model is to use K-nearest neighbours (KNN). Whenever a new data fragment is collected, it is compared to the closest ones and applies a majority principle to determine its class. In this example, three behaviour patterns are recognised; therefore, a sample collection must be composed for each pattern. It might be done by hand since, in this case, the time series is relatively short.

Examples of the collected patterns (defrost on the left and temperature maintenance on the right) are present in figure 2:

 Example Patterns
Figure 2: Example Patterns

Unfortunately, in this example, only one anomaly is present (figure 3):

 Anomaly Pattern
Figure 3: Anomaly Pattern

A data augmentation technique might be applied to overcome data scarcity, where several other samples are produced from the given data sample. This is done by applying Gaussian noise and randomly changing the sample's length (for example, the original anomaly sample is not used for the model). Altogether, the collection of initial data might be represented by the following figure 4:

 Data Collection
Figure 4: Data Collection

One might notice that:

The abovementioned issues expose the problem of calculating distances from one example to another since comparing data points will produce misleading distance values. To avoid it, a Dynamic Time Warping (DTW) metric has to be employed [4]. For the practical implementations in Python, it is highly recommended to visit TS learn library documentation [5].

Once the distance metric is selected and the initial dataset is produced, the KNN might be implemented. The closest ones can be determined using DTW by providing the “query” data sequence. As an example, a simple query is depicted in the following figure 5:

 Single Query
Figure 5: Single Query

For practical implementation, the TSleanr package is used. In the following example, 10 randomly selected data sequences are produced from the initial data set. While the data set is the same, none of the selected data sequences are “seen” by the model due to the randomness. The following figure shows the results 6:

 Multiple Test Queries
Figure 6: Multiple Test Queries

As it might be noticed, the query (black) samples are somewhat different from the ones found to be “closest” by the KNN. However, because of the DTW advantages, the classification is done perfectly. The same idea demonstrated here might be used for unknown anomalies by setting a similarity threshold for DTW, classifying known anomalies as shown here, or even simple forecasting.


[1] Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. oTexts. Retrieved 19 May 2015.
[2] Box, George E. P. (2015). Time Series Analysis: Forecasting and Control. WILEY. ISBN 978-1-118-67502-1.
[3] IsolationForest example — scikit-learn 1.5.2 documentation
[4] Gold, Omer; Sharir, Micha (2018). “Dynamic Time Warping and Geometric Edit Distance: Breaking the Quadratic Barrier”. ACM Transactions on Algorithms. 14 (4). doi:10.1145/3230734. S2CID 52070903.
[5] Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal Kolar, & Eli Woods (2020). TSlearn, A Machine Learning Toolkit for Time Series Data. Journal of Machine Learning Research, 21(118), 1-6.