Table of Contents

IoT Data Analysis

General audience classification iconGeneral audience classification iconGeneral audience classification icon

IoT systems, in their essence, are built to act as a tool for getting better insights into different processes and systems in order to make better decisions. The insights are provided by measuring the statuses of the systems or process elements represented by data. Unfortunately, without properly interpreting the data content, they turn into useless bits and bytes. Therefore, providing a means for understanding data is an essential property of a modern IoT system. Today, IoT systems produce a vast amount of data, which is very hard to use manually. Thanks to modern hardware and software developments, it is possible to develop fully or semi-automated systems for data analysis and interpretation, which may go further into decision-making and acting according to the decisions.

As various resources have stated, IoT in most cases, complies with the so-called big 5Vs of Big Data, where just one correspondence is needed to solve a Big Data problem. As has been explained by Jain et al. [1] Big Data might be of different forms, volumes and structures, and in general, the 5Vs i.e. Volume, Variety, Veracity, Velocity and Value might be interpreted as follows:

Volume

This characteristic is the most obvious and refers to the size of the data. In most practical applications of IoT systems, large volumes of data are reached through intensive production and collection of sensor data. It usually rapidly populates existing operational systems and requires dedicated IoT data collection systems to be upgraded or developed from scratch (which is more advisable).

Variety

As Jain explained, big data is highly heterogeneous in terms of source, kind, and nature. Having different systems, processes, sensors, and other data sources, variety is usually a distinctive feature of practical IoT systems. For instance, a system of intelligent office buildings would need data from a building management system, appliances and independent sensors, and external sources like weather stations or forecasts from appropriate external weather forecast APIs (Application programming interfaces). Additionally, the given system might require historical data from other sources, like XML documents, CSV files or other sources, diversifying the sources even more.

Veracity

Unfortunately, the volume or diversity of data does not bring value; the data needs to be reliable and clean. In other words, data has to be of good quality; otherwise, the analysis might not bring additional value to the system’s owner or even compromise the decision-making process—the quality of data is represented by Veracity. In IoT applications, it is easy to lose data quality due to malfunctioning sensors that are missing or producing false data. Since the IoT essential part is hardware, the data must be preprocessed in most cases.

Velocity

Data velocity characterises the data bound to the time and its importance during a specific period or at a particular time instant. A good example might be any real-time system like an industrial process control system, where reactions or decisions must be made during a fixed period of time, requiring data at particular time instants. In this case, data has a flow nature of a particular density.

Value

Since the IoT systems and their data analysis subsystems are built to add value to their owner, the costs of the development and ownership should exceed the returned value. The system is of low or no value if it does not apply.

Dealing with big data requires specific hardware and software infrastructure. While there is a certain number of typical solutions and a lot more customise, some of the most popular are explained here:

Relational DB-based systems

Those systems are based on well-known relational data models and appropriate database management systems like MS SQL Server, Oracle Server, MySQL, etc. There are some advantageous features of those systems, for instance:

Unfortunately, scaling out data writing is not always possible and is usually supported at a high cost for software products.

 Relational DBMS scaling options
Figure 1: Relational DBMS scaling options

Complex Event Processing (CEP) systems

CEP systems are very application-tailored, enabling significant productivity at a reasonable cost. High productivity is usually needed for processing data streams, such as voice or video. Maintaining a limited time window for data processing is possible, which is relevant for systems that are close to real-time. Some of the most common drawbacks to be considered are:

 CEP systems
Figure 2: CEP systems

NoSQL systems

As the name suggests, the main characteristic is higher flexibility in data models, which overcomes the limitations of highly structured relational data models. NoSQL systems are usually distributed, where the distribution is the primary tool to enable supreme flexibility. In IoT systems, software typically gets older faster than hardware, which requires the maintenance of many versions of communication protocols and data formats to ensure back compatibility. Another reason is the variety of hardware suppliers, where some protocols or data formats are specific to the given vendor. It also provides a means for scalability out and up, enabling high future tolerance and resilience. A typical approach is to use a key-value or key-document approach, where a unique key indexes incoming data blocks or documents (JSON, for instance). Some other designs might extend the SQL data models by others – object models, graph models, or the mentioned key-value models, providing highly purpose-driven and, therefore, productive designs. However, the complexity of the design raises problems of data integrity as well as the complexity of maintenance.

 NoSQL systems
Figure 3: NoSQL systems

In-memory data grids

This is probably the most productive type of system, providing high flexibility, productivity and scalability. Because these systems are designed to operate in servers RAM, the in-memory data grids are the best choice for data preprocessing in IoT systems due to their high productivity and ability to scale dynamically depending on actual workloads. They provide all the benefits of the CEP and Relational systems, adding a scale-out functionality for data writing. There are only two major drawbacks – limited RAM and high development costs. Some examples of available solutions:

This chapter is devoted to the main groups of algorithms for numerical data analysis and interpretation, covering both mathematical foundations and application specifics in the context of IoT. The chapter is split into the following subchapter:


[1] Jain, A., Mittal, S., Bhagat, A., Sharma, D.K. (2023). Big Data Analytics and Security Over the Cloud: Characteristics, Analytics, Integration and Security. In: Srivastava, G., Ghosh, U., Lin, J.CW. (eds) Security and Risk Analysis for Intelligent Edge Computing. Advances in Information Security, vol 103. Springer, Cham. https://doi.org/10.1007/978-3-031-28150-1_2