Based on the IoT data lifecycle discussed earlier, we divide an IoT data management system into i) an online (i.e. real-time) front-end that interacts directly with the interconnected IoT objects and sensors, and ii) an offline back-end that handles the mass storage and in-depth analysis of the IoT data. The data management frontend is communication-intensive, as it involves the propagation of query requests and result to and from sensors and smart objects. The backend is storage-intensive, as it involves the mass storage of produced data for later processing and analysis and more in-depth queries. Although the storage elements reside on the back-end, they interact with the front-end on a frequent basis via continuous updates and are thus referred to as online. The autonomous edges in the IoT data lifecycle can be considered more communication-intensive than storage-intensive, as they provide real-time data to certain queries.
This envisioned data management architecture differs considerably from the existing database management systems (DBMSs), which are mainly storage-centric. In traditional databases, the bulk of data is collected from predefined and finite sources, and stored in scalar form according to strict normalisation rules in relations. Queries are used to retrieve specific “summary” views of the system or update specific items in the database. New data is inserted into the database when needed, also via insertion queries. Query operations are usually local, with execution costs bound to processing and intermediate storage. Transaction management mechanisms guarantee the ACID properties in order to enforce overall data integrity. Even if the database is distributed over multiple sites, query processing and distributed transaction management are enforced. The execution of distributed queries is based on the transparency principle, which dictates that the database is still viewed logically as one centralised unit, and the ACID properties are guaranteed via the two-phase commit protocol.
In the IoT systems, the picture is dramatically different, with a massive and ever-growing number of data sources that include sensors, RFIDs, embedded systems, and mobile devices. Contrary to the occasional updates and queries submitted to traditional DBMSs, data is constantly streaming from a multitude of edge devices to the IoT data stores, and queries are more frequent and with more versatile needs. Hierarchical data reporting and aggregation may be required for scalability guarantees as well as to enable more prompt processing functionality. The strict relational database schema and the relational normalisation practice may be relaxed in favour of more unstructured and flexible forms that adapt to the diverse data types and sophisticated queries. Although distributed DBMSs optimise queries based on communication considerations, optimisers base their decisions on fixed and well-defined schemas. This may not be the case in the IoT, where new data sources and streaming, localised data create a highly dynamic environment for query optimisers. Striving to guarantee the transparency requirements imposed in distributed DBMSs on IoT data management systems is challenging, if not impossible. Furthermore, transparency may not even be required in the IoT because innovative applications and services may require location and context awareness. Maintaining ACID properties in bounded IoT spaces (subsystems), while executing transactions can be managed, but is challenging for the more globalised space. However, the element of mobile data sources and how their generated data can be incorporated into the already established data space is a novel challenge that is yet to be addressed by the IoT data management systems.