Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications

Elkhoukhi, Hamza; Elmouatamid, Abdellatif; Haibi, Achraf; Bakhouya, Mohamed; El Ouadghiri, Driss

doi:10.3390/su17094029

Open AccessReview

Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications

by

Hamza Elkhoukhi

^1,*,

Abdellatif Elmouatamid

^2,*,

Achraf Haibi

³

,

Mohamed Bakhouya

⁴

and

Driss El Ouadghiri

³

¹

Department of Mathematics and Computer Science, University Toulouse 2 Jean Jaurès, 31100 Toulouse, France

²

Department of Sciences and Technologies, UMR Espace-Dev, University of French Guiana, 97300 Cayenne, France

³

IA Laboratory, Faculty of Sciences, Moulay Ismail University, Meknes 11201, Morocco

⁴

LERMA Lab, College of Engineering and Architecture, International University of Rabat, Sala El Jadida 11103, Morocco

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(9), 4029; https://doi.org/10.3390/su17094029

Submission received: 17 February 2025 / Revised: 19 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Renewable Energy Integration and Application in Buildings for Carbon Neutrality 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Internet of things (IoT) and big data technologies are increasingly gaining significance in the implementation of various services and applications. Consequently, much of the research focused on energy efficiency and building management revolves around integrating IoT and big data technologies for data collection and processing. Occupancy detection, comfort, and energy management are the most important services for optimizing building energy consumption in smart buildings, and environmental data play a key role in improving these services. Furthermore, the integration of advanced and recent techniques, such as IoT, big data, and machine learning, is progressively becoming more vital for both researchers and industries. This paper presents and discusses various emerging technologies that will contribute to designing novel IoT-based architectures to improve smart building services. These technologies offer innovative solutions to address the challenges of interoperability, scalability, and real-time data processing within intelligent environments, paving the way for more efficient, adaptive, and user-centric smart building systems. The main aim of this research is to help researchers define an optimal architecture that presents all layers, from sensing to big data stream processing. We established comparative criteria between the most popular data processing techniques to select the appropriate framework for developing intelligent platforms for managing building services, such as occupancy detection systems and occupants’ comfort management, and further, to enhance the deployment of digital twins for critical environment monitoring and anomaly detection. The proposed architecture uses Apache Kafka, Apache Storm, and Apache SAMOA as its core components, creating a comprehensive platform for efficient data collection, monitoring, and processing with high performance in terms of fault tolerance and low latency.

Keywords:

internet of things; big data; occupancy detection; smart building; energy efficiency in building

1. Introduction

Energy security is considered an important challenge in numerous countries to maintaining sustainable economic development in the world. The construction and building sectors consume the most energy globally. Moreover, recent public statistics related to energy consumption around the world cite that 22% of total energy consumption is consumed by residential buildings distributed in different residential end uses (i.e., HVAC, lighting, water heating, cooking, and other appliances) [1]. In fact, the growth of energy demand gives rise to power degradation. Furthermore, the excessive use of fossil fuels can lead to extensive environmental impact. Therefore, developing new strategies to manage energy use and integrating renewable energy sources are a major focus for researchers worldwide.

Most governments around the world investigate the development of new strategies for optimizing energy use and protecting the environment. For example, the Moroccan government aims to achieve 20% of energy-saving potential by 2030 [2]. Accordingly, modern residential and commercial buildings integrate building energy management systems (BEMSs) as advanced methods for controlling different building services, with the aim of optimizing energy consumption and maintaining the occupants’ comfort. BEMSs integrate different services, such as lighting, shading, and HVAC. Moreover, advanced controllers, such as MPC and GPC, can provide significant indoor air quality with potential energy savings in a specific area [3]. Several control methods can merge thermal data to efficiently control HVAC and ventilation systems [4]. Moreover, the integration of renewable sources into the existing grid poses a big challenge to controlling approaches and storage instrumentation. Therefore, many investigators have proposed different smart grid ideas with the intention of securing electricity for consumers and exploiting generated electricity efficiently [5]. In fact, badly managing building’s active systems may considerably increase the wasted energy. Therefore, BEMSs have been proposed by numerous studies in order to facilitate the control and management of energy use in buildings, where weather information and occupant impact are considered key parameters [6,7,8,9]. In addition, building information models play an important role in BEMSs, where different information could be integrated and considered (i.e., such as geometric information, physical properties, and component information) for optimal computing and calculating essential building services such as thermal efficiency and lighting intensity. The main aim of smart buildings is to integrate physical and computational components to create an environment where the energy is well exploited, and the occupants’ comfort is preserved. Each building has a unique characteristic. Therefore, emerging real-time and adaptive approaches to environmental settings are a big challenge enabled by the development of smart buildings. Dealing with this issue requires the integration of new adaptive techniques and deep insight into control modeling. With recent advances in data processing sensing networks, many industries and researchers have confirmed the potential of IoT as an enabler in the development of intelligent and context-aware services and applications [10,11]. Most studies demonstrate that building services require the integration of recent IoT and big-data technologies due to the complexity of large amounts of data that can be monitored and exploited [12,13,14,15]. These services can dynamically react to the changes in the environment and users’ preferences. Their main objective is to create a comfortable environment for occupants depending on their location and ongoing activities.

Typically, a complex sensing network connects home appliances. Therefore, monitoring and controlling home appliances and active systems with high performance require the integration of IoT and big data technologies in order to deal with large amounts of non-structured data that are generated from objects in smart buildings, such as smart meters and other deployed sensors [16,17]. In fact, IoT refers to a network of interconnected devices capable of collecting and exchanging data. Big data refers to the vast and intricate datasets that originate from a variety of sources, including IoT devices. They are characterized by high volume, velocity, and variety, requiring advanced tools for processing and analysis [18,19,20]. The multiple sources of data that are generated from deployed sensors involve the integration of recent big data techniques such as artificial intelligence algorithms and distributed ledger technology. Additionally, data generated from buildings are mostly asynchronous, heterogeneous, and stored in different formats. Therefore, in order to manage building data clearly, big data approaches are required to be integrated for processing large amounts of both historical and stream data to extract appropriate knowledge for making data-driven instructions [21].

In summary, the large-volume and complex data generated from smart objects in buildings necessitates the implementation of effective IoT architecture to efficiently collect data and communicate between building appliances via the integration of recent big data technologies for the processing already-stored or real-time stream data. Accordingly, the main purpose of this study is to illustrate the right path and propose optimal architecture for managing building services with high performance. Furthermore, this study presents and discusses a high level of data-driven approaches for data exchange, building data management, and real-time stream data processing. Further, existing works related to IoT and big data technologies are provided and compared in terms of their characteristics and performances.

The remainder of this paper is organized as follows: Section 2 introduces the methodology followed for collecting and analyzing studies related to data sensing and processing approaches for smart building applications. Section 3 presents an overview of the sensing technologies used for smart building applications. Section 4 sheds more light on different big data technologies, including batch-based and stream-based techniques that can be integrated into building management systems. The discussion of and comparison criteria between stream processing techniques and the proposition of a global IoT and big data architecture platform are presented in Section 5.

2. Materials and Methods

This section provides the methodology used for collecting, selecting, and analyzing studies related to the sensing and data processing approaches that can be integrated for improving smart building services management. The main objective of this work is to share efficient and novel technologies in order to help developers and researchers design an optimal IoT architecture to enhance the infrastructure deployment of new building management systems. This section presents a detailed review methodology.

2.1. Narrative Review Method

The use of IoT and big data technologies in smart buildings can help improve the building management system in order to reduce the energy consumption and maintain the occupants’ comfort. However, selecting the most relevant technologies for setting up an optimal architecture has still proven challenging in many studies related to smart buildings. This study follows a narrative literature review approach to provide a comprehensive overview of IoT and big data technologies that can be applied to smart building applications.

2.2. Data Collection Method

In order to perform this review, we collected relevant scientific articles, technical reports, and international standards from well-known databases such as IEEE Xplore, ScienceDirect, and Google Scholar. The selection criteria focused mostly on recent publications (from 2015 to 2024) related to energy performance, comfort indicators, and smart building technologies.

Several keywords have been used for this study, including “occupancy detection”, “buildings active services”, “buildings management systems”, and “IoT and big data for smart buildings”. The first search in Scopus yielded 1416 documents. Furthermore, another search was made using secondary keywords for extracting IoT and data processing approaches that can be integrated into building management systems.

2.3. Data Analysis Method

The data collected in this work were analyzed using qualitative research content analysis by systematically examining textual information in order to explore how IoT and big data technologies are applied to enable and improve smart building services. This method involved an in-depth examination of the selected literature to categorize the technological methods of enhancing key performance metrics and to highlight their relevance and applications across various smart buildings services (i.e., energy management, occupancy detection, and indoor air quality).

Qualitative content analysis was chosen due to its effectiveness in synthesizing diverse and interdisciplinary sources, especially in studies related to smart buildings. It allowed for the extraction of both explicit outcomes (e.g., energy savings, latency, and data accuracy) and implicit patterns (e.g., integration challenges, privacy concerns, and scalability issues). By clustering findings into thematic categories (i.e., sensor networks, data analytics techniques, system interoperability, and real-time decision-making), the review introduces a comprehensive overview of the current landscape. Despite this approach’s added value, it is important to identify numerous challenges. Like several literature-based studies, the analysis is constrained by the reporting quality of existing studies. Many innovations may have been excluded if they were not presented and published in an accessible academic source.

Based on most of the research work presented in this review, at the end of this work, an architecture is presented, as general as possible, with detailed descriptions of different components that can be integrated in smart building applications. Furthermore, a detailed prototype platform of one of the important services in smart buildings, namely, occupancy detection, is presented by illustrating significant hardware/software settings in order to facilitate the deployment design of the IoT infrastructure for improving smart building services.

3. Sensing Technologies Used in Smart Building Services

New generations of buildings, or smart buildings, aim at integrating various methos of ensuring the comfort of building occupants while also ensuring that the energy is well exploited. In fact, large amounts of data are generated daily from smart building-connected devices. Therefore, IoT, big data analytics, machine learning, and sensing techniques are cost-effective technologies that can be integrated into smart buildings with minimum infrastructure in order to facilitate the use of large amounts of data generated from building objects and to create an optimal decision to well manage building services [5]. Different types of sensors can be integrated for analyzing occupants’ behavior in buildings and, therefore, for studying the occupants’ preferences (i.e., visual and thermal) in order to introduce building automation and establish better control of active building systems [22]. Various studies related to energy management in buildings have investigated the integration of IoT and big data technologies with the aim of managing the trade-off between energy consumption and occupants’ comfort. This section presents an overview of IoT integration into smart buildings, evaluating it based on diverse characteristics.

Indoor air quality is an important feature related to the comfort and well-being of building occupants. For example, the authors of [23] proposed a general architecture prototype for real-time monitoring and processing; the aim of this work is to monitor and analyze the influence of CO₂ on and the SPO2 of building occupants. In fact, IoT platforms can help gather environmental data and act on building equipment in order to make building occupants more comfortable. The authors of [24] present the usefulness of integrating IoT technology for regulating HVAC and determine the opening/closing patterns for windows based on indoor air quality data in order to improve the performance of students learning in a classroom. Moreover, advanced indoor air quality monitoring systems are presented in [25] by integrating an open-source IoT web server platform named Emoncms for long-term monitoring and the storing of useful environmental data. Furthermore, the authors of [26] integrated IoT techniques for monitoring indoor air quality data, including CO, NH₃, CH₄, NO₂, CO₂, and machine learning for predicting different pollutant concentrations. In setting up a low-cost solution for monitoring indoor air quality data, CO₂ is easy to measure and can be influenced by the building occupants. The authors of [27] propose an IoT-based solution for monitoring real-time CO₂ concentration in the closed area. An ESP8266 microcontroller, which comprises a Wi-Fi module, was integrated as a communication unit. Furthermore, the work presented in [28] proposes a cloud-based IoT platform using a Lora gateway for gathering indoor air quality data, such as PM_2.5/PM10, CO, SO₂, NO_x, and O₃ pollutants. Support vector machine regression (SVMR) and neural network algorithms were integrated and evaluated for predictive analysis. Moreover, different algorithms can be proposed for controlling building active systems based on different parameters such as occupancy information, air quality, and thermal parameters. For example, the authors of [29] present a fuzzy control for adjusting the working interval of the exhaust fan with the use of an IoT platform for monitoring the CO₂ level and conducting the control remotely. The work presented in [30] concerns a simple approach for real-time controlling HVAC systems and achieving building occupants’ comfort. IoT technologies have been integrated to collect environmental thermal data. The authors of [31] introduce an advanced approach in order to choose a suitable control strategy for ventilation systems. IoT devices and big data techniques were integrated for real-time data monitoring and processing. Moreover, the work presented in [32] presents a smart home application for controlling HVAC and lighting equipment using the ESP8266 microcontroller and Blynk platform.

Several works related to energy management systems have demonstrated the importance of integrating IoT to enhance many building services, such as occupancy detection [33,34]. Table 1 presents the studies showing the importance of IoT and sensing technologies for improving building services by introducing different communication technologies that can be integrated into building equipment control. For example, the authors of [35] present an occupancy detection algorithm using an IoT platform for collecting four environmental sensors (i.e., temperature, humidity, and CO₂) to determine wasted spaces in the building. An IoT-based prototype has been proposed by the authors of [36] for gathering occupancy information in the framework of energy building management. Lattice iCE40-HX1K stick FPGA boards and Raspberry Pi are the main elements integrated for counting people walking through the door. Further, the authors of [37] present an IoT platform for collecting and processing environmental data to create a low-cost and accurate occupancy detection model. Moreover, the authors of [38] propose a mobile air quality monitoring system (i.e., named e-nose) of different environmental parameters such as CO, CO₂, NO₂, PM10, humidity, and temperature in order to evaluate the impact of human activities on the indoor environment with the integration of different sensing techniques and cloud services.

Smart buildings are considered as sociotechnical systems including various heterogeneous entities (e.g., lighting, HVAC, RES) to balance between energy use and user comfort [39]. Making living in buildings more comfortable requires the integration of advanced techniques allowing for the interaction of different building entities in order to perform appropriate actions (e.g., well-controlling HVAC and lighting), as depicted in Figure 1. For example, occupancy detection and information can help control the building’s active systems [40,41,42]. Different studies have investigated the development of advanced and intelligent strategies for making occupants’ lives more comfortable and therefore efficiently managing energy consumption. The main purpose of smart or intelligent buildings is the responsiveness to users’ preference needs. Integrating recent technologies is the main challenge of developing advanced building services in terms of reliability and performance [10]. Several definitions are proposed for smart buildings due to the evolution of technology. The integration of IoT and big data become important in order to evolve the smart approaches of building services [39,43]. The main purpose of building an efficient institute is to provide a comfortable environment and optimize energy use and gas emission. As depicted in Figure 1, different services (i.e., such as lighting control, air conditioning, solar energy generators, temperature/humidity monitoring, energy use monitoring, and cameras) can be integrated into smart buildings in order to establish the trade-off between occupants’ comfort and energy consumption [16,44,45]. Intelligent approaches and smart control strategies (e.g., automotive control, machine learning, and q-learning) of such building appliances (i.e., such as lighting, HVAC, and windows opening) can assist in ensuring efficient energy use [4,46].

IoT technology is the main concept employed to enable building entities to collect data, run commands, and control equipment. Therefore, the introduction of IoT can be considered an opportunity for developing products that save energy costs and increase user comfort and work efficiency. This subsection presents, as generally as possible, the architecture and detailed layers adopted by industries and researchers. Generally, there is no standard IoT architecture. Diverse architectures have been proposed by different studies. The most adopted IoT architecture consists of four main layers, namely, the perception, network, data processing, and application layers, as shown in Figure 2.

Deploying recent occupancy detection techniques requires the integration of advanced IoT architecture in order to deal with various numbers of connected devices deployed in distributed areas (e.g., offices, conference rooms, etc.) [37,41,47]. As illustrated in Figure 2, four main layers must be implemented in order to facilitate the deployment of advanced occupancy detection scenarios.

Perception layer: The perception layer (i.e., physical layer) consists mainly of sensing techniques. Various types of sensors can be deployed simply or are already installed in the building. Recent studies have revealed an interest in using environmental sensors (i.e., such as CO₂, temperature, humidity, light intensity, etc.) in order to fit the user privacy requirement contrary to camera sensors.

Network layer: The network layer is responsible for connecting all smart things and servers. As presented in Figure 2, different protocols (i.e., such as HTTP, MQTT, etc.) and techniques can be integrated for transmitting and processing sensor data.

Data processing layer: To set up smart building services using big data, traditional methods are not sufficient (i.e., due to latency and fault tolerance). The integration of big data techniques becomes a necessity for collecting and analyzing the numerous building data generated. Sophisticated IoT platforms such as Kaa and Thingsboard can provide advanced services such as the managing and integrating of various IoT devices, and they are horizontally scalable and fault tolerant. Big data tools such as Spark and Apache Storm can process real-time big data streams in a distributed manner [48].

Application layer: In order to efficiently use building systems such as HVAC, lighting, and ventilation, different control strategies have been developed for managing the use of these systems. For instance, building platforms could integrate MPC, GPC, and PID to efficiently control active equipment using environmental data and occupancy information [44].

In order to set up and integrate occupancy detection systems in the building, integrating IoT technologies becomes important to facilitate the collection and processing of data and therefore to make action efficient. Moreover, some works add more layers to their architecture due to a lack of certain features, such as security and business layers [49]. The aim is to facilitate the process of collecting and analyzing data in order to enable more efficient actions with the aim of reducing energy consumption and increasing the users’ comfort.

4. Data Processing Technologies for Smart Building Services

Most modern buildings include a diversity of sensors, actuators, and advanced connectivity techniques. Further, these buildings generate a huge amount of data every second. In fact, it is difficult to organize, analyze, and store such a large amount of data. Therefore, large-scale data processing techniques have become increasingly important for efficiently analyzing building data and generating decisions in order to better control active equipment. The aim is to process a myriad of generated data in an intelligent manner with high-performance computing [39]. In the past, data storage was limited to spreadsheets and roles in database management systems. The information that could be structured in different row–column formats was disregarded [50]. Big data technologies are designed for analyzing different types of data and predicting accurate results in a short period of time. Big data techniques include five main characteristics, named the five Vs (i.e., volume, variety, velocity, value, veracity), in order to deal efficiently with different kinds of generated data.

Volume: Storing and analyzing a large amount of data is the main challenge of big data technologies. In fact, developing big data solutions means no longer dealing only with gigabytes; the current challenge is dealing with zettabytes.

Variety: The data can be categorized into different types (i.e., structured, unstructured, or structured). In fact, traditional database management tools are not sufficient to store and process a large amount of data of different types, and the data can be formed as documents, images, audio, logs, text, emails, or others.

Generally, processing big data consumes substantial resources and time. The aim is to ensure the processing of data streams in a short time with high performance and continuity. In fact, most research work related to big data development refers to the velocity of data streaming collected from websites, sensors, and video streams.

Value: The value refers to how the data can be beneficial in decision-making. For example, the work related to energy efficiency in buildings requires the implementation of big data technologies for managing energy use and increasing human comfort [31,51].

Veracity: This refers to data (i.e., accuracy and noise) and availability. This factor can affect the analysis methods, the time needed, and data dependency.

According to several studies and norms related to big data development, the above-mentioned characteristics do not comprise definitive rule. Several other characteristics can be taken into consideration such as variability (changeability in data flow), viscosity (degree of correlation or complexity), viability (ability to be active), volatility (how long-time data are valid and how long they should be stored (i.e., durability)), and validity (understandable in order to find the hidden relationships) [52].

Combining wireless communication technologies and electronic devices (i.e., smartphones, tablets, sensors, and house appliances) led to the emergence of the concept of the IoT. While the development of IoT infrastructure is growing, the need for platforms for managing data generated by large numbers of devices has become more important. This phenomenon has resulted in the generation of copious amounts of data from diverse sources.

Big data technology is the best solution to the problem of valuing and exploiting the huge data generated from IoT. Furthermore, industries and organizations should take advantage of the massive amount of data collected. The quick growth of information technologies, such as smartphones, social applications, and machine-to-machine communication, has led to the development of big data technologies [53]. Several studies have investigated the integration of IoT and big data for ensuring the development and implementation of different services such as energy building management, healthcare, and electrical vehicles. For example, in the healthcare field, some solutions have been proposed using IoT for remotely monitoring patients and analyzing the data collected from the embedded connected devices such as glycosometers and tension meters [54]. Further, in the building sector, for the successful management of the active equipment, the integration of IoT and data analytics techniques is greatly needed due to the big data generated from sensors deployed in different areas in residentials or industrial buildings [31,47]. Recently, car companies have invested in the development of new strategies in order to manage the reduction in energy consumption of vehicle components (i.e., HVAC) and the deployment of recharge stations using the IoT to collect data and to employ big data tools for analyzing the behavior of vehicles and drivers in order to provide efficient action and control [55,56].

In fact, the IoT and big data development necessitate the integration of different technologies and frameworks in order to transfer and gather the generated data to be processed. The following technologies have been presented in many research projects and are used by different companies to deploy their own solutions.

(1): Fog computing techniques: Fog computing is an extension of cloud computing. It can be considered a mini data center integrated between cloud computing and the edge of the network. Moreover, it is a highly virtualized platform that offers storage, processing, and networking services between edge and traditional cloud servers. Furthermore, depending on much research work, fog computing is described as a bringer that makes cloud computing closer to IoT devices. Any computing device with data processing and storage ability can be considered a fog node [57]. Among the advantages of fog nodes is their ability to reduce the amount of data transferred to the cloud server, as presented in Figure 3.

(2): Data aggregation and ingestion. In order to link IoT devices to the processing unit (i.e., such as Spark and Apache Storm), different linker techniques can be deployed and implemented in order to enable the connection between big data processing elements and the data collected from different sources such as IoT devices and SaaS apps, ending up in different target environments. Data ingestion is an important technology that can help make sense of the increasing volume and data complexity. In fact, two main methods can be adopted for data ingestion, namely, batches and real-time. Real-time data ingestion refers to a process of collecting and transferring data to be processed directly from sources (i.e., see Figure 4). Real-time ingesting is an important technique for applications that are time-sensitive, i.e., when organizations need to rapidly react to new information (i.e., such as stock market trading, power grid monitoring, etc.). Also, the high-performance data pipeline refers to a crucial technique employed in order to help improve real-time data processing when systems are used to make rapid operational decisions and act on new insights. However, batches ingestion is designed for transferring data as batches depending on the interval of time. The ingestion, in this case, is useful when the user needs to gather a specific set of data already stored on a daily basis and not for real-time decision-making.

Several techniques have been developed for data ingestion into different big data technologies (i.e., such as Spark, Storm, Samza, S4, etc.):

Apache Flume (https://flume.apache.org/): This is a reliable and distributed service for efficiently collecting and transferring large amounts of streaming data and log data into the big data technologies employed such as the Hadoop distributed file system (HDFS). It is robust, scales horizontally, and is fault tolerant, with tunable reliability mechanisms. It integrates an extensive data model to enable online data processing techniques.

Apache Kafka (https://kafka.apache.org/): This is component of a data stream platform used for collecting, processing, storing, and integrating data at scale. It has many use cases, including real-time pub/sub messaging, stream processing, and distributed streaming.

Different technologies have been developed in order to improve the management of a large amount of data needing to be stored or processed. Data ingestion is more important in order to ensure the transmission of well-structured data into a processing platform, whether stream (i.e., near real-time) or batch processing.

4.1. Batches-Based Processing Techniques

Batch setting, or batch-based processing, involves procedures and operations pertaining to a big and static dataset. Moreover, the results can be returned at a later time when processing is definitively completed. In fact, batch processing is when the set of data is already stored over a period of time in the server, and a big set of data can be totally analyzed and processed using different batch-based techniques such as Hadoop and Spark. For instance, weekly or monthly processing is required for billing systems and payrolls. Hadoop ecosystems and Apache Spark are the most popular technologies adopted to store, process, and analyze data in batches [59].

Hadoop MapReduce: This is the main programming model and execution engine for analyzing massive data in Hadoop framework is MapReduce. As a programming model, MapReduce includes two main functions, namely, the Map (Ki, Vi) and the reduce (Kj, list (Vj)). Each map function of input key-value (Ki, Vi) is used to output different intermediate key values (Kj, Vj). The intermediate values Vj should be grouped in order to generate the pairs Kj, list (Vj). The reduce function of the Kj list (Vj) summarizes the entire dataset into pairs (i.e., Kk, Vk) after the shuffling phase [59,60,61].

Apache Spark: This is popular for simplifying the development of applications that can be executed in parallel and in a distributed cluster in a fault-tolerant way. The most common programming support for Spark is Scala. The implementations of the tasks are based on the resilient distributed dataset (RDD) as a multiset of objects with the “read-only” feature propagating over a cluster with the fault-tolerant system [59]. In fact, the whole architecture components include Spark Core and a set of libraries. Most of the functionalities provided by Apache Spark are built on top of Spark Core. Moreover, it is considered to be the basis of the distributed and parallel processing of large amounts of data. It contains Spark Core at its top layer, including Spark SQL, Spark streaming, Mlib, and Spark GraphX, which are utilized for data monitoring and processing [62].

The above-mentioned techniques (i.e., the batches processing techniques) can be integrated for setting up different scenarios related to energy management in buildings. Using data already stored in the past can help create the best approaches and methods for increasing profit and decreasing costs. As an example, several works have investigated the integration of batch processing techniques in order to forecast the behavior of building occupants using historical data from previous years in order to develop the best planning and schedules of the occupancy in a specific area (i.e., such as offices, floor, buildings, and conference rooms). The main purpose of such studies is to efficiently control the active equipment of buildings (i.e., turn on/off HVAC, and lighting systems) in order to reduce the wasted energy in buildings.

Better control of building services requires real-time action. Several works investigate the use of environmental data in order to manage the use of most energy-consuming equipment [3,4]. Analyzing all generated data from IoT and acting directly on this equipment necessitate the integration of stream data processing techniques in order to analyze continuously generated data with high performance (i.e., latency and accuracy) [63].

4.2. Stream-Based Processing Techniques

Processing a myriad of generated data in batch is not sufficient when a new instance of data should be processed rapidly in order to ensure a fast reaction. Specifically, the stream processing paradigm online enables the analysis of continuous and dynamic data. Therefore, stream data processing techniques have become more important to industries and researchers who are working on real-time services such as building energy management systems [5]. The common streaming engines that have a large-scale adoption are Apache Spark Streaming (https://spark.apache.org/), Apache Flink (https://flink.apache.org/), and Apache Storm (https://storm.apache.org/). In fact, several applications necessitate the integration of these data stream processing tools. For example, the purchases of tasks for some applications, such as online video games that are downloaded from different sources, must analyze the users’ actions.

Apache Storm: This is a distributed real-time processing system. It is easy to process and consumes a large amount of stream data in order to ensure real-time processing. It includes several processing techniques, such as API and SQL stream processing. The architecture of the Storm is composed of three essential elements: spouts as a source unit, and bolts and topologies as processing units. A spout is considered a source of streams. It consumes data from different sources, such as message queues and databases, and then transfers the tuples into the bolts in the topology without any processing tasks. Moreover, bolts consume the tuple from the spout, apply some tasks, such as aggregation, joining operations, or filtering, and then emit the new tuples to its output streams. The topology of a Storm application covers the multi-stage stream communication and computation of spouts and bolts, which are connected to stream grouping [64].

Apache Spark streaming: Spark streaming enables real-time data processing of data coming from different sources such as Flume, Kafka, and Amazon Kinesis. In fact, these processed data can be emitted to file systems and live dashboards. It is considered a micro batch generic-purpose data stream processing engine, an extension of the Spark Core API. The main data abstraction in Spark is called the resilient distributed dataset (RDD). Furthermore, Spark streaming offers the concept of discretized stream (DStream) in order to process a continuous stream of data, represented as a sequence of RDDs [65].

Apache Flink: This is designed for the distributed processing and stateful computation of continuous data streams (i.e., unbounded or bounded data streams). It is also considered a generic-purpose data stream processing engine. Each dataflow or instance in Flink starts with one or several sources and ends in multiple sinks in order to submit data to different outputs such as file systems or databases [66,67]. Apache Flink can be integrated into different cluster resource managers such as Hadoop and Kubernetes, and it can also be set up to be executed as a stand-alone cluster.

4.3. Big Data and Machine Leaning

Industry parts need to gather useful information from data streams (i.e., such as social media, credit card transactions, deployed sensors, etc.). Data collected from high-speed streams are more and more common in modern data ecosystems [68]. The real-time processing of a huge amount of data is not easy. For example, web companies must integrate systems that are able to process each instance of incoming data, demonstrating high performance in terms of time of execution and memory use [69]. Furthermore, Machine learning approaches have has an impact on many types of applications and systems, such as speech processing, natural language, computer vision, etc. Batch-based learning techniques require the training phase on the data already stored in file systems or databases. Different frameworks are designed and developed for applying machine learning algorithms on big data as one batch. Distributed machine learning frameworks (i.e., MLBase and Mahout) have demonstrated their effectiveness in terms of variety and volume [70]. However, different frameworks have been developed in order to address the issue of applying machine learning techniques on data coming from different sources (e.g., social media, building sensors, internet traffic data) as a data stream. In fact, streaming machine learning frameworks (i.e., such as massive online analysis (MOA)) also address the dimensions of variety and velocity [69,70].

Batch-based machine learning integration (Apache Mahout): Most of the work related to building energy management, such as via occupancy detection systems, has investigated the integration of batch machine learning techniques. This means that the data should be already available in a specific database or file system in order to build a model by training it on this data and building an efficient model. In this case, the model cannot be updated or improved using stream data. Further, Apache Mahout is an open-source framework that offers the free implementation of scalable and distributed machine learning algorithms, including several approaches such as classification, clustering, and recommender engine. The authors of [71] present the Hadoop ecosystems architecture, which includes several libraries, including the Mahout library, which is suitable for applications necessitating the scale of large amount of data executed on top of the Apache Hadoop. Figure 5 presents the global architecture of Hadoop ecosystems presented by [71], including the Mahout Library. In summary, the main purpose of Apache Mahout is to execute and apply machine learning algorithms on data already stored in HDFS.

In fact, batch machine learning is extremely important in order to improve the different services that need to process historical data in order to make efficient decisions and to plan for the present or the future. However, many issues cannot deal with batch-based methods because of the environment change, and the static model will be unserviceable over time.

4.4. Online and Real-Time Machine Learning Techniques

Most developed solutions and frameworks related to machine learning techniques address just two out of three dimensions (i.e., variety and volume). In fact, some solutions have emerged for the three most popular dimensions (i.e., variety, volume, and velocity) in order to perform the stream mining of big data [72]. The main idea of stream machine learning or online machine learning is to update the model continuously and improve performance when each instance arrives at the processing unit in real or near real time. In fact, batch-based machine learning algorithms are static. These algorithms should exploit a large dataset to train and identify the model results before moving it to production. In contrast, online learning algorithms are processed incrementally when new data become available. Actually, stream learning is still developing its philosophy, and the documentation of its general properties is not yet comprehensive, as in batch learning. In this subsection, we discuss the two main frameworks that involve machine learning for the data stream: Apache Samoa, developed by Apache [69], and StreamDM, by Huawei [68].

Apache SAMOA (scalable advanced massive online analysis) is a distributed system for applying and programming streaming algorithms in order to develop new ML algorithms by avoiding the complication of streaming processing engines (SPEs), and developers can integrate new platforms using SAMOA APIs. The SAMOA project is divided into two main elements: SAMOA-Platform and SAMOA-API. The SAMOA-API can be integrated with different SPEs without any restriction. If new SPEs are being released, a new SAMOA platform module can be included. In order to perform ML algorithm development in SAMOA, five components are included in the global architecture: processing items, content event, stream, topology, and task. Processors are computational units in SAMOA use for executing algorithms separately on a defined SPE. In fact, these processing items (PIs) refer to the particular implementation of processors in different SPEs, such as Bolt in Storm. Two different PIs type are presented by the SAMOA community, namely, the entrance PI and the normal PI. The entrance PI consumes events from external sources or generates new events and transfers them to the normal PI to be processed and exploited for ML algorithms. The normal PI can send the processed event or new event to the outgoing streams. The SAMOA framework facilitates the implementation of ML algorithms in a parallel way by specifying the parallelism hint (i.e., the PI runtime during the task execution), as presented in Figure 6. Also, the stream can be considered as a communication tool between PIs (source PI and destination PI), as shown in Figure 6. The content event handles the data transferred from one PI to another via a stream. The source Pi sends an event to the destination PI via a stream, while the destination PI consumes the event coming from source PI. In fact, in order to consume the content event coming from source PI, the destination PI should integrate different “grouping” mechanisms, which define the stream route of content events. Three grouping mechanisms in SAMOA can be recognized, namely, shuffle grouping, all grouping, and key grouping. Moreover, the topology is a set of connected PIs and streams. The distributed streaming ML algorithms can be integrated into the SAMOA architecture. Tasks are related to ML integration. It is often used for evaluating the performance of algorithms. For example, the prequential evaluation task is the most popular task in the SAMOA architecture. It is used for testing the performance of models and for exploiting the current instance to train and update the model [69].

StreamDM is an open-source framework for applying machine learning algorithms on data streams. It has emerged on top of Spark streaming. The main goal is to connect the Spark framework using the Spark streaming library and online machine learning algorithms in a distributed system. The Spark environment already includes machine learning libraries such as MLlib and spark.ml. The aim of StreamDM is to develop advanced methods of data mining and machine learning in order to apply them to large-scale data streams. In fact, StreamDM has been developed for practical users, only using algorithms that are already presented by the framework, i.e., without the need to develop new algorithms, and researchers and developers can implement new algorithms in StreamDM. The Spark streaming is the main provider of data streams in StreamDM. In fact, it is not an instance-by-instance setting of streams. The idea is to process mini batches of data. Therefore, the latency of data needs more time. Further, StreamDM is mostly based on tasks that describe its functionality in mining data streams.

Figure 7 describes as generally as possible the workflow of an example of a StreamDM task named “EvaluatePrequential”. The task processes contain four main classes. First, the Reader class reads data streams from discretized streams. In addition, the testing and training data streams are consumed by the Learner class, which contains machine learning algorithms. Further, the predicted data are transmitted to the Evaluator class in order to evaluate the performance of the applied algorithm. Then, the results are sent by the Output Writer class into other data stream consumers such as file systems, consoles, or other tasks.

5. Discussion

Most digital solutions exploit a huge amount of data generated every millisecond. The main aim is to process these data quickly with high performance. Furthermore, many applications and systems require the near-real-time processing of large amounts of data streams and the real-time acting and controlling of a myriad of deployed devices simultaneously, such as building energy management systems. Numerous open-source frameworks and systems have been designed and developed for processing data streams. In fact, generating significant knowledge from unbounded data in time has become more difficult. The various number of developed frameworks and systems presents a substantial challenge for researchers and developers in choosing the best elements and determining which framework is suitable for their use cases. There are many features for processing data streams that should be taken into account in order to make the right design and develop an appropriate approach.

The benchmarking of some services is an important tool that can be used for evaluating the performance of big data technologies, especially stream processing frameworks [73]. Latency and fault tolerance are the main parameters that can help researchers and industries to choose an appropriate framework for their use cases. Several works have evaluated the performance of different stream processing frameworks. The aim is to choose a suitable system that can help facilitate the implementation of desirable solutions. Most works related to big data stream technologies focus on evaluating three popular data stream processing systems, namely, Apache Storm, Apache Spark, and Apache Flink. These systems can be integrated in order to process data streams with high performance. For example, the authors of [74] set up a benchmarking method for testing and evaluating the performance of Apache Storm, Spark streaming, and Flink in terms of throughput and latency. The results show that Apache Flink and Strom are close in terms of latency; however, Apache Spark has the biggest latency. Moreover, the work presented in [75] shows the potential of Apache Storm in terms of latency and node failure. The results of the experiment showed that Storm has better performance (i.e., by at least 50%) than Apache Flink and Apache Spark. Table 2 presents the comparison criteria of the two most popular big data stream processing systems.

The main objective of this work is to design and choose the best elements in order to set up occupancy detection systems using environmental data generated from sensors deployed in buildings. Most building services require control of active equipment in time; this is why the integration of real-time monitoring and processing large amounts of environmental data have become more important in order to set up effective control of building active equipment. In fact, occupancy information is considered the main input for applying different control approaches such as PID, MPC, and GPC on HVAC, ventilation, or lighting systems in order to save energy and maintain the users’ comfort.

Machine learning (ML) algorithms are the most popular solution that can help researchers and developers to use environmental data for detecting indoor building occupancy, i.e., either the presence of an occupant or the number thereof. Most of the works related to occupancy detection systems use the traditional ML algorithm. This means that the environmental data should already be collected and stored for exploitation in order to build the model using many ML algorithms (i.e., such as SVM, ANN, DT, etc.). However, dealing with environmental data is not easy because it is not stationary and changes over time. The main purpose of this paper is to design and develop an appropriate architecture platform using recent big data stream processing technologies, which can involve the integration of machine learning algorithms that can deal with data streams and non-stationary data. Based on the results presented in the literature review of works that address the evaluation of stream processing techniques, Storm is still the best framework for implementing occupancy detection systems in terms of latency and fault tolerance. In this work, the objective is to present a general architecture that can be integrated into a smart building for detecting occupancy based on environmental data using recent IoT and big data technologies. The idea is to use valuable tools in terms of performance in order to provide efficient building services. Figure 8 illustrates the adopted architecture that can improve the performance of occupancy detection using IoT and big data technologies. The architecture includes four main layers.

Data sensing: Many types of sensors can be included in the proposed architecture. As an example, for collecting CO₂ data, there are different embedded technologies that can help in collecting data from sensors in the environment. The main types of environmental data that can be used for detecting occupancy include CO₂, temperature, humidity, lighting, and electricity consumption. Furthermore, infrared or camera sensors can be exploited to gather the true occupancy information, which can help to train the model (it is named the label).

IoT platform middleware: An IoT platform is a set of elements that enable developers to collect data from multiple sensors remotely while ensuring secure connectivity. The aim of each deployed IoT platform is fast communication between different connected devices and the cloud, using advanced networking protocols such as MQTT and HTTPs. Different IoT platforms provide a free service for data acquisition and visualization. In fact, these platforms can be implemented in the cluster. For example, Kaa and Thingsboard are the most popular open-source IoT platforms that include multiple services related to data management.

Stream data processing techniques: In the proposed architecture, Storm is the most suitable due to its effectiveness in terms of fault tolerance and latency. Moreover, it is a distributed system and can support an advanced Machine learning framework, such as Apache SAMOA.

Smart home services: Different applications can be developed for controlling active equipment in buildings. The aim is to manage building services such as HVAC and lighting systems in order to reduce energy consumption and maintain the occupants’ comfort. Different approaches proposed in the literature aim at building control devices efficiently [29,30,31].

Furthermore, a prototype platform was deployed in the laboratory in order to set up the occupancy detection use case by integrating the above-mentioned technologies for ensuring the real-time stream processing of data coming from deployed sensors. Figure 9 illustrates different elements and components that are integrated into the platform in order to set up an occupancy detection system. In the proposed scenario, multiple types of sensors and microcontrollers have been exploited as hardware parts for collecting, gathering, and sending data to the remote server. In fact, the main hardware modules that have been deployed for data acquisition and monitoring in EEBLab are as follows.

MH Z14A is an infrared CO₂ meter that could be integrated using either UART or I2C protocols. DHT22 is used in thermal comfort by measuring indoor/outdoor temperature and relative humidity in a given area. One wire communication protocol was used for gathering data. Further, Raspberry PI 3 is considered a bridge or edge of the deployed platform. The main task of Raspberry is to gather data using the UART protocol and to filter/reorganize sensors’ data to be transmitted to the server.

NodeMCU is a low-cost Wi-Fi module with an 802.1 Wi-Fi direct protocol and a soft-AP-integrated TCP/IP protocol stack, Fan module (12 V) is used to control air quality in the room, and PZEM 004T current sensor is deployed for measuring the consumption of different building appliances and machines (e.g., ventilation, HVAC, lighting).

Regarding data transfer, communication techniques are required. The MQTT (message queue telemetry transport) protocol is deployed in the Ubuntu server. It is mainly used to transmit data from the monitoring prototype to the gateway actor. It is worth noting that MQTT protocol is a machine-to-machine connectivity protocol, and it was designed as an extremely lightweight publish/subscribe messaging transport. In fact, the sensing node starts sampling sensors’ data and sends them via MQTT to the gateway, which transports them to the server using HTTP transport protocols. The deployed cluster includes the Apache Storm setup as a data stream engine for processing data coming directly from sensors without storing them and then resending the decision results for the ventilation system control.

6. Conclusions and Perspectives

The integration of IoT and big data is a crucial solution for improving building services. For example, managing building equipment requires the efficient manipulation of the large amount of data generated from connected devices. In this work, we provide an overview highlighting the transformative role of IoT and big data technologies in smart building applications in order to improve operational efficiency, energy management, and occupant comfort. Many technologies have been presented and discussed: both batch-based and stream-based techniques for developing the optimal architecture for processing building data and setting up active services. The key findings demonstrate that IoT devices, integrated with big data analytics, enable the real-time monitoring and optimization of building systems, leading to significant energy savings and improved sustainability. In fact, stream processing technologies are the most suitable for processing building environmental data.

For example, Storm is classified by various studies as the best framework that can be integrated for processing data streams in terms of latency and fault tolerance. The main objective of this study was to design an effective architecture, including the best tools for processing large amounts of environmental data, in order to implement efficient management building systems so as to improve building services with the aim of reducing energy and maintaining the occupants’ comfort. The technologies and platforms studied are the main means for researchers to improve the deployment of next-generation approaches, such as digital twin, for critical environment monitoring and anomaly detection. Future research should focus on addressing these issues by developing robust data security frameworks, improving interoperability between IoT devices, and exploring advanced data processing techniques like edge computing. Overall, the continued development of IoT and big data technologies holds great promise for the evolution of smart buildings, fostering more intelligent, energy-efficient, and responsive environments.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

González-Torres, M.; Pérez-Lombard, L.; Coronel, J.F.; Maestre, I.R.; Yan, D. A review on buildings energy information: Trends, end-uses, fuels and drivers. Energy Rep. 2022, 8, 626–637. [Google Scholar] [CrossRef]
El Asri, N.; Nouira, Y.; Maaroufi, I.; Marfak, A.; Saleh, N.; Mharzi, M. The policy of energy management in public buildings procurements through the study of the process of delegated project management-Case of Morocco. Energy Policy 2022, 165, 112944. [Google Scholar] [CrossRef]
Berouine, A.; Ouladsine, R.; Bakhouya, M.; Essaaidi, M. Towards a real-time predictive management approach of indoor air quality in energy-efficient buildings. Energies 2020, 13, 3246. [Google Scholar] [CrossRef]
Berouine, A.; Akssas, E.; Naitmalek, Y.; Lachhab, F.; Bakhouya, M.; Ouladsine, R.; Essaaidi, M. A fuzzy logic-based approach for HVAC systems control. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019. [Google Scholar] [CrossRef]
Hannan, M.A.; Faisal, M.; Ker, P.J.; Mun, L.H.; Parvin, K.; Mahlia, T.M.I.; Blaabjerg, F. A review of internet of energy based building energy management systems: Issues and recommendations. IEEE Access 2018, 6, 38997–39014. [Google Scholar] [CrossRef]
Michailidis, P.; Michailidis, I.; Gkelios, S.; Kosmatopoulos, E. Artificial Neural Network Applications for energy management in buildings: Current trends and future directions. Energies 2024, 17, 570. [Google Scholar] [CrossRef]
Salimi, S.; Hammad, A. Critical review and research roadmap of office building energy management based on occupancy monitoring. Energy Build. 2018, 182, 214–241. [Google Scholar] [CrossRef]
Im, P.; Bae, Y.; Cui, B.; Lee, S.; Bhattacharya, S.; Adetola, V.; Vrabie, D.; Zhang, L.; Leach, M. Literature Review for Sensor Impact Evaluation and Verification Use Cases—Building Controls and Fault Detection and Diagnosis (FDD); US Department of Energy: Washington, DC, USA, 2020. [Google Scholar] [CrossRef]
Smith, A.R.; Ghamari, M.; Velusamy, S.; Sundaram, S. Thin-Film technologies for sustainable Building-Integrated photovoltaics. Energies 2024, 17, 6363. [Google Scholar] [CrossRef]
Bakhouya, M.; NaitMalek, Y.; Elmouatamid, A.; Lachhab, F.; Berouine, A.; Boulmrharj, S.; Ouladsine, R.; Felix, V.; Zine-dine, K.; Khaidar, M.; et al. Towards a context-driven platform using IoT and big data technologies for energy efficient buildings. In Proceedings of the 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, 24–26 October 2017; pp. 1–5. [Google Scholar] [CrossRef]
Lachhab, F.; Bakhouya, M.; Ouladsine, R.; Essaaidi, M. Context-driven monitoring and control of buildings ventilation systems using big data and Internet of Things–based technologies. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2018, 233, 276–288. [Google Scholar] [CrossRef]
Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]
Dwivedi, Y.K.; Hughes, L.; Ismagilova, E.; Aarts, G.; Coombs, C.; Crick, T.; Duan, Y.; Dwivedi, R.; Edwards, J.; Eirug, A.; et al. Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int. J. Inf. Manag. 2019, 57, 101994. [Google Scholar] [CrossRef]
Miorandi, D.; Sicari, S.; De Pellegrini, F.; Chlamtac, I. Internet of things: Vision, applications and research challenges. Ad Hoc Netw. 2012, 10, 1497–1516. [Google Scholar] [CrossRef]
Minerva, R.; Lee, G.M.; Crespi, N. Digital Twin in the IoT context: A survey on technical features, scenarios, and architectural models. Proc. IEEE 2020, 108, 1785–1824. [Google Scholar] [CrossRef]
Alemazkoor, N.; Tootkaboni, M.; Nateghi, R.A. Louhghalam Smart-Meter Big Data for Load Forecasting: An Alternative Approach to Clustering. IEEE Access 2022, 10, 8377–8387. [Google Scholar] [CrossRef]
Berouine, A.; Lachhab, F.; Malek, Y.N.; Bakhouya, M.; Ouladsine, R. A smart metering platform using big data and IoT technologies. In Proceedings of the 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, 23–26 April 2017; pp. 1–6. [Google Scholar] [CrossRef]
Rizwan, B.M.; Gill, A.Q. Towards an IoT big data analytics framework: Smart buildings systems. In Proceedings of the IEEE 2016 IEEE 18th International Conference on High Performance Computing and Communications, Sydney, Australia, 12–14 December 2016; pp. 1325–1332. [Google Scholar] [CrossRef]
Plageras, A.P.; Psannis, K.E.; Stergiou, C.; Wang, H.; Gupta, B. Efficient IoT-based sensor BIG Data collection–processing and analysis in smart buildings. Future Gener. Comput. Syst. 2018, 82, 349–357. [Google Scholar] [CrossRef]
Bashir, M.R.; Gill, A.Q.; Beydoun, G.; Mccusker, B. Big data management and analytics metamodel for IoT-enabled smart buildings. IEEE Access 2020, 8, 169740–169758. [Google Scholar]
Priyanka, E.B.; Thangavel, S.; Meenakshipriya, B.; Prabu, D.V.; Sivakumar, N.S. Big Data Technologies with Computational Model Computing Using Hadoop with Scheduling Challenges. In Studies in Computational Intelligence; Springer: Cham, Switzerland, 2021. [Google Scholar]
Dong, B.; Prakash, V.; Feng, F.; O’Neill, Z. A review of smart building sensing system for better indoor environment control. Energy Build. 2019, 199, 29–46. [Google Scholar] [CrossRef]
Malek, Y.N.; Kharbouch, A.; El Khoukhi, H.; Bakhouya, M.; De Florio, V.; El Ouadghiri, D.; Latre, S.; Blondia, C. On the use of IoT and Big Data Technologies for Real-time Monitoring and Data Processing. Procedia Comput. Sci. 2017, 113, 429–434. [Google Scholar] [CrossRef]
Tagliabue, L.C.; Cecconi, F.R.; Rinaldi, S.; Ciribini, A.L.C. Data driven indoor air quality prediction in educational facilities based on IoT network. Energy Build. 2021, 236, 110782. [Google Scholar] [CrossRef]
Benammar, M.; Abdaoui, A.; Ahmad, S.H.M.; Touati, F.; Kadri, A. A modular IoT platform for real-time indoor air quality monitoring. Sensors 2018, 18, 581. [Google Scholar] [CrossRef]
Mumtaz, R.; Zaidi, S.M.H.; Shakir, M.Z.; Shafi, U.; Malik, M.M.; Haque, A.; Mumtaz, S.; Zaidi, S.A.R. Internet of things (Iot) based indoor air quality sensing and predictive analytic—A COVID-19 perspective. Electron 2021, 10, 184. [Google Scholar] [CrossRef]
Marques, G.; Ferreira, C.R.; Pitarma, R. Indoor Air Quality Assessment Using a CO₂ Monitoring System Based on Internet of Things. J. Med. Syst. 2019, 43, 67. [Google Scholar] [CrossRef] [PubMed]
Ramachandraarjunan, S.; Perumalsamy, V.; Narayanan, B. IoT based artificial intelligence indoor air quality monitoring system using enabled RNN algorithm techniques. J. Intell. Fuzzy Syst. 2022, 43, 2853–2868. [Google Scholar] [CrossRef]
Pradityo, F.; Surantha, N. Indoor air quality monitoring and controlling system based on IoT and fuzzy logic. In Proceedings of the 2019 7th International Conference on Information and Communication Technology (ICoICT), Kuala Lumpur, Malaysia, 26–29 July 2019. [Google Scholar] [CrossRef]
Rajith, A.; Soki, S.; Hiroshi, M. Real-time optimized HVAC control system on top of an IoT framework. In Proceedings of the Third International Conference on Fog and Mobile Edge Computing (FMEC), Malmö, Sweden, 23–26 April 2018. [Google Scholar] [CrossRef]
Lachhab, F.; Bakhouya, M.; Ouladsine, R.; Essaaidi, M. Towards an Intelligent Approach for Ventilation Systems Control Using IoT and Big Data Technologies. Procedia Comput. Sci. 2018, 130, 926–931. [Google Scholar] [CrossRef]
Taştan, M.; Gökozan, H. An Internet of Things Based Air Conditioning and Lighting Control System for Smart Home. Am. Sci. Res. J. Eng. Technol. Sci. 2018, 50, 181–189. [Google Scholar]
Elkhoukhi, H.; Bakhouya, M.; Elmouatamid, A.; El Ouadghiri, D.; Gaber, J. A review of occupancy sensing technologies and approaches in smart buildings. Int. J. RF Technol. 2024, 15, 19–48. [Google Scholar] [CrossRef]
de Bakker, C.; Aries, M.; Kort, H.; Rosemann, A. Occupancy-based lighting control in open-plan office spaces: A state-of-the-art review. Build. Environ. 2017, 112, 308–321. [Google Scholar] [CrossRef]
Ji, Y.; Ok, K.; Choi, W.S. Occupancy detection technology in the building based on IoT environment sensors. In Proceedings of the 8th International Conference on the Internet of Things, Santa Barbara, CA, USA, 15–18 October 2018. [Google Scholar] [CrossRef]
Huang, Q.; Rodriguez, K.; Whetstone, N.; Habel, S. Rapid internet of things (IoT) prototype for accurate people counting towards energy efficient buildings. J. Inf. Technol. Constr. 2019, 24, 1–13. [Google Scholar] [CrossRef]
Parise, A.; Manso-Callejo, M.A.; Cao, H.; Mendonca, M.; Kohli, H.; Wachowicz, M. Indoor Occupancy Prediction using an IoT Platform. In Proceedings of the 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), Granada, Spain, 22–25 October 2019; pp. 26–31. [Google Scholar] [CrossRef]
Taştan, M.; Gökozan, H. Real-time monitoring of indoor air quality with internet of things-based e-nose. Appl. Sci. 2019, 9, 3435. [Google Scholar] [CrossRef]
Daissaoui; Boulmakoul, A.; Karim, L.; Lbath, A. IoT and Big Data Analytics for Smart Buildings: A Survey. Procedia Comput. Sci. 2020, 170, 161–168. [Google Scholar] [CrossRef]
Kitzberger, T.; Kotik, J.; Pröll, T. Energy savings potential of occupancy-based HVAC control in laboratory buildings. Energy Build. 2022, 263, 112031. [Google Scholar] [CrossRef]
Elkhoukhi, H.; Naitmalek, Y.; Berouine, A.; Bakhouya, M.; Elouadghiri, D.; Essaaidi, M. Towards a Real-time Occupancy Detection Approach for Smart Buildings. Procedia Comput. Sci. 2018, 134, 114–120. [Google Scholar] [CrossRef]
Rueda, L.; Agbossou, K.; Cardenas, A.; Henao, N.; Kelouwani, S. A comprehensive review of approaches to building occupancy detection. Build. Environ. 2020, 180, 106966. [Google Scholar] [CrossRef]
Qolomany, B.; Al-Fuqaha, A.; Gupta, A.; Benhaddou, D.; Alwajidi, S.; Qadir, J.; Fong, A.C. Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey. IEEE Access 2019, 7, 90316–90356. [Google Scholar] [CrossRef]
Mariano-Hernández, D.; Hernández-Callejo, L.; Zorita-Lamadrid, A.; Duque-Pérez, O.; García, F.S. A review of strategies for building energy management system: Model predictive control, demand side management, optimization, and fault detect & diagnosis. J. Build. Eng. 2021, 33, 101692. [Google Scholar] [CrossRef]
Yu, L.; Qin, S.; Zhang, M.; Shen, C.; Jiang, T.; Guan, X. A Review of Deep Reinforcement Learning for Smart Building Energy Management. IEEE Internet Things J. 2021, 8, 12046–12063. [Google Scholar] [CrossRef]
Berouine, A.; Ouladsine, R.; Bakhouya, M.; Essaaidi, M. A Predictive Control Approach for Thermal Energy Management in Buildings. SSRN Electron. J. 2022, 8, 9127–9141. [Google Scholar] [CrossRef]
Elkhoukhi, H.; NaitMalek, Y.; Bakhouya, M.; Berouine, A.; Kharbouch, A.; Lachhab, F.; Hanifi, M.; El Ouadghiri, D.; Essaaidi, M. A platform architecture for occupancy detection using stream processing and machine learning approaches. Concurr. Comput. Pr. Exp. 2020, 32, e5651. [Google Scholar] [CrossRef]
Krishnamurthi, R.; Kumar, A.; Gopinathan, D.; Nayyar, A.; Qureshi, B. An overview of iot sensor data processing, fusion, and analysis techniques. Sensors 2020, 20, 6076. [Google Scholar] [CrossRef]
Antão, L.; Pinto, R.; Reis, J.; Gonçalves, G. Requirements for testing and validating the industrial internet of things. In Proceedings of the 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Västerås, Sweden, 9–13 April 2018; pp. 110–115. [Google Scholar] [CrossRef]
Baig, M.I.; Shuib, L.; Yadegaridehkordi, E. Big data adoption: State of the art and research challenges. Inf. Process. Manag. 2019, 56, 102095. [Google Scholar] [CrossRef]
Farzaneh, H.; Malehmirchegini, L.; Bejan, A.; Afolabi, T.; Mulumba, A.; Daka, P.P. Artificial intelligence evolution in smart buildings for energy efficiency. Appl. Sci. 2021, 11, 763. [Google Scholar] [CrossRef]
Zanoon, N.; Al-Haj, A.; Khwaldeh, S.M. Cloud Computing and Big Data is there a Relation between the Two: A Study. Int. J. Appl. Eng. Res. 2017, 12, 6970–6982. [Google Scholar]
Alansari, Z.; Anuar, N.B.; Kamsin, A.; Soomro, S.; Belgaum, M.R.; Miraz, M.H.; Alshaer, J. Challenges of Internet of Things and big data integration. In Proceedings of the Emerging Technologies in Computing: First International Conference (iCETiC), London, UK, 24 August 2018; pp. 47–55. [Google Scholar] [CrossRef]
Kharbouch, A.; El Khoukhi, H.; Nait Malek, Y.; Bakhouya, M.; De Florio, V.; El Ouadghiri, D.; Latré, S.; Blondia, C. Towards an IoT and Big Data Analytics Platform for the Definition of Diabetes Telecare Services. SSRN Electron. J. 2018. [Google Scholar] [CrossRef]
Lv, Z.; Qiao, L.; Cai, K.; Wang, Q. Big Data Analysis Technology for Electric Vehicle Networks in Smart Cities. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1807–1816. [Google Scholar] [CrossRef]
Malek, Y.N.; Najib, M.; Bakhouya, M.; Essaaidi, M. Multivariate deep learning approach for electric vehicle speed forecasting. Big Data Min. Anal. 2021, 4, 56–64. [Google Scholar] [CrossRef]
Yu, T.; Wang, X.; Shami, A. A novel fog computing enabled temporal data reduction scheme in iot systems. In Proceedings of the 2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–5. [Google Scholar] [CrossRef]
Ullah, F.; Babar, M.A. Architectural Tactics for Big Data Cybersecurity Analytics Systems: A Review. J. Syst. Softw. 2019, 151, 81–118. [Google Scholar] [CrossRef]
Herodotou, H.; Chen, Y.; Lu, J. A Survey on Automatic Parameter Tuning for Big Data Processing Systems. ACM Comput. Surv. 2020, 53, 43. [Google Scholar] [CrossRef]
Herodotou, H.; Babu, S. Profiling, what-if analysis, and costbased optimization of mapreduce programs. Proc. VLDB Endow. 2011, 4, 1111–1122. [Google Scholar] [CrossRef]
Herodotou, H.; Dong, F.; Babu, S. Mapreduce programming and costbased optimization? Crossing this chasm with starfish. Proc. VLDB Endow. 2011, 4, 1446–1449. [Google Scholar] [CrossRef]
Santhanam, R. Comparative Study of XGBoost4j and Gradient Boosting for Linear Regression Comparative Study of XGBoost4j and Gradient Boosting for Linear Regression. Int. J. Control Theory Appl. 2016, 9, 40. [Google Scholar]
Elkhoukhi, H.; Bakhouya, M.; El Ouadghiri, D.; Hanifi, M. Using Stream Data Processing for Real-Time Occupancy Detection in Smart Buildings. Sensors 2022, 22, 2371. [Google Scholar] [CrossRef]
Kharbouch, A.; Naitmalek, Y.; Elkhoukhi, H.; Bakhouya, M.; De Florio, V.; El Ouadghiri, M.D.; Umi; Latre, S.; Blondia, C. IoT and big data technologies for monitoring and processing real-time healthcare data. Int. J. Distrib. Syst. Technol. 2019, 10, 17–30. [Google Scholar] [CrossRef]
Apache, S. Spark Streaming Programming Guide. 2024. Available online: https://spark.apache.org/docs/latest/streaming-programming-guide.html (accessed on 20 December 2024).
Isah, H.; Abughofa, T.; Mahfuz, S.; Ajerla, D.; Zulkernine, F.; Khan, S. A survey of distributed data stream processing frameworks. IEEE Access 2019, 7, 154300–154316. [Google Scholar] [CrossRef]
Flink, A. Apache Flink Architecture. 2020. Available online: https://flink.apache.org/flink-architecture.html (accessed on 20 December 2024).
Bifet, A.; Maniu, S.; Qian, J.; Tian, G.; He, C.; Fan, W. StreamDM: Advanced Data Mining in Spark Streaming. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, 14–17 November 2015. [Google Scholar] [CrossRef]
De Francisci Morales, G.; Bifet, A. SAMOA: Scalable advanced massive online analysis. J. Mach. Learn. Res. 2015, 16, 149–153. [Google Scholar]
Bifet, A.; Morales, G.D.F. Big data stream learning with SAMOA. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW), Shenzhen, China, 14 December 2014. [Google Scholar] [CrossRef]
Al-Barznji, K.; Atanassov, A. Collaborative Filtering Techniques for Generating Recommendations on Big Data. In Proceedings of the International Conference Automatics and Informatics, Sofia, Bulgaria, 4–6 October 2017. [Google Scholar]
Fernandes, L.F.D.O.; Bernardini, F.; Meza, E.M.; Miranda, L.; Viterbo, J. Energy Consumption Prediction using Data Stream Learning for Commercial Buildings. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 441–446. [Google Scholar] [CrossRef]
Karimov, J.; Rabl, T.; Katsifodimos, A.; Samarev, R.; Heiskanen, H.; Markl, V. Benchmarking distributed stream data processing systems. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1507–1518. [Google Scholar] [CrossRef]
Chintapalli, S.; Dagit, D.; Evans, B.; Farivar, R.; Graves, T.; Holderbaugh, M.; Liu, Z.; Nusbaum, K.; Patil, K.; Peng, B.J.; et al. Benchmarking streaming computation engines: Storm, flink and spark streaming. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, USA, 23–27 May 2016; pp. 1789–1792. [Google Scholar] [CrossRef]
Lopez, M.A.; Lobato, A.G.P.; Duarte, O.C.M.B. Duarte A performance comparison of open-source stream processing platforms. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, 4–8 December 2016. [Google Scholar] [CrossRef]

Figure 1. Smart building metrics.

Figure 2. Mapping IoT layers for the occupancy detection scenario.

Figure 3. Fog-based IoT architecture [57].

Figure 4. Data ingestion procedure [58].

Figure 5. Global architecture of Hadoop ecosystems [71].

Figure 6. Parallelism hint and stream grouping in SAMOA.

Figure 7. The workflow of “EvaluatePrequential” Task in StreamDM.

Figure 8. The global architecture describing IoT and big data layers for setting up smart home services based on occupancy detection systems.

Figure 9. Prototype platform architecture of occupancy scenario deployment.

Table 1. Overview of different building services using IoT technologies.

Building Services	Sensing Parameters	Communication Techniques	The Aim
Air quality control (HVAC and window opening)	CO₂, VOC, temperature, humidity	Restful API	Regulating HVAC and determining the opening/closing patterns for windows based on indoor air quality data [24]
Air quality monitoring	CO₂, CO, SO₂, NO₂, O₃, Cl₂, temperature, and humidity	IoT web server platform (Emoncms)	Long-term monitoring and storing environmental data [25]
Air quality predicting	NH₃, CO, NO₂, CH₄, CO₂	GSM/Wi-Fi	Predicting different indoor air quality pollutants [26]
Real-time CO₂ monitoring	CO₂	Wi-Fi (http requests)	Developing an IoT platform for monitoring CO₂ concentration in real-time [27]
Air quality monitoring and predicting	PM_2.5/PM10, CO, SO₂, NO_x, and O₃	Lora gateway	Collecting and predicting air pollutant data using artificial intelligent algorithms [28]
Air quality monitoring	CO, CO₂, NO₂, PM10, humidity, and temperature	Wi-Fi (http requests)	Evaluate the impact of human activities on the indoor environment [38]
HVAC control	Temperature	Wi-Fi (http requests)	Optimal control of HVAC system using IoT framework for collecting thermal data [30]
Ventilation control	CO₂, temperature, and humidity	Wi-Fi (MQTT protocol)	Comparing different ventilation-based control strategies in terms of reducing energy and indoor comfort [31]
HVAC and lighting control	Temperature	IoT web server platform (Blynk)	Controlling remotely HVAC and lighting systems [32]
Occupancy detection	temperature, humidity, PIR, and CO₂	Wi-Fi based mesh network topology	Detecting occupancy with high accuracy in a floor area using environmental data [35]
Occupancy counting	iCE40-HX1K stick FPGA boards	Wi-Fi	Counting people using 2 iCE40-HX1K stick FPGA boards deployed in the door [36]

Table 2. Comparison criteria of data stream processing technologies.

Technology	Processing Model	Streaming Type	Data Type	Latency	Machine Learning Support
Apache Storm	At least once	Native streaming	Tuple	Milliseconds	Apache SAMOA
Apache Spark	Exactly one	Mini batches	Discretized stream	Seconds	StreamDM + MLlib
Apache Flink	Exactly one	Native streaming	Data stream	Milliseconds	FlinkML

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elkhoukhi, H.; Elmouatamid, A.; Haibi, A.; Bakhouya, M.; El Ouadghiri, D. Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications. Sustainability 2025, 17, 4029. https://doi.org/10.3390/su17094029

AMA Style

Elkhoukhi H, Elmouatamid A, Haibi A, Bakhouya M, El Ouadghiri D. Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications. Sustainability. 2025; 17(9):4029. https://doi.org/10.3390/su17094029

Chicago/Turabian Style

Elkhoukhi, Hamza, Abdellatif Elmouatamid, Achraf Haibi, Mohamed Bakhouya, and Driss El Ouadghiri. 2025. "Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications" Sustainability 17, no. 9: 4029. https://doi.org/10.3390/su17094029

APA Style

Elkhoukhi, H., Elmouatamid, A., Haibi, A., Bakhouya, M., & El Ouadghiri, D. (2025). Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications. Sustainability, 17(9), 4029. https://doi.org/10.3390/su17094029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Overview of Sensing and Data Processing Technologies for Smart Building Services and Applications

Abstract

1. Introduction

2. Materials and Methods

2.1. Narrative Review Method

2.2. Data Collection Method

2.3. Data Analysis Method

3. Sensing Technologies Used in Smart Building Services

4. Data Processing Technologies for Smart Building Services

4.1. Batches-Based Processing Techniques

4.2. Stream-Based Processing Techniques

4.3. Big Data and Machine Leaning

4.4. Online and Real-Time Machine Learning Techniques

5. Discussion

6. Conclusions and Perspectives

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI