1. Introduction
Internet of Things (IoT) applications based on smart and distributed cameras have gained considerable attention in the last decades and become very popular. There are numerous applications that require the integration of Wireless Vision Sensor Networks (WVSN), such as traffic monitoring in smart cities [
1], security and inspection during production in Industry 4.0 [
3], the diagnosis of diseases in healthcare [
4] and self-driving cars in the automotive sector [
5]. The continuous advancement in the different sectors means that more and more applications require greater speed, greater autonomy, more data and a smaller system size. As a consequence, WVSN systems must have low latency and low power consumption.
The demands on WVSN make integrating image processing systems in IoT challenging due to the main characteristics and somehow contrasting nature of image processing and IoT. On the one hand, we have image processing systems that rely on high data volumes and require significant processing capabilities while, on the other hand, we have the IoT, relying on wireless communication and developed with focus on low data volume applications. Thus, the contrasting nature of the two defines the core challenges in designing image processing-based IoT systems.
To understand how the amount of data affects the processing in the camera node, we need to analyze the relationship between the application requirements and the node constraints. For example, the smaller size requirement (to reduce weight in drones or hide its location in vehicles) affects all camera components but especially the battery [
6]; for this reason, the autonomy of the camera could be compromised. In addition, a low latency requirement in the IoT system requires a general increase in speed at the node, affecting both the processing and communication parts and resulting in an overall increase of energy consumption. Furthermore, the requirement of handling more data means using more memory and increment in both computational and communication load, thereby affecting both energy and latency at the node. Therefore, the smart camera node is undoubtedly a crucial component in an IoT system with the data captured and processed by the system a main factor affecting processing and communication requirements.
Image processing systems consist of a series of image processing tasks that are typically sequential, and the initial input image is reduced as it progresses through the image processing pipeline. As a result, the data volume is changing through the processing stages, which has a significant importance in an IoT node if we consider node offloading. From a node offloading perspective, the performance of an IoT node is defined by the inter-dependencies of the processing and the communication component. In our previous work, we have introduced our node offloading method, intelligence partitioning [
7], which relies on the computational and communication inter-dependencies to extract the optimal partition point as a trade-off between energy efficiency and latency. Throughout that analysis, we considered the input image as predefined; hence, its characteristics such as size and content were not included in the performance analysis while searching for the optimum partition point. The aim of this paper is to provide an analysis on how the change of the input image in terms of size and content affects the choice of the optimal partition point for node. To address this, we provide an analysis based on a traditional image processing system which we have implemented on a RaspberryPi and measured the energy consumption for different partition and input image configurations. Our aim is to achieve a better understanding of how to optimize data-intensive IoT nodes. Thus, we provide a method to study a broader design space for image processing systems with intelligence partitioning which enables designers to choose appropriate input options.
As a motivating example, we propose an industrial biscuit inspection application. The system to analyze combines IoT technology with computer vision; in this way, the factory benefits from the advantages offered by both systems. On the one hand, IoT technology makes the vision system able to work wireless. It also provides the factory with the necessary flexibility to carry out future adaptations of the production system, avoiding rewiring and costly installations. On the other hand, the vision system is responsible for detecting cookies that do not meet the quality criteria (size and shape) and providing their coordinates. The camera node must capture the images of the biscuits that are transported by a conveyor belt at a certain speed, so image processing time is vital. In addition, the autonomy of the system is important to avoid production stoppages due to battery changes. To meet the latency and power demands, we decided to apply the system partition, where the challenge is to find the optimal partition point that guarantees low latency and low power consumption. Our hypothesis is based on the fact that the characteristics of the image (size and content) will have a non-linear effect on the latency and power of the system, thus affecting partition decisions. Through the study of this application, we intend to understand and explain the importance of image size in partitioning decisions, which together with the type of processing platform, type of communication and type of optimization, will influence the latency and energy in each of the partitions. The challenges found in the motivating example (reducing latency and power consumption at the node) are currently found in most wireless vision systems, where limited node resources force us to adopt creative design solutions (such as partitions).
The article is structured as follows. First, we analyze how the inter-task data amount varies along the processing chain with respect to changes in the image. Next, we analyze the dependency between processing time of a task and data to handle by this task to see how changes in the amount of data affect the processing time. Then, we study and compare the latency and energy behaviors of the processing and communication components of each partition point in an example image processing system. Finally, we look at how changes in the image affect the optimal partitioning solutions based on the energy and latency optimization objectives at the node.
2. Related Work
The integration of image processing systems in the context of IoT continues to pose challenges. Energy and latency are often primarily responsible for the current research effort [
12] to contribute to the overall improvement of these systems. Smart cameras with integrated image analysis capabilities are among the most challenging IoT nodes. The data intensive nature of such systems can impose high energy consumption and long delays in the wireless communication and the processing tasks. Thus, deploying low-latency/energy data-intensive IoT nodes is a challenging task.
To address this, several approaches have been presented, relying on fine-tuned implementations to improve the node latency and energy. Abas et al. [
13] achieved energy efficiency in the node by adapting the activity of the camera to the amount of remaining battery charge. In addition, the system only records and transmits information when events of interest are detected. Anagnostou et al. [
14] presented an energy aware hardware activation scheduler that activates functions as the energy is available. Qurabat et al. [
15] improved the energy efficiency proposing data reduction applied at two levels of network design: the sensor nodes and the gateway. Dai et al. [
16] managed to reduce latency at the edge nodes by improving a low latency object detection algorithm. In the recent years, the techniques and technologies based on Artificial Intelligence (AI) are being imposed in the IoT field. Mohammadi et al. explained in their survey [
17] how Deep Learning (DL) is already being applied in each layer of the IoT systems (Device, Edge/Fog and Cloud). In this survey, the authors highlighted four methods and technologies to incorporate DL on IoT devices. From the point of view of computational efficiency, the Network Compression and Accelerators stand out. Regarding the energy perspective, the Approximate Computing and Tinymote with DL are the most efficient. These works propose solutions for the efficiency of the current node, but we want to address the problem at the system design level, where we would take into account the amount of data processed and communicated to achieve an advantage in terms of latency and energy.
During the past decade, other works proposed node offloading by distributing the processing tasks between the node and a server. This approach opens up a new design exploration dimension where image processing may be local, partially local or totally remote. In their work, Pinto et al. [
18] analyzed three scenarios to find out when it is best to send data and when to process them locally. Khursheed et al. [
19] investigated the partition between hardware (Field-Programmable Gate Array or FPGA) and software (micro-controller) but also between local and central processing in order to achieve an optimal partition point that guarantees a minimum energy consumption in the camera node. Imran et al. [
20] descreased the execution time by parallelizing tasks in a FPGA. In their work, they also considered the partition of the tasks in order to distribute the processing load between the node and the server. In [
21], Motlagh et al. proposed an Unmanned Aerial Vehicle (UAV) as an IoT node where the energy savings are achieved by moving local processing (on-board processing) to a Multi-Access Edge Computing (MEC) node. Zhao et al. [
22] presented a system partition approach based on Convolutional Neural Networks (CNNs) by distributing the convolutional layers that contribute greatly to inference latency. Although these works take into account processing scenarios, they do not consider the effects on latency and energy due to changes in the system input.
The aforementioned works offer solutions that undoubtedly contribute to the improvement and optimization of the camera node. However, these works do not consider changes in the image in their analysis. From a data point of view, an image can vary with respect to two dimensions: size and content. The size refers to the resolution of the image, which directly influences the number of pixels to be processed. Content refers to features in the image that are relevant to the application. For example, in an object identification task, the number of objects in an image might affect the processing and communication load, subsequently affecting the efficiency of the optimization methods proposed.
Many authors have studied the effect that changes in image size and content have on their systems. In [
23], Wang et al. studied an image content weighting methodology in order to improve image quality assessment (IQA) algorithms. Fookes et al. [
24] analyzed the impact of image resolution on facial recognition performance showing that the methodology proposed for super-resolution improves the recognition of low-resolution and noisy facial images. In [
25], Yan et al. developed a model to map pedestrians at different resolutions. This way, the authors managed to reduce the average rate of false positives in the context of detection in traffic scenes. In the work presented by Alhilal et al. [
26], a low complexity vision system was designed for the identification of objects in WVSN. The proposed architecture is characterized by low power processing, thus achieving the energy efficiency of the node. In [
27], Gu et al. investigated the influence of viewing distance and image resolution on IQA performance. Ur Rehman et al. focused on efficient image delivery based on object detection [
28]. Their work proposes a new object detection model to reduce false transmission of images and transmitting only image segments instead of complete images. The results showed considerable energy savings in the camera node compared to more current techniques. Romic et al. [
29] evaluated the performance of a stairs detection system based on cameras in function of distinct image resolutions. The results showed that the selection of an optimal resolution is essential to achieve the trade-off between precision and processing speed. Yazidi et al. faced the challenge of data growth from the IoT and the generation of Big Data by different platforms [
30]. The authors conducted a data size latency sensitivity study in order to measure the performance of the Apache Spark framework (Spark is a fast parallel processing system in the Big Data environment) and evaluate the computational complexity. Despite the existence of studies related to system input changes, none of them addresses the problem of the impact of size and content of the input images on both the latency and energy at the node. However, the fact that there is a great research effort to cover the effect of input image changes on the implementations highlights its importance.
Summarizing, none of the related works studied the effect that changes in the input image could have on design decisions related to system deployment. Specifically, design decisions related to system deployment would be affected by how changes in the image are transferred through the system to affect energy and latency at different partitions. In this work, we attack the problem of camera node optimization through intelligence partitioning [
7]. Intelligence partitioning targets finding a suitable cut-off point that optimizes either latency or energy or both considering processing and communication in the sensor node (
Figure 1).
In this article, we rely on the previously introduced intelligence partitioning method, but we provide a more elaborate view of the problem, including variations in the input image size and content. This is because image size and content could have an impact on the inter-task data amount during the processing chain, affecting the workload of the processing tasks and the node efficiency. Therefore, we explore the use of intelligence partitioning as a methodology for the camera node optimization, this way contributing to the problem of implementing data-intensive, low-latency/energy camera nodes.
3. Theory
3.1. Intelligence Partitioning/Node Offloading
Intelligence partitioning is a method developed by Shallari et al. to improve the energy efficiency of smart sensor nodes by analyzing the trade-off between processing and communication [
7]. It focuses on the prospective energy consumption variation due to partitioning the processing trail at any given point, and allocating the processing tasks between the sensor node and a remote processing unit. By taking into consideration a variety of partition configurations and wireless communication technologies, it provides insight into the inter-effects of processing and communication in the overall energy consumption of the sensor node. However, the current approach only provides a partial view of the problem, because in the analysis of the optimal partitioning point, the size of the input data and subsequently the inter-task data amount between the processing tasks are considered fixed.
The system inter-task data amount during processing could vary for two reasons. The first concerns the content of the image. There are processing tasks that could be sensitive to the content of the image. This is because normally the objective of an image processing system is to detect objects in the image for subsequent counting, identification, classification, etc. Therefore, there should be variations in the data amounts during processing that result from the outputs of the system tasks that are dependent on the image content. The second concerns the initial stages of design, where the image size must be selected. This decision directly affects the data amount that the system must process and therefore should affect the inter-task data amount during processing too.
Because of the close relationship between data, time and energy, we think that data are the common factor in node constraints. For this reason, we believe that the choice of system input size could have an effect on subsequent partitioning decisions.
3.2. Communication Model
Intelligence partitioning assumes that several tasks can be executed in different locations. As a result, data have to be exchanged between these locations depending on the partitioning point. However, the energy and delay resulting from this transfer depend on the inter-task data amount to exchange at each task as well as the chosen communication technology. Several technologies have been discussed in the context of IoT or smart cameras.
In order to evaluate multiple technologies and their impact on the partitioning depending on the data change, we employ a framework that models the data transfer of various IoT commuication technologies. Krug and O’Nils [
31] introduced a modular framework that allows us to evaluate and select the most suitable communication technologies for our system. The models are implemented in Matlab and calculate the latency and energy per data transfer for several communication technologies. The framework covers the functional level of sensing which, in our work, corresponds to the camera node and is thus viable for this task.
To calculate the energy and delay, Krug and O’Nils considered the communication technology, the resulting protocol-specific timing based on the data amount to transfer and corresponding real hardware transceivers. The amount of data to transfer is used to determine the number of packets to be sent by the transceiver and thus determines its activity. The energy consumption then depends on the resulting duration of each activity as well as the corresponding power consumption of the selected hardware. As a result, the models are able to provide the communication cost for an arbitrary data amount.
In order to observe the impact of the communication component at the partion points, in this study, we chose a subset of models. We analyze an image processing system with a relatively high data amount to transfer compared to traditional IoT use cases. Due to this, we selected the following communication technologies that are able to handle this data amount: Bluetooth 5.0, 802.11n (Wi-Fi), LTE Cat.4, and LTE Cat.1. All technologies are suitable for higher data rates and are used for smart camera applications, where LTE Cat.4 corresponds to traditional smartphone type communication. Other popular low power communication technologies such as LoRa were not considered in this study as they are not able to handle large data amounts required to send images or intermediate data. For these technologies, the partitioning results in complete in-node processing always.
4. Methodology
We focus on sequential processing systems because the partitioning cut between the sensor node and the cloud server has to be a single, directed vertex. More general architectures can be transformed into an acyclic, sequential system by collapsing cycles and fork-join structures into single processing nodes. Thus, our work is also applicable to general processing architectures that can be transformed in this way into an sequential architecture, which can be done if the overall processing algorithm has a single input and a single output. Thus, our method is general but limited by the fact that a partitioning cut can only be applied to directed vertices that separate the architecture graph into two otherwise disconnected sub-graphs.
4.1. Optimization Problem
In an IoT system, the data are captured by the devices and sent to the cloud for analysis, processing or both. Once the cloud computes a solution, the data could be sent back to the same node or an actuator in order to achieve the system objective. In this article, we limit the scope of the optimization problem to the sensor node, assuming it does not expect data back. Therefore, we analyze both the time and the energy in order to optimize and offload the node.
As mentioned before, image processing systems usually involve a set of smaller tasks
that form a complex processing function
In a distributed system, a specific task
of the function
F is not bound to a specific geographical location. Therefore, the actual execution of tasks is location independent [
7]. Each task can be mapped to any node in the system: the camera node, a cloud server, or an inter-task fog computational resource. Formally, the distributed function
F has its functionality distributed between node and cloud so that,
where the subsets
are the different clusters of tasks composing the function
F that are executed at the receptive location. Because of this, both the latency and the energy of the tasks are also given in different entities. The mapping of the computational and communication load of a node between the different computational resources (
Figure 2) is defined as intelligence partitioning,
are the node latency and energy due to processing and
are the latency and energy due to communication from the node to the cloud.
Both latency and energy have a direct relationship with the data amount. The data transfer (
) between the processing tasks in the node make up the
D function,
is the input image,
, … are the data between the processing tasks, and
is the system output data. The data amount transferred between two computational layers can be expressed as,
j is the position of the system partition cut.
Our optimization problem is focused on the latency and energy; therefore, system partitioning must allow a reduction of one or both of these. The latency in the node (
), in a specific partition cut, depends on two components, the computation and communication latencies. The computation latency (
) can be defined as the accumulated processing time of the tasks (
t) that make up the system from its start to the partition cut. The computational latency depends on the processing platform (
) but also on the data (
) generated by the processing algorithms. The communication latency (
) refers to the time to transfer data from the partition point (
) and it depends on the intrinsic characteristics of communication technology (
). The node latency (
) is derived as,
are input to
is the output.
are the measurement or estimation function for the processing and communication latency, respectively. Like the latency, the energy consumption in the node until the partition cut has two components, the computation and communication energies. Therefore, the node energy (
) is derived as
Section 5.3.3, we will consider three specific objective functions for our study case in order to minimize once latency (
), once energy (
) only, and once energy (
) under delay constraints (
4.2. Method for Analyzing Algorithm Image Sensitivity
According to Equations (
6) and (
7), the optimal partition points are data-dependent. This means that both the latency and the energy of the partitions will vary based on the inter-task data amount processed before the partition cut and the data amount transmitted over the link in the partition cut.
The inter-task data amount through the system varies for two reasons that are closely related. The first is due to the dependence of the data amount between processing tasks. Since being the image introduced into the system, it undergoes a series of changes that lead to a reduction of data. This reduction is due to the fact that each task exerts a reducing action that will affect the inter-task data amount to a lesser or greater extent. The second is due to changes in the image. Normally, during the execution of the application, the image changes in content, which refers to the number of objects. The objective of a processing system is usually the recognition of objects for their subsequent labeling, counting, classification, etc. So, there are processing tasks that would be sensitive to these changes in the image. Because of this, a change in the number of objects could affect the data amounts in the outputs of these tasks, producing variations in the inter-task data amount during the processing chain. Another change of the image refers to its size. This type of change would also have an effect on the data amount throughout the system. This is because, at the beginning of the system, the tasks are in charge of processing pixels in order to isolate relevant information for the application. Because of this, the data amount in the output will change relative to the size of the image. Although normally the size of the input image does not vary during the execution of the application, it would play a fundamental role in the design of the system. This is because the number of objects is related to the resolution of the image, so a higher resolution (larger image size) would allow increasing the number of objects per image. Conversely, a large image size could affect communication and processing times, thereby affecting the partition latency and energy. Therefore, we focus on two aspects of the image: its size and its content in terms of the number of objects visible in the image.
We have analyzed the image sensitivity of the tasks by observing their outputs while introducing multiple images into the system, with various sizes and number of objects. We expect to see that the output of an image-sensitive processing task could vary based on the input image size, its content, or both. However there could also be non-image sensitive tasks in image processing systems. Accordingly, each algorithm has a characteristic image sensitivity and that causes variations in the inter-task data amount, which in turn could play a role in partitioning decisions.
4.3. Processing Time Behavior Analysis
The variation of inter-task data amount during the processing chain means that the data in the inputs of the algorithms (
) varies due to the dependency between the tasks. These changes should affect the time it takes to execute each of the system tasks. The processing time is an important aspect at a partition point because of its direct impact on overall latency and energy (Equations (
6) and (
7)). For this reason, the processing time behavior of the tasks with respect to the data amount could be another factor that influences the best partitioning solutions. We have analyzed the processing time behavior by measuring the time it takes to process the input data in each of the system tasks, which varies depending on the input image and the data amount dependency between the previous tasks. We are interested in observing how the processing task times behave to analyze how they are influenced by the changes in the input image (
Table 1). The variation of the processing times will contribute in the partitions by increasing or decreasing their delay based on the data amounts.
4.4. Partitioning Depending on System Architecture
In this work, we analyze the latency and energy of the partitions at different points in the processing chain. To do this, we first analyze the tasks of the system separately, extracting the amount of data at each tasks input and output as well as the related processing time and energy. The partition points are dependent on the type of system architecture which can be sequential or parallel. In a purely sequential system, the data amounts, latencies, and energies of the partition points correspond to the inputs and outputs of the processing tasks. This would greatly simplify the analysis of partitions. However, in a parallel architecture, neither the data amount, nor the latencies, nor the energies of the partition points have to correspond to the inputs or outputs of the processing tasks. Then, the analysis of the partitions in a parallel architecture would require a previous serialization process.
The serialization process consists of grouping the tasks in parallel forming compound tasks. The time of the compound task is the maximum time between the tasks that compose it and the energy is the sum of the energies of all the tasks compounding it. The data amount in the output of the compound task is the result of summing the data in the outputs of the tasks that compose it. Once the parallel architecture is serialized, the data amounts, latencies, and energies between the compound tasks will correspond to specific points in the processing chain.
Our application case is a serial architecture from a time and energy point of view, but there is a parallel data stream where data are transferred between non-consecutive tasks (Channel Separation and Image Segmentation). For this reason, we only apply serialization from a data perspective.
Figure 3 shows the system flow diagram where we have marked the partition points from 1 to 10. Partition 1 (
) corresponds to the system input data, and number 10 (
) to the output. We have observed that the data amount, time and energy of the Image Histogram is too small when compared to the previous task (Channel Separation) and the latter one (Image Segmentation). Due to this, the partition point after Image Histogram will be practically the same (almost same latency and energy) as the partition point after Channel Separation. Then, the time and energy in Image Histogram will be accumulated in the next partition, that is in the output of Image Segmentation.
7. Conclusions
We have applied intelligence partitioning with the purpose of offloading an image processing-based IoT node with respect to node energy and to reduce latency. This work shows that the optimal partitioning point depends heavily on the selection of the input image size, but is hardly affected by the number of objects in the image. In addition, the optimal partitioning point also depends on the specific objective function.
The reason why size affects partitioning points is due to two aspects. The first is that, during processing, there are more tasks processing pixels than processing data referring to objects. Because of this, the cumulative effect of pixel processing is much larger than processing dedicated to objects making size more relevant. The second aspect concerns the communication of data at the point of partitioning. We have found that data communication dominates over processing in both energy and time resource consumption at the node. In this way, the relevance of image size is enhanced because the largest amounts of data to be sent are found in the partitions made at the beginning of the processing stage. In addition, due to the reduction effect of the algorithms, the amount of data to be processed in the object processing stage is much lower than in the pixel stage, so sending data is less costly, making objects less important.