Impact of Input Data on Intelligence Partitioning Decisions for IoT Smart Camera Nodes

: Image processing systems exploit image information for a purpose determined by the application at hand. The implementation of image processing systems in an Internet of Things (IoT) context is a challenge due to the amount of data in an image processing system, which affects the three main node constraints: memory, latency and energy. One method to address these challenges is the partitioning of tasks between the IoT node and a server. In this work, we present an in-depth analysis of how the input image size and its content within the conventional image processing systems affect the decision on where tasks should be implemented, with respect to node energy and latency. We focus on explaining how the characteristics of the image are transferred through the system until ﬁnally inﬂuencing partition decisions. Our results show that the image size affects signiﬁcantly the efﬁciency of the node ofﬂoading conﬁgurations. This is mainly due to the dominant cost of communication over processing as the image size increases. Furthermore, we observed that image content has limited effects in the node ofﬂoading analysis.


Introduction
Internet of Things (IoT) applications based on smart and distributed cameras have gained considerable attention in the last decades and become very popular. There are numerous applications that require the integration of Wireless Vision Sensor Networks (WVSN), such as traffic monitoring in smart cities [1], security and inspection during production in Industry 4.0 [2,3], the diagnosis of diseases in healthcare [4] and self-driving cars in the automotive sector [5]. The continuous advancement in the different sectors means that more and more applications require greater speed, greater autonomy, more data and a smaller system size. As a consequence, WVSN systems must have low latency and low power consumption.
The demands on WVSN make integrating image processing systems in IoT challenging due to the main characteristics and somehow contrasting nature of image processing and IoT. On the one hand, we have image processing systems that rely on high data volumes and require significant processing capabilities while, on the other hand, we have the IoT, relying on wireless communication and developed with focus on low data volume applications. Thus, the contrasting nature of the two defines the core challenges in designing image processing-based IoT systems.
To understand how the amount of data affects the processing in the camera node, we need to analyze the relationship between the application requirements and the node constraints. For example, the smaller size requirement (to reduce weight in drones or hide its location in vehicles) affects all camera components but especially the battery [6]; for this reason, the autonomy of the camera could be compromised. In addition, a low latency requirement in the IoT system requires a general increase in speed at the node, affecting both the processing and communication parts and resulting in an overall increase of energy consumption. Furthermore, the requirement of handling more data means using more memory and increment in both computational and communication load, thereby affecting both energy and latency at the node. Therefore, the smart camera node is undoubtedly a crucial component in an IoT system with the data captured and processed by the system a main factor affecting processing and communication requirements.
Image processing systems consist of a series of image processing tasks that are typically sequential, and the initial input image is reduced as it progresses through the image processing pipeline. As a result, the data volume is changing through the processing stages, which has a significant importance in an IoT node if we consider node offloading. From a node offloading perspective, the performance of an IoT node is defined by the inter-dependencies of the processing and the communication component. In our previous work, we have introduced our node offloading method, intelligence partitioning [7], which relies on the computational and communication inter-dependencies to extract the optimal partition point as a trade-off between energy efficiency and latency. Throughout that analysis, we considered the input image as predefined; hence, its characteristics such as size and content were not included in the performance analysis while searching for the optimum partition point. The aim of this paper is to provide an analysis on how the change of the input image in terms of size and content affects the choice of the optimal partition point for node. To address this, we provide an analysis based on a traditional image processing system which we have implemented on a RaspberryPi and measured the energy consumption for different partition and input image configurations. Our aim is to achieve a better understanding of how to optimize data-intensive IoT nodes. Thus, we provide a method to study a broader design space for image processing systems with intelligence partitioning which enables designers to choose appropriate input options.
As a motivating example, we propose an industrial biscuit inspection application. The system to analyze combines IoT technology with computer vision; in this way, the factory benefits from the advantages offered by both systems. On the one hand, IoT technology makes the vision system able to work wireless. It also provides the factory with the necessary flexibility to carry out future adaptations of the production system, avoiding rewiring and costly installations. On the other hand, the vision system is responsible for detecting cookies that do not meet the quality criteria (size and shape) and providing their coordinates. The camera node must capture the images of the biscuits that are transported by a conveyor belt at a certain speed, so image processing time is vital. In addition, the autonomy of the system is important to avoid production stoppages due to battery changes. To meet the latency and power demands, we decided to apply the system partition, where the challenge is to find the optimal partition point that guarantees low latency and low power consumption. Our hypothesis is based on the fact that the characteristics of the image (size and content) will have a non-linear effect on the latency and power of the system, thus affecting partition decisions. Through the study of this application, we intend to understand and explain the importance of image size in partitioning decisions, which together with the type of processing platform, type of communication and type of optimization, will influence the latency and energy in each of the partitions. The challenges found in the motivating example (reducing latency and power consumption at the node) are currently found in most wireless vision systems, where limited node resources force us to adopt creative design solutions (such as partitions).
The article is structured as follows. First, we analyze how the inter-task data amount varies along the processing chain with respect to changes in the image. Next, we analyze the dependency between processing time of a task and data to handle by this task to see how changes in the amount of data affect the processing time. Then, we study and compare the latency and energy behaviors of the processing and communication components of each partition point in an example image processing system. Finally, we look at how changes in the image affect the optimal partitioning solutions based on the energy and latency optimization objectives at the node.

Related Work
The integration of image processing systems in the context of IoT continues to pose challenges. Energy and latency are often primarily responsible for the current research effort [8][9][10][11][12] to contribute to the overall improvement of these systems. Smart cameras with integrated image analysis capabilities are among the most challenging IoT nodes. The data intensive nature of such systems can impose high energy consumption and long delays in the wireless communication and the processing tasks. Thus, deploying low-latency/energy data-intensive IoT nodes is a challenging task.
To address this, several approaches have been presented, relying on fine-tuned implementations to improve the node latency and energy. Abas et al. [13] achieved energy efficiency in the node by adapting the activity of the camera to the amount of remaining battery charge. In addition, the system only records and transmits information when events of interest are detected. Anagnostou et al. [14] presented an energy aware hardware activation scheduler that activates functions as the energy is available. Qurabat et al. [15] improved the energy efficiency proposing data reduction applied at two levels of network design: the sensor nodes and the gateway. Dai et al. [16] managed to reduce latency at the edge nodes by improving a low latency object detection algorithm. In the recent years, the techniques and technologies based on Artificial Intelligence (AI) are being imposed in the IoT field. Mohammadi et al. explained in their survey [17] how Deep Learning (DL) is already being applied in each layer of the IoT systems (Device, Edge/Fog and Cloud). In this survey, the authors highlighted four methods and technologies to incorporate DL on IoT devices. From the point of view of computational efficiency, the Network Compression and Accelerators stand out. Regarding the energy perspective, the Approximate Computing and Tinymote with DL are the most efficient. These works propose solutions for the efficiency of the current node, but we want to address the problem at the system design level, where we would take into account the amount of data processed and communicated to achieve an advantage in terms of latency and energy.
During the past decade, other works proposed node offloading by distributing the processing tasks between the node and a server. This approach opens up a new design exploration dimension where image processing may be local, partially local or totally remote. In their work, Pinto et al. [18] analyzed three scenarios to find out when it is best to send data and when to process them locally. Khursheed et al. [19] investigated the partition between hardware (Field-Programmable Gate Array or FPGA) and software (micro-controller) but also between local and central processing in order to achieve an optimal partition point that guarantees a minimum energy consumption in the camera node. Imran et al. [20] descreased the execution time by parallelizing tasks in a FPGA. In their work, they also considered the partition of the tasks in order to distribute the processing load between the node and the server. In [21], Motlagh et al. proposed an Unmanned Aerial Vehicle (UAV) as an IoT node where the energy savings are achieved by moving local processing (on-board processing) to a Multi-Access Edge Computing (MEC) node. Zhao et al. [22] presented a system partition approach based on Convolutional Neural Networks (CNNs) by distributing the convolutional layers that contribute greatly to inference latency. Although these works take into account processing scenarios, they do not consider the effects on latency and energy due to changes in the system input.
The aforementioned works offer solutions that undoubtedly contribute to the improvement and optimization of the camera node. However, these works do not consider changes in the image in their analysis. From a data point of view, an image can vary with respect to two dimensions: size and content. The size refers to the resolution of the image, which directly influences the number of pixels to be processed. Content refers to features in the image that are relevant to the application. For example, in an object identification task, the number of objects in an image might affect the processing and communication load, subsequently affecting the efficiency of the optimization methods proposed.
Many authors have studied the effect that changes in image size and content have on their systems. In [23], Wang et al. studied an image content weighting methodology in order to improve image quality assessment (IQA) algorithms. Fookes et al. [24] analyzed the impact of image resolution on facial recognition performance showing that the methodology proposed for super-resolution improves the recognition of low-resolution and noisy facial images. In [25], Yan et al. developed a model to map pedestrians at different resolutions. This way, the authors managed to reduce the average rate of false positives in the context of detection in traffic scenes. In the work presented by Alhilal et al. [26], a low complexity vision system was designed for the identification of objects in WVSN. The proposed architecture is characterized by low power processing, thus achieving the energy efficiency of the node. In [27], Gu et al. investigated the influence of viewing distance and image resolution on IQA performance. Ur Rehman et al. focused on efficient image delivery based on object detection [28]. Their work proposes a new object detection model to reduce false transmission of images and transmitting only image segments instead of complete images. The results showed considerable energy savings in the camera node compared to more current techniques. Romic et al. [29] evaluated the performance of a stairs detection system based on cameras in function of distinct image resolutions. The results showed that the selection of an optimal resolution is essential to achieve the trade-off between precision and processing speed. Yazidi et al. faced the challenge of data growth from the IoT and the generation of Big Data by different platforms [30]. The authors conducted a data size latency sensitivity study in order to measure the performance of the Apache Spark framework (Spark is a fast parallel processing system in the Big Data environment) and evaluate the computational complexity. Despite the existence of studies related to system input changes, none of them addresses the problem of the impact of size and content of the input images on both the latency and energy at the node. However, the fact that there is a great research effort to cover the effect of input image changes on the implementations highlights its importance.
Summarizing, none of the related works studied the effect that changes in the input image could have on design decisions related to system deployment. Specifically, design decisions related to system deployment would be affected by how changes in the image are transferred through the system to affect energy and latency at different partitions. In this work, we attack the problem of camera node optimization through intelligence partitioning [7]. Intelligence partitioning targets finding a suitable cut-off point that optimizes either latency or energy or both considering processing and communication in the sensor node ( Figure 1). In this article, we rely on the previously introduced intelligence partitioning method, but we provide a more elaborate view of the problem, including variations in the input image size and content. This is because image size and content could have an impact on the inter-task data amount during the processing chain, affecting the workload of the processing tasks and the node efficiency. Therefore, we explore the use of intelligence partitioning as a methodology for the camera node optimization, this way contributing to the problem of implementing data-intensive, low-latency/energy camera nodes.

Intelligence Partitioning/Node Offloading
Intelligence partitioning is a method developed by Shallari et al. to improve the energy efficiency of smart sensor nodes by analyzing the trade-off between processing and communication [7]. It focuses on the prospective energy consumption variation due to partitioning the processing trail at any given point, and allocating the processing tasks between the sensor node and a remote processing unit. By taking into consideration a variety of partition configurations and wireless communication technologies, it provides insight into the inter-effects of processing and communication in the overall energy consumption of the sensor node. However, the current approach only provides a partial view of the problem, because in the analysis of the optimal partitioning point, the size of the input data and subsequently the inter-task data amount between the processing tasks are considered fixed.
The system inter-task data amount during processing could vary for two reasons. The first concerns the content of the image. There are processing tasks that could be sensitive to the content of the image. This is because normally the objective of an image processing system is to detect objects in the image for subsequent counting, identification, classification, etc. Therefore, there should be variations in the data amounts during processing that result from the outputs of the system tasks that are dependent on the image content. The second concerns the initial stages of design, where the image size must be selected. This decision directly affects the data amount that the system must process and therefore should affect the inter-task data amount during processing too.
Because of the close relationship between data, time and energy, we think that data are the common factor in node constraints. For this reason, we believe that the choice of system input size could have an effect on subsequent partitioning decisions.

Communication Model
Intelligence partitioning assumes that several tasks can be executed in different locations. As a result, data have to be exchanged between these locations depending on the partitioning point. However, the energy and delay resulting from this transfer depend on the inter-task data amount to exchange at each task as well as the chosen communication technology. Several technologies have been discussed in the context of IoT or smart cameras.
In order to evaluate multiple technologies and their impact on the partitioning depending on the data change, we employ a framework that models the data transfer of various IoT commuication technologies. Krug and O'Nils [31] introduced a modular framework that allows us to evaluate and select the most suitable communication technologies for our system. The models are implemented in Matlab and calculate the latency and energy per data transfer for several communication technologies. The framework covers the functional level of sensing which, in our work, corresponds to the camera node and is thus viable for this task.
To calculate the energy and delay, Krug and O'Nils considered the communication technology, the resulting protocol-specific timing based on the data amount to transfer and corresponding real hardware transceivers. The amount of data to transfer is used to determine the number of packets to be sent by the transceiver and thus determines its activity. The energy consumption then depends on the resulting duration of each activity as well as the corresponding power consumption of the selected hardware. As a result, the models are able to provide the communication cost for an arbitrary data amount.
In order to observe the impact of the communication component at the partion points, in this study, we chose a subset of models. We analyze an image processing system with a relatively high data amount to transfer compared to traditional IoT use cases. Due to this, we selected the following communication technologies that are able to handle this data amount: Bluetooth 5.0, 802.11n (Wi-Fi), LTE Cat.4, and LTE Cat.1. All technologies are suitable for higher data rates and are used for smart camera applications, where LTE Cat.4 corresponds to traditional smartphone type communication. Other popular low power communication technologies such as LoRa were not considered in this study as they are not able to handle large data amounts required to send images or intermediate data. For these technologies, the partitioning results in complete in-node processing always.

Methodology
We focus on sequential processing systems because the partitioning cut between the sensor node and the cloud server has to be a single, directed vertex. More general architectures can be transformed into an acyclic, sequential system by collapsing cycles and fork-join structures into single processing nodes. Thus, our work is also applicable to general processing architectures that can be transformed in this way into an sequential architecture, which can be done if the overall processing algorithm has a single input and a single output. Thus, our method is general but limited by the fact that a partitioning cut can only be applied to directed vertices that separate the architecture graph into two otherwise disconnected sub-graphs.

Optimization Problem
In an IoT system, the data are captured by the devices and sent to the cloud for analysis, processing or both. Once the cloud computes a solution, the data could be sent back to the same node or an actuator in order to achieve the system objective. In this article, we limit the scope of the optimization problem to the sensor node, assuming it does not expect data back. Therefore, we analyze both the time and the energy in order to optimize and offload the node.
As mentioned before, image processing systems usually involve a set of smaller tasks t i that form a complex processing function F: In a distributed system, a specific task t i of the function F is not bound to a specific geographical location. Therefore, the actual execution of tasks is location independent [7]. Each task can be mapped to any node in the system: the camera node, a cloud server, or an inter-task fog computational resource. Formally, the distributed function F has its functionality distributed between node and cloud so that, where the subsets f Node and f Cloud are the different clusters of tasks composing the function F that are executed at the receptive location. Because of this, both the latency and the energy of the tasks are also given in different entities. The mapping of the computational and communication load of a node between the different computational resources (Figure 2) is defined as intelligence partitioning, : where L P Node and E P Node are the node latency and energy due to processing and L C Node→Cloud and E C Node→Cloud are the latency and energy due to communication from the node to the cloud. Both latency and energy have a direct relationship with the data amount. The data transfer (D i ) between the processing tasks in the node make up the D function, where D 0 is the input image, D 1 ,D 2 , ... are the data between the processing tasks, and D N is the system output data. The data amount transferred between two computational layers can be expressed as, where j is the position of the system partition cut.
Our optimization problem is focused on the latency and energy; therefore, system partitioning must allow a reduction of one or both of these. The latency in the node (L Node ), in a specific partition cut, depends on two components, the computation and communication latencies. The computation latency (L P ) can be defined as the accumulated processing time of the tasks (t) that make up the system from its start to the partition cut. The computational latency depends on the processing platform (P P ) but also on the data (D i ) generated by the processing algorithms. The communication latency (L C ) refers to the time to transfer data from the partition point (D j ) and it depends on the intrinsic characteristics of communication technology (C C ). The node latency (L Node ) is derived as, where D i−1 are input to t i and D i is the output. L P and L C are the measurement or estimation function for the processing and communication latency, respectively. Like the latency, the energy consumption in the node until the partition cut has two components, the computation and communication energies. Therefore, the node energy (E Node ) is derived as In Section 5.3.3, we will consider three specific objective functions for our study case in order to minimize once latency (L Node ), once energy (E Node ) only, and once energy (E Node ) under delay constraints (MaxDelayConstrain).

Method for Analyzing Algorithm Image Sensitivity
According to Equations (6) and (7), the optimal partition points are data-dependent. This means that both the latency and the energy of the partitions will vary based on the inter-task data amount processed before the partition cut and the data amount transmitted over the link in the partition cut.
The inter-task data amount through the system varies for two reasons that are closely related. The first is due to the dependence of the data amount between processing tasks. Since being the image introduced into the system, it undergoes a series of changes that lead to a reduction of data. This reduction is due to the fact that each task exerts a reducing action that will affect the inter-task data amount to a lesser or greater extent. The second is due to changes in the image. Normally, during the execution of the application, the image changes in content, which refers to the number of objects. The objective of a processing system is usually the recognition of objects for their subsequent labeling, counting, classification, etc. So, there are processing tasks that would be sensitive to these changes in the image. Because of this, a change in the number of objects could affect the data amounts in the outputs of these tasks, producing variations in the inter-task data amount during the processing chain. Another change of the image refers to its size. This type of change would also have an effect on the data amount throughout the system. This is because, at the beginning of the system, the tasks are in charge of processing pixels in order to isolate relevant information for the application. Because of this, the data amount in the output will change relative to the size of the image. Although normally the size of the input image does not vary during the execution of the application, it would play a fundamental role in the design of the system. This is because the number of objects is related to the resolution of the image, so a higher resolution (larger image size) would allow increasing the number of objects per image. Conversely, a large image size could affect communication and processing times, thereby affecting the partition latency and energy. Therefore, we focus on two aspects of the image: its size and its content in terms of the number of objects visible in the image.
We have analyzed the image sensitivity of the tasks by observing their outputs while introducing multiple images into the system, with various sizes and number of objects. We expect to see that the output of an image-sensitive processing task could vary based on the input image size, its content, or both. However there could also be non-image sensitive tasks in image processing systems. Accordingly, each algorithm has a characteristic image sensitivity and that causes variations in the inter-task data amount, which in turn could play a role in partitioning decisions.

Processing Time Behavior Analysis
The variation of inter-task data amount during the processing chain means that the data in the inputs of the algorithms (D i−1 ) varies due to the dependency between the tasks. These changes should affect the time it takes to execute each of the system tasks. The processing time is an important aspect at a partition point because of its direct impact on overall latency and energy (Equations (6) and (7)). For this reason, the processing time behavior of the tasks with respect to the data amount could be another factor that influences the best partitioning solutions. We have analyzed the processing time behavior by measuring the time it takes to process the input data in each of the system tasks, which varies depending on the input image and the data amount dependency between the previous tasks. We are interested in observing how the processing task times behave to analyze how they are influenced by the changes in the input image ( Table 1). The variation of the processing times will contribute in the partitions by increasing or decreasing their delay based on the data amounts.

Partitioning Depending on System Architecture
In this work, we analyze the latency and energy of the partitions at different points in the processing chain. To do this, we first analyze the tasks of the system separately, extracting the amount of data at each tasks input and output as well as the related processing time and energy. The partition points are dependent on the type of system architecture which can be sequential or parallel. In a purely sequential system, the data amounts, latencies, and energies of the partition points correspond to the inputs and outputs of the processing tasks. This would greatly simplify the analysis of partitions. However, in a parallel architecture, neither the data amount, nor the latencies, nor the energies of the partition points have to correspond to the inputs or outputs of the processing tasks. Then, the analysis of the partitions in a parallel architecture would require a previous serialization process. The serialization process consists of grouping the tasks in parallel forming compound tasks. The time of the compound task is the maximum time between the tasks that compose it and the energy is the sum of the energies of all the tasks compounding it. The data amount in the output of the compound task is the result of summing the data in the outputs of the tasks that compose it. Once the parallel architecture is serialized, the data amounts, latencies, and energies between the compound tasks will correspond to specific points in the processing chain.
Our application case is a serial architecture from a time and energy point of view, but there is a parallel data stream where data are transferred between non-consecutive tasks (Channel Separation and Image Segmentation). For this reason, we only apply serialization from a data perspective. Figure 3 shows the system flow diagram where we have marked the partition points from 1 to 10. Partition 1 (p1) corresponds to the system input data, and number 10 (p10) to the output. We have observed that the data amount, time and energy of the Image Histogram is too small when compared to the previous task (Channel Separation) and the latter one (Image Segmentation). Due to this, the partition point after Image Histogram will be practically the same (almost same latency and energy) as the partition point after Channel Separation. Then, the time and energy in Image Histogram will be accumulated in the next partition, that is in the output of Image Segmentation.

Results
We have carried out an in-depth analysis of the latency and energy of different partitions. We have first designed the system and a dataset for the purpose of the study. We then ran the system for each of the dataset images and extracted data amounts and execution times for each of the system tasks during processing. Finally, we set the optimization objectives and analyzed the impact of varying the input image at various partition points in the system.

Application Case
Our case study is a biscuit inspection system. This system is responsible for analyzing images and identifying whether biscuits are acceptable or not, depending on the visual biscuit characteristics. The system is made up of ten processing tasks: Color Transformation, Channel Separation, Image Histogram, Image Segmentation, Edge Detection, Image Region Filling, Morphology, Component Labeling, Features Extraction and Classification. Figure 3 shows the system flow diagram, and the data types of the output from each processing task are found in Table 1.
The primary input is an RGB image, which has a certain size and number of objects. Then, the image is transformed from RGB to YCbCr color space. The Channel Separation task receives the YCbCr image and separates the "Y" channel. The Image Histogram task then uses this channel to compute a histogram, which is used for the next task. Image Segmentation has two inputs, a 1D array coming from Image Histogram and the image from Channel Separation. Initially, this task uses the 1D array to define the intensity level that separates the background from the objects of interest; it computes a global threshold. Sequentially, the threshold is used then to convert the input image into a binary image without background (2D binary matrix). Next, we introduce the binary image into the Edge Detection task, which applies a Sobel operator [32] to detect the edges of the objects. The output of this task is a binary image showing the edges of the objects (2D binary matrix). Next, we fill the objects regions by applying a Region Filling task. The output is a binary image (2D binary matrix) with the regions defined in white. The Morphology task captures the binary image with filled regions and applies the Open Operator [33] to remove the image artifacts (biscuits crumbs). The Morphology results in an image whose objects are only biscuits (2D binary matrix). The Component Labeling task is responsible for differentiating the regions of the image by assigning distinct numerical labels to each of the regions. Therefore, the output is a 2D matrix of non-binary labels. Next, the Feature Extraction task analyzes the label matrix to extract the characteristics of the objects, as their centroid, perimeter and eccentricity. Its output is a matrix where each row corresponds to each object and the columns indicate the features. Finally, the Classification task analyzes the features matrix in order to separate the good cookies from the bad ones. The system output is a 2D matrix with the coordinates of the cookies and an additional bit indicating their quality.

Experiment Description
The scope is to analyze the impact that variations in the input image could cause to the optimal partition point, shifting it as a result of image size and content variations. Due to the configuration of the camera layout for frame capture, the maximum number of objects is limited to 18. Thus, we consider 10 different images containing 0, 2, 4, 6, 8, 12, 14, 16 and 18 objects, respectively. Similarly, we consider ten different image sizes between 292 × 436 and 2848 × 4288 pixels. Hence, 100 different images are used in the experiments, as illustrated in Figure 4.  (6) and (7), both latency and energy at a partition point have two components: processing and communication. The processing latency and energy at a partition point is the sum of all the latencies and energies of the processing tasks until the partition point; hence, all these tasks are implemented in the smart sensor node. Instead, the communication latency and energy are defined by the resulting data size at any given partition point. Therefore, the latency and energy at a partition point is the cumulative processing time plus the communication time of the serialized system.

As shown in Equations
The image processing tasks in Figure 3 were implemented on a Raspberry Pi 2 model B relying on the OpenCV libraries. We measured the power consumption for each image processing task with on the UNI-T tester [34] and the accuracy is in the range of 10 mA and 10 mV. Furthermore, we used the gettimeofday() function to record the execution time in the range of microseconds for each image processing task. Based on these measurements, we calculated the processing energy consumption for different partitioning configurations.The measured variation in computational latency was less than 5% and both energy consumption and latency measurements were obtained as an average of 10 measurements for each input image and processing configuration. In our analysis, we have also used four wireless communication technologies: BLE 5, 802.11n, LTE Cat.1 and LTE Cat. 4. For each data size resulting from the 10 input images and the inter-task data sizes, we have calculated the latency and energy consumption based on the communication model described in Section 3.2. First, we have extracted the data amounts based on the sizes of the images and their content (Section 5.3.1). Then, we have used the communication model described in Section 3.2 to generate both the time and the energy to communicate such data.

Experimental Results
This section shows the results obtained through the experiments where we analyze and characterize the processing and communication, pose the optimization problem and finally study how the best partitioning solutions are affected. Table 1 summarizes how the data amount and the processing time of different tasks depends on image size and number of objects.

Analysis of Inter-Task Data Amount
From the data amount perspective, we see that outputs of tasks 1, 2, 4, 5, 6, 7 and 8 are just sensitive to the change of the image size, while tasks 9 and 10 are sensitive to the number of objects. Task 3 is not sensitive to size or objects. Figure 5 depicts the result of the inter-task data amount experiment and shows the impact of image size and numbers of objects on the data amounts inside the processing pipeline ( Figure 5). We can see that the size of the primary input image has a much greater effect than the number of objects in the image. Only tasks 9 and 10 are sensitive to the number of objects. Table 1 shows the execution time of the tasks is also affected by the size (from task 1 to 9) and the objects (tasks 9 and 10) of the input image. The change in the execution time of each task grows linearly with respect to its input. We see that the time in task 9 varies by both image size and content, this is because Features Extraction analyzes pixels to provide an output based on objects. Figure 6 depicts the result of the Raspberry experiment and shows the impact of image size and numbers of objects on the processing time of individual processing tasks during the processing pipeline.   Changes in the inter-task data amount and processing times during processing lead us to think that partition points might also be affected. Both latency and energy at a partition point depend on two components: processing and communication (Equations (6) and (7)), which vary differently depending on the size of the image and the number of objects. Figures 7-10 break the analysis into processing and communication, considering delay or energy.

Characterization of Processing and Communication
We see that the processing times and energy incurred by tasks 9 and 10 are so small when compared to the other tasks, that their effect is hardly noticeable in Figures 7-10. Figure 9a shows the total latency of the partitions that results when summing the processing and communication components of Figure 7a,b. However, Figure 9b shows the total energy of the partitions that results from summing the processing and communication components of Figure 8a   The number of objects affects the processing latency only for partitions 9 and 10 ( Figure 10), where this effect is imperceptible in the plot as it is too small. Latency due to object number is independent of the image size; therefore, it becomes more important as we reduce the size. However, the effect of the number of objects on latency is very small, even for small images. Similarly, the effect of the number of objects on the processing energy is negligible. Therefore, we decided to ignore this effect in the remainder of our analysis.

Optimization problem
As the partitioning decision involves a trade-off between energy and latency, we can formulate different optimization problems.
Minimizing latency: Minimizing energy: Minimizing energy under a latency constraint: Figure 11 shows the best partitioning solutions for each optimization goal. Each of the figures present four different communication technologies (BLE5, Wifi, LTE Cat.4 and Cat.1) and a single processing platform (Raspberry). The graphs show the optimal partitions for the different image sizes, and we note that the optimal point is different for different image sizes, at least for some communication technologies.   In Figure 12, we show the times and energies of the partitions as a function of the image sizes in order to observe how the change in the partitioning solutions occurs. Then, the cases in Figure 12a-c correspond to the cases seen in Figure 11. However, as the case of LTE Cat.4 is repeated in Figure 11b,c, Figure 12c only shows the case for BLE5 communication. It is by applying the optimization functions to each of these graphs when we observe the changes in the best partitioning solutions. We have marked the partitioning solutions with circles that meet the optimization objective. 6. Discussion

Impact on System-Wide Energy and Latency
The intelligence partitioning aims at finding the optimal system partition point with respect to system latency and node energy. For that reason, we analyze the partition points from the time and energy perspectives, according to the objective functions in Equations (8)- (10).
We have observed that changing the size of the input image leads to a chain of effects that ultimately alter the latencies and energies of the system. First, changes in image size affect the inter-task data amount along the system during the processing chain ( Figure 5). This is due to some processing tasks are sensitive to image size, its content, or both ( Table 1). The data amount at the partitioning point directly affects communication latency and energy, and consequently, they are data-dependent (Figures 7b and 8b). In turn, the latency is also affected due to its dependency on the input data of the processing tasks ( Figure 7a). Finally, due to the relationship between time and energy in the processing platform, the processing energy varies with processing time (Figure 8a). Figure 11 presents three cases where the partitioning solutions change due to changes in the image size. The partitioning solution change could be related to how the contribution of the processing and communication components influences partition time and energy. In Figures 7 and 8, we see that the processing and communication behaviors at the partition points are completely different. For this reason, in early partitions, where the data amount in partition points is greater (Figure 5), there would be a small accumulated processing time, but a lot of communication. In addition, near the end of the system, there are small data amounts to be sent, but a lot of accumulated processing time. Due to this difference in data dependency, the best partitioning solution in latency, energy or both may vary with respect to the size of the image. To demonstrate this, Figures 13 and 14 show the contribution of the processing and communication of partitions 4 and 9 depending on the latency and energy in function of the image size.  Figure 12a-c. The time and energy of communication of partition 9 is not visible because the data size in partition 9 is very small, only reaching a maximum of 288 Bytes in comparison to 1.53 MBytes in partition 4. So, the time used to send data in partition 9 is around microseconds, which is much lower than partition 4, which is in seconds. The same observation applies to communication energy.

Impact on Optimal Partition Point Selection
From Figure 13a,b, we see that partition 4 is dominated by communication, while partition 9 is dominated by processing, because, in partition 4, there is a large data amount to be sent, requiring more time and energy compared to the processing part up to that point. However, the processing dominance of partition 9 is due to the small data amount to be sent, but in the final stage of the processing, a lot of processing time and energy has been spent. Figure 14a shows the latency perspective, with the latency constraint marked with a purple line (1 s). We note that due to communication dominance, partition 4 only meets the time constraint from size 292 × 436 × 3 to 1144 × 1720 × 3. However, partition 9 meets the time constraint of sizes 292 × 436 × 3 to 2280 × 3432 × 3. Figure 14b shows the energy perspective. This time we have marked the partition points that do not meet the delay constraint with an "x". This way, we see that although a lot of processing energy is expended in partition 9, due to the delay constraint, this is the best partitioning solution for size 1144 × 1720 × 3 to size 2280 × 3432 × 3.
Both the processing and communication time and energy are dependent on the data, so the best partitioning solutions are closely related to the image size. Hence, the best partitioning solution is different for different image sizes. In addition, the type of optimization problem also influences the optimal solution.

How to Make Partition Decisions Based on Image Size?
In this work, we have analyzed the impact of the size and content of the image on the deployment of system tasks. We have shown that the size of the image influences partitioning decisions and studied why, which is our main contribution. We have created this section so that designers can take advantage of this knowledge and thus apply it, to achieve a more effective node offload.
Our study shows that the best implementation depends on the details of the application requirements (such as minimum image resolution, power and energy budgets), and the communication technology and protocol used. Because the optimal partitioning choice is sensitive to all these parameters, we advocate a systematic and quantitative exploration of the design space ( Figure 15).
Partitioning analysis: 1. Select or create a typical image of the application in question. Subsequently, scale the image in order to have a dataset with different resolutions. 2.
Run the system for each of the images in the dataset and: • Measure the amount of data at the outputs of each of the tasks in the system; • Measure the execution time and processing energy at the outputs of each of the tasks in the system; • Measure the communication time and energy at the outputs of each of the tasks in the system employing the communication framework [31]. 3.
If the system is not purely sequential, serialize and assign the times, energies and amounts of data to the composite tasks.

4.
Apply Equations (6) and (7) to calculate the time and energy of the partitions.

5.
Apply the optimization objective in order to find the image size that allows the optimal partitioning point. Figure 15. Example of partitioning analysis. In this example, we evaluate partition 3 with respect to a set of four images which vary in size. The partition analysis must be done on all possible partition points of the system.
To summarize, the latency and energy consumption for both processing and communication are dependent on the image size and the inter-task data amount, which define where would be the most optimal partitioning point for a given scenario. These results prove that the inter-task data amount has a major importance in the design space exploration of data intensive IoT nodes, as the one introduced in our recent work [35].

Conclusions
We have applied intelligence partitioning with the purpose of offloading an image processing-based IoT node with respect to node energy and to reduce latency. This work shows that the optimal partitioning point depends heavily on the selection of the input image size, but is hardly affected by the number of objects in the image. In addition, the optimal partitioning point also depends on the specific objective function.
The reason why size affects partitioning points is due to two aspects. The first is that, during processing, there are more tasks processing pixels than processing data referring to objects. Because of this, the cumulative effect of pixel processing is much larger than processing dedicated to objects making size more relevant. The second aspect concerns the communication of data at the point of partitioning. We have found that data communication dominates over processing in both energy and time resource consumption at the node. In this way, the relevance of image size is enhanced because the largest amounts of data to be sent are found in the partitions made at the beginning of the processing stage. In addition, due to the reduction effect of the algorithms, the amount of data to be processed in the object processing stage is much lower than in the pixel stage, so sending data is less costly, making objects less important.