Implementation of Multi-Exit Neural-Network Inferences for an Image-Based Sensing System with Energy Harvesting

: Wireless sensor systems powered by batteries are widely used in a variety of applications. For applications with space limitation, their size was reduced, limiting battery energy capacity and memory storage size. A multi-exit neural network enables to overcome these limitations by ﬁltering out data without objects of interest, thereby avoiding computing the entire neural network. This paper proposes to implement a multi-exit convolutional neural network on the ESP32-CAM embedded platform as an image-sensing system with an energy constraint. The multi-exit design saves energy by 42.7% compared with the single-exit condition. A simulation result, based on an exemplary natural outdoor light proﬁle and measured energy consumption of the proposed system, shows that the system can sustain its operation with a 3.2 kJ (275 mAh @ 3.2 V) battery by scarifying the accuracy only by 2.7%.


Introduction
Dramatic advances in computation requires an increasing amount of data to analyze. Sensors became an essential device to collect the data from a physical world. Electronic sensing systems have been utilized in a variety of applications, including biomedical observation, civil engineering monitoring, and energy resource detection [1][2][3][4][5][6][7][8]. Size of the sensors have reduced to fit into a greater number of applications, and they employ batteries to power themselves without external power connection for easier placement [9][10][11][12][13][14][15][16][17]. As an example, three AAA-sized batteries with an energy capacity of 2400 mAh can continuously power a wireless sensor for 2.4 months [17]. To maximize system lifetime for a given battery capacity, a sensing system uses a duty-cycled operation between active and sleep modes [14,16]. It saves the total energy consumption or the average power consumption by reducing power in sleep mode for a long time; during that period the system does not need to be fully operated. The second solution is to develop low-power circuits both for active and sleep modes [18][19][20][21][22][23][24][25]. Lastly, the system includes an energy harvester to recharge the connected battery using environment energy [17]. However, the harvested power is typically lower than power consumption in active mode.
An image is one of most popular data to analyze a target. Low-power wireless image systems have been studied to operate for an extended time at a given battery capacity [25]. Similar with other data (e.g., acceleration), the system experiences a trade-off between sampling frequency and power consumption/data storage size. Slower sampling frequency saves power consumption and data storage size while increasing a chance to miss important data. For lower data storage size even with a high sampling rate, image-based sensors recently employ machine learning algorithms to evaluate if images include objects of interest [26]. It only stores the useful images in memory. For image recognition, a Convolutional Neuron Network (CNN) is widely used [27]. It generates of interest [26]. It only stores the useful images in memory. For image recognition, a Convolutional Neuron Network (CNN) is widely used [27]. It generates a classification label by convoluting image data with proper weights, through multiple layers. The label indicates a type of object in the original image. However, CNN requires heavy computation (millions of multiply-accumulation (MAC) operations [27]) and thus high power consumption, which is critical for a battery-powered image system with energy harvesting.
There has been continuous effort to reduce energy consumption in neural network computation. To overcome energy shortage in intermittent computing systems with energy harvesting, software compression and hardware acceleration are proposed for embedded systems [28]. However, it is not suitable for tasks that require instant outcomes since this approach completes one inference in multiple energy cycles. Instead, recently, multi-exit CNNs were proposed [27,29], as shown in Figure 1. It has multiple paths that generate labels from an image. Each path has different depth and thus different cost of time. Shorter path provides less accuracy, but it uses entropy of an inference result to check its confidence. This approach is applied to millimeter scale systems with energy harvesting and µ Ah-level battery [30], but the inference takes up to 4.1 min. Zeng, et al. [31] applies a multi-exit CNN to industrial internet-of-things to satisfy various timing requirements for real-time processing, but power reduction was not a primary consideration. In this paper, we propose an image-based inferencing system operating on a commercial ESP32-CAM microcontroller. Periodically, the system takes an image by a connected camera, runs a three-exit CNN the image, and obtains a label. The three exits achieve accuracy of 60.5%, 70% and 76% from the CIFAR-10 dataset. The system detects battery energy level by its built-in ADC at each branch and selects the optimal path accordingly. At the end of two early exits, it calculates entropy to find if a deeper path is necessary. The system saves energy consumption by 42.54% at 240 MHz clock frequency. A simulation result, based on an exemplary natural outdoor light profile [17] and measured energy consumption of the proposed system, achieves accuracy of 72.1% with an entropy threshold of 1.9 and a battery energy threshold of 1500 J. It requires battery energy of 3.2 kJ, which can be supported by a battery with size of 43 mm × 14 mm × 14 mm [32].
The rest of this article is arranged as follows. Section II introduces a target system. Section III proposes the CNN architecture, and Section IV shows the hardware platform. Sections V and VI show measurement results and evaluation results for long-term use, respectively. Finally, Section VII concludes this paper.

Target System
We target an image-based system that can be attached to a static location (e.g., tree) and monitor the objects. Figure 2 shows the software and hardware diagram of a target In this paper, we propose an image-based inferencing system operating on a commercial ESP32-CAM microcontroller. Periodically, the system takes an image by a connected camera, runs a three-exit CNN the image, and obtains a label. The three exits achieve accuracy of 60.5%, 70% and 76% from the CIFAR-10 dataset. The system detects battery energy level by its built-in ADC at each branch and selects the optimal path accordingly. At the end of two early exits, it calculates entropy to find if a deeper path is necessary. The system saves energy consumption by 42.54% at 240 MHz clock frequency. A simulation result, based on an exemplary natural outdoor light profile [17] and measured energy consumption of the proposed system, achieves accuracy of 72.1% with an entropy threshold of 1.9 and a battery energy threshold of 1500 J. It requires battery energy of 3.2 kJ, which can be supported by a battery with size of 43 mm × 14 mm × 14 mm [32].
The rest of this article is arranged as follows. Section 2 introduces a target system. Section 3 proposes the CNN architecture, and Section 4 shows the hardware platform. Sections 5 and 6 show measurement results and evaluation results for long-term use, respectively. Finally, Section 7 concludes this paper.

Target System
We target an image-based system that can be attached to a static location (e.g., tree) and monitor the objects. Figure 2 shows the software and hardware diagram of a target system. To detect specific objects (e.g., wild animals), the system repeats active and sleep modes, as shown in Figure 2a. Once the system transits from sleep to active mode by a timer, the system takes an image using a camera and categorizes its contents by a multi-exit CNN. If the captured image contains a classified object, the system stores the picture in a data storage. Otherwise, it goes back to sleep mode without storing the image. Figure 2b shows the hardware structure of the targeted system. A microprocessor controls the mode transition between active and sleep modes, image capture and process, and data transfer. Data storage such as an SD card stores selected images and coefficients of CNN. The entire system is powered by a battery. An energy harvester (e.g., solar panel) recharges a battery to extend system lifetime. system. To detect specific objects (e.g., wild animals), the system repeats active and sleep modes, as shown in Figure 2a. Once the system transits from sleep to active mode by a timer, the system takes an image using a camera and categorizes its contents by a multiexit CNN. If the captured image contains a classified object, the system stores the picture in a data storage. Otherwise, it goes back to sleep mode without storing the image. Figure  2b shows the hardware structure of the targeted system. A microprocessor controls the mode transition between active and sleep modes, image capture and process, and data transfer. Data storage such as an SD card stores selected images and coefficients of CNN.
The entire system is powered by a battery. An energy harvester (e.g., solar panel) recharges a battery to extend system lifetime.  Figure 3 shows the proposed multi-exit CNN. We added more layers to the model of [27] and modified part of its parameters for higher accuracy. It processes images with 3 channels as input (red, green, and blue). Each channel has a resolution of 32 × 32 and an 8-bit color depth, which is identical with the CIFAR-10 dataset. The convolution (CVx) convolves the input feature map with several filters and generates the output feature map. A ReLU function (RLx) activates the output feature map by substituting the negative values with 0. Some specific ReLU layers are followed by max pooling (PLx). At the end of each exit, fully connected layers (FCx) generate the 10 final outputs. The CNN has 3 exits, marked as early exit1, early exit2 and main exit. At two branching points (after RL2 and after PL3), the system measures the battery voltage and finds the consumed battery energy. If the consumed energy is higher than a threshold (BATTHx), the narrower path is selected (PLa1 or PCb1). The model is trained including the 3 exits together with the same importance, and the average loss of the 3 exits is minimized. At the output of the two early exits, it checks entropy of the outputs as a confidence level. If the entropy is larger than a  Figure 3 shows the proposed multi-exit CNN. We added more layers to the model of [27] and modified part of its parameters for higher accuracy. It processes images with 3 channels as input (red, green, and blue). Each channel has a resolution of 32 × 32 and an 8-bit color depth, which is identical with the CIFAR-10 dataset. The convolution (CVx) convolves the input feature map with several filters and generates the output feature map. A ReLU function (RLx) activates the output feature map by substituting the negative values with 0. Some specific ReLU layers are followed by max pooling (PLx). At the end of each exit, fully connected layers (FCx) generate the 10 final outputs. The CNN has 3 exits, marked as early exit1, early exit2 and main exit. At two branching points (after RL2 and after PL3), the system measures the battery voltage and finds the consumed battery energy. If the consumed energy is higher than a threshold (BAT THx ), the narrower path is selected (PLa1 or PCb1). The model is trained including the 3 exits together with the same importance, and the average loss of the 3 exits is minimized. At the output of the two early exits, it checks entropy of the outputs as a confidence level. If the entropy is larger than a threshold value (ENT TH ), the result is more likely to be unreliable, and the CNN returns the processing back to a deeper path (CV3 or CV4). The entropy is calculated as:

Proposed Multi-Exit CNN
(1) (2) J. Low Power Electron. Appl. 2021, 11, x FOR PEER REVIEW 4 of 14 threshold value (ENTTH), the result is more likely to be unreliable, and the CNN returns the processing back to a deeper path (CV3 or CV4). The entropy is calculated as:  Figure 3. Proposed multi-exit CNN diagram. For energy efficiency, the system employs fixed-point data and coefficients. Figure 4a shows distribution of the coefficients. The coefficients mainly distribute between −1.0 and 1.0, implying that the decimal number is more important than the integer. Figure 4b,c shows accuracy and storage size for coefficients for 8-bit, 16-bit and double-type data, respectively. Compared with the double-type data, using 8-bit integer format saves the storage size by 86.8% at a cost of accuracy drop of 4.3%.
J. Low Power Electron. Appl. 2021, 11, x FOR PEER REVIEW 5 of 14 For energy efficiency, the system employs fixed-point data and coefficients. Figure  4a shows distribution of the coefficients. The coefficients mainly distribute between −1.0 and 1.0, implying that the decimal number is more important than the integer. Figure 4b,c shows accuracy and storage size for coefficients for 8-bit, 16-bit and double-type data, respectively. Compared with the double-type data, using 8-bit integer format saves the storage size by 86.8% at a cost of accuracy drop of 4.3%.   Figure 5 shows the implemented hardware for the proposed system, consisting of an ESP32-CAM module, an OV2460 camera module (1632 × 1232 resolution), and an SD card (≥128 MB). Table 1 details the ESP32-CAM module [33]. The processor of the ESP32-CAM module operates at a clock frequency of 160 MHz. At the maximum frequency, the entire operation including image capture and CNN inference is processed in 35.7 s. The module takes 5 or 3.3 V as a supply voltage.

Image Sensor
Micro-controller SD Card    Figure 5 shows the implemented hardware for the proposed system, consisting of an ESP32-CAM module, an OV2460 camera module (1632 × 1232 resolution), and an SD card (≥128 MB). Table 1 details the ESP32-CAM module [33]. The processor of the ESP32-CAM module operates at a clock frequency of 160 MHz. At the maximum frequency, the entire operation including image capture and CNN inference is processed in 35.7 s. The module takes 5 or 3.3 V as a supply voltage.

Hardware Implementation
For energy efficiency, the system employs fixed-point data and coefficients. Figure  4a shows distribution of the coefficients. The coefficients mainly distribute between −1.0 and 1.0, implying that the decimal number is more important than the integer. Figure 4b,c shows accuracy and storage size for coefficients for 8-bit, 16-bit and double-type data, respectively. Compared with the double-type data, using 8-bit integer format saves the storage size by 86.8% at a cost of accuracy drop of 4.3%.  Figure 5 shows the implemented hardware for the proposed system, consisting of an ESP32-CAM module, an OV2460 camera module (1632 × 1232 resolution), and an SD card (≥128 MB). Table 1 details the ESP32-CAM module [33]. The processor of the ESP32-CAM module operates at a clock frequency of 160 MHz. At the maximum frequency, the entire operation including image capture and CNN inference is processed in 35.7 s. The module takes 5 or 3.3 V as a supply voltage.

Image Sensor
Micro-controller SD Card    Figure 6 shows the current consumption in active and sleep modes. The sleep mode saves power consumption by 93.7% compared with the active mode. Figure 7 depicts the data flow between external storage and SRAM. After processing each layer, the coefficients for the used layer are replaced by those for the new layer. All the inputs and outputs of the layers are kept in memory. Thus, the CNN processing can roll back to a former layer before branching when entropy is larger than a threshold.

Active
Deep   of the layers are kept in memory. Thus, the CNN processing can roll back to a forme before branching when entropy is larger than a threshold.     Figure 6 shows the current consumption in active and sleep modes. The sleep mode saves power consumption by 93.7% compared with the active mode. Figure 7 depicts the data flow between external storage and SRAM. After processing each layer, the coefficients for the used layer are replaced by those for the new layer. All the inputs and outputs of the layers are kept in memory. Thus, the CNN processing can roll back to a former layer before branching when entropy is larger than a threshold.   Figure 9 shows measured results of the total inference at different clock frequencies and exits; the maximum, minimum and average values from 20 samples are shown. In Figure 9a, the average current consumption increases from 35.2 to 72.9 mA as frequency increases from 80 to 240 MHz. In Figure 8b, the processing time decreases roughly linearly with higher clock frequency. The clock frequency of 240 MHz saves processing time by around 65%. Figure 8c shows the energy consumption for each exit at different frequencies. Compared with the main exit, the early exit1 saves energy by 83.5%, and the early exit2 saves energy by 56.3%. The highest frequency (240 MHz) costs the lowest energy consumption (1.3, 3.5 and 8.2 J) for all 3 exits. It means that energy reduction from shorter processing time at higher frequency is larger than energy increase due to higher power consumption; this results from power consumption that does not depend on clock frequency.   Figure 9a, the average current consumption increases from 35.2 to 72.9 mA as frequency increases from 80 to 240 MHz. In Figure 8b, the processing time decreases roughly linearly with higher clock frequency. The clock frequency of 240 MHz saves processing time by around 65%. Figure 8c shows the energy consumption for each exit at different frequencies. Compared with the main exit, the early exit1 saves energy by 83.5%, and the early exit2 saves energy by 56.3%. The highest frequency (240 MHz) costs the lowest energy consumption (1.3, 3.5 and 8.2 J) for all 3 exits. It means that energy reduction from shorter processing time at higher frequency is larger than energy increase due to higher power consumption; this results from power consumption that does not depend on clock frequency. Figure 10 shows measured CNN inference with 10,000 images from the CIFAR-10 dataset. Figure 10a,b shows entropy distribution from early exit1 and early exit2, respec-tively. In both figures, the average entropy of the correct inferences is lower than that of the incorrect inferences, matching with the proposed confidence checking method at the output of earlier exits. The proposed system considers lower entropy as higher confidence in the inference result at an exit. Figure 10c,d shows the number of accepted results with different ENT TH and the accuracy among them. Note that the accuracy does not include the results of samples discarded by the entropy check. Beginning from 0.1, only 3% (10%) of the results are accepted with an accuracy of 97% (99%) at early exit1 (early exit 2). As ENT TH increases, more results are accepted at the early exit, resulting in lower accuracy. When ENT TH is higher than 3.1, all results are accepted, and the accuracy becomes equal to the accuracy of a single path (early exit1 or early exit2). Figure 10e shows the overall accuracy using all 3 exits across ENT TH . Note that inference results are obtained for all the samples, which is different from Figure 10c. As ENT TH increases, the overall accuracy decreases from the main exit only accuracy to the early exit1 only accuracy. For example, the accuracy is 72% at ENT TH = 1.4. Figure 9f shows that the average processing time is 20.7 s at ENT TH = 1.4. Compared to a single-exit CNN with only the main exit, the proposed multi-exit CNN system reduces the processing time by 42.5% and thus saves energy consumption by the same amount, at the cost of an accuracy loss of 2.9%.  Figure 10 shows measured CNN inference with 10,000 images from the CIFAR-10 dataset. Figure 10a,b shows entropy distribution from early exit1 and early exit2, respectively. In both figures, the average entropy of the correct inferences is lower than that of the incorrect inferences, matching with the proposed confidence checking method at the output of earlier exits. The proposed system considers lower entropy as higher confidence in the inference result at an exit. Figure 10c,d shows the number of accepted results with different ENTTH and the accuracy among them. Note that the accuracy does not include the results of samples discarded by the entropy check. Beginning from 0.1, only 3% (10%) (f) Figure 10. Measured CNN inference with 10,000 images from the CIFAR-10 dataset: (a) entropy distribution for early exit1, (b) entropy distribution for early exit2, (c) acceptance ratio and accepted accuracy across ENTTH for early exit1, (d) acceptance ratio and accepted accuracy across ENTTH for early exit2, (e) overall accuracy over 3 exits across ENTTH, and (f) average processing time for each image across ENTTH. Figure 10. Measured CNN inference with 10,000 images from the CIFAR-10 dataset: (a) entropy distribution for early exit1, (b) entropy distribution for early exit2, (c) acceptance ratio and accepted accuracy across ENT TH for early exit1, (d) acceptance ratio and accepted accuracy across ENT TH for early exit2, (e) overall accuracy over 3 exits across ENT TH , and (f) average processing time for each image across ENT TH .

Simulation Results for Long Term Operation
To evaluate the energy saving of the proposed multi-exit CNN system for a target system, we perform simulations using MATLAB, based on an exemplary natural outdoor light profile [17], measurement results of an energy harvester [17] and the proposed system. The light profile is obtained from 5 HOBO MX2202 light sensors in the Beechwood Farms Nature Reserve of Audubon Society of Western Pennsylvania in Allegheny County, Pennsylvania.
The charging power of an energy harvester is measured as shown in Figure 11. The energy harvesting system includes a solar panel (Adafruit 200) and an energy harvesting chip with power management (TI BQ25570). The output of the energy harvester is connected to the ESP32 module as a power supply. A Keithley 2401 source measurement unit measures the harvested power. Simulation emulates a scenario where the proposed system wakes up every 3 min, performs a CNN inference operation, and then enters sleep mode again. To suppress the sleep-mode power consumption and achieve sustainable system operation, we include models of a low-power timer (TI TPL5111) and a switch (ZVN2110A). In sleep mode, the timer consumes 35 nA counting for a fixed period, and the leakage current of the switch is only 30 pA. As the timer reaches a threshold, the switch is turned on, and the system enters the active mode. Thus, the system power consumption is considered as 35 nA in sleep mode, mainly due to the low-power timer.

Simulation Results for Long Term Operation
To evaluate the energy saving of the proposed multi-exit CNN system for a target system, we perform simulations using MATLAB, based on an exemplary natural outdoor light profile [17], measurement results of an energy harvester [17] and the proposed system. The light profile is obtained from 5 HOBO MX2202 light sensors in the Beechwood Farms Nature Reserve of Audubon Society of Western Pennsylvania in Allegheny County, Pennsylvania.
The charging power of an energy harvester is measured as shown in Figure 11. The energy harvesting system includes a solar panel (Adafruit 200) and an energy harvesting chip with power management (TI BQ25570). The output of the energy harvester is connected to the ESP32 module as a power supply. A Keithley 2401 source measurement unit measures the harvested power. Simulation emulates a scenario where the proposed system wakes up every 3 min, performs a CNN inference operation, and then enters sleep mode again. To suppress the sleep-mode power consumption and achieve sustainable system operation, we include models of a low-power timer (TI TPL5111) and a switch (ZVN2110A). In sleep mode, the timer consumes 35 nA counting for a fixed period, and the leakage current of the switch is only 30 pA. As the timer reaches a threshold, the switch is turned on, and the system enters the active mode. Thus, the system power consumption is considered as 35 nA in sleep mode, mainly due to the low-power timer.

Solar Panel
Energy Harvester ESP32-CAM Module Source Meter Figure 11. Testing setup for the proposed system. Figure 12 shows the simulated long-term operation. Figure 12a shows the accuracy across BATTH1 and BATTH2 when ENTTH is set to 2.0. As a wider margin is given to the energy budget (higher BATTHX), the main exit is selected more frequently, resulting in higher accuracy. At BATTH1 = 1500 J and BATTH2 = 1500 J, the accuracy is 72.11%, which is lower than the main-exit only approach (74.77%) by 2.66%. Figure 12b show the required battery capacity without power outage. Figure 12c shows the average processing time per wakeup. At BATTH1 = 1500 J and BATTH2 = 1500 J, a battery with capacity of 3050 J and average processing time of 0.4 min is required. This can be implemented through a rechargeable battery (43 mm × 14 mm × 14 mm) [32]. Figure 12d shows the distribution of the selected path for each inference at BATTH1 = 1500 J and BATTH2 = 1500 J. Among the Figure 11. Testing setup for the proposed system. Figure 12 shows the simulated long-term operation. Figure 12a shows the accuracy across BAT TH1 and BAT TH2 when ENT TH is set to 2.0. As a wider margin is given to the energy budget (higher BAT THX ), the main exit is selected more frequently, resulting in higher accuracy. At BAT TH1 = 1500 J and BAT TH2 = 1500 J, the accuracy is 72.11%, which is lower than the main-exit only approach (74.77%) by 2.66%. Figure 12b show the required battery capacity without power outage. Figure 12c shows the average processing time per wakeup. At BAT TH1 = 1500 J and BAT TH2 = 1500 J, a battery with capacity of 3050 J and average processing time of 0.4 min is required. This can be implemented through a rechargeable battery (43 mm × 14 mm × 14 mm) [32]. Figure 12d shows the distribution of the selected path for each inference at BAT TH1 = 1500 J and BAT TH2 = 1500 J. Among the 9569 inferences, 59.8% are processed by the two early exits to save energy without skipping any inference. 17% of the early exit1 results exceed the entropy threshold and are processed one more time with early exit2. 10.2% of the early exit2 results are re-processed by the main exit. 9569 inferences, 59.8% are processed by the two early exits to save energy without skipping any inference. 17% of the early exit1 results exceed the entropy threshold and are processed one more time with early exit2. 10.2% of the early exit2 results are re-processed by the main exit.  Figure 13a shows the accuracy across ENTTH from 0.1 to 3.1 at BATTH1 = 1500 J and BATTH2 = 1500 J. The accuracy decreases from 74.77% to 71.95% by using more early exits. At ENTTH = 1.9, the required battery capacity reduces from 19.5 to 3.2 kJ by the reduced re-calculation as shown in Figure 13b. In Figure 13c, the average process time also reduces from 0.69 to 0.40 min.  Figure 13a shows the accuracy across ENT TH from 0.1 to 3.1 at BAT TH1 = 1500 J and BAT TH2 = 1500 J. The accuracy decreases from 74.77% to 71.95% by using more early exits. At ENT TH = 1.9, the required battery capacity reduces from 19.5 to 3.2 kJ by the reduced re-calculation as shown in Figure 13b. In Figure 13c, the average process time also reduces from 0.69 to 0.40 min. Figure 14 shows the battery energy across time. With the energy harvester, the system can regain the consumed power in a duty-cycled operation and achieve energy autonomy. The worst case consumes 3163 J. The operation of the system can be sustained by a rechargeable battery, which has energy capacity as high as 4.6 kJ [32]. It proves the feasibility of such systems.  Figure 14 shows the battery energy across time. With the energy harvester, the system can regain the consumed power in a duty-cycled operation and achieve energy autonomy. The worst case consumes 3163 J. The operation of the system can be sustained by a rechargeable battery, which has energy capacity as high as 4.6 kJ [32]. It proves the feasibility of such systems.

Conclusions
This paper demonstrates the feasibility of implementing a CNN in a battery-powered sensing system. By using multiple exits with different depths, the proposed system analyzes captured images with shorter time and lower energy by 42.5% at the cost of 2.9% accuracy drop, compared with a conventional, single-exit CNN. Simulation results, based on an exemplary natural outdoor light profile and measured energy consumption of the proposed system, show that the system can sustain its operation with a 3.2 kJ (275 mAh @ 3.2 V) battery by scarifying the accuracy only by 2.7%.   Figure 14 shows the battery energy across time. With the energy harvester, the system can regain the consumed power in a duty-cycled operation and achieve energy autonomy. The worst case consumes 3163 J. The operation of the system can be sustained by a rechargeable battery, which has energy capacity as high as 4.6 kJ [32]. It proves the feasibility of such systems.

Conclusions
This paper demonstrates the feasibility of implementing a CNN in a battery-powered sensing system. By using multiple exits with different depths, the proposed system analyzes captured images with shorter time and lower energy by 42.5% at the cost of 2.9% accuracy drop, compared with a conventional, single-exit CNN. Simulation results, based on an exemplary natural outdoor light profile and measured energy consumption of the proposed system, show that the system can sustain its operation with a 3.2 kJ (275 mAh @ 3.2 V) battery by scarifying the accuracy only by 2.7%.

Conclusions
This paper demonstrates the feasibility of implementing a CNN in a battery-powered sensing system. By using multiple exits with different depths, the proposed system analyzes captured images with shorter time and lower energy by 42.5% at the cost of 2.9% accuracy drop, compared with a conventional, single-exit CNN. Simulation results, based on an exemplary natural outdoor light profile and measured energy consumption of the proposed system, show that the system can sustain its operation with a 3.2 kJ (275 mAh @ 3.