Edge Devices for Internet of Medical Things: Technologies, Techniques, and Implementation

: The health sector is currently experiencing a signiﬁcant paradigm shift. The growing number of elderly people in several countries along with the need to reduce the healthcare cost result in a big need for intelligent devices that can monitor and diagnose the well-being of individuals in their daily life and provide necessary alarms. In this context, wearable computing technologies are gaining importance as edge devices for the Internet of Medical Things. Their enabling technologies are mainly related to biological sensors, computation in low-power processors, and communication technologies. Recently, energy harvesting techniques and circuits have been proposed to extend the operating time of wearable devices and to improve usability aspects. This survey paper aims at providing an overview of technologies, techniques, and algorithms for wearable devices in the context of the Internet of Medical Things. It also surveys the various transformation techniques used to implement those algorithms using fog computing and IoT devices.


Introduction
The need for a better quality of life has led to revolutions in many aspects of human life. The healthcare system is considered vital for modern society. In the legacy system, patients have to move to a healthcare center for treatment or diagnosis. Additionally, patients with some chronic diseases, such as cardiovascular and diabetes, have to be checked on a regular basis. However, the access to healthcare services for remote people in the legacy system is considered a challenge. The 2014 statistics from the world health organization (WHO) emphasise that there are over 422 million diabetes patients worldwide. In 2019, over 1.5 million people world-wide lost their lives due to diabetes which is a cause for real concern [1]. International communities and national governments have adopted e-health and mobile health systems to improve healthcare services and make them affordable and accessible for everyone [2,3]. Figure 1 shows the three pillars of the e-health system.
The miniaturization of electronic devices along with advances in communication technologies have enabled the renovation of the healthcare system, among others, by telemonitoring of patients, tracking of the health record, automatic emergency calls, and telediagnosis. For instance, it has been observed by using empirical data that type-1 diabetes patients are threatened by hypoglycemia, which is the principal cause of the sudden nocturnal death. Hypoglycemia can be easily detected by a continuous monitoring of the glucose level and the electrocardiogram (ECG) signal. In case the QT-interval of the ECG signals is larger than 445 ms and the glucose level is below 3.5 mmol/L, then the patient will be at a high risk of sudden death [4]. Wearable technologies, such as smart watches and wristbands, are some of the enabling technologies for e-health, as shown in Figure 2. An e-health system is generally composed of, among others, wireless sensors that form a body area network (BAN), wearable devices to collect data from the sensors, a gateway for internet access, a cloud server to process and store e-health data. The server is usually located in health clinics or hospitals. Wearables are battery-limited devices, which creates a stringent limit on their computational capability. To extend further the operating time of the wearables, energy-harvesting techniques, wireless power transfer along with duty cycling have been suggested in many published reports [5]. Such systems are composed of a communication module, sensors, micro controllers, an energy management unit, battery, and an energy scavenging or receiver unit.
The Internet of Medical Things, IoMT, is a system in which medical devices and healthcare units are interconnected using IoT technology [6]. Wearable medical devices are cardinals in the IoMT as they can collect vital biological data to healthcare givers. The IoMT is a system of systems in which a multitude of technologies, platforms, and algorithms are used all the way from the sensor layer to the cloud computing layer.
This contribution aims at reviewing latest breakthrough in the IoMT by focusing on the the following domains: • Semiconductor and multicore technology. • Energy harvesting and transfer. • Algorithm transformation techniques. • Implementation of deep neural networks on edge devices. • User-centered design of IoMT.
The paper is organized as follows. Section 2 compares our work with existing review works. Section 3 describes the technologies, algorithms and paradigm in designing energy-efficient IoMT edge devices. Section 4 focuses on the energy harvesting sources and solutions for the IoMT low-end node. Section 5 presents platforms and transformation techniques for the energy efficient implementation of DNN algorithm on IoMT edge devices. Section 6 presents approaches for a user-centered design of IoMT. Finally, Section 7 concludes the paper. Table 1 summarizes the list of abbreviations used throughout the paper. The list of symbols, their meanings and typical values are shown in Table 2.

Related Works
Numerous survey papers on e-health have been published recently [7][8][9][10][11][12][13][14][15]. A systematic review of the wearable sensors to monitor human activities is the focus of [7]. The work in [8] surveyed sensors, communication protocols and hardware platforms used in wearable and unobtrusive devices. The authors discussed the following short-range communication protocols: Zigbee, NFC, UWB, and WiFi. However, the security and the algorithms used in the wearable devices were not considered. The authors of [9] summarized the research work in wireless communication (WAN, LAN, and BAN) and briefly discussed energy harvesting for wearables. However, energy-efficiency cannot be achieved by looking at wireless communication only. Innovation in sensor technology, and the low-power design of the integrated circuits for wearables from the system down to the layout should be considered. On the same topic, in [10], wireless communication protocols for a body area network were compared based on four metrics: transmission range, latency, power consumption, security and privacy, and data rate. It was concluded that ZigBee and low-power Bluetooth are the most promising candidates for BAN. The design requirements of the biomedical wearables were the focus of the work presented in [11]. The authors developed factors that should be considered for developing biomedical wearables. Those factors include cost, features of the biomedical signal, human factors, and the ecosystem. The study focused on hardware requirements for four sensors (PPG, EMG, microphone, and IMU) and two embedded systems (Arduino and Raspberry PI). The work lacks discussions of algorithms used to process big IoMT data, energy harvesting, power management techniques which are vital for wearables. A few shortcomings of the previous survey work have been addressed in [12]. The authors reviewed the existing technologies, algorithms, and architectures for the processing of medical data. In particular, the authors described the architecture, technologies, and application of the IoMT, cloud computing IoMT and edge-cloud IoMT. The discussed IoMT technologies are RFID, wireless sensor networks and middleware. The authors considered AI the technology of the cloud computing layer. However, the trend in the IoT is to move AI to the edge and fog layers. The work of [13] reviewed existing IoMT monitoring systems and the machine learning algorithms used at the edge layer. The application of edge, fog, and cloud computing in e-health was the focus of the review work described in [15]. The authors surveyed machine learning algorithms for monitoring, classification of biomedical signals and predictions of patient well-being. They further analyzed the pros and cons of edge intelligence for an IoT health care system. However, hardware implementation techniques of machine learning algorithms' target reliability and power efficiency were overlooked. The survey work reported in [14] dealt with summarizing latest research breakthroughs in technologies, services, and applications of health IoT systems (HIoT). Furthermore, the authors provided open issues in adopting HIoT. Among the listed challenges are energy consumption, interoperability, security and privacy, and scalability. The technologies considered by the authors are the identification standards, wireless communication standards, and localization services. Table 3 compares our work with the existing survey papers in this field.

Technologies for IoMT Edge Devices
For years, the CMOS transistor was the subject of aggressive scaling. The aim was to reduce the feature size of the transistor to gain performance, lower power consumption, and increase the circuit complexity, among other benefits.
Let S be the scaling factor. According to Dennard's scaling law, as clock frequency increases by S, the switched power dissipation and the area per gate decreases by S 2 . For decades, Dennard's law has been used as a guideline to scale the supply voltage for the CMOS technology. According to the ITRS roadmap, 2009 edition, the scaling factor S was 1.44 per technology cycle. Per contra, the scaling of the supply voltage and the threshold voltage for a gate length below 90 nm do not scale at the same speed as the gate length [16].
In sub-nanometer design, the planar bulk CMOS reached a dead-end [17,18]. The known factors that led to the dispensing of bulk CMOS block are the short-channel effect and leakage current. In the 70 nm channel length, the active mode leakage current is responsible for over 40% of the total power consumption [19]. Dark silicon has emerged as a design challenge in the nanometer regime. Several design techniques to circumvent the dark silicon issues have been elaborated upon in [20,21].
The FinFET transistor has been proposed as a substitute for the bulk CMOS transistor [22]. In the double-gate FinFET, the channel length is controlled by two gates. The gates can be independent (IG-FinFET) or shorted (SG-FinFET). Faster logic gates are constructed using SG-FinFET, whereas low-power devices are designed using IG-FinFET.
The micro-controller is the core technology for both low-end and middle-end devices [23]. For the high-end device, the single board device is dominant technology.
The market survey shows that ARM's 32-bit processor, Cortex-M3, is the leading technology for the development of low-end devices. Table 4 summarizes the processor/microcontroller family, manufacturer, and target application using the cortex-M3 core. The processor manufactured using 28 nm technology consumes 8 µW/MHz. The reported efficiency of the processor was 3.34 CoreMark/MHz. The Cortex-M3 can be deployed at various levels in the IoT chain (end-device, gate-ways, and cloud services).
The Cortex-M3 is based on Armv7-M architecture, using a secure and open-source operating system, mbed. The security of the mbed is insured using a multilayer security architecture: hardware, software, communication, and life-cycle. Furthermore, mbed supports the following short-range and long-range connectivity: Bluetooth low-energy, Wi-Fi, Ethernet, cellular, NFC, RFID, LoRa LPWAN, and 6LoWPAN Sub-GHz Mesh. The Cortex-M3 architecture allows for the integration of a crypto-accelerator. Crypto-algorithms can be implemented using dedicated accelerators or using the instruction set (software implementation). A number of wearables have been designed using cortex-M3 technology [24].
Traditionally, at the onset of the cloud computing era, data collected from IoMT devices are processed by cloud servers using computationally intensive tasks like machine learning algorithms and big data analytics [25]. However cloud-based computing is an energy-inefficient solution, incurs extra latency and puts user privacy at risk. To circumvent these shortcomings, fog-based computing has been advocated ( Figure 3). Edge devices in the fog computing paradigm can process, store, and transmit data. Recently, multicore technology has received ample attention in the realization of IoT edge devices. It is well-known that in low-power design, the most effective way for a substantial power reduction is to reduce the supply voltage. However, the speed of the logic devices deteriorates. Parallel processing is one approach that alleviates the speeddegradation. In processor design, increasing the number of low-power cores is an effective approach for low-power design. In the nanometer regime, near-threshold computing combined with parallel processing is an effective way for reducing the power consumption of digital circuitry. This approach has been used in [26] to design a quad-core RISC-V processor for IoT endpoints. The processor is designed with compressed instructions, deploys an L0-buffer to reduce cache access, uses a logarithmic interconnect for the intercore communication and core-memory data transfer. Cache memories are accessed via a dedicated interconnect. The reported efficiency of the processor implemented using 65nm bulk CMOS is 67 MOPS/mW. Figure 4 illustrates a simplified architecture of the Parallel Ultra-Low-Power Processor (PULP). The processor was used to design wearables that collect ECG and EMG signals [27]. The wearable has higher energy efficiency compared with similar wearables designed using ARM-Cortex M4F and M7.

Sub-Nyquist Sampling
Compressive sensing, or sub-Nyquist sampling, is an effective signal processing technique that has been used to reduce the power consumption for the processing of ECG signal (signal acquisition and compression) [28]. It has been advocated to reduce the sampling rate for redundant signals by taking advantage of the signal sparsity. Mathematically, given a signal x of dimension N, that is x ∈ R N , the compressive sensing seeks to transform the signal x into a signal y with L non-zero elements, referred to as L-sparse (cf. (1)).
where A ∈ R M×N is the compressive sensing matrix that should satisfy the restricted isometry property shown in (2).
where δ L ∈ (0, 1) is an isometry constant. The reconstruction of the signal is achieved by solving the convex optimization problem illustrated in (3) minimize where δ is an arbitrary small and positive number.
There are three families of algorithm that solve (3): Thresholding, Greedy, and 1minimization algorithms. Table 5 categorizes these algorithms. Details of those algorithms are reported in [29]. Compressive sensing, CS, has been recently implemented in a plethora of medical wearables. In [30], the authors implemented the CS algorithm for real-time compression of ECG signals using a wearable device. The scheme is based on three processing stages. In the first stage is linear transformation, the second stage is specification, and the final one is encoding. Three types of compressed sensing matrix were evaluated in terms of latency and percentage root-mean square difference (PRD). The first matrix was generated using quantized normal number generation, the second one was generated using a pseudorandom matrix obtained by processing one stored random vector, and the last one (sparse binary) used a sub-Gaussian random matrix that satisfies the RIP p property. To further reduce the redundancy between packets, Huffman coding with a codeword of size 512 was used. The reconsturction algorithm was implemented on the desktop PC. The reported results showed that the compressive sensing algorithm dissipated 37.1% less energy compared with the discrete wavelet transform.
In [31] an IoT platform is proposed for the real-time monitoring of an ECG signal. The platform is composed of a wearable device that collects the ECG signal, compresses it using the CS algorithm, and sends this wirelessly to an IoT edge device for reconstruction, classification, and analysis. The authors considered two reconstruction algorithms: subspace pursuit and orthogonal matching pursuit. The edge devices were designed using a commercial processor that has eight cores distributed as follows: a cluster of four high-performance cores and a cluster of four low-power cores.

Approximate Computing
Approximate computing is a new paradigm shift in designing circuitry or writing a code that produce acceptable but inaccurate results [32]. This theory has been used for instance in reducing the bitwidth of arithmetic circuits when implementing digital signal processing algorithms using fixed-point representation [33].
Approximate computing can be done at various abstraction levels: data, circuit, architectural, instruction, algorithm, and system level. An approximate circuit is achieved by reducing the number of partial products in designing multipliers [34] and the number of transistors in designing the mirror adder [35]. A comprehensive survey on the approximate arithmetic circuit is reported in [36]. In the biomedical domain, the work of [37] devised an approximate adder circuit by reducing the number of gates from five gates (two AND, two XOR, and one OR) to four gates (two OR, one AND, and one XOR gates). The approximate adder was then used to design an energy-efficient Baugh-Wooley multiplier. In [38], the authors used approximate adders and multipliers to design an energy-efficient wearable that implements the Pan-Tompkins algorithm for the detection of the QRS complex. The algorithm has five stages. First, the ECG signal is recorded and sampled using an ADC converter. Afterwards, the sampled signal is filtered using a low-pass filter followed by a high-pass filter. The output is then squared and integrated.
The algorithm is computation-intensive in which the optimization of the bit-width and the architecture of the arithmetic circuits play an inevitable role in reducing the overall power consumption. To accomplish this, the authors used five types of approximated full adders ( Figure 5) and two architectures of the approximate multiplier. The approximation starts from the LSB and moves towards the MSB. In each iteration, both energy saving and accuracy of peak detection are determined. The process halts when the maximum number of LSBs has been approximated without affecting the structure similarity index measure (SSIM). Figure 5. Approximate adders used in [38]. (a) Full-adder, (b) the carry-out and the sum are complemented, (c) the input A is copied to the carry-out, (d) both sum and carry-out are approximated, (e) the input A is copied to sum and B is copied to carry-out.
Approximate memory and storage is another approach in the approximate computing topic. It is achieved by the following techniques as bit-trimming, voltage scaling, reducing the refresh rate for DRAM, and approximate compression for selected memory regions [39][40][41].

Techniques: Energy Harvesting for IoMT Devices
Nowadays, most wearable devices use batteries as a power supply source. This is inconvenient and leads to limitations of operation time and to a frequent battery replacement. Energy autonomous systems, supplied by energy harvesting or energy transfer, provide promising solutions, which ensure a continuous sensing and transmission of measured data with minimal human intervention. The power consumption of wearable devices is typically in the range from µW even up to W. Therefore, relatively high requirements are expected from the energy harvesters [42] to realize a reliable power supply for wearable devices such as fitness trackers, body attached sensors, and thermoelectric powered watches [43]. In telemedicine, the commonly used sensors may include accelerometers, heart rate monitors, glucose concentration, and blood pressure monitors. Table 6 investigates the power requirements of the mentioned wearable components which constitute an important guideline for the design of the needed energy harvesting system. The extracted energy cannot be directly supplied to systems. Power management is generally required to achieve rectified and conditioned energy, and to enhance the power extraction ability [49] (Figure 6).

Energy Harvesting Sources
Efficient energy harvesting from ambient sources is very important and can extend or replace the battery-based supply [50][51][52]. Electrical energy can be extracted using one or a combination of different transduction mechanisms. Several principles of converters can be implemented, such as piezoelectric [53], triboelectric [54], electromagnetic [55] and electrostatic [56] converters. However, the ambient sources can fluctuate, affected by ambient conditions or aging, which makes their predictability and controllability quiet challenging (Table 7).
Piezoelectric energy harvesting is an efficient technique to generate electricity from movements, vibrations, and shocks. Several piezoelectric materials can be used, including crystals, ceramics, and polymers. To enhance the piezoelectric effect, it is required to install the converter in a part of the body that is exposed to a large compressing force. Piezoelectric energy harvesters (PEHs) have been frequently implemented in shoe insoles, where the foot can apply a large amount of pressure [57][58][59]. Several studies focusing on kinetic energy generated by body motion to supply wearable systems, have recently been conducted. In [60] the authors proposed a piezoelectric harvester embedded in the heel of a shoe in order to extract the kinetic energy generated by human walking. Although the device can generate significant levels of power, some optimizations for the structure are required. The mechanical specifications of the piezoelectric ceramics have to be considered in case of their implementations in areas with large stresses such as walking or running. This can consequently be used to avoid mechanical damage to the piezoelectric layers. Electromagnetic energy harvesting is based on Faraday's law induction. An electric current is induced once a conductor is moving through an electric field. Electromagnetic energy harvesting systems are designed as a system of springs, magnets and coils. The output power depends mostly on the number of coils and magnetic mass. Therefore, reducing the size, weight and complexity of these energy harvesters is quiet challenging. For example, in [61], the authors demonstrated the performance of a frequency up-converted electromagnetic harvester in harvesting energy from human limbs. In this work, an average power density of 0.33 mW/cm −3 was achieved via low frequency human vibration to supply wearable devices with extremely low frequency (∼5 Hz).
In [62], triboelectric nanogenerators and hybridized systems were reviewed. The authors highlighted the recent development of thermoelectric nanogenerators (TENG)-based hybrid generators (e.g., incorporation of TENG with other transducers such as piezoelectric, electromagnetic and thermoelectric generators) and hybridized systems, from the perspective of operation concepts, energy management strategies, optimizing techniques, and the system integrity. New implementations of these systems for outdoor, indoor, wearable, and implantable applications were also reviewed, with an overview of future trendy applications of hybrid energy harvesters in healthcare, robotics, and the Internet of Things (IoT). They also discussed some challenges for the developed TENG-based generators such as the material optimizations, the output power enhancement, the operation mode and the energy storage strategies.

Energy Transfer
An interesting alternative for supplying medical wearable devices is energy transfer. Both Radio-Frequency energy transfer (RF) and Inductive Power Transfer (IPT) are interesting for supplying WSN for e-health.
Typically, an IPT system consists of an AC signal generator supplying the transmitter coil to generate magnetic flux, which induces voltage in the receiver coil situated in the proximity of the transmitter coil. To increase the transmission efficiency and the received power, LC resonance on both IPT system sides is required, which consists of additional compensation capacitors connected to the transmitter coil and the receiver coil in series or in parallel [63,64]. The connected loads are generally supplied by a DC voltage within specific voltage and current levels. For that, additional energy management stages includes an AC-DC rectifier and an DC-DC converter becomes primordial of an IPT system. On the other hand, the IPT system works properly within an ideal load impedance, which depends on the current consumption, the output voltage. For that, an impedance matching between both sides is recommended by the uses of controllable gain in the transmitter side as well as controllable DC-DC converter in the receiver side [65].
Moreover, an IPT system works properly with the highest performance when the transmitter and the receiver coils are in resonance conditions as well as concentric position with minimal separation distance [52,66]. The misalignment between the coils situated in transmitter and the receiver sides influences the coupling factor of the IPT systems and then the transmission efficiency and the received power. In the case of flexible coils, additional deviations can occur caused by the bending or stretching of the transmitter or the receiver coils. The bending can be in concave or convex directions. However, when both coils are bent in the same direction and with the same angle, the flexible coil performances become like rigid flat coils (Figure 7). To overcome the bending issues, the selection of a proper coil in terms of the parameters [67] and geometry [68] to tolerate the bending are investigated. The fabrication of the coil for wearable applications can be generally based on the rigid coil [69], flexible coils [68,70,71] or fabric-based coils [67,72,73] associated with clothes. Rigid coils show the highest efficiency comparing to the other types of coils, but they are not preferred due to their weight and lack of flexibility. On the other hand, flexible coils can be a good alternative with which design a printed coil with different material properties like copper [68] or silver [74] and with various possible geometries and sizes. However, flexible coils are difficult to associate with the clothes, especially where the fabrics are stretchable and a specific fabrication process for the coil is required [73]. For that, many studies work on fabric-based coils and show more and more potential. They can be fabricated by thin and heterogeneous conductor wires via automated sewing machines for high precision and simplicity of integration with clothes.
To solve the flexible coil bending issues, many studies investigate control of the IPT system by varying the supply gain, frequency, or even compensation capacitor values [75]. These control architectures require additional elements on the transmitter side circuit, which increases the system cost and size. On the other hand, some studies [71] apply an IPT with a bent transmitter coil and receiver coil to reach higher performances in terms of coupling factor, as shown in Figure 7c. Others focus on IPT systems with multiple transmitter coils to generate a higher magnetic field [72] for a higher received voltage at the receiving side.

Energy Harvesting Solutions for Medical Wearable Devices
Several solutions have been proposed to realize energy scavenging from the human body [43,76]. This can be from everyday activities, without performing a specific workout, such as breathing, arm motion, walking, running, or pedaling. Mechanical energy is available from the movements of different body zones like the elbow, the knee, the ankle or heel (Figure 6). In [77], the performance of three different vibration generators was investigated at nine positions on the body of a person walking on a treadmill. The results show that at lower body locations (hip, knee and ankle), the amount of energy generated is four times higher than the amount generated at upper locations.
Body heat provides also interesting possibilities for supplying wearable systems. In [78], a flexible thermoelectric generator (TEG) was able to generate 4.95 mW of power from body heat based on the Seeback effect and was then used to power a wearable multi-sensing bracelet. The self-powered multi-sensing bracelet can work sustainably at various conditions including human motion. In such systems, the amount of energy is highly dependent on the temperature difference between the human body and the ambient temperature [79].
In some investigations, involuntary activities like cardiac motion, blood pressure and breathing have been used to produce biomechanical energy, which can regularly provide energy for wearable devices. In [80] cardiac contractions are used as a source of energy to power low-power pacemakers. The developed harvester delivers 11.1 µJ of electrical energy when powered by a constant 90 bpm heartbeat. The extraction of energy from the human body is in general much more complex than energy harvesting from machines [81] due to the requirement for small size and weight. The available energy is often weak and not easily usable, e.g., human body kinetic energy has often a low frequency and a low amplitude.
In [82] the efficient use of thermoelectric nanogenerators (TENG) in healthcare as a convenience technique for patient rehabilitation was recently demonstrated. A wearable TENG-based rehabilitation device (Rehab-TENG) was implemented as an exercise gaming device while extracting energy. First, they successfully controlled a game on a laptop with the device using arm flexion and extension. This can be an effective technique for testing the motor function of the patients' impaired arm. Second, the Rehab-TENG device was implemented as an energy harvester in an exercise system where the patient moved their impaired arm to store energy in a capacitor. Based on the charging rate of the storage capacitor, the level of deficiency could be evaluated, which can thereby improve patient compliance since it motivates them to do more repetitive motions of the impaired body zone and this feature can eventually speed up recovery. Finally, authors discussed the possibility of using the Rehab-TENG device as an autonomous home-based exercise and monitoring system which is particularly useful in the current pandemic situation, avoiding hospital visits for rehabilitation treatment and monitoring.
The main trend in energy harvesting technologies for IoT biomedical applications [49] is towards the development of biocompatible wearable energy harvesters such us textiles, footwear or watches to extract energy from the human body, which are lightweight and easy integrable in textiles and provide important possibilities to extend their sizes for a better energy output.

Deep Learning
Deep Neural Networks (DNNs), as a branch of Artificial Intelligence (AI), are rapidly growing in both academia and industry, showing superior advantages in different domains such as pattern recognition, speech recognition, image classification, and computer vision [83,84]. They are also opening the path to new critical domains as E-health, self-driving cars, and surgery robots, where high precision and safety are the highest priority. As shown in Figure 8, a DNN is composed of a number of neurons, arranged in layers as input, output, and hidden layers, where each neuron performs a simple Multiply-And-Accumulate (MAC) operation.
DNN computation is composed of two main steps of training and inference. In the training phase, weights are learned and, in the inference phase, the learned weights are fixed and deployed on the underlying hardware. Training is an offline process while inference is an online process. So, the inference time and power consumption should be kept as small as possible, especially if the DNN is going to be deployed on edge devices in the IoMT. Among them, the deployment of DNNs on embedded systems is more challenging as these platforms have limited computation resources and storage, and are usually battery-enabled. In the following, we investigate different platforms to execute DNNs. Then, we review different approaches to reduce power in resource-constraint embedded devices.

Platforms to Execute DNN Applications
There are several popular platforms with which to execute DNNs such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs) [85]. Application-Specific Integrated Circuits (ASICs) are also experiencing increasing popularity in DNN computation as they can offer optimized performance and power efficiency. Many-core systems-on-chip (MPSoCs) are another platform providing flexibility for the communication among neurons. These platforms can be embedded into a system, called embedded systems, to perform a specific task. Various attempts have been conducted so far to improve the efficiency of such platforms for DNN applications, either targeting the training or inference phase [86]. Concerning IoT devices, the inference phase is of our interest, where the trained model is deployed on the device. The computations in the inference phase can be optimized using frameworks such as Tensorflow Lite, TensorRT, XLA, AMP, and Arm NN. These frameworks try to bridge the gap between the DNN models and the underlying hardware platforms. Among them, TensorflowLite and TensorRT are the engines targeting IoT and embedded devices.

Deploying DNNs on IoMT Edge Devices
In recent years, cloud computing and IoT techniques have grown rapidly, proving benefits in different aspects of human life. Based on a prediction, the number of IoT devices will approach 21 billion by 2025, and the market is reaching USD 112 Billion by then (IoT Analytics Research 2018; investinbsr). At the same time, these devices generate a massive amount of data that should be analyzed locally, which demands highly efficient data processing platforms and techniques. DNNs can address the process of the produced data. However, they come with extensive computation and storage requirements beyond the capability of current embedded devices to execute them within a limited power budget. As an example, VGG-16 [87] as a popular deep convolutional neural network, consists of 138 million parameters, requiring 500MB memory space, and involves 15.5 million floating-point operations per second (FLOPs) [88]. Such a requirement is beyond the capability of IoT devices.
Considering the above facts, it is highly challenging to deploy DNNs on resourceconstraint embedded devices, and extensive effort is needed to enable such deployment. Among these approaches are pruning [89], quantization [90], knowledge distillation [91], parameter sharing [92], and compression [93] to reduce the computational complexity of DNNs. The impact of these approaches should be measured on the underlying device where DNNs are going to be deployed as different approaches show varying impacts on different platforms. After observing the system parameters (such as throughput, latency, and power), algorithms should be refined and tuned to improve these parameters. Figure 9 shows a schema for co-optimization of DNNs and the hardware platforms.

Algorithms and Methods to Reduce Computation and Power Consumption in Embedded Devices
There are various algorithms and methods at different levels (such as DNN modeling, computer architecture, and compiler) to improve the execution of DNNs. A popular approach is quantization [93], which reduces the precision of number representation, e.g., from 32-bit to 4-bit representation. Quantization relaxes the storage requirement and could lower the number of computations, which usually come at the cost of lower accuracy. Although quantization is an efficient approach with regard to storage, the underlying hardware should be able to skip the unnecessary zero computations as a result of reduced number representation. Thereby, quantization alone cannot be considered as an effective approach to reduce the inference time or power consumption in embedded devices.
Another efficient approach is pruning, which is mainly based on removing weights that are smaller than a threshold value. Pruning can also be applied to filters by removing those which are the least important based on the defined criteria [88]. Like quantization, pruning also usually leads to lower accuracy, while retraining helps to retain the accuracy to some extent. Weight pruning also requires hardware support, and without such support, the underlying hardware cannot skip the zero computation and thus reduce the power consumption or inference time. Filter pruning, on the other hand, skips the zero computation related to the removed filters entirely. Thereby, it would be an efficient approach to be employed in embedded devices to reduce power as well as inference time.
Dynamic batch sizing is another approach to control the power and throughput at run-time [94]. In this approach, instead of using a fixed batch size, depending on the power budget and the required throughput, batch size can be dynamically adjusted. This approach allows saving power in IoMT devices whenever necessary.

User-Centred Design
The market success of the e-health system is guided by the user acceptance of the technologies. Numerous reports suggested involving user requirements in the development of the IoMT. Surveys are important instruments to identify the functional and non-functional requirements of the medical health system [95][96][97][98]. This approach has been adopted in the vINCI project (Clinically-validated Integrated Support for Assistive Care and Lifestyle Improvement: the Human Link) [96,99]. The project uses the following technologies: a smartphone app, a dashboard, smartwatch, and smart insole. The users of the project are of two categories: primary end-users (seniors over 65 years old) and secondary end-users (caregivers). Two questionnaires, one for each category, have been designed to capture the user requirements. Twelve functional and five non-functional requirements have been identified. Those requirements have been considered in developing the vINCI platform.

Conclusions and Open Issues
The miniaturization of electronic devices coupled with the advances in internet technology and wireless communication has established a plethora of pervasive and ubiquitous applications. The Internet of Medical Things, IoMT, is a paradigm shift in the health industry that has been put in place to improve medical services, reduce the cost, increase life expectancy, and so forth. IoMT is composed of low-end, middle, and high-end nodes. Wearables are low-end nodes that are composed of multiple medical sensors, energy scavenging units, wireless communication circuitry, embedded system, storage unit, and power management unit. To increase the operating hours of wearables, several techniques can be used at the algorithm level, including all the design abstraction layers of the VLSI design, and energy harvesting techniques. Additionally, a significant power reduction can be achieved by reducing the transmission rate from wearables to the fog layer. This work systematically reviewed the approaches taken to design energy-efficient IoMT edge nodes. In particular, the following topics have been discussed: technology trends, sub-Nyquist sampling, approximate computing, multi-core technology, energy harvesting techniques, and the implementation of deep learning on edge devices. Among these topics, approximate computing for IoMT is under-researched and needs further exploration as it is a very promising technique to reduce energy consumption. Energy harvesting technologies provide promising possibilities, even if they are not frequently used. Multi-source energy harvesting can be coupled with algorithmic techniques, such as computational offloading, which is a topic that needs further investigation. At the technology level, RISC-V-based multicore architecture is a promising solution for wearables. However, the security and trustability of RISC-V need further exploration. Last but not least, the involvements of different stakeholders and end-users in determining the functional and non-functional requirements for the development of the IoMT are often overlooked and need to be accounted for before, during, and after the development of an IoMT platform.

Conflicts of Interest:
The authors declare no conflict of interest.