A Review of Embedded Machine Learning Based on Hardware, Application, and Sensing Scheme

Machine learning is an expanding field with an ever-increasing role in everyday life, with its utility in the industrial, agricultural, and medical sectors being undeniable. Recently, this utility has come in the form of machine learning implementation on embedded system devices. While there have been steady advances in the performance, memory, and power consumption of embedded devices, most machine learning algorithms still have a very high power consumption and computational demand, making the implementation of embedded machine learning somewhat difficult. However, different devices can be implemented for different applications based on their overall processing power and performance. This paper presents an overview of several different implementations of machine learning on embedded systems divided by their specific device, application, specific machine learning algorithm, and sensors. We will mainly focus on NVIDIA Jetson and Raspberry Pi devices with a few different less utilized embedded computers, as well as which of these devices were more commonly used for specific applications in different fields. We will also briefly analyze the specific ML models most commonly implemented on the devices and the specific sensors that were used to gather input from the field. All of the papers included in this review were selected using Google Scholar and published papers in the IEEExplore database. The selection criterion for these papers was the usage of embedded computing systems in either a theoretical study or practical implementation of machine learning models. The papers needed to have provided either one or, preferably, all of the following results in their studies—the overall accuracy of the models on the system, the overall power consumption of the embedded machine learning system, and the inference time of their models on the embedded system. Embedded machine learning is experiencing an explosion in both scale and scope, both due to advances in system performance and machine learning models, as well as greater affordability and accessibility of both. Improvements are noted in quality, power usage, and effectiveness.


Introduction
Machine learning has become a ubiquitous feature in everyday life. From self-driving vehicles, facial recognition systems, and real-time interpretation of different languages, to security surveillance, smart home applications, and health monitoring, artificial intelligence has changed almost every society on earth [1][2][3][4]. Due to the extremely high computational requirements of machine learning models, until recently, the majority of these breakthroughs were implemented on high-power stationary computing systems. However, continuous advancements in embedded system design have made the implementation of machine learning models on embedded computing systems for a wide variety of mobile and low-power applications viable. One example of such an application would

Objective and Method
To reiterate, the goal of this study is to summarize the current state-of-the-art research in the embedded machine learning area for different applications, so that the researchers could have an overview of the cutting-edge methods and results, as well predict the general trajectory of embedded machine learning advances. The method of research for this study was the compilation of the results gathered by the research papers referenced for this work. Excluding the related works in the Benchmark and Review section of the references, all of the papers presented in this review included a proposal or implementation of embedded machine learning for a specified application with the results of each study including one or all of the following findings: accuracy, inference speed, and power consumption.

Hardware
Embedded systems are computer hardware systems designed for performing dedicated functions in a combination with a larger system. They include and are used in many everyday items from mobile phones and household appliances. Embedded computer devices are a subset of embedded systems used for computational tasks for more dedicated or remote operations, such as running machine learning algorithms in real time on small unmanned aerial vehicles, connecting systems connected to the internet of things, and even security monitoring. While the variety of the embedded computer devices produced and used is quite wide, most academic research conducted on embedded machine learning is focused on using Raspberry Pi and NVIDIA Jetson devices. Some other devices used include the ASUS Tinker board series, Google's Coral TPU dev series, ODROID-XU4 Boards, and the Banana Pi board series.

General Considerations
When choosing an embedded computing device for specific applications, many different parameters need to be kept in mind. The parameters include, but are not limited to, system processing speed, affected by the integrated CPU and GPU of a system, system memory affected by the RAM, system storage space, system bus and drivers, the overall power consumption of a system, and its cost of purchase. Generally, systems with higher performance and memory are capable of performing more complex machine learning tasks at a greater speed but have high power consumption rates and monetary prices. On the other hand, cheaper and less power-intensive systems have lower performances and memory, making them perform their dedicated task far slower.

Processor Units
Processing units are the integrated electrical circuits responsible for performing the fundamental algorithmic and arithmetic logic processes for running a computer device. There are different categories of processors, with the most common ones in embedded computer systems being CPUs and GPUs. Central Processing Units, or CPUs, are the processors present in most electrical devices and are responsible for the execution of programs and applications, they are usually composed of multiple cores and have their performance measured in gigahertz. Graphical Processing Units, or GPUs, are dedicated processors used for graphical rendering, allowing devices to allocate graphically intensive tasks, such as real-time object recognition, to them. All of the embedded computer devices presented in this review contain both a CPU and GPU unit, with the CPUs being various ARM Cortex multicore processors [24][25][26][27][28][29][30][31][32][33][34]. The GPUs for each system were more varied in both clock speed and power consumption. More detailed descriptions are given within each systems subsection.

Memory Units
System memory generally refers to a computing system's Random Access Memory or RAM, which is responsible for storing application data for quick access. The larger a system's RAM, the quicker the system can run simultaneous applications, making RAM proportional to the overall performance of a system. Embedded computing devices are packaged with their own memory component, with most embedded systems in this review having between 1 GB, 2 GB, and 4 GB of RAM [30,31], while the most recent NVIDIA kits have between 8 GB and 16 GB [24,28]. Memory Bandwidth is another important parameter of system memory, indicating the rate at which data can be accessed and edited, with the bandwidth of the system included in this review ranging from 128-bit to 256-bit.

Storage Units
Computer storage refers to the component of a computing device responsible for retaining longtime application and computation data. While access and alteration to storage data by the CPU are much slower than its access to RAM data, it consumes far less power and processing capability. Storage systems come in many varieties such as flash drives, hard drives, solid state drives, SD cards, and embedded MultiMediaCard memory or eMMC. Hard drives have been the most common form of storage up until recently, with their advantage over other alternatives being their overall size and their downside being their relatively slow data access speed. Solid state drives or SSDs have provided far faster data access at the cost of storage size, however, in recent years, SSDs have made leaps in storage capacity and are now comparable in overall storage size to hard drives. Flash drives are quick and easy to connect or disconnect from different computing devices while having very small storage space, they are very similar to SSDs in terms of performance. Secure digital cards or SD cards are also similar to flash storage but have both much smaller sizes and storage capacities. eMMCs are architecturally similar to flash storage and are generally used in small laptops and embedded computing systems. Most development kit embedded computing systems contain eMMCs, this being very much the case in NVIDIA Jetson, Coral Edge, and ASUS Tinker board devices, and others, such as ODROID-XU4 boards, do not have their own integrated storage devices but instead have flash storage interface. Raspberry Pi boards have interfaces for both SD cards and Flash drives.

Operating Systems
Operating systems are responsible for managing and running all of the applications on a computing device, allowing applications to make requests for services through a defined application program interface (API). This makes the creation and usage of various applications much simpler, as all low-level functions, such as allocating disk space for an app, can be delegated to the OS. Operating systems rely on a library of device drivers to their services to specific hardware environments, so while every application makes a common call to a storage device, it is the OS that receives that call and uses the corresponding driver to translate the call into commands needed for the underlying hardware. Hardware capabilities are divided into three sections: providing UI through a CLI or GUI, launching and managing application execution, and identifying and exposing system hardware resources to the applications. Most personal computing devices utilize general-purpose operating systems, such as Windows, Mac OS, and Linux, and while there are specific embedded operating systems, mainly used in ATMs, Airplanes, and ioT devices, most embedded computing systems either utilize operating systems based on or very similar to general-purpose computer operating systems. For example, Nvidia Jetson boards have Linux for Tegra included in their development software kits [35].

Bus and Drivers
Computer buses are communication systems responsible for transferring data between the various components of a computing system. While most home computer systems have 32-bit to 64-bit buses, embedded devices have far smaller bit rates between 4-bit and 8-bit. Drivers refer to the systems responsible for communicating the software of a computer device to its hardware component. They generally run at a high privilege level in the OS run time environment, and in many cases are directly linked to the OS kernel, which is a portion of an OS such as Windows, Linux, or Mac OS, which remains memory-resident and handles execution for all other code. Drivers are what defines the messages from the OS to a specific device that facilitate the devices' fulfillment of the OS's request. The device drivers used in each embedded computing system are related to the operating systems of each device. For example, Raspberry Pi devices mainly use Raspberry Pi's own operating system which is based on Debian, while Nvidia Jetson boards mainly rely on JetPack, Nvidia's proprietary Software Development Kit (SDK) for their Jetson board series, which includes the Linux for Tegra (L4T) operating system. This means the driver kernels for both of these embedded system product lines are similar to that of a Linux computer [36].
Firmware refers to software formats that are directly embedded in specific devices, giving users low-level control over them. Essentially, firmware is responsible for giving simple devices their operation and system communication instructions. They are unique to other software in that they do not rely on APIs, OSs, or device drivers for operation. They are the first part of device programming to start sending instructions when the device is powered on, and in some more simple devices such as keyboards, they never pause their operations. They are mostly installed on a ROM for software protection and proximity to the physical component of their specific device. They can only work with a basic or low-level binary language known as machine language [37]. All of this applies to the components within an embedded system, meaning each device within the system has its own unique firmware with varying levels of complexity based on the function of the device.

Nividia Jetson
Jetson is the name of a series of machine learning embedded systems by NVIDIA used for autonomous devices and various embedded applications. While Jetson Developer kits vary in capability and performance, they are generally very reliable for implementing machine learning tasks-this is especially true for more graphically intensive applications. The downside to this is that NVIDIA Jetson boards also tend to be more costly than market alternatives. Most of the sources shown in this review either only made use of Jetson boards or used their combination with other devices. These specific developer kits were the NVIDIA Jetson Nano, NVIDIA Jetson TX1, NVIDIA Jetson TX2, NVIDIA Jetson AGX Xavier, and NVIDIA Jetson Xavier NX.
NVIDIA Jetson Nano is one of the smaller Jetson kits specialized for machine learning tasks like image classification, object detection, segmentation, and speech processing. It has a 128-core Maxwell GPU, a Quad-core ARM Cortex A57 1.4Remote Sensing of EnvironmentHz CPU, 4 GB 64-bit LPDDR4 25.6 GB/s Memory, 2x MIPI CSI-2 DPHY lanes camera, Ethernet, HDMI, and USB connection ports. Unlike most other NVIDIA kits, Nano does not have an integrated storage unit and has to rely on SD cards for that purpose. It has a power consumption of 5-10 Watts and with a price range of USD 300-USD 500, it is the more affordable option out of all of the NVIDIA development kits [24].
The Jetson TX1 and TX2 series are a discontinued line of embedded system development kits with flexible capabilities that include great performance for machine learning tasks. As the discontinuation of this line of kits is especially recent for the TX2 series, research publications that utilize the TX2 board are not uncommon, with the TX1 being much rarer. The TX1 has a 256-core Maxwell GPU, a Quad-core ARM® Cortex®-A57 CPU, a 4 GB LPDDR4 memory, a 16 GB eMMC 5. The Jetson AGX Xavier is one of the most powerful developer kits produced by NVIDIA. It is mainly used for creating and deploying end-to-end AI robotics applications for manufacturing, delivery, retail, and agriculture, but it could also be applied for less intensive machine learning applications. It has a 512-core Volta GPU with Tensor Cores, an 8-core ARM v8.2 64-bit CPU, a 32 GB 256-Bit LPDDR4x memory, a 32 GB eMMC 5.1 Flash storage, as well as two USB C ports, and an HDMI and camera connector. It has a price of about USD 4000 and has a power consumption of 30 Watts, making it much more costly in both price and electricity than the other Jetson kits [27].
The Jetson Xavier NX kits is another series of NVIDIA developer kits designed as the successor to the TX series. It is power-efficient and compact, making it suitable for machine learning application development. It has an NVIDIA Volta architecture GPU with 384 NVIDIA CUDA® cores and 48 Tensor cores, a six-core NVIDIA Carmel ARM®v8.2 64-bit CPU, an 8 GB 128-bit LPDDR4x memory, two MIPI CSI-2 DPHY lanes cameras, and Ethernet, HDMI, and USB type A and Micro AB connection ports. It has an integrated storage component of its own, instead of relying on a micro SD storage interface. It has a power consumption of 10 Watts and a price range of around USD 2000. Its well-rounded quality makes it a very good, if somewhat expensive, the choice for machine learning implementation on embedded systems [28].

Google Coral
Google Coral Dev Board is a single-board computer by Coral that can be used to perform fast machine learning (ML) inferencing in a small form factor; it is mainly used for prototyping custom embedded systems, but it can also be used for embedded machine learning on its own. It has an Edge TPU coprocessor that is capable of performing 4 trillion operations per second, as well as being compatible with TensorFlow Lite. It has a quad Cortex-A53 CPU, integrated GC7000 Lite Graphics, 1 GB/2 GB/4 GB LPDDR4 memory, 8 GB eMMC storage as well as a MicroSD slot, Type C, A, and microB USB, Gigabit Ethernet, and HMDI 2.0 ports. The overall board has a low power cost of 6-10 Watts and at USD 130, the price for the board is relatively low [29].

Raspberry Pi
Raspberry Pi is a series of extremely popular embedded computers developed by the Raspberry Pi Foundation in the United Kingdom. The uses for these systems are extremely wide, including machine learning. Like the Jetson series, Raspberry Pi products are very commonly used in embedded machine-learning implementation projects. For this review, the three systems of Raspberry Pi that were commonly utilized were the Raspberry Pi 3 Model B, the Raspberry Pi 3 Model B+, and the Raspberry Pi 4 Model B.
The Raspberry Pi 3 Model B is the first iteration of the third-generation Raspberry Pi computers. It has a Quad Core 1.2 GHz Broadcom BCM2837 64bit CPU, a 400 MHz VideoCore IV video processor, a 1 GB LPDDR2 memory, a microSD port for storage, a 100 Base Ethernet, 4 USB 2.0, and full-size HDMI ports. It has an extremely low power consumption of 1.5 Watts and a monetary cost of about USD 40 [30].
The Raspberry Pi 3 Model B+ is the final iteration of the third-generation Raspberry Pi Computers. It has a Quad Core 1.4 GHz Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC CPU, a 400 MHz VideoCore IV video processor, a 1 GB LPDDR2 memory, a microSD port for storage, a 1000 Base Ethernet, 4 USB 2.0, and full-size HDMI ports. Its main advantage to model 3b is its processor's higher clock speed and its PoE (power over Ethernet) support. At 2 Watts, its power consumption is still low but higher than that of the model 3b series. It also has a very close monetary cost ranging around USD 40.
The Raspberry Pi 4 Model B is the first iteration of the fourth-generation Raspberry Pi Computer. It has a Quad Core 1.5 GHz Broadcom BCM2837B0, Cortex-A72 (ARMv8) 64-bit SoC CPU, a 400 MHz VideoCore IV video processor, a choice between 1 GB, 2 GB, 4 GB, and 8 GB LPDDR2 memory, a microSD port for storage, a Gigabit Ethernet, 4 USB 2.0, and full size HDMI ports. Its main advantage to model 3b is its processor's higher clock speed and its PoE (power over Ethernet) support. Its newer processor and option for memory make it a superior choice compared to the previous iteration of Raspberry pi. It has a relatively low power consumption of 4 Watts and a monetary cost of about USD 40-USD 80 depending on the memory size [31].

ODROID XU4
The ODROID XU4 is an energy-efficient single-board embedded computing system by Hardkernel Co. located in Rm704 Anyang K Center 1591-9 Gwanyang-dong Dongan-gu, Anyang-si, Gyeonggi-do, South Korea. It is compatible with open-source software and can use different versions of Linux, such as Ubuntu, as its operating system. It has Exynos5422 Cortex™-A15 2 Ghz and Cortex™-A7 Octa core CPUs, a Mali-T628 MP6 GPU, a 2 GB LPDDR3 memory, 2 GB eMMC5.0 LPDDR3 Flash Storage as well as a microSD slot, 2 USB 3.0 and 1 USB 2.0, Gigabit Ethernet, and HMDI 1.4 ports. It has an operating power of 5 Watts and its cost is generally around USD 100 [32].

Banana Pi
Banana Pi is an open-source hardware platform by Shenzhen SINOVOIP Co. located in 7/F, Comprehensive Building of Zhongxing Industry City, Chuangye Road, Nanshan District, Shenzhen, China. Like other embedded systems, it has a wide range of applications, amongst them, embedded machine learning implementation. It has an H3 Quad-core Cortex-A7 H.265/HEVC 4K, a Mali400MP2 GPU, 1 GB DDR3 Memory, an 8 GB eMMC Onboard Storage, two USB 2.0 ports, an HDMI port, and an Ethernet interface. Its overall power consumption is about 5 Watts and it has a price range of USD 50-USD 75 [33].

ASUS Tinker Board
The ASUS Tinker Board S is a powerful SBC board with a wide range of functions such as computer vision, gesture recognition, image stabilization, and processing, as well as computational photography. It has a Rockchip Quad-Core RK3288 CPU, an ARM® Mali™-T764 GPU, a 2 GB Dual-Channel DDR3 Memory and 16 GB eMMC Onboard Storage 4 USB 2.0, and an Ethernet port, and RTL GB LAN connectivity. It has a maximum power consumption of 5 Watts and is a relatively low-price system for all of its capabilities ranging in price from USD 100-USD 150 [34].
The ASUS Tinker Edge R is specifically developed for AI applications, containing an integrated Machine Learning (ML) accelerator that speeds up processing efficiency, lowers power demands, and makes it easier to build connected devices and intelligent applications. It has an Arm® big.LITTLE™ A72+A53 Hexa-core CPU, an ARM® Mali™-T860 MP4 GPU, a 4 GB Dual-CH LPDDR4 memory on its system, and a 2 GB LPDDR3 on the Rockchip NPU, a 16 GB eMMC Flash Storage as well as a microSD slot, 3 USB 3.2 type A and 1 USB 3.2 Type C, Gigabit Ethernet, and HMDI ports. It can maintain a maximum power supply of 65 Watts and is a relatively lo-price system for all of its capabilities ranging in price from USD 200-USD 270 [38].
All of the inforamtion related to hardware specification has been summarised in Table 1.

Sensors
Electrical sensors are components responsible for gathering input from a given physical environment. The specific input that a sensor responds to varies from sensor to sensor could be temperature, ultrasound waves, light waves, pressure [39,40], or motion. Sensors do this by acting as switches in a circuit, controlling the flow of electric charges through their overall systems. Sensors can be split into two separate overarching categories, active sensors, and passive sensors. Active sensors emit their own radiation such as ultrasound waves and laser, from an internal power source, which is then reflected from the objects in the environment, the sensor then detects these reflections as inputs. radars are an example of active sensors. Passive sensors simply detect the radiation or signature emitted from their targets, such as body heat [41].
The most important characteristics of sensor performance are transfer function, sensitivity, span, uncertainty, hysteresis, noise, resolution, and bandwidth. The transfer function shows the functional relationship between the physical input signal and the electrical output signal. The sensitivity is defined in terms of the relationship between the input physical signal and the output electrical signal. The span is the range of input physical signals that may be converted to electrical signals by the sensor. Uncertainty is generally defined as the largest expected error between actual and ideal output signals. Hysteresis is the width of the expected error in terms of the measured quantity for sensors that do not return to the same output value when the input stimulus is cycled up or down. Output noise is generated by all sensors in addition to the output signal, and since there is an inverse relationship between the bandwidth and measurement time, it can be said that the noise decreases with the square root of the measurement time. The resolution is defined as the minimum detectable signal fluctuation. The bandwidth is the frequency range between the upper and lower cutoff frequencies, which respectively correspond to the reciprocal of the response and decay times [42].
Once sensors acquire input and convert it into electrical current, they can communicate their data to the rest of an overarching system through a variety of means, the main methods being to transfer data over a wired interface, or transfer data wirelessly [43,44]. Since the embedded systems studied in this research all made use of wired communication for their sensing systems, we focus only on analog communication. Standard wired interfaces between sensors and computing devices use serial ports, which transfer data between the data terminal equipment (DTE) and data circuit-terminating equipment (DCE). For successful data communication, the DTE and DCE must agree on a communication standard, the transmission speed, the number of bits per character, and whether stop and parity framing bits are used. Most modern-day computing devices and embedded systems use USB standards for their communication, connection, and power peripherals, which includes any additional sensor systems. USBs have had many port-type iterations since their inception; USB 1.x (up to 12 Mbps speed), USB 2.0 (up to 480 Mbps speed), USB 3.0 (up to 5 Gbps speed), and USB4 (super speed, up to 40 Gbps), most devices have ports for the USB 2.0 and USB 3.0 port types, with the USB4 being mostly suited for mobile smartphone devices. One of the main advantages of USB devices, including sensor systems, is that they can have multiple functionalities through a single connection port, for example, a USB camera can record both video and audio. These devices are referred to as composite devices and each of their functionalities is assigned to a specific address. USB devices can draw 5V and a maximum of 500mA from a USB host, allowing both data interface for sensor systems as well as powering the sensor component [45].

Sensor-to-Computation Pipeline
Once sensor systems receive input, they convert the input into digital data and transfer it to a display or a larger system. The format of the gathered data depends on the specific input a sensor collects, cameras would collect videos or images and microphones would collect audio. The environmental data collected by sensors are then stored within internal or external storage components connected to the overall system. These data are then used for whatever purpose the overall system that employed the sensor has been designed for.
As the focus of these research projects is over-viewing the capability of different embedded systems for running machine learning models, all of the sensor data are transferred to a previously trained machine learning algorithm or used to train a new algorithm based on existing architecture. In cases of trained model deployment, depending on the exact application of the model as well as its architecture, the stored data collected by the sensor systems is transferred to the model to perform predictions. For example, image identification and object recognition models will compare images files to the dataset images they have been trained with to either identify the specific objects of interest or the entire image, while forest biomass estimation models would compare the results gathered from lidar sensors to their trained dataset to estimate the concentration of vegetation in certain areas of forests [46].

Specific Sensors
Much like the different embedded computing systems that were used for machine learning implementation, many different sensors were used in each of our review sources depending on the application of the research. Not all sources made active use of a sensor within their work, and mainly explored the theoretical implementation of their machinelearning models using sensor systems. Amongst those that did implement their systems in some capacity, many implemented some form of object detection, image recognition, image segmentation, and other forms of computer vision, making extensive use of different integrated and separate image and video cameras. These cameras included infrared, RGB, Depth, Thermal, and 360-degree cameras. Other sensors used included microphones, electrocardiograms, radar, motion sensors, LIDAR, and multi-sensors.

RGB Cameras
RGB color cameras or visible imaging sensors are sensor systems that collect and store visible light waves as electrical signals that are then reorganized as rendered colored images. The images and videos they capture replicate human vision, capturing lightwave with (400-700) nm wavelength through light-sensitive electrical diodes, then saving them as pixels. Modern-day cameras can capture high-definition images [47]. The main use of these sensors is for object detection and image classification algorithms. Among the sources in this review, the main application in which an RGB camera was implemented included autonomous vehicles for pedestrian and sign detection, security cameras for intruder detection, facial recognition, and employee safety monitoring, and drones for search and rescue, domestic animal monitoring [48,49], agricultural crops, and wildlife observation [50].

Infrared Cameras
Infrared cameras or thermal imaging sensors are sensor systems that collect and store the heat signature that is emitted from objects as electronic images that show the apparent surface temperature of the captured object. They contain sensor arrays, consisting of thousands of detector pixels arranged in a grid on which infrared energy is focused. The pixels then generate an electrical signal that is used to create a color map image corresponding to the heat signature detected on an object ranging from violet to red, yellow, and finally white, with deep violet corresponding to the lowest detected heat signature and bright white corresponding to the highest detected heat signature [51]. In a similar sense to RGB cameras, the main use of these sensors is for object detection and image classification algorithms, albeit for more specialized tasks. Applications proposed by the sources in this review included autonomous vehicles for pedestrian detection, hand gesture, sign language, and facial expression recognition, thermal monitoring of electrical equipment, and profile recognition in smart cities.

Depth Cameras
Depth or range cameras are specific forms of sensor systems used to measure the exact three-dimensional depth of a given environment. They work by illuminating the scene with infrared light and measuring the time-of-flight. There are two operation principles for these sensors, pulsed light, and continuous wave amplitude modulation. In a sense, depth camera operation is very similar to Lidar, with it relying on infrared radiation reflection instead of laser [52]. The main application depth cameras used in among the sources of this paper were for quad-copter drone formation control, ripe coffee beans identification, and personal fall detection.

360 Degree Cameras
360-degree cameras are sensor systems used to record images or video from all directions in 3D space using two over-180-degree cameras facing the front and rear of the device, the borders of the two images or videos are then stitched together to create a seamless single 360 image or video file. Users and automated applications can then select a specific section of the captured 360-image or footage for the intended use. Other than the over 180-field of view for each camera lens, 360 cameras work in an identical fashion to RGB cameras capturing visible spectrum light and storing it as digital data in pixel format [53,54]. While 360 cameras have various applications, from recreational ones such as vlogging and nature photography to navigational ones such as Google Maps, the sources used in this paper mainly relied on them for biometric recognition and marine life research.

Radar
RADAR, short for Radio Detecting And Ranging, is a radio transmission-based sensor system designed for detecting objects. They operate using short-pulse electromagnetic waves, these pulses are then reflected from objects in the path of the RADAR sensor and are then reflected back at it. Essentially, "When these pulses intercept precipitation, part of the energy is scattered back to the RADAR" [55]. RADAR systems can rely on 14 different frequency bands depending on the application. RADAR systems have a wide variety of applications, from meteorology to military surveillance and astronomical studies. Among the sources used for this review, RADAR systems were scarcely used, and within these cases, the main usage was for electric hybrid car deep learning-based car following systems as well as multi-target classification for security monitoring.

LiDar
Lidar (light detection and ranging) sensors are sensor systems that emit millions of laser waveforms and then collect their reflection to precisely measure the shape and distance of physical objects in a 3D environment. Essentially, they are laser-based radar systems. This process is repeated millions of times per second to create a precise realtime three-dimensional map of an area called a point cloud, which can then be used for navigation systems [56]. While the technology itself is decades old, with improvements in Lidar performance in terms of range detection, accuracy, power consumption, as well as physical features such as dimension and weight, its popularity has been rising in recent years, especially in the fields of robotics, navigation, remote sensing, and advanced driving assistance [57]. Lidars' main usage among our sources was for locating people in danger in search and rescue operations, such as one following an earthquake, and optimizing trajectory tracking for small multi-rotor aerial drones.

Microphones
Microphones are sound sensors that act as transducers, converting sound waves into electrical current audio signals carrying the sound data. When sound waves interact with the microphone diaphragm, the vibrations created are converted into a coinciding audio signal via electromagnetic or electrostatic principles that will be outputted [58]. This audio signal can then be stored as digital data and replayed or used in other applications such as training sound recognition machine learning models. The sources presented in this review mainly used microphones for real-time speech source localization.

Body Motion Sensors
Body motion sensors, also known as motion capture sensors, are a series of sensor systems that are used to keep track of a person or a physical movement or physical posture. They generally work by making use of other sensing systems, including photosensors, angle sensors, IR sensors, optical sensors, accelerometers, inertial sensors [59], and magnetic bearing sensors [60]. Mocap sensors have been widely known for their use in the entertainment industry, but with recent advances, they have become more affordable and accurate for common consumer use. The application for which motion capture was used among the sources in this review is complex posture detection.

Electrocardiograms
Electrocardiograms are heart monitoring sensors used for quick analysis of a patient's heart [61][62][63]. Heart contractions generate natural electrical impulses that are measurable by nonintrusive devices, such as lead wires placed on a patient's skin. The measured pulses are then converted into an electric signal that can be used to measure irregularities in the patient's heart rate [64]. Naturally, electrocardiograms are mainly used in medical facilities or by caregivers and nurses to monitor heart health [65,66], however, the sources used for this review have also utilized them for identifying epileptic seizures.

Electroencephalograms
Electroencephalograms are brain monitoring sensors used for analyzing a patient's brain activity. The brain's processes are the result of electrical current traveling through its neurons at varying levels depending on the current state of a patient, what they are doing, or how they are feeling. Electroencephalograms record these currents across the various brain regions using painless electrodes placed around a patient's scalp. These fluctuations recordings are then saved as either a paper or digital graph [67]. Much like electrocardiograms, electroencephalograms are mainly used in medical facilities or by caregivers and nurses to monitor heart health, however, sources used for this review have also utilized them for anesthesia patient monitoring.

Applications
Embedded machine learning applications are all either of a remote nature or require more mobile systems to be implemented. The applications which are covered in this review are divided into the following categories: autonomous driving, security, personal health and safety, unmanned aerial vehicle navigation, and agriculture.

Autonomous Driving
Autonomous driving refers to the ever-expanding field of assisted and self-driving vehicles. It involves the implementation of a machine learning algorithm designed to detect obstacles, street signs, pedestrians, and other vehicles. Almost all self-driving vehicle AI models are computer vision models such as object and depth detection and distance measurement, with some exceptions that rely on Lidar or Radar for obstacle detection. Due to the nature of the application, the highest priority for models developed on embedded systems for self-driving vehicles is performance speed. Driving requires extremely short reaction time and that makes the speed at which a model can identify objects and allow the other car systems to make driving decisions very important.

Security and Safety
Security applications of machine learning can be related to many different sections such as intruder detection or personnel safety in hazardous worksites [68]. Once again, most of these models are trained for computer vision purposes in order to identify different individuals and ensure authorized access to secure locations and information. They do this through facial recognition and biometric identification using embedded system-operated camera systems, to name a few avenues. Ensuring personnel safety in hazardous work environments also involves constant monitoring by camera systems, to see if any of the employers are showing visible signs of illness or injury. Accuracy and computational speed are both of very high import in these applications.

Healthcare
Monitoring the health of hospital and nursing home patients is one of the fields in which machine learning has been found to be increasingly useful. The AI models trained for these purposes are varied depending on the exact nature of the task they are created to accomplish [69,70]. Applications involving the monitoring of the status of specific organs of patients can rely on various different medical equipment as well as visual and thermal cameras, such as monitoring a patient's heart rate or brain activity, which are achieved with electrocardiograms and electroencephalograms. Fast performance of the machine learning models is of even greater importance in these scenarios as they can quite literally be about "life and death". Other health monitoring applications can refer to posture recognition and monitoring systems that rely on motion sensors and cameras to identify the posture of a given patient and inform their caretakers in case of any danger.

Drones
Aerial drones, or unmanned aerial vehicles, have a long history of military use, but have become increasingly utilized in everyday life over the past decade, be it for package delivery, remote video recording, wildlife research, or simply for recreational purposes. Many of these drones are of the quadcopter variety [71]. While most drones require remote piloting, there has been an increasing element of automation to their navigation [72,73], odometry, landing, and trajectory systems. AI models trained for these purposes use pathways, object images, and balance data models. While performance speed is an important factor for these models, accuracy takes far greater precedence as even the slightest misclassification can result in damage to or the destruction of the drone.

Agriculture
Different agricultural sectors have also started making use of machine learning. Object detection and facial recognition models are customized for recognizing individual animals during feeding and drinking to measure their overall consumption as well as monitor animal behavior and health. Object detection machine learning models are also used in farming crops for identifying weeds within the field, damaged crops, and crops ready for harvest, as well as any damage to the field and its fences. In both instances, the detection accuracy and energy consumption of the models are far more important than the performance speed.

Application Based System Comparison
As previously discussed, most review work on embedded machine learning has been focused on the implementation of modified ML architecture on specific embedded devices, whereas in this work, our focus is on identifying the advantages certain systems provide for specific applications and sensing schemes. For this purpose, we have divided our sources into the following categories with a summary of each presented in the Tables 2-12 after the conclusion section. The systems are then compared by their performance and cost, the former being assessed differently depending on the task for which the machine learning model is trained. The method used for analyzing the performance is different from source to source and heavily dependent on the specific application and sensory system. Each sourced paper used a different method for analyzing model accuracy and inference speed. Alongside the power consumption, the mean of all the final results is used to assess the overall performance of each embedded system and presented in Figures 2-9.

Image Recognition, Object Detection, and Computer Vision
As previously stated, different machine learning methods have been seeing an everincreasing application within various fields, among these methods is the broad field of computer vision, which includes image and object detection. These applications can range from security and agriculture to autonomous vehicles-we have further divided these applications into the specific field in which they are applied.

Crop Identification
As previously discussed, like many other professions, machine learning has been seeing an increasing level of application within the field of crop and animal agriculture. This application can range from smart affordable farming solutions such as in [74] to the monitoring of ripened produce as in [75]. While time is valuable in any discipline, for agricultural machine learning applications, it is not nearly as much of a priority as power consumption and accuracy. Most of the applications covered in this review involve the usage of object recognition algorithms for the detection of various field or crop features but there are other applications that are analyzed as well. The performance of these applications is covered in Table 2 in addition to a comparison graph provided in Figure 2.

. Face and Expression Recognition
Facial recognition is one of the most well known applications in the field of computer vision-many personal projects, academic research studies, and computer applications have been developed regarding or using facial recognition. There are also many specialized models based on facial recognition, such as facial recognition models for animals [85], or facial expression recognition models that make use of existing facial recognition technologies as a baseline [86]. The priority in facial recognition models is dependent on the application as models used for security purposes would need to have both high accuracy and inference speed, while commercial application models are not under as much scrutiny. Most of the sources used in this review either implement facial recognition directly [87], or use it as a basis for emotion and personality assessment as well [85]. The performance of these applications is covered in Table 3 in addition to a comparison graph provided in Figure 3.

Depth Estimation
Depth estimation is a sub-field of machine learning that attempts to estimate depth within 2D images. It involves the use of pixel shape and orientation for the identification of the distance of objects within 2D images and video from the device that recorded it. Its utility is mainly in photography and depth estimation for self-driving vehicles, while within our sources, it was mostly used for personal projects such as in [88]. The performance of these applications is covered in Table 4 as well as a comparison graph being provided in Figure 4.

Autonomous Vehicle Obstacle Recognition
One of the most widespread and focused implementations of machine learning, specifically, embedded machine learning, is in autonomous or assisted vehicles. Self-driving cars have been a staple of both science fiction and practical research for decades, but in the past decade, they have come increasingly close to reality. Advances in machine learning have been one of, if not the largest, driving factors behind this. While there are many different aspects of driving that a machine-earning algorithm could automate, from speed adjustment to the piloting of the vehicle in different directions, the focus in this review is mainly on the implementations of detection schemes for the various obstacles a vehicle can encounter, from other cars to pedestrians [98], road signs [99], traffic lights [5], and speed bumpers [11]. Due to the extremely dangerous nature of this application, systems used for these implementations need to be both as accurate and as fast as possible. The performance of these applications is covered in Table 5 in addition to a comparison graph provided in Figure 5.

. Computer Vision in Medical Diagnosis and Disability Assistance
An interesting and beneficial application of computer vision is its use in the diagnosis of medical conditions and in assisting individuals with disabilities. Many of the sources presented in this review made use of RGB and thermal imaging of patients to perform object detection and image classification to find any signs of medical conditions such as melanoma [110] or diabetes [111], while others presented systems for assisting the visually impaired [112]. In both presented fields of application, while a very high accuracy is of extreme importance, a high inference speed is also paramount to any aides to special needs individuals. The result of these benchmarks is covered in Table 6 in addition to a comparison graph provided in Figure 6.

Computer Vision in Safety and Security
A more novel application of Computer vision models is its use in security systems as well as safety oversight networks. The sources presented in this section cover applications in detecting violent assaults [12] and mining personnel safety [3] to detecting survivors of severe natural disasters [113]. Most of these applications make use of RGB video and image cameras to perform detection and recognition. The result of these benchmarks is covered in Table 7 in addition to a comparison graph provided in Figure 7.   Smart cities are an increasingly used term within tech circles that refers to, among other things, the usage of machine learning and AI for the automation of many aspects of city management. Many of these applications are related to traffic management [14] or to the profiling of individuals [144]. It is very important for these models to be able to handle a large number of objects at any given time; for this reason, inference time is of a higher priority for these applications. Most of these applications make use of RGB video cameras to perform detection and recognition. The result of these benchmarks is covered in Table 8 as well as a comparison graph being provided in Figure 8.

General Embedded Computer Vision
Many of the sources presented in this review could not fit into a large enough application category of their own. These sources ranged from works that were focused on the visual location of robotic limb grasping points [145] to ones studying the identification of individuals via their clothing [146]. For that purpose, these sources were all included within a generalized category presented in Table 9 as well as the comparison graphs shown in Figure 9.

Non-Vision-Related Machine Learning
Among the sources used for this review, a number were unrelated to any sub-field of computer vision and relied on different sensing schemes from LiDar [171] to ultrasound [13] for gathering training data and implementation, in applications from waste management [148] to heart monitoring [13]. While the sensing scheme and overall application of these models vastly differed from one another, their numbers for each application and sensor were not sufficient for a proper basis-by-basis comparison. For this reason, they are displayed within Table 10.

Embedded Machine Learning Optimization
Some of the sources in this review did not look into new applications of machine learning, but rather sought to optimize the performance of existing machine learning architecture on embedded system devices. The optimizations ranged from improving the effectiveness of image captioning models on the NVIDIA Jetosn TX2 [172] to pruning deep neural nets [173]. It should be noted that unlike the other sources in this review, most of these papers did not have sensing schemes. The result of these benchmarks is covered in Table 11 in addition to a comparison graph provided in Figure 10.

Benchmarks, Reviews, and Machine Learning Enhancements
Among the sources used for this review, there were works of research that were not focused on the introduction of a specific application or a new method for the implementation of machine learning tasks for any field. These papers either attempted to perform benchmarks of different embedded system hardware via the implementation of specific machine learning architectures on them [20] or tried to augment the learning rate of machine learning models and implement their work on embedded computing systems [23]. While most of the work that fell into this category did not include any sensing schemes, the data gathered in them were highly relevant to this work and were for that reason included in this review. The result of these benchmarks are covered in Table 12 and a comparison graph is provided in Figure 11.

Conclusions
Rapid advances have been made in the field of machine learning, causing an explosion of model variety, application, and performance. While many of these models are implemented on powerful stationary computer devices, there are many applications that are faced with cost, power, and size limitations for the specific usage of their models. For this reason, the field of embedded machine learning, which is the implementation of machine learning on embedded computing systems, has also faced a great deal of attention recently. The main challenges faced in embedded machine learning are caused by the severe limitations of embedded system devices in terms of computational performance and power, with different devices having different performances, power requirements, and purchasing costs. In this review, a large collection of research work and implementation of embedded machine learning on Raspberry Pi, NVIDIA Jetson, and a few other series of devices is presented alongside the overall power consumption, inference time, and accuracy of these implementations. In addition, unlike many other reviews of this topic, this paper also includes a presentation of the overall sensing scheme present in many of the works. It was believed that this was a major dimension of embedded machine learning study overlooked by most other reviews on the subject matter. The hope of this review is to familiarize interested researchers in the field of embedded machine learning by giving them a general introduction to it.
Overall, this study contained studies of several generations of embedded systems, specifically, the Nvidia Jetson and Raspberry Pi systems, showing that much like dedicated computing systems, embedded devices have been experiencing steady improvements in the fields of performance and power consumption. More recent Jetson boards such as the TX2 have a far higher performance rate compared to the TX1 while having the same power consumption levels. As these advances continue, it stands to reason that embedded machine learning will see even greater attention and become even more widespread. All of the systems discussed in this work have their own distinct advantages and disadvantages that users would need to consider when choosing a system for their embedded machine learning application. More robust systems with high performance and relatively efficient power usage such as the Jetson Board and Coral Dev Board line tend to be more monetarily expensive, while more affordable options such as the Raspberry and Banana Pi boards tend to have far lower performances. More remote applications such as agricultural object detection systems might need a greater number of low-power systems while not having much emphasis on performance, while autonomous vehicle applications would have a far greater emphasis on performance and accuracy than on cost and power usage. A general table of all sources' hardware, application, ML architecture, sensor is provided in Table 13 for interested readers.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: