This section explains the results of TinyML for use in various application categories extracted from the chosen experimental studies. Data from demonstrations of TinyML use cases, and in particular, the reasons the authors gave for their choice to use a TinyML solution over an IoT or cloud-based solution in their experiments were examined. Their results and conclusions were analyzed to assess how these experiments fit TinyML’s application history.
4.1. TinyML Applications in Healthcare
This subsection covers TinyML experiments related to the field of healthcare. Most of these studies have covered patient diagnostics and health monitoring.
Table 7 lists the results of the healthcare application experiments.
Experimental studies on TinyML implementations in healthcare have focused on data processing assistance for various treatments or detection procedures. One experiment involved testing TinyML as a solution to enhance speech for use in hearing aids [
26]. The experiment used an STM32 microcontroller unit to represent the hardware constraints of a hearing aid as a model for their experiment. Their model’s memory size was pruned by 47% using experimental pruning techniques to optimize the model and fit it onto the device. The experiment’s survey participants expressed a moderate preference for the enhanced audio sample over the unprocessed audio sample, and the computational latency was sufficiently reduced to 4.26 ms—below the benchmark requirement of 10 ms. Their experiment found the optimized model satisfactory and a latency low enough to consider pruned RNNs for their task, and the TinyML implementation allowed for audio processing on the edge device without relying on real-time IoT service.
Other TinyML experiments cover medical detection scenarios, with one example being the implementation of a TinyML model for detecting gait deficit among patients with Parkinson’s disease [
27]. The researchers assumed that latency and network connection dependency were the reasons for choosing TinyML over the other solutions. The embedded model was run on an ATMega2560 microcontroller. Their results suggested that these techniques worked to accurately classify Parkinsons symptoms and could be extended in future work to carry out predictive analysis. They concluded that their hardware was appropriate for the experimental tasks and suggested that the experiment could also be scaled further to a wider range of embedded devices.
Another detection experiment analyzed electrocardiogram results using a TinyML solution to detect and classify cardiac arrhythmias [
28]. The experiment used a Chip nRF52 from Nordic Semiconductor with an ARM CortexM4 processor as the hardware base to run the model. The inference model was a CNN running a CMSIS-NN library designed for ARM Cortex-based processors. The experimental results showed a model accuracy of 87% on the testing set, a memory requirement of 210 KB output binary to run the network, a latency execution time of 95 ms, and a power usage of 21 mW/h. For future work, they suggested comparing their inference library with other libraries such as TensorFlow Lite for Microcontrollers (TFLM) to find the best performing ML architecture solution for their experimental task.
TinyML has also been shown to be able to detect medical problems from visual data as well. One such experiment used a TinyML model for the detection and classification of liver lesions using visual data fed into the model [
30]. The experiment did not use a microcontroller or hardware device for hardware but instead used an automated CAD hardware model fitted for use in a TinyML environment with small-memory constraints. The model consisted of a deep neural network (DNN) architecture with pretrained weights from Keras Applications, with an Adam training optimizer for maximum accuracy. The experimental results showed an 80% accuracy in detecting liver lesions for experimental data in a model with a 75 ms delay. These results were satisfactory enough to support the use of TinyML inference for acute detection of lesions.
Two additional experiments investigated the use of TinyML for seizure detection in patients with epilepsy using different neural network models running on different hardware devices. One group of researchers ran their experiment on an STM32L476 ARM Cortex microcontroller, with detection processed on the device [
29]. The study results showed high accuracy rates of over 90%, and a low false-alarm rate. The other seizure detection experiment was performed using a BioWolf wearable ExG device with a multicore processor, with the seizure detection model implemented on the device [
31]. Both experiments measured detection success using the Recall Score to measure accuracy, because their use cases involved both feature detection and classification. Both studies used Random Forest as a major classification algorithm, finding it the most suitable for classification on a resource-constrained platform.
One study focused on detecting colorectal cancer polyps in patients [
46]. They chose to embed their model on a hardware device to reduce the power consumption and increase the accuracy. They used a CNN model for classification and implemented it on a medical capsule robot (MCR) modified to contain an MCU with a four-layer CNN implemented on the MCU. The images captured by the MCR sensor were compressed and sent to the model, which classified them. Their model was able to successfully classify cancerous polyps on the test dataset with an accuracy of 92.8% while maintaining low power use, using only 2.5488 mW at a clock speed frequency of 8 MHz, with an inference latency of 1.6 s. They concluded that their four-layer model had a slightly lower accuracy than other common CNNs but that the power and memory efficiency make it usable for classification tasks.
Another study used TinyML to detect respiratory conditions from sound recordings, distinguishing between healthy lung respiration and those from asthma patients [
49]. The researchers chose on-device inference to reduce diagnosis result latency and enhance data privacy by avoiding transmission across a network. The hardware used was an Arduino Nano MCU with an ARM Cortex processor containing 256 KB of RAM. The inference was done with a custom-designed CNN trained using TensorFlow Lite Micro. The custom CNN model returned a 96% accuracy rate and a 97% precision and recall rate on the test dataset, using only 12 KB of RAM for inference, approximately 250 KB of flash storage for the model, and with an inference time of 127 ms.
Overall, TinyML’s use in healthcare applications mainly focus on diagnosis, with ML models chosen for classification purposes. The hardware used is typically specialized microcontrollers and wearable devices, particularly the ARM Cortex series. The key reason for choosing TinyML is resource constraints, in particular memory and power constraints. Other common reasons include system security and data privacy, both of which were hypothesized to be more secure with on-device inference computing compared to cloud-based IoT. TinyML inference latency remained around 100 ms with complex diagnostics such as cancer lesions taking 1 to 2 s on average. Inference accuracy rates ranged between 80 and 99%.
4.2. TinyML Applications in Ecology
This subsection covers the TinyML experiments related to ecology, including agriculture, water treatment, and environmental monitoring.
Table 8 lists the results and implications of the experiments.
Research into TinyML applications in smart farming has found that the majority of TinyML solutions deal with crop management, such as moisture and temperature data processing, irrigation optimization, and yield efficiency optimization [
1]. These experiments chose TinyML over IoT because of issues with latency and security that prevent the effective use of cloud-based inference solutions. Machine learning techniques in smart agriculture often rely on processing data collected from sensors to analyze crop status and detect the presence of disease or pests [
41]. This processing is often performed under time, power, and memory constraints, as accurate results must be delivered within time to ensure a successful response, and detection must be cost-effective and deployable at scale and across diverse ecological environments and agricultural settings.
One such experiment used TinyML to forecast temperature in greenhouses using sensor data to predict temperature patterns [
33]. Their neural network model was run on an Arduino Nano 33 BLE Sense for greenhouse monitoring. This platform proved sufficient for running their neural network models while consuming very low power compared to traditional computing platforms. The resulting experimental model used only 0.17 W of power compared to the base model which used 3.5 W, suggesting that temperature forecasting can be implemented successfully on the edge even on extremely power-constrained devices.
TinyML for edge audio processing has also been tested, with a trained model deployed on an Arduino Nano connected to a 0.9-inch display put to test [
38]. The results showed that mosquito wingbeats can be recognized by a model collecting data from Arduino. Limitations include distance issues, as audio quality is dependent on the sample distance from the device, and the testing environment had certain sound constraints. The model exhibited an accuracy of 88.3% on a testing dataset of mosquito wingbeat samples. The inference time was 337 ms, with a RAM consumption of 9.2 kB and a flash usage of 43.4 kB for one second of data.
TinyML image classification can be carried out on an OpenMV Cam STM32H7 Plus, with an LR-Net model embedded in the camera’s ROM. The resulting TinyML system can process and classify 15 images per second at an accuracy of 98.0%, with the model using only 13 kB of the 31 kB camera RAM space [
40]. STM32CubeAI was used to generate a C code file from the neural network model to perform inferences from the camera hardware.
Other classification experiments focused on using embedded models to detect and classify gases [
42]. One implementation had a classification accuracy of 72% for detecting gases with a sensor and classifying them into one of four categories (Ammonia, Methane, Nitrous Oxide, or neither) The main proposed use case of this experiment is to assist farmers in monitoring air quality on an edge probe with less need to remain connected to the cloud. The researchers aimed to develop a low-cost, low-power probe capable of measuring the presence of environmentally harmful gases. The experiment used TinyML inference over IoT to increase transmission efficiency and avoid cloud-to-device traffic. The hardware included several sensors connected to an STM32 MCU running a pretrained ANN converted from Python to C code using XCubeAI. The resulting accuracy was calculated by comparing the predicted values to the true values and found five classification errors in 18 test patterns.
TinyML solutions have also been tested for vehicular emission detection to monitor CO
2 emissions from vehicles using engine sensor data. The experiment used a Freematics ONE+ hardware platform featuring an ESP32 microcontroller for vehicular emissions monitoring [
32]. TinyML was chosen over IoT inference to reduce power costs and reliance on network infrastructure and enable real-time, distributed monitoring near pollution sources. The system successfully processed OBD-II sensor data while maintaining low power consumption. They implemented unsupervised TinyML with Typicality and Eccentricity Data Analytics (TEDA) to process vehicular emission data. The experimental results for vehicle emission detection showed 94% accuracy using TEDA, with inference times of approximately 1 ms and RAM usage of only 1.5 KB.
Another TinyML experiment was conducted to carry out environmental predictions and atmospheric pressure forecasting and to determine whether a neural network running on an MCU can reliably predict weather patterns [
34]. The experiment used Long Short-Term Memory (LSTM) networks to predict the atmospheric pressure. They used two LSTM cells with 30 units per cell and dropout layers at a 20% drop rate. The hardware used an STM32F401RET6 microcontroller with 512 KB of FLASH memory and 96 KB of SRAM. The prediction results had a Root Mean Square Error of 0.0255. Their system demonstrated the successful operation of deep tiny neural networks running on an MCU with memory constraints.
Another experiment examined the use of TinyML to detect cholera contamination in communal tap waters in rural areas. The researchers used TinyML to process physicochemical parameters of water to predict water-borne cholera presence, assuming that traditional laboratory testing methods are expensive and impractical for deployment in rural Africa [
35]. The implementation used an embedded kit that could carry out offline inference, which included an ARM Cortex M4 processor. Their model used Support Vector Machine (SVM) as the primary ML algorithm, which was specifically chosen for its effectiveness with small datasets and nonlinear pattern recognition. They used model compression to reduce the size to fit on the hardware for the model to run. The experimental solution had an accuracy of 94% for SVM classification on the testing dataset, with an output latency of 1 ms, a memory usage of 1.6 KB for RAM and 15 KB for flash, and within the power constraints.
Model sizes below 140 kB can be used to successfully predict plant growth status and disease presence with 96% accuracy on a 10-class dataset. Such models are compact enough to fit within a battery-powered Sony camera system and require a low power rate of about 2.63 mW per hour to operate, allowing for easy use and deployment in an agricultural environment bound by power constraints [
41].
One experiment integrated TinyML with unmanned aerial vehicles (UAVs) for smart farming. On-device inference was chosen over cloud inference to reduce bandwidth requirements, enhance privacy, and reduce energy consumption. The model predicted soil moisture content using a DNN and LSTM model on an ESP32 microcontroller running TFLM. The resulting DNN achieved a 97% accuracy rate with an average inference time of 97.8 ms. The experiment found that DNN models with smaller LSTM structures require less memory to infer and could infer faster than those with larger LSTM structures [
47].
Overall, the use of TinyML for ecology includes a wide breadth of applications ranging from sensor data processing to audio and image classification. As a result, a wide range of neural networks are used specific for each application. Resource constraints include memory, power, and communication bandwidth, with the latter resource dependent on the local infrastructure of the application environment. Inference latency ranged from 0.1 to 1 s, with accuracy ranging from 98% for simple image classification to as low as 72% for complex gas analysis. These results show that TinyML’s performance in ecological experiments is varied and dependent on the task complexity and the experiment setting.
4.3. TinyML Applications in Vehicular Detection
Another field for TinyML experiments includes vehicular assistance software, particularly related to sensors and object detection.
Table 9 lists the related experiments, their results, and the implications.
Intelligent cars offer a new platform for the development of embedded vehicle service applications [
43]. TinyML implementations in vehicles stem from the existing research gaps identified in IoT-based intelligent vehicle implementations. IoT solutions rely on direct data-streaming connections between a vehicle processor and a centralized cloud server. This current arrangement suffers from scaling issues where the cloud server may not be capable of managing a high volume of data transmission, reception, and processing beyond a certain scale [
43]. Solving this problem requires localizing certain ML system components such as inference onto the edge computing layer to limit the volume of data that would have to be streamed to and from the cloud. This research gap offers opportunities to experiment with the use of edge inference for intelligent vehicles, both in terms of applying such a model to different use cases and optimizing the degree of localization between devices and the cloud.
Use cases for TinyML systems in intelligent vehicles often involve the detection of other road objects such as vehicles or pedestrians [
36] and the detection of road quality issues such as potholes. TinyML experiments included models for real-time vehicle and pedestrian detection (VaPD) in automotive applications for use in driver assistance systems. One such experiment tested whether TinyML could process camera input data and detect both vehicles and pedestrians within model size constraints [
36]. They implemented a vehicle and pedestrian detection system on a Raspberry Pi 4 and an NVIDIA Jetson Nano 2 GB. The Raspberry Pi 4 was enhanced with a Coral USB Accelerator with an Edge TPU coprocessor for optimal inference capability. The edge model uses a Tiny YOLO v3 architecture with Tucker tensor decomposition as an optimizer to decompose the convolutional layers and reduce the parameter count. This technique was chosen specifically to join decomposition and fine-tuning in a single step as part of the optimization process. The results achieved an experimental precision of 77.5% with their optimized model, using a much smaller memory size (22 to 32%) compared to their baseline model, and a storage cost of 10.7 MB compared to an 875 MB solution. The experimental results showed that functional object detection on an edge vehicle can be successfully implemented even with a smaller model size and storage constraints, allowing for more memory-efficient implementations.
Detection algorithms also include vehicle drivers, as shown in one experiment [
39]. The experiment used TinyML to detect signs of driver drowsiness for a vehicle safety assistant system. The experiment selected lightweight sensors with a locally run embedded model to minimize computational cost and reduce inference latency as much as possible, as cloud-hosted inference was considered to have an unacceptable latency delay. The main challenge identified was the high training cost for DL models, both in terms of computing power and training data, to make accurate predictions. The experiment used several lightweight DL models, quantized to reduce their size, to perform inference. The experiment aimed to cover existing the limitations they found in driver detection experiments, namely, issues with accurately identifying different driver head movements and issues with user-unfriendly means of driver detection. The resulting model had an accuracy of 0.9964.
Another similar experiment examined TinyML for real-time bus passenger detection and counting for smart public transport systems [
37]. The experiment involved optimizing a Tiny YOLO network to accurately detect passengers while still meeting resource constraints. This involved a model adjusted to use a depth wise decomposition, carrying out decomposition and fine tuning in separate steps. They used batch normalization and LeakyReLu as activation functions to reduce the computational complexity as much as possible. The experiment results had a detection accuracy of 0.945 during rush hour, with the file size decreasing from 60.5 MB to 7 MB compared to the base model.
When implemented on a camera sensor in an intelligent vehicle, a TinyML system can detect road anomalies such as speed bumps and potholes at a high rate, maintaining an F1 score of 0.76–0.78 using multiple iterations of an experimental driving route of an intelligent vehicle [
43]. The study did not compare memory or power use, which may provide an additional topic of research when comparing the efficiency of intelligent vehicle TinyML implementations rather than just their accuracy score.
The use of TinyML for vehicular detection mainly includes image detection and classification of obstacles, pedestrians, passengers, and vehicles. These applications favor TinyML solutions due to a need to overcome bandwidth latency delays and avoid communication link problems. The results depend on the setting and available resources, with accuracy ranging around 76–78%, indicating a need for improvement.
4.4. Other TinyML Applications
TinyML is also used for other applications such as industrial design, edge device security, and edge model deployment.
Table 10 lists studies that used TinyML for applications other than healthcare, ecology, or vehicular detection.
Other experiments have focused on the application of TinyML to machine design and production. The experiment aimed to use TinyML to perform efficient real-time fault detection of operating machinery to reduce machinery production and maintenance costs. The TinyML model was tasked with monitoring the condition of industrial assets and detecting anomalies therein. Data acquisition, training, and inference were performed on an edge device. The framework uses an MCU attached to an accelerometer to detect vibration signals. The TinyML system achieved an anomaly detection accuracy of 99.9% when tested on a centrifugal pump [
44].
The heterogeneity of TinyML systems, particularly hardware, provides barriers to their widespread use in industrial settings where standardization and scale are paramount [
6]. Insufficiently detailed documentation of the TinyML model distributions provides barriers to their integration in large-scale industrial systems. The privacy, latency, and energy efficiency benefits of TinyML still draw demand for its use in industrial settings.
TinyML was used over IoT to avoid a bottleneck problem with transferring and receiving data caused by bandwidth constraints and server response time causing inference delays. This problem was hypothesized to worsen as the number of machines increased. The experiment used edge training as well as inference, as they believed that their anomaly detector must train the device on the same device on which inference is performed. The experiment used a STM32 MCU with 1 MB of RAM and 2 MB of flash memory, and the core was an ARM Cortex. The system used a signal processing technique called Wavelet Packet Decomposition to reduce the data dimensions to fit the MCU’s memory constraints. Input data were separated into low- and high-frequency regions. The autoencoder was trained in C by using a backpropagation algorithm. The network was trained sequentially to account for data storage limits and functioned by comparing the Mean Square Error (MSE) of the sensor data to a set anomaly level. If the error exceeded the anomaly level, a signal was sent to investigate the machine. The system output was evaluated by comparing the ratio of True Positives, True Negatives, False Positives, and False Negatives to assess the output accuracy. The study claimed a training model accuracy of 0.997, with only one false positive out of 3400 samples, and a testing accuracy of 1 for 100 test samples. The main limitation is the test sample count, as 100 tests on only one machine may be insufficient for this use case. The test results are also insufficiently explained, as no diagram is provided to analyze them; the results (accuracy, precision, and F1 score) are only mentioned in one line.
One experiment deployed and tested ML models on an integrated circuit known as an FPGA. An edge inference solution was used to reduce the classification latency, because the task of detecting and classifying malware in real time was very time sensitive and required a detection speed measured in microseconds [
45]. The FPGA was loaded with software to detect and classify malware and side-channel attacks from hardware register data collected from the device. As the goal of the experiment was to design and test a detection system for hardware-level malware, they chose an edge AI implementation to perform detection and classification on an embedded system. Multiple model types were tested on the same hardware device to compare their performance and assess their suitability. The experimental results showed varying degrees of accuracy for the different implemented models, with the J48 model displaying the highest F1 score of 0.918 for malware detection among the tested models.
The executed ML training pipeline returned a trained model and a set of evaluation metrics that were deployed via an automated deployment pipeline that ran immediately after validation. The model deployment script was carried out within a Docker service image, which deployed the model on the edge device where it could carry out inference on real-time data generated by the edge device’s sensors. The experiment simulated a real use case and applied their ML pipeline against it. The resulting ML pipeline successfully trained their model on the cloud and deployed it to an edge device. The deployed model then carried out a principal component analysis against humidity and temperature measurement data collected in real time by the device sensors. The training cloud server had four cores and 16 GB RAM, and the edge device was a Kunbus, which is similar to a Raspberry Pi Core. The ML model was run inside a Docker container in the edge device and required 1.267 s to run. Building the docker image from the trained model took 457.4 s and the combined deployment steps took approximately 38 s. The running model could process more than 12,000 requests per minute and successfully predicted all anomalies in the test dataset correctly [
51].