TinyML with Meta-Learning on Microcontrollers for Air Pollution Prediction

: Tiny machine learning (tinyML) involves the application of ML algorithms on resource-constrained devices such as microcontrollers. It is possible to improve tinyML performance by using a meta-learning approach. In this work, we proposed lightweight base models running on a microcontroller to predict air pollution and show how performance can be improved using a stacking ensemble meta-learning method. We used an air quality dataset for London. Deployed on a Raspberry Pi Pico microcontroller, the tinyML file sizes were 3012 bytes and 5076 bytes for the two base models we proposed. The stacked model could achieve RMSE improvements of up to 4.9% and 14.28% when predicting NO 2 and PM 2.5 , respectively.


Introduction
Tiny machine learning is a rapidly growing field of machine learning [1], targeting resource constrained devices such as microcontrollers to optimise energy efficiency and latency [2].This work aims to improve tinyML models running on a microcontroller using a meta-learning method for predicting hourly air pollutants.With a stacking ensemble architecture, the meta-learner learns from each base model and improves the final prediction.We designed lightweight models for base members and selected a simple least squares linear regression as a meta-learner.

Materials and Methods
Air quality data from five monitoring sites in the Greater London area between 1 July 2019 and 13 December 2021 were collected using the Openair framework [3].These sites were London Bexley (BEX), London Westminster (HORS), London N. Kensington (KC1), London Eltham (LON6), and London Marylebone Road (MY1).Seven features were selected as inputs (NO x , NO 2 , NO, PM 2.5 , modelled wind speed, wind direction, and air temperature) to predict two hourly pollutants (NO 2 and PM 2.5 ).This work used 80% of the data for the training set and 20% for the test set.All features were normalised to the range of [0,1], and missing values were filled using a multivariate imputer function provided by scikit-learn.
Figure 1a shows the stacking ensemble concept.The stacking architecture consisted of two base models in Level-0 and a least squares linear regression as the meta-learner in Level-1. Figure 1b shows the proposed base models, consisting of dense layers as Base-1 and a combination of 2-D CNN and dense layers as Base-2.The rectified linear activation function (ReLU) was used in all the layers, except the output layers.In this work, we used the TensorFlow framework to create the ML models and deployed the lite versions to a Raspberry Pi Pico microcontroller without performing quantisation.
the TensorFlow framework to create the ML models and deployed the lite versions to a Raspberry Pi Pico microcontroller without performing quantisation.

Discussion
As reported in Table 1, the stacked model could reduce RMSE values obtained by the individual base models.The linear approximation by the meta-learner found the best way to combine the Level-0 members' outputs by minimising the residual sum of squares between the input and target sets.Converting the ML models to lite versions could reduce the model size by about 83% and 77% for Base-1 and Base-2, respectively.The deployed tinyML model sizes were 3012 bytes for Base-1 and 5076 bytes for Base-2, considered lite enough even without performing any quantisation techniques.The lite version's accuracy was not degraded compared to that of the original model.

Discussion
As reported in Table 1, the stacked model could reduce RMSE values obtained by the individual base models.The linear approximation by the meta-learner found the best way to combine the Level-0 members' outputs by minimising the residual sum of squares between the input and target sets.Converting the ML models to lite versions could reduce the model size by about 83% and 77% for Base-1 and Base-2, respectively.The deployed tinyML model sizes were 3012 bytes for Base-1 and 5076 bytes for Base-2, considered lite enough even without performing any quantisation techniques.The lite version's accuracy was not degraded compared to that of the original model.

Table 1 .
RMSE values of the base and stacked models obtained from the test set.

Table 1 .
RMSE values of the base and stacked models obtained from the test set.