Dimensionality Reduction for Smart IoT Sensors

: Smart IoT sensors are characterized by their ability to sense and process signals, producing high-level information that is usually sent wirelessly while minimising energy consumption and maximising communication e ﬃ ciency. Systems are getting smarter, meaning that they are providing ever richer information from the same raw data. This increasing intelligence can occur at various levels, including in the sensor itself, at the edge, and in the cloud. As sending one byte of data is several orders of magnitude more energy-expensive than processing it, data must be handled as near as possible to its generation. Thus, the intelligence should be located in the sensor; nevertheless, it is not always possible to do so because real data is not always available for designing the algorithms or the hardware capacity is limited. Smart devices detecting data coming from inertial sensors are a good example of this. They generate hundreds of bytes per second (100 Hz, 12-bit sampling of a triaxial accelerometer) but useful information comes out in just a few bytes per minute (number of steps, type of activity, and so forth). We propose a lossy compression method to reduce the dimensionality of raw data from accelerometers, gyroscopes, and magnetometers, while maintaining a high quality of information in the reconstructed signal coming from an embedded device. The implemented method uses an adaptive vector-quantisation algorithm that represents the input data with a limited set of codewords. The adaptive process generates a codebook that evolves to become highly speciﬁc for the input data, while providing high compression rates. The codebook’s reconstruction quality is measured with a peak signal-to-noise ratio (PSNR) above 40 dB for a 12-bit representation.


Introduction
Nowadays, smart sensors are everywhere: in our mobile phones [1], wearables [2,3], embedded systems, home automation, industrial environments, and so forth. Most of the time these systems are not autonomous but rather are based on the cooperation of two or more entities. For example, a sensor gathers data that it pre-processes and sends to a mobile phone wirelessly, then the phone processes the data further and finally sends it to the cloud for deeper knowledge extraction.
Inertial sensors generate information about the movement of a person, process, or animal, thereby constituting a clear example of this data life cycle. For example, a 12-bit 3-axis accelerometer embedded in a smartwatch sampling at 20 Hz generates more than 2.5 Mb of data every hour. However, the amount of information actually sent to our smartphones is several orders of magnitude less than this amount: the number of steps, activities performed, minutes of sleep, and so on. All this information is added to the cloud to provide further insights into sleep patterns or the number of calories that should be burned. The impact on the energy consumption and communication quality of the device is evident.
During this transmission process, the system expends energy according to the amount of data sent. Furthermore, sensors are not always powered by a continuous energy supply source; instead, During this transmission process, the system expends energy according to the amount of data sent. Furthermore, sensors are not always powered by a continuous energy supply source; instead, their energy often comes from batteries that must meet other constraints, such as size, operational temperature range, and means of recharging. In any case, if the amount of data being sent is reduced, these systems can remain autonomous for a longer period of time.
Furthermore, the selection of the communication protocol also relies on the amount of data sent. As in Figure 1, the required data rate may vary between the design and running of the smart sensor because, when designing a smart sensor, the processes or algorithms to be implemented are not known a priori, and raw data must be sampled at a high frequency to avoid missing relevant information. Hence, the first phase of intelligence design involves creating a raw database that merges several data sources. This database usually contains three data sources: raw sensor data sampled as quickly as possible, metadata associated with the activities or scenarios to be detected, and additional data from third parties. When all the information has been collected, the algorithms are designed offline to best fit the application requirements. The last design phase involves deploying specific hardware processes according to the capabilities of the hardware. This deployment considers questions such as which, when, and how often sensors are sampled, how raw data is processed, and, lastly, which algorithms should be run to produce the highest-level information possible to send out wirelessly over a constrained network. As a result, in the execution phase, the smart sensors will send out high-level information instead of raw data. This abstraction process aims to reduce the memory space needed, reduce the energy costs associated with sending data, and allow a simpler representation to reach the end user. Typically, this process is carried out with techniques, such as machine learning [4], that have been designed to classify and group low-level abstractions. Figure 2 illustrates a time line containing the common steps that should be followed to analyse any dataset correctly. In the pre-processing phase, time windows are used to contain the signal data, and different techniques are applied to the data vector [4][5][6][7][8][9][10]. Filtering removes noise and eliminates outliers. Statistical methods can be used to evaluate changes in the data produced by abrupt variations and involve the use of minimum, maximum, average, and median values or, in the case of motion sensors, correlations between axes. Variances and standard deviations can be used to improve representations. To explore the relationship between the variation in one axis and that in another, a cross-correlation is used. This abstraction process aims to reduce the memory space needed, reduce the energy costs associated with sending data, and allow a simpler representation to reach the end user. Typically, this process is carried out with techniques, such as machine learning [4], that have been designed to classify and group low-level abstractions. Figure 2 illustrates a time line containing the common steps that should be followed to analyse any dataset correctly. In the pre-processing phase, time windows are used to contain the signal data, and different techniques are applied to the data vector [4][5][6][7][8][9][10]. Filtering removes noise and eliminates outliers. Statistical methods can be used to evaluate changes in the data produced by abrupt variations and involve the use of minimum, maximum, average, and median values or, in the case of motion sensors, correlations between axes. Variances and standard deviations can be used to improve representations. To explore the relationship between the variation in one axis and that in another, a cross-correlation is used. The next step, as indicated in Figure 2, is to apply a dimensionality reduction model to the data in order to compress them. The usual solutions run offline models on a computer or in the cloud that process the raw data from the sensors directly. Typical techniques for inertial sensors are principal component analysis (PCA) [9][10][11][12], sequential forward selection (SFS) [13], random subset feature selection (RSFS) [13], independent component analysis (ICA) [12], and independent principal component analysis (I-PCA) [12].
On the other hand, it is necessary to know the techniques and metrics that are used in these applications, in this case, IoT lossy compression. There are multiple options of lossy compressors [14] such as traditional methods based on tensor decomposition like discrete cosine transform (DCT), discrete wavelet transform (DWT) or vector quantisation (VQ), artificial neural network (ANN), deep belief network (DBN) learning algorithms. As evaluation metrics we can find a large series of study variables [14], although the most representative in this field by far are the peak to signal noise ratio (PSNR), the compression ratio (CR), the bit rate (BR) and the mean squared error (MSE).
We can find examples of DCT and DWT techniques applied for environmental signals [15][16][17] such as temperature, humidity, and wind speed, with compression ratios of up to one hundred, with RMSE below unity. Compressed sensing [14,17] can also be used, usually based on the traditional models, while shifting the workload to the decoder side, thus encoding on the device side does not require too many resources. It is possible to find some hybrid models including a lossy compression model corrected with a lossless compression model in order to lower the error in the reconstructed signal [18], at the cost of achieving compression ratios in the order of tens.
The main objective is to reduce energy consumption in data transmission [15][16][17][18][19][20][21], since the cost of wireless communication is usually the highest; therefore, methods are adapted to reduce the amount of data transmitted, such as SZ compression [19] to reduce the sending of medical data such as ECG or hearth rate.
Vector quantization models [22][23][24][25][26], mentioned above, present higher compression ranges that exceed a hundred compression ratio [22] and are commonly used for image compression. Typical PSNR values in these models range from 25 to 60 dBs [22,23]. As with other models, there are variations of the traditional model that allow us to add a function to provide data with adaptability such as frequency sensitive competitive learning (FSCL) [25,26].
The last step applies the abstraction layer to the sensor data by processing the data with classification models, such as support vector machines (SVM) [5,8,10,11], random forest [27], and multilayer perceptron (MLP) [10].
Each of these tasks can be executed in the different entities comprising a system, depending on their computing capabilities, power, memory, and so on. As a rule of thumb, the closer to the source of the data each task is executed, the more efficient the system will be. Nevertheless, it is challenging to execute dimensionality reduction and abstraction algorithms in embedded systems that usually consist of a processor and limited memory.
The objective of this work is to carry out an adaptation of the traditional VQ model ( Figure 3) as it belongs to those that present the highest compression ratio. The purpose is to execute the compression of data gathered by an inertial sensor inside an embedded device and to improve its energy efficiency by lowering the number of bytes transmitted; to improve security of communications as data encoded with VQ results in a closed information circuit; and to analyse the compression capacity of the proposed model with inertial data instead of images. The next step, as indicated in Figure 2, is to apply a dimensionality reduction model to the data in order to compress them. The usual solutions run offline models on a computer or in the cloud that process the raw data from the sensors directly. Typical techniques for inertial sensors are principal component analysis (PCA) [9][10][11][12], sequential forward selection (SFS) [13], random subset feature selection (RSFS) [13], independent component analysis (ICA) [12], and independent principal component analysis (I-PCA) [12].
On the other hand, it is necessary to know the techniques and metrics that are used in these applications, in this case, IoT lossy compression. There are multiple options of lossy compressors [14] such as traditional methods based on tensor decomposition like discrete cosine transform (DCT), discrete wavelet transform (DWT) or vector quantisation (VQ), artificial neural network (ANN), deep belief network (DBN) learning algorithms. As evaluation metrics we can find a large series of study variables [14], although the most representative in this field by far are the peak to signal noise ratio (PSNR), the compression ratio (CR), the bit rate (BR) and the mean squared error (MSE).
We can find examples of DCT and DWT techniques applied for environmental signals [15][16][17] such as temperature, humidity, and wind speed, with compression ratios of up to one hundred, with RMSE below unity. Compressed sensing [14,17] can also be used, usually based on the traditional models, while shifting the workload to the decoder side, thus encoding on the device side does not require too many resources. It is possible to find some hybrid models including a lossy compression model corrected with a lossless compression model in order to lower the error in the reconstructed signal [18], at the cost of achieving compression ratios in the order of tens.
The main objective is to reduce energy consumption in data transmission [15][16][17][18][19][20][21], since the cost of wireless communication is usually the highest; therefore, methods are adapted to reduce the amount of data transmitted, such as SZ compression [19] to reduce the sending of medical data such as ECG or hearth rate.
Vector quantization models [22][23][24][25][26], mentioned above, present higher compression ranges that exceed a hundred compression ratio [22] and are commonly used for image compression. Typical PSNR values in these models range from 25 to 60 dBs [22,23]. As with other models, there are variations of the traditional model that allow us to add a function to provide data with adaptability such as frequency sensitive competitive learning (FSCL) [25,26].
The last step applies the abstraction layer to the sensor data by processing the data with classification models, such as support vector machines (SVM) [5,8,10,11], random forest [27], and multilayer perceptron (MLP) [10].
Each of these tasks can be executed in the different entities comprising a system, depending on their computing capabilities, power, memory, and so on. As a rule of thumb, the closer to the source of the data each task is executed, the more efficient the system will be. Nevertheless, it is challenging to execute dimensionality reduction and abstraction algorithms in embedded systems that usually consist of a processor and limited memory.
The objective of this work is to carry out an adaptation of the traditional VQ model ( Figure 3) as it belongs to those that present the highest compression ratio. The purpose is to execute the compression of data gathered by an inertial sensor inside an embedded device and to improve its energy efficiency by lowering the number of bytes transmitted; to improve security of communications as data encoded with VQ results in a closed information circuit; and to analyse the compression capacity of the proposed model with inertial data instead of images.

Materials and Methods
In this work, we analyse several VQ methods for reducing the dimensionality of raw data coming from inertial measuring unit (IMU) sensors while maintaining the highest quality of information. The aim of this study is to optimize the efficiency of a sensor equipped with an accelerometer, gyroscope, and magnetometer that monitors movement at 50 Hz so that it is capable of continuously sending this data to an external entity (e.g., the cloud) for analysis.
The smart sensor ( Figure 4) consists of a Pycom's Lopy4 module (Microprocessor ESP32 DualCore, 240 Mhz, RAM: 4 MB, ROM: 8 MB) and the inertial sensor BNO055 from Bosch. The device can communicate using different protocols depending on specific requirements: BLE/Wi-Fi/LoRa/Sigfox. Considering a resolution of 16 bits at 50 Hz, sensor data are generated at about 300 bytes per second, which is a limitation in terms of the necessary energy and the data protocol used. According to Figure 1, wi-fi is used to send raw data in the design phase, and LoRa is used to send compressed data in the execution phase.

Data Processing Proposal
First, a public dataset must be chosen on which to apply the pre-processing and then the dimensionality-reduction or compression model. The objective is to validate the approach and quantify the errors produced for inertial sensors in different positions emitting different amounts of data. In addition, a labelled database is needed so that classifications can be made, if necessary. Then we will be able to evaluate the impact on the generalisation of the trained model with databases created by people outside the study and the chosen inertial sensors.
For this study, we analysed several public databases covering different human activities monitored using inertial sensors. Some databases involved a large number of activities [2,28], many test subjects [29], a large number of sensors [1], or a large amount of data [1,3,29]. In order to carry out an analysis of the compression model without involving a variety of activities [3] or the development of these activities [29], we chose the heterogeneity activity recognition dataset [1]. This database used inertial sensors in smartphones or smartwatches to record patterns in sitting, standing, walking, climbing stairs, descending stairs, and riding a bicycle. We analysed the repeatability and variability of the data from different devices in the database and finally chose the data from the

Materials and Methods
In this work, we analyse several VQ methods for reducing the dimensionality of raw data coming from inertial measuring unit (IMU) sensors while maintaining the highest quality of information. The aim of this study is to optimize the efficiency of a sensor equipped with an accelerometer, gyroscope, and magnetometer that monitors movement at 50 Hz so that it is capable of continuously sending this data to an external entity (e.g., the cloud) for analysis.
The smart sensor (

Materials and Methods
In this work, we analyse several VQ methods for reducing the dimensionality of raw data coming from inertial measuring unit (IMU) sensors while maintaining the highest quality of information. The aim of this study is to optimize the efficiency of a sensor equipped with an accelerometer, gyroscope, and magnetometer that monitors movement at 50 Hz so that it is capable of continuously sending this data to an external entity (e.g., the cloud) for analysis.
The smart sensor ( Figure 4) consists of a Pycom's Lopy4 module (Microprocessor ESP32 DualCore, 240 Mhz, RAM: 4 MB, ROM: 8 MB) and the inertial sensor BNO055 from Bosch. The device can communicate using different protocols depending on specific requirements: BLE/Wi-Fi/LoRa/Sigfox. Considering a resolution of 16 bits at 50 Hz, sensor data are generated at about 300 bytes per second, which is a limitation in terms of the necessary energy and the data protocol used. According to Figure 1, wi-fi is used to send raw data in the design phase, and LoRa is used to send compressed data in the execution phase.

Data Processing Proposal
First, a public dataset must be chosen on which to apply the pre-processing and then the dimensionality-reduction or compression model. The objective is to validate the approach and quantify the errors produced for inertial sensors in different positions emitting different amounts of data. In addition, a labelled database is needed so that classifications can be made, if necessary. Then we will be able to evaluate the impact on the generalisation of the trained model with databases created by people outside the study and the chosen inertial sensors.
For this study, we analysed several public databases covering different human activities monitored using inertial sensors. Some databases involved a large number of activities [2,28], many test subjects [29], a large number of sensors [1], or a large amount of data [1,3,29]. In order to carry out an analysis of the compression model without involving a variety of activities [3] or the development of these activities [29], we chose the heterogeneity activity recognition dataset [1]. This database used inertial sensors in smartphones or smartwatches to record patterns in sitting, standing, walking, climbing stairs, descending stairs, and riding a bicycle. We analysed the repeatability and variability of the data from different devices in the database and finally chose the data from the Considering a resolution of 16 bits at 50 Hz, sensor data are generated at about 300 bytes per second, which is a limitation in terms of the necessary energy and the data protocol used. According to Figure 1, wi-fi is used to send raw data in the design phase, and LoRa is used to send compressed data in the execution phase.

Data Processing Proposal
First, a public dataset must be chosen on which to apply the pre-processing and then the dimensionality-reduction or compression model. The objective is to validate the approach and quantify the errors produced for inertial sensors in different positions emitting different amounts of data. In addition, a labelled database is needed so that classifications can be made, if necessary. Then we will be able to evaluate the impact on the generalisation of the trained model with databases created by people outside the study and the chosen inertial sensors.
For this study, we analysed several public databases covering different human activities monitored using inertial sensors. Some databases involved a large number of activities [2,28], many test subjects [29], a large number of sensors [1], or a large amount of data [1,3,29]. In order to carry out an analysis of the compression model without involving a variety of activities [3] or the development of these activities [29], we chose the heterogeneity activity recognition dataset [1]. This database used inertial sensors in smartphones or smartwatches to record patterns in sitting, standing, walking, climbing stairs, descending stairs, and riding a bicycle. We analysed the repeatability and variability of Electronics 2020, 9, 2035 5 of 16 the data from different devices in the database and finally chose the data from the Samsung S3 at a sampling frequency of 100 Hz. The Samsung S3 was positioned on the front of the torso with a belt.

Data Pre-Processing
For the pre-processing of the data, we followed the steps indicated in Figure 5, which involved grouping sample sequences in data windows of variable sizes to evaluate their impact, as well as calculating statistical parameters to extract information from the dataset to provide to an abstraction layer.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 17 Samsung S3 at a sampling frequency of 100 Hz. The Samsung S3 was positioned on the front of the torso with a belt.

Data Pre-Processing
For the pre-processing of the data, we followed the steps indicated in Figure 5, which involved grouping sample sequences in data windows of variable sizes to evaluate their impact, as well as calculating statistical parameters to extract information from the dataset to provide to an abstraction layer. • Raw data subsampling. In order to obtain more input data for training purposes, a subsampling option is included in the sensor sample rate (Tsoriginal) for the database. In this proposal, we considered three sampling frequencies: the original frequency, plus half-frequency subsampling, and quarter-frequency subsampling (Ts), producing two and four new signals, respectively, thus allowing us to increase the data collected. • Window maker. There are two parameters that are used to define the window size (TW): the sensor sampling frequency (Ts) and the full sample time (Tf). To evaluate their impacts, different combinations have been tested, with the only restriction being that the window size must always be a natural number. • Statistical data normalisation. Once the TW for the data window (xi) is defined, it is standardized by subtracting its mean and dividing this quantity by its standard deviation, resulting in the standardized data window (xi′) (see Equation (1)). The later data submission will include the mean value, the standard deviation, and the associated VQ index for (xi′). Two additional VQs can be implemented to code the mean and standard deviation values, resulting in a data submission consisting of three fully encrypted indexes. However, for simplicity, in this work, only the standardized data vector is compressed.
Once all the tasks have been executed, the normalized data window, along with its average and standard deviation, will be available. The compressor model input will be the normalized window, which defines the expected input size. However, if the window calculation parameters are modified, it will be necessary to retrain the compressor model.

Dimensionality Reduction
The dimensionality reduction models analysed in this work are of the VQ type, a lossy compression technique that uses a competitive decision algorithm. To represent an input, the VQ selects a value from a list of indexes corresponding to the N-centroid table (codebook). Each input is a point in an n-dimensional space, where the set of N centroids is also located. The VQ divides this space by optimising the arrangement of the regions associated with each centroid, which are called Voronoi regions ( Figure 6). • Raw data subsampling. In order to obtain more input data for training purposes, a subsampling option is included in the sensor sample rate (Tsoriginal) for the database. In this proposal, we considered three sampling frequencies: the original frequency, plus half-frequency subsampling, and quarter-frequency subsampling (Ts), producing two and four new signals, respectively, thus allowing us to increase the data collected. • Window maker. There are two parameters that are used to define the window size (TW): the sensor sampling frequency (Ts) and the full sample time (Tf ). To evaluate their impacts, different combinations have been tested, with the only restriction being that the window size must always be a natural number. • Statistical data normalisation. Once the TW for the data window (x i ) is defined, it is standardized by subtracting its mean and dividing this quantity by its standard deviation, resulting in the standardized data window (x i ) (see Equation (1)). The later data submission will include the mean value, the standard deviation, and the associated VQ index for (x i ). Two additional VQs can be implemented to code the mean and standard deviation values, resulting in a data submission consisting of three fully encrypted indexes. However, for simplicity, in this work, only the standardized data vector is compressed.
Once all the tasks have been executed, the normalized data window, along with its average and standard deviation, will be available. The compressor model input will be the normalized window, which defines the expected input size. However, if the window calculation parameters are modified, it will be necessary to retrain the compressor model.

Dimensionality Reduction
The dimensionality reduction models analysed in this work are of the VQ type, a lossy compression technique that uses a competitive decision algorithm. To represent an input, the VQ selects a value from a list of indexes corresponding to the N-centroid table (codebook). Each input is a point in an n-dimensional space, where the set of N centroids is also located. The VQ divides this space by optimising the arrangement of the regions associated with each centroid, which are called Voronoi regions ( Figure 6). The usual VQ techniques, such as K-means, distribute the centroids by replicating the probability density of the occurrence of examples in that space. In this work, VQs based on MSCL (magnitude sensitive competitive learning) [26] will be applied, meaning that the arrangement of the centroids distributes equally among them a certain magnitude defined by a function local to the centroids.
In order to compare several compression objectives, three magnitude definitions will be analysed: fixed value, number of activations, and quantification error. When selecting a magnitude function with a constant value of one, the VQ generated is equivalent to a classical K-nearest neighbour (K-NN). When the magnitude counts the number of activations that a centroid has along its training cycles, the generated VQ is equivalent to the frequency sensitive competitive learning (FSCL) model [25], where the generated codebook optimizes the Shannon information entropy of the coding (in the optimal coding, the codes are used with uniform probability). For a magnitude that accounts for the mean quantification error (Qerror) of each centroid with the data it captures in its Voronoi region, this VQ, with error as magnitude, will be referred to as error sensitive competitive learning (ESCL).
The steps of the VQ training algorithm are shown in Algorithm 1. During training, the VQ recalculates the positions of the centroids by presenting the data set cyclically. Training stops when a certain error level for the obtained coding is met. In each cycle, each training example is assigned to one centroid based on a two-step competition. First, the BMU (best matching unit) and NMU (next matching unit) calculations are accomplished, obtaining the two centroids with the smallest Euclidean distances from the input data. Secondly, the LBMU (local best matching unit) calculation is performed to select the definitive winner among the two centroids, i.e., the centroid with the smallest product of its Euclidean distance with its local magnitude. Once all the training examples have been assigned to the centroids, we proceed to the update phase where new centroid positions are assigned. Next, if the local magnitude is not a fixed value, the algorithm selects the BMU for each training sample again using the updated codebook, and cumulates codebook-centroids wins and errors for updating their new magnitudes. The usual VQ techniques, such as K-means, distribute the centroids by replicating the probability density of the occurrence of examples in that space. In this work, VQs based on MSCL (magnitude sensitive competitive learning) [26] will be applied, meaning that the arrangement of the centroids distributes equally among them a certain magnitude defined by a function local to the centroids.
In order to compare several compression objectives, three magnitude definitions will be analysed: fixed value, number of activations, and quantification error. When selecting a magnitude function with a constant value of one, the VQ generated is equivalent to a classical K-nearest neighbour (K-NN). When the magnitude counts the number of activations that a centroid has along its training cycles, the generated VQ is equivalent to the frequency sensitive competitive learning (FSCL) model [25], where the generated codebook optimizes the Shannon information entropy of the coding (in the optimal coding, the codes are used with uniform probability). For a magnitude that accounts for the mean quantification error (Qerror) of each centroid with the data it captures in its Voronoi region, this VQ, with error as magnitude, will be referred to as error sensitive competitive learning (ESCL).
The steps of the VQ training algorithm are shown in Algorithm 1. During training, the VQ recalculates the positions of the centroids by presenting the data set cyclically. Training stops when a certain error level for the obtained coding is met. In each cycle, each training example is assigned to one centroid based on a two-step competition. First, the BMU (best matching unit) and NMU (next matching unit) calculations are accomplished, obtaining the two centroids with the smallest Euclidean distances from the input data. Secondly, the LBMU (local best matching unit) calculation is performed to select the definitive winner among the two centroids, i.e., the centroid with the smallest product of its Euclidean distance with its local magnitude. Once all the training examples have been assigned to the centroids, we proceed to the update phase where new centroid positions are assigned. Next, if the local magnitude is not a fixed value, the algorithm selects the BMU for each training sample again using the updated codebook, and cumulates codebook-centroids wins and errors for updating their new magnitudes.
To choose the necessary function updating the local magnitude BMU, we must consider their different behaviours. The ESCL forces the VQ to form Voronoi regions with the same average error, so the centroids tend to present uniform quantification errors. The FSCL adjusts the Voronoi regions to obtain the same activation frequency, so the centroids are used with uniform probability. A unity magnitude adjusts the Voronoi regions to present the same data density, so the centroids tend to present uniform densities.
After training, the VQ will assign the winning centroid to an input (based on the Euclidean distance), so only the BMU calculation is applied. The output of the system will be the numerical index referring to that centroid.

Analysis of Method Performances
The performance evaluation of a compressor model is broken down into two phases; for each of them, a series of objectives and procedures is defined in order to verify the compression quality and the capacity of the running system. Python 3 with the Pycharm IDE is used for the entire pre-processing and dimensionality reduction application. The smart sensor programming is done with MicroPython; thus, a large part of the generated code was reusable, except for the adaptation of certain libraries, including numpy.

Offline Training and Run Validation
Once the codebook has been trained, the coding process calculates the Euclidean distance of each centroid from the data sample vector, and the closest centroid is determined. The winning centroid index will be transmitted as NeuronWin. Figure 7 shows the flow chart listing the steps taken in offline validation.
to obtain the same activation frequency, so the centroids are used with uniform probability. A unity magnitude adjusts the Voronoi regions to present the same data density, so the centroids tend to present uniform densities.
After training, the VQ will assign the winning centroid to an input (based on the Euclidean distance), so only the BMU calculation is applied. The output of the system will be the numerical index referring to that centroid.

Analysis of Method Performances
The performance evaluation of a compressor model is broken down into two phases; for each of them, a series of objectives and procedures is defined in order to verify the compression quality and the capacity of the running system. Python 3 with the Pycharm IDE is used for the entire pre-processing and dimensionality reduction application. The smart sensor programming is done with MicroPython; thus, a large part of the generated code was reusable, except for the adaptation of certain libraries, including numpy.

Offline Training and Run Validation
Once the codebook has been trained, the coding process calculates the Euclidean distance of each centroid from the data sample vector, and the closest centroid is determined. The winning centroid index will be transmitted as NeuronWin. Figure 7 shows the flow chart listing the steps taken in offline validation. The decoding step consists of extracting the weights of the winning neuron from the centroidweight lookup table as the data is reconstructed.

Offline Training and Online Run Validation
When verifying the model in the smart sensor itself, the aim is to: (i) evaluate the feasibility of implementing the compression system when the smart sensor is operating in real time, and (ii) measure the model generalisation of a compression designed offline with one set of data using different data.
The encoding process follows the same procedure as in the previous case with data from the sensor array Sensor raw data. Figure 8 describes the steps used to evaluate the adaptive model within the microcontroller and compare the error in the remote device. The decoding step consists of extracting the weights of the winning neuron from the centroid-weight lookup table as the data is reconstructed.

Offline Training and Online Run Validation
When verifying the model in the smart sensor itself, the aim is to: (i) evaluate the feasibility of implementing the compression system when the smart sensor is operating in real time, and (ii) measure the model generalisation of a compression designed offline with one set of data using different data.
The encoding process follows the same procedure as in the previous case with data from the sensor array Sensor raw data. Figure 8 describes the steps used to evaluate the adaptive model within the microcontroller and compare the error in the remote device.  The pre-processing and dimensionality-reduction models were incorporated into the microcontroller to encode the data from the inertial sensor. The size of the compressor model was designed with the microcontroller's memory limitations in mind to allow for real-time implementation and operation. A wi-fi communication line was established between the device and the computer, allowing the computer to receive the raw data window, coded data, and statistical parameters. Once in the computer, we used the coded data to reconstruct the data (recovering the weights of the winning centroid used as a replacement for the original data) and evaluate the error committed in the compression. The pre-processing and dimensionality-reduction models were incorporated into the microcontroller to encode the data from the inertial sensor. The size of the compressor model was designed with the microcontroller's memory limitations in mind to allow for real-time implementation and operation. A wi-fi communication line was established between the device and the computer, allowing the computer to receive the raw data window, coded data, and statistical parameters. Once in the computer, we used the coded data to reconstruct the data (recovering the weights of the winning centroid used as a replacement for the original data) and evaluate the error committed in the compression.

Offline Training and Run Validation
We evaluated different architectures by varying the sampling frequency (F), the size of the samples (T), and the number of centroids in the codebook (N). Of the raw data in the database, 1,961,400 samples were reserved for training, and 600,100 samples were used for testing. The model accuracy was measured with the mean squared error (MSE) when reconstructing the input signal. The peak signal-to-noise ratio (PSNR) measured the quality of that reconstruction. We also considered the size of the look-up table (LUT-Size) and the signal compression ratio (bits of the original signal vs. bits required for the codebook index, i.e., the compressed signal). Both parameters have a direct relationship with F, T, and N, and they must comply with the constraints imposed by the embedded system in terms of available memory and communication protocol. Table 1 lists different combinations of the temporal parameters F and T for a fixed codebook size (N = 512), which allows us to visualize the compression level achieved and the required memory. In Table 2, the temporal parameters are set (F = 100 Hz, and T = 1 s) and different sizes for the codebook N are tested to show the effect on the MSE.  In Figures 9-11, the PSNR distributions as well as the mean, minimum, and maximum values for the cases considered in Table 2. are shown while evaluating the three different target functions (ESCL, FSCL, and unity). In Figures 9-11, the PSNR distributions as well as the mean, minimum, and maximum values for the cases considered in Table 2. are shown while evaluating the three different target functions (ESCL, FSCL, and unity).

Offline Training and Online Run Validation
We have evaluated the suitability of the method using the models with N = 128 and N = 256 due to the memory constraints of the embedded system. Table 3 shows the PSNRs achieved by the embedded system for F = 50 Hz-100 Hz and N = 128-256. Figures 12 and 13 show the PSNR distributions and mean, minimum, and maximum values obtained from evaluating the three different target functions (TF = ESCL, FSCL, unity). Table 4 shows the PSNR achieved by the embedded system while using testing data instead of training data, which measures the generalisation of the model.

Offline Training and Online Run Validation
We have evaluated the suitability of the method using the models with N = 128 and N = 256 due to the memory constraints of the embedded system. Table 3 shows the PSNRs achieved by the embedded system for F = 50 Hz-100 Hz and N = 128-256. Figures 12 and 13 show the PSNR distributions and mean, minimum, and maximum values obtained from evaluating the three different target functions (TF = ESCL, FSCL, unity). Table 4 shows the PSNR achieved by the embedded system while using testing data instead of training data, which measures the generalisation of the model.    The PSNR for a PCA-based model up to 24 components has been calculated in order to compare it with the proposed VQ model (the number of components was set to reach 90% of the variance). It can be seen in Figure 14 how the average PSNR increases with the number of components, but it increases slowly when using more than 6 components, requiring at least 19 components to achieve the mean value of 40 dB, where each PCA component requires a 2-byte float16 data. This means that to obtain a value for an average PSNR between 40 and 50 dBs, we would have to use more than 19 components, which would be equivalent to 38 bytes, while VQ achieves similar or better PSNR figures, with only one byte being sent, leading to significative less data being transmitted.   The PSNR for a PCA-based model up to 24 components has been calculated in order to compare it with the proposed VQ model (the number of components was set to reach 90% of the variance). It can be seen in Figure 14 how the average PSNR increases with the number of components, but it increases slowly when using more than 6 components, requiring at least 19 components to achieve the mean value of 40 dB, where each PCA component requires a 2-byte float16 data. This means that to obtain a value for an average PSNR between 40 and 50 dBs, we would have to use more than 19 components, which would be equivalent to 38 bytes, while VQ achieves similar or better PSNR figures, with only one byte being sent, leading to significative less data being transmitted. The PSNR for a PCA-based model up to 24 components has been calculated in order to compare it with the proposed VQ model (the number of components was set to reach 90% of the variance). It can be seen in Figure 14 how the average PSNR increases with the number of components, but it increases slowly when using more than 6 components, requiring at least 19 components to achieve the mean value of 40 dB, where each PCA component requires a 2-byte float16 data. This means that to obtain a value for an average PSNR between 40 and 50 dBs, we would have to use more than 19 components, which would be equivalent to 38 bytes, while VQ achieves similar or better PSNR figures, with only one byte being sent, leading to significative less data being transmitted.

Discussion
Compression rate is defined as the ratio between the size of the original signal and the compressed signal. In VQ compression methods it is established a priori when setting the model parameters F and T, which define the window size TW, and the number of bytes required to index a look-up table of size N. Thus, we can set the desired compression rate while considering the cost of the memory required to store the look-up table. We can see in Table 1 how the compression rate increases as the sampling frequency and sample size increase, i.e., as the LUT-Size does. Table 2 shows that the compression rate decreases when the number of neurons and the size of the look-up table increase, as we use two bytes to index codebooks with N > 256. Compression rate figures of 600 can be achieved in the smallest models with codebook sizes of 76.8 kB and 153.8 kB.
The MSE is related to the number of samples a single neuron in the codebook must represent. Table 1 shows that increasing the window size increases the MSE for a fixed number of neurons, while we can see in Table 2 how the MSE decreases as the number of neurons increases for a fixed window size. This same pattern can also be seen in Figures 9-11, where the PSNR increases as the number of neurons N increases.
Looking at the PSNR distributions in Figures 9-11, we can identify several regions of interest in these violin plots, namely, three regions between 80-40 dB, and a fourth region in the highest part of the chart above 120 dB, which appears only in models with the highest number of neurons. This region corresponds to a nearly perfect reconstruction where the centroid weights have hardly any noise, thus indicating an overfitting performance, where single centroids dedicated to code individual samples appear. To avoid this situation, codebook sizes must be at least 5-10 times smaller than the number of samples to learn in order to force the centroids to learn different patterns. Next, an area between 80-60 dB corresponds to small Voronoi regions where low noise for the reconstructed signal can be found and is more evident in models with higher numbers of neurons. Between 60-50 dB, we can observe a neck belonging to the intermediate-sized Voronoi regions, in which certain signals are close to the centroid, but other signals appear on the periphery of the Voronoi region, giving rise to lower PSNR values. Lastly, in the 50-40 dB area, the Voronoi regions are large and, as they comprise widely separated input data values, they produce situations with noisier reconstructions.
In Figure 9, an extra simulation with 16,384 neurons is added, in which we can see a greater growth in the neurons capable of nearly replicating the pattern exactly, as well as an increase in the average. This situation is due to the fact that the number of neurons is very large and the centroids have few examples to train, so they adjust to replicate the input values.
We can see a large increase in the mean from N = 4096 to N = 8192, as well as the appearance of overfitting in the 120 dB zone in the violin plot. As the objective is to achieve high compression, we will focus on the area from N = 128 to N = 1024, in which the evolution of the model is clear. We can see in the evolution of the 60-40 dB and 60-80 dB areas how, as we add more neurons, the data goes towards areas with higher PSNR values, increasing the density in the area by approximately 70 dB. With this model, we achieve a good balance by obtaining PSNR values greater than 50 dB while applying compression factors greater than 300.
In Figures 12 and 13, two models are selected to analyse the impact of the target function used. If we look at Figure 12, the ESCL model gives the best response. The average in this model is higher than those in the other two, and the 50-70 dB zone is wider than it is in the other two, providing a better compression result. Figure 13 produces a similar result but with a less pronounced difference, as the number of centroids is duplicated. In both model parametrisations, the FSCL and unity magnitudes tend to be similar. This effect tends to occur when the distribution of data points is almost uniform, making the density and frequency of activation essentially equivalent.
The implementation of the microcontroller compression algorithm shows the importance of establishing an adequate map size to limit the embedded system when applying the models at N = 128 to get the system working properly.
In Table 3, we verify the correct implementation of the algorithm in the embedded device, allowing the signal to compress in the same way that it does in the offline process. For this implementation, models such as those that appear in Table 1 have been evaluated, achieving similar results. The generalisation of the model evaluation results in Table 4 shows how the PSNR has decreased compared to the values obtained in Table 3, as expected. However, the PSNR remains at values above 40 dB.

Conclusions
We are surrounded by sensors that collect data from the environment continuously and send them to the Internet. Reductions in their data rates are necessary in order to enhance their energy usage, improve data security, and minimize the sensor data traffic generated by billions of IoT devices.
In an implementation of VQ models, we define an adaptative codebook that evolves to become highly specific to the input data. It has been designed to look for a balance between complexity (allowing implementation in embedded devices) and error minimisation. We found that the ESCL model was the best VQ algorithm, as it was able to provide compression factors between 300 and 600 with PSNR values varying from 40 dB to 70 dB.
This work provides evidence that VQ methods are able to reduce the dimensionality of raw data within an embedded device, with the memory required for their adaptive look-up tables being their main limitation. The use of additional external flash memory could allow the use of larger look-up tables and improve performance. An additional benefit of VQ encoding is that data encryption is possible in situations where the data to be transmitted could be intercepted [30]. At very low energy costs, VQ encoding makes it nearly impossible to decode the data sent in the absence of a model.