Microbiological Quality Estimation of Meat Using Deep CNNs on Embedded Hardware Systems

Spectroscopic sensor imaging of food samples meta-processed by deep machine learning models can be used to assess the quality of the sample. This article presents an architecture for estimating microbial populations in meat samples using multispectral imaging and deep convolutional neural networks. The deep learning models operate on embedded platforms and not offline on a separate computer or a cloud server. Different storage conditions of the meat samples were used, and various deep learning models and embedded platforms were evaluated. In addition, the hardware boards were evaluated in terms of latency, throughput, efficiency and value on different data pre-processing and imaging-type setups. The experimental results showed the advantage of the XavierNX platform in terms of latency and throughput and the advantage of Nano and RP4 in terms of efficiency and value, respectively.


Introduction
Ensuring the quality and safety of food products is expected by consumers, especially for highly perishable (e.g., freshly ground meat) food commodities. Food products' quality and safety control must be accurate and efficient to meet consumers' increasing expectations and standards, which in turn results in more demanding and labour-intensive processes [1]. In addition, the increasing demand for food due to the increase in the world population, especially in some developing countries such as China and India [2], has made it even more challenging to ensure food quality and safety control processes.
Various food analysis methods, inspections and audits are taking place to evaluate the quality and safety of raw and processed materials and the product, considered the control measure [3,4]. In the case of monitoring microbiological food safety and quality, microbiological analysis (e.g., colony counting methods), chemical analysis, or molecular techniques are performed [5,6]. However, these methods are time-consuming, provide retrospective results, are expensive or depend on high-tech infrastructure and require specialized staff [7,8], which in turn allows microbiological inspection only of a small sample of the food products available on the market [4].
The progress over the last decades in sensors technology and Artificial Intelligence, particularly in machine learning and computer vision, has enabled the development of solutions for automatic quality and authenticity assessment of food products, like meat and fish [3,9], dairy products [10], food powders [11] or oil [12]. Such technological solutions allow the quick and low-cost assessment of the quality of food products without the need

System Architecture
In this section, we describe the proposed system for estimating the microbial population of food, which consists of three phases: data acquisition, offline AI training, and online operation. The modular architecture of the system allows it to be flexible. It can be applied to different types of food and expanded upon with alternative algorithmic techniques for data pre-processing on different embedded boards.
The first phase, data acquisition, involves collecting multispectral imaging (MSI) data from the food samples. The MSI data is then used in the offline AI training phase. Next, an AI model is trained to estimate the microbial population levels of the food samples using a convolutional neural network for regression (numerical estimation of the microbial population). Finally, the trained model is deployed on the edge device using an embedded hardware board in the online operation phase. It processes the MSI food images in real-time to provide microbial population estimates. Figure 1 illustrates the block diagram of the

Acquisition of Food Imaging Data and Estimation of Microbial Population
Two datasets were formed in terms of packaging, minced pork samples that were stored aerobically (AIR) and samples that were under modified atmosphere packaging (MAP). The gas composition was 80% O2 and 20% CO2. Each dataset contained four experimental replicates (R1-R4). The samples were stored at different temperatures, from 4 °C to 12 °C and subjected to lab-based microbiological analysis. The MSI data were collected from fresh to spoiled states, covering as much as possible a representative number of samples throughout storage. The microbial population (i.e., the total viable counts, TVC) was measured using the plate count method. In parallel to the acquisition of TVC, multispectral images of the respective minced pork samples (ca. 70 g) were acquired using the VideometerLab system [22] to form a suitable dataset for the training of the machine learning regression models. Each minced meat sample was placed in a Petri dish, and the latter was placed inside an Ulbricht sphere, in which the camera was top-mounted, and the corresponding multispectral image of the product's surface was taken. The MSI images had a resolution of 1200 × 1200 pixels and 18 different wavelengths, non-uniformly distributed ranging from 405 to 970 nm (i.e., 405, 430, 450, 470, 505, 565, 590, 630, 645, 660, 850, 870, 890, 910, 920, 940, 950 and 970 nm). A more detailed description of the data acquisition and storage conditions is available in [3].

Offline Training Phase
The acquired MSI meat images were pre-processed, as shown in Figure 2. The preprocessing included image resizing to 224 × 224 pixels resolution to reduce the CNN model complexity and, thus, the training time of the CNN regression models. Training on the original MSI image resolution (1200 × 1200 × 18 pixels) would result in running out of memory. Additionally, it prevents the increase of inference on edge devices, which typically have limited processing and memory resources. After resizing, the data sample is optionally converted to an RGB image (224 × 224 × 3 pixels) via concatenating specific

Acquisition of Food Imaging Data and Estimation of Microbial Population
Two datasets were formed in terms of packaging, minced pork samples that were stored aerobically (AIR) and samples that were under modified atmosphere packaging (MAP). The gas composition was 80% O 2 and 20% CO 2. Each dataset contained four experimental replicates (R1-R4). The samples were stored at different temperatures, from 4 • C to 12 • C and subjected to lab-based microbiological analysis. The MSI data were collected from fresh to spoiled states, covering as much as possible a representative number of samples throughout storage. The microbial population (i.e., the total viable counts, TVC) was measured using the plate count method. In parallel to the acquisition of TVC, multispectral images of the respective minced pork samples (ca. 70 g) were acquired using the VideometerLab system [22] to form a suitable dataset for the training of the machine learning regression models. Each minced meat sample was placed in a Petri dish, and the latter was placed inside an Ulbricht sphere, in which the camera was top-mounted, and the corresponding multispectral image of the product's surface was taken. The MSI images had a resolution of 1200 × 1200 pixels and 18 different wavelengths, non-uniformly distributed ranging from 405 to 970 nm (i.e., 405, 430, 450, 470, 505, 565, 590, 630, 645, 660, 850, 870, 890, 910, 920, 940, 950 and 970 nm). A more detailed description of the data acquisition and storage conditions is available in [3].

Offline Training Phase
The acquired MSI meat images were pre-processed, as shown in Figure 2. The preprocessing included image resizing to 224 × 224 pixels resolution to reduce the CNN model complexity and, thus, the training time of the CNN regression models. Training on the original MSI image resolution (1200 × 1200 × 18 pixels) would result in running out of memory. Additionally, it prevents the increase of inference on edge devices, which typically have limited processing and memory resources. After resizing, the data sample is optionally converted to an RGB image (224 × 224 × 3 pixels) via concatenating specific wavelengths into a group of three channels (i.e., 645 nm for Red, 505 nm for Green, and 470 nm for Blue) to reduce further the amount of input data channels (6× times less) or remain in the MSI format (224 × 224 × 18 pixels). Next, the RGB or the MSI image is optionally masked to remove the background, petri dish and fat present in the meat images. The masking is performed by image segmentation using the k-means clustering algorithm applied to the RGB image. The resulting mask is used to segment the meat part of either the MSI or the RGB image, with the values of the pixels of the non-meat part of the images being set equal to 0 (black colour). As a final pre-processing step, pixel value normalization ([0, 1] values range) is applied, resulting in 224 × 224 × C images with C = 18 for MSI and C = 3 for RGB images.
Sensors 2023, 23, x FOR PEER REVIEW 4 wavelengths into a group of three channels (i.e., 645 nm for Red, 505 nm for Green 470 nm for Blue) to reduce further the amount of input data channels (6× times le remain in the MSI format (224 × 224 × 18 pixels). Next, the RGB or the MSI image tionally masked to remove the background, petri dish and fat present in the meat im The masking is performed by image segmentation using the k-means clustering algo applied to the RGB image. The resulting mask is used to segment the meat part of e the MSI or the RGB image, with the values of the pixels of the non-meat part of the im being set equal to 0 (black colour  Various machine learning regression models were evaluated to find the best-s and most accurate deep learning architecture for predicting microbial populations pre-processed meat images, MSI or RGB, were used as input (X) at the training pha the CNN regression models. At the same time, the ground truth labels (TVC values) calculated from the microbiological analysis. During the offline training phase, va CNN models were trained for hyperparameter optimisation and evaluated using th age pre-processing steps described above.

Online Operation Phase
Regarding the online operation phase, the same pre-processing steps were used the offline phase, as shown previously in Figure 2. However, during the online oper phase, the most accurate CNN regression models were deployed on various edge de (embedded systems) with distinct architectures. For each edge device, the regre models were optimized (quantized) for the target hardware to maximise the perform By doing this, the model size is vastly reduced, which in turn requires fewer proce capabilities, benefiting the embedded devices' memory requirements and compute straints.

Experimental Setup
The experimental setup section includes a detailed description of the datasets to train and test the CNN regression models, as well as descriptions of the models for image segmentation and regression for microbial population estimation. It als cludes a description of the edge devices benchmarked for the online operation phas their key features. Various machine learning regression models were evaluated to find the best-suited and most accurate deep learning architecture for predicting microbial populations. The pre-processed meat images, MSI or RGB, were used as input (X) at the training phase of the CNN regression models. At the same time, the ground truth labels (TVC values) were calculated from the microbiological analysis. During the offline training phase, various CNN models were trained for hyperparameter optimisation and evaluated using the image pre-processing steps described above.

Online Operation Phase
Regarding the online operation phase, the same pre-processing steps were used as in the offline phase, as shown previously in Figure 2. However, during the online operation phase, the most accurate CNN regression models were deployed on various edge devices (embedded systems) with distinct architectures. For each edge device, the regression models were optimized (quantized) for the target hardware to maximise the performance. By doing this, the model size is vastly reduced, which in turn requires fewer processing capabilities, benefiting the embedded devices' memory requirements and compute constraints.

Experimental Setup
The experimental setup section includes a detailed description of the datasets used to train and test the CNN regression models, as well as descriptions of the models used for image segmentation and regression for microbial population estimation. It also includes a description of the edge devices benchmarked for the online operation phase and their key features.

Evaluation Datasets
The dataset containing multispectral images of raw minced pork was pre-processed into MSI and RGB image types, each having a set with and without masking. Each CNN model was trained on these categories using a 4-fold cross-validation experimental protocol to avoid overlapping between train and test subsets. The distribution of the AIR

K-Means Masking
To segment the minced meat images, to remove redundant information from the image samples, such as the background, the petri dish and the fat, the k-means clustering algorithm was used. By fitting a k-means model on an RGB image sample, the undesirable areas of the images were removed with an additional step of a threshold operation on the pixels. An example of the effect of masking is shown in Figure 3. The k-means model parameters were empirically optimised, and the k-means model weights were stored for reuse during the online, operational phase.
The dataset containing multispectral images of raw minced pork was pre-processed into MSI and RGB image types, each having a set with and without masking. Each CNN model was trained on these categories using a 4-fold cross-validation experimental protocol to avoid overlapping between train and test subsets. The distribution of the AIR (424 samples) and MAP (423 samples) subsets divided into 4-fold training is tabulated in Tables 1 and 2, respectively.

K-Means Masking
To segment the minced meat images, to remove redundant information from the image samples, such as the background, the petri dish and the fat, the k-means clustering algorithm was used. By fitting a k-means model on an RGB image sample, the undesirable areas of the images were removed with an additional step of a threshold operation on the pixels. An example of the effect of masking is shown in Figure 3. The k-means model parameters were empirically optimised, and the k-means model weights were stored for reuse during the online, operational phase.

CNN Regression Models
For the estimation of the microbiological quality of meat samples, 2D CNN regression models were used. Various well-known and widely used CNN architectures were tested, namely the MobileNet [23], DenseNet [24], EfficientNet [25], VGG16 [26], and ResNet [27]. ResNet-18 and Resnet-34 achieved significantly higher performance than the other four CNN architectures. Thus, in the remainder of this article, we considered only ResNet-based evaluations. The inputs of the ResNet-18 and ResNet-34 models were adjusted according to the input data (MSI or RGB). Finally, the output was set for regression, with detailed layer information for both architectures shown in Table 3.
* C is number of channels; C = 3 for RGB images; C = 18 for MSI images.

Embedded Systems
Seven embedded systems were evaluated for this application: a 4 GB and 8 GB Raspberry Pi 4 (RP4), Intel Neural Compute Stick 2 (NCS2), and NXP i.MX 8M Plus (IMX8P), NVIDIA Jetson Nano (Nano), NVIDIA Jetson Xavier NX (XavierNX), AMD-Xilinx FPGAs Ultra96v1 (ULTRA96) and Kria (KV260). Each platform is unique regarding the underlying technology, external memory bandwidth and density, different type of AI acceleration, power consumption and cost. Table 4 presents the list of embedded systems used in the present evaluation with their core specifications, with more details on the setup explained below. Intel NCS2 (NCS2) is a vision processing unit (VPU) accelerator with 16 low-power vector processing units 128-bit wide (a.k.a. SHAVE), running at 700 MHz. It comes in the form of a USB stick, so it does require a host controller, where an RP4 fitted with 4 GB LPDDR4 running 32bit OS (Buster) was used to act as the host. The CNN models used on NCS2 were quantized (FP16) and inferred using OpenVINO v2022.2 runtime engine.

3.
IMX8P (NXP i.MX 8M Plus) NXP i.MX 8 M Plus (IMX8P) includes a quad-core ARM Cortex-A53 running at 1.8 GHz, an ARM Cortex M7, a HiFi4 DSP running at 800 MHz, and most importantly, a Neural Processing Unit (NPU). The NPU includes several hardware features, such as a 128-bit vector engine and tensor processing cores capable of accelerating INT8 models. Any models of unsupported datatypes (e.g., FP16 and DINT8) have their inference fall back to being executed in the CPU. TFLITE v2.9.1 runtime engine was used, which meant the previous TFLITE quantized models could be reused.

4.
Nano (NVIDIA Jetson Nano) NVIDIA Jetson Nano (Nano) includes an embedded GPU with 128 CUDA cores, a quad-core ARM Cortex-A57 64-bit CPU and 4GB LPDDR4. From the two power modes supported, we used the power mode MAXN (10 Watts), where the 4× CPU cores run at 1.48 GHz and the GPU at 921.6 MHz. Running Jetpack v4.6.1, the CNN models were quantized (FP16) and executed using TensorRT (TRT) runtime engine.

5.
XavierNX (NVIDIA Jetson Xavier NX) NVIDIA Jetson Xavier NX (XavierNX) is a more powerful family than Nano, as it includes more GPU cores, a more powerful CPU, and higher density and speed LPDDR4. Its GPU comprises 384 cores and 48 Tensor Cores, while its CPU is a 64-bit 6-core NVIDIA Carmel ARMv8.2. From the various power modes, we used power mode 1 (15 watts, 4 cores), where the 4× CPUs were running 1.4 GHz and the GPU at 1.1 GHz. Running Jetpack v5.0.2, CNN models were quantized (FP16/INT8) and executed using TensorRT (TRT) runtime engine.

6.
Ultra96 (Avnet Ultra96-V1) Avnet Ultra96-V1 (Ultra96) is an AMD-Xilinx FPGA fitted with a ZU3EG variant, capable of accelerating AI models using a soft Deep Learning Processor Unit (DPU) in the Programmable Logic (PL). The DPU architecture is configurable with various parallelism and performance settings at the expense of PL resources. The Ultra96 was configured with the B1600 variant of DPUCZDX8G running at 300 MHz. The models were quantized (INT8) using Vitis-AI v2.5 and inferred with VART runtime engine.

KV260 (Xilinx Kria KV260 Starter Kit)
Xilinx Kria KV260 is System-on-Module with a carrier card containing an FPGA with a higher resource count than Ultra96, aimed for vision AI application. Similarly, to the Ultra96 setup, a DPU was implemented, but the main difference was configured with a more capable B4096 variant running at 300 MHz.

Experimental Results
The architecture presented in Section 2 was evaluated according to the experimental setup presented in Section 3. In Section 4.1, the performance metrics used in the training phase are outlined, the microbial population estimation results using three different CNN models on the minced pork dataset are presented, and the quantization loss results for the target edge devices are explored. Finally, in Section 4.2, the metrics used to evaluate the online (edge device) microbial population estimation and the results obtained from benchmarking each edge device on the proposed architecture, as illustrated in Figure 1.

Accuracy Metrics
The metrics used to evaluate the performance of the CNN regression models are the Root Mean Square Error (RMSE), the Pearson Correlation Coefficient (r), the Mean Absolute Error (MAE), and the Residual Prediction Deviation (RPD), which have also been used as the performance metrics in [3,[35][36][37][38]. The equations of the metrics are described below: where y n is the real TVC value of the n-th meat sample as calculated from the microbiological analysis, ∼ y n is the TVC value estimated by the CNN regression model, y is the average real TVC value, ∼ y is the average estimated TVC value, and σ∼ y is the standard deviation of the estimated TVC values.

CNN-Based Microbial Population Estimation
The training was performed using k-fold cross-validation, with k = 4, due to the number of available replicates for each AIR and MAP dataset. The type of data that the regression models were trained with were MSI (224 × 224 × 18) or RGB (224 × 224 × 3) images, with two different pre-processing types, i.e., with masking and without masking. The training was implemented with the Root Mean Squared Propagation (RMSprop) optimizer and Mean Squared Error (MSE) as the loss function.
Tables 5 and 6 present the averaged 4-fold cross-validation results for the AIR and MAP data. The results indicate that most CNN models have comparable performance regarding r, RMSE, and MAE metrics. However, ResNet-18 on MSI data with masking achieved the highest RPD metric. Notably, better results were observed in the AIR data than the MAP, with masking improving the RPD results on average by approximately 2.5% for AIR and 4.0% for MAP data. Some reasons for the better performance of the model based on AIR data compared to MAP data may be the different batches (e.g., initial microbial population) and the dominance of the different microbial groups due to different packaging conditions. It needs to be stressed that there is an assembly of quality characteristics of the samples contributing to the development of models. For example, as expected, the obtained better performance of the models based on aerobic samples could not be attributed to the colour since this was maintained better in the samples stored under MAP due to high oxygen presence.
Although ResNet-34 has twice as many hyperparameters as ResNet-18, slight overfitting was observed, indicating that the model may have been too complex for the given task. Furthermore, for the RGB data, despite the input data being six times less than MSI (3 channels instead of 18), the RMSE, MAE, and r values were near the MSI-based CNN model results. However, the RPD was much lower for the RGB image data.  The region from 405 nm (VIS) to 970 nm (NIR) is associated with protein, fat, and moisture [39]. Therefore, it is more informative for the 'description' of meat deterioration compared to changes only of colour (RGB models). Hyperspectral imaging (HSI) and multispectral imaging have been used for the prediction of freshness, quality, and safety parameters, with the region 400-1000 nm being the most utilized in animal-origin foods [40,41]. In the case of HSI, feature selection methods are applied to select key wavelengths to improve performance and computational time, and avoid overfitting [42,43]. The results of these studies show the potential in a wide range of applications. The studies by [44,45] showed great potential for the prediction of quality (i.e., TVC and TVB-N) in terms of RPD (>3). In the present study, the highest RPD was 2.83. Comparing the data acquisition workflow, in the present study, the samples were stored simulating real-life life conditions in a range of storage temperatures from 4 • C (refrigeration conditions) to 12 • C (abusive temperature), including a high number of samples (n > 400 for each packaging condition) and independent batches (R1-R4).

CNN Performance with Data Quantization
After evaluating the FP32 data type CNN regression models, the next step was to quantize each model for the target embedded system and its compatible run time engine. As mentioned previously in Section 3.3, each hardware supports specific data types. The results of the quantization were averaged across all CNN models and compared to the original results (FP32 data), with Table 7 showing the delta change in each metric. For the metrics r and RPD, the higher is, the better, while the opposite applies to RMSE and MAE. While for the case of FP16, no loss was observed, PTQ-INT8 (Post Training Quantization) models did show a slight drop in accuracy except TRT (Xavier NX), which showed a huge drop and was unsuitable for further hardware testing. As for the rest of the INT8 results (orange coloured), the delta in loss could be minimized further via Quantization Aware Training (QAT). However, this was not explored further as the results were satisfactory to proceed with hardware evaluations. Following the evaluation and comparison of the performance of each embedded system, the following main metrics were defined and used:

1.
Latency: Execution time from start to finish of a specific stage. To accurately extract this measurement, the application was run multiple times, and the average latency time was calculated for each stage. The overall test time was at least 30 s; apart from the target application process, other OS processes use the hardware resources too (such as CPU cores, cache memory, etc.), which may add noise to the experimental results. Stages of interest included loading MSI data, pre-processing, and model inference.

Throughput
Calculating the maximum sample per second throughput of each embedded system, considering all stages of the data pipeline, but not including the loading of models.

Hardware Evaluation Results
The hardware performance of each embedded system used in the study was evaluated using the metrics described above: latency, throughput, efficiency, and value. Latency is measured for each stage of the data pipeline, while throughput is calculated by considering the entire pipeline.

Load CNN Model (Latency)
Loading the CNN model is performed once, and Figure 4 demonstrates that it could be time-consuming. The NCS2 and Nano platforms were the slowest, requiring an average of 3.6 and 2.9 s, respectively, to load the CNN model weights. In contrast, the remaining platforms required between 12 ms (RP4_64bit) and 754 ms (Ultra96). This variation in loading time depends on both the CPU capabilities and the size and data type of the quantized model.

Load k-means Model (Latency)
When masking was used, the k-means model needed to be loaded onc the model weights were binarized for re-usage, and in this stage, they were The deserialization process depended solely on the CPU and the correspond system. The NCS2 platform, which uses RP4 with a 32-bit OS, had the slow zation time by a significant margin (46.3 ms). In contrast, the XavierNX, w latest and fastest ARM CPU with a 64-bit OS, had the fastest deserialization t Figure 5 shows the results.

Read MSI Samples (Latency)
Loading a multi-spectral image was found to be the most computation

2.
Load k-means Model (Latency) When masking was used, the k-means model needed to be loaded once. After that, the model weights were binarized for re-usage, and in this stage, they were deserialized. The deserialization process depended solely on the CPU and the corresponding operating system. The NCS2 platform, which uses RP4 with a 32-bit OS, had the slowest deserialization time by a significant margin (46.3 ms). In contrast, the XavierNX, which has the latest and fastest ARM CPU with a 64-bit OS, had the fastest deserialization time (6.2 ms). Figure 5 shows the results.
Sensors 2023, 23, x FOR PEER REVIEW loading time depends on both the CPU capabilities and the size and data typ tized model.

Load k-means Model (Latency)
When masking was used, the k-means model needed to be loaded on the model weights were binarized for re-usage, and in this stage, they were The deserialization process depended solely on the CPU and the correspond system. The NCS2 platform, which uses RP4 with a 32-bit OS, had the slow zation time by a significant margin (46.3 ms). In contrast, the XavierNX, w latest and fastest ARM CPU with a 64-bit OS, had the fastest deserialization Figure 5 shows the results.

Read MSI Samples (Latency)
Loading a multi-spectral image was found to be the most computation task of the data pipeline, posing a significant bottleneck. The size of the MSI s typically had a resolution of 1200 × 1200 × 18 pixels and an average sample s further intensified the computational load. The results of loading the MSI sa

Read MSI Samples (Latency)
Loading a multi-spectral image was found to be the most computationally intensive task of the data pipeline, posing a significant bottleneck. The size of the MSI sample, which typically had a resolution of 1200 × 1200 × 18 pixels and an average sample size of 100 MB, further intensified the computational load. The results of loading the MSI samples across different embedded systems are presented in Figure 6. It was observed that the slowest system was the Ultra96, with an average loading time of 4.6 s, while XavierNX was the fastest one, taking only 2 s. The loading time depended on the CPU's capabilities, particularly the clock frequency.

Pre-Processing (Latency)
The pre-processing stage of the data pipeline involved several steps (r 2) and varied depending on the input data type, whether it was MSI or RG whether it was with or without masking. Masking had a negligible impac cessing time, as shown in Figure 7, due to the resizing of the input data at t pipeline, which reduced the amount of data to be processed in subsequen cessing RGB data, consisting of three channels, was slightly faster than t which had 18 channels, although the difference was not proportional. Am forms tested, the fastest was the XavierNX, which had the most powerful C slowest was the Ultra96, which had the least powerful CPU.

4.
Pre-Processing (Latency) The pre-processing stage of the data pipeline involved several steps (refer to Figure 2) and varied depending on the input data type, whether it was MSI or RGB image, and whether it was with or without masking. Masking had a negligible impact on preprocessing time, as shown in Figure 7, due to the resizing of the input data at the start of the pipeline, which reduced the amount of data to be processed in subsequent stages. Processing RGB data, consisting of three channels, was slightly faster than the MSI data, which had 18 channels, although the difference was not proportional. Among the platforms tested, the fastest was the XavierNX, which had the most powerful CPU, while the slowest was the Ultra96, which had the least powerful CPU.

CNN Regression Model Inference Time (Latency)
The results of the CNN model inference time for each embedded system, using the fastest-performing quantization with minimal quantization loss, are presented in Table 8. The ResNet models were derived from the same architecture but with varying numbers of parameters due to input shapes. The model with the least parameters (ResNet-18: RGB) was the fastest. In addition, platforms that used TFLITE models (RP4_64bit and IMX8P) could perform thread execution, further decreasing latency. The slowest platform was the IMX8P (174.8/281.7/129.3 ms), while the fastest was the XavierNX (4.3/5.9/3.0 ms) for ResNet-18:MSI, ResNet-34:MSI and ResNet18:RGB respectively. cessing time, as shown in Figure 7, due to the resizing of the input data at the start of the pipeline, which reduced the amount of data to be processed in subsequent stages. Pro cessing RGB data, consisting of three channels, was slightly faster than the MSI data which had 18 channels, although the difference was not proportional. Among the plat forms tested, the fastest was the XavierNX, which had the most powerful CPU, while the slowest was the Ultra96, which had the least powerful CPU.

CNN Regression Model Inference Time (Latency)
The results of the CNN model inference time for each embedded system, using the fastest-performing quantization with minimal quantization loss, are presented in Table 8 The ResNet models were derived from the same architecture but with varying numbers of parameters due to input shapes. The model with the least parameters (ResNet-18: RGB was the fastest. In addition, platforms that used TFLITE models (RP4_64bit and IMX8P could perform thread execution, further decreasing latency. The slowest platform was the IMX8P (174.8/281.7/129.3 ms), while the fastest was the XavierNX (4.3/5.9/3.0 ms) for Res Net-18:MSI, ResNet-34:MSI and ResNet18:RGB respectively.

Throughput (samples per second)
We also evaluated the performance of the full data pipeline, considering all stages, including loading the meat sample image, pre-processing, and model inference. The total throughput results, measured in samples per second, are presented in Figure 8. Our results indicate that the loading of the meat sample image was the biggest bottleneck, negatively impacting the overall performance. Notably, the impact of masking on performance was smaller, as it affected the pre-processing stage, which was the second biggest bottleneck. The slowest platform was the Ultra96, which had the slowest CPU, while the fastest platform was the XavierNX, which had the fastest CPU. Overall, our findings suggest that the performance of this application is highly dependent on CPU capabilities. throughput results, measured in samples per second, are presented in Figure 8. Our results indicate that the loading of the meat sample image was the biggest bottleneck, negatively impacting the overall performance. Notably, the impact of masking on performance was smaller, as it affected the pre-processing stage, which was the second biggest bottleneck. The slowest platform was the Ultra96, which had the slowest CPU, while the fastest platform was the XavierNX, which had the fastest CPU. Overall, our findings suggest that the performance of this application is highly dependent on CPU capabilities.

7.
Efficiency (throughput per watt) Figure 9 presents each embedded system's efficiency (samples/watts) results when considering power consumption. Based on our results, the most efficient platform was the Nano, closely followed by the RP4_64bit. On the other hand, the least efficient platforms were the KV260 and Ultra96. Interestingly, XavierNX, which performed best in latency due to its fast CPU and accelerator, ranked third in efficiency. It is worth noting that further optimization can be achieved through hardware and software/firmware optimizations, as development kits often include features that may not be necessary for a given application. 7. Efficiency (throughput per watt) Figure 9 presents each embedded system's efficiency (samples/watts) results when considering power consumption. Based on our results, the most efficient platform was the Nano, closely followed by the RP4_64bit. On the other hand, the least efficient platforms were the KV260 and Ultra96. Interestingly, XavierNX, which performed best in latency due to its fast CPU and accelerator, ranked third in efficiency. It is worth noting that further optimization can be achieved through hardware and software/firmware optimizations, as development kits often include features that may not be necessary for a given application.

Value (throughput per dollar)
Finally, we measured the value metric of each platform, which considers the cost of the embedded systems (development kit). The results are presented in Figure 10, where the RP4_64bit was the most cost-efficient platform with the highest throughput per dollar. In contrast, the IMX8P was the least cost-efficient platform on the evaluated metric. Interestingly, XavierNX ranked first in the latency metrics and was observed fifth in value, indicating that it may not be the best choice for cost-sensitive applications.

8.
Value (throughput per dollar) Finally, we measured the value metric of each platform, which considers the cost of the embedded systems (development kit). The results are presented in Figure 10, where the RP4_64bit was the most cost-efficient platform with the highest throughput per dollar. In contrast, the IMX8P was the least cost-efficient platform on the evaluated metric. Interestingly, XavierNX ranked first in the latency metrics and was observed fifth in value, indicating that it may not be the best choice for cost-sensitive applications.

Conclusions
An architecture for estimating the microbial population of food samples using multispectral imaging and deep machine learning models for regression operating on embedded hardware was presented. Minced pork samples from different storage conditions were trained using different image pre-processing techniques on AIR and MAP storage conditions and deployed on a wide range of well-known hardware platforms. The evaluation showed that the most accurate results were achieved in AIR and MAP storage conditions when applying transfer learning on the ResNet-18 model with the masked MSI images. In addition, processing RGB images instead of the MSI ones resulted in lower latency and higher throughput of the tested embedded boards, with a slight reduction of the microbial population estimation accuracy. Regarding hardware performance, the Xa-vierNX platform outperformed all other evaluated embedded boards regarding latency and throughput because of its advantageous CPU and accelerator. In terms of energy efficiency and value, Nano and RP4 outperformed the other tested hardware boards. Moreover, on average, the loading of the MSI data corresponded to 86% of the total execution time in the end-to-end pipeline, 8% of the execution time was for the pre-processing and 6% for the inferencing of the CNN models.
The evaluation results indicate the potential of portable devices for food quality assessment using spectroscopic sensors and AI on edge. Such portable devices will allow easy and rapid testing of food quality from the corresponding public authorities in the short term and, with the further development of spectroscopic sensor technologies, also from individual consumers in the longer term.

Conclusions
An architecture for estimating the microbial population of food samples using multispectral imaging and deep machine learning models for regression operating on embedded hardware was presented. Minced pork samples from different storage conditions were trained using different image pre-processing techniques on AIR and MAP storage conditions and deployed on a wide range of well-known hardware platforms. The evaluation showed that the most accurate results were achieved in AIR and MAP storage conditions when applying transfer learning on the ResNet-18 model with the masked MSI images. In addition, processing RGB images instead of the MSI ones resulted in lower latency and higher throughput of the tested embedded boards, with a slight reduction of the microbial population estimation accuracy. Regarding hardware performance, the XavierNX platform outperformed all other evaluated embedded boards regarding latency and throughput because of its advantageous CPU and accelerator. In terms of energy efficiency and value, Nano and RP4 outperformed the other tested hardware boards. Moreover, on average, the loading of the MSI data corresponded to 86% of the total execution time in the end-to-end pipeline, 8% of the execution time was for the pre-processing and 6% for the inferencing of the CNN models.
The evaluation results indicate the potential of portable devices for food quality assessment using spectroscopic sensors and AI on edge. Such portable devices will allow easy and rapid testing of food quality from the corresponding public authorities in the short term and, with the further development of spectroscopic sensor technologies, also from individual consumers in the longer term.