Evaluation of a Micro-Electro Mechanical Systems Spectral Sensor for Soil Properties Estimation

Soil properties estimation with the use of reflectance spectroscopy has met major advances over the last decades. Their non-destructive nature and their high accuracy capacity enabled a breakthrough in the efficiency of performing soil analysis against conventional laboratory techniques. As the need for rapid, low cost, and accurate soil properties’ estimations increases, micro electro mechanical systems (MEMS) have been introduced and are becoming applicable for informed decision making in various domains. This work presents the assessment of a MEMS sensor (1750–2150 nm) in estimating clay and soil organic carbon (SOC) contents. The sensor was first tested under various experimental setups (different working distances and light intensities) through its similarity assessment (Spectral Angle Mapper) to the measurements of a spectroradiometer of the full 350–2500 nm range that was used as reference. MEMS performance was evaluated over spectra measured from 102 samples in laboratory conditions. Models’ calibrations were performed using random forest (RF) and partial least squares regression (PLSR). The results provide insights that MEMS could be employed for soil properties estimation, since the RF model demonstrated solid performance over both clay (R2 = 0.85) and SOC (R2 = 0.80). These findings pave the way for supporting daily agriculture applications and land related policies through the exploration of a wider set of soil properties.


Introduction
The world is facing a new era in agriculture where technological advances are largely utilized to manage agricultural systems by assisting in the estimation of temporal and spatial changes in soil and crop production [1]. These advances that range from robotics and drones to computer vision software and sensor development have completely transformed modern agriculture. Precision agriculture or smart farming is highly connected with these technologies and along with Internet of Things (IoT) services are becoming the future of farming applications [2]. Walter et al. [3] highlighted that agriculture undergoes a fourth revolution due to the increasing use of information and communication technology that could enable the continuous monitoring of the farm through a reliable data flow.
However, not all types of data could be gathered effortlessly. Due to the spatial variation of soil attributes, soil analysis is a very important task for agricultural monitoring of soil fertility, yet it is time consuming and costly, while it is a destructive technique that uses conditions. Firstly, the most suitable MEMS sensor configuration for reliable measurements was evaluated by testing different set ups, adjusting the height and the light intensity of the sensor in a subset dataset (six soil samples). These measurements were compared to reference measurements with a spectroradiometer with spectral range from 350-2500 nm. In the next step, after selecting the optimal sensor configuration, measurements in the full dataset were performed. Finally, the results regarding both the most accurate measurement methodology, and the modelling evaluation for SOC and clay estimation using partial least squares regression (PLSR) and the random forests (RF) models are provided and discussed.

Sample Dataset
In this study, a set of 102 topsoil samples (0-30 cm) were utilized, which will be denoted as full dataset Figure 1. The full dataset was comprised of soil samples from two different regions. Specifically, the 45 topsoil samples were collected using a random soil sampling strategy that was adapted during the field campaign to specifically include bare soil samples from the broader rural area around lake Zazari, located in Western Macedonia, Greece. The remaining 57 samples were selected from the Republic of North Macedonia (MKD), part of GEO-CRADLE's initiative open SSL. For more details (e.g., sampling strategies and soil types), the reader is referred to Tziolas et al. [42]. The physical samples were sent to the Laboratory of Remote Sensing and Geographic Information Systems of the Aristotle University of Thessaloniki, Thessaloniki, Greece). The GEO-CRADLE SSL is in compliance with GEOSS data principles and Open Database License standards, and it is publicly available through the GEO-CRADLE regional data hub. The initial MKD part of GEO-CRADLE is a regional VNIR-SWIR SSL, comprised of the spectral signatures and their key soil properties of 124 topsoil samples across the MKD region. The conclusive 57 soil samples were selected from the full dataset, applying the Kennard Stone algorithm [43] over the predictor space, in order to achieve adequate var- The initial MKD part of GEO-CRADLE is a regional VNIR-SWIR SSL, comprised of the spectral signatures and their key soil properties of 124 topsoil samples across the MKD region. The conclusive 57 soil samples were selected from the full dataset, applying the Kennard Stone algorithm [43] over the predictor space, in order to achieve adequate variability of the target soil properties (clay and SOC) in the dataset. A sample dataset constituting of six indicative soil samples (hereinafter referred to as subset dataset, Figure 2) representing different soil classes was selected from the complete dataset and was utilized for the assessment the most suitable MEMS sensor configuration for the subsequent spectral measurements. iability of the target soil properties (clay and SOC) in the dataset. A sample dataset constituting of six indicative soil samples (hereinafter referred to as subset dataset, Figure 2) representing different soil classes was selected from the complete dataset and was utilized for the assessment the most suitable MEMS sensor configuration for the subsequent spectral measurements.

Sample Preparation and Chemical Analysis
The soil samples were carefully cleaned from possible vegetation, roots, stones, or other non-soil particles and air-dried. Each sample was crushed, passed through a 2 mm sieve, and then separated into two equal sub-samples. The first sub-sample was used for spectroscopic analysis, while the second was used for chemical analysis. The chemical analyses, in the case of the samples from the area of Zazari, were carried out in the laboratory of the Inter-Balkan Environment Center, while in the case of the samples from the area of the Republic of North Macedonia, they were taken from the open GEO-CRADLE database. The Vougioukos method [44] was used to determine the physicochemical parameters and specifically to determine the granulometric composition of the soil. The total organic carbon was determined via the Walkley-Black method [45].

Spectral Measurements
The spectral measurements were performed with two different sensors. The first one was the Spectral Evolution PSR+ 3500 instrument (Spectral Evolution Inc., Lawrence, MA, USA) covering the electromagnetic spectrum from 350 to 2500 nm. The spectral resolution of PSR+ 3500 is 2.8 nm @ 700 nm; 8 nm @ 1500 nm; 6 nm @ 2100 nm. PSR+ 3500 derived spectral signatures are graphical representations of one-dimensional functions, mapping the electromagnetic spectrum of VNIR-SWIR to the sensed reflectance. The second spectral device was a MEMS based S2.2 sensor (Spectral Engines OY, Helsinki, Finland) that employs a patented Fabry-Pérot interferometer and covers the spectral range from 1750 to 2150 nm, with wavelength interval of 5 nm. Two tungsten vacuum lamps compose the built-in illumination source. The device is compact with minimum dimensions (25 × 25 × 17.5 mm 3 ), weighing 15 g, and is equipped with a 5 V re-chargeable battery supply, meeting portability characteristics and able to operate in situ and withstand ambient temperatures between 10 and 50 °C.
Spectral measurements are based on the principles of energy matter interactions. The values of the relative reflectance Ro of each sample are calculated as follows: where Io is the reflectance of the sample, Id the reflectance of the dark measurement (which is sensed through using each device with the sensor's aperture closed), and Iw is the reflectance value of the respective white-reference material used for each sensor.

Sample Preparation and Chemical Analysis
The soil samples were carefully cleaned from possible vegetation, roots, stones, or other non-soil particles and air-dried. Each sample was crushed, passed through a 2 mm sieve, and then separated into two equal sub-samples. The first sub-sample was used for spectroscopic analysis, while the second was used for chemical analysis. The chemical analyses, in the case of the samples from the area of Zazari, were carried out in the laboratory of the Inter-Balkan Environment Center, while in the case of the samples from the area of the Republic of North Macedonia, they were taken from the open GEO-CRADLE database. The Vougioukos method [44] was used to determine the physicochemical parameters and specifically to determine the granulometric composition of the soil. The total organic carbon was determined via the Walkley-Black method [45].

Spectral Measurements
The spectral measurements were performed with two different sensors. The first one was the Spectral Evolution PSR+ 3500 instrument (Spectral Evolution Inc., Lawrence, MA, USA) covering the electromagnetic spectrum from 350 to 2500 nm. The spectral resolution of PSR+ 3500 is 2.8 nm @ 700 nm; 8 nm @ 1500 nm; 6 nm @ 2100 nm. PSR+ 3500 derived spectral signatures are graphical representations of one-dimensional functions, mapping the electromagnetic spectrum of VNIR-SWIR to the sensed reflectance. The second spectral device was a MEMS based S2.2 sensor (Spectral Engines OY, Helsinki, Finland) that employs a patented Fabry-Pérot interferometer and covers the spectral range from 1750 to 2150 nm, with wavelength interval of 5 nm. Two tungsten vacuum lamps compose the built-in illumination source. The device is compact with minimum dimensions (25 × 25 × 17.5 mm 3 ), weighing 15 g, and is equipped with a 5 V re-chargeable battery supply, meeting portability characteristics and able to operate in situ and withstand ambient temperatures between 10 and 50 • C.
Spectral measurements are based on the principles of energy matter interactions. The values of the relative reflectance R o of each sample are calculated as follows: where I o is the reflectance of the sample, I d the reflectance of the dark measurement (which is sensed through using each device with the sensor's aperture closed), and I w is the reflectance value of the respective white-reference material used for each sensor.

Reference Spectral Measurements
All measurements of the full dataset were performed inside a totally opaque dark box to prevent interferences with external light conditions and decrease temperature deviations. The steps of the followed protocol were performed by the following order: (i) measurement of a white reference panel (Spectralon TM ) with over 98% reflectance in the VNIR-SWIR region and 95% in the SWIR for irradiance to reflectance conversion, (ii) three measurements of two sand dunes Wylie Bay and Lucky Bay that were used as internal soil standards according to Ben Dor et al. [46], and (iii) three consecutive measurements from each soil sample that was rotated 90 • each time and averaged for the formation of the spectral signature. This procedure was repeated after five soil sample measurements. The sensors were cross-calibrated through Wylie Bay and Lucky Bay fine earth samples that were used as diffusion gold standards [47], through the transformation: According to the above set of formulas, a correction coefficient is calculated for each sensed spectral band λ, ρ S,λ is the reflectance of standardization media as measured from each of both setups, with λ denoting the spectral band, ρ SBM,λ is the reference standard as defined from Commonwealth Scientific and Industrial Research Organization (CSIRO), and the standardized reflectance of the each sample will be the element-wise product of the vector containing the calculated correction factor R λ with the sensed reflectance for each wavelength R o,λ .

MEMS Sensor Configuration and Experimental Setup
Considering the lack of an established protocol for measuring soil samples with the MEMS sensor, ten different scenarios regarding the sensor's configuration were evaluated. Each scenario was conceptualized to test different instrumentation set-ups in order to select the optimal configuration. The adjusting parameters that were tested include the sensor's distance from the sensed surface and the device's light intensity. The device's aperture angle is 7 • while the sensor's distance from its protective glass is 12 mm, resulting in a circular sensing area of 1.43 mm diameter that does not cover the minimum soil particle size limit of 2 mm of coarse sand [48]. By that means, robust and representative measurements could be questionable when it comes to non-homogeneous materials such as soil samples. Under that consideration, a mounting bracket that would raise the device from the sensing area and functionally expand the sensor's field of view to a circular area over 2 mm diameter was tested. Three cylinders at heights (H) of 15, 20, and 35 mm with interior surface of maximum reflectance were evaluated. The tested tubes increased the sensor's sensing area's diameter, resulting to diameters of 2.87, 3.3, and 3.9 mm, respectively. For each tube, the measurement was performed over three scenarios connected to Light Intensity (LI), which was set to 80%, 90%, and 100%, respectively. The experimental set up of the MEMS sensor along with the differences of the sensed area alterations according to height changes is shown in Figure 3.
The default sensor configuration was also assessed, i.e., direct contact of the soil sample with the sensor's lens and 100% LI. The above-mentioned measurements were performed at the subset dataset. The procedure of MEMS measurements included the exact same steps that were followed for PSR+ 3500 spectral measurements. The sensed spectral signatures were compared to the PSR+ 3500 reference spectral signatures in terms of the Spectral Angle Mapper similarity [49], and the optimal scoring methodology was selected for the modeling part. The Spectral Angle Mapper angle is calculated as: where n the total count of spectral bands, t λ the sensed reflectance at λ nm, and ρ λ the corresponding reflectance of PSR+ 3500 measurements.  The default sensor configuration was also assessed, i.e., direct contact of the soil sample with the sensor's lens and 100% LI. The above-mentioned measurements were performed at the subset dataset. The procedure of MEMS measurements included the exact same steps that were followed for PSR+ 3500 spectral measurements. The sensed spectral signatures were compared to the PSR+ 3500 reference spectral signatures in terms of the Spectral Angle Mapper similarity [49], and the optimal scoring methodology was selected for the modeling part. The Spectral Angle Mapper angle is calculated as: where n the total count of spectral bands, tλ the sensed reflectance at λ nm, and ρλ the corresponding reflectance of PSR+ 3500 measurements.

MEMS Sensor Spectral Measurements
In order to perform the spectral measurements to the full dataset with the MEMS sensor, the best evaluated set up from the previous step was utilized. The followed procedure was according to the steps described in the previous section, i.e., white reference measurement, internal soil standards measurements, and finally soil sample measurements.

Spectral Data Preprocessing
A common practice in soil spectroscopy is to test different pre-processing techniques, which may help to enhance the absorption bands or perform scatter correction. In that regard, reflectance (R) measurements were transformed to absorbance values through log transformation (log10 (1/R)). Then, the Savitzky-Golay algorithm [50] was performed to apply local polynomial regression on the first derivative of absorbance spectra. Through the first derivative, any possible baseline signal was removed [51], and a third degree polynomial was fitted through Partial Least Squares to spectral neighborhoods with 101 nm width for full PSR+ 3500 measurements and 21 nm for MEMS with the help of the Savitzky-Golay algorithm. The conclusive part of spectral preprocessing includes the range scaling, which was conducted through the application of z-transformation with the standard normal variates (SNV) technique [52]. Range scaling aims to reduce multiplicative effect of light scattering through centering and scaling, resulting in an average value of 0 and standard deviation of 1 for each spectral band, respectively. Reflectance measurements along with the preprocessing transformations are shown in Figure 4. PSR+ 3500

MEMS Sensor Spectral Measurements
In order to perform the spectral measurements to the full dataset with the MEMS sensor, the best evaluated set up from the previous step was utilized. The followed procedure was according to the steps described in the previous section, i.e., white reference measurement, internal soil standards measurements, and finally soil sample measurements.

Spectral Data Preprocessing
A common practice in soil spectroscopy is to test different pre-processing techniques, which may help to enhance the absorption bands or perform scatter correction. In that regard, reflectance (R) measurements were transformed to absorbance values through log transformation (log10 (1/R)). Then, the Savitzky-Golay algorithm [50] was performed to apply local polynomial regression on the first derivative of absorbance spectra. Through the first derivative, any possible baseline signal was removed [51], and a third degree polynomial was fitted through Partial Least Squares to spectral neighborhoods with 101 nm width for full PSR+ 3500 measurements and 21 nm for MEMS with the help of the Savitzky-Golay algorithm. The conclusive part of spectral preprocessing includes the range scaling, which was conducted through the application of z-transformation with the standard normal variates (SNV) technique [52]. Range scaling aims to reduce multiplicative effect of light scattering through centering and scaling, resulting in an average value of 0 and standard deviation of 1 for each spectral band, respectively. Reflectance measurements along with the preprocessing transformations are shown in Figure 4. PSR+ 3500 sensor presents higher sensitivity in reflectance sensing according to Figure 4c compared to MEMS, since it presents smoother changes for the first derivative curve. Due to this sensor characteristic, along with the lower spectral resolution, the reflectance's variability of each sensed wavelength is lower for the MEMS sensor.

Model Calibration and Validation
Two modeling iterations were conducted for both devices and each of the target chemical properties, one by fitting RF algorithm and other by PLSR method.
RF has been proven as an effective prediction algorithm in various domains [53,54]. RF is a classifier constituted from an ensemble of tree-based classifiers, RF = {T(x,Θ 1 ), T(x,Θ 2 ), . . . , T(x,Θ k )}, where k is the amount of trees and Θ k independent and identically distributed vectors that act as the class recommendation of each tree for input x [55]. The RF model is tuned from three parameters, which are the number of variables that are used for the development of each tree and ranges from 1 to the total number of variables; the number of trees developed (if it is greater than 500 there is not statistical significant difference on the results) [56]; and the minimum node size, which will determine the minimum size of each intermediate node and the resulting leaves. The value selection of the minimum node size alters the role of RF in terms of producing coarser outputs for size alterations; thus, for minimum node size equal to 1, the RF acts as a classifier, while for larger values, it performs regression results [57].
Land 2021, 10, x FOR PEER REVIEW 7 of 16 sensor presents higher sensitivity in reflectance sensing according to Figure 4c compared to MEMS, since it presents smoother changes for the first derivative curve. Due to this sensor characteristic, along with the lower spectral resolution, the reflectance's variability of each sensed wavelength is lower for the MEMS sensor.

Model Calibration and Validation
Two modeling iterations were conducted for both devices and each of the target chemical properties, one by fitting RF algorithm and other by PLSR method.
RF has been proven as an effective prediction algorithm in various domains [53,54]. RF is a classifier constituted from an ensemble of tree-based classifiers, RF = {T(x,Θ1), T(x,Θ2),…, T(x,Θk)}, where k is the amount of trees and Θk independent and identically distributed vectors that act as the class recommendation of each tree for input x [55]. The RF model is tuned from three parameters, which are the number of variables that are used for the development of each tree and ranges from 1 to the total number of variables; the number of trees developed (if it is greater than 500 there is not statistical significant difference on the results) [56]; and the minimum node size, which will determine the minimum size of each intermediate node and the resulting leaves. The value selection of the minimum node size alters the role of RF in terms of producing coarser outputs for size alterations; thus, for minimum node size equal to 1, the RF acts as a classifier, while for larger values, it performs regression results [57].
According to Wold et al. [58], PLSR is one of the most used approaches in estimating chemical properties values through multivariable linear modeling due to its simplicity without performance decrease tradeoff. PLSR is capable of analyzing strongly correlated According to Wold et al. [58], PLSR is one of the most used approaches in estimating chemical properties values through multivariable linear modeling due to its simplicity without performance decrease tradeoff. PLSR is capable of analyzing strongly correlated multivariate data by constructing factors of latent variables, aiming to explain the covariance between the predictor variables and their responses.
The accuracy assessment of the iterated models was performed through the calculation of standard modeling metrics between the measured values y and the estimated valuesŷ for the metrics of coefficient of determination (R 2 ), root mean squared error (RMSE), and ratio of performance to InterQuartile distance (RPIQ) that are given by: Due to the underlying data size, leave-one-out cross validation (LOOCV) was applied for calibration and validation dataset splitting, resulting in increased model's accuracy in terms of the abovementioned metrics and error unbiasedness. Data preparation and pre- processing was performed under R base package, while the modelling part was completed with Caret [59].

Physicochemical Analysis of Soil Samples
The results from the physicochemical analyses show an average clay content of 34.81%, while the average SOC content is 1.33%, which is typical for the Southern Balkan region soils that have low SOC content Table 1.

MEMS Sensor Optimal Configuration
The results of the spectral analysis of the six samples constituting the subset dataset displayed distinguishable trends and characteristics for each sample. A visual inspection of the different set ups regarding the heights and light intensities was initially performed. As shown in Figure 5, the different sensor set ups resulted in differences at the samples spectral signatures. More specifically, it is presented how different LI (100%, 90%, and 80%) affect the spectral signatures at a certain height, i.e., 15, 20, and 35 mm. It was observed that the greater differences from the reference measured were at minimum LI, and hence, that is a very important factor during the measurements. Regarding the comparison of the default MEMS sensor set up and the PSR+ 3500 measurement, it was observed that the reflectance of the latter was significantly lower (Figure 5d). The signal-to-noise ratio representing the electromagnetic radiation reflected from soil samples is positively correlated to the LI, while it is negatively correlated to the distance between the sensor and the target. This comes in accordance to Beer-Lambert's Law, stating that the radiation that passes through an area per time, decreases exponentially due to its conversion to Joule heating on its way to the absorbing medium. This change can be described from the formula: where L i stands for the light intensity, x is the length of the medium between the light source and the target, and A is the medium's absorbance coefficient.
To quantitatively assess the spectral similarity of the spectra produced from the MEMS sensor against the PSR+ 3500 reference measurements, the Spectral Angle Mapper was utilized. The results with the setup of 20 mm height and 100% light intensity showed the highest similarity to PSR+ 3500 measurements, compared to the rest scenarios, in terms of minimum Spectral Angle Mapper distance, as shown in Figure 6. The default setup (proximal) that implies 100% LI and zero distance between the sensor and the sample, which is suggested from the manufacturer, presented the least optimal sensor configuration, with an average spectral distance of 0.052 rad. These findings demonstrate that field of view plays an important role on the sensor's accuracy and verified the initial hypothesis indicating that in order to cover the minimum soil particle size limit, the sensor's field of view must be increased. This was achieved through the increase of distance between the soil sample and the device. The other nine evaluated scenarios presented more accurate measurements in comparison to the default. More specifically, the order from the less accurate scenario to the most accurate is shown in Table 2. To that end, the complete dataset was measured in respect of the insights derived from the preliminary spectral analysis, setting the light intensity to 100% and using the cylinder of 20 mm height. where Li stands for the light intensity, x is the length of the medium between the light source and the target, and A is the medium's absorbance coefficient. To quantitatively assess the spectral similarity of the spectra produced from the MEMS sensor against the PSR+ 3500 reference measurements, the Spectral Angle Mapper was utilized. The results with the setup of 20 mm height and 100% light intensity showed the highest similarity to PSR+ 3500 measurements, compared to the rest scenarios, in terms of minimum Spectral Angle Mapper distance, as shown in Figure 6. The default setup (proximal) that implies 100% LI and zero distance between the sensor and the sample, which is suggested from the manufacturer, presented the least optimal sensor configuration, with an average spectral distance of 0.052 rad. These findings demonstrate that field of view plays an important role on the sensor's accuracy and verified the initial hypothesis indicating that in order to cover the minimum soil particle size limit, the sensor's field of view must be increased. This was achieved through the increase of distance between the soil sample and the device. The other nine evaluated scenarios presented more accurate  Table 2. To that end, the complete dataset was measured in respect of the insights derived from the preliminary spectral analysis, setting the light intensity to 100% and using the cylinder of 20 mm height.

Reference Spectral Measurements Compared to The MEMS Sensor Spectral Measurements
The calibration models were generated from the use of the RF classifier and the PLSR statistical method, while the preprocessing methods included the transformation from absorbance to reflectance, the Savitzky-Golay first derivative, and the SNV transformation, which were all applied in both calibration models. The results of the full dataset with LOOCV presented higher metric values with the use of the PSR+ 3500 sensor and the RF algorithm with R 2 = 0.93 and RMSE = 6.97% for clay content and R 2 = 0.91 and RMSE = 0.32% for SOC content ( Table 3). The PLSR showed less accuracy, supporting the hypothesis that it is more efficient in predicting linear relationships. However, in the case of soil analysis, the relationship between soil properties and spectral data are usually presented as nonlinear [60], and this can explain the low predictive performance of PLSR. In contrast, RF could describe complex linear and nonlinear relationships and interactions while it reduces the probability of model overfitting. High accuracy for clay content prediction was also reported by Waiser et al. [28] and [61], with R 2 ranging from 0.85-0.88 values. However, the results are in contrast with the studies of [13,62], who reported better prediction models with the use of PLSR for clay content rather than SOC. In a study using 1534 legacy soil samples from Brazil, Ref. [63] noted that PLSR and RF models had similar predictive accuracies, however inferior to this study. This could be attributed to the local character of the present study and the smaller number of soil samples. Regarding SOC predictions, the results are similar to [14,56], who reported better results for the RF approach against the PLSR. However, Ref. [64] found that PLSR outperformed the use of RF, which highlights that differences between studies regarding the size of the dataset, and the range of the values of the modelled properties could significantly alter the accuracy of prediction. The respective scatter plots are shown in Figure 7. The variable importance analysis plot represents the most important wavelengths selected from the RF calibration models (Figure 8). It could be observed that in most cases, the respective wavelengths of the PSR+ 3500 correspond with the MEMS sensor. Clay minerals in general present distinctive absorptions in the NIR-SWIR region. Regarding the PSR+ 3500 sensors, significant wavelengths above 2200 indicate the presence of clay minerals such as Kaolinite, which is in agreement with Lee et al. [23], that have also demonstrated the following specific wavelengths as important for clay estimation (1904, 2177, 2201, 2213, 2321, 2492 nm). The narrow absorption bands near the 1900 nm region result from the presence of hydroxyls and water, vermiculite [63], and montmorillonitic clay [64]. Regarding SOC, RF variable importance analysis of PSR+ 3500 full spectrum indicated that the regions of the most significant wavelengths were found mostly centered in the visible region. This probably occurs due to the effect that SOC have in the soil's color as a result of absorptions from chromophores, as the literature review reveals [5,65,66]. More specifically, it has been observed that the yellow and red colors of the soil are due to the presence of hematite and goethite [67]. SOC in general is closely related to NIR wavelengths indicating C-O, C=O, and N-H compounds, while the 1852, 1930, 2033, 2060, and 2208 nm were indicated fundamental regions by Viscarra Rossel et al. [64]. Shi et al. [68] also reported the significance of the 1800-2450 nm region that corresponds with the part of the spectral region of the used MEMS sensor. The above suggest that the use of a MEMS spectral sensor is property related, and therefore there is a need for careful selection of its spectral range.
RF could describe complex linear and nonlinear relationships and interactions while it reduces the probability of model overfitting. High accuracy for clay content prediction was also reported by Waiser et al. [28] and [61], with R 2 ranging from 0.85-0.88 values. However, the results are in contrast with the studies of [13,62], who reported better prediction models with the use of PLSR for clay content rather than SOC. In a study using 1534 legacy soil samples from Brazil, [63] noted that PLSR and RF models had similar predictive accuracies, however inferior to this study. This could be attributed to the local character of the present study and the smaller number of soil samples. Regarding SOC predictions, the results are similar to [14,56], who reported better results for the RF approach against the PLSR. However, [64] found that PLSR outperformed the use of RF, which highlights that differences between studies regarding the size of the dataset, and the range of the values of the modelled properties could significantly alter the accuracy of prediction. The respective scatter plots are shown in Figure 7.   Comparing the results between both sensors, it can be observed that their accuracies are very close. The RF model gave an R 2 = 0.85 and RMSE = 8.17% for clay content, while for SOC content, the results were slightly lower with R 2 = 0.80 and RMSE = 0.46%. This could be attributed to the selection of a suitable spectral region of MEMS sensor for specifically estimating clay and SOC. However, the PLSR results of the MEMS sensor were significantly lower compared to all the models. The results for the models based on PSR+ 3500 full spectrum are in complete accordance with [69], where a single dimensioned multi-channel convolutional neural network was applied to VNIR-SWIR spectra for the estimation of clay and SOC content, with the result of R 2 of 0.86 in both cases. Viscarra Rossel et al. [70] reported similar prediction accuracies for both clay and SOC, using only the NIR region of the electromagnetic spectrum that supports the choice of the respective MEMS sensor for the specific soil attributes. Moreover, the findings of the current study demonstrate the potential of miniaturized spectrometers with reduced wavelength ranges compared to other alternatives, as demonstrated by Tang et al. [40], with similar or even better predictive performance. This can be explained by the choice of RF for the development of regression spectroscopic models. erals such as Kaolinite, which is in agreement with Lee et al. [23], that have also demonstrated the following specific wavelengths as important for clay estimation (1904, 2177, 2201, 2213, 2321, 2492 nm). The narrow absorption bands near the 1900 nm region result from the presence of hydroxyls and water, vermiculite [63], and montmorillonitic clay [64]. Regarding SOC, RF variable importance analysis of PSR+ 3500 full spectrum indicated that the regions of the most significant wavelengths were found mostly centered in the visible region. This probably occurs due to the effect that SOC have in the soil's color as a result of absorptions from chromophores, as the literature review reveals [5,65,66]. More specifically, it has been observed that the yellow and red colors of the soil are due to the presence of hematite and goethite [67]. SOC in general is closely related to NIR wavelengths indicating C-O, C=O, and N-H compounds, while the 1852, 1930, 2033, 2060, and 2208 nm were indicated fundamental regions by Viscarra Rossel et al. [64]. Shi et al. [68] also reported the significance of the 1800-2450 nm region that corresponds with the part of the spectral region of the used MEMS sensor. The above suggest that the use of a MEMS spectral sensor is property related, and therefore there is a need for careful selection of its spectral range. Comparing the results between both sensors, it can be observed that their accuracies are very close. The RF model gave an R 2 = 0.85 and RMSE = 8.17% for clay content, while for SOC content, the results were slightly lower with R 2 = 0.80 and RMSE = 0.46%. This could be attributed to the selection of a suitable spectral region of MEMS sensor for specifically estimating clay and SOC. However, the PLSR results of the MEMS sensor were significantly lower compared to all the models. The results for the models based on PSR+ 3500 full spectrum are in complete accordance with [69], where a single dimensioned multi-channel convolutional neural network was applied to VNIR-SWIR spectra for the estimation of clay and SOC content, with the result of R 2 of 0.86 in both cases. Viscarra Rossel et al. [70] reported similar prediction accuracies for both clay and SOC, using only the NIR region of the electromagnetic spectrum that supports the choice of the respective MEMS sensor for the specific soil attributes. Moreover, the findings of the current study demonstrate the potential of miniaturized spectrometers with reduced wavelength ranges compared to other alternatives, as demonstrated by Tang et al. [40], with similar or even better predictive performance. This can be explained by the choice of RF for the development of regression spectroscopic models. Despite that, the lower accuracy of the MEMS sensors compared to the PSR+ 3500 is presumable due to the smaller spectral range and spectral resolution and hence missing specific wavelengths that could enhance model's predictive performance. As it is shown from various studies mentioned above, both the visible spectral region and wavelengths above 2200 nm can hold additional valuable information that could result in better prediction models.
The evaluated device meets all portability characteristics; thus, it can be the basis for a set of in situ soil spectroscopy applications and services. It is noteworthy that a remarkable success of collecting data in this way can reflect widespread interest for daily farming applications (e.g., fertilization by farmers and/or agri-consultants) and for checking of farming compliance to various international policies and treaties. However, there are limitations that need to be surpassed in order to acquire representative spectral measurements that are not taken under laboratory conditions, like the absence of developed standards and protocols for measurements and sample preparation. In this work, diffuse reflectance analysis was performed according to the standardization protocol proposed by Ben Dor et al. [46] that only concerns reflectance measurements in the laboratory, indicating that the inter-calibration of MEMS and PSR+ 3500 was achieved with the help of standardization through Wylie Bay and Lucky Bay fine earth samples and their standard measurements provided from CSIRO. These standardization media, due to their nature, present characteristics that may be impractical for in situ use (i.e., hard to maintain pure). To that end, there is a need for further assessment of different materials that could be alternatives of the internal soil standards.

Conclusions
The elaborated work provides insights that MEMS spectral sensors can support the global effort of assuring crop quality through monitoring soil and accurately estimating key properties playing the role of soil quality indices. The spectral signatures acquired with the MEMS sensor presented high similarity with corresponding PSR+ 3500 in the common spectral range, which encourages further research of soil properties that are sensitive to vibrations due to electromagnetic radiation with wavelengths inside the 1750-2150 nm region.
Within the framework of this work, ten different measurement setups were evaluated including the default setup, over a sub-sample of six instances. The measurements were compared to PSR+ 3500 measurements, and Spectral Angle Mapper indicated that the MEMS sensor needs to operate with maximum light intensity and have 20 mm distance from soil sample. This is an aspect of MEMS that needs to be further assessed, and especially the capability to obtain in situ measurements of reflectance of rough soil under high scatter effect due to the existence of aggregates with size larger than 2 mm.
The modeling part indicated that RF outperformed PLSR in both clay and SOC predictions, resulting in satisfactory values of accuracy metrics for the case of RF. On the other hand, PLSR modeling over MEMS signatures needs further research, since the device's coarse spectral resolution and lower spectral sensitivity presumably provided less information compared to PSR+ 3500 related to the studied properties. The fitted models revealed correlations between clay and reflectance at the regions of 1990 and 1880 nm and for SOC over the region of 1750, 2000, and 2100 nm that come in accordance with the absorption features of clay and SOC as reported in the literature.
Nonetheless, the need for the analysis of a broader set of soil samples with high spatial variability covering different soil classes is highlighted, under real field conditions. In addition, the limited spectral range of the sensor constitutes that a sensor could mainly be used for specific soil properties estimation. Moreover, this novel approach needs to be tested in the field under real conditions (e.g., roughness), along with the full chain of interconnected systems, to pave the way for the development of tailored downstream services and associated tools aimed at the various stakeholders.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: http://datahub.geocradle.eu/dataset/regional-soil-spectral-library. Additional data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.