Next Article in Journal
An Improved Robust ESKF Fusion Positioning Method with a Novel UWB-VIO Initialization
Previous Article in Journal
Design and Implementation of an IoT-Based Low-Power Wearable EEG Sensing System for Home-Based Sleep Monitoring
Previous Article in Special Issue
Enhanced Cropland SOM Prediction via LEW-DWT Fusion of Multi-Temporal Landsat 8 Images and Time-Series NDVI Features
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Low-Cost Portable Near-Infrared Spectroscopy for Predicting Soil Properties in Paddy Fields of Southeastern China

1
State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou 310058, China
2
Zhejiang Key Laboratory of Agricultural Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
3
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 311215, China
4
State Key Laboratory of Efficient Utilization of Arid and Semi-Arid Arable Land in Northern China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(6), 1805; https://doi.org/10.3390/s26061805
Submission received: 28 January 2026 / Revised: 10 March 2026 / Accepted: 11 March 2026 / Published: 12 March 2026
(This article belongs to the Special Issue Soil Sensing and Mapping in Precision Agriculture: 2nd Edition)

Abstract

Timely and accurate soil property information is critical for sustainable agriculture and precision nutrient management. Conventional laboratory methods are accurate but costly and labor-intensive, restricting their feasibility for high-density soil mapping. Low-cost, portable near-infrared (NIR) spectroscopy presents a promising alternative for rapid, on-site, and non-destructive soil analysis. This study aimed to evaluate the potential of a low-cost, portable NIR sensor (NeoSpectra) for the quantitative prediction of key soil properties in paddy fields from Southeastern China. The target properties were soil organic matter (SOM), total nitrogen (TN), pH, and particle size fractions (clay, silt, and sand). A total of 995 soil samples were collected from representative paddy fields in the region and spectra measurements were conducted in the laboratory on air-dried samples. We developed and compared the performance of multiple machine learning algorithms, including partial least squares regression (PLSR), Cubist, random forest (RF) and memory-based learning (MBL), to build robust calibration models. The predictive models showed substantial performance for SOM and TN, indicating high accuracy (R2 > 0.75, LCCC > 0.85, RPD > 2) for quantitative prediction. Predictions for pH, silt, sand, and clay were less accurate (R2 of 0.48–0.53, LCCC of 0.67–0.71, RPD of 1.39–1.49), suggesting the sensor’s utility is limited to indicating general trends for these properties. Among the tested algorithms, MBL consistently provided the most accurate and robust predictions across the majority of soil properties. Our findings demonstrate that the low-cost portable NIR sensor, when coupled with appropriate machine learning algorithms, is a powerful and viable tool for the rapid and reliable estimation of critical paddy soil fertility properties (SOM and TN). This technology has significant potential to support field-level soil health monitoring, precision fertilization strategies, and sustainable land management in the agricultural systems of Southeastern China.

1. Introduction

Sustainable agricultural management and the advancement of precision agriculture are critically dependent on the accurate and timely assessment of soil properties. Key indicators of soil health and fertility, such as soil organic matter (SOM), total nitrogen (TN), pH, and texture (sand, silt, and clay content), govern nutrient cycling, water retention, and ultimately, crop productivity [1]. Traditionally, the characterization of these properties has relied on conventional wet chemistry laboratory methods. Although accurate, wet chemistry is expensive, slow, and generates hazardous waste, limiting its feasibility for high-density monitoring [2]. This analytical bottleneck hinders the ability of farmers and land managers to make rapid, data-driven decisions for site-specific nutrient management and environmental protection.
To overcome these limitations, diffuse reflectance spectroscopy, particularly in the near-infrared (NIR) region (780–2500 nm), has emerged as a promising alternative for rapid, non-destructive, and cost-effective soil analysis [3]. NIR spectroscopy measures the absorption of light by molecular bonds (e.g., C-H, N-H, O-H) in soil constituents. By applying chemometric and machine learning techniques to the spectral data, it is possible to develop predictive models for various soil properties simultaneously from a single scan [4]. The performance of soil property prediction is heavily reliant on the choice of machine learning algorithm. While partial least squares regression (PLSR) remains a standard linear technique, it often struggles to capture the complex, non-linear relationships present in large, heterogeneous soil spectral libraries [5]. Non-linear algorithms such as random forest (RF) and Cubist have been introduced to address this, yet they still typically apply a global model to the entire dataset. In contrast, memory-based learning (MBL) adopts a local modeling strategy, dynamically selecting spectrally similar neighbors for each prediction sample [5,6]. This approach has shown particular promise in regional-scale studies by effectively handling local spectral non-linearities that global models may over-smooth [6]. Early applications predominantly utilized expensive, laboratory-based spectrometers, which, despite their accuracy, still required soil samples to be transported from the field, thus retaining some of the logistical challenges of conventional methods [7].
Recent technological advancements have led to the miniaturization of spectrometers, resulting in the development of low-cost, portable, and handheld NIR sensors. These devices offer the transformative potential for in situ soil analysis, empowering users to obtain immediate feedback directly in the field [8]. The accessibility and ease of use of these portable sensors are democratizing soil analysis, moving it from the laboratory to the farm. Numerous studies have demonstrated the capability of various portable NIR instruments to predict key soil properties. For instance, research has shown successful predictions of soil organic carbon (SOC), total nitrogen (TN), and soil texture, although the performance can be influenced by factors such as soil moisture and the robustness of the calibration models [9,10].
Among the emerging low-cost devices, the NeoSpectra sensor (Si-Ware Systems) distinguishes itself through the use of Micro-Electro-Mechanical Systems (MEMS) Fourier-Transform (FT-NIR) technology. This design allows for a significant reduction in size and cost while maintaining a wide spectral range (1350–2550 nm) that covers major absorption features for SOM and clay minerals. Unlike discrete-wavelength sensors, the NeoSpectra captures continuous spectral data, making it a potentially transformative tool for soil analysis. Several recent studies have validated its effectiveness across diverse soil types and geographic regions. For example, Sharififar et al. (2019) evaluated the NeoSpectra sensor for predicting soil organic carbon and total carbon, finding its performance to be comparable to more expensive research-grade instruments [11]. Mitu et al. (2024) further investigated the consistency among multiple NeoSpectra units and concluded that while the devices produce comparable spectral data, calibration transfer strategies may be necessary for applications involving multiple sensors to ensure high accuracy [12]. More recently, large-scale soil spectral libraries have been developed using the NeoSpectra sensor, covering a wide diversity of mineral soils in the United States, Africa and Australia, demonstrating its utility for building robust predictive models for properties including SOC, TN, pH, and clay content [13,14]. These studies collectively highlight the growing confidence in the NeoSpectra sensor as a reliable tool for soil science and agronomy.
However, the performance of NIR spectroscopy is highly dependent on the specificity of the calibration dataset, and models developed for one region or soil type may not be directly applicable to another. Paddy soils, in particular, present a unique context due to their management under flooded conditions, which influences their biogeochemical properties. While some studies have explored the use of vis-NIR spectroscopy in paddy soils [15,16,17,18], research focusing specifically on the application of low-cost, portable FT-NIR sensors like the NeoSpectra in this critical agricultural system remains limited.
While low-cost NIR sensors have shown promise, the current literature is heavily skewed toward upland soils in dryland systems. Paddy soils represent a unique challenge due to their distinct mineralogy formed under seasonally flooded conditions, which can obscure spectral features. Furthermore, many prior evaluations rely on relatively small datasets (<200 samples) or standard global calibration models (e.g., PLSR), which often fail to capture regional soil heterogeneity. To date, a comprehensive evaluation of MEMS-based sensors using MBL on a large, regional-scale paddy soil dataset is lacking. Unlike the vast majority of existing low-cost NIR evaluations that focus on stable, aerated upland soils, this study targets the unique complexity of paddy soil systems. These soils undergo periodic redox-driven transformations and anthropogenic compaction (plow pans) that create distinct spectral interference. Consequently, global dryland models are often non-transferable to these critical rice-producing regions, necessitating a regional-scale evaluation that accounts for these specific pedogenic conditions. Therefore, this study aims to address this research gap by evaluating the potential of the low-cost, portable NeoSpectra NIR sensor for predicting key soil properties (SOM, TN, pH, clay, silt, and sand) in paddy soils from Southeastern China. Using a substantial dataset of 995 soil samples along with laboratory spectral measurements on air-dried soil, we will develop and compare the performance of multiple machine learning algorithms to build robust predictive models. The findings of this work will contribute to establishing the utility of this technology for rapid, on-site soil assessment to support precision agriculture in one of the world’s most important rice-producing regions.

2. Materials and Methods

2.1. Study Area and Soil Sampling

This study was conducted in the Hang-Jia-Hu plain, the largest alluvial plain in Zhejiang Province, China, and a critical component of the Yangtze River Delta (Figure 1). As the province’s most critical grain production base, the region encompasses the major cities of Hangzhou, Huzhou, and Jiaxing, spanning twelve counties and approximately 7600 km2. The area is characterized by a subtropical humid monsoon climate, with hot, humid summers and cool winters. The average annual temperature is 16 °C and annual precipitation is approximately 1300 mm. Geomorphologically, the region consists of coastal and lacustrine alluvial plains with low, flat topography, particularly in the east. Rice is the dominant crop, making paddy soils the major soil type in this study area. The cropping system typically follows a rice–wheat or rice–rapeseed rotation, with the rice growing season generally extending from May to October.
Field surveys were conducted from May to June 2025 using a grid-based sampling design stratified by terrain, slope position, soil type, and cropping system. A total of 995 soil samples were collected from 256 sampling sites. At each site, soil cores were taken to a depth of 1 m using a handheld high-frequency vibration soil core sampler (Model VD51, Cote, Melbourne, Australia). The cores were sectioned into four depth intervals (i.e., 0–20, 20–40, 40–60, and 60–100 cm) where possible.
Selected soil physicochemical properties were determined according to standard national protocols. Soil organic matter (SOM) was determined using the potassium dichromate oxidation–external heating method (Walkley–Black). Total nitrogen (TN) was measured using the semi-micro Kjeldahl method. Soil pH was determined potentiometrically using a glass electrode in a 1:2.5 (w/v) soil-to-water suspension. Soil particle size fractions (clay, silt, and sand) were analyzed using the pipette method after dispersing the soil with sodium hexametaphosphate.

2.2. Spectral Measurement and Preprocessing

Spectral measurements were conducted in the laboratory on air-dried soil samples to establish a standardized baseline for the sensor’s performance, eliminating the confounding effects of variable soil moisture and surface roughness typical of field conditions. Samples were air-dried at room temperature until equilibrium moisture content was reached. All samples were ground, and passed through a 2 mm sieve to ensure homogeneity. To ensure the validity of the spectral predictive models, the homogenized soil samples were split into two identical subsamples: one portion was used for the reference chemical analysis (as described in Section 2.1), and the corresponding portion was subjected to spectral scanning. Approximately 100 g of sieved soil was placed into a 10 cm diameter Petri dish, and the surface was carefully leveled using a flat scraper to minimize surface roughness and shadow effects.
Soil reflectance spectra were acquired using a NeoSpectra near-infrared (NIR) spectrometer (Si-Ware Systems, Menlo Park, CA, USA) covering the spectral range of 1350–2550 nm. The NeoSpectra sensor utilizes optical MEMS technology, which significantly reduces cost and enhances portability, making it one of the most widely studied NIR spectrometers in soil science [19]. The instrument operates with a non-uniform spectral sampling interval, averaging a spectral resolution of approximately 8 nm across the wavelength range. For each soil sample, spectral measurements were repeated three times. Each measurement consisted of 10 automatic internal scans. The final representative spectrum for each sample was obtained by averaging the 30 total scans.
To enhance relevant soil information and mitigate noise and scattering effects, several spectral preprocessing methods were evaluated. These included absorbance transformation (AB, AB = log(1/R)), first derivative (FD), and standard normal variate (SNV) correction. In addition to individual methods, combinations were also tested, resulting in a total of six preprocessing strategies (AB, FD, SNV, AB + FD, AB + SNV, and AB + FD + SNV). The optimal spectral preprocessing method was determined by the best model performance for each soil property.

2.3. Spectral Modelling

Four machine learning algorithms were employed to predict soil properties from the NIR spectral data.
PLSR is a standard linear technique in soil spectroscopy that addresses multicollinearity by projecting high-dimensional spectral data into orthogonal latent variables (LVs) [20]. This reduces data complexity while maximizing covariance with the response variable. The optimal number of LVs (ncomp) was determined within a range of 1 to 30 by minimizing the root mean square error (RMSE) using 5-fold cross-validation.
RF is an ensemble method that aggregates predictions from multiple decision trees trained on bootstrap samples [21]. The final prediction is derived by averaging the outputs of individual trees. We fixed the number of trees (ntree) at 500 to ensure stability, while the number of features considered at each split (mtry) was optimized between 1 and 15 using 5-fold cross-validation [22].
Cubist is a rule-based piecewise linear model derived from the M5 algorithm [23]. It manages complex non-linear relationships by recursively partitioning the dataset into subsets defined by rules, fitting a separate linear model for each. Two key hyperparameters, committees (10–100) and neighbors (1–9), were optimized via 5-fold cross-validation.
MBL is a local learning approach that avoids training a global model [24]. Instead, it dynamically constructs local regression models for each new sample using spectrally similar cases from the calibration set [25]. In this study, the Mahalanobis distance, calculated on the first two principal components, was used to identify neighbors. A local PLSR model was then fitted for each new sample. The number of neighbors (30–250, step size of 10) and local LVs (3–20) were optimized to balance model stability and accuracy.

2.4. Evaluation of Model Performance

A total of 995 samples were partitioned into calibration (75%, 746 samples) and validation (25%, 249 samples) sets using a location-based Kennard–Stone algorithm. To prevent over-optimistic model evaluation, the algorithm was constrained to ensure that all samples from a single sampling location (across different depths) were assigned to the same subset. Details of the location-based Kennard–Stone algorithm adopted in this study are illustrated in Figure 2. Model performance of the four algorithms was evaluated on the validation set using the coefficient of determination (R2), root mean square error (RMSE), Lin’s concordance correlation coefficient (LCCC), and ratio of performance to deviation (RPD). Higher R2 and LCCC values, along with lower RMSE, indicate superior model performance.

3. Results

3.1. Statistical Summary of Soil Properties and Spectral Characteristics

Descriptive statistics for the measured soil properties are summarized in Table 1. The dataset exhibited a wide range of variation, particularly for soil fertility indicators. SOM and TN showed high variability, with coefficients of variation (CV) of 69.58% and 61.54%, respectively. The soil particle size fractions were dominated by silt (mean of 68.58%), followed by sand (19.85%) and clay (11.57%), reflecting the silt-rich alluvial parent material of the region. Soil pH covered a broad range from acidic (4.4) to alkaline (8.55), with a near-neutral mean of 6.93. Importantly, the statistical characteristics (mean, standard deviation, and range) of the calibration and validation sets were comparable across all properties, confirming that the location-based Kennard–Stone algorithm effectively partitioned the dataset into representative subsets.
The average soil reflectance spectra and their variations across different depth intervals are presented in Figure 3a. The spectra exhibited typical characteristics of soil reflectance in the NIR region, with distinct absorption features near 1400 nm and 1900 nm associated with O-H bonds in soil water and clay minerals, and weaker features around 2200 nm related to Al-OH absorption in clay lattice structures. Overall, soil reflectance increased with wavelength. The standard deviation (shaded area) indicated significant spectral variation among the samples, reflecting the high natural heterogeneity of the soil across the study area. When looking into spectral variation by depth, the topsoil layers (0–20 and 20–40 cm) generally exhibited lower reflectance compared to deeper layers (40–60 and 60–100 cm), likely due to higher SOM in the topsoil, which tends to absorb more light.
The principal component analysis (PCA) performed on the spectral data (Figure 3b) revealed the distribution of soil samples at four depth intervals in the spectral space. The first two principal components (PC1 and PC2) explained 25.59% of the total spectral variance (15.07% and 10.52%, respectively). The score plot illustrates a continuous distribution rather than distinct clustering, reflecting the gradual pedogenic transition down the soil profile. However, a tendency toward separation is observable, where topsoil samples (0–20 cm) generally occupy the lower quadrants (negative PC2 scores) while subsoil samples (60–100 cm) shift toward the upper quadrants, indicating spectral differentiation driven by SOM and texture gradients.

3.2. Model Performance Across Different Predictive Models

The performance of the predictive models for the six soil properties using the validation set is illustrated in Table 2. Among the four machine learning algorithms (PLSR, RF, Cubist, and MBL), the MBL algorithm consistently yielded the most accurate predictions. Consequently, the results presented in Figure 4 focus on the performance of the MBL models.
The models achieved high predictive accuracy for soil fertility properties. SOM prediction was robust, with R2 of 0.76, LCCC of 0.87, and RPD of 2.05 (Figure 4a). Similarly, TN predictions showed strong agreement between observed and predicted values (R2 of 0.75, LCCC of 0.86, RPD of 2.01) (Figure 4b). These results indicate that the portable NeoSpectra sensor can quantitatively monitor SOM and TN with high reliability. In contrast, the predictions for pH and soil particle size fractions were less accurate. The model for pH achieved R2 of 0.53, LCCC of 0.71, and RPD of 1.46 (Figure 4c). For soil particle size fractions, silt content showed the best performance among the fractions (R2 of 0.55, LCCC of 0.70, RPD of 1.49), followed by sand (R2 of 0.53, LCCC of 0.71, RPD of 1.46) and clay (R2 of 0.48, LCCC of 0.67, RPD of 1.39) (Figure 4d–f). While these accuracies are lower than those for SOM and TN, the LCCC values suggest that the models can still distinguish between high and low values, making them useful for rapid screening or identifying general trends.

3.3. Model Performance at Different Depth Intervals

Model performance across four depth intervals (0–100 cm) revealed clear depth-dependent patterns. SOM prediction was most accurate in the topsoil 0–20 cm layer (R2 = 0.81) and 20–40 cm layer (R2 = 0.73), but declined noticeably in deeper layers (Figure 5a). However, performance dropped noticeably in the deeper subsurface layers (40–60 cm and 60–100 cm), with R2 decreasing to 0.61 and 0.52, respectively. A similar trend was observed for TN, where the 0–60 cm layers exhibited superior predictability (R2 of 0.67–0.78) compared to the 60–100 cm layers (R2 of 0.39) (Figure 5b). For soil particle size fractions, the depth-wise prediction performance showed more variable patterns. Clay and silt showed an interesting trend that both of them were predicted with relatively high accuracy in the upper layers (R2 of 0.65–0.77), followed by a marked decline in the 40–60 cm interval (R2 of 0.20–0.25) and a recovery in the deepest layer (R2 of 0.46–0.75, Figure 5d,e). In contrast, sand showed a more gradual decrease in predictive performance with depth, with R2 decreasing from 0.63 at 0–20 cm to 0.54 at 20–40 cm and remaining around 0.50 in the deeper layers (R2 of 0.50–0.51, Figure 5f). These results suggest that the NeoSpectra sensor is particularly effective for assessing key fertility indicators (SOM, TN) and physical properties in the plow layer (0–40 cm), which is the most critical zone for rice root growth and nutrient management.

4. Discussion

4.1. Potential of NeoSpectra for Characterizing Paddy Soil Properties

This study demonstrates that the low-cost, portable NeoSpectra sensor can successfully predict key soil fertility properties in the paddy fields of Southeastern China. The high prediction accuracy for SOM (R2 = 0.76) and TN (R2 = 0.75) is comparable to results often achieved with benchtop research-grade spectrometers in paddy soil regions [26,27,28]. This performance is attributed to the direct interaction of NIR radiation with the fundamental molecular vibrations of C–H, N–H, and O–H bonds present in SOM [29]. Given that SOM and TN are the most critical indicators for soil fertility and rice productivity, these results confirm that the MEMS-based NeoSpectra sensor is a viable tool for supporting precision nutrient management in this region.
In contrast, the predictions for pH and soil particle size fractions (clay, silt, sand) were less accurate, serving primarily as indicators of general trends (R2 of 0.48–0.55) rather than precise quantitative measurements. This lower performance is consistent with previous studies on portable sensors [12,14]. Unlike SOM, pH does not have a direct spectral response in the NIR region so that its prediction relies on correlations with spectrally active soil components such as clay minerals and SOM [29]. Similarly, while clay minerals have distinct absorption features (e.g., around 1900 and 2200 nm), the complex mineralogy of paddy soils often influenced by redox-influenced iron oxides may complicate the spectral signals. Nevertheless, the achieved accuracy is sufficient for rapid, high-density field screening to identify problematic zones (e.g., acidification or sandy patches) that require further attention.

4.2. Superiority of MBL in Regional Modeling

A key finding of this study was the superior performance of the MBL algorithm compared to global modeling techniques like PLSR, RF, and Cubist. Regional soil datasets, such as the one used in this study (covering 7600 km2), often present high heterogeneity in parent materials due to pedogenesis as well as human management. Global models like PLSR attempt to fit a single equation to the entire dataset, which often leads to the averaging of spectral features and reduced accuracy for local variations [18,25,30].
MBL overcomes this limitation by dynamically constructing a local model for each unknown sample using only its most spectrally similar neighbors from the calibration library. This approach is particularly effective for complex soil spectral libraries because it implicitly handles non-linear relationships by approximating them with a series of local linear models. Our results suggest that for the regional-scale soil spectral library in Southeastern China, implementing MBL strategies is crucial for maximizing the predictive power of low-cost sensors. The novelty of this approach lies in the integration of MEMS-based sensing with a local modeling strategy (MBL) at a regional scale. While previous studies often utilize small, homogeneous datasets or rigid global models, our results demonstrate that MBL effectively bypasses the ‘averaging’ effect of global equations. This allows low-cost sensors to achieve accuracies (R2 = 0.76 for SOM, R2 = 0.75 for TN) previously reserved for research-grade benchtop instruments in paddy environments.

4.3. Influence of Soil Depth and Management on Prediction Accuracy

The stratification of model performance by depth intervals revealed interesting patterns related to paddy soil pedogenesis and human management. For SOM and TN, predictive accuracy exhibited a clear monotonic decrease with depth. The superior prediction of SOM and TN in the topsoil layers (0–40 cm) aligns with the typical distribution of SOM in cultivated soils, where surface accumulation from crop residues and fertilization creates a wider range of SOC and TN that facilitates robust model calibration. This zone corresponds to the plow layer, which are the most agronomically active and variable layers in rice cultivation systems [31]. Accuracy declined notably in the subsoil (60–100 cm), likely due to the lower concentration and variance of SOM, which reduces the signal-to-noise ratio in the NIR spectra.
In contrast, soil particle size fractions (clay and silt) showed a complex, non-linear trend. While accuracy was high in the surface layer, a marked decline occurred specifically in the 40–60 cm transition zone. This depth often corresponds to the plow pan, a compacted layer characterized by the accumulation of iron–manganese nodules and distinct hydrologic properties due to long-term flooding. These features likely introduce spectral interference that disrupts the correlation between clay minerals and the NIR signal. This identification of a critical depth-dependent limitation provides a necessary boundary condition for portable NIR applications. The disruption of predictability in the 40–60 cm zone is a unique feature of managed paddy profiles, where the accumulation of iron–manganese nodules in the plow pan disrupts the correlation between clay minerals and NIR signals. This finding distinguishes our work from general surface-soil surveys by highlighting how anthropogenic management layers directly impact sensor reliability. The recovery of prediction accuracy in the deep subsoil (60–100 cm) suggests that the parent material at this depth is more homogeneous and free from the distinct anthropogenic disturbances found in the plow pan.

4.4. Implications for Precision Agriculture in Southeastern China

The integration of the NeoSpectra sensor with MBL algorithms offers a practical solution to the data bottleneck in precision agriculture. Conventional grid sampling (as performed in this study) is too costly for routine monitoring by smallholder farmers. However, the portability and affordability of the NeoSpectra device enable high-density spatial and temporal measurements, facilitating more effective, data-driven soil monitoring [32,33].
Crucially, to fully leverage the extensive data contained in the China Soil Spectral Library (CSSL), future research must focus on spectral transfer modeling [34,35,36]. The CSSL is primarily constructed using high-precision, research-grade spectrometers (e.g., ASD FieldSpec), which differ significantly in spectral resolution and range from MEMS-based sensors like NeoSpectra. This instrumental discrepancy creates a domain shift that often prevents the direct application of national-scale models to low-cost sensor data. Therefore, developing robust calibration transfer strategies or deep transfer learning is necessary. These methods can mathematically harmonize NeoSpectra readings to align with the ASD standard, effectively bridging the gap between affordable field sensors and high-quality national databases. Successful implementation of this transfer would allow local stakeholders to benefit from the massive, diverse training data of the national soil spectral library without the need for extensive, site-specific recalibration.
While the device currently requires air-dried, sieved samples for maximum accuracy, future work should also focus on calibrating the NeoSpectra sensor for in situ spectral measurements. This will require incorporating external parameter orthogonalization (EPO) or direct standardization (DS) algorithms to remove the interfering effects of soil moisture and surface roughness [15,37]. By combining spectral transfer modeling with in situ calibration, this technology can unlock the full potential of the national soil spectral library, optimizing nitrogen fertilization and soil health monitoring across the Yangtze Delta. Finally, while this study established the feasibility of the NeoSpectra sensor using robust machine learning methods (MBL), future research employing larger-scale libraries (N > 2000) should investigate deep learning approaches (e.g., Convolutional Neural Networks) together with advanced variable selection to potentially capture more complex spectral features [13,38,39].

5. Conclusions

This study demonstrates that the low-cost portable NeoSpectra sensor is a viable tool for assessing paddy soil fertility in Southeastern China. MBL proved to be the most robust modeling strategy, consistently outperforming global algorithms. The sensor achieved high predictive accuracy for SOM and TN, making it suitable for quantitative nutrient management, while predictions for pH and soil particle size fractions (clay, silt, and sand) served primarily as indicators of general trends. Furthermore, model performance was superior in the agronomically critical topsoil layers (0–40 cm). Overall, this technology enables cost-effective, high-density soil monitoring to support precision agriculture in intensive rice production systems.

Author Contributions

Conceptualization, S.C.; methodology, S.C. and M.L.; software, M.L. and Y.J.; validation, H.G. and D.Y.; formal analysis, M.L. and Y.J.; data curation, M.L. and Y.J.; writing—original draft preparation, S.C., M.L. and Y.J.; writing—review and editing, H.G., D.Y., J.Q., Q.Y. and Z.S.; visualization, M.L. and Y.J.; supervision, S.C.; funding acquisition, S.C. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (U24A20575, 42201054) and the open project of State Key Laboratory of Efficient Utilization of Arable Land in China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences (EUAL-2025-01).

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be available upon reasonable request to the corresponding author.

Acknowledgments

We would like to express our gratitude to all colleagues and students involved in the field surveys and spectral measurements. During the preparation of this manuscript, the authors used Gemini 3Pro for grammar, punctuation, and language refinement, without altering the content.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lal, R. Soil health and carbon management. Food Energy Secur. 2016, 5, 212–222. [Google Scholar] [CrossRef]
  2. Viscarra Rossel, R.A.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  3. Stenberg, B.; Viscarra Rossel, R.A.; Mouazen, A.M.; Wetterlind, J. Visible and near infrared spectroscopy in soil science. Adv. Agron. 2010, 107, 163–215. [Google Scholar]
  4. Ben-Dor, E.; Chabrillat, S.; Demattê, J.A.M.; Taylor, G.R.; Hill, J.; Whiting, M.L.; Sommer, S. Using imaging spectroscopy to study soil properties. Remote Sens. Environ. 2009, 113, S38–S55. [Google Scholar] [CrossRef]
  5. Barra, I.; Haefele, S.M.; Sakrabani, R.; Kebede, F. Soil spectroscopy with the use of chemometrics, machine learning and pre-processing techniques in soil diagnosis: Recent advances—A review. TrAC Trends Anal. Chem. 2021, 135, 116166. [Google Scholar] [CrossRef]
  6. Viscarra Rossel, R.A.; Behrens, T.; Ben-Dor, E.; Chabrillat, S.; Demattê, J.A.M.; Ge, Y.; Gomez, C.; Guerrero, C.; Peng, Y.; Ramirez-Lopez, L.; et al. Diffuse reflectance spectroscopy for estimating soil properties: A technology for the 21st century. Eur. J. Soil Sci. 2022, 73, e13271. [Google Scholar] [CrossRef]
  7. Bellon-Maurel, V.; Fernandez-Ahumada, E.; Palagos, B.; Roger, J.M.; McBratney, A. Critical review of chemometric indicators for soil-condition assessment using near-infrared spectroscopy. TrAC Trends Anal. Chem. 2010, 29, 1045–1055. [Google Scholar] [CrossRef]
  8. Soriano-Disla, J.M.; Janik, L.J.; Viscarra Rossel, R.A.; MacDonald, L.M.; McLaughlin, M.J. The performance of visible, near-, and mid-infrared reflectance spectroscopy for the prediction of soil physical, chemical, and biological properties. Appl. Spectrosc. Rev. 2014, 49, 139–186. [Google Scholar] [CrossRef]
  9. d’Acqui, L.P.; Falsone, G.; Bonifacio, E.; Vingiani, S. A comparison between portable and benchtop NIR instruments for the determination of soil properties. Geoderma 2018, 311, 25–34. [Google Scholar]
  10. Wijewardane, N.K.; Ge, Y.; Wills, S.; Lubeley, M. In-situ and laboratory evaluation of a portable spectrometer for soil organic carbon and total nitrogen sensing. Soil Sci. Soc. Am. J. 2016, 80, 744–754. [Google Scholar]
  11. Sharififar, A.; Tso, C.P.; Lee, S.K.; Tang, K.H.D. Evaluating a low-cost portable NIR spectrometer for the prediction of soil organic and total carbon using different calibration models. Catena 2019, 181, 104088. [Google Scholar] [CrossRef]
  12. Mitu, S.M.; Smith, C.; Sanderman, J.; Ferguson, R.R.; Shepherd, K.; Ge, Y. Evaluating consistency across multiple NeoSpectra (compact Fourier transform near-infrared) spectrometers for estimating common soil properties. Soil Sci. Soc. Am. J. 2024, 84, 1324–1339. [Google Scholar] [CrossRef]
  13. Sanderman, J.; Partida, C.; Safanelli, J.L.; Shepherd, K.; Ge, Y.; Mitu, S.M.; Ferguson, R. Application of a Handheld Near Infrared Spectrophotometer to Farm-Scale Soil Carbon Monitoring. Eur. J. Soil Sci. 2025, 76, e70053. [Google Scholar] [CrossRef]
  14. Huang, Y.C.; Ng, W.; Minasny, B.; Tang, Y.; McBratney, A.B. Accessible Soil Spectroscopy: Evaluating Low-Cost Vis–NIR Spectrometers for Resource-Constrained Environments. Eur. J. Soil Sci. 2025, 76, e70248. [Google Scholar] [CrossRef]
  15. Ji, W.; Viscarra Rossel, R.A.; Shi, Z. Accounting for the effects of soil moisture in the prediction of soil organic matter and carbonate in paddy soils with visible and near-infrared spectroscopy. Geoderma 2015, 259, 28–36. [Google Scholar]
  16. Chakraborty, S.; Weindorf, D.C.; Li, B.; Ali, M.N.; Paul, S.; Roy, D.P. Paddy soil property prediction using a portable X-ray fluorescence spectrometer in the lower Gangetic plain of India. Geoderma 2017, 288, 120–129. [Google Scholar]
  17. Dai, L.; Xue, J.; Lu, R.; Wang, Z.; Chen, Z.; Yu, Q.; Shi, Z.; Chen, S. In-situ prediction of soil organic carbon contents in wheat-rice rotation fields via visible near-infrared spectroscopy. Soil Environ. Health 2024, 2, 100113. [Google Scholar] [CrossRef]
  18. Dai, L.; Wang, Z.; Zhuo, Z.; Ma, Y.; Shi, Z.; Chen, S. Prediction of soil organic carbon fractions in tropical cropland using a regional visible and near-infrared spectral library and machine learning. Soil Tillage Res. 2025, 245, 106297. [Google Scholar] [CrossRef]
  19. Elzeiny, W.E.; Eltagoury, Y.M.; Sabry, Y.M.; Khalil, D.A. On-chip photonic MEMS coupled-cavity spectrometer. IEEE Photonics Technol. Lett. 2023, 35, 951–954. [Google Scholar] [CrossRef]
  20. Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
  21. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  22. Wang, N.; Chen, S.; Huang, J.; Frappart, F.; Taghizadeh, R.; Zhang, X.; Wigneron, J.P.; Xue, J.; Xiao, Y.; Peng, J.; et al. Global soil salinity estimation at 10 m using multi-source remote sensing. J. Remote Sens. 2024, 4, 0130. [Google Scholar] [CrossRef]
  23. Quinlan, R.J. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, TAS, Australia, 16–18 November 1992; pp. 343–348. [Google Scholar]
  24. Ramirez-Lopez, L.; Behrens, T.; Schmidt, K.; Stevens, A.; Demattê, J.A.M.; Scholten, T. The spectrum-based learner: A new local approach for modeling soil vis–NIR spectra of complex datasets. Geoderma 2013, 195, 268–279. [Google Scholar] [CrossRef]
  25. Wang, Z.; Chen, S.; Lu, R.; Zhang, X.; Ma, Y.; Shi, Z. Non-linear memory-based learning for predicting soil properties using a regional vis-NIR spectral library. Geoderma 2024, 441, 116752. [Google Scholar] [CrossRef]
  26. Ji, W.; Shi, Z.; Huang, J.; Li, S. In situ measurement of some soil properties in paddy soil using visible and near-infrared spectroscopy. PloS ONE 2014, 9, e105708. [Google Scholar] [CrossRef] [PubMed]
  27. Yang, M.; Chen, S.; Xu, D.; Hong, Y.; Li, S.; Peng, J.; Ji, W.; Guo, X.; Zhao, X.; Shi, Z. Strategies for predicting soil organic matter in the field using the Chinese Vis-NIR soil spectral library. Geoderma 2023, 433, 116461. [Google Scholar] [CrossRef]
  28. Liu, Y.; Lu, Y.; Chen, D.; Zheng, W.; Ma, Y.; Pan, X. Simultaneous estimation of multiple soil properties under moist conditions using fractional-order derivative of vis-NIR spectra and deep learning. Geoderma 2023, 438, 116653. [Google Scholar] [CrossRef]
  29. Nocita, M.; Stevens, A.; van Wesemael, B.; Aitkenhead, M.; Bachmann, M.; Barthès, B.; Dor, E.B.; Brown, D.J.; Clairotte, M.; Csorba, A.; et al. Soil spectroscopy: An alternative to wet chemistry for soil monitoring. Adv. Agron. 2015, 132, 139–159. [Google Scholar]
  30. Lotfollahi, L.; Delavar, M.A.; Biswas, A.; Fatehi, S.; Scholten, T. Spectral prediction of soil salinity and alkalinity indicators using visible, near-, and mid-infrared spectroscopy. J. Environ. Manag. 2023, 345, 118854. [Google Scholar] [CrossRef]
  31. Zhang, S.; Huang, G.; Zhang, Y.; Lv, X.; Wan, K.; Liang, J.; Feng, Y.; Dao, J.; Wu, S.; Zhang, L.; et al. Sustained productivity and agronomic potential of perennial rice. Nat. Sustain. 2023, 6, 28–38. [Google Scholar] [CrossRef]
  32. Chen, S.; Arrouays, D.; Mulder, V.L.; Poggio, L.; Minasny, B.; Roudier, P.; Libohova, Z.; Lagacherie, P.; Shi, Z.; Hannam, J.; et al. Digital mapping of GlobalSoilMap soil properties at a broad scale: A review. Geoderma 2022, 409, 115567. [Google Scholar] [CrossRef]
  33. Peng, Y.; Ben-Dor, E.; Biswas, A.; Chabrillat, S.; Demattê, J.A.; Ge, Y.; Gholizadeh, A.; Gomez, C.; Guerrero, C.; Herrick, J.; et al. Spectroscopic solutions for generating new global soil information. The Innovation 2025, 6, 100839. [Google Scholar] [CrossRef]
  34. Shi, Z.; Wang, Q.; Peng, J.; Ji, W.; Liu, H.; Li, X.; Viscarra Rossel, R.A. Development of a national VNIR soil-spectral library for soil classification and prediction of organic matter concentrations. Sci. China Earth Sci. 2014, 57, 1671–1680. [Google Scholar] [CrossRef]
  35. Viscarra Rossel, R.A.; Shen, Z.; Lopez, L.R.; Behrens, T.; Shi, Z.; Wetterlind, J.; Sudduth, K.A.; Stenberg, B.; Guerrero, C.; Gholizadeh, A.; et al. An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning. Earth-Sci. Rev. 2024, 254, 104797. [Google Scholar] [CrossRef]
  36. Safanelli, J.L.; Hengl, T.; Parente, L.L.; Minarik, R.; Bloom, D.E.; Todd-Brown, K.; Gholizadeh, A.; Mendes, W.D.S.; Sanderman, J. Open Soil Spectral Library (OSSL): Building reproducible soil calibration models through open development and community engagement. PloS ONE 2025, 20, e0296545. [Google Scholar] [CrossRef] [PubMed]
  37. Minasny, B.; Bandai, T.; Ghezzehei, T.A.; Huang, Y.C.; Ma, Y.; McBratney, A.B.; Ng, W.; Norouzi, S.; Padarian, J.; Sharififar, A.; et al. Soil science-informed machine learning. Geoderma 2024, 452, 117094. [Google Scholar] [CrossRef]
  38. Zhang, X.; Chen, S.; Xue, J.; Wang, N.; Xiao, Y.; Chen, Q.; Hong, Y.; Zhou, Y.; Teng, H.; Hu, B.; et al. Improving model parsimony and accuracy by modified greedy feature selection in digital soil mapping. Geoderma 2023, 432, 116383. [Google Scholar] [CrossRef]
  39. Hong, Y.; Chen, S.; Hu, B.; Wang, N.; Xue, J.; Zhuo, Z.; Yang, Y.; Chen, Y.; Peng, J.; Liu, Y.; et al. Spectral fusion modeling for soil organic carbon by a parallel input-convolutional neural network. Geoderma 2023, 437, 116584. [Google Scholar] [CrossRef]
Figure 1. The study area and location of sampling sites for model calibration and validation.
Figure 1. The study area and location of sampling sites for model calibration and validation.
Sensors 26 01805 g001
Figure 2. Location-based Kennard–Stone algorithm for splitting the calibration and validation sets.
Figure 2. Location-based Kennard–Stone algorithm for splitting the calibration and validation sets.
Sensors 26 01805 g002
Figure 3. Spectral characteristics of reflectance spectra (a) and principal component analysis (PCA) score (b) of the collected paddy soil samples using NeoSpectra sensor. In the reflectance spectra plot, the black line represents the global mean, the grey shaded area indicates the standard deviation, and the colored lines correspond to the four sampling depth intervals. In the PCA score plot, the first two principal components (PC1 vs. PC2) illustrate the distribution of soil samples colored by four sampling depth intervals.
Figure 3. Spectral characteristics of reflectance spectra (a) and principal component analysis (PCA) score (b) of the collected paddy soil samples using NeoSpectra sensor. In the reflectance spectra plot, the black line represents the global mean, the grey shaded area indicates the standard deviation, and the colored lines correspond to the four sampling depth intervals. In the PCA score plot, the first two principal components (PC1 vs. PC2) illustrate the distribution of soil samples colored by four sampling depth intervals.
Sensors 26 01805 g003
Figure 4. The scatter plots of the best model (MBL) in spectral prediction of SOM (a), TN (b), pH (c), clay (d), silt (e), and sand (f).
Figure 4. The scatter plots of the best model (MBL) in spectral prediction of SOM (a), TN (b), pH (c), clay (d), silt (e), and sand (f).
Sensors 26 01805 g004
Figure 5. Comparison of model performance in four depth intervals for SOM (a), TN (b), pH (c), clay (d), silt (e), and sand (f).
Figure 5. Comparison of model performance in four depth intervals for SOM (a), TN (b), pH (c), clay (d), silt (e), and sand (f).
Sensors 26 01805 g005
Table 1. Descriptive statistics of SOM, TN, pH, clay, silt, and sand. N, number of samples; Max, maximum; Min, minimum; SD, standard deviation; CV, coefficient of variation (%).
Table 1. Descriptive statistics of SOM, TN, pH, clay, silt, and sand. N, number of samples; Max, maximum; Min, minimum; SD, standard deviation; CV, coefficient of variation (%).
Soil
Property
DatasetNMaxMinMeanSDCV (%)
SOM
(g kg−1)
Whole99568.272.2714.8910.3669.58
Calibration74668.272.2715.4310.8470.25
Validation24949.892.4213.298.5764.47
TN
(g kg−1)
Whole9953.440.20.910.5661.54
Calibration7463.440.20.930.5862.37
Validation2493.170.210.840.4958.33
pHWhole9958.554.46.930.710.1
Calibration7468.554.46.870.7210.48
Validation2498.454.547.10.618.59
Clay
(%)
Whole99583.741.0211.579.481.24
Calibration74683.741.0211.979.8882.54
Validation24966.642.0110.367.6874.13
Silt
(%)
Whole99592.199.5868.5810.2314.92
Calibration74692.1912.1268.0310.6615.67
Validation24986.89.5870.248.6312.29
Sand
(%)
Whole99553.82.8319.857.6838.69
Calibration74653.82.83207.9939.95
Validation24950.953.8219.46.6934.48
Table 2. The coefficient of determination (R2), Lin’s concordance correlation coefficient (LCCC), and the ratio of performance to deviation (RPD) for seven soil properties predicted by four models using a NeoSpectra spectrometer. The metrics for the best model are marked in bold font.
Table 2. The coefficient of determination (R2), Lin’s concordance correlation coefficient (LCCC), and the ratio of performance to deviation (RPD) for seven soil properties predicted by four models using a NeoSpectra spectrometer. The metrics for the best model are marked in bold font.
Soil PropertyPLSRRFCubistMBL
R2LCCCRPDR2LCCCRPDR2LCCCRPDR2LCCCRPD
SOM0.590.71.560.720.831.890.740.861.970.760.872.05
TN0.570.681.530.730.831.920.740.851.970.750.862.01
pH0.460.621.350.490.641.410.50.671.420.530.711.46
Clay0.340.571.230.430.591.330.370.571.260.480.671.39
Silt0.440.551.330.470.621.380.450.621.340.550.71.49
Sand0.260.361.160.440.591.340.50.681.420.530.711.46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, M.; Jin, Y.; Guo, H.; Yu, D.; Qian, J.; Yu, Q.; Shi, Z.; Chen, S. Low-Cost Portable Near-Infrared Spectroscopy for Predicting Soil Properties in Paddy Fields of Southeastern China. Sensors 2026, 26, 1805. https://doi.org/10.3390/s26061805

AMA Style

Li M, Jin Y, Guo H, Yu D, Qian J, Yu Q, Shi Z, Chen S. Low-Cost Portable Near-Infrared Spectroscopy for Predicting Soil Properties in Paddy Fields of Southeastern China. Sensors. 2026; 26(6):1805. https://doi.org/10.3390/s26061805

Chicago/Turabian Style

Li, Minwei, Yechen Jin, Hancheng Guo, Dietian Yu, Jianping Qian, Qiangyi Yu, Zhou Shi, and Songchao Chen. 2026. "Low-Cost Portable Near-Infrared Spectroscopy for Predicting Soil Properties in Paddy Fields of Southeastern China" Sensors 26, no. 6: 1805. https://doi.org/10.3390/s26061805

APA Style

Li, M., Jin, Y., Guo, H., Yu, D., Qian, J., Yu, Q., Shi, Z., & Chen, S. (2026). Low-Cost Portable Near-Infrared Spectroscopy for Predicting Soil Properties in Paddy Fields of Southeastern China. Sensors, 26(6), 1805. https://doi.org/10.3390/s26061805

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop