1. Introduction
Agriculture is a fundamental source of income and food worldwide. To enhance both the quantity and quality of agricultural production, it is essential to monitor the quality of water and soil [
1]. Healthy soils not only contribute to food security and nutritional health but also play a crucial role in mitigating climate change and reducing environmental pollutants. However, agriculture in semiarid regions faces significant challenges due to soil texture and composition, leaching processes, and interactions with groundwater, all of which affect crop productivity and sustainability [
2].
All of the complex phenomena mentioned above can be generally characterized through patterns that allow for qualitative differences to be discerned between different soil samples. These patterns can be analyzed using statistical techniques and different types of experiments that measure the electromagnetic responses of soils. For their analysis, statistical techniques can be applied and various experiments can be performed that analyze the electromagnetic response of agricultural samples. Since these complexities generate the characteristic patterns of agricultural soils, studying them is relevant for recognizing different types of soils. In this context, the characterization of agricultural soils using interdigitated sensors (IDSs) combined with principal component analysis (PCA) presents a powerful approach for detecting complex patterns in multivariate datasets [
3].
IDSs measure electromagnetic properties that correlate with the chemical and physical composition of the soil, providing valuable insights into factors such as the moisture, salt content, and presence of organic or inorganic compounds [
4]. These microwave sensors are well-suited for diverse environments due to their high sensitivity, rapid response time, low manufacturing cost, and simple design [
5,
6,
7]. Their operation is based on the interaction of electromagnetic waves with the analyte under test, affecting the S11 reflection coefficient. Changes in resonant frequency or signal amplitude, or peak shifts in the S11 parameter can be correlated with variations in soil or water composition, enabling precise pattern recognition [
8].
PCA further enhances this analysis by reducing data dimensionality, identifying key sources of variability, and facilitating pattern visualization [
9,
10]. PCA is a multivariate technique used to analyze relationships between variables within a dataset. The objective of PCA is to decrease the dimensionality of data, reducing the number of original set of variables into a smaller set of non-correlated variables known as principal components. These principal components are linear combinations of the original set of variables and are selected so as to capture the maximum possible variance or information within the data. By reducing the principal components to two or three components, the data can be plotted and visually inspected, extracting patterns that characterize the dataset. In addition, PCA allows one to reduce the noise within the original dataset, improving data interpretation [
10,
11].
Mathematically speaking, the PCA method is a linear algebra problem. The problem consists of finding new variables that are a linear combination of the original variables. When these new variables are arranged hierarchically to maximize the variance of the dataset, they are called principal components. Associated with the principal components, there are eigenvectors and eigenvalues. The eigenvectors define the directions in the multidimensional space of the dataset where the variances are maximized. The eigenvalues measure the proportion of variance explained by each principal component [
12]. Previous studies have demonstrated the effectiveness of PCA in distinguishing different soil types and their interactions with agrochemicals [
13]. While IDSs have shown promise in detecting soil properties at local scales, their application in semiarid environments remains limited [
14,
15].
The increasing demand for “smart agriculture” solutions highlights the importance of integrating modern sensing techniques into soil monitoring. Smart agriculture aims to optimize agricultural production by utilizing sensor networks, measuring equipment, and computational methods to enhance resource efficiency [
16,
17,
18,
19].
Building on these observations, this study investigates the combined capabilities of PCA-based pattern recognition analysis and microwave IDSs for recognizing the patterns of agricultural soils at varying depth. Soil samples were collected and analyzed from depths of 0 to 90 cm following a standardized protocol to ensure representativeness. The results provide insight into soil composition patterns and offer evidence of interactions between soil layers and groundwater. These findings have direct implications for understanding glyphosate behavior in semiarid agricultural soils. The El Potrillo ranch, located in the state of Chihuahua, México, provides an ideal setting for studying these dynamics. Its proximity to the Bustillos lagoon offers an opportunity to investigate soil electromagnetic patterns, as well as their interactions with water sources and agrochemicals.
2. Materials and Methods
2.1. Soil Sampling and Site Description
To conduct the pattern recognition study in agricultural soils, four soil samples were collected from the El Potrillo ranch in Chihuahua, Mexico; coordinates: 28°31′41.0″ N, 106°41′47.5″ W (
Figure 1). This site was selected due to its importance in cultivating corn, cotton, sorghum, and chili crops and its proximity to Bustillos lagoon, which is recognized as a migratory bird sanctuary and provides the opportunity to complete a more detailed study of the soil patterns in conjunction with those of the waters studied in reference [
20]. Besides this, agricultural soils have representative characteristics of agricultural soils in semiarid regions, where soil texture, leaching processes, and interactions with groundwater significantly influence agricultural productivity and soil quality.
At each sampling point, a 30 cm × 30 cm excavation was performed, and agricultural soil samples were collected at four depths: M1 (surface), M2 (30 cm), M3 (60 cm), and M4 (90 cm). One kilogram of soil was collected at each depth and immediately sealed in airtight bags to prevent moisture loss and external contamination, following a standardized protocol to guarantee their representativeness and consistency.
To establish comparison materials, distilled water (DT), deionized water (DI), and commercial glyphosate at two concentrations (GLY-05% and GLY-01%) were included in this study. Distilled and deionized water were chosen as standard comparison liquids due to their high purity and minimal ion presence, while glyphosate was selected to evaluate sensor responses to a chemically distinct, more viscous liquid, and because it is the herbicide that is applied to these agricultural soils.
2.2. Sample Preparation for Microwave Sensing
The collected soil samples were prepared for microwave characterization through a homogenization procedure. A mass of 500 g of each soil sample was mixed with 1 L of deionized water to standardize moisture content, thereby eliminating variability due to natural differences in soil humidity. The mixture was then processed in a Vitamix industrial blender for 60 min, ensuring uniform particle dispersion and the removal of agglomerates. This optimized soil interaction with electromagnetic waves during sensor characterization.
2.3. Microwave Sensor Fabrication and Vector Network Analyzer Measurements
An array of four interdigitated microwave sensors with different geometrical configurations were used for electromagnetically characterization. This array includes designs with 3, 6, 9, and 12 interdigitated fingers, referred to as 3F, 6F, 9F, and 12F, shown in
Figure 2a–d. All sensors were fabricated on flexible polyimide substrates (50.8 µm thick) with a relative dielectric constant of 3.2 coated with a 35 µm copper layer from Pyralux
® AG Dupont
TM company, Wilmington, DE, USA. The interdigitated electrode patterns were designed using a direct laser writing photolithography system MicroWriter ML3, from Durham Magneto Optics Ltd., Durham, UK. This equipment allows for dynamic adjusted resolution, selecting features sizes between 0.6 µm and up to 5 µm during exposure. Each substrate was cut into 2.5 × 4 cm rectangles using a wire saw model 850 from South Bay Technology, San Clemente, CA, USA. Then, all substrates were cleaned with isopropyl alcohol for 10 min in an ultrasonic bath model Branson 0510, Brookfield, CT, USA. In the next step, each substrate was coated with a positive photoresist AR-P 3120, Allresist GmbH, Strausberg, Germany, using a spin-coater model SPIN 20, APT Automation, Bünde, Germany, at 4500 rpm for 120 s to achieve a uniform 3 µm film over the entire substrate. The coated substrates were prebaked at 120 °C for 60 s to promote adhesion and then the sensor design was exposed onto the photoresist using a laser writer with 60 mJ/cm
2. The development step was carried out with AR-300-26 Allresist (Strausberg, Germany) developer for 40 s. Then, the exposed metal was etched in a ferric chloride solution for approximately 20 min, followed by removal of the remaining photoresist using ultrasonic acetone cleaning.
Each sensor was then equipped with a high-frequency SMA connector model 73251-1150, from Molex, Lisle, IL, USA, which is compatible with 50 Ω coaxial cables, minimizing electrical losses due to impedance coupling and enabling connection to the vector network analyzer. Microwave measurements were performed using a Vector Network Analyzer (VNA) model ENA E5063A (Keysight Technologies, Santa Rosa, CA, U.S.A.) with frequency range of 100 kHz–14 GHz, and with 2-port of 50 Ω configuration. This equipment was calibrated using an electronic calibration module N7553A, Keysight, which includes standard patterns for open, short, and load conditions.
For each of the agricultural soil samples M1, M2, M3, M4, distilled water (DT), deionized water (DI), and commercial glyphosate at two concentrations (GLY-05% and GLY-01%), the scatter parameter S11 was recorded every 60 s over a 50 min period, resulting in 50 data points per sample, obtaining a total of 1600 measurements across the four sensors. This electromagnetic characterization of materials using VNA provides the scattering parameters, particularly the reflection parameter S11, which is calculated using the expression S11 = 20log(R), where R = Aref/Ainc. Here, Ainc and Aref represent the amplitudes of the incident and reflected waves, respectively.
The measurement of the reflection parameter S11 was performed using the samples labeled M1 through M4, as well as deionized water (DI), distilled water (DT), and a commercial glyphosate solution (Gly-05% and Gly-01%). To obtain the same sample size, each liquid was poured into a 10 mL glass beaker with an internal diameter of 22 mm to ensure that the electromagnetic wave’s electric field fully interacted with the liquid under test. Subsequently, each sensor was carefully immersed into each sample, ensuring that the interdigitated electrodes were completely submerged, as illustrated in
Figure 2e. This procedure was designed to maximize the interaction between the electromagnetic field and the sample, thereby enabling a thorough evaluation of the reflection characteristics and dielectric behavior of each substance.
All measurements were conducted under controlled conditions at a constant temperature of 23 °C to ensure experimental consistency and minimize potential fluctuations that might affect the accuracy and reproducibility of the results. Maintaining this stable temperature helped to reduce thermal variations that could otherwise influence the permittivity or conductivity of the test samples.
The resulting dataset was processed using principal component analysis (PCA) via OriginPro 9.1.0 software to identify significant patterns in electromagnetic behavior, which were then used for pattern recognition in agricultural soils.
More information on sensor geometry and fabrication processes, the samples from Buttillos lagoon S1–S7 shown in
Figure 1, and techniques of electromagnetic characterization can be found in a previous work [
20].
3. Results and Discussion
To evaluate the capability of the interdigitated sensor array for pattern recognition between different types of soil and water samples, the obtained measurements were processed using principal component analysis (PCA).
Table 1 shows the eigenvalues, the percentage of the variance, and the cumulative variance percentage for the samples set: M1, M2, M3, M4, DI, DT, GLY-01%, and GLY-05%. The percentage of the variance is the eigenvalue expressed as a percentage while the cumulative percentage of variance represents the sum of the percentage of the variance starting from the highest to the lowest value. According to
Table 1, the first principal component accounts for the majority of the variance in the multivariate dataset, explaining 99.73% of the total variability. The second principal component contributes only 0.26%, and the third and fourth components contribute even less.
Figure 3 shows the PCA results for agricultural soil samples (M1, M2, M3, and M4) collected at various depths, along with deionized water (DI), distilled water (DT), and glyphosate at two concentrations (GLY-05% and GLY-01%) for comparison. The plot shows that the first principal component (PC1) accounts for 99.73% of the total variance, while the second component (PC2) explains only 0.26%. Together, these components capture 99.99% of the total variance, suggesting that most of the meaningful information is embedded in PC1, allowing for a simplified but highly effective representation of multivariate data.
When examining the sample distribution in the PC1–PC2 plane, it becomes apparent that DI and DT exhibit minimal variability, indicating a high degree of similarity in their electromagnetic response. This is consistent with the controlled nature of both types of water, which are processed to remove impurities and therefore tend to show homogeneous characteristics.
In contrast, the projections corresponding to commercial glyphosate at 5% (GLY-05%) are positioned far from both water and soil samples, indicating a significantly different spectral pattern. Interestingly, the 1% glyphosate solution (GLY-01%) aligns more closely with the soil samples, specifically M1–M4, suggesting a greater affinity or interaction with the soil matrix at lower concentrations. These findings are consistent with previous research indicating that lower concentrations of glyphosate exhibit increased interaction and retention in deeper soil layers due to adsorption phenomena. This behavior has been observed in studies modeling solute transport in soil interfaces [
21], as well as in numerical simulations of water flow and solute transport affected by soil layering [
22]. Furthermore, Fenn et al. [
23] demonstrated that the adsorption and desorption of glyphosate increase with soil depth, reinforcing the notion of enhanced retention at greater depths.
A closer inspection of the soil samples reveals that M4, collected from the deepest layer, exhibits the greatest proximity to GLY-01%. This finding may imply that agrochemicals applied at the surface tend to percolate and accumulate in deeper soil layers, where higher density, reduced aeration, and greater moisture retention enhance compound adsorption [
24].
As we move toward shallower depths (from M4 to M1), a gradual shift in PC1 positioning is observed: the samples become increasingly distant from glyphosate and closer to DI and DT. This trend may reflect a reduced interaction between the upper soil layers and glyphosate, possibly due to degradation, volatilization, or limited adsorption in particular; the surface soil sample (M1) exhibits a PCA profile more closely aligned with the water samples, indicating a reduced glyphosate retention capacity and greater susceptibility to external environmental influences. These results are consistent with previous observations highlighting that herbicides such as glyphosate typically undergo significant degradation and volatilization at the soil surface due to increased exposure to environmental conditions. Specifically, Qingshan Li et al. [
25] attributed this behavior to surface-level photodegradation, microbial activity, and changes in compound mobility associated with the lower bulk density and increased porosity characteristic of the upper soil layers. Similar mechanisms have also been extensively reviewed by Borggaard et al. [
26].
Figure 4 shows the values of the principal components PC1 and PC2 resulting from an analysis that includes agricultural soil samples (M1, M2, M3, and M4), distilled water (DT), deionized water (DI), glyphosate at two concentrations (GLY-05% and GLY-01%), and water samples from Bustillos lagoon (S1, S2, S3, S4, S5, S6, and S7), as reported in reference [
20]. This analysis enables a visual assessment of the correlation between the characteristics of the soil samples and those of the lagoon water samples.
The relationship between water samples S1–S7 from Bustillos lagoon and soil samples M1–M4 from El Potrillo ranch, analyzed in the two-dimensional space defined by PC1 and PC2, reveals distinctive patterns that reflect the particular behavior of each group. The PC1 results indicate low variability among the lagoon water samples, while the limited variance observed in PC2 suggests consistent patterns in these samples, likely due to the natural homogenization typical of aqueous environments.
The Bustillos lagoon water samples (S1–S7) cluster within a specific region of the PCA plot, indicating high similarity in their electromagnetic behavior. In contrast, the soil samples exhibit increasing dispersion along PC1 as the extraction depth increases, suggesting greater heterogeneity in the soil properties at deeper levels. Notably, the deepest soil sample (M4) shows a higher degree of similarity with the lagoon water samples in PC1. According to Hansen et al. [
27] and Lu et al. [
24], deep soil layers can act as hotspots for geochemical processes that influence solute transport. This behavior is associated with increased soil density and reduced porosity at greater depths, resulting in more pronounced chemical exchange between groundwater and deeper soil strata, driven by mechanisms such as mineral adsorption and solute diffusion.
Conversely, surface soil samples such as M1 exhibit a lower correlation with the lagoon waters, possibly due to natural leaching and volatilization processes occurring in these more exposed layers, along with reduced interaction with groundwater. Geographical proximity may also play a role. As shown in
Figure 1, sample S1 is located near the extraction site of soil samples M1, M2, M3, and M4. In
Figure 4, the PCA value of S1 is observed to be closely aligned with those of the soil samples, particularly in PC1. This suggests a potential direct interaction between surface water and soil, influenced by factors such as water chemistry and soil permeability [
25].
Additionally, the DT and DI water samples showed remarkably similar behavior, with closely clustered PC1 and PC2 values. This high correlation is attributed to their similar physicochemical properties—such as low conductivity and the absence of dissolved ions—which clearly distinguishes them from both the lagoon water and soil samples. These results demonstrate the effectiveness of principal component analysis in discriminating between different types of samples based on pattern recognition.
Table 2 shows the eigenvalues of the correlation matrix for the set of samples M1–M4, DI, DT, GLY-01%, and GLY-05%, together with the water samples (S1, S2, S3, S4, S5, S6, and S7) from Bustillos lagoon reported in reference [
20]. The table includes the percentage of variance explained by each principal component and their cumulative contribution. PC1 accounts for most of the variance in the multivariate data, explaining 97.33% of the total variance, demonstrating its strong influence on the dataset, whereas PC2 contributes an additional 2.21%. In contrast, the third and fourth components together explain only 0.46% of the variance, reinforcing that most of the original information is concentrated in the first two principal components. These results demonstrate that dimensionality reduction is effective for pattern recognition in this analysis.
To evaluate the possibility of improving the understanding and discrimination of patterns in agricultural soils by increasing the explained variance of PC1 and PC2, a principal component analysis was performed by sequentially eliminating each of the interdigitated sensors. For this analysis, combinations of three sensors featuring different interdigitated finger geometries were tested, as shown in
Figure 5, which shows the following sensor configurations: (a) 6F–9F–12F, (b) 3F–9F–12F, (c) 3F–6F–12F, and (d) 3F–6F–9F.
In
Figure 5, it can be seen that when sensor 3F is excluded, the PC1 value reaches its maximum (99.01%), indicating that this sensor has a lower correlation with the other interdigitated sensors. This suggests that sensor 3F introduces redundant or non-complementary information compared to the others. To corroborate this result, the correlation matrix r between the sensors was calculated and visualized using a heatmap (
Figure 6). This heatmap shows that the row and column corresponding to sensor 3F present the lowest correlation, represented by the coolest colors, with the rest of the sensors.
This identification of sensor 3F as a redundant element within the sensor array aligns with previous findings in the literature, where principal component analysis (PCA) has been used to enhance data interpretability by removing variables with a low contribution or weak correlation. Sena et al. [
13] demonstrated that excluding redundant variables improved the accuracy of soil property characterization. Similarly, Zhuo et al. [
28] and Huang et al. [
29] reported that the application of PCA in sensor network design and environmental modeling allowed for the removal of weakly correlated variables, improving both the representativeness of the data and the performance of the predictive analyses. These studies support the methodological approach adopted in the present work, confirming that strategic sensor exclusion can significantly improve the effectiveness of pattern recognition in agricultural soil analysis.
Additionally, the correlation matrix
r was computed for each combination of three sensors, as shown in
Figure 7. In this figure, the heatmap corresponding to the exclusion of sensor 3F (
Figure 7a) exhibits the highest intensity, indicating a stronger correlation among the remaining sensors (6F, 9F, and 12F).
To quantify the correlation among the sensors, the determinant of the correlation matrix
r was calculated for each combination of three sensors (
Table 3). A determinant close to zero indicates a stronger correlation among the remaining sensors. When sensor 3F is excluded, the determinant of
r is the smallest in magnitude (4.31 × 10
−3), reinforcing the conclusion that sensors 6F, 9F, and 12F exhibit a higher degree of mutual correlation in the absence of sensor 3F.
4. Conclusions
This work shows that PCA is a powerful tool for pattern recognition, demonstrating its effectiveness in analyzing different agricultural soil depths and water samples from Bustillos lagoon. The analysis reveals a significant correlation between deep soil samples and lagoon water samples, where the deepest soil sample, M4, exhibited a greater similarity to water sample S1, which is the closest to the soil extraction point.
PC1 accounts for 97.33% of the total variance, while PC2 explains only 2.21%, amounting to 99.54% of the total variability in the samples. Deionized and distilled water have similar principal component scores, whereas the 5% glyphosate sample shows a marked difference from both the water and soil samples. In contrast, the 1% glyphosate sample scores are similar to those of the soil samples, particularly sample M4, corresponding to the greatest extraction depth.
To further refine the analysis, a PCA was performed by sequentially eliminating each IDS, looking for redundant sensors which would allow for working with a smaller sensor array. The exclusion of sensor 3F maximized the PC1 value, indicating a lower correlation of this sensor with the remaining sensors, as confirmed by correlation matrices and determinant values. Additionally, removing sensor 3F resulted in the smallest determinant of the correlation matrix r, reinforcing the conclusion that sensors 6F, 9F, and 12F exhibit a stronger mutual correlation in its absence.
Taken together, the methodology applied in this study represents an improvement over conventional approaches by integrating PCA not only for classifying soil and water samples but also as a tool for sensor optimization. The strategic exclusion of redundant sensors, supported by correlation matrices and determinant analysis, demonstrates that a similar or improved pattern recognition performance can be achieved with a reduced sensor array. This advancement contributes to more efficient and scalable implementations of electromagnetic sensing in agricultural soil monitoring, reducing system complexity without compromising analytical accuracy.