1. Introduction
In the current global context, marked by accelerated climate change and continuous soil degradation, real-time monitoring of ecosystems is becoming increasingly stringent [
1]. The need to understand and manage soil health and associated environmental factors is more pressing than ever, given the direct impact on agriculture, biodiversity and food security [
2]. Soil health is “the ability of soils to provide multiple functional traits necessary to maintain ecosystem stability” [
3]. Thus, a new concept of “One Health” was introduced [
4], which is a collaborative and transdisciplinary approach which intends to preserve human, animal, and environmental health through surveillance, prevention, and mitigation. Living soil has the ability to support plant and animal productivity, maintain and improve water and air quality, promote plant and animal health, and create a natural, manageable ecosystem [
1]. Soil quality and soil health are often compatible [
5]. Soil health is a more practical expression for research and agricultural communities in terms of current comprehensive soil management and assessment [
6]. The most important features of healthy soils are a gentle slope favoring agriculture, an adequate depth of soil moisture, a sufficient supply of nutrients, biodiversity of organisms, the absence of weeds, resistance to degradation and drainage of excess water [
2].
Inadequate soil management can lead to soil degradation resulting in increased levels of aridification, which leads to a decrease or cessation of soil functions. Soil degradation must be combated through various ways if the United Nations Sustainable Development Goals are to be achieved. The EU Soil Strategy for 2030 also aims to ensure the health and resilience of soils in the European Union, promoting their sustainable use and protecting them against degradation. It aims to achieve objectives such as reducing soil pollution and restoring degraded soils, with a deadline of 2050, reducing pollution to zero, as well as a healthy and sustainable food system [
7].
However, for all these initiatives to be successful and for soil degradation mitigation strategies to be implemented correctly and effectively, it is essential to adopt the right soil quality indicators. Currently, such indicators, where used, differ significantly in terms of type (physical, chemical or biological), intensity, sensitivity and frequency of collection [
6].
The growing adoption of precision agriculture and advanced soil monitoring technologies, including IoT solutions, is further propelling the expansion of the market for online devices designed to monitor soil fertility, which is closely linked to environmental conditions [
8,
9]. Therefore, the growing desire to improve agricultural productivity to feed a rapidly growing population and the growing acceptance of precision agriculture and fertility management solutions are the main reasons driving the development of new state-of-the-art monitoring products now. Additionally, the accessibility of sophisticated soil monitoring equipment at reasonable prices is fueling the market expansion, due to growing government and business initiatives to support sustainable agricultural techniques [
10].
This study responds to this current need by presenting an advanced technological solution for preventing soil aridification. Our solution provides in situ data, which is essential for developing analytical tools that enable informed decision-making. This directly supports efforts to combat desertification, optimize sustainable agricultural practices, and create effective environmental policies for natural resource conservation. Quality soil is vital for sustainable agricultural production, biodiversity conservation and the efficient functioning of ecosystems [
11].
The growing demand to enhance agricultural productivity to feed a rapidly expanding population, coupled with the increasing adoption of precision agriculture and fertility management solutions, are the primary drivers for the development of new, state-of-the-art monitoring products [
12]. Additionally, the affordability of sophisticated soil monitoring equipment is further stimulating market expansion, driven by rising governmental and business initiatives that support sustainable agricultural techniques [
13,
14]. Furthermore, the need to increase agricultural productivity for a growing population, coupled with stringent government regulations on sustainability and the rising need to maintain soil quality, are key drivers for the development of new real-time monitoring and data collection products [
15]. These products provide essential data for making viable decisions and establishing new directives for soil protection.
While conventional laboratory methodologies remain the gold standard for precise soil analysis, their inherent nature presents significant drawbacks, notably high costs, intensive labor requirements, and the provision of discrete data points that preclude the capability for continuous monitoring. The proposed IoT-SoL system is specifically engineered to circumvent these limitations by enabling uninterrupted data acquisition without requiring on-site travel. To provide empirical validation for these advantages, a comparative analysis will be conducted focusing on three critical metrics: Cost, Time Efficiency, and Scalability. Sensors used by data acquisition equipment
Table 1. The findings presented in
Table 1 substantiate the implementation of the IoT-SoL system, clearly demonstrating its superiority over traditional methodologies.
In
Table 1, the proposed system obtains “Excellent” ratings in time efficiency (real-time monitoring) and scalability (modular architecture, easily expandable), demonstrating a reliability (“Very good”) comparable to the “gold standard” precision of laboratory methods.
Soil health evaluation is the process of analyzing soil quality and condition to understand its fertility, structure, and overall capacity to support plant growth. It provides insight into the soil’s ability to promote plant development, nutrient cycling, water retention, and carbon sequestration. Understanding soil health enables farmers and land managers to make informed decisions regarding land use, soil management practices, and the application of fertilizers or soil amendments to enhance soil fertility and productivity [
16].
Soil health monitoring is a concept that refers to the creation and application of practices and technologies aimed at improving soil quality, productivity, and its resilience to environmental stresses [
17].
Bhatnagar, V. et al. [
18] describe soil health monitoring as the evaluation of surface soil properties like moisture, temperature, pH, and nutrients to apply appropriate interventions such as irrigation, fertilization, crop rotation, and conservation tillage. In contrast, our research specifically focuses on measuring key soil quality parameters—temperature, moisture, electrical conductivity (EC), and nutrient levels (NPK)—at varying soil depths. We also measure air quality at the soil level, which significantly influences soil aridification trends.
In contrast to Payero JO [
19] who propose a large-scale, cost-effective Internet of Things (IoT) system for measuring crop water consumption using a weighing system to monitor crop evapotranspiration (ET), our research is based on creating an open-source ecosystem platform. This platform provides real-time, in situ data from both multi-depth soil monitoring and air quality monitoring at the soil level.
In a related context, Jian Zhang et al. [
20] provide an overview of smart agriculture; their study lacks a concrete and innovative platform or a specific framework for the collection of both soil and environmental data. This highlights a gap that our research aims to address.
By combining IoT (Internet of Things) with a pilot SoL (Soil-of-Life) station, soil health management can benefit from several key advantages:
Real-time, remote monitoring of soil conditions at five depths (up to 1 m depth), including temperature, moisture, salinity, NPK, and pH.
Real-time, remote monitoring of environmental conditions, such as solar radiation, PM2.5 and PM10 particle counts, CO2 levels, wind speed and direction, atmospheric temperature, humidity, pressure, and precipitation.
Data-driven adaptive management of soil interventions like irrigation, fertilization, crop rotation, and conservation tillage to prevent land aridification [
21].
Therefore, the implementation of the Ecosystem platform, supported by IoT technology, holds significant promise for making future predictions essential for maintaining fertile land [
22]. This approach is important for achieving smart and sustainable agriculture while also preventing soil aridification.
This work designs the architecture of an Ecosystem platform that can serve as the foundation for developing a network of soil health monitoring platforms. The goal is to prevent soil aridification and, ultimately, desertification.
Thus, by implementing a pilot IoT-SoL station and utilizing IoT technology, we set a precedent for the proactive monitoring of soil degradation and environmental pollution. This contributes significantly to efforts to combat desertification and protect biodiversity. Consequently, this work presents the development and implementation of the Eco-system platform, an advanced software solution designed for collecting data from a heterogeneous set of multi-parameter sensors integrated into the IoT-SoL monitoring platform.
The central objective is to integrate this software into a physical pilot IoT-SoL monitoring station. This station will use innovative technologies and a local Wi-Fi network for efficient data transfer. Consequently, this work details the creation of a robust system for collecting and storing data from various devices and sensors, resulting in a structured database that provides measurable and relevant values for ecosystem analysis. By adopting an agile methodology with iterative development cycles, this paper outlines the processes for developing, integrating, and testing the components. This approach ensures continuous online access to analytical datasets and high-performance computational tools.
2. Electronic Equipment
The Ecosystem platform comprises a software solution engineered for the acquisition of data from a heterogeneous array of multi-parameter sensors. This software is deployed on a physical pilot IoT-SoL monitoring station, which leverages innovative existing technologies and a local Wi-Fi network for data transmission. Furthermore, the project encompasses the design of a robust system for the collection and storage of the resultant sensor data. The outcome of this implementation is a structured database that furnishes measurable and pertinent values, which can be visualized in real-time via platform access.
https://statiesol.incdmtm.ro/viewdatesol.xhtml (accessed on 15 September 2025).
The elaboration of this work allowed the development, integration, and testing of components using a series of existing innovative technologies and an agile methodology. In this approach, development processes follow an iterative life cycle throughout the project. This phase will provide access to an analytical dataset and computational tools, through:
- (1)
Description of electronic equipment
The electronic hardware responsible for data acquisition, processing, and transmission to the project server comprises several key components:
- -
A multi-functional development board, centered around an ESP32-S3R2 System-on-Chip (SoC) microcontroller. Key specifications for this application include a dual-core 32-bit Xtensa® LX7 processor operating at up to 240 MHz, integrated 2.4 GHz Wi-Fi (802.11 b/g/n) and Bluetooth® 5 (LE) connectivity. The board is also provisioned with 512 KB of embedded SRAM, 384 KB of ROM, 2 MB of on-board PSRAM, and 16 MB of external Flash memory. A SIM7670G 4G communication module is included to facilitate mobile network connectivity.
- -
A suite of nine sensors dedicated to capturing pertinent soil and atmospheric data.
- -
An electronic conversion module to translate UART TTL signals into the RS485 protocol.
- -
An integrated power solution consisting of a battery and solar panel charging capabilities.
- -
Visual and auditory indicators (e.g., LEDs, buzzer) to signal operational status.
- (2)
Sensor description
The equipment contains nine sensors, as shown in
Figure 1. Each sensor is identified by a unique address. The sensors acquire and, upon request, deliver numerical values of physical, chemical, and nutrient parameters taken from the soil, as well as values related to atmospheric measurements.
Figure 1 shows the physical configuration of the IoT-SoL station, highlighting its multi-parametric capability through the integration of nine distinct sensors (five deep soil sensors and four environmental sensors).
The chemical and physical parameters acquired by multiparametric sensors are presented in
Supplementary S1, Table S1-1. The system employs nine distinct sensors, each with a unique address. Sensors 1 through 5 (7-in-1 Integration Soil Sensor) are dedicated to collecting soil data, measuring parameters such as temperature, humidity, conductivity, pH, and essential nutrients (Nitrogen, Phosphorus, and Potassium). The remaining four sensors (6 to 9) are used for atmospheric measurements (6-NBL-W-HPRS/Solar Radiation Sensor; 7; 8-NBL-W-PM PM2.5+ PM10 Integrated sensors, 9-Multi-Parameter Ultrasonic Weather Sensor). Specifically, they acquire data on solar radiation, the concentration of PM2.5 and PM10 particles, CO
2 levels, and a comprehensive set of weather parameters, including wind speed, wind direction, temperature, humidity, atmospheric pressure, and precipitation.
The full list of write addresses used by the sensors is presented in
Supplementary S1, Table S1-2. This table, clarifies the scope of the system, showing that it comprehensively collects values from the 7 soil parameters (including NPK, pH and EC) and an extensive set of atmospheric parameters (solar radiation, CO
2, precipitation, etc.), parameters that will support the correlated analysis of aridification.
- (3)
Description of the RS485 communication network
From a hardware point of view, the nine sensors are connected to an RS485 bus whose master is the ESP32-S3 microcontroller [
23]. The sensors have the same supply voltage and the same parameters for data transmission:
- (4)
Supply voltage: 12 V DC
- (5)
Communication parameters given: 9600 bauds 8N1
For the RS485 physical network, the MODBUS communication software protocol was used. According to the accompanying technical manuals, each sensor can receive three commands from the RS485 network master:
- (a)
The address read command is the same for each sensor: 0 x 00 0 x 20 0 x 00 0 x 68.
- (b)
The address write command differs from sensor to sensor. These commands were used to establish the unique address of each sensor.
The third byte represents the new address, while the last two bytes correspond to the CRC, calculated and inserted into the command in low-byte and high-byte order.
Supplementary Table S1-3 lists the writing commands used to set the address of each sensor. These commands are required because every sensor is shipped from the factory with the default address 0 x 01. Therefore, they were used to assign a unique address to each sensor. The address setting commands used for each sensor, as listed in
Supplementary Table S1-3, are fully available in
Supplementary S1.
- (c)
The data read command varies from sensor to sensor, as the number of bytes requested is specific to each device. The two CRC16 bytes also differ accordingly.
The third byte indicates the new address, while the last two bytes are the CRC (Cyclic Redundancy Check) value, which is calculated and appended to the command in low-byte and high-byte order. The time interval between two consecutive readings from the same sensor must be at least 1000 ms.
This provides a comprehensive list of sensor-specific data request commands and the length of the response message in bytes [
24]. These commands are sent from the RS485 network master to each individual sensor to initiate data acquisition. The table outlines the communication protocol by specifying the exact command structure for each sensor and the expected length of the message returned to the master, which is essential for proper parsing and processing of the data.
- (6)
Exit data
Although the length of the data packets varies, the response received from each sensor adheres to a predefined message format. For clarity, an example of a typical response from the soil sensors is provided below:
The specific significance of each byte within this received data frame is detailed in the next
Table 2:
Communication example (obtaining data from sensor):
Command: 01 03 00 00 00 08 44 °C
Response: 01 03 10 01 14 01 46 00 4B 02 D2 00 08 00 08 00 15 00 29 2B 90
Temperature: 01 14 (276 → 27.6 °C)
Humidity: 01 46 (326 → 32.6%)
Conductivity: 00 4B (75 → 75 μs/cm)
PH: 02 D2 (722 → 7.22 pH)
N: 00 08 (8 → 8 mg/Kg)
P: 00 08 (8 → 8 mg/Kg)
K: 00 15 (21 → 21 mg/Kg)
Salinity: 00 29 (41 → 41 mg/L)
During its operational cycle, the electronic equipment provides visual and auditory feedback through three LEDs and a buzzer. These signals indicate the current operational status and to alert the user of specific events.
A key objective of this study is the implementation of robust network architecture. The proposed system is designed to efficiently collect data from the pilot system’s sensors. This is accomplished using the MQTT protocol, which enables a flexible publish/subscribe communication model, with data being stored in a centralized database to ensure secure and organized data management [
25]. The database manages information acquired from the pilot ground station, ensuring that the stored information is both measurable and valid. Data acquisition from the pilot station is followed by its storage to support future decisions regarding soil health. Two user interfaces are under development: a web interface for data visualization and a graphical application for visual data representation. While the strategic transformation of increasing data volumes into managerial decisions is essential for the station’s performance and sustainability, the data collection itself is carried out by installing soil sensors at various depths.
- (7)
Data storage
Data storage is managed on a dedicated server running the Linux operating system. The sensor data is stored in a MySQL database, while a Java EE (Jakarta EE) GlassFish application server handles the querying and management of data received via the TCP/IP protocol.
GlassFish is the Eclipse implementation of Jakarta EE, providing support for key technologies such as Jakarta REST, Jakarta CDI, Jakarta Security, Jakarta Persistence, Jakarta Transactions, Jakarta Servlet, Jakarta Faces, and Jakarta Messaging. This comprehensive support allows developers to create portable and scalable enterprise applications that seamlessly integrate with traditional technologies [
26]. Additionally, optional components can be installed to provide supplementary services and extend the server’s functionality.
Built on a modular core powered by OSGi, GlassFish runs directly on the Apache Felix implementation. It can also function with the Equinox OSGi or Knopflerfish OSGi runtimes. The HK2 component abstracts the OSGi module system to provide components that can also be viewed as services. These services can be discovered and injected at runtime, which allows for flexible and dynamic architecture.
MySQL is an open-source Relational Database Management System (RDBMS). A relational database organizes data into one or more data tables, where data can be linked to one another; these relationships help in structuring the data.
SQL is the language used by programmers to create, modify, and retrieve data from the relational database, as well as to control user access. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer’s storage system. It also manages users, allows network access, and facilitates database integrity testing and backup creation.
The management of sensor data is handled by a MySQL 8.0.40 database server. The relational structure of the database, designed for the efficient management of sensor data, is presented in the next figure.
To facilitate easier management and reduce the database size on the storage disk, the database was normalized. This process established one-to-one or one-to-many relationships between the tables, thereby eliminating the duplication of stored values. The resulting structure, as shown in
Figure 2, consists of several interconnected tables:
- -
The Station (STATIE) table, which serves as a central hub and contains general information about each pilot station, such as its ID, city, name, and geographic coordinates.
- -
The USERS table manages user accounts and is linked to the (STATIE) table.
- -
The Air Data (DATE_AER) table stores atmospheric measurements, including solar radiation, PM2.5, PM10, and CO2 levels.
- -
The Soil Data (DATE_SOL) table is dedicated to storing soil-specific parameters, such as temperature, humidity, conductivity, pH, and nutrient levels (N, P, K).
- -
The Multisensor Data (DATE_MULTISENZOR) table is designed to handle a wide range of data from a single multi-functional sensor, including wind speed, rainfall, and various environmental metrics.
- -
The HEADER_NAME table contains metadata for the various data tables, improving data organization and retrieval.
Figure 2 illustrates the logical diagram (ER Diagram) of the MySQL database that provides efficient storage, eliminates redundancy, and allows complex queries.
This structured design ensures that data is stored efficiently and logically, with clear relationships that prevent redundancy and support robust data queries and analysis.
To establish the logical scheme and mathematical model for the requested problem, a simplified flowchart was implemented in the firmware and is presented in
Figure S1-1, from Supplementary S1. This flowchart illustrates the operational logic of the data acquisition process.
Within the system, data security is implemented on three levels, so during transfer, data is encrypted AES-128 by the LPWAN module and subsequently transmitted to the cloud server using HTTPS/TLS connections. Data integrity is ensured by Authentication Codes (MAC) included in the data packets and when stored, data is protected in the database by access controls and standard security measures.
Data is stored on a dedicated server running the Linux operating system. Sensor data is saved in a MySQL database, while a Java EE (Jakarta EE) GlassFish application server handles the querying and management of the data received via the TCP/IP protocol.
The electronic configuration for data transmission has been optimized using the following elements:
- -
MPPT controller from 20 V to 12 V that ensures efficient battery charging;
- -
Over discharge protection—To prevent battery damage, an integrated or separate BMS (Battery Management System);
- -
DC-DC converter to 5 V (for microcontroller and sensors);
- -
Voltage and current monitoring—An INA219 or similar module to track battery status and system consumption.
Communication with the LTE module is established via the UART (Universal Asynchronous Receiver-Transmitter) protocol for the transmission of AT commands, with an optimal baud rate configured at 115,200.00 bits per second fast enough to ensure effective communication [
27]. The communication sequence is as follows:
- -
The ESP32-S3 GPIOs are on 3.3 V, and the SIM7670G works with 1.8 V/2.8 V logic level converter (ex: TXB0108).
- -
Deep sleep mode has been enabled for ESP32-S3 and puts SIM7670G in low power mode when not transmitting.
- -
Load data at regular intervals to save energy (e.g., every 30 min).
Deep sleep mode is an advanced power saving technique: when no data is being transmitted to the platform, the ESP32-S3 microcontroller enters an extremely low power consumption mode, and the SIM7670G module is also put into a low power mode. Instead of being constantly active, the system collects data at regular intervals (for example, every 30 min) and transmits the information only when it is woken up, the rest of the time, the system remains in power saving mode. This duty cycle is the most effective way to minimize the total power consumption of a monitoring system.
The graphical representation of how the sensor data values are managed is presented in
Figure 3.
Figure 4 illustrates the logical flow of data from sensor collection to application server processing. This flow supports the goal of transforming raw data into measurement reports and decision-making information.
To prevent erroneous data (due to temporary faults or noise) from reaching the main database, the firmware has a validation routine implemented. Thus, all readings must fall within the specified operating range of the sensor. Any value outside this range is rejected and recorded as ‘Invalid’. It is checked whether the change in value between two consecutive readings exceeds a physically impossible rate of change (e.g., a change of 5 °C within 5 min at a depth of 80 cm is suspected), indicating a sensor error.
Therefore, the IoT-SoL system is designed with a ‘store-and-forward’ mechanism to ensure continuous monitoring despite network interruptions. The station is equipped with a non-volatile storage module (e.g., a MicroSD card or EEPROM memory). If the wireless transmission (e.g., LoRaWAN) fails after reading, the data is immediately saved locally. A status flag is set, and upon the next successful connection attempt, the system retransmits the stored packets first, ensuring the integrity of the data time series.
On the analysis platform, if a data series for a particular parameter (e.g., NPK) is consistently marked as ‘Invalid’, an imputation model based on the inter-sensor correlation established by Pearson analysis (e.g., NPK estimation based on EC values, which is a more robust sensor) can be used to fill the gaps, thus allowing the aridification algorithms to run uninterrupted. The system also provides a maintenance alert for any sensor that returns invalid data for a predefined period (>24 h). This alert is automatically triggered on the user’s dashboard, requesting on-site maintenance intervention.
The database used consists of the following data according to
Table 3—database structure:
To access the website that manages the measurement values from the soil station’s sensors, use the following address:
http://statiesol.incdmtm.ro (accessed on 15 September 2025). The screen shown in
Figure 4 will then be displayed:
To view the values of the measurement sensors, first select the desired measurement station, then select the appropriate link to access the corresponding data.
3. Sensor Experimental Setup and Validation
The Open Access Data Platform serves as an important instrument by making the collected, validated, and structured data readily available to researchers, agronomists, and interested stakeholders. This dissemination mechanism not only facilitates in-depth analysis and data merging but enables informed decision-making aimed directly at improving agricultural management. By leveraging this information, users can accurately assess the current state of land aridification and swiftly identify specific nutrient deficiencies (both macro- and micronutrients) essential for achieving quality agriculture. Thus, the platform supports the transition toward precision agriculture, optimizing resource utilization and enhancing crop sustainability and yield.
3.1. Sensor Experimental Setup
To evaluate the performance and the calibration methodology of the proposed system, a rigorous experimental configuration was established to analyze data measured across three distinct soil types: Chernozem (Romanian Plain), alluvial soils (Danube Meadow area), and sandy soils (Oltenia Plain). This configuration, detailed in the following points, was designed to validate the sensors under simulated field conditions:
- -
The test objective is the validation of the accuracy and reliability of the 5 multi-parametric sensors (S1–S5) in the IoT station by comparing them with a reference sensor (e.g., NBL-S-HS/Soil Handheld Meter).
- -
The target factor, which is an independent variable, is soil type (3 types: Soil I, Soil II, Soil III).
- -
The observation factor is depth levels; the measurements were made at 5 distinct depths (levels) to simulate real-world field use: 20 cm, 40 cm, 60 cm, 80 cm, and 100 cm.
- -
The measured variables are Temperature (T), Moisture (M), Electrical Conductivity (EC), pH, Nitrogen (N), Phosphorus (P), Potassium (K), and Total Dissolved Solids (TDS).
- -
The measurements are quarterly averages.
3.2. Experimental Results
In this work, three types of soil were measured. In order to obtain a validation of the measurements, these soils were analyzed in two ways:
- -
Mode I—the first measurement was performed using the 5 multisensors (NBL-S-TMC-7/7-in-1 Soil Integrated Sensor) simultaneously at each depth as shown in
Figure 1.
- -
The second measurement was made using a corresponding standard sensor (NBL-S-HS/Soil Handheld Meter) which is used to quickly measure agricultural environmental parameters such as soil temperature and humidity, PH, salinity and electrical conductivity, which are displayed in real time on the display and the data is stored in the internal chip of the speed recorder. After measurement, the data from the logger can be downloaded to the calculator via the included software for easy research or storage, as shown in
Figure 5.
The electrochemical sensors (pH and EC) used in the IoT-SoL system are equipped with Automatic Temperature Compensation (ATC) circuits. This is very important, since EC and pH readings are strongly influenced by soil moisture temperature. The ATC function automatically adjusts the reading to a standard reference temperature (25 °C), thus ensuring data comparability regardless of in situ soil thermal fluctuations. In our experimental design, we placed the sensors at significant depths (20 cm to 100 cm) based on the fact that soil temperature at depth is much more stable and less susceptible to extreme daily variations (hot/cold) than surface temperature. Therefore, most of the essential readings (moisture, NPK) are physically protected from discrepancies caused by rapid thermal fluctuations.
To eliminate random noise and short-term discrepancies (caused, for example, by electrical interference or environmental micro-fluctuations), the collected data is subjected to algorithmic filtering (e.g., moving average filter or similar technique) before being stored. This process smooths the data, eliminating abnormal peaks (outliers) that do not represent a real change in the ground state.
Subsequently, the aggregated experimental results obtained from measurements across the three distinct soil types (Chernozem, alluvial, and sandy soils), collected from their corresponding geographical regions in Romania (Romanian Plain, Danube Meadow, and Oltenia Plain, respectively), are presented. These results comprise the calculated quarterly average values for each of the measured variables (T, M, EC, pH, N, P, K, and TDS). To enable a rigorous evaluation of the IoT-SoL system’s precision, these averages are directly compared against the baseline values derived from the aforementioned reference sensor (the established gold standard). The comparative analysis of these datasets is crucial for demonstrating the reliability and accuracy of the integrated IoT-SoL sensors under simulated field conditions [
28,
29,
30]. The values for each table presented (
Table 4,
Table 5 and
Table 6) are the quarterly averages obtained, and S1–S5 are five soil sensors of the same type (7-in-1 Integration Soil Sensor).
The centralization of the values (Soil I, II, III) in
Table 4,
Table 5 and
Table 6 presents the quarterly averages of the values measured by the 5 IoT sensors (S1–S5) at 5 depths (20 cm to 100 cm) which are subsequently compared with the values of the reference sensor.
The main finding is that the readings of the IoT sensors are extremely similar to those of the reference sensor (minimal differences on most parameters), which can confirm the accuracy of the system. A main trend observed within these tables is the decrease in Moisture (M) and Nutrients (N, P, K) with depth (especially below 40 cm), reflecting the natural dynamics of the soil.
3.3. Analysis of Results
In the analysis of the results, the correlation function was used to reveal how variables within a system influence each other. For example, a strong positive correlation between two variables suggests a close relationship, implying that changes in one variable likely influence the other. By identifying significant correlations, the correlation function helps to simplify complex datasets by highlighting the most important relationships and minimizing the focus on less relevant factors. Correlation analysis forms the foundation for building statistical models that can be used to predict values and gain a deeper understanding of complex processes. It also provides valuable insights for validating existing hypotheses.
n—sample size.
x—individual values of the variable x.
y—individual values of the variable y.
—arithmetic mean of all x values.
—arithmetic mean of all y values.
The Pearson Correlation Coefficient (rxy) measures the strength and direction of a linear relationship between two variables. Its value ranges from −1 to +1. An absolute value close to 1 indicates a strong correlation (high similarity in behavior), while a value close to 0 indicates a weak or nonexistent correlation.
Applying Colton’s rules (stated in 1974), the following apply:
*—A correlation coefficient from 0.25 to 0.50 (or from −0.25 to −0.50) means a weak correlation (acceptable degree of association).
**—A correlation coefficient from 0.50 to 0.75 (or from −0.50 to −0.75) means a moderate to good correlation.
***—A correlation coefficient greater than 0.75 (or less than −0.75) means a strong correlation (very good degree of association).
The limits of the Pearson coefficient are as follows:
Thus, as the value of the Pearson correlation coefficient approaches 1 (in absolute value), the “intensity” of the linear relationship between the two variables will be higher [
32,
33].
Supplementary Tables S3-(1–6) present the Pearson linear correlation coefficients between all measured parameters (T, M, EC, pH, N, P, K, TDS). The main statistical result that emerges from the analysis of these tables is the existence of a strong and significant linear correlation (marked with *** or **) between pairs of variables such as Nitrogen (N) and Potassium (K) and, in certain soils, between Humidity (M) and Temperature (T).
As a finding, we can state that the matrices of the IoT sensors (Matrix 1) are very close to those of the reference sensor (Matrix 2), which demonstrates that the IoT system accurately measures the relationships between soil parameters.
To verify the reproducibility and scientific reliability of the measurements, a percentage of similarity between the two correlation matrices was calculated for each soil type studied. Therefore, we used a method called “Cosine similarity matrix”, which is a common technique in data analysis to measure the similarity between two non-zero vectors.
To calculate the cosine similarity between the two vectors V
1 and V
2, we will use the following formula [
34]:
where
S—similarity;
V1 × V2 is the dot product between the vectors;
are the Euclidean norms (or lengths) of the vectors.
Consequently, to implement the selected multivariate analysis methodology (or: machine learning algorithm/statistical method), a necessary step of data preprocessing was required. This step involved the transformation of the correlation matrices, which were obtained individually for each soil type under study, into a single vector format. This vectorization enabled the concise and uniform representation of the relational structure among the measured variables for each soil environment (Chernozem, alluvial, and sandy soils). Hence, we obtained the input vectors, which standardize the correlation data for each soil type. These vectors, representing the correlation matrices for each soil type studied, are presented in
Table 7:
Table 7 provides the vector data for the calculation of the Cosine Similarity, which quantifies the differences between the correlation matrices of the IoT sensors and the reference ones.
The main statistical result is the degree of similarity obtained from the comparative analysis: 99.01% for Sol I, 99.03% for Sol II and 99.86% for Sol III. This extremely high similarity (over 99%) strongly supports the validity and robustness of the IoT-SoL system as a reliable soil monitoring tool.
Following the comparative analysis of the correlation matrices, we obtained a major degree of similarity that can support the validity of Autonomous IoT-SoL system for monitoring soil quality in the context of climate change and preventing aridification. After some laborious calculations, we obtained the following similarity percentages:
- -
For soil I—a similarity of 99.01%;
- -
For soil II—a similarity of 99.03%;
- -
For soil III—a similarity of 99.86%.
Consequently, we can affirm that a similarity coefficient exceeding the threshold of 99% indicates a considerable concordance between the two compared correlation matrices (that obtained via the IoT-SoL system and the reference matrix). This high level of statistical accuracy not only validates the precision of the integrated sensors within the autonomous IoT-SoL system but essentially demonstrates the reliability of a novel measurement and verification methodology based on this system and the open-access Ecosystem platform. This finding establishes IoT-SoL as a viable and reproducible instrument for soil parameter monitoring.
All statistical analyses, including Pearson correlation matrices and Cosine Similarity calculation, were performed using the Excel program.
To provide an aggregate and rigorous validation of the measurement capability of the IoT-SoL system against the reference sensor, we employed Cosine Similarity (CS). This metric measures the cosine of the angle between the two correlation matrices (representing the IoT system and the reference system) viewed as high-dimensional vectors. A result close to 1 (or 100%) indicates high similarity, confirming that the two systems exhibit a near-identical pattern of correlations between the measured parameters.
The results of the Cosine Similarity analysis for the three soil types are summarized below and visually presented in
Figure 6:
To ensure a robust statistical validation of the IoT-SoL monitoring system, two complementary statistical methodologies were utilized: Pearson Correlation and Cosine Similarity. Pearson Correlation was employed to assess the strength and direction of the linear relationship between the variables measured by the IoT sensors and the reference values, offering insight into data consistency. Simultaneously, Cosine Similarity was used to determine the angular similarity between the correlation vectors (derived previously), thereby quantifying the structural alignment of the datasets. These approaches serve complementary roles in the validation process, and the comparative results are detailed in
Table 8:
In conclusion, we can say that the Pearson correlation matrices show that, at the individual level, each pair of parameters behaves similarly between the IoT station and the reference sensor, whereas the cosine similarity takes all these individual correlations (the entire matrix) and condenses them into a single percentage score. Therefore, we can say that regardless of the individual relationships, the overall response pattern of our IoT system is almost identical to that of a certified reference system.