A WebGIS-Based System for Supporting Saline-Alkali Soil Ecological Monitoring: A Case Study in Yellow River Delta, China

: The monitoring and evaluation of soil ecological environment is very important to ensure the saline-alkali soil health and the safety of agricultural products. It is of foremost importance to, within a regional ecological risk-reduction strategy, develop a useful online system for soil ecological assessment and prediction to prevent people from suﬀering the threat of sudden disasters. However, the traditional manual or empirical parameter adjustment causes the mismatch of the hyperpa-rameters of the model, which cannot meet the urgent needs of high-performance prediction of soil properties using multi-dimensional data in WebGIS system. For the end, this study aims to develop a saline-alkali soil ecological monitoring system for real-time monitoring of soil ecology in the Yel-low River Delta , China. The system applied advanced web-based GIS, including front-end and back-end technology stack, cross-platform deployment of machine learning models, and a database embedded in multi-source environmental variables. Th e system adopts a ﬁve -layer architecture and integrates functions such as data statistical analysis, soil health assessment, soil salt prediction and data management. The system visually displays the statistical results of air quality, vegetation index and soil properties in the study area. It provides users with ecological risk assessment functions to analyze soil heavy metal pollution. Specially, the system introduces a tree structured Parzan estimator (TPE) optimized machine learning model to achieve ac curate prediction of soil salinity. The TPE - RF model had the highest prediction accuracy (R 2 = 94.48%) in the testing set in comparison with the TPE - GBDT model, which exhibited the strong ability for the nonlinear relationship be-tween environmental variables and soil salinity. The system developed in this study can provide accurate saline-alkali soil information and health assessment results for government agencies and farmers, which is of great signiﬁcance for agricultural production and saline -alkali soil ecological protection.


Introduction
Saline-alkali land is an important reserve cultivated land resource, and it is an important buffer zone for global environmental change [1].Saline-alkali soil health is not only a critical foundation for sustaining global ecosystems and agricultural development [2,3], but also plays an important role in ensuring food security, maintaining biodiversity, and promoting sustainable development [4][5][6].In recent years, with the acceleration of urbanization, the decline of soil quality (such as soil salinization and heavy metal accumulation) has become an important cause of threatening saline-alkali soil health [7,8].For example, soil heavy metals can cause problems such as farmland soil degradation and pollution by changing soil physical and chemical properties [9], posing a threat to the quality of agricultural products and human health [10].The prevention and early warning of saline-alkali soil health based on Geographic Information System (GIS) has been widely used.It has the advantages of map data processing, spatial analysis and visualization, and can effectively detect the spatial distribution of regional soil properties [11,12].However, the traditional GIS technology can not achieve the real-time monitoring and evaluation of saline-alkali soil ecological health, and has a lag for the prevention and control of salinealkali soil health risks.The information technology represented by WebGIS is emerging, gradually applying to many fields such as agriculture, soil science, meteorology, ecological protection [13][14][15][16].Compared with the traditional GIS, WebGIS enables real-time interaction for spatial data analysis, multiple functional services and monitoring model [17,18].Therefore, it is of great significance to develop WebGIS system for saline-alkali soil ecological monitoring with integrated prediction and evaluation, which can improve the quality and efficiency of agricultural production and ensure the quality and safety of food.
At present, the development technology innovation of WebGIS system can be divided into two types, including database and evaluation model.The use of database technology in WebGIS system can achieve efficient storage, management and analysis of geographic information data.For example, the Geo APEXOL interface was applied to the WebGIS system for evaluation of nonpoint source pollution on site [19].The interface used a complete database of soil, meteorology, land use, DEM and agricultural management, which realized the automation of the model establishment process [19].Cao et al. used B/S threelayer architecture, Oracle, and AJAX technology to build a database of trace elements in coal, and integrated WebGIS technology to realize the visualization of data and analysis of trace elements [20].The monitoring of soil properties needs data such as topography, climate, vegetation and remote sensing images, and a reliable database can efficiently manage these multi-source heterogeneous data.
The accurate evaluation model is the basis for WebGIS system to accurately predict and evaluate the monitoring target, and the most commonly used model is the linear model.Yong et al. integrated the exponential smoothing prediction model into the WebGIS-based system to better fit the variation of soil heavy metal pollution [15].The physics-based STONE model was used in the WebGIS system, which associated the potential rockfall source information with the earthquake-induced peak ground acceleration (PGA) scenario to simulate the earthquake-induced rockfall trajectory [21].In addition, other linear models such as probability sampling [22] and curve fitting [23] were also applied to the WebGIS disaster monitoring system, and combined with physical models to achieve real-time monitoring of landslides and typhoons [24,25].Because the linear model has the advantages of simple and easy to use and stable fitting relationship, the WebGIS system embedded in the linear model can quickly monitor the target in real time.However, there is a significant nonlinear relationship between environmental variables and monitoring targets, resulting in low prediction accuracy of linear models, which can not meet the high-precision evaluation of soil properties by WebGIS system.
Obviously, compared with linear model, machine learning model has higher accuracy in evaluating soil properties [26].According to our knowledge, the research on embedding machine learning models into WebGIS systems to monitor soil properties has not been reported, but many traditional studies have used machine learning models to predict soil properties.For example, Xiao et al. used random forest (RF), support vector machine (SVM) and extreme gradient boosting (XGB) models to predict soil salinity parameters [27].They found that the overall performance of the XGB model was better than that of the SVM and RF models [27].RF and Cubist models were used to predict the content of heavy metals in soil, and the RF model has higher accuracy in predicting Ni and Cu in comparison with the Cubist model [28].Liu et al. compared the performance of random forest (RF), partial least squares regression (PLSR), support vector machines regression (SVMR), and Gaussian process regression (GPR) models in predicting soil water content (SWC), and proved that the GPR model has the best performance [29].
High-performance machine learning model is the core of accurate real-time prediction of soil properties, which is in line with the future research trend of WebGIS system.Although the choice of model is very important for WebGIS system to predict soil properties online, many studies have found that the low performance and prediction accuracy of the model are essentially caused by the mismatch of hyperparameters of the model [30][31][32][33].However, there is a key shortcoming in traditional machine learning to predict soil properties, which requires manual or empirical adjustment of hyperparameters.That is, hyperparameters can significantly affect the final performance of machine learning models [34][35][36].For example, the hyperparameters of the GBDT model are divided into structural parameters and regularization parameters.The structural parameters are used to control the complexity and prediction accuracy of the model, including max_depth (specify the maximum depth of each tree), max_features (limit the number of features considered when branching the tree), and n_estimations (specifies the number of iterations, the number of trees).The regularization parameter is used to adjust the over-fitting phenomenon of the model, including min_impurity_decrease (limit the amount of information gain when splitting nodes), and reg_alpha (specify the proportion of subsampling in the modeling process) [37].When the hyperparameter space of the prediction model is large, these hyperparameters interfere with each other.Manual parameter tuning lead to long time-consuming and low efficiency in finding hyperparameters, and it is difficult to find the optimal hyperparameters.Quickly finding the optimal combination of hyperparameters is the key to improve the prediction performance and accuracy of the model in WebGIS system.
With the development of information technology, more and more optimization algorithms for machine learning hyperparameters are applied.For example, Yu et al. used the TPE algorithm to optimize the hyperparameters of the CatBoost model, constructed the TPE-CatBoost soil moisture spatial estimation model, and proved that the TPE-CatBoost model has high estimation accuracy [38].Compared with traditional hyper-parameter tuning methods such as manual parameter tuning, grid search (GS), and random grid search (RGS), TPE algorithm has the ability of parameter adaptive adjustment and global exploration, and can search parameters in the entire parameter space [39].Before reaching the global optimal value in the search space, it avoids falling into local optimum [38], which significantly improves the accuracy of machine learning prediction.In order to meet the needs of WebGIS system to deal with massive multi-source heterogeneous data, high-performance algorithms embedded in hyper-parameter optimization will be the inevitable requirement for achieving high-precision prediction and processing multi-dimensional data.However, to our knowledge, the application of TPE-optimized machine learning model to WebGIS monitoring system has not been reported.Specifically, whether WebGIS development technologies such as predictive model markup language (PMML) and Java-PMML (JPMML) can achieve cross-platform deployment of TPE-optimized machine learning models, and encapsulate the deployed models without retraining and reloading to enhance the modular application flexibility and scalability of the core models.Furthermore, whether the TPE-optimized machine learning model can give full play to its best performance and optimal prediction accuracy in WebGIS system is also not clear.
Therefore, the aim of this study is to develop a saline-alkali soil ecological monitoring system based on WebGIS technology in the Yellow River Delta of China, including presentation layer, application layer, data analysis layer, application support layer and database layer.The system provides the functions of soil statistical analysis, soil health evaluation, soil salinity prediction and data management.The machine learning model optimized by TPE is embedded in the system, and the prediction accuracy of soil salinity is improved by combining vegetation index and air quality variables.The purpose of this study is to provide technical and application support for promoting agricultural sustainable development and saline-alkali soil ecological protection in the reginonal area.

Study Area
The study area is located in the Yellow River Delta (YRD) of Shandong Province, China (Figure 1).It is an alluvial plain formed by sediment deposited in the Bohai depression at the mouth of the Yellow River.The study area is in the mid-latitude warm temperate zone and has a warm temperate continental monsoon climate with an average annual temperature of 14.2 °C and annual precipitation of 794 mm, respectively.The topography of the study area is flat, the parent material of the soil is loess alluvium, and the soil is rich in organic matter and nutrients, which provides good conditions for agricultural production.The region has developed agricultural production, with the main crop types being rice, wheat, and maize, and is rich in biodiversity.The area has a long history of oil extraction, fragile soil ecosystems and severe salinization problems.Therefore, the monitoring of soil ecological environment is crucial for the sustainable development of salinealkali in this area.

Requirement Analysis
The Yellow River Delta is the location of Chinese Shengli Oilfield, with a history of more than 60 years of oil exploitation.Human activities such as oil mining and smelting have caused great damage to the soil ecology in the region.The area is also a typical coastal saline-alkali land, and the ecological environment of farmland soil is very complex.This puts forward an urgent need for the monitoring and evaluation of saline-alkali soil ecological environment.The monitoring of the ecological environment of saline-alkali soil in the YRD meets the national ecological protection strategic requirements.The Ministry of Ecology and Environment of China issued the "14th Five-Year Plan for Ecological and Environmental Monitoring" (28 December 2021), which pointed out that it is necessary to implement the environmental quality monitoring network construction project, improve the environmental quality monitoring and early warning capacity, strengthen the water ecological environment monitoring capacity in the Yellow River Basin, and give priority to supporting the ecological monitoring of national nature reserves such as the YRD.In addition, farmers and agricultural management departments need to understand farmland soil information and monitor soil ecological environment in real time for agricultural production and land use in the Yellow River Delta.Although some scholars have carried out research on farmland soil ecological monitoring in the Yellow River Delta region, such as providing solutions for soil improvement, water and salt regulation, there is still a lack of a complete ecological monitoring system as technical support or auxiliary measures.Therefore, this study developed a WebGIS-based saline-alkali soil ecology monitoring system to meet the urgent needs of various users for real-time monitoring of soil salinization and pollution risk prevention and control in the YRD.

Remote Sensing Data
In order to fully cover the scope of the study area, we selected two remote sensing images of Sentinel-2 satellite (L2A level) in the same transit time (23 September 2022), which were obtained from the data sharing website of the European Space Agency (https://dataspace.copernicus.eu/).Using ENVI 5.3 software to preprocess the 11 bands (B1-B11) of Sentinel-2 image data, including band combination, mosaic cropping, and image mosaic, with a resolution set to 10 m × 10 m. Figure 2 shows eight soil texture feature statistics obtained by gray level co-occurrence matrix (GLCM) processing, including Mean, Variance (VAR), Homogeneity (HOM), Contrast (CTRA), Difference (DIS), Entropy (ENT), Second-Order Moment (SECM) and Correlation (CORR).Four vegetation indices were obtained by using the band calculation tool of ENVI software (Figure 2), including Modified Normalized Difference Vegetation Index (MNDVI), Normalized Difference Vegetation Index (NDVI), Optimized Soil Regulated Vegetation Index (OSAVI) and Plant Senescence Reflectance Index (PSRI).The calculation formula of vegetation index is shown in Table 1.

Soil Properties Data
At the last ten days of September 2022, a total of 60 soil samples were collected from 0-20 cm surface soil at each sampling point in the study area.It is used to determine soil salinity, pH and heavy metal content.
(1) Soil salinity was determined by quality method.
After the collected soil samples were pre-treated by removing impurities, air-dried indoors, and ground, they are mixed with water at a 1:5 soil-to-water ratio and shaken.The soil-water mixture was then filtered, and the leachate was extracted with a large-bellied pipette, placed in a beaker for water bath heating, a small amount of 15% H2O2 was added to remove organic matter, and then evaporated.The beaker was dried in an oven at 105 °C for 4 h, cooled and weighed with an analytical balance to measure the residue mass.The beaker was continued to be dried in the oven for 2 h, and then weighed until it was constant weight.The total amount of soluble salts (Equation ( 5)) was calculated as follows: where the total soluble salt represents the mass fraction of soluble salt in the soil, m1 is the sum of the beaker and salt mass, m2 is the mass of the empty beaker, and m is the mass of the sample with the same volume of extracted liquid from the absorbed water.
(2) The soil pH value was determined by the potentiometric method.
Weighing 10 g of air-dried soil sample through a 2 mm sieve into a beaker and added 25 mL of carbon dioxide-free water or calcium chloride solution.Then stired with a glass rod and let it stand for 30 min, then measured it with a pH meter, and recorded the pH value of the solution to be measured after the reading was stable.Figure 4 shows the interpolation results of soil pH and soil salinity obtained by ArcGIS processing.(3) Determination of heavy metal content in soil.
4-6 points of soil were taken from each field, mixed and packed into sample bags, and the coordinates of the sampling points were determined by high-precision GPS.The soil samples were brought back to the laboratory for natural air drying, removal of debris, crushing and grinding and other pretreatments, and then passed through a 200-mesh nylon sieve.We weighed 0.2 g of soil sample, added 6 mL of HNO3, 2 mL HCl and 2 mL HF in a 3:1:1 ratio in a fume hood, and repeated all operations in parallel 3 times.The samples were placed in a microwave deprocessor (CEM MARS5, USA) for digestion, adding one GSS-1 standard soil sample and one blank to each batch.The measured contents of three heavy metals, including As, Pb, and Zn, were measured using an inductively coupled plasma mass spectrometer (Agilent ICP-MS 7800, USA).

Risk Assessment Method of Soil Heavy Metals
The risk assessment of heavy metals in saline-alkali soil is crucial for the protection of farmland ecology.This study utilized three soil heavy metal evaluation methods: Single pollution index, Nemero comprehensive pollution index, and Hakanson potential ecological hazard index.

Single Pollution Index
The single pollution index model is used to evaluate the risk of single heavy metal element in farmland soil [44], and the main heavy metal pollutants and their pollution degree can be determined.The calculation formula (Equation ( 6)) is as follows: where i C (mg kg −1 ) is the measured content of the i-th heavy metal in soil, i S (mg kg −1 ) is the risk assessment standard threshold of the i-th heavy metal, i P is the single factor index value of the the i-th heavy metal element in the soil.The soil heavy metal standard used in this study is based on the "Soil Environmental Quality Agricultural Land Soil Pollution Risk Control Standard (Trial)" (GB 15618-2018) [45].

Nemero Comprehensive Pollution Index
The Nemero comprehensive pollution index can comprehensively reflect the cumulative risk of multiple heavy metals at a single sampling point [46].The calculation formula (Equation ( 7)) is as follows: where total P is the Nemero comprehensive pollution index value of various heavy metals in soil sampling points, max i P is the maximum value of the single factor index of each heavy metal element in the soil sampling point.The Nemero comprehensive pollution index is divided into five grades, and the evaluation results are shown in Table 2.

Hakanson Potential Ecological Risk Index
The potential ecological risk index is a risk assessment method for heavy metals in farmland soil [47], which considers the toxicological properties of each heavy metal, the degree of pollution to the soil and the comprehensive ecological risk.The calculation process (Equations ( 8)-( 10)) is as follows: where i C is the measured value of the concentration of the i-th heavy metal element (mg kg −1 ), i n C is the evaluation reference value of the i-th heavy metal element in saline-alkali soil, and i f C is the pollution coefficient of single element.
where i r T is the toxicity response coefficient of the i-th heavy metal, and i r E is the oten- tial ecological risk index of single the i-th heavy metal element.The toxicity coefficient of the potential ecological risk index of heavy metals is given in Table 3.
RI represents the potential ecological risk index of various heavy metals in salinealkali soil, and the evaluation results of the potential ecological risk index of heavy metals are divided into five levels (Table 4).

System Architecture
The system adopts five-layer architecture, including presentation layer, application layer, data analysis layer, application support layer and database layer (Figure 6).Each module interacts with each other through the interface, and realizes the functions of data statistical analysis, soil health assessment, soil salinity prediction and data management.The system architecture is based on the front-end technology stack (Vue.js,Element UI, Node.js,Axios, and Echarts.js),back-end technology stack (Spring Boot and Mybatis-Plus), cross-platform deployment of machine learning models (Predictive Model Markup Language (PMML), Java PMML), and the MySQL database.The choice of these development tools and technologies can improve development efficiency and system performance, making the system have a good user interface, reliable back-end logic, efficient data interaction and scalable machine learning capabilities.

Performance Layer
The presentation layer is the front end of the system, which consists of the user interface.In this study, Vue.js, Element UI and Echarts.jstechnologies were used for front-end development, and Node.js was used as the server running environment.Vue.js is a popular JavaScript framework that can add libraries according to user needs for building visual interactive interfaces [48].Element UI is a desktop UI component library based on Vue.js, providing a wealth of components and templates to meet the needs of developers for interface design.Echarts.js is an open source javascript-based visualization library for creating interactive data visualization charts [49].Use Axios for front-end data interaction and webpack for packaging.Axios is a promise-based HTTP client that automatically transforms the data format of requests and responses, simplifying the coding of data processing.In the presentation layer of this system, Vue.js framework is used to build the front-end user interface to realize data visualization, and components and templates provided by Element UI are embedded to improve the aesthetics and ease of use of the interface.Finally, Echarts.js is used to build and render visual charts of air quality and vegetation index data.

Application Layer
The application layer is the back end of the system, including the back-end framework, the data access layer, and the application support layer.First, the back end of the system uses Spring Boot as a framework to provide basic Web development capabilities and easy configuration.Spring Boot is an open source framework for building Java applications that simplifies application development, deployment, and configuration [50].
Spring Boot provides a convention-over-configuration approach that allows developers to quickly set up and run standalone applications through automatic configuration and default values.By using Spring Boot framework, the system can build projects quickly, realize the data supply function efficiently, and improve the reliability and development efficiency of the system.Second, the data access layer uses Mybatis Plus as an Object-Relational Mapping (ORM) tool to simplify the access and operation of MySQL database, which is used to realize the addition, deletion, modification and checking of soil attribute data and the paging function of the interface.MyBatis-Plus is an enhanced tool library based on the MyBatis framework that provides a range of convenient functions and features.MyBatis Plus can give full play to the flexibility and powerful SQL writing ability of MyBatis, which can simplify and enhance the interaction with the database [51].Third, the application support layer uses ArcGIS Server to host the map service, and ArcGIS API for JavaScript as the map development kit.

Data Analysis Layer (1) Hyperparameter optimization machine learning model
In order to achieve the prediction of soil salinity, we built a hyperparameter optimization machine learning model, which is mainly divided into two stages.In the first stage, hyperparameter optimization of RF model and GBDT model is carried out using TPE algorithm in Python environment, so that the model can have better accuracy in training and prediction.The second stage is to use the hyperparameters optimized by TPE to establish a soil salinity prediction model based on random forest (RF) and gradient lifting decision tree (GBDT).The details are as follows: ① TPE hyperparameter optimization Tree-structured Parzen Estimator (TPE) is a single-objective Bayesian optimization algorithm based on Tree structure Parzen for solving global optimization problems.Compared with other parameter optimization algorithms such as Grid Search (GS) and Random Grid Search (RGS), TPE can adaptively adjust the size of parameter search space, and the search efficiency is higher.The ability to find the best hyperparameter configuration for a machine learning model in as few iterations as possible saves time costs [52].TPE algorithm uses Gaussian mixture model to learn hyperparameter model.In the optimization process of TPE algorithm, the hyperparameter set is divided into two different probability density functions by sampling: the distribution formed by the hyperparameter set with less risk and the distribution ) (x g formed by the hyperparameter set with greater risk.The TPE formula is as follows: TPE defines two probability density functions (Equations (11) and ( 12)): where * y is the threshold, which represents the value at the quantile r in the y-set.The range of * y is (0,1), and the default value of r is 0.15.TPE selects expected improvement (EI) as the acquisition function.The calculation formula is:  (13) the Equation (13) shows that EI is proportional to and inversely proportional to to select the most appropriate x value, so that the EI value is higher in the iterative process.In each iteration, the algorithm returns the x* with the largest EI value and participates in the next iteration.Finally, the optimal hyperparameter combination of the RF and GBDT models can be found.

② RF
Random Forest (RF) is a decision tree model based on Bagging ensemble principle.The RF model uses Bootstrap Aggregating (Bagging) method to randomly sample the sample data to form multiple subsets and establish several decision trees.In the regression problem, the average value of the prediction results of all decision trees is taken as the final result [53,54].RF has the characteristics of high efficiency, ease of use, and strong ability to process high-dimensional data.It has been widely used in soil attribute prediction [27].

③ GBDT Gradient Boosting Decision Tree (GBDT) is an ensemble learning model based on
Boosting [55].By gradually improving the model to improve the prediction performance, it can deal with high-dimensional data and nonlinear relationships, and has a good effect in data analysis and prediction.It uses Classification and Regression Tree (CART) as a weak learner, and constructs multiple weak learners by iteratively establishing multiple decision trees.GBDT uses the gradient descent algorithm to train the newly added weak classifier by calculating the negative gradient information of the current model loss function [56].Subsequently, the trained weak classifier is added to the model, and the loss function is continuously fitted to continuously optimize the optimal model.The core formula of GBDT algorithm is as follows: The GBDT first initializes the decision tree model (Equation ( 14)): where (x) f 0 is the initialized weak learner, c is an optimal bias constant, the sum of the difference between the constant and the true value of the data set is the smallest, and ,c) L(y i is the loss function, which represents the error between the predicted value of CART and the actual target value. Each weak learner adjusts the training sample according to the performance of the previous weak learner, and then calculate the residual error generated by the previous weak learner (Equation ( 15)).Then, fit the residual ki r , create a new weak learner (x) T k , and update the learner (Equation ( 16)).
Repeat the above steps until the number of weak learners reaches the previously set hyperparameter.Finally, the prediction results of all weak learners are weighted and summed to form a strong learner and output the prediction results of soil properties (Equation ( 17)).
The data set required for the prediction model is based on the spatial interpolation and resampling results of air quality, vegetation index and soil properties (unified resolution of 10 m × 10 m), and the point data set is obtained by grid turning point in ArcGIS.The two machine learning models after hyperparameter optimization were used for prediction and accuracy evaluation.The obtained prediction data were point-to-grid in ArcGIS, so as to obtain the spatial prediction results of soil properties in the YRD by different models.In addition, the measured coefficient (R 2 ) and root mean squared error (RMSE) were used to evaluate the accuracy of the model [57].We constructed two indicators, R 2 and RMSE, to evaluate the measured and predicted values of soil salinity.According to the fitting effect of the obtained scatter plot, the accuracy and robustness of the model are evaluated.The Shapley Additive exPlanations (SHAP) analysis method is used to explain the contribution of covariates to the prediction target [58].We used the SHAP method to evaluate the importance ranking and correlation between variables and soil salt prediction.
(2) Cross-platform deployment of machine learning models (PMML, JPMML) Predictive Model Markup Language (PMML) and Java PMML (JPMML) are used to implement cross-platform deployment of machine learning models.PMML is a standard XML format for describing and exchanging prediction models [59].It defines a set of general model representation methods, which can transform various machine learning and statistical models (such as decision trees, random forests and linear regression) into a unified PMML format [60].The model can be migrated from one platform or environment to another platform or environment without retraining and re-implementing the model.This makes the deployment and application of the model more flexible and scalable.JPMML is a PMML interpreter for Java platform.It supports a variety of machine learning models and can handle complex model transformation and evaluation tasks.By using JPMML, developers can easily deploy and apply various prediction models in Java applications using the PMML standard to achieve cross-platform and cross-system model sharing and application.
(3) Database design Figure 7 shows the database design of the system.This study used MySQL as a database management system for the storage and management of attribute data and air quality factor data.Soil attribute data included sampling point coordinates, pH, heavy metals and soil salt content, and air quality factor data for contaminated gas from January to December in the study area.Multiple tables of environmental data have established in MySQL database, including AQI, O3, PM2.5, Mental, and Point.Database design reveal that the ID, Longtitude and Latitude exist in every field.The air quality (i.e., AQI, O3, and PM2.5) table contains the monthly average from January to December.The Mental table is a field composed of the content of As, Pb, and Zn, which is used to assess soil ecological risk.The Point table contains fields composed of the content of pH and Salt, which are used to calculate the results of spatial interpolation in the WebGIS system.The numerical type of all public fields (i.e., ID, Longtitude, and Latitude) is set to int, and the numerical type of the air quality and soil properties fields is set to double.

Design of WebGIS System Functions
The WebGIS system functions are hierarchically released to the server of the cloud platform according to the data response to support multi-level management and services of soil ecology and attribute prediction in the study area.The logical view of the functional architecture is designed as a tree structure, and the leftmost of Figure 8 is graded according to each type of data, including soil measured data, remote sensing data, air quality data, vegetation index data, and soil texture data.In the right side of Figure 8, the functions are fully deployed, including data statistical analysis, soil health evaluation, soil salinity prediction, and data management.Moreover, the smallest level of leaves has realized detailed functions, and it can be modularly expanded according to new requirements in the further.

Data Statistical Analysis
As the basis of the WebGIS system, this function displays and analyzes all spatial and non-spatial statistical data related to soil ecology of the study area, including air quality, vegetation index, and soil texture modules.

Air Quality Monitoring
In order to reveal the air quality of the study area from multiple angles, a variety of display modes were used.They are statistical tables, statistical charts and online maps.The statistical table (top left of Figure 9) shows the concentration standards of PM2.5, O3, PM10, SO2, NO2 and CO, and the horizontal histogram (bottom left of Figure 9) shows the proportion of pollution gases in the primary and secondary standards.The longitudinal histogram reflects the monthly mean comparison results of the AQI index, which is characterized by high in winter and low in summer.At the far right of Figure 9, the user can select the monitoring station on the map to see the air quality pollution level of the current site.The pie chart shows the proportion of monthly concentrations of air quality such as O3 and PM2.5 in the total annual concentration.In the main interface (in the middle of Figure 9), the user visually observes the spatial distribution of the polluting gases in the study area by selecting the interpolation map.This interpolation map is realized by using ArcGIS Server in the cloud server to publish the map.For example, by clicking the layer drop-down box to select SO2 gas, the legend in the interface shows that the concentration of SO2 is from 8.31 ug m −3 to 13.25 ug m −3 .Obviously, the SO2 concentration in the northeast and south of the study area is higher, while the SO2 concentration in the middle is lower.This function can help users find out the areas with serious air quality pollution, and take targeted measures to control the corresponding areas, which is conducive to timely control of air pollution.

Vegetation Index Analysis
Since the vegetation index is a relatively abstract concept for ordinary users, this function module provides the text of the introduction of the concept of vegetation index and the bubble diagram of its related applications (left side of Figure 10).Based on the above introduction, the user can easily understand the comparison results of descriptive statistical value from the vegetation index in the line chart.Through the map coverage analysis, the spatial distribution characteristics of the vegetation index in the study area and the proportion of the hierarchical area in the pie chart are clarified.For example, by viewing the spatial interpolation results of the MNDVI index (Figure 10), users can intuitively analyze the sparse (green area) and dense (red area) distribution trends of the MNDVI index in the study area according to the color.Obviously, the areas with higher MNDVI index values are concentrated near the seaside, and the vegetation coverage on both sides of the river in the inland area is higher, which may be related to the dense number of nearby farmland, sufficient soil nutrients and water content.In contrast, the MNDVI index is lower in the north and east.In this module, users can qualitatively infer the soil ecological situation of saline-alkali land in the study area by understanding the coverage characteristics of vegetation.

Soil Texture Analysis
In order to allow users to understand the soil texture characteristics of the study area, the concept of soil texture, particle composition and texture type are provided on the left side of the Figure 11.In the main interface, the remote sensing images of eight soil textures in the study area are displayed, including Mean, VAR, HOM, CTRA, DIS, ENT, SECM and CORR, and users can browse them by switching the arrow buttons around.For example, users can clearly observe the characteristics of soil texture with color change after Mean transformation in the region.The soil textures (purple and blue patches) near the ocean are homogeneous because the land use types (mainly wetlands and tidal flats) in these areas are monotonous.The yellow, green and brown patches far from the ocean have high texture features.In these areas, farmland, forest land, grassland, construction land and other land use types are mixed, resulting in complex soil texture features.This functional module can provide data support for assisting farmland soil ecological assessment and spatial prediction of soil properties.

Ecological Risk Assessment
The ecological risk assessment function includes single pollution index, Nemero comprehensive pollution index and Hakanson potential ecological hazard index.Users can browse the rating criteria for each pollution index by clicking on the tab 'See the evaluation criteria'.Taking Hakanson ecological risk assessment as an example (Figure 12), firstly, a single sampling point was selected to obtain id and latitude and longitude information, and then the concentrations of As, Pb, and Zn in soil were 25.377 mg kg −1 , 27.383 mg kg −1 , and 38.163 mg kg −1 by clicking on the tab 'View the data for that point'.Finally, the Hakanson potential ecological risk index of this point is calculated.At the bottom of Figure 12, the comprehensive index RI is 35.913,indicating that this point is at a minor ecological risk level.In the bar chart, although the concentrations of the three heavy metals were at a low ecological risk level, the Hakanson potential ecological risk index value of soil As was significantly higher than that of Pb and Zn, indicating that As had higher accumulation and ecological risk.Similarly, users can also choose the single pollution index and the Nemero comprehensive pollution index to evaluate the soil ecology, so as to determine whether the soil in the area is polluted by heavy metals.

Evaluation of Spatial Variability of Soil Properties
Users can select the spatial distribution results of different soil properties in the left column, which can intuitively show the spatial variability of soil pH and soil salinity in the region.For example, the areas with high soil pH value were concentrated in the north and south of the study area (Figure 13), indicating that the soil alkalinity in these areas was strong and the distribution had significant spatial heterogeneity.Users can prevent and control salinization according to the spatial variability of soil pH and salinity, and improve crop yield and quality by improving soil quality.

SHAP-Based Variable Importance Analysis
The SHAP analysis results (upper right of Figure 14) show the total importance of different environmental variables in the prediction process (i.e., histogram).The higher the SHAP value, the greater the contribution of this factor to the fitting accuracy of soil salinity.The positive and negative driving effects and the cumulative importance of environmental variables at each point are also shown in the scatter plot of Figure 14.When the red points are gathered in the positive area on the right side of the coordinate axis and the blue points are gathered in the negative area on the left side of the coordinate axis, it represents that the factor has a positive driving effect on soil salinity, and vice versa.For example, air quality factors (O3 and PM10) have a significant positive driving effect on the fitting accuracy of soil salinity, while SO2 and NO2 have a negative driving effect on the fitting accuracy of soil salinity.In addition, the coefficient of variation was used to describe the variation degree of soil salinity and various characteristics in the lower left of the system.If the variation rate of a feature is high, it means that the data is discrete and has high variability.

Data Management
The data management module showed the detailed information of soil sampling points (Figure 15), which allows users to quickly browse, query and operate the data.The main interface of data module includes add data, modify data, delete data, refresh data and other functions.The user can click on the add data, enter the index information of the soil sampling point to add the sampling point data.The user can also click the modify button to modify the data of the sampling point according to the dynamic change of the sampling point information, so as to help the user update the data information in time.This module can realize the comprehensive management of soil sampling point data.

The Application Prospect of Hyperparameter Optimization Machine Learning in WebGIS System
Prediction of soil properties is essential for monitoring soil ecological health, and the selection of accurate models and algorithms is the key to soil attribute prediction.The application of accurate algorithms or models in WebGIS system can improve the accuracy of prediction and achieve more desirable results, which is consistent with the development trend of the existing WebGIS system.For example, Sciortino et al. applied SL algorithm to WebGIS, using SL prediction model to monitor the change of product shelf life, which realized real-time prediction of shelf life [61].Yong et al. applied exponential smoothing prediction model in WebGIS system to predict soil heavy metal pollution [15].The results showed that exponential smoothing prediction model could predict soil heavy metal pollution well.With the development of automatic machine learning algorithms, hyperparameter optimization is a key step to improve the prediction accuracy and generalization ability of machine learning models [62].This system applied the TPE algorithm to optimize the key hyperparameters of RF and GBDT models, which significantly improved the prediction accuracy of the models.
The traditional hyperparameter optimization method of manual parameter adjustment or grid search has uncertainty and is easy to fall into the local optimal solution, which has high search time cost and poor performance.TPE algorithm has been proved to be a high-performance parameter tuning method in many studies [63,64].It will selectively the next soil sampling point based on the performance of the previous soil sampling point.In the process of thousands of iterations, TPE will continue to approach the lowest value of the loss function, and then find the optimal hyperparameters of the model with low time cost.The advantage of applying the TPE algorithm in this study is that it saves the time cost of hyperparameter optimization and significantly improves the prediction efficiency and fitting accuracy of the model.Combining WebGIS technology with TPE-RF and TPE-GBDT models to achieve strong generalization ability and high-precision soil salinity prediction.However, in order to meet the needs of WebGIS system to process massive multi-source heterogeneous data, embedding deep learning algorithm will be an inevitable trend to achieve high-precision prediction and processing of multi-dimensional data.Deep learning has a large number of hyperparameters and complex combinations, and has stronger learning ability and generalization ability [65], and it has immeasurable potential in dealing with multidimensional data of cross-scale fusion [66].There are some difficulties at present.For example, unlike the decision tree model, the hyperparameter optimization interface of the neural network model needs to be clear.In particular, some deep neural networks such as CNN have complex hyperparameters, and their hyperparameter optimization is very difficult.Future research should consider using the TPE algorithm to optimize the hyperparameters of the deep learning model to fully improve the computational efficiency of the system.

The Synergy of Multi-Source Environmental Variables in WebGIS System
Considering that the use of multiple covariates can improve the prediction accuracy of the model, and there is a large correlation between environmental variables, which will affect each other.In order to comprehensively monitor the ecological environment of the YRD, the system integrates multiple data sources related to the soil ecological environment of the YRD in the data analysis layer, including air quality factor data, soil texture, vegetation index, soil attribute data and other data.Using WebGIS technology, multisource data sharing is realized in a visual form, which helps users to comprehensively analyze and comprehensively monitor soil ecological health in the YRD.This is similar to the method of comprehensive analysis using multi-source data in the existing literature.Recently, Mallinis et al. used WebGIS technology to build a visual platform that integrates thematic data of all available ecosystem types, ecosystem ranges, ecosystem conditions, ecosystem services and other information to help decision makers conduct overall assessment and comprehensive analysis of ecosystems [67].Similarly, Yao et al. demonstrated a variety of data such as locust spatial data, field survey data, control data, and meteorological data in the locust decision support system, which is helpful for users to fully understand the occurrence of locust disasters and make decisions [18].
However, this should also take into account some limitations.The integrity and realtime of data may be insufficient, which affects the accurate monitoring of soil ecological environment.For example, the soil attribute data used in this system is obtained by manual field sampling.Due to the limitations of environment, resources and technology, the sampling scale and sampling number are small, the spatial representation is poor, and the data integrity is insufficient, so it is difficult to obtain data in real time.Soil quality is dynamically changing.Accurate monitoring of soil ecological environment in a wide range requires more complete data to provide support.For example, detailed field data such as drainage measures and soil microbes in saline-alkali land are important for evaluating soil ecological processes in saline-alkali land at the field scale and for providing specific field management and soil improvement programs [68].Meanwhile, it also requires real-time update of data.In this case, how to obtain more complete and real-time data is a key issue to be solved.With the development of information technology, sensor technology continues to progress, and comprehensive and accurate soil data can be obtained.Google Earth Engine (GEE) is a publicly accessible platform that provides a large number of earth observation data sets for analysis worldwide, including satellite image data and real-time observation data of multiple sensors [69].Future research can obtain soil parameters and other environmental variable factors through the GEE platform, and establish a complete YRD soil ecological database to better monitor the soil ecological environment accurately and in real time.
The system provides a visual platform for the monitoring of soil ecological environment.In the current policy, the use of soil ecological monitoring systems is essential for environmental monitoring and saline-alkali protection.At present, our system is applied to the local scale of the YRD region, which has good performance and high prediction accuracy, and plays an important role in the monitoring of soil ecological environment.Our country is rich in soil resources.In the future, we can consider applying the system to a wider range of areas.However, there may be differences in soil quality and soil properties in different regions, and the function of the system may be different in different regions.It is necessary for future research to introduce more environmental variable factors and build a rich and complete database.Some module functions should be added into the WebGIS system in future research.More vegetation indices are explored to respond to soil ecological monitoring functions.In particular, long-term series of vegetation indices are added to the WebGIS system to realize the spatial and temporal dynamic monitoring functions of saline-alkali soil, including soil ecological degradation and erosion warning, and soil ecological water holding capacity and carbon sequestration capacity evaluation functions.In addition, the system is oriented to a wide range of groups.Considering that non-professional farmers may lack experience and knowledge, we provide the introduction and sharing of relevant knowledge of various types of data in the system.Users can learn professional knowledge in the process of using the system.However, in order to improve the ease of use and stability of the system, the future WebGIS system should also be tested and improved for different groups of users.

Conclusions
This study developed a WebGIS-based system of saline-alkali soil ecological real-time monitoring.The system integrated a variety of WebGIS technologies.The front end of the system used technologies such as Vue.js, Element UI, Axios and Echarts.js to provide users with an intuitive and easy-to-use interactive interface.The back-end used the spring boot framework to improve the stability of the system input data.PMML and JPMML were used for cross-platform deployment of machine learning models, so that the prediction model can be shared.Different from previous studies, this study takes into account the multi-source environmental variables, and it reflects the diversity of soil ecosystem in saline-alkali land of YRD.Users can comprehensively analyze and monitor the soil ecological environment through the system.The data statistical analysis module showed the monitoring results of air quality, vegetation index and soil texture in a visual form.The soil health assessment module provided users with the risk assessment of soil heavy metals and the spatial variability of soil properties.The soil salinity prediction module introduces the advanced TPE algorithm to optimize the RF and GBDT models, which significantly improved the prediction accuracy.The data management module applied Mybatis-Plus technology, and users can update data in real time.This study realized a potential platform for users to monitor soil ecological environment, which provides scientific decision support for salt reduction and carbon sequestration in saline-alkali land.

Figure 1 .
Figure 1.(a-c) The location of the study area and soil sampling points.

2. 3 . 2 .
Air Quality Data Six air quality factors were selected as environmental variables, including CO, NO2, O3, SO2, PM2.5, PM10.The data came from the Dongying Municipal Environmental Protection Bureau (http://sthj.dongying.gov.cn/),including 40 air quality monitoring stations.We employed ordinary kriging (OK) to spatially interpolate the air quality factors of the 40 monitoring stations (Figure 3) in the ArcGIS 10.8 software, and obtained raster data of air quality with a resolution of 10 m × 10 m.

Figure 4 .
Figure 4.The interpolation results of (a)soil pH and (b)soil salinity.

Figure 6 .
Figure 6.The architecture of the saline-alkali soil ecological monitoring system in the YRD.

f
is the kth weak learner.

Figure 7 .
Figure 7.The architecture of database design.

Figure 8 .
Figure 8. Functions of the saline-alkali soil ecological monitoring system in the YRD.

Figure 9 .
Figure 9.The main interface of air quality module.

Figure 10 .
Figure 10.The main interface of vegetation index module.

Figure 11 .
Figure 11.The main interface of soil texture module.

Figure 12 .
Figure 12.The ecological risk assessment results of soil heavy metals in the system.

Figure 13 .
Figure 13.The spatial distribution of soil pH concentration.

Figure 14
Figure 14 is the soil salinity prediction function module, which uses TPE-RF and TPE-GBDT models to predict the content of soil salinity.The interface shows the prediction accuracy, spatial prediction map, scatter plot of test set and training set, and SHAP analysis result.The module places a text box for user-defined prediction.By inputting the values of multiple feature indicators, the TPE-optimized machine learning model is called at the back end to perform single-point prediction of soil salinity.For example, the upper left of Figure 14 shows that the predicted single-point salinity based on the TPE-RF model is 0.096 g kg −1 .Compared with the TPE-GBDT model, the fitting accuracy of TPE-RF model in the test set (R 2 = 94.48%) was higher, which exhibited the TPE-RF model has a strong ability to explain the nonlinear relationship between environmental variables and soil salinity.The scatter plots of the training set and the test set of the TPE-RF model show better compactness along the diagonal.

Figure 14 .
Figure 14.The prediction function of the soil salinity by TPE-RF model.

Figure 15 .
Figure 15.The main interface of the data management module.

Table 1 .
The formula for vegetation index based on Sentinel-2 remote sensing image.

Table 2 .
Evaluation level of Nemero comprehensive pollution index.

Table 3 .
Toxicity coefficient of Hakanson potential ecological hazard index.

Table 4 .
Evaluation level of Hakanson potential ecological hazard index.