Assessment of Soil Pollution Levels in North Nile Delta, by Integrating Contamination Indices, GIS, and Multivariate Modeling

: The proper assessment of trace element concentrations in the north Nile Delta of Egypt is needed in order to reduce the high levels of toxic elements in contaminated soils. The objectives of this study were to assess the risks of contamination for four trace elements (nickel (Ni), cobalt (Co), chromium (Cr), and boron (B)) in three different layers of the soil using the geoaccumulation index (I-geo) and pollution load index (PLI) supported by GIS, as well as to evaluate the performance of partial least-square regression (PLSR) and multiple linear regression (MLR) in estimating the PLI based on data for the four trace elements in the three different soil layers. The results show a widespread contamination of I-geo Ni, Co, Cr, and B in the three different layers of the soil. The I-geo values varied from 0 to 4.74 for Ni, 0 to 6.56 for Co, 0 to 4.11 for Cr, and 0 to 4.57 for B. According to I-geo classiﬁcation, the status of Ni, Cr, and B ranged from uncontaminated/moderately contaminated to strongly/extremely contaminated. Co ranged from uncontaminated/moderately contaminated to extremely contaminated. There were no signiﬁcant differences in the values of I-geo for Ni, Co, Cr, and B in the three different layers of the soil. According to the PLI classiﬁcation, the majority of the samples were very highly polluted. For example, 4.76% and 95.24% of the samples were unpolluted and very highly polluted, respectively, in the surface layer of the soil proﬁles. Additionally, 14.29% and 85.71% of the samples were unpolluted and very highly polluted, respectively, in the subsurface layer of the soil proﬁles. Both calibration (Cal.) and validation (Val.) models of the PLSR and MLR showed the highest performance in predicting the PLI based on data for the four studied trace elements, as an alternative method. The validation (Val.) models performed the best in predicting the PLI, with R 2 = 0.89–0.93 in the surface layer, 0.91–0.96 in the subsurface layer, 0.89–0.94 in the lowest layers, and 0.92–0.94 across the three different layers. In conclusion, the integration of the I-geo, PLI, GIS technique, and multivariate models is a valuable and applicable approach for the assessment of the risk of contamination for trace elements, and the PLSR and MLR models could be used through applying chemometric techniques to evaluate the PLI in different layers of the soil.

assessing and analyzing trace element contents in soil because it intuitively reflects the effects of human activities on trace elements as well as the impact of trace elements on the environment [26]. The PLI represents the number of times the trace element concentrations in the soil surpass the background concentrations, and it provides a summative measure of the total level of trace element toxicity in a given sample. Both parameters were used to evaluate the contamination levels in the soil and bottom sediment of Upper Egypt [27].
Natural landscapes and ecosystems have been impacted by global development and unplanned agricultural practices [28]. Therefore, the evaluation of soil pollution needs to consider in depth knowledge of the spatial distribution of contaminants [29]. The use of a GIS database can provide detailed information for low-cost soil surveyance. GIS databases can also help in deriving digital elevation models (DEMs), which can assist in the development of landscape features used to characterize landform [30]. It is important to analyze the distribution and concentration of trace elements. This will enable pollution levels to be determined and the associated impacts on both the environment and human health to be assessed. The evaluation and mapping of soil toxic elements can help in developing strategies to promote the sustainable use of soil resources, reduce soil degradation, and expand crop production. Geophysical completion is used to survey and interpret the spatial distributions of pollutants in soil [31]. The function of the weighted distance (IDW) is useful for comprehensively evaluating pollution patterns [32]. Proper assessments of the toxic element concentrations in soils, supported by GIS databases, are needed to reduce the high levels of toxic elements in contaminated soils [33].
Calculating the PLI requires a series of calculation steps that require significant time and effort to convert several numbers from the trace element data for the soil into a single value describing the soil contamination level [27]. Partial least-square regression (PLSR) and multiple linear regression (MLR) could be used to solve this problem since they are standard methods for specifying a linear relationship between independent variables and dependent variables [34,35]. New modeling frameworks and multivariate regression models such as PLSR and MLR should facilitate a substantial increase in the efficiency of predicting dependent variables based on several independent variables. PLSR is commonly used to create predictive models of the hyperspectral responses of in situ canopy samples [9,36]. PLSR was recently shown to perform well in assessing water quality indices [37]. PLSR and MLR have been proposed for resolving strongly multicollinear and noisy variables and efficiently assessing measured parameters [38]. Both methods can combine data for a large number of trace elements into a single index to enhance the prediction of a measured variable. Therefore, using these methods, the PLI or other pollution indices can be simultaneously estimated from data for several trace elements. PLSR can reduce many collinear factors to a few, non-correlated latent factors, preventing the overfitting or underfitting of the data, and avoiding redundant data [37,39]. There is little information available with which to evaluate the PLSR and MLR methods based on trace elements for assessing the PLI of soil in different layers.
Therefore, the objectives of this study were to (i) map soil pollution according to four trace elements (Ni, Co, Cr, and B) based on I-geo using the GIS technique for soil profiles in three soil layers; (ii) assess the risk of contamination for four trace elements using I-geo and PLI in three different soil layers; and (iii) evaluate the performance of PLSR and MLR models based on four trace elements (Ni, Co, Cr, and B) for predicting PLI, as a new method, in three different soil layers.

The Study Area
The research area is situated north of the Nile Delta ( Figure 1) in Kafr El-Sheikh Governorate, Egypt. The geographic coordinates are in UTM zone 36 (latitudes 31 0 0 -31 40 0 N and longitudes 30 30 0 -31 10 0 E; Figure 1). The northern part of the Nile Delta is in the arid region, while the southern part is in the hyper-arid region, according to a map of the global distribution of arid regions. The study area is encircled by two branches of the Nile River: the Rosetta in the west and the Damietta to the east. A network of 40,000 km of canals diverts and supplies water from the Nile River to nearly 2 million farmers for cropland irrigation, with a similar network of drainage canals also incorporated in the region [40,41]. These drainage canals cover approximately 18,000 km, leading to a total length of nearly 58,000 km when combined with irrigation canals [40]. In the Nile Delta, the population's normal growth rate is 21,600/year. An area of 825,100 acres is preserved for cultivation and is famous for agricultural rice, beet, cotton, and wheat. In the north of the Nile Delta, the Lake Burullus flood plain, coastal plain, urban and industrial commercial centers, and some sand dunes in the coastal sections are all important features and major landforms. The alluvial plain, lacustrine plain, and marine plain (71.08%, 19.34%, and 9.57% of the total territory, respectively) are the three major physiographic units.

The Study Area
The research area is situated north of the Nile Delta ( Figure 1) in Kafr El-Sheikh Governorate, Egypt. The geographic coordinates are in UTM zone 36 (latitudes 31′0′0″-31′40′0″ N and longitudes 30′30′0″-31′10′0″ E; Figure 1). The northern part of the Nile Delta is in the arid region, while the southern part is in the hyper-arid region, according to a map of the global distribution of arid regions. The study area is encircled by two branches of the Nile River: the Rosetta in the west and the Damietta to the east. A network of 40,000 km of canals diverts and supplies water from the Nile River to nearly 2 million farmers for cropland irrigation, with a similar network of drainage canals also incorporated in the region [40,41]. These drainage canals cover approximately 18,000 km, leading to a total length of nearly 58,000 km when combined with irrigation canals [40]. In the Nile Delta, the population's normal growth rate is 21,600/year. An area of 825,100 acres is preserved for cultivation and is famous for agricultural rice, beet, cotton, and wheat. In the north of the Nile Delta, the Lake Burullus flood plain, coastal plain, urban and industrial commercial centers, and some sand dunes in the coastal sections are all important features and major landforms. The alluvial plain, lacustrine plain, and marine plain (71.08%, 19.34%, and 9.57% of the total territory, respectively) are the three major physiographic units.

Soil Analysis
Soil samples were collected from 21 representative profiles in Kafr El-Sheikh Governorate during September 2018. Twenty one soil profiles were selected according to geomorphologic units in the study area to represent the different agricultural practices and soil samples were collected from different layers according to morphological variations. As each physiographic unit covers more than one profile, and therefore the degree of certainty in the distribution is great because it covers the physiographic units to a large extent. The soil profiles were described according to virtual characterization, and the Mansell book was used to determine the color grade. The studied profiles and soil sample locations were determined using a Global Positioning System (GPS) unit (German model) as shown in Figure 1. Throughout the sampling, a GPS was used to pinpoint the precise locations of the sampling sites. Soil samples were taken from the selected top

Soil Analysis
Soil samples were collected from 21 representative profiles in Kafr El-Sheikh Governorate during September 2018. Twenty one soil profiles were selected according to geomorphologic units in the study area to represent the different agricultural practices and soil samples were collected from different layers according to morphological variations. As each physiographic unit covers more than one profile, and therefore the degree of certainty in the distribution is great because it covers the physiographic units to a large extent. The soil profiles were described according to virtual characterization, and the Mansell book was used to determine the color grade. The studied profiles and soil sample locations were determined using a Global Positioning System (GPS) unit (German model) as shown in Figure 1. Throughout the sampling, a GPS was used to pinpoint the precise locations of the sampling sites. Soil samples were taken from the selected top layers. The study area was classified in to order Entisols, there are no morphological differences in the soil profiles, so soil samples were taken at the three levels. The type of soil is clay. Three soil samples were taken from each profile at depths of 0-30, 30-60, and 60-100 cm. All the samples were placed in sealed polyethylene bags and returned to the laboratory. The samples were composited, homogenized, air-dried at 25 to 35 • C, crushed, and sieved to 2 mm. Soil properties were determined using the prepared samples (pH, EC, Ca 2+ , Mg 2+ , K + , Na + , CO 2 − , HCO3 − , Cl − , SO4 − , OM %, CaCO 3 %, P% and N%) in Table S1. For the determination of the total element concentration, exactly 1 g of powdered soil sample was digested with aqua regia (HNO 3  ICP-OES). The threshold trace element concentrations in the soil, in mg kg −1 dry soil, were determined according to Kabata-Pendias [42] as shown in Table 1. The I-geo expresses pollution by comparing the measured levels of trace elements with the background levels originally used for evaluating bottom sediments [43,44]. Trace element contamination was assessed using the geographic accumulation index (I-geo) using the following equation: where C n is the concentration of the trace elements measured in the soil, B n is the geochemical background concentration of the trace element (medium crust) [45]. The constant 1.5 in Equation (1) was introduced to minimize the effect of potential differences in background values that could be attributed to rocky differences in sediments. The concentration here is between the concentration obtained and that of the elements in the Earth's crust, because soil is part of the Earth's crust, and its chemical composition is related to that of the crust [46]. The I-geo classification is shown in Table 2. Table 2. Class, value, and contamination level according to geoaccumulation index (I-geo) in soil [46].

The Pollution Load Index (PLI)
The PLI is a geometric average of impurity coefficients (C i f) that defines the contribution of all trace elements in a specific place [47]. Contamination factor (CF) was computed by dividing metal concentration by background value using the following equation [48]: Pollution load index (PLI) is used to estimate elements concentrations in soils relative to the reference concentration and was calculated using the following equation [49]: where CF is the contamination factor and n is the number of metals. This parameter can be used to determine the level of environmental pollution in order to undertake monitoring activities to improve soil quality. The PLI classification is shown in Table 3. Table 3. Class, value, and pollution level according to pollution load index (PLI) in soil [47].

Spatial Distributions of Trace Elements
Spatial completion is widely used when data are collected in distinct locations (such as soil profiles) to produce continuous information [50]. The ArcGIS Spatial Analyst 10.2.1 extension offers spatial data analysis tools that use statistical theory and techniques to model spatially referenced data. Data for four trace elements were used to derive the intervening values using ArcGIS Spatial Analysis' interpolation methods. The weighted inverse distance (IDW) is an interpolation method that uses the values measured surrounding the prediction site. The values measured closest to the prediction site have a greater effect on the expected values of the distant ones; greater weight is given to the points closest to the prediction site, the weight being a function of distance [51]. The advantage of using IDW in mapping of spatial distribution of heavy metals is that it is efficient. This interpolation method works better with equally distributed points [52]. In this study, the concentration of elements is not due to natural sources, but the presence of other sources such as agricultural and industrial drainage, which directly affects the concentration of elements in the soil, which varies based on the distance from the source, and therefore it is better to choose this IDW method. The statistical relations between the known points were determined using the IDW function of ArcGIS and were used to determine the concentrations of trace elements in the study area. The IDW was used with 12 adjacent samples to estimate each grid point. The nearest point was weighed using two strengths.
where x 0 represents the estimation point and x i represents the data points inside a selected neighborhood. The distance between the estimation point and the data points is connected to the weights (r) by d ij . The IDW method has the effect of giving data points near the interpolation point relatively substantial weights; whereas data points further away have less impact. The larger the weight, the more close-to-x 0 influence points are granted.

Partial Least-Square Regression (PLSR)
PLSR was evaluated in this study as a new method for predicting the PLI. PLSR is a versatile tool that can easily manage data when the number of input variables is much greater than the number of target variables, and the input variables have much collinearity and noise [53]. In this study, to link the input variables (four trace elements) to the output variables (PLI), PLSR was combined with leave-one-out cross-validation (LOOCV). An important step in PLSR analysis is to select the optimal number of latent variables (LVs) in order to represent the calibration data without overfitting or underfitting. The LV parameter was optimized using the LOOCV in terms of the lowest value of the RMSE. Random 10-fold cross-validation was applied on the datasets to increase the robustness of the results as indicated by Unscrambler X Version 10.2 (CAMO Software AS, Oslo, Norway). The performance of PLSR models based on four trace elements for each soil layer was evaluated to predict the PLI. The best model for both calibration (Cal.) and validation (Val.) was chosen according to the lowest value of the root mean square error (RMSE) and mean absolute deviation (MAD) as well as the highest determination coefficient (R 2 ) and accuracy (Acc). The absolute variance fraction, R 2 , is computed as follows: The RMSE shows how well the model (absolutely) fits the data points. The RMSE is a relative measure of fit that determines the best absolute values, with smaller RMSE values suggesting a better fit. The RMSE is calculated with the following equation: The MAD evaluates the average magnitude of errors through a series of simulations without taking direction into account. It also determines the precision of constant variables, as seen below: WQI o represents the observed value, and n represents the number of data points. WQI f , conversely, is the predicted value.
Acc represents the accuracy of the model, PLI p is the predicted or simulated value, and PLI ave is the average value.

Multiple Linear Regression (MLR)
MLR was evaluated in this study as a new method like PLSR for predicting the PLI. MLR analyzes a dependent parameter (PLI) using two or more independent parameters (four trace elements). MLR attempts to model the linear relationship between the independent and the response (dependent) variable. The best model for both calibration (Cal.) and validation (Val.) was also chosen according to the lowest RMSE and mean absolute deviation (MAD) as well as the highest R 2 . The least-square approach was used to calculate the parameters, using the regression equation, that minimized the sum of the errors squared. Y i = β 0 + β 1 xi 1 + β 2 xi 2 + . . . + β p xi p + (9) where, for i = n observations, Y i = the dependent variable, xi = explanatory variables, β 0 = the y-intercept (constant term), β p = the slope coefficients for each explanatory variable, and = the model's error term (also known as the residuals).

Statistical Analysis
This statistical analysis was performed using SPSS (v. 12.0, SPSS Inc., Chicago, IL, USA). The relationship between the observed and predicted value of the PLI derived from PLSR was modeled using a simple linear regression. The significance level of the coefficients of determination (R 2 ) for these relationships was set at 0.05.

The Variation of Four Trace Elements in Three Different Layers of Soil
In this study, contamination with Ni, Co, Cr, and B in three different layers of soil was assessed. There were wide variations in the values of the four trace elements in the three different layers. The Ni varied from 0 to 2720 mg kg −1 , Co varied from 0 to 2694 mg kg −1 , Cr varied from 0 to 2327 mg kg −1 , and B varied from 0 to 3551 mg kg −1 (Table S2 and Table 4) across the three different layers. The trace element concentrations in the soil therefore ranked, in descending order, as B > Ni > Cr > Co. There were no Sustainability 2021, 13, 8027 8 of 20 significant differences in the values of Ni, Co, Cr, and B between the three layers of the soil, as shown in Table 4.

Assessment of Contamination Risk Using Geoaccumulation Index
In this study, the contamination of soils was assessed based on the I-geo. The I-geo values indicated widespread pollution by Ni, Co, Cr, and B in the different layers of soil. I-geo (Ni) varied from 0 to 4.74, I-geo (Co) varied from 0 to 6.56, I-geo (Cr) varied from 0 to 4.11, and I-geo (B) varied from 0 to 4.57 (Table 5). There were no significant differences in the values of I-geo (Ni), I-geo (Co), I-geo (Cr), and I-geo (B) between the three layers of the soil, as shown in Table 5. The spatial distributions of I-geo for heavy metals in the study area are illustrated in Figures 2-5. The beryl green color represents unpolluted soils related to the heavy metals analyzed, while the strongly polluted soils are represented by the red color. The maps of the Co distributions are wholly covered by red color, indicating that the study area was strongly polluted with Co contamination in the different layers of soil. Regarding the B and Cr maps, most of the study area was mostly covered by orange color, indicating that a large part of the research area was moderately to highly polluted with B and Cr. Based on the Ni maps, it was deduced that about half of the research area was highly polluted with this metal. Additionally, the spatial distribution maps of Ni in soil layers were recognized by increasing concentrations toward the eastern parts, which may be due to the proximity to El Gharbia main drain.
The results show that I-geo values of four trace elements indicated non-polluted to highly contaminated conditions. The I-geo values show wide variations in the values Ni, Co, Cr, and B in the different layers of soil (Table S3 & Figures 2-5). The Ni, Cr, and B according to I-geo classification ranged from uncontaminated/moderately contaminated to strongly/extremely contaminated. While, Co according to I-geo classification ranged from uncontaminated/moderately contaminated to extremely contaminated (Table S3 and  Table 6). The highest values of I-geo Ni, Co, Cr, B were found at 1Und, 14Und, 17S and 4Und, respectively. The I-geo (Ni) revealed that all the soil samples were contaminated except for three samples (5S, 20Und, and 21Und), which were in the non-polluted category, as shown in Table S3. The I-geo (Co) showed a high accumulation effect in many samples and indicates severe contamination in most samples. There was also a non-polluted in the samples 15S and 21Und as shown in Table S3. I-geo (Cr) and I-geo (B) showed high contamination in most samples, but they indicate a medium degree of contamination in many samples, and non-polluted was present in the samples of Cr at 21Und and B at 11Sub and 9-21Und.
These results indicate that the three soil layers in Kafr El-Sheikh Governorate contain large quantities of Ni, Co, Cr and B. The main sources of Ni, Co and Cr pollution in soils are the metal plating, fossil fuel combustion, Ni mining, and electroplating industries, anthropogenic sources such as sewage sludge and other wastes used as soil conditioners, agricultural fertilizers especially phosphates, atmospheric deposition and in inorganic fertilizers [54][55][56][57][58][59]. The main sources of B contamination in soil are the borosilicate mineral tourmaline, underground currents, groundwater, and seawater [60].
The main sources of B contamination in soil are the borosilicate mineral tourmaline, underground currents, groundwater, and seawater [60].         According to the I-geo classification, 33.4% and 57.1% of the soil samples in the surface layer (S), 52.3% and 38.1% in the subsurface layer (Sub), and 28.5% and 57.1% in the underground layer (Und) were strongly contaminated and strongly/extremely contaminated, respectively, with Ni (Table 6). For Co, about 71.4% and 90.4% were strongly/extremely contaminated in the S and Sub layers, respectively, while 66.6% of the soil samples in the Und layer were extremely contaminated (Table 6). For Cr, about 76.1%, 76.1%, and 71.4% were strongly contaminated in the S, Sub, and Und layers, respectively. About 47.6% and 33.3% of the soil samples from the S layer, as well as 52.3% and 23.8% from the Sub layer, were strongly contaminated and strongly/extremely contaminated, respectively. Additionally, For B, 14.3% and 76.1% of the soil samples in Und were moderately/strongly contaminated and strongly contaminated. The rest of the soil samples from the three layers were uncontaminated/moderately contaminated to mod- According to the I-geo classification, 33.4% and 57.1% of the soil samples in the surface layer (S), 52.3% and 38.1% in the subsurface layer (Sub), and 28.5% and 57.1% in the underground layer (Und) were strongly contaminated and strongly/extremely contaminated, respectively, with Ni (Table 6). For Co, about 71.4% and 90.4% were strongly/extremely contaminated in the S and Sub layers, respectively, while 66.6% of the soil samples in the Und layer were extremely contaminated (Table 6). For Cr, about 76.1%, 76.1%, and 71.4% were strongly contaminated in the S, Sub, and Und layers, respectively. About 47.6% and 33.3% of the soil samples from the S layer, as well as 52.3% and 23.8% from the Sub layer, were strongly contaminated and strongly/extremely contaminated, respectively. Additionally, For B, 14.3% and 76.1% of the soil samples in Und were moderately/strongly contaminated and strongly contaminated. The rest of the soil samples from the three layers were uncontaminated/moderately contaminated to moderately contaminated for all the trace elements.

Assessment of Contamination Risk Using Pollution Load Index
In this study, the pollution of soils was assessed on the basis of PLI values. The PLI_S values varied from 0 to 35.50, and the mean value was 22.86. The PLI_Sub values varied from 11 to 34.43, and the mean was 23.87. The PLI_Und values varied from 0 to 32.61, and the mean was 223.20 (Table 7). These results agree with those of Elbehiry et al. [61], who evaluated the risks for four trace elements based on the PLI in soils of the Nile Delta close to the studied area. They found that the PLI ranged from 0.03 to 23.36 across the studied soils. According to the PLI classification, 4.76% and 95.24% of the samples were unpolluted and very highly polluted, respectively, in the S layer of the soil profiles. All the samples were very highly polluted in the Sub layer of the soil profiles. Additionally, 14.29% and 85.71% of the samples were unpolluted and very highly polluted, respectively, in the Und layers of the soil profiles (Table 8). This region is one of Egypt's most inhabited, fertile, and cultivated, which means it supports a large population. Furthermore, since it is close to the sea and hosts many of Egypt's industrial and agricultural resources, as well as much domestic drainage in the Nile Delta, this region is subjected to many stresses. As well as the sources of these elements are the industrial drainage in the El Gharbia main drain [61]. PLI_S is pollution load index in the surface layer; PLI_Sub is pollution load index in subsurface layer; PLI_Und is pollution load index in underground layer. U.P. is unpolluted; V.H.P. is very highly polluted.

Performance of PLSR and MLR Models in Predicting the PLI
Mathematical techniques can be used to calculate the PLIs of soil sites with high accuracy [47]. These methods, however, are time consuming, since they require many mathematical equations to convert several numbers of trace element data into a single value that represents the soil pollution levels. By contrast, the PLSR and PLR methods are easy and do not need several steps for calculating the PLI. The multivariate regression models such as PLSR and MLR have recently been used as alternative methods to predict the water quality indices based on data for several trace elements [37,39].
To reduce the strongly collinear independent variables to a small minority of orthogonal factors, multivariate statistical techniques, PLSR models, based on the four trace elements, were tested for predicting the PLSR techniques can be used to identify optimized models that enhance the efficiency when searching for optimized relationships [53,[62][63][64][65]. The calibration (Cal.) models of PLSR and MLR performed the best in predicting the PLI based on four trace elements, with R 2 = 0.91-0.95 in the surface layer, 0.94-0.97 in the subsurface layer, 0.92-0.99 in the underground layers, and 0.93-0.97 across the three different layers (Tables 9 and 10). The validation (Val.) models performed the best in predicting the PLI based on data for four trace elements, with R 2 = 0.89-0.93 in the surface layer, 0.91-0.96 in the subsurface layer, 0.89-0.94 in the underground layers, and 0.92-0.94 across the three different layers (Tables 9 and 10). In general, the Cal. and Val. of PLSR models performed better in predicting the PLI for the three different layers than the MLR models (Tables 9 and 10; Figures 6 and 7). The Cal. and Val. of PLSR models showed a higher R 2 and lower RMSE and MAD than the MLR models. For example, the RMSE and MAD for the Cal. Models of PLSR were 2. 19 (Tables 9 and 10). PLSR analysis selected the optimal number of latent variables (LVs) from 1 to 2 in order to represent the calibration data without overfitting or underfitting. The PLSR and MLR models showed a very small drop in the quality of the performance measures (R 2 , RMSE, MAD, and Acc) when moving from the calibrating stage to the testing stage (Tables 9 and 10; Figures 6 and 7). A significant positive relationship was also obtained between the measured and predicted of I-geo and PLI in case of calibrating and validating values (Tables 9 and 10; Figures 6 and 7). Table 9. Results of calibration (R 2 cal , RMSE C , MAD c , and Acc c ), and ten-fold cross-validation (R 2 val , RMSE v , MAD v , and Acc v ): partial least-square regression models of the relationships between four trace elements and pollution load index (PLI). ***: p < 0.001.

Pollution Load Index
Layers LVs To the best of our knowledge, the issue of predicting PLI using PLSR and MLR models, based on trace element data, has not been addressed to date. The multivariate regression models were recently shown to perform well in predicting the water quality indices [37,39]. For example, Gad et al. [37] found that PLSR based on data for several trace elements accurately estimated four pollution indices for water and the drinking water quality index (DWQI) for both the Cal. and Val. Models; R 2 varied from 0.98 to 1.00 for the Cal. and from 0.88 to 0.99 for the Val. Elsayed et al. [39] found that principal component regression (PCR) and support vector machine regression (SVMR) represented robust models for predicting six water quality indices in the Cal. and Val. models; R 2 varied from 0.48 to 0.99. Finally, the results obtained from this study prove that both PLSR and MLR have the potential to predict the PLI in and across three different layers.  Table 9.  Table 9.  Table 10.

Conclusions
Trace metal (Ni, Co, Cr, and B) contamination in the north Nile Delta of Egypt wa assessed based on the I-geo and PLI. The distribution patterns of trace metals in th three layers of the soil profiles indicate high pollution. The status regarding Ni, Cr, and B, according to I-geo classification, ranged from uncontaminated/moderately contami  Table 10.

Conclusions
Trace metal (Ni, Co, Cr, and B) contamination in the north Nile Delta of Egypt was assessed based on the I-geo and PLI. The distribution patterns of trace metals in the three layers of the soil profiles indicate high pollution. The status regarding Ni, Cr, and B, according to I-geo classification, ranged from uncontaminated/moderately contaminated to strongly/extremely contaminated. Co ranged from uncontaminated/moderately contaminated to extremely contaminated. According to the I-geo classification, 33.4% and 57.1% of the soil samples in the surface layer (S), 52.3% and 38.1% in the subsurface layer (Sub), and 28.5% and 57.1% in the underground layer (Und) were strongly contaminated and strongly/extremely contaminated, respectively, with Ni The PLI showed that the majority of the samples were very highly polluted. According to the PLI classification, 4.76% and 95.24% of the samples were unpolluted and very highly polluted, respectively, in the S layer of the soil profiles. All the samples were very highly polluted in the Sub layer of the soil profiles. Additionally, 14.29% and 85.71% of the samples were unpolluted and very highly polluted, respectively, in the Und layers of the soil profiles; thus, high concentrations of the four trace elements may present potential health risks for the human populations residing in the surrounding area. The deterioration of soil quality in this area can be attributed to large applications of agricultural resources, industrial activities, and poor drainage and its location close to the sea and Lake Burullus. The PLSR and MLR models showed robust performance in estimating the PLI in and across different soil layers, showing the highest R 2 values, lowest RMSE and MAD values, and greatest slope values in calibration and validation datasets. The PLSR and MLR models showed a very small drop in the quality of the performance measures (R 2 , RMSE, MAD, and Acc) when moving from the calibrating stage to the testing stage. The Cal. and Val. of PLSR models showed a higher R 2 and lower RMSE and MAD than the MLR models. For example, the RMSE and MAD for the Cal. Models of PLSR were 2.19 and 1.68 in the surface layer, 1.07 and 0.83 in the subsurface layer, 1.71 and 1.24 in the underground layers, and 0.94 and 1.19 across the three different layers, respectively. Future studies should evaluate both the PLSR and MLR models under different environmental conditions for different soils.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/su13148027/s1, Table S1. Physiochemical parameters in three layers for different soil profiles; Table S2. Trace element concentrations in soil samples; Table S3. Geoaccumulation index values and contamination levels in soil.