1. Introduction
Climate change is significantly altering arid and semi-arid regions, impacting both the environment and communities [
1]. Shifting weather patterns, rising temperatures, and changing ecosystems are introducing complex challenges that are affecting the balance in ecological systems and societal well-being [
2]. Coastal areas face rising sea levels, while altered weather impacts agricultural lands and water availability [
3]. Notably, the Kerkennah archipelago in Tunisia, stands out as a particularly vulnerable locale within this context. It is currently confronting a pressing and dire situation, characterized by sea-level rise and critical drought conditions. These environmental challenges have precipitated salinization of the island’s surface and a gradual expansion of salt flats [
4], thereby exacerbating the island’s vulnerability and destroying its remaining agricultural lands.
According to the literature, very few studies have been conducted on soil salinization in the Kerkennah archipelago [
4,
5,
6,
7]. In fact, the primary investigator for soil salinization research, especially on the expansion of salt flats (sebkhas) in Kerkennah, is Etienne [
4,
5,
6], whose studies extensively used remote sensing methods. However, the most recent publication in this regard was published in 2021 [
7], in which spectral indices such as the normalized difference vegetation index (NDVI), the automated water extraction index (AWEI), and the salinity index (SI) were used to map soil salinity and identify areas with halophytic vegetation.
Although salinization of the Kerkennah’s lands has become very noticeable, particularly evident due to the decline of palm trees, there remains to be a lack of information concerning the agricultural lands, and a comprehensive and systematic monitoring approach for soil salinization in Kerkennah is notably absent. The absence of such monitoring limits our capacity to make informed decisions and implement targeted interventions to mitigate the adverse impacts of soil salinization on the local ecosystem and agricultural practices.
Traditional soil salinity monitoring and prediction often involve labor-intensive fieldwork and sampling. Prior research has mainly focused on qualitatively differentiating between salinized and non-salinized soils by analyzing soil salinity distribution and dynamics. However, recently, there has been a transformative shift, favoring remote sensing [
8], geographic information systems (GIS), and advanced modeling for mapping soil salinity. These technologies provide broad spatial coverage, which is crucial for both agricultural and environmental perspectives, by using remote sensing data to monitor and detect soil salinity effectively.
Recent advances in remote sensing technology have significantly improved the mapping and monitoring of diverse soil attributes, including salinity, by utilizing medium- to high-resolution satellite data [
9]. In particular, Sentinel-2 has been useful in mapping soil salinization [
10,
11,
12,
13]. Overcoming climate challenges, a recent study employed machine learning techniques with Sentinel-2 data from the Google Earth Engine to accurately predict soil salinity [
14]. The authors also combined multiple spectral indices, such as the modified normalized difference water index (MNDWI) by Han Qiu (2005) [
15] and the normalized difference salinity index (NDSI) formulated by Khan et al. in 2005 [
16]. These indices, along with the NDVI, highlight distinct features like water, vegetation, and salinity in satellite images.
Nevertheless, monitoring by relying on optical sensors like Sentinel-2 could be limited by weather conditions. Synthetic Aperture Radar (SAR) remote sensing offers all-weather monitoring and penetration capabilities that can capture the attributes of the soil beneath the surface layer [
17]. Given the increased accessibility of SAR technology, like the Sentinel-1 mission developed by the European Space Agency, which offers high-resolution imagery (up to 10 m) and a more frequent revisit cycle (6 days) [
18], there is now a better chance for enhancing the mapping and monitoring of soil salinity and its associated characteristics. Approaches like machine learning algorithms show promise in modeling the relationship between remote sensing data and electrical conductivity (EC) by using simple linear regression [
19], random forest regression [
20], support vector machine [
21] and other types of regression models [
22]. Despite challenges, these statistical models have shown promising results in salinity estimation using Sentinel-1 images in arid and semi-arid regions [
12,
23,
24,
25,
26,
27,
28]. These advancements pave the way for improved soil salinity estimation and management. Certain researchers have integrated sensors with varying characteristics to study attribute mapping, and they have discovered that the fusion of multiple sensor sources can significantly enhance the precision of digital soil mapping [
29,
30]. While there has been advancements in utilizing Sentinel-2 data for mapping soil salinization, utilization of the combination of Sentinel-1 and Sentinel-2 data to predict soil salinity remains relatively limited [
30,
31].
This study aims to assess the effectiveness of various machine learning algorithms when applied to Sentinel-1 data for soil salinity prediction in the Kerkennah region and to generate comprehensive spatiotemporal maps, since 2018, using Sentinel-1 and Sentinel-2 data and the capabilities of the Google Earth Engine. By harnessing the potential of remote sensing in conjunction with data-driven methods, this research not only provides valuable insights but also has the potential to guide decision-making processes regarding soil salinity management and sustainable land use planning in the Kerkennah archipelago.
It is noteworthy to emphasize a significant advancement in this research, which lies in the introduction of spatiotemporal series encompassing soil salinity and normalized difference vegetation (NDVI) data since 2018. This pioneering endeavor aims to unveil complex environmental dynamics, shedding light on the intricate interplay between drought conditions, soil salinity variations, and the broader ecological landscape of the Kerkennah archipelago. Consequently, this study marks the first attempt to employ machine learning techniques in conjunction with Synthetic Aperture Radar (SAR) data to monitor, predict, and understand the evolution of soil salinity dynamics in the unique context of the Kerkennah archipelago.
This study seeks to offer a valuable monitoring tool intended for use by decision-makers and farmers, especially in agricultural regions grappling with the critical problem of soil salinization, where such crucial information is currently absent. It addresses the urgent demand for efficient monitoring in areas where soil salinity represents a substantial challenge to agricultural productivity and long-term sustainability.
The paper is organized as follows: In
Section 2, the study areas, the datasets used, and the methodology implemented for processing RADAR and optical data are discussed. A detailed presentation of the machine learning algorithms employed for performance evaluation is also provided.
Section 3 is dedicated to the development, testing, and validation of soil salinity prediction maps. Finally, the correlations between these data and vegetation health are examined and discussed.
2. Materials and Methods
2.1. Study Area
Kerkennah is an archipelago located in the Gulf of Gabes, about 18 km from the town of Sfax in Tunisia, and it consists of two main islands, i.e., Chergui and Gharbi, as shown in
Figure 1, along with twelve smaller islets.
The archipelago covers an area of 15.7 thousand hectares and is characterized by a flat and monotonous relief with low slopes and a maximum altitude of 13 m. The archipelago is known for its fragile natural environment, which is characterized by an arid climate with a long dry period with an average annual precipitation of 200 mm/yr. The area is dominated by soft formations, particularly red würmian silts, which are prone to coastal erosion [
32]. Additionally, saline soils cover almost half of the total area and only a few agricultural lands are left in Kerkennah [
4].
The fragility of Kerkennah’s environment is further exacerbated by two major problems. Firstly, the sea-level rise and, secondly, the continuous expansion of salt playas at the expense of the palm groves that are dying gradually, as shown in
Figure 1. This is primarily caused by human activities, such as increased salt production, unregulated construction practices, and illicit sand extraction for building purposes [
33].
The predominant agricultural activity on the islands revolves around olive cultivation, which is the most widely grown crop. Additionally, pomegranates, vines, and naturally occurring palm trees are also cultivated [
34].
The research area, where samples were collected, as shown in
Figure 2, encompasses a diverse range of land types which allowed us to examine variations in soil salinity across different land uses and salinity levels. In fact, it comprises a big part of the renowned “Zorii Forest,” as well as segments of the irrigated parcels of Ramla (irrigated with underground water from the well with salinity level of approximately 3.6 g/L, shown in
Figure 1) and parts of the Brenka sebkha and the Ouled Kacem sebkha. This geographical choice offers a distinct differentiation between saline and non-saline lands. Notably, the “Zorii Forest” encompasses a vast 500 hectares, making it the largest and oldest cultivated land in Kerkennah. It is characterized by its “rendzina” soil type [
35], abundant in limestone, known locally as “hmeda” in Kerkennah. This area is also characterized by its diversified vegetation and its unique grape and fig varieties, some of which have flourished for centuries.
2.2. Ground Truth Data
Ground truth data for soil salinity levels were collected during field surveys on 20 and 21 February 2023. A total of 59 soil samples were collected from various locations representing different land cover types and soil salinity levels, as shown in
Table 1. Each soil sample was georeferenced using a Global Positioning System (GPS) device for accurate spatial integration.
The depth of the soil samples ranged from 0 to 30 cm. The images in
Figure 1 depict four sample sites. A standardized procedure was followed to collect soil samples at each location. Each analyzed sample was a composite of 9 soil samples collected from different places of a (10 × 10) m square centered at the sample location. This approach was followed to optimize the sample representation within the Sentinel-1 image pixel (10 × 10) m square. We strategically positioned the samples in close proximity to each other to enhance the accuracy of our digital mapping within the study area.
When the collected samples arrived at the laboratory, they were air dried at the laboratory room temperature for almost a week. After that, fragments of the samples, such as stones, wood, and roots, were taken out and pounded to a fine powder in an agate mortar and pestle until they could pass through a 2 mm sieve. The electrical conductivity (EC) of a suspension of unfiltered 1:5 dirt and deionized water was measured by the following steps: Soil suspensions were prepared by combining 5 g of soil with 25 mL of distilled. A consistent agitation was maintained using a magnetic agitator for a duration of 2 h. Subsequently, the electrical conductivity (EC) of the suspensions was quantified using a conductivity meter. It is noteworthy that the EC meter underwent calibration at 25° using a KCl standard solution (1.413 dS/m) prior to conducting measurements on the soil suspensions. This calibration procedure underscores the precision and accuracy of the EC measurements.
2.3. Sentinel-1 C-SAR and Sentinel-2 MSI Satellite Dataset
The S1 GRDH_1SDV (VV, VH) image, obtained from [
36] was captured on 21 February 2023, and reveals intricacies of the entire Kerkennah archipelago’s features. Operating within the C-band frequency of 5.405 GHz, the Sentinel-1A SAR data employ the interferometric wide-swath mode to efficiently encompass extensive areas, and as a ground range detected product type, it achieves precise representations, fortified by an impressive pixel resolution of 20 × 22 m and pixel spacing of 10 × 10 m. Additionally, a temporal resolution from 3 to 5 days in Tunisia facilitate meticulous monitoring of landscape dynamics. While in its ascending orbit, the image offers dual-polarization observations (VV and VH), spanning a swath of 250 × 350 km and featuring an incidence angle varying from 30.6 to 46.0 degrees [
18]. These collective attributes enhance the adaptability of this indispensable dataset, significantly contributing to the advancement of insights into environmental dynamics.
The Sentinel-2 MSI satellite data, launched on 23 June 2015, signify a significant advancement in the field of terrestrial observation. Covering an extensive geographic range from 82.8° N to 56° S, this satellite comprises four units (A, B, C, and D), operating at an altitude of 786 km and conducting observations at 10:30 LTDN. With a spectrum capturing frequencies ranging from 490 nm to 1375 nm, distributed across 13 distinct bands, the Sentinel-2 MSI offers diverse resolution capabilities. Details at a 10 m resolution are available in bands B2 (490 nm), B3 (560 nm), B4 (665 nm), and B8 (842 nm), while bands B5 (705 nm), B6 (740 nm), B7 (783 nm), B8a (865 nm), B11 (1610 nm), and B12 (2190 nm) provide a 20 m resolution. Bands B1 (443 nm), B9 (940 nm), and B10 (1375 nm) exhibit a 60 m resolution [
37]. With a bit depth of 12, these data are acquired over a width of 290 km and are subject to a revisiting period of 5 days. The data are stored in the SAFE format for effective management and accessibility. Notably, all data from the years 2019, 2020, 2021, 2022, and 2023 downloaded [
37], have been meticulously processed to extract spatiotemporal vegetation index series, thereby providing a means to monitor temporal variations in vegetation cover.
Figure 2 depicts an image captured by the Sentinel-1 satellite on 21 February 2023. The image showcases a color composite of backscatter values (σ0) derived from Sentinel-1 data, expressed in decibels (dB), originating from the Kerkennah archipelago. In this composition, distinct color channels are allocated as follows: red signifies VV polarization, green denotes VH polarization, and blue represents the VH/VV ratio. Notably, this image features a magnified perspective of a specific region, where on-ground measurements were executed on 20 and 21 February 2021. This targeted area is confined within a section of the Zorri Forest agricultural zone.
Examination of the color composite radar image underscores the notable discrepancies in land cover throughout the entire study area. These disparities are conspicuous due to the unique attributes inherent in the various radar polarizations employed for the composite representation. More specifically, the red color, symbolizing the VV polarization, accentuates smooth and compact surfaces. Such areas may encompass elements like roadways, urbanized zones, or exposed soils. The green color, linked to VH polarization, unveils rugged surfaces and regions characterized by denser vegetation. Vegetation, woodlands, and cultivated fields are distinguishable in this hue. The blue element, corresponding to the VH/VV ratio, enables the differentiation of distinct terrain characteristics. This ratio can yield insights into soil roughness.
2.4. Preprocessing and Approach for Salinity Mapping
The process of generating spatiotemporal salinity maps in the Kerkennah agricultural zone, utilizing Sentinel-1, Sentinel-2, and in situ data, comprises a series of interconnected stages.
The data acquisition and preprocessing phase entail multiple steps as follows: Acquiring Sentinel-1 SAR image data, applying radiometric calibration to convert backscatter coefficients, subsetting to cover the study area, employing speckle filtering for noise reduction, and executing range doppler terrain correction. Calculation of the SAR index (
Figure 3), derived from mathematical equations involving vertical-vertical (VV) and vertical-horizontal (VH) polarizations, was conducted.
In total, thirteen SAR indices were computed based on the in situ electrical conductivity parameter. Then, these indices along with VV and VH were subjected to a thorough analysis to identify the most promising potential indicators. Subsequently, a correlation matrix was employed to assess and to determine the best correlated index with the EC values.
In situ data collection (
Table 1) involved measuring the soil electrical conductivity (EC) distributed across the study area. Regarding Sentinel-2 data, optical imagery from 2018 to 2023 was acquired using the Google Earth Engine, followed by atmospheric correction and calculation of the normalized difference vegetation index (NDVI) values for a vegetation analysis. Then, time series datasets were constructed by organizing the SAR indices, EC data, and NDVI values with corresponding dates. The application of machine learning algorithms, including regression techniques, utilized in situ EC data, SAR indices, and NDVI as input parameters. The model was selected based on the coefficient of determination R
2 and the root mean square error (RMSE) values that were calculated as follows (a model with an R
2 closer to 1 indicates better explanatory power, while an RMSE closer to 0 reflects more accurate predictions):
where y(i) is the observed value for the
ith data point,
is the predicted value for the i-th data point, and
is the mean (average) of the observed values.
Once the model was selected and validated, it was employed to predict EC values throughout the area, and then throughout the whole Island, combining NDVI data to generate spatiotemporal maps illustrating trends in soil salinity and vegetation health.
The final step involved interpreting the generated maps to discern spatial and temporal variations in salinity, effectively communicated through visualization methods that highlight changes in the agricultural landscape.
Figure 3 summarizes the methodology applied in this study.
The insights extracted from
Figure 4 offer a comprehensive view of the relationships between Sigma VH, Sigma VV, and various indices and the EC values. Among these factors, Sigma VH, Sigma VV, and three other indices stand out for their significant correlations, denoted as “R” (referring to the Pearson correlation coefficient), as depicted in
Figure 4. Notably, these correlations exhibit a consistent negative moderate correlation, ranging from approximately −0.4 to −0.47. This consistency in negative correlation values guided us in making a deliberate and strategic choice. We opted to simplify our analysis by exclusively utilizing Sigma VH for the subsequent phases of our study, given its highest correlation value of −0.47. This strategic decision not only streamlined our approach but also ensured that our attention remained focused on the most influential and representative parameter for our research.
2.5. Machine Learning Algorithms Used
2.5.1. Random Forest Regression
The random forest (RF) algorithm is an ensemble learning method proposed by Breiman [
37]. It uses multiple decision trees with the same distribution to set up a forest for training and predicting sample data [
38]. Decision trees are non-parametric supervised learning methods that summarize decision rules from data, and use the tree structure to solve classification and regression problems.
In the case of regression, the RF algorithm uses regression trees (RT). At each branching of the RT, the mean of the samples on the leaf nodes and the mean square error (MSE) formed between each sample are calculated. The algorithm pursues the minimum MSE as the branching condition until no more features are available or the overall MSE is optimal, at which point the regression tree stops growing [
39].
2.5.2. Polynomial and Linear Regression
Polynomial regression is a type of linear regression that aims to find an appropriate polynomial of a certain order to describe the relationship between an independent variable (x) and a dependent variable (y) [
40]. It is a special case of multiple linear regression (MLR) where the polynomial equation is used to capture the curvilinear interaction between the variables [
41]. Polynomial regression allows for more flexibility in fitting the data compared to simple linear regression. The general form of polynomial regression is illustrated in Equation (3):
The coefficients (am) represent the polynomial, and the order m determines the complexity of the polynomial. When m = 1, it becomes a linear model, and when m = 2, it becomes a quadratic regression model. Polynomial fitting aims to construct a polynomial of order m that approximates the data points, minimizing the residual error between the estimated and actual values.
The polynomial equation includes higher-order terms such as quadratic and cubic, which can better capture the nonlinear relationships between the variables [
42]. The degree of the polynomial determines the complexity of the model [
43]. The least square method is commonly used to estimate the coefficients of the polynomial equation.
2.5.3. Exponential Regression
Exponential regression represents a nonlinear modeling approach within statistical analysis, designed specifically to accommodate datasets that display exponential growth or decay characteristics. This method entails the fitting of an exponential function to the data, where the functional relationship between the dependent variable and one or more independent variables is expressed through an exponential equation [
44].
The functional form of the exponential equation is articulated as:
In this context:
y Signifies the dependent variable, typically the focal aspect of prediction or elucidation;
a Denotes the coefficient that vertically scales the exponential curve;
e Symbolizes the base of the natural logarithm, an unvarying constant approximately equal to 2.71828;
b Represents the exponent governing the rate of growth or decay;
x Stands for the independent variable.
Exponential regression proves to be particularly efficacious when confronted with datasets that manifest a consistent percentage rate of increment or decrement over a spectrum of temporal or independent variable values [
45].
3. Results
3.1. Performance of Machine Learning Algorithms
The performances of the machine learning algorithms, including polynomial regression, random forest, exponential regression and linear regression were evaluated based on their predictive accuracy for soil salinity estimation.
Table 2 presents the performance metrics obtained using four different regression models, using sigma VH and EC values using python.
From
Table 2, based on their coefficient of determination (R
2) and root mean square error (RMSE) values, the evaluation of several regression algorithms on the dataset indicates distinct performance trends. On the one hand, with an R
2 of 0.84 and a low RMSE of 0.83, the random forest model exhibits strong fitting, and despite having a lower R
2 of 0.75, the exponential regression model produces reliable predictions, as shown by its low RMSE of 0.47. The linear regression model, on the other hand, exhibits weak explanatory power with an R
2 of 0.32 and bigger residuals with an RMSE of 0.79. With an R
2 of 0.91, suggesting good fitting, and an astonishingly low RMSE of 0.28, confirming its accuracy in prediction, the polynomial regression model is impressive. This concise analysis underscores the superior performance of random forest and polynomial regression in capturing the data’s intricate relationships. Polynomial regression initially appeared to have the highest correlation, but it generated, in fact, a fifth-degree equation, which overly complicated the relationship we were seeking to establish. We eliminated it due to concerns about overfitting. When deciding between the random forest (RF) and exponential regression models, even though the RF model had a slightly higher R
2, we favored the exponential regression model for its simplicity and its alignment with the inherent exponential-like curve observed in our data and also for having lower and better RMSE results. Our goal was to find a direct and interpretable relationship, making exponential regression a more suitable choice
Opting for the exponential regression model appears to be the most fitting choice given this dataset and the R
2 and RMSE results. The distinctive trends and patterns in the data align well with the exponential nature of the regression model as shown in
Figure 5. This alignment highlights the model’s ability to capture and represent the intricate relationships in the dataset, making it a suitable choice for analysis by choosing the exponential regression model.
The equation driven from the chosen exponential model is illustrated in Equation (4):
where Y represents the predicted values of electrical conductivity and X corresponds to the values of different Sigma VH obtained from the Sentinel-1 data.
3.2. Soil Salinity Prediction Maps
Then, the best-performing machine learning algorithm, i.e., exponential regression, was applied to the entire Sentinel-1 dataset to predict the annual soil salinity level across the study area and, afterwards, across the entire Kerkennah islands.
Figure 6 illustrates the distinct pattern of salinity distribution within the study area. The high salinity regions that correspond to EC values ranging between 12 and 30 ds/m are primarily concentrated in the northern and southern sectors, while the rest of the area displays low salinity levels close to zero. However, there are some small areas of high EC values ranging from 6 to 18 ds/m all along the study area. This distribution aligns logically with the notable demarcation between the Brenka sebkha to the north, the Ouled Kacem sebkha to the south, and the agricultural parcels of the “Zorii Forest” and the “hmeda” soil in between. Notably, smaller areas of elevated salinity (ranging from 6 to 18 ds/m) are evident within the agricultural lands, especially in the eastern and northeastern zones. These areas receive irrigation from wells with a salinity level of 3.6 g/L. In fact, the average amount of salt injected annually per hectare is about 9.35 tons within the irrigated perimeter of Remla, according to Fehri [
33]; however, this should not result in such high EC values, but we can see that this had a clear influence. It is possible that the model is overestimating these values because, while performing the validation with the validation sampling, we noticed this particular overestimation in moderately saline soils inside the agricultural parcels.
Across the time period from 2018 to 2023, a pattern of soil salinity variation is visible, and it appears to have its origins in the different annual precipitation patterns that distinguish each of these years. The interplay between rainfall and salinity dynamics becomes evident as we observe these slight variations. Notably, 2019 and 2020 emerge as an intriguing outlier within this context. It showcases the lowest EC values observed across all the years examined. This deviation sparks curiosity and invites a further investigation into the specific environmental factors and mechanisms that might have contributed to this exceptional dip in soil salinity.
Taking a closer look at
Figure 7, we applied the model on the entire island to see its efficiency, the differentiation among saline flats (sebkhas) can clearly be observed, characterized by high EC values ranging from 6 to 30 ds/m, while the remaining areas exhibit notably lower EC values near zero This alignment reinforces the accuracy of the mapping outcomes in effectively delineating regions with varying soil salinity.
The depiction of saline soils is highly accurate and aligns seamlessly with the soil occupancy map created by Etienne in 2017 [
4].
Upon closer examination of the temporal data, a compelling trend emerges. The islands display a slight oscillation in salinity, particularly noticeable in its southern sector, Gharbi Island, as evidenced by the transition from red to green shades in 2019 and 2020.
Unexpectedly, during these specific years, an apparent reduction in soil salinity becomes evident in the southern part of the Gharbi Island. It is worth emphasizing that, even amid the fluctuations in overall salinity levels, the fundamental topographical features of the sebkhas maintain their consistent form and presence. Upon closer examination, a compelling revelation emerges when analyzing the precipitation data for the years 2019 and 2020, sourced from the meteorological station of the Ramla sector, provided by the Territorial Extension Unit of Kerkennah (CTV Kerkennah). The hydrological year 2019–2020 stands out as having recorded the highest annual precipitation levels among all the years under consideration. Notably, the period from September 2019 to August 2020 witnessed an annual precipitation of 353 mm, with distinct spikes in October 2019 (167.5 mm) and March 2020 (53.5 mm). Furthermore, the previous hydrological year 2018–2019 also had an annual precipitation of 285.4 mm that apparently influenced the EC values of 2019. It becomes evident that this significant precipitation exerted a pronounced influence on the island’s salt levels, particularly within the sebkha regions, and more specifically in the case of the El West sebkha located on Gharbi Island. This area, which is characterized by a very altitude (between 0 and 2 m), appears to have been particularly affected by the substantial influx of rainfall, leading to observable changes in salt levels.
3.3. Correlation with Vegetation Health Data
The soil salinity prediction maps were analyzed in conjunction with vegetation health information derived from the Sentinel-2 data and NDVI values (
Figure 8).
This analysis aimed to identify correlations between soil salinity levels and land use practices. Areas with sparse vegetation cover or land uses that intensify salinization, such as improper irrigation practices, may exhibit higher soil salinity levels.
Upon initial observation, a notable trend emerges indicating a significant decrease in NDVI values over the entire time span from 2018 to 2023. In fact, areas with high NDVI values, ranging from 0.15 to 0.3, correspond to areas with high vegetation; however, areas with low NDVI values, ranging from 0 to 0.15, correspond to areas with low vegetation. This observation provides us with insight into a decline in the overall vegetation health of the islands’ vegetation.
A closer examination reveals an interesting anomaly in the year 2020, where vegetation experienced an unexpected increase, particularly, in the northern part of Chargui island. Upon reexamination of the soil salinity maps, a noteworthy observation arises, i.e., the regions exhibiting elevated NDVI values in 2020 coincide with areas of markedly high salinity. It is essential to recognize that sebkhas, known for their high salt content, harbor halophytes that can significantly impact NDVI values. These findings prompt further investigation into the intricate relationship between vegetation, soil salinity, and the presence of halophytes. Relying only on NDVI values may not provide a definitive understanding of the vegetation type, particularly whether it comprises halophytes. Therefore, differentiating areas with halophytes as saline regions cannot simply rely on NDVI assessments, which has also been demonstrated in another recent publication [
7].
4. Discussion
The vulnerable Kerkennah archipelago, in addition to facing sea-level rise, is facing critical drought conditions, leading to salinization of its surface and expansion of salt flats. Despite the islands’ apparent salinization, a comprehensive and systematic monitoring approach for soil salinization, especially in agricultural lands, has been lacking, limiting informed decision making to address its adverse impacts. Recent advancements in remote sensing technologies have transformed soil salinity monitoring and prediction. The integration of Geographic Information Systems (GIS) and advanced machine learning algorithms with satellite data offers wide spatial coverage, essential for agricultural and environmental perspectives.
This study’s approach involved using machine learning algorithms with Sentinel-1 data to build a soil salinity prediction model. We also utilized Sentinel-2 data to generate maps of the NDVI values to better understand the correlation between soil salinity levels and land use practices.
Different regression models were applied on Sentinel-1 data to generate the best reliable model for constructing soil salinity prediction maps in the Kerkennah region. Various regression models were used, like polynomial, random forest, linear regression, and exponential regression. Similar recent studies (2023) have highlighted the efficacy of various regression algorithms, like random forest with R
2 = 0.80 [
24] and the PLSR model with R
2 = 0.66 [
46], for predicting soil salinity in semi-arid and arid regions. Notably, in our study, the exponential regression model, due to its accurate fitting R
2 = 0.75 and low root mean square error (RMSE) = 0.47 ds/m, was selected to predict soil salinity based on ground truth measurements and Sentinel-1 SAR data.
The elaborated maps, generated by applying the exponential regression model on different Sentinel-1 data using the Google Earth Engine, revealed distinctive patterns of the annual soil salinity distribution in the study area in the six previous years (from 2018 to 2023). High salinity regions were concentrated in the northern and southern sectors, corresponding to EC values ranging from 12 to 30 ds/m. In contrast, the remaining areas displayed low salinity levels close to zero. These findings aligned logically with the geographical features and the presence of sebkhas (the Brenka sebkha in the north and the Oueld Kacem sebkha in the south) and agricultural parcels. Interestingly, smaller areas with EC values from 6 to 18 ds/m were evident within agricultural lands, particularly, in the eastern and northeastern zones. Although there is a possibility of overestimation, these values provide insight into the potential enduring effects of utilizing underground water with a salinity level of 3.6 g/L for irrigation purposes. Over the years from 2018 to 2023, temporal variation in soil salinity was observed, possibly attributed to annual precipitation patterns.
Applying the model on the entire Kerkennah archipelago has yielded highly precise outcomes that closely resemble the earlier land use maps created by Etienne in 2017 [
4]. These outcomes effectively differentiate saline soil areas from agricultural regions. Between 2018 and 2023, fluctuations in soil salinity were noted, potentially linked to the varying patterns of annual precipitation. Notably, the years 2019 and 2020 stood out due to their significantly lower EC values especially in the southern part of the islands (Gharbi Island). This anomaly prompted further investigation, and upon analyzing the meteorological station data, it was evident that these two years experienced the highest annual precipitation levels. The relationship between increased rainfall and decreased soil salinity in these years provided valuable insights into the complex interplay between precipitation and salt levels.
To understand the correlation between soil salinity levels and land use practices, this study integrated soil salinity prediction maps with Sentinel-2-derived NDVI data using the Google Earth Engine. The analysis revealed a consistent decrease in NDVI values over the study period, suggesting a decline in vegetation health. Notably, an unexpected increase in NDVI values was observed in 2020 due to relatively high precipitation levels in 2019 and 2020, particularly, in the northern part of the archipelago. Intriguingly, this increase in vegetation health coincided with areas of high soil salinity, primarily sebkhas known for their halophytic vegetation. However, relying solely on NDVI data for differentiating halophyte-influenced areas may not provide accurate results as demonstrated in a previous study [
7].
The developed model presents a promising approach for accurate soil salinity prediction and mapping in the Kerkennah archipelago. This model could be used to monitor soil salinity in the entire island area and especially in agricultural parcels. However, it has shown its limitation when attributing EC values to saline soils. It seems to be overestimating EC values in saline or moderately saline areas. A future sampling campaign should be conducted to calibrate and better validate the model. This campaign would aim to gather additional ground truth data from different areas of the islands that accurately represent the varying degrees of soil salinity present in the study area. By incorporating this new data into the model, adjustments can be made to enhance its accuracy and to ensure that the predicted EC values align more closely with the actual soil salinity levels. Another intriguing approach would involve enhancing soil salinity models by integrating environmental variables through the utilization of a multiple criteria decision analysis.
5. Conclusions
In conclusion, the potential of Sentinel-1 data to predict soil salinity is highlighted through this study. This study focused on a localized model calibration approach, where the model was constructed through the utilization of ground truth measurements and Sentinel-1 SAR data.
The chosen machine learning algorithm, i.e., exponential regression, was applied to Sentinel-1 datasets using the Google Earth Engine, enabling the mapping of the average annual salinity of the Kerkennah islands over the past 6 years.
The model created very accurate soil salinity maps, highlighting the salty area in the island and even the moderately salty areas within the agricultural parcels. While certain overestimations of EC values in saline regions were observed, the overall reliability and capacity of the model to monitor soil salinization in the Kerkennah archipelago remain uncompromised.
With its implications, this study could serve as a foundational step toward the development of more precise models for monitoring soil salinity using SAR data in the Kerkennah archipelago. To this end, a subsequent sampling campaign could be carried out to fine tune and to better validate the current model. Alternatively, exploration of using alternative machine learning algorithms might lead to the creation of an even more refined predictive model.