A Modified Bare Soil Index to Identify Bare Land Features during Agricultural Fallow-Period in Southeast Asia Using Landsat 8

: Bare soil is a critical element in the urban landscape and plays an essential role in urban environments. Yet, the separation of bare soil and other land cover types using remote sensing tech-niques remains a significant challenge. There are several remote sensing-based spectral indices for barren detection, but their effectiveness varies depending on land cover patterns and climate conditions. Within this research, we introduced a modified bare soil index (MBI) using shortwave infrared (SWIR) and near-infrared (NIR) wavelengths derived from Landsat 8 (OLI – Operational Land Imager). The proposed bare soil index was tested in two different bare soil patterns in Thailand and Vietnam, where there are large areas of bare soil during the agricultural fallow period, obstructing the separation between bare soil and urban areas. Bare soil extracted from the MBI achieved higher overall accuracy of about 98% and a kappa coefficient over 0.96, compared to bare soil index (BSI), normalized different bare soil index (NDBaI), and dry bare soil index (DBSI). The results also revealed that MBI considerably contributes to the accuracy of land cover classification. We suggest using the MBI for bare soil detection in tropical climatic regions.


Introduction
Remote sensing and satellite imagery have been widely utilized for monitoring land and environmental changes, including urban expansion [1][2][3], deforestation [4][5][6], climate change impacts [7,8], wildfire damage [9,10], and other natural and anthropogenic dynamics. Presently, there are diverse high-resolution satellites which positively support urban studies, such as HyMap [11], Worldview [12], SPOT [13,14], and Sentinel-2 [15,16]. An integration of Sentinel-2 and Sentinel-1, a free-of-charge Synthetic Aperture Radar (SAR) sensor, has potential for urban mapping [17,18]. Yet, the major limitations of these observation data are data-acquired costs and time of coverage, especially for urban expansion studies, which are often considered over a long-term period. The cost per scene for commercial satellites is costly relative to the income of developing countries. The freely accessible data with fine resolution such as the Sentinel mission has only been available since 2015. Therefore, Landsat data (i.e., 4, 5, 7-ETM, and 8-OLI) are commonly used in numerous studies worldwide since its data cover nearly 50 years consecutively [19]. With medium multispectral resolution and powerful thermal infrared (TIR) sensors, Landsat data are applied for monitoring urban expansion and surface temperature, an essential parameter in the urban environment [20][21][22][23][24].
The presence of bare soil in peri-urban and countryside regions has posed difficulties for accurate classification of urban land covers. This is partly because bare soil and urban features have relatively similar spectral characteristics [25]. This confusion could be limited by using settlement products (e.g., World Settlement Footprint-WSF [26], Global Human Settlement Layer-GHSL [27,28]) to mask out urban areas. Data continuity is a disadvantage since these data are solely aggregated for a few certain time hooks. Multitemporal data are a potential approach to detect bare soil since agricultural bare soil is seasonal, whereas urban features are permanent. Yet, applying multitemporal data in the tropical region is relatively difficult due to data loss during the rainy season when the cloud cover rate is up to 85-95% [29,30]. Bare soil detection, therefore, is still a challenging task.
The bare soil includes fallow agricultural land during the fallow period of cultivation-land preparation time of transition between two crops-and available land after the land clearance process or pre-urbanized parcels [31]. In some urban areas, the presence and extension of bare soil are closely related to dust storm frequency, especially in desert and arid climate regions [32][33][34]. Besides, barren land without vegetation is vulnerable to sediment transport, soil erosion, and landslides [35][36][37]. Such fallow land is essential because of its relatively lower heat capacity relative to other land cover types, inducing high temperatures over its land surface. It is, therefore, vital to classify bare soil in urban studies, mainly when the urban heat island (UHI) is assessed in various temporal and spatial scales [38]. The dynamics of land cover changes and inaccuracy in classifying bare soil leads to inadequate assessment of the heat harshness in inner city areas.
Barren land is often misclassified as built-up areas and vice versa. We may face an "unrealistic urbanization" because the classified urban areas are actually bare soil areas. Consequently, urban development rates and developing tendencies might be erroneously assessed in the very first step of urban planning. Once barren land is efficiently differentiated from built-up land, it improves urban expansion assessment and helps contribute to other developments such as agricultural management. For instance, there were 238,276 hectares of rice cultivation in the Vietnamese Mekong Delta damaged during the severe drought and salinity intrusion in the dry season of 2015-2016 [39]. Such damage could be assessed by monitoring the fallow land areas. An identical evaluation is applicable for aquaculture, such as allocating fallow shrimp ponds [40,41]. Therefore, bare soil monitoring is crucial for land use planning and agricultural practices, and policymaking.
Scholars worldwide have proposed many spectral indices derived from Landsat to discriminate bare soil from other lands (Table 1). Generally, bare soil indices are constructed from various Landsat wavelengths from visible to near-infrared (NIR) and shortwave infrared (SWIR) wavelengths with a two, three, and four band combined index. Koroleva et al. [42] proposed an approach to separate bare soil applying the Red-NIR spectral space, which is a principal basis in determining soil type [43]. Although bare soil occupies an ellipse-shaped area on the Red-NIR plot, it is smaller than the vegetation area [42]. It may lead to difficulty in discrimination between bare soil and other land cover types in general. Lin et al. [44] proposed a non-ratio index, bareness index (BI), for urban feature extraction in general from band Red, NIR, and SWIR1 (1.57-1.65 μm). However, this index has limited ability to separate bare soil from built-up areas [44]. Likewise, enhanced built-up and bareness index (EBBI) was introduced to detect urban features and bare soil, but its differentiation ability is inadequate [31]. Bare soil features are also not distinguished from built-up features in bare soil index 3 (BSI3) [45] and normalized difference soil index 1 (NDSI1) [46]. For the bare soil indices, including BSI2 [47], BSI [48], BSI1 [34], and dry bare soil index (DBSI) [49], barren land and urban area cannot be separated due to their similarities and overlapping thresholding values. Whether it is a twoband or four-band ratio, bare soil parcels were differentiated less clearly by these bare soil indices due to their limited capability. DBSI was initially proposed for arid climatic regions (i.e., Iraq), and applying DBSI in humid regions is therefore inappropriate. Deng et al. [50] introduced a normalized difference soil index (NDSI2) in an attempt to highlight bare soil information by reversing the modified normalized difference water index (MNDWI) [51] based on the high reflectance of bare soil in the shortwave infrared wavelength. Yet, the NDSI2 is able to detect large and dry bare soil parcels while small and sparse parcels are often neglected. Thermal infrared wavelength (TIR) has been utilized to facilitate bare soil detection; for example, normalized difference bareness index (NDBaI) [52] and normalized difference bare land index (NBLI) [53]. The performance of NDBaI and NBLI is relatively high compared to other barren indices. Nevertheless, the disadvantage of using thermal infrared-based is low resolution originating from TIR (i.e., 60 m in Landsat TM/ETM, 100 m in Landsat 8). By using panchromatic (PAN) images with better spatial resolution (i.e., 15 m) and SWIR2 (2.11-2.29 μm), the modified normalized difference soil index (MNDSI) overcomes the limitation of low resolution on NDBaI. Besides, MNDSI shows excellent discrimination between urban areas and bare soil in a comparatively hot area like Dehradun (India) [54]. It is also the case that most of these indices were firstly designed for temperate climate and high latitude regions. It is well known that the barren index's performance is significantly affected by climate patterns [49]. Hence it is not suitable for the tropical monsoon climate with different bare soil patterns (i.e., homogeneous versus sparse bare soil, see Section 2.1).  This paper aims to introduce a more accurate bare soil index to support long-term studies related to land cover classification using Landsat images in the tropics. Firstly, we introduce a new bare soil index to improve the discrimination between bare soil and other land covers in finer resolution using a spectral index derived from Landsat 8 imagery. Subsequently, we assessed the proposed modified bare soil index's performance in two test sites and made further comparisons with three other available bare soil indices. Figure 1 presents a flowchart that describes the procedures of development and validation of the modified bare soil index from data used to output. Each step is discussed in detail below.

Test Sites
We chose the eastern Bangkok metropolis (Thailand) as the first test site (test site #1) because this is a semi-urban region with heterogeneous urban and annual agricultural landscapes, which become seasonal bare land in the fallow period [55] (Figure 2). The concentrated and smaller-size residential areas are interspersed with the fields due to the suburbanization process [56]. The second site (test site #2) is located in central Soc Trang province, a local city in the Vietnamese Mekong Delta, where most agricultural land is double-cropped rice or a rain-fed rice system [57]. The immense paddy fields become seasonal bare soil during the dry and salinity intrusion periods. Both test sites are located in flat low plains and distributed over the Gley soil (Gleysols-GL) based on the World Reference Base for Soil Resources (WRB) [58,59]. Specifically, the test sites are square with a 20 km edge and dominated by four major land cover types (i.e., bare soil, built-up, vegetation, and water bodies).

Data
Two Landsat 8 (OLI/TIRS) scenes were collected from the US Geological Survey (USGS) website (https://earthexplorer.usgs.gov accessed on 26 August 2020.) The image covering Bangkok metropolitan (Thailand) was captured on 19 February 2020 (WRS-2 path/row: 129/50), and the other image was also in the dry season of 2020 in Soc Trang province (Vietnam) on 25 February with path 125 and row 53. All images are Level-2 data (surface reflectance) with a cloud cover rate of less than 0.5%, minimizing atmospheric effects for further analyses.
Compared to other Landsat satellites, Landsat-8 was retrofitted with coastal aerosol and cirrus bands for cloud-related studies. The main drawback of Landsat 8 data is a low resolution of thermal infrared bands, about 100 m compared to 60 m in Landsat 7, affecting the accuracy of urban bare soil detection using the thermal band [60,61]. The thermal infrared band therefore was eliminated from potential indices, whereas the remaining bands were preferred to maintain the index resolution at 30 m. Shortwave infrared (SWIR) bands were especially considered since these bands can penetrate thin clouds and discriminate vegetation and soil [62,63]. Figure 3a shows the reflectance properties of four land cover types, which are visualized by 4000 training pixels on both sites. This graph also shows profiles of bare soil and built-up cover that are relatively similar and lead to a challenge in urban bare soil separation in urban studies. These two features' reflectance values vary depending on to visible wavelength, but urban features reflect more energy in this wave range. In contrast, bare soil reflects more near infrared (0.85-0.88 μm) and shortwave infrared 1 (SWIR1: 1.57-1.65 μm), band 5 and band 6 on Landsat 8, respectively. Bare soil tends to absorb energy in shortwave infrared 2 (SWIR2: 2.11-2.29 μm) against urban features.

Modified Bare Soil Index
On the other hand, water attracts most of the energy from the visible to the infrared spectrum, especially for the SWIR1 and SWIR2 channels. Vegetation absorbs these SWIR wavelengths, whereas NIR energy is mostly returned to the sensor onboard after reaching vegetated surfaces (Figure 3a,b). We applied the dissimilarities among bare soil, urban, and vegetation in NIR, SWIR1, and SWIR2 bands as the basis for separating bare soil from other land cover features. Based on these observations, we developed the modified bare soil index (MBI). Firstly, we tried a two-band index (Equation (1)) based on the unique distinction between bareness and urban areas on SWIR1 and SWIR2 to enhance bare soil signals (Figure 4b). This ratio is a normalized shortwave infrared difference soil-moisture (NSDS), determining soil-moisture because of water absorption by shortwave infrared [64]. This ratio indicates that bare soil features were highlighted compared to urban features, but the most positive values of the index are vegetation. Subsequently, we proposed to add the NIR band into Equation (1)   f is an additional factor ( ) = 0.5 f .  (1)), which mostly highlights vegetation with bright tone, bare soil in gray and urban areas in black, and (c) three-band bare soil index (Equation (2)) emphasizing bare soil (bright tone), urban features in gray and vegetation in black.

Comparisons with Other Bare Soil Indices
This study calculated three other bare soil indices on Landsat 8 to compare with the MBI. Bare soil index (BSI1) was first introduced by Rikimaru et al. [65] to monitor forest status. Diek et al. [48] then modified SWIR1 to SWIR2 to improve bare soil detection, also called bare soil index (BSI). A modified bare soil index proposed by Zhao and Chen [52], normalized difference bareness index (NDBaI), is based on higher radiation of bare soil on the thermal band. NDBaI was an option to monitor bare soil, land conversion, impervious surface, and its relation to surface temperature [66][67][68][69][70]. Applying the thermal band in urban bare soil separation is inefficient as expected because the thermal band's pixel size is always coarser than multispectral bands, at 60 m for Landsat TM/ETM, and 100 m for Landsat 8. A recent bare soil index is dry bare soil index (DBSI). This DBSI is a fourband index, which is an inverse modified normalized difference index (MNDWI) adjusted by the normalized difference vegetation index (NDVI) [49]. The three mentioned indices were computed using the corresponding formulas in Table 1.

Bare Soil Classification Using Thresholding
Although a single index image together with other multispectral bands can be classified by supervised and unsupervised algorithms, thresholding is the simplest method to identify a specific land cover type [71][72][73][74][75][76]. The thresholding method was applied for evaluating the remote sensing-based index's performance [49]. After the four bare soil indices were computed, the ratio images were then classified into two land cover types, bare soil and non-bare soil using an appropriate threshold value. In practice, an index image depicts more than two land cover types with several histogram peaks. A conventional binary thresholding is irrelevant for classifying this index image. We applied multi-Otsu thresholding (MOT), with a basis of Otsu thresholding [77], for indicating the bare soil thresholding values for different bare soil indices. To ensure uniformity in bare soil identification between the two test sites, the mean threshold values were considered instead of particular values. The computed threshold values were then slightly adjusted by comparing with false color composite (FCC) images to get the most appropriate threshold for bare soil detection on each index.

Pure Pixel Selection
We selected "pure pixels" or confident pixels for four land cover types, namely bare soil, built-up, vegetation, and water bodies. The pixel selection bases are diverse from different image interpretation, ground sampling, and field experience. Besides, very high resolution (VHR) images in Google Earth were also used as references, as suggested by previous studies [78][79][80][81][82]. To increase objectivity and independence, we selected two isolated datasets in each test site to compare index performance and evaluate bare soil extraction. The first dataset consists of 2000 separated pixels and 500 pixels for each land cover type. Meanwhile, the second dataset contains 400 pixels, 200 for bare soil and 200 for non-bare soil. A pixel was randomly chosen when it was accurately known as a particular land cover, and was not adjacent to another land cover. Additionally, the two nearest pixels need to be no closer than 90 m (i.e., about 3 pixels on the Landsat image). Finally, pixel purity was tested by Jeffries-Matusita and Transformed Divergence separability measures [83][84][85]. The measurement indicates a land cover pixel group's separative ability against another land cover in a pair. Likewise, all land cover types are compared, in which a group is separated from another with a value range of 1.71-2.01. All selected pixel groups achieved fine separability (> 1.99).

Performance Assessment
The second datasets of pure pixels (i.e., bare soil and non-bare soil) were used to evaluate the ability to separate bare soil and other land cover types on four bareness indices. The bare soil maps were then compared to the truth points and the well-known indicators in remote sensing were estimated, which are overall accuracy and kappa coefficient, by constructing a confusion matrix.
In addition to the thresholding classification evaluated by the above metrics, a random forest classifier (RFC) was applied for assessing the performance of MBI in a multivariate classification against the other three indices. The dataset of four land cover types was divided into two sets. The classifier was trained by 70% of the pixels and validated by the remaining 30% of samples [71,86]. There were seven spectral bands (i.e., RGB, NIR, SWIR1, SWIR2, TIR1) and four bare soil indices contributing to training the model. There are two critical parameters in the RFC: the number of trees (ntree) and the number of randomly sampled variables (mtry). The mtry parameter is automatically optimized. The classification trees were defined as a number that is not too small, i.e., = ntree 500 [7,71,87]. Subsequently, individual mean decrease accuracy (MDA) was computed for the corresponding contributor. A more important variable is indicated by a higher MDA, reflecting the model's accuracy loss by excluding this variable [87]. The classifier was repeated with an iteration time of n = 1000 on completely different training datasets to limit possible biases resulting from the variable selection.

Bare Soil Indices
Four bare soil indices calculated for test sites #1 and #2 are presented in Figure 5. The index images were uniformly rescaled on a value range of −1 to 1 for peer comparison among the indices. Regarding image visibility, we can see that BSI emphasizes built-up covers instead of bare soil areas because buildings in index images are in a lighter tone (i.e., higher than 0.5) compared to bare soil parcels (Figure 5a,c). Although DBSI can better highlight bare areas against BSI, several built-up areas are still in a light tone, especially in the city center and buildings with high reflectance from roof materials. The NDBaI and our bare soil index, MBI, performed well for both bare soil emphasis and separation with built-up covers. Those bareness areas are able to be visibly identified by pixels with values moving towards 1.0 on both NDBaI and MBI (Figure 5b,d). ). An overlapping rate test shows the overlapping rate (OLR) between bare soil and urban pixels reaches 35% and 25% for DBSI and NDBaI, respectively. Therefore, determining an optimal thresholding value to differentiate bareness from urban areas is relatively challenging. Nevertheless, once we can pick out an approximate value, it does not mean that these two covers can be altogether distinguished because of existing overlapping. The mean ranges in BSI are relatively distinct,

Extraction of Bare Soil Areas using Modified Bare Soil Index
Bare soil index images were applied with different thresholding values for mapping ratio images into bare soil and other land cover types (i.e., as values out of the thresholding ranges). In the first test site, bare soil thresholding ranges were BSI ( ) and a visualization technique based on image interpretation keys (e.g., color, tone, texture, size, shape, and association) when as much bare soil as possible is able to be retrieved. The bare soil extraction results using threshold values are shown in Figure 7. Generally, these four indices were able to detect bare soil features, especially for large and homogeneous areas of bare soil (Figure 7b,c). Yet, the classification ability of these bareness indices is dissimilar. The BSI classified more bare soil areas than there are in reality, i.e., it is clearly depicted in the overview bare soil layers. For instance, once many bare soil areas were detected using the BSI index, several built-up areas were misclassified as bare soil (Figure 7a,c: BSI). The misclassification of bare soil was reduced when using the DBSI, but there are still buildings classified as having bare soil features, especially in the city core of Soc Trang province (test site #2) and a dense cluster of residential areas in eastern Bangkok (Figure 7a,c: DBSI). The extracted bareness layers using NDBaI and MBI were relatively similar to visible bare soil in composite images in terms of spatial patterns and distribution, except for some small-size and scattered residential areas that were incorrectly identified. The difference can only be seen in Figure 7a,c for NDBaI and MBI, where a strip of scattered residential areas and the city center were accurately classified as other land cover types by MBI instead of bare soil features as classified by NDBaI. In short, among four indices of bare soil, MBI is the most effective index in identifying bare soil features using index values in terms of visibility. Detection performance is detailed in Section 3.3. Figure 7. Spatial distribution of bare soil layers extracted from BSI, NDBaI, DBSI, and MBI on entire test sites and zoomin areas in solid line squares where there were significant differences among the indices compared to FCC images: (a,b) Test site #1, (c,d) Test site #2

Performance Assessment
The bare soil layers were compared to ground truth points (i.e., the second dataset of pure pixels). The results were mainly evaluated by the overall accuracy and kappa coefficients (   Figure 8 shows the important variable assessment using the RFC that reveals the MDA in test site #1 which is generally lower than the value in test site #2. The more extensive the bare soil areas are, the more misclassification there is liable to be. MBI is the most significant contributor to the classifier in comparison to the multispectral bands and three bare soil indices. The classification accuracy decreases by 28 ± 14.6% in test site #1 and 26.3 ± 4.2% in test site #2 without MBI. In test site #2, the accuracy drops by 77.1% without the contribution of MBI. The contribution of BSI is noticeable with the MDA about 22.7 ± 3.3% and 25.1 ± 2.3% in test sites #1 and #2, respectively. It is followed by the NDBaI with a small difference in MDA between the two test sites. The DBSI efficiency is relatively low in this test when its MDA lies between 14 ± 1.5% (test site #2) and 16.6 ± 1.3% (test site #1). Among the individual spectral bands, the TIR wavelength is indicated as the least contributing variable for the classification.
In contrast, it is predictable that NIR, SWIR1, and SWIR2 are the most individually meaningful contributors among the multispectral wavelengths. The contribution of these bands is even higher than that of NDBaI and DBSI. Once again, it demonstrates the rationale for suggesting band combination is appropriate, and the MBI has potentially higher efficiency than other indices in this study.

Discussions
Several remote sensing indices are derived from different spectral wavelengths to enhance and separate bare soil from other land cover features. Unlike some other indices (e.g., NDVI and MNDWI), which can be widely applied in many regions regardless of climate conditions and geographical features, the bare soil index is a relatively sensitive indicator. The bare soil index's performance depends on soil composition, soil moisture, and even surrounding green covers [49,64]. Rasul [49] also indicated that the bare soil index's performance differs in humid and dry-arid regions. For example, the dry bareness index (DBSI) was assessed as having high accuracy in Erbil (Iraq), but it is an inappropriate indicator for bareness mapping in our test sites (i.e., tropical monsoon regions). We should consider the study area's climatic conditions as well as the effectiveness when choosing an appropriate bare soil index. Furthermore, surface characteristics should be considered when applying the spectral indices in general and the MBI in particular. For instance, the effectiveness of NDVI in vegetation detection is negatively influenced by urban architectures such as cool roofs, cool pavement materials, and rooftop gardens. Similarly, the MBI performance may be affected by urban materials such as clay roof tiles and road paving bricks.
The modified bare soil index (MBI) is based on NIR, SWIR1, and SWIR2 spectral wavelengths. This index shows the ability to effectively distinguish bare soil from other areas at both test sites. Meanwhile, the NDBaI also had high isolation efficiency ( Table 2), but the accuracies varied between these two test sites. Thus, we can primarily consider that bare soil classification efficiency by the applied MBI is independent of bare soil and urban patterns. Compared with the NDBaI, the proposed index utilizes infrared wavelengths with similar pixel resolution instead of the 100 m thermal infrared band in Landsat 8.
Additionally, the application of NDBaI for finer resolution satellites, e.g., Sentinel-2, WorldView, is impossible because the thermal infrared band is unavailable from these satellites. However, the wavelengths required for MBI estimation are all applicable for Sentinel-2 and WorldView. MBI can even be considered for bare soil-related studies at larger scales using low-resolution data such as MODIS and Sentinel-3. Therefore, the MBI may provide a better option and has the potential to be applied to a wide range of data in bare soil-related research, such as the pre-urbanization stage and agricultural losses caused by natural disasters. Although there are comparable spectrum bands among the satellites, the dissimilarities in pixel size and central wavelengths may affect the MBI's performance. Therefore, empirical studies on MBI's ability to identify bare soil based on the data mentioned earlier are expected to be carried out in the future.
There are two kinds of bare soil which are visible in a color composite image, namely dry and humid bare soil (Figure 9c,d,e,f). The dry bare soil (type #2) has a high reflectance value in shortwave infrared wavelengths, while humid bare soil (type#1) has a low reflectance value in these wavelengths, absorbed by high moisture content (Figure 9g). Therefore, the humid bare soil is identified as urban features (Figure 9a,b). When we reduce the MBI thresholding value to select the whole area of humid bare soil, several urban areas are misclassified. Fortunately, the humid bare soil is uncommon in both test sites, so it does not significantly affect the general bare soil extraction. Despite this, future studies should continue to seek an index or technique to complete the distinction between urban areas and wet bare soil for research in urban, semi-urban, and countryside areas.

Conclusions
We enhanced the bare soil signals by a three-band bare soil index called modified bare soil index (MBI) using NIR, SWIR1, and SWIR2 wavelengths in Landsat 8 to precisely classify and differentiate bare soil from built-up and other land covers, especially for areas disturbed by seasonal bare soil in the tropical regions. The value range of urban area and bareness in MBI has fewer overlapping zones in comparison to BSI, NDBaI, and DBSI, whereby MBI significantly reduces misclassification between built-up and bare soil features. Bare soil areas derived by MBI thresholding value achieved high accuracy with kappa coefficients over 0.96 and overall accuracy higher than 98% in the study areas. The MBI is also a critical predictor in land cover classification compared to individual bands and other considered bare soil indices.
The proposed MBI is a potential option for classifying bare soil in the tropical climatic region. Usage of MBI is more efficient than NDBaI, which is based on a low resolution thermal infrared wavelength in Landsat 8. Besides, the MBI might be applicable for a finer satellite such as Sentinel 2, WorldView, and any satellite with similar spectrum bands. Yet, when applying MBI, climatic conditions, bare soil patterns, and land cover characteristics should be considered to optimize the bare soil index's effectiveness.