www.mdpi.com/journal/remotesensing Article Application of Remote-sensing Data and Decision-Tree Analysis to Mapping Salt-Affected Soils over Large Areas

Expert assessments for crop and range productivity of very-large arid and semiarid areas worldwide are ever more in demand and these studies require greater sensitivity in delineating the different grades or levels of soil salinity. In conjunction with field study in arid southeastern Oregon, we assess the merit of adding decision-tree analysis (DTA) to a commonly used remote-sensing method. Randomly sampled surface soil horizons were analyzed for saturation percentage, field capacity, pH and electrical conductivity (EC). IFSAR data were acquired for terrain analysis and surficial geological mapping, followed by derivation of layers for analysis. Significant correlation was found between EC values and surface elevation, bands 1, 2, 3 and 4 of the Landsat TM image, and brightness and wetness indices. Maximum-likelihood supervised classification of the Landsat images yields two salinity classes: non-saline soils (EC < 4 dSm–1), prediction accuracy of 97%, and saline soils (EC < 4 dSm–1), prediction accuracy 60%. Addition of DTA results in successful prediction of five classes of soil salinity and an overall accuracy of about 99%. Moreover, the calculated area of salt-affected soil was overestimated when mapped using remote sensing data only compared to that predicted by additionally using DTA. DTA is a promising approach for mapping soil salinity in more productive and accurate ways compared to only using remote-sensing analysis.


Introduction
Soil salinization is a major land-degradation problem in arid and semi-arid environments [1][2][3][4].Mapping techniques that can be used to inventory and monitor soil salinization over large areas in more efficient, time-effective and less expensive ways are required for precision agriculture and sustaining soil productivity in many parts of the world.Predictive mapping techniques, such as linear and multiple regression, geostatistics (i.e., Kriging and CoKriging), fuzzy logic, neural network, and classification and regression trees [5][6][7][8][9] have been used to develop soil and natural resource maps.Each of these techniques provides optimal results under certain circumstances.Geostatistics, for example, yield significant results when data are normally distributed and stationary (mean and variance do not vary significantly in space); where significant deviations from normality and stationarity arise, the analysis becomes problematic [10,11].This normality issue is difficult to constrain at smaller scales, especially when values of environmental parameters and soil properties dramatically change from one location to another across the soilscape.It is just such a case in mapping soil salinity, where electrical conductivity (EC) values change significantly between salt-affected and normal soils over relatively short distances within large areas.Using geostatistics in this case will result in significant errors and the predicted values will depart significantly from the original.Mapping salt-affected soils in the field is difficult as they are interspersed with normal soils and form no contiguous pattern [12].
Geostatistical techniques such as cokriging could benefit from the availability of secondary data in developing prediction maps, but all of the data have to be numerical and normally distributed, not nominal or categorical data.Thus, valuable data such as geology and vegetation which either have a significant influence on salt accumulation or influenced by soil salinity, could not be used in producing predictive soil maps.This issue is in addition to the requirement of geostatistical techniques to large amounts of field data in order to obtain optimal results.Decision-tree analysis (DTA), on the other hand, is a predictive mapping technique that can be used in developing soil salinity maps over large areas.It is a non-parametric or distribution-free statistical method, which means data are not required to fit a normal distribution curve.It can be used with ordinal as well as categorical attributes.Moreover, it requires no assumptions about the data and provides interpretable prediction rules that can be extrapolated to similar areas [6,13,14].
Remote-sensing data have been used successfully in mapping soil salinity for decades [2,3,12,[15][16][17][18][19][20].The principle behind this success is based on the dramatic effects that soil salinity has on soil physical, chemical and biological properties.The quantities and changes in soil properties can be monitored using remote sensing.
The objectives of this study are to demonstrate a combined method involving remote sensing data and DTA in developing soil salinity maps in an efficient and timely manner.This study was conducted in support of an initial soil survey of Malheur County, Oregon, the largest unmapped area in the USA.
Recognizing the fine distinctions between five soil salinity classes used by the US Department of Agriculture Natural Resources Conservation Service [21] is one of the means by which soil map units are being designed and delineated in the survey area.

Site Description
Field reconnaissance of the study area, located in central Malheur County, southeastern Oregon, USA (Figure 1), reveals that saline soils and intergrades are abundant.For this experiment, a 1,160 km 2 area was chosen, covering the range of saline soils thought present in the region [22].Surface elevation ranges from 1,175 to 1,771 m (mean = 1,297 m) aMSL and slopes range between 0 to 57° (mean = 2°).Aridisols are the dominant soil order [23] in the area as the soil moisture regime is aridic with mean annual precipitation (MAP) varying from 178 mm in low areas to 330 mm on high elevations (mean = 254 mm); a portion of this falls as snow during winter, with 220 mm at median elevations.Soil temperature regime is mesic to frigid, with surface air temperatures ranging from monthly mean low temperature of −8° (January) to monthly mean maximum 33 °C (August).Soils in the area are developed on Tertiary basalt and andesite, tuffaceous sedimentary rocks, lacustrine deposits and fluvial sedimentary rocks [24].Prevalent landforms in the area are alluvial fans, fan remnants, basins, flood plains, pediments, and playas.Vegetation varies from native vegetation to agricultural pasture land [25].Dominant native types of vegetation in the area are Wyoming big sagebrush (Artemisia tridentata wyomingensis), Shadscale saltbush (Atriplex confertifolia), greasewood (Sarcobatus vermiculatus), black sagebrush (Artemisia nova), basin big sagebrush (Artemisia tridentata tridentata), and low sagebrush (Artemisia arbuscula arbuscula).
Most soils in the area are well drained and depth to the water table is far away from the soil surface except for areas close to agricultural pasture land and the internal playas.Agricultural lands are pump-irrigated and irrigation water flows to nearby low-lying areas, resulting in higher water tables.Coyote Lake, in addition to many smaller playas, is an internal-draining playa (alkali flats or sabkha) that is periodically water-logged during the winter and spring seasons, and dries out during the summer.Playas are bare and shallow depressions with a high content of soluble salts and high alkalinity.

Data Sources and Description
Spatially extensive, digital data appropriate to describing the morphology and distribution of soils cf.[7], especially the abundance of salinity, were used in this work including: digital elevation model (DEM) data, Landsat imagery, geology, vegetation, and climate (Figure 1).Description of the data layers and their sources is presented in Table 1 and a more detailed description of the attribute-values for each layer is provided in Table 2.No soil map exists for the area.Spatial data were represented using the raster data model in ArcGIS which is helpful in data modeling and data manipulation.Data layers were resized to have a spatial resolution of 30 m, which represent most of the data used in this study.To reduce errors due to georeferencing, as many of the layers are digitized from paper maps, all layers were closely examined for spatial correspondence to late-generation; USGS digital orthoquad maps (combined aerial imagery and topographic data).Elevation, terrain attributes and landforms were developed based on two digital elevation models (DEM).Initially, we used stock USGS DEMs with their 10 m grid size, but found that playa rims, dune fields and other landforms were not present or poorly resolved using such data.To improve on this we had Interferometric Synthetic Aperture Radar (IFSAR, aka InSAR) data acquired from fixed-wing aircraft and processed by Intermap Technologies, Inc. (Englewood, CO, USA), under contract with the USDA-NRCS, for our use in developing the DTA technique in its application to digital soil mapping.Because the existing (bedrock) geological map for the area is a preliminary 1:500,000 scale representation [32] and therefore unsuitable for use in this experiment, we developed a 1:24,000-scale surficial geology map based on field reconnaissance, aerial photographs, IFSAR DEM, and the bedrock geology.The surficial geological units include age, process and material properties, e.g., Holocene aeolian sand dunes, Late Pleistocene sandy gravelly alluvial fans, and residuum on Tertiary andesitic basalt.

Soil Samples and Analysis
About 210 surface soil samples, nominally the 0-15 cm layer, were collected from soil pits excavated and described according to US Soil Survey standards [21].Field work occurred during the months of July and August (dry season) 2006, when salt efflorescence reaches its maximum in this region (M.Keller, personal comm., 2006).Sampling sites were selected using a stratified random sampling method modified by access concerns with existing remote dirt road network.Stratification of the random sample depended on landscape complexity and the need to represent all area-class map units [3,33].Samples were air dried, crushed and sieved to pass through a 2 mm sieve.Electrical Conductivity (EC) was measured in the soil-saturation extract in decisiemens per meter (dSm −1 ) according to [34] (Table 3).Soil reaction (pH) was also measured in the soil paste.No data were available about water table depth and water salinity content for this remote area.

Mapping Methods
Two methods were used in this paper to develop soil salinity maps: remote-sensing analyses without and with DTA.Two Landsat TM images, acquired on August 17, 2005 [26], were mosaiked and subsetted to cover the area of interest (Figure 1).Salt-affected soils are usually poorly vegetated areas and stressed vegetation could be used as indirect sign for the presence of salinity.Two vegetation indices were therefore integrated in the analysis: Normalized Vegetation Index (NDVI) [27] and Soil Adjusted Vegetation Index (SAVI) [28].Tassel Cap Transformation (TCT) indices (brightness, greenness, and wetness [29] were used to distinguish areas with high spectral reflectance, green vegetated areas and soil and vegetation moisture.Band ratios such as b3/b1 and b5/b7 also were calculated and used to interpret some properties conventionally associated with non-saline soils.Band Ratio b3/b1 is found to reflect iron content as reported by [35], whereas b5/b7 is found to have a strong correlation with clay mineral content in poorly vegetated areas [36]. Using DTA, several additional environmental variables were incorporated in developing soil-salinity prediction maps.DTA was carried out using the See5/C4.5 algorithm [37], a system for automated knowledge acquisition for artificial intelligence (AI) applications [38].A constructed decision tree consists of nodes representing variables or attributes, branches representing attribute values, and leaves representing classes.A decision tree is built based on selecting the attribute that minimizes the amount of disorder in the sub-tree rooted at a given node.
Soil samples were classified according their EC values into five classes [21]: (1) EC < 2 dSm −1 very low; (2) EC from 2 to 4 dSm −1 low; (3) EC from 4 to 8 dSm −1 moderate; (4) EC from 8 to 16 dSm −1 high; and (5) EC > 16 dSm −1 very high.Sampling points were buffered using a 300 m (10 pixels) buffering distance on the GIS map to collect other localized environmental data that have the major influence in developing soil salinity at each sampling location to be used in training the prediction model.These locations were randomly sampled using the Classification and Regression Tree (CART) model [39].About 21,400 training samples (pixels) were used to train the model, whereas about 7,600 validating samples (pixels) were used to validate the model.Training and validation data were boosted using 10 trials to enhance the prediction accuracy.Using this function results in creating a sequence of decision trees, where each subsequent tree attempts to fix the misclassification errors in the previous one.Each decision tree makes a prediction and the final prediction is a weighted vote of the predictions of all trees [40].Also, the growing tree was pruned by 30% to reduce the over-fitting problem and increase the model efficiency [3].

Field Observations
Across the area we find a significant correlation between EC values and surface elevation (Table 4).Field observations in the study area indicate the presence of salt accumulation in low-lying landforms across the landscape, including Holocene floodplains, Holocene playas and Quaternary fluvial fans.Salts effloresced on the soil surface in the floodplain of Crooked Creek (close to the agriculture land).EC values were very high at that location and varied from 12.5 to 82.8 dSm −1 .This high salt content is associated salt-tolerant vegetation (halophytes) such as salt grass (Distichlis spicata var.stricta.) and greasewood.Also, this location represents a low-level area and the ground water table was encountered at about 60 cm.Higher salt content was also observed in the playas-no vegetation is growing in these areas, soils are strongly compacted, and pH values are greater than 8.5.However, the salt accumulation on playa surfaces is significantly less than that observed on floodplains in the northeastern quarter of the study area.The distal ends of fluvial fans, whether Holocene or Middle Pleistocene in age, have morphological traces of salts within the soil matrix.

Image Analysis and Visual Interpretation
Salt-affected soils could easily be visually identified from the Landsat images using the false color composite (RGB 432) and brightness and wetness indices.The spectral reflectance curve (Figure 2) shows that severely salt-affected soils have a high reflectance in the visual (bands 1, 2 and 3) and near infrared (band 4) parts of spectrum and relatively low reflectance in the mid-infrared parts of spectrum (bands 5 and 7).A significant correlation was found between the EC values and bands 1, 2, 3, and 4 of the Landsat images (Table 4).Also, a significant correlation was found between the EC values and the brightness and wetness indices.Landsat images were classified using the maximum likelihood supervised classifier in the ENVI program into 5 classes (1.Saline soil; 2. Agriculture land; 3. Inter-mountain basins big sage steppe; 4. Low sage brush steppe; and 5. Inter-mountain big sage brush shrubland) (Figure 3).The output map was reclassified into two classes: saline (class 1) EC > 4 dSm −1 and non-saline (classes 2, 3, 4, and 5) EC < 4 dSm −1 .Prediction accuracy of salt-affected soils was about 60% and non-saline soils was about 97%, with an overall accuracy was about 95%.Classified saline area represents 6.7% of the total area, whereas the non-saline area represents 93.3%.

Decision-Tree Analysis and Predicted Soil Salinity Map
Decision-tree analysis yielded a predictive soil map with classes of salinity (Figure 4).Prediction confidence in the classification accuracy of the decision tree is high (Figure 4).The overall accuracy of the decision tree produced without boosting was 98.4% and Kappa coefficient was 0.90 (Table 5).Producer's accuracy varied from 77.7 to 99.2% with a mean value of 87.6%, whereas the user's accuracy varied from 79.0 to 99.2% with a mean value of 85.8%.The prediction accuracy was enhanced by using 10 trials of boosting.The overall accuracy was 98.8% and Kappa coefficient was 0.92 (Table 6).Producer's accuracy varied from 78.8 to 99.2%, with a mean value of 91.4%, whereas the user's accuracy varied from 71.0 to 99.6% with a mean value of 86.0%.The calculated area of saline soils represents 1.9% of the total area, whereas the non-saline area represents 97.4%.   5.

Discussion
Multi-class soil salinity maps are required for modern management of arid lands, and techniques such as that described here are needed to efficiently develop such inventories.Our results indicate that use of Landsat TM images of the study area well identifies bare areas that have a high reflectance due to their high salinity content and/or salt-efflorescence on the soil surface.This result agrees with that obtained by [41] who reported that salt-affected soils with salt encrustation at the surface are, generally, smoother than non-saline surfaces and cause high reflectance in the visible and near-infrared bands.It was also noticed that the high spectral reflectance of some areas east and northeast of playas was due to millimeters-thin deposits of yellowish saline dust blown from the dry lake surfaces, resulting in misclassifying these locations as highly saline soils.Accordingly, classifying salt-affected soils based on spectral signatures could overestimate their areas.
Vegetation indices (NDVI, SAVI, and Greenness index) did not have a significant correlation with the EC values, which indicates that halophytes couldn't be used in identifying salt-affected areas under vegetation cover.This could also be due to the coarse resolution of the Landsat image (30 m pixel size) and the smaller size of the salt-affected area.Landsat data could only distinguish between highly saline soils and normal or non-saline soils but salinity classes or degrees in between could not be discriminated.Similar results are noted by [3] and [12].We also found that the wetness index has a significant correlation with the EC values which could be due to the tendency of salt-affected soils to retain high moisture content.
The soil salinity map developed by DTA successfully predicts five classes of salinity levels, a significant increase of the standard remote-sensing methods.This is likely due to the ability of DTA to integrate other environmental variables that have significant influence on the development of secondary salinization.Surficial geology and terrain attributes, notably elevation and slope, are critical variables in predicting soil salinity across such a broad area.We find that the difference between USGS 10 m DEM and IFSAR 5 m DEM to be significant in that the former partially or wholly misses key landforms with saline soils.Secondary salinization mostly occurs in low-land areas, where groundwater frequently rises up through the soil profile [12,38].Therefore, it is important to identify these locations using the DEM.Soil salinity is not only influenced by the morphology of the soil profile but also by the soil physical, chemical and biological properties [12,42].Bedrock geology and its chemical composition were integrated in the analysis and could result in enhancing the prediction accuracy.Soils in the study area are developed on Quaternary sediments and Tertiary volcanic flow rocks, mostly basalt and andesitic basalt, which are rich in feldspars and salt-bearing inclusions and vugs.Dominant vegetation is another variable that is directly influenced by higher soil salinity levels.This was also integrated in the analysis; however the vegetation map has a coarser scale that does not represent the vegetation types associated with soil salinity (halophytes), especially over smaller areas.This biological relationship to salinity is noted elsewhere across the study area, yet is not useful for large-scale distinctions of map classes.

Conclusions
Current remote-sensing methodology used in mapping soil salinity could be significantly improved if decision-tree analysis (DTA) is incorporated in such efforts.Remote-sensing data alone have been a nominally successful tool in mapping soil salinity over large areas, as it can only distinguish between two classes of saline soils: highly salt-affected soils, indicated by poor, sparse vegetation and high spectral reflectance, and non-saline soils, indicated by healthy vegetation.This is insufficient for modern soil salinity management with its five classes of salinity.Moreover, salt-affected areas may be overestimated when mapped using only spectral signatures.
The development of a soil salinity map additionally using DTA successfully distinguishes between the five classes of salinity used in the USA and elsewhere.DTA proved to be an efficient, useful approach for mapping soil salinity over large areas compared to traditional remote-sensing data.DTA incorporates several environmental variables that significantly influence the development of soil salinity and not only the spectral properties of the soil surface.The use of surficial geology, terrain and landform map layers, especially those developed using high-resolution IFSAR DEMs, significantly enhances delineations of map classes.Using this technique could significantly enhance the productivity and the accuracy of soil salinity mapping compared to conventional mapping methods especially in such remote inhospitable areas.Predictive maps of multi-class soil salinity should now be closer to obtain elsewhere around the world.

Figure 1 .
Figure 1.Study area, in Malheur County, southeastern Oregon, USA (see inset map), represented by a Landsat TM image of study area (left) and IFSAR DEM with hillshade (right).Both images have the overlay of soil pits and sampling points.

Figure 2 .
Figure 2. Spectral reflectance of salt-affected soils collected by using (a) Landsat TM image and (b) Spectroradiometer.Spectral properties of salt-affected soils in the study area were measured in the field almost at the same acquisition time of Landsat TM images (August 17, 2005).The spectral reflectance was measured using the FieldspecfiPro, manufactured by Analytical Spectral Devices of Boulder Colorado.The instrument has a field of view of 25 mm.It covers the spectral range from 350 to 2,500 nm with an average bandwidth of 1 nm.

Table 1 .
Databases and their sources.

Table 2 .
Environmental variables and their properties.

Table 3 .
Soil samples and their XY coordinates, saturation percentage (SP), field capacity (FC), pH and EC values.

Table 4 .
Correlation between EC values and numerical environmental variables.

Table 5 .
Producer's, user's, overall accuracy and Kappa coefficient for predicted soil salinity classes developed without boosting.

Table 6 .
Producer's, user's, overall accuracy and Kappa coefficient for predicted soil salinity classes developed with 10 trials of boosting.