1. Introduction
An estimation of suspended sediment yield is required for engineering practices that deal with improved land and water management practices in a river basin. The transport of sediment in rivers implies a series of negative effects, such as reservoir siltation and channel bed modification. Such effects may disturb the sediment balance in the basin. In particular, sediment that is eroded from sloping areas can accumulate in the river’s network, thereby affecting channel water conveyance [
1]. Moreover, several problems due to soil erosion, such as the loss of fine and nutrient-rich topsoil that reduces land productivity, as well as the pollution of surface water bodies, are evident [
2,
3,
4,
5]. The study of erosion and sediment yield has long established itself as an important area of hydrological research due to the economic significance of the processes involved.
Similar to other developing Southeast Asian countries, land degradation is a major problem in Thailand. This problem manifests itself in terms of the soil structure and its fertility deterioration, in particular for sloping land [
6]. Cultivation on sloping areas influences the environment in terms of siltation, flash floods, poor crop yields,
etc. [
7]. The estimation of sediment yield is required in planning and designing water resource development projects, especially for studying the feasibility of a dam or a barrage, assessing sediment budgets and examining the delivery of sediment and contaminants to the estuarine or ocean system, which also provides a valuable means of studying the denudation process [
8]. However, sediment data is rarely available due to the lack of monitoring. Erosion and sediment transport are complex phenomena, and these processes are affected by several factors, such as climatic and geomorphological conditions, land use,
etc.The approaches employed to estimate sediment yield can be divided into four main groups [
9], namely: (1) the soil erosion and sediment delivery approaches, wherein estimated soil erosion rates are factored by a sediment delivery ratio, which is often based on basin characteristics; (2) the physically-based and/or distributed basin modeling approaches, wherein the movement of water and soil is estimated in a distributed way throughout the basin; (3) the models relating sediment concentration or the load to the river flow, wherein measured sediment concentration data is related to river flow characteristics; and (4) empirical models based on broad basin and climate descriptors, wherein sediment yield equations are derived from known basin characteristics. The soil erosion and sediment delivery approaches are usually based on the Universal Soil Loss Equation (USLE) [
10] and the concept of the sediment delivery ratio (SDR) [
11]. Although many combinations of erosion and sediment delivery modelling are available [
12,
13,
14,
15,
16], they still require calibration and, thus, cannot be transferred from the study area to other catchments and environments. Moreover, USLE cannot be applied easily to non-agricultural land uses or to areas outside of the range of the original development and application [
9].
The physically-based model describes the physical processes involved in the flow and transport of sediment, and these processes use the laws of the conservation of mass, momentum and sediment transport to explain the inherent processes; however, the physically-based model requires extremely onerous input data. When the input data is scarce, the large number of involved parameters may cause significant uncertainty in soil erosion estimates [
1]. Furthermore, the simulation of sediment transport at the basin scale is still computationally very expensive. The models relating sediment concentration or load to river flow are most commonly used in practice. These models assume that river flow, rather than sediment supply, is the dominant factor in sediment yield. However, such models also require a large amount of data to give realistic estimates of long-term average annual sediment yield. This approach is based on “what has happened” rather than “what may happen”. Understanding sediment supply and transport processes is required to extrapolate their potential consequences during unmonitored future climate and/or land-use scenarios. The empirical model is based on limited knowledge of the processes and relies on the data describing input and output behavior. This method, however, is able to make abstractions and generalizations of the process and often complements the physically-based model [
17].
Several authors have shown the effectiveness of statistical relationships, which allow one to estimate river sediment transport depending on easily available geomorphologic, hydrological and climatic parameters [
1,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33]. Sediment yield is controlled by factors that control erosion and sediment delivery, including local topography, soil properties, climate, vegetation cover, catchment morphology, drainage network characteristics and land use [
26,
28]. Langbein and Schumm [
32] studied the relationship between mean annual precipitation and sediment yield in the United States, while Walling and Webb [
31] concluded that no simple relationship exists between climate and sediment yield, because climate’s effect on sediment load is very complex. Anderson [
33] proposed three major groups of explanatory variable as being involved in relating sediment yield to watershed variables. These are the hydrologic event variables, the watershed conditions and land use variables, as well as the inherent watershed variables, such as area, geology and physiography. He also mentioned that sediment measuring device and its efficiency is also important in having accurate sediment measurements. Bray and Xie [
29] identified six categories of variables that can be related to the processes associated with the generation and delivery of suspended sediment to the basin outlet in Canada, which are hydroclimatic conditions, basin topographic features, land surface features, soil characteristics, channel network features and human activities. Ciccacci
et al. [
30] and Grauso
et al. [
1] investigated the correlation between the sediment yield and some geomorphologic, hydrological and climatic parameters in Italy. They found a significant relationship between average yearly sediment yield per unit watershed area and the drainage density. Restrepo
et al. [
23] developed a multiple regression model for estimating the sediment yield in a South American watershed. They reported six catchment variables that predict sediment yield, including runoff, precipitation, precipitation peakedness, mean elevation, mean water discharge and relief, while the mean annual runoff is the dominant control factor. Syvitski and Milliman [
22] provided a description of factors influencing the estimation of sediment loads from rivers, which are drainage area size, basin relief, geologic condition, climate and vegetation cover. They successfully estimated the long-term flux of sediment delivered by rivers to the coastal zone (488 global rivers) by the BQART model, which is influenced by geomorphic and tectonic characteristics, geography, geology and human activities. Recently, Cohen
et al. [
19] introduced a comprehensive global fluvial sediment predictor named WBMsed (Water Balance Model with sediment), a distributed global-scale riverine sediment flux model. The major important inputs for the model are anthropogenic factors, ice cover, lithology, reservoir sediment trapping, drainage area size, maximum basin relief, daily temperature and daily discharge.
The statistical method for reducing a large number of interrelated variables into a smaller number of dominant variables is called principal components analysis (PCA) and has been used in many areas of scientific research [
17,
34,
35,
36,
37,
38,
39,
40]. Recently, Tayfur
et al. [
41] investigated sediment load prediction and generalization from the laboratory scale to the field scale using principle component analysis (PCA) in conjunction with data-driven methods of artificial neural networks and genetic algorithms. In spite of these several uses, there is a disadvantage to PCA: the interpretability of the second and higher components may be limited. For this reason, Varimax rotation is applied to the PCA’s solution to enhance the interpretability of the components by maximizing a simple structure. An alternative rotational approach is known as the independent component analysis (ICA) [
42,
43,
44], which finds a linear representation of non-Gaussian data, so that the components are statistically independent. Westra
et al. [
44] report that the PCA and Varimax rotations provide fairly accurate interpretations for global and local phenomena, respectively, while the interpretability of ICA results appears to be less successful.
The objectives of this study are to propose a complementary methodology that can be used in the prediction of suspended sediment yield in an ungauged basin (i.e., one where the river flow data is unavailable) based on a data-driven modeling approach. The use of the PCA with Varimax rotation to identify the key factors affecting sediment yield and the use of multiple regression analysis to establish the relationships between suspended sediment yield and the basin’s characteristics in terms of geomorphology and climate are also investigated.
2. Study Area
The study basin covers an area of 102,636 km
2 of Ping, Wang, Yom and Nan river basins in Northern Thailand. It is located between 15°30′ N and 20°00′ N latitudes and 98°00′ E and 101°30′ E longitudes (
Figure 1). The Ping, Wang, Yom and Nan rivers are the main tributaries of the Chao Phraya River, the most important river of Thailand. These four tributaries originate from the Phi Pannam Mountain and course through mountainous areas before merging with each other in the alluvial plains of the Nakhon Sawan Province to form the Chao Phraya River.
Figure 1.
The study area showing the locations of suspended sediment gauging stations.
Figure 1.
The study area showing the locations of suspended sediment gauging stations.
The study area is mountainous, with agriculturally productive valleys. The Ping, Wang, Yom and Nan rivers travel from north to south. The climate of the study area is dominated by seasonal monsoons. The rainy season that lasts from May to October is influenced by the southwest monsoon from the Indian Ocean and the depressions originating in the Pacific Ocean. The average monthly temperature ranges from 15 °C in December to 40 °C in April, except in high altitude locations. The study area can be classified as a tropical rainforest with high biodiversity. The general description of the study area [
45] is presented in
Table 1.
Table 1.
A general description of the study area.
Table 1.
A general description of the study area.
Basin Characteristic | Ping | Wang | Yom | Nan |
---|
Drainage area (km2) | 33,896 | 10,791 | 23,616 | 34,331 |
Main river length (km) | 740 | 460 | 735 | 770 |
Forest area (percent) | 73.66 | 76.07 | 49.68 | 45.14 |
Mean annual discharge (m3·s−1) | 276.59 | 51.26 | 115.95 | 381.07 |
Mean annual runoff (106 m3·yr−1) | 8725.30 | 1617.50 | 3656.60 | 12,014.80 |
Mean annual rainfall (mm·yr−1) | 1125 | 1099 | 1159 | 1241 |
No. of selected rain gauge stations | 45 | 23 | 23 | 34 |
No. of selected suspended sediment gauging stations | 22 | 1 | 4 | 10 |
In terms of soil erosion, Alford’s report [
46] on mountain watersheds informs us that the Chao Phraya river basin, in Northern Thailand, showed no evidence of a significant increase in sediment yield during the period extending from the late 1950s to the mid-1980s. However, the Northern region of Thailand is very vulnerable to soil erosion, due to its undulating topography, steep slopes and high rainfall. Due to rapid economic development and population growth in the area, the forest-covered land in this northern region decreased from 68.54% in 1961 to 54.27% in 2004 [
47]. The most vulnerable area is steeply sloping land, which is under cultivation (more than 35% of sloping land). In recent times, human encroachment on forest areas in the upper part of the study area and land use changes with respect to agriculture have become problematic [
48].
5. Conclusions
The investigation of factors affecting suspended sediment yield in the Ping, Wang, Yom and Nan river basins in Thailand, using principal component analysis, is presented in this study. From the principal component analysis, six components of dominant factors influencing suspended sediment yield were identified. These factors contribute to 86.7% of the total variance of all variables considered in the analysis. The dominant factors from each group were then taken as predictor variables in the successive multiple regression analysis to estimate suspended sediment yield and area-specific suspended sediment yield.
From the regression analysis, it was found that there are three factors that significantly affect suspended sediment yield. These factors are hierarchical anomaly density, basin area and forest area. On the other hand, there are five factors that significantly influence area-specific suspended sediment yield. These are basin slope, hierarchical anomaly density, main channel length, forest area and dry season rainfall. The regression models indicate better predictability of suspended sediment yield and area-specific sediment yield for basins with a drainage area of less than 1000 km2.
A set of equations for predicting suspended sediment yield and area-specific suspended sediment yield for basin areas with different sizes within the error of estimation range was proposed. These equations may be used to estimate the expected sediment yield in ungauged basins in the planning and design of water and land development and conservation projects in the northern part of Thailand with easily determined dominant input variables. However, it should be noted that the error of estimation for suspended sediment is relatively high, which is partially due to uncertainties in the sediment sampling/measurement (especially during high discharges or flood events) and in developing the sediment-discharge rating curve equations. Additionally, these models were developed for the estimation of average annual suspended sediment in ungauged basins. Therefore, the application of the models for sediment yield on short time periods, such as event-based estimation, is not recommended, due to the hysteresis effect in sediment rating curves.
Since this is a data-driven approach, the availability of limited data may restrain its applicability. The proposed equations may further be tested in other basins, provided that there is adequate data with similar hydrological and geo-morphological conditions. It is to be noted here that the proposed models were developed under natural conditions, without a water regulating structure; hence, the use of models in other basins with infrastructure may not be warranted. In addition, the models were developed with a relatively short period of data; hence, it is suggested that the models may be updated with a long period of data and more stations, as well as for variable climate and land use conditions. This can be done by using variable data of land use (percent of forest area and agricultural area) and the climate characteristics (annual rainfall, dry and wet season rainfall, precipitation concentration index) and, then, re-generating a new set of regression models.