1. Introduction
Long-term forecasting of watershed runoff and extreme runoff is constrained by insufficient understanding of their physical mechanisms. The selection of forecasting factors is limited by the limitations of temporal and spatial scales and lacks systematic scientific evidence, rendering the reliability and accuracy of forecasting results inadequate for engineering practice. To improve long-term forecasting accuracy, systematic screening of suitable forecasting factors and the improvement of research methods are needed. Zheng Jinling [
1] selected atmospheric circulation indices, the El Niño–Southern Oscillation (ENSO) index and other atmospheric and oceanic physical factors, as well as 1431 forecasting factors such as the distribution of high-altitude pressure fields, precipitation, and temperature, and used the stepwise regression method to achieve a long-term forecasting qualification rate of over 80% for reservoirs in the upper reaches of the Songhua River in Northeast China. Huang Chiyuan [
2] selected meteorological factors and established a multiple-regression equation based on statistical methods to provide the long-term forecasting of the May runoff of the Tianshan River in Xinjiang, providing support for agricultural irrigation. Jin Juliang [
3] established a neural network model for annual runoff forecasting and tested the Artificial Neural Network (ANN) model with 23 years of the measured annual runoff data from Yamadu Station on the Ili River in Xinjiang, along with the corresponding measured data from four previous influencing factors, achieving ideal results. Zhang Lanying [
4] established an optimized support vector machine monthly runoff forecasting model based on the grid search algorithm using five forecasting factors: the monthly average rainfall, the monthly average precipitation, the monthly average relative humidity, the average maximum temperature, and the average minimum temperature. The model was applied to eight sub-basins of the Shiyang River Basin and showed good applicability. The above methods have the advantages of good intelligence and high efficiency. However, there are many factors selected, and most of them are at the same scale, making it difficult to analyze and mine the information of factors and runoff effectively. At the same time, the prediction period is limited.
To study the high and low flow of the extreme runoff, the multiple scale factors are currently used for forecasting. Li Wenlong [
5,
6] searched for factors affecting the 2010 flood from the motion relationship of the sun, Earth, and moon, while Peng Zhuoyue [
7] comprehensively considered the astronomical indicators that characterize the trajectory and relationship of the sun, Earth, and oon, and selected years similar to the forecast year for the runoff forecasting. Jin Chaohui [
8] conducted in-depth research on the relationship between the lunar declination angle and the annual water in reservoirs. Liu Qingren [
9] focused on solar activity and used mathematical statistical analysis methods to analyze hydrological impact characteristics of sunspots and El Niño events on the Songhua River Basin and the basic laws of water and drought disasters. Li Wenlong [
10] used the year table of El Niño and La Niña events under the Pacific Decadal Oscillation (PDO) cold and warm phases, combined with the actual inflow of water in the Fengman Reservoir basin, to obtain the inflow pattern of Fengman Reservoir in the year of El Niño occurrence under the PDO cold and warm phases, and successfully predicted that the inflow of Fengman Reservoir in 2015 would be an extremely dry year. Li Yongkang [
11] analyzed the factor characteristics of the atmospheric circulation in the early stages of the major flood (drought) and the extremely large flood (drought) years as well as the characteristics of circulation in the early stage of the major drought and flood years to select the better forecasting factors. The variation pattern of extreme runoff has a good correspondence with the characteristic values of the astronomical and global scale factors. The forecasting factors have a long prediction period and good forecasting effect. However, some forecasting factors lack quantitative processing and are mostly based on a single factor to predict the inflow of the watershed. The forecasting methods such as similar year comparison method and correlation analysis method can only qualitatively forecast the incoming water in the basin and cannot conduct quantitative research. Moreover, they cannot further explore the long-term forecasting rules of forecasting factors and runoff, resulting in low forecasting accuracy. At the same time, due to the complex influencing factors of the long-term forecasting and the relatively low forecasting accuracy, it is necessary to combine the characteristics of the watershed for data fusion and revise the forecast results. It is necessary to use a new method to fuse the forecasting factors and further improve the accuracy of the method prediction.
The future atmospheric circulation anomalies can be predicted based on the movements and positions of the sun, Earth, and moon, thereby predicting runoff and extreme runoff levels [
12,
13]. El Niño and La Niña phenomena can cause energy and water vapor anomalies, which in turn affect precipitation and runoff anomalies, leading to extreme floods and droughts in certain regions of the Earth. The meteorological factors at the watershed scale have a good correlation with abundance or scarcity of runoff, and the accuracy of predicting runoff abundance or scarcity is high, while the accuracy of identifying extreme runoff is low. Real-time hydro-meteorological observations can be used to correct the results of astronomical and global scale factors.
For medium- and long-term runoff forecasting, the current weather forecasts cannot provide reliable precipitation data for the corresponding forecast period. The machine learning-based, data-driven models represented by neural networks [
14,
15], extreme learning machines [
16,
17,
18], and support vector machine regression models [
19,
20] can explore and express the complex data relationship between the predicted object and observed data, and have strong nonlinear fitting ability, thus attracting much attention.
This article takes the annual inflow of the Fengman Reservoir Basin from 1933 to 2017 as the research object. Based on sample classification, factors of astronomical scale and global circulation scale are quantitatively processed. BP neural network [
21,
22] and support vector machine [
23,
24] are used for fitting and training forecasts, and the forecast results are qualitatively and quantitatively analyzed. Due to the influence of model parameters on the training and prediction results of the neural networks and support vector machines, in order to improve the prediction performance of the model, this article uses the GWO algorithm to optimize the parameters of the neural networks and support vector machines. Thus, GWO-BP and GWO-SVM models are constructed. The results show that the multi-factor fusion GWO-BP algorithm has a good forecasting effect and high accuracy. This study employs a GWO-BP neural network to systematically investigate extreme runoff patterns through sample analysis and key influencing factor identification, ultimately developing a predictive model for extreme runoff events.
2. Forecast Factor Analysis
The astronomical scale factor mainly refers to the relative motion of the sun, moon, and Earth, with sunspot numbers, lunar declination angle, and lunar calendar dates of the 24 solar terms as the main forecasting factors. The global circulation scale factors are selected based on the El Niño and La Niña events during the PDO warm and cold phases. The watershed-scale factors are selected based on the rainfall from September to October of the previous year and the temperature anomaly in April of the current year, which can reflect the climate characteristics and changes in the watershed. Based on the astronomical, global atmospheric, and ocean current cycle forecasts, the model can be corrected and rolled forward for forecasting.
2.1. Relative Sunspot Numbers
The sun is the closest star to Earth, and solar activity exerts the most significant influence on terrestrial hydrological processes. Variations in solar activity not only affect atmospheric circulation but also induce changes in atmospheric circulation patterns and various hydrological elements. A larger sunspot relative number (SRN) indicates stronger solar activity, while a smaller SRN indicates weaker solar activity.
2.2. Lunar Declination Angle
The angle between the lunar apparent motion orbit (white path) plane and the equatorial plane of the Earth (celestial sphere) is called the lunar declination angle (also known as the right ascension angle). The crustal volume change caused by the maximum declination angle of the moon is 2.3 times that of the minimum. The tidal cycle changes and crustal deformation caused by lunar movement are the main causes of earthquakes and heavy precipitation (or drought). Correlation analysis of lunar declination angle trajectory is conducted as follows. The maximum values of the lunar declination angle each year are connected to form a line, forming the trajectory. By comparing the trajectory segment of the lunar declination angle with historical fragments of similar years, predicting runoff patterns is found.
2.3. 24 Solar Terms and Lunar Phases
The gravitational forces of the sun and moon have a significant impact on the solid, liquid, and gas on Earth, causing phenomena such as land tides, tides, and atmospheric tides. The relative motion between the sun and Earth is characterized by the 24 solar terms. Each of the 24 solar terms corresponds to a certain position reached by the Earth for every 15° movement on the ecliptic. The Gregorian dates of the 24 solar terms are roughly the same every year, and the lunar calendar time corresponding to the solar terms is introduced for forecasting. Comprehensively, it can reflect the impact of relative motion of the sun, moon, and Earth on the abundance or scarcity of runoff.
2.4. Ocean Current Circulation
Weather and climate processes are direct products of atmospheric circulation, and various physical factors affect climate by influencing the distribution and changing characteristics of atmospheric circulation. The seasonal variation characteristics of large-scale (hemisphere or even global) and long-term atmospheric circulations are crucial for the impact of hydroclimate change. El Niño and La Niña events during the warm and cold phases of “Lamadre” can cause droughts or floods in some areas.
2.5. Watershed-Scale Factors—Agricultural Proverbs
The selection of basin forecasting factors should be based on the characteristics of the basin and factors that have a significant impact and decisive effect on the inflow of the basin should be chosen. Many factors are difficult to effectively select, and meteorological factors with good corresponding relationships can be selected based on agricultural proverbs. This article takes the annual inflow of the Fengman Reservoir Basin as the research object and collects and analyzes proverbs about the high and low runoff in the basin, such as “there is more rainfall after autumn, and the mountain slopes will be flooded in the following year” and “spring is cold and summer is flooded”. This study identifies prior-year September–October precipitation and current-year April temperature anomalies as key predictors for constructing an extreme runoff forecasting model.
3. GWO-BP Forecasting Model
3.1. BP Neural Network Model
The BP neural network is a multi-layer feedforward neural network proposed by Rumelhart [
25,
26,
27,
28], characterized by forward signal propagation and backward error propagation. The excitation function of its neurons is an S-shaped function. When the number of input nodes is n and the number of output nodes is m, the BP neural network expresses the function mapping relationship from n independent variables to m dependent variables. The topological structure of the BP neural network includes the input layer, hidden layer, and output layer, which can learn and store the complex correspondence relationship, and uses error backpropagation and steepest gradient information to find the parameter combination that minimizes network error. Due to the non-convex nature of the error function, the optimization speed of the neural network is slow. To improve the training speed and prediction accuracy of the neural network, intelligent optimization methods can be used to optimize the inter-layer parameters of the network and determine the optimal parameter values. In this paper, the wolf pack optimization algorithm is used as the optimization method to determine the initial values of the parameters of the neural network.
3.2. GWO Optimization Model
Wolves have a strict hierarchical system and specific hunting patterns, with the leader wolf responsible for decision making, the scout wolf responsible for finding prey, and the fierce wolf responsible for besieging prey. The allocation of prey in wolf packs is determined based on individual strength, with the strong receiving more and the weak receiving less or none, thus achieving a survival of the fittest mechanism in wolf packs. The wolf pack algorithm simulates the wandering, summoning, and besieging of wolf pack hunting, following the wolf pack’s “winner is king” first wolf generation mechanism and the “survival of the strong” wolf pack renewal mechanism. The wolf pack algorithm is based on mechanisms such as prey initialization, leader wolf generation, wolf detection, wandering behavior, and fierce wolf running behavior, siege behavior, and wolf pack updating behavior for optimization, and has the advantages of fast optimization speed and good optimization effect [
29].
3.3. GWO-BP Model
Once the structure of the BP neural network is determined, its weights and thresholds are obtained through training. BP neural network prediction involves assigning the weights and thresholds from the optimal individual to the BP neural network, enabling the network to generate predicted outputs via training and simulation. If the error remains large after multiple optimization iterations, the error acts as the objective function, and forecasting factors serve as inputs for secondary fitting. The final forecast result is obtained by superimposing the primary forecast result on the secondary forecast error. Optimization via the Gray Wolf Optimizer (GWO) algorithm effectively improves parameter estimation accuracy, reduces network learning time, and enhances fitting prediction performance.
The steps for optimizing the BP neural network using the GWO model are as follows.
Step 1: Initialize the neural network structure. A three-layer neural network is used as the network structure. The number of input layer neurons is n1, the number of hidden layer neurons is n2, and the number of output layer neurons is 1, where n2 = 2 × n1 + 1. The initial weights and thresholds are randomly initialized in the range of [−0.5, 0.5].
Step 2: Initialize the wolf pack individuals. Determine the upper and lower limits (ub, lb) of the artificial wolf pack with a population size of N and a maximum iteration count of Imax. Determine the dimension of the wolf pack vector as n based on the number of parameters, and determine the head wolf, scout wolf, and fierce wolf pack individuals based on their fitness.
Step 3: Individual evolution of wolf packs. Explore the wolf’s wandering behavior, the fierce wolf’s running behavior, the wolf pack’s besieging behavior, and the wolf pack’s updating behavior. The position of the wolf pack individuals is continuously updated, and the optimal leader wolf individual is constantly updated. The wolf pack individuals evolve towards the optimal direction, that is, the direction of optimal fitness.
Step 4: Iteration termination test. If the maximum number of iterations or the termination condition is met, the iteration is stopped. Otherwise, step 3 is repeated continuously until the optimal parameter vector individual is found.
4. Study Area
The research area is controlled by the Fengman Reservoir watershed, which is located on the Second Songhua River. The Second Songhua River Fengman Upper Basin is located in the central-southern region of Jilin Province, distributed between longitude 125°18′–125°45′ E and latitude 41°40′–44°05′ N. The location and water system map of the Fengman Reservoir watershed are shown in
Figure 1. There are two sources in the control basin of the Fengman Reservoir, namely Toudaojiang and Erdaojiang. The main source of the Erdao River originates from Tianchi in the Changbai Mountain. After the confluence of the Toudao and Erdao Rivers, it is called the Second Songhua River. The larger tributaries that are accepted along the main river section above the Fengman of the basin include the Lafa River and the Huifa River. The entire watershed is distributed in the northwest slope and Changbai Mountain area, with elevations gradually increasing from northwest to southeast, rising upstream from the Fengman 200 m~500 m to over 2000 m at the source of the river.
The forest vegetation coverage in the control basin of the Baishan Reservoir is relatively high, with secondary forest vegetation mainly distributed in mountainous areas, and some primary forests distributed in the river source area. Therefore, the soil erosion in this basin is relatively light, and the sediment content in the river is very low. The rivers with relatively high sediment content are the Lafa River and Huifa River basins, which are affected by frequent human activities due to the large amount of arable land in the area.
The area where the Fengman Control Watershed is located is on the edge of the East Asian continent, with a climate belonging to the cold temperate continental monsoon climate. The main characteristics of spring in this region are strong winds and dry weather, the main characteristics of summer are heavy rainfall and humid heat, the main characteristics of autumn are sunny and warm weather, and the main characteristics of winter are cold and long.
The precipitation in the watershed controlled by Fengman varies greatly from north to south, with the southern region receiving about 900 mm of precipitation and the central region dropping below 700 mm. Moreover, the distribution of precipitation is uneven throughout the year. The average annual precipitation of Fengman controlled basin from June to September is 520.4 mm, accounting for about 75% of the total annual precipitation in the region. Heavy rain and even rainstorms often occur in the flood season. The precipitation in this basin varies greatly between different years, with the highest annual precipitation being 1021.4 mm (1951) and the lowest being 565.0 mm (1958). The maximum precipitation is 1.8 times the minimum. Major floods in the basin generally occur from May to early September but are most common in late July and August.
5. Model Application
5.1. Sample Classification
To forecast the annual average inflow of the basin more effectively, the measured sample series and the investigated flood series are classified into seven grades according to the fractal classification method of definite samples. They are the Super-High-Flow Year (SHFY), High-Flow Year (HFY), Partial High-Flow Year (PHFY), Normal-Flow Year (NFY), Partial Low-Flow Year (PLFY), Low-Flow Year (LFY), and Super-Low-Flow Year (SLFY).
The method of sample classification is the proportional factor method. The multi-year runoff series
are known, and the multi-year average runoff
is calculated to be multiplied by the proportion factors corresponding to the sample grading to obtain the grades. The proportion factor is linked to the grading method of the Fengman Reservoir Basin, which conforms to the inflow forecasting after long-term analysis and combined with the production practice experience. The intervals of scaling factors are as follows: SHFY: >1.4, HFY: 1.2~1.4, PHFY: 1.1~1.2, NFY: 0.9~1.1, PLFY: 0.8~0.9, LFY: 0.6~0.8, SLFY: 0~0.6.
From the formula above, indicates the limit values of the seven grades; —the ratio factor of the limit of the SHFY and the HFY is 1.4; —the ratio factor of the limit of the HFY and the PHFY is 1.2; —the ratio factor of the limit of the PHFY and the NFY is 1.1; —the ratio factor of the limit of the NFY and the PLFY is 0.9; —the ratio factor of the limit of the PLFY and the LFY is 0.8; —the ratio factor of the limit of the LFY and the SLFY is 0.6.
5.2. Processing and Analysis of Forecasting Factors
The astronomical-scale factors, global circulation-scale factors, and watershed-scale factors are quantified as shown in
Table 1.
5.2.1. Sunspot Relative Number
The value of the relative number of sunspots can be obtained as the factor based on the website of the Space Environment Forecasting Center (
http://www.sepc.ac.cn/dailyReport_chn.php (accessed on 30 June 2018)).
The single and double cycles of the relative sunspot numbers: With a single cycle, the value is 1, and with a double cycle, the value is 0. The periods 1933–1943 (17th cycle), 1954–1963 (19th cycle), 1976–1985 (21st cycle), and 1996–2007 (23rd cycle) are set as single cycles, and are assigned a value of 1. The periods 1944–1953 (18th cycle), 1964–1975 (20th cycle), 1986–1995 (22nd cycle), and 2008–2018 (24th cycle) are double cycles, and are assigned a value of 0.
The phase of the relative number of sunspots: The year at the highest point of each cycle curve of annual sunspot relative number is called the peak year of sunspot activity (M year). The year at the lowest point of the curve is called the valley year of sunspot activity (m year). Years with a sunspot relative number larger than 40 are recorded as the M phase, and those with a sunspot relative number (SRN) less than 40 are recorded as the m phase. The phase of the year before the valley is denoted as m−1, and the phase of the year after the valley is denoted as m + 1. By analogy, the phase of sunspots can be determined to be m, m + 1, m + 2, M − 1, M, M + 1, M + 2, M + 3, M + 4, m − 3, m − 2,and m − 1, to which can be assigned values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12.
5.2.2. Lunar Apparent Declination
The value of the lunar declination angle: The minimum and maximum declination angles of the moon are 18.50 degrees and 28.50 degrees, with a motion period of 18.6 years. This cycle is similar to the Saros cycle of solar and lunar eclipses (18 years, 11 days, 8 h).
The phase of the lunar declination angle: The angle between the Earth’s equatorial plane and the apparent orbit of the moon is called the Lunar Apparent Declination (LAD, also known as the white-red declination). The tidal periodic variation and crustal deformation caused by the lunar movement are the main causes of heavy floods (or droughts), which can be used as a predictor of annual average inflow in the basin. The LAD varies periodically. The maximum annual maximum LAD in the first year is set as m1 and in the second year as m2, with the rest deduced by analogy to the minimum annual maximum LAD. The minimum annual maximum LAD in the first year is set as M1 and in the second year as M2, with the rest deduced by analogy to the maximum annual maximum LAD. The annual phase of the LAD is formed and set as a value from 0 to the final value. When the declination angle of the moon is descending, the sign is determined to be negative. When the lunar declination angle is ascending, the symbol is determined to be positive.
5.2.3. 24 Solar Terms and Lunar Calendar Dates
The 24 solar terms are the 24 time points that reflect the position of the Earth on the ecliptic. This position reflects the distance of the Earth from the sun and the angle at which the sun shines on the Earth. Since the solar calendar dates are fixed for several days each year to the 24 solar terms, the lunar calendar date can further distinguish the occurrence of different times and effectively reflect the relative position of the sun, Earth, and the moon. Different locations change the tidal forces and energy acting on Earth’s atmosphere and oceans, which will cause abnormal temperatures and atmospheric conditions, leading to abrupt changes in the atmospheric circulation system. The abrupt change in the atmospheric circulation system may lead to changes in the ridge line, the intensity of the subtropical high, and the area index of the subtropical high. The intensification of air convection leads to extreme events of drought and waterlogging in the region. The astronomical calendar issued by China’s Zijinshan Observatory shall prevail. Calibrating the time of 24 solar terms and lunar Calendar Dates (24STLCD) reflects the relative position of the sun and the moon to a certain extent, and the structure of the celestial motion and gravitational force to Earth.
The lunar calendar time of the 24 solar terms: The dates of each year’s 24 solar terms are treated as sample factors. The lunar calendar dates of the 24 solar terms are regarded as forecast factors calculated through the traditional Chinese calendar.
It is necessary to determine whether the year is a leap year or a regular year. A leap year is assigned a value of 1, and a normal year is assigned a value of 0.
The relationship between the time of the beginning of the spring festival and after the spring festival. The beginning of spring festival solar term is assigned a value of 1 if it occurs before the spring festival, and a value of 0 if it occurs after the spring festival.
5.2.4. Ocean Global Atmospheric Circulation Factor
There are many factors affecting atmospheric circulation. In this manuscript, we choose the Ocean Global Atmospheric Circulation Factor (OGACF), which has a great influence on atmospheric circulation, such as the La Madre, El Niño, and La Niña phenomena.
La Madre’s warm and cold phases: The value of La Madre is −1 if it is a cold phase. If it is a warm phase, the value of La Madre is +1. The first cold phase cycle was from 1889 to 1924, and the first warm phase cycle was from 1925 to 1945. The second cold phase cycle was from 1946 to 1977, and the second warm phase cycle was from 1978 to 1999. The third cold phase cycle is from 2000 to 2035.
El Niño and La Niña phenomena: According to the website of the National Climate Center (
http://www.ncc-cma.net/cn/), the phenomenon for each year is determined as El Niño or La Niña. If it is an El Niño year, the value is +1. If it is a La Niña year, the value is −1. The notable El Niño years are 1891, 1898, 1925, 1939–1941, 1953, 1957–1958, 1965–1966, 1972–1976, 1982–1983, 1997–1998, and 2007.
5.2.5. Watershed Meteorological Factors
Watershed Meteorological Factors (WMFs) are the direct factors affecting watershed inflow in the watershed scale. The WMF includes the temperature, precipitation, evaporation, drought index and runoff, and so on. Based on the “more rain after autumn, flooding hillsides in the coming year” and “if spring is cold, the summer is waterlogging”, the statistical analysis of the WMF with the inflow has been performed. It is known that the rainfall in September–October of last year in the Fengman Reservoir Basin affected the inflow of that year, and the temperature departure in April of that year affected the inflow of that year.
The relationship between the precipitation in September–October of the previous year after autumn and the inflow in the Fengman Reservoir Basin is analyzed. It can be seen that if the precipitation in September–October of the previous year was more than 100 mm, the probability of the Fengman Reservoir Basin being a flood year was higher.
For the WMF to forecast the characteristics of the inflow in the forecasting year, it is necessary to know the precipitation in September–October of the previous year. The measured values can be taken as the factors to forecast the inflow. The meteorological information can be obtained from the hydrological and meteorological monitoring department for free.
- 2.
Temperature departure in April
The relationship between the temperature departure in April of that year and the annual inflow of the Fengman Reservoir Basin is analyzed. It can be seen that if the temperature departure in April of that year was less than 1 °C, the probability of the Fengman Reservoir Basin experiencing a flood year was higher.
For the WMF to forecast the characteristics of the inflow in the forecasting year, it is necessary to know the temperature departure in April of that year. The measured values can be taken as the factors to forecast the inflow. The meteorological information can be obtained from the hydrological and meteorological monitoring department for free.
5.3. Example of Model Application
The annual inflow of the Fengman Reservoir Basin {qi, i = 1, 2, …, 85} from 1933 to 2017 is selected as the sample sequence. The BP neural network forecasting model is trained with the first 80 years (1933–2012) to determine the model parameters. The model is tested with the annual average runoff and the factors of the next five years (2013–2017).
Based on the established forecasting model, the forecast values of annual water volume are calculated for 2013 to 2017. According to the Hydrological Information Forecasting Specification (GB/T22482-2008), in the quantitative forecasting of medium- and long-term hydrological forecasting, the qualified standard for water volume is 20% of the annual variation. When the forecast error is less than the allowable error, it is considered a qualified forecast. By calculating the qualified standard and calculating the annual variation in water volume from 1933 to 2017, the qualified standard can be obtained as 118.6 m3/s.
The input of the model is the forecast factors, including the logarithm of sunspot phases, lunar declination angle, 24 solar terms and lunar calendar dates, the warm and cold phases of La Madra, the rainfall in September and October of the previous year, and the average temperature in April. By comparing the GWO-BP prediction results with those of the GWO-SVM and the single-scale factor prediction results, the results show that the GWO-BP model has better prediction performance than the GWO-SVM model.
5.3.1. GWO-BP Model Prediction Results
2014 MATLAB programming is used to calculate and verify the results. The input layer is 28, the hidden layer is selected as 28 after trial calculation, and the output layer is 1. The population size N of the wolf pack optimization algorithm is 50, and the maximum iteration number Gmax is 500. The BP neural network uses optimization algorithms to optimize parameters such as the network weights of the hidden layer and output layer, as well as the neuron bias term, which represents the activation threshold of neurons and is used to adjust the output baseline of neurons.
Figure 2 shows the comparison between the model’s training simulation values (1933–2012) and forecast values (2013–2017) and the annual average flow observation values in the Fengman Basin. The statistical analysis results of the errors related to the training and forecasting stages are shown in
Table 1.
According to the statistical results in
Table 1, the qualification rate in the training phase is 100.0%, classified as Class A; the testing phase qualification rate is 80%, with favorable forecasting performance, and is classified as Class B.
5.3.2. GWO-SVM Model Prediction Results
The input sample is consistent with the GWO-BP model, and the LibSVM toolbox function of 2014 MATLAB software is used to optimize based on the GWO algorithm. After trial calculation verification, the support vector machine in this model is the v-SVR, and the kernel function is the sigmoid function. Using optimization algorithms to optimize the support vector machines, the optimized parameters are penalty coefficients and kernel function parameters. The penalty coefficient is used to balance the maximization of classification interval and the tolerance for misclassification. The kernel function is a Gaussian kernel, and the parameter function controls the range of influence of a single sample on the decision boundary. The kernel function is a polynomial kernel, and the parameter function is to adjust the complexity of the polynomial mapping. The Gaussian kernel function was used in this study. After calculation, the parameters c = 0.9977 and g = 8.5277 can be obtained.
Based on the GWO-SVM forecasting model, the annual runoff forecast values are calculated for 2013–2017.
Figure 3 shows the comparison of the model’s training simulation values (1933–2012) and forecast values (2013–2017) with the annual average flow observation values in the Fengman Basin. The statistical analysis results of the errors related to the training and forecasting stages are shown in
Table 2. According to the statistical analysis results in
Table 2, it can be seen that the pass rate during the training phase is 100.0%, which is classified as Class A. The pass rate during the testing phase is 60%, and the predicted results are classified as Class C.
7. Conclusions
This study focuses on long-term forecasting of watershed inflow and analyzes the limitations of commonly used forecasting factors attributed to their short prediction periods. Current ultra-long-term forecasting lacks digital processing of astronomical-scale and global circulation-scale factors, and analyses of watershed-scale factor mechanisms are insufficient and unreliable. These limitations hinder comprehensive modeling of multi-scale factors. Through screening and quantifying astronomical-scale and global ocean current cycle-scale factors, observations of sunspot phase pairs, lunar declination angles, and their related trajectory characteristics were introduced as forecasting factors. The 24 solar terms and lunar calendar dates were used to quantitatively characterize the movements of celestial bodies (e.g., the sun, moon, and Earth). The characteristics of ocean currents influenced by the PDO (Pacific Decadal Oscillation), El Niño, and La Niña phenomena were digitized. Based on watershed agricultural proverbs, meteorological and hydrological forecasting factors affecting watershed inflow were selected, effectively addressing shortcomings such as short forecasting periods, difficulties in acquiring factors, and poor factor correlation. Using the constructed forecasting factors as inputs, the wolf pack optimization algorithm (GWO) was applied to optimize the BP neural network for predicting the annual inflow of the Fengman Basin from 2013 to 2017. Comparisons with the GWO-SVM model showed that the GWO-BP model achieved high qualitative and quantitative forecasting accuracy, exhibiting favorable effectiveness. The GWO-BP model is applicable to long-term runoff forecasting in river basins.