# Application of Set Pair Analysis-Based Similarity Forecast Model and Wavelet Denoising for Runoff Forecasting

## Abstract

**:**

## 1. Introduction

## 2. SPA-SF Model

#### 2.1. SPA Principles

_{i}, represents a runoff set, another runoff set is expressed by the set, B

_{n+1}, and therefore, B

_{i}and B

_{n}

_{+1}constitute a set pair. Typically, the set in the set pair analysis is expressed as H(B

_{i}, B

_{n}

_{+1}), which means that B

_{i}and B

_{n}

_{+1}form a pair. To determine the characteristics of the set pair, identity, difference and oppositional analyses can be performed using the same, different and reverse connection degrees [20]. Constructing connection degrees and computing connection coefficients is critical for SPA and is based on the set pair.

#### 2.2. SPA-SF Model and Its Application to Forecast Runoff

_{i}(i = 1,2,…,n), which depends on x

_{i-1}, x

_{i-2},…, x

_{i-m+1}, x

_{i-m}, the m is the previous adjacent historical values of runoff (i.e., the most recent runoff values at the same investigated location where the forecast has to be assessed). Define the set B

_{i}= (x

_{i}, x

_{i}

_{+1},…, x

_{i}

_{+m−1}) (i = 1,2,…,n–m), and x

_{m}

_{+i}. are the subsequent or forecast runoff values of set B

_{i}, as shown in Table 1 [20]. The elements of set B

_{i}are regarded as impact factors (i.e., there are m impact factors), and x

_{m}

_{+i}are regarded as predicted values or dependent variables. The subsequent values, x

_{n}

_{+1}, can be predicted based on the relationship between sets B

_{n}

_{+1}= (x

_{n}

_{-m+1}, x

_{ n}

_{-m+2},…,x

_{n}

_{−1}, x

_{n}) and B

_{i}, using SPA-SF [20].

**Table 1.**The sets constituted from hydrological time series [20].

Sets | Elements in sets | Subsequent values | ||||
---|---|---|---|---|---|---|

B_{1} | x_{1} | x_{2} | x_{3} | ... | x_{m} | x_{m}_{+1} |

B_{2} | x_{2} | x_{3} | x_{4} | ... | x_{m}_{+1} | x_{m}_{+2} |

B_{3} | x_{3} | x_{4} | x_{5} | ... | x_{m}_{+2} | x_{m}_{+3} |

... | ... | ... | ... | ... | ... | ... |

B_{n}_{−m} | x_{n}_{−m} | x_{n}_{−m+1} | x_{n}_{−m+2} | ... | x_{n}_{−1} | x_{n} |

B_{n}_{+1} | x_{n}_{−m+1} | x_{n}_{−m+2} | x_{n}_{−m+3} | ... | x_{n} | x_{n}_{+1} |

- (1)
- The appropriate value of m is selected through the analysis.
- (2)
- Process the various elements in set B
_{i}to obtain their symbolic rank according to certain classification criteria. Use the mean deviation to classify the elements. Calculate the average μ_{j}and the average absolute deviation d_{j}(j = 1,2…,m) of the impact factor (i.e., the same element in the set). The elements in set B_{i}can be classified into Classes I, II and III according to (0, μ_{j}− 0.5d_{j}), (μ_{j}− 0.5d_{j}, μ_{j}+ 0.5d_{j})and (μ_{j}+ 0.5d_{j}, ∞), respectively.

**Figure 1.**The flowchart for the forecasting processes of the set pair analysis-based similarity forecast (SPA-SF) model.

- (3)
- Construct the current set B
_{n }_{+ 1}, according to the classification standard of the quantified symbols. Construct the set pair, H(B_{n }_{+ 1,}B_{i}), and count the number of identical statistical symbols (i.e., identity). Count the number of statistical symbols with one difference (i.e., differences), such as Class II vs. I, or Class II vs. III. Count the number of statistical symbols with two differences (i.e., compositionality), such as Class III vs. I. Calculate the connection degree for each set pair. The connection degree μ_{Bn+1~Bi}, which describes the relationship between B_{n}_{+1}and B_{i}, is defined as [20]: - (4)
- When the values of I and J are reasonably chosen, Equation (1) becomes a value called the connection coefficient, denoted by μ′
_{Bn+1~Bi}. In this study, the values of I and J were chosen as 0.5 and −1 [20], respectively. Thus, the connection coefficient of set pair H(B_{n}_{+1}, B_{i}) and μ′_{Bn+1~Bi}can be obtained. - (5)
- The K historical samples, which are the most similar to B
_{n}_{+1 }based on the maximum of connection coefficients μ′_{Bn+1~Bi}are determined. The value of K can be empirically determined or obtained from the connection coefficients that exceed a certain threshold K ≤ n^{0.5}. Typically, the choice of K depends on the specific circumstances of the study. In this study, the suitable value of K was chosen based on the largest values of the connection coefficients, μ′_{Bn+1~Bi}. The relative weights corresponding to the K historical samples can be determined from the relative membership degree, υ_{n}_{+1,i}, corresponding to the connection coefficients, μ′_{Bn+1~Bi}. The prediction of x_{n}_{+1 }can be obtained from the weighted average of the K historical samples, as follows [20]:

## 3. Wavelet Denoising

#### 3.1. Wavelet Transforms and Daubechies Wavelet Coefficients

**H**and decomposition high-pass filter

**G**, are needed to compute discrete wavelet transforms. This work applied the Daubechies wavelet with vanishing moment of four (DAUB4) filters proposed by Daubechies. The DAUB4 has the following four coefficients: c

_{0}, c

_{1}, c

_{2}and c

_{3}. A decomposed matrix,

**F**, can be applied to decompose the hydrological data vector,

**X**, as follows [23]:

_{0}, c

_{1}, c

_{2}, c

_{3}} be smoothing filter

**H**. The odd elements of the

**FX**output are obtained by convolving the

**H**with the

**X**, which gives an approximation of the original data,

**S**. The filter {c

_{3}, −c

_{2}, c

_{1}, −c

_{0}}, denoted as

**G**, cannot be a smoothing filter, due to its negative values. The even elements of the

**FX**output are obtained by convolving the

**G**with the

**X**, which obtains

**W**, the detailed signal for the original data [23].

**F′**, is obtained as follows [23]:

**F′**(

**FX**) =

**X**.

_{0}, c

_{1}, c

_{2}, c

_{3}} estimated by Daubechies are as follows [23]:

#### 3.2. Wavelet Denoising Procedure

- (1)
- Choose an appropriate wavelet function and number of resolution level M. The original one-dimensional time series is decomposed into an approximation at resolution level M and detailed signals at various resolution levels by using wavelet transform.
- (2)
- Below fixed thresholds, the absolute values of detailed signals, w
_{j}(t) (j = 1, 2,…, M), are set to zero at each resolution level. The subscript, j, represents the j-th resolution levels. The absolute values of detailed signals that exceed fixed thresholds are treated as the difference between the values of detailed signals and thresholds as follows [24].Equation (6) gives the threshold quantifications used to obtain the processed detailed signals at each resolution level during wavelet denoising. The approximation usually does not perform threshold quantifications. - (3)
- Wavelet reconstruction can derive the denoised time series data from the approximation at resolution level M and processed detailed signals (ŵ
_{j}(t)) at all resolution levels.

## 4. Application and Analysis

#### 4.1. The Proposed Method Combining SPA-SF and Wavelet Denoising

#### 4.2. Determining Thresholds

_{j}is the length of detailed signals at each resolution level, j, and σ is the noise strength of detailed signals at each resolution level. Noise strength was obtained as follows [24]:

_{j}(t) are the detailed signals at resolution level j. The T varies according to the length of detailed signals and noise strength. The assumed error model for the choice of the error estimator in Equations (7) and (8) is Gaussian white noise with zero mean [24]. It should be noted that applying wavelet denoising based on a short time series may not provide a robust enough estimate of noise strength in Equation (8). In practice, a long time series should be used.

#### 4.3. Collection and Collation of Research Data

Name of stations | Name of rivers | Name of river basins | Drainage basin area (km^{2}) |
---|---|---|---|

Lijia | Lijia River | Lijia River | 148.62 |

Yanping | Kano River | Beinan River | 476.16 |

Tateyama | Fung Ping River | Hsiukuluan River | 249.40 |

Mizuho Bridge | Hsiukuluan River | Hsiukuluan River | 1538.81 |

Renshou Bridge | Mugua River | Hualien River | 425.92 |

Hualien Bridge | Hualien River | Hualien River | 1506.00 |

**Figure 2.**The location of the six investigated hydrometric sites in eastern Taiwan. The boundaries of the drainage basins are also shown [25]. (

**a**) Lijia Station; (

**b**) Yanping Station; (

**c**) Tateyama and Mizuho Bridge Station; (

**d**) Renshou Bridge and Hualien Bridge Station.

#### 4.4. Comparison of Models

_{est}and Ԛ

_{obs }represent the estimated and observed runoff, respectively. A small RMSE value indicates that the simulation results were close to the actual data and had high accuracy.

## 5. Results and Discussion

**Figure 3.**Original annual runoff time series and denoised annual runoff time series using wavelet denoising of the six stations. (

**a**) Lijia Station; (

**b**) Yanping Station; (

**c**) Tateyama Station; (

**d**) Mizuho Bridge Station; (

**e**) Renshou Bridge Station; (

**f**) Hualien Bridge Station.

_{i}(i = 1,2,…,n) was dependent on the five previous historical values. With a certain classification criteria, the various elements in set B

_{i}were processed to obtain the symbolic rank. The current set B

_{n}

_{ + 1 }was constructed and quantified according to the classification standard symbols. In addition, the values of I and J were selected as 0.5 and −1 [20], respectively. The connection coefficient, μ′

_{Bn}

_{+1~Bi}, of the set pair, H(B

_{n}

_{+1}, B

_{i}), was then obtained. A suitable value of K was chosen based on the number of the largest contact coefficients, μ′

_{Bn}

_{+1~Bi}. Equation (2) was used to forecast the annual runoff time series. For example, the calculation process for the Hualien Bridge station in 2003 is shown in Table 3. Table 3 shows that the values of the largest connection coefficient and K were 0.8 and 2, respectively. These data indicate that the values of the largest connection coefficient and K were reasonable. This result suggests that the SPA-SF model can be applied to forecast annual runoff time series.

**Table 3.**The calculation process of forecasting annual runoff for the Hualien Bridge station in 2003.

Sets | x_{i} | x_{i}_{+1} | x_{i}_{+2} | x_{i}_{+3} | x_{i}_{+4} | Subsequent values
x_{i}_{+5} | Identity | Differences | Compositionality | Connection coefficients |
---|---|---|---|---|---|---|---|---|---|---|

B_{1} | III | III | III | I | II | 37,977 | 0.4 | 0.2 | 0.4 | 0.1 |

B_{2} | III | III | I | II | II | 34,293 | 0.2 | 0.4 | 0.4 | 0.0 |

B_{3} | III | I | II | III | II | 22,702 | 0.6 | 0.4 | 0.0 | 0.8 |

B_{4} | I | II | III | II | I | 39,307 | 0.4 | 0.4 | 0.2 | 0.4 |

B_{5} | II | II | II | I | II | 34,200 | 0.0 | 0.8 | 0.2 | 0.2 |

B_{6} | II | II | I | I | II | 27,551 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{7} | II | I | III | II | I | 36,767 | 0.6 | 0.4 | 0.0 | 0.8 |

B_{8} | I | II | II | I | II | 46,204 | 0.0 | 0.6 | 0.4 | -0.1 |

B_{9} | II | II | I | II | III | 30,304 | 0.0 | 0.6 | 0.4 | -0.1 |

B_{10} | II | I | II | III | II | 34,553 | 0.4 | 0.6 | 0.0 | 0.7 |

B_{11} | I | II | III | II | II | 28,901 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{12} | II | III | II | II | I | 31,007 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{13} | III | II | II | I | II | 49,942 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{14} | II | II | I | II | III | 31,395 | 0.0 | 0.6 | 0.4 | −0.1 |

B_{15} | II | I | II | III | II | 28,567 | 0.4 | 0.6 | 0.0 | 0.7 |

B_{16} | I | II | III | II | I | 14,963 | 0.4 | 0.4 | 0.2 | 0.4 |

B_{17} | II | III | II | I | I | 34,098 | 0.2 | 0.4 | 0.4 | 0.0 |

B_{18} | III | II | I | I | II | 29,349 | 0.2 | 0.4 | 0.4 | 0.0 |

B_{19} | II | I | I | II | I | 32,706 | 0.4 | 0.4 | 0.2 | 0.4 |

B_{20} | I | I | II | II | II | 24,671 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{21} | I | II | II | II | I | 46,275 | 0.2 | 0.6 | 0.2 | 0.3 |

B_{22} | II | I | II | I | III | 28,012 | 0.2 | 0.4 | 0.4 | 0.0 |

B_{23} | I | II | I | III | I | 52,494 | 0.4 | 0.2 | 0.4 | 0.1 |

B_{24} | II | I | III | I | III | 53,543 | 0.4 | 0.2 | 0.4 | 0.1 |

B_{25} | I | III | I | III | III | 16,919 | 0.2 | 0.0 | 0.8 | −0.6 |

B_{26} | III | I | III | III | I | – | – | – | – | – |

**Table 4.**The forecasting results obtained using the SPA-SF and SPA-SF model and wavelet denoising (SPA-SFW) based on the RMSE for six stations.

Lijia | Yanping | ||||||||
---|---|---|---|---|---|---|---|---|---|

Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | ||

2003 | 3,385 | 3,051 | 2,192 | 2003 | 9,928 | 8,174 | 9,844 | ||

2004 | 2,587 | 2,774 | 2,920 | 2004 | 10,674 | 14,195 | 9,477 | ||

2005 | 3,448 | 2,890 | 2,789 | 2005 | 25,568 | 7,635 | 11,234 | ||

2006 | 4,954 | 3,880 | 4,492 | 2006 | 24,889 | 13,261 | 13,037 | ||

2007 | 3,757 | 2,943 | 4,029 | 2007 | 6,589 | 8,547 | 8,487 | ||

2008 | 2,357 | 3,124 | 2,151 | 2008 | 7,510 | 4,147 | 9,844 | ||

RMSE | 691 | 619 | RMSE | 9,013 | 7,707 | ||||

Tateyama | Mizuho Bridge | ||||||||

Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | ||

2003 | 9,730 | 6,403 | 6,725 | 2003 | 30,552 | 34,723 | 34,158 | ||

2004 | 7,455 | 6,901 | 7,117 | 2004 | 34,808 | 38,753 | 32,795 | ||

2005 | 11,458 | 8,154 | 7,047 | 2005 | 40,134 | 38,702 | 37,280 | ||

2006 | 8,664 | 8,207 | 8,765 | 2006 | 39,159 | 32,655 | 45,371 | ||

2007 | 18,772 | 5,639 | 7,411 | 2007 | 51,320 | 44,931 | 52,029 | ||

2008 | 7,257 | 8,207 | 6,939 | 2008 | 34,877 | 51,015 | 42,048 | ||

RMSE | 5,713 | 5,128 | RMSE | 7,943 | 4,391 | ||||

Renshou Bridge | Hualien Bridge | ||||||||

Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | Year | Observe (m^{3}/s) | SPA-SF (m^{3}/s) | SPA-SFW (m^{3}/s) | ||

2003 | 4,624 | 7,428 | 4,225 | 2003 | 31,077 | 29,734 | 31,007 | ||

2004 | 6,783 | 4,851 | 5,628 | 2004 | 38,921 | 37,552 | 36,088 | ||

2005 | 17,327 | 6,896 | 7,834 | 2005 | 68,780 | 34,293 | 34,293 | ||

2006 | 12,114 | 5,135 | 5737 | 2006 | 57,138 | 28,371 | 29,611 | ||

2007 | 16,804 | 11,979 | 11,979 | 2007 | 68,247 | 28,901 | 34,707 | ||

2008 | 9,657 | 6,939 | 7,166 | 2008 | 47,589 | 30,673 | 41,024 | ||

RMSE | 5,770 | 5,192 | RMSE | 25,347 | 22,815 |

**Table 5.**The average forecasting results obtained using SPA-SF, SPA-SFW and autoregression (AR) based on RMSE.

Name of stations | SPA-SF | SPA-SFW | AR |
---|---|---|---|

Lijia | 691 | 619 | 1,061 |

Yanping | 9,013 | 7,707 | 9,141 |

Tateyama | 5,713 | 5,128 | 4,674 |

Mizuho Bridge | 7,943 | 4,391 | 9,030 |

Renshou Bridge | 5,770 | 5,192 | 6,036 |

Hualien Bridge | 25,347 | 22,815 | 21,146 |

Average | 9,080 | 7,642 | 8,515 |

Name of stations | m = 1 | m = 2 | m = 3 | m = 4 | m = 5 |
---|---|---|---|---|---|

Lijia | 0.31699 | 0.25847 | 0.25648 | −0.03414 | 0.13865 |

Yanping | 0.22231 | 0.26237 | 0.15771 | −0.14061 | 0.43697 |

Tateyama | 0.08698 | 0.30733 | 0.47995 | 0.11940 | −0.00456 |

Mizuho Bridge | 0.31934 | 0.13720 | 0.39881 | 0.18277 | −0.05631 |

Renshou Bridge | 0.21177 | 0.09370 | 0.22852 | 0.28814 | 0.14405 |

Hualien Bridge | 0.06826 | 0.21847 | 0.36847 | 0.20636 | 0.12162 |

## 6. Conclusions

## Conflicts of Interest

## References

- Wu, C.L.; Chau, K.W. Data-driven models for monthly streamflow time series prediction. Eng. Appl. Artif. Intell.
**2010**, 23, 1350–1367. [Google Scholar] [CrossRef] - Montanari, A. Large sample behaviors of the generalized likelihood uncertainty estimation (GLUE) in assessing the uncertainty of rainfall-runoff simulations. Water Resourc. Res.
**2005**, 41. [Google Scholar] [CrossRef] - Wang, W.S.; Li, Y.Q. Set pair analysis of water resources and hydrology. South-to-north water divers. Water Sci. Technol.
**2011**, 9, 27–32. (in Chinese). [Google Scholar] - Jin, J.L.; Wei, Y.M.; Wang, W.S. Set pair analysis based on similarity forecast model of water resources. J. Hydroelectr. Eng.
**2009**, 28, 72–77. (in Chinese). [Google Scholar] - Zhao, K. Set Pair Analysis and Its Preliminary Applications; Zhejiang Science and Technology Press: Hangzhou, China, 2000. (in Chinese) [Google Scholar]
- Jiang, Y.L.; Xu, C.F.; Yao, Y.; Zhao, K.Q. Systems information in set pair analysis and its applications. In proceedings of 2004 International Conference on Machine Learning and Cybernetics, Shanghai, China, 26–29 August 2004.
- Li, H.M.; Fu, Q. Highest floodlevel prediction based on set pair analysis similarity forecast model. J. Heilongjiang Hydraul. Eng.
**2010**, 3, 30–32. (in Chinese). [Google Scholar] - Gao, J.; Sheng, Z. Method and application of set pair analysis classified prediction. J. Syst. Eng.
**2002**, 7, 458–462. (in Chinese). [Google Scholar] - Nourani, V.; Rahimi, A.Y.; Nejad, F.H. Conjunction of ANN and threshold based wavelet de-noising approach for forecasting suspended sediment load. Int. J. Manag. Inf. Technol.
**2013**, 3, 9–26. [Google Scholar] - Nejad, F.H.; Nourani, V. Elevation of wavelet denoising performance via an ANN-based streamflow forecasting model. Int. J. Comput. Sci. Manag. Res.
**2012**, 1, 764–770. [Google Scholar] - Chang, S.G.; Yu, B.; Vetterli, M. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process
**2000**, 9, 1532–1546. [Google Scholar] [CrossRef] - Lim, Y.H.; Lye, L.M. Denoising of streamflow series affected by tides using wavelet methods. In Proceeding of Annual Conference of the Canadian Society for Civil Engineering, Montréal, Québec, Canada, 5–8 June 2002.
- Liu, G.H.; Qian, J.L.; Wang, J.J. Study of flood forecast based on wavelet soft-threshold technology and ANN. J. Hydroelectr. Eng.
**2004**, 23, 5–10. (in Chinese). [Google Scholar] - Wang, X.J.; Fei, S.M. Application of wavelet analysis to hydrological runoff simulation. Water Resour. Power
**2007**, 25, 1–3. (in Chinese). [Google Scholar] - Wang, H.R.; Ye, L.T.; Liu, C.M.; Yang, C.; Liu, P. Problems in wavelet analysis of hydrologic series and some suggestions on improvement. Progr. Nat. Sci.
**2007**, 17, 80–86. [Google Scholar] [CrossRef] - Cui, L.; Chi, D.; Wu, S. Forecasting precipitation using gray topological with wavelet denoising. J. Liaoning Tech. Univ.
**2009**, 28, 853–856. (in Chinese). [Google Scholar] - Wang, W.H.; Hu, S.X.; Li, Y.Q. Wavelet transform method for synthetic generation of daily streamflow. Water Resour. Manag.
**2011**, 25, 41–57. [Google Scholar] [CrossRef] - Chou, C.M. A threshold based wavelet denoising method for hydrological data modelling. Water Resour. Manag.
**2011**, 25, 1809–1830. [Google Scholar] [CrossRef] - Li, A.Y.; Lu, J.H. Annual runoff forecasting based on wavelet de-noising SPA model. Adv. Mater. Res.
**2012**, 356, 2301–2306. [Google Scholar] - Wang, W.S.; Zhang, X.; Jin, J.L.; Ding, J.; Wang, H. Methods of Uncertainty Analysis for Hydrology; Science Press: Beijing, China, 2011. (in Chinese) [Google Scholar]
- Wang, W.S.; Ding, J.; Lee, Y.Q. Hydrological Wavelet Analysis; Chemical industry Press: Beijing, China, 2005. (in Chinese) [Google Scholar]
- Mallat, S.G. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal.
**1989**, 11, 674–693. [Google Scholar] [CrossRef] - Lin, Z.S.; Zheng, Z.W. Diagnosis Technology of Climate Using Wavelet; Meteorology Press: Beijing, China, 1999. (in Chinese) [Google Scholar]
- Donoho, D.L.; Johnstone, J.M. Ideal spatial adaptation by wavelet shrinkage. Biometrika
**1994**, 81, 425–455. [Google Scholar] [CrossRef] - Taiwan River Restoration Network. Available online: http://trrn.wra.gov.tw/trrn/controlRiver/index.do (accessed on 17 November 2013).

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Chou, C.-M. Application of Set Pair Analysis-Based Similarity Forecast Model and Wavelet Denoising for Runoff Forecasting. *Water* **2014**, *6*, 912-928.
https://doi.org/10.3390/w6040912

**AMA Style**

Chou C-M. Application of Set Pair Analysis-Based Similarity Forecast Model and Wavelet Denoising for Runoff Forecasting. *Water*. 2014; 6(4):912-928.
https://doi.org/10.3390/w6040912

**Chicago/Turabian Style**

Chou, Chien-Ming. 2014. "Application of Set Pair Analysis-Based Similarity Forecast Model and Wavelet Denoising for Runoff Forecasting" *Water* 6, no. 4: 912-928.
https://doi.org/10.3390/w6040912