Time Series Analysis of Evaporation Duct Height over South China Sea: A Stochastic Modeling Approach

: The evaporation duct could signiﬁcantly affect the work status of maritime microwave communication systems in the South China Sea. Therefore, the exact forecasting of the evaporation duct is vital for the normal operation of the systems. This study presents a stochastic modeling approach to predict the future trends of the evaporation duct over the South China Sea. The autoregressive integrated moving average (ARIMA) model has been used for modeling the monthly evaporation duct height estimated from the Climate Forecast System Reanalysis dataset released by the National Centers for Environment Prediction. The long-term evaporation duct height data were collected for a period of 10 years from 2008 to 2017. The analysis of correlation function reveals the existence of seasonality in the time series. Therefore, a seasonal ARIMA model with the form as ARIMA(0,0,1) × (0,1,2) 12 is proposed by ﬁtting the monthly data optimally. The ﬁtted model is further used to forecast the evaporation duct variation for the year 2018 at 95% level of conﬁdence, and high-accuracy results are obtained. Our study demonstrates the feasibility of the proposed stochastic modeling technique to predict the future variations of the evaporation duct over South China Sea. Author Contributions: Conceptualization, Q.Z.; methodology, Q.Z.; software, Q.Z. and F.H.; vali-dation, F.H.; formal analysis, F.H.; investigation, F.H.; data curation, Q.Z.; writing—original draft preparation, F.H.; writing—review and editing, Q.Z.; project administration, Q.Z.;


Introduction
High-speed trans-horizon wireless marine communication has always been a challenge due to the contradiction between the transmission distance and channel capacity [1,2]. The wirless communication via evaporation duct provides a possible way to satisfy the need for long distance and high capacity at the same time [3]. As a special atmospheric layer resulting from the sea surface evaporation, the evaporation duct could trap radio waves within the evaporation duct layer. This would enable the evaporation duct to act as a wireless communication channel. Therefore, studying the evaporation duct properties, while estimating their effect on a reliable over-the-sea wireless communication link, is essential [4,5].
Evaporation duct height (EDH), the difference in height between the top of trapping layer and the sea surface, is the most widely-used parameter to quantify the trapping ability. This parameter is also used to estimate the path loss when designing the wireless communication capacity. Regarding this, to analyze the historical characteristics of EDH and to carry out relevant future predictions could help the engineers gain further insight into the nature and the variation of the evaporation duct channel. Deep investigations have been conducted to explore the varying and distribution characteristics of EDH, and the results confirm the existence of the inherent principle in the spatial-temporal variation [6][7][8][9]. Given this, the attempt to take advantage of the spatial-temporal characteristics and use the time series analysis technique for analyzing the EDH variability attains significance.
The time series analysis technique is commonly used to identify the persistence pattern in a sequence and to predict the future behavior of the variable based on its past pattern [10].
Various approaches are applied to simulate the patterns. One of most popular approach is the Box-Jenkins modeling technique, which also known as autoregressive integrated moving average (ARIMA) model [11]. The ARIMA model is a linear combination of autoregressive and moving average components [12]. In the autoregressive component, the current observation is represented as a weighted average of previous observations, plus a random term that is uncorrelated as a weighted average of previous observations. In the moving average component, the current observation is represented as a weighted average of uncorrelated random terms of other observations. The application of ARIMA model has been successfully extended to various fields of climatology such as wind speed [13,14], precipitation [15], air temperature [16], sea surface temperature [17], and soil moisture [18]. Studies carried by Yang Shaobo also successfully demonstrate the ability of ARIMA model to predict the EDH variation in the following day [19].
The spatio-temporal variations of EDH over South China Sea (SCS) have been extensively evaluated using evaporation duct model and reanalysis data. Climatology of EDH variations in the South China Sea was studied and reported in [6,20]. Wireless communication link characteristics, such as the path loss distribution and fading characteristics, are then detailed calculated based on the climatology [21]. In this paper, more quantitative information on the long-term trends of EDH over SCS is to be revealed by using the time series models. The current work suggests an ARIMA model based on a stochastic process that uses the characteristics of the historical data and forecasts its future values.
The remainder of this paper is organized as follows: In Section 2, the data and method adopted in our research is described in details. In Section 3, we focus on the time series modeling of EDH based on the ARIMA model. In Section 4, the conclusion is given.

Study Area
As the largest semi-enclosed marginal sea in the northwest Pacific Ocean, SCS ( Figure 1) is a hot spot for evaporation duct research. Numerous works have been published to establish a long-range radio link over South China Sea based on the evaporation duct channel [7,[22][23][24]. However, without knowing the evaporation duct height, it is difficult to calculate the statistics of fading variations of the channel to estimate the fade margin to establish reliable high data rate wireless communication via evaporation duct. In this view, analysis of EDH characteristic over SCS is of particular concern for the present study.

Monthly EDH Data
The monthly mean of EDH data from 2008 to 2017 has been retrieved from air-sea environmental data using evaporation duct model [25]. Based on the Monin-Obukhov similarity and Liu-Katsaros-Businger theories, the evaporation duct model estimates the EDH from local air-sea environmental conditions, including air temperature, sea surface temperature, specific humidity, surface wind speed, and pressure [26,27]. More details about the basic theory, model algorithm and parameterization schemes can be found in the authors' recently published studies [6][7][8]. In our study, the data of air-sea variables are obtained from the reanalysis products of the National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR) dataset [28]. A series of studies has been conducted on the comparison and evaluation of the EDH data obtained from this hybrid method with the measuring results from different sea areas, including the South China Sea [6,20,29]. The studies have demonstrated satisfactory agreement between the retrieved EDH and practically measured EDH data, thus rendering the methodology suitable for the EDH monitoring and research. As a result, the application of stochastic time series modeling on the long-term monthly mean EDH can be performed with great accuracy.

ARIMA Modeling Approach
ARIMA model consists of three parts, namely, AutoRegression, Integration and Moving Average. The autoregression part with lag order p refers to the current values X t as a function of past time series values X t−1 , X t−2 , . . ., X t−p and can be written as where t denotes the time indices and the terms φ 1 , φ 2 , . . ., φ p are autoregressive coefficients that relate X t to X t−1 , X t−2 , . . ., X t−p . The moving average part of the model with lag q can be expressed in following form where ε t−1 , ε t−2 , . . ., ε t−p are the independent white noise sequence with zero mean and variance σ 2 . θ 1 , θ 2 , . . ., θ q are the moving average coefficients relating X t to ε t−1 , ε t−2 , . . ., ε t−p . If the time series is not stationary, differencing should be applied to stationalize the time series and this should correspond to the integration in the model with lag d. Hence, the ARIMA(p,d,q) can be mathematically written as where B is the backshift operator, i.e., BX t = X t−1 . The terms φ(B) and θ(B) are the autoregressive and moving average operators, respectively, and can be written as When the time series contains seasonality, a multiplicative seasonal ARIMA denoted as ARIMA(p,d,q) × (P,D,Q) s is applied, where P,D,Q represent seasonal autoregression, seasonal differencing and seasonal moving average orders, respectively, and s denotes the seasonal cycle. The seasonality in the time series can be determined by the correlation function examing the lags significantly different from zero. A seasonal ARIMA(p,d,q) × (P,D,Q) built for the time series is defined as follows: where B is the backshift or lag operator, s is the seasonal lag (in month for present study), ε t represents error variables, d and D are non-seasonal and seasonal difference, φ and Φ are the non-seasonal and seasonal autoregressive parameters, and θ and Θ are the non-seasonal and seasonal moving parameters, respectively. The methodology of ARIMA modeling approach contains iterative procedures to find the best-fit model of the internal structures of the time series through three steps, namely, identification, estimation and diagnostic checking. To identify a suitable model, the stationarity of the time series should be checked firstly. For non-stationarity time series, it is necessary to employ situable differencing process to achieve stationarity. The autoregression and moving average parts of the differenced time series are determined by the graphs of autocorrelation function (ACF) and partial autocorrelation function (PACF). Considering the ACF and PACF correlograms, several ARIMA models are identified as the alternatives. The best-fit model is mainly determined by normalized Bayesian information criterion (BIC) value. BIC is a comprehensive parameter to measure the overall fitness of a model as following equation where l max is the model's maximum likelihood and n and k are the sample size and the number of estimated parameters, respectively. If the preferred model is obtained, the next step is to estimate the model parameters. Once the model parameters are specified, ACF and PACF correlograms of the residuals, and the difference between the actual and fitted data, are plotted for diagnostic check. For a good ARIMA model, the residuals should be uncorrelated and distributed normally with a zero mean. At last, the best-fit model is used to forecast the future values of current time series.

Model Verification and Comparison Criteria
Several different consistency measures, including the root of the mean square error (RMSE) and mean absolute percentage error (MAPE), are adopted to evaluate the performances of proposed model. These two quantified error measures are given by where P i and X i are the predicted and observed values, respectively. N is the number of data.  Figure 2b shows the long-term monthly average EDH variation over SCS from January 2008 to December 2017. There is a common trend of EDH observed in the seasons of each year. Generally, the overall level of EDH from January to May is lower than that of June to December. The peak values appear in June and July, and the nadirs are observed in the months of March and April. This variation is mainly attributed to the seasonal monsoon over the South China Sea [9]. The SCS is governed by the Southwest monsoon and northeast monsoon in summer and winter, respectively. Our previous research has pointed out that the evaporation duct height over South China Sea shows strong relevant to the sea surface wind and SST. The seasonality of the wind and SST patterns is charged for the seasonal variability in the EDH over the SCS. Further, the seasonality in the time series can be adpoted in the statistical model to determine the future trends of EDH.

ARIMA Modeling
The establishment of ARIMA model contains several iterative procedures, including model identification, model parameters estimation and model diagnostic checking. The monthly EDH data from 2008 to 2017 are used to train and obtain the optimal ARIMA model.
The model identification is mainly based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) correlograms of the monthly EDH time series. Autocorrelation describes the correlations between each pair of points in the time series with a specific lag. Partial autocorrelation is used to identify the correlations between time series points with different lags. If the ACF and PACF sequence rapidly converges to zero with increasing lag values, the sequence is considered stationary. Figure 3 shows the ACF and PACF correlograms of the EDH time series with different differential options. The x-axis is the time lag indicating the separation steps, and the y-axis is the correlation value between −1 and +1. The blue shade represents the 95% confidence intervals. Table 1 summarizes the BIC, RMSE and MAPE values of five best models among the different combinations of ARIMA model parameters to the data.    Figure 4 gives the ACF and PACF between the residuals at various lags of the five ARIMA models. Globally, the lags of all cases satisfy the 95% confidence criteria. Considering the autocorrelation coefficients of the residuals, all 24 autocorrelation coefficients are not statistically significant. This indicates that the residuals are similar to white noise and the models are a good fit. Individually, the model ARIMA(0,0,1) × (0,1,2) 12 has the minimum BIC value and smallest RMSE and MAPE. As a result, model ARIMA(0,0,1) × (0,1,2) 12 is considered as the best fit model and could be further applied in the following forecasts and analysis.    Once the model parameters are estimated, the model diagnostic checking is then performed, and a forecast of EDH for the next 12 months till December 2018 has been made at a 95% level of confidence. Figure 5 shows that the selected ARIMA model could simulate the seasonality well from the periodicity in the historical EDH series. The major feature of ARIMA model is that this modeling technique is rather simple and only fewer parameters are needed to make a good prediction. However, the random large fluctuations existing in the data that led to these extreme values may not be well represented by the model. Further evaluation of the model is conducted to compare the forecast values with the observed data till December 2018. Table 2 shows the residuals between the predicted and observed data are within ±0.5 for the whole period, except for the months of January and August. This may result from the linear characteristics of the ARIMA model [10,12]. Recently, a new software package, namely, Jumps Upon Spectrum and Trend (JUST), has been proposed to detect potential jumps in non-stationary time series and predict future trends [30,31]. This method is based on the spectral analysis and may also solve the non-stationary EDH time series forecasting. This will be part of our next research project. Generally, the prediction accuracy of the selected ARIMA(0,0,1) × (0,1,2) 12 is satisfactory and the model can be further extended to the months where the observed data are not available.

Conclusions
The study discusses the stochastic modeling of evaporation duct height over South China Sea for the period from 2008 to 2017. ARIMA models are analyzed to fit and forecast the monthly EDH data. Seasonal differencing is used to remove the seasonality and make the time series stationary. Correlation analysis is conducted to the differenced series in terms of the ACF and PACF, and the significant lags are adopted to calculate the order of the ARIMA models. Based on evaluation criteria, it is found that ARIMA(0,0,1) × (0,1,2) 12 model is adequate for simulating the EDH. Further, prediction has been conducted based on the proposed model for the year 2018. The results shows a high degree of fitness between the forecasted and observed values. Obviously, there are also some discrepancies, which probably come from the inherent linearity of the model. Improving the forecasting ability of our proposed model should be part of our future work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: