Canonical Correlation Study on the Relationship between Shipping Development and Water Environment of the Yangtze River

: The sustainable development of the Yangtze River will a ﬀ ect the lives of the people who live along it as well as the development of cities beside it. This study investigated the relationship between shipping development and the water environment of the Yangtze River. Canonical correlation analysis is a multivariate statistical method used to study the correlation between two groups of variables; this study employed it to analyze data relevant to shipping and the water environment of the Yangtze River from 2006 to 2016. Furthermore, the Yangtze River Shipping Prosperity Index and Yangtze River mainline freight volume were used to characterize the development of Yangtze River shipping. The water environment of the Yangtze River is characterized by wastewater discharge, ammonia nitrogen concentration, biochemical oxygen demand, the potassium permanganate index, and petroleum pollution. The results showed that a signiﬁcant correlation exists between Yangtze River shipping and the river’s water environment. Furthermore, mainline freight volume has a signiﬁcant impact on the quantity of wastewater discharged and petroleum pollution in the water environment.


Introduction
The sustainable development of the Yangtze River is crucial for the lives of the people and the development of the cities along it [1,2]. The Yangtze River has an abundance of natural resources, a superior geographical environment, and a strong shipping capacity. Its navigation mileage exceeds 2800 kilometers, accounting for 80% of Chinese inland river freight volume, and it is known as the Chinese "golden waterway." The Yangtze River plays a critical role in China's economic development. Over the years, the economies of the 11 provinces along the Yangtze River have developed rapidly and continuously, forming the Yangtze River Economic Belt. In 2018, the GDP of the Yangtze River Economic Belt was approximately US$ 5.76 trillion, accounting for 45% of the whole country's economy [3]. The Yangtze River Economic Belt is one of the key regions for economic development in China. It has a high degree of urbanization, a large population density, and developed industry and agriculture [4]. The quality of the water environment directly affects the ecological balance, economic development, and public health in the region [5]. Therefore, the sustainable development of the Yangtze River's water quality is particularly crucial.
With the development of the Yangtze River Economic Belt, human disturbance in the Yangtze basin has led to increased soil loss, and, furthermore, increased dam and river navigation has altered the pristine ecosystem and threatened local biodiversity [6]. In addition, the Yangtze River receives a large amount of industrial waste water [7], municipal sewage discharge, and sewage discharged from ships. The Yangtze River Valley Water Environment Monitoring Center monitors the water quality of the Yangtze River. It has detected many harmful organic chemicals in the water, and the content of nutrients, heavy metals, and dissolved organic carbon in the wastewater were shown to be increasing [8]. Water pollution in the Yangtze River will indirectly affect the economic situation of the cities in the basin [9]. According to relevant studies, the discharge of pollutants into the Yangtze River basin has increased year by year, reaching 35.32 billion tons in 2016. Nearly 20% of the river is inferior to the class III standard for surface water; moreover, the overall compliance rate of important functional areas of water is only 73.8% [10].
At present, research on the pollution in the Yangtze River water environment is mainly focused on the effects of heavy metal, waste dumping, organic micropollutants, and dam construction [11][12][13][14][15]. One study statistically analyzed the influences of runoff, estuarine areas, and adjacent sea areas on heavy metal pollution in the surface water of the Yangtze River estuary using the metal profile fluxes in floods and tides [12]. In another study, the impact of waste dumping on the water environment of the Yangtze River was determined with statistical data and spatial patterns, and then the subsequent environmental impact on local and water ecosystems was evaluated [13]. Furthermore, through a water diversion project, the content of organic micropollutants in the Yangtze River water source and their threat to water quality were investigated and analyzed [14]. Another study discussed the influence of dam construction on the water environment by analyzing the effects of changes in sediment elements and contents on hydrological conditions [15].
However, the impact of shipping on water pollution in the Yangtze River has rarely been studied. According to statistics, there are more than 80,000 transport vessels in the main stream of the Yangtze River, most of which are not equipped with oil-water separation devices or domestic sewage treatment devices. Millions of tons of oily sewage, tens of billions of tons of wastewater, and 750 million tons of domestic waste are discharged into the Yangtze River every year [16]. In recent years, domestic garbage, domestic waste water, and oil produced during the transportation of ships have caused pollution to the Yangtze River's water environment, which cannot be ignored.
The construction and development of the ecological environment monitoring website provided abundant data for environment research. Multivariate statistical analysis methods are considered reliable and effective tools to obtain valuable ecological environment information through a large amount of monitoring data. Various analysis methods of this kind have been used for water quality correlation analysis of rivers in the Yangtze River Basin. Cluster analysis can be used to assess the changes and trends in river water quality [17]. Principal component analysis can screen out independent comprehensive factors related to water quality [18]. Factor analysis is often used to identify key information to reflect the distribution of factors affecting water quality [19]. Canonical correlation analysis (CCA) could establish the correlation between two groups of indicators as a whole and it capable of reflecting the overall correlation between two groups of variables. It has also been applied to a number of environmental quality assessments with agreeable results. CCA has been used in water quality correlation analysis of the Nervion-Ibaizabal estuary in the Basque country of Spain [20], the Caspian sea in the southwest [21], and the Kalen river in Iran [22]. In this article, canonical correlation analysis (CCA) is employed to determine the relationship between shipping and related water environment in the Yangtze river.
In this study, the authors sought to (1) collect, collate, and screen the relevant historical data of shipping and the water environment of the Yangtze River; (2) use Spearman correlation analysis to identify the correlation between shipping and the water environment of the Yangtze River; and (3) use canonical correlation analysis (CCA) to determine the relationship between Yangtze River shipping and the related water environment. This study's findings provide a reference for developing countermeasures to prevent shipping pollution as well as for the sustainable development of the Yangtze River. Such work will inform public policy and be useful for various stakeholders in discussions on the sustainable development of the Yangtze River.

Canonical Correlation Analysis
CCA is a descriptive method that seeks to obtain measures of association between two sets of multivariate observations [23,24]. CCAs are widely employed in practical analysis. For example, CCA was used to examine the relationship between quality and process monitoring combined with a regularization method [25]. In addition, CCAs are widely used in biomedicine [26,27]. Feature vectors are extracted from gray matter and white matter, which represent two different anatomical feature spaces of the brain. Moreover, one study used CCA to find the optimal feature subset from two groups of highly correlated features, thereby helping improve the diagnosis performance of Parkinson's disease [28].
For describing the correlation between two groups of variables, the Spearman correlation coefficient has a disadvantage in that it only considers the correlation between a single X and single Y in isolation; Spearman correlation analysis does not consider the correlation between variables within the X and Y variable groups. Many correlation coefficients exist between two groups, which makes the problem appear complex and difficult to describe as a whole. CCA is a multivariate analysis method for studying the correlation between two groups of variables. It uses the idea of principal component analysis to reduce dimensions and extract principal components from two groups of variables, as well as maximize the correlation between the principal components extracted from two groups of variables when the principal components extracted from the same group are not related to each other. CCA uses the correlation of principal components extracted from two groups of variables to describe the linear correlation of the two groups as a whole. The principle of a CCA is as follows: There are two interrelated groups of random variables: The covariance matrix of X and Y is: (2) In Equation (2), 11 = cov(X) is a p × p matrix; 22 = cov(Y) is a q × q matrix; and According to the principal component analysis method, the correlation between two sets of variables (X and Y) can be transformed into a correlation between two canonical variables (U and V). To identify the canonical variables U and V, the coefficient a = a 1 , · · · , a p and b = b 1 , · · · , b q must be found, as shown in Equations (3) and (4). The variance and covariance of U and V are shown in Equation (5). The correlation coefficients of U and V are shown in Equation (6): In CCA, the difficulty is how to select coefficient vectors a and b to maximize corr(X, Y) under the given X, Y, and . The correlation coefficient between the canonical variables U and V does not change when they are multiplied by an arbitrary constant. Therefore, the coefficient vectors a and b are restricted as in Equation (7) to prevent the repetition of unnecessary results.

Data Selection and Source
In this study, Yangtze River Shipping Prosperity Index, Yangtze River mainline freight volume, Yangtze River wastewater discharge quantity, ammonia nitrogen, biochemical oxygen demand, potassium permanganate index and petroleum were selected as variables.
While many factors can be used to characterize the development of shipping on and the water environment of the Yangtze River, limited factors exist that can be used to quantify the actual data. For this reason, this study employed the Yangtze River mainline freight volume and Yangtze River Shipping Prosperity Index to characterize the development of Yangtze River shipping. The Yangtze River mainline freight volume is the most ideal quantitative index for representing the development of shipping, whereas the Yangtze River Shipping Prosperity Index is a qualitative index for investigating the Yangtze River port; it reflects the production and operations of the shipping port and shipping enterprises. The freight volume of the Yangtze River mainstream is obtained from the annual freight statistics of the Yangtze River. According to the principle of diffusion index, the shipping prosperity index of the Yangtze river is a comprehensive quantitative index that is developed by processing and summarizing the qualitative indexes in a survey of the prosperity of port and waterway enterprises on the Yangtze river. The survey observation points include 36 port companies and 50 shipping companies (in total 86 samples). Each year is divided into four quarters to observe the classification of enterprises by type, region, transportation, goods, etc. These are finally summed up to get the annual shipping climate index for Yangtze river [29]. The prosperity index takes 100 as the critical point and it fluctuates from 50 to 150. When the index is greater than 100, it indicates that the rising index dominates, which reflects that the Yangtze River shipping is in a state of growth and prosperity. The higher the index, the better the state of prosperity. Yangtze River mainline freight volume and Yangtze River Shipping Prosperity Index are derived from the data provided by the waterway transportation economy and shipping prosperity in the Yangtze River Yearbook from 2006 to 2016 [30][31][32][33][34][35][36][37][38][39][40].
In this study, the water environment of the Yangtze River was characterized by the Yangtze River wastewater discharge quantity, ammonia nitrogen, biochemical oxygen demand, potassium permanganate index, and petroleum. Specifically, the wastewater discharge quantity is constructed from annual statistics of water discharges from water users from industries, construction industry, tertiary industry, and urban residents in the Yangtze River Basin. The Yangtze River wastewater discharge quantity comes from the data on sewage discharge in the Changjiang & Southwest River Water Resources Bulletin [41][42][43][44][45][46][47][48][49][50][51], collected from the Yangtze River Water Resources Network from 2006 to 2016. The data on ammonia nitrogen, biochemical oxygen demand, potassium permanganate index and petroleum came from sample monitoring at seven key monitoring points in the main stream of the Yangtze river basin, covering seven regions of Sichuan, Chongqing, Hubei, Hunan, Jiangxi, Anhui and Jiangsu, as shown in Figure 1. The monitoring frequency is once a week. The sampling method is based on ISO 5667- . The annual average is used in this study. The measurement method is shown in Table 1. The data of ammonia nitrogen, biochemical oxygen demand, potassium permanganate index and petroleum are from the China Environmental Yearbook [52][53][54][55][56][57][58][59][60][61][62] from 2006 to 2016. Tables 1 and 2 show the indicators and data. This study collected, filtered, and processed data to perform a CCA of data, characterizing the development of the Yangtze River and its water environment. Data calculations and output were performed using IBM SPSS Statistics 25.0.  This study collected, filtered, and processed data to perform a CCA of data, characterizing the development of the Yangtze River and its water environment. Data calculations and output were performed using IBM SPSS Statistics 25.0.

Spearman Correlation Analysis
Spearman correlation analysis was performed to study the relationship between the development of shipping and the water environment of the Yangtze River. Table 3 presents the results, which show that the freight volume (x 2 ) of the Yangtze River mainline is significantly correlated with wastewater discharge (y 1 ) and ammonia nitrogen concentration (y 2 ). The Pearson correlation coefficients were 0.874 and 0.880, respectively, with a p-value of <0.001. Spearman correlation analysis was also used to test the inter-group correlation of the X group (x 1 and x 2 ) and Y group (y 1 , y 2 , y 3 , y 4 , and y 5 ). The correlation coefficient between the Yangtze River Shipping Prosperity Index (x 1 ) and mainline freight volume (x 2 ) was −0.661 with a p-value of 0.027. The inter-group correlation between the Y group results can be found in Table 4. The correlation coefficients of wastewater discharge (y 1 ) and ammonia nitrogen concentration (y 2 ) as well as potassium permanganate index (y 4 ) and petroleum category (y 5 ) were −0.775 and 0.804, respectively, with a significance level of 0.05. Although the correlation coefficients between the two variables were very high, Spearman correlation analysis can only reflect the correlation between single variables, and therefore, to reflect the overall correlation between the two groups of variables, performing CCA on the two groups of variables is necessary.

Canonical Correlation Analysis
In the CCA, two pairs of canonical correlation variables (U 1 , V 1 and U 2 , V 2 ) were extracted from the two groups of data (X and Y groups), as shown in Table 5. The canonical correlation coefficient of the first pair of canonical correlation variables (U 1 and V 1 ) was 0.979 at a significance level 0.05, whereas that of the second pair (U 1 and V 1 ) was 0.55 without a significance level. This meant that the first canonical correlation coefficient was reliable, whereas the second was meaningless. Thus, the first pair of canonical correlation variables (U 1 and V 1 ) was used in the subsequent analysis. In addition, the canonical correlation coefficient of the first pair of canonical correlation variables (0.979) was greater than all the correlation coefficients obtained by the Spearman correlation analysis, indicating that the CCA results could better represent the relationship between the river's shipping and water environment than could the Spearman correlation analysis. This means that the influence of shipping on the water environment is not simply on individual indicators but on the overall water environment. Therefore, the correlation relationship between the shipping and water environment of the Yangtze River could be represented by the first pair of canonical variables U 1 and V 1 . Table 6 lists the standardized canonical correlation coefficients, which are summarized as the canonical correlation model in Equation (8).
According to Equation (8), the canonical variable U 1 is dominated by Yangtze River mainline freight volume (x 2 ) with a coefficient of 0.847. The canonical variable V 1 is dominated by Yangtze River wastewater discharge quantity (y 1 ) and petroleum (y 5 ), with coefficients of 0.656 and 0.526, respectively. This canonical correlation model indicates that the Yangtze River mainline freight volume has a significant impact on the wastewater discharge quantity and petroleum in the Yangtze River's water environment. According to the model's result, this study determined that the Yangtze River mainline freight volume (x 1 ) has a significant correlation with the wastewater discharge quantity (y 1 ) and petroleum (y 5 ).
For the canonical variable V1, the coefficients of ammonia nitrogen concentration (y 2 ), biochemical oxygen demand (y 3 ), and potassium permanganate index (y 4 ) were 0.382, −0.107, and −0.222, respectively. The negative values of the coefficients indicate that these water pollution indicators in the Yangtze River are gradually decreasing.

Canonical Structural Analysis
A canonical structural analysis was employed to measure the correlation and direction of the original variables (X and Y) and the canonical variable (U 1 and V 1 ) using canonical loading and cross loading. Canonical loading is an indicator that reflects the correlation between the original variables and its own canonical variables (e.g., U 1 with x 1 and x 2 ). The greater the absolute value of canonical loading, the more the canonical variable interprets its original variable. Cross-loading is the correlation index of the original variable with another canonical variable (e.g., U 1 with y 1 , y 2 , y 3 , y 4 , and y 5 ). Table 7 presents the canonical structural analysis results.
According to the results in Table 7, the canonical loading of the Yangtze River mainline freight volume (x 2 ) and the canonical variable U 1 were as high as 0.98. This means that mainline freight volume can represent the Yangtze River's shipping development very well. Furthermore, the cross-loading description of the mainline freight volume (x 2 ) and the canonical variable V 1 was 0.967, indicating that mainline freight volume greatly affects the Yangtze River's water environment.

Discussions
The results of Spearman correlation analysis and CCA revealed that the canonical correlation coefficient derived from the CCA was greater than the correlation coefficients derived from the Spearman correlation analysis. This indicated that the CCA could better reflect the relationship between river shipping and the water environment than could the Spearman correlation analysis. According to the CCA result, mainline freight volume has a significant impact on the wastewater discharge quantity and petroleum in the Yangtze River's water environment. It also shows that freight rises with the continuous development of Yangtze river shipping. This fact further reflects that Yangtze river shipping quantity, shipping tonnage and the growth of crew numbers have led to an increase in ship sewage wastewater emissions and ship oil spills. Pollution accidents resulting from ships have increased gradually in recent years. This has caused a serious impact on the Yangtze River water environment, which may threaten the sustainable development of the Yangtze River Economic Belt. In addition, in the canonical correlation model, it can also be found that ammonia nitrogen concentration, biochemical oxygen demand and potassium permanganate index show negative correlation coefficients, indicating that these water pollution indicators in the Yangtze River Basin decrease gradually. According to China's surface water environmental quality standard GB3838-2002, ammonia nitrogen concentration, biochemical oxygen demand, and permanganate index are considered the basic indicators for detecting surface water quality. Therefore, the reduction of the values of these indicators shows that the water quality in the Yangtze River has been improving and further indicates the water quality management in the Yangtze River Basin can be considered successful in recent years. Unfortunately, there are still many pollution problems left in the Yangtze River water environment such as wastewater discharge and petroleum pollution, which require urgent solutions. Additionally, there are still many indicators in the Yangtze River waters that have exceeded standards, such as total phosphorus and chemical oxygen demand, which could not be controlled in the short term. Therefore, the prevention and control of water pollution in the Yangtze River Basin needs a long-term strategy based on continuous data analyses.

Conclusions
In this study, data on the development of shipping on the Yangtze River and data on the Yangtze River's water environment were analyzed. Firstly, CCA was better than Spearman correlation analysis to reflect the relationship between inland river navigation and water environment. Secondly, the canonical correlation coefficient (0.979) indicated a significant correlation between Yangtze River shipping and its water environment. According to the CCA result, mainline freight volume has a significant impact on wastewater discharge quantity and petroleum in the Yangtze River's water environment.

Conflicts of Interest:
The authors declare no conflict of interest.