Modified Principal Component Analysis for Identifying Key Environmental Indicators and Application to a Large-Scale Tidal Flat Reclamation

Identification of the key environmental indicators (KEIs) from a large number of environmental variables is important for environmental management in tidal flat reclamation areas. In this study, a modified principal component analysis approach (MPCA) has been developed for determining the KEIs. The MPCA accounts for the two important attributes of the environmental variables: pollution status and temporal variation, in addition to the commonly considered numerical divergence attribute. It also incorporates the distance correlation (dCor) to replace the Pearson’s correlation to measure the nonlinear interrelationship between the variables. The proposed method was applied to the Tiaozini sand shoal, a large-scale tidal flat reclamation region in China. Five KEIs were identified as dissolved inorganic nitrogen, Cd, petroleum in the water column, Hg, and total organic carbon in the sediment. The identified KEIs were shown to respond well to the biodiversity of phytoplankton. This demonstrated that the identified KEIs adequately represent the environmental condition in the coastal marine system. Therefore, the MPCA is a practicable method for extracting effective indicators that have key roles in the coastal and marine environment.


Introduction
Coastal tidal flats have often been reclaimed to moderate the conflict between population growth and land scarcity [1].In China, about 1.12 million hectare (ha) of coastal flats has been reclaimed since 1979 [2], and the number will increase another 0.25 million ha by 2020 [3].Numerous studies have noted that the reclamation has noticeable effects on the coastal marine environment [4][5][6][7].To characterize the local environment with changes that are caused by reclamation and evaluate the environmental effect of reclamation activity, it is essential to identify the key environmental indicators (KEIs) from a large number of environmental variables.The KEIs also provide useful tools for tracking the environmental progress, supporting policy evaluation, and informing the public about coastal and marine governance.
Environmental issues often involve analysis of a wide range of variables simultaneously.Principal component analysis (PCA) can effectively reduce the dimension of a multivariate data set by using only the first few principal components (PCs) [8], while still preserving its structure to the extent possible [9].Therefore, PCA has been commonly employed in environmental fields to identify the KEIs.For example, Pejman et al. used PCA to extract the most significant parameters contributing to water quality variations for all seasons in Haraz River Basin in India [10].Yang and Yao et al. identified the soil organic matter as an indispensable enabling factor of soil quality from a total of 22 soil properties on the coast in the northern of Jiangsu province of China [11,12].With the PCA method, Berger et al. selected five variables as the most key water quality variables, including electric conductivity, oxygen concentration, caffeine, silicate, and toxic units with respect to pesticides, and they were significantly correlated with the ecological quality in German streams [13].Ouyang et al. extract the parameters that are most important in assessing seasonal variations of river water quality from 16 physical and chemical parameters in LSJR basin, USA [14].In the coastal and marine region, González-Oreja and Sáiz-Salinas applied PCA to propose the dissolved oxygen from the data set of 30 abiotic variables as the key environmental factor controlling the distribution of benthos in the Bilbao and Plentzia estuary, Spain [15].Performing PCA on the long-term data of physico-chemical and biological variables in the Nervión estuary, Borja et al. confirmed the redox potential, dissolved oxygen and metal concentrations of the sediments to be the critical controllers determining the local benthic structure [16].Looi et al. extracted the seven major components out of 28 variables by 48 samples of the coastal water of the Straits of Malacca, Malaysia, with relation to the pollution contribution from minerals-related parameters, natural and anthropogenic sources [17].Udayakumar et al. used PCA to locate the major parameter affecting the ecological health of the coastal water in the Mangalore coastal region in India [18].On the basis of PCA, Iyer et al. constructed a statistical model to explain the relationships between the various physicochemical variables and the environmental conditions in the Cochin coast in south west India [19].Wilbers et al. identified the four key factors to explain the presence of pollution in surface water from the dataset of 32 sampling locations in the Mekong Delta of Vietnam [20].In the Kuwait Bay, AI-Mutairi et al. calculated three principal components responsible for water quality variations, in which the first component included DO and pH, the second including PO4, TSS, and NO 3 , and the last containing seawater temperature and turbidity [21].The PCA also has been performed to compare the heavy metal composition among the coastal sediments and reduce the dimensionality of the original chemical variables into a few factors to identify the noteworthy elements in the contaminated areas [22,23].Simeonov et al. defined the structure of the 15 analytes sediment data collected from the western coastline of the USA as four latent factors conditionally named "anthropogenic", "organic", "natural", and "hot spots" [24].Pereira et al.s' work demonstrated the importance of organic matter content and the fine-grained fraction of sediments on the control of the bioavailable metals distribution in the Paraguaçu estuary, Brazil [25].However, these previous attempts were conducted primarily based on the numerical divergence of a data set but with inadequate consideration of the other two important attributes: pollution status and temporal variation.In addition, the Pearson's linear correlation was frequently used to detect the correlations of the environmental variables, whereas they are often inherently nonlinear in nature [26].
The objective of this study is to develop an approach to identify the KEIs accounting for all three important attributes of the environmental variables and their nonlinear interrelationships.To achieve this, a modified PCA method (MPCA) was proposed.The three attributes were taken as coordinates to construct a three-dimensional environmental characteristic space, and the distance correlation (dCor) was used to measure the nonlinear interrelationships between the variables.The MPCA was applied to determine the KEIs in the Tiaozini sand shoal, which is a large-scale tidal flat reclamation region in China.The identified KEIs were shown to successfully represent the local environmental characteristics.The approach provides a practicable tool to identify the KEIs for assessing the environment quality in coastal reclamation areas.The results can be used as part of a framework to assess the effects of coastal reclamation and relevant policies, as well as provide subsequent ecosystem services for protecting coastal and marine environments.The results can also be used to provide technological support for the establishment of coastal environmental monitoring and administration system for conservation and sustainable development of marine resources.

MPCA
Assume that A is an (m × n) matrix containing measurements of n environmental variables, each measured at m sampling sites.Firstly, A was normalized through the properties of data divergence, variables pollution status, and temporal variation to develop three matrixes D, P, and T.
where, a ij is the value of the jth variable measured at the ith sampling site; a j is the spatial average of the jth variable; σ j is the spatial standard deviation of the jth variable; and, d ij is the data divergence of a ij , which indicates the spatial differentiation of the jth variable.
(ii) Matrix of pollution status P = [p ij ] (m×n) where, p ij is the pollution index of a ij ; S j is the environmental quality standard of the jth variable; pH is the power of hydrogen in water; pH ju , pH jd are the upper and lower limits of pH standard value, respectively; and, DO f is the saturation concentration of dissolved oxygen in water.A p ij greater than 1.0 indicates that the value of a ij exceeds the environmental quality guidelines.
(iii) Matrix of temporal variation T = [t ij ] (m×n) where, t ij is the temporal fluctuation intensity of a ij ; a k ij is the measurement at the kth monitoring; a ij is the temporal average of a ij ; l is the total monitoring times.sgn is the sign function with a value of 1 if there is an overall increasing tendency of a ij and −1 otherwise.The bigger the absolute value of t ij , the larger the temporal variation of a ij .
The spatial distribution, pollution status, and the temporal variation of an environmental variable are independent, and thus can be used to construct a three-dimensional characteristic space (C-Space).Mathematically, d ij , p ij , and t ij are the projected lengths of a ij on the respective coordinates of the C-Space (Figure 1).The measurement vector of any variable in the three-dimensional space can be written as: where, v ij is the measurement vector of the jth variable at the ith sampling site;  In a normal PCA, the Pearson's correlations for variables are calculated to indicate their paired linear dependences.However, in an environmental system, these dependences usually show a strong non-linear nature [26] and are difficult to effectively detect by the Pearson's correlation.This study uses the distance correlation (dCor) instead to measure the relationships of the environmental variables.This method has been shown to have remarkable advantages in detecting non-linear or non-monotone relationships among random variables [27].
The empirical distance covariance (dCov) of vα and vβ can be calculated as: Then, the empirical distance correlation (dCor) is given as: Based on the dCor among the environmental variables, the comprehensive matrix R can then be established as: In a normal PCA, the Pearson's correlations for variables are calculated to indicate their paired linear dependences.However, in an environmental system, these dependences usually show a strong non-linear nature [26] and are difficult to effectively detect by the Pearson's correlation.This study uses the distance correlation (dCor) instead to measure the relationships of the environmental variables.This method has been shown to have remarkable advantages in detecting non-linear or non-monotone relationships among random variables [27].
Firstly, defining the C-Space distance of the αth variable v α : where, ||•|| is the Euclidean norm; ϕ kl is the C-Space distance between the kth and the lth sampling site of the αth variable v α .Similarly: where, φ kl is the distance between the kth and the lth sampling site of the βth variable v β .The empirical distance covariance (dCov) of v α and v β can be calculated as: Then, the empirical distance correlation (dCor) is given as: Based on the dCor among the environmental variables, the comprehensive matrix R can then be established as: The eigenvalue problems of matrix R can be solved: Water 2018, 10, 69 5 of 18 |R − λI| = 0, (10) where, λ is the eigenvalue; I is the identity matrix.The eigenvectors of R are the principal components (PCs), and their orders are determined based on the magnitude of their corresponding eigenvalues.
The eigenvector with the largest eigenvalue best represents A and is defined as the first PC.Similarly, the eigenvector with the second-largest eigenvalue best represents the residual left in A once the first eigenvector has been removed and is defined as the second PC.Each subsequent eigenvector is defined the same way.Typically, only the top several PCs are needed to achieve a high level of discriminating accuracy with an accumulated contribution rate greater than 75%.Herein, the contribution rate of the gth PC is computed as: where, c g and λ g are the contribution rate and eigenvalue of the gth PC, respectively.The criteria proposed by Andrews et al. [28] is then employed to determine the most appropriate indicators from the remaining PCs to construct the KEIs of the environmental variables: (1) Only the variables with a factor loading (the corresponding element of the eigenvector) within 10% of the highest factor loading (using absolute values) are retained from each remaining PC; (2) If the variables are correlated (correlation coefficient of >0.60 and significant (two-tailed) at (c) The distance correlations among the variables were calculated using Equations ( 5)-( 8) a shown in Table 3.
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequ the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirmi sampling to be adequate for the PCA analysis.The MPCA was then conducted to sele representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of th characteristics of the coastal marine system, and the extraction rate of different variables from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the origin set.The order by which the PCs were interpreted was determined by the magnitude o eigenvalues.The first PC explained 57.52% of the variance.The factor loadings of all the varia the first PC were positive and showed relatively narrow differences.PC1 included six weighted variables of which the factor loadings were within 10% of the highest one: Cd W ( pH W (0.273), DO W (0.283), Hg W (0.278), Sulphide S (0.268), and TOC S (0.267).It also had the mo loadings from DIN W (0.251), COD W (0.250), TKN S (0.254), and TP S (0.252), which could be exp by nutrient condition in the water column and sediment.The highest loading from Cd W w result of significant correlation that existed among pH W , DO W , SRP W , COD W , Hg W , TKN S , Sul Cd S and Cd W (Table 3).Because all of the highly weighted variables comprised in this com were significantly correlated (dCor > 0.6 and ϸ < 0.01) with Cd W except TOC s (see Table 3), PC1 be identified as the "aqueous toxin and organic pollution component".≤ 0.01), the variable with the highest sum of correlation coefficients (absolute values) is retained for the KEIs; otherwise, all of the variables are retained.The variables contained in the KEIs are the most important contributors to the integrated environmental condition of spatial distribution, marine pollution, and temporal variation.The main steps of the MPCA are illustrated in Figure 2.
Water 2018, 10, 69 5 of 17 eigenvector is defined the same way.Typically, only the top several PCs are needed to achieve a high level of discriminating accuracy with an accumulated contribution rate greater than 75%.Herein, the contribution rate of the gth PC is computed as: where, cg and λg are the contribution rate and eigenvalue of the gth PC, respectively.The criteria proposed by Andrews et al. [28] is then employed to determine the most appropriate indicators from the remaining PCs to construct the KEIs of the environmental variables: (1) Only the variables with a factor loading (the corresponding element of the eigenvector) within 10% of the highest factor loading (using absolute values) are retained from each remaining PC; (2) If the variables are correlated (correlation coefficient of >0.60 and significant (two-tailed) at ϸ ≤ 0.01), the variable with the highest sum of correlation coefficients (absolute values) is retained for the KEIs; otherwise, all of the variables are retained.The variables contained in the KEIs are the most important contributors to the integrated environmental condition of spatial distribution, marine pollution, and temporal variation.The main steps of the MPCA are illustrated in Figure 2.

Application Area
The Tiaozini sand shoal is a roughly 52,882 ha large silt-muddy flat on the coast in the northern of Jiangsu province in China, embraced by the radiative sand ridge in the Yellow Sea with stable and relatively high-speed deposition in recent decades [29].According to the Jiangsu Coastal Development Program (China's National Development and Reform Commission (NDRC), 2009), the local government planned a large-scale reclamation project covering a 22,773.33ha area in this flat to ease the mounting shortage of the agricultural land resource, named as Tiaozini Land Reclamation (Figure 3).The project was designed to be carried out over three phases to reclaim 6746.67 ha, 8446.67 ha, and 7579.99 ha of the land successively.Phase I was completed in July 2013

Application Area
The Tiaozini sand shoal is a roughly 52,882 ha large silt-muddy flat on the coast in the northern of Jiangsu province in China, embraced by the radiative sand ridge in the Yellow Sea with stable Water 2018, 10, 69 6 of 18 and relatively high-speed deposition in recent decades [29].According to the Jiangsu Coastal Development Program (China's National Development and Reform Commission (NDRC), 2009), the local government planned a large-scale reclamation project covering a 22,773.33ha area in this flat to ease the mounting shortage of the agricultural land resource, named as Tiaozini Land Reclamation (Figure 3).The project was designed to be carried out over three phases to reclaim 6746.67 ha, 8446.67 ha, and 7579.99 ha of the land successively.Phase I was completed in July 2013 and the other two have not yet been started.In this study, the proposed MPCA was applied to identify the KEIs in the reclamation region using the environmental data collected before and after the Phase I project.

Field Sampling
We carried out the field sampling effort in September 2013, after the completion of the Phase I project.The marine monitoring system included 12 sampling sites (Figure 4).Eight water quality variables plus seven variables in the sediment compartment were measured at each site.pH, dissolved oxygen (DO), chemical oxygen demand (COD), dissolved inorganic nitrogen (DIN), soluble reactive phosphorus (SRP), petroleum (PETRO), as well as heavy metals Cd and Hg were measured in the water column.Sulphide, total organic carbon (TOC), PETRO, total Kjeldahl nitrogen (TKN), total phosphorus (TP), Cd, and Hg were measured in the sediment sample.We conducted the marine phytoplankton measurements synchronously.Historical monitoring data of the same variables in April 2010 were collected from an official report of "Marine environmental monitoring report for Tiaozini Reclamation Region" (Jiangsu Marine Environmental Monitoring and Forecasting Center, Jiangsu Marine fisheries research institute, 2010) in order to represent the environmental status of the area before reclamation.
All the samples were collected, stored

Field Sampling
We carried out the field sampling effort in September 2013, after the completion of the Phase I project.The marine monitoring system included 12 sampling sites (Figure 4).Eight water quality variables plus seven variables in the sediment compartment were measured at each site.pH, dissolved oxygen (DO), chemical oxygen demand (COD), dissolved inorganic nitrogen (DIN), soluble reactive phosphorus (SRP), petroleum (PETRO), as well as heavy metals Cd and Hg were measured in the water column.Sulphide, total organic carbon (TOC), PETRO, total Kjeldahl nitrogen (TKN), total phosphorus (TP), Cd, and Hg were measured in the sediment sample.We conducted the marine phytoplankton measurements synchronously.Historical monitoring data of the same variables in April 2010 were collected from an official report of "Marine environmental monitoring report for Tiaozini Reclamation Region" (Jiangsu Marine Environmental Monitoring and Forecasting Center, Jiangsu Marine fisheries research institute, 2010) in order to represent the environmental status of the area before reclamation.Descriptive statistics was conducted to describe the main characteristics of the coastal environment in the study area both pre-(Table 1) and post (Table 2) the Phase I project.The corresponding limits of each variable as per the environmental quality guidelines of China Sea Water Quality Standard (CSWQS GB 3097-1997) and China Marine Sediment Quality (CMSQ GB 18668-2002) are also listed in Tables 1 and 2.
Water 2018, 10, 69 7 of 17     Figure 5 compares the environmental variable measurements (mean value with spatial standard deviation) between pre-and post-Phase I project.Monitoring data indicated that the spatial and temporal variations in the concentrations of pH and DO were relatively stable.After the Phase I project, the concentrations of PETRO increased noticeably both in the water column and sediment, largely because of the oil spill of working vessels and oily wastewater discharge from working equipment during the land reclamation.The concentrations of all the other variables showed different degrees of decrease.
Figure 6 demonstrates the temporal variation in the Shannon-Wiener index, Rmargalef and Pielou index of phytoplankton (mean value with spatial standard deviation) before and after the Phase I project.The spatial distributions of the post-Phase I biodiversity indices were more symmetrical than those of the pre-Phase I. Overall, the Shannon-Wiener index and Rmargalef of phytoplankton increased, and the Pielou index reduced slightly post the land reclamation.
showed different degrees of decrease.
Figure 6 demonstrates the temporal variation in the Shannon-Wiener index, Rmargalef and Pielou index of phytoplankton (mean value with spatial standard deviation) before and after the Phase I project.The spatial distributions of the post-Phase I biodiversity indices were more symmetrical than those of the pre-Phase I. Overall, the Shannon-Wiener index and Rmargalef of phytoplankton increased, and the Pielou index reduced slightly post the land reclamation.

Results
All the monitoring variables were taken as potential environmental indicators (PEIs).A pool of attributes consisting of 8 water variables (pH, DO, COD, DIN, SRP, PETRO, Cd, Hg) and seven sediment variables (Sulphide, TOC, PETRO, TKN, TP, Cd, Hg) were established.Due to the limited pre-reclamation sediment data (there were 12 sediment sampling sites in 2013 but only 5 in 2010), the temporal fluctuation intensity of the variables could not be calculated at every site.To deal with this issue, the study area was divided into four sub-regions: North, Center, South, and Far, each includes at least one sediment sampling site and several water sampling sites in both years (Figure 4).The spatial averages of each variable during pre-and post-Phase I periods were calculated in each sub-region.Then Equation (3) was used to obtain the sub-region's overall temporal fluctuation intensity.Given that the temporal fluctuation intensity of a variable at any sampling site can be approximately represented by the overall value of the sub-region in which the site is located, the matrix T in the study area could be obtained.The matrix D and P were constructed using Equations ( 1) and ( 2), respectively, with the data set measured in September 2013 (Table 2) to highlight the spatial distribution and environmental quality of the variables post the Phase I reclamation and avoid overuse of the historical data.Thus, the measurement vectors in the C-Space could be built according to Equation (4).
The pollution status (p), temporal variation (t), and spatial variation (d) of all the PEIs are illustrated in the box charts shown in Figure 7.Both pollutant statuses and temporal fluctuation intensities show strong variations among the measured variables, while the spatial variations appear closer to each other.The variable measurements could meet the water and sediment quality objectives in general, except for DIN (p ranged from 0.51 to 1.81); at all of the sampling sites, PETRO in the water column significantly rose (t ranged from 0.44 to 0.68) and Hg in the sediment experienced a huge change (t ranged from −0.98 to −0.91) post the reclamation.

Results
All the monitoring variables were taken as potential environmental indicators (PEIs).A pool of attributes consisting of 8 water variables (pH, DO, COD, DIN, SRP, PETRO, Cd, Hg) and seven sediment variables (Sulphide, TOC, PETRO, TKN, TP, Cd, Hg) were established.Due to the limited pre-reclamation sediment data (there were 12 sediment sampling sites in 2013 but only 5 in 2010), the temporal fluctuation intensity of the variables could not be calculated at every site.To deal with this issue, the study area was divided into four sub-regions: North, Center, South, and Far, each includes at least one sediment sampling site and several water sampling sites in both years (Figure 4).The spatial averages of each variable during pre-and post-Phase I periods were calculated in each sub-region.Then Equation (3) was used to obtain the sub-region's overall temporal fluctuation intensity.Given that the temporal fluctuation intensity of a variable at any sampling site can be approximately represented by the overall value of the sub-region in which the site is located, the matrix T in the study area could be obtained.The matrix D and P were constructed using Equations ( 1) and ( 2), respectively, with the data set measured in September 2013 (Table 2) to highlight the spatial distribution and environmental quality of the variables post the Phase I reclamation and avoid overuse of the historical data.Thus, the measurement vectors in the C-Space could be built according to Equation (4).
The pollution status (p), temporal variation (t), and spatial variation (d) of all the PEIs are illustrated in the box charts shown in The distance correlations among the variables were calculated using Equations ( 5)-( 8) and are shown in Table 3.
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of the total characteristics of the coastal marine system, and the extraction rate of different variables ranged from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the original data set.The order by which the PCs were interpreted was determined by the magnitude of their The distance correlations among the variables were calculated using Equations ( 5)-( 8) and are shown in Table 3.The distance correlations among the variables were calculated using Equations ( 5)-( 8) and are shown in Table 3.
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of the total characteristics of the coastal marine system, and the extraction rate of different variables ranged from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the original data set.The order by which the PCs were interpreted was determined by the magnitude of their eigenvalues.The first PC explained 57.52% of the variance.The factor loadings of all the variables in the first PC were positive and showed relatively narrow differences.PC1 included six highly weighted variables of which the factor loadings were within 10% of the highest one: Cd W (0.288), pH W (0.273), DO W (0.283), Hg W (0.278), Sulphide S (0.268), and TOC S (0.267).It also had the moderate loadings from DIN W (0.251), COD W (0.250), TKN S (0.254), and TP S (0.252), which could be explained by nutrient condition in the water column and sediment.The highest loading from Cd W was the result of significant correlation that existed among pH W , DO W , SRP W , COD W , Hg W , TKN S , Sulphide S , Cd S and Cd W (Table 3).Because all of the highly weighted variables comprised in this component were significantly correlated (dCor > 0.6 and ϸ < 0.01) with Cd W except TOC s (see Table 3), PC1 could be identified as the "aqueous toxin and organic pollution component".
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of the total characteristics of the coastal marine system, and the extraction rate of different variables ranged from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the original data set.The order by which the PCs were interpreted was determined by the magnitude of their eigenvalues.The first PC explained 57.52% of the variance.The factor loadings of all the variables in the first PC were positive and showed relatively narrow differences.PC1 included six highly weighted variables of which the factor loadings were within 10% of the highest one: Cd W (0.288), pH W (0.273), DO W (0.283), Hg W (0.278), Sulphide S (0.268), and TOC S (0.267).It also had the moderate loadings from DIN W (0.251), COD W (0.250), TKN S (0.254), and TP S (0.252), which could be explained by nutrient condition in the water column and sediment.The highest loading from Cd W was the result of significant correlation that existed among pH W , DO W , SRP W , COD W , Hg W , TKN S , Sulphide S , Cd S and Cd W (Table 3).Because all of the highly weighted variables comprised in this component were significantly correlated (dCor > 0.6 and ϸ < 0.01) with Cd W except TOC s (see Table 3), PC1 could be identified as the "aqueous toxin and organic pollution component".The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of the total characteristics of the coastal marine system, and the extraction rate of different variables ranged from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the original data set.The order by which the PCs were interpreted was determined by the magnitude of their eigenvalues.The first PC explained 57.52% of the variance.The factor loadings of all the variables in the first PC were positive and showed relatively narrow differences.PC1 included six highly weighted variables of which the factor loadings were within 10% of the highest one: Cd W (0.288), pH W (0.273), DO W (0.283), Hg W (0.278), Sulphide S (0.268), and TOC S (0.267).It also had the moderate loadings from DIN W (0.251), COD W (0.250), TKN S (0.254), and TP S (0.252), which could be explained by nutrient condition in the water column and sediment.The highest loading from Cd W was the result of significant correlation that existed among pH W , W , SRP W , COD W , Hg W , TKN S , Sulphide S , Cd S and Cd W (Table 3).Because all of the highly weighted variables comprised in this component were significantly correlated (dCor > 0.6 and (c) charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the ents.
relations among the variables were calculated using Equations ( 5)-( 8) and are er Olkin test (KMO) was first performed to measure the sampling adequacy for to proceed with a factor analysis.The KMO result was 0.884, confirming the quate for the PCA analysis.The MPCA was then conducted to select the y indicators to construct the KEIs from the pool of PEIs.ented in Table 4 show that the first four PCs accounted for 79.71% of the total coastal marine system, and the extraction rate of different variables ranged %.Therefore, these PCs contained the primary information of the original data hich the PCs were interpreted was determined by the magnitude of their t PC explained 57.52% of the variance.The factor loadings of all the variables in ositive and showed relatively narrow differences.PC1 included six highly f which the factor loadings were within 10% of the highest one: Cd W (0.288), 283), Hg W (0.278), Sulphide S (0.268), and TOC S (0.267).It also had the moderate (0.251), COD W (0.250), TKN S (0.254), and TP S (0.252), which could be explained n in the water column and sediment.The highest loading from Cd W was the orrelation that existed among pH W , DO W , SRP W , COD W , Hg W , TKN S , Sulphide S , 3).Because all of the highly weighted variables comprised in this component rrelated (dCor > 0.6 and ϸ < 0.01) with Cd W except TOC s (see Table 3), PC1 could aqueous toxin and organic pollution component".< 0.01) with Cd W except TOC s (see Table 3), PC1 could be identified as the "aqueous toxin and organic pollution component".
The second PC explained 9.14% of the variance with high factor loading from DIN W (−0.479) and a moderate factor loading from pH W (−0.399), DO W (−0.325), SRP W (0.390), Sulphide S (0.323), and TOC S (0.316).PC2 could be identified as the "eutrophication component" since it mainly explained variations in characters that are related to the marine eutrophication.In PC2, DIN W correlated strongly with pH W (dCor = 0.918 and (c) The distance correlations among the variables were calculated using Equations ( 5)-( 8) and are shown in Table 3.
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative quality indicators to construct the KEIs from the pool of PEIs.
The results presented in Table 4 show that the first four PCs accounted for 79.71% of the total characteristics of the coastal marine system, and the extraction rate of different variables ranged from 73.82% to 92.14%.Therefore, these PCs contained the primary information of the original data set.The order by which the PCs were interpreted was determined by the magnitude of their eigenvalues.The first PC explained 57.52% of the variance.The factor loadings of all the variables in the first PC were positive and showed relatively narrow differences.PC1 included six highly weighted variables of which the factor loadings were within 10% and Cd W (Table 3).Because all of the highly weighted variables comprised in this component were significantly correlated (dCor > 0.6 and ϸ < 0.01) with Cd W except TOC s (see Table 3), PC1 could be identified as the "aqueous toxin and organic pollution component".The distance correlations among the variables were calculated using Equations ( 5)-( 8) and are shown in Table 3.
The Kaiser Meyer Olkin test (KMO) was first performed to measure the sampling adequacy for the appropriateness to proceed with a factor analysis.The KMO result was 0.884, confirming the sampling to be adequate for the PCA analysis.The MPCA was then conducted to select the representative contaminant of the high-duty construction of reclamation.As the biggest contributor to PC4, Hg S showed the highest absolute temporal fluctuation at all the sampling sites (|t| > 0.90, see Figure 7b).The relative independence and drastic change made it sensible to select Hg S to be an important indicator in the region.
Because of the direct dependency and high sensitivity of phytoplankton communities on the environmental variables in the coastal marine system [47,48], the biodiversity of phytoplankton was used as the response data of the local environmental conditions impacted by reclamation.Figure 8 illustrates the comparison of the measurements of KEIs between pre-and post-Phase I project.After the reclamation, the concentrations of Cd W , TOC S , DIN W , and Hg S significantly decreased; although the concentration of PETRO W increased, it remained at a rather low level.These variations indicate that the local environmental condition after the reclamation favored the development of phytoplankton more than the pre-reclamation condition in the region.This is consistent with the performance of the biodiversity of phytoplankton (see Figure 6).To further evaluate the validity of the selected KEIs, multiple regression analysis was conducted using the KEIs as independent variables and the biodiversity indices as dependent variables.The multiple regressions yielded coefficients of determination (R 2 ) of 0.996 for Rmargalef, 0.918 for the Shannon-Wiener index, and 0.938 for the Pielou index after the Phase project, as well as 0.994 for Rmargalef, 0.912 for the Shannon-Wiener index, and 0.957 for the Pielou index before the Phase I project (Table 5).The results suggested that the identified KEIs were well representative of the coastal environmental system.PCA has become a frequently used tool of identifying the KEIs in environmental applications due to its advantage of effective dimensionality reduction with minimal information loss.Many researchers had performed PCA to reduce the dimensionality of environmental variables to identify the key elements [17, [22][23][24], locate the critical controllers in determining the local aquatic structure [15,16,[49][50][51], as well as reveal the KEIs responsible for water quality variations [18][19][20][21]25].In the normal PCA algorithm, the Pearson's correlation was often used to measure the correlations of the environmental variables and construct the correlation coefficient matrix R for the subsequent factor analysis.However, the Pearson's correlation is difficult to be applicable when the interrelationships of the variables are nonlinear.For example, we calculate the Pearson's correlation coefficient of Cd W with Sulphide S , Cd W with SRP W , Cd W with pH W , Cd W with DO W , TOC S with Hg S , and TOC S with Sulphide S in the C-space of the study region.The results indicate that the correlation coefficients are −0.329,0.0975, −0.0191, −0.146, −0.208, and −0.406, respectively, and their significant values (two-tailed) are all larger than 0.05.The Pearson's correlation can't capture the strong nonlinear relationships among these variables.The errors should cause the information distortion of high loading factors in the subsequent factor analysis, and negatively affect the determining results of KEIs in the system.In this study, the adoption of the distance correlation (dCor) solves this problem effectively.The validation test of KEIs demonstrates that the proposed MPCA is a suitable method for identifying the KEIs in the coastal and marine environment.The method also can be extended to  PCA has become a frequently used tool of identifying the KEIs in environmental applications due to its advantage of effective dimensionality reduction with minimal information loss.Many researchers had performed PCA to reduce the dimensionality of environmental variables to identify the key elements [17, [22][23][24], locate the critical controllers in determining the local aquatic structure [15,16,[49][50][51], as well as reveal the KEIs responsible for water quality variations [18][19][20][21]25].In the normal PCA algorithm, the Pearson's correlation was often used to measure the correlations of the environmental variables and construct the correlation coefficient matrix R for the subsequent factor analysis.However, the Pearson's correlation is difficult to be applicable when the interrelationships of the variables are nonlinear.For example, we calculate the Pearson's correlation coefficient of Cd W with Sulphide S , Cd W with SRP W , Cd W with pH W , Cd W with DO W , TOC S with Hg S , and TOC S with Sulphide S in the C-space of the study region.The results indicate that the correlation coefficients are −0.329,0.0975, −0.0191, −0.146, −0.208, and −0.406, respectively, and their significant values (two-tailed) are all larger than 0.05.The Pearson's correlation can't capture the strong nonlinear relationships among these variables.The errors should cause the information distortion of high loading factors in the subsequent factor analysis, and negatively affect the determining results of KEIs in the system.In this study, the adoption of the distance correlation (dCor) solves this problem effectively.The validation test of KEIs demonstrates that the proposed MPCA is a suitable method for identifying the KEIs in the coastal and marine environment.The method also can be extended to extract the key indicators to support environmental quality management in other regions across the world where the environmental systems present the non-linear or non-monotone characteristics.

Conclusions
A modified principal component analysis approach (MPCA) has been developed to identify the KEIs.In the MPCA algorithm, a characteristic space composed of three attribute dimensions of spatial distribution, pollution status, and temporal variation of environmental variables was constructed to embody their integrated environmental characteristics.The distance correlation (dCor) was introduced to detect the nonlinear relationships inherently existing between these variables.The MPCA was applied to identify the KEIs from the PEIs consisting of eight water plus seven sediment chemical variables in the Tiaozini large-scale tidal flat reclamation area.The identified KEIs in the reclamation area included DIN, Cd, and PETRO in the water column, and Hg, TOC in the sediment.Satisfactory responses of the KEIs to the biodiversity of phytoplankton both pre-and post-Phase I project indicate that the selected KEIs adequately represent the environmental condition of the reclamation region.Therefore, the proposed MPCA is suitable for extracting effective indicators that play key roles in the coastal and marine environment.The use of the KEIs can help policy makers and researchers to monitor the environmental changes in coastal reclamation area and make informative decisions on selecting management strategies for the sustainable development and utilization of seawater resources.Next, efforts should be devoted to using more cases to further test the validation of the MPCA in different coastal and marine systems.It will also be of interest to collect long time-serial data to analyze the complex interrelationships between the KEIs and marine plankton, and discuss the environmental behavior and ecological effects of the KEIs with specially designed numerical experiments in the study region.
three coordinate vectors, respectively, in the C-Space.Thus, v ij indicates the integrated environmental feature of the jth variable at the ith sampling site.

Figure 1 .
Figure 1.Sketch of the C-Space and the vector of an arbitrary variable measurement.

Figure 1 .
Figure 1.Sketch of the C-Space and the vector of an arbitrary variable measurement.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of t variable measurements.

Figure 2 .
Figure 2. Flow-chart of the main steps used to perform the modified principal component analysis approach (MPCA).

Figure 2 .
Figure 2. Flow-chart of the main steps used to perform the modified principal component analysis approach (MPCA).
, and analyzed according to China's national standard "Specification of Oceanographic Survey" (GB/T 12763-2007) and "Specification for Marine Monitoring" (GB 17378-2007) issued by the General Administration of Quality Supervision, Inspection and Quarantine and the Standardization Administration of China, as well as "Technical specification of marine biological quality monitoring" (HY/T 078-2005) issued by the State Oceanic Administration of China.

Water 2018 ,
10, 69 7 of 18 All the samples were collected, stored, and analyzed according to China's national standard "Specification of Oceanographic Survey" (GB/T 12763-2007) and "Specification for Marine Monitoring" (GB 17378-2007) issued by the General Administration of Quality Supervision, Inspection and Quarantine and the Standardization Administration of China, as well as "Technical specification of marine biological quality monitoring" (HY/T 078-2005) issued by the State Oceanic Administration of China.

Figure 4 .
Figure 4. Location of the sampling sites.

Figure 4 .
Figure 4. Location of the sampling sites.

Figure 5 .
Figure 5.Comparison of environmental variable measurements between pre-and post-Phase I project.

Figure 6 .
Figure 6.Comparison of phytoplankton biodiversity between pre-and post-Phase I project.

Figure 7 .Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 7 .
Figure 7.The box charts of (a) pollution status; (b) temporal variation; and, (c) spatial variation of the variable measurements.

Figure 8 .
Figure 8.Comparison of measurements of the KEIs between pre-and post Phase I project.

Figure 8 .
Figure 8.Comparison of measurements of the KEIs between pre-and post Phase I project.

Table 1 .
Descriptive statistics of measured variables of water and sediment samples pre-Phase I (April 2010).

Table 1 .
Descriptive statistics of measured variables of water and sediment samples pre-Phase I (April 2010).Standard deviation; 2 Coefficient of variation; 3 Below detection limit;4No available standard.

Table 2 .
Descriptive statistics of measured variables of water and sediment samples post-Phase I (September 2013).

Table 3 .
Distance correlations (dCor) among the measured variables across the reclamation area.
Notes: w The water chemical variable; S The sediment chemical variable; * Significant (two-tailed) at , which could be explained by nutrient condition in the water column and sediment.The highest loading from Cd W was the result of significant correlation that existed among pH W

Table 5 .
Multiple regressions of the KEIs on the phytoplankton biodiversity.