Data Analysis of Beach Sands’ Chemical Analysis Using Multivariate Statistical Methods and Heavy Metal Distribution Maps: The Case of Moonlight Beach Sands, Kemer, Antalya, Turkey

: Multivariate statistical methods are widely used in several disciplines of fundamental sciences. In the present study, the data analysis of the chemical analysis of the sands of Moonlight Beach in the Kemer region was examined using multivariate statistical methods. This study consists of three parts. The multivariate statistical analysis tests were described in the ﬁrst part, then the pollution indexes were studied in the second part. Finally, the distribution maps of the chemical analyses and pollution indexes were generated using the obtained data. The heavy metals were mostly observed in location K1, while they were sorted as follows based on their concentrations: Mg > Fe > Al > Ti > Sr > Mn > Cr > Ni > Zn > Zr > Cu > Rb. Also, strong positive correlations were found between Si, Fe, Al, K, Ti, P. According to the results of factor analysis, it was found that four factors explained 83.5% of the total variance. On the other hand, the coe ﬃ cient of determination ( R 2 ) was calculated as 63.6% in the regression model. Each unit increase in the value of Ti leads to an increase of 0.022 units in the value of Si. Potential Ecological Risk Index analysis results (RI < 150) revealed that the study area had no risk. However, the locations around Moonlight Beach are under risk in terms of Enrichment Factor and Contamination Factor values. The index values of heavy metals in the anomaly maps and their densities were found to be successful; and higher densities were observed based on heavy metal anomalies.


Introduction
Just as a single factor can affect any formation in nature, several factors can also affect it. Fundamental sciences, particularly multivariate statistical methods, are used to examine, analyze, and interpret these formations in nature. Scientific studies examining heavy metals content in food, water, and the soil have gained prominence in terms of the functioning of the ecosystem and the quality of human life. Therefore, the number of studies on beach sands increased [1][2][3][4][5]. In these studies, as well as many other studies, statistical evaluations were conducted [6][7][8][9][10][11][12][13][14][15][16][17][18].
In this context, the interpretation of beach sand data using the multivariate statistical analysis comes to the forefront. Therefore, multivariate statistical analyses such as principal component analysis (PCA), cluster analysis (CA), etc. are used to reveal the relationship between variables and the formation they affect [19][20][21][22]. Besides, the adopted multivariate statistical approach and recent advances in mathematics show how copulas may represent an efficient tool to investigate the statistical behavior of dependent variables [23][24][25][26].
The study site is the world-famous Moonlight Beach (also known as Ayisigi Beach, Kemer, Antalya, Turkey) which is located on the southern coast of Turkey. Located in very close to downtown Kemer, the beach has a length of 350 m and a width of 25 m, and it is a sandy beach without any pebbles ( Figure 1).
Samples were collected from the study site by a random sampling method. The samples were collected from three different points in each location, 10 m from the sea on the beachside. For each sample, a total of 3 kg of sand (sediment, sediment) was collected from a depth of approximately 10 cm using a sterile shovel. The coordinates of the location of each sample were obtained by using a GPS device. The samples were dehumidified by keeping them at 105 • C for 24 h in an oven, and they were dried to have a constant weight. Then, an agate mortar was used to ground the samples until they got the mesh size of 200 µm, later, the chemical analyses were carried out by standard methods using XRF. The oxide values of the samples, which had been subjected to the chemical analyses, were recalculated using background values; in the end, the element contents of the samples were determined.
Statistical analyses of the data about the heavy metal concentrations of the samples were conducted using SPSS 23 and Stata 14 software packages. Also, the density maps were prepared using ArcGIS software. MS Excel was used to calculate index values.
Firstly, descriptive statistics were calculated for the data set of heavy metal concentrations obtained from the sand samples collected from the beach, then Shapiro-Wilk and Kolmogorov-Smirnov normality tests were performed on the data set. The data group was tried to be interpreted by using multivariate statistical methods. In the present study, correlation analysis, factor analysis, and regression analysis were used. The spatial distribution of heavy metals along the beach was explained using density maps of the data set. Then, the pollution indices of the beach sands of Moonlight Beach were determined using pollution index calculation methods.

Correlation Analysis
The strength and direction of the relationship between two variables are determined using correlation analysis. This relationship is determined according to the value of the correlation coefficient, which is between −1 and +1. If the value of the correlation coefficient between two variables is +1 or −1, it is interpreted that the correlation between the variables is perfect. On the other hand, if the value of this coefficient approaches zero, the strength of the correlation decreases. The sign of the coefficient indicates the correlation's direction. Accordingly, the sign of + shows a positive correlation, while the sign of − shows a negative correlation. Usually, the following four correlation measures are calculated: The Pearson's correlation coefficient (also known as Pearson product-moment correlation), Spearman's correlation coefficient, the Point-Biserial correlation coefficient, and Kendall rank correlation coefficient [44]. These coefficients are used based on the data's meeting several statistical assumptions. Since the data set did not show normal distribution, Spearman's correlation coefficient was used instead of the Pearson's correlation coefficient in the present study. The equation of Spearman's correlation coefficient is as follows: where ρ denotes the Spearman's correlation coefficient, d i is the difference between ranks of the corresponding variables, and n denotes the number of observations.

Factor Analysis
As a multivariate statistical analysis method, factor analysis is used in several disciplines. In particular, it is widely used to transform a large number of variables into a small number of significant variables. In this analysis technique, the maximum common variance is calculated using all variables, and they are put in a common factor. We can use these new factors extracted in other analyses. Factor analysis is divided into two methods as exploratory and confirmatory according to its purpose. Exploratory factor analysis is preferred when a theory will be suggested by finding a factor using the relationship between variables; on the other hand, confirmatory factor analysis is preferred when the relationship between variables will be used to test a previously known theory [45,46]. A factor model can be defined as follows: . .
where x is a random data matrix, which has k variables and n units, and u is the mean vector of x. The coefficient of l ij denotes the factor loading of the ith variable on the jth factor. It indicates the factor loadings on the factor. Since F j affects all observed factors, it is referred to as the common factor (i = 1, . . . , k; j = 1, . . . , m; m < k ) [45,46].
The data set is expected to meet the following assumptions to apply factor analysis: The data set should have no outliers. Also, there should be more samples than the number of factors, i.e., the sample size should be sufficient. There should be no multicollinearity between variables. The variables are not required to have a fixed variance between them. The variables are required to satisfy the linearity requirement. Finally, the variables are required to satisfy the assumption of multivariate normal distribution. However, the last assumption is not required when the Principal Component Analysis method is selected.

Regression Analysis
Regression analysis is referred to as a statistical modeling method. It is used to predict the correlation between one or multiple independent variables (explanatory variable) and one dependent variable (explained variable).
The multiple regression model is defined as follows: In this equation, Y i stands for the ith value of the dependent variable observed, X ij stands for the ith value of the jth independent variable, β j stands for jth estimated regression coefficient, ε denotes the error term (residual), and k stands for the number of independent variables [47].
The regression model in the multiple regression is required to have a proper specification. In other words, it should be ensured that appropriate independent variables are used in the model. The variables are required to have a linear relationship. There should be no outlier in the data set. There should be no correlation between errors and independent variables. The values of dependent and independent variables should show equal variance. The errors are expected to be distributed close to the normal distribution. There should be no multicollinearity problem, i.e., high correlation between variables.
The most frequently used methods for estimating the coefficients are the least-squares method and the maximum likelihood method.

Calculation of Contamination Indices
Heavy metal contamination in sediments is determined using the Contamination Factor (CF) index [48,49].
The Pollution Load Index (PLI), on the other hand, indicates heavy metal concentrations. It is derived from concentration factors and summarizes the overall heavy metal toxicity level in a given sample [50,51].
The Enrichment Factor (EF) examines the ratio of a sample's elemental concentrations to the elemental concentrations in the surficial Earth's crust [49]. The thresholds used in examining this ratio are given in Table 1 [52,53]. The potential ecological impact of heavy metals in sediments on organisms in marine ecosystems can be investigated using the Potential Ecological Risk Index (RI) [54]. Potential Ecological Risk Index (RI) can be calculated by multiplying the result of the Contamination Factor (CF) and the toxicity coefficient. The total value of each metal calculated gives the potential ecological risk index of that sample. The following toxicity coefficients are used in the equation: (T f ): Mn = 1, Cr = 2, Ni = 5, and Zn = 1.

Concentration and Descriptive Statistics in Sediment
The coordinates of the locations where 20 beach sand samples were taken and the descriptive statistical values of 21 elements found in these samples are given in Table 2. The concentration values of the elements in these samples are also shown in the same table. According to the average values of the concentrations, the elements are listed in a descending order as follows: While determining the distribution of the variable, the extension of the distribution toward the right or left tail is called skewness. Skewness is used to determine the degree to which the distribution is symmetrical [55][56][57][58][59]. Skewness value is used to analyze the asymmetrical distribution of elements. If skewness = 0, the series is distributed symmetrically. The series becomes asymmetrical in line with the difference of the skewness value from zero. The series is right-skewed (positive) when the skewness > 0, and the series is left-skewed (negative) when the skewness < 0, which means it is not symmetrical. In the case of the right-skewed data, usually, the arithmetic mean will be greater than the mode and median values. On the other hand, the arithmetic mean is usually smaller than the mode and median values in the case of the left-skewed data [55]. In the present study, the elements of Ca, Si, Mg, and Cr have left-skewed curves while the other elements have right-skewed curves. The elements with a right-skewed curve tend to have higher values compared to other elements.
The kurtosis, on the other hand, is used to determine the height of the distribution of the same variables [55][56][57][58][59]. If the kurtosis value equals to 3, the curve has a mesocortical shape. In this case, there is no kurtosis in the curve, and there is a symmetrical distribution. A leptokurtic curve is seen when kurtosis > 3, which means that the curve is asymmetrical and the probability of extreme values is high. On the other hand, a platykurtic curve is formed when kurtosis < 3, which means that the curve is asymmetrical, and the data is flatter, "less peaked", and spread over a wider area compared to the normal distribution [55]. In the present study, the elements with a kurtosis value less than 3 are found to be Ca, Si, K, Ti, Mn, Cr, Ni, Sr, Pb, Ba, Zn, and they have asymmetric curves. These elements have platykurtic curves. On the other hand, other elements are asymmetric and have leptokurtic curves.
The Shapiro-Wilk and Kolmogorov-Smirnov normality tests were conducted to examine whether the data consisting of the element concentrations in the beach sand samples had a normal distribution ( Table 3). The hypotheses below were tested for these normality tests: Hypothesis 1 (H 1 ). The data fit the normal distribution.

Hypothesis 2 (H 2 ).
The data do not fit the normal distribution. H 1 hypotheses were accepted when the significant (sig.) value of the Z statistics was greater than 0.05, while H 1 hypotheses were rejected for other values. According to the results of hypothesis tests, Si, Mg, Na, Ti, Mn, Cr, and Sr were found to show normal distribution, while other data did not fit the normal distribution.

Correlation Analysis
The correlation coefficients determined based on the results of chemical analyses of beach sands were presented in Table 4. The correlation coefficient was calculated using Spearman's rank-order correlation. The following elements were found to have strong positive correlations (0.01 level) with the other elements mentioned: Si with Fe, Al, K, Ti, and P; Fe with Al, K, Ti, Mn, and P; Al with K, Ti, P; Na with S; K with Ti, Mn, and P; Ti with Mn and P. On the other hand, high negative correlations (0.01 level) were found between Mg and Ba and between Na and Zn. There is a negative correlation (0.05 level) relationship between Ca and Si, Al, K, Ni. However, the following elements were found to have a negative correlation (0.05 level) with the other elements mentioned: Ca with Si, Al, K, and Ni; Mg with P, Sr, Ba, and Zn; S with Zn; Sr with Zr. Metals showing a high positive correlation were interpreted to be of similar origin [56][57][58][59]. These metals were found to be associated with Si. However, the elements with negative correlations were thought to have different origins. Moreover, Ca constitutes a different origin group from Si.

Factor Analysis
The Kaiser-Meyer-Olkin (KMO) test statistic was used to examine the suitability of the data of the Moonlight Beach coastline for factor analysis. Since the test statistic was found to be 0.52 > 0.5, the data set was considered suitable for factor analysis. Moreover, the test statistics for Bartlett's Test of Sphericity was found to be 301.027 and the significant value was calculated as 0.000. Since p < 0.01, the null hypothesis was rejected, which meant that the correlation matrix was not equal to the identity matrix. Therefore, the dataset was also found to be suitable for factor analysis according to Bartlett's Test of Sphericity (Table 5).
Since all of the data did not fit the normal distribution, factor loadings were studied using the Principal Component Analysis. It was thought that the components showing a strong correlation indicated the source that provided the strength of the correlation between those components [60,61].
The proportional changes for each analyzed element are given in Table 6. The elements which had the initial value 1 did not lose much significance at the end of the dimension reduction operation through factor analysis. These values were found to be close to 1 ( Table 6).
The levels of explained variance based on the factor analysis are given in Table 7. According to the factor analysis, four factors were obtained for 14 elements. With the value of 30.584%, the first factor has the highest explanatory power. The other four factors obtained explains 83.540% of the total variance. Since the level of explained variance is high (close to 1), the obtained factors have high explanatory power.
Principal component analysis (PCA) has a key role in statistical analysis [9,15,20,62]. It was used to determine the differences between beach sand samples based on the variables. A Rotated Component Matrix was obtained to see factor loadings more clearly. In the present study, four principal components were determined using the SPSS 23 software package. According to the Rotated Component Matrix, the elements of Al, K, Ti, Si, and Mg constituted Factor 1 (Table 8). Similarly, Na, S, Ni, Ca, and Sr constituted Factor 2, while Mn, Cr, and Fe constituted Factor 3. Finally, the element of P constituted Factor 4.
The eigenvalue is an important measure for observing the number of components determined in factor analysis [21,55,62]. According to eigenvalues, the breakpoint formed by the values in the plot causes the curve to have a new slope due to a significant change in the variance 4; thus, the curve turns into an approximately flat line after point 6 ( Figure 2). It was found to be compatible with four factors previously determined and became an important indicator for following the change of these factors. Since the breakpoints in the curve are not sharp, it is interpreted that the number of variables affecting the study area might be high.

Regression Analysis
The dependent variable to be explained by regression analysis is Si, while the independent variables selected to explain this variable are Sr, Mn, Mg, Na, Cr, and Ti. The R 2 value, the coefficient of determination of the regression model, was found to be 63.6%. These independent variables explain 63.6% of the dependent variable. The value of the variance inflation factor (VIF) was examined in terms of multicollinearity assumption; and the maximum VIF value was found to be 2.916 (<10), which was an acceptable value. There was no multicollinearity problem. The Durbin-Watson statistic was determined to be 1.597. According to the ANOVA table, the significant value was calculated as 0.021. According to these results, the feature and number of data used in the statistical analysis were found to be sufficient (Table 9). Examining the ANOVA table of the model, it is seen that the F test value is 3.790 and the significant value is 0.021 (<0.05). According to these figures, the model was found to be statistically significant.
According to the values of the regression coefficients in the model, the effects of two elements on the Si value were found to be statistically significant. One unit increase in the Ti value results in an increase of 0.022 units in the Si value. On the other hand, one unit increase in Cr value causes a 0.069 unit increase in Si value.
The significance values of the t statistics for the constant-coefficient and the elements of Mg, Na, Mn, and Sr were found to be higher than 0.05. Therefore, they were not found to be statistically significant.

Metal Concentrations and Distribution in Baech
Depending on the characteristics of the locations, some heavy metals may accumulate at the locations [6,7,19,21,34,[52][53][54][55]61]. The distribution maps of these deposits are important in terms of showing the differences and similarities of the elements and heavy metals' general distribution on the site [11,34]. In this context, distribution maps of the heavy metal concentrations of Al, Fe, P, Ti, which were the metals representing the first factor, in the beach sand samples from Moonlight Beach were prepared (Figure 3). According to these maps, the beach sand samples of Moonlight Beach were found to have very low values in terms of heavy metal anomalies. It is evident that the beach, where anthropogenic activities are intense, are surrounded by heavy metals.

Results of Index
Various "index" calculations were used to determine the toxicity risk of the beach and to calculate the quality of the beach sand [37,48,49]. These calculated values were compared with the national Maximum Permissible Concentration (MPC) [61]. The following indices were used in an integrated manner: Pollution Load Index (PLI), Potential Ecological Risk Index (RI), Contamination Factor (CF), and Enrichment Factor (EF). Fe was considered to be the "conservative element" in the calculation of Enrichment Factor (EF) (Tables 10 and 11). Considering the distribution of Potential Ecological Risk index values of beach sand samples from Moonlight Beach, location K1 stands out with a value of 65.73, which is the highest value. However, this value is well below the threshold value (RI < 150) (Table 1). However, according to PLI, all of the locations with a value of less than 1, except for the locations of K3, K13, and K2, were found to be polluted (PLI > 1: Polluted). On the other hand, CF and EF values were found to be 0 (zero) in location K2. However, in other locations, they were greater than 1 (one). The national Maximum Permissible Concentration (MPC) values for heavy metal concentrations have been determined on a national scale, [61,63]. The distribution map of the Potential Ecological Risk Index (RI) values of the study area is shown in Figure 4. The values of all of the locations were found to have lower than the national Maximum Permissible Concentration (MPC) values [61].

Conclusions
According to the data obtained from the chemical analyses of the chemical contents of the samples taken from the study area and the findings obtained from the statistical analyses and distribution maps of these data can be listed as follows: The elemental contents of the beach sand samples collected from Moonlight Beach are ordered according to their abundance as follows: Ca > Si > Mg > Fe > Al > Na > K > Ti > S > Sr > Mn > Cr > Ba > P > Ni > Zn > Zr > Cu > Rb. Location K1 draws attention in terms of having the highest heavy metal anomaly (Fe, Cl, Al, Mn, Cr, Ni, Br). Location K18 has the richest Ca content while the location K14 has the richest Si content.
According to the results of the Shapiro-Wilk and Kolmogorov-Smirnov normality tests, the contents of all the other elements except for Si, Mg, Na, Ti, Mn, Cr, and Sr are not normally distributed. Therefore, factor loadings of the data were calculated using Principal Component Analysis.
Heavy metals with a strong positive correlation (0.01 level) with Si are ordered as Fe, Al, K, Ti, P. This relationship suggests that these metals are of similar origin. On the other hand, Ca shows a different correlation behavior from Si and heavy metals strongly correlated with Si.
The results of the Kaiser-Meyer-Olkin (KMO) test (0.52 > 0.5) revealed that the data set was suitable for factor analysis. According to the results of Bartlett's Test of Sphericity (301.027; significant 0.000; p < 0.01; the null hypothesis is rejected; the correlation matrix is not equal to the identity matrix), the data set was found to be suitable for factor analysis.
According to the results of Principal Component Analysis, by which four factors were extracted from 14 heavy metals, 83.540% of the total variance is explained by these factors. According to eigenvalues, the breaking point causes a significant change in variance 4. This case was found to be consistent with PCA. The first factor explains 30.584% of the total variance. The first factor, which was determined by Rotated Component Matrix, consists of the elements of Al, K, Ti, Si, and Mg, and they are consistent with the elements that show a strong correlation.
The value of the coefficient of determination (R 2 ) of the regression model was calculated as 63.6%. The highest value of the variance inflation factor (VIF) was calculated as 2.916 (<10), and the Durbin-Watson coefficient was determined as 1.597. According to the ANOVA table, the F test value was 3.790, and the significant value was 0.021 (<0.05). It was found that there was no multicollinearity problem, the data were sufficient, and the regression model was significant. According to the coefficients of the regression model, one unit increase in the value of Ti causes a 0.022 unit increase in the value of Si.
According to distribution maps of heavy metal anomalies, location K3 has the minimum anomaly. In particular, Moonlight Beach has no problem with its beach sands in terms of heavy metal anomalies. However, the promontory part on the west of the beach, the creek on the east of the beach, and the parking area of the boats show heavy metal anomalies. Heavy metal entry to the beach in these areas should be prevented.
The Potential Ecological Risk Index (RI) analysis conducted for the study area revealed that the risk index was RI < 150. In this case, the entire study area was found to have no potential ecological risk.
The heavy metal analyses stated in the manuscript can be repeated for certain periods. In the case of detecting heavy metal increases in the area, emergency action plans can be prepared. The coast can be protected by sharing this data with local governments. Moreover, the procedure can be repeated to eliminate the risk of increasing heavy metal values. Also, measures can be taken to avoid anthropogenic pollution.
Funding: No external funding was received for this research.