Correlation of Environmental Parameters and the Water Saturation Induced Deterioration of Earthen Archaeological Sites: The Case of World Heritage Liangzhu City, China

: This paper proposes a combined methodology for the quantitative analysis of the correla ‐ tions between the monitored influencing environmental factors and the water saturation induced deterioration of earthen relics in a humid area. The Archaeological Ruins of Liangzhu City that have been exposed and severely damaged in a humid environment with high water content and dry–wet cycles are chosen as examples. A monitoring system including atmospheric, groundwater, soil moisture conditions, and images of the surface was installed. Based on the proposed methodology, 11 key influencing indexes involving groundwater, soil moisture and temperature at different depths, atmospheric radiation, and rainfall for the water saturation induced deterioration are inves ‐ tigated, and their correlation is described by a regression model. The weight rankings of influencing factors to the deterioration of the research area are calculated. The results can help quantitatively control the atmospheric environment where the earthen relics are located and can promote the con ‐ servation of the archaeological ruins in the humid environment.


Introduction
Sites with soil as the main building material are called earthen sites. As places where cultural relics are preserved, the earthen site is an important component of cultural heritage that demonstrates the unique spiritual value, vitality, and creativity of a nation [1]. When the earthen site is located in a humid environment, the stability of historic constructions is degraded by high water content, high temperature, and dry and wet cycles [2,3]. When the soil mechanics soften with rainfall and groundwater, the archaeological relics suffer severe damages due to soil erosion, cracks, water seepage, fragment, soil collapse, instability, and even the total loss of structure [4][5][6][7][8]. Currently, the protection of earthen sites has gradually become the focus of multidisciplinary scientific and technological research, such as chemical, image-based, monitoring, and laser scanning technologies (Table 1).

Technology Key Results References
Chemical Technology alkaline activation as an alternative consolidation treatment for clayey materials Elert et al. 2015 [9] a new type of grout material composed of gypsum, quicklime, soil, and admixtures for cracks reparation Lv et al. 2017 [10] the tentative protection treatment of earthen monuments with tung oil and quicklime Li et al. 2012 [11] Image-based Technology the image-based technology for mapping, restoring, and interpreting the archaeological heritages  [19] Research has been carried out on many earthen sites (Table 2). However, damaging processes and influencing factors are very complex and often difficult to identify and quantitatively evaluate. The correlation between environmental factors and site deterioration is sometimes specialized by the different sites, thus a common model has not been proposed yet. For the earthen relics in a humid environment, the saturation zone is the surface performance of the degradation of rainfall and groundwater. For a historic soil slope, the phreatic line represents the intersection line between the free water surface and the cross section of the slope body that is formed by the water seepage from the upstream surface of the slope, through the slope body, and downstream. The soil below the phreatic line is saturated and affected by the water seepage. Therefore, the position and shape of the phreatic line in the slope surface greatly influences the shear strength of the soil material, the stability of the slope, and the seepage stability of the soil material. The determination of its location is an important part of the soil slope seepage analysis and stability analysis. Furthermore, effectively reducing the height of the phreatic line is a research topic in practical engineering. China has a long history and culture characterized by a large number of cultural relics and historical sites [26]. The Archaeological Ruins of Liangzhu City are one of the typical representatives of earthen sites. In October 2018, a monitoring system for Liangzhu City was established and initiated. The monitoring includes overall topographic features, the characteristics of important archaeological relics, the cracks of open relics, environmental parameters, tourist management, archaeological excavation records, etc. In this study, a methodology for the analysis of the quantitative correlations between the influencing environmental factors and the water saturation induced deterioration of earthen relics is proposed.

Site Description
The Neolithic Archaeological Ruins of Liangzhu City (about 3300-2300 BCE) were added to the UNESCO World Heritage list during the 43rd session of the World Heritage Committee in Baku (Azerbaijan) on 6 July 2019. The ruins, with a core area covering 14.3 km 2 , are located in the Yangtze River Basin on the southeastern coast of China. They are considered an important representation of the early urban civilization based on the ricecultivating agriculture as the economic foundation [27,28]. Liangzhu City has a wellplanned structure consisting of a palace area, altar, moated city, outer wall, and a water conservancy system [29,30]. The peripheral Water Conservancy System of Liangzhu City, including Area of High-Dam, Area of Low-Dam, and the causeway in front of the mountains, built about 5000-4840 years ago, has multiple functions such as flood control and water storage, irrigation and transportation, which is a comprehensive management strategy for water. It is designed through unified planning at the beginning of the construction of Liangzhu City. Peripheral Water Conservancy System is the earliest large-scale water conservancy project site found in China so far and one of the earliest dam systems in the world. Liangzhu City belongs to a subtropical monsoon climate. Figure 1 is the location map of Liangzhu City. The property of Liangzhu City is composed of four areas: (i) the Area of Yaoshan Site, (ii) the Area of High-dam at the Mouth of the Valley, (iii) the Area of Low-dam on the Plain, and (iv) the Area of City Site. Currently, only four relics in Liangzhu City are excavated, which are the Mojiaoshan Palace, the Laohuling dam site, the southern wall, and the Yaoshan altar. The four relics are damaged by the rainfall and groundwater of the environment since excavated. The Laohuling dam is part of the Area of High-dam, which was excavated in 2016 ( Figure 2).
The Laohuling dam is mainly composed of silty clay soil with hydrological conditions closely related to the surrounding environment. The open Laohuling dam relic has an inverted "U" shape, which means that the water will flow to the Laohuling dam from the nearby mountains. It can be divided into the east section, south section, and west section. The east section is from southwest to northeast and is facing northwest, with a length of about 14 m and a relative height from 2.1 m to 4.1 m. The south section is from southeast to northwest and is facing northeast, with a length of about 33 m and a relative height from 3.2 m to 5.9 m. This area represents the main excavation section. The west section is from south to north and is facing east, with a length of about 18.7 m and a relative height from 2.9 m to 4.1 m. According to an investigation of the Archaeological Ruins of Liangzhu City, typical structural deterioration in the Laohuling dam relics was caused by the humid climate ( Figure 3). Firstly, a lot of soil cracks appeared when the dam relic was exposed to the air. Furthermore, some of the fracture channels experienced water seepage during rainy conditions. Secondly, a high saturation line was formed by the capillary water infiltration, and the soil mechanical properties at the foot of the slope were weakened due to the immersion of water. Thirdly, due to alternating dry-wet cycles, the silty clay soil started to shrink and swell, which resulted in the expansion of cracks and numerous local collapses. Fourthly, the collapse and holes in the slope expanded, thus leading to large and visible deformations. From the excavation year of 2016 till now, the instability of the structure, the surface cracks, holes, and loss of soils have been investigated. The results from the field survey showed that the Laohuling dam relic is a representative area for the research of earthen archaeological sites as it contains almost all types of damage occurring in a humid earthen site.

Monitoring System
The Liangzhu Site Management Committee monitors the environmental conditions of the open Laohuling dam relic. This mainly includes monitoring atmospheric, groundwater, and soil moisture conditions. This monitoring system has been operating since October 2018, and its components are shown in Table 3. Table 3. Monitoring components of the Laohuling dam relic.
Other than the monitored components in Table 3, periodical images of the Laohuling dam site were shot using installed high-resolution cameras to collect daily information on the important parts with cracks, holes, or other damages. The camera can rotate automatically according to the set direction of 45°, covering the surface of the dam site. The Digital Single Lens Reflex (SLR) camera with 4000 k resolution and a maximum focal length of 20 mm was chosen. Manual inspection was also performed daily in order to protect the archaeological relics. The environment in the glass shed is different from the outside especially in the summer days and rainy days. The current monitoring system only focuses on the meteorological, groundwater, and soil information surrounding the study area. As the dimensions of the Laohuling dam site are small, the position of the environment monitoring device is in the center of the dam area. In order to improve the correlation analysis, more indexes need to be monitored in the future.

Identification of the Saturation Area
The saturation of the slope foot can weaken the stability of the dam site by reducing the shear strength of the soil. The photos of the saturation area were taken by the SLR camera. The dimensions of the saturation area can be identified by image recognition technology. After applying the algorithm to identify the saturation area, the position and time of the image with the best recognition effect can be recorded as the measuring point for further data acquisition.

Image Recognition
The gray clustering algorithm was applied to separate the saturation area and the dry area. The main steps for identification of the saturation area were processed as follows ( Figure 4): 1. To improve algorithm efficiency, a square of 9 pixels by 9 pixels was selected as the clustering object. The average pixel value for gray and blue channels in the square was calculated. Then, the average value of these two channels was used as the clustering feature. As clustering is an unsupervised learning method, it can only divide the image pixels into two categories, yet it cannot indicate whether black or white colors belong to the saturation area. Therefore, further feature screening was needed to confirm the category. Here, we set white to represent the saturation area. 2. There were a large number of discrete points in the output of the previous clustering method. Therefore, we applied a morphological method for further image optimization. Morphological open operations were applied to smooth the regional boundary and eliminate discrete pixels. 3. After the application of the morphological open operations, a few scattered white areas remained as they were mistakenly identified as saturation area. Two regions were classified by the connected domain filtering. Furthermore, small and scattered white areas were removed from the saturation area. 4. After filtering the connected domain, the saturation area was identified. However, there were a few black pixels in the white area, so we used the morphological closed operation to fill them. In this step, all cavities in the region were filled and the adjacent region was connected. 5. Contour detection was used to mark the edge of the saturation area. Accordingly, the edge of the saturation region was identified, and the pixel area of the saturation region was calculated. Finally, the area in the red circle in Figure 4 was identified as the saturation area. The detected area by the algorithm was fit to the actual situation. The image recognition process can be used to calculate the dimensions of the saturation area, which is a performance of the deterioration of the dam site. As the camera takes 12 photos every day, the automatic process is necessary. Thus, the process of the image recognition was coded, the saturation area can be automatically identified when the images are input.

Identification of the Saturation Area of Laohuling Dam Site
A small region with the largest saturation area at the foot of the Laohuling dam site was selected as the study area. During 2019, 678 images of the study area were collected. However, only 191 images could visualize the deterioration area clearly. Other images could not be used for recognition because of the shadow of sunshine. The proposed image recognition algorithm was applied to identify and calculate the saturation area. The area with wet pixels (i.e., white pixels) ranged from 11,853 to 797,762. The pixel area can be converted into actual dimensions of the saturation area by applying the information related to the shooting angle, shooting distance, and lens focal length. Due to the lack of relevant information, pixel area was used in the analysis. Among the identified wet areas, 171 days had a wet pixel area with less than 10,000 pixels, which accounted for 46.85% of the whole year (including image missing days and dry days); 24 days had a wet pixel area of 10,000-200,000, which accounted for 6.58% of the whole year; 40 days had a wet pixel area of 200,000-400,000, and it accounted for 10.96% of the whole year; 87 days had a wet pixel area of 400,000-600,000, which accounted for 23.84% of the whole year. Finally, 21 days had a wet pixel area of 600,000-800,000, and it accounted for 5.75% of the whole year.

Correlation Analysis and Factor Analysis
The following processes are used to analyze the correlation between the environmental parameters and the saturation area.
First of all, the correlations between the monitored environmental parameters were assessed with the correlation analysis. The multi-collinearity analysis was performed for the data group with high correlation coefficients. Variance expansion factor and condition numbers were also used to determine whether multi-collinearity existed or not.
Secondly, multiple regression analysis was carried out. Data were collected at different temporal scales (e.g., seconds, hours, days), and the daily average was calculated for each factor. Multiple correlation analysis was performed to test the correlation between the monitoring data (x) and the dimension of the saturation pixel area (y). Regression analysis and recursive feature elimination were applied to select the monitoring indicators with statistical significance. Afterward, the regression model can be established. In the current research area, only one-year data can be used to establish the regression model, which is not reliable enough. The regression model needs to be optimized after collecting data for more years.
Factor analysis represents a statistical technique for the extraction of common factors from variable groups, and it decomposed the monitoring indicators into two parts (a common factor group and a special factor group) by applying the principal component analysis (PCA). Accordingly, the same information dispersed across multiple variables will be concentrated. The Kaiser-Meyer-Olkin (KMO) measure and Bartley sphere tests should be performed on the original dataset to check whether the data is suitable for factor analysis. The closer the KMO measure to 1, the better the effect of factor analysis is. Bartley sphere test is used to test whether each variable in the correlation matrix is independent. If the original hypothesis is rejected, factor analysis can be done. For factor analysis of normalized data, the number of common factor groups can be determined by a scree plot, or by a common factor with an eigenvalue greater than 1. Eigenvalues represent the ability of common factors to explain the information of all variables. Eigenvalues greater than 1 indicate that the explanatory power of a common factor is at least greater than the average explanatory power of one primary variable. Furthermore, the model can retain most of the internal information of the original data structure through factor rotation. Thus, it can better explain the influences of the environmental factors on changes of saturation area.

Regression Model
The correlation analysis of 33 monitoring indicators revealed that 11 monitoring indicators are significantly related to the saturation area. These indicators are: "PH value"," soil moisture 0 cm horizontal", "soil moisture 450 cm vertical", "soil temperature 100 cm horizontal", "soil temperature 50 cm horizontal", "soil temperature 0 cm horizontal", "groundwater level", "soil temperature 400 cm vertical", "atmospheric radiation", "rainfall", and "three days of cumulative rainfall". The coefficient of determination of the multiple-factor model R 2 was 0.85 and the optimal multiple regression model was established.
Multiple regression analysis showed that monitoring factors, such as soil temperature at different depths, have strong correlations with the wet pixel area. The strongest correlation (−0.7) with the wet pixel area is from the value of "soil temperature 0 cm horizontal". That is, when the value of "soil temperature 0 cm horizontal" increased, then the wet pixel area decreased significantly.

Factor Analysis
The KMO statistic was 0.79, and the Bartley sphere test p-value was 0, which indicated that the factor analysis is suitable for the original dataset. The results of factor analysis show that five groups of common factors were intercepted and retained. Five factors were rotated by varimax orthogonal factors to maximize the variance of each factor. After the factor rotation, the matrix of factor load can be obtained. Factor load represents the correlation coefficient between variables and a common factor. Furthermore, five groups of common factors consist of different indicators, and the results of factor analysis are obtained.
The first group consists of mostly temperature-related monitoring indicators, and it includes: "soil temperature 0 cm horizontal", "soil temperature 50 cm horizontal", "soil temperature 100 cm horizontal", "soil temperature 300 cm vertical", "Soil temperature 400 cm vertical", "Soil temperature 450 cm vertical", "soil humidity 0 cm horizontal", "soil humidity 300 cm vertical", "soil humidity 400 cm vertical", "atmospheric temperature(q)", "PM25", "groundwater temperature", and "groundwater PH". The highest factor load among the 13 variables was 0.981 for "soil temperature 300 cm vertical", with other soil temperature factor loads greater than 0.9. Accordingly, the common factor is called the temperature factor, and it explains 32.6% of the overall variance.
The second group consists of mostly humidity-related monitoring indicators, and it includes: "soil moisture 50 cm horizontal", "soil moisture 100 cm horizontal", "soil moisture 450 cm vertical", "groundwater level", "PH(q)", and "turbidity". The highest factor load was 0.908 for "soil moisture 100 cm horizontal", and other soil humidity factor loads were greater than 0.8. Thus, the second group of common factors is called the humidity factor, and it explains 17.4% of the total variance.
The third group consists of mostly environment-related monitoring indicators, and it includes: "noise", "illumination", "wind direction(q)", "atmospheric radiation", and "wind speed(q)". The highest factor load was 0.841 for "atmospheric radiation", while the factor load of "illuminance" was 0.809. Thus, the third group of common factors is called the environmental factor, and it explains 11.2% of the total variance.
The fourth group consists of mostly rainfall-related monitoring indicators, and it includes: "rainfall", "three days cumulative rainfall", and "CO2". The highest factor load was 0.856 for "rainfall", while the load of the "three-day cumulative rainfall" factor was 0.843. Thus, the fourth group of common factors is called the rainfall factor, and it explains 7.2% of the total variance.
The fifth group consists of mostly oxygen-related monitoring indicators, and it includes "dissolved oxygen" and "conductivity". The highest factor load was 0.805 for "conductivity", while factor load was 0.611 for "dissolved oxygen". Thus, the fifth group of common factors is called the oxygenated factor, and it explains 5.5% of the total variance.
Regrouping of common factors can prevent information fragmentation caused by the analysis of each monitoring indicator, and it also reduces the analysis of highly correlated factors. Table 5 shows the calculated weight rankings of monitoring factors for the wet pixel area of the research area.

Discussion and Conclusions
Environmental parameters such as temperature, moisture, groundwater, rainfall, and wind can cause severe damage to earthen archaeological ruins. In order to protect the earthen ruins, an advanced monitoring system is usually adopted worldwide. The current study applied correlation analysis, regression analysis, and factor analysis in order to identify the main influencing factors of the saturation induced deterioration of an earthen site.
A regression model to calculate the wet pixel area (represented as the saturation area) by monitoring indexes was established, which can be used to support the environmental control of the protected area. By the model, the change of the dimension of the saturation area can be calculated and forecasted by the indicators of "PH value"," soil moisture 0 cm horizontal", "soil moisture 450 cm vertical", "soil temperature 100 cm horizontal", "soil temperature 50 cm horizontal", "soil temperature 0 cm horizontal", "groundwater level", "soil temperature 400 cm vertical", "atmospheric radiation", "rainfall", and "three days of cumulative rainfall". The regression model should be optimized in the future based on more trained data from the monitoring system and the identification of the dimensions of the saturation area.
Temperature and humidity are widely demonstrated as the most significant factors destroying earthen ruins. In the current study, the damage of temperature and humidity to different parts of the Laohuling dam relic was classified. The temperature factor group explained nearly one-third of the information in the model, and accordingly, it represents the most important group of common factors. In the temperature group, the most important components are represented by the six monitoring indicators in Table 5. The second most important group is the humidity factor that explained nearly one-fifth of the model's information. The important components of the humidity factor are "soil moisture 50 cm horizontal", "soil moisture 100 cm horizontal", "soil moisture 450 cm vertical", and "groundwater level". The important indexes from the temperature and humidity factor groups should be paid more attention during the design of the conservation treatment and routine maintenance.
The proposed methodology could be used to analyze the monitoring data and to calculate the dimensions of the saturation area. The current study provided preliminary results for predicting further saturation-induced deterioration of earthen objects in the Archaeological Ruins of Liangzhu City. The micro-environment in the glass shed of the Laohuling dam site will be controlled in the future by the installation of dehumidifiers and automatic sprinklers, as well as the construction of a waterproof curtain underground. The regression model can provide scientific recommendations for the performance of the environmental control machines. Thus, the conservation and protection of the relics in the humid environment will be more reliable. The image recognition algorithm, correlation analysis process, and factor analysis process can be adopted by other archaeological ruins to analyze monitoring data and acquire useful features from the dataset for protection.