Principal Components Analysis (PCA) of Monument Stone Decay by Rainwater: a case study of “Basílica da Estrela” church, Portugal†

An extended version of Principal Components Analysis (PCA) of monument stone decay phenomena occurring at “Basilica da Estrela” church, Lisbon, Portugal, is now presented. The PCA rationale and general methodological procedure is presented, as a first step of a stepwise approach to the eigenvector methods of data analysis. PCA, as others “Eigenvector Methods”, seeks to reveals the underlying structure that might exist within a set of multivariate observations. Temperature, pH, electrical conductivity and main ionic species were measured on several seepage samples over three years inside the monument. PCA results are discussed in the perspective of a nondestructive tool.


Introduction
Based on previous studies [1][2][3], an extended version of Principal Components Analysis (PCA) of monument stone decay phenomena occurring at "Basilica da Estrela", Portugal, is now presented [1][2][3]. The "Basilica da Estrela" church is the most relevant 18 th century monument in the city of Lisbon, Portugal. Built with limestone, it is located in a moderately polluted area about 15 km away from the sea. Physical weathering forms, such as granular disintegration, flakes, scales and spalling, prevail inside. However, chemical weathering forms are also present and are largely dominated by calcite re-precipitation forming large white zones. Soluble salts were, in contrast, practically non existent or hardly found. Regarding monument stone decay and conservation issues, infiltration of rainwater through the terrace of the church was pointed out as being the main problem of the monument [3]. In [2] there was a preliminary presentation of PCA (plot of variables in the plane of the two first components) and here the structure of the relations between variables and samples will be developed.
To study the water-rock interaction and the stone decay phenomena occurring at "Basilica da Estrela", chemical and physical analyses were performed on seepage water samples collected over three years inside (high choir) the church. Temperature, pH, electrical conductivity and main ionic species were measured on each sample. The data gathered over the sampling period have been worked out and will be analysed here using Principal Components Analysis (PCA).
PCA approach will be used, in this paper, to help data interpretation and also as a first step of a stepwise approach to the eigenvector methods of data analysis. Only the rationale and the general methodological procedures used in PCA will now be presented. This means that, the mathematics (theoretical and practical manipulation) and the computational essentials underlying PCA implementation, are beyond the scope of our paper. Those interested in should be referred to [4], for instance. As a first approach, our description of PCA will rely heavily on Davis [4]. In this paper we have used the computer program "STAT-ITCF (1987, 3 rd version). ITCF stands for "Institut Technique des Céréales et des Fourrages".
There are some examples of the use of PCA in studies to address monument stone decay problems. Moropoulou et al. [5] tried to show how PCA could be a powerful tool in conservation and restoration problems. Bacci al. [6] used PCA for discriminating areas resulting from degradation process of calcareous stones due to interaction with atmospheric pollutants in urban and industrialized areas. Benavente et al. [7] applied PCA to analyse the relationship between petrophysical properties and durability when they were studying salt weathering in dual-porosity building dolostones. Maurício et al. [8] applied Benzécri Multiple Correspondence Factor Analysis (MCFA) to chemical data of salt efflorescences collected at Sta Marija Ta´ Cwerra Church in Malta. Torfs et al. [9] applied also PCA, combined with other methods, in the analysis of the data they had collected in their study on the environmental effects on the deterioration of the church of Sta Marija Ta´ Cwerra, Malta. Zezza et al. [10] included also Principal Components Analysis in their study of the environmental effects on the deterioration of the Cathedral of Bari, Italy.

Data Sampling
To assess the rainwater-induced alteration processes at "Basílica da Estrela", physical and chemical analyses were performed on seventeen seepage water samples collected over three years inside the church at the elevated choir. Seepage water samples resulted from rainwater that was able to penetrate the building through the roof and percolated the monument changing its composition through water-rock interactions. This approach can be considered a NDT tool for the characterization of alteration of geologic materials in the built environment as it does not involve the extraction of samples from those materials.
For each sample, the main chemical species determined were respectively Cl -, NO3 -, SO4 2-, HCO3 -, CO3 2and Na + , K + , Ca 2+ , Mg 2+ . Electrical conductivity, pH and temperature were also measured. The type of data gathered all over the sampling period form the raw data set that was worked out and analysed using Principal Components Analysis (PCA), in this paper. PCA approach was used to help data interpretation. Table 1 gives the basic statistical parameters of the raw data set. This consists of 12 selected variables (physical and chemical properties, table columns) measured on seventeen samples. Table 1. Raw data set: basic statistical parameters of physical and chemical properties measured on seventeen seepage water samples. The chemical analyses are in weight per cent. MAX: maximum value; AV: average (mean); MIN: minimum value; STD: standard deviation.
Variables -Physical and chemical properties According to Davis [4], PCA is a factor analysis technique designed for interval or ratio data that are measurements made on a continuous numerical scale. If the raw multivariate data matrix has n rows that represent observations/samples and m columns of variables, the n samples or objects may be regarded as being points located in the m-dimensional space defined by the m variables. In general, PCA as any other eigenvalue and eigenvector methods was originally devised to explain the interrelationships in a large numbers of variables by the presence of a few factors or principal components or axes. It has as its main purpose to decompose the larger m-dimensional space (a multivariate set of observations) into a smaller p-dimensional one, by computing new, uncorrelated orthogonal components that are linear combination of the original variables and losing as less as possible of the variance in the original data set. The new components are called principal components of the multivariate data matrix. The linear transformation of m original variables to p new variables is performed in a fashion that requires that each new variable accounts for, successively, as much of the total variance as possible. But the question, now, is: how many factors should be retained? A general pragmatic approach may consist of extracting only two or three factors, because this is the maximum number that can conveniently be displayed as scatter diagrams and any number larger than this increases the dimensionality of the problem to the point where it again becomes difficult to grasp [4]. The factor axes/or principal components are, then, plotted two at a time as 2D flat diagrams that are more easily manageable and perceptible dimensions at just one glance. We should need only p factor axes to explain our data and the usual assumption is that p < m.
Finally, it should be stressed out that the principal components have to be interpreted in terms of original variables. However, sometimes this may not be made as easier as we wish. Then a full circle approach from variables to principal components, for reduction in the size of the problem, back to variables for interpretation of the principal components, is usually used. 2D diagrams of the principal component's loadings show the correlation among the original variables themselves and also between these and the principal componets analysis axes. On the other hand, projecting the samples scores (samples co-ordinates) onto the first two principal axes (interpreted in terms of the original variables according to their loadings in the principal components) some significant insight into the inter-samples relationships in the data set could also be obtained. This way, this may help the analyst to explore the inter-variable relationships, the inter-object (similitude) relationships in a given data set, as well as the interrelations between the variables and objects with each of the respective principal components.
As the principal components are linear transformations of the m original variables, we are able to plot PCA scores simply by projecting our original observations onto the principal axes [4].

General Methodological Algorithm: a simple layout
Here, we will try to sketch out only some of the general or basic computational steps usually involved in principal components analysis, as it is, supposedly, implemented in several commonly available libraries of computer programs. As other factor analysis techniques, PCA tries to explore some of the mathematical and computational relationships that exist between a data matrix, its matrices of cross-products, and their eigenvalues and eigenvectors.
Principal components are nothing more than the eigenvectors of a variance-covariance or a correlations matrix. PCA is, then, concerned with finding these axes and measuring their magnitude. It starts by extracting the eigenvalues and eigenvectors of a variance-covariance or correlations matrix, and then discarding the less important of these. The eigenvectors are the coordinates of the principal components axes of the data set. They may provide significant insights into the structure underlying the data set, yielding the orientations of the principal axes of the cloud of points. On the other hand, the eigenvalues represent the lengths of the successive principal axes or principal components. That is, they represent the amount or proportion of the total variance transported or accounted for by the eigenvectors. So, in general, a PCA implementation may involve only a few steps, starting by computing the matrix of variance-covariance or correlations. Then it proceeds by extracting the eigenvalues and their associated eigenvectors from the matrix of the cross-products of an original raw or transformed data set. The matrix of the cross-products may be obtained from an original raw or transformed data set. The variance-covariance matrix will contain elements of correlations when all the initial raw variables in the data set are standardized so they have means of 0.0 and variances of 1.0. Standardization may be unavoidable if the original variables are expressed in different, incompatible units. In a third step, we compute what is called principal component scores by projecting onto the principal components each sample or original observation. Principal components loadings are the elements of the eigenvectors that are used to compute the scores of observations and they are simply the coefficients of the linear equation which the eigenvector defines [4]. A final step in PCA implementation may involve the plotting and interpretation of 2D scatter diagram defined by each pair of the principal components. By cross-plotting, the variables and the samples are shown at positions representing, respectively, their loadings and scores on the principal components. The arc on the diagram is part of a circle representing a communality of 1.00. The communalities are the amount of variance of each variable retained in the principal components. If a variable falls on the circle, the two components account for all of its variability. Variables that plot inside the circle are characterized by variability that is not represented by the two principal components.

Results and Discussion
The matrix of pairwise correlations and the eigenvalues for the first eigenvectors are given in Table 2. The loadings of the eleven original variables on axis I are plotted against the loadings on axis II, in Figure 1. The samples are plotted on the score space defined also by the two first principal components ( Figure 2). The samples are shown at positions corresponding to their scores on the first two axes. The first principal axis contains about 45.2 % of the total variance, whereas the second principal component represents an additional 24.7% (both correspond to almost 70.0 % of the total variance of the results from seepage water samples).  K + , Na + , Cl -, SO4 2-, HCO3 -, NO3and pH are all well represented on the plan defined by the first two principal eigenvectors, with communality values varying, respectively, between 0.95 and 0.80 ( Figure 1). On the other hand, Ca 2+ , CO3 2-, Mg 2+ , T and specific conductivity (σ) are not well represented on this plan as they plot far from the circle representing a communality of 1.00. These last variables involving also the third or fourth principal components seem thus to be less important to explain the overall variability in the seepage water samples.
The first principal eigenvector is strongly and positively correlated with K + , Na + , Cl -, HCO3and NO3and negatively correlated with pH. Samples projected on to the right side of the scores plot have values for these chemical species higher than their mean values. On the left side we find the samples with the highest values of pH. The second principal eigenvector is positively correlated with , pH, CO3 2-, Ca 2+ , SO4 2and negatively with T and Mg 2+ . However, this axis does not clearly contribute for the analysis of samples position onto this scores plan of the two principal components.
Only K + , Na + , Cl -, HCO3and NO3form one cluster on this factor plan that is, in general, positively correlated with SO4 2-. This seems to suggest the same source or process involving the strongly and positively correlated variables forming the cluster as well as a not very different source and alteration process or possibly a slight combination of other ones involving SO4 2-. Together with pH these variables seem to play a significant role in the characterisation of seepage waters. They explain most of the variation observed in the chemical composition of the samples, while the other variables, including Ca 2+ as well, do not. This surprisingly secondary role played by Ca 2+ is possibly associated with stalactite formation observed inside the church. However, all the variables analysed do not provide enough discrimination of seepage waters to allow sub-classifications other than richer/poor samples in the content of these variables. Taking into account the sampling period, no time-or seasonal-dependent control of seepage water composition has appeared clearly from the analysis of the data. This could reflect a significant uniformity contribution of ion sources and stone alteration processes.

Conclusions
The water-rock interaction and environmentally-induced processes, at "Basílica da Estrela", seem to promote essentially the enrichment of seepage waters in K + , Na + , Cland HCO3 -. In this case, PCA has produced a result in general agreement with the one obtained by the analysis of the same data set performed by Figueiredo et al. [1][2][3] using a, perhaps, more classical geochemical approach [1][2][3]. However, it should be stressed that this kind of multivariate analysis may provide a basis for the management of environmental and stone decay data which may, when needed, be combined with other geochemical and petrophysical studies. For instance, PCA may help in the planning of future searching campaign for related studies. Hence PCA can help in non-destructive studies of stone decay by the study of these samples that represent the product of interaction between pollutants and stones without sampling the cultural materials.