Combination of Total-Reflection X-Ray Fluorescence Method and Chemometric Techniques for Provenance Study of Archaeological Ceramics

The provenance study of archaeological materials is an important step in understanding the cultural and economic life of ancient human communities. One of the most popular approaches in provenance studies is to obtain the chemical composition of material and process it with chemometric methods. In this paper, we describe a combination of the total-reflection X-ray fluorescence (TXRF) method and chemometric techniques (PCA, k-means cluster analysis, and SVM) to study Neolithic ceramic samples from eastern Siberia (Baikal region). A database of ceramic samples was created and included 10 elements/indicators for classification by geographical origin and ornamentation type. This study shows that PCA cannot be used as the primary method for provenance purposes, but can show some patterns in the data. SVM and k-means cluster analysis classified most of the ceramic samples by archaeological site and type with high accuracy. The application of chemometric techniques also showed the similarity of some samples found at sites located close to each other. A database created and processed by SVM or k-means cluster analysis methods can be supplemented with new samples and automatically classified.


Introduction
The study of samples of ancient ceramics is an important source of information about the cultural and economic life of ancient human communities. This is because ceramics were objects of everyday use and may provide a lot of information about trade relations, religious customs, and communication between various communities [1]. The provenance analysis of ceramics is possible through the determination of chemical composition [2]. Quantitative data regarding elemental composition are usually obtained by analyzing powdered ceramic fragments through X-ray fluorescence spectrometry (XRF) [3][4][5], instrumental neutron activation analysis [6,7], and inductively coupled plasma mass spectrometry (ICP-MS) [3,8]. Laser ablation inductively coupled plasma mass spectrometry [7,9] and micro X-ray fluorescence spectrometry (µXRF) [10] can be used for separate analysis of the clay component and inclusions in the ceramic cross-section. Non-destructive characterization of bulk ceramic composition is possible using portable XRF [11][12][13][14][15].
Despite the large number of methods used to analyze ceramics, one of the most popular approaches to chemical analysis in archaeology is the use of XRF spectrometry combined with machine learning techniques to attribute samples to specific places of origin. This

TXRF Analysis
This research was performed using a TXRF method for the analysis of ceramic samples, successful validation of which was performed in our previous study [35]. The analytical features of the method were not under consideration in this study. Table 1 presents the concentration ranges of the elements in the 81 ceramic samples obtained by TXRF analysis and divided by archaeological sites. Most of the ceramic samples (45) are represented by those from the Popovsky Lug site. Elements of interest were determined not only by the sensitivity and accuracy of the TXRF method, but also by the previous investigations, where element indicators were found [35]. The acid leaching sample preparation of the ceramic allows transferring of the clay component into solution and separation of the insoluble silicate minerals included in the ceramic [24,35]. Using this approach for provenance analysis allows for better identification of sample groups than using the bulk composi-tion of ceramic samples. This method, based on the analysis of the clay components of ceramics, is a perspective to be applied for comparison with the chemical composition of regional clays.  Figure 1 shows the spectra of ceramic samples from different sites. To perform the correct interpretation of spectra, all intensities were normalized to the Se-Kα line (internal standard). As can be seen from the spectra, there is either a slight (Ni, Ga, Y, and Pb) or great (Cr, Zn, and Sr) difference among peaks of same elements for samples from different sites. This means that ceramic samples may have similar elemental compositions even if they were found far away from each other.

Factor Analysis and Principal Component Analysis (PCA)
Factor analysis and PCA chemometric techniques are among the most popular techniques for exploratory data analysis. PCA is employed as a data dimensionality reduction method to visualize the samples in a lower dimension space. The data obtained by TXRF

Factor Analysis and Principal Component Analysis (PCA)
Factor analysis and PCA chemometric techniques are among the most popular techniques for exploratory data analysis. PCA is employed as a data dimensionality reduction method to visualize the samples in a lower dimension space. The data obtained by TXRF for the 81 ceramic samples were composed in a matrix that contained the concentrations of 10 elements as well as the characteristics of archaeological types and sites. First, factor analysis was applied for fast pattern recognition. It allowed the extraction of four significant factors, and showed that three elements, according to the first three factors, had the highest factor loadings: Zn, Cr, and Sr. Ternary diagrams could be constructed using these three factors. Figure 2 presents two ternary diagrams using Zn, Cr, and Sr concentrations divided by archaeological site and type. Looking at Figure 2a, one may recognize a cluster of samples from the Ust-Karenga site in the left-bottom corner. The Popovsky Lug site covers most of the space in Figure 2a and overlaps with other sites. However, one may visually separate the Shishkino and Makarovo sites from the others. If we look at the right ternary diagram (Figure 2b), which was made by type of ceramics, we may recognize the same cluster of the Ust-Karenga type. There was no more distinct clustering observed for the other types of samples.
Next, we applied principal component analysis (PCA). The study of the score and loading scatter plots allowed the identification of criteria for clustering objects. Figure 3ad present the score scatter plots that were constructed by projections on the first and second principal components of PC1-PC2. Figure 3e-h present the corresponding loading plots. Figure 3a,c are plotted by site; Figure 3b,d are plotted by type. The cumulative percent of variance in correlation matrix for the first two principal components is about, or more than, 50% for all scatter plots. In Figure 3e,f, the most significant variables for the first PC are Cr, Ni, V, and Pb. In Figure 3g,h, the most significant variables for the first PC are V, Rb, and Pb. The second principal component divides the samples by the contents of Ni and Sr.
As can be seen from Figure 3a, ceramics from the Popovsky Lug, Shishkino, Ust-Yamniy, and Makarovo sites are practically the same, while the ceramics from Ust-Karenga and Ust-Yumurchen are separated from them. For Ust-Karenga ceramics, the high content of Pb in the samples is noteworthy. The samples from Ust-Karenga differ from the others in the content of the groups of elements correlating with each other: Rb, Pb, Ni, V, and Cr. The samples from the Shishkino, Ust-Yamniy, and Makarovo sites form a common cluster with those from the Popovsky Lug site. These sites, located in the uppermost (southern) section of the upper Lena, are fairly close to each other, and this fact can explain the similarity in the PCA graphs. If we exclude the samples from Popovsky Lug, we may discover a clear segregation of the samples from Makarovo, Shishkino, and Ust-Karenga ( Figure 3c).
The division of ceramic samples by archaeological type is shown in Figure 3b (by all types) and in Figure 3d (only samples from Popovsky Lug). The scatter plot in Figure 3b shows a clear differentiation of the Ust-Karenga ceramic type. Figure 3d shows that the Setchaty type is a little apart from the other samples. The Ust-Belsky and Posolskaya types  Figure 2a and overlaps with other sites. However, one may visually separate the Shishkino and Makarovo sites from the others. If we look at the right ternary diagram (Figure 2b), which was made by type of ceramics, we may recognize the same cluster of the Ust-Karenga type. There was no more distinct clustering observed for the other types of samples.
Next, we applied principal component analysis (PCA). The study of the score and loading scatter plots allowed the identification of criteria for clustering objects. Figure 3a-d present the score scatter plots that were constructed by projections on the first and second principal components of PC1-PC2. Figure 3e-h present the corresponding loading plots. Figure 3a,c are plotted by site; Figure 3b,d are plotted by type. The cumulative percent of variance in correlation matrix for the first two principal components is about, or more than, 50% for all scatter plots. In Figure 3e,f, the most significant variables for the first PC are Cr, Ni, V, and Pb. In Figure 3g,h, the most significant variables for the first PC are V, Rb, and Pb. The second principal component divides the samples by the contents of Ni and Sr.
As can be seen from Figure 3a, ceramics from the Popovsky Lug, Shishkino, Ust-Yamniy, and Makarovo sites are practically the same, while the ceramics from Ust-Karenga and Ust-Yumurchen are separated from them. For Ust-Karenga ceramics, the high content of Pb in the samples is noteworthy. The samples from Ust-Karenga differ from the others in the content of the groups of elements correlating with each other: Rb, Pb, Ni, V, and Cr. The samples from the Shishkino, Ust-Yamniy, and Makarovo sites form a common cluster with those from the Popovsky Lug site. These sites, located in the uppermost (southern) section of the upper Lena, are fairly close to each other, and this fact can explain the similarity in the PCA graphs. If we exclude the samples from Popovsky Lug, we may discover a clear segregation of the samples from Makarovo, Shishkino, and Ust-Karenga (Figure 3c). Molecules 2023, 28, x FOR PEER REVIEW 6 of 15  The division of ceramic samples by archaeological type is shown in Figure 3b (by all types) and in Figure 3d (only samples from Popovsky Lug). The scatter plot in Figure 3b shows a clear differentiation of the Ust-Karenga ceramic type. Figure 3d shows that the Setchaty type is a little apart from the other samples. The Ust-Belsky and Posolskaya types are distributed over the entire area of the graph, which makes it impossible to judge the difference between these samples. The rest of the samples are mixed.
Despite some useful results being found, factor analysis and PCA did not show any systematic pattern able to create reference groups according to common provenance features.

Generalized Cluster Analysis by k-Means (k-Means CA)
The application of k-means CA is a fast and easy way for analyzing provenance without preliminary preparation of a dataset, for example, excluding outliers. In the case of PCA, we found similarities in the ceramic samples from different sites geographically close to each other without the possibility of diverse results. For example, ceramic samples from Popovsky Lug, Shishkino, and Makarovo could not be separated. The k-means CA gave us a better view on the differences in the ceramic samples. Table 2 presents the results of the k-means CA of ceramic samples, characterized by archaeological site. Testing error was equal to 0.448; training error-0.272. The total number of clusters was six, which is the same number as the number of archaeological sites. The Popovsky Lug site and Makarovo site are only related to cluster Nos. 3 and 4, respectively. These two sites are clearly distinguished from the others, which was not possible using PCA. Most of the Ust-Karenga site samples (7/8) are related to cluster No. 1. Samples from the Ust-Yamniy site share cluster No. 6 with the samples from Shishkino, as was observed by PCA. Two of the four Ust-Yumurchen samples are in a cluster No. 5, and one of the four belongs to cluster No. 6. Cluster No. 2 only includes two samples from Ust-Karenga and Ust-Yumurchen, the same pattern observed on the PCA graphs. In total, the results of k-means CA are better and clearer for provenance analysis than those of PCA. The method of k-means CA also enables the elimination of the subjective factor in the assessment of classification and the following additional errors.  Table 3 presents the results of the k-means CA of ceramic samples, characterized by type. The conditions of analysis were the same as in the case of analysis by site. Testing error was equal to 0.349; training error-0.206. The total number of clusters was five. The results of the classification of ceramic by type were even better than those by site. Clear and obvious clusters were obtained for the Ust-Belsky, Setchaty, Khaitinsky, and Posolskaya sample types. Only the Ust-Karenga type is divided by two clusters, but the clusters are independent. An independent cluster means no correlation with other variables.  The chemometric technique of k-means CA works very well with ceramic samples based on elemental data. The classification of ceramics by type is better than by site, meaning that archaeological interpretation based on visual assessment of found material is reliable.

Support Vector Machines (SVMs)
Based on the results of SVM, Tables 4 and 5 present a differentiation of the ceramic samples by archaeological site and type, respectively. Some samples were not correctly classified, and the tables also contain the percent of misclassification and incorrect attribution. Only four outliers were detected by the PCA method for the SVM model of samples, characterized by site. The classification accuracy of SVM performed for sites is 84.416%. In the summary table of the SVM method, some samples from the sites of Ust-Yamniy and Makarovo are defined as Popovsky Lug and vice versa, although they have some of their own characteristics associated with the type of pottery. The pottery of the Ust-Karenga site stands out in terms of chemical composition, although the pottery of the Ust-Yumurchen site is not clearly distinguished. In conclusion, data were obtained for four samples from Ust-Yumurchen, which was not enough for unambiguous conclusions. Table 5 presents the results of the SVM analysis of the ceramic samples by archaeological type. Six outliers were excluded before construction of the SVM model. The classification accuracy of the SVM as performed as 89.333%. The Khaitinsky, Posolskaya, and Ust-Karenga types were clearly distinguished by SVM. In total, 2 of the 13 samples of the Ust-Belsky type were incorrectly classified as being of the Posolskaya type. Almost half of the Setchaty-type samples were wrongly classified as the Khaitinsky or Posolskaya type.
The SVM model worked very well for the provenance analysis of ceramic samples, with an accuracy of more than 80%. To find out why the incorrectly predicted samples did not fit into the classification, we repeated the PCA after removing outliers. Those ceramic samples were indeed located at the edges of the 3σ border. If we removed the ceramic samples incorrectly predicted by the SVM method, then almost 100% accuracy of the SVM method was obtained. Thus, the resulting database using the SVM method can be supplemented with new samples for subsequent automatic classification. Ceramics from the Ust-Karenga and Ust-Yumurchen sites are represented by corded pottery. These two sites are located near the Upper Vitim River within the Vitim Plateau (north-eastern Transbaikalia). Ust-Karenga ceramics have a paraboloid shape with imprints of a thin twisted cord on the outer surface and traces of rubbing on the inner side. The ornament is made with a comb stamp; a zigzag shape is dominant. Ust-Yumurchen ceramics are characterized by imprints of a ribbed blade or a twisted cord on the outer surface. The ornament was applied with a jagged stamp as well as a rectangular or oval stack in cross-section. The rim has a subtriangular molding on the outer or inner side, which makes it similar to Posolskaya-type ceramics.

Sample Description
The selection of samples was carried out mainly with reference to the vessels for a full description and archaeological interpretation. The ceramic fragments were taken from different parts of the vessels and were rinsed in de-ionized (18.2 MΩ) water in an ultrasonic bath for 60 min, dried, and photo-documented. Typical ceramic samples from the aforementioned archaeological sites are presented in Figure 5. Then, the ceramic samples were crushed and milled using a mortar grinder.
wrapped with a cord. A distinctive feature of Posolskaya-type ceramics is a subtriangula thickening on the outer or inner side of the rim, under the cut of which a belt of through punctures is applied along a drawn or pressed groove. Ceramics of the Ust-Belsky typ are characterized by both closed-and open-shape vessels with pointed or rounded bot toms. The vessels are smooth-walled, decorated from the rim to the bottom, and distin guished by horizontal and zigzag rows of imprints made with a stack.  Ceramics from the Ust-Karenga and Ust-Yumurchen sites are represented by corded pottery. These two sites are located near the Upper Vitim River within the Vitim Plateau (north-eastern Transbaikalia). Ust-Karenga ceramics have a paraboloid shape with imprints of a thin twisted cord on the outer surface and traces of rubbing on the inner side. The ornament is made with a comb stamp; a zigzag shape is dominant. Ust-Yumurchen ceramics are characterized by imprints of a ribbed blade or a twisted cord on the outer surface. The ornament was applied with a jagged stamp as well as a rectangular or oval stack in cross-section. The rim has a subtriangular molding on the outer or inner side, which makes it similar to Posolskaya-type ceramics.
The selection of samples was carried out mainly with reference to the vessels for a full description and archaeological interpretation. The ceramic fragments were taken from different parts of the vessels and were rinsed in de-ionized (18.2 MΩ) water in an ultrasonic bath for 60 min, dried, and photo-documented. Typical ceramic samples from the aforementioned archaeological sites are presented in Figure 5. Then, the ceramic samples were crushed and milled using a mortar grinder.

Sample Preparation
The following reagents were used for TXRF analysis: single-element standard solution of Se (C = 1000 mg/L, Merck, Darmstadt, Germany) for the preparation of the internal

Sample Preparation
The following reagents were used for TXRF analysis: single-element standard solution of Se (C = 1000 mg/L, Merck, Darmstadt, Germany) for the preparation of the internal standard; nitric acid and hydrochloric acid (ultra-pure grade, Merck) for the preparation of the aqua regia used in the leaching procedure; and ultrapure deionized water (18.2 MΩ, Elga Labwater, High Wycombe, UK) for dilution.
We placed 20 mg of the sample in a polytetrafluoroethylene (PTFE) vessel, and 1 mL of aqua regia was added. The closed vessel was heated on a plate for 8 h at a temperature of 170 • C, cooled, and then 3.95 mL of ultrapure water and 0.05 mL of the internal standard of Se were added. Homogenization of the obtained solution was performed using a Vortex (IKA, Staufen im Breisgau Germany) for 5 min. The sediment in the form of insoluble solids was separated. An amount of 10 µL of the solution was put onto a quartz carrier and dried on a heating plate.

Spectra Acquisition
TXRF elemental analysis was performed using a benchtop spectrometer S2 PICOFOX (Bruker Nano Analytics, Berlin, Germany) equipped with an X-ray tube with a Mo-anode, multilayer monochromator, and 30 mm 2 silicon drift detector (energy resolution was <150 eV at Mn-Kα line). Measurement of one sample was carried out in triplicate during 500 s at a 50 kV voltage and a 0.50 mA current. Quartz carriers were used as sample holders and reflectors. The Spectra 7.8.2 software package (Bruker Nano Analytics, Berlin, Germany) was applied for spectra processing.

Data Analysis
The concentration of the element (C i , mg/kg) was calculated using the following formula [37]: where C is is the concentration of the internal standard; N i and N is are the net peak areas of an element of interest and the internal standard, respectively; S i and S is are the sensitivities of an element of interest and internal standard, respectively. The following elements were excluded from the data due to the study [38] of heterogeneity of ceramic samples: Ca, P, and Mn. In [38], it was shown that concentrations of these elements may strongly vary within the same sample and, therefore, cannot be considered in a provenance analysis.
The database of ceramic samples, with a matrix of 81 × 11, was divided into separate files, each representing a specific type/site of pottery, and then given statistical treatment. Different classification approaches were used for provenance analysis: unsupervised learning and pattern recognition (factor analysis, PCA) and supervised learning (k-means cluster analysis and support vector machines). Unsupervised learning algorithms are not guided by previously known classifications; only after certain clusters have been defined is it possible to assign labels. Supervised methods fit the model to a given classification. Chemometric methods were performed using the STATISTICA 10 program (TIBCO Software Inc., Palo Alto, USA).
Factor analysis (FA) was applied under the following conditions: 10 variables (elements), extraction of 3 factors, method of principal components, and varimax raw factor rotation.
Principal component analysis (PCA) was applied as follows: 10 elements were chosen as variables, scores were automatically standardized, and loading and score plots were based on the first two components.
The k-means cluster analysis (k-means CA) was applied under the following conditions: k-means algorithm, squared Euclidian distance method, and training/test samples 2:1. Support vector machines (SVMs): outliers were determined using the PCA method. Figure 6 shows an example of outlier detection by the construction of score and loadings plots in the space of principal components. Samples outside of the 3σ border were considered outliers. After exclusion of all the outliers, the SVM model could be applied.
Molecules 2023, 28, x FOR PEER REVIEW 12 of 15 plots in the space of principal components. Samples outside of the 3σ border were considered outliers. After exclusion of all the outliers, the SVM model could be applied. The SVM method is based on data mining and machine learning. By default, the program divides the entire set into 2 groups: training and testing. We placed 75% of the samples into the training group, and the remaining 25% into the test group. The SVM model characteristics were classification type 2, the kernel type was a radial basis function (gamma = 0.450), and cross-validation.

Conclusions
A reference database was created to characterize the ancient ceramics of the studied territories. The database was integrated into the STATISTICA program for the rapid study of the differences and similarities of the samples using chemometric approaches (PCA, kmeans CA, and SVM), as well as for classification by geographical origin and ornamentation. The study showed that unsupervised clustering methods such as PCA can be used for an initial assessment of a matrix, while k-means CA or SVM should be applied as primary methods for provenance analysis. According to the results of these chemometric techniques, some samples from the Popovsky Lug, Ust-Yamniy, and Makarovo sites formed a common cluster, because the sites are very close to each other and have the same clay sources. These sites are in the uppermost (southern) section of the upper Lena, which can be conditionally called Kachugsky after the regional center of the village of Kachuga. The fairly close location of the objects to each other explains the similarity in geomorphological position, the correlation of stratigraphic sections, and the similar composition of clay taken from different objects of this region. The chemical composition of pottery from the Ust-Karenga and Ust-Yumurchen sites (some of the samples) differs from that of pottery from other sites due to the differences in ownership and sources of raw materials. For a more detailed study of ceramics, it will be necessary to analyze a larger number of samples from this site, as well as clay sources. Based on the results obtained in this study, we may assume that the raw materials used for the manufacture of the ceramic samples were local. The database created and processed by the SVM or k-means CA methods can be supplemented with new samples and then automatically classified. This study showed that a combination of the TXRF method and chemometric techniques is a fast and effective way to conduct provenance analysis of archaeological ceramics.  The SVM method is based on data mining and machine learning. By default, the program divides the entire set into 2 groups: training and testing. We placed 75% of the samples into the training group, and the remaining 25% into the test group. The SVM model characteristics were classification type 2, the kernel type was a radial basis function (gamma = 0.450), and cross-validation.

Conclusions
A reference database was created to characterize the ancient ceramics of the studied territories. The database was integrated into the STATISTICA program for the rapid study of the differences and similarities of the samples using chemometric approaches (PCA, k-means CA, and SVM), as well as for classification by geographical origin and ornamentation. The study showed that unsupervised clustering methods such as PCA can be used for an initial assessment of a matrix, while k-means CA or SVM should be applied as primary methods for provenance analysis. According to the results of these chemometric techniques, some samples from the Popovsky Lug, Ust-Yamniy, and Makarovo sites formed a common cluster, because the sites are very close to each other and have the same clay sources. These sites are in the uppermost (southern) section of the upper Lena, which can be conditionally called Kachugsky after the regional center of the village of Kachuga. The fairly close location of the objects to each other explains the similarity in geomorphological position, the correlation of stratigraphic sections, and the similar composition of clay taken from different objects of this region. The chemical composition of pottery from the Ust-Karenga and Ust-Yumurchen sites (some of the samples) differs from that of pottery from other sites due to the differences in ownership and sources of raw materials. For a more detailed study of ceramics, it will be necessary to analyze a larger number of samples from this site, as well as clay sources. Based on the results obtained in this study, we may assume that the raw materials used for the manufacture of the ceramic samples were local. The database created and processed by the SVM or k-means CA methods can be supplemented with new samples and then automatically classified. This study showed that a combination of the TXRF method and chemometric techniques is a fast and effective way to conduct provenance analysis of archaeological ceramics.