1. Introduction
Glass artworks represent a unique intersection of artistic expression and material science, whose preservation is of paramount importance for cultural heritage [
1]. These objects, ranging from historical stained glass windows to intricate decorative pieces, are valued not only for their aesthetic qualities but also for the historical and technological information they embody [
2]. Glass artifacts not only reflect artistic and aesthetic intentions but also preserve information about historical production techniques, raw material sources, and technological innovations, providing valuable insights for both art historians and materials scientists [
3,
4,
5]. The surface integrity of glass artworks is highly sensitive to environmental conditions, including fluctuations in humidity, temperature, exposure to atmospheric pollutants, and variations in light intensity [
6]. These environmental factors often interact in complex, nonlinear ways, influencing weathering processes at multiple scales—from atomic-level compositional changes to micrometer-scale cracking and macroscopic surface corrosion [
3]. Over time, these factors induce complex physicochemical weathering processes, resulting in microstructural deterioration, compositional alterations, and visible changes in color and texture. The susceptibility and manifestation of such weathering processes often depend strongly on the glass composition. For example, lead-barium glass, commonly used in decorative and optical objects, exhibits distinct surface corrosion patterns due to the leaching of lead and barium ions [
7,
8]. Understanding these composition-dependent mechanisms is essential for evaluating the state of preservation, predicting potential risks, and designing targeted conservation strategies. Traditional approaches, however, often fall short in capturing the complex, multidimensional inter-actions among compositional parameters, environmental conditions, and observable deterioration features [
9]. The wide variety of glass formulations and environmental exposures further complicates the establishment of universal criteria for evaluating deterioration. These challenges underscore the need for systematic, data-driven frameworks that integrate chemical, physical, and visual characteristics, enabling more comprehensive, predictive, and interpretable analyses of glass weathering patterns.
Traditional methods for investigating glass weathering primarily rely on laboratory-based analytical techniques such as X-ray diffraction (XRD), scanning electron microscopy (SEM), Raman spectroscopy, and inductively coupled plasma (ICP) analysis [
10]. These methods provide detailed insights into surface morphology, crystallo-graphic changes, and elemental composition, offering high-resolution data essential for understanding specific weathering mechanisms [
7]. However, while such analytical measurements remain fundamental for obtaining reliable compositional information, the subsequent interpretation and classification of glass types or weathering states often depend on manual analysis and conventional statistical approaches. Furthermore, conventional studies often rely on single-variable correlations, which limits their ability to capture the complex, nonlinear interactions among chemical composition, environmental exposure, and observable physical changes in glass artifacts [
11]. This complexity poses significant challenges to traditional analytical and statistical frameworks. These limitations highlight the growing need for systematic, data-driven approaches that can integrate chemical, physical, and visual features in a unified framework, enabling more accurate classification, prediction, and mechanistic understanding of glass weathering processes while improving scalability, reproducibility, and interpretability.
Recent advancements in machine learning (ML) offer a promising avenue to overcome these limitations by enabling the extraction of meaningful patterns and predictive insights from complex, heterogeneous datasets [
12]. In materials science, ML techniques have been increasingly employed to model composition–property relationships, predict degradation or corrosion behavior, and guide the design of novel materials with enhanced stability [
13,
14,
15]. In the context of heritage science, data-driven approaches have demonstrated success in predicting pigment degradation, assessing metal corrosion, and evaluating stone weathering under environmental stressors [
16,
17]. However, existing research has yet to establish a systematic and integrated framework for analyzing glass weathering, and most studies address the issue only through isolated case analyses. Several independent research groups have approached this problem using diverse machine learning methods. Li’s research group [
18] employed a joint Daen-LR, ARIMA-LSTM, and MLR architecture (JMLA) to analyze the chemical composition of ancient glass, demonstrating improved classification accuracy and efficiency in glass type identification. Rahman and colleagues [
19] developed a deep learning-based glass classification model using a convolutional neural network (CNN) that extracts hierarchical features from oxide content data, capturing complex patterns to accurately identify glass types. Chen and colleagues [
20] applied random forest and BP neural networks to successfully recognize eight major and minor sub-classes of glass artifacts, highlighting the potential of neural network-based supervised learning for subclass identification. Meng’s team [
21] conducted principal component analysis on chemical composition data and developed a case-specific clustering algorithm (K-means++) to categorize glass relics, achieving robust clustering validated by inertia and silhouette scores. Tang’s research group [
22] proposed a stacking integration classification combined with Gaussian mixture clustering (SIC-GMC) for component correction and category identification of weathered silicate glass, integrating ensemble learning with probabilistic clustering to handle small datasets. Xu’s group [
23] developed a support vector machine classification model, training it on known samples and applying it to predict the classification of unknown ancient glass artifacts based on their chemical composition. Chen and collaborators [
24] constructed a classification framework using decision trees, support vector machines, and logistic regression based on glass patterns, colors, surface weathering, types, and composition ratios, complemented by K-means clustering for subclassification of high-potassium and lead-barium glass. Cai’s research team [
25] utilized a generalized Shapley function based on fuzzy measurements to analyze the correlation between chemical composition indicators across different glass categories, revealing systematic changes in correlations from unweathered to weathered lead-barium and high-potassium glasses, thus assisting archaeological classification. Despite these promising developments, the systematic application of ML to glass weathering remains limited, particularly when considering multi-dimensional datasets that combine chemical composition, physical properties, and observable visual characteristics. Integrating such approaches holds significant potential for developing scalable, non-destructive, and reproducible frameworks for assessing and predicting surface weathering in glass artworks, ultimately supporting more informed preservation strategies and deepening our understanding of material-specific degradation processes over time.
In this study, we construct a machine learning-based analytical framework using a dataset of weathered glass samples that includes both high-potassium and lead-barium glass, together with their associated chemical compounds. By integrating statistical analysis with methods such as the Prototypical Network, Gaussian Mixture Model (GMM), Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), and Mutual Information, the proposed system aims to identify key compositional indicators, reveal intrinsic clustering patterns, and enhance the interpretability of glass weathering behavior. This work establishes a systematic, data-driven approach for understanding and predicting surface deterioration in glass artworks. Its significance lies in bridging materials science and cultural heritage research, providing a reproducible and non-destructive means to analyze complex degradation processes. The framework not only deepens the scientific understanding of composition-dependent weathering mechanisms but also offers practical insights for the long-term preservation and sustainable management of historical glass artifacts.
Despite the strengths of this integrated analytical framework, several limitations should be acknowledged. First, the dataset used in this study may not cover the full spectrum of glass types and weathering conditions, which could restrict the generalizability of the resulting models when applied to broader archaeological or historical contexts. Second, potential biases and noise introduced during data collection, including variations in sample preparation and analytical instrumentation, may affect the stability and accuracy of the predictions. Third, the current work primarily focuses on the influence of chemical composition on glass classification and weathering behavior, while external environmental factors such as soil characteristics, humidity, temperature, and long-term burial conditions are not explicitly incorporated. These omissions may limit the framework’s ability to capture the complete range of mechanisms that drive surface deterioration in glass materials. Future studies incorporating more diverse datasets and additional environmental parameters would help mitigate these constraints and strengthen the robustness of the analytical approach.
Taken together, these considerations outline both the potential and the current boundaries of data driven approaches in cultural heritage materials research. By recognizing these limitations, this study provides a basis for future work that expands the dataset, incorporates environmental factors, and validates the framework across diverse glass contexts. The following sections present the methodology and empirical findings that demonstrate the framework’s ability to enhance the predictive understanding and interpretability of glass weathering.
2. Methods
2.1. Workflow
In this study, we designed a machine learning-based analytical system for the study of weathered glass, aiming to establish a comprehensive and data-driven framework for understanding, classifying, and interpreting glass weathering phenomena. The specific process of the whole model is shown in
Figure 1. The workflow follows a progressive, four-step approach, moving from macroscopic correlation analysis to detailed chemical interpretation.
Stage 1: Macro-level Correlation Analysis
This initial step employs Pearson correlation analysis to identify potential associations between observable weathering conditions and categorical glass attributes, including glass type, color, and surface pattern. This stage serves as a global screening step, highlighting which broad categorical factors may be influenced by weathering. The results provide a statistical anchor for the subsequent modeling stages and help narrow the analytical focus.
Stage 2: Primary Glass Type Classification
Building on the correlation findings, a Prototypical Network is adopted for the primary classification task, distinguishing between high-potassium and lead-barium glass. The PN is particularly suitable for few-shot learning scenarios, which is essential given the limited sample size typical of heritage glass datasets. The model learns an embedding function that maps chemical composition features into a metric space, where each class is represented by a prototype (i.e., the mean embedding of support samples). Classification is performed by measuring the distance between a query sample and these class prototypes.
Stage 3: Fine-grained Subclass Analysis and Validation
After identifying the major glass types, Gaussian Mixture Models (GMMs) are applied within each type to detect latent compositional subclusters. This probabilistic clustering approach captures subtle chemical heterogeneity and enables fine-grained subclass discovery. To validate the statistical robustness of these subclasses, Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) is subsequently employed. OPLS-DA separates predictive (between-class) variation from orthogonal (within-class) variation, thereby providing a rigorous evaluation of subclass separability.
Stage 4: Mechanistic Chemical Correlation Analysis
In the final stage, Mutual Information (MI)-based network analysis is conducted to quantify nonlinear statistical dependencies among chemical components. By constructing an information-theoretic network of elemental interactions, this analysis reveals co-varying chemical elements and potential structural or weathering-related relationships. This stage provides mechanistic insight into the compositional patterns observed in the classification and clustering results.
2.2. Data Overview
The data were acquired from a private archaeology database containing the chemical composition detection results of 58 glass cultural relics, comprising a total of 66 measurement points. For each glass relic, basic attributes including ornamentation, color, and glass type (high-potassium or lead-barium) were recorded. At each detection point, the degree of weathering and the contents of 14 chemical components were measured. These components included SiO2, Na2O, K2O, CaO, MgO, Al2O3, Fe2O3, CuO, PbO, BaO, P2O5, SrO, SnO, and SO2.
Overall, the dataset describes glass artifacts using four categories of information: glass type, ornamentation, color, and chemical composition. The glass type was classified into two groups: high-potassium and lead-barium. Ornamentation was simplified into three patterns (A, B, and C). Color was categorized into eight shades: light green, light blue, dark green, dark blue, purple, green, teal, and black.
Since the chemical composition variables represent relative proportions of oxides, the dataset constitutes compositional data. Direct application of conventional statistical methods to such data may lead to spurious correlations caused by the closure effect. Therefore, the Centered Log-Ratio (CLR) transformation [
26] was applied to the chemical composition variables prior to subsequent analysis. For a composition vector
the CLR transformation is defined as:
where
denotes the geometric mean of all components. This transformation maps the compositional data from the simplex space to real space, thereby mitigating the closure effect and enabling the application of conventional statistical methods in subsequent analyses.
2.3. Chi-Squared Test
The Chi-squared test is a statistical technique used to compare observed data with expected distributions based on a specific hypothesis. Its main purpose is to determine whether the differences between observed and expected values can be attributed to chance or if they suggest a significant relationship between variables [
27,
28,
29]. Therefore, the formula for Pearson’s chi-squared test is
The Yates’ corrected chi-square test is appropriate when analyzing a 2 × 2 contingency table formed by two binary categorical variables, particularly when one or more cells have expected frequencies between 1 and 5 [
30,
31,
32]. This correction reduces the risk of Type I errors by making the test more conservative. For our dataset, as certain expected cell frequencies fall within this range, we employed the Yates’ continuity correction to adjust the chi-square test.
where
and
represent the observed and expected frequencies for the cell in the
-th row and
-th column, respectively. The degrees of freedom (
) for a contingency table are defined as
, where
is the number of rows and
is the number of columns.
However, while the chi-square test determines whether a statistically significant association exists between variables, the test statistic itself is highly sensitive to sample size and does not directly reflect the strength of the relationship. To address this, we calculated Cramér’s V as a measure of effect size to assess the practical significance of the observed associations [
29]. Cramér’s V scales the chi-square statistic to a range between 0 and 1, where 0 indicates no association and 1 indicates a perfect relationship. For a 2 × 2 table, it is calculated as:
where
is the total number of observations. To evaluate the practical significance of the findings, the magnitude of the effect size was interpreted as follows [
33]: weak (
), moderate (
), and strong (
). This classification ensures that even when statistical significance is achieved, the actual strength of the association is rigorously assessed.
2.4. Prototypical Network
Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks [
34,
35]. Their simplicity and computational efficiency make them an appealing alternative to more complex meta-learning algorithms for few-shot and zero-shot learning. As shown in
Figure 2, the framework is composed of an embedding network that maps input samples into a low-dimensional feature space and a prototype metric classifier that performs classification based on the Euclidean distance between embedded samples and class prototypes.
The basic principle is as follows [
36,
37]: Given a sample set
and a query set
, PN is achieved by first constructing the prototypes of all sample classes, then measuring the distance between the query data features and the class prototypes using a fixed function. The prototype for the
-th class
is calculated with
where
is the
-th sample of class
,
denotes the feature extraction module that extracts the data into a feature vector. Then for a query data
from
, the probability that it belongs to the
-th class is calculated by a non-parametric softmax classifier.
where
is a metric function.
Taking the negative logarithm of the probability, the cross-entropy loss function can be computed as follows:
To implement the proposed Prototypical Network framework in this study, a lightweight embedding network based on a multilayer perceptron (MLP) was constructed to accommodate the small tabular dataset. The embedding network consists of two fully connected layers, where the 14-dimensional input features are first projected to a 16-dimensional hidden representation followed by a ReLU activation function, and then further transformed into an 8-dimensional embedding space used for metric-based classification. The model parameters were initialized using the default Kaiming uniform initialization in PyTorch (version 2.5.1). Training was performed using the Adam optimizer with a learning rate of 1 × 10−3 for 30 epochs and a batch size of 16, without applying dropout or weight decay. Class prototypes were computed as the mean embedding of all samples belonging to each class in the training set, and classification was carried out by measuring the Euclidean distance between the query embeddings and these prototypes. The architecture and hyperparameters were selected to match the limited scale of the dataset and to ensure stable and efficient convergence without extensive hyperparameter tuning.
2.5. Gaussian Mixture Model
Gaussian Mixture Modeling (GMM) is performed using the sklearn.mixture package in Python 3.10 to identify potential sub-clusters within the glass samples. To account for the varying scales of different chemical components, the multivariate chemical composition data (comprising 14 features) are first normalized using Z-score standardization [
38,
39]. Our approach treats GMM as a multivariate clustering method. The probability density of the data
is defined as a weighted sum of
Gaussian distributions:
where
,
, and
represent the mixing proportions, mean vectors, and full covariance matrices for each component, respectively. The model parameters
are determined through the Expectation-Maximization (EM) algorithm, initialized via the k-means strategy [
40].
Model selection, specifically the determination of the optimal number of Gaussian components, is based on minimizing the Bayesian Information Criterion (BIC), defined as:
where
is the maximized likelihood function of model
with
components, with maximizing parameters
, determined through the EM algorithm,
is the sample size, and
is the number of estimated parameters [
41]. The Bayesian Information Criterion (BIC) is a log-likelihood-based metric that includes a penalty term for model complexity, helping to prevent overfitting. It has been widely validated across diverse applications [
42].
To further validate the clustering quality and ensure the mathematical validity of the identified subclasses, the Silhouette Coefficient and the Davies–Bouldin Index (DBI) are employed as quantitative internal evaluation metrics. The Silhouette Coefficient measures how similar an individual sample is to its assigned cluster relative to other clusters, with values ranging from −1 to 1, where a higher average value indicates superior cluster separation [
43]. The Davies–Bouldin Index evaluates the average similarity between each cluster and its most similar one, where a lower value signifies a more distinct and compact clustering structure [
44]. By identifying the number of components that simultaneously achieve a minimum BIC and a minimum DBI while maintaining a high Silhouette Coefficient, the most statistically robust classification of glass artifacts is determined.
To facilitate visualization and exploratory analysis, Principal Component Analysis (PCA) is applied to reduce the dimensionality of the multivariate chemical composition data. PCA projects the original high-dimensional feature space onto a lower-dimensional subspace spanned by the leading principal components, which capture the majority of data variance [
45,
46]. In this study, the first two principal components are retained and used for visualizing the distribution patterns and clustering tendencies of glass samples, providing an intuitive representation that complements the subsequent GMM analysis.
2.6. OPLS-DA
Orthogonal projection to latent structures discriminant analysis (OPLS-DA) is a widely used statistical method in multivariate data analysis, particularly in class pattern recognition [
47,
48]. In OPLS-DA, the response
is a dummy matrix that contains the information about class membership for each observation. OPLS separates the variation described by the model into two different parts: predictive and orthogonal. The predictive part is the variation in
that is used to model the variation in
. The orthogonal part contains variation in
that is unrelated to the response
. In the OPLS-DA context, the predictive part contains between-class variation while the orthogonal part contains within-class variation. By dividing the variation into two parts, the interpretation of the model becomes easier [
49].
The OPLS-DA method can establish correlation models between different indices and samples, and subsequently screen indices that reflect sample differences, as represented by the variable importance in projection (VIP) values [
50]. In addition, to evaluate the accuracy and reliability of the OPLS-DA model, permutation testing is commonly employed.
To ensure the robustness of the OPLS-DA model and to mitigate the potential risk of overfitting given the constraints of the sample size, a rigorous internal validation protocol was implemented. The predictive performance of the model was first assessed using a 7-fold cross-validation procedure. During this process, the dataset was partitioned into seven subsets where each subset was systematically excluded and predicted by the remaining six subsets. This iterative process yielded the cumulative parameters
, which represents the fraction of the variation of
explained by the model, and
, which represents the fraction of the variation of
that can be predicted by the model according to the cross-validation [
51,
52].
Furthermore, a permutation test with 200 iterations was conducted to evaluate the statistical significance of the classification. In each iteration, the class labels in the
matrix were randomly shuffled while the chemical composition data in the
matrix remained constant, followed by the recalculation of the corresponding OPLS-DA model. The resulting
and
values from the permuted models were compared against those of the original model. The model is considered statistically valid if the permuted
values are consistently lower than the original value and the
intercept on the regression line is below zero [
52,
53].
2.7. Mutual Information Network
Mutual information (MI) inherently addresses the challenge of fairly measuring statistical associations between paired variables. It serves as a fundamental measure of statistical dependence between two random variables, where higher MI values indicate stronger dependence. Previous studies have characterized MI as an objective function that promotes model fairness by maximizing the entropy of cluster proportions, while simultaneously enhancing firmness by minimizing conditional entropy [
54]. Together, these properties demonstrate that mutual information provides a robust and intrinsically meaningful framework for interpreting the increasingly large and complex datasets encountered across a wide range of scientific and industrial applications [
54,
55,
56].
Formally, MI quantifies the reduction in uncertainty of one random variable given knowledge of another. For two random variables
and
, mutual information is defined as:
where
,
, and
are the joint and marginal probability distributions, respectively.
Mutual information (MI) was estimated using the k-nearest neighbors (k-NN) estimator [
57] implemented in scikit-learn’s mutual_info_regression, with k set to 3. This non-parametric estimator avoids binning artifacts and is suitable for continuous variables with small sample sizes.
To enable fair per-class comparison, each class-specific MI matrix was independently min-max normalized to [0, 1] based on its own upper-triangular values (excluding diagonal). The differential score was computed to highlight class-specific differences.
Statistical robustness was assessed via bias-corrected and accelerated (BCa) bootstrap confidence intervals [
58] (1000 resamples,
to account for low power in
n = 18). Pairs were considered significant if the 90% BCa CI strictly excluded zero.
2.8. Evaluation Metrics
To quantitatively evaluate the performance of the proposed method, we adopt four widely used evaluation metrics: Accuracy, Precision, Recall, and F1-score. These metrics are computed based on the confusion matrix, which consists of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) [
59,
60].
Accuracy measures the overall correctness of the model by calculating the proportion of correctly classified samples among all samples. It is defined as:
Precision reflects the reliability of positive predictions, indicating the proportion of correctly predicted positive samples among all samples predicted as positive:
Recall, also known as sensitivity, measures the ability of the model to correctly identify positive samples. It is defined as the ratio of true positive samples to all actual positive samples:
F1-score is the harmonic mean of Precision and Recall, providing a balanced evaluation when there is a trade-off between the two metrics:
3. Results and Discussion
3.1. Correlation Analysis
Chi-square tests were performed to evaluate the associations between weathering conditions and three categorical variables: glass type, color, and surface pattern. This analysis enabled us to determine whether observable deterioration features showed statistically significant relationships with specific compositional categories or visual attributes, thereby providing an initial quantitative assessment of variation in weathering behavior among different classes of glass.
The results (
Table 1) indicate a statistically significant association between glass type and weathering (
,
), demonstrating that the occurrence of weathering differs significantly between glass types. To assess the practical significance of this relationship, Cramér’s V was calculated as 0.344, indicating a moderate association (
) between the variables. Specifically, lead-barium glass exhibits a markedly higher tendency to undergo weathering compared with high-potassium glass, suggesting that compositional differences may influence the susceptibility of glass to deterioration.
In contrast, glass color (
Table 2) was not significantly associated with weathering based on the Yates-corrected chi-squared test (
,
). However, this non-significant result should be interpreted with caution, as the statistical power of the test may be limited. Therefore, while no strong association between glass color and weathering is detected in this dataset, the possibility of a true relationship cannot be excluded.
The association between surface pattern and weathering (
Table 3) was marginal but not statistically significant (
,
), suggesting a possible but inconclusive relationship between decorative style and weathering behavior. Given the low expected frequencies in some categories and the limited statistical power of the analysis, this result should be interpreted as suggestive rather than definitive evidence against an association. Furthermore, the analysis is based on a relatively small dataset, which may not fully represent the broader population of ancient glass artifacts. Consequently, the observed pattern should be interpreted with caution.
Within the scope of the present dataset, the findings suggest that chemical composition appears to play a dominant role in the weathering processes of ancient glass, whereas visual attributes such as color and decorative patterns exhibit comparatively weaker or indirect associations. Further investigations based on larger and more comprehensive datasets are required to validate and generalize these observations.
3.2. Classification of Glass Types
Based on the chi-squared test results, which indicated that chemical composition plays a dominant role in the weathering behavior of ancient glass, the Prototypical Network was further evaluated for classifying lead-barium glass and high-potassium glass using 14 chemical components as input features. During training, the model exhibited rapid and stable convergence (
Figure 3). The initial classification accuracy reached 81.25% with a loss of 0.5502 in the first epoch, and accuracy improved steadily to 100% by the 13th epoch, accompanied by a continuous decrease in training loss to 0.0475. These results demonstrate that chemically informed feature representations enable effective discrimination between the two glass types.
On the test set, the model achieved 100% classification accuracy (
Figure 4), demonstrating that it effectively captures the characteristic chemical feature distributions of each glass type. These results indicate that the Prototypical Network is highly effective for few-shot classification of materials, achieving both fast convergence and robust generalization on limited multicomponent chemical data.
To mitigate the potential influence of data partitioning on model performance and stability, different random seeds were used to generate multiple train–test splits of the dataset. This strategy reduces the potential bias associated with a single random partition and provides a more robust evaluation of the model. For each randomized split, the dataset was first divided into training and test subsets. Feature scaling was then performed using a StandardScaler (scikit-learn 1.3.2) fitted exclusively on the training data, and the resulting parameters were subsequently applied to transform both the training and the corresponding test set, thereby preventing any potential data leakage. Based on this procedure, a total of 10,000 independent training and evaluation runs were conducted. The final model performance was summarized by reporting the average values of four evaluation metrics, accuracy, precision, recall, and F1-score, as presented in
Table 4.
The experimental results demonstrate that the proposed Prototypical Network consistently achieves strong classification performance. Specifically, the model attains an average accuracy of 0.9674, an average precision of 0.9837, an average recall of 0.9720, and an average F1-score of 0.9766, indicating stable and reliable behavior across diverse data partitions.
To ensure a fair comparison, the CART (Classification and Regression Tree) decision tree model was evaluated using the same experimental protocol, with consistent data preprocessing procedures, random resampling strategies, and evaluation metrics applied across all models. When compared with the widely adopted CART decision tree [
61,
62,
63], the average performance of the Prototypical Network is found to be highly comparable. Considering that the validation set contains only 14 samples, the observed differences in average performance between the two models are relatively small and can be regarded as negligible.
Importantly, beyond average performance, we further examine the models under worst-case data partition scenarios. Under these unfavorable conditions, the Prototypical Network demonstrates a consistent performance advantage of approximately 10% over the decision tree model across the evaluated metrics. This result indicates a markedly improved level of robustness and stability with respect to adverse sample distributions, which is particularly desirable in small-sample and data-sensitive classification tasks.
To further evaluate the robustness of the Prototypical Network, a sensitivity analysis was conducted using the pre-trained model, which achieved 100% accuracy on the original test set. Random perturbations with varying magnitudes (±0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1) were independently applied to each chemical component in the test data, and the model was evaluated over 100 repeated runs at each noise level. The analysis revealed that only perturbations in SiO
2 resulted in observable changes in classification outcomes, whereas variations in other chemical components had negligible effects. The corresponding frequencies of classification changes across different perturbation levels are reported in
Table 5.
The near-exclusive sensitivity to SiO
2 perturbations highlights its dominant discriminative role in the embedding space, consistent with its position as the primary network former in silicate glasses. Importantly, this does not imply the model reduces to a univariate SiO
2 threshold; the embedding is learned jointly across all oxides, enabling robust multivariate separation even under adverse data partitions (
Table 4).
3.3. Subclass Analysis and Validation
Using lead-barium glass as a case study, Gaussian Mixture Models (GMM) were employed to investigate potential subclass structures. As summarized in
Table 6, model selection was rigorously performed by evaluating a range of clusters from
n = 1 to 6. The Bayesian Information Criterion (BIC) reached its global minimum at
n = 3 (958.11), providing strong statistical evidence for a three-component model. This selection was further validated by quantitative internal metrics, where the tri-cluster configuration achieved a Silhouette Coefficient of 0.2062 and a Davies–Bouldin Index (DBI) of 1.5901. Although higher-order models showed slight improvements in geometric separation, their significantly elevated BIC values indicated a high risk of over-parameterization. Consequently, the
n = 3 model was identified as the most robust and parsimonious representation, effectively balancing goodness-of-fit with model complexity.
Subsequent Principal Component Analysis (PCA) was performed for visualization (
Figure 5), revealing clear separation among the three clusters in the reduced feature space. These results demonstrate that GMM, combined with dimensionality reduction, can effectively uncover latent subclass structures in lead-barium glass based on chemical composition.
The OPLS-DA model achieved clear separation among the three subclasses of lead-barium glass. The score plot (
Figure 6a) revealed distinct clustering patterns, indicating that the chemical compositions of the subclasses are systematically different [
64,
65]. The model showed strong explanatory and predictive performance, with
,
, and
, suggesting good model fitness and reliability. The permutation test (
Figure 6b) confirmed the robustness of the model [
65], as all permuted
values were lower than the original and the intercept of
was −0.333, demonstrating the absence of overfitting. The VIP plot (
Figure 6c) identified the main variables contributing to class discrimination. Components such as BaO, SiO
2, PbO, exhibited VIP > 1, indicating their significant influence on the differentiation of subclasses. These compositional differences likely reflect variations in raw material sources or production techniques within the lead-barium glass group.
Based on the classification results and compositional characteristics, the lead-barium glass samples can be divided into three subclasses:
Subclass 1 characterized by high PbO content;
Subclass 2 with high SiO2 and low PbO;
Subclass 3 with high BaO and low SiO2.
To quantify the tolerance of the lead-barium glass subclassification to numerical variation in individual compositional variables, a feature-wise perturbation analysis was conducted by introducing bidirectional multiplicative fluctuations in the original data space. For a given compositional feature with original value
, the perturbed value
was generated according to:
where
denotes the noise amplitude. The noise level
was progressively increased from small (0.01 and 0.02), through intermediate values (0.05, 0.1, and 0.2), to larger amplitudes spanning 0.3–0.9 and ultimately reaching 1.0, thereby covering a wide range of symmetric positive and negative numerical fluctuations.
For each feature and each noise level , 100 independent perturbation tests were performed. After perturbation, the data were standardized using the scaler derived from the original dataset and reclustered using a fixed three-component GMM. Subclass assignments were compared with the original clustering after label alignment. An acceptable noise range was conservatively defined as the maximum value of for which the subclassification remained completely unchanged across all repeated tests; once at least one sample exhibited a different subclass label, the model was regarded as unstable for that feature.
As a result, the feature-specific acceptable noise ranges derived under this criterion are summarized in
Table 7. The tolerable fluctuation amplitudes vary substantially among chemical components, reflecting pronounced differences in their influence on the stability of lead-barium glass subclassification. SO
2 exhibits the most restricted tolerance (±0.05), indicating that even minimal bidirectional numerical fluctuations in this component may lead to changes in subclass assignment. Several components, including BaO, Na
2O, PbO, SiO
2, and SnO
2, also display low acceptable noise ranges (±0.1), suggesting that these variables exert a strong control on subclass boundaries and that relatively small positive or negative deviations can affect the clustering outcome.
In comparison, Al2O3 and CaO show moderate tolerance (±0.2), while P2O5 exhibits a slightly higher threshold (±0.3), indicating a more balanced contribution to the clustering structure. Components such as CuO, Fe2O3, K2O, MgO, and SrO maintain stable subclass assignments under substantially larger perturbations (±0.4–0.5), implying a comparatively weaker influence on subclass differentiation within the explored fluctuation range.
From an analytical perspective, components characterized by low acceptable noise thresholds—particularly SO2, PbO, Na2O, BaO, SiO2, and SnO2—warrant special attention in measurement strategies, as their numerical variability can substantially influence compositional interpretation. Improving the analytical reliability of these components, for example through optimized calibration procedures, enhanced signal stability, or refined quantification approaches, would contribute to a more accurate characterization of lead-barium glass and thereby strengthen the overall reliability of related compositional analyses and subclassification studies.
3.4. Chemical Correlation Analysis
A more detailed examination of the internal compositional structure of lead-barium glass revealed several pronounced and systematically recurring elemental correlations that shed light on the material’s chemical behavior during both formation and weathering processes. As shown in
Figure 7, SiO
2 exhibited strong negative correlations with PbO, P
2O
5, and SrO, suggesting that an increase in silica content is typically accompanied by a decrease in these oxides. In contrast, CuO showed a strong positive correlation with BaO, and BaO was also positively correlated with SO
2, indicating possible co-variation in their raw material sources or melting behavior. Additionally, CaO and P
2O
5 displayed a strong positive correlation, implying a possible association in the glass network structure.
To further investigate the inter-element relationships in the two types of ancient glass, a mutual information (MI)-based differential network analysis was performed. Separate MI matrices were first computed for the two glass categories, and their normalized difference matrix was used to construct the network. Edges were retained only when the bootstrap confidence interval of the differential score excluded zero, indicating statistically significant differences in association strength between the two glass types.
In the lead-barium glass (Class 1), the network is dominated by PbO and SiO
2, which show the strongest positive differential association (
). Significant associations also emerged between SiO
2 and SrO (
), as well as Fe
2O
3 and PbO (
). These high positive
values indicate that the lead-silicate framework in Class 1 is highly cohesive, with PbO acting as the primary flux that strongly co-varies with the glass-forming SiO
2 and the stabilizing SrO [
66].
In contrast, the high-potassium glass (Class 0) exhibits a more complex, multi-component dependency network. The glass former SiO
2 emerged as a central hub, showing significantly stronger associations (negative
values) with Al
2O
3 (
), CaO (
), and K
2O (
). Additionally, strong dependencies were observed between Fe
2O
3 and P
2O
5 (
), and K
2O and CaO (
). This pattern suggests that in the potassium-based system, the chemical structure is more dependent on an integrated aluminosilicate and calcium-potassium-silicate framework, potentially reflecting the use of plant ash or specific mineral fluxes where multiple oxides are introduced simultaneously [
67,
68].
Overall, the MI network analysis (
Figure 8) highlights a fundamental transition in chemical integration: the lead-barium glass is characterized by a focused lead–silicate interaction, while the high-potassium glass relies on a more distributed and interdependent alkali-calcium-aluminosilicate network.
While the present framework successfully discriminates weathering patterns based on oxide composition alone, glass deterioration is inherently an interaction between material intrinsic properties and extrinsic environmental conditions. Prior studies have shown that environmental factors strongly influence the kinetics and extent of alteration, whereas composition primarily governs the qualitative nature of weathering products and relative durability. The absence of site-specific environmental metadata in our dataset precludes modeling these interactions directly. Future extensions incorporating environmental context variables would enable more holistic predictive modeling of degradation risks.