Next Article in Journal
Enhanced Flame Retardancy of Silica Fume-Based Geopolymer Composite Coatings Through In Situ-Formed Boron Phosphate from Doped Zinc Phytate and Boric Acid
Previous Article in Journal
Adsorption of Antibiotics by Natural Clay Minerals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrated Compositional Modeling and Machine Learning Analysis of REE-Bearing Coal Ash from a Weathered Dumpsite

1
Institute of Combustion Problems, Almaty 050012, Kazakhstan
2
Faculty of Chemistry and Chemical Technology, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
*
Author to whom correspondence should be addressed.
Minerals 2025, 15(7), 734; https://doi.org/10.3390/min15070734
Submission received: 13 May 2025 / Revised: 19 June 2025 / Accepted: 12 July 2025 / Published: 14 July 2025
(This article belongs to the Section Mineral Processing and Extractive Metallurgy)

Abstract

Coal combustion residues are increasingly viewed as alternative sources of rare earth elements (REEs), but their heterogeneous composition and post-depositional alteration complicate resource evaluation. This study analyzes 50 coal ash (CA) samples collected from a weathered dumpsite near Almaty, Kazakhstan, originating from power generation using coal from the Ekibastuz Basin. A multi-method approach—comprising bulk chemical characterization, unsupervised clustering, X-ray diffraction (XRD), scanning electron microscopy (SEM), and supervised machine learning (ML)—was applied to identify consistent indicators of REE enrichment. While conventional regression models failed to predict individual REE concentrations accurately, ML algorithms consistently highlighted vanadium (V) as the most robust predictor of ΣREE across Random Forest, XGBoost, and LASSO. This suggests that V may act as a geochemical proxy for REE-bearing phases, potentially due to co-retention in amorphous or ferruginous matrices. Despite compositional similarity among many samples, XRD and SEM revealed marked variability in phase structure and crystallinity, underscoring the limitations of bulk oxide data alone. These findings demonstrate that REE behavior in ash cannot be predicted deterministically, but ML can be used to screen for informative compositional signals. The proposed workflow may support the preliminary classification and valorization of heterogeneous ash materials in secondary resource strategies.

1. Introduction

Coal combustion continues to play an important role in global energy production, particularly in rapidly industrializing regions. However, this process generates vast amounts of coal ash (CA), which is a complex mixture of inorganic residues that includes both fly ash and bottom ash [1,2,3]. CA formation creates significant environmental and management challenges. At the same time, CA is increasingly recognized as a promising secondary resource for the extraction of strategically important elements, particularly rare earth elements (REEs) [4,5,6,7] and germanium (Ge) [8,9,10].
REEs such as cerium (Ce), lanthanum (La), yttrium (Y), and scandium (Sc) are essential for a wide range of advanced applications, including permanent magnets, batteries, catalysts, lasers, and phosphors [11,12,13,14]. Sc, although often grouped with REEs, exhibits distinct behavior and is particularly valued for its use in high-strength aluminum alloys and solid oxide fuel cells [15,16,17,18]. Ge is another critical element with high economic value, owing to its use in fiber optics, infrared imaging systems, semiconductors, and photovoltaics [19,20,21,22]. The global supply of both REEs and Ge is geographically concentrated, raising concerns over resource security, supply chain vulnerability, and price volatility.
Recent studies have demonstrated that CA, especially the fine-grained fly ash fraction, can contain economically relevant concentrations of REEs and Ge [23,24,25,26]. However, the distribution, enrichment mechanisms, and mineralogical associations of these elements are highly variable and often poorly understood [27,28,29]. Factors such as the mineral composition of the parent coal, combustion temperature, ash particle size, and redox environment all influence the retention and partitioning of REEs and Ge into the ash residue. In this context, a deeper understanding of the multivariate chemical environment that governs the mobility and enrichment of target elements is essential for developing viable extraction technologies.
In recent years, machine learning (ML) techniques have gained momentum in coal-related research, offering powerful tools for predicting material properties and critical element content from easily measurable parameters. ML models have been used to estimate ash content [30], amorphous phase fraction [31], and metal recovery potential [32], as well as to support the design of ash-based construction materials [33,34]. Specifically in the context of critical elements, several studies have focused on predicting the concentrations of rare earth elements (REEs) in coal itself. Chatterjee et al. demonstrated the use of support vector machines and neural networks, combined with data augmentation techniques, to classify Indiana coal samples with high REY (rare earths and yttrium) potential using routine coal parameters [35]. Their follow-up work expanded this approach through geostatistical mapping and uncertainty modeling, enabling identification of coal zones with economically viable REE potential [36]. In parallel, Bhatt et al. (2025) applied spectroscopy-based methods such as LIBS and LA-ICP-MS coupled with principal component analysis (PCA) and multivariate regression to quantify and image REE distribution in coal samples [37].
Beyond raw coal, ML has also been applied directly to CA, where REE and Ge redistribution during combustion introduces further complexity. Song et al. proposed a multi-task neural network to predict the content of multiple REEs in CAs using bulk oxide composition as input, enabling fast and cost-effective screening of ash feedstocks [38]. Xu and Li (2019) addressed Ba–Eu spectral interference in coal combustion products using ML-based predictive thresholds for accurate interpretation of REE data in ICP-MS analysis [39]. These works underscore the growing utility of ML both upstream and downstream in the coal utilization chain. Despite these advances, there remains a lack of studies that integrate ML with geochemical and mineralogical interpretation of REE and Ge behavior in post-combustion residues, especially for high-ash coals.
The present study aims to bridge this gap by combining multivariate statistics, clustering, and ML to assess REE and Ge enrichment patterns, compositional predictors, and selective recovery strategies in CA samples from the Ekibastuz Basin in Kazakhstan. The primary goals of this study are to elucidate the multivariate chemical controls on REE and Ge behavior in CA from the Ekibastuz Basin; to identify key predictors of their distribution using both statistical and ML tools; and to propose selective extraction strategies tailored to the geochemical characteristics of distinct ash types. By linking compositional fingerprinting with recovery potential, this work contributes to the broader effort to develop efficient resource recovery technologies for critical elements from industrial waste streams.

2. Materials and Methods

2.1. Coal Ash Sampling

Sampling was carried out at the CA dump of Almaty TPP-2, which operates on Ekibastuz coal. The location of the ash dump on the map with the sampling area (red circle) is shown in Figure 1. The sampled material, originating from surface ash dumps, represents a heterogeneous mixture of fly ash and bottom ash residues. It should be noted that, since the samples were collected from an open-air dumpsite, secondary or weathering-derived phases may contribute to the observed REE distribution. Additionally, meteoric water exposure could influence REE mobility and potentially cause vertical variations in concentration.
To collect samples of CA from the dump, a location marked on the map was determined. A square with sides of about 20 m was visually marked. On each side of the square, 5 points (including the vertices) were marked every 5 m, as shown in Figure 2.
A drill was used to collect samples at the specified points, starting from the surface of the dump and further at the following depths: 0.5; 1.0; 1.5; 2.0 m. A total of 125 samples weighing 1.5–2.5 kg were collected. Each sample was numbered and placed in separate polyethylene bags and then used for research as follows. At each of the 25 locations, two composite samples were prepared. The first one (Sample A) combined material from the surface and from a depth of 1.5 m. The second (Sample B) included material taken from 0.5 and 2.5 m. To both A and B, an additional portion collected at 1.0 m depth was added to ensure uniformity across depth intervals. In total, 50 samples were obtained and subjected to further analysis.
CA samples were dry, loose, spherical powder of gray color; depending on the sampling location, the color shades varied from light gray to dark gray.

2.2. Analytical Methods for Coal Ash Characterization

Elemental analysis of the CA samples was performed as follows. The samples were mixed with a NaOH-Na2CO3 mixture (1:1 by weight) in a 5-fold excess and fused at 500 °C for 1 h. The resulting fusion cakes were subjected to complete acid digestion using a microwave-assisted system with a hydrochloric acid (36%) and hydrogen peroxide mixture at 80 °C for 2 h. The concentrations of elements in the resulting solutions were determined by atomic absorption spectroscopy (AAS, GBC Scientific Equipment Pty Ltd., Melbourne, Australia), except for silicon and phosphorus, which were measured by inductively coupled plasma optical emission spectroscopy (ICP-OES, Optima 8300, PerkinElmer, Waltham, MA, USA). Each determination was performed in triplicate, with relative standard deviations not exceeding ±3%. The elemental compositions of samples 1–50 are presented in Table S1.
The phase composition and crystallinity of the CA samples were recorded by X-ray diffraction (XRD) using Cu-Kα radiation in the 2θ range of 10–80°. The amorphous phase content was semi-quantitatively estimated for all samples as the difference between the total diffracted area (10–50° 2θ) and the integrated area of crystalline peaks above the baseline.
The surface morphology of the material was investigated using Quanta 200i 3D (FEI Company, Hillsboro, OR, USA) electron microscopy.
In addition, the following characteristics were determined separately for the coal ash samples: Fe(II) content, soluble alumina, acid-insoluble residue, loss on ignition (LOI), unburned carbon, and free CaO.
The content of ferrous iron (Fe2+) in CA samples was determined by redox titration. The samples were digested in sulfuric acid, and if necessary, ferric iron was reduced to ferrous using an ascorbic acid. The resulting Fe2+ was titrated with a standard potassium permanganate solution until a persistent pink endpoint was reached.
Soluble alumina (Al2O3) in CA samples was evaluated by leaching the sample with dilute hydrochloric acid (1:5), followed by filtration and analysis of the leachate using complexometric titration with EDTA.
The acid-insoluble residue was determined gravimetrically. The sample of CA was treated with a mixture of hydrochloric and nitric acids to dissolve soluble components. The residue was filtered, dried, and weighed to quantify refractory minerals, which are expected to be dominated by quartz and aluminosilicate phases [40].
Loss on ignition (LOI) was measured by heating a known mass of the sample of CA in a muffle furnace at 950–1000 °C for 1–2 h.
Unburned carbon content in the CA samples was estimated from LOI, under the assumption that the loss primarily results from the combustion of residual carbonaceous matter. However, due to the heterogeneous composition of the CA samples, including possible minor contributions from carbonate decomposition and other volatile phases, the estimated unburned carbon values in some cases exceed the measured LOI.
Free calcium oxide (free CaO) in the CA samples was determined by ethylene glycol extraction. The sample was treated with ethylene glycol, which selectively dissolves free CaO. The extract was titrated with standard hydrochloric acid using phenolphthalein as an indicator.
The content of Fe2+, soluble Al2O3, insoluble residue, LOI, and unburned carbon, as well as free CaO is presented in Table S2.

2.3. Data Processing and Analytical Workflow

The analytical dataset included 50 CA samples characterized by 22 attributes, incorporating both major and trace elements, as well as derived analytical indicators such as loss on ignition, unburned carbon, and free CaO content (Table S3). Data processing was performed in Python 3.10 using the Google Colab environment with standard scientific libraries including Pandas, Numpy, Matplotlib 3.7.1, Seaborn 0.12.2, and Scikit-learn.
To explore multivariate relationships between chemical components, unsupervised clustering techniques were employed. In particular, agglomerative hierarchical clustering (AHC) was used with Ward’s linkage and multiple distance metrics, including Euclidean, Manhattan, and cosine distance. This allowed for the identification of chemical affinity groupings without assuming linearity or normality. The number and quality of clusters were assessed using dendrograms and silhouette coefficients. Dimensionality reduction was also performed using principal component analysis (PCA) to visualize underlying trends and data structure.
Supervised ML models were applied to assess which chemical attributes most significantly influence the observed variability in REE and Ge content. Regression-based models, including Random Forest, XGBoost, and LASSO, were tested using 5-fold cross-validation. Due to the heterogeneous nature of the ash material and expected noise, model performance was evaluated not only by R2 scores but also by interpretability. To this end, SHAP (SHapley Additive Explanations) values were used to estimate the relative importance of each input variable. The focus of the ML analysis was not on precise prediction but on identifying compositional attributes consistently associated with REE and Ge enrichment.
This clustering strategy and the use of alternative similarity metrics follow recommendations from recent studies on combustion residues and coal ash geochemistry [41,42,43].

3. Results and Discussion

3.1. General Chemical Composition and Variability in REE Content in CA

The overall oxide composition of the studied ash samples is presented in Figure 3.
The material is dominated by silica (SiO2) and alumina (Al2O3), with average concentrations of approximately 62 wt % and 28 wt %, respectively. Iron oxide (Fe2O3) reaches around 8 wt %, while P2O5, Ti, CaO, and MgO occur at notably lower levels, generally below 1 wt %. The low dispersion in SiO2 and Al2O3 contents reflects a relatively homogeneous aluminosilicate framework, whereas Fe2O3 and P2O5 show broader variability (see Table S1), possibly due to the heterogeneous nature of combustion residues and secondary weathering processes at the dumpsite.
To evaluate REE enrichment, the combined concentrations of Ce, La, Y, and Sc were calculated as ΣREE. The results are shown in Figure 4.
The ΣREE values span a range from approximately 91,000 to over 133,000 mg/kg. While most samples fall within a narrow compositional band, several display significantly elevated concentrations, indicating localized enrichment. This heterogeneity may be attributed to variable proportions of REE-hosting phases such as phosphates, aluminosilicate glass, or their weathering derivatives, as evidenced by previous studies of coal-ash mineralogy [6,25].
The samples with the highest ΣREE are of particular interest for further mineralogical characterization and selective recovery strategies.
The relationships between ΣREE and selected critical elements (Ge, Ga, Li, and V) were further explored using Spearman’s correlation analysis. Spearman’s rank-order correlations were calculated because several major and trace-element variables are non-normally distributed and display monotonic but non-linear trends. In such situations, Spearman’s ρ is the standard choice for correlation analysis of skewed geochemical or environmental datasets [44].
Figure 5 presents the resulting correlation matrix. A moderate positive correlation was observed between Ge and Ga, suggesting possible co-enrichment or shared geochemical behavior. In contrast, ΣREE exhibited weak correlations with other critical elements, indicating distinct sources or mineral hosts. The observed correlations are consistent with earlier studies showing that REEs in coal ash are chiefly retained in Si–Al glass or discrete phosphate minerals, whereas Ga and Ge are hosted mainly by aluminosilicate glass, oxide residues, or volatile condensates; i.e., they occupy mineralogical domains distinct from those of the REEs [5,6,7].

3.2. Pattern Recognition and Multivariate Analysis

The chemical complexity and inherent heterogeneity of ash samples from dumpsites limit the interpretive power of pairwise correlations alone. While ΣREE and selected critical elements showed some degree of variability across the dataset, these trends may reflect non-linear or multivariate dependencies. To identify sample groupings based on overall chemical similarity, agglomerative hierarchical clustering was applied to the scaled dataset. Figure 6 presents the resulting dendrogram generated using Ward’s linkage method. The vertical axis represents the linkage distance, which reflects the increase in within-cluster variance when clusters are merged.
The dendrogram reveals a clear hierarchical structure, with samples progressively grouped based on chemical similarity. At lower linkage distances, individual samples with nearly identical compositions are merged, forming tightly bound subgroups. As the linkage distance increases, these subgroups combine into broader clusters. A natural division into four primary compositional clusters is evident around a linkage distance threshold of 10–15: cluster 1 (orange in Figure 6)—samples 49 and 50; cluster 2 (green)—samples 10, 21, 25, 26, 31–33, 35–45; cluster 3 (red)—samples 1–5, 7–9, 11–14, 16–20, 22, 24, 27, 28, 30; cluster 4 (violet)—samples 6, 15, 23, 29, 34, 46–48. These groupings may reflect differences in combustion regimes, feedstock variability, or post-depositional weathering effects that influence the distribution of both major and trace components.
To further investigate the underlying structure of multivariate chemical variation in the dataset, PCA was performed on the scaled compositional data. This technique reduces the dimensionality of the dataset while preserving the maximum amount of variance, allowing for visualization of dominant compositional trends and relationships between samples. PCA is particularly well suited for identifying latent drivers of geochemical variability, especially in complex, heterogeneous materials such as coal ash [45,46]. In this study, PCA was applied to the full set of standardized major and trace elements, and the results were projected onto the two leading principal components, which together account for the majority of the variance. Figure 7 displays the PCA score plot for the first two principal components (PC1 and PC2), illustrating the multivariate distribution of samples in reduced-dimensional space.
The PCA score plot shown in Figure 7 reveals distinct patterns in the multivariate distribution of CA samples. The first principal component (PC1), accounting for 44.6% of the total variance, separates samples primarily based on variations in major oxides such as SiO2 and Al2O3 versus iron- and phosphate-bearing oxides Fe2O3 and P2O5. The second component (PC2), contributing an additional 16.2% of variance, reflects subtler shifts related to trace and critical elements, including ΣREE and Ge. Samples with elevated ΣREE tend to be positioned toward the upper-right quadrant of the plot, while those with balanced or lower critical element content cluster near the center. The distribution supports the presence of overlapping but geochemically distinct ash fractions, in agreement with the clustering results discussed earlier.
Thus, multivariate analysis revealed consistent internal structuring within the ash dataset despite its overall heterogeneity. Hierarchical clustering delineated three compositional groups, potentially reflecting differences in combustion conditions, feedstock variability, or secondary alteration. PCA supported these groupings and demonstrated that the major axes of variance are driven by a combination of oxide balance (e.g., SiO2 vs. Fe2O3/P2O5) and the distribution of trace and critical elements such as ΣREE and Ge. Correlation patterns between ΣREE, Ge, Ga, and Li further suggest that these elements are controlled by distinct geochemical processes and likely reside in separate host phases. These findings provide a rationale for applying supervised ML techniques.

3.3. Mineralogical and Microstructural Characterization

To support the geochemical trends observed in the dataset and to better understand the potential host environments for REEs and other critical metals, selected mineralogical and microstructural analyses were conducted. Although REEs occur at low concentrations and are not directly detectable via SEM or XRD, these techniques provide essential context regarding the bulk mineral framework.
Figure 8 presents the XRD patterns of four representative samples, each corresponding to one of the four compositional clusters defined by AHC (see Figure 6).
These diffraction profiles indicate substantial mineralogical divergence across clusters. Cluster 1 (blue line, sample 49) is characterized by dominant mullite and hematite phases, with negligible amounts of calcite and amorphous material, suggesting formation under high-temperature conditions typical of slag-enriched or recrystallized ash. Cluster 2 (orange line, sample 10) exhibits prominent calcite and hematite reflections alongside diminished mullite, indicating lower thermal exposure and possible post-depositional carbonate formation through lime interaction or atmospheric carbonation. Cluster 3 (green line, sample 1) presents a more balanced assemblage of quartz, mullite, hematite, and moderate calcite, consistent with well-combusted fly ash of relatively homogeneous origin. In contrast, cluster 4 (red line, sample 15) displays a diffuse amorphous halo and weak crystalline signals, especially for quartz and mullite, pointing to fine-grained, poorly crystallized material—likely derived from low-temperature trapping zones or weathered surface layers. For all four samples, a minor peak at ~22° 2θ was observed but could not be confidently matched to any known phase; it may reflect a trace crystalline component or overlap with the amorphous halo. Semi-quantitative evaluation of the XRD patterns showed that the amorphous phase content across the analyzed samples ranged from approximately 45% to 60%.
The SEM observations are consistent with these interpretations (Figure 9).
The microstructure of sample 49 representing cluster 1 (Figure 9a) is dominated by a dense, fused matrix with minimal porosity and few distinguishable particles, indicating intense thermal transformation and recrystallization—supporting the presence of mullite and hematite. Sample 10 representing cluster 2 (Figure 9d) exhibits a mixture of angular and partially fused particles, with a moderately heterogeneous texture, in line with partial combustion and carbonate formation. In contrast, sample 1 (Figure 9b) shows abundant spherical particles with smooth surfaces, reflecting uniform combustion and typical fly ash morphology. Finally, sample 15 (Figure 9c) displays a highly fragmented and fine-grained texture, lacking well-formed crystalline particles—likely reflecting amorphous phase enrichment and post-depositional alteration.
Together, these microstructural differences reinforce the geochemical and mineralogical clustering, indicating that combustion conditions and post-depositional processes strongly influence ash morphology and phase assemblages. Importantly, while neither XRD nor SEM directly captured the low-abundance REE phases, their combination revealed diagnostic contrasts in texture and crystallinity between compositional clusters. This structural diversity implies distinct retention environments for trace elements and provides a basis for linking bulk chemical signatures to specific ash morphotypes.
Table 1 summarizes the key features of each cluster, including the list of assigned samples and average contents of selected major oxides and trace elements (including REEs and Ge).
Despite similar average concentrations of major oxides such as SiO2, Fe2O3, and CaO across the identified clusters, their XRD and SEM signatures reveal substantial differences in crystallinity and morphology. This discrepancy arises because bulk chemical composition accounts for total elemental content without distinguishing between crystalline, amorphous, or structurally disordered phases. In contrast, XRD selectively detects crystalline structures, and SEM visualizes surface features shaped by thermal exposure, phase evolution, and weathering processes. As a result, comparable oxide levels may be associated with distinctly different mineral phases and microstructural expressions.
Given this complex and multivariate nature of CA materials, conventional univariate correlations are insufficient to explain the selective enrichment of REEs and Ge. Therefore, in the following section, supervised ML models are applied to identify which compositional attributes most significantly influence the concentration and distribution of critical elements. These models aim to extract non-obvious relationships embedded in the dataset and to prioritize chemical predictors that can guide future separation and recovery efforts.

3.4. Machine-Learning-Based Prediction of REE and Ge Enrichment

Given the modest sample size (n = 50), model outputs are best viewed as indicators of compositional trends rather than production-grade predictors.
While pairwise correlation methods such as Pearson’s coefficient offer useful preliminary insights, they are inherently limited in their ability to capture the multivariate complexity of compositional data. In the studied ash samples, elemental concentrations often co-vary non-linearly due to overlapping sources, phase partitioning, and secondary transformations [4,45]. Moreover, some critical elements—such as REEs and Ge—may be hosted in minor or amorphous phases not directly aligned with bulk oxide trends. Therefore, a multivariate, model-driven approach is required to identify hidden patterns and compositional predictors of trace element enrichment. Supervised ML techniques are particularly well suited for this task, as they allow for ranking of variables by importance, accommodate non-linear relationships, and support interpretation through explainable frameworks.
To identify key compositional drivers of REE and Ge variability, a set of supervised machine learning models was employed, including Random Forest, XGBoost, and LASSO regression. These models differ in their internal mechanisms—tree-based ensembles versus linear regularization—allowing for cross-validation of feature relevance across algorithms. Model performance was evaluated using five-fold cross-validation, with metrics such as the coefficient of determination (R2) and root mean square error (RMSE) used to assess predictive accuracy. In addition, SHAP (Shapley Additive Explanations) values were calculated to estimate the relative importance of input features and enhance model interpretability.
The comparative predictive performance of Random Forest, XGBoost, and LASSO models in estimating the concentrations of selected critical elements (Ce, La, Y, Sc, and Ge) is summarized in Table 2.
The regression results show that none of the three algorithms achieved reliable prediction (Random Forest R2 ≈ −0.03 to −0.40; XGBoost ≈ −0.67 to −1.13; LASSO ≤ 0.51). We attribute this to three factors: limited sample size (n = 50) relative to the 22 input variables, strong collinearity among bulk-oxide features, and analytical noise introduced by weathering and heterogeneous sampling depth. To check whether the issue was model-dependent, we ran additional small-sample methods (k-nearest neighbor, partial least squares, Gaussian-process regression). These gave similarly low scores (R2 < 0.45). The data therefore lack the signal needed for precise concentration estimates.
The results indicate that Ce and Y are the only elements for which meaningful compositional trends could be captured by at least one model, with LASSO achieving R2 values of 0.384 and 0.513, respectively. This suggests that these elements may be partially associated with stable or predictable carrier phases—such as iron oxides, phosphate residues, or the aluminosilicate glass matrix—consistent with earlier observations of bulk mineralogy and variable P2O5 and Fe2O3 content in high-ΣREE samples.
In contrast, both La and Sc yielded near-zero or negative R2 values across all models, indicating that their concentrations are not linearly or non-linearly predictable from the bulk oxide composition. This may reflect dispersed partitioning among amorphous or minor phases, or greater mobility under surface weathering conditions, although this interpretation remains tentative without direct mineralogical confirmation.
Ge exhibited the poorest performance overall, with highly negative R2 values in tree-based models and no explanatory power in LASSO. While Ge is often considered a coal-associated critical element, its behavior in combustion residues appears decoupled from the main oxide matrix. This may suggest that Ge is partitioned into secondary, volatile-derived, or poorly crystalline carriers not captured by conventional compositional analysis—though further phase-specific characterization would be required to support this hypothesis.
The fact that V emerged as a dominant predictor for ΣREE in all three modeling approaches further reinforces its value as a robust geochemical proxy for REE enrichment in coal ash. Its consistent influence across statistical methods supports the notion that V-rich samples may co-localize or geochemically parallel REE-hosting domains, making it a useful screening parameter for targeted recovery strategies.
Negative R2 values indicate that the corresponding model performs worse than a naive prediction using the mean of the target variable, i.e., it introduces more error than simply assuming a constant average.
To illustrate these performance metrics, Figure 10, based on R2 and RMSE values reported in Table 2, presents a side-by-side comparison of R2 and RMSE values for Ce, La, Y, Sc, and Ge across the three models.
Only Ce and Y show moderate predictability, particularly under the LASSO model, with R2 values of 0.384 and 0.513, respectively. In contrast, La, Sc, and Ge display near-zero or negative R2 values across most models, indicating poor correlation with bulk compositional inputs. RMSE trends further confirm the limited explanatory power, especially for Ge, which consistently yielded the highest errors.
To underline the most influential compositional variables contributing to ΣREE enrichment, the top five predictors identified by each model—Random Forest, XGBoost, and LASSO—are presented in Table 3. The listed variables were selected based on feature importance (tree-based models) or non-zero regression coefficients (LASSO), and their values were normalized to allow cross-model comparison.
Figure 11 presents the normalized feature importance values for ΣREE prediction derived from SHAP-style interpretation across the Random Forest, XGBoost, and LASSO models, based on the top-ranked predictors listed in Table 3.
V dominates across all three models, which supports the role of this element as an important geochemical proxy for ΣREE enrichment in CA. In the Random Forest model, SiO2, Li, MgO, and P2O5 also contribute notably, which shows potential links between REE content and the aluminosilicate matrix or phosphorus-bearing phases. XGBoost assigns highest importance to the acid-insoluble residue, likely reflecting the influence of refractory carriers such as mullite or hematite, followed by CaO, soluble alumina, and alkali oxides. LASSO emphasizes V along with Ga, MgO, CaO, and K2O + Na2O; this demonstrates that certain trace and alkali elements may co-vary with REEs under specific combustion or weathering conditions.
Several consistent patterns emerge that provide insight into the geochemical controls on REE behavior in CA.
Most notably, V appears as the top-ranked predictor in both Random Forest and LASSO, and remains within the top five in XGBoost. This convergence across models with fundamentally different architectures—ensemble averaging (RF), gradient boosting (XGB), and L1-penalized regression (LASSO)—strongly supports the interpretation that V acts as a robust geochemical proxy for REE enrichment. Mechanistically, this is plausible given that both V and REEs are redox-sensitive, lithophile elements that may co-associate with iron-bearing glassy or amorphous matrices formed under high-temperature combustion. Furthermore, both elements may be retained in similar residual phases under oxidative weathering conditions.
The appearance of Ga (only in LASSO) alongside V suggests that certain trace elements behave coherently with REEs under shared geochemical constraints, possibly linked to condensation or volatility patterns during combustion.
SiO2 and MgO appear prominently in Random Forest and LASSO, suggesting that ΣREE may be structurally or texturally associated with the aluminosilicate network and its modification through Mg-bearing glass or spinel phases. Their presence supports the interpretation that REEs are not simply concentrated in isolated mineral grains but may be distributed throughout more pervasive amorphous silicate frameworks.
The presence of K2O + Na2O and CaO in both LASSO and XGBoost underscores the role of alkali and alkaline earth components in influencing REE retention. These oxides may reflect lime-based additives or phase transitions during combustion, which modulate the physicochemical environment in which REEs are immobilized. For example, CaO may signal the formation of apatite-like phases or influence glass structure polymerization, both of which can affect REE partitioning.
The insoluble residue, the top predictor in the XGBoost model, is likely a proxy for the non-leachable, refractory component of the ash, which may serve as a cumulative indicator of stable REE-bearing hosts such as mullite, iron oxides, or weathered glass. Similarly, soluble alumina may differentiate mobile from structurally retained aluminum, indirectly reflecting ash porosity, reactivity, or the proportion of stable aluminosilicate phases.

4. Integrated Interpretation and Implications

This study offers new insights into the geochemical and mineralogical factors influencing REE enrichment in combustion residues, particularly in the context of weathered ash deposits. While most ML models showed limited predictive accuracy in absolute terms, several compositional variables emerged repeatedly across independent algorithms, and show their diagnostic significance.
Among these, V demonstrated the most consistent importance, ranking within the top five predictors in all three models—Random Forest, XGBoost, and LASSO. Its recurrence, despite different model architectures and statistical assumptions, highlights its potential role as a robust proxy for REE-hosting domains. This likely reflects shared partitioning behavior between V and REEs under oxidizing combustion conditions, as well as mutual affinity for amorphous, ferruginous, or phosphate-rich phases. Prior studies have reported similar associations in U.S. CA, where V and REEs often co-localize within silico-ferruginous matrices or aluminosilicate glass [47].
However, it must be emphasized that the observed behavior of V may be specific to the geochemical and depositional context of the studied dump-site material. Factors such as fuel composition, combustion regime, and post-depositional alteration strongly affect the distribution of both REEs and trace elements like V. Therefore, while V proved to be a reproducible predictor within this dataset, its applicability to other coal ash sources remains uncertain. The observed correlation likely reflects localized mechanisms, such as high-temperature glass entrapment and subsequent weathering, rather than universal geochemical controls. Similar cautionary notes regarding proxy indicators were expressed in other ML-driven ash studies [28,31,38], as well as in broader reviews of ash heterogeneity [48].
The overall weak predictive performance for La, Sc, and Ge across all models further supports the interpretation that REEs are dispersed among poorly crystalline or amorphous carriers, which are not easily captured by bulk chemical descriptors. This is corroborated by XRD and SEM analyses, which revealed that samples with similar oxide compositions often exhibit markedly different mineralogical textures and crystallinity. These structural divergences help explain the limited success of ML in modeling individual REE contents and underline the necessity of integrating phase-level data in predictive workflows. Several works have emphasized the key role of amorphous and sub-micron phases—especially iron- and phosphate-bearing matrices—in hosting REEs under combustion conditions [7,11,25].
From a practical standpoint, this work shows that ML models, even when not highly predictive, can serve as effective screening tools to prioritize compositional variables and eliminate noise in multivariate datasets. The combined presence of V, CaO, MgO, and alkalis (K2O + Na2O) appears to reflect zones of higher REE potential, particularly in materials exhibiting elevated amorphous content or ferruginous inclusions. For instance, CaO may signal carbonate precipitation or phosphate stabilization, both of which can influence REE solubility and leachability [23,26].
The cluster-based classification of samples (Figure 6, Table 1) further contextualizes these findings by linking geochemical trends to mineralogical expressions. Samples in cluster 1, with high mullite and hematite content, reflect dense, recrystallized ash formed under elevated temperatures, while those in cluster 4 display amorphous-rich textures and more porous morphology. These features are crucial for process selection: mullite-bound REEs may require aggressive treatment (e.g., alkali fusion or acid baking), whereas REEs in weathered glass or iron oxyhydroxides may be accessible via mild acid or organic ligand leaching [5,29].
Thus, the use of ML in conjunction with XRD and SEM has enabled the identification of structurally and compositionally consistent signals underlying REY variability, even in challenging, heterogeneous matrices. While direct quantification of REE content remains elusive without detailed mineralogical data, the approach developed here provides a practical and scalable framework for evaluating REE potential in secondary resources.

5. Conclusions

This study integrates compositional, structural, and data-driven approaches to evaluate the distribution of REEs and Ge in heterogeneous CA materials from the Ekibastuz Basin. Rather than relying on conventional geochemical normalization or mineral-specific partitioning models, the analysis focuses on identifying statistically reproducible indicators of REE enrichment using ML and phase-level characterization.
Although traditional statistical models failed to predict individual REE concentrations with high accuracy, the combined use of unsupervised clustering, XRD/SEM characterization, and supervised ML allowed for the identification of robust compositional indicators. Among them, V consistently emerged as the most informative predictor of ΣREE content across all ML models tested. This suggests that V may serve as a practical geochemical proxy for REE enrichment in certain types of ash, particularly those with high-temperature transformation and subsequent weathering. However, its reliability as a universal indicator is limited, as it likely reflects site-specific mechanisms of REE retention under oxidizing and amorphous conditions.
The poor predictability of Ge, La, and Sc further emphasizes the role of amorphous and poorly crystalline phases in hosting critical elements, which are not readily captured by bulk oxide composition alone. The observed discrepancy between chemical similarity and mineralogical differentiation reinforces the need for integrated phase-sensitive techniques to complement compositional modeling.
The methodological framework developed—combining compositional fingerprinting, structural assessment, and ML-based prioritization—can be adapted for rapid screening of REE-bearing ash streams. In resource-constrained settings, the identification of surrogate indicators such as V, CaO, MgO, and alkalis may facilitate preliminary classification and valorization of CA materials.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/min15070734/s1, Table S1: Content of components in CA samples; Table S2: Results of chemical analysis of coal ash samples; Table S3: Created dataset.

Author Contributions

Conceptualization, R.N. and K.K.; methodology, O.T. and L.M.; investigation, L.M., A.B. and O.T.; resources, R.N.; writing—original draft preparation, K.K., O.T., L.M. and A.B.; writing—review and editing, R.N., L.M. and A.K.; visualization, A.K.; project administration, R.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (grant No. BR21882017).

Data Availability Statement

The data supporting the results can be made available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bhattacharya, S.S.; Kim, K.H. Utilization of coal ash: Is vermitechnology a sustainable avenue? Renew. Sustain. Energy Rev. 2016, 58, 1376–1386. [Google Scholar] [CrossRef]
  2. Shi, W.; Bai, J.; Kong, L.; Li, H.; Bai, Z.; Vassilev, S.V.; Li, W. An overview of the coal ash transition process from solid to slag. Fuel 2021, 287, 119537. [Google Scholar] [CrossRef]
  3. Petrović, M.; Fiket, Ž. Environmental damage caused by coal combustion residue disposal: A critical review of risk assessment methodologies. Chemosphere 2022, 299, 134410. [Google Scholar] [CrossRef]
  4. Thomas, B.S.; Dimitriadis, P.; Kundu, C.; Vuppaladadiyam, S.S.V.; Raman, R.S.; Bhattacharya, S. Extraction and separation of rare earth elements from coal and coal fly ash: A review on fundamental understanding and on-going engineering advancements. J. Environ. Chem. Eng. 2024, 12, 112769. [Google Scholar] [CrossRef]
  5. Dodbiba, G.; Fujita, T. Trends in extraction of rare earth elements from coal ashes: A review. Recycling 2023, 8, 17. [Google Scholar] [CrossRef]
  6. Fu, B.; Hower, J.C.; Zhang, W.; Luo, G.; Hu, H.; Yao, H. A review of rare earth elements and yttrium in coal ash: Content, modes of occurrences, combustion behavior, and extraction methods. Prog. Energy Combust. Sci. 2022, 88, 100954. [Google Scholar] [CrossRef]
  7. Rybak, A.; Rybak, A. Characteristics of some selected methods of rare earth elements recovery from coal fly ashes. Metals 2021, 11, 142. [Google Scholar] [CrossRef]
  8. Rudnik, E. Challenges and Opportunities in Hydrometallurgical Recovery of Germanium from Coal By-Products. Molecules 2025, 30, 1695. [Google Scholar] [CrossRef]
  9. Zhou, C.; Du, J.; Zhang, Y.; Sun, J.; Wu, W.; Liu, G. Redistribution and transformation mechanisms of gallium and germanium during coal combustion. Fuel 2021, 305, 121532. [Google Scholar] [CrossRef]
  10. Meshram, P.; Abhilash. Strategies for recycling of primary and secondary resources for germanium extraction. Min. Metall. Explor. 2022, 39, 689–707. [Google Scholar] [CrossRef]
  11. Balaram, V. Sources and applications of rare earth elements. Environ. Technol. Treat Rare Earth Elem. Pollut. Princ. Eng. 2022, 113, 75–113. [Google Scholar]
  12. Jiang, Y.; Fu, H.; Liang, Z.; Zhang, Q.; Du, Y. Rare earth oxide based electrocatalysts: Synthesis, properties and applications. Chem. Soc. Rev. 2024, 53, 714–763. [Google Scholar] [CrossRef]
  13. Chai, S.-S.; Zhang, W.-B.; Yang, J.-L.; Zhang, L.; Theint, M.M.; Zhang, X.-L.; Guo, S.-B.; Zhou, X.; Ma, X.-J. Sustainability applications of rare earths from metallurgy, magnetism, catalysis, luminescence to future electrochemical pseudocapacitance energy storage. RSC Sustain. 2023, 1, 38–71. [Google Scholar] [CrossRef]
  14. Malavekar, D.B.; Magdum, V.V.; Khot, S.D.; Kim, J.H.; Lokhande, C.D. Doping of rare earth elements: Towards enhancing the electrochemical performance of pseudocapacitive materials. J. Alloys Compd. 2023, 960, 170601. [Google Scholar] [CrossRef]
  15. Junior, A.B.; Espinosa, D.C.R.; Vaughan, J.; Tenório, J.A.S. Recovery of scandium from various sources: A critical review of the state of the art and future prospects. Miner. Eng. 2021, 172, 107148. [Google Scholar] [CrossRef]
  16. Røyset, J.; Ryum, N. Scandium in aluminium alloys. Int. Mater. Rev. 2005, 50, 19–44. [Google Scholar] [CrossRef]
  17. Yamada, Y.; Yoneda, M.; Fukuzumi, S. High and robust performance of H2O2 fuel cells in the presence of scandium ion. Energy Environ. Sci. 2015, 8, 1698–1701. [Google Scholar] [CrossRef]
  18. Phoung, S.; Williams, E.; Gaustad, G.; Gupta, A. Exploring global supply and demand of scandium oxide in 2030. J. Clean. Prod. 2023, 401, 136673. [Google Scholar] [CrossRef]
  19. Priyadarshini, P.; Sahoo, D.; Naik, R. A review on the optical properties of some germanium based chalcogenide thin films and their applications. Opt. Quantum Electron. 2022, 54, 166. [Google Scholar] [CrossRef]
  20. Osterthun, N.; Neugebohrn, N.; Gehrke, K.; Vehse, M.; Agert, C. Spectral engineering of ultrathin germanium solar cells for combined photovoltaic and photosynthesis. Opt. Express 2021, 29, 938–950. [Google Scholar] [CrossRef]
  21. Kamble, G.U.; Shin, S.W.; Park, S.W.; Gaikwad, M.A.; Karade, V.C.; Jang, J.S.; Park, Y.; Ghorpade, U.V.; Suryawanshi, M.P.; Kim, J.H. Germanium Selenide: A Critical Review on Recent Advances in Material Development for Photovoltaic and Photoelectrochemical Water-Splitting Applications. Sol. RRL 2023, 7, 2300502. [Google Scholar] [CrossRef]
  22. Chiara, R.; Morana, M.; Malavasi, L. Germanium-based halide perovskites: Materials, properties, and applications. ChemPlusChem 2021, 86, 879–888. [Google Scholar] [CrossRef]
  23. Ketegenov, T.; Kamunur, K.; Mussapyrova, L.; Batkal, A.; Nadirov, R. Enhancing Rare Earth Element Recovery from Coal Ash Using High-Voltage Electrical Pulses and Citric Acid Leaching. Minerals 2024, 14, 693. [Google Scholar] [CrossRef]
  24. Reedy, R.C.; Scanlon, B.R.; Bagdonas, D.A.; Hower, J.C.; James, D.; Kyle, J.R.; Uhlman, K. Coal ash resources and potential for rare earth element production in the United States. Int. J. Coal Sci. Technol. 2024, 11, 74. [Google Scholar] [CrossRef]
  25. Park, S.; Kim, M.; Lim, Y.; Yu, J.; Chen, S.; Woo, S.W.; Yoon, S.; Bae, S.; Kim, H.S. Characterization of rare earth elements present in coal ash by sequential extraction. J. Hazard. Mater. 2021, 402, 123760. [Google Scholar] [CrossRef]
  26. Banerjee, R.; Chakladar, S.; Mohanty, A.; Chattopadhyay, S.K.; Chakravarty, S. Leaching characteristics of rare earth elements from coal ash using organosulphonic acids. Miner. Eng. 2022, 185, 107664. [Google Scholar] [CrossRef]
  27. Berti, D.; Groppo, J.G.; Joshi, P.; Preda, D.V.; Gamliel, D.P.; Beers, T.; Schrock, M.; Hopps, S.D.; Morgan, T.D.; Hower, J.C.; et al. Electron microbeam investigations of the spent ash from the pilot-scale acid extraction of rare earth elements from a beneficiated Kentucky fly ash. Int. J. Coal Geol. 2025, 303, 104738. [Google Scholar] [CrossRef]
  28. Liu, C.; Yang, Y.; Chen, L.; Wu, J.; Sun, Y.; Han, M.; Guo, X.; He, M.; Jin, Z. Rare earth resource in fly ashes from coal power plants of China: Based on machine learning model and unit-based estimation. Int. J. Coal Geol. 2025, 303, 104743. [Google Scholar] [CrossRef]
  29. Zhou, C.; Li, C.; Li, W.; Sun, J.; Li, Q.; Wu, W.; Liu, G. Distribution and preconcentration of critical elements from coal fly ash by integrated physical separations. Int. J. Coal Geol. 2022, 261, 104095. [Google Scholar] [CrossRef]
  30. Wen, Z.; Liu, H.; Zhou, M.; Liu, C.; Zhou, C. Explainable machine learning rapid approach to evaluate coal ash content based on X-ray fluorescence. Fuel 2023, 332, 125991. [Google Scholar] [CrossRef]
  31. Qi, C.; Wu, M.; Zheng, J.; Chen, Q.; Chai, L. Rapid identification of reactivity for the efficient recycling of coal fly ash: Hybrid machine learning modeling and interpretation. J. Clean. Prod. 2022, 343, 130958. [Google Scholar] [CrossRef]
  32. Wu, M.; Qi, C.; Chen, Q.; Liu, H. Evaluating the metal recovery potential of coal fly ash based on sequential extraction and machine learning. Environ. Res. 2023, 224, 115546. [Google Scholar] [CrossRef]
  33. Ashraf, M.W.; Tu, Y.; Khan, A.; Siddiqui, A.S.; Mubarak, S.; Sufian, M.; Ullah, S.; Wang, C. Experimental and explainable machine learning based investigation of the coal bottom ash replacement in sustainable concrete production. J. Build. Eng. 2025, 104, 112367. [Google Scholar] [CrossRef]
  34. Dev, K.L.; Kumar, D.R.; Wipulanusat, W. Machine learning prediction of the unconfined compressive strength of controlled low strength material using fly ash and pond ash. Sci. Rep. 2024, 14, 27540. [Google Scholar] [CrossRef]
  35. Chatterjee, S.; Mastalerz, M.; Drobniak, A.; Karacan, C.Ö. Machine learning data augmentation approach for identification of rare earth element potential in Indiana Coals, USA. Int. J. Coal Geol. 2022, 259, 104054. [Google Scholar] [CrossRef]
  36. Chatterjee, S.; Karacan, C.Ö.; Mastalerz, M. Exploring the uncertainty of machine learning models and geostatistical mapping of rare earth element potential in Indiana coals, USA. Int. J. Coal Geol. 2024, 282, 104419. [Google Scholar] [CrossRef]
  37. Bhatt, C.R.; Jain, J.C.; Bol’shakov, A.A.; McIntyre, D.L. Chemistry imaging and distribution analysis of rare earth elements in coal using LIBS and LA-ICP-MS instruments. Int. J. Coal Geol. 2025, 301, 104710. [Google Scholar] [CrossRef]
  38. Song, Y.; Zhao, Y.; Ginella, A.; Gallagher, B.; Sant, G.; Bauchy, M. Predicting rare earth elements concentration in coal ashes with multi-task neural networks. Mater. Horiz. 2024, 11, 1448–1464. [Google Scholar] [CrossRef]
  39. Xu, N.; Li, Q. Threshold value determination using machine learning algorithms for Ba interference with Eu in coal and coal combustion products by ICP-MS. Minerals 2019, 9, 259. [Google Scholar] [CrossRef]
  40. Wang, P.; Liu, H.; Zheng, F.; Liu, Y.; Kuang, G.; Deng, R.; Li, H. Extraction of aluminum from coal fly ash using pressurized sulfuric acid leaching with emphasis on optimization and mechanism. Jom 2021, 73, 2643–2651. [Google Scholar] [CrossRef]
  41. Eminagaoglu, M.; Oskay, R.G.; Karayigit, A.I. Evaluation of elemental affinities in coal using agglomerative hierarchical clustering algorithm: A case study in a thick and mineable coal seam (kM2) from Soma Basin (W. Turkey). Int. J. Coal Geol. 2022, 259, 104045. [Google Scholar] [CrossRef]
  42. Xu, N.; Finkelman, R.B.; Dai, S.; Xu, C.; Peng, M. Average linkage hierarchical clustering algorithm for determining the relationships between elements in coal. ACS Omega 2021, 6, 6206–6217. [Google Scholar] [CrossRef] [PubMed]
  43. Xu, N.; Li, Q.; Zhu, W.; Finkelman, R.B.; Engle, M.A.; Wang, R.; Wang, Z. Advocating the Use of Bayesian Network in Analyzing the Modes of Occurrence of Elements in Coal. ACS Omega 2023, 8, 39096–39109. [Google Scholar] [CrossRef] [PubMed]
  44. Reimann, C.; Filzmoser, P.; Hron, K.; Kynčlová, P.; Garrett, R.G. A new method for correlation analysis of compositional (environmental) data–a worked example. Sci. Total Environ. 2017, 607, 965–971. [Google Scholar] [CrossRef]
  45. Bishop, B.A.; Shivakumar, K.R.; Alessi, D.S.; Robbins, L.J. Insights into the rare earth element potential of coal combustion by-products from western Canada. Environ. Sci. Adv. 2023, 2, 529–542. [Google Scholar] [CrossRef]
  46. Filzmoser, P.; Hron, K.; Reimann, C. Principal component analysis for compositional data with outliers. Environmetr. Off. J. Int. Environmetr. Soc. 2009, 20, 621–632. [Google Scholar] [CrossRef]
  47. Taggart, R.K.; Hower, J.C.; Dwyer, G.S.; Hsu-Kim, H. Trends in the rare earth element content of US-based coal combustion fly ashes. Environ. Sci. Technol. 2016, 50, 5919–5926. [Google Scholar] [CrossRef]
  48. Blissett, R.S.; Rowson, N.A. A review of the multi-component utilisation of coal fly ash. Fuel 2012, 97, 1–23. [Google Scholar] [CrossRef]
Figure 1. Coal ash sampling area (image obtained via Google Maps).
Figure 1. Coal ash sampling area (image obtained via Google Maps).
Minerals 15 00734 g001
Figure 2. Scheme of sampling of CA dump of Almaty TPP-2. The square plot (20 × 20 m) contains 25 sampling points arranged at 5 m intervals.
Figure 2. Scheme of sampling of CA dump of Almaty TPP-2. The square plot (20 × 20 m) contains 25 sampling points arranged at 5 m intervals.
Minerals 15 00734 g002
Figure 3. Average bulk oxide composition of coal ash samples (SiO2, Al2O3, Fe2O3, P2O5, Ti, CaO, MgO) with standard deviation bars.
Figure 3. Average bulk oxide composition of coal ash samples (SiO2, Al2O3, Fe2O3, P2O5, Ti, CaO, MgO) with standard deviation bars.
Minerals 15 00734 g003
Figure 4. Total concentrations of ΣREE (Ce + La + Y + Sc) across the CA samples.
Figure 4. Total concentrations of ΣREE (Ce + La + Y + Sc) across the CA samples.
Minerals 15 00734 g004
Figure 5. Spearman’s correlation matrix between ΣREE and selected critical elements (Ge, Ga, Li, V) in CA samples.
Figure 5. Spearman’s correlation matrix between ΣREE and selected critical elements (Ge, Ga, Li, V) in CA samples.
Minerals 15 00734 g005
Figure 6. Agglomerative hierarchical clustering dendrogram based on standardized major and trace element concentrations (Ward’s linkage method).
Figure 6. Agglomerative hierarchical clustering dendrogram based on standardized major and trace element concentrations (Ward’s linkage method).
Minerals 15 00734 g006
Figure 7. Principal component analysis (PCA) score plot showing the distribution of ash samples in the PC1–PC2 space based on scaled chemical composition.
Figure 7. Principal component analysis (PCA) score plot showing the distribution of ash samples in the PC1–PC2 space based on scaled chemical composition.
Minerals 15 00734 g007
Figure 8. XRD patterns of CA samples representing different clusters (1–4) according to AHC (Figure 7).
Figure 8. XRD patterns of CA samples representing different clusters (1–4) according to AHC (Figure 7).
Minerals 15 00734 g008
Figure 9. SEM images of CA samples 49 (a), 1 (b), 15 (c) and 10 (d).
Figure 9. SEM images of CA samples 49 (a), 1 (b), 15 (c) and 10 (d).
Minerals 15 00734 g009
Figure 10. Comparative performance of ML models for ΣREE and Ge prediction.
Figure 10. Comparative performance of ML models for ΣREE and Ge prediction.
Minerals 15 00734 g010
Figure 11. Normalized importance of top ΣREE predictors across ML models (from Table 3).
Figure 11. Normalized importance of top ΣREE predictors across ML models (from Table 3).
Minerals 15 00734 g011
Table 1. Compositional summary of AHC-defined clusters, including sample IDs and average concentrations of major oxides and selected critical elements.
Table 1. Compositional summary of AHC-defined clusters, including sample IDs and average concentrations of major oxides and selected critical elements.
Cluster IDSample IDsSiO2, wt %Al2O3,
wt %
Fe2O3,
wt %
CaO,
wt %
P2O5,
wt %
LOI,
wt %
Ce,
mg/kg
La,
mg/kg
Y,
mg/kg
Sc,
mg/kg
Ge,
mg/kg
149, 5053.626.66.83.80.510.238,67615,00620,00813,58611,174
26, 15, 23, 29, 34, 46, 47, 4860.329.97.74.90.67.852,70219,69726,87415,95013,722
31, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14, 16, 17, 20, 24, 27, 28, 4363.427.98.24.40.610.352,20718,87624,44917,25813,047
410, 11, 18, 19, 21, 22, 25, 26, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 44, 4562.628.27.44.50.69.056,03420,38728,44717,62412,310
Table 2. Comparative predictive performance (R2 and RMSE) of Random Forest, XGBoost, and LASSO models for selected critical elements.
Table 2. Comparative predictive performance (R2 and RMSE) of Random Forest, XGBoost, and LASSO models for selected critical elements.
ElementRandom ForestXGBoostLASSO
R2RMSER2RMSER2RMSE
Ce−0.0346249−0.67167820.3844823
La−0.3982611−0.494243902207
Y−0.07931950.14727140.5132145
Sc−0.2232299−0.636243802079
Ge−0.251392−1.130164701245
Table 3. Top 5 compositional predictors of ΣREE concentrations identified by Random Forest, XGBoost, and LASSO models (normalized scores).
Table 3. Top 5 compositional predictors of ΣREE concentrations identified by Random Forest, XGBoost, and LASSO models (normalized scores).
ModelFeatureNormalized Score
Random ForestV1.0000
SiO20.6325
Li0.4943
MgO0.4741
P2O50.4126
XGBoostInsoluble residue1.0000
CaO0.6585
Soluble alumina0.5727
K2O + Na2O0.5015
V0.4589
LASSOV1.0000
Ga0.3389
K2O + Na2O0.3145
MgO0.3029
CaO0.2550
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nadirov, R.; Kamunur, K.; Mussapyrova, L.; Batkal, A.; Tyumentseva, O.; Karagulanova, A. Integrated Compositional Modeling and Machine Learning Analysis of REE-Bearing Coal Ash from a Weathered Dumpsite. Minerals 2025, 15, 734. https://doi.org/10.3390/min15070734

AMA Style

Nadirov R, Kamunur K, Mussapyrova L, Batkal A, Tyumentseva O, Karagulanova A. Integrated Compositional Modeling and Machine Learning Analysis of REE-Bearing Coal Ash from a Weathered Dumpsite. Minerals. 2025; 15(7):734. https://doi.org/10.3390/min15070734

Chicago/Turabian Style

Nadirov, Rashid, Kaster Kamunur, Lyazzat Mussapyrova, Aisulu Batkal, Olesya Tyumentseva, and Ardak Karagulanova. 2025. "Integrated Compositional Modeling and Machine Learning Analysis of REE-Bearing Coal Ash from a Weathered Dumpsite" Minerals 15, no. 7: 734. https://doi.org/10.3390/min15070734

APA Style

Nadirov, R., Kamunur, K., Mussapyrova, L., Batkal, A., Tyumentseva, O., & Karagulanova, A. (2025). Integrated Compositional Modeling and Machine Learning Analysis of REE-Bearing Coal Ash from a Weathered Dumpsite. Minerals, 15(7), 734. https://doi.org/10.3390/min15070734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop