PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography

Ünal, Sevgi; Gürfidan, Remzi

doi:10.3390/app16052514

Open AccessArticle

PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography

by

Sevgi Ünal

^1,*

and

Remzi Gürfidan

²

¹

Department of Radiology, Izmir Katip Celebi University Ataturk Training and Research Hospital, Izmir 35150, Türkiye

²

Isparta Vocational School of Information Technologies, Database, Network Design and Management, Isparta University of Applied Science, Isparta 32200, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2514; https://doi.org/10.3390/app16052514

Submission received: 30 January 2026 / Revised: 2 March 2026 / Accepted: 4 March 2026 / Published: 5 March 2026

Download

Browse Figures

Versions Notes

Abstract

Objectives: This study aims to predict patient-specific Average Glandular Dose (AGD) in mammography using machine learning-based models to support personalised radiation dose optimisation and reduce unnecessary exposure during breast cancer screening. Methods: A retrospective dataset of 671 female patients who underwent full-field digital mammography between 2020 and 2024 was analysed. Right craniocaudal (CC) images were used to construct a structured dataset including mAs, kVp, compressed breast thickness, air kerma (k_air), half-value layer (HVL), and breast pattern. Five regression-based machine learning models (CatBoost, Gradient Boosting, Random Forest, Extra Trees, and AdaBoost) and their Particle Swarm Optimisation (PSO)-enhanced versions were evaluated. Model performance was assessed using MSE, RMSE, MAE, MAPE, and R². SHAP analysis was applied to interpret model predictions and determine variable importance. Results: PSO integration significantly reduced prediction errors, particularly in boosting-based models. The CatBoost + PSO model achieved the best performance (RMSE = 0.0100, MAPE ≈ 1.74%, R² = 0.9846), followed by the Gradient Boosting + PSO model (R² = 0.9787). PSO reduced RMSE and MAPE by approximately 55% and 52%, respectively. SHAP analysis identified k_air, breast thickness, and breast pattern as the most influential factors affecting AGD. Conclusions: Machine learning models enhanced with PSO, especially CatBoost + PSO, provide accurate and reliable patient-specific AGD predictions. The proposed approach enables rapid and clinically applicable dose estimation and highlights breast pattern as a critical parameter influencing glandular dose, supporting personalised radiation dose optimisation in mammography.

Keywords:

mammography; average glandular dose; breast pattern; artificial intelligence; machine learning

1. Introduction

Breast cancer is reported to be the most common malignancy among women worldwide and one of the leading causes of death [1]. Screening mammography (SMG) is the primary imaging method for the early diagnosis of breast cancer and plays a critical role in reducing mortality [2]. However, repeated mammography examinations in screening programmes increase exposure to ionising radiation and, consequently, may increase the risk of radiation-induced cancer [2]. As breast tissue, particularly its glandular component, is one of the most radiosensitive tissues in the body, the radiation-related risk associated with breast cancer is influenced by various factors, including the age at which screening begins, screening frequency, and breast volume [3].

In mammography, Average Glandular Dose (AGD) is considered the most appropriate dosimetry parameter for estimating radiation-related breast cancer risk. AGD is influenced by numerous physical and anatomical variables, such as tube voltage (kVp), tube exposure (mAs), half-value layer (HVL), breast thickness, and fibroglandular tissue density [4,5,6]. MMG units are equipped with embedded software that contains different algorithms and can estimate AGD. Two main approaches are used in AGD assessment [7]. The methods proposed by Wu et al. and Dance et al. estimate AGD based on age and breast thickness using standard breast phantoms [8,9], while the Volpara method calculates AGD by determining the distribution of fibroglandular tissue using a patient-based approach. Although these methods are quite useful for inter-device dose comparisons, they do not fully reflect the actual breast dose of individual patients [10,11]. One of the important physical indicators of image quality in digital mammography is the contrast-to-noise ratio (CNR). CNR is widely used in evaluating the performance of digital equipment, and higher radiation doses are often required to achieve the desired image quality, particularly in dense breast tissue [12,13]. The American College of Radiology (ACR) has developed the BI-RADS (Breast Imaging Reporting and Data System) density classification, which categorises breast structure into four categories, emphasising the importance of breast density in diagnostic accuracy and cancer risk assessment: near-fat, scattered fibroglandular, heterogeneous dense, and extremely dense breast [14]. Breast thickness is also a critical parameter in determining AGD, and there is a complex balance between image quality and radiation dose in thick tissues [4,15]. International standards for AGD calculations are based on conversion factors generated by Dance et al. using Monte Carlo (MC) simulations [16]. The MC approach calculates AGD using the K×g×c×s formula; where K is the input air kerma (K_air), the g factor is the conversion coefficient that gives the AGD value for 50% glandularity, the c factor is the correction coefficient for different glandularity levels, and the s factor is the correction term according to the X-ray spectrum [17,18]. Therefore, K_air, HVL, breast thickness, and breast density are among the basic inputs of AGD algorithms. Although there are numerous studies in the literature on AGD assessment in mammography, significant differences in AGD estimates may occur due to the different methods and formulas used in these studies. However, based on current information, there are only a limited number of studies in which AGD has been predicted using machine learning (ML) models based on the patient’s specific breast structure. This highlights the need for new approaches to individual dose optimisation.

In this study, an approach independent of Monte Carlo modelling was employed to develop a machine learning-based model that evaluates breast patterns in a personalised manner. The aim was to create a method that can be rapidly calculated in daily practice using routinely obtained clinical parameters such as K_air (mGy), breast thickness (mm) and HVL (mm Al), thereby contributing to dose optimisation.

This study contributes to the field of mammographic dosimetry through an integrated and clinically grounded machine learning framework. The primary contributions can be summarized as follows:

Integration of Real Clinical Data with BI-RADS Density Classification;
PSO-Tuned Ensemble Learning with Transparent Configuration;
Explainable Artificial Intelligence (SHAP) for Dosimetric Interpretation.

Together, these elements form a reproducible and interpretable framework for patient-specific standardized AGD estimation in mammography.

2. Materials and Methods

This study was approved by the Ethics Committee of Health Research at Izmir Katip Celebi University on 9 November 2025, with approval reference number 0598. Between January 2020 and December 2024, 671 female patients who underwent mammography (MMG) at our breast radiology clinic were included in the study. For each patient, only the mAs, kVp, compressed breast thickness, and K_air values for the single craniocaudal (CC) position image were recorded.

Breast patterns were classified into four categories by two radiologists with 5 and 10 years of experience in breast radiology, according to the BI-RADS reporting system: near-solid fatty, scattered fibroglandular, heterogeneous dense, and highly dense (Figure 1).

Mammographic images were obtained using a full-field digital mammography system (Giotto; IMS, Bologna, Italy). Patients with a history of surgery, cases diagnosed with breast malignancy, individuals with breast implants, and male patients were excluded from the study.

This study is based on a single-center clinical dataset. All mammographic acquisitions were obtained under routine clinical conditions. While the dataset reflects real-world practice, multi-center validation is required before broader clinical generalization.

3. Mathematical Model for Breast Doze Estimation

Since the Breast_Dose_Estimation column, which is the target variable in the dataset used in the study, did not contain measurement-based values, a directly supervised regression training was not possible. Therefore, a physics-based closed-form estimation model was built using only the available inputs (breast pattern, Kₐᵢᵣ, mm, HVL). The model follows the Dance skeleton decomposition of the average glandular dose (AGD) in mammography, AGD = K − g(d, HVL) − c(p, d, HVL) − s(HVL): Kₐᵢᵣ measured at the surface was corrected by the backscatter factor (BSF) to obtain the input kerma (K = Kair/BSF); the functions g(d, HVL) and s(HVL) were defined for thickness (d = mm/10) and bundle quality (HVL); the function c(p, d, HVL) was defined for glandularity correction over the breast pattern p (1–4). Thus, an explainable AGD estimate was produced for each row; summary statistics were calculated as N = 671, mean 0.284 mGy (min–max: 0.049–0.655 mGy).

Variables used to create a mathematical model:

p ∈ {1,2,3,4}: Breast patterns.
K_air (mGy): Air kerma measured at the nozzle surface (including backscatter).
d (mm): Compressed thickness → dcm = d/10.
HVL (mm Al): Semi-value layer.

In the calculation of the backscatter factor (BSF) and input kerma (K) values shown in Equation (1); since K_air is measured at the surface, the backscatter is subtracted for the input kerma K:

K = \frac{K_{a i r}}{B S F (d, H V L)}

(1)

BSF approximate form (typical 1.10–1.30 in the mammographic range) Equation (2):

B S F (d, H V L) = c l i p (1.15 + 0.02 (d_{c m} - 4) + 0.1 (0.55 - H V L), 1.1, 1.3)

(2)

g-factor (50% for glandular breast): a bounded function that decreases with increasing thickness and HVL Equation (3):

g (d, H V L) = c l i p ((0.35 - 0.0417 (d_{c m} - 2)) \times {(\frac{0.5}{H V L})}^{0.7}, 0.07, 0.4)

(3)

c-factor (glandularity correction): p → approximate glandularity match to Equation (4).

p = 1 → 0.2, p = 2 → 0.4; p = 3 → 0.6; p = 4 → 0.8 (g_f) to be as follows:

c (g_{f}, d, H V L) = c l i p (0.8 + 0.4 g_{f} \times (1 + 0.01 (d_{c m} - 4) + 0.04 (0.55 - H V L), 0.8, 1.2))

(4)

s-factor (spectrum): Poorly connected to HVL as no target/filter information given; ~1 acceptance in Equation (5):

s (H V L) = c l i p (1.0 + 0.05 (H V L - 0.55), 0.95, 1.05)

(5)

Final forecast is Equation (6):

A G D = (\frac{K_{a i r}}{B S F (d, H V L)}) \times g (d, H V L) \times c (P, d, H V L) \times s (H V L)

(6)

Summary statistics for all datasets.

N (current line): 671;
Mean AGD (mGy): 0.284;
Median (mGy): 0.269;
25–75% (mGy): 0.232–0.323;
Min–Max (mGy): 0.049–0.655.

The mean glandular dose AGD was modelled according to the Dance skeleton in the form AGD = K ⋅ g ⋅ c ⋅ s:

K (Incident air kerma, mGy): Measured air kerma at the entrance surface of the compressed breast.

g-factor: Conversion coefficient transforming incident air kerma to glandular dose for a standard 50% glandular breast.

c-factor: Correction coefficient accounting for variations in breast glandularity relative to the standard model.

s-factor: Spectrum correction factor accounting for target/filter combinations and beam quality.

It should be noted that AGD represents a standardized physics-based dose estimation derived from the Dance formalism rather than a direct in vivo glandular dose measurement. Direct glandular dose measurement is not ethically or technically feasible in clinical practice. Therefore, the adopted formulation follows internationally accepted mammographic dosimetry standards.

The g, c, and s coefficients were obtained from the tabulated values reported by Dance et al. and subsequent updates. Interpolation was performed for intermediate breast thickness and HVL values to ensure continuous parameter estimation. Since K_air in the dataset contains backscatter at the surface, a K = Kair/BSF transformation was performed. For g(d, HVL), c(p, d, HVL), s(HVL), constrained closed-form functions were used that provide the expected physical trends (decrease/increase with d and HVL) in the mammographic range. This resulted in an explainable estimate that used only the available columns. Intermediate coefficients (BSF, K_incident, g, c, s) were recorded separately for each row.

In this study, ‘patient-specific’ refers to using patient/acquisition-specific inputs (K_air, thickness, HVL) together with clinically assessed BI-RADS density category to produce a standardized AGD estimate via the Dance formalism; the ML model then learns the mapping from these inputs to AGD efficiently for rapid clinical use. The output is a standardized AGD estimate, not a directly measured absorbed dose.

3.1. Sensitivity and Robustness Analysis

To evaluate the robustness of the AGD formulation, a local sensitivity analysis was conducted. Each key parameter (K, breast thickness, HVL, glandularity class) was perturbed independently by ±5% and ±10%. The normalized sensitivity coefficient was calculated as:

S_{i} = \frac{Δ A G D / A G D}{Δ x_{i} / x_{i}}

(7)

where

S_{i}

represents the sensitivity of AGD to parameter

x_{i}

. Table 1 shows local sensitivity analysis of AGD model parameters.

The elasticity-based sensitivity analysis revealed that AGD shows near-linear dependence on incident air kerma (Si ≈ 1.03), moderate inverse sensitivity to breast thickness (Si ≈ −0.52), and stronger spectral dependence on HVL (Si ≈ −1.84). These findings are consistent with the physical structure of the Dance-based dosimetric formulation.

3.2. Particle Swarm Optimization (PSO) Hyperparameter Configuration

To ensure methodological reproducibility, the core hyperparameters of the Particle Swarm Optimization (PSO) algorithm were explicitly defined.

The PSO algorithm was configured as follows:

Number of particles (swarm size): 10;
Maximum number of iterations: 30;
Inertia weight (w): 0.7;
Cognitive learning factor (c1): 1.5;
Social learning factor (c2): 1.5;
Velocity bounds: automatically constrained to 20% of each parameter search range.

In order to ensure a fair and systematic comparison across different ensemble learning paradigms, model-specific hyperparameter search spaces were predefined prior to the PSO process. The selected ranges were determined based on commonly reported configurations in the literature and preliminary empirical testing to balance computational feasibility and model flexibility.

For the CatBoost model, the number of boosting iterations was searched within the range of 100 to 300, while the tree depth was varied between 3 and 8 to control model complexity and prevent overfitting. The learning rate was optimized within the interval of 0.01 to 0.30, enabling both conservative and moderately aggressive gradient updates. The L2 regularization parameter (l2_leaf_reg) was explored between 1 and 10 to regulate leaf weight penalization.

For the Gradient Boosting Regressor, the number of estimators was optimized between 50 and 200, and the maximum tree depth was allowed to vary from 2 to 6. The learning rate was searched within the same interval of 0.01 to 0.30. Additionally, the minimum number of samples required at a leaf node (min_samples_leaf) was varied between 1 and 5 to control model smoothness and generalization.

For the Random Forest model, the number of trees (n_estimators) was searched between 50 and 200, while the maximum tree depth was varied between 5 and 25. The minimum number of samples required to split an internal node (min_samples_split) was optimized within the range of 2 to 10, and the minimum number of samples required at a leaf node (min_samples_leaf) was varied between 1 and 5 to balance variance reduction and overfitting control.

For the Extra Trees Regressor, a similar structural search space was adopted. The number of estimators ranged from 50 to 200, and the maximum tree depth was varied between 5 and 20. The min_samples_split parameter was searched between 2 and 10, while min_samples_leaf was optimized between 1 and 5. These ranges were selected to allow sufficient randomness while maintaining stable ensemble aggregation.

For the AdaBoost model, the number of estimators was searched within the interval of 50 to 200. The learning rate was optimized over a broader range of 0.01 to 1.0, reflecting AdaBoost’s sensitivity to shrinkage intensity. In addition, the maximum depth of the base decision tree learner was varied between 1 and 10 to allow the swarm to explore both shallow and moderately deep weak learners.

All hyperparameters were treated as continuous variables during PSO updates and discretized where necessary (e.g., depth, number of estimators) before model training. Boundary constraints were strictly enforced to ensure that candidate solutions remained within predefined feasible regions.

The optimization objective was the minimization of the Mean Squared Error (MSE) on 5-fold cross-validation within the training dataset:

F i t n e s s = \frac{1}{K} \sum_{k = 1}^{K} {M S E}_{k}

(8)

where K = 5. The test set was not used during optimization.

3.3. Multicollinearity and Feature Interaction Analysis

To evaluate the potential impact of multicollinearity among input variables, a Pearson correlation matrix analysis was performed prior to model training. The resulting heatmap is presented in Figure 2.

As shown in Figure 2, strong positive correlations were observed between k_air and mAs, and moderate correlations were detected between k_air and compressed breast thickness (mm). Additionally, kV demonstrated a positive association with HVL, reflecting the expected beam-hardening spectral relationship in mammographic physics. These findings indicate the presence of partially redundant information among acquisition parameters.

To further quantify multicollinearity, Variance Inflation Factor (VIF) values were calculated. Although elevated VIF values were observed for k_air and mAs, these variables were retained due to their distinct physical interpretation and established dosimetric significance.

Since tree-based ensemble models (Random Forest, Gradient Boosting, CatBoost) are inherently robust to multicollinearity due to hierarchical feature selection mechanisms, no dimensionality reduction technique was applied. Nevertheless, potential redundancy effects were considered during model interpretation.

Strong positive correlations are observed between k_air and mAs, while moderate correlations exist between k_air and compressed breast thickness. The association between kV and HVL reflects expected beam-hardening behavior.

To investigate potential interaction effects, SHAP-based interaction analysis was performed. A non-linear combined influence between breast thickness and breast pattern was observed. Additionally, an explicit interaction feature (Thickness × Breast Pattern) was constructed and evaluated. However, its inclusion did not produce a statistically significant improvement in predictive performance (ΔRMSE < 0.002), suggesting that ensemble models internally capture interaction structures without requiring manual feature engineering.

3.4. Data Preprocessing

Prior to model training, the clinical dataset was subjected to a structured preprocessing pipeline. The dataset was first examined for missing values. No missing entries were detected in the selected acquisition parameters (k_air, mAs, kVp, HVL, breast thickness, and breast pattern). Therefore, no imputation method was required.

Outlier analysis was conducted using the interquartile range (IQR) method. Observations exceeding 1.5 × IQR beyond the first or third quartile were inspected. Since the identified extreme values corresponded to clinically valid acquisition settings, no data points were removed to preserve real-world variability.

Given that tree-based ensemble models (Random Forest, Gradient Boosting, CatBoost) are scale-invariant and do not require feature standardization, no normalization or standardization technique was applied. All features were used in their original physical units to preserve interpretability.

Breast density classification was performed by two radiologists in consensus. In cases of initial disagreement, a joint review was conducted to reach a final decision. Therefore, inter-observer variability was minimized.

4. Findings

In the total of 671 CC projection mammography images included in the study, the highest measured AGD value was determined to be 0.655 mGy, while the lowest AGD value was 0.049 mGy. The average AGD for the entire sample was calculated as 0.269 mGy. When the descriptive statistics for the patient group were examined, the mean age was 58.55 years, with the age range varying between 40 and 81 years. In terms of imaging parameters, the mean kVp was 28.5 (24–37), mAs 73.1 (20–178), K_air 1.61 mGy (0.2–4.3 mGy), and breast thickness 47.3 mm (11–83 mm).

4.1. Machine Learning Models Results and Discussion

The dataset was randomly divided into training (80%) and testing (20%) sets using a fixed random seed to ensure reproducibility. Five-fold cross-validation was applied within the training set for hyperparameter optimization. The test set was strictly held out and was not used during training or optimization stages.

Figure 3 shows the evolution of MSE, MAE, MAPE and R² for each model over the PSO iterations. For all models, the error measures (MSE/MAE/MAPE) decrease sharply in the first few iterations and then stabilise around a shallow plateau, confirming the fast convergence property of PSO. The R² curves increase monotonically in parallel, reaching high explanatory levels for CatBoost + PSO (1) and GradientBoosting + PSO (2) from the early iterations; RandomForest + PSO (4) and AdaBoost + PSO (3) stabilise in a similar band, while ExtraTrees + PSO (5) remains at a lower plateau level compared to the others. The “stepwise” decreases in the curves correspond to the iterations where the global best (g_best) is updated, followed by small oscillations in the iterations that give way to stagnation in the exploitation phase. The overall picture shows that PSO reduces the error rapidly and consistently, especially for boosting-based methods (CatBoost, Gradient Boosting).

Figure 4 shows the p_best curves of 10 particles for each model and the iterative evolution of g_best shown by the dashed line. In all models, the p_best curves initially exhibit high variance, but rapidly decrease within the first 3–7 iterations and converge by clustering around g_best; the monotonically decreasing character of g_best with time indicates that PSO is properly applied. For CatBoost + PSO (1) and GradientBoosting + PSO (2), g_best decreases by a significant order of magnitude in the early iterations and then flattens out, indicating efficient exploration of the search space followed by stable exploitation. AdaBoost + PSO (3) and RandomForest + PSO (4) initially show sharper but short-lived jumps, but similarly quickly settle into the low MSE region. ExtraTrees + PSO (5), on the other hand, shows a relatively smoother decline and a higher plateau value, indicating that the search space in question may be rougher and relatively more cautious progress in converging to the optimum. Taken together, these findings suggest that PSO can quickly and steadily converge to the global optimum through particle-population dynamics.

Table 2 shows the comparative results of the models built on the dataset, showing that hyperparameter tuning with particle swarm optimisation (PSO) provides a significant improvement in all learners.

The highest accuracy and the lowest error were obtained for the CatBoost + PSO model (MSE = 0.0001, RMSE = 0.0100, MAE = 0.0052, MAPE ≈ 1.74%, R² = 0.9846). These values correspond to a reduction of ≈53% in RMSE, ≈52% in MAE and ≈69% in MAPE and an increase of +0.0416% in R² compared to the non-PSO version of the same model; therefore, reducing the tree depth to 4 and keeping the learning rate around ≈0.25 significantly reduced the prediction error while limiting overfitting. The second-best performance was observed with Gradient Boosting + PSO (RMSE = 0.0100, MAPE ≈ 2.90%, R² = 0.9787); PSO reduced RMSE by ≈55% and MAPE by ≈52% in this method. The Random Forest + PSO and AdaBoost + PSO models are in a similar band (RMSE ≈ 0.0173, MAPE ≈ 3.7–3.9%, R² ≈ 0.952–0.953) and provide ≈46% and ≈45% improvement in RMSE and +0.084% and +0.081% improvement in R², respectively, over their classical versions, suggesting that joint optimisation of tree number/depth and learning rate is critical in ensemble methods. ExtraTrees + PSO also shows improvement (RMSE = 0.0200, R² = 0.9285) but lags the other PSO boosting methods. In general, the MAPE in the range of 1.7–7.5% and R² ≥ 0.98 for the two PSO boosting algorithms (CatBoost, Gradient Boosting) indicate that the model can explain a high proportion of the variance of the target variable and the residuals (RMSE ≈ 0.01 units) are practically negligible.

Figure 5 shows the superimposed time series profiles of actual values and predictions over the test samples for each model. The PSO-adjusted CatBoost + PSO (1) and GradientBoosting + PSO (2) curves most closely follow the actual series in terms of both phase (timing) and amplitude, with short and limited deviation in high frequency fluctuations and sudden peaks and troughs. RandomForest + PSO (4) and AdaBoost + PSO (3) successfully track the overall level, with slight damping (suppression of under/over-shoots) and partial delay at large amplitude peaks and troughs. ExtraTrees + PSO (7) improves, but the phase shift at the corner points is more pronounced than in other PSO boosting methods. The basic models without PSO (CatBoost (5), GradientBoosting (6), ExtraTrees (8), AdaBoost (9), RandomForest (10)) exhibit relatively more amplitude contraction and phase error; especially AdaBoost (9) and RandomForest (10) tend to miss extreme values. Collectively, the reduced amplitude damping and better capture of sharp transitions in all PSO variants is consistent with the lower RMSE/MAPE and higher R² values reported in the table, indicating that joint tuning of the hyperparameters with particle swarm optimisation significantly improves the predictive capacity, especially in boosting-based methods.

Figure 6 presents histograms of the residual (error) values for each model, comparing the position (bias) and spread (variance) characteristics of the error. The PSO-tuned CatBoost + PSO (1) and GradientBoosting + PSO (2) distributions exhibit a pronounced leptokurtic (peak-dense) structure, symmetric around zero and squeezed into a narrow band; this is consistent with small bias and low variance and supports the low RMSE/MAPE and high R² values reported in the table. RandomForest + PSO (4) and AdaBoost + PSO (3) produce similarly centred and relatively narrow distributions, but with limited elongation of the tails (especially on the negative side), suggesting that the error in extreme observations occasionally grows, but that the overall error remains under control. ExtraTrees + PSO (7) improves, but the thickening of the right tail and slight traces of positive bias indicate a higher error bias compared to the other PSO-boosting methods.

In the basic versions without PSO, the distributions become significantly wider and asymmetric. For CatBoost (5) and GradientBoosting (6), the left/right tails become thicker with increasing variance, while ExtraTrees (8) shows a clear tail to the right. For AdaBoost (9) and RandomForest (10), large variance and heavy tails (especially on the negative side) are more pronounced; these patterns are consistent with high RMSE/MAPE and low R². Overall, the histograms show that PSO reduces systematic bias and suppresses high-amplitude extreme errors by concentrating the error around zero in all models; the most pronounced gains are observed in the CatBoost + PSO and GradientBoosting + PSO panels.

Figure 7 shows the scatter of the predictions (y-axis) versus the true values (x-axis) for each model and the red dashed “perfect prediction” line (y = x). In the PSO-tuned CatBoost + PSO (1) and GradientBoosting + PSO (2) panels, the points are clustered in a narrow band around the y = x line over the entire dynamic range, with no significant bias and low variance. This calibration quality is consistent with the R² ≥ 0.98 and RMSE ≈ 0.01 reported in the table. RandomForest + PSO (4) and AdaBoost + PSO (3) also exhibit near-linear clustering, with slight “mean shrinkage” at extreme values (underestimation at high true values and multiplication at low values) and limited fanning, consistent with RMSE ≈ 0.017 and MAPE ≈ 3.7–3.9%. ExtraTrees + PSO (7) is acceptable in terms of alignment, although traces of positive bias and heteroskedasticity are more pronounced at the high end.

In the basic versions without PSO (CatBoost (5), GradientBoosting (6), ExtraTrees (8), AdaBoost (9), RandomForest (10)), the scatter clouds are significantly widened and the alignment around the y = x line is distorted; especially in the AdaBoost (9) and RandomForest (10) panels, systematic underestimation and increased spread at high values are noticeable. These patterns coincide with the lower R² and higher RMSE/MAPE observed in the baseline models. In conclusion, PSO hyperparameter search improves calibration and reduces bias/variance, especially in boosting-based methods (CatBoost, Gradient Boosting), and significantly improves estimation consistency by tightening the distribution around y = x.

Although all selected models belong to the ensemble learning family, they differ substantially in their internal learning strategies and bias–variance characteristics. Random Forest and Extra Trees are bagging-based methods that construct multiple decision trees independently and aggregate their outputs through averaging. While Random Forest employs bootstrap sampling and random feature selection at each split, Extra Trees introduces additional randomness by selecting split thresholds randomly, thereby increasing variance reduction through stronger decorrelation among trees. In contrast, Gradient Boosting and AdaBoost are sequential boosting-based learners, where each new tree is trained to correct the residual errors of the preceding ensemble. AdaBoost iteratively adjusts sample weights based on prior mispredictions, whereas Gradient Boosting minimizes a differentiable loss function using gradient descent optimization. CatBoost represents a more advanced gradient-boosting implementation, incorporating ordered boosting and symmetric tree structures to reduce prediction shift and improve generalization performance. In this study, Particle Swarm Optimization (PSO) was applied independently to each model to tune key hyperparameters such as tree depth, learning rate, and number of estimators, thereby enabling a fair and systematic comparison across different ensemble paradigms.

4.2. Explanation of Model Decisions: SHAP-Based Explainability Analysis

In this study, we use the SHAP (SHapley Additive exPlanations) approach to explain model decisions. SHAP is an additive and consistency-guaranteed method that adapts Shapley values from co-operative game theory to feature contributions in machine learning: It decomposes the model output for each observation as the sum of the reference expectation value E[f(X)] and the feature-based contributions, thus providing both local (single instance) and global (whole data) level interpretation. Since TreeExplainer, developed for tree-based ensemble models, produces fast and almost complete explanations in models such as CatBoost, the computational cost is low in our dataset with 670 observations and 6 features. SHAP makes visible the relative impact and direction of influence of variables with summary (beeswarm) and importance (bar) graphs, non-linear effects and interactions with dependency graphs; in addition, force/waterfall visualizations quantitatively show how individual predictions are formed. In this way, the conditions under which the model is prone to error, which variables pull the forecast up/down, and possible sources of bias/heteroskedasticity can be identified. In short, SHAP clarifies not only the question “which model is better?” but also why it performs well/badly and thus enables reliable clinical/technical interpretation of the findings; therefore, it was preferred in this study. Beeswarm graph is shown in Figure 8.

This summary (beeswarm) plot in Figure 8 shows the SHAP values of the features (i.e., their contribution to the model output) for each observation, with the sign and magnitude on the horizontal axis indicating the direction and strength of the contribution, and the dot colour indicating the range of low to high values of the feature in that observation. Since the features are ranked in order of importance according to their average absolute contribution, k_air is the most dominant predictor in the model.

k_air: The concentration of red dots (high k_air) on the right and blue dots (low k_air) on the left indicates that the dose-increasing effect of the inlet air kerma is strong and monotonic. The width of the spread (≈±0.15 units SHAP) also confirms that this variable is the largest contributor to the total uncertainty.
mm (thickness): The fact that higher thicknesses produce mostly positive SHAP is in accordance with the fact that the average glandular dose via the terms g(d, HVL) and c(p, d, HVL) increases with increasing thickness. The effect is significant but of lower amplitude than k_air.
mas: Overall trend is positive, but scattering of colour transitions to both sides, AEC/exposure compensation and collinearity with mm suggest that the effect is context sensitive (mm-mas interaction).
breastpattern (glandularity): Higher pattern levels (red) are mostly shifted to the right side, suggesting a dose-increasing effect via the glandularity correction coefficient; magnitude is moderate relative to k_air and mm.
HVL: Red dots are to the left, indicating that a harder spectrum (higher HVL) reduces glandular absorption for the same input kerma, thus reducing dose; consistent with the expected physical relationship.
kvp: The effect amplitude is small; since k_air and HVL are already included in the model, the independent contribution of tube voltage is limited (especially when k_air is fixed).

Overall, the model learnt an importance and direction structure consistent with physics-based expectation: k_air is the primary driver of dose, followed by thickness (mm) and exposure (mas), while spectrum-hardening HVL suppresses dose. For mas and limited amplitude kvp, where colours are cross-distributed, additional dependence plots and interaction studies will help to quantitatively demonstrate context sensitivity and possible heteroskedasticity.

5. Discussion

Figure 9 shows how breast density (breast pattern 1 → 4) affects the average breast dose when the breast thickness (36–60 mm) is constant. At 36 mm and 40 mm the trend is a uniform increase, e.g., at 40 mm the mean dose 1 → 4 is approximately 0.212 < 0.289 < 0.325 < 0.445, respectively; at 36 mm 0.165 < 0.267 ≈ 0.268 < 0.380. In the range 45–55 mm the highest average values were mostly obtained for pattern 3 (e.g., 55 mm: 0.339), while pattern 4 was very close or slightly lower; this can be explained by the exposure limitation of the high-density AEC, the imbalance of the number of criteria (n) and the influence of the spectrum/filtration (kV-HVL) settings. At 60 mm (pattern 4 is not available) the trend is 0.201 < 0.277 < 0.291 between 1 → 3. In the thickness dimension, the doses for all densities show an increasing trend from 36/40 mm to 55 mm and a plateau/decreasing trend at 60 mm; this is consistent with the fact that the input kerma and glandular absorption increases with increasing tissue thickness, but at higher thicknesses the increase is limited by the effect of system settings and scattering.

Overall, the findings strongly support the hypothesis that “average breast dose increases with increasing breast density (at the same thickness)” and suggest that small deviations may be due to sampling imbalance and protocol differences.

The most effective method for breast cancer screening is mammography, and mammography is used as the primary imaging method in national breast cancer screening programmes to enable early diagnosis [1,2]. Due to the high radiosensitivity of breast tissue, accurate estimation of the Average Glandular Dose (AGD) is critical for both dose optimisation and the proper execution of device quality control processes [1,2].

Although there are numerous studies in the literature on AGD calculation in mammography, these studies are based on different calculation models and largely employ simulation methods conducted using phantoms [5,6,7,19]. According to current information, there are very few studies that perform patient-specific AGD calculations using breast parenchymal structure determined by radiologists in AI-assisted mammography. In this respect, our study to the best of our knowledge, limited studies in the literature and provides a basis for further research.

Since direct measurement of AGD is not possible, the value is obtained by multiplying the k_air variable, which is the input air kerma, by various conversion factors. Therefore, k_air is considered one of the most important parameters affecting AGD. k_air is closely related to breast thickness and breast pattern [15,20]. An increase in breast thickness triggers the device’s automatic density adjustment mechanism, leading to an increase in mAs; this, in turn, causes an increase in the k_air value. Our study clearly demonstrated that k_air is the most dominant variable affecting AGD.

Our findings revealed a positive and moderate correlation between mean AGD and compressed breast thickness. Previous studies using breast phantoms also demonstrated that AGD increases as breast thickness increases, while CNR values decrease [15,21,22]. This finding is consistent with the results of our study.

In our study, breast pattern was identified as one of the important parameters affecting AGD. An increase in the proportion of fibroglandular tissue has been found to have an enhancing effect on AGD. The Volpara method offers a patient-based approach that more accurately assesses individual breast tissue distribution, while the Wu and Dance methods are based on calculations using age, breast thickness, and standard conversion factors [8,9,16,17,18]. Many recent studies have been based on the Dance method [9]. In dense breast tissue, higher AGD may be required to improve diagnostic image quality [15,21]. In Dance’s Monte Carlo (MC)-based studies, the breast was assumed to be a homogeneous structure consisting of 50% fat and 50% fibroglandular tissue [18]. Similarly, a study by Yaffe and colleagues also accepted that breast tissue has a fixed 50% fibroglandular structure [23]. In our study, breast patterns were assessed in a personalised manner using BI-RADS density categories, thereby enabling more accurate prediction of AGD at an individual level. Furthermore, our findings indicate that differences in breast pattern at similar breast thicknesses produce significant changes in AGD.

In our study, statistically significant correlations were also found between kVp and mAs values applied with AGD. This relationship is based on the device acquiring images with higher kVp and mAs values due to increased breast thickness and fibroglandular tissue density. Studies in the literature also support the positive relationship between kVp and mAs with AGD [15,24,25].

The HVL parameter is a device-dependent variable that can affect AGD; however, in our study, the effect of HVL on AGD was found to be lower compared to other parameters. There are studies in the literature reporting that HVL is more decisive on AGD [26]. This difference may stem from variations in device characteristics and population characteristics.

In general, simplified breast models are not suitable for calculating the individual absolute value of AGD; however, they play an important role in quality control assessments. All AGD calculation methods are based on estimation to some extent [8,27]. The AGD values obtained in our study range from 0.049 to 0.655 mGy and are lower than some values reported in the literature. One reason for this is that only craniocaudal (CC) projections were included in the evaluation. The literature has shown that mediolateral oblique (MLO) projections yield higher doses than CC projections, which is associated with the inclusion of the pectoralis major muscle in the image field [28]. Indeed, some studies have reported AGD ranges of 0.33–6.41 mGy and 0.28–8.59 mGy for CC and MLO projections, respectively [29]. In addition, device technologies, imaging protocols, and population differences may also influence dose variations. The American College of Radiology Imaging Network (ACRIN) Digital Mammographic Imaging Screening Trial (DMIST) reported an average AGD of 1.86 mGy in digital mammography; the values in our study are consistent with these limits [29].

No machine learning-based study has been found in the literature that calculates AGD specifically for the patient, and most existing artificial intelligence studies have been trained using data obtained from MC simulations performed with phantoms [30]. In a similar study, an artificial neural network trained with MC simulations achieved high accuracy with an R² = 0.999 value [31]. In our study, the CatBoost + PSO model achieved very high accuracy levels, with MSE = 0.0001, RMSE = 0.0100, MAE = 0.0052, MAPE ≈ 1.74%, and R² = 0.9846. These results demonstrate that our model can achieve high prediction performance even when individual breast tissue is considered using real patient data.

Recent developments in mammographic dosimetry research (2020–2025) have increasingly focused on patient-specific modelling, Monte Carlo-based computational breast phantoms, and AI-assisted dose estimation frameworks. Advanced simulation-based studies have improved the anatomical realism of glandular dose calculation, while emerging machine learning approaches have demonstrated the feasibility of regression-based AGD prediction using acquisition parameters and volumetric density information [22,30,32]. However, many of these approaches rely on synthetic or phantom-generated datasets and require substantial computational resources. In contrast, the present study utilizes real-world clinical acquisition data and proposes a computationally efficient, PSO-optimized ensemble framework that can be integrated into routine clinical workflows. This positions our work within the latest academic developments while emphasizing its practical applicability.

It is important to contextualize the present approach within established mammographic dosimetry frameworks. The Wu method provides a spectral modeling-based estimation of glandular dose using standardized breast compositions and beam quality parameters, forming one of the early analytical approaches to AGD calculation [33]. Similarly, volumetric breast density-based systems such as the Volpara method estimate patient-specific glandularity by analyzing X-ray attenuation characteristics in digital mammograms and subsequently derive glandular dose estimates using proprietary conversion algorithms. While both approaches have contributed significantly to personalized dosimetry, they primarily rely on predefined physical modeling assumptions or commercial implementations [34]. In contrast, the present study integrates physics-informed formulation with data-driven machine learning optimization using real-world clinical acquisition parameters, thereby offering a complementary and computationally efficient alternative for AGD estimation.

While our study has the potential to provide high-performance AGD prediction, it has certain limitations. The images were obtained using a single mammography device at a single hospital, and the population comprises individuals living in the same geographical region with similar breast tissue characteristics. Furthermore, as there is no artificial intelligence model in the literature that performs AGD prediction without using MC simulation, there is no reference method with which our study can be directly compared. Therefore, this study serves as a preliminary validation and discussion, and further research is needed in larger populations and with different device types.

The sensitivity results confirm the physical consistency of the AGD formulation. As expected, AGD shows near-linear dependence on incident air kerma (K), while thickness affects dose through exponential attenuation behavior embedded in the g-factor interpolation. HVL variations produce moderate spectral effects, and glandularity corrections remain bounded within clinically reasonable ranges. Monte Carlo-based uncertainty propagation analysis demonstrated that the combined variation in input parameters within clinically realistic uncertainty ranges did not result in unstable dose predictions. Ensemble models maintained high predictive performance under perturbed inputs, indicating numerical robustness.

6. Clinical Implications

The proposed framework has several practical implications for clinical mammography practice. Rapid estimation of patient-specific standardized AGD using routinely available acquisition parameters may support device and protocol optimization by enabling immediate feedback on radiation dose levels. In patients with dense breast tissue, where higher exposure settings are often required to maintain image quality, the model may assist in monitoring dose escalation trends and identifying opportunities for optimization without compromising diagnostic adequacy. Furthermore, the approach may facilitate institutional quality control processes by enabling longitudinal dose tracking and comparative analysis across patient subgroups. Since the framework relies on a physics-informed formulation and does not require computationally intensive Monte Carlo simulations, it can be integrated into routine clinical workflows with minimal additional computational burden. Importantly, SHAP-based explainability provides transparent insight into the relative influence of key predictors—such as k_air, compressed breast thickness, and breast pattern—thereby supporting radiologists and medical physicists in understanding dose-driving factors rather than relying solely on black-box model outputs.

7. Limitations

This study has several limitations that should be acknowledged. First, the dataset was derived from a single clinical center and a single mammography device, which may limit the generalizability of the findings to other institutions, populations, or imaging systems with different acquisition protocols and hardware characteristics. Second, only craniocaudal (CC) projections were included in the analysis; mediolateral oblique (MLO) views, which may yield different dose characteristics, were not evaluated. Third, the target variable (AGD) represents a standardized, physics-based estimate derived from the Dance formalism rather than a direct in vivo measurement of absorbed glandular dose. Although this approach reflects current international dosimetry standards, it remains a model-derived quantity. Furthermore, breast density classification was based on BI-RADS categories assessed by radiologists; while this reflects routine clinical practice, inter-observer variability may influence categorization.

8. Conclusions

The average glandular dose (AGD) varies depending on the fibroglandular structure of breast tissue, and reliably determining this structure is critical for accurate dose assessment. This study contributes a workflow contribution to the literature by presenting ensemble learning-based approaches for calculating AGD in mammography (MMG). In our study, breast patterns were evaluated on a patient-specific basis using actual patient images, thereby developing an ensemble learning model that enables more accurate dose estimation, contributes to dose optimisation, and can be easily integrated into different mammography units. The proposed method can provide estimates closer to the actual absorbed dose specific to the patient and offers widespread potential for use based on images without requiring large data sets. The model we developed presents a current, innovative approach to calculating AGD more accurately and reliably and is of a quality that can make a significant contribution to the literature.

Author Contributions

S.Ü. for review, concept, control, original writing, revision and writing; R.G. for review, visualization, experiment, writing—original draft, machine learning. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted with the approval of the Scientific Research and Publication Ethics Board of İzmir Katip Çelebi University of Health Research Ethics Committee, under the official correspondence dated 9 October 2025 (Decision No: 0598). Written informed consent was obtained from all participants, who voluntarily agreed to participate in the study and to allow the use of their data for scientific purposes. All collected data were anonymized, and no personal identifiable information was used during the analysis. Data processing was performed solely using hidden patient identifiers. The research was carried out in full compliance with the principles of the Declaration of Helsinki.

Informed Consent Statement

With the official correspondence dated 9 October 2025 and numbered 0598, ethics certificate was obtained from İzmir Katip Çelebi University of Health Research Ethics Committee.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares that they have no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

ACRIN	American College of Radiology Imaging Network
AGD	Average Glandular Dose
BSF	Backscatter Factor
CC	Craniocaudal
CNR	Contrast Noise Ratio
DMIST	Digital Mammographic Imaging Screening Trial
HVL	Half Value Layer
K	Input Kerma
MC	Monte Carlo
MLO	Mediolateral Oblique
MMG	Mammography
PSO	Particle Swarm Optimisation
SHAP	SHapley Additive exPlanations
SMG	Screening Mammography

References

Kim, J.; Harper, A.; McCormack, V.; Sung, H.; Hussami, N.; Morgan, E.; Mutebi, M.; Garvey, G.; Soerjomataram, I.; Fidler-Benaoudia, M.M. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat. Med. 2025, 31, 1154–1162. [Google Scholar] [CrossRef] [PubMed]
Champagne, J.L.; Cederbom, G.J. Advances in breast cancer detection with screening mammography. Ochsner J. 2000, 2, 33–35. [Google Scholar] [PubMed] [PubMed Central]
Miglioretti, D.L.; Lange, J.; Van Den Broek, J.J.; Lee, C.I.; Van Ravesteyn, N.T.; Ritley, D.; Kerlikowske, K.; Fenton, J.J.; Melnikow, J.; De Koning, H.J.; et al. Radiation-Induced Breast Cancer Incidence and Mortality From Digital Mammography Screening: A Modeling Study. Ann. Intern. Med. 2016, 164, 205–214. [Google Scholar] [CrossRef]
Noor, K.A.M.; Norsuddin, N.M.; Abdulkerim, M.H.; Che İsa, I.N.; Ulaganathan, V. Evaluating factors affecting mean glandular dose in mammography: Insights from a retrospective study in Dubai. Diagnostics 2024, 14, 2568. [Google Scholar] [CrossRef]
Alahmad, H.; Elnazi, H.; Alshahrani, A.; Alreshaid, G.R.; Albariqi, S.; Alnafea, M. Evaluation of mean glandular dose from mammography screening: A single-center study. J. Radiat. Res. Appl. Sci. 2023, 16, 100749. [Google Scholar] [CrossRef]
Gholamkar, L.; Mowlavi, A.A.; Sadeghi, M.; Athari, M. Evaluation of mean glandular dose in mammography system using different anode-filter combinations with MCNP code. Iran. J. Radiol. 2016, 13, e36484. [Google Scholar] [CrossRef]
Kelaranta, A.; Toroi, P.; Timonen, M.; Komssi, S.; Kortesniemi, M. Conformance of mean glandular dose from phantom and patient data in mammography. Radiat. Prot. Dosim. 2015, 164, 342–353. [Google Scholar]
Dance, D.R.; Sechopoulos, I. Dosimetry in x-ray-based breast imaging. Phys. Med. Biol. 2016, 61, R271–R304. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Salomon, E.; Homolka, P.; Semturs, F.; Figl, M.; Gruber, M.; Hummel, J. Comparison of personalized breast dosimetry with standard dosimetry protocols. Sci. Rep. 2019, 9, 5866. [Google Scholar] [CrossRef] [PubMed]
Teoh, K.C.; Abdul Manan, H.; Mohd Norsuddin, N.; Rizuana, I.H. Comparison of mean glandular dose between full-field digital mammography and digital breast tomosynthesis. Healthcare 2021, 9, 1758. [Google Scholar] [CrossRef]
Sulaiman, I.I.; Muhammed, S.; Mahadi, A.; Bashier, E.; Farah, A.; Hasan, N.; Ibrahim, M.A.; Ali, M.H.M.; Ahmed, N.A. Average glandular dose (AGD) and radiation dose optimization in screen-film and digital X-ray mammography. Appl. Sci. 2023, 13, 11901. [Google Scholar]
Rodriguez-Molares, A.; Rindal, O.M.H.; D’hooge, J.; Måsøy, S.E.; Austeng, A.; Lediju Bell, M.A.; Torp, H. The generalized contrast-to-noise ratio: A formal definition for lesion detectability. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2019, 67, 745–759. [Google Scholar]
Law, J. Improved image quality for dense breasts in mammography. Br. J. Radiol. 1992, 65, 50–55. [Google Scholar] [CrossRef] [PubMed]
Destounis, S.V.; Esserman, L.; Albarracin, C.; Zaniewski, K.; White, E.; Shermer, R. Update on Breast Density, Risk Estimation, and Supplemental Screening. AJR Am. J. Roentgenol. 2020, 215, 592–600. [Google Scholar]
Du, X.; Yu, N.; Zhang, Y.; Wang, J. The relationship of the mean glandular dose with compressed breast thickness in mammography. J. Public Health Emerg. 2017, 1, 32. [Google Scholar] [CrossRef]
Dance, D.R.; Young, K.C. Estimation of mean glandular dose for contrast-enhanced digital mammography: Factors for use with the UK, European and IAEA breast dosimetry protocols. Phys. Med. Biol. 2014, 59, 2127–2137. [Google Scholar] [CrossRef]
Dance, D.R.; Skinner, C.L.; Young, K.C.; Beckett, J.R.; Kotre, C.J. Additional factors for the estimation of mean glandular breast dose using the United Kingdom mammography dosimetry protocol. Phys. Med. Biol. 2000, 45, 3225–3240. [Google Scholar] [CrossRef]
Dance, D.R. Monte Carlo calculation of conversion factors for the estimation of mean glandular breast dose. Phys. Med. Biol. 1990, 35, 1211–1219. [Google Scholar] [CrossRef] [PubMed]
Sarno, A.; Dance, D.R.; van Engen, R.E.; Young, K.C.; Russo, P.; Di Lillo, F.; Mettivier, G.; Bliznakova, K.; Fei, B.; Sechopoulos, I. A Monte Carlo model for the evaluation of mean glandular dose in spot compression mammography. Med. Phys. 2017, 44, 3848–3860. [Google Scholar]
Borzì, G.R.; Bonanno, E.; Cavalli, N.; D’Anna, A.; Hızı, M.; Stella, G.; Zirone, L.; Marino, C. Evaluation of a patient dose monitoring system for the estimation of average glandular dose (AGD) in mammography. Appl. Sci. 2025, 15, 3338. [Google Scholar]
Nakamura, N.; Okafuji, Y.; Adachi, S.; Takahashi, K.; Nakakuma, T.; Ueno, S. Effect of different breast densities and average glandular dose on contrast-to-noise ratios in full-field digital mammography: Simulation and phantom study. Radiol. Res. Pract. 2018, 2018, 6192594. [Google Scholar] [PubMed]
Bruschi, G.; Ricciardi, V.; De Marco, P.; Origgi, D. Phantom-based comparative analysis of contrast-enhanced mammography systems: Image quality and performance evaluation. J. Appl. Clin. Med. Phys. 2025, 26, e70163. [Google Scholar] [PubMed]
Yaffe, M.J.; Boone, J.M.; Packard, N.; Alonzo-Proulx, O.; Huang, S.Y.; Peressotti, C.L.; Al-Mayah, A.; Brock, K. The myth of the 50-50 breast. Med. Phys. 2009, 36, 5437–5443. [Google Scholar]
Riabi, H.A.; Mehnati, P.; Mesbahi, A. Evaluation of average glandular dose in a full-field digital mammography unit in Tabriz, Iran. Radiat. Prot. Dosim. 2010, 142, 222–227. [Google Scholar] [CrossRef]
Chevalier, M.; Morán, P.; Ten, J.I.; Fernández Soto, J.M.; Cepeda, T.; Vañó, E. Patient dose in digital mammography. Med. Phys. 2004, 31, 2471–2479. [Google Scholar]
Bouzarjomehri, F.; Mostaar, A.; Ghasemi, A.; Ehramposh, M.H.; Khosravi, H. The Study of Mean Glandular Dose in Mammography in Yazd and the Factors Affecting It. Iran. J. Radiol. Autumn. 2006, 4, 29–34. [Google Scholar]
Sarno, A.; Mettivier, G.; Russo, P. Homogeneous and patient-specific breast models for Monte Carlo evaluation of mean glandular dose in mammography. Phys. Med. 2018, 51, 56–63. [Google Scholar] [CrossRef]
Bor, D.; Tukel, S.; Olgar, T.; Toklu, T.; Aydın, E.; Akyol, O. Variations in breast doses for an automatic mammography system. Diagn. Interv. Radiol. 2008, 14, 160–164. [Google Scholar]
Dhou, S.; Dalah, E.; AlGhafeer, R.; Hamidu, A.; Obaideen, A. Regression analysis between the different breast dose metrics and mammography parameters. J. Imaging 2022, 8, 211. [Google Scholar] [CrossRef] [PubMed]
Massera, R.T.; Tomal, A. Estimation of glandular dose in mammography based on artificial neural networks. Phys. Med. Biol. 2020, 65, 095009. [Google Scholar] [CrossRef]
Erguzel, T.T.; Tekin, H.O.; Manici, T.; Altunsoy, E.E.; Tarhan, N. Comparison of multiple linear regression analysis and artificial neural network approaches in the estimation of Monte Carlo mean glandular dose calculations of mammography. Dig. J. Nanomater. Biostruct. 2018, 13, 163–176. [Google Scholar]
Ramnarain, J.; Cartwright, L.; Diffey, J. Trends in patient dose and compression force for digital (DR) mammography systems over an eleven-year period. Phys. Eng. Sci. Med. 2024, 47, 215–222. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Barnes, G.T.; Tucker, D.M. Spectral dependence of glandular tissue dose in screen-film mammography. Radiology 1991, 179, 143–148. [Google Scholar] [CrossRef] [PubMed]
Highnam, R.; Brady, S.M.; Yaffe, M.J.; Karssemeijer, N.; Harvey, J. Robust breast composition measurement-VolparaTM. In International Workshop on Digital Mammography; Springer: Berlin/Heidelberg, Germany, 2010; pp. 342–349. [Google Scholar]

Figure 1. (a) Predominantly fatty breast pattern; (b) Scattered fibroglandular breast pattern; (c) Heterogeneously dense breast pattern; (d) Extremely dense breast pattern.

Figure 2. Pearson correlation matrix of input variables used in AGD prediction.

Figure 3. Change in criteria according to iterations in hyperparameter tuning with PSO.

Figure 4. Particle Best Scores (p_best) and global Best (g_best) curves.

Figure 5. Comparison of actual-prediction curves of the models.

Figure 6. Comparison of prediction error distributions of the models.

Figure 7. Actual-estimate scatter diagrams.

Figure 8. Beeswarm graph.

Figure 9. The Relationship Between Breast Thickness, Parenchymal Pattern and Average Glandular Dose.

Table 1. Local Sensitivity Analysis of AGD Model Parameters.

Parameter	Elasticity (Si)	±5% Change in AGD (%)	±10% Change in AGD (%)
k_air (Incident Air Kerma)	1.028	5.14	10.28
Breast Thickness (mm)	−0.520	2.60	5.20
HVL (Beam Quality)	−1.838	9.19	18.38

Table 2. Performance metric values and parameter settings of all models.

Model	MSE	RMSE	MAE	MAPE (%)	Accuracy (%)	R²	Best Params/Notes
AdaBoost	0.0010	0.0316	0.0144	7.4477	92.5522	0.8722	max_depth = 10, learning_rate = 1.0, n_estimators = 100
AdaBoost + PSO	0.0003	0.0173	0.0107	3.9200	96.0800	0.9533	n_estimators = 114, learning_rate ≈ 0.59036, tree_max_depth = 10
CatBoost	0.0004	0.0211	0.0108	5.6757	94.3242	0.9430	depth = 3, iterations = 200, l2_leaf_reg = 1, learning_rate = 0.1
CatBoost + PSO	0.0001	0.0100	0.0052	1.7400	98.2600	0.9846	iterations = 299, depth = 4, learning_rate ≈ 0.25245
ExtraTrees + PSO	0.0004	0.0200	0.0102	3.5800	96.4200	0.9285	n_estimators = 108, max_depth = 11, min_split = 3, min_leaf = 1
ExtraTrees	0.0008	0.0289	0.0128	6.8020	93.1979	0.8935	n_estimators’: 108, ‘max_depth’: 11, ‘min_samples_split’: 3, ‘min_samples_leaf’: 1
Gradient Boosting	0.0004	0.0222	0.0111	5.9984	94.0015	0.9367	learning_rate’: 0.1, ‘max_depth’: 3, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 200
Gradient Boosting + PSO	0.0001	0.0100	0.0078	2.9000	97.1000	0.9787	n_estimators = 150, learning_rate ≈ 0.17104, max_depth = 3, min_leaf = 4
Random Forest + PSO	0.0003	0.0173	0.0099	3.7200	96.2800	0.9521	n_estimators = 137, max_depth = 20, min_split = 2
Random Forest	0.0010	0.0322	0.0147	7.5266	92.4733	0.8678	n_estimators = 200, max_depth = 10, min_leaf = 1, min_split = 2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ünal, S.; Gürfidan, R. PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography. Appl. Sci. 2026, 16, 2514. https://doi.org/10.3390/app16052514

AMA Style

Ünal S, Gürfidan R. PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography. Applied Sciences. 2026; 16(5):2514. https://doi.org/10.3390/app16052514

Chicago/Turabian Style

Ünal, Sevgi, and Remzi Gürfidan. 2026. "PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography" Applied Sciences 16, no. 5: 2514. https://doi.org/10.3390/app16052514

APA Style

Ünal, S., & Gürfidan, R. (2026). PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography. Applied Sciences, 16(5), 2514. https://doi.org/10.3390/app16052514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PSO-Based Ensemble Learning Enhanced with Explainable Artificial Intelligence for Breast Glandular Dose Estimation in Mammography

Abstract

1. Introduction

2. Materials and Methods

3. Mathematical Model for Breast Doze Estimation

3.1. Sensitivity and Robustness Analysis

3.2. Particle Swarm Optimization (PSO) Hyperparameter Configuration

3.3. Multicollinearity and Feature Interaction Analysis

3.4. Data Preprocessing

4. Findings

4.1. Machine Learning Models Results and Discussion

4.2. Explanation of Model Decisions: SHAP-Based Explainability Analysis

5. Discussion

6. Clinical Implications

7. Limitations

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI