Next Article in Journal
Management of Postoperative Pancreatic Fluid Collection and Role of Endoscopy: A Case Series and Review of the Literature
Previous Article in Journal
Spectralis Optical Coherence Tomography for Evaluating Ocular Hypertensive and Glaucoma Suspect Eyes: Real-World Data from Taiwan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Integration of Bronchoalveolar Lavage miRNAs and KL-6 in Progressive Pulmonary Fibrosis Diagnosis

by
Piera Soccio
1,*,
Valerio Longo
2,
Corrado Mencar
2,
Pasquale Tondo
1,3,
Fabiola Murgolo
1,
Giulia Scioscia
1,3 and
Donato Lacedonia
1,3
1
Department of Medical and Surgical Sciences, University of Foggia, Viale Luigi Pinto, 1, 71122 Foggia, Italy
2
Department of Computer Science, University of Bari Aldo Moro, 70125 Bari, Italy
3
Institute of Respiratory Diseases, Policlinico Riuniti of Foggia, 71122 Foggia, Italy
*
Author to whom correspondence should be addressed.
Diagnostics 2025, 15(10), 1257; https://doi.org/10.3390/diagnostics15101257
Submission received: 2 April 2025 / Revised: 3 May 2025 / Accepted: 7 May 2025 / Published: 15 May 2025
(This article belongs to the Section Pathology and Molecular Diagnostics)

Abstract

:
Background/Objectives: Progressive pulmonary fibrosis (PPF) represents one of the most severe and complex challenges in respiratory medicine, characterized by a rapid decline in lung function and often poor prognosis, making it a priority in research on interstitial lung diseases (ILDs). The aim of this study is to correlate classical clinical features and three genetic biomarkers with the diagnosis and prognosis of progressive pulmonary fibrosis in ILDs. Methods: This study involved 19 patients with progressive pulmonary fibrosis (PPF) and 20 patients with non-progressive pulmonary fibrosis (nPPF) from the S.C. of Respiratory System Diseases at the Policlinico of Foggia (Italy) between 2015 and 2022. All participants underwent pulmonary function tests (PFTs), a 6 min walk test (6MWT), and bronchoalveolar lavage (BAL) sampling, following the acquisition of written consent for these procedures. Bayesian analysis with generalized linear models has been applied for both diagnostic and prognostic classification. Results: The proposed Bayesian model enables the estimation of the contribution of each considered feature, and the quantification of the uncertainty that is consequential to the small size of the dataset. The analysis of miRNAs such as miR-21 and miR-92a, alongside the protein biomarker KL-6, was identified as a significant indicator for PPF diagnosis, enhancing both the sensitivity and specificity of predictions. Conclusions: The identification of specific genetic markers such as microRNAs and their integration with traditional clinical characteristics can significantly enhance the management of patients with the disease. This multidimensional approach, which integrates clinical data with omics data, could enable more precise identification and monitoring of the disease and potentially optimize future treatments through larger studies and extended follow-ups.

1. Introduction

Pulmonary fibrosis is a chronic disease that results from the excessive accumulation of fibrotic tissue in the lungs, causing thickening and stiffness that compromise lung function and lead to symptoms such as dyspnea and reduced breathing capacity [1,2].
There are two primary forms of lung fibrosis: progressive and non-progressive. Progressive pulmonary fibrosis (PPF) is characterized by a continuous and irreversible decline in lung function despite treatment. Disease progression is defined by a decline in forced vital capacity (FVC), worsening symptoms, and an increased need for oxygen therapy [3]. Idiopathic pulmonary fibrosis (IPF) is one of the most severe and well-known forms of PPF, but progressive fibrosis can also occur in other interstitial lung diseases (ILDs). IPF is a chronic fibrotic lung disease of unknown origin that predominantly affects middle-aged and elderly adults. It is characterized by progressive fibrosis of lung tissue, leading to irreversible deterioration of lung function. While the clinical course of IPF is variable, it is generally associated with a poor prognosis, with a median survival ranging from 3 to 5 years from diagnosis [3]. Key symptoms include dyspnea (shortness of breath), dry and persistent cough, fatigue, and weight loss.
In addition to IPF, several ILDs can develop a progression that is characterized by irreversible fibrosis and impaired respiratory function. These include non-specific interstitial pneumonia (NSIP), which can exhibit a progressive fibrotic course, although it typically presents a more uniform fibrotic pattern compared to IPF. NSIP may be idiopathic or secondary to autoimmune diseases and can have a more favorable prognosis than other forms of progressive pulmonary fibrosis [4]. Chronic hypersensitivity pneumonitis, caused by prolonged environmental exposure, can evolve into progressive fibrosis with a clinical presentation similar to that of IPF [3]. Connective tissue disease associated with interstitial lung disease (CT-ILD), which may occur in patients with autoimmune diseases, shows an unpredictable progression and a significant effect on prognosis [5]. Fibrotic pneumoconiosis, such as silicosis and asbestosis, caused by chronic inhalation of toxic mineral dusts, results in progressive pulmonary fibrosis and impaired respiratory function [6]. Finally, fibrotic sarcoidosis, a late-stage complication of sarcoidosis, is characterized by disease-typical granulomas progressing to diffuse fibrosis, with effects on lung function similar to those found in other fibrotic ILDs [7].
The diagnosis of progressive fibrotic ILDs is based on a combination of clinical history, imaging (particularly high-resolution computed tomography, HRCT), lung function tests, and, in some cases, lung biopsy. Identifying fibrotic progression early is crucial for enhancing treatment and the clinical management of patients.
In contrast, non-progressive pulmonary fibrosis (nPPF) is characterized by the stabilization of fibrotic tissue in the lungs, without further deterioration over time. In this form, the scarring process may stop or slow significantly, allowing the patient to maintain relatively stable lung function over time. Although this condition can also occur in IPF, it is more common in other forms of pulmonary fibrosis [3].
Despite advances in medical research, there are no specific laboratory tests for the early detection of progressive fibrosis. The lack of diagnostic tools hinders the ability to accurately predict the activity of the disease and its future course. A better understanding of the underlying molecular mechanisms is therefore crucial for the development of targeted treatments and improved patient outcomes. Krebs von den Lungen 6 (KL-6) is a mucus-forming glycoprotein with a high molecular weight that is expressed on the surface membrane of alveolar epithelial cells (AEC-II) and bronchiolar epithelial cells. Recent studies have shown that elevated serum levels of KL-6 act as a prognostic indicator, correlating with decreased survival rates [8,9]. Additionally, KL-6 has been identified as a predictor of exacerbations in IPF [10], and it shows promise as a biomarker for monitoring the response to antifibrotic therapy [11]. Furthermore, KL-6 has been identified as a promising biomarker to assess the severity of fibrotic processes and disease progression in ILD, with significantly higher levels observed in patients with PPF compared to those with nPPF. The significant difference in KL-6 levels between serum and bronchoalveolar lavage fluid (BAL) highlights its potential diagnostic and prognostic value, providing information on both local and systemic aspects of ILD [12].
MicroRNAs (miRNAs), a class of small noncoding RNAs that play a key role in the post-transcriptional regulation of gene expression, also play a fundamental role in pulmonary fibrosis by modulating gene expression involved in key pathological processes such as epithelial–mesenchymal transition (EMT) and extracellular matrix deposition [13,14]. Dysfunctions in these miRNAs can alter the balance of the inflammatory response and collagen production, contributing to the development and progression of pulmonary fibrosis [15].
MiRNAs can be transported by nanometric extracellular vesicles known as exosomes. Exosomes are essential for intercellular communication and can influence physiological and pathological processes. Many molecular components of exosomes have also been associated with various diseases, suggesting a role as diagnostic, prognostic, and therapeutic monitoring tools [16,17,18]. Recent studies on exosomal miRNAs, derived from both the serum and BAL of patients with IPF, highlight their role in the pathogenesis of the disease and their potential as progression biomarkers and as targets for the development of new drugs [19,20].
The combination of clinical and genetic biomarkers, such as KL-6, miR-21, and miR-92a, represents a promising opportunity to improve early diagnosis, risk stratification, and therapeutic decision-making in patients with fibrotic ILDs. Despite advancements in imaging diagnostics and functional assessment, the early identification of fibrotic progression remains a clinical challenge. In this context, the combination of clinical and genetic biomarkers, such as KL-6, miR-21, and miR-92a, represents a promising opportunity to enhance early diagnosis, risk stratification, and therapeutic decision-making in patients with fibrotic ILDs. KL-6 reflects the damage and regenerative processes of the alveolar epithelium [8,9,10,11], while miR-21 is known for its pro-fibrotic role, as it promotes epithelial–mesenchymal transition (EMT) and extracellular matrix deposition [13,14,15]. In contrast, miR-92a plays an anti-fibrotic role, modulating angiogenesis and matrix remodeling [19,20]. The interaction between miR-21 and miR-92a may play a key role in balancing the pro- and anti-fibrotic processes, furthering our understanding of disease progression and refining the potential diagnostic and therapeutic approaches. Integrating these biomarkers into a single analytical model could improve diagnostic accuracy and allow for a more precise characterization of fibrotic progression.
In the diagnosis of pulmonary fibrosis, sample sizes vary depending on the research approach. Studies using CT analysis for drug trials have used hundreds of cohorts to validate predictive measures of disease severity [21]. On the contrary, deep learning detection methods have employed datasets with thousands of samples to ensure the accuracy of automated classification systems [22].
Meanwhile, exploratory studies on PET/CT imaging have operated with smaller patient groups to evaluate new imaging modalities [23]. These variations in sample sizes reflect the diverse methodologies and objectives in pulmonary fibrosis diagnostic research.
Obviously, the cost of a study is proportional to the sample size of the population involved. This problem is further exacerbated when studies involve complex procedures such as biomarker measurements. For example, in a recent meta-analysis of the prognostic role of KL-6 measurement in IPF, the reported studies involved 56 to 267 cases, with the latter obtained from multiple centers [24].
It is well known that small sample sizes can limit the significance of statistical studies [25]. Bayesian analysis offers a robust solution for addressing challenges posed by small sample sizes in medical studies [26,27,28,29].
It allows for the incorporation of prior information, enhancing the reliability and interpretability of results. By utilizing Bayesian methods, researchers can effectively quantify the uncertainty associated with results from small datasets, helping to make decisions for future research directions. Moreover, Bayesian approaches facilitate the combination of all available information throughout the clinical trial process, from design to analysis. This is particularly beneficial in studies of rare diseases where conventional power requirements are challenging to meet due to inherently small sample sizes and heterogeneity. This comprehensive framework enables researchers to make informed decisions, improve study designs, and improve the accuracy of treatment effect assessments in medical research.
There are several studies that have used Bayesian analysis for the diagnosis of pulmonary fibrosis. Meltzer et al. [30,31] developed a model to diagnose IPF by applying Bayesian probit regression modeling to gene expression profiles of lung tissue. Cottin et al. [32] identified factors that influence the clinical likelihood of a diagnosis of IPF and used a Bayesian framework to integrate radiological and histopathological features to estimate the post-test probability of diagnosis. Another study also developed a model and app using Bayesian additive regression tree to assist radiologists in diagnosing the pulmonary fibrosis pattern [33]. Additionally, Bayesian network analysis has been used to integrate the extracellular vesicle proteome and clinical information for the diagnosis of IPF [34]. Other studies use Bayesian methods to evaluate drug therapies for the treatment of IPF [35,36,37,38].
Given the limited sample size (39 patients), this study should be considered exploratory in nature. While Bayesian methods are particularly suitable for modeling with small datasets, we acknowledge the potential for overfitting even within this framework. To mitigate this risk, we employed a model selection strategy based on the Expected Log Predictive Density using Leave-One-Out Cross-Validation (ELPD-LOO), which helps assess the generalization performance of the models. Nonetheless, the findings require validation in larger, independent cohorts before clinical translation.
To the best of our knowledge, while research into pulmonary fibrosis has increasingly focused on the role of miRNAs as potential biomarkers and therapeutic targets, the application of Bayesian analysis specifically to miRNA biomarkers in this context appears to be limited. In this paper, we present the results of a Bayesian analysis conducted on a small cohort of 39 patients with lung diseases. The aim of this study was to investigate the relationship between classical clinical parameters and three genetic biomarkers in the diagnosis and prognosis of progressive pulmonary fibrosis.

2. Materials and Methods

2.1. Population

This study involved 19 patients with PPF and 20 patients with nPPF from the S.C. of Respiratory System Diseases at the Policlinico of Foggia (Italy) between 2015 and 2022. The classification of patients followed the criteria established by Hambly et al. [39] to define the progression of the disease. Written informed consent was obtained from all participants, and ethical approval was granted by the Ethics Committee of Policlinico of Foggi, adhering to the principles of the Declaration of Helsinki. Ethical considerations precluded the inclusion of healthy donors. Diagnoses of various diffuse interstitial lung diseases (ILDs) were made according to the ATS/ERS/JRS/ALT guidelines [3]. The PPF group included 19 cases: 10 of idiopathic pulmonary fibrosis (IPF), 4 cases of hypersensitivity pneumonitis (HP), 2 cases of non-specific interstitial pneumonia (NSIP), and 3 cases related to rheumatologic diseases. On the contrary, the nPPF group consisted of 20 cases: 11 patients with slow-progressing IPF, 2 with HP, 3 with NSIP, and 4 with other ILDs. All participants underwent pulmonary function tests (PFT), a 6 min walk test (6MWT), and bronchoscopy with bronchoalveolar lavage (BAL) sampling, after obtaining written consent for these procedures. BAL samples from all patients were collected, aliquoted, and stored at −80 °C until analysis.

2.2. Bronchoalveolar Lavage: Procedure and Diagnostic Utility

Bronchoalveolar lavage (BAL) was performed for diagnostic purposes in all patients according to the guidelines of the BAL Working Group of the European Respiratory Society [40,41]. In summary, BAL is an auxiliary procedure used in fibrobronchoscopy to assess large alveolar compartments, providing both cellular and non-cellular constituents of the lower respiratory tract. During this procedure, the bronchoscope introduces 3–5 doses of 20–50 mL of saline solution into the peripheral lung, followed by the recovery of at least 30% of the volume by gentle suction to prevent damage to or collapse of the distal airways [42]. The lavage fluid was divided into aliquots, where a portion was used for studies of lung cellularity, while the rest was stored at −80 °C. The experimental procedure was carried out in accordance with the following stages [43,44,45]:
  • Exosome purification from BAL;
  • Western blotting Multiplex and Surface Marker Analysis;
  • Isolation of RNA;
  • RNA reverse transcription and expression of miRNA through q-PCR.
A detailed description of each stage is reported in the Supplementary Materials.

2.3. Dataset

The dataset used for analysis consists of 39 samples with 11 numerical features and 4 nominal features, including 2 targets (diagnosis and prognosis), which are well-balanced (Figure 1).
Feature statistics are reported in Table 1, where 23 missing values are reported.
Figure 2 reports the box plots of the numerical features conditioned to diagnosis and prognosis.
Finally, Figure 3 shows the distributions of gender and smoking status. As expected, male patients are more common than females [46]; on the other hand, there is no significative difference between smokers and non-smokers.

2.4. Workflow

2.4.1. Data Preparation

  • Missing values: no imputation methods have been considered adequate to estimate the missing values; therefore, the rows containing missing values were removed from the dataset when a processing method could not be applied with missing data.
  • Outliers: potential outliers detected from boxplots were discussed with the domain experts, and their inclusion in the dataset was eventually confirmed.
  • Normalization: we opted for the z-score normalization of all numerical features, so that each feature would be centered around the mean and would have a unitary standard deviation.
z = x m e a n x s t d x

2.4.2. Feature Augmentation

Since our main interest is in understanding the possible correlation between genetic features and diagnosis or prognosis, additional features have been calculated to favor the discovery of these relationships. In particular, pairwise products have been considered to highlight the correlation among genetic features. In fact, the product of feature A and B (normalized with z-score) is positive and high if both features are high or both are low, while the product is negative if they are discordant. The pairwise product of genetic features led to the introduction of six derived features in the dataset.

2.4.3. Feature Ranking and Selection

The number of features is high compared to the low cardinality of the samples. To reduce the risk of overfitting, feature selection was performed using the Wrapper method, which correlates the selected features with the selected predictive model [47]. The Wrapper method requires the iterative generation of a subset of candidate features: this is accomplished by first ranking features and then providing the selection procedure with an incremental list of features according to the rank. The following procedure has been carried out:
  • Feature ranking: the adopted method is based on three stages.
    • Random Forest feature ranking was applied [48], whereas ranking was performed by sorting the features based on their importance, which was estimated by Gini impurity.
    • Since we worked with linear models, we needed to avoid the presence of highly correlated features. To this end, a Pearson correlation matrix among features was computed, and features with an absolute correlation coefficient ≥0.5 (generally interpreted as moderate to high correlation) with higher ranked feaures were removed.
    • Finally, gender was removed because it introduced label-leakage bias in classification. Also, the DLCO/VA feature was discarded due to the high number of missing values.
  • The choice of the final feature subset was made according to the Wrapper method by using the ranked list of features obtained in the previous stage. Starting from the most important feature, several models were induced by increasingly adding new features according to the rank. These models were then evaluated according to ELPD-LOO (described henceforth), and the model showing the highest value of this measure (with the corresponding feature subset) was selected.

2.4.4. Bayesian Model

Both targets (diagnosis and prognosis) are binary; therefore, the objective of the model is to estimate the probability of each class, given the selected input features. More specifically, the Bayesian model assumes a Bernoulli distribution of the target T (T being diagnosis or prognosis)
T | x B e r n o u l l i p T x
where x is the feature vector, and the probability p T is defined by a generalized linear model:
p T x = 1 + e x p β T , 0 + β T x 1
The proposed Bayesian model enables the estimation of the contribution of each feature considered in a transparent way. It is therefore more understandable than opaque techniques, such as deep learning models (Molnar 2022 [49]).
The Bayesian model requires the specification of the prior distributions of the linear coefficients. In this work, Student distribution has been used to specify priors because of its heavier tails than standard normal distribution.
The output of the Bayesian model is the posterior distribution of the coefficients β 0 , T and β T as defined by Bayes’ rule
P r x o s , T o b s P r β 0 , T , β T , x o s P r P r   β 0 , T , β T  
which quantifies the uncertainty of the coefficients due to the scarcity of observed data.

2.4.5. Expected Log Pointwise Predictive Density (ELPD-LOO)

The generalization ability of the Bayesian model (i.e., the ability to predict the target of newly observed data) can be estimated through Leave-One Out (LOO), a general technique that estimates the target of any data point in the dataset by training a model without the data point. In the Bayesian context, the output of a model is a probability distribution, whose predictive ability is evaluated by scoring the log-probability of the outcome. The Expected Log Pointwise Predictive Density (ELPD) LOO is a scoring function suitable for evaluating the generalization ability of a Bayesian model [50].
ELPD-LOO can be used to score a Bayesian model with a specified subset of features within a feature selection process based on the Wrapper method. The model with the best EPLD-LOO will be retained as the model with the highest predictive quality.

3. Results

3.1. Feature Ranking and Selection

To keep the analysis manageable, and to focus on correlations between features of different types, we divided the features in three groups (Table 2) and considered the two highest ranked features for each group. Linear correlation among these features is negligible; therefore, all the resulting six features were retained for feature selection, according to their rank. Table 3 reports the list of features considered for each clinical problem.
For each of the two clinical problems, all the considered features were evaluated to further refine the selection. We built a Bayesian model for each individual feature and for each pair (resulting in 21 different models). In the case that the probability distribution of a coefficient gives a high probability degree to values close to zero, either for the single feature or for the pair, the corresponding feature was removed because its contribution could be irrelevant or ambiguous (i.e., in some models, the contribution of the feature to the output is positive; in others, it is negative).
Table 4 reports the list of features resulting from the workflow.
For each subset of features in each group, we quantified the performance of the model using ELPD-LOO, which is reported in Figure 4.
Regarding the diagnosis of IPF, the figure highlights the combination of features LIN + miR-21 + miR-21 × miR-92a as the most effective, while the exclusion of miR-21 leads to a small reduction in average ELPD. On the other hand, for prognosis estimation, the most effective models are based on AGE+DLCO, with or without KL6, which were both considered in the subsequent analysis.

3.2. Performance Evaluation

For each model in the ranking, and for both clinical problems, we computed the expected accuracy, sensitivity, and specificity (expected value computation is required because each model is defined by a probability distribution of the parameters). In Table 5, the performance evaluation of models for diagnosis is reported. The table shows that miRNA features, together with the LIN feature, significantly improve the expected quality of classification, in terms of accuracy, sensitivity, and specificity. In particular, the highest values of accuracy and specificity are obtained when the interaction between miR-21 and miR-92a is considered.
In Table 6, the performance results are reported for prognosis prediction. In this case, the presence of KL6 improves both accuracy and specificity when age and DLCO are also used for prediction, while sensitivity is improved when age is not considered.
In both tables, we also observe that the prediction uncertainty (as represented by the standard deviation) of the best models is small when compared with the other models, yet high in absolute terms (for example, the standard deviation of specificity of the best model for prognosis classification is 14.4% as reported in Table 6). This assessment of uncertainty is an objective indicator that more data are needed to obtain more stable results.

3.3. Uncertainty Evaluation

For both diagnosis and prognosis prediction, we reported the posterior probability of the class of each sample in a confusion diagram, as reported in Figure 5 and Figure 6.
In each class diagram, the x-axis shows the observed classes of PPF vs. others in diagnosis, SLOW vs. RAPID in prognosis; on the y-axis, the posterior probabilities of each data sample have been reported. As posterior probabilities are defined as probability distributions, they are synthetically represented as circles centered on the distribution mean, and radius proportional to the standard deviation. If the posterior probability is greater than 0.5, the predicted label is PPF diagnosis and RAPID for prognosis.
By recording the posterior probability of the target class, it is possible to appreciate the uncertainty in the decision: when this probability is close to the cutoff point of 0.5, classification should not be made, because the available data do not provide enough information to take a clear decision. As concerning PPF diagnosis, from Figure 5 we observe an increase in uncertainty in the classification of PPF when the compound feature miR-21 × miR-92a is not considered (right diagram). This can be observed by the concentration of the class probability distributions of PPF around the cutoff point of 0.5 when this feature is removed from the model. On the other hand, we observe a reduction in uncertainty for the opposite class when this feature is removed. Regarding the prognosis, in Figure 6 we observe an increase in uncertainty about the prediction of class probabilities when KL6 is used in the model (this is observable by the larger areas of the circles on the right diagram). However, many predictions are away from the cutoff point, thus making the classification less uncertain. We also observe the presence of predictions around the 0.5 cutoff both for the SLOW and RAPID class, which may be due to the fuzzy nature of their definition, which does not admit a sharp separation.
The presence of circles with large radii is an indication that, for some samples, the estimation of the class probability is highly uncertain. In some cases, the circles intersect the class cutoff, suggesting that no decisions should be made on these samples. On the other hand, there are samples with large radii that are far from the cutoff; in such a case, the class prediction is certain, assuming the hypothesis of the model. The reduction in the uncertainty can be achieved by increasing the number of samples. Another option to reduce the uncertainty is to use more flexible (i.e., non-linear) models, which, however, may be less interpretable than linear models, and would require validation processes (for example, k-fold cross-validation) to avoid overfitting that, in turn, requires more data.

3.4. Interpretation of the Linear CoefficientsUncertainty Evaluation

In Figure 7 the linear coefficients of the logistic function (2) are reported for the diagnosis of PPF, with and without miR-21. The coefficients are described through posterior probability distributions.
None of the coefficients shown include 0 with significant probability (intercept is not considered here). For both models and all features, the probability that a coefficient is in the interval [−0.05, +0.05] is less than 5%. This is an indication that all the coefficients concur to the definition of class probability in a coherent way. Specifically, higher values of Lin determine a decrease in probability of PPF, while the cooccurrence of miR-21 and miR-92a (i.e., either they are both low or they are both high) increases the probability of PPF. An increase in miR-21 (in the model that uses it) determines a decrease in the probability of PPF, which is partially compensated by the feature miR-21 × miR-92a.
In Figure 8, the linear coefficients of the logistic function are reported for the prognosis, with and without KL6. Both age and DLCO are negatively correlated with class probability (class is RAPID prognosis; therefore, the higher the age or DLCO, the higher the tendency to classify the subject with a SLOW prognosis).
On the other hand, KL6 is positively correlated (the higher the value, the higher the probability of RAPID prognosis). For both models and all features, the probability that a coefficient is in the interval [−0.05, +0.05] is less than 5.5%.

4. Discussion

The experimental results show that a Bayesian approach to medical diagnosis could provide interpretable information on the uncertainty of predictive models, which is particularly apparent in the case of small datasets. The Bayesian approach adopted in this study explicitly quantifies the uncertainty associated with parameter estimates, which is particularly useful in studies with limited sample sizes. Unlike traditional frequentist methods, Bayesian inference provides probability distributions for each coefficient, allowing for a more transparent assessment of each variable’s contribution to classification. This is especially relevant in exploratory studies on rare diseases or conditions with high individual variability, such as PPF.
In all cases reported in this paper, we observe bell-shaped distributions, which indicate the presence of a compact region of the parameter space where the true (yet unknown) values reside, if the modeling assumptions are correct. However, it must be noticed that data are normalized (according to z-score); therefore, the interpretation of the coefficients requires some care. As an example, the posterior distribution of the KL6 coefficient in prognosis shows that its true value is roughly between 0 and 3; this is the (uncertain) contribution of KL6 when its absolute value is around its mean plus its standard deviation.
The uncertainty in the coefficients determines the uncertainty in the classification. In fact, the confusion matrices report, for each data point, the classification that is pictorially represented by a circle whose radius is proportional to its uncertainty. It is noteworthy observing that models with a higher number of features provide more uncertain classifications than simpler models because of the propagation of uncertainty from the feature coefficients to the class.
Dealing with uncertainty makes modeling more complex, but the results are more informative and robust. For example, notwithstanding the significant uncertainty of the linear coefficients of the linear models, all of them show that the probability of being around zero or having an incoherent sign is very small: this is a robust indicator that the features are necessary for the classification. On the other hand, uncertainty quantification is essential for deciding if the available data are not enough for a robust classification and, consequently, the models are not useful for further analysis or clinical practice. In such a case, the analyst is informed that additional data are required for conclusive results.
Moreover, given the relatively small cohort analyzed in this study, the present findings should be interpreted as preliminary and hypothesis-generating. While the Bayesian framework helps mitigate some limitations associated with sample size, further validation in larger, multi-center studies is necessary to confirm these results and support their clinical applicability. It is essential to highlight that the obtained results must be validated in larger cohorts to ensure the robustness and clinical applicability of the conclusions.
From a medical viewpoint, the results provide valuable information on the disease and its potential application in clinical practice to diagnose and predict the progression of progressive pulmonary fibrosis. The findings indicate that including miRNAs such as miR-21 and miR-92a, alongside the protein biomarker KL-6, change diagnostic and prognostic accuracy for PPF. The analysis revealed that certain miRNAs are linked to the disease, indicating a potential role as diagnostic markers. However, larger studies are required to confirm their clinical applicability and define their actual predictive value.
The interaction between miR-21 and miR-92a was identified as a significant indicator for PPF diagnosis, improving both the accuracy and specificity of the predictions. This is consistent with previous studies that have highlighted the crucial role of miRNAs in modulating gene expression and the pathological processes of pulmonary fibrosis [13,51].
We demonstrated that the integration of miRNAs with protein biomarkers can provide more detailed and accurate information compared to traditional diagnostic methods. Specifically, we examine the expression of KL-6 mRNA alongside miR-21 and miR-92a. KL-6 is known for its high expression in fibrotic and interstitial lung diseases [52,53]. Analysis of KL-6 mRNA showed a significant correlation with the severity of fibrosis, confirming its role as an indicator of fibroblastic activity in the lungs. MiR-21 and miR-92a were significantly associated with PPF, suggesting that these miRNAs could play a key role in the pathogenesis of the disease. Specifically, miR-21 is known for its ability to modulate inflammatory and fibrotic responses [54,55], while miR-92a is involved in the regulation of cellular metabolism and proliferation [56,57].
Integrating miRNAs with KL-6 improved the model’s predictive capacity compared to single biomarkers, suggesting a potential benefit in analyzing pulmonary fibrosis progression. Further validation is, however, needed for clinical translation. The combination of these biomarkers provided a more detailed and accurate picture compared to the isolated use of each. This multidisciplinary approach, which integrated clinical data with omics data, could enable more precise identification and monitoring of the disease and potentially pave the way for new treatment and management strategies. Furthermore, the use of machine learning techniques for data analysis further enhanced the ability to identify significant patterns and complex relationships between clinical and genetic variables.
Despite the promising findings, it is important to acknowledge that the implementation of these biomarkers—particularly when derived from exosomes isolated from bronchoalveolar lavage (BAL)—faces important technical and economic challenges in clinical practice. The isolation, characterization, and analysis of exosomal RNA currently require specialized equipment and expertise, which limits widespread routine use. Nevertheless, advances in microfluidics and RNA quantification technologies may gradually reduce these barriers, and alternative sources such as serum-derived exosomes are being increasingly explored to provide more accessible sampling options. Therefore, while BAL exosome profiling is not yet standardizable for routine diagnostics, our results suggest that this approach holds significant translational potential that warrants further investigation in larger, clinically oriented studies.
The Bayesian methodology used in this study represented a significant advance. Unlike pointwise generalized linear models (GLMs), which provide point estimates for coefficients, the Bayesian approach offers a posterior distribution for each coefficient. This not only provides a probability distribution as an output, but also allows for observing the influence of each variable on the classification.
In particular, the Bayesian approach allowed for an explicit evaluation of the uncertainty associated with parameter estimates, a crucial aspect in studies with small sample sizes. The posterior distribution of each coefficient provides a more accurate and robust estimate compared to traditional methods, such as generalized linear models (GLMs), which provide only point estimates. The probability distribution associated with each parameter allows for a more transparent assessment of each variable’s contribution to classification, with uncertainty indicated by the width of the distribution. This approach is especially useful in complex clinical contexts, such as in progressive pulmonary fibrosis, where classification models need to account for individual variability and a limited number of samples.
The distance of the coefficient from zero indicates the relevance of the variable, and the width of the distribution expresses the associated uncertainty. This approach improved the management of uncertainty in the data and provided more robust results, especially considering the limited sample size [58]. Furthermore, the ability to update prior probabilities with new evidence makes Bayesian analysis particularly suitable for exploratory studies in complex clinical contexts.
However, it is important to note that posterior uncertainty in many parameters remains significant. The width of the posterior distributions for several coefficients indicates substantial uncertainty in their estimates, which is reflected in the uncertainty observed in the classification results. This is particularly noticeable in models with more features, where the propagation of uncertainty from the coefficients increases the overall uncertainty in the classification. Thus, while the results are promising, the significant uncertainty suggests that a larger sample size would be beneficial to further refine the model and improve diagnostic and prognostic accuracy.
The small sample size led to significant uncertainty in the classification of diagnosis and prognosis. The Bayesian analysis enabled a quantification of this uncertainty, providing robust results but, at the same time, reporting high uncertainty in classifying some samples, thus suggesting that the model could be improved by including a larger number of patients. Furthermore, the lack of a control group of healthy subjects limits our ability to compare biomarker levels between different populations.
Beyond diagnostic and prognostic purposes, the identification of specific biomarkers such as KL-6, miR-21, and miR-92a, may influence future clinical management by enabling patient stratification and the tailoring of therapeutic approaches. For instance, miR-21 has been implicated in TGF-β-mediated fibrotic signaling pathways [54], making it a potential target for anti-fibrotic therapies. Recent preclinical studies have explored the use of miR-21 inhibitors to reduce fibrosis, suggesting that its measurement could guide the application of emerging RNA-based treatments [59,60]. Similarly, the modulation of miR-92a, known to affect endothelial function and cell proliferation [57], could open new avenues for therapeutic intervention in fibrotic diseases.
Moreover, the integrative use of biomarkers in predictive modeling may assist in identifying patients at higher risk of rapid progression, thereby informing decisions about the early initiation of anti-fibrotic agents, enrollment in clinical trials, or more intensive follow-up protocols. The Bayesian approach also offers the flexibility to incorporate new data over time, which is particularly relevant in evolving therapeutic landscapes where novel treatments—such as nintedanib or pirfenidone analogs, RNA therapeutics, or anti-miRNA agents—are under continuous development [61,62].
Overall, our findings highlight the potential of a biomarker-based, uncertainty-aware framework to support precision medicine strategies in progressive pulmonary fibrosis, by improving risk stratification, monitoring, and potentially guiding the use of targeted interventions.
In summary, our study highlights how the integration of KL-6 and miR-21 and miR-92a provides a more comprehensive view of progressive pulmonary fibrosis, improving both diagnosis and prognosis. The use of Bayesian methodology allowed for a more accurate assessment of uncertainty and influencing variables, facilitating the identification of new therapeutic strategies. This multidimensional approach marks a significant advancement in the management of pulmonary fibrosis, with the potential to optimize future treatments through larger studies and extended follow-ups.

5. Conclusions

This study demonstrates that Bayesian analysis can be a powerful tool for the diagnosis and prognosis of progressive pulmonary fibrosis (PPF), especially in contexts with limited data. The integration of genetic biomarkers such as miR-21, miR-92a, and KL-6 with traditional clinical parameters could enhance the management of patients with PPF, contributing to a more personalized approach. Specifically, miR-21 and miR-92a may be crucial for early diagnosis and monitoring disease progression, while KL-6 proves to be a useful prognostic indicator that could be incorporated into clinical routines to monitor patient response to antifibrotic treatments.
However, it is important to acknowledge that the integration of omics data into routine clinical practice presents several challenges. These include the need for larger validation studies, cost considerations, the availability of necessary technologies, and the integration of omics data with traditional clinical measures. Therefore, while these biomarkers hold promise, their clinical implementation requires careful consideration of these factors.
Further studies with larger samples and validation of biomarkers in independent cohorts are necessary to confirm these results. Additionally, exploring the use of exosomes derived from bronchoalveolar lavage and serum as diagnostic and prognostic tools could offer new opportunities to monitor local and systemic pathological processes, expanding our understanding of PPF and improving therapeutic strategies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics15101257/s1, Supplementary Method S1. Exosome Purification from BAL; Supplementary Method S2. Western Blotting; Supplementary Method S3. Multiplex Surface Marker Analysis; Supplementary Method S4. RNA Extraction; Supplementary Method S5. Reverse Transcription and miRNA Expression Analysis by Real-Time PCR.

Author Contributions

P.S.: Conceptualization; Data curation; Formal analysis; Methodology; Project administration; Supervision; Writing—original draft; Writing—review and editing. V.L.: Data curation; Formal analysis; Investigation; Software; Validation; Writing—original draft; Writing—review and editing. C.M.: Data curation; Formal analysis; Investigation; Methodology; Writing—original draft; Writing—review and editing. P.T.: Data curation; Investigation; Writing—review and editing. F.M.: Data curation; Investigation; Writing—review and editing. G.S.: Data curation; Investigation; Writing—review and editing. D.L.: Conceptualization; Project administration; Resources; Supervision; Writing—original draft; Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. This manuscript has been published with the financial support of the Dept. of Medical and Surgical Science of the University of Foggia.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Policlinico Riuniti of Foggia (approval number 26/CE/2023, 10 March 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, C.; Zhao, H.; Li, B.L.; Fu, G.; Liu, H.; Cai, J.M.; Zheng, M. CpG-oligodeoxynucleotides may be effective for preventing ionizing radiation-induced pulmonary fibrosis. Toxicol. Lett. 2018, 292, 181–189. [Google Scholar] [CrossRef]
  2. Li, N.; Lin, Z.; Zhou, Q.; Wang, Y.; Guan, Y.; Li, H.; Zhao, Y.; Liu, N.; Jin, Y.; Yao, S. Metformin alleviates crystalline silica-induced pulmonary fibrosis by remodeling endothelial cells to mesenchymal transition via autophagy signaling. Ecotoxicol. Environ. Saf. 2022, 245, 114100. [Google Scholar] [CrossRef]
  3. Raghu, G.; Remy-Jardin, M.; Richeldi, L.; Thomson, C.C.; Inoue, Y.; Johkoh, T.; Kreuter, M.; Lynch, D.A.; Maher, T.M.; Martinez, F.J.; et al. Idiopathic Pulmonary Fibrosis (an Update) and Progressive Pulmonary Fibrosis in Adults: An Official ATS/ERS/JRS/ALAT Clinical Practice Guideline. Am. J. Respir. Crit. Care Med. 2022, 205, e18–e47. [Google Scholar] [CrossRef] [PubMed]
  4. Cottin, V.; Hirani, N.A.; Hotchkin, D.L.; Nambiar, A.M.; Ogura, T.; Otaola, M.; Skowasch, D.; Park, J.S.; Poonyagariyagorn, H.K.; Wuyts, W.; et al. Presentation, diagnosis and clinical course of the spectrum of progressive-fibrosing interstitial lung diseases. Eur. Respir. Rev. 2018, 27, 180076. [Google Scholar] [CrossRef] [PubMed]
  5. Oliveira, R.P.; Ribeiro, R.; Melo, L.; Grima, B.; Oliveira, S.; Alves, J.D. Connective tissue disease-associated interstitial lung disease. Pulmonology 2022, 28, 113–118. [Google Scholar] [CrossRef]
  6. Jindal, S.K.; Aggarwal, A.N.; Gupta, D. Dust-induced interstitial lung disease in the tropics. Curr. Opin. Pulm. Med. 2001, 7, 272–277. [Google Scholar] [CrossRef] [PubMed]
  7. Bonham, C.A.; Strek, M.E.; Patterson, K.C. From granuloma to fibrosis: Sarcoidosis associated pulmonary fibrosis. Curr. Opin. Pulm. Med. 2016, 22, 484–491. [Google Scholar] [CrossRef]
  8. Wakamatsu, K.; Nagata, N.; Kumazoe, H.; Oda, K.; Ishimoto, H.; Yoshimi, M.; Takata, S.; Hamada, M.; Koreeda, Y.; Takakura, K.; et al. Prognostic value of serial serum KL-6 measurements in patients with idiopathic pulmonary fibrosis. Respir. Investig. 2017, 55, 16–23. [Google Scholar] [CrossRef]
  9. Yokoyama, A.; Kohno, N.; Hamada, H.; Sakatani, M.; Ueda, E.; Kondo, K.; Hirasawa, Y.; Hiwada, K. Circulating KL-6 predicts the outcome of rapidly progressive idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 1998, 158, 1680–1684. [Google Scholar] [CrossRef]
  10. Ohshimo, S.; Ishikawa, N.; Horimasu, Y.; Hattori, N.; Hirohashi, N.; Tanigawa, K.; Kohno, N.; Bonella, F.; Guzman, J.; Costabe, U.; et al. Baseline KL-6 predicts increased risk for acute exacerbation of idiopathic pulmonary fibrosis. Respir. Med. 2014, 108, 1031–1039. [Google Scholar] [CrossRef]
  11. d’Alessandro, M.; Bergantini, L.; Cameli, P.; Pieroni, M.; Refini, R.M.; Sestini, P.; Bargagli, E. Serum concentrations of KL-6 in patients with IPF and lung cancer and serial measurements of KL-6 in IPF patients treated with antifibrotic therapy. Cancers 2021, 13, 689. [Google Scholar] [CrossRef] [PubMed]
  12. Soccio, P.; Moriondo, G.; d’Alessandro, M.; Scioscia, G.; Bergantini, L.; Gangi, S.; Tondo, P.; Foschino Barbaro, M.P.; Cameli, P.; Bargagli, E.; et al. Role of BAL and serum Krebs von den Lungen-6 (KL-6) in patients with pulmonary fibrosis. Biomedicines 2024, 12, 269. [Google Scholar] [CrossRef]
  13. Pandit, K.V.; Milosevic, J.; Kaminski, N. MicroRNAs in idiopathic pulmonary fibrosis. Transl. Res. 2011, 157, 191–199. [Google Scholar] [CrossRef]
  14. Liang, H.; Liu, S.; Chen, Y.; Bai, X.; Liu, L.; Dong, Y.; Hu, M.; Su, X.; Chen, Y.; Huangfu, L.; et al. miR-26a suppresses EMT by disrupting the Lin28B/let-7d axis: Potential cross-talks among miRNAs in IPF. J. Mol. Med. 2016, 94, 655–665. [Google Scholar] [CrossRef]
  15. Liu, R.H.; Ning, B.; Ma, X.E.; Gong, W.M.; Jia, T.H. Regulatory roles of microRNA-21 in fibrosis through interaction with diverse pathways (Review). Mol. Med. Rep. 2016, 13, 2359–2366. [Google Scholar] [CrossRef] [PubMed]
  16. Admyre, C.; Grunewald, J.; Thyberg, J.; Gripenbäck, S.; Tornling, G.; Eklund, A.; Scheynius, A.; Gabrielsson, S. Exosomes with major histocompatibility complex class II and co-stimulatory molecules are present in human BAL fluid. Eur. Respir. J. 2003, 22, 578–583. [Google Scholar] [CrossRef] [PubMed]
  17. Njock, M.S.; Fish, J.E. Endothelial miRNAs as cellular messengers in cardiometabolic diseases. Trends Endocrinol. Metab. 2017, 28, 237–246. [Google Scholar] [CrossRef]
  18. Sánchez-Vidaurre, S.; Eldh, M.; Larssen, P.; Daham, K.; Martinez-Bravo, M.J.; Dahlén, S.E.; Dahlén, B.; van Hage, M.; Gabrielsson, S. RNA-containing exosomes in induced sputum of asthmatic patients. J. Allergy Clin. Immunol. 2017, 140, 1459–1461.e2. [Google Scholar] [CrossRef]
  19. Soccio, P.; Moriondo, G.; Scioscia, G.; Leo, V.; Tondo, P.; Salerno, L.; Palange, P.; Foschino Barbaro, M.P.; Lacedonia, D. Searching for airways biomarkers useful to identify progressive pulmonary fibrosis. BMC Pulm. Med. 2023, 23, 407. [Google Scholar] [CrossRef]
  20. Lacedonia, D.; Scioscia, G.; Soccio, P.; Conese, M.; Catucci, L.; Palladino, G.P.; Simone, F.; Quarato, C.M.I.; Di Gioia, S.; Rana, R.; et al. Downregulation of exosomal let-7d and miR-16 in idiopathic pulmonary fibrosis. BMC Pulm. Med. 2021, 21, 188. [Google Scholar] [CrossRef]
  21. Jacob, J.; Bartholmai, B.J.; Rajagopalan, S.; van Moorsel, C.H.M.; van Es, H.W.; van Beek, F.T.; Struik, M.H.L.; Kokosi, M.; Egashira, R.; Brun, A.L.; et al. Predicting Outcomes in Idiopathic Pulmonary Fibrosis Using Automated Computed Tomographic Analysis. Am. J. Respir. Crit. Care Med. 2018, 198, 767–776. [Google Scholar] [CrossRef] [PubMed]
  22. Sabir, W.; Farhan, M.; Almalki, N.S.; Alnfiai, M.M.; Sampedro, G.A. FibroVit-Vision transformer-based framework for detection and classification of pulmonary fibrosis from chest CT images. Front. Med. 2023, 10, 1282200. [Google Scholar] [CrossRef]
  23. Röhrich, M.; Leitz, D.; Glatting, F.M.; Wefers, A.K.; Weinheimer, O.; Flechsig, P.; Kahn, N.; Mall, M.A.; Giesel, F.L.; Kratochwil, C.; et al. Fibroblast Activation Protein-Specific PET/CT Imaging in Fibrotic Interstitial Lung Diseases and Lung Cancer: A Translational Exploratory Study. J. Nucl. Med. 2022, 63, 127–133. [Google Scholar] [CrossRef] [PubMed]
  24. Aloisio, E.; Braga, F.; Puricelli, C.; Panteghini, M. Prognostic role of Krebs von den Lungen-6 (KL-6) measurement in idiopathic pulmonary fibrosis: A systematic review and meta-analysis. Clin. Chem. Lab. Med. 2021, 59, 1400–1408. [Google Scholar] [CrossRef]
  25. Ramirez-Santana, M. Limitations and biases in cohort studies. In Cohort Studies in Health Sciences; IntechOpen: London, UK, 2018. [Google Scholar] [CrossRef]
  26. Larson, C.; Kaplan, D.; Girolamo, T.; Kover, S.T.; Eigsti, I.M. A Bayesian statistics tutorial for clinical research: Prior distributions and meaningful results for small clinical samples. J. Clin. Psychol. 2023, 79, 2602–2624. [Google Scholar] [CrossRef]
  27. Sondhi, A.; Segal, B.; Snider, J.; Humblet, O.; McCusker, M. Bayesian additional evidence for decision making under small sample uncertainty. BMC Med. Res. Methodol. 2021, 21, 221. [Google Scholar] [CrossRef] [PubMed]
  28. Han, T.; Proctor, R.W. Change of Variable-Foreperiod Effects within an Experiment: A Bayesian Modeling Approach. J. Cogn. 2022, 5, 40. [Google Scholar] [CrossRef]
  29. Han, J.; Sauberan, J.; Tran, M.T.; Adler-Shohet, F.C.; Michalik, D.E.; Tien, T.H.; Tran, L.; DO, D.H.; Bradley, J.S.; Le, J. Implementation of Vancomycin Therapeutic Monitoring Guidelines: Focus on Bayesian Estimation Tools in Neonatal and Pediatric Patients. Ther. Drug Monit. 2022, 44, 241–252. [Google Scholar] [CrossRef]
  30. Meltzer, E.B.; Barry, W.T.; D’Amico, T.A.; Davis, R.D.; Lin, S.S.; Onaitis, M.W.; Morrison, L.D.; Sporn, T.A.; Steele, M.P.; Noble, P.W. Bayesian probit regression model for the diagnosis of pulmonary fibrosis: Proof-of-principle. BMC Med. Genom. 2011, 4, 70. [Google Scholar] [CrossRef]
  31. Meltzer, E.B.; Barry, W.T.; Yang, I.V.; Brown, K.K.; Schwarz, M.I.; Patel, H.; Ashley, A.; Noble, P.W.; Schwartz, D.A.; Steele, M.P. Familial and sporadic idiopathic pulmonary fibrosis: Making the diagnosis from peripheral blood. BMC Genom. 2014, 15, 902. [Google Scholar] [CrossRef]
  32. Cottin, V.; Tomassetti, S.; Valenzuela, C.; Walsh, S.L.F.; Antoniou, K.M.; Bonella, F.; Brown, K.K.; Collard, H.R.; Corte, T.J.; Flaherty, K.R.; et al. Integrating Clinical Probability into the Diagnostic Approach to Idiopathic Pulmonary Fibrosis: An International Working Group Perspective. Am. J. Respir. Crit. Care Med. 2022, 206, 247–259. [Google Scholar] [CrossRef]
  33. Fan, W.; Chen, Q.; Maccarrone, V.; Luk, L.; Navot, B.; Salvatore, M. Developing radiology diagnostic tools for pulmonary fibrosis using machine learning methods. Clin. Imaging 2024, 106, 110047. [Google Scholar] [CrossRef]
  34. Tomoto, M.; Mineharu, Y.; Sato, N.; Tamada, Y.; Nogami-Itoh, M.; Kuroda, M.; Adachi, J.; Takeda, Y.; Mizuguchi, K.; Kumanogoh, A.; et al. Idiopathic pulmonary fibrosis-specific Bayesian network integrating extracellular vesicle proteome and clinical information. Sci. Rep. 2024, 14, 1315. [Google Scholar] [CrossRef]
  35. Zhao, C.; Yin, Y.; Zhu, C.; Zhu, M.; Ji, T.; Li, Z.; Cai, J. Drug therapies for treatment of idiopathic pulmonary fibrosis: A systematic review, Bayesian network meta-analysis, and cost-effectiveness analysis. EClinicalMedicine 2023, 61, 102071. [Google Scholar] [CrossRef] [PubMed]
  36. Canestaro, W.J.; Forrester, S.H.; Raghu, G.; Ho, L.; Devine, B.E. Drug Treatment of Idiopathic Pulmonary Fibrosis. Chest 2016, 149, 756–766. [Google Scholar] [CrossRef] [PubMed]
  37. Skandamis, A.; Kani, C.; Markantonis, S.L.; Souliotis, K. Systematic review and network meta-analysis of approved medicines for the treatment of idiopathic pulmonary fibrosis. J. Drug Assess. 2019, 8, 55–61. [Google Scholar] [CrossRef]
  38. Rochwerg, B.; Neupane, B.; Zhang, Y.; Garcia, C.C.; Raghu, G.; Richeldi, L.; Brozek, J.; Beyene, J.; Schünemann, H. Treatment of idiopathic pulmonary fibrosis: A network meta-analysis. BMC Med. 2016, 14, 18. [Google Scholar] [CrossRef] [PubMed]
  39. Hambly, N.; Farooqi, M.M.; Dvorkin-Gheva, A.; Donohoe, K.; Garlick, K.; Scallan, C.; Chong, S.G.; MacIsaac, S.; Assayag, D.; Johannson, K.A.; et al. Prevalence and characteristics of progressive fibrosing interstitial lung disease in a prospective registry. Eur. Respir. J. 2022, 60, 2102571. [Google Scholar] [CrossRef]
  40. Clinical guidelines and indications for bronchoalveolar lavage (BAL): Report of the European Society of Pneumology Task Group on BAL. Eur. Respir. J. 1990, 3, 937–976. [CrossRef]
  41. Haslam, P.L.; Baughman, R.P. Report of ERS Task Force: Guidelines for measurement of acellular components and standardization of BAL. Eur. Respir. J. 1999, 14, 245–248. [Google Scholar] [CrossRef]
  42. Meyer, K.C.; Raghu, G.; Baughman, R.P.; Brown, K.K.; Costabel, U.; du Bois, R.M.; Drent, M.; Haslam, P.L.; Kim, D.S.; Nagai, S.; et al. An official American Thoracic Society clinical practice guideline: The clinical utility of bronchoalveolar lavage cellular analysis in interstitial lung disease. Am. J. Respir. Crit. Care Med. 2012, 185, 1004–1014. [Google Scholar] [CrossRef] [PubMed]
  43. Danese, E.; Minicozzi, A.M.; Benati, M.; Paviati, E.; Lima-Oliveira, G.; Gusella, M.; Pasini, F.; Salvagno, G.L.; Montagnana, M.; Lippi, G. Reference miRNAs for colorectal cancer: Analysis and verification of current data. Sci. Rep. 2017, 7, 8413. [Google Scholar] [CrossRef] [PubMed]
  44. Chapman, J.R.; Waldenström, J. With Reference to Reference Genes: A Systematic Review of Endogenous Controls in Gene Expression Studies. PLoS ONE 2015, 10, e0141853. [Google Scholar] [CrossRef] [PubMed]
  45. Livak, K.J.; Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−∆∆CT Method. Methods 2001, 25, 402–408. [Google Scholar] [CrossRef]
  46. Kawano-Dourado, L.; Glassberg, M.K.; Assayag, D.; Borie, R.; Johannson, K.A. Sex and gender in interstitial lung diseases. Eur. Respir. Rev. 2021, 30, 210105. [Google Scholar] [CrossRef]
  47. Tang, J.; Alelyani, S.; Liu, H. Feature selection for classification: A review. In Data Classification: Algorithms and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014; p. 37. [Google Scholar]
  48. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar] [CrossRef]
  49. Molnar, C. Interpretable Machine Learning: A Guide For Making Black Box Models Explainable. 2022. Available online: https://christophmolnar.com/books/interpretable-machine-learning (accessed on 6 May 2025).
  50. Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef]
  51. Pottier, N.; Maurin, T.; Chevalier, B.; Puisségur, M.P.; Lebrigand, K.; Robbe-Sermesant, K.; Bertero, T.; Lino Cardenas, C.L.; Courcot, E.; Rios, G.; et al. Identification of keratinocyte growth factor as a target of microRNA-155 in lung fibroblasts: Implication in epithelial-mesenchymal interactions. PLoS ONE 2009, 4, e6718. [Google Scholar] [CrossRef]
  52. Lucà, S.; Pagliuca, F.; Perrotta, F.; Ronchi, A.; Mariniello, D.F.; Natale, G.; Bianco, A.; Fiorelli, A.; Accardo, M.; Franco, R. Multidisciplinary Approach to the Diagnosis of Idiopathic Interstitial Pneumonias: Focus on the Pathologist’s Key Role. Int. J. Mol. Sci. 2024, 25, 3618. [Google Scholar] [CrossRef]
  53. d’Alessandro, M.; Bergantini, L.; Cameli, P.; Vietri, L.; Lanzarone, N.; Alonzi, V.; Pieroni, M.; Refini, M.; Sestini, P.; Bonella, F.; et al. Krebs von den Lungen-6 as a biomarker for disease severity assessment in interstitial lung disease: A comprehensive review. Biomark Med. 2020, 14, 665–674. [Google Scholar] [CrossRef]
  54. Cardin, S.; Guasch, E.; Luo, X.; Naud, P.; Le Quang, K.; Shi, Y.; Tardif, J.C.; Comtois, P.; Nattel, S. Role for MicroRNA-21 in Atrial Profibrillatory Fibrotic Remodeling Associated with Experimental Postinfarction Heart Failure. Circ. Arrhythm. Electrophysiol. 2012, 5, 1027–1035. [Google Scholar] [CrossRef]
  55. Huang, Y.; He, Y.; Li, J. MicroRNA-21: A central regulator of fibrotic diseases via various targets. Curr. Pharm. Des. 2015, 21, 2236–2242. [Google Scholar] [CrossRef] [PubMed]
  56. Fang, L.L.; Wang, X.H.; Sun, B.F.; Zhang, X.D.; Zhu, X.H.; Yu, Z.J.; Luo, H. Expression, regulation and mechanism of action of the miR-17-92 cluster in tumor cells (Review). Int. J. Mol. Med. 2017, 40, 1624–1630. [Google Scholar] [CrossRef] [PubMed]
  57. Yanshen, Z.; Lifen, Y.; Xilian, W.; Zhong, D.; Huihong, M. miR-92a promotes proliferation and inhibits apoptosis of prostate cancer cells through the PTEN/Akt signaling pathway. Libyan J. Med. 2021, 16, 1971837. [Google Scholar] [CrossRef] [PubMed]
  58. Raftery, A.E. Bayesian Model Selection in Social Research. Sociol. Methodol. 1995, 25, 111–163. [Google Scholar] [CrossRef]
  59. Théry, C.; Amigorena, S.; Raposo, G.; Clayton, A. Isolation and characterization of exosomes from cell culture supernatants and biological fluids. Curr. Protoc. Cell Biol. 2006, 30, 3.22.1–3.22.29. [Google Scholar] [CrossRef]
  60. Liu, G.; Friggeri, A.; Yang, Y.; Milosevic, J.; Ding, Q.; Thannickal, V.J.; Kaminski, N.; Abraham, E. miR-21 mediates fibrogenic activation of pulmonary fibroblasts and lung fibrosis. J. Exp. Med. 2010, 207, 1589–1597. [Google Scholar] [CrossRef]
  61. Wollin, L.; Wex, E.; Pautsch, A.; Schnapp, G.; Hostettler, K.E.; Stowasser, S.; Kolb, M. Mode of action of nintedanib in the treatment of idiopathic pulmonary fibrosis. Eur. Respir. J. 2015, 45, 1434–1445. [Google Scholar] [CrossRef]
  62. Rupaimoole, R.; Slack, F. MicroRNA therapeutics: Towards a new era for the management of cancer and other diseases. Nat. Rev. Drug Discov. 2017, 16, 203–222. [Google Scholar] [CrossRef]
Figure 1. Distribution of diagnosis (a) and prognosis (b) in the dataset.
Figure 1. Distribution of diagnosis (a) and prognosis (b) in the dataset.
Diagnostics 15 01257 g001
Figure 2. Feature distributions according to diagnosis (a) and prognosis (b).
Figure 2. Feature distributions according to diagnosis (a) and prognosis (b).
Diagnostics 15 01257 g002
Figure 3. Distribution of gender (a) and smoking status (b).
Figure 3. Distribution of gender (a) and smoking status (b).
Diagnostics 15 01257 g003
Figure 4. Model ranking based on ELPD computation; the circles represent the computed ELPD value, while the horizontal line is the standard error. The difference in ELPD is computed with respect to the highest ranked model.
Figure 4. Model ranking based on ELPD computation; the circles represent the computed ELPD value, while the horizontal line is the standard error. The difference in ELPD is computed with respect to the highest ranked model.
Diagnostics 15 01257 g004
Figure 5. Class posterior probabilities of the best model for the diagnosis of PPF, with and without miR-21. The area of each circle is proportional to the variance of the posterior probability distribution.
Figure 5. Class posterior probabilities of the best model for the diagnosis of PPF, with and without miR-21. The area of each circle is proportional to the variance of the posterior probability distribution.
Diagnostics 15 01257 g005
Figure 6. Posterior class probabilities of the best models for the prognosis of PPF, with and without KL6.
Figure 6. Posterior class probabilities of the best models for the prognosis of PPF, with and without KL6.
Diagnostics 15 01257 g006
Figure 7. Distribution of the linear coefficients of the best model for the diagnosis of PPF, with and without miR-21. Multiple lines are reported for each distribution, which are consequences of multiple chains of Monte Carlo sampling of the distributions. The lines are highly overlapping because of sampling convergence.
Figure 7. Distribution of the linear coefficients of the best model for the diagnosis of PPF, with and without miR-21. Multiple lines are reported for each distribution, which are consequences of multiple chains of Monte Carlo sampling of the distributions. The lines are highly overlapping because of sampling convergence.
Diagnostics 15 01257 g007
Figure 8. Distribution of the linear coefficients of the best model for the prognosis, with and without KL6.
Figure 8. Distribution of the linear coefficients of the best model for the prognosis, with and without KL6.
Diagnostics 15 01257 g008
Table 1. Basic statistics of the numerical features.
Table 1. Basic statistics of the numerical features.
FeatureMinMaxMeanStd. Dev.Missing
Age468066.467.560
FVC%2310771.6617.621
FEV1%2311275.5819.323
DLCO3012257.5119.084
DLCO/VA50167100.6726.059
Macro%199373.9518.430
Neu%07412.7116.161
Lin%2289.386.010
2−ΔΔCT miR-210.073.410.810.950
2−ΔΔCT miR-92a0.023.310.50.640
2−ΔΔCT KL-60.056.8711.2314.765
Abbreviations: FVC% = forced vital capacity (% predicted); FEV1% = forced expiratory volume in 1 s (% predicted); DLCO = diffusing capacity for carbon monoxide; DLCO/VA = DLCO corrected for alveolar volume; Macro% = percentage of macrophages in BAL; Neu% = percentage of neutrophils in BAL; Lin% = percentage of lymphocytes in BAL; 2−ΔΔCT miR-21 = relative expression level of miR-21; 2−ΔΔCT miR-92a = relative expression level of miR-92a; 2−ΔΔCT KL-6 = relative expression level of KL-6.
Table 2. Feature grouping.
Table 2. Feature grouping.
Functional ParametersBAL MarkersmiRNA Biomarkers
AGEMACRO%miR-21
FVC%NEU%miR-92a
FEV1%LIN%KL-6
DLCO miR-21 × miR-92a
DLCO/VA miR-92a × KL6
Table 3. Features considered for either diagnosis or prognosis.
Table 3. Features considered for either diagnosis or prognosis.
DiagnosisPrognosis
FVC%Age
DLCODLCO
LinLin
NeuNeu
miR-21 × miR-92aKL6
miR-21miR-92a × KL6
Table 4. Selected features for either diagnosis or prognosis.
Table 4. Selected features for either diagnosis or prognosis.
DiagnosisPrognosis
miR-21 × miR-92aKL6
miR-21Age
LinDLCO
Table 5. Performance evaluation of the selected classification models for diagnosis (expected values, standard deviation in brackets; underlined: best values per column).
Table 5. Performance evaluation of the selected classification models for diagnosis (expected values, standard deviation in brackets; underlined: best values per column).
ModelAccuracySpecificitySensitivity
Lin + miR-21 + miR-21 × miR-92a0.75 (0.036)0.67 (0.101)0.81 (0.094)
Lin + miR-92a0.70 (0.077)0.58 (0.114)0.81 (0.152)
Lin + miR-21 × miR-92a0.72 (0.046)0.62 (0.106)0.80 (0.133)
Lin + miR-210.67 (0.050)0.54 (0.118)0.79 (0.143)
Lin0.68 (0.046)0.55 (0.141)0.80 (0.168)
miR-92a0.54 (0.048)0.35 (0.307)0.70 (0.328)
miR-210.52 (0.061)0.34 (0.348)0.67 (0.359)
Table 6. Performance evaluation of the selected classification models for prognosis (expected values) (expected values, standard deviation in brackets; underlined: best values per column).
Table 6. Performance evaluation of the selected classification models for prognosis (expected values) (expected values, standard deviation in brackets; underlined: best values per column).
ModelAccuracySpecificitySensitivity
Age + KL6 + DLCO0.75 (0.058)0.73 (0.144)0.77 (0.107)
DLCO + KL60.65 (0.052)0.59 (0.184)0.70 (0.168)
Age + DLCO0.68 (0.057)0.63 (0.171)0.73 (0.135)
DLCO0.62 (0.051)0.51 (0.227)0.71 (0.204)
Age + KL60.67 (0.056)0.62 (0.155)0.71 (0.138)
Age0.58 (0.047)0.51 (0.240)0.64 (0.219)
KL60.57 (0.055)0.55 (0.309)0.59 (0.266)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Soccio, P.; Longo, V.; Mencar, C.; Tondo, P.; Murgolo, F.; Scioscia, G.; Lacedonia, D. Bayesian Integration of Bronchoalveolar Lavage miRNAs and KL-6 in Progressive Pulmonary Fibrosis Diagnosis. Diagnostics 2025, 15, 1257. https://doi.org/10.3390/diagnostics15101257

AMA Style

Soccio P, Longo V, Mencar C, Tondo P, Murgolo F, Scioscia G, Lacedonia D. Bayesian Integration of Bronchoalveolar Lavage miRNAs and KL-6 in Progressive Pulmonary Fibrosis Diagnosis. Diagnostics. 2025; 15(10):1257. https://doi.org/10.3390/diagnostics15101257

Chicago/Turabian Style

Soccio, Piera, Valerio Longo, Corrado Mencar, Pasquale Tondo, Fabiola Murgolo, Giulia Scioscia, and Donato Lacedonia. 2025. "Bayesian Integration of Bronchoalveolar Lavage miRNAs and KL-6 in Progressive Pulmonary Fibrosis Diagnosis" Diagnostics 15, no. 10: 1257. https://doi.org/10.3390/diagnostics15101257

APA Style

Soccio, P., Longo, V., Mencar, C., Tondo, P., Murgolo, F., Scioscia, G., & Lacedonia, D. (2025). Bayesian Integration of Bronchoalveolar Lavage miRNAs and KL-6 in Progressive Pulmonary Fibrosis Diagnosis. Diagnostics, 15(10), 1257. https://doi.org/10.3390/diagnostics15101257

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop