Next Article in Journal
Prediction of Postoperative Stroke in Elderly Surgical ICU Patients Using Random Forest Model: Development on MIMIC-IV with Cross-Institutional and Temporal External Validation
Previous Article in Journal
Bioinformatics Analysis Reveals Epigenetic Regulation of COL5A2 by Tumor-Suppressive miRNAs miR-101-3p and miR-29c-3p as a Potential Molecular Mechanism in Lung Adenocarcinoma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of the ‘qXR’ Software for the Detection of Pulmonary Nodules, Cardiomegaly and Pleural Effusion: A Comparative Analysis in a Latin American General Hospital

by
Adriana Anchía-Alfaro
1,
Sebastián Arguedas-Chacón
1,
Georgia Hanley-Vargas
1,
Sofía Suárez-Sánchez
1,
Luis Andrés Aguilar-Castro
2,
Sergio Daniel Seas-Azofeifa
2,
Kal Che Wong Hsu
2,
Diego Quesada-Loría
1,
María Felicia Montero-Arias
3,
Juliana Salas-Segura
4 and
Esteban Zavaleta-Monestel
1,*
1
Life Sciences Center for Innovation, Hospital Clínica Bíblica, San José 10104, Costa Rica
2
Radiology Department, Hospital Clínica Bíblica, San José 10104, Costa Rica
3
Pulmonology Department, Hospital Clínica Bíblica, San José 10104, Costa Rica
4
Cardiology Department, Hospital Clínica Bíblica, San José 10104, Costa Rica
*
Author to whom correspondence should be addressed.
BioMedInformatics 2026, 6(2), 15; https://doi.org/10.3390/biomedinformatics6020015
Submission received: 5 January 2026 / Revised: 17 March 2026 / Accepted: 20 March 2026 / Published: 25 March 2026

Abstract

Background/Objectives: AI-based tools for chest radiograph interpretation are increasingly used as decision-support systems, yet their performance must be validated in local clinical environments before deployment. This study evaluated the diagnostic performance of qXR (Qure.ai, v3.2) for detecting pulmonary nodules, cardiomegaly, and pleural effusion in adult patients at Hospital Clínica Bíblica, San José, Costa Rica. Methods: Three radiologists independently interpreted 225 chest radiographs, providing the reference standard. qXR outputs were compared against radiologist assessments for each finding. The sensitivity, specificity, Cohen’s kappa, and area under the ROC curve (AUC) were calculated. Due to the convenience-stratified sampling design, predictive values were not used for clinical interpretation. Results: For pulmonary nodules, qXR achieved a sensitivity of 0.71, specificity of 0.90, Cohen’s kappa of 0.51, and AUC of 0.80. For pleural effusion, sensitivity and specificity were both 0.86, with a kappa of 0.63 and AUC of 0.86. Cardiomegaly showed the lowest agreement, with a sensitivity of 0.64, specificity of 0.91, kappa of 0.57, and AUC of 0.77. Conclusions: qXR demonstrated moderate diagnostic agreement with radiologist assessments for pulmonary nodules and pleural effusion, and lower agreement for cardiomegaly under local imaging conditions. These results reflect technical concordance between the AI system and individual radiologists and do not constitute evidence of clinical utility or real-world impact. Context-specific validation is essential prior to integrating AI tools into routine radiological workflows.

1. Introduction

AI-based decision-support systems have demonstrated increasing potential to enhance diagnostic accuracy, workflow efficiency, and inter-reader consistency in medical imaging, particularly in radiology-intensive environments [1]. In chest radiography, deep learning algorithms have been shown to assist in the detection of clinically relevant findings such as pulmonary nodules, pleural effusions, and cardiomegaly, features that are often subtle and subject to inter-observer variability, especially in high-volume clinical settings [1,2]. These systems are primarily designed to augment, rather than replace, radiologist interpretation by prioritizing suspicious findings and providing standardized, reproducible assessments [3].
Chest X-ray imaging remains a cornerstone of diagnostic evaluation worldwide due to its accessibility, low cost, minimal radiation exposure, and utility as a first-line modality for a broad spectrum of cardiopulmonary conditions [4,5]. However, its diagnostic performance is highly dependent on reader expertise and workload, which may be unevenly distributed across healthcare systems. AI-based tools offer a scalable approach to support image interpretation, particularly in settings where subspecialty radiology expertise or advanced imaging modalities are limited [6].
Despite the rapid development and regulatory approval of several AI tools for chest radiograph interpretation, most validation studies have been conducted using datasets derived from high-income countries. These datasets may not adequately represent the demographic characteristics, imaging protocols, or equipment variability encountered in routine clinical practice in Latin American healthcare systems. Consequently, the external validity and generalizability of AI performance across different geographic, epidemiological, and operational contexts remain uncertain [7].
In Latin America, evidence evaluating the real-world performance of AI-assisted chest radiography is limited, and in Costa Rica, no published studies have assessed the application of AI algorithms for chest X-ray interpretation using locally generated data. The absence of national evidence derived from domestic imaging archives and interpreted by practicing local radiologists represents a critical gap, particularly given the increasing interest in integrating AI solutions into public and private healthcare systems.
The present study seeks to address this gap by evaluating the performance of the qXR software within the Costa Rican clinical context, using chest radiographs interpreted independently by local radiologists as the reference standard. By generating locally relevant performance data, this study contributes to the biomedical informatics literature by examining the contextual validity of AI-based diagnostic tools and informing evidence-based decisions regarding their potential implementation. Ultimately, establishing region-specific evidence is essential to ensure that AI integration in radiology supports safe, equitable, and clinically meaningful adoption within national health systems.

2. Materials and Methods

The initial study population comprised all posteroanterior (PA) and anteroposterior (AP) chest radiographs performed at the Radiology and Medical Imaging Department of Hospital Clínica Bíblica, main campus and Santa Ana campus, between 15 August 2024 and 28 February 2025, totaling 5017 radiographs.
Sample size was calculated using the formula for the comparison of two proportions [8]:
n   =   ( Z α / 2   +   Z β   ) 2 [ P 1 1 P 1 P 2 1 P 2 ]   ( P 1 P 2 ) 2 ,
n: required sample size
Z α / 2   = 1.96 (two-sided significance level of 0.05)
Z β   = 0.84 (statistical power of 80%)
P 1 = Sensitivity of radiologist detection
P 2 = Sensitivity of qXR detection
A two-sided significance level of 0.05 and a statistical power of 80% were applied. Sensitivity was selected as the primary parameter given the clinical priority of minimizing missed diagnoses. Published sensitivity estimates for the three radiological findings were used as reference values [9,10]. The sensitivity values for pleural effusion detection (radiologist: 0.76; qXR: 0.89) yielded the largest required sample size and were therefore used as the basis for calculation, resulting in a target of 225 radiographs.
The sample was distributed by convenience across the three findings and radiographs without pathological signs, yielding the following groups: 67 with pleural effusion, 67 with cardiomegaly, 24 with at least one pulmonary nodule, and 67 without findings. This distribution was intended to ensure minimum representation of each finding. Because only 24 radiographs with pulmonary nodules were available in the institutional database, priority was given to cardiomegaly, pleural effusion, and normal cases. A single radiograph could contain more than one finding; this grouping was applied solely for selection purposes and does not reflect the actual prevalence of these findings in the study population.
Since the sample was constructed by convenience with a predetermined distribution of findings (67–67–24–67), the positive predictive value (PPV) and negative predictive value (NPV) estimates derived from it do not reflect the expected performance in clinical practice under real prevalence conditions. For this reason, PPV and NPV are presented solely for descriptive purposes; clinical interpretation should rely primarily on sensitivity, specificity, and AUC, which are prevalence-independent metrics.
We included PA or AP chest radiographs corresponding to patients over 18 years old and performed at Hospital Clínica Bíblica. Studies other than chest radiographs, incomplete images, and qXR reports that did not specify nodule location or pleural effusion laterality were excluded. Image completeness was assessed by members of the research team prior to inclusion; selection required complete chest radiographs with no missing anatomical regions to ensure adequate interpretation and reporting.
All 225 selected radiographs fully met the established inclusion and exclusion criteria, and none were discarded for technical quality reasons. Radiographs were identified through the institutional Picture Archiving and Communication System (PACS) using structured report keywords and diagnostic codes associated with pleural effusion, cardiomegaly, and pulmonary nodules. Searches were performed by a member of the research team who was not involved in image interpretation. Radiographs without pathological findings were identified by selecting reports in which none of the three radiological signs under investigation were documented. Figure 1 shows the selection and interpretation process of the radiographs.
The study was approved by the Scientific Ethical Committee of Hospital Clínica Bíblica, which authorized the retrospective and anonymous evaluation of radiological images without requiring individual informed consent. All images were anonymized by the research team prior to data collection and analysis.

2.1. Image Acquisition

All chest radiographs included in this study were frontal projections, comprising both posterior–anterior (PA) and anterior–posterior (AP) acquisitions, obtained as part of routine clinical care at Hospital Clínica Bíblica. No dedicated imaging protocol was implemented for study purposes, and no selection, modification, or preprocessing of images was performed by the study team prior to qXR analysis.
Images were transmitted to qXR in their native DICOM format through an established institutional integration between the hospital’s Picture Archiving and Communication System (PACS) and the Qure Gateway. The Gateway, installed on a dedicated secondary server within the hospital network, handled anonymization and re-identification of studies before forwarding them to Qure.ai’s cloud-based analysis servers. This integration was operational prior to the study period and functioned as part of the hospital’s standard radiology workflow. The qXR algorithm processed each radiograph as received, without any manual intervention, image conversion, or quality filtering applied at the institutional level. The proportion of PA versus AP acquisitions was not prospectively recorded, and retrospective stratification was not feasible given that qXR does not provide functionality to filter or segment outputs by acquisition projection type.

2.2. qXR Artificial Intelligence Software

All anonymized chest radiographs were processed using the qXR artificial intelligence algorithm (Qure.ai, Mumbai, India). To ensure data confidentiality and avoid transfer outside the institution, all analysis was executed within the institutional firewall of Hospital Clínica Bíblica. We used the commercially available version of the software, qXR v3.2, Rev. 04, employing only manufacturer-preset parameters, including internal operating thresholds, with no adjustments or additional retraining for this cohort. Binary outputs were generated using the default manufacturer-defined operating thresholds embedded within the software. No post-processing modifications were applied.
For processing, qXR accepts only anonymized DICOM (.dcm) studies meeting predefined technical and clinical criteria. Radiographs labeled as “chest”, acquired in PA or AP projections, with a minimum resolution of 1440 × 1440 pixels and a grayscale depth of at least 10 bits, were included. The algorithm has been previously trained on large volumes of non-local images and has authorization for clinical use in more than 50 countries, including approval by the U.S. Food and Drug Administration (FDA).
qXR is an automated analysis system based on deep convolutional neural networks (CNNs), designed to detect multiple radiological findings. The software generates a continuous probability for each evaluated sign, accompanied by a binary label (“present”/”absent”) and, when applicable, a heat map highlighting regions of interest. In accordance with the manufacturer’s intellectual property policy, the exact model architecture, training parameters, and specific datasets used are not publicly available. Decision thresholds are internal to the system, not modifiable by the user, and not openly documented, as they form part of the software’s proprietary design. According to qXR documentation, the algorithm presents known limitations related to variability in non-standard images, the presence of artifacts, intrathoracic devices, and low-prevalence conditions [11].

2.3. Establishment of the Reference Standard in the Study

To define the reference standard (gold standard), three radiologists independently evaluated the chest radiographs included in the study. The 225 images were distributed equally among the radiologists, such that each specialist interpreted only a subset of them. For this reason, duplicate readings were not performed, nor was a consensus process conducted among evaluators; the individual report of each radiologist was considered the definitive diagnostic reference for the images assigned to them. This approach was selected to reflect real-world clinical practice, in which chest radiographs are typically interpreted by a single radiologist. While this strategy does not allow assessment of inter-reader variability, it provides an ecologically valid reference standard for evaluating AI performance in routine settings. The three participating radiologists were experienced professionals with 3 to 19 years of clinical experience. Each radiological report included an assessment of the following findings:
  • Presence or absence of pulmonary nodules, specifying quantity and anatomical location (left or right lung; upper, middle, or lower fields) when applicable.
  • Presence or absence of pleural effusion, indicating laterality.
  • Presence or absence of cardiomegaly.
Radiologists had access only to the images and did not receive clinical or demographic information about participants to ensure objective and independent interpretation. Radiologists were blinded to qXR outputs, and the research member performing statistical analysis was blinded to radiologist identity.

2.4. Anonymization and Data Recording

A sub-team of the research group assigned each image a unique identification code and then randomly distributed the radiographs among the three radiologists. Before readings began, the data collection instrument was presented to the radiologists, questions were clarified, and adjustments were made according to their expert recommendations.
Each radiologist interpreted the images independently, at different times and according to their availability, recording findings on the form designed for this purpose. The evaluations were conducted on the computers of the Medical Imaging Service at Hospital Clínica Bíblica.

2.5. Statistical Analysis

Data were recorded by part of the research team using an editable collection form on the REDCap platform and subsequently exported to Microsoft Excel 365, version 2602 (Microsoft Inc., Redmond, WA, USA). All statistical analyses were conducted by a biostatistician in RStudio version 4.5.0, using base R libraries and the pROC package.
For each radiological sign, radiograph classification was based on the independent observations of the three radiologists and the findings generated by qXR, from which binary variables were derived. Sensitivity, specificity, predictive values, Cohen’s kappa, anatomical accuracy, and area under the ROC curve (AUC) were estimated for each finding [12,13]. Confidence intervals for AUC values were calculated using DeLong’s method. For findings requiring additional characterization, specifically, nodule location and pleural effusion laterality, the corresponding kappa coefficient was also calculated. All analyses were conducted using a fully paired design between radiologist readings and AI outputs.

3. Results

Among the 225 radiographs included in the analysis, radiologists identified 31 pulmonary nodules, 85 cases of cardiomegaly, and 49 cases of pleural effusion, which were used as the reference standard to evaluate qXR performance. The distribution of true positives, true negatives, false positives, and false negatives for each sign is shown in Table 1.
Initially, 24 radiographs with at least one pulmonary nodule were identified in the institutional database, according to the diagnoses recorded in previous radiology reports. Since verification or reevaluation of these original reports was not part of the study objectives, no systematic comparison was made between those reports and the new interpretation. During the reading conducted specifically for this analysis, a total of 31 nodules were identified. This difference may be explained by interobserver variability and by the fact that, although 24 radiographs with at least one nodule were selected, each could contain more than one nodule, while the search filter only captured presence as a binary variable. For the performance analysis, this current reading of 31 nodules was used as the operational reference. A similar pattern was observed for the other two radiological findings. Interobserver variability, along with potential discrepancies between identifying a finding and correctly documenting it in the patient record, affecting whether it appears in the database when applying filters, may contribute to differences between the initial and final numbers of cases reported.

3.1. Specificity and Sensitivity

Methodological note: The PPV and NPV reported correspond to the convenience-stratified sample of the study and do not necessarily reflect values expected in populations with different prevalences.
For pulmonary nodule detection, qXR showed a sensitivity of 0.71 (95% CI: 0.53–0.83) and a specificity of 0.90 (95% CI: 0.84–0.93), with a PPV of 0.49 (95% CI: 0.35–0.63) and an NPV of 0.96 (95% CI: 0.92–0.98) (Table 2). Cohen’s Kappa was 0.51 (95% CI: 0.39–0.63, p = 0.0000), the lowest among the three signs evaluated. Regarding nodule location, a Kappa of 0.83 (95% CI: 0.62–1.0, p = 0.0000) was obtained.
For cardiomegaly, qXR achieved a sensitivity of 0.64 (95% CI: 0.57–0.70) and a specificity of 0.91 (95% CI: 0.85–0.94), with a PPV of 0.81 (95% CI: 0.70–0.88) and an NPV of 0.80 (95% CI: 0.74–0.86). Cohen’s Kappa was 0.57 (95% CI: 0.44–0.70, p = 0.0000).
In the detection of pleural effusion, qXR showed a sensitivity of 0.86 (95% CI: 0.73–0.93) and an equal specificity of 0.86 (95% CI: 0.80–0.90), with a PPV of 0.63 (95% CI: 0.51–0.73) and an NPV of 0.96 (95% CI: 0.91–0.98). Cohen’s Kappa was 0.63 (95% CI: 0.50–0.76, p = 0.0000). For effusion laterality identification, qXR obtained a Kappa of 0.76 (95% CI: 0.53–0.99, p = 0.0000).

3.2. Receiver Operating Characteristics (ROC) Curves

The area under the ROC curve was 0.80 (95% CI: 0.72–0.89) for pulmonary nodules, 0.86 (95% CI: 0.80–0.91) for pleural effusion, and 0.77 (95% CI: 0.71–0.83) for cardiomegaly (Figure 2).

4. Discussion

This study evaluated the diagnostic performance of qXR for detecting pulmonary nodules, cardiomegaly, and pleural effusion in chest radiographs of adult patients at Hospital Clínica Bíblica, San José, Costa Rica. Diagnostic accuracy metrics and discriminative capacity were assessed against the independent interpretation of three radiologists, providing locally generated performance data under heterogeneous imaging conditions characteristic of routine clinical practice.
For pulmonary nodules, qXR showed moderate agreement with the reference standard (kappa = 0.51; AUC = 0.80). These results are lower than those reported in prior validation studies of qXR for nodule detection. Mahboub et al., evaluating qXR on a dataset of 894 chest radiographs with three radiologists as ground truth, reported an AUC of 0.99 and sensitivity of 1.00, substantially exceeding the AUC of 0.80 observed in the present study [14,15]. Similarly, Govindarajan et al., in a prospective multicenter study of 65,604 chest radiographs across 35 centers in India, reported an AUC of 0.91 and sensitivity of 0.72 for nodule detection [16]. The performance gap between these studies and the present findings likely reflects several compounding factors: both reference studies were conducted under more controlled acquisition conditions, predominantly standardized PA radiographs in high-volume radiology networks, whereas our sample included heterogeneous PA and AP studies from a single community hospital. Additionally, the nodule subgroup in our study comprised only 24 cases, limiting statistical power and increasing susceptibility to sampling variability. It is also notable that both Mahboub et al. and Govindarajan et al. received funding or direct involvement from Qure.ai, which may introduce optimism bias in reported performance estimates, a recognized limitation in industry-sponsored AI validation studies. Predictive values should be interpreted with caution given the artificially manipulated prevalence in this sample; clinical conclusions should rely on sensitivity, specificity, and AUC, which are prevalence-independent metrics.
For cardiomegaly, qXR showed the lowest agreement among the three findings (kappa = 0.57; sensitivity = 0.64), consistent with the variability reported in the literature for this finding. AI performance for cardiomegaly is known to be influenced by cardiothoracic ratio variation, degree of lung inflation, and radiographic projection type [17,18]. Our sample included both PA and AP radiographs, which may have contributed to the observed variability; however, stratified analysis by projection type was not feasible, as qXR does not provide functionality to filter outputs by acquisition projection [19]. These findings are consistent with the broader literature on AI-based cardiomegaly detection. In the large multicenter study by Govindarajan et al., qXR achieved a sensitivity of 0.80 and AUC of 0.97 for cardiomegaly across 35 centers, values substantially higher than the sensitivity of 0.64 observed in our study [16]. Furthermore, a recent systematic review and meta-analysis by Kufel et al. synthesizing 14 studies and 70,472 chest radiographs reported a pooled AUC of 0.96 (95% CI 0.94–0.97) for AI-based cardiomegaly detection across diverse algorithms and datasets [18]. The sensitivity observed in our study falls well below both of these benchmarks. This discrepancy is most plausibly attributable to the uncontrolled inclusion of AP radiographs in our sample, a factor explicitly flagged by Kufel et al. as a source of interpretational variability, combined with the small size of our cardiomegaly positive subgroup and the absence of projection-stratified analysis.
For pleural effusion, qXR demonstrated balanced performance, with a sensitivity and specificity of 0.86, kappa of 0.63, and AUC of 0.86. For comparative context, Govindarajan et al. reported a sensitivity of 0.67 and specificity of 0.99 for pleural effusion detection in their prospective multicenter cohort, yielding an AUC of 0.97 [16]. The profile of our results differs notably: while our sensitivity exceeds that of Govindarajan et al., our specificity (0.86 vs. 0.99) is considerably lower. This pattern may reflect the influence of the uncontrolled projection mix in our sample, as AP acquisitions, particularly portable studies, produce basal opacification and diaphragmatic elevation patterns that can mimic or obscure small effusions, potentially inflating false-positive rates. Overall, the balanced sensitivity–specificity profile observed for pleural effusion suggests that this finding represents the most reliably detectable sign for qXR under heterogeneous local imaging conditions, consistent with the relatively high spatial conspicuity and distinct radiographic features of moderate-to-large effusions.
The intended clinical use case of qXR at our institution is as a second reader to assist radiologists in the interpretation of chest radiographs. Based on the performance metrics observed in this study, particularly the moderate kappa values across all three findings and the sensitivity limitations for cardiomegaly, the software did not demonstrate sufficient effectiveness to support this intended use under current local imaging conditions. These results reflect technical concordance between qXR and individual radiologist assessments and should not be interpreted as evidence of clinical utility or readiness for routine deployment.
Compared with previously published studies, a trend toward lower sensitivity and specificity was observed across all three findings [2,16]. This is most evident when contrasting our results with large multicenter studies such as Govindarajan et al., which differ substantially in design, scale, and acquisition conditions [16]. Furthermore, many commercial AI models are optimized for global normal/abnormal triage rather than sign-specific detection, which may partly explain the performance differences observed [16].
The convenience-stratified sampling design (67–67–24–67) introduced spectrum bias that directly affects PPV and NPV estimation. Clinical interpretation should therefore rely exclusively on sensitivity, specificity, and AUC. Additionally, the reference standard was based on independent radiologist interpretation without tomographic confirmation, and technical heterogeneity between PA and AP studies could not be controlled for. These methodological factors limit direct comparability with prior research and should be considered when interpreting the observed performance metrics.
Taken together, these findings contribute locally generated evidence on AI-assisted chest radiography in a Latin American clinical setting where adoption of these technologies is still emerging. The performance variability observed across findings and in comparison with international studies reinforces the importance of context-specific validation prior to implementation. Future studies should address clinical impact, performance under real prevalence conditions, multicenter reproducibility, and operational integration into routine workflows.

4.1. Limitations

Several limitations should be considered when interpreting these findings. First, the number of pulmonary nodules available in the institutional database was small (n = 24), which constrained the sample size for this finding and may limit the generalizability of nodule-specific performance metrics.
Second, the convenience-stratified design (67–67–24–67) does not reflect the real prevalence of these findings in the study population. While necessary to ensure minimum case representation per group, this approach introduces spectrum bias and prevents extrapolation of predictive values to clinical settings with different prevalence profiles. Additionally, radiographs classified as normal were selected from routine clinical examinations and may contain incidental findings unrelated to the three signs under investigation, which may further affect the representativeness of the normal group.
Third, the sample included both PA and AP radiographs, but the proportion of each projection type was not prospectively recorded. Retrospective stratification was not feasible, as qXR does not provide functionality to filter or segment outputs by acquisition projection. This is a relevant constraint, as AP acquisitions, particularly portable films, tend to magnify cardiac and mediastinal structures due to source-image geometry, potentially contributing to cardiomegaly misclassification [20].
Fourth, no confirmatory imaging was used to establish diagnostic ground truth. CT confirmation or histopathological verification was not obtained for suspected malignant nodules, and no subspecialty thoracic radiology review was performed. The reference standard relied exclusively on independent interpretation by three general radiologists, which, while appropriate for a preliminary validation study, limits certainty regarding diagnostic truth.
Fifth, this study was not designed to evaluate clinical impact. Aspects such as triage utility, workflow integration, workload reduction, and performance in non-specialized settings were outside the scope of this retrospective accuracy analysis and cannot be inferred from these results.
Finally, the single-center design limits generalizability to other institutions, imaging environments, and levels of care.
Notwithstanding these limitations, this study provides valuable locally generated evidence on AI-assisted chest radiography performance in Costa Rica. The use of three radiologists as the reference standard and the systematic evaluation of multiple accuracy metrics strengthen the internal validity of the findings and support the case for future prospective, multicenter investigations.

4.2. Future Research

Future research should move beyond retrospective accuracy assessment toward prospective evaluation of AI-assisted chest radiography as a decision-support tool in routine clinical practice.
Implementation-oriented studies could examine how qXR outputs are integrated into radiology reporting systems, how radiologists interact with algorithm-generated results during interpretation, and whether AI-assisted workflows affect reading time, diagnostic confidence, or error rates.
Use-case-specific evaluations would also be valuable, particularly for case prioritization, quality assurance, and support in high-volume settings. Defining clear operational scenarios prior to evaluation would yield more actionable evidence than generalized accuracy of metrics alone.
Longitudinal and multicenter studies are needed to assess consistency of AI performance over time and across institutions with varying imaging equipment, patient populations, and clinical workflows, conditions representative of the broader Latin American healthcare landscape.
Finally, future work should address human–AI interaction and governance, including algorithm transparency, user trust calibration, and alignment with local regulatory and ethical frameworks. These dimensions are especially relevant in middle-income health systems where AI adoption is emerging, and evidence beyond technical validation remains scarce.

5. Conclusions

qXR demonstrated moderate diagnostic agreement with radiologist assessments for pulmonary nodules (kappa = 0.51; AUC = 0.80) and pleural effusion (kappa = 0.63; AUC = 0.86) and lower agreement for cardiomegaly (kappa = 0.57; AUC = 0.77) under local imaging conditions at Hospital Clínica Bíblica. These results reflect technical concordance between the AI system and individual radiologists and do not constitute evidence of clinical utility or real-world impact.
This study contributes locally generated validation evidence for AI-assisted chest radiography in a Latin American clinical context where such data remain limited. The findings support the continued investigation of qXR as a complementary decision-support tool, while underscoring that context-specific validation is essential prior to the broader implementation of AI systems in radiological practice.

Author Contributions

Conceptualization, E.Z.-M., S.A.-C. and A.A.-A.; methodology, D.Q.-L., A.A.-A., G.H.-V., S.A.-C. and E.Z.-M.; software, S.A.-C.; validation, A.A.-A., G.H.-V., S.S.-S. and E.Z.-M.; formal analysis, D.Q.-L.; investigation, L.A.A.-C., S.D.S.-A., K.C.W.H. and A.A.-A.; resources, E.Z.-M. and S.A.-C.; data curation, D.Q.-L., A.A.-A., G.H.-V. and S.S.-S.; writing—original draft preparation, A.A.-A. and D.Q.-L.; writing—review and editing, E.Z.-M., M.F.M.-A., J.S.-S. and A.A.-A.; visualization, D.Q.-L., A.A.-A. and S.A.-C.; supervision, E.Z.-M.; project administration, A.A.-A.; funding acquisition, E.Z.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by AstraZeneca CAMCAR. No specific grant number was assigned.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee Hospital Clínica Bíblica (protocol code CEC-HCB-E017-2024 and date of approval 8 August 2025).

Informed Consent Statement

Patient consent was waived due to the retrospective nature of the study and the impracticality of contacting participants, including deceased patients or those with unavailable contact information. All data were anonymized to ensure confidentiality. The study was conducted in full compliance with national ethical regulations.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank Yazlin Alvarado-Rodríguez, Rebeca Agüero-Cedeño, Daniel Díaz-Juan, Briansy Angulo-Gaucherand and Daniel Nieto-Bernal for their support and assistance during the implementation of this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ROCReceiver Operating Characteristic
AUCArea Under the Curve
PAPosteroanterior
APAnteroposterior
PPVPositive Predictive Value
NPVNegative Predictive Value
FDAU.S. Food and Drug Administration
CNNsDeep Convolutional Neural Networks
CIConfidence Interval
CTRCardiothoracic Ratio

References

  1. Celik, A.; Surmeli, A.O.; Demir, M.; Esen, K.; Camsari, A. The diagnostic value of chest X-ray scanning by the help of Artificial Intelligence in Heart Failure (ART-IN-HF). Clin. Cardiol. 2023, 46, 1562–1568. [Google Scholar] [CrossRef] [PubMed]
  2. Mahboub, B.; Tadepalli, M.; Raj, T.; Santhanakrishnan, R.; Hachim, M.; Bastaki, U.; Hamoudi, R.; Haider, E.; Alabousi, A. Identifying malignant nodules on chest X-rays. A validation study of radiologist versus artificial intelligence diagnostic accuracy. Adv. Biomed. Health Sci. 2022, 1, 137–143. [Google Scholar] [CrossRef]
  3. Rohan, K.; Anupama, R.; Shivaraj, K. Artificial Intelligence in Radiology: Augmentation, Not Replacement. Cureus 2025, 6, e86247. [Google Scholar] [CrossRef] [PubMed]
  4. Homayounieh, F.; Digumarthy, S.; Ebrahimian, S.; Rueckel, J.; Hoppe, B.F.; Sabel, B.O.; Conjeti, S.; Ridder, K.; Sistermanns, M.; Wang, L.; et al. An Artificial Intelligence–Based Chest X-ray Model on Human Nodule Detection Accuracy From a Multicenter Study. JAMA Netw. Open 2021, 4, e2141096. [Google Scholar] [CrossRef] [PubMed]
  5. Niehoff, J.H.; Kalaitzidis, J.; Kroeger, J.R.; Schoenbeck, D.; Borggrefe, J.; Michael, A.E. Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays. Sci. Rep. 2023, 13, 3680. [Google Scholar] [CrossRef] [PubMed]
  6. Achour, N.; Zapata, T.; Saleh, Y.; Pierscionek, B.; Azzopardi-Muscat, N.; Novillo-Ortiz, D.; Morgan, C.; Chaouali, M. The role of AI in mitigating the impact of radiologist shortages: A systematized review. Health Technol. 2025, 15, 489–501. [Google Scholar] [CrossRef] [PubMed]
  7. Catalina, Q.M.; Vidal-Alaball, J.; Fuster-Casanovas, A.; Escalé-Besa, A.; Comellas, A.R.; Solé-Casals, J. Real-world testing of an artificial intelligence algorithm for the analysis of chest X-rays in primary care settings. Sci. Rep. 2024, 14, 5199. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, H.; Chow, S.C. Sample Size Calculation for Comparing Proportions. In Wiley Encyclopedia of Clinical Trials; D’Agostino, R.B., Sullivan, L., Massaro, J., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007; pp. 1–11. [Google Scholar]
  9. Zaki, H.A.; Albaroudi, B.; Shaban, E.E.; Shaban, A.; Elgassim, M.; Almarri, N.D.; Basharat, K.; Azad, A.M. Advancement in pleura effusion diagnosis: A systematic review and meta-analysis of point-of-care ultrasound versus radiographic thoracic imaging. Ultrasound J. 2024, 16, 3. [Google Scholar] [CrossRef] [PubMed]
  10. Blake, S.R.; Das, N.; Tadepalli, M.; Reddy, B.; Singh, A.; Agrawal, R.; Chattoraj, S.; Shah, D.; Putha, P. Using Artificial Intelligence to Stratify Normal versus Abnormal Chest X-rays: External Validation of a Deep Learning Algorithm at East Kent Hospitals University NHS Foundation Trust. Diagnostics 2023, 13, 3408. [Google Scholar] [CrossRef] [PubMed]
  11. Qure AI|AI Assistance for Accelerated Healthcare. Available online: https://www.qure.ai/ (accessed on 24 November 2025).
  12. Cerda, J.; Cifuentes, L. Uso de curvas ROC en investigación clínica: Aspectos teórico-prácticos. Rev. Chil. Infectol. 2012, 29, 138–141. [Google Scholar] [CrossRef] [PubMed]
  13. Cerda, L.J.; Villarroel Del, P.L. Evaluación de la concordancia inter-observador en investigación pediátrica: Coeficiente de Kappa. Rev. Chil. Pediatr. 2008, 79, 54–58. [Google Scholar] [CrossRef]
  14. Nam, J.G.; Park, S.; Hwang, E.J.; Lee, J.H.; Jin, K.-N.; Lim, K.Y.; Vu, T.H.; Sohn, J.H.; Hwang, S.; Goo, J.M.; et al. Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology 2019, 290, 218–228. [Google Scholar] [CrossRef] [PubMed]
  15. Schalekamp, S.; van Ginneken, B.; Koedam, E.; Snoeren, M.M.; Tiehuis, A.M.; Wittenberg, R.; Karssemeijer, N.; Schaefer-Prokop, C.M. Computer-aided detection improves detection of pulmonary nodules in chest radiographs beyond the support by bone-suppressed images. Radiology 2014, 272, 252–261. [Google Scholar] [CrossRef] [PubMed]
  16. Govindarajan, A.; Govindarajan, A.; Tanamala, S.; Chattoraj, S.; Reddy, B.; Agrawal, R.; Iyer, D.; Srivastava, A.; Kumar, P.; Putha, P. Role of an Automated Deep Learning Algorithm for Reliable Screening of Abnormality in Chest Radiographs: A Prospective Multicenter Quality Improvement Study. Diagnostics 2022, 12, 2724. [Google Scholar] [CrossRef] [PubMed]
  17. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
  18. Kufel, J.; Czogalik, Ł.; Bielówka, M.; Magiera, M.; Mitręga, A.; Dudek, P.; Bargieł-Łączek, K.; Stencel, M.; Bartnikowska, W.; Mielcarska, S.; et al. Measurement of Cardiothoracic Ratio on Chest X-rays Using Artificial Intelligence-A Systematic Review and Meta-Analysis. J. Clin. Med. 2024, 13, 4659. [Google Scholar] [CrossRef] [PubMed]
  19. American College of Radiology. Available online: https://www.acr.org/Data-Science-and-Informatics/AI-in-Your-Practice/AI-Use-Cases/Use-Cases/Cardiomegaly-Detection (accessed on 17 November 2025).
  20. Broder, J. Imaging the Chest: The Chest Radiograph. In Diagnostic Imaging for the Emergency Physician; Elsevier: Amsterdam, The Netherlands, 2011; pp. 185–296. [Google Scholar]
Figure 1. Flow diagram of chest radiograph selection, inclusion and exclusion criteria, ground truth, and AI interpretation. Abbreviations: PA, Posteroanterior; AP, Anteroposterior; AI, Artificial Intelligence.
Figure 1. Flow diagram of chest radiograph selection, inclusion and exclusion criteria, ground truth, and AI interpretation. Abbreviations: PA, Posteroanterior; AP, Anteroposterior; AI, Artificial Intelligence.
Biomedinformatics 06 00015 g001
Figure 2. Receiver operating characteristic (ROC) curves for the detection of pulmonary nodules (AUC = 0.80), pleural effusion (AUC = 0.86), and cardiomegaly (AUC = 0.77). The dashed diagonal line represents the reference to no discrimination.
Figure 2. Receiver operating characteristic (ROC) curves for the detection of pulmonary nodules (AUC = 0.80), pleural effusion (AUC = 0.86), and cardiomegaly (AUC = 0.77). The dashed diagonal line represents the reference to no discrimination.
Biomedinformatics 06 00015 g002
Table 1. Distribution of True and False Results by Radiological Sign.
Table 1. Distribution of True and False Results by Radiological Sign.
Radiological SignTrue PositivesTrue NegativesFalse PositivesFalse Negatives
Pulmonary nodule22197239
Cardiomegaly541271331
Pleural effusion42151257
Table 2. Agreement Statistics Between qXR and Radiologists by Radiological Sign.
Table 2. Agreement Statistics Between qXR and Radiologists by Radiological Sign.
Radiological SignSensitivitySpecificityPPVNPVCohen’s Kappa
Pulmonary nodule0.710.900.490.960.51
Cardiomegaly0.640.910.810.800.57
Pleural effusion0.860.860.630.960.63
Abbreviations: PPV, positive predictive value; NPV, negative predictive value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anchía-Alfaro, A.; Arguedas-Chacón, S.; Hanley-Vargas, G.; Suárez-Sánchez, S.; Aguilar-Castro, L.A.; Seas-Azofeifa, S.D.; Hsu, K.C.W.; Quesada-Loría, D.; Montero-Arias, M.F.; Salas-Segura, J.; et al. Evaluation of the ‘qXR’ Software for the Detection of Pulmonary Nodules, Cardiomegaly and Pleural Effusion: A Comparative Analysis in a Latin American General Hospital. BioMedInformatics 2026, 6, 15. https://doi.org/10.3390/biomedinformatics6020015

AMA Style

Anchía-Alfaro A, Arguedas-Chacón S, Hanley-Vargas G, Suárez-Sánchez S, Aguilar-Castro LA, Seas-Azofeifa SD, Hsu KCW, Quesada-Loría D, Montero-Arias MF, Salas-Segura J, et al. Evaluation of the ‘qXR’ Software for the Detection of Pulmonary Nodules, Cardiomegaly and Pleural Effusion: A Comparative Analysis in a Latin American General Hospital. BioMedInformatics. 2026; 6(2):15. https://doi.org/10.3390/biomedinformatics6020015

Chicago/Turabian Style

Anchía-Alfaro, Adriana, Sebastián Arguedas-Chacón, Georgia Hanley-Vargas, Sofía Suárez-Sánchez, Luis Andrés Aguilar-Castro, Sergio Daniel Seas-Azofeifa, Kal Che Wong Hsu, Diego Quesada-Loría, María Felicia Montero-Arias, Juliana Salas-Segura, and et al. 2026. "Evaluation of the ‘qXR’ Software for the Detection of Pulmonary Nodules, Cardiomegaly and Pleural Effusion: A Comparative Analysis in a Latin American General Hospital" BioMedInformatics 6, no. 2: 15. https://doi.org/10.3390/biomedinformatics6020015

APA Style

Anchía-Alfaro, A., Arguedas-Chacón, S., Hanley-Vargas, G., Suárez-Sánchez, S., Aguilar-Castro, L. A., Seas-Azofeifa, S. D., Hsu, K. C. W., Quesada-Loría, D., Montero-Arias, M. F., Salas-Segura, J., & Zavaleta-Monestel, E. (2026). Evaluation of the ‘qXR’ Software for the Detection of Pulmonary Nodules, Cardiomegaly and Pleural Effusion: A Comparative Analysis in a Latin American General Hospital. BioMedInformatics, 6(2), 15. https://doi.org/10.3390/biomedinformatics6020015

Article Metrics

Back to TopTop