Diagnostic Performance of Artificial Intelligence-Based Computer-Aided Detection and Diagnosis in Pediatric Radiology: A Systematic Review

Artificial intelligence (AI)-based computer-aided detection and diagnosis (CAD) is an important research area in radiology. However, only two narrative reviews about general uses of AI in pediatric radiology and AI-based CAD in pediatric chest imaging have been published yet. The purpose of this systematic review is to investigate the AI-based CAD applications in pediatric radiology, their diagnostic performances and methods for their performance evaluation. A literature search with the use of electronic databases was conducted on 11 January 2023. Twenty-three articles that met the selection criteria were included. This review shows that the AI-based CAD could be applied in pediatric brain, respiratory, musculoskeletal, urologic and cardiac imaging, and especially for pneumonia detection. Most of the studies (93.3%, 14/15; 77.8%, 14/18; 73.3%, 11/15; 80.0%, 8/10; 66.6%, 2/3; 84.2%, 16/19; 80.0%, 8/10) reported model performances of at least 0.83 (area under receiver operating characteristic curve), 0.84 (sensitivity), 0.80 (specificity), 0.89 (positive predictive value), 0.63 (negative predictive value), 0.87 (accuracy), and 0.82 (F1 score), respectively. However, a range of methodological weaknesses (especially a lack of model external validation) are found in the included studies. In the future, more AI-based CAD studies in pediatric radiology with robust methodology should be conducted for convincing clinical centers to adopt CAD and realizing its benefits in a wider context.


Introduction
Artificial intelligence (AI) is an active research area in radiology [1][2][3][4]. However, the investigation of use of AI for computer-aided detection and diagnosis (CAD) in radiology started in 1955. Any CAD systems are AI applications and can be subdivided into two types: computer-aided detection (CADe) and computer-aided diagnosis (CADx) [5][6][7]. The former focuses on the automatic detection of anomalies (e.g., tumor, etc.) on medical images, while the latter is capable of automatically characterizing anomaly types such as benign and malignant [7]. Since the 1980s, more researchers have become interested in the CAD system development due to availabilities of digital medical imaging and powerful computers. The first CAD system approved by The United States of America Food and Drug Administration was commercially available in 1998 for breast cancer detection [6].
Early AI-based CAD systems in radiology were entirely rule based, and their algorithms could not improve automatically. In contrast, machine learning (ML)-based and deep learning (DL)-based CAD systems can automatically improve their performances through training, and hence, they have become dominant. DL is a subset of ML, and its models have more layers than those of ML. The DL algorithms are capable of modeling high-level abstractions in medical images without predetermined inputs [5,8,9].

Literature Search
The literature search with the use of electronic scholarly publication databases, including EBSCOhost/Cumulative Index of Nursing and Allied Health Literature Ultimate, Ovid/Embase, PubMed/Medline, ScienceDirect, Scopus, SpringerLink, Web of Science, and Wiley Online Library was conducted on 11 January 2023 to identify articles investigating the diagnostic performance of the AI-based CAD in the pediatric radiology with no publication year restriction [12,19,20]. The search statement used was ("Artificial Intelligence" OR "Machine Learning" OR "Deep Learning") AND ("Computer-Aided Diagnosis" OR "Computer-Aided Detection") AND ("Pediatric" OR "Children") AND ("Radiology" OR "Medical Imaging"). The keywords used in the search were based on the review focus and systematic reviews on the diagnostic performance of the AI-based CAD in radiology [19][20][21][22][23].

Article Selection
A reviewer with more than 20 years of experience in conducting literature reviews was involved in the article selection process [14,24]. Only peer-reviewed original research articles that were written in English and focused on the AI-based CAD in pediatric radiology with the diagnostic accuracy measures were included. Gray literature, conference proceedings, editorials, review, perspective, opinion, commentary, and non-peer-reviewed (e.g., those published via the arXiv research-sharing platform, etc.) articles were excluded because this systematic review focused on the diagnostic performance of the AI-based CAD in the pediatric radiology and appraisal of the associated methodology reported in the refereed original articles. Papers mainly about image segmentation or clinical prediction instead of disease identification or classification were also excluded [12]. Figure 1 illustrates the details of the article selection process. A three-stage screening process through assessing (1) article titles, (2) abstracts, and (3) full texts against the selection criteria was employed after duplicate article removal from the results of the database search. Every non-duplicate article within the search results was retained until its exclusion could be decided [14,25,26].
Children 2023, 10, x FOR PEER REVIEW 3 of 18 reported in the refereed original articles. Papers mainly about image segmentation or clinical prediction instead of disease identification or classification were also excluded [12]. Figure 1 illustrates the details of the article selection process. A three-stage screening process through assessing (1) article titles, (2) abstracts, and (3) full texts against the selection criteria was employed after duplicate article removal from the results of the database search. Every non-duplicate article within the search results was retained until its exclusion could be decided [14,25,26].  Preferred reporting items for systematic reviews and meta-analyses flow diagram for systematic review of diagnostic performance of artificial intelligence-based computer-aided detection and diagnosis in pediatric radiology. CINAHL, Cumulative Index of Nursing and Allied Health Literature.

Data Extraction and Synthesis
Two data extraction forms (Tables 1 and 2) were developed based on a recent systematic review on the diagnostic performance of AI-based CAD in radiology [12]. The data, including author name and country, publication year, imaging modality, diagnosis, diagnostic performance of AI-based CAD system (area under receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy and F1 score), AI type (such as ML and DL) and model (e.g., support vector machine, convolutional neural network (CNN), etc.) for developing the CAD system, study design (either prospective or retrospective), source (such as public dataset by Guangzhou Women and Children's Medical Center, China) and size (e.g., 5858 images, etc.) of dataset for testing the CAD system, patient/population (such as 1-5-year-old children), any sample size calculation, model internal validation type (e.g., 10-fold cross-validation, etc.), any model external validation (i.e., any model testing with use of dataset not involved in internal validation and acquired from different setting), reference standard for ground truth establishment (such as histology and expert consensus), any model performance comparison with clinician and model commercial availability were extracted from each included paper. When diagnostic performance findings were reported for multiple AI-based CAD models in a study, only the values of the best performing model were presented [27]. Meta-analysis was not conducted because this systematic review covered a range of imaging modalities and pathologies, and hence, high study heterogeneity was expected, affecting its usefulness [12,13,28]. The Revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was used to assess the quality of all included studies [9,12,13,19,23,27,29].

Discussion
This article is the first systematic review on the diagnostic performance of the AIbased CAD in the pediatric radiology covering the brain [30][31][32][33][34][35][36][37][38], respiratory [42][43][44][45][46][47][48][49][50], musculoskeletal [40,41], urologic [51,52] and cardiac imaging [39]. Hence, it advances the previous two narrative reviews about various uses of AI in the pediatric radiology [17] and the AI-based CAD in the pediatric chest imaging [16] published in 2021 and 2022, respectively. Most of the included studies reported AI-based CAD model performances of at least 0.83 (AUC), 0.84 (sensitivity), 0.80 (specificity), 0.89 (PPV), 0.63 (NPV), 0.87 (accuracy), and 0.82 (F1 score) . However, the diagnostic performances of these CAD systems appeared a bit lower than those reported in the systematic review of the AI-based CAD in the radiology (pooled sensitivity and specificity: 0.87 and 0.93, respectively) [10]. In addition, the pediatric pneumonia was the only disease that was investigated by more than two studies [43,[45][46][47][48][49][50]. Although these studies reported that their CAD performances for the pneumonia diagnosis were at least 0.850 (AUC), 0.760 (sensitivity), 0.800 (specificity), 0.891 (PPV), 0.905 (accuracy) and 0.903 (F1 score), which would be sufficient to support less experienced pediatric radiologists in image interpretation, all but one were the retrospective studies and relied on the chest X-ray dataset consisting of 1741 normal and 4346 pneumonia images of 6087 1-5-year-old children collected from the Guangzhou Women and Children's Medical Center, China [13,[43][44][45][46][47][48][49][50]. It is noted that the use of the public dataset could facilitate AI-based CAD model performance comparison with other similar studies [43]. On the other hand, this approach would affect the model generalization ability (i.e., unable to maintain the performance when applying to different settings), causing the model to be unfit for real clinical situations [10,46]. Although techniques such as the cross-validation can be used to improve the AI-based CAD model generalization ability [37], only one of these studies used the cross-validation approach [50], while half of them did not report the internal validation type [43,45,49]. In addition, some ground truths given in the public datasets might be inaccurate, indicating potential reference standard issues [10,42]. These studies did not calculate the required sample size; perform the external validation; and compare their model performances with radiologists, but they are essential for the demonstration of the trustworthiness of study findings [43,45,46,[48][49][50]. As per Table 2, the aforementioned methodological issues were also common for other included studies. These issues are found in many studies about the AI-based CAD in the radiology as well [10,12,13]. Table 2 reveals that the DL and its model, CNN, were commonly used for the development of the AI-based CAD systems in the pediatric radiology similar to the situation in the radiology [13]. According to the recent narrative review about the AI-based CAD in the pediatric chest imaging published in 2022, 144 Conformité Européenne-marked AI-based CAD systems for brain (35%), respiratory, (27%), musculoskeletal (11%), breast (11%), other (7%), abdominal (6%) and cardiac (4%) imaging were commercially available in the radiology [16]. The proportions of these systems are comparable to the findings of this systematic review that the brain, respiratory and musculoskeletal imaging were the three most popular application areas of the AI-based CAD in the pediatric radiology and the cardiac imaging was the least (Table 1). However, except for Helm et al.'s retrospective study about the detection of pediatric pulmonary nodules in 29 3-18-year-old patients with the use of the AI-based CAD system developed for adults [44], no commercial system was involved in the included studies (  [44] was the only one that performed the external validation of the CAD system with the reference standard established by the consensus of six radiologists, and one of the few compared the CAD performance with the clinicians. However, that study only used four evaluation measures: sensitivity (0.42), specificity (1.00), PPV (1.00) and NPV (0.26), and the other metrics commonly used in more clinically focused studies, AUC and accuracy, were not reported [10,12,44,53]. This highlights that even for a more clinically focused AI-based CAD study in the pediatric radiology with the better design, the common methodological weaknesses such as the retrospective data collection with limited information of patient characteristics reported and cases included, and no sample size calculation, were still prevalent (Table 2) [44,54,55]. Hence, these explain the findings in Figure 2 that the concern regarding applicability was found in the patient selection, and the risk of bias was noted in both patient selection and reference standard categories, although similar results were also reported in the systematic reviews of the AI-based CAD in the radiology [10,12].
Apparently, the AI-based CAD in the pediatric radiology is less developed when compared to its adult counterpart. For example, not many studies were published before 2020 [33,35,36,44,52,, and the studies mainly focused on the MRI and X-ray and particular patient cohorts   (Table 2). Although Schalekamp et al.'s [16] narrative review published in 2022 suggested the use of the AI-based CAD designed for the adult population in children, Helm et al.'s [44] study demonstrated that this approach yielded low sensitivity (0.42) and NPV (0.26) in detecting pediatric pulmonary nodules because of the smaller nodule sizes in children. Hence, AI-based CAD systems specifically designed/finetuned for the pediatric radiology by researchers and/or commercial companies seem necessary in the future. In addition, for further research, more robust study designs that can address the aforementioned methodological issues (especially the lack of the external validation) are essential for providing trustworthy findings to convince clinical centers to adopt the AI-based CAD in the pediatric radiology. In this way, the potential benefits of the CAD could be realized in a wider context [5,10,12,13].
This systematic review has two major limitations. The article selection, data extraction, and synthesis were performed by a single author, albeit one with more than 20 years of experience in conducting the literature reviews [14]. According to a recent methodological systematic review, this is an appropriate arrangement provided that the single reviewer is experienced [14,24,[77][78][79]. Additionally, through adherence to the PRISMA guidelines and the use of the data extraction forms (Tables 1 and 2) devised based on the recent systematic review on the diagnostic performance of the AI-based CAD in the radiology and the QUADAS-2 tool, the potential bias should be addressed to a certain extent [12,14,26,29]. In addition, only articles in English identified via databases were included, potentially affecting the comprehensiveness of this systematic review [9,21,26,27,80]. Nevertheless, this review still has a wider coverage about the AI-based CAD in the pediatric radiology than the previous two narrative reviews [16,17].

Conclusions
This systematic review shows that the AI-based CAD for the pediatric radiology could be applied in the brain, respiratory, musculoskeletal, urologic and cardiac imaging. Most of the studies (93.3%, 14/15; 77.8%, 14/18; 73.3%, 11/15; 80.0%, 8/10; 66.6%, 2/3; 84.2%, 16/19; 80.0%, 8/10) reported AI-based CAD model performances of at least 0.83 (AUC), 0.84 (sensitivity), 0.80 (specificity), 0.89 (PPV), 0.63 (NPV), 0.87 (accuracy), and 0.82 (F1 score), respectively. The pediatric pneumonia was the most common pathology covered in the included studies. They reported that their CAD performances for pneumonia diagnosis were at least 0.850 (AUC), 0.760 (sensitivity), 0.800 (specificity), 0.891 (PPV), 0.905 (accuracy) and 0.903 (F1 score). Although these diagnostic performances appear sufficient to support the less experienced pediatric radiologists in the image interpretation, a range of methodological weaknesses such as the retrospective data collection, no sample size calculation, overreliance on public dataset, small test set size, limited patient cohort coverage, use of diagnostic accuracy measures and cross-validation, lack of model external validation and model performance comparison with clinicians, and risk of bias of reference standard are found in the included studies. Hence, their AI-based CAD systems might be unfit for the real clinical situations due to a lack of generalization ability. In the future, more AI-based CAD systems specifically designed/fine-tuned for a wider range of imaging modalities and pathologies in the pediatric radiology should be developed. In addition, more robust study designs should be used in further research to address the aforementioned methodological issues for providing the trustworthy findings to convince the clinical centers to adopt the AI-based CAD in the pediatric radiology. In this way, the potential benefits of the CAD could be realized in a wider context.