Evaluating the Checklist for Artificial Intelligence in Medical Imaging (CLAIM)-Based Quality of Reports Using Convolutional Neural Network for Odontogenic Cyst and Tumor Detection

This review aimed to explore whether studies employing a convolutional neural network (CNN) for odontogenic cyst and tumor detection follow the methodological reporting recommendations, the checklist for artificial intelligence in medical imaging (CLAIM). We retrieved the CNN studies using panoramic and cone-beam-computed tomographic images from inception to April 2021 in PubMed, EMBASE, Scopus, and Web of Science. The included studies were assessed according to the CLAIM. Among the 55 studies yielded, 6 CNN studies for odontogenic cyst and tumor detection were included. Following the CLAIM items, abstract, methods, results, discussion across the included studies were insufficiently described. The problem areas included item 2 in the abstract; items 6–9, 11–18, 20, 21, 23, 24, 26–31 in the methods; items 33, 34, 36, 37 in the results; item 38 in the discussion; and items 40–41 in “other information.” The CNN reports for odontogenic cyst and tumor detection were evaluated as low quality. Inadequate reporting reduces the robustness, comparability, and generalizability of a CNN study for dental radiograph diagnostics. The CLAIM is accepted as a good guideline in the study design to improve the reporting quality on artificial intelligence studies in the dental field.


Introduction
Advances in digital dentistry, along with rapid developments in diagnostic artificial intelligence (AI), have the potential to improve diagnostic accuracy. In addition, AIbased applications can assist dentists in making timely interventions and increase their working performance. Applications of AI in dentistry include detection, segmentation, and classification of anatomy (tooth, root morphology, and mandible) and pathology (caries, periodontal inflammation, and osteoporosis) [1,2].
In the last decade, deep-learning methods such as the convolutional neural network (CNN) have been demonstrated to achieve remarkable results on panoramic and conebeam-computed tomographic (CBCT) images [3][4][5]. Consequently, an increasing number of studies are employing the CNN framework. Indeed, most studies for the automated detection of odontogenic cysts and tumors are based on this framework and achieved high performance [6][7][8]. However, AI has several challenges in terms of robustness, comparability, and generalizability in medical imaging.
In the medical field, several checklists are applied to report the evaluation of machine learning models [9][10][11][12]; these include the Standards for Reporting of Diagnostic Accuracy Studies (STARD) [13][14][15][16], Consolidated Standards of Reporting Trials (CONSORT)-AI extension [17], and the AI checklist in dental research [18]. Recently, the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) has been developed based on the consensus of radiological experts and is viewed as the best guideline for presenting research [19]. To the best of our knowledge, no previous systematic review has evaluated the methodological quality among studies on AI in dentistry. Therefore, this systematic review was methodologically performed on available studies using CNN for automated detection of odontogenic cysts and tumors to determine if the reports adequately adhered to the items of the CLAIM guideline.

Inclusion and Exclusion Criteria
All of the reports employing the CNN model to examine the performance of automated detection of odontogenic cysts and tumors on panoramic and CBCT images were eligible. We excluded methodological reviews, studies not employing CNN, studies unrelated to the topic, and studies not involving humans.

Electronic Search
A comprehensive literature search was conducted on electronic databases, including PubMed, EMBASE, Scopus, and Web of Science, from inception to 18 April 2021. The search strategy was a combination of MeSH (medical subject heading) terms and free text words, including "deep learning" (MeSH Terms) OR deep learning (Text Word) OR convolution neural network (Text Word)) AND ("odontogenic tumors" (MeSH Terms) OR odontogenic tumor (Text Word) OR "odontogenic cysts" (MeSH Terms) OR odontogenic cysts (Text Word). In this review, we provided the detailed search strategy in Supplementary Table S1. In addition, there was no language restriction in this review.

Manual Searching
In addition to searching electronic databases, the list of bibliographic references of the included studies was screened to identify potentially relevant additional studies. Furthermore, we also searched opengrey.eu from inception to April 2021 for eligible studies in grey literature.

Study Selection
The title and abstract of each of the identified studies were independently screened by two reviewers (V.N.T.L. and D.-W.L.) to discard duplicates and studies that did not satisfy the inclusion criteria. After, the full-text articles were examined when information was provided insufficiently in the abstract. A third reviewer (Y.-M.Y.) resolved any disagreement during this process. Full-text articles that satisfied the inclusion criteria were independently assessed by two reviewers (V.N.T.L. and D.-W.L.) with clinical knowledge of odontogenic cysts and tumors and methodological knowledge of AI research.

Data Extraction
Two reviewers (V.N.T.L. and D.-W.L.) independently extracted the data from each included article into predesigned data collection forms on Microsoft Word: (1) General characteristics (primary author, country, date of publication, journal name); (2) Specific characteristics (studies objectives, dataset, CNN model, comparative analysis, outcome metrics, and performance). Discrepancies were resolved by discussion with a third reviewer (Y.-M.Y.).

Reporting Epidemiological and Descriptive Characteristics
Among the included CNN studies, epidemiological and descriptive characteristics were assessed for journal category, location and job of corresponding author, guideline for reporting, and funding source.

Reporting of Methodological Elements of the Included CNN Studies
This systematic review was performed based on the CLAIM guideline [19], which includes 42 items. According to this guideline, we examined whether methodological elements were reported in the included CNN studies.

Statistical Analysis
Regarding categorical data, numbers (percentages) are used to summarize descriptive statistics. Among the included studies, absolute and relative frequencies are used to summarize the information extracted from the CLAIM items.

Study Selection
The search strategy yielded a total of 55 studies from electronic databases and manual searching. After removing duplication, 49 studies were selected, of which 26 were removed after filtering the title and abstract. Finally, 23 studies were assessed for eligibility by full-text review. At this stage, the studies were excluded for some reasons, such as methodological review (n = 4), articles unrelated to the topic (n = 12), and articles not involving human participants (n = 1). Finally, six reports were included in the systematic review [6][7][8][20][21][22] (Figure 1). characteristics (studies objectives, dataset, CNN model, comparative analysis, outcom metrics, and performance). Discrepancies were resolved by discussion with a third reviewer (Y.-M.Y.).

Reporting Epidemiological and Descriptive Characteristics
Among the included CNN studies, epidemiological and descriptive characteristic were assessed for journal category, location and job of corresponding author, guidelin for reporting, and funding source.

Reporting of Methodological Elements of the Included CNN Studies
This systematic review was performed based on the CLAIM guideline [19], which includes 42 items. According to this guideline, we examined whether methodologica elements were reported in the included CNN studies.

Statistical Analysis
Regarding categorical data, numbers (percentages) are used to summariz descriptive statistics. Among the included studies, absolute and relative frequencies ar used to summarize the information extracted from the CLAIM items.

Study Selection
The search strategy yielded a total of 55 studies from electronic databases and manua searching. After removing duplication, 49 studies were selected, of which 26 wer removed after filtering the title and abstract. Finally, 23 studies were assessed fo eligibility by full-text review. At this stage, the studies were excluded for some reasons such as methodological review (n = 4), articles unrelated to the topic (n = 12), and article not involving human participants (n = 1). Finally, six reports were included in th systematic review [6][7][8][20][21][22] (Figure 1).

Items and Subcategory
No. (%) of Reports

Journal Category
Biomedical engineering field 2 (33%) Dental or medical field 4 (67%) Location of corresponding author

Reporting of CLAIM Items across the Included Studies
In all studies, 42 methodological items are reported (Supplementary Table S2). In the abstract section, two studies (33%) did not present a structured summary of the study design, methods, results, and conclusions (item 2) [7,20].

Discussion
In our systematic review, the included CNN studies only improved the model performance for automated odontogenic cyst and tumor detection. This can help to reduce morbidity and mortality through long-term follow-up and early intervention. However, the application of AI must remain grounded in the fundamental tenets of science and scientific publication, which are evident in the design and reporting.
To examine the level of compliance with design and reporting standards, we evaluated six reports employing CNN for odontogenic cyst and tumor detection based on the CLAIM. Recently, CLAIM was used as evaluation guidance in the design and reporting of CNN studies for brain metastasis detection [23], knee imaging [24], and radiological cancer diagnosis [25] in the medical field. In our study, none of the CNN reports followed any previous reporting guidelines. After evaluation, the methodological reporting recommendations of the CLAIM guideline were missing in most CNN studies. Among the included CNN studies, we found a lack of adherence to the standards of CLAIM in the abstract, methods (study design, data, ground truth, data partitions, model, training, and evaluation), results, discussion, and supplementary sections. These findings indicate that the robustness, comparability, and generalizability of the CNN studies for the automated detection of odontogenic cysts and tumors are not guaranteed. Consequently, the reporting quality related to the AI application for medical imaging must be improved for a clear, transparent, and reproducible CNN study.
Among the included studies, high heterogeneity of study design can influence the robustness, comparability, and generalizability of the CNN studies for automated odontogenic cyst and tumor detection. Regarding the sample characteristics, the location and category of lesions were inconsistent in the datasets. Furthermore, private datasets were different in size. Therefore, a benchmark dataset should be required to solve these issues. In addition to sample issues, comparators should be consistent across studies to reduce the bias of datasets. Especially, outcome measurement should be standardized to improve the comparability of the model performance. In general, these issues usually occur in novel studies; in particular, deep learning is an emerging approach, and the included studies have only been published in the last three years.
From previous studies, the quality of "AI for health" studies remains low, and reporting is often insufficient to fully comprehend and possibly replicate these studies [26][27][28]. In dental and oral sciences, the emergence of standards towards reporting is necessary given the increasing number of recent CNN studies [1]. CNN is one type of deep learning algorithm that is used in many branches of computer vision dealing with medical image analysis and represents a future computer-aided technology for medical and dental experts [29]. To improve the performance of future CNN studies, authors should examine assumptions in greater detail and report valid and adequate items following the CLAIM guidelines. Regarding medical imaging, the CLAIM is the best guideline for presenting research and is relatively new; this guideline should be applied widely to improve the reporting of AI research.

Strengths and Limitations
Our review is the first to investigate reporting quality of CNN studies for the automated detection of odontogenic cysts and tumors. However, our study has some limitations. From the reader's position, we only evaluated the CNN reports for the automated detection of odontogenic cysts and tumors. Moreover, we realized that the AI researchers may have omitted or removed important details during publication despite using the proper methods. Further studies should be performed to compare our reported results with those of CONSORT-AI extensions and the STARD checklist to ascertain the reliability of the results. In addition, we did not investigate other dental issues because we only intended to evaluate the quality of CNN reports for odontogenic cyst and tumor detection. However, we recommend that the CLAIM should be considered as the best framework to help AI researchers for reporting any issue in dentistry.

Conclusions
This review revealed that the CLAIM-based quality of CNN reports for odontogenic cyst and tumor detection is considered low-level. Performing a CNN study with insufficient reporting raises the likelihood of producing invalid results. Therefore, the CLAIM is accepted as a good guideline in the study design to help authors writing AI manuscripts in dentistry.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/app11209688/s1. Table S1: Detailed search strategies for each database. Mesh terms, search terms, and combinations of the two were used for each database search. Table S2: Evaluating the CLAIM-based quality of CNN reports for odontogenic cyst and tumor detection.