Deep Learning Methods in Medical Image-Based Hepatocellular Carcinoma Diagnosis: A Systematic Review and Meta-Analysis

Simple Summary In this study, after conducting a comprehensive review of 1356 papers that evaluated the diagnostic performance of deep learning (DL) methods based on medical images for hepatocellular carcinoma (HCC), the findings showed a pooled sensitivity of 89% (95% CI: 87–91), a specificity of 90% (95% CI: 87–92), and an AUC of 0.95 (95% CI: 0.93–0.97). In addition, both the DL methods and human clinicians demonstrated similar levels of performance in HCC detection, with receiver operating characteristic curve (ROC) values of 0.97 (95% CI: 0.95–0.98) for both groups, indicating no discernible difference. Although the heterogeneity was obvious, the utilization of DL methods for diagnosing HCC through medical images has shown promising outcomes. Abstract (1) Background: The aim of our research was to systematically review papers specifically focused on the hepatocellular carcinoma (HCC) diagnostic performance of DL methods based on medical images. (2) Materials: To identify related studies, a comprehensive search was conducted in prominent databases, including Embase, IEEE, PubMed, Web of Science, and the Cochrane Library. The search was limited to studies published before 3 July 2023. The inclusion criteria consisted of studies that either developed or utilized DL methods to diagnose HCC using medical images. To extract data, binary information on diagnostic accuracy was collected to determine the outcomes of interest, namely, the sensitivity, specificity, and area under the curve (AUC). (3) Results: Among the forty-eight initially identified eligible studies, thirty studies were included in the meta-analysis. The pooled sensitivity was 89% (95% CI: 87–91), the specificity was 90% (95% CI: 87–92), and the AUC was 0.95 (95% CI: 0.93–0.97). Analyses of subgroups based on medical image methods (contrast-enhanced and non-contrast-enhanced images), imaging modalities (ultrasound, magnetic resonance imaging, and computed tomography), and comparisons between DL methods and clinicians consistently showed the acceptable diagnostic performance of DL models. The publication bias and high heterogeneity observed between studies and subgroups can potentially result in an overestimation of the diagnostic accuracy of DL methods in medical imaging. (4) Conclusions: To improve future studies, it would be advantageous to establish more rigorous reporting standards that specifically address the challenges associated with DL research in this particular field.


Introduction
Liver cancer, also known as HCC, is a prevalent and deadly form of cancer, ranking as the sixth most common type worldwide and the third leading cause of mortality [1].Early-stage HCC often lacks noticeable symptoms, which can lead to delayed diagnosis as the cancer progresses.Symptoms, such as fatigue, weight loss, or abdominal discomfort, can be nonspecific and resemble other liver diseases, such as cirrhosis and hepatitis.This similarity poses challenges in differentiating and promptly diagnosing HCC [2,3].HCC is characterized by tumor heterogeneity, which can impact the accuracy of tissue sampling and biopsy results, further complicating diagnosis confirmation [4].Therefore, accurate and reliable technologies are crucial for the effective early detection of HCC.Medical imaging is essential in clinical practice for diagnosis, staging, and treatment planning.Modalities such as ultrasound (US), magnetic resonance imaging (MRI), and computed tomography (CT) are noninvasive and offer valuable tumor images, reducing patient discomfort and risks compared to invasive procedures like biopsies [5][6][7].These techniques provide detailed anatomical and pathological information, aiding in determining tumor characteristics such as size, location, and malignancy.However, the interpretation of medical images still relies on the subjective judgment and experience of healthcare professionals.There can be variability in diagnostic results among different doctors, introducing subjectivity [8,9].Given the variation in expertise, achieving accurate and timely diagnoses based on medical images remains challenging.
DL is a machine learning technique that includes multiple model architectures and can solve various types of machine learning problems.Common DL methods are based on convolutional neural networks, recurrent neural networks, long short-term memory networks, generative adversarial networks, etc. DL methods have shown promising results in the automatic detection of medical images, enabling automatic diagnosis and classification of diseases by analyzing and identifying features and lesions [10,11].Compared to manual analysis, DL methods offer faster processing and improved efficiency, reducing the burden on doctors.DL methods typically consist of several steps: building a DL model, collecting and processing data, setting model parameters, completing model training, and evaluating and tuning the model.In addition, the dataset used by DL methods can usually be divided into the training set, the validation set, and the test set.The data can be augmented, cropped, and processed by other enhancement methods.DL methods use multi-level feature extraction networks to simulate and learn complex features of data.They learn complex visual patterns and features from much medical image data, enhancing diagnostic accuracy.DL methods outperform traditional approaches by capturing more diagnostically significant subtle features, aiding in the accurate assessment of pathological changes [12,13].Additionally, DL models analyze large-scale medical image data, revealing hidden patterns and correlations, thus improving the understanding of disease mechanisms, variations, progression, and prognosis [14,15].However, there is currently a lack of comprehensive evidence on the use of DL-based methods for HCC detection.Accordingly, this study aimed to provide a systematic review and meta-analysis of published data to evaluate the diagnostic performance of DL methods based on medical images in detecting HCC.

Protocol Registration and Study Design
We registered our study protocol in PROSPERO with the number CRD42023442527.The study followed the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) to ensure comprehensive and transparent reporting [16] and Assessing the Methodological Quality of Systematic Reviews (AMSTAR) guidelines [17].Informed consent from all subjects (patients) was not required because our data came from the open database.

Search Strategy and Eligibility Criteria
A systematic search was conducted with several databases, including Embase, IEEE, PubMed, Web of Science, and the Cochrane Library.The search aimed to identify studies published from the inception of the database to July 3, 2023 that focused on the development of DL methods for diagnosing HCC based on medical images.Supplementary Note SI summarized the search terms and search strategy used in each database.No limitations were imposed regarding publication types, regions, or language.However, conference abstracts, scientific reports, letters, and narrative reviews were excluded.A team of clinicians and investigators collaboratively developed a comprehensive search strategy for each database.
Two investigators assessed eligibility by screening titles, abstracts, and relevant citations.Discrepancies were resolved through discussion with an additional contributor.Inclusion criteria consisted of studies reporting DL models' diagnostic performance in early HCC detection using medical images.Studies that reported diagnostic results such as sensitivity and specificity or detailed information on 2 × 2 contingency tables were considered eligible for inclusion.The use of DL models was not limited by participant characteristics, imaging modality, or intended setting.

Data Extraction
The study characteristics and diagnostic yield data were independently extracted by two investigators using a standardized data extraction sheet.Uncertainties were resolved through discussions with a third researcher.With a meticulous approach, we diligently extracted the diagnostic accuracy data and precisely organized it into contingency tables, including the number of true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs).

Study Quality Assessment
Three researchers utilized the QUADAS-2 tool to assess both the risk of bias and concerns about the suitability of the included studies [18].This tool was specifically chosen to aid in the evaluation process.

Statistical Analysis
Hierarchical SROC curves assessed the DL methods' diagnostic performance, presenting averaged estimates of sensitivity, specificity, AUC, and 95% CI with prediction regions.A meta-analysis using contingency tables identified the most accurate DL methods across studies with multiple methods.Heterogeneity was evaluated with the I 2 statistic, exploring potential sources through subgroup meta-analyses and regression analyses.The random-effects model accounted for the substantial heterogeneity.Publication bias was assessed visually with funnel plots.
In the process of collating the data, we found that DL methods combined with contrastenhanced images had higher diagnostic accuracy than non-contrast-enhanced images.The following subanalyses were further conducted: (a) Based on the medical image method, the DL methods were divided into two categories: contrast-enhanced and non-contrastenhanced images.Image enhancement methods used contrast media, and non-contrastenhanced images did not use contrast media.(b) The DL methods were categorized by their respective imaging modalities, including CT, US, and MRI.(c) The DL methods were classified as internal or external methods depending on the type of validation.Internal validation was conducted using internal data for validation.External validation was conducted using external data for validation.(d) The DL methods were assessed and compared with human clinicians based on aggregated performance measures using the same datasets.Meta-analyses were conducted if at least three original studies were available.We harnessed the mighty STATA (version 17.0) to dissect our data with precision.Our threshold for significance was set at p < 0.05, and we employed a robust two-sided approach to determine statistical significance for all tests.

Study Selection and Characteristics
As shown in Figure 1, after removing 316 duplicates, our initial search yielded 1356 records, of which 1040 underwent screening.From the title and abstract screening, 944 studies were excluded; therefore, there were 96 for further full-text screening.Ultimately, 48 articles  were considered appropriate in our review, with 30  offering data for further meta-analysis.Among these studies, 45 utilized retrospective data, 1 employed prospective data, and 2 sourced data from open-access sources.Out of the identified studies, 5 utilized out-of-sample datasets for external validation.Five studies compared the performance of the DL methods to that of clinicians utilizing the identical dataset.Medical imaging modalities were classified into the following categories: MRI (n = 10), US (n = 7), and CT (n = 13).Tables 1 and 2 showed the characteristics of the included study.
Our threshold for significance was set at p < 0.05, and we employed a robust two-sided approach to determine statistical significance for all tests.

Study Selection and Characteristics
As shown in Figure 1, after removing 316 duplicates, our initial search yielded 1356 records, of which 1040 underwent screening.From the title and abstract screening, 944 studies were excluded; therefore, there were 96 for further full-text screening.Ultimately, 48 articles  were considered appropriate in our review, with 30  offering data for further meta-analysis.Among these studies, 45 utilized retrospective data, 1 employed prospective data, and 2 sourced data from open-access sources.Out of the identified studies, 5 utilized out-of-sample datasets for external validation.Five studies compared the performance of the DL methods to that of clinicians utilizing the identical dataset.Medical imaging modalities were classified into the following categories: MRI (n = 10), US (n = 7), and CT (n = 13).Tables 1 and 2 showed the characteristics of the included study.

Overall Performance of the DL Methods
Out of the 48 studies included, 30 offered enough data for creating contingency tables and calculating diagnostic yields, making them eligible for the meta-analysis.The metaanalysis included 102 contingency tables, as shown in Figure 2A, with a pooled sensitivity of 89% (95% CI: 87-91) and specificity of 90% (95% CI: 87-92) across all the DL methods.The AUC was determined to be 0.95 (95% CI: 0.93-0.97).

Subgroup Meta-Analyses
Among the studies included in the analyses, 23 studies focused on contrast-enhanced images, resulting in a total of 65 contingency tables.The pooled sensitivity for these studies was 92% (95% CI: 89-93), while the pooled specificity was 94% (95% CI: 92-96).Additionally, the AUC was 0.97 (95% CI: 0.96-0.99).More detailed information is shown in Figure 3A.Furthermore, 6 studies did not investigate image contrast enhancement, contributing a total of 30 contingency tables.The pooled sensitivity of these studies was 84% (95% CI: 81-86), the pooled specificity was 80% (95% CI: 77-82), and the AUC value was 0.89 (95% CI: 0.85-0.91).More details regarding these studies could be found in Figure 3B.Since most studies investigated multiple DL methods and reported their diagnostic performance, we chose to report the highest accuracy achieved by various DL methods across the included studies, which resulted in 30 contingency tables.As we combined their findings, we discovered a pooled sensitivity of 93% (95% CI: 91-95) and a specificity of 95% (95% CI: 92-97).The AUC was calculated to be 0.98 (95% CI: 0.96-0.99).More detailed information was shown in Figure 2B.
curves of all studies included in the meta-analysis (102 tables of 30 studies), and (B) ROC curves of studies reporting the highest accuracy (30 tables of 30 studies).
The meta-analysis included 29 studies that used within-sample datasets, comprising a total of 92 contingency tables.For these studies, the pooled sensitivity and specificity were 89% (95% CI: 87-91) and 90% (95% CI: 88-92), respectively.The AUC was determined to be 0.95 (95% CI: 0.93-0.97),as illustrated in Figure 5A.External validation was performed in only 5 studies, contributing 10 contingency tables.The pooled sensitivity and specificity for these studies were 93% (95% CI: 89-96) and 83% (95% CI: 69-91), respectively.The AUC was calculated as 0.95 (95% CI: 0.93-0.97),as shown in Figure 5B.The meta-analysis included 29 studies that used within-sample datasets, comprising a total of 92 contingency tables.For these studies, the pooled sensitivity and specificity were 89% (95% CI: 87-91) and 90% (95% CI: 88-92), respectively.The AUC was determined to be 0.95 (95% CI: 0.93-0.97),as illustrated in Figure 5A.External validation was performed in only 5 studies, contributing 10 contingency tables.The pooled sensitivity and specificity for these studies were 93% (95% CI: 89-96) and 83% (95% CI: 69-91), respectively.The AUC was calculated as 0.95 (95% CI: 0.93-0.97),as shown in Figure 5B.Among the 30 included studies, 6 directly compared the diagnostic performance of the DL methods with human clinicians using the same dataset.These studies consisted of 20 contingency tables for the DL methods and 10 contingency tables for the human clinicians.The pooled sensitivity for the DL methods was 91% (95% CI: 88-93), while the human clinicians had a pooled sensitivity of 88% (95% CI: 80-93).The pooled specificity for the DL methods was 92% (95% CI: 89-95), compared to 95% (95% CI: 89-97) for the human clinicians.Both the DL methods and the human clinicians exhibited an AUC value of 0.97 (95% CI: 0.95-0.98),as depicted in Figure 6A,B.Among the 30 included studies, 6 directly compared the diagnostic performance of the DL methods with human clinicians using the same dataset.These studies consisted of 20 contingency tables for the DL methods and 10 contingency tables for the human clinicians.The pooled sensitivity for the DL methods was 91% (95% CI: 88-93), while the human clinicians had a pooled sensitivity of 88% (95% CI: 80-93).The pooled specificity for the DL methods was 92% (95% CI: 89-95), compared to 95% (95% CI: 89-97) for the human clinicians.Both the DL methods and the human clinicians exhibited an AUC value of 0.97 (95% CI: 0.95-0.98),as depicted in Figure 6A,B.Among the 30 included studies, 6 directly compared the diagnostic performance of the DL methods with human clinicians using the same dataset.These studies consisted of 20 contingency tables for the DL methods and 10 contingency tables for the human clinicians.The pooled sensitivity for the DL methods was 91% (95% CI: 88-93), while the human clinicians had a pooled sensitivity of 88% (95% CI: 80-93).The pooled specificity for the DL methods was 92% (95% CI: 89-95), compared to 95% (95% CI: 89-97) for the human clinicians.Both the DL methods and the human clinicians exhibited an AUC value of 0.97 (95% CI: 0.95-0.98),as depicted in Figure 6A,B.

Heterogeneity Analysis
The meta-analysis of 30 studies indicated that the DL methods were beneficial in diagnosing HCC from medical imaging based on the random-effects model.However, the sensitivity showed an I 2 value of 99.83% (p < 0.01) and the specificity had an I 2 value of 99.84% (p < 0.05), indicating high heterogeneity.Supplementary Figure S4 presents more details on these results.The detailed results of the subgroup (Supplementary Figures S5-S7) and meta-regression analyses (Supplementary Table S1) explored the potential sources of between-study heterogeneity.Apart from the imaging modality, both medical image methods and validation types demonstrated statistically significant differences, corroborating the findings from the subgroup.To evaluate publication bias, a funnel plot was generated.The result showed that this study had obvious publication bias (p < 0.05).More detailed information was presented in Supplementary Figure S3.

Quality Assessment
QUADAS-2 was used to assess the quality of the included studies, and the results were summarized in Supplementary Figure S1.Supplementary Figure S2 present a detailed evaluation of each item related to the risk of bias and applicability concerns.For patient selections (n = 30) and reference standards (n = 22), over half of the studies demonstrated a high or unclear risk of bias.This was mainly due to a lack of clarity in describing the included patients, such as previous testing, presentation, setting, intended use of the index test, and insufficient external evaluation.

Discussion
Through this study, we assessed the diagnostic effectiveness of DL methods for HCC detection based on medical images.When averaging the results across the studies, the pooled sensitivity, specificity, and AUC were found to be 89%, 90%, and 0.95, respectively.When determining the highest accuracy of each DL method among the included studies, we found that the DL methods demonstrated superior performance in terms of sensitivity (93%), specificity (95%), and AUC (0.98).In subgroup analysis, to begin with, we found that DL methods combined with contrast-enhanced images had higher diagnostic accuracy than non-contrast-enhanced images.The reason may be that the image enhancements have a higher resolution ratio so that tiny lesions are displayed more clearly, which is more conducive to the diagnosis of cancer.Furthermore, MRI, US, and CT are the main imaging techniques for the diagnosis of HCC.The selection of imaging techniques for HCC diagnosis depends on several factors, including the patient's condition, availability of medical resources, and specific circumstances.Typically, doctors choose the most suitable imaging examination based on each patient's needs and the characteristics of their condition.Moreover, using an internal dataset may overstate diagnostic value since homogeneity is produced, but external validation through out-of-sample data can offer insights into subgroups and variations among different ethnic groups.However, the presence of high heterogeneity and variance between studies results in considerable uncertainty surrounding the estimates of diagnostic accuracy in this meta-analysis.
A systematic search for relevant articles resulted in the identification of four systematic reviews or meta-analyses that explored the significant role of artificial intelligence (AI) and medical images in HCC diagnosis.However, these reviews considered diverse domains, making direct comparisons with this research challenging.Chou et al. discovered that image-based diagnosis of HCC had a sensitivity of 84% and specificity of 99%, highlighting its importance.However, they did not explore AI methods [67].In our research, with the assistance of the DL method, the effectiveness of medical image diagnosis of HCC was further improved.Lai showed that AI methods outperformed traditional systems in predicting HCC treatment outcomes, but their review lacked sufficient data for a metaanalysis [68].The meta-analysis we conducted can reduce the differences caused by random errors and increase the efficiency of the tests.Martinino et al. observed that as the number of studies and images increased, AI methods became more effective in diagnosing HCC, but the review did not differentiate between machine learning and DL methods [69].By applying DL methods to assist in the diagnosis of HCC, we can automatically learn patterns and features from the data to achieve more accurate predictions and decision-making.Zhang et al.'s meta-analysis revealed that DL methods excel in predicting microvascular invasion, demonstrating superior accuracy, methodology, and cost-effectiveness.However, HCC classification was not investigated in their study [70].Therefore, our research perfectly filled this gap.
Our research showed that DL methods are a powerful tool in diagnosing HCC, and we summarize our results as follows.First, the DL methods can extract intricate patterns and features based on medical images, enabling accurate identification of early-stage liver tumors that may be challenging for human experts to detect [71].This early detection had great significance in improving treatment outcomes and increasing patient survival rates [72].Second, DL models can process and analyze much imaging data in a relatively short time, facilitating faster and more efficient diagnoses [73,74].With this improved speed and efficiency, patients can promptly receive their diagnosis, allowing for expedited treatment planning and intervention.Moreover, the DL methods can learn based on large datasets of medical images to continuously improve their accuracy and performance [75,76].This adaptive learning capability enables the methods to remain up-to-date with the latest medical knowledge and advancements in HCC detection, ensuring the most accurate and reliable diagnoses.Another advantage of DL in HCC diagnosis is its potential for reducing human subjectivity and variability.By relying on objective image analysis, the DL methods can provide consistent and standardized evaluations, leading to more reliable and reproducible diagnoses [77,78].This consistency is especially valuable in cases in which doctors' opinions may differ, as the methods can provide an additional objective perspective.Furthermore, DL models can integrate multiple imaging modalities, such as CT scans, MRI, and ultrasound, to provide a more comprehensive and holistic assessment than other methods [79,80].By fusing information from various sources, these models can enhance the accuracy and reliability of HCC diagnosis and help guide treatment decisions.Lastly, with the development of the social economy, the quality of datasets obtained in HCC research is constantly improving, the data is increasing, and the diversity of data is constantly enriched.Meanwhile, the rapid development of DL methods has continuously made breakthroughs in algorithm innovation.The advanced performance of deep learning methods is based on mass of data for training because the accurate features will be obvious for the training effect on the mode to improve the generalization ability of the model.If the data is insufficient, the training of deep learning models is fatal, resulting in the model training appearing to be overfitting.Of course, this problem can be solved by data augmentation, but the generalization ability of the model may not be improved.We included the latest articles that published up until 2023, and the number of training sets can be up to hundreds of thousands; thus, our article with higher indicators (AUC and ROC) compared to similar articles is acceptable.Overall, DL has considerable advantages in HCC diagnosis, including improved early detection, faster processing times, continuous learning and improvement, reduced subjectivity, and more comprehensive evaluations.The integration of DL methods into clinical practice can significantly enhance patient care and outcomes in HCC.
Our study had some limitations.First, there was evident heterogeneity in our study.Despite subgroup and meta-regression analyses being carried out, the heterogeneity could not be completely eliminated.Second, due to limited data, we could not perform subgroup analysis based on tumor size and location.Third, the included studies were almost entirely retrospective, and potential confounding variables and confounding bias may limit the internal validity of retrospective studies.Research on DL methods based on medical images for HCC diagnosis should be improved in terms of study design.

Conclusions
In conclusion, the DL methods based on medical images for detecting HCC were found to be highly accurate, although the heterogeneity is obvious.Furthermore, the sensitivity of the DL methods significantly improved when utilizing contrast-enhanced imaging techniques.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15235701/s1.Supplementary File S1: QUADAS-2 summary plot.Supplementary File S2: Publication bias and Meta-regression result.Supplementary File S3: Forest plot of studies included in the meta-analysis.Supplementary Note SI: The search terms and search strategy used in each database.
Funding: This study was supported by the National Natural Science Foundation of China (82171944 and 81873899 to Baoming Luo) and Guangdong Province Natural Science Foundation (2021A1515012611 to Baoming Luo).

Figure 2 .
Figure 2. A comprehensive evaluation of DL methods.(A) Receiver operator characteristic (ROC) curves of all studies included in the meta-analysis (102 tables of 30 studies), and (B) ROC curves of studies reporting the highest accuracy (30 tables of 30 studies).

Figure 2 .
Figure 2. A comprehensive evaluation of DL methods.(A) Receiver operator characteristic (ROC) curves of all studies included in the meta-analysis (102 tables of 30 studies), and (B) ROC curves of studies reporting the highest accuracy (30 tables of 30 studies).

Figure 3 .
Figure 3. Pooled performance of DL methods using different medical image methods.(A) ROC curves of studies with contrast-enhanced images (65 tables of 23 studies), (B) ROC curves of studies with non-contrast-enhanced images (30 tables of 6 studies).

Cancers 2023 , 19 Figure 3 .
Figure 3. Pooled performance of DL methods using different medical image methods.(A) ROC curves of studies with contrast-enhanced images (65 tables of 23 studies), (B) ROC curves of studies with non-contrast-enhanced images (30 tables of 6 studies).

Figure 4 .
Figure 4. Pooled performance of DL methods using different imaging modalities.(A) ROC curves of studies using MRI (29 tables of 10 studies), (B) ROC curves of studies using US (32 tables of 7 studies), and (C) presented ROC curves of studies using CT (41 tables of 13 studies).

Figure 4 .
Figure 4. Pooled performance of DL methods using different imaging modalities.(A) ROC curves of studies using MRI (29 tables of 10 studies), (B) ROC curves of studies using US (32 tables of 7 studies), and (C) presented ROC curves of studies using CT (41 tables of 13 studies).

Figure 5 .
Figure 5. Pooled performance of DL methods using different validation types.(A) ROC curves of studies with internal validation (92 tables of 29 studies), (B) ROC curves of studies with external validation (10 tables of 5 studies).

Figure 6 .
Figure 6.Pooled performance of DL methods versus human clinicians using the same sample.(A) ROC curves of studies using DL methods (20 tables of 6 studies), and (B) ROC curves of studies using human clinicians (10 tables of 6 studies).

Figure 5 .
Figure 5. Pooled performance of DL methods using different validation types.(A) ROC curves of studies with internal validation (92 tables of 29 studies), (B) ROC curves of studies with external validation (10 tables of 5 studies).

Cancers 2023 , 19 Figure 5 .
Figure 5. Pooled performance of DL methods using different validation types.(A) ROC curves of studies with internal validation (92 tables of 29 studies), (B) ROC curves of studies with external validation (10 tables of 5 studies).

Figure 6 .
Figure 6.Pooled performance of DL methods versus human clinicians using the same sample.(A) ROC curves of studies using DL methods (20 tables of 6 studies), and (B) ROC curves of studies using human clinicians (10 tables of 6 studies).

Figure 6 .
Figure 6.Pooled performance of DL methods versus human clinicians using the same sample.(A) ROC curves of studies using DL methods (20 tables of 6 studies), and (B) ROC curves of studies using human clinicians (10 tables of 6 studies).

Table 1 .
Study design and basic demographics.
Patients who had abdominal CT scans within three months of operation with a routine clinical imaging protocol of contrast-enhanced portal venous phase CT Patients who had (1) no contrast-enhanced CT scans;(2) metal artifacts infiltrating the tumor on CT imaging;(3) prior ablation, embolization, resection, or transplantation, as these prior treatments would alter the appearance of the tumors on imaging and compromise the quantitative image analysis; (4) tumors that were ruptured; (5) tumors with a diffuse infiltrative pattern (as tumor borders were challenging to determine for analysis) Patients who (1) were at least 18 years old; (2) had clear CT image with lesion location being analyzed easily; (3) had no other genetic history in the family Patients who (1) take related prohibited drugs before CT image acquisition; (2) during hospital examination, the patient had a severe malignant tumor and
(2)ients who were (1) pathologically confirmed with one of the following malignant hepatic tumors: HCC, ICC, and metastasis;(2)with preoperative multi-phase contrast-enhanced CT available Patients (1) who were ≤18 years old; (2) who had a prior liver resection or transplantation; (3) whose interval between the pathologic examination and the preoperative CT > 100 days; (4) whose image quality was poor Histopathology 723 Oestmann et al. 2021 [34] Patients had histopathological diagnosis and were older than 18 years NA Histopathology 118 Wang et al. 2021 [31]The HCC group consisted of patients not only treated by surgical resection but also treated by intervention, radiofrequency ablation, cryoablation, microwave therapy, or any other invasive treatment therapy.Both solitary and multiple HCC tumor nodules were enrolled.Patients diagnosed with malignant lesions other than HCC such as hemangioendothelioma, sarcoma, intrahepatic cholangiocarcinoma, and metastatic tumor were included in the control group.Patients diagnosed with benign lesions such as leiomyolipoma, hemangioma, cyst, abscess, adenoma, and focal nodular hyperplasia were also included in the

Table 2 .
Methods of model training and validation.