Ensemble Deep Learning Model to Predict Lymphovascular Invasion in Gastric Cancer

Simple Summary Lymphovascular invasion (LVI) serves as a crucial predictor in gastric cancer, indicating an increased likelihood of lymph node spread and poorer patient outcomes. Detecting LVI(+) within gastric cancer histopathology presents challenges due to its elusive nature, leading to the proposal of a deep learning-based detection method using H&E-stained whole-slide images. Remarkably, both the classification and detection models demonstrated superior performance, and their ensemble exhibited outstanding predictive capabilities in identifying LVI areas. This innovative approach holds promise in precision medicine, potentially streamlining examinations and reducing discrepancies among pathologists. Abstract Lymphovascular invasion (LVI) is one of the most important prognostic factors in gastric cancer as it indicates a higher likelihood of lymph node metastasis and poorer overall outcome for the patient. Despite its importance, the detection of LVI(+) in histopathology specimens of gastric cancer can be a challenging task for pathologists as invasion can be subtle and difficult to discern. Herein, we propose a deep learning-based LVI(+) detection method using H&E-stained whole-slide images. The ConViT model showed the best performance in terms of both AUROC and AURPC among the classification models (AUROC: 0.9796; AUPRC: 0.9648). The AUROC and AUPRC of YOLOX computed based on the augmented patch-level confidence score were slightly lower (AUROC: −0.0094; AUPRC: −0.0225) than those of the ConViT classification model. With weighted averaging of the patch-level confidence scores, the ensemble model exhibited the best AUROC, AUPRC, and F1 scores of 0.9880, 0.9769, and 0.9280, respectively. The proposed model is expected to contribute to precision medicine by potentially saving examination-related time and labor and reducing disagreements among pathologists.


Introduction
Gastric cancer is the most common type of cancer, accounting for 12% of all cancer cases in Korea according to data from the National Cancer Center in 2018 [1].In 2020, more than 1 million (1,089,103) new cases of gastric cancer were estimated worldwide, resulting in 768,793 deaths [2].Lymph node metastasis is the most significant prognostic factor for patients with gastric cancer, and the presence of lymphovascular invasion (LVI) is the most significant risk factor for lymph node metastasis [3][4][5][6].LVI is defined as the invasion of vessel walls by tumor cells and/or the presence of tumor emboli within an endothelial-lined space [7].Predictive value and prevalence of LVI are highly dependent on the type of cancer, and the presence of LVI is a recognized prognostic factor in a variety of solid malignancies, including breast cancer, urothelial carcinoma, and colorectal cancer [8].Since the proclamation of LVI as an important factor in the prognosis of gastric cancer by Talamonti et al. [9], the American Joint Committee on Cancer has recommended the evaluation of LVI [10].According to the current Japanese guidelines, LVI in gastric cancer is not clinically useful information except for predicting the possibility of curative endoscopic resection.LVI is the most significant risk factor associated with lymph node metastases in individuals with early gastric cancer [6,[11][12][13].The rate of lymph node metastasis observed in patients exhibiting LVI (25.7-32.1%)was much higher compared to that in those without LVI (1.5-2.3%)[6,11,13,14].In addition, Fusikawa et al. showed that a significant difference was observed between the values of 79.8% in the LVI(-) group and 67.2% in the LVI(+) group in advanced cancer [7].The five-year survival rate of advanced cancers with nodal metastases is 76.7% in the LVI(-) group and 60.9% in the LVI(+) group [7].Therefore, LVI is an independent prognostic marker in gastric cancer and tends to worsen the prognosis, particularly in cases of advanced malignancy with lymph node metastasis.
The recognition of lymphatic tumor emboli in microscopic sections is dependent on the pathologist [15].There is potential for significant inter-observer variations in the diagnosis of LVI amongst pathologists [16].Inter-observer disagreement can be expected in the diagnosis of LVI as retraction artifacts that isolate tumor aggregates can be caused by tissue shrinkage during fixation, which are easily confused with true tumor emboli during routine examination of hematoxylin and eosin (H&E) stained sections [17,18].Tumors may be artefactually displaced into vessels during specimen cut up or processing [19].For instance, Gilchrist et al. noted that when three surgical pathologists were told to assess for LVI in a pT1-2 N0 M0 histological mastectomy case, all three concurred in only 12 of 35 breast cancer cases [15,16].Several attempts have been made to overcome these limitations.The monoclonal D2-40 antibody can selectively detect lymphatic vessels as it is expressed in the lymphatic endothelium but not in blood vessels, and D2-40 staining is reportedly more sensitive than H&E staining for detecting lymphatic invasion (LI) [17,20,21].Elastin staining may also be used for a clearer recognition of blood vessels as it identifies the elastic fibers of blood vessels [22][23][24][25].Inter-observer agreement in the diagnoses of LVI was improved by adding ancillary D2-40 and elastin staining, regardless of the experience of the pathologists [4].However, the assessment of LVI by pathologists is inherently limited owing to human errors.Examining large areas of tumors for LVI is time-consuming and challenging because the foci of LVI can be small and subjective.Nonetheless, the presence of LVI can have a marked impact on disease management, and the identification of a genuine single focus is sufficient to label a case as LVI(+).This automated identification of possible LVI(−)indicating lesions may have significant clinical utility [19].
Digital pathology defines the creation of whole-slide images (WSI) from a histology slide that can be viewed on a screen to form a diagnostic report [26].Traditionally, histological diagnosis and pathological staging by pathologists have been evaluated using glass slides and microscopes [26].Digital pathology is now increasingly being implemented in laboratories around the world, and digital support management is seen as a key component of health service planning aimed at improving efficiency, network operation, and quality [26].There is great potential for using artificial intelligence (AI) to assist pathologists and derive new biological insights into disease biology, even in areas imperceptible to human observers [27].However, the majority of AI medical devices that have received FDA approval and have been introduced to the market thus far are primarily focused on radiology.In contrast, only a limited number of devices have been approved for use in the field of pathology [28].Moreover, it is important to explore the potential of these AI technologies as many pathology departments do not have enough pathologists.
AI algorithms that utilize convolutional neural networks (CNNs) for image analysis have already shown significant promise in the pathological evaluation of various solid tumors, including prostate cancer screening in prostate biopsies [29,30], leading to new evaluations of clinical outcomes, providing [31,32] or predicting the presence of mutations [33] or molecular subtypes [34] in H&E-stained sections.The usefulness of these algorithms in identifying small regions of prognostic significance in digital WSI has previously been demonstrated in the context of identifying metastatic breast cancer within lymph nodes [35,36].In addition, the AI model can automatically find LVI in the WSI of testicular cancer [19].AI model can identify LVI foci better than a human expert (recall score: 0.68 vs. 0.56).
In this study, we developed an algorithm to identify LVI foci related to the prognosis of gastric cancer.The image classification and detection models were trained and validated at both the patch and WSI levels.The ensemble approach was used to combine the predictions of these sub-models to improve the overall performance of the model.The sub-models were trained on a dataset of WSI of gastric cancer, with annotations of vascular and lymphatic vascular invasion.A conceptual diagram of the LVI prediction model is shown in Figure 1.
Cancers 2024, 16, x FOR PEER REVIEW 3 of 14 automated identification of possible LVI(−)indicating lesions may have significant clinical utility [19].Digital pathology defines the creation of whole-slide images (WSI) from a histology slide that can be viewed on a screen to form a diagnostic report [26].Traditionally, histological diagnosis and pathological staging by pathologists have been evaluated using glass slides and microscopes [26].Digital pathology is now increasingly being implemented in laboratories around the world, and digital support management is seen as a key component of health service planning aimed at improving efficiency, network operation, and quality [26].There is great potential for using artificial intelligence (AI) to assist pathologists and derive new biological insights into disease biology, even in areas imperceptible to human observers [27].However, the majority of AI medical devices that have received FDA approval and have been introduced to the market thus far are primarily focused on radiology.In contrast, only a limited number of devices have been approved for use in the field of pathology [28].Moreover, it is important to explore the potential of these AI technologies as many pathology departments do not have enough pathologists.
AI algorithms that utilize convolutional neural networks (CNNs) for image analysis have already shown significant promise in the pathological evaluation of various solid tumors, including prostate cancer screening in prostate biopsies [29,30], leading to new evaluations of clinical outcomes, providing [31,32] or predicting the presence of mutations [33] or molecular subtypes [34] in H&E-stained sections.The usefulness of these algorithms in identifying small regions of prognostic significance in digital WSI has previously been demonstrated in the context of identifying metastatic breast cancer within lymph nodes [35,36].In addition, the AI model can automatically find LVI in the WSI of testicular cancer [19].AI model can identify LVI foci better than a human expert (recall score: 0.68 vs. 0.56).
In this study, we developed an algorithm to identify LVI foci related to the prognosis of gastric cancer.The image classification and detection models were trained and validated at both the patch and WSI levels.The ensemble approach was used to combine the predictions of these sub-models to improve the overall performance of the model.The sub-models were trained on a dataset of WSI of gastric cancer, with annotations of vascular and lymphatic vascular invasion.A conceptual diagram of the LVI prediction model is shown in Figure 1.

Patients and Tumor Samples
Gastric adenocarcinoma slides were obtained from 88 patients who underwent endoscopic submucosal dissection, subtotal gastrectomy, or total gastrectomy at the Chonnam National University Hwasun Hospital from 2018 to 2021.The availability of adequate tissue and the histological diagnosis of gastric cancer were the inclusion criteria.One hundred WSI were collected from these patient samples.Clinical information was collected from the electronic medical records maintained in the electronic database of the hospital.This study was approved by the Institutional Review Board (IRB) of the Chonnam National University Hwasun Hospital (CNUHH-2021-197) and conducted in accordance with the Declaration of Helsinki.Informed consent from patients was waived with IRB approval.

Datasets
The slides were scanned using a Leica-Aperio GT450 Scanner (Leica Biosystems) using an 40× objective.Using QuPath 0.3.0tools, the LVI(+) regions were annotated by two board-certified pathologists.The examples of LVI(+) and LVI(−) are depicted in Figure 2. We performed CD34 and D2-40 immunohistochemical staining on all slides to confirm LVI(+) foci and to increase the accuracy of marking LVI(+) foci.For training, validation, and test splitting, we randomly selected WSI with a 6:2:2 ratio.We patchified WSIs using conventional digital pathology image analysis (Figure 1, preprocessing panel).LVI(+) foci were generated based on LVI(+) annotations.The sliding windowing approach generated LVI(−) patches from the remaining WSI.Without any overlap, we visited all WSI regions that did not include LVI(+) foci.To handle class imbalances and remove redundancy in LVI(−) patches, one-third was sampled from all LVI(−) patches.The LVI(+) and LVI(−) patches were generated at 20×-level (0.5 µm/pixel) with 512 × 512 pixels.
ensemble confidence).This ensemble confidence is then utilized to predict the ultimate diagnosis of LVI(+) or LVI(−).

Patients and Tumor Samples
Gastric adenocarcinoma slides were obtained from 88 patients who underwent endoscopic submucosal dissection, subtotal gastrectomy, or total gastrectomy at the Chonnam National University Hwasun Hospital from 2018 to 2021.The availability of adequate tissue and the histological diagnosis of gastric cancer were the inclusion criteria.One hundred WSI were collected from these patient samples.Clinical information was collected from the electronic medical records maintained in the electronic database of the hospital.This study was approved by the Institutional Review Board (IRB) of the Chonnam National University Hwasun Hospital (CNUHH-2021-197) and conducted in accordance with the Declaration of Helsinki.Informed consent from patients was waived with IRB approval.

Datasets
The slides were scanned using a Leica-Aperio GT450 Scanner (Leica Biosystems) using an 40× objective.Using QuPath 0.3.0tools, the LVI(+) regions were annotated by two board-certified pathologists.The examples of LVI(+) and LVI(−) are depicted in Figure 2. We performed CD34 and D2-40 immunohistochemical staining on all slides to confirm LVI(+) foci and to increase the accuracy of marking LVI(+) foci.For training, validation, and test splitting, we randomly selected WSI with a 6:2:2 ratio.We patchified WSIs using conventional digital pathology image analysis (Figure 1, preprocessing panel).LVI(+) foci were generated based on LVI(+) annotations.The sliding windowing approach generated LVI(−) patches from the remaining WSI.Without any overlap, we visited all WSI regions that did not include LVI(+) foci.To handle class imbalances and remove redundancy in LVI(−) patches, one-third was sampled from all LVI(−) patches.The LVI(+) and LVI(−) patches were generated at 20×-level (0.5 µm/pixel) with 512 × 512 pixels.
To conduct external validation, we utilized a publicly accessible classification dataset that contained patch images pertaining to lymphatic invasion [37].Comprising 48 WSIs sourced from 27 patients, this external validation dataset comprised 302 positive instances and 671 negative instances.The patch images were captured at a 5×-level magnification (2 µm/pixel) with dimensions of 512 × 512 pixels.Notably, this external validation dataset was acquired using a distinct scanner (Leica-Aperio AT2) and originated from a different hospital setting.To conduct external validation, we utilized a publicly accessible classification dataset that contained patch images pertaining to lymphatic invasion [37].Comprising 48 WSIs sourced from 27 patients, this external validation dataset comprised 302 positive instances and 671 negative instances.The patch images were captured at a 5×-level magnification (2 µm/pixel) with dimensions of 512 × 512 pixels.Notably, this external validation dataset was acquired using a distinct scanner (Leica-Aperio AT2) and originated from a different hospital setting.

Model Development
We fine-tuned the image classification and detection models to identify the LVI foci in a given patch image.The following analysis was conducted using Python 3.8, Pytorch 1.13.1, and a single A100 GPU.

Classification Models
We defined the classification problem as a binary classification.The ResNet 50 [38], EfficeientNet B3 [39], and ConViT (Small) [40] models were fine-tuned on the LVI datasets.The parameters of the selected image classification models ranged from 20 to 30 M. In an empirical study, we found that large parameters converged into overfitting because of our limited dataset volume.We utilized ImageNet [41] pretrained weights with entire layers that can be updated by considering the modality gap between a conventional RGB and digital pathology images.Image augmentations were applied, including affine transform, elastic transform, blurring, brightness, and color jittering.Balanced weight-sampling methods were applied during training to alleviate data imbalance.The image classification models were trained using the Adam optimizer (learning rate: 1 × 10 −4 ), cosine annealing learning rate scheduler, and automated mixed precision.

Detection Models
The detection model was utilized to classify and localize the desired object in the entire image simultaneously.A regression operation was applied to localize the object using a bounding box.We utilized a one-stage object detection model called the YOLO model [42].YOLO detection uses the concept of an anchor box.The anchor box has a predefined shape and ratio of the bounding box that is utilized in the bounding box location prediction.For example, human objects commonly exhibited square shapes with long heights and short widths.In contrast, the dog objects had square shapes with short heights and log widths.Anchor-based methods have been actively utilized to ease the prediction performance.However, in terms of LVI, the shape of LVI was arbitrary; several LVI foci assumed a square shape, and the others assumed a rectangular shape with variants of size.To compare the impact of the anchor box assumption on LVI foci detection, we trained both an anchor box assumption-based detection model (YOLO v3) [43] and detection model without the anchor box assumption (YOLOX) [44].To match the number of parameters, the medium size of YOLOX was selected.The hyperparameters and data augmentations followed the recommendations of each framework.The detection model could detect as many LVI(+) regions as possible.Therefore, unlike a classification model, a single-patch image can have multiple prediction confidence scores.To aggregate multiple confidence scores, we computed the augmented confidence score of each patch image using the maximum operator.

Ensemble Model
The ensembled confidence score (C ens ) is calculated as the weighted average of the confidence score of the classification model (C cl f ) and the augmented confidence score of the detection model (C det ), according to Equation (1): where w cl f and w det denote weighted factors of classification and detection models, respectively.Considering the performances of each model, we empirically set the w cl f to 1.0 and w det to 1.0, respectively.The ensembled confidence score was treated as a final confidence score.

Evaluation Metrics
Generally, to evaluate the classification performance, the true positive (TP), false positive (FP), false negative (FN), and true negative (TN) are computed by comparing the prediction confidence that a model returns and the ground truth.Furthermore, the TP, FP, FN, and TN, accuracy score, recall (sensitivity), precision (positive predicted value, PPV), F1 score, AUROC, AUPRC are obtained.The detection performance was evaluated based on the intersection over union (IOU) of the bounding box predicted by the model and ground truth bounding box.With the IOU threshold, we could determine whether the model prediction was true or false.Using the precision and recall scores, we can summarize the detection performance as an average precision (AP) score [45].The AP 50 score corresponded to the AP score at the IOU threshold of 50%.The classification performance of the detection model was computed based on the augmented confidence score that aggregated multiple prediction outputs.

Patient Characteristics
All the patients were LVI(+).The mean age of the patients was 69.6 years (±10.2),and the majority were men (73.0%) (Table 1).Poorly differentiated tumors comprised 46.0% of the cases.Despite being LVI(+), 10 patients (18.2%) did not exhibit LNM.The number of lymph node involvement was 12.0 (±13.7).Perineural invasion was observed in 39 (61.9%) patients.The clinicopathological features of the cases are summarized in Table 1.

Patch-Level Analysis
With WSI-level splitting, each WSI was randomly allocated as a training, valid, or test dataset.Each WSI image had a different prognosis for LVI.Therefore, the number of LVI foci and patch images was heterogeneous.The dataset configurations are presented in Table 2.The patch-level analysis was components: classification, detection, and an ensemble of both classification and detection.Figure 3 illustrated the example outputs of ground truths, classification focused areas, and detection outputs.

Patch-Level Analysis
With WSI-level splitting, each WSI was randomly allocated as a training, valid, or test dataset.Each WSI image had a different prognosis for LVI.Therefore, the number of LVI foci and patch images was heterogeneous.The dataset configurations are presented in Table 2.The patch-level analysis was components: classification, detection, and an ensemble of both classification and detection.Figure 3 illustrated the example outputs of ground truths, classification focused areas, and detection outputs.

Patch-Level Analysis: Classification Models
The patch classification results were outstanding for all classification models without any considerable performance gap.The ConViT model showed the best performance in terms of both the area under the receiver operating characteristics (AUROC) and area under the precision-recall curve (AUPRC) in the classification models (AUROC: 0.9796; AUPRC: 0.9648).The accuracy, precision, recall score, and F1 score were computed with a confidence score threshold of 0.5.

Patch-Level Analysis: Detection Models
In detection, the YOLOX model outperformed the YOLO v3 model in both detection (AP 50 ) and classification metrics.The AP 50 of YOLOX and YOLO v3 were 0.55 and 0.66, respectively.The AUROC and AUPRC values of YOLOX were higher than those for YOLO v3 (0.9666 vs. 0.9702 for the AUROC and 0.9423 vs. 0.9302 for the AUPRC).However, the Cancers 2024, 16, 430 8 of 14 AUROC and AUPRC of YOLOX computed based on the augmented patch-level confidence score were slightly lower (AUROC: −0.0094; AUPRC: −0.0225) than those of the ConViT classification model.In the detection models, the accuracy, precision, recall score, and F1 score were computed with an augmented patch-level confidence score threshold of 0.7.The threshold was adjusted to be stricter than the value utilized in the image classification model to mitigate the heavy false positives that could occur during detection.

Patch-Level Analysis: Ensemble Model
Notably, the YOLOX model exhibited an outstanding F1 score (+0.0039 points compared with that of ConViT) in all benchmark models.Considering the AUROC, AUPRC, and F1 scores, we attempted to mix the best-performing models in an ensemble approach.With simple averaging of the patch-level confidence scores, the ensemble model showed the best AUROC, AUPRC, and F1 scores of 0.9880, 0.9769, and 0.9280, respectively.The performances are summarized in Table 3.

WSI-Level Analysis
The WSI consists of multiple patch images, allowing for aggregation of these patches at the WSI level.The conceptual diagram of WSI-level analysis is shown in Figure 4.Each patch prediction result was aggregated at the WSI-level, and the WSI-level prediction result was aggregated once more in the entire test dataset.The WSI-level prediction performance is summarized in Table 4.The performance was consistent with the results of the patchlevel prediction (Table 3).We adjusted the threshold such that the positive and negative could be determined as the medium points (0.5); however, this threshold could be rescaled depending on the interests of the researcher.In our dataset, WSIs generally included multiple LVI regions.Therefore, we concluded that the benefit of reducing false positives was more significant.If the LVI region is small, such as in patients with early-stage cancer, a strategy can be adopted to reduce false negatives by lowering the threshold.rescaled depending on the interests of the researcher.In our dataset, WSIs generally included multiple LVI regions.Therefore, we concluded that the benefit of reducing false positives was more significant.If the LVI region is small, such as in patients with earlystage cancer, a strategy can be adopted to reduce false negatives by lowering the threshold.

External Validation
To measure the efficacy of the ensemble approach, we conducted an external validation using a preexisting dataset.Employing this classification dataset facilitated the application of our model to ascertain positive or negative LVIs [37].The ensemble model demonstrated superior performance compared to both classification and object detection models (Table 5).Specifically, the AUROC of the ensemble model exhibited improvements of 0.025 (2.8%) and 0.052 (5.9%) in contrast to the classification and detection models, respectively.Furthermore, the AUPRC of the ensemble model saw enhancements of 0.044 (5.1%) and 0.081 (9.8%), respectively.Analogous to the internal validation dataset, the ensemble model exhibited robustness when compared to the classification and detectiononly models.

External Validation
To measure the efficacy of the ensemble approach, we conducted an external validation using a preexisting dataset.Employing this classification dataset facilitated the application of our model to ascertain positive or negative LVIs [37].The ensemble model demonstrated superior performance compared to both classification and object detection models (Table 5).Specifically, the AUROC of the ensemble model exhibited improvements of 0.025 (2.8%) and 0.052 (5.9%) in contrast to the classification and detection models, respectively.Furthermore, the AUPRC of the ensemble model saw enhancements of 0.044 (5.1%) and 0.081 (9.8%), respectively.Analogous to the internal validation dataset, the ensemble model exhibited robustness when compared to the classification and detection-only models.

Discussion
In this study, we present a deep-learning model for predicting gastric LVI from the patch images from WSI.Two models were developed: image classification and detection.The ConViT (classification) and YOLOX (detection) models showed comparable performances.The final ensemble model showed outstanding performance in predicting gastric LVI.
In a previous study, Ghosh et al. demonstrated that a deep-learning model could predict LVI foci in testicular LVI [36].They applied the semantic segmentation-based model (DeeplabV3) [46] to predict the mask of LVI foci; however, the number of LVI(+) foci to train and evaluate a semantic segmentation model were small.Therefore, the model performance can be further improved.With few samples of LVI foci included in the test dataset (34 foci), it could be difficult to determine the generalized performance of LVI prediction.
One of the primary tasks of digital pathology is the detection of mitosis, which often employs a two-stage framework comprising object detection and classification [47].This approach is preferred due to the small size of mitotic objects, which makes the model predictions highly susceptible to false positives and false negatives.Initially, candidate regions are identified through object detection, and subsequently refined using classification techniques.While the sequential application of this two-stage framework may not pose significant challenges in studies based on limited benchmark datasets, it can prove timeconsuming in typical medical scenarios.Therefore, to address this issue, we propose an ensemble approach that combines the advantages of the two-stage model while enabling parallel processing.
In our experimental setting, the classification model ConViT exhibited an outstanding performance among the candidate classification models.The ConViT model attempted to fuse the outstanding performance of transformer-based architectures with the advantages of CNN.The ability of the transformer to focus on global information and the ability of CNNs to focus on local patterns boosted the prediction performance.The LVI foci had heterogeneous shape and size characteristics.In addition, it is essential to determine whether the LVI is located in the lymph node site or blood vessels.The most common false positives occurred in detachment artifacts owing to the failure to interpret peripheral contexts.
The detection model also showed comparable performance in detecting LVI foci.The anchor-free assumption-based model YOLOX was more appropriate because of the varying sizes and shapes of the LVI foci.The YOLOX model exhibited a comparable performance with regard to the AUROC and AUPRC than the ConViT model.However, it exhibited a slightly better performance with regard to the F1 score.The ensemble model exhibited improved AUROC, AUPRC, and F1 scores compared with the classification and detectiononly model (improved gain: 0.0084, AUROC; 0.012, AUPRC; 0.022, F1 score).Additionally, the improvement of the ensemble model was also found in the external validation (AUROC: 2.8%; AUPRC: 5.1%).
Our model predicted LVI foci in WSI; in other words, it identified whether LVI foci existed.However, LVI is essentially a histological finding that suggests the possibility of metastasis to the lymph nodes.Previous studies have reported models to predict LNM from pathological slide images of solid tumors, such as breast, colorectal, bladder, and prostate cancers.Although LNM is one of the most important prognostic factors, a model for predicting LNM in gastric cancer has not yet been reported.Wang et al. reported a model for predicting the prognosis of gastric cancer using the histopathology of resected lymph nodes; however, this was not a model for predicting metastasis to the lymph nodes.This algorithm, which detects LVI(+) foci, is expected to significantly help pathologists at the actual reading site, However, predicting LVI(+) in the clinical field is not sufficient to predict the prognosis of a patient.It is necessary to conduct additional studies on the association of the LVI(+) foci identified by this algorithm with the number of lymph node metastases and patient survival prognosis, and thus, further investigation into this is anticipated.
In addition, semi-supervised and active learning pipelines for generating LVI focal labeling more easily need to be further developed.Our YOLOX model can predict the LVI foci using a bounding box.Therefore, we can assume that the prediction results of YOLOX are newly annotated LVI foci in the other datasets.With the supervision of human experts who reject or accept newly annotated LVI foci (active learning), the labeled dataset expands rapidly.Additionally, in this study, we hypothesized that detection would be sufficient to predict LVI foci.However, a previous study utilized semantic-segmentation-based modeling for testicular LVI foci detection.LVI foci share similar patterns despite differences in organs, such as tumors surrounded by blood vessels or lymph nodes.Therefore, in the future, we aim to expand our work to compare semantic segmentation, object detection, and classification models to predict the LVI foci.
Our study has several limitations.First, a number of LVI(+) foci imbalances may exist for each slide.This data imbalance problem may cause distortion in the learning process.We applied WSI-level data splitting to resolve the LVI(+) foci imbalance problem.The best option for data splitting involves splitting the patient-level data.However, we encountered varying LVI(+) foci depending on the patient status.Furthermore, of the multiple sections of slides that may be present in a single gastric cancer tissue, we selected no more than five slides from the same patient.Therefore, patient-level data splitting can be coupled with a heavy class imbalance that is harmful to supervised learning procedures.To mitigate this issue, we alternately selected WSI-level splitting.Second, LVI(+) foci always contain the possibility of false positives or negatives.To reduce false-positive or false-negative foci marks at the annotation step, we confirmed CD34 and D2-40 immunohistochemical staining on all slides.In addition, LVI(+) confirmation was performed by two pathologists.However, annotation marking for foci may be missed because LVI(+) is a relatively small lesion within the WSI.This results in missed marking annotations for some LVI(+) foci and marked LVI(+) foci for some artifacts.Similarly, when the trained algorithmic model predicts LVI(+) positive foci, it may be a false positive.To discriminate false positives, all areas predicted to be LVI(+) positive foci were individually checked by two pathologists.Through this process, we were able to improve the accuracy of the model in predicting LVI(+) foci.Spatial heterogeneity is a crucial factor that must be taken into account in studies on artificial intelligence learning in digital pathology.Stomach cancer is specifically recognized as a type of cancerous tissue that exhibits significant and pronounced spatial heterogeneity within the tissue.Nevertheless, spatial heterogeneity was not a significant factor that needed to be taken into account for this project.LVI is histopathologically defined by the presence of tumor emboli within lymphatic/vascular channels and exhibits morphological features that are rather homogeneous.For instance, the presence of LVI is not exclusive to stomach cancer but is also observed in various other forms of cancer.These findings indicate that the scope of this research extends beyond stomach cancer and has potential for further application to other types of cancer.

Conclusions
This research presents an ensemble deep-learning model for detecting vascular and lymphatic vascular invasion in WSI of histopathology of gastric cancer.The ensemble deeplearning model has been demonstrated as more robust and accurate than single models, and it can be used as a valuable tool for pathologists in diagnosing gastric cancer and may help improve the accuracy of diagnosis and prognosis of the disease.This approach can be considered an alternative to traditional methods and as a step toward computer-aided diagnotic systems in histopathology.

Figure 1 .
Figure 1.Schematic of the LVI Net.Panel (A) portrays the preprocessing step and annotations, while Panel (B) illustrates the workflow of the LVI Net.The patch image is input into both the classification and detection models.Subsequently, the prediction outcomes from these models conducted weighted averaging, resulting in the computation of the final confidence level (referred to as the

Figure 1 .
Figure 1.Schematic of the LVI Net.Panel (A) portrays the preprocessing step and annotations, while Panel (B) illustrates the workflow of the LVI Net.The patch image is input into both the classification and detection models.Subsequently, the prediction outcomes from these models conducted weighted averaging, resulting in the computation of the final confidence level (referred to as the ensemble confidence).This ensemble confidence is then utilized to predict the ultimate diagnosis of LVI(+) or LVI(−).

Figure 2 .
Figure 2. Example patch images of LVI(+) and LVI(−).The left panel displays a patch associated with LVI(+) classification, while the right panel represents LVI(−).LVI foci refer to tumors located within identifiable white, rounded structures that align anatomically with blood vessels and lymph nodes.A patch is classified as positive if it contains one or more regions indicating the presence of LVI.The LVI areas are marked as red boxes.

Figure 3 .
Figure 3.The output of classification and detection model.The summarization of the classification and detection results for the same patch image is presented.Panel (A) displays the original image, while panels (B,C) showcase the classification and detection results, respectively.The heatmap generated using Grad-CAM highlights the areas of focus by the classification model, with red areas indicating greater attention.This visual representation indicates that the classification model exhibits a relatively focused perspective.Conversely, the detection model predicts the object's location by enclosing it within a bounding box and provides the confidence level for each prediction.It is evident that the detection model successfully identifies various dispersed regions within the image.

Figure 3 .
Figure 3.The output of classification and detection model.The summarization of the classification and detection results for the same patch image is presented.Panel (A) displays the original image, while panels (B,C) showcase the classification and detection results, respectively.The heatmap generated using Grad-CAM highlights the areas of focus by the classification model, with red areas indicating greater attention.This visual representation indicates that the classification model exhibits a relatively focused perspective.Conversely, the detection model predicts the object's location by enclosing it within a bounding box and provides the confidence level for each prediction.It is evident that the detection model successfully identifies various dispersed regions within the image.

Figure 4 .
Figure 4.The example WSI-level analysis.A WSI-level analysis can be visualized by combining the results of patch-level analysis.Additionally, it can be illustrated as a WSI-level heatmap, which utilizes the spatial information of the patch images.In the heatmap representation, red points indicate regions that exhibit high confidence in being classified as LVI(+) cases, whereas blue points indicate regions with high confidence in being classified as LVI(−) cases.The image on the right showcases a magnified view of the area identified as LVI(+), presenting the respective judgments made by both the classification model and the detection model.

Figure 4 .
Figure 4.The example WSI-level analysis.A WSI-level analysis can be visualized by combining the results of patch-level analysis.Additionally, it can be illustrated as a WSI-level heatmap, which utilizes the spatial information of the patch images.In the heatmap representation, red points indicate regions that exhibit high confidence in being classified as LVI(+) cases, whereas blue points indicate regions with high confidence in being classified as LVI(−) cases.The image on the right showcases a magnified view of the area identified as LVI(+), presenting the respective judgments made by both the classification model and the detection model.

Table 1 .
Baseline characteristics of the study population.

Table 3 .
Performance of trained model using the patch images.

Table 4 .
Performance of trained model using the whole slide images.

Table 4 .
Performance of trained model using the whole slide images.

Table 5 .
Performance of trained model using the external validation dataset.