Color-Transfer-Enhanced Data Construction and Validation for Deep Learning-Based Upper Gastrointestinal Landmark Classification in Wireless Capsule Endoscopy

While the adoption of wireless capsule endoscopy (WCE) has been steadily increasing, its primary application remains limited to observing the small intestine, with relatively less application in the upper gastrointestinal tract. However, there is a growing anticipation that advancements in capsule endoscopy technology will lead to a significant increase in its application in upper gastrointestinal examinations. This study addresses the underexplored domain of landmark identification within the upper gastrointestinal tract using WCE, acknowledging the limited research and public datasets available in this emerging field. To contribute to the future development of WCE for gastroscopy, a novel approach is proposed. Utilizing color transfer techniques, a simulated WCE dataset tailored for the upper gastrointestinal tract is created. Using Euclidean distance measurements, the similarity between this color-transferred dataset and authentic WCE images is verified. Pioneering the exploration of anatomical landmark classification with WCE data, this study integrates similarity evaluation with image preprocessing and deep learning techniques, specifically employing the DenseNet169 model. As a result, utilizing the color-transferred dataset achieves an anatomical landmark classification accuracy exceeding 90% in the upper gastrointestinal tract. Furthermore, the application of sharpen and detail filters demonstrates an increase in classification accuracy from 91.32% to 94.06%.


Introduction
The global endoscopy market is experiencing steady increase, transcending geographical boundaries, as highlighted by its expansion beyond the confines of the United States and China [1].Conventional wired endoscopy, while a cornerstone in diagnostics, has several limitations, such as the need for sedation, active involvement, and specialized techniques [2].In contrast, wireless capsule endoscopy (WCE) offers a more patient-friendly alternative, with individuals simply ingesting a capsule with water.The transmitted capsule images are subsequently reviewed by endoscopists, typically providing diagnostic results within an average of 30-40 min [3].The simplicity and convenience of wireless capsule endoscopy have resulted in a recent surge in popularity, showing the potential to outpace wired endoscopy [4].However, the current focus of WCE is predominantly on the small intestine [5].This preference arises from the narrow luminal diameter of the small intestine within the gastrointestinal tract, enabling WCE to effectively capture essential imaging areas [5,6].Nonetheless, technological advancements are propelling efforts to expand WCE capabilities for observing the upper gastrointestinal tract [7][8][9][10].Notably, magnetic-controlled capsule endoscopy (MCCE) has emerged as a promising alternative to passive and uncontrollable WCE, demonstrating the capability to image landmark areas in the upper gastrointestinal tract [11].Figure 1 presents a comparative concept between the existing WCE and the ongoing development of MCCE.
agnostics 2024, 14, x FOR PEER REVIEW 2 of 20 efforts to expand WCE capabilities for observing the upper gastrointestinal tract [7][8][9][10].Notably, magnetic-controlled capsule endoscopy (MCCE) has emerged as a promising alternative to passive and uncontrollable WCE, demonstrating the capability to image landmark areas in the upper gastrointestinal tract [11].Figure 1 presents a comparative concept between the existing WCE and the ongoing development of MCCE.Despite the increasing interest in upper gastrointestinal WCE, there is a limited body of research dedicated to identifying anatomical landmarks through WCE imaging [12].This limitation is attributed to the nascent nature of this research domain [13,14].Xu et al. proposed a multitask anatomy detection network for gastroscopy, successfully classifying 10 upper gastrointestinal anatomical structures with an average detection accuracy of 93.74% and classification accuracy of 98.77% [15].Takiyama et al. conducted an automatic anatomical classification study on 27,335 images categorized into esophagogastroduodenoscopy (EGD)-based larynx, esophagus, stomach, and duodenum using a GoogleNetbased architecture [16].They further extended their study to classify and evaluate upper, middle, and lower stomachs using a validation set of 13,048 items, achieving a four-class classification accuracy of 97% and a three-position study accuracy of 99%.As indicated above, while studies exist on the identification of upper gastrointestinal landmarks through wired conventional endoscopy, research on upper gastrointestinal landmarks with WCE images has only recently emerged [17,18].Further investigations are needed for landmark identification in this specific context.The primary objective of this study was to propose a method for providing uniform and high-accuracy imaging with capsule endoscopic images, incorporating a prelandmark for the upper gastrointestinal tract.This approach aims to facilitate the imaging identification of active capsules within the upper gastric tract, anticipating their future commercial availability.Historically, capsule endoscopic images required video processing due to their lower quality and darker nature than wireless endoscope images.In our effort to enhance WCE images for improved accuracy, akin to wired endoscope images, we engaged in image preprocessing.Consequently, WCE images for the upper gastrointestinal tract became essential for achieving this purpose.However, a significant challenge in this field is the absence of a dedicated public dataset for the WCE images of upper gastrointestinal landmarks, excluding the small and large intestines [19].Existing public datasets primarily focus on classifying normal and abnormal diseases across unspecified segments of the gastrointestinal tract [20].While capsule endoscopy images of the stomach are available, they often lack coverage of essen- Despite the increasing interest in upper gastrointestinal WCE, there is a limited body of research dedicated to identifying anatomical landmarks through WCE imaging [12].This limitation is attributed to the nascent nature of this research domain [13,14].Xu et al. proposed a multitask anatomy detection network for gastroscopy, successfully classifying 10 upper gastrointestinal anatomical structures with an average detection accuracy of 93.74% and classification accuracy of 98.77% [15].Takiyama et al. conducted an automatic anatomical classification study on 27,335 images categorized into esophagogastroduodenoscopy (EGD)-based larynx, esophagus, stomach, and duodenum using a GoogleNetbased architecture [16].They further extended their study to classify and evaluate upper, middle, and lower stomachs using a validation set of 13,048 items, achieving a four-class classification accuracy of 97% and a three-position study accuracy of 99%.As indicated above, while studies exist on the identification of upper gastrointestinal landmarks through wired conventional endoscopy, research on upper gastrointestinal landmarks with WCE images has only recently emerged [17,18].Further investigations are needed for landmark identification in this specific context.The primary objective of this study was to propose a method for providing uniform and high-accuracy imaging with capsule endoscopic images, incorporating a prelandmark for the upper gastrointestinal tract.This approach aims to facilitate the imaging identification of active capsules within the upper gastric tract, anticipating their future commercial availability.Historically, capsule endoscopic images required video processing due to their lower quality and darker nature than wireless endoscope images.In our effort to enhance WCE images for improved accuracy, akin to wired endoscope images, we engaged in image preprocessing.Consequently, WCE images for the upper gastrointestinal tract became essential for achieving this purpose.However, a significant challenge in this field is the absence of a dedicated public dataset for the WCE images of upper gastrointestinal landmarks, excluding the small and large intestines [19].Existing public datasets primarily focus on classifying normal and abnormal diseases across unspecified segments of the gastrointestinal tract [20].While capsule endoscopy images of the stomach are available, they often lack coverage of essential imaging areas based on EGD, making it difficult to identify landmarks.
To address this gap, a realistic WCE dataset was created to contribute to the advancement of future upper gastrointestinal WCE.To validate the similarity of this dataset to actual WCE images, a small intestine dataset that includes both wired endoscopy and WCE endoscopy datasets was used, namely the Kvasir and Kvasir-capsule datasets, containing landmark images of the normal small intestine [19,21].Our hypothesis posited that applying the colors of the Kvasir-capsule to the Kvasir datasets using color transfer techniques would result in similar accuracy after training with DenseNet169.Testing this hypothesis affirmed that landmark images subjected to color transfer closely resemble actual WCE images.Subsequently, thresholds and error ranges were established based on this comparison, which were then applied to the upper gastrointestinal landmark dataset to create a comprehensive dataset [22,23].The WCE image dataset for the upper gastrointestinal tract, validated through this hypothesis, demonstrates performance similar to that obtained using WCE images captured from the actual upper gastrointestinal tract.Moreover, this dataset has the potential to enhance the accuracy of identifying landmarks when applied to wired endoscopy images, following image preprocessing.Our intended preprocessing includes utilizing a dataset with implemented color transfer and leveraging deep learning techniques to improve the accuracy of classifying images depicting upper gastrointestinal anatomical landmarks.After this process, we anticipate that even low-quality WCE images will demonstrate uniform and high accuracy comparable to high-quality wired endoscopes.Moreover, the dataset is expected to successfully be used to identify the landmarks required for EGD-based imaging.This is anticipated to contribute to the development of WCE for the upper gastrointestinal tract, slated for future commercialization.
This paper is organized as follows: Section 2 provides an overview of upper gastrointestinal tract landmark classification, introduces the dataset, and outlines experiments for model selection and resolution comparison.It details the hypothesis formulation, the method for selecting an image similarity proof method to create a dataset mimicking a WCE image, the image similarity proof experiment using the Kvasir and Kvasir-capsule datasets, the production of a WCE dataset for the upper gastrointestinal tract, and the image similarity verification process, including the resulting threshold and error range selection.In Section 3 (Results), we describe an experiment confirming landmark classification accuracy using the DenseNet169 model with the WCE datasets from Section 2. The section explains the results of enhanced classification accuracy achieved through image preprocessing.Section 4 (Discussion) discusses the experimental results, considers limitations, and proposes future research directions.Section 5 (Conclusions) provides a summary of this research.

Related Research and Model Selection
The adoption of deep learning technology has significantly impacted the field of medical imaging, and this influence has been further amplified through the use of CNNs [24,25].In particular, various deep learning models based on CNNs have been developed, presenting a novel approach to medical image analysis.Within this domain, research utilizing CNNs has flourished, resulting in the creation of numerous deep learning models tailored for CNNs [26][27][28].Previously, there was heavy reliance on the performance of capsule endoscopy equipment, but recent developments in deep learning technology have spurred a range of CNN-based innovative research.As a related research case, we analyzed research progress by year through the examination of anatomical landmark data captured via a wireless capsule endoscope.Table 1 illustrates the research progress over the years based on anatomical landmark data collected using wireless capsule endoscopes [14,[29][30][31][32][33][34][35][36][37].Research to date has used a variety of CNN models.However, several studies have shown wide and inconsistent limits in accuracy [14,[30][31][32][33][34][35][36].The performance of these models can vary depending on the characteristics of each dataset.In the pursuit of an effective CNN model for our dataset, we applied deep learning to the original dataset using commonly utilized models: ResNet, DenseNet, and Inception, as indicated in the latest studies presented in Table 1.Among these models, the DenseNet model exhibited the highest average classification accuracy of 93.28%.Given the various versions of the DenseNet model, including DenseNet121, DenseNet169, and DenseNet201, we systematically trained and evaluated these models to identify the optimal performer.The deep learning accuracy results for each model on the original data were 92.26% for the DenseNet121 model, 93.28% for the DenseNet169 model, and 91.17% for the DenseNet201 model.Ultimately, the DenseNet169 model was selected for our study due to its highest classification accuracy.DenseNet is a model specialized for transfer learning and excels in classification problems using small datasets, making it well suited for our research [38,39].Table 2 presents the deep learning results for each model on the original dataset.Notably, DenseNet169 has exhibited superior performance in various contexts, as reported in other studies [40].Additionally, Abbas et al. successfully employed DenseNet169 in their research on a five-stage automatic detection and classification system for hypertensive retinopathy, demonstrating its efficacy in classification tasks [41].In a study by Farag et al., a novel architecture based on DenseNet169 was created to classify the severity of diabetic retinopathy, achieving an impressive 97% accuracy in severity ratings [42].Based on this collective evidence, the DenseNet169 model was chosen for this study due to its anticipated high performance in classification accuracy.

Original Dataset
In this study, a dataset comprising 2526 images capturing upper gastrointestinal landmarks obtained from endoscopic procedures conducted at Yonsei Severance Hospital was utilized.The dataset included images of five distinct landmarks: Angulus, Antrum, Body A, Body B, as well as Cardia and Fundus.Table 3 provides the distribution of images in the original dataset, and Figure 2 illustrates an exemplary image from the dataset, showcasing instances of the five classes of upper gastrointestinal tract landmarks.

Original Dataset
In this study, a dataset comprising 2526 images capturing upper gastrointestinal landmarks obtained from endoscopic procedures conducted at Yonsei Severance Hospital was utilized.The dataset included images of five distinct landmarks: Angulus, Antrum, Body A, Body B, as well as Cardia and Fundus.Table 3 provides the distribution of images in the original dataset, and Figure 2 illustrates an exemplary image from the dataset, showcasing instances of the five classes of upper gastrointestinal tract landmarks.

Hypothesis Formulation
Figure 3 depicts the workflow of the flow study chart, detailing the progression.Initially faced with challenges in obtaining wireless capsule endoscopy (WCE) images targeting upper gastrointestinal landmarks, a hypothesis was developed to create an alternative dataset.Our hypothesis posited that endoscopic images of similar quality to WCE images would yield comparable outcomes when applied in practice.To validate this hypothesis, the Kvasir-capsule dataset, which contains WCE images, and the Kvasir dataset, which comprises wired endoscopy images, were used.Due to the limited availability of WCE images, especially for the small intestine, comparative analysis of the two datasets was deemed suitable.It was anticipated that applying color transfer to the Kvasir dataset to mimic the WCE image colors of the actual small intestine would yield results similar to those of the Kvasir-capsule dataset.To validate this expectation, a validation plan was applied to postulate that if color-transfer-induced similarity was confirmed through these results, analogous outcomes could be expected for upper gastrointestinal landmarks.Assessment of this similarity involved quantifying image comparison using the Euclidean distance.Initially faced with challenges in obtaining wireless capsule endoscopy (WCE) images targeting upper gastrointestinal landmarks, a hypothesis was developed to create an alternative dataset.Our hypothesis posited that endoscopic images of similar quality to WCE images would yield comparable outcomes when applied in practice.To validate this hypothesis, the Kvasir-capsule dataset, which contains WCE images, and the Kvasir dataset, which comprises wired endoscopy images, were used.Due to the limited availability of WCE images, especially for the small intestine, comparative analysis of the two datasets was deemed suitable.It was anticipated that applying color transfer to the Kvasir dataset to mimic the WCE image colors of the actual small intestine would yield results similar to those of the Kvasir-capsule dataset.To validate this expectation, a validation plan was applied to postulate that if color-transfer-induced similarity was confirmed through these results, analogous outcomes could be expected for upper gastrointestinal landmarks.Assessment of this similarity involved quantifying image comparison using the Euclidean distance.

Validation of Alternative WCE Datasets 2.4.1. Selection of the Image Similarity Measurement Method
To illustrate the similarity between color-transferred wired endoscope images and actual WCE images, this study's approach involved assessing image similarity.Several methods exist for evaluating similarity between images, with the Euclidean distance (a distance-based measurement) and cosine similarity (an angle-based measurement) being among the most commonly employed [43][44][45].The Euclidean distance quantifies the distance between vectors and gauges their similarity, with a closer proximity to 0 indicating greater similarity [46].Heidari et al. utilized color-based descriptors as image features and applied Euclidean distance for image comparisons [47].Furthermore, Wang et al. introduced an intuitive Euclidean distance measure for images, referred to as image Euclidean distance [48].The formula for Euclidean distance is expressed as follows: (1) here, p and q are vectors in multidimensional space.
Conversely, cosine similarity calculates image similarity by considering the cosine angle between vectors, yielding values between −1 and 1 [49].The formula for cosine similarity is given by Here, ||A|| and ||B|| represent the Euclidean norms of vectors A and B, respectively.Given the focus on determining color similarity rather than shape, the Euclidean distance suitable for this study's objectives was discovered, especially when shape value deviations were minimal.In cases where the deviation of each shape value was substantial, cosine similarity proved effective [50].Considering the difficulty in extracting features from the upper gastrointestinal landmarks compared with from other datasets, this study placed less emphasis on shape values.Consequently, the Euclidean distance was determined to be more aligned with this study's objectives.However, during the measurements of Euclidean distance, subtle differences in shape across each class were maintained [51].The VGG16 model, pre-trained on the ImageNet dataset, was employed for Euclidean distance measurements [52].

Comparative Analysis of the Kvasir and Kvasir-Capsule Datasets
To validate our hypothesis, two distinct proof procedures were implemented.First, a color transfer from the Kvasir-capsule image to the Kvasir image was executed.Utilizing the well-known color transfer algorithm, as reported by Reinhard et al., this algorithm transmits only red, green, and blue (RGB) colors by calculating the mean and standard deviation (SD) of the RGB channel of the target image in the original image [53].Notably, the original image's features remain untouched, and only the color is transferred to the resulting image.In a similar context, Yin et al. conducted a study leveraging color transfer to identify faces by race through the transfer of skin color [54].The formula for color transfer is expressed as follows: here, I k represents the color-transferred image for a specific channel k. σ k t , σ k A , S k , and T k denotes SD of the target image channel k, SD of the source image channel k, the source image channel k, and the target image channel k, respectively.k = (l, α, β) denotes the specific color channel, where l represents an achromatic channel; and α and β represent chromatic yellow-blue and red-green opponent channels, respectively.Subsequently, the color-transferred Kvasir and Kvasir-capsule datasets were trained to classify classes using DenseNet169 with a 320 × 320 input size.The classification accuracy for the color-transferred Kvasir dataset reached 98.95%, whereas the Kvasir-capsule dataset achieved an accuracy of 95.92%.Although the Kvasir dataset with color transfer exhibited a slightly higher error of approximately 3%, both datasets demonstrated an accuracy exceeding 95%, indicating similar performance.Further analysis using Euclidean distance to measure image similarity yielded values ranging from approximately 0.8 to 1.2.Table 4 provides a breakdown of Euclidean distance values for three classes, each consisting of 500 images from the Kvasir dataset with color transfer based on the Kvasir-capsule dataset.These two validation approaches collectively confirmed that the color-transferred WCE image exhibited deep learning accuracy akin to that of the actual WCE image.Figure 4 shows the illustration of color transfer and Kvasir-capsule images.Subsequently, the color-transferred Kvasir and Kvasir-capsule datasets were trained to classify classes using DenseNet169 with a 320 × 320 input size.The classification accuracy for the color-transferred Kvasir dataset reached 98.95%, whereas the Kvasircapsule dataset achieved an accuracy of 95.92%.Although the Kvasir dataset with color transfer exhibited a slightly higher error of approximately 3%, both datasets demonstrated an accuracy exceeding 95%, indicating similar performance.Further analysis using Euclidean distance to measure image similarity yielded values ranging from approximately 0.8 to 1.2.Table 4 provides a breakdown of Euclidean distance values for three classes, each consisting of 500 images from the Kvasir dataset with color transfer based on the Kvasir-capsule dataset.These two validation approaches collectively confirmed that the color-transferred WCE image exhibited deep learning accuracy akin to that of the actual WCE image.Figure 4 shows the illustration of color transfer and Kvasircapsule images.Building on the comparison of the results between the color-transferred Kvasir and Kvasir-capsule datasets, a WCE dataset for the upper gastrointestinal tract was created by applying color transfer from capsule endoscopy to images obtained through traditional wired endoscopy.To address the initial imbalance in the number of images across classes, data augmentation techniques were employed, including top-down inversion, left-right inversion, and a 20-degree tilt, as detailed in Table 3.This augmentation process resulted in 6400 photos, with 1280 photos in each class [55].
Using datasets from Liao et al. [56], Rahman et al. [57], and the Gastrointestinal Bleeding WCE dataset [58], these three datasets were selected as target images.The color information extracted from these target datasets was subsequently applied to the original images obtained through wired endoscopy at Severance Hospital.Consequently, a new dataset of WCE images for the upper gastrointestinal tract was obtained.Figure 5 illustrates example images demonstrating color transfer, and Figure 6 displays the resulting color-transferred images based on the three target datasets.The resulting image datasets were designated and produced through the application of color transfer as the L, R, and G datasets, respectively.The L dataset was derived from Liao et al. [56], the R dataset from Rahman et al. [57], and the G dataset by focusing on normal images within the Gastrointestinal Bleeding WCE dataset [58].These specific datasets served as references for our study.
pylorus, and normal z-line areas from the top.

WCE Dataset Construction Using Color Transfer
Building on the comparison of the results between the color-transferred Kvasir and Kvasir-capsule datasets, a WCE dataset for the upper gastrointestinal tract was created by applying color transfer from capsule endoscopy to images obtained through traditional wired endoscopy.To address the initial imbalance in the number of images across classes data augmentation techniques were employed, including top-down inversion, left-right inversion, and a 20-degree tilt, as detailed in Table 3.This augmentation process resulted in 6400 photos, with 1280 photos in each class [55].
Using datasets from Liao et al. [56], Rahman et al. [57], and the Gastrointestinal Bleeding WCE dataset [58], these three datasets were selected as target images.The color information extracted from these target datasets was subsequently applied to the original images obtained through wired endoscopy at Severance Hospital.Consequently, a new dataset of WCE images for the upper gastrointestinal tract was obtained.Figure 5 illustrates example images demonstrating color transfer, and Figure 6 displays the resulting color-transferred images based on the three target datasets.The resulting image datasets were designated and produced through the application of color transfer as the L R, and G datasets, respectively.The L dataset was derived from Liao et al. [56], the R dataset from Rahman et al. [57], and the G dataset by focusing on normal images within the Gastrointestinal Bleeding WCE dataset [58].These specific datasets served as references for our study.

Validation of Image Similarity: Measurement of Euclidean Distance
Image similarity was assessed by comparing three reference images with actual upper gastrointestinal tract WCE images using the three resultant datasets.The Euclidean distance was calculated for each of the five classes within the three datasets.Among the distance values derived from 1280 images for each class, the analysis focused on identifying the closest and farthest values from the target image.The results of the Euclidean distance measurements for the target image and the three datasets are presented in Table 5, with 1280 images used for each class.The left side of the table denotes the closest distance, and the right side represents the farthest distance.

Experimental Methods
An experiment was designed to classify upper gastrointestinal landmarks using deep learning based on three datasets.The training, validation, and test datasets comprised 3840, 1280, and 1280 pieces of data, respectively, with a distribution ratio of 6:2:2.The chosen model for the experiment was DenseNet169, which was implemented using TensorFlow, with a batch size of 16, and stochastic gradient descent as the optimizer.Two experiments were conducted within this experimental framework: • Accuracy comparison across three datasets based on five image sizes; • Accuracy evaluation for datasets with five image sizes and different image preprocessing techniques (sharpen and detail filters).
The first experiment involved the classification of upper gastrointestinal landmarks using DenseNet169 with various image sizes.Although the prevalent size for commercial WCE is 320 × 320 [59], in previous research, Iqbal et [60,61], which was undertaken to enhance accuracy and speed.The analysis covered input images of sizes 128 × 128, 256 × 256, 384 × 384, and 512 × 512.The goal was to align the results with those of previous studies and consider the image quality of current commercialized WCE images [62,63].The second experiment aimed to evaluate the accuracy of the datasets with five image sizes by incorporating distinct image preprocessing techniques such as the sharpen and detail filters.These experiments collectively explored the impact of various image sizes and preprocessing techniques on the classification of upper gastrointestinal landmarks using deep learning with DenseNet169.

Accuracy Comparison among Three Datasets
This study compared the landmark classification accuracy of three datasets based on DenseNet169.In the experiment, on the L dataset, an average classification accuracy of 92.62% was achieved; on the R dataset, the average accuracy was 92.41%; and on the G dataset, the average accuracy was 92.38%.Notably, the highest classification accuracy was achieved on average for the L dataset.For all three datasets, an average classification accuracy of 92% was maintained.

Classification Accuracy for Different Input Image Sizes
Among the five image sizes, the highest landmark classification accuracy was achieved for 128 × 128 and 256 × 256.Upon analysis, two assumptions were made.First, DenseNet uses a dense block composed of several convolutional layers, and the dense connection between these layers facilitates information reuse.The experiment suggested that smaller images resulted in more accurate landmark classifications than larger images, contributing to increased accuracy.Second, the relatively small size of the dataset (around 6400 images) allowed for effective transfer learning with small data, implying that larger input images may result in lower accuracy.Table 6 confirms that based on these factors, the accuracy experiences a slight decrease as the image size increases.

Classification Accuracy among Different Classes
In the experiment, the highest accuracy was achieved for the Cardia and Fundus classes among the five classes, whereas the lowest accuracy was achieved for the Body A class.Upon analyzing the dataset images, the distinct narrow features of the esophagus were prominent, especially around the Cardia portion in the Cardia and Fundus class.The Fundus class was characterized by several parallel upper wrinkles.These unique feature points differed from those in other classes, suggesting successful learning in extracting these distinctive features through deep learning.Consequently, probability of misclassification for images in the Cardia and Fundus class was lower.Conversely, the lowest accuracy was achieved for Body A. Despite representing the same body part as the Body B class, Body A was captured in the greater curvature area, whereas Body B was photographed in the lesser curvature area, resulting in different shooting positions.However, the differences in wrinkle shapes between the two classes were not significant, and numerous images featured smooth stomach walls, posing challenges in extracting distinctive feature points for both classes.Consequently, a high probability of misclassification arose due to the difficulty in distinguishing between the two classes during deep learning.Figure 7 illustrates examples of correctly and incorrectly classified images for Body A and Body B. As shown in the examples, distinguishing features between Body A and Body B is challenging through human visual perception.

Classification Accuracy among Different Classes
In the experiment, the highest accuracy was achieved for the Cardia and Fundus classes among the five classes, whereas the lowest accuracy was achieved for the Body A class.Upon analyzing the dataset images, the distinct narrow features of the esophagus were prominent, especially around the Cardia portion in the Cardia and Fundus class.The Fundus class was characterized by several parallel upper wrinkles.These unique feature points differed from those in other classes, suggesting successful learning in extracting these distinctive features through deep learning.Consequently, probability of misclassification for images in the Cardia and Fundus class was lower.Conversely, the lowest accuracy was achieved for Body A. Despite representing the same body part as the Body B class, Body A was captured in the greater curvature area, whereas Body B was photographed in the lesser curvature area, resulting in different shooting positions.However, the differences in wrinkle shapes between the two classes were not significant, and numerous images featured smooth stomach walls, posing challenges in extracting distinctive feature points for both classes.Consequently, a high probability of misclassification arose due to the difficulty in distinguishing between the two classes during deep learning.

Classification Accuracy through Image Preprocessing Using Sharpen and Detail Filters
In diverse scenarios, image preprocessing enhances outcomes.For Experiment 1, the goal was to boost accuracy by applying three image filters-sharpen, detail, and sharpen and detail-across three datasets.The sharpen filter, modeled after the approach of Nascimento et al. [64], demonstrated superior performance improvement compared with the original image.Additionally, findings in studies by Hentschel et al. [65] suggested that the detail filter, which increases sharpness, enhances image quality.To enhance the accuracy of the classification of upper gastrointestinal landmarks, sharpen and detail filters were applied.Both filters were imported and implemented using Python's scikit-image package [66], resulting in the creation of nine new datasets.The anticipation was that the sharpen filter would excel in highlighting feature points and the detail filter would improve accuracy by expressing points in detail.The combination of both filters in the sharpen and detail filter was hypothesized to yield the highest accuracy.Training was conducted with these nine datasets using five different image sizes, following the same learning environment as in Experiment 1.The results are presented in Table 7, and the reliability of these results was enhanced through the inclusion of precision, recall, and the F1 measure as evaluation indicators.In this experiment, an evaluation index utilizing a confusion matrix is presented.Evaluation indicators include precision, recall, and F1 measures, along with accuracy.The formula for each indicator is as follows:

Accuracy Comparison between the Datasets
In this comprehensive analysis, the classification accuracy of DenseNet169-based datasets across various factors was assessed.The results indicated that datasets with sharpen and detail filters exhibited the highest classification accuracy.Applying the sharpen filter led to a slight increase in accuracy, whereas the detail filter, when applied alone, showed no significant improvement.However, simultaneous application of both filters resulted in an overall accuracy increase compared with that achieved on the baseline dataset.These outcomes highlight the enhanced accuracy achievable with datasets that feature the sharpen and detail filters.

Classification Accuracy across Different Input Image Sizes
Among the five image sizes tested, 128 × 128 and 320 × 320 demonstrated the highest average accuracy.Notably, the 512 × 512 size of the R sharpen and detail dataset exhibited the highest accuracy improvement-approximately 2.74%, rising from 91.32% in the original R dataset to 94.06%.The second-highest improvement was observed in the 512 × 512 size of the G sharpen and detail dataset, which increased by 1.59% from 91.87% to 93.46% of the original R dataset.While DenseNet169's average accuracy was primarily notable in the 128 × 128 video size section, applying the sharpen and detail filter resulted in improved classification accuracy even for larger image sizes, contrary to prior experimental results.

Classification Accuracy among Different Classes
Among the five classes, the highest accuracy was achieved for the Cardia and Fundus class, whereas the lowest accuracy was achieved for the Body A class, consistent with the findings in Section 3.1.However, it was observed that the probability of misclassification for each class decreased with the application of image filters.Figure 8 presents the confusion matrices of the three datasets with the sharpen and detail filter applied.The left matrix corresponds to a size of 320 × 320, and the right matrix corresponds to a size of 512 × 512.

Accuracy Comparison between the Datasets
In this comprehensive analysis, the classification accuracy of DenseNet169-based datasets across various factors was assessed.The results indicated that datasets with sharpen and detail filters exhibited the highest classification accuracy.Applying the sharpen filter led to a slight increase in accuracy, whereas the detail filter, when applied alone, showed no significant improvement.However, simultaneous application of both filters resulted in an overall accuracy increase compared with that achieved on the baseline dataset.These outcomes highlight the enhanced accuracy achievable with datasets that feature the sharpen and detail filters.

Classification Accuracy across Different Input Image Sizes
Among the five image sizes tested, 128 × 128 and 320 × 320 demonstrated the highest average accuracy.Notably, the 512 × 512 size of the R sharpen and detail dataset exhibited the highest accuracy improvement-approximately 2.74%, rising from 91.32% in the original R dataset to 94.06%.The second-highest improvement was observed in the 512 × 512 size of the G sharpen and detail dataset, which increased by 1.59% from 91.87% to 93.46% of the original R dataset.While DenseNet169's average accuracy was primarily notable in the 128 × 128 video size section, applying the sharpen and detail filter resulted in improved classification accuracy even for larger image sizes, contrary to prior experimental results.

Classification Accuracy among Different Classes
Among the five classes, the highest accuracy was achieved for the Cardia and Fundus class , whereas the lowest accuracy was achieved for the Body A class , consistent with the findings in Section 3.1.However, it was observed that the probability of misclassification for each class decreased with the application of image filters.Figure 8 presents the confusion matrices of the three datasets with the sharpen and detail filter applied.The left matrix corresponds to a size of 320 × 320, and the right matrix corresponds to a size of 512 × 512.

Discussion
In this experiment, the absence of commercially available upper gastrointestinal landmark datasets was addressed by generating and validating wireless capsule endoscopy (WCE) data based on a developed hypothesis.Initially, the image similarity in the small intestine dataset with real WCE images through color transfer was established, extending the same technique to the upper gastrointestinal tract dataset.While the ideal enhancement would involve using actual upper gastrointestinal landmark WCE images, the absence of a commercially available active capsule endoscope capable of imaging upper gastrointestinal landmarks led us to conduct experiments with datasets resembling the real environment.Obtaining real data for upper gastrointestinal landmarks in the future would significantly enhance the reliability of validating image similarity with our hypothesized dataset.The class-based classification accuracy results for the five landmarks were crucial.Across several experiments, high accuracy was achieved for the Cardia and Fundus classes, with the second-highest accuracy exhibited for the Angulus class .These classes, distinguished by the shape of the stomach folds and the stomach wall contributed to their high accuracy.In contrast, Body A and Body B exhibited the lowest average classification probability among the five classes, indicating unclear differences between images belonging to these two classes.It is hoped that collecting more data and refining the model structure will lead to improved accuracy, which will require further research for a clearer classification of these two classes in the future.And, the

Discussion
In this experiment, the absence of commercially available upper gastrointestinal landmark datasets was addressed by generating and validating wireless capsule endoscopy (WCE) data based on a developed hypothesis.Initially, the image similarity in the small intestine dataset with real WCE images through color transfer was established, extending the same technique to the upper gastrointestinal tract dataset.While the ideal enhancement would involve using actual upper gastrointestinal landmark WCE images, the absence of a commercially available active capsule endoscope capable of imaging upper gastrointestinal landmarks led us to conduct experiments with datasets resembling the real environment.Obtaining real data for upper gastrointestinal landmarks in the future would significantly enhance the reliability of validating image similarity with our hypothesized dataset.The class-based classification accuracy results for the five landmarks were crucial.Across several experiments, high accuracy was achieved for the Cardia and Fundus classes, with the second-highest accuracy exhibited for the Angulus class.These classes, distinguished by the shape of the stomach folds and the stomach wall contributed to their high accuracy.In contrast, Body A and Body B exhibited the lowest average classification probability among the five classes, indicating unclear differences between images belonging to these two classes.It is hoped that collecting more data and refining the model structure will lead to improved accuracy, which will require further research for a clearer classification of these two classes in the future.And, the DenseNet169 model exhibited notable efficacy in leveraging learned features and facilitating gradient propagation, particularly in the context of increasingly high-resolution medical imaging data.However, its deep architecture and dense connectivity contribute to substantial memory usage, requiring considerable computational resources and time for training.To address these challenges and ensure responsiveness while conserving resources, especially at elevated resolutions, there is a pressing need for refinements and adjustments in the model design.This study also explored the impact of image size on classification accuracy, revealing a decrease in accuracy as image size increased.To enhance overall classification accuracy, sharpen and detail filters were employed among other image filters in the second experiment, resulting in improved accuracy for all image sizes.This finding holds promise for stimulating medical imaging studies involving classification.

Conclusions
This study established a hypothesis to simulate wireless capsule endoscopy (WCE) images for upper gastrointestinal landmarks by creating 6400 images augmented from 2546 datasets and validated the hypothesis through dataset collection.The Euclidean distance was subsequently applied to assess image similarity.The DenseNet169 model confirmed the highly similar accuracy of the three datasets to the WCE images.Furthermore, training DenseNet169 with a preprocessed dataset incorporating image filters demonstrated improved accuracy compared with the previous experiment.
In the evolving market for endoscopy equipment, our research focused on identifying landmarks captured using future active capsule endoscopes and enhancing the accuracy of the classification of capsule endoscope images through image preprocessing, utilizing color transfer techniques similar to those employed for wired endoscope images.While current wired endoscopes offer high image quality and clear vision, capsule endoscope imaging still requires further enhancements in image quality.Our research findings indicate that, through image preprocessing, deep learning models trained on capsule endoscope images can achieve accuracy comparable to those trained using wired endoscope images, facilitated by sharpening and detail filtering.The medical significance lies in the fact that image preprocessing allows capsule endoscopy images to be used produce results similar to those of wired endoscopy even under conditions with lower image quality.It is anticipated that these advancements will enable more precise diagnostic assistance in gastroscopy through images captured with future capsule endoscopy equipment for the upper gastrointestinal tract.

Figure 3
Figure 3 depicts the workflow of the flow study chart, detailing the progression.Initially faced with challenges in obtaining wireless capsule endoscopy (WCE) images targeting upper gastrointestinal landmarks, a hypothesis was developed to create an alternative dataset.Our hypothesis posited that endoscopic images of similar quality to WCE images would yield comparable outcomes when applied in practice.To validate this hypothesis, the Kvasir-capsule dataset, which contains WCE images, and the Kvasir dataset, which comprises wired endoscopy images, were used.Due to the limited availability of WCE images, especially for the small intestine, comparative analysis of the two datasets was deemed suitable.It was anticipated that applying color transfer to the Kvasir dataset to mimic the WCE image colors of the actual small intestine would yield results similar to those of the Kvasir-capsule dataset.To validate this expectation, a validation plan was applied to postulate that if color-transfer-induced similarity was confirmed through these results, analogous outcomes could be expected for upper gastrointestinal landmarks.Assessment of this similarity involved quantifying image comparison using the Euclidean distance.

Figure 3 .
Figure 3. Workflow of the study.Figure 3. Workflow of the study.

Figure 3 .
Figure 3. Workflow of the study.Figure 3. Workflow of the study.

Table 4 .Figure 4 . 3 .
Figure 4. Illustration of Color Transfer and Kvasir-capsule Images: (a) Example image of Kvasircapsule (1); (b) Example image of Kvasir-capsule (2).These two images serve as examples for calculating Euclidean distance.(c) Demonstration of the color transfer from the Kvasir-capsule image to the Kvasir dataset image.The images are arranged in the order of normal cecum, normal pylorus, and normal z-line areas from the top.

Figure 5 .
Figure 5. Example images demonstrating color transfer.This process illustrates the application of color transfer to the original images based on the target images.The original images were sourced from wired endoscopy, and the target images represent examples from one of the three target datasets, namely, the Gastrointestinal Bleeding WCE dataset.

Figure 5 .
Figure 5. Example images demonstrating color transfer.This process illustrates the application of color transfer to the original images based on the target images.The original images were sourced from wired endoscopy, and the target images represent examples from one of the three target datasets, namely, the Gastrointestinal Bleeding WCE dataset.

Figure 7 .
Figure 7. Examples of correctly and incorrectly classified images for the Body A and Body B classes: (a) correctly classified images of Body A; (b) incorrectly classified images of Body A; (c) correctly classified images of Body B; (d) incorrectly classified images of Body B.

Figure 7 .
Figure 7. Examples of correctly and incorrectly classified images for the Body A and Body B classes: (a) correctly classified images of Body A; (b) incorrectly classified images of Body A; (c) correctly classified images of Body B; (d) incorrectly classified images of Body B.

Table 1 .
Studies on landmarks in wireless capsule endoscopy by year.

Table 2 .
Accuracy comparison of five models on the original dataset.

Table 3 .
Distribution of images in the original dataset.

Table 2 .
Accuracy comparison of five models on the original dataset.

Table 3 .
Distribution of images in the original dataset.

Table 4 .
Euclidean distance measurements for color-transferred Kvasir dataset and Kvasircapsule dataset.

Table 6 .
Comparison of results across five input sizes for L, R, and G datasets.