1. Introduction
Globally, EC is a considerable public health concern. The two primary histological types, squamous cell carcinoma (SCC) and adenocarcinoma (ACE), exhibit substantial differences in their incidence patterns and critical etiological factors, complicating the understanding and prevention of this illness. Their elevated mortality rate is the primary trait they share [
1]. SCC is common in certain regions of Africa and East Asia, often linked to lifestyle factors such as smoking and drinking alcohol. However, ACE has become common in Western countries and is associated with Barrett’s esophagus, gastroesophageal reflux disease (GERD), and obesity. Dysphagia, unaccountable weight loss, and a chronic cough are nonspecific signs of EC that may lead to late-stage detection and poor prognosis; the overall five-year survival rate typically ranges from 15% to 25% [
2]. Patients undergo contemporary treatment modalities such as chemotherapy, surgery, and radiotherapy, contingent upon the type and stage of cancer [
3].
The medical imaging datasets are fundamental in enhancing EC diagnoses. The National Taiwan University Hospital Yun-Lin Branch dataset consists of 2400 images that consist of three groups, training (1800), validation (400), and test (200), that are classified into eight groups (reflecting different gastrointestinal and esophageal disorders), such as malignancy, normal tissue, staining, esophageal junction, varicose veins, duodenum, stomach, and inflammation. This dataset has been used to classify EC using machine learning and deep learning models, such as logistic regression, VGG16, the Polynomial Classifier, and YOLOv8. YOLOv8 outperformed other models in terms of object detection, whereas the other models played a role in the analysis of complicated patterns and binary categories.
Spectroscopy imaging or HSI is the examination of light’s interaction with the material under analysis. It is a hybrid approach that integrates imaging and spectroscopy. Collecting spectral data from each pixel of a two-dimensional (2-D) array detector generates a three-dimensional (3-D) database of spectral and spatial information [
4]. This spatial information indicates the origin of each spectrum in the samples, enabling a more accurate analysis in relation to the environmental lighting conditions. Moreover, HSI covers a continuous spectrum of light, featuring several spectral bands and an elevated spectral resolution. Consequently, HSI can capture the spectral fluctuation under two-dimensional disparate environmental situations [
5]. The spectral reflectance signature curve of the image, analyzed pixel by pixel, is displayed on the left. The RGB image has three bands in red, green, and blue wavelengths. The intensity curve of a pixel in the RGB image is positioned at the far right. It quantifies the quantity of light that a certain target or object transmits, reflects, or emits [
6]. The optical imaging has also improved, as compared to the existing White Light Imaging (WLI) and Narrow Band Imaging (NBI), in hyperspectral imaging (HSI), which offers more spectrally and spatially rich information to detect disease. As summarized in the recent studies, Spectral Imaging Technology, HSI, is used in biomedical imaging to better characterize tissues using multiple wavelengths, which has enhanced diagnostic image accuracy. HSI provides continuous spectral data, and unlike WLI and NBI, which use restricted spectral ranges, this provides greater distinctiveness between normal and pathological tissues. Based on these developments, the present work proposes the Spectrum-Aided Vision Enhancer (SAVE) to transform average WLI images into hyperspectral images to improve the endoscopic image classification without the need to use a specific hardware [
7].
NBI functions as a specific HSI technique by acquiring images from limited wavelength bands. The exact technique improves image attributes to provide great efficacy in medical diagnostics and other activities necessitating precise discrimination [
8]. Two short-wavelength light beams at 415 nm (blue) and 540 nm (green) collaborate to improve optical images in accordance with this method [
9]. Tissues permit light transmission as longer wavelengths penetrate more effectively through their structure due to scattering and absorption characteristics. Medical diagnostics significantly benefit from the application of 415 nm blue light, which clarifies superior mucosal vascular features, and the complementary 540 nm green light, which enhances the visibility of submucosal Intraepithelial Papillary Capillary Loops (IPCLs) [
10]. A clear color contrast is evident in the photograph, with surface vessels depicted in brown and submucosal vessels in cyan. This approach enhances the visual differentiation between blood vessels and mucosal tissue [
11]. Multiple studies indicate that NBI yields more accurate diagnoses than WLI, with respect to accuracy, sensitivity, and specificity.
This paper presents SAVE, a system that converts WLI images into HSI images using band selection to identify certain narrow bands. Multiple machine learning and deep learning architectures were developed using SAVE photos in conjunction with WLI images from EC classifications. The trained results exhibited sensitivity rates, precision levels, accuracy, and F1-score metrics for comparative analysis.
2. Materials and Methods
2.1. Dataset
The dataset from the National Taiwan University Hospital Yun-Lin Branch aims to improve the diagnosis of esophageal disorders using medical images analyzed by machine learning algorithms. The complete dataset consists of 2400 images, separated into three segments: 200 in the test set, 400 in the validation set, and 1800 in the training subset, as indicated in
Table 1. Eight distinct categories are employed to classify each image, representing various gastrointestinal and esophageal diseases essential for accurate diagnosis and treatment planning. The three classifications are malignancy, indicating malignant lesions in the esophagus and crucial for malignancy detection; normal, representing healthy esophageal tissue; and staining, which reveals alterations in tissue coloration that may imply pathological processes. Other groupings include the esophageal junction, which concentrates on the area where the esophagus and stomach converge, and varicose, which emphasizes irregularities in blood vessels. The dataset includes images of the duodenum, representing the initial portion of the small intestine; the stomach, crucial for comprehensive gastrointestinal evaluations; and inflammation, indicative of inflammatory conditions impacting the esophagus.
The data utilized in this paper was collected retrospectively in the Department of Gastroenterology, Ditmanson Medical Foundation Chia-Yi Christian Hospital in 2019–2023 by use of an Olympus (Olympus Corporation, Hachioji, Tokyo, Japan) EVIS EXERA III system (CV-190 platform, GIF-HQ190 model). Images were of high quality (1920 × 1080, JPEG, sRGB) and were mostly ripped in the form of stills of the normal endoscopic videos. This prospective study was conducted in accordance with the ethical standards of the Institutional Research Board and the principles of the 1964 Declaration of Helsinki and its later amendments. The study protocol was approved by the Institutional Review Board of National Taiwan University Hospital (NTUH) (IRB No. NTUH-202410087RINA; approved on 1 January 2025). Written informed consent was waived in this study because of its retrospective, anonymized design. An initial set of more than 5000 clinically validated images was subsequently selected to yield 2400 pictures after quality control into eight esophageal disease classes, such as normal, dysplasia, and esophageal cancer. Trained annotators performed the labeling, and a senior gastroenterologist confirmed them to be clinically accurate. The augmentation methods used to address the imbalance in the classes included rotation, flipping, and contrast normalization, which ensured that the classes had equal representation when training the models. Images of realistic artifacts are deliberately incorporated in the dataset to enhance the robustness of the models and represent the variability of the clinical environment.
PCA
The principal component analysis (PCA) biplot with K-Means clustering visually represents the findings of the dimensionality reduction and clustering for the classification of endoscopic images into categories such as malignancy, inflammation, varicose veins, staining, and normal regions. The initial two principal components (PC1 and PC2) account for 17.8% and 12.8% of the variance, respectively, facilitating a distinct delineation of clusters in the diminished feature space. Distinct clusters, such as “cancer” and “normal,” display considerable separation, indicating the model’s potential for precise classification. Nevertheless, specific clusters, such as “inflammation” and “varicose veins,” exhibit significant overlap, which shows similarities in visual characteristics that could result in misdiagnosis. The cluster sizes, indicated by the number of images in each category, further highlight the data distribution, with the “varicose veins” cluster containing the most substantial sample size. The presence of overlapping regions indicates the necessity for further feature extraction methods or the incorporation of advanced classifiers to improve differentiation. The PCA visualization corroborates its efficacy by the K-Means clustering method while highlighting opportunities for enhancement to obtain more precise and dependable categorization in endoscopic image analysis, as illustrated in
Figure 1.
2.2. Model Architecture
2.2.1. Logistic Regression
Logistic regression is a fundamental statistical method utilized for medical image classification [
12]. We employed logistic regression to classify EC images by analyzing grayscale image data [
13].
X input images with dimensions 256 × 256 are initially converted into a vector of 65,536 elements. A single-layer neural network represents the logistic regression model by linearly transforming input characteristics via the weight matrix
W and the bias term
b, as delineated in Equation (1).
The linear combination of the input features constitutes
Z at this juncture. After the generation of the linear combination
Z, it goes through the SoftMax activation function, yielding projected probability for all C cancer types as delineated in Equation (2).
The input
X is classified under class c. This probability is represented as
. The models employs cross-entropy loss in its training process, as delineated in Equation (3).
The total training sample size
N, along with
yi and
c, denotes the accurate class labels in this formula. Backpropagation calculates the gradients for Stochastic Gradient Descent (SGD) optimization, which modifies parameters
W and
b. The first moment estimate
is calculated according to Equation (4), the subsequent moment estimate
is revised in accordance with Equation (5), and the ultimate parameter adjustment is executed utilizing Equation (6).
where
η denotes the learning rate. The training spanned 300 epochs with a batch size of 64, incorporating data augmentation that used image scaling and grayscale conversion methods. The model evaluation encompassed accuracy metrics alongside confusion matrices and classification reports to ascertain its capacity to distinguish EC phases.
2.2.2. VGG16
This study employed the VGG16 deep learning model to classify photos of EC. The VGG16 deep learning model operates as a convolutional neural network (CNN) pre-trained on the ImageNet dataset, making it appropriate for our research via transfer learning [
14]. The fundamental form of the model utilizes small 3 × 3 kernels in convolutional layers, together with hierarchical feature extraction using max-pooling layers [
15]. The extraction procedure is defined as Equation (7).
The operation includes three primary components:
X represents the picture input,
W denotes the filters in combination with the bias term
b, and
f signifies the activation function. The conclusive classification is represented by Equation (8).
The training process utilizes the categorical cross-entropy loss function, represented as Equation (9).
The Adam optimizer was employed for optimization, utilizing the weight update rule outlined in Equations (10)–(12).
The algorithm uses and as first and second moment estimations, with η representing the learning rate and ϵ serving as a safeguard against division by zero.
The training period lasted 300 epochs, incorporating data augmentation techniques such as rotation, zoom, and horizontal flipping to improve generalization. The model evaluation approach included accuracy assessment, confusion matrix analysis, and classification report examination, which demonstrated its proficiency in accurately distinguishing between different forms of esophageal cancer.
2.2.3. YOLO V8
The system utilizes YOLOv8 to analyze EC while executing deep learning tasks via a controlled procedure for data management, model construction, and assessment [
16]. Training commences with the confirmation of three distinct directories designated for training, validation, and testing within the dataset structure [
17]. The lightweight YOLOv8n-cls.pt model is designed for classification tasks. It enhances the velocity and accuracy of its identification operations.
The training procedure spans 300 epochs, with images of 224 × 224 pixels, and does not incorporate early termination functionality due to the patience parameter being set to 0. The validation procedure initially selects data from the valid/directory, if it exists; otherwise, it defaults to the test/set data if it is valid/is devoid of data. The model is subjected to performance evaluation using the selected dataset. In classification, the cross-entropy loss serves as the primary loss function, imposing penalties for erroneous predictions, as demonstrated by Equation (13).
where
C = number of classes;
= actual label;
= predicted probability for class i.
The trained YOLOv8 model employs the test dataset for final assessment via predictions that facilitate the measurement of classification accuracy and the evaluation of the model’s stability. The processing method ensures rapid evaluation periods while maintaining precise model testing and protecting dataset features and configuration data [
18].
2.2.4. Mobile NetV2
The lightweight deep learning model Mobile Network Version 2 (MobileNetV2) facilitates EC classification through a structured method including dataset preparation, model training, and evaluation [
19]. The data preprocessing phase performs two operations on the images: resizing each to 256 × 256 pixels and normalizing using the mean and standard deviation from ImageNet. Consequently, the model attains enhanced performance via standardized input representations [
20]. MobileNetV2 attains its objective of preserving accuracy via depth-wise separable convolutions, which concurrently diminish computing complexity. The final completely linked layer utilizes C classes via the SoftMax activation function
σ, which transforms logits into probability distributions across the C categories, as illustrated in Equation (14).
The model optimization method relies on cross-entropy loss to assess the divergence between predicted class outcomes and actual probabilities, as delineated in Equation (15).
The Adam optimizer updates model weights using first and second moment gradient estimates, facilitating the dynamic adjustment of the learning rate. The learning rate schedule features a step decay that reduces the learning rate by a factor of 0.1 at every 10-epoch interval. During training, AMP enables certain computations to function in 16-bit floating-point FP16 mode rather than 32-bit FP32 mode, hence enhancing performance efficiency while preserving accuracy levels [
21].
The method utilizes deep learning technologies alongside optimization breakthroughs to deliver an efficient procedure for EC diagnostics, facilitating the early detection of cancer. The simultaneous implementation of the MobileNetV2 architecture, including cross-entropy loss, Adam optimization, mixed-precision training, and learning rate scheduling, enhances the model’s efficacy in medical picture classification tasks.
2.3. SAVE
The spectral analysis technique, in conjunction with HSI, employs band selection to identify specific wavelengths from the entire spectral range prior to further analysis. The data collection methodologies of HSI across various wavelengths yield non-informative datasets, rendering this data selection process exceedingly beneficial. SAVE was created by Hitspectra Intelligent Technology Co., Ltd., located in Kaohsiung City, Taiwan, to address band selection issues and fixed-band constraints, hence improving result accuracy. SAVE converts all RGB and WLI standard images into HSI picture format.
In spectral reconstruction inside SAVE, precise mathematical formulations facilitate the accurate transformation of images from their conventional format to hyperspectral format. The conversion calibration procedure uses the 24-color Macbeth Color Checker, incorporating natural ambient color samples. JPEG images encoded in the standard RGB (sRGB) color space necessitate a modification to standardize their R, G, and B values within the range of 0 to 1. Through linearized RGB data in conjunction with the Gamma function, the technique produces results in the CIE 1931 XYZ color space, which delineate numerical correlations between wavelengths of the visible spectrum and perceived color responses.
The conversion transforms the normalized RGB values into XYZ values (XYZcamera), hence yielding CIE XYZ tristimulus values. The camera system operates to replicate the visual representations of colors found in the Macbeth Color Checker, which serves as the reference standard. The conversion employs the specific mathematical pattern illustrated in Equations (16)–(18).
Conversion of RGB to CIE 1931 XYZ color space
where
is defined in Equation (19):
The acquired spectral data is transformed utilizing both the ophthalmoscope light source spectrum S(λ) and the XYZ color matching tool. R(λ) (380 nm–780 nm, 1 nm) within the XYZ color gamut pertains to the light source spectrum of the ophthalmoscope hyperspectral system. The procedure applies S(λ) in conjunction with the XYZ color matching algorithm.
Modifications must be implemented in the camera’s nonlinear response utilizing a third-order formula that employs variables as response modifiers. The V matrix is derived from the standardized outputs of and and contains the variable. The standardization process is limited to the third order to prevent excessive rectification.
The color difference analysis utilizing CIEDE 2000 necessitates the transformation of both XYZ Correct and XYZ Spectrum via a translation process. An array (R(λ))401*24 was utilized to arrange the spectra in accordance with a matrix structure, wherein the matrix comprises intensity values at 1 nm wavelength intervals along its rows.
The Macbeth Color Checker comprises 24 color samples, which form the columns in this configuration.
The established method ensures flawless spectrum conversion, hence improving the accuracy rate and operational efficiency of endoscopic HSI. As illustrated in
Figure 2, the study followed a structured pipeline starting with preprocessing and dataset partitioning, followed by model training with different architectures. A color correction step was incorporated to improve image consistency, and model performance was assessed.
Spectral reconstruction of SAVE technique is performed by creating a fine correlation between RGB or images of white light (WLI) as shown in
Figure 3 and reference spectral information with the Macbeth Color Checker (X-Rite Classic) to calibrate it as shown in
Figure 4 (see
supplementary Table S2 for the RMSEs of the XYZ values before and after calibration and
Table S3 for the color difference before and after camera calibration). It is a color checker with a set of 24 standardized color patches that represent natural colors, which is used in measurements when converting endoscopic RGB images to the CIE 1931 XYZ color space. The values in the RGB color space are scaled and linearized with Gamma correction function to ensure that the responses to colors are true to life. The corrected values are then remapped to values of the XYZ tristimulus values (XYZ camera), such that SAVE can reproduce the spectral behavior of the reference Macbeth chart. SAVE identifies the most informative spectral features and removes the redundant data using multiple regression (MRA) and principal component analysis (PCA). The PCA-based band selection is an eigenvector-based band selection method, which reveals the eigenvectors with the highest contribution to the spectral variance, or, in other words, more than 99% of information, thus dimensionality reduction is achieved without compromising diagnostic fidelity. This guarantees that only necessary wavelength bands are kept, improving computational efficiency and retaining important spectral properties required to make medical imaging produce true color and tissue in medical imaging (see
supplementary Figure S29 for the RMSEs between analog and measured spectra of each color block;
Figure S30 for the chosen hyperspectral bands for the SAVE algorithm and
Figure S31 for the SSIM and PSNR test for the SAVE algorithm). The WLI images shown in
Figure 3 and the corresponding SAVE images as shown in
Figure 4 are compared with the original NBI images shown in
Figure 5.
4. Discussion
This research paper discusses the application of machine learning and deep learning technologies for the interpretation of medical images to classify esophageal disorders. This study illustrates the critical necessity of the timely detection of EC, since this health concern is a significant global public health challenge [
22]. Advanced imaging technology combined with machine learning models enhances endometrial cancer analysis and treatment planning due to the disease’s dismal survival rates [
23].
This study included 2400 medical images sourced from Kaohsiung Medical University, categorized into eight unique classes encompassing diverse esophageal and gastrointestinal diseases. An appropriate allocation of images into training, validation, and testing segments facilitated a comprehensive model evaluation. The disease classification encompassed cancer, normal tissue, staining, varicose veins, esophageal junction abnormalities, inflammation, and diseases of the duodenum and stomach [
24]. This study assessed various machine learning and deep learning frameworks to optimize classification accuracy [
23].
This study included a combination of logistic regression alongside VGG16, YOLOv8, and MobileNetV2. The primary statistical model of logistic regression executed binary tasks, while both CNNs and VGG16 accomplished image recognition functions [
25]. This research chose YOLOv8 for its rapid object detection capabilities and MobileNetV2 for its efficient lightweight architecture, making it suitable for medical imaging applications [
26].
This study presented SAVE, a novel technique designed to convert WLI images into HSI via its established transformation method. The research team assessed the performance of WLI, NBI, and SAVE imaging modalities. The research findings indicated that SAVE, in conjunction with NBI, had a superior performance compared to WLI in the assessment of illness categories. SAVE attained a flawless classification performance, establishing it as a promising instrument for medical image analysis. The precision, recall, and F1-score attained a value of 1.00 for all classes.
The evaluation of the model performance revealed that VGG16 attained complete success in image categorization, with a 100% accuracy rate across all techniques. The performance of YOLOv8 differed by its imaging approach, with WLI achieving an 81% accuracy, SAVE attaining 85%, and NBI realizing 82% success. The optimal outcome from MobileNetV2 was achieved with WLI images, resulting in an accuracy of 86.5%. The accuracy rates of SAVE and NBI were 80% and 75%, respectively. Research demonstrates that deep learning models effectively detect diseases; nevertheless, their detection performance is contingent upon the chosen imaging approach [
27].
The assessment of machine learning models necessitates the accurate evaluation of four fundamental metrics: the precision, recall, F1-score, and accuracy [
28]. This work demonstrates that hyperspectral imaging combined with NBI is crucial for enhancing diagnostic accuracy in the detection of esophageal disorders [
29]. This research facilitates medical imaging advancement by integrating sophisticated imaging systems with deep learning models, thereby generating new opportunities for the early diagnosis of esophageal conditions [
30].
4.1. Practical Challenges
The SAVE algorithm is very much dependent on an accurate calibration of the endoscopic camera and the spectrometer with 24 colors using the Macbeth Color Checker. But this calibration accuracy in the real world is difficult to maintain. Spectral sensitivities and nonlinear characteristics of different endoscope models, light sources, and camera sensors differ. As an example, variations in the illumination spectra, gamma correction, and dark current may corrupt the information on the color recorded and create imprecisions in the reconstructed hyperspectral data. Furthermore, optical properties of biological tissues, including scattering, absorption, and reflection, are different across patients and body parts. These inconsistencies complicate the attainment of the spectral-to-color mapping of different imaging sessions. As a result, a small calibration error will be magnified by several steps of transformation, which decreases the accuracy of simulated fine-band images. The SAVE algorithm consists of numerous computational steps that are computationally expensive, which consist of a conversion to color space, PCA, regression modeling, spectral reconstruction, and NBI simulation. All of these stages require great processing power and memory bandwidth. Whereas these kinds of computations can be handled offline using high-performance workstations, the merger of SAVE with real-time medical imaging systems is a significant issue. There are strict limitations on the size, power consumption, and processing capacity of capsule endoscopes as well as portable video endoscopes. To achieve clinical frame rates of 2530 fps, special hardware acceleration is required to achieve real-time execution. In the absence of this, latency and frame drops might interfere with live visualization in the course of diagnostic or surgical processes. Furthermore, heterogeneity exists between the hardware of various manufacturers, and so a universal SAVE-compatible platform is not easily possible. One of the biggest problems with the implementation of SAVE is the validation of the simulated NBI images, in particular with the VCE. There is a native NBI mode in traditional Olympus endoscopes that can be compared directly with simulated results, whereas VCEs do not have such an option. The lack of a reference ground truth makes it hard to determine the degree of the fidelity of SAVE to the actual NBI appearance and diagnostic characteristics. Although such measures as SSIM, PSNR, and entropy demonstrate a quantitative similarity, they have no guarantees of being clinically interpretable or lesion-visible. Moreover, the training and evaluation data is quite limited, and it might not reflect the extensive stochastic range of the mucosal texture, vascular patterns, and pathologies in practice. Thus, prior to the clinical implementation of SAVE, it is necessary to carry out an extensive validation with larger, multi-center datasets and use expert samples to ensure diagnostic reliability.
4.2. Advantages and Limitations of Imaging Modalities
The imaging modalities have their own strengths and weaknesses that affect the success of the analysis of medical images. The most widely used modality in endoscopic diagnostics is still WLI because it is widely available, naturally colored, and cheap to compute. It allows for the rapid acquisition of images and can work with normal clinical processes. However, WLI has a poor contrast and spectral information, and, therefore, it has a low sensitivity for early lesion detection and faint tissue differentiation [
31]. Conversely NBI helps improve the visualization of the mucosal structures and vascular patterns with the help of narrow-band light with the central wavelengths of 415 nm and 540 nm, which are the hemoglobin absorption peaks. This enhances the detection of vascular and neoplastic anomalies, and NBI is useful in the early diagnosis of esophageal and gastrointestinal cancers. However, its significant shortcomings are that it utilizes fixed spectral bands and that it also relies on special optical filters, which are costly equipment and restrict the spectral flexibility necessary to fully characterize tissues [
32]. The proposed SAVE provides a computational alternative, which is able to rebuild the hyperspectral images using normal RGB or WLI pictures, to generate more information on spectral images, without having to use any specific optical equipment. This enables a better differentiation of the tissues and a high level of diagnostic accuracy, especially when identifying abnormalities, which depend on the characteristics of hemoglobin absorption. Nevertheless, SAVE also adds more complexity and requires the high-end calibration of colors to guarantee spectral fidelity between systems of various endoscopes. These issues are critical in achieving rational and ubiquitous clinical integration [
31].
Each machine learning and deep learning model employed in the present study has its own strengths and weaknesses that affect its applicability to the medical imaging. Logistic regression is highly interpretable and low in computational costs and hence best suited to small datasets or grayscale image analysis. Nonetheless, it is a linear model by nature and, therefore, cannot represent nonlinear relationships that are complex in high-dimensional image data, explaining its lower accuracy in the context of WLI images as raw data [
32]. The VGG16 model is better at the task of image classification, showing accuracies of up to 99–100 percent after being trained on SAVE-processed images or NBI images. This has been attributed to its high hierarchical feature extraction and strong ability to extract finer details of images. Its large computational and parameter size is the main trade-off, as this could limit its use in real-time or resource-constrained clinical settings [
33].
YOLOv8 offers a good trade-off between speed and accuracy, as it is fast and accurate when it comes to real-time detection and classification in the field of live endoscopy. It features powerful spatial localization due to its anchor-free nature and efficient pyramid of features [
34]. It can however be less sensitive to small spectral differences, because the model is specifically designed to be sensitive to spatial features rather than spectral or textural features. Finally, MobileNetV2 is designed to be deployed on small and embedded platforms and has much lower computational demands. It has a moderate classification accuracy. Although its accuracy is less than that of more deep convolutional networks, it is still acceptable for resource-constrained applications. In order to achieve clinically reliable performance rates, MobileNetV2 might need spectral augmentation or spectral SAVE-based improvement, which offers more discriminative features. In general, these models when combined with SAVE demonstrate potential for effective, precise, and real-time medical image analysis with a variety of hardware and clinical conditions [
34].
5. Conclusions
Machine learning and deep learning methodologies have demonstrated efficacy in assessing medical images of esophageal disorders, underscoring the importance of timely and precise diagnoses to improve treatment outcomes. The assessment of various classification models, including logistic regression, VGG16, YOLOv8, and MobileNetV2, applied to WLI, NBI, and SAVE images was conducted using data from Kaohsiung Medical University. The study findings indicated that NBI and SAVE exhibited enhanced disease detection capabilities compared to WLI, with SAVE achieving a flawless classification performance for each disease category tested. VGG16 achieved the highest results with 100% accuracy, whereas other algorithms exhibited varying performances depending on the chosen imaging method. This research indicates that AI-driven diagnostic tools, when combined with HSI, exhibit significant potential for the improved diagnosis of esophageal diseases. Deep learning systems utilizing advanced imaging techniques enhance medical diagnostic capabilities, resulting in faster and more accurate diagnoses of esophageal diseases. Future research must improve deep learning models by expanding data gathering and incorporating supplementary imaging techniques to enhance diagnostic accuracy. The integration of AI diagnostic technologies into medical practice holds significant potential for transforming the recognition and treatment of esophageal diseases, resulting in enhanced healthcare outcomes.