1. Introduction
The global burden of gastrointestinal disease was heavy in 2019, with an age-standardized rate of 95,582 incident cases per 100,000 person-years (95% UI, 87,741–104,084 incident cases) across 204 countries and territories, which translates to 7.3 billion incident cases (95% UI, 6.7–9.0 billion) [
1]. Gastrointestinal diseases are diseases of the gastrointestinal tract, i.e., the esophagus, stomach, small intestine, large intestine, and rectum; and the accessory organs of digestion, the liver, gallbladder, and pancreas [
2]. The incident cases, deaths, and DALYs of digestive diseases in 2019 worldwide were 443.53 million, 2.56 million, and 88.99 million, with respective increases of 74.44, 37.85, and 23.46% compared with those in 1990 [
3]. Gastrointestinal (GI) alterations in the elderly are usual, and although certain GI disorders have a greater occurrence in the elderly, no GI disease is exclusive to this population. Although specific alterations of the aging GI system are physiologic, others are pathologic and especially more common in individuals older than 65 years old [
4]. Digestive diseases like colorectal cancer, gastric ulcers, and inflammatory bowel diseases have been estimated by the World Health Organization (WHO) to be responsible for a high prevalence rate of disease across the world, with an increasing incidence rate observed due to eating habits, lifestyle, and environmental factors [
5]. Among GI diseases, colorectal cancer alone results in more than 1.9 million new cases annually, with gastric cancer and esophageal cancer exerting a significant influence on cancer deaths worldwide [
6].
Conventionally, GI disease has been diagnosed by invasive methods such as endoscopic examination, biopsy, and histopathological examination [
7]. Endoscopy remains the gold standard for the identification of gastrointestinal tract pathology; nevertheless, subjective visualization of endoscopic images by hand is inconsistent and subject to specialists’ expertise [
8]. Histopathology, although most precise, is labor-intensive and subject to an expert’s opinion, leading to delay in diagnosis and treatment [
9]. Radiological techniques like magnetic resonance imaging (MRI) and computed tomography (CT) scans provide additional diagnostic proof but are affected by resolution deficiencies and cost [
10]. Such traditional techniques, although useful, are marred by inherent shortcomings such as operator dependence, risk of misinterpretation, and delayed detection of incipient diseases [
11]. Consequently, automated and dependable diagnostic tools are increasingly needed that could avoid substantial limitations, improve accuracy, decrease workload, and enable early detection of GI diseases [
12].
Despite significant advancements in deep learning architectures, several gaps remain in optimizing feature extraction and classification efficiency. Many existing models lack a structured hierarchical approach, utilizing a Mixture of Experts (MoE) framework, which leads to suboptimal feature extraction and inefficient resource utilization. Traditional classification models often struggle to direct attention toward the most clinically relevant regions, which results in the processing of large amounts of irrelevant information and a failure to capture patterns that are critical for diagnosis. The absence of dedicated mechanisms for amplifying important signals further leads to redundant representations and weakens the overall quality of extracted features. Models without residual pathways are especially prone to gradient vanishing issues, which disrupt the training process, limit the exchange of information between layers, and slow down learning efficiency. Another shortcoming is the lack of adaptive routing strategies, preventing the flexible distribution of weights across specialized modules and reducing the capacity to handle inputs of varying complexity. In addition, many designs omit intermediate auxiliary classifiers, despite their usefulness in strengthening gradient propagation, improving early-stage feature learning, and mitigating overfitting by introducing supplementary classification objectives before the final output. Parameter tuning also remains a persistent difficulty, as conventional manual or exhaustive search approaches are still widely used, whereas probabilistic optimization methods such as the Tree-Structured Parzen Estimator can offer faster convergence and better results. Addressing these limitations can pave the way for more effective and resilient classification frameworks in demanding diagnostic applications.
The key contributions of this study are highlighted as follows:
- A hierarchical multi-stage architecture incorporating a Mixture of Experts framework is employed in order to refine the feature extraction step by step, distribute learning across specialized expert blocks, and incorporate the capability to dynamically choose the best-suited expert blocks for maximizing classification efficiency. 
- Each expert block is incorporated with a spatial-channel attention mechanism to increase the feature representation, concentrate on diagnostically meaningful areas, reduce irrelevant information, and adapt control attention weights for better disease classification. 
- Squeeze-and-excitation blocks are incorporated into each expert block to recalibrate channel-wise feature responses, emphasizing key diagnostic patterns while reducing redundant information and retaining essential characteristics for accurate classification. 
- Residual connections have been incorporated in the architecture to avoid the vanishing gradient problem, retain low-level features, carry information smoothly between different layers, and enhance converging speed and robustness for a more stabilized training procedure. 
- Dynamic routing mechanism is incorporated simultaneously in order to allocate relative weights to the expert blocks, so that the feature can be explicitly carried out, and dynamically controls the contributions of each expert block in line with the complexity of input data, so as to render generalized enhancement in an efficient way to the model. 
- Intermediate auxiliary classifiers are incorporated to enhance the flow of gradients, improving feature learning at earlier stages, and effectively reducing overfitting by enforcing extra classification tasks prior to the final decision process. 
- For hyperparameter optimization, Tree-Structured Parzen Estimator is employed to automate parameter tuning, optimize model efficiency, and ensure faster convergence with improved performance. 
The architecture of this work is as follows: 
Section 2 presents the related literature. The complete methodology, along with mathematical modeling, is presented in 
Section 3. Results of the proposed model are discussed in 
Section 4, whereas 
Section 5 concludes this paper with future directions.
  2. Literature Review
GI diseases, encompassing inflammatory disorders and malignant conditions, represent a substantial global health burden, requiring accurate and rapid diagnostic techniques. Conventional diagnostic approaches, including endoscopy, histopathology, and radiological imaging, are widely used; however, these techniques are often laborious, subject to subjective interpretation, and require specialist involvement. Recent breakthroughs in deep learning have demonstrated significant potential in automating the classification of gastrointestinal diseases, utilizing feature extraction approaches to enhance diagnostic accuracy. Although convolutional neural networks (CNNs) are widely utilized for medical image processing, current models encounter difficulties in feature extraction, generalizability, and interpretability.
Escobar et al. [
13] introduced a transfer learning-based solution for endoscopic image-based classification of gastrointestinal diseases, solving the computational complexity issue in current models. The traditional deep learning technique is accurate but involves huge parameters, which constrains its practicality. To avoid this, the authors presented a light CNN-based method using transfer learning with VGG-16 fine-tuned from the 
 layer. The model achieved 98% accuracy and required significantly fewer parameters than other approaches. Ablation experiments compared several CNN architectures, including DenseNet201, ResNet50, Xception, and VGG19, and found that VGG-16 provided the best balance between accuracy and computational cost. Baseline model benchmarking revealed improved performance, with equivalent accuracy to the state of the art, while utilizing significantly fewer computational resources. The authors demonstrated that lightweight models can achieve high classification accuracy after fine-tuning and are thus ideal for real-time clinical use.
Alhajlah et al. [
14] suggested a deep learning framework for classifying gastrointestinal disease to overcome the issue of similarity between infected and normal areas. Classical approaches had limitations in correct classification because overlapping features caused misclassification. To improve diagnostic accuracy, the authors used Mask Recurrent-Convolutional Neural Network (R-CNN) for preliminary infection localization and fine-tuned pre-trained ResNet-50 and ResNet-152 models for feature extraction. The extracted features were combined using a serial method to maximize information retention. However, to minimize redundancy and enhance classification accuracy, the research work utilized an Improved Ant Colony Optimization (ACO) algorithm in feature selection to select the most significant features. Machine learning approaches were employed to conduct the last classification, achieving 96.43% accuracy on a widely available dataset. A comparative evaluation demonstrated that the envisioned method outperformed other available methods in both precision and efficiency.
Noor et al. [
15] introduced a classification framework for gastrointestinal (GI) diseases that leverages deep learning while addressing the inherent difficulties of analyzing wireless capsule endoscopy (WCE) images. WCE data typically suffers from poor contrast and high intra- and inter-class similarity, which makes manual inspection not only labor-intensive but also prone to errors. To mitigate this challenge, the authors designed a brightness-controlled contrast enhancement technique optimized with a genetic algorithm (GA) that dynamically adjusts contrast and brightness levels, thereby improving visual quality. The enhancement was validated using quantitative measures such as peak signal-to-noise ratio (PSNR), mean square error (MSE), visual information fidelity (VIF), and information quality index (IQI), all of which confirmed significant improvements. When coupled with transfer learning and standard classifiers, this method achieved a classification accuracy of 96.40%, demonstrating that better image quality directly contributes to improved diagnostic performance and the potential for early detection of GI diseases.
Abraham et al. [
16] developed a transfer learning approach for detecting and classifying digestive disorders, with emphasis on gastrointestinal diseases such as cancer, heartburn, irritable bowel syndrome (IBS), and lactose intolerance. The study tested endoscopic image data using several pre-trained networks—ResNet50, InceptionV3, DenseNet121, and EfficientNetB0. The results showed clear gains over conventional methods, with EfficientNetB0 providing the highest accuracy (98.01%), precision (98%), and recall (98%). Compared to earlier work, this framework achieved superior outcomes, indicating that transfer learning can considerably enhance computer-aided diagnostic tools. The authors also suggested that such models can be extended to other medical imaging domains in combination with image enhancement to boost reliability.
Khan et al. [
17] proposed GestroNet, an intelligent system for the automatic detection and classification of GI tract diseases, including bleeding, polyps, and ulcers. To overcome challenges in lesion segmentation due to varying shapes and locations, the framework used deep saliency maps for region extraction, combined with Bayesian optimization for feature selection. MobileNet-V2, fine-tuned through transfer learning, served as the backbone, while a hybrid whale optimization algorithm eliminated redundant features. The system was validated on three datasets—Kvasir 1, Kvasir 2, and CUI Wah—achieving outstanding accuracies of 99.61%, 98.20%, and 98.02%, respectively. Compared to existing methods, GestroNet exhibited superior performance, underscoring the benefits of combining deep architectures with optimization algorithms.
Noor et al. [
18] presented a deep learning model for GI disease classification that integrates an attention mechanism to enhance the focus on pathological regions. Their framework employed a lightweight CNN for feature extraction, with subsequent dimensionality reduction using a cosine similarity-based feature selection method. When tested on the Kvasir dataset, the approach yielded an accuracy of 97.68%. The results showed that attention-driven feature refinement and selection improve both classification accuracy and robustness, making the framework suitable for real clinical use.
Lonseko et al. [
19] developed an attention-guided CNN framework for GI disease classification that incorporates an encoder–decoder structure with a spatial attention module. This mechanism selectively emphasizes diagnostically relevant regions of the image. Data augmentation and five-fold cross-validation were applied to counter dataset imbalance and ensure reliable evaluation. Compared with standard models such as ResNet50, GoogLeNet, and DenseNet, the proposed model achieved superior results, with accuracy up to 93.19%. Validation with t-SNE visualization and confusion matrices further confirmed its effectiveness. The study highlighted the contribution of spatial attention in improving diagnostic accuracy for gastrointestinal imaging tasks.
Overall, the reviewed studies reveal several recurring limitations in existing GI disease classification approaches. Many conventional architectures exhibit weak progressive feature refinement and insufficient extraction of discriminative information, resulting in moderate classification accuracy. Models frequently fail to adequately highlight clinically important regions, which increases the risk of misdiagnosis. A further weakness lies in the absence of channel-level attention, limiting the model’s ability to emphasize critical features. Training instability also arises in deeper networks due to vanishing gradients, while most systems lack adaptive mechanisms for dynamically adjusting learning weights. Additionally, insufficient early-layer learning leads to the loss of low-level diagnostic cues, and reliance on manual hyperparameter tuning remains both inefficient and resource-intensive. Addressing these shortcomings is essential for building more reliable and robust GI disease classification systems.
  5. Conclusions and Future Work
This study introduces GID-Xpert, a hierarchical, multi-stage, attention-driven model that integrates expert blocks with dynamic routing for the classification of gastrointestinal diseases. Our approach addresses key challenges in traditional and deep learning-based GI disease diagnostics by incorporating spatial-channel attention, expert modules, and dynamic routing mechanisms. Experimental evaluations on multiple datasets demonstrate that GID-Xpert achieves superior classification performance, particularly excelling in bleeding detection with 100% accuracy on WCEBleedGen and near-perfect results on KAUHC. The model also demonstrates good performance on the GastroEndoNet dataset, pointing to its suitability for varied GI conditions. The efficacy of expert modules and attention mechanisms is confirmed through ablation studies, which stress their importance in optimizing feature extraction and classification accuracy. Although the model exhibits strong performance, its generalizability on unseen datasets and in real-life clinical settings requires further exploration. Subsequent studies may explore the integration of multimodal clinical data, optimization of computational efficiency, and the incorporation of explainable AI methods to enhance the interpretability of results. In total, GID-Xpert is a remarkable contribution to computerized GI disease diagnosis, providing a scalable and effective solution for image analysis in the medical field.