GPTNeXt: Biomedical Image Classification Investigations
Abstract
1. Introduction
1.1. Motivation and Our Model
1.2. Novelties and Contributions
- Architectural Innovation: We propose GPTNeXt, a lightweight CNN (7.4 M parameters) that incorporates transformer-inspired design principles including a patchify stem, Pre-LN normalization, GELU activation, and dual shortcut connections within an efficient convolutional framework.
- Feature Engineering Framework: We introduce an exemplar-based deep feature engineering approach that extracts multi-scale representations through fixed-size patch decomposition, combined with INCA-based feature selection for dimensionality reduction.
- Empirical Validation: We demonstrate consistent classification performance exceeding 98% accuracy across three diverse biomedical imaging datasets, with comprehensive statistical validation including confidence intervals and significance testing.
2. Materials and Methods
2.1. Material
2.1.1. Alzheimer’s MR Image Dataset (AD)
2.1.2. Blood Image Cell Dataset (Blood)
2.1.3. Lung Cancer Image Dataset
2.2. GPTNeXt
- Lightweight Design (7.4 M vs. 28.6 M parameters): GPTNeXt achieves a 74% parameter reduction compared to ConvNeXt-Tiny, enabling deployment in resource-constrained clinical environments and embedded systems commonly used in point-of-care settings. This reduction is achieved without sacrificing performance, where GPTNeXt achieves comparable or superior accuracy to ConvNeXt-T across all three biomedical datasets.
- Grouped Downsampling with Smaller Kernels (3 × 3 vs. 7 × 7): Unlike ConvNeXt’s 7 × 7 depth-wise convolutions, GPTNeXt employs 3 × 3 grouped convolutions with dual shortcut connections. This design choice: (i) reduces computational cost (1.8 GFLOPs vs. 4.5 GFLOPs) while maintaining receptive field coverage through stacked layers; (ii) better preserves fine-grained details critical for biomedical imaging, such as cellular morphology in blood cell images, tissue boundaries in histopathology, and subtle atrophy patterns in brain MRI; and (iii) provides implicit regularization through parameter sharing, which is beneficial for limited-size medical datasets.
- Dual Shortcut Connections: GPTNeXt introduces two residual connections within each block (as shown in Figure 2) compared to ConvNeXt’s single shortcut connection. This architectural choice enhances gradient flow during training and enables more stable convergence on limited biomedical datasets where overfitting is a primary concern. The dual shortcut s contribute +0.93% accuracy improvement over single shortcuts.
- Biomedical-Specific Design Rationale: Our architectural decisions are motivated by unique characteristics of biomedical image classification: (i) medical images often contain subtle, localized abnormalities (e.g., early-stage tumor markers and cellular morphology variations) requiring fine-grained feature extraction; (ii) limited dataset sizes compared to natural image datasets (e.g., ImageNet) benefit from lightweight architectures with inherent regularization properties; (iii) clinical deployment scenarios prioritize inference efficiency (4.5 ms per image) and model compactness (28.2 MB) for integration into existing hospital information systems; and (iv) multi-scale feature extraction (full image + patches) is essential for capturing both global context and local pathological patterns.
- Feature Engineering Compatibility: Unlike ConvNeXt’s end-to-end classification paradigm, GPTNeXt’s architecture is specifically designed to generate discriminative features at the global average pooling layer, enabling effective integration with our patch-based exemplar feature engineering pipeline. The architectural simplicity (straight-forward design without complex attention mechanisms) facilitates feature extraction stability across different image patches, as evidenced by the consistent performance improvements, when patch-based features are incorporated.
2.3. GPTNeXt-Based Exemplar Deep Feature Engineering
- First, the patch size of 112 × 112 (half of the input resolution) ensures that each patch captures sufficient contextual information while enabling localized feature extraction. This multi-scale approach is analogous to spatial pyramid pooling, which has been theoretically shown to improve translation invariance and enable the capture of features at multiple granularities.
- Second, the overlapping stride of 56 pixels (50% overlap) ensures that discriminative features near patch boundaries are captured in multiple patches, reducing information loss at boundary regions. The resulting nine patches provide systematic coverage of the spatial domain: four corner regions, four edge–center regions, and one central region.
- Third, combining features from the full image with patch-level features creates a hierarchical representation that captures both global context (disease-level patterns) and local details (cellular or structural abnormalities). This dual-scale approach is particularly beneficial for biomedical images where diagnostic information may be distributed across different spatial scales.
3. Experiments
- Solver: Stochastic Gradient Descent with Momentum (sgdm);
- Initial Learning Rate: 0.01;
- Maximum Epoch: 100;
- Mini Batch Size: 32;
- Training and Validation Ratio: 70:30.
- Pretrained GPTNeXt: GAP layer utilized to extract 1280 features;
- Patch Divider: Patch size: 112 × 112, Stride: 56, and Number of created patches: 9;
- INCA: Range of iteration: from 100 to 1000, Misclassification rate calculator: kNN, Solver: Stochastic Gradient Descent (sgd), and Number of iterations of NCA: half of the number of images;
- kNN: k: 1, Distance metric: L1-norm, Voting: none, Validation: 10-fold cross-validation;
- SVM: Kernel: cubic polynomial kernel, C: 1, Coding: one-vs-one, and Validation: 10-fold cross-validation;
- LDA: Gamma: zero, Filling coefficients: off, and Validation: 10-fold cross-validation.
4. Discussion
4.1. Ablation Study
4.2. Contribution Analysis: Architecture vs. Feature Engineering
4.3. Feature Engineering Pipeline Ablation
- (i)
- Contribution of shallow classifier approach: Replacing softmax classification with kNN on the deep features extracted from the full image improves the accuracy by 0.34–4.63 percentage points across datasets, demonstrating the benefit of decoupling feature extraction from classification.
- (ii)
- Contribution of patch-based multi-scale extraction: Adding patch-based features increases the feature dimension from 1280 to 12,800. Interestingly, without INCA feature selection, this expansion actually decreases performance for the Blood (−0.53%) and Lung (−0.44%) datasets due to the curse of dimensionality, while improving the AD dataset (+3.22%), which has more training samples. This highlights that simply adding more features is not always beneficial.
- (iii)
- Critical role of INCA feature selection: INCA feature selection provides consistent improvements across all datasets (+1.27% to +2.55%), demonstrating its essential role in identifying discriminative features and mitigating the curse of dimensionality. The final selected feature counts (177 for AD, 950 for Blood, and 308 for Lung) represent optimal subsets that maximize classification performance.
- (iv)
- Synergistic effect: The combination of patch-based extraction and INCA selection yields performance greater than either component alone, confirming that both components contribute meaningfully to the overall pipeline effectiveness.
4.4. Feature Selection Analysis
4.5. Classifier Performance Comparison
4.6. Comparison with State of the Art
4.7. Interpretation of High Classification Accuracy
- -
- The GPTNeXt proposed model produced high classification accuracy for all three datasets, representing its good disposition with regards to various biomedical image classification tasks.
- -
- The combination of GPTNeXt with prototypical deep features, a method inspired by Vision Transformer (ViT), was found to be successful in feature extraction from images. INCA feature selection even enhanced the selected feature quality.
- -
- Among all classifiers tested, the kNN classifier appeared to be most efficient, justifying its applicability for the feature set of the proposed model. LDA, which is in any case suboptimal, also led to satisfactory classification.
- -
- The GPTNeXt model is introduced as a novel combination between CNNs and transformer architecture, i.e., the GPT structure. It extends standard CNNs with the transformers’ merits, and achieves superior performance on classification tasks.
- -
- The number of learnable parameters for the proposed GPTNeXt is a modest 7.4 million, driving a lightweight and efficient image classifier. This is beneficial for deep learning applications that are lacking resources.
- -
- The deep feature engineering model with GPTNeXt produces previously unexplored approaches to feature extraction, thus increasing the protein structure footprint of any domain.
- -
- The comparative evaluation performed with state-of-the-art techniques shows that the proposed approach provides competitive classification accuracy, thus proving its effectiveness and importance for the research in biomedical image analysis.
4.8. Computational Cost Analysis
5. Limitations and Future Works
- -
- Research ways to scale the GPTNeXt model to enable it to handle big and complex datasets. This may include augmenting the model’s capacity, investigating deeper models, or using more sophisticated training strategies.
- -
- Performance/model optimization for the real-time/low-resource applications. Apply quantization, pruning or other model compression methods to compress the model so as to minimize memory and computation resources during inference.
- -
- Broaden the range of applications within biomedical imaging to model the final histopathological stains, radiology or microscopy records. Fine-tuning the model with domain-specific datasets can enhance its performance in each area.
- -
- Study methods of multimodal imaging integration, e.g., integration of MRI and PET images for better disease diagnosis and treatment.
- -
- Work with hospitals to test the platform and validate its performance in live clinical workflows. Make sure the model is compatible with medical standards and practices.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| Create Deep Learning Network Architecture Script for creating the layers for a deep learning network with the following properties: Number of layers: 73 Number of connections: 80 Run the script to create the layers in the workspace variable lgraph. To learn more, see Generate MATLAB Code From Deep Network Designer. Auto-generated by MATLAB on 26-Oct-2023 17:45:05 Create Layer Graph Create the layer graph variable to contain the network layers. lgraph = layerGraph(); Add Layer Branches Add the branches of the network to the layer graph. Each branch is a linear array of layers. tempLayers = [ imageInputLayer([224 224 3],”Name”,”imageinput”) convolution2dLayer([4 4],96,”Name”,”conv”,”Padding”,”same”,”Stride”,[4 4]) batchNormalizationLayer(“Name”,”batchnorm”) geluLayer(“Name”,”gelu”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ layerNormalizationLayer(“Name”,”layernorm”) groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv”,”Padding”,”same”) geluLayer(“Name”,”gelu_1”) convolution2dLayer([1 1],96,”Name”,”conv_1”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_1”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition”) layerNormalizationLayer(“Name”,”layernorm_1”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_1”,”Padding”,”same”) geluLayer(“Name”,”gelu_2”) convolution2dLayer([1 1],96,”Name”,”conv_2”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_2”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_1”) geluLayer(“Name”,”gelu_13”) groupedConvolution2dLayer([2 2],2,”channel-wise”,”Name”,”groupedconv_2”,”Padding”,”same”,”Stride”,[2 2]) layerNormalizationLayer(“Name”,”layernorm_2”) geluLayer(“Name”,”gelu_3”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ layerNormalizationLayer(“Name”,”layernorm_3”) groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_3”,”Padding”,”same”) geluLayer(“Name”,”gelu_4”) convolution2dLayer([1 1],192,”Name”,”conv_3”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_3”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_2”) layerNormalizationLayer(“Name”,”layernorm_4”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_4”,”Padding”,”same”) geluLayer(“Name”,”gelu_5”) convolution2dLayer([1 1],192,”Name”,”conv_4”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_4”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_3”) geluLayer(“Name”,”gelu_12”) groupedConvolution2dLayer([2 2],2,”channel-wise”,”Name”,”groupedconv_5”,”Padding”,”same”,”Stride”,[2 2]) layerNormalizationLayer(“Name”,”layernorm_5”) geluLayer(“Name”,”gelu_6”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ layerNormalizationLayer(“Name”,”layernorm_6”) groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_6”,”Padding”,”same”) geluLayer(“Name”,”gelu_7”) convolution2dLayer([1 1],384,”Name”,”conv_5”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_5”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_4”) layerNormalizationLayer(“Name”,”layernorm_7”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_7”,”Padding”,”same”) geluLayer(“Name”,”gelu_8”) convolution2dLayer([1 1],384,”Name”,”conv_6”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_6”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_5”) geluLayer(“Name”,”gelu_14”) groupedConvolution2dLayer([2 2],2,”channel-wise”,”Name”,”groupedconv_8”,”Padding”,”same”,”Stride”,[2 2]) layerNormalizationLayer(“Name”,”layernorm_8”) geluLayer(“Name”,”gelu_9”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ layerNormalizationLayer(“Name”,”layernorm_9”) groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_9”,”Padding”,”same”) geluLayer(“Name”,”gelu_10”) convolution2dLayer([1 1],768,”Name”,”conv_7”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_7”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_6”) layerNormalizationLayer(“Name”,”layernorm_10”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ groupedConvolution2dLayer([3 3],4,”channel-wise”,”Name”,”groupedconv_10”,”Padding”,”same”) geluLayer(“Name”,”gelu_11”) convolution2dLayer([1 1],768,”Name”,”conv_8”,”Padding”,”same”) batchNormalizationLayer(“Name”,”batchnorm_8”)]; lgraph = addLayers(lgraph,tempLayers); tempLayers = [ additionLayer(2,”Name”,”addition_7”) geluLayer(“Name”,”gelu_15”) convolution2dLayer([1 1],1280,”Name”,”conv_9”,”Padding”,”same”) layerNormalizationLayer(“Name”,”layernorm_11”) geluLayer(“Name”,”gelu_16”) batchNormalizationLayer(“Name”,”batchnorm_9”) globalAveragePooling2dLayer(“Name”,”gapool”) fullyConnectedLayer(2,”Name”,”fc”) softmaxLayer(“Name”,”softmax”) classificationLayer(“Name”,”classoutput”)]; lgraph = addLayers(lgraph,tempLayers); % clean up helper variable clear tempLayers; Connect Layer Branches Connect all the branches of the network to create the network graph. lgraph = connectLayers(lgraph,”gelu”,”layernorm”); lgraph = connectLayers(lgraph,”gelu”,”addition/in2”); lgraph = connectLayers(lgraph,”batchnorm_1”,”addition/in1”); lgraph = connectLayers(lgraph,”layernorm_1”,”groupedconv_1”); lgraph = connectLayers(lgraph,”layernorm_1”,”addition_1/in1”); lgraph = connectLayers(lgraph,”batchnorm_2”,”addition_1/in2”); lgraph = connectLayers(lgraph,”gelu_3”,”layernorm_3”); lgraph = connectLayers(lgraph,”gelu_3”,”addition_2/in2”); lgraph = connectLayers(lgraph,”batchnorm_3”,”addition_2/in1”); lgraph = connectLayers(lgraph,”layernorm_4”,”groupedconv_4”); lgraph = connectLayers(lgraph,”layernorm_4”,”addition_3/in1”); lgraph = connectLayers(lgraph,”batchnorm_4”,”addition_3/in2”); lgraph = connectLayers(lgraph,”gelu_6”,”layernorm_6”); lgraph = connectLayers(lgraph,”gelu_6”,”addition_4/in2”); lgraph = connectLayers(lgraph,”batchnorm_5”,”addition_4/in1”); lgraph = connectLayers(lgraph,”layernorm_7”,”groupedconv_7”); lgraph = connectLayers(lgraph,”layernorm_7”,”addition_5/in1”); lgraph = connectLayers(lgraph,”batchnorm_6”,”addition_5/in2”); lgraph = connectLayers(lgraph,”gelu_9”,”layernorm_9”); lgraph = connectLayers(lgraph,”gelu_9”,”addition_6/in2”); lgraph = connectLayers(lgraph,”batchnorm_7”,”addition_6/in1”); lgraph = connectLayers(lgraph,”layernorm_10”,”groupedconv_10”); lgraph = connectLayers(lgraph,”layernorm_10”,”addition_7/in1”); lgraph = connectLayers(lgraph,”batchnorm_8”,”addition_7/in2”); Plot Layers plot(lgraph); |
| from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, LayerNormalization, Add, Activation, GlobalAveragePooling2D, Dense, Softmax # Create Layer Graph lgraph = None # Add Layer Branches input_layer = Input(shape = (224, 224, 3), name = “imageinput”) x = Conv2D(96, kernel_size = (4, 4), padding = “same”, strides = (4, 4), name = “conv”)(input_layer) x = BatchNormalization(name = “batchnorm”)(x) x = Activation(“gelu”, name = “gelu”)(x) # Create the first branch branch1 = Model(inputs = input_layer, outputs = x) # Create the second branch input_layer = branch1.output x = LayerNormalization(name = “layernorm”)(input_layer) x = Conv2D(4, kernel_size = (3, 3), padding = “same”, name = “groupedconv”, strides = (1, 1), activation = “gelu”)(x) x = Conv2D(96, kernel_size = (1, 1), padding = “same”, name = “conv_1”)(x) x = BatchNormalization(name = “batchnorm_1”)(x) x = Activation(“gelu”, name = “gelu_1”)(x) branch2 = Model(inputs = branch1.input, outputs = x) # Create the third branch input_layer = branch2.output x = Add(name = “addition”) ([input_layer, branch1.output]) x = LayerNormalization(name = “layernorm_1”)(x) branch3 = Model(inputs = branch2.input, outputs = x) # Repeat the process for the remaining branches... # Connect Layer Branches lgraph = branch1 lgraph = Model(inputs = lgraph.input, outputs = branch3(lgraph.output)) lgraph = Model(inputs = lgraph.input, outputs = branch2(lgraph.output)) lgraph = Model(inputs = lgraph.input, outputs = branch1(lgraph.output)) |
References
- Amelio, L.; Amelio, A. Classification methods in image analysis with a special focus on medical analytics. In Machine Learning Paradigms: Advances in Data Analytics; Springer: Berlin/Heidelberg, Germany, 2019; pp. 31–69. [Google Scholar]
- James, A.P.; Dasarathy, B.V. Medical image fusion: A survey of the state of the art. Inf. Fusion 2014, 19, 4–19. [Google Scholar] [CrossRef]
- Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural networks: A review. J. Med. Syst. 2018, 42, 226. [Google Scholar] [CrossRef]
- Razzak, M.I.; Naz, S.; Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. In Classification in BioApps: Automation of Decision Making; Springer: Berlin/Heidelberg, Germany, 2018; pp. 323–350. [Google Scholar]
- Kaplan, E.; Chan, W.Y.; Altinsoy, H.B.; Baygin, M.; Barua, P.D.; Chakraborty, S.; Dogan, S.; Tuncer, T.; Acharya, U.R. PFP-HOG: Pyramid and Fixed-Size Patch-Based HOG Technique for Automated Brain Abnormality Classification with MRI. J. Imaging Inform. Med. 2023, 36, 2441–2460. [Google Scholar] [CrossRef] [PubMed]
- Gul, Y.; Muezzinoglu, T.; Kilicarslan, G.; Dogan, S.; Tuncer, T. Application of the deep transfer learning framework for hydatid cyst classification using CT images. Soft Comput. 2023, 27, 7179–7189. [Google Scholar] [CrossRef]
- Kaplan, E.; Chan, W.Y.; Dogan, S.; Barua, P.D.; Bulut, H.T.; Tuncer, T.; Cizik, M.; Tan, R.-S.; Acharya, U.R. Automated BI-RADS classification of lesions using pyramid triple deep feature generator technique on breast ultrasound images. Med. Eng. Phys. 2022, 108, 103895. [Google Scholar] [CrossRef]
- Barua, P.D.; Muhammad Gowdh, N.F.; Rahmat, K.; Ramli, N.; Ng, W.L.; Chan, W.Y.; Kuluozturk, M.; Dogan, S.; Baygin, M.; Yaman, O. Automatic COVID-19 detection using exemplar hybrid deep features with X-ray images. Int. J. Environ. Res. Public Health 2021, 18, 8052. [Google Scholar] [CrossRef]
- Rallabandi, V.S.; Seetharaman, K. Deep learning-based classification of healthy aging controls, mild cognitive impairment and Alzheimer’s disease using fusion of MRI-PET imaging. Biomed. Signal Process. Control 2023, 80, 104312. [Google Scholar] [CrossRef]
- Torigian, D.A.; Zaidi, H.; Kwee, T.C.; Saboury, B.; Udupa, J.K.; Cho, Z.-H.; Alavi, A. PET/MR imaging: Technical aspects and potential clinical applications. Radiology 2013, 267, 26–44. [Google Scholar] [CrossRef] [PubMed]
- Banerjee, A.; Chakraborty, C.; Kumar, A.; Biswas, D. Emerging trends in IoT and big data analytics for biomedical and health care technologies. In Handbook of Data Science Approaches for Biomedical Engineering; Elsevier: Amsterdam, The Netherlands, 2020; pp. 121–152. [Google Scholar]
- Jones, L.; Golan, D.; Hanna, S.; Ramachandran, M. Artificial intelligence, machine learning and the evolution of healthcare: A bright future or cause for concern? Bone Jt. Res. 2018, 7, 223–225. [Google Scholar] [CrossRef]
- Tas, N.P.; Kaya, O.; Macin, G.; Tasci, B.; Dogan, S.; Tuncer, T. ASNET: A Novel AI Framework for Accurate Ankylosing Spondylitis Diagnosis from MRI. Biomedicines 2023, 11, 2441. [Google Scholar] [CrossRef]
- Tuncer, T.; Dogan, S.; Subasi, A. A new fractal pattern feature generation function based emotion recognition method using EEG. Chaos Solitons Fractals 2021, 144, 110671. [Google Scholar] [CrossRef]
- Kumar, A.; Kim, J.; Lyndon, D.; Fulham, M.; Feng, D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J. Biomed. Health Inform. 2016, 21, 31–40. [Google Scholar] [CrossRef]
- Ranschaert, E.R.; Morozov, S.; Algra, P.R. Artificial Intelligence in Medical Imaging: Opportunities, Applications and Risks; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Kaplan, E.; Baygin, M.; Barua, P.D.; Dogan, S.; Tuncer, T.; Altunisik, E.; Palmer, E.E.; Acharya, U.R. ExHiF: Alzheimer’s disease detection using exemplar histogram-based features with CT and MR images. Med. Eng. Phys. 2023, 115, 103971. [Google Scholar] [CrossRef]
- Lanjewar, M.G.; Parab, J.S.; Shaikh, A.Y. Development of framework by combining CNN with KNN to detect Alzheimer’s disease using MRI images. Multimed. Tools Appl. 2023, 82, 12699–12717. [Google Scholar] [CrossRef]
- de Mendonça, L.J.C.; Ferrari, R.J. Alzheimer’s Disease Neuroimaging Initiative. Alzheimer’s disease classification based on graph kernel SVMs constructed with 3D texture features extracted from MR images. Expert Syst. Appl. 2023, 211, 118633. [Google Scholar] [CrossRef]
- Shamrat, F.J.M.; Akter, S.; Azam, S.; Karim, A.; Ghosh, P.; Tasnim, Z.; Hasib, K.M.; De Boer, F.; Ahmed, K. AlzheimerNet: An effective deep learning based proposition for alzheimer’s disease stages classification from functional brain changes in magnetic resonance images. IEEE Access 2023, 11, 16376–16395. [Google Scholar] [CrossRef]
- Arafa, D.A.; Moustafa, H.E.-D.; Ali, H.A.; Ali-Eldin, A.M.; Saraya, S.F. A deep learning framework for early diagnosis of Alzheimer’s disease on MRI images. Multimed. Tools Appl. 2023, 83, 3767–3799. [Google Scholar] [CrossRef]
- Marwa, E.-G.; Moustafa, H.E.-D.; Khalifa, F.; Khater, H.; AbdElhalim, E. An MRI-based deep learning approach for accurate detection of Alzheimer’s disease. Alex. Eng. J. 2023, 63, 211–221. [Google Scholar]
- Yentrapragada, D. Deep features based convolutional neural network to detect and automatic classification of white blood cells. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 9191–9205. [Google Scholar] [CrossRef]
- Ahmad, R.; Awais, M.; Kausar, N.; Akram, T. White Blood Cells Classification Using Entropy-Controlled Deep Features Optimization. Diagnostics 2023, 13, 352. [Google Scholar] [CrossRef] [PubMed]
- Manescu, P.; Narayanan, P.; Bendkowski, C.; Elmi, M.; Claveau, R.; Pawar, V.; Brown, B.J.; Shaw, M.; Rao, A.; Fernandez-Reyes, D. Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning. Sci. Rep. 2023, 13, 2562. [Google Scholar] [CrossRef] [PubMed]
- Leng, B.; Wang, C.; Leng, M.; Ge, M.; Dong, W. Deep learning detection network for peripheral blood leukocytes based on improved detection transformer. Biomed. Signal Process. Control 2023, 82, 104518. [Google Scholar] [CrossRef]
- Rahman, W.; Faruque, M.G.G.; Roksana, K.; Sadi, A.S.; Rahman, M.M.; Azad, M.M. Multiclass blood cancer classification using deep CNN with optimized features. Array 2023, 18, 100292. [Google Scholar] [CrossRef]
- Hamed, E.A.-R.; Salem, M.A.-M.; Badr, N.L.; Tolba, M.F. An Efficient Combination of Convolutional Neural Network and LightGBM Algorithm for Lung Cancer Histopathology Classification. Diagnostics 2023, 13, 2469. [Google Scholar] [CrossRef]
- Huang, P.; Tan, X.; Zhou, X.; Liu, S.; Mercaldo, F.; Santone, A. FABNet: Fusion attention block and transfer learning for laryngeal cancer tumor grading in P63 IHC histopathology images. IEEE J. Biomed. Health Inform. 2021, 26, 1696–1707. [Google Scholar] [CrossRef]
- Zhou, X.; Tang, C.; Huang, P.; Mercaldo, F.; Santone, A.; Shao, Y. LPCANet: Classification of laryngeal cancer histopathological images using a CNN with position attention and channel attention mechanisms. Interdiscip. Sci. Comput. Life Sci. 2021, 13, 666–682. [Google Scholar] [CrossRef] [PubMed]
- Luo, J.; Huang, P.; He, P.; Wei, B.; Guo, X.; Xiao, H.; Sun, Y.; Tian, S.; Zhou, M.; Feng, P. DCA-DAFFNet: An end-to-end network with deformable fusion attention and deep adaptive feature fusion for laryngeal tumor grading from histopathology images. IEEE Trans. Instrum. Meas. 2023, 72, 5031115. [Google Scholar] [CrossRef]
- Raza, R.; Zulfiqar, F.; Khan, M.O.; Arif, M.; Alvi, A.; Iftikhar, M.A.; Alam, T. Lung-EffNet: Lung cancer classification using EfficientNet from CT-scan images. Eng. Appl. Artif. Intell. 2023, 126, 106902. [Google Scholar] [CrossRef]
- Malik, H.; Anees, T.; Din, M.; Naeem, A. CDC_Net: Multi-classification convolutional neural network model for detection of COVID-19, pneumothorax, pneumonia, lung Cancer, and tuberculosis using chest X-rays. Multimed. Tools Appl. 2023, 82, 13855–13880. [Google Scholar] [CrossRef]
- Quasar, S.R.; Sharma, R.; Mittal, A.; Sharma, M.; Agarwal, D.; de La Torre Díez, I. Ensemble methods for computed tomography scan images to improve lung cancer detection and classification. Multimed. Tools Appl. 2023, 83, 52867–52897. [Google Scholar] [CrossRef]
- Huang, P.; He, P.; Tian, S.; Ma, M.; Feng, P.; Xiao, H.; Mercaldo, F.; Santone, A.; Qin, J. A ViT-AMC network with adaptive model fusion and multiobjective optimization for interpretable laryngeal tumor grading from histopathological images. IEEE Trans. Med. Imaging 2022, 42, 15–28. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.; Li, C.; He, P.; Xiao, H.; Ping, Y.; Feng, P.; Tian, S.; Chen, H.; Mercaldo, F.; Santone, A. MamlFormer: Priori-experience guiding transformer network via manifold adversarial multi-modal learning for laryngeal histopathological grading. Inf. Fusion 2024, 108, 102333. [Google Scholar] [CrossRef]
- Pan, H.; Peng, H.; Xing, Y.; Jiayang, L.; Hualiang, X.; Sukun, T.; Peng, F. Breast tumor grading network based on adaptive fusion and microscopic imaging. Opto-Electron. Eng. 2022, 50, 220158. [Google Scholar]
- Iqbal, I.; Mustafa, G.; Ma, J. Deep learning-based morphological classification of human sperm heads. Diagnostics 2020, 10, 325. [Google Scholar] [CrossRef]
- Iqbal, I.; Walayat, K.; Kakar, M.U.; Ma, J. Automated identification of human gastrointestinal tract abnormalities based on deep convolutional neural network with endoscopic images. Intell. Syst. Appl. 2022, 16, 200149. [Google Scholar] [CrossRef]
- Son, J.; Kim, B. Trend Analysis of Large Language Models through a Developer Community: A Focus on Stack Overflow. Information 2023, 14, 602. [Google Scholar] [CrossRef]
- Nazir, A.; Wang, Z. A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges. Meta-radiology 2023, 1, 100022. [Google Scholar] [CrossRef]
- Rane, N.L.; Tawde, A.; Choudhary, S.P.; Rane, J. Contribution and performance of ChatGPT and other Large Language Models (LLM) for scientific and research advancements: A double-edged sword. Int. Res. J. Mod. Eng. Technol. Sci. 2023, 5, 875–899. [Google Scholar]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E. The rise and potential of large language model based agents: A survey. arXiv 2023, arXiv:2309.07864. [Google Scholar] [CrossRef]
- Uraninjo. Augmented Alzheimer MRI Dataset, Augmented Alzheimer MRI Dataset. 2022. Available online: https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset (accessed on 21 January 2025).
- Acevedo, A.; Alférez, S.; Merino, A.; Puigví, L.; Rodellar, J. Recognition of peripheral blood cell images using convolutional neural networks. Comput. Methods Programs Biomed. 2019, 180, 105020. [Google Scholar] [CrossRef]
- Acevedo, A.; Merino, A.; Alférez, S.; Molina, Á.; Boldú, L.; Rodellar, J. A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data Brief 2020, 30, 105474. [Google Scholar] [CrossRef] [PubMed]
- Misra, B. Lung Cancer Image Dataset. 2022. Available online: https://www.kaggle.com/datasets/bhaveshmisra/lung-cancer-images12000-imagesmostly (accessed on 18 January 2025).
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Tuncer, T.; Dogan, S.; Özyurt, F.; Belhaouari, S.B.; Bensmail, H. Novel Multi Center and Threshold Ternary Pattern Based Method for Disease Detection Method Using Voice. IEEE Access 2020, 8, 84532–84540. [Google Scholar] [CrossRef]
- Goldberger, J.; Hinton, G.E.; Roweis, S.; Salakhutdinov, R.R. Neighbourhood components analysis. Adv. Neural Inf. Process. Syst. 2004, 17, 513–520. [Google Scholar]
- Maillo, J.; Ramírez, S.; Triguero, I.; Herrera, F. kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data. Knowl.-Based Syst. 2017, 117, 3–15. [Google Scholar] [CrossRef]
- Vapnik, V. The support vector method of function estimation. In Nonlinear Modeling; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–85. [Google Scholar]
- Zhang, Y.; Zhou, X.; Witt, R.M.; Sabatini, B.L.; Adjeroh, D.; Wong, S.T. Dendritic spine detection using curvilinear structure detector and LDA classifier. Neuroimage 2007, 36, 346–360. [Google Scholar] [CrossRef]
- Rezaee, K.; Khosravi, M.R.; Sabri, M.; Al-Qawasmi, K. A Hybrid Deep Cascade-ResNet Model for Detecting Alzheimer’s Stages in MR Images. In Proceedings of the 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), Zarqa, Jordan, 6–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Jha, S.; Mahajan, A.D.; Thenraj, P.G.; Karuppasamy, S.; Lakshminarayanan, S. Comparative Evaluation of Transfer Learning Models on Dementia Prediction. In Proceedings of the 2022 6th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore, India, 1–3 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1520–1526. [Google Scholar]
- Elgendy, O.; Nassif, A.B. Alzheimer Detection using Different Deep Learning Methods with MRI Images. In Proceedings of the 2023 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 20–23 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Divyadurga, S.; Aashiq, A.; Jayanthi, K. Automated Pathology Diagnosis Using Artifical Intelligence with Histopathological Images of Lungs. In Proceedings of the 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), Vellore, India, 5–6 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]









| Dataset | No | Class | Training | Test | Total | % of Total | Imbalance Ratio |
|---|---|---|---|---|---|---|---|
| AD | 1 | Mild dementia | 8960 | 896 | 9856 | 24.4% | 1.00 (ref) |
| 2 | Moderate dementia | 6464 | 64 | 6528 | 16.2% | 0.66 | |
| 3 | No dementia | 9600 | 3200 | 12,800 | 31.7% | 1.30 | |
| 4 | Very mild dementia | 8960 | 2240 | 11,200 | 27.7% | 1.14 | |
| Total | 33,984 | 6400 | 40,384 | 100% | --- | ||
| Blood | 1 | Basophil | 914 | 304 | 1218 | 7.1% | 1.00 (ref) |
| 2 | Eosinophil | 2340 | 777 | 3117 | 18.3% | 2.56 | |
| 3 | Erythrocyte | 1164 | 387 | 1551 | 9.1% | 1.27 | |
| 4 | IG | 2172 | 723 | 2895 | 16.9% | 2.38 | |
| 5 | Lymphocyte | 912 | 302 | 1214 | 7.1% | 1.00 | |
| 6 | Monocyte | 1068 | 352 | 1420 | 8.3% | 1.17 | |
| 7 | Neutrophil | 2497 | 832 | 3329 | 19.4% | 2.73 | |
| 8 | Platelet | 1764 | 584 | 2348 | 13.7% | 1.93 | |
| Total | 12,831 | 4261 | 17,092 | 100% | |||
| Lung Cancer | 1 | Adenocarcinoma | 4000 | 999 | 4999 | 33.3% | 1.00 (ref) |
| 2 | Neuroendocrine | 4000 | 999 | 4999 | 33.3% | 1.00 | |
| 3 | Squamous cell carcinoma | 4000 | 999 | 4999 | 33.3% | 1.00 | |
| Total | 12,000 | 2997 | 14,997 | 100% | --- | ||
| Layer | Input | Operation | Output |
|---|---|---|---|
| Stem | 224 × 224 | 4 × 4, 96, BN + GELU, stride: 4 | 56 × 56 |
| GPTNeXt block 1 | 56 × 56 | 2 × 2, 192, LN + GELU, stride: 2 | 28 × 28 |
| GPTNeXt block 2 | 28 × 28 | 2 × 2, 192, LN + GELU, stride: 2 | 14 × 14 |
| GPTNeXt block 3 | 14 × 14 | 2 × 2, 192, LN + GELU, stride: 2 | 7 × 7 |
| GPTNeXt block 4 | 7 × 7 | 7 × 7 | |
| Output size | 7 × 7 | 1 × 1, 1280, LN + GELU + BN, global average pooling, fully connected layer, softmax outcomes, classification | |
| Total learnable parameters | 7.4 million | ||
| Design Principle | MobileNetV2 [49] | MobileNetV3 [50] | ConvNeXt-T [48] | GPTNeXt (Ours) |
|---|---|---|---|---|
| Inverted bottleneck | ✓ | ✓ | ✓ | ✓ |
| Depth-wise separable conv | ✓ | ✓ | ✓ (7 × 7) | ✓ (3 × 3) |
| Squeeze-and-excitation | ✗ | ✓ | ✗ | ✗ |
| Patchify stem | ✗ | ✗ | ✓ (4 × 4) | ✓ (4 × 4) |
| Layer normalization | ✗ | ✗ | ✓ | ✓ |
| GELU activation | ✗ | ✗ | ✓ | ✓ |
| Grouped downsampling | ✗ | ✗ | ✗ | ✓ |
| Dual shortcuts | ✗ | ✗ | ✗ | ✓ |
| Parameters | 3.4 M | 5.4 M | 28.6 M | 7.4 M |
| Metric | Dataset | Classifier | ||
|---|---|---|---|---|
| LDA | kNN | SVM | ||
| Accuracy | AD | 99.84 | 100 | 99.89 |
| Blood | 97.91 | 98.24 | 98.03 | |
| Lung | 99.10 | 99.70 | 99.53 | |
| UAR | AD | 99.91 | 100 | 99.94 |
| Blood | 97.76 | 98.17 | 97.98 | |
| Lung | 99.10 | 99.70 | 99.53 | |
| UAP | AD | 99.89 | 100 | 99.93 |
| Blood | 97.93 | 98.19 | 98.15 | |
| Lung | 99.10 | 99.70 | 99.53 | |
| F1 | AD | 99.90 | 100 | 99.93 |
| Blood | 97.85 | 98.18 | 98.07 | |
| Lung | 99.10 | 99.70 | 99.53 | |
| Dataset | Class | Sensitivity | Specificity | Precision | F1-Score |
|---|---|---|---|---|---|
| AD | Mild Dementia | 100.00 | 100.00 | 100.00 | 100.00 |
| Moderate Dementia | 100.00 | 100.00 | 100.00 | 100.00 | |
| No Dementia | 100.00 | 100.00 | 100.00 | 100.00 | |
| Very Mild Dementia | 100.00 | 100.00 | 100.00 | 100.00 | |
| Macro-Average | 100.00 | 100.00 | 100.00 | 100.00 | |
| Blood | Basophil | 98.03 | 99.87 | 98.35 | 98.19 |
| Eosinophil | 99.74 | 99.94 | 99.74 | 99.74 | |
| Erythrocyte | 96.90 | 99.79 | 97.91 | 97.40 | |
| IG | 96.54 | 99.24 | 96.28 | 96.41 | |
| Lymphocyte | 99.34 | 99.87 | 98.36 | 98.85 | |
| Monocyte | 96.88 | 99.74 | 97.15 | 97.01 | |
| Neutrophil | 98.08 | 99.53 | 98.08 | 98.08 | |
| Platelet | 99.83 | 99.95 | 99.66 | 99.74 | |
| Macro-Average | 98.17 | 99.74 | 98.19 | 98.18 | |
| Lung | Adenocarcinoma | 99.80 | 99.65 | 99.30 | 99.55 |
| Neuroendocrine | 100.00 | 100.00 | 100.00 | 100.00 | |
| Squamous Cell Carcinoma | 99.30 | 99.90 | 99.80 | 99.55 | |
| Macro-Average | 99.70 | 99.85 | 99.70 | 99.70 |
| Dataset | Classifier | Accuracy (%) | 95% CI | Sample Size |
|---|---|---|---|---|
| AD | LDA | 99.84 | [99.71, 99.91] | 6400 |
| kNN | 100.00 | [99.94, 100.00] | 6400 | |
| SVM | 99.89 | [99.77, 99.95] | 6400 | |
| Blood | LDA | 97.91 | [97.44, 98.30] | 4261 |
| kNN | 98.24 | [97.80, 98.59] | 4261 | |
| SVM | 98.03 | [97.57, 98.41] | 4261 | |
| Lung | LDA | 99.10 | [98.69, 99.38] | 2997 |
| kNN | 99.70 | [99.43, 99.84] | 2997 | |
| SVM | 99.53 | [99.21, 99.72] | 2997 |
| Dataset | Comparison | χ2 | p-Value | Significance |
|---|---|---|---|---|
| AD | kNN vs. LDA | 10.24 | 0.0014 | ** |
| kNN vs. SVM | 7.11 | 0.0077 | ** | |
| Blood | kNN vs. LDA | 8.45 | 0.0037 | ** |
| kNN vs. SVM | 2.13 | 0.1445 | n.s. | |
| Lung | kNN vs. LDA | 12.57 | 0.0004 | *** |
| kNN vs. SVM | 3.27 | 0.0706 | n.s. |
| Configuration | Accuracy (%) | Parameters |
|---|---|---|
| Baseline (ResNet-18 style) | 96.42 | 11.7 M |
| +Patchify stem | 97.15 | 8.9 M |
| +Inverted bottleneck | 97.89 | 7.8 M |
| +Layer Norm (Pre-LN) | 98.34 | 7.6 M |
| +Grouped downsampling | 98.91 | 7.4 M |
| +GELU activation | 99.12 | 7.4 M |
| +Dual shortcuts (Full GPTNeXt) | 99.84 | 7.4 M |
| Method | AD (%) | Blood (%) | Lung (%) |
|---|---|---|---|
| GPTNeXt (softmax only) | 89.60 | 96.96 | 98.53 |
| ResNet-50 (softmax only) | 87.34 | 95.12 | 97.45 |
| ConvNeXt-T (softmax only) | 88.92 | 96.23 | 98.21 |
| GPTNeXt + Feature Eng. + kNN | 100.0 | 98.24 | 99.70 |
| ResNet-50 + Feature Eng. + kNN | 97.23 | 96.45 | 98.12 |
| ConvNeXt-T + Feature Eng. + kNN | 99.45 | 97.89 | 99.21 |
| k Value | AD (%) | Blood (%) | Lung (%) |
|---|---|---|---|
| k = 1 | 100.00 | 98.24 | 99.70 |
| k = 3 | 99.92 | 97.89 | 99.53 |
| k = 5 | 99.86 | 97.65 | 99.37 |
| k = 7 | 99.78 | 97.42 | 99.20 |
| k = 9 | 99.72 | 97.21 | 99.03 |
| Configuration | AD (%) | Blood (%) | Lung (%) | Features |
|---|---|---|---|---|
| GPTNeXt (softmax only) | 89.60 | 96.96 | 98.53 | N/A |
| GPTNeXt + Full image only + kNN | 94.23 | 97.35 | 98.87 | 1280 |
| GPTNeXt + Patches (no INCA) + kNN | 97.45 | 96.82 | 98.43 | 12,800 |
| GPTNeXt + Patches + INCA + kNN (Full) | 100.00 | 98.24 | 99.70 | 177/950/308 |
| Study | Method | Split ratio | Classifier | Accuracy (%) |
|---|---|---|---|---|
| Alzheimer’s MR Image Dataset (AD) | ||||
| Rezaee et al. [56] | Cascade-ResNet | 5-fold CV | Softmax | 99.02 |
| Jha et al. [57] | InceptionResnetV2 | 84.15:15.85 | Softmax | 99.23 |
| Elgendy and Nassif [58] | VGG16 | 70:15:15 | Softmax | 97.00 |
| Our method | GPTNeXt | 10-fold CV | kNN | 100.0 |
| Blood Image Cell Dataset (Blood) | ||||
| Acevedo et al. [45] | VGG16 | 5-fold CV | Softmax | 96.20 |
| Our method | GPTNeXt | 10-fold CV | kNN | 98.24 |
| Lung Cancer Image Dataset | ||||
| Sowdeswari et al. [59] | ResNet50 | 67:25:8 | Softmax | 98.00 |
| Our method | GPTNeXt | 10-fold CV | kNN | 99.70 |
| Phase | Component | AD | Blood | Lung |
|---|---|---|---|---|
| Training | GPTNeXt CNN training (100 epochs) | ~38 min | ~14 min | ~13 min |
| Feature extraction (training set) | ~113 s | ~43 s | ~40 s | |
| INCA feature selection (901 iterations) | ~8 min | ~4 min | ~4 min | |
| Classifier training (10-fold CV) | ~3 min | ~2 min | ~2 min | |
| Total training time | ~52 min | ~23 min | ~22 min | |
| Inference | Single-image processing | 4.5 ms | 4.5 ms | 4.5 ms |
| Throughput | 222 img/s | 222 img/s | 222 img/s |
| Step | Operation | Time (ms) |
|---|---|---|
| 1 | Image resizing (224 × 224 × 3) | 0.5 |
| 2 | Patch generation (9 patches, 112 × 112 × 3) | 0.3 |
| 3 | GPTNeXt forward passes (10×) | 3.3 |
| 4 | Feature concatenation (12,800-dim) | 0.1 |
| 5 | Feature selection (pre-computed indices) | 0.1 |
| 6 | kNN classification | 0.2 |
| Total | 4.5 |
| Component | Memory |
|---|---|
| GPTNeXt model (FP32) | 28.2 MB |
| GPTNeXt model (FP16) | 14.1 MB |
| Feature vector per image (12,800 × FP32) | 50.0 KB |
| Peak GPU memory (training) | ~2.5 GB |
| Peak GPU memory (inference) | ~500 MB |
| Model | Parameters (M) | GFLOPs | Model Size (MB) | Inference Time (ms) |
|---|---|---|---|---|
| VGG-16 | 138.4 | 15.5 | 528 | 31.0 |
| ResNet-50 | 25.6 | 4.1 | 98 | 23.0 |
| ConvNeXt-T | 28.6 | 4.5 | 109 | 18.0 |
| EfficientNet-B0 | 5.3 | 0.4 | 20 | 13.0 |
| GPTNeXt (Ours) | 7.4 | 1.8 | 28 | 4.5 * |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alotaibi, F.A.; Yagmahan, M.S.N.; Alobaid, K.A.; Jari, M.; Goktas, O.F.; Baygin, M.; Tuncer, T.; Dogan, S. GPTNeXt: Biomedical Image Classification Investigations. Diagnostics 2026, 16, 581. https://doi.org/10.3390/diagnostics16040581
Alotaibi FA, Yagmahan MSN, Alobaid KA, Jari M, Goktas OF, Baygin M, Tuncer T, Dogan S. GPTNeXt: Biomedical Image Classification Investigations. Diagnostics. 2026; 16(4):581. https://doi.org/10.3390/diagnostics16040581
Chicago/Turabian StyleAlotaibi, Fahad A., Mehmet Said Nur Yagmahan, Khalid A. Alobaid, Mousa Jari, Omer Faruk Goktas, Mehmet Baygin, Turker Tuncer, and Sengul Dogan. 2026. "GPTNeXt: Biomedical Image Classification Investigations" Diagnostics 16, no. 4: 581. https://doi.org/10.3390/diagnostics16040581
APA StyleAlotaibi, F. A., Yagmahan, M. S. N., Alobaid, K. A., Jari, M., Goktas, O. F., Baygin, M., Tuncer, T., & Dogan, S. (2026). GPTNeXt: Biomedical Image Classification Investigations. Diagnostics, 16(4), 581. https://doi.org/10.3390/diagnostics16040581

