1. Introduction
Cancers are a group of noncommunicable diseases that can occur almost anywhere in the human body [
1]. They are characterized by unregulated cell growth and invasion into neighboring tissues, organs, and other anatomical sites. Among all causes of death, the World Health Organization (WHO) ranks cancer as the second leading killer worldwide. In 2020, there were
million new cases and
million deaths [
2]. Unfortunately, most cancers continue to pose difficulties regarding early detection, treatment, and prognosis [
3].
Early cancer detection improves curability, resulting in significantly less morbidity and mortality than if cancers are detected at more advanced stages [
4]. In recent years, medical imaging techniques have played a crucial role in cancer assessment because they provide detailed visualization of the human body’s internal structures, which aids in cancer diagnosis and treatment [
5]. In addition, accurate cancer susceptibility, recurrence, and survival predictions are essential to increase patients’ survival rates.
Oral cancer is a complex, widespread malignancy, reported as the sixth most diagnosed cancer [
4]. There were 377,713 new lip and oral cavity cancer cases in 2020, with 177,757 deaths as a result [
2]. As shown in
Figure 1, by 2030, there will be an estimated 467k new cases and 220k deaths [
2]. One of the most lethal diseases of the head and neck is oral cancer, characterized by a wide range of behavior patterns, a high recurrence rate, and an increasing incidence [
6]. Comorbidities such as speech impairment, oral pain, malnutrition, dysphagia, and lack of appetite are also common among people with oral cancer, and they contribute to the poor health-related quality of life these patients experience [
1]. Oral squamous cell carcinomas (OSCCs) account for more than 90% of all cases of oral cancer [
7]; however, only 70% of patients will be alive after five years [
1] with this aggressive form of cancer. While lymphoma and leukemia are the most common types of cancer in KSA, OC is the third most common [
8].
Oral cancer predominantly affects the head, neck, and various subsites (
Figure 2) [
9,
10]. It often arises from oral lesions and can potentially spread to other body parts [
11]. While advancements in treatment, including chemoradiation, radiation therapy, immunotherapy, and anticancer treatments, have improved, the survival rate remains at 40% to 50% [
12]. Early diagnosis and tailored treatment selection are imperative to enhance patient outcomes. Unfortunately, most oral cancer cases are detected late, with early lesions often asymptomatic and benign, making clinical diagnosis challenging [
4]. Overcoming obstacles like low awareness, limited screening, and delayed specialist consultation is crucial to prevent misdiagnosis, disease progression, and decreased survival.
The early detection of oral cancer improves patient survival rates [
13] and can impact the outcomes for individuals with oral cancer as follows:
- -
More effective treatment options: Treatment options are more effective when oral cancer is detected early. Surgery, radiation therapy, and chemotherapy are common treatments for oral cancer, and they are most successful when the cancer is localized and has not spread to nearby tissues or lymph nodes.
- -
Higher cure rates: Early-stage oral cancer is often curable. Patients diagnosed at an early stage have a significantly higher chance of being cured than those diagnosed at an advanced stage, when the cancer has already spread to other body parts.
- -
Preservation of function and appearance: Early detection may allow less aggressive treatments to preserve important functions such as speech, swallowing, and chewing. It can also help in preserving the patient’s facial appearance.
- -
Reduced morbidity: Advanced oral cancer can lead to significant morbidity, including disfigurement and difficulty in eating and speaking. Early detection can reduce the extent of surgery required and the associated complications, leading to a better quality of life for the patient.
- -
Lower healthcare costs: Treating oral cancer at an advanced stage typically involves more extensive and costly interventions. Early detection can lead to less aggressive treatments, shorter hospital stays, and reduced healthcare expenses.
- -
Improved quality of life: Early detection increases the chances of survival and improves the patient’s overall quality of life. Patients diagnosed and treated at an early stage generally experience fewer side effects from treatment and a faster recovery.
Early oral cancer detection can reduce the death rate by 70% [
14], underscoring the importance of precise histopathological identification and accurate early detection for informed treatment decisions and improved survival. In the early stages, oral tumors often lack symptoms and manifest as erytholeuko-plastic lesions, including white patches (leukoplakia) or red patches (erythroplakia) [
9]. The process of oral tumor identification involves several stages, as illustrated in
Figure 3. It begins with conventional oral examinations conducted by dentists and specialists during routine check-ups. Subsequently, two diagnostic approaches are employed. The first is non-invasive and includes digital imaging, biomarker detection in saliva, and medical imaging techniques like computed tomography (CT) and magnetic resonance imaging (MRI) [
15]. Pathologists or computer-aided systems can perform non-invasive assessments. The second approach is invasive and entails a tissue biopsy for microscopic analysis. Histology grading is used to classify cancer cells based on tissue abnormalities, focusing on architectural differences and keratin pearls [
16]. High-throughput microscopy techniques, such as ex vivo fluorescent confocal microscopy (FCM), can also be employed.
In the invasive assessment, tissue samples are extracted following a clinical examination to confirm the disease’s presence through histological processes. These collected samples are processed, embedded in paraffin blocks, and then sectioned. Various tissue components are stained with different dyes for examination under optical magnification, commonly using hematoxylin and eosin (H&E) staining [
1]. Diagnosing oral lesions relies on complex and expensive microscopic examination to detect cyto-histopathological abnormalities by analyzing tissue characteristics [
17]. Histopathological images contain valuable phenotypic information for disease management and patient survival [
18]. However, this gold standard approach is demanding, requiring experienced pathologists to annotate structures and morphological features on numerous tissue sections, impacting examination accuracy. Given the limitations of current approaches, there is a need for more accurate early screening methods for oral cancer, emphasizing the importance of precise histopathological identification in disease estimation and prognosis.
The rising prevalence of several diseases has forced medical experts to turn to technological aid [
19]. In this vein, increasing diagnostic and prognosis accuracy may aid doctors in providing more precise care [
6]. Furthermore, whole-slide images (WSIs) enable digital image production from whole-tissue slides at high resolution [
18]. However, these microscopic imaging techniques produce unlabeled large-sized microscopic images containing spatial information, cell interactions, and many objects [
20]. Manual screenings for oral cancer, while valuable, come with several challenges and limitations, such as (i) subjectivity: manual screenings heavily rely on the subjective judgment and experience of healthcare professionals, such as dentists and oral surgeons. Variability in interpretation can lead to inconsistencies in detecting abnormalities. (ii) Late detection: in some cases, oral cancer may not exhibit visible or palpable symptoms until it reaches an advanced stage. This means that even skilled professionals may miss early signs of cancer during routine screenings. (iii) Patient cooperation: successful manual screenings depend on patients’ ability to cooperate by opening their mouths and remaining fully still. This can be challenging, especially for anxious patients with limited mobility or cognitive impairments. (iv) False positives and negatives: manual screenings can result in false positives (identifying a benign condition as cancerous) or false negatives (missing cancerous lesions). These errors can lead to unnecessary anxiety and additional testing or delayed diagnosis. (v) Time consuming: manual screenings can be time consuming, particularly in busy clinical settings. This may lead to rushed examinations or reduced thoroughness. These challenges can impact the accuracy and effectiveness of early detection efforts. Thus, an automated system is necessary to augment pathologists’ tasks.
Pathomics, integrating machine learning and digital pathology, aims to enhance prognostication. It can analyze a wide range of whole-slide image (WSI) data to generate quantitative features characterizing tissue sample phenotypes [
21]. Machine learning has emerged as a promising approach in oncology, supporting disease prevention, accurate diagnoses, treatment decisions, and patient care [
19]. This technology effectively analyzes medical images, including lesions and pathologies, facilitating early and precise diagnosis based on macroscopic photographs.
The digitalization of histopathology and pathomics has created a promising field that can transform medical and surgical pathology [
20]. Machine learning, particularly deep learning, offers an opportunity to automate feature extraction and classification for early malignancy screening [
22]. Despite challenges like limited datasets, heterogeneity, and computational complexity, this study is motivated by several factors: (i) automating time-consuming tasks in visual tissue slide examination can aid pathologists [
23]. (ii) Precise automated classification is crucial for early tumor detection. (iii) Deep learning impacts medical diagnosis. (iv) It assists medical professionals in treatment planning; and (v) optimization processes are essential for model design and hyperparameter tuning.
The research focuses on the challenging issue of oral cancer, a complex malignancy with a high incidence rate. Despite remarkable treatment strategy advancements, oral cancer’s survival rate remains distressingly low. Late-stage diagnosis is a primary contributing factor to this unfortunate reality. Early detection is pivotal in enhancing patient outcomes, as it directly correlates with reduced morbidity and mortality rates. Regrettably, early-stage oral cancer lesions often remain asymptomatic, posing significant diagnostic challenges. The current diagnostic methods, which heavily rely on labor-intensive and time-consuming histopathological examination by experienced pathologists, have demonstrated limitations. Therefore, a pressing need exists for developing more reliable and efficient early oral cancer detection screening methods. With this context in mind, the primary research objectives of this study are outlined as follows:
- -
Development of an automated oral cancer classification model: The foremost goal is to create an innovative model for the automated classification of oral cancer, employing cutting-edge deep learning techniques, specifically, convolutional neural networks (CNNs). This framework will be designed to be adaptable, alleviating the necessity for the manual assignment of hyperparameters and ensuring the automation of the classification process.
- -
Leveraging transfer learning for enhanced efficiency: Harness the power of transfer learning (TL) to enhance the efficiency and effectiveness of oral cancer classification significantly. This involves capitalizing on pre-trained CNN models to improve the accuracy of our classification system.
- -
Optimization with Aquila and Gorilla Optimizer algorithms: Investigate the utilization of the Aquila and Gorilla Optimizers. These algorithms will play a pivotal role in optimizing the performance of both the CNNs and the TL process. The aim is to explore the possibilities of improving the classification accuracy. Furthermore, conduct a comparative analysis pitting the Aquila and Gorilla Optimizers against other nature-inspired algorithms to ascertain which optimization approach yields the superior results within the context of oral cancer classification.
In summary, this research addresses the pressing issue of early oral cancer detection through a multifaceted approach that includes automated classification models, transfer learning, and advanced optimization algorithms. The following points summarize the current study’s contributions:
- -
An innovative model for classifying oral cancer, built upon pre-trained CNNs.
- -
The fusion of deep learning CNNs with the Aquila and Gorilla Optimizers demonstrates their efficiency in oral cancer classification.
- -
A comprehensive optimization of each pre-trained model’s performance through meticulous adjustments to the CNN and TL hyperparameters facilitated by the Aquila and Gorilla Optimizers.
- -
The introduction of an adaptable framework that eliminates the need for manual hyperparameter assignment.
- -
A thorough comparative analysis between the two optimization algorithms, Aquila and Gorilla.
- -
Promising outcomes in classification performance, as substantiated by standard performance metrics.
The remaining sections of the paper are as follows:
Section 2 presents an overview of deep learning, metaheuristic optimization, and the AO and GTO algorithms. The related research is set out in
Section 3.
Section 4 presents the strategy and framework that have been proposed. The experimental results and comparisons with state-of-the-art techniques are discussed in
Section 5. Finally,
Section 6 presents the study’s conclusions.
2. Background
Deep learning (DL) is a subset of artificial intelligence that mimics brain functions in data processing and decision making. The potential impact of applying optimized deep learning (DL) models in healthcare diagnostics is profound and far-reaching. Optimized DL models can analyze medical images, such as X-rays, MRIs, CT scans, and histopathology slides, with unprecedented accuracy. This enables the early detection of diseases like cancer, cardiovascular issues, and neurological disorders, increasing the chances of successful treatment and improved patient outcomes. Further, DL models can assist healthcare professionals in making more accurate diagnoses. They can help identify subtle patterns or anomalies that might be missed by human observers, reducing diagnostic errors and ensuring that patients receive appropriate care. They can analyze large datasets to predict disease outbreaks, patient readmissions, and healthcare resource utilization.
Optimized models can analyze a patient’s medical history, genetic data, and other relevant information to recommend personalized treatment plans. This tailoring of treatment can lead to more effective therapies with fewer side effects. Additionally, DL models can automate routine tasks, such as triage, data entry, and medical image analysis. They can accelerate discovery by predicting potential drug candidates, simulating molecular interactions, and identifying disease biomarkers.
While there is an initial investment in developing and optimizing DL models, they can ultimately lead to cost savings in healthcare. Early disease detection, reduced hospitalizations, and more efficient resource allocation can lower healthcare costs in the long run. Moreover, DL models can continuously learn and adapt to new data and research findings. As more patient data becomes available and medical knowledge advances, these models can improve diagnostic accuracy and treatment recommendations. DL models can also be integrated into telemedicine platforms, enabling remote diagnosis and consultation.
Deep learning can address several challenges associated with manual, oral cancer screenings [
24]. Deep learning can help to overcome these challenges in the following ways:
- -
Automation: Deep learning models can automatically analyze medical images without human interpretation, such as oral cavity photos or radiographs. This automation can increase the efficiency of screenings and reduce the burden on healthcare professionals.
- -
Consistency: Deep learning models provide consistent and objective results, reducing the variability introduced by different healthcare providers. This consistency can lead to more reliable detection of abnormalities.
- -
Early detection: Deep learning algorithms can detect subtle and early signs of oral cancer that the human eye might miss. They can identify irregular patterns, shapes, and color changes in images indicative of precancerous or cancerous lesions.
- -
Enhanced visualization: Deep learning can enhance the visualization of challenging areas in the oral cavity, such as the base of the tongue or tonsils, by processing images to highlight potential abnormalities or fusing information from multiple imaging modalities.
- -
Reduced false positives and negatives: With proper training and validation, deep learning models can significantly reduce the occurrence of false positives and false negatives in oral cancer screenings, leading to more accurate results and reducing patient anxiety.
- -
Continuous learning: Deep learning models can continuously learn and adapt from new data, allowing them to improve over time as more cases are analyzed. This adaptability can keep the model up to date with the latest information and detection techniques.
- -
Speed: Deep learning algorithms can quickly process large medical image datasets, leading to faster screenings and potentially earlier diagnosis.
- -
Risk stratification: Deep learning can help to stratify patients into different risk categories based on the severity of detected abnormalities, allowing healthcare providers to prioritize follow-up care for high-risk individuals.
Deep learning uses multiple nonlinear layers to extract features. Convolutional neural networks (CNNs), a DL subcategory, are commonly used for visual image analysis with minimal preprocessing. CNNs were introduced by LeCun et al. in 1998 for document identification [
25]. In recent years, medical professionals have shown increasing interest in using machine learning for diagnostics [
25]. DL’s potential is promising, but demands large datasets and it traditionally operates in isolation.
Using pre-trained networks like AlexNet for various tasks, transfer learning breaks this isolation paradigm, enabling knowledge transfer between tasks. Transfer learning adapts existing knowledge to new domains, avoiding the need to start from scratch when learning something new. It involves using components from one model to create a model for a different purpose, often incorporating additional training data and neural layers [
26,
27]. Transfer learning plays a crucial role in enhancing the accuracy of the classification system in several ways:
- -
Feature extraction: Transfer learning leverages pre-trained deep learning models (e.g., DenseNet201, VGG16, Xception) trained on large and diverse datasets, such as ImageNet, for general image recognition tasks. These models have learned to extract valuable hierarchical features from images, which are useful for various classification tasks. Instead of training a deep model from scratch, transfer learning allows these pre-trained models to be used as feature extractors.
- -
Reduced data requirements: Training deep neural networks from scratch often requires a vast amount of labeled data, which may not always be available, especially in medical imaging tasks. Transfer learning mitigates this challenge by using pre-trained models that have already learned generic features and achieves high accuracy even with a relatively small dataset, such as the one used in this study.
- -
Fine-tuning: Transfer learning allows for fine-tuning pre-trained models on a specific task. In the proposed framework, models like DenseNet201 were fine-tuned using oral cancer histopathology slide images. Fine-tuning involves updating the model’s weights and parameters to adapt to the specific characteristics of the new dataset.
- -
Speed and efficiency: Transfer learning reduces the training time and computational resources compared to training a deep model from scratch.
The efficacy of deep learning (DL) models is profoundly contingent upon the volume of accessible data and the strategic selection of hyperparameters. The hyperparameter configuration substantially impacts a convolutional neural network’s performance, and suboptimal selections can detrimentally affect applications [
28]. Instead of adopting a random approach to determining hyperparameter values, an optimization procedure is implemented to fine-tune these parameters [
28] meticulously. Such optimization ensures that hyperparameters are modulated proficiently, enhancing the application’s performance.
As discussed earlier, optimization methods are vital in various fields, including engineering, mathematics, medicine, and the military. They are crucial in enhancing efficiency and effectiveness by finding optimal solutions to recurrent problems across different domains [
29]. These methods are particularly valuable when applied to real-world scenarios, where finding the best solution can have significant practical implications. In medical applications, preprocessing and optimization techniques have gained increasing attention from healthcare professionals; specifically, the automated classification of diseases, such as the early detection of oral cancer. However, diagnosing these conditions accurately and efficiently can be challenging. In the preprocessing and optimization stages of building a deep learning system for tasks such as oral cancer classification, several challenges are commonly encountered:
- -
Data preprocessing challenges:
- -
Data quality: Histopathology slide images may have varying qualities due to image resolution, staining variations, and artifacts.
- -
Data augmentation: Augmenting the dataset by creating variations of the original images is essential for training deep learning models effectively. However, determining which augmentation techniques to apply and their parameters can be challenging.
- -
Hyperparameter optimization challenges:
- -
High-dimensional hyperparameter space: Deep learning models have numerous hyperparameters, including learning rates, batch sizes, dropout rates, activation functions, and more. The hyperparameter space is high-dimensional, making manual tuning impractical.
- -
Computational resources: Conducting an exhaustive search of the hyperparameter space can be computationally expensive and time-consuming, especially when dealing with multiple models and configurations.
- -
Overfitting: Optimizing hyperparameters can lead to overfitting, where the model performs exceptionally well on the training data but fails to generalize to new, unseen data.
- -
Model selection challenges:
- -
Model complexity: Choosing the appropriate deep learning architecture for the task is crucial. Models vary in complexity, and selecting one that balances performance and computational cost is challenging.
- -
Transfer learning: Another challenge is deciding whether to use transfer learning and selecting the most suitable pre-trained model. Not all pre-trained models are equally effective for every task.
- -
Optimizer selection challenges:
- -
Optimizer diversity: There is a wide variety of optimization algorithms available for deep learning, including gradient-based methods, evolutionary algorithms, and metaheuristic optimizers.
- -
Optimizer hyperparameters: Optimizers have hyperparameters that need tuning, such as learning rates and momentum. Determining the optimal values for these hyperparameters is challenging.
- -
Evaluation metrics: Choosing appropriate evaluation metrics to assess the performance of the models is essential. In medical applications such as oral cancer classification, metrics like accuracy, sensitivity, specificity, and area under the ROC curve (AUC) are commonly used, but selecting the most relevant ones is challenging.
This is where optimization techniques come into play. By automating the classification of lesions based on medical imaging data, optimization algorithms can assist healthcare providers in making timely and accurate diagnoses. The optimization process [
29] is iterative and involves an extensive search for the best solution among various trial alternatives. Optimization techniques can be broadly categorized into deterministic and stochastic algorithms [
29,
30]. Deterministic methods find globally optimal solutions quickly but may suffer from performance degradation as the problem size increases. These methods are complex and specialized [
31] and struggle with NP-hard multidimensional problems.
On the other hand, stochastic optimizers, types of stochastic algorithms, use randomness to explore solutions broadly, although they do not guarantee optimal results [
31]. Heuristic approaches, like evolutionary algorithms, memetic algorithms, and greedy strategies, fall under this category, providing efficient, near-optimal solutions at a lower cost [
31]. However, many of these heuristics are problem-specific.
Metaheuristic algorithms, a class of stochastic algorithms inspired by biological systems, excel in solving nonlinear, multidimensional optimization problems [
32]. They offer accurate and robust solutions, and their problem-independent nature makes them adaptable to various design challenges. Metaheuristics tackle complex, intractable problems at a higher level of abstraction [
31] without depending on preconditions like differentiability or continuity.
Metaheuristics have several advantages: they do not require gradient information, can be adjusted dynamically, and are flexible due to their black-box design. These procedures start with trial-and-error approaches, evaluate potential solutions based on algorithm-specific equations, and continue until a predetermined stopping criterion is met [
31]. As a result, different optimization techniques can yield solutions with varying levels of improvement.
Metaheuristic optimization involves a two-stage approach to finding optimal solutions: diversification (exploration) and intensification (exploitation). Diversification aims to maintain a global search by reducing the risk of getting stuck in local minima through randomizing the search. Intensification evaluates promising solutions near the population memory, akin to a targeted local search. Balancing these stages is crucial for effective metaheuristic optimization.
The no free lunch (NFL) theorem states that all nonsampling algorithms are roughly as effective as one another in solving practically any optimization issue. The theory holds that any given black-box search or optimization algorithm will produce the same results across various target functions within a constrained search space [
32]. However, the issue with algorithms is that they cannot effectively tackle every real-world situation. In the end, the NFL theorem has the potential to derail the efforts of the researcher who seeks to create a super-algorithm that solves all issues faster than a random algorithm.
Nature-inspired metaheuristic algorithms simulate biological or physical phenomena to solve optimization problems. The algorithms can be broken down into five classes: physics-based, nature-based, human-based, swarm-based, and animal-based. Researchers have shown that most metaheuristic algorithms are inspired by the strategies employed by predators and prey in the wild.
Three of the most used metaheuristic algorithm types are based on evolution, physics, and swarms [
30]. The swarm algorithm is a model that may be used to simulate the social behavior of a population. Various optimization algorithms based on swarms have been developed since the early 1990s, including particle swarm optimization (PSO) and ant colony optimization (ACO). Swarm intelligence algorithms include, but are not limited to, artificial bee colony algorithms, firefly optimization algorithms, grey wolf optimization algorithms, sparrow optimization methods, and whale optimization algorithms.
The artificial Gorilla Troops Optimizer (GTO) is a recently released metaheuristic optimization method inspired by gorillas’ natural behavior. Abdollahzadeh et al. [
33,
34] developed the GTO in 2021. The technique simulates gorillas’ social behavior and movements in the wild. Gorillas live in family units known as “troops”, which typically include a dominant male known as a “silverback”, as well as many females and their young [
32]. The GTO stands out due to its unique inspiration from the social behavior of gorilla troops. The GTO introduces a novel optimization approach by simulating the dynamics of gorilla family units led by dominant silverbacks and considering the interactions between different members. This algorithm leverages the division of roles within gorilla troops, mimicking these groups’ cooperation and decision-making processes. GTO offers a fresh perspective on optimization, potentially enhancing its performance in solving complex problems. Its adaptability and emulation of nature’s strategies make it valuable to the optimization toolkit.
The Aquila Optimizer (AO) is an optimization algorithm that takes its cues from the natural world and is inspired by the activity of hunting. As one of the most well-known raptors, the aquila is a common sight. Aquilas can capture a wide range of ground-dwelling prey due to their swiftness, agility, sturdy feet, and large sharpened talons. Aquila employs four main hunting methods, each with advantages and disadvantages; most can switch between them quickly and intelligently depending on the circumstances [
35,
36,
37].
The AO algorithm emulates aquilas’ actions at each hunting stage to show how the bird operates under pressure. An overview of the AO algorithm’s four main steps reveals that it involves high soaring with a vertical stoop to select the search area, contour flight with short glide attacks to locate within divergent search areas, low soaring with a slow descent to exploit within convergent search areas, and walk-and-grab attacks to swoop in and grab targets within convergent search areas. AO has two phases of updating the current individuals: exploration and exploitation, as do other metaheuristic techniques. Furthermore, the AO algorithm can employ alternative behaviors to transition from the exploration phase to the exploitation phase based on this condition:
the exploration steps will be enabled; otherwise, the exploitation steps will take place [
35,
38]. The exploration phase occurs when
, and it contains two methods; the first one is expanded exploration while the second is narrowed exploration.
The AO draws inspiration from the hunting tactics of the aquila, a raptor known for its agility and efficiency in capturing prey. The AO’s ability to replicate these hunting strategies provides a versatile optimization approach. By dynamically transitioning between these methods based on specific conditions, AO introduces an element of adaptability not commonly seen in other metaheuristic algorithms. This adaptability enables AO to optimize solutions effectively across various problems and complexities. The Aquila Optimizer (AO) contributes to the system’s accuracy by efficiently tuning and optimizing the various hyperparameters of the deep learning models used in the oral cancer classification system. The AO can enhance the accuracy as follows:
- -
Hyperparameter optimization: DL models have numerous hyperparameters that significantly impact performance. These hyperparameters include learning rates, batch sizes, dropout rates, activation functions, and more. Manually tuning these hyperparameters can be time consuming and may not yield the best results. The AO automates this process by intelligently searching the hyperparameter space to find the optimal configuration for each model. This fine-tuning leads to improved accuracy.
- -
Loss function selection: AO recommends using specific loss functions for different models. The choice of loss function is crucial in training deep learning models. Different loss functions are suitable for different tasks and datasets.
- -
Model selection: The AO might also assist in selecting the most appropriate pre-trained convolutional neural network (CNN) model for the task. Different CNN architectures have varying levels of complexity and are better suited for specific types of data.
- -
Robustness to data variability: Like many medical datasets, oral cancer datasets can be highly variable due to differences in patient populations and image quality. The AO helps make the models robust to this variability by finding hyperparameter configurations that work well across different subsets of the data.
- -
Optimal data augmentation: The AO can also guide the decision to use data augmentation techniques. Data augmentation involves creating variations of the original data to improve the model’s ability to generalize to unseen examples. The AO can determine whether data augmentation would benefit each model, further enhancing accuracy.
These optimization techniques, rooted in the principles of artificial intelligence, stochastic algorithms, and nature-inspired metaheuristics, lay the groundwork for the application of automated classification of oral cancer. By harnessing the power of deep learning and optimization, we aim to enhance the accuracy and efficiency of diagnosing oral cancer, ultimately improving patient outcomes and advancing medical diagnostics.
3. Related Studies
The advent of machine learning achieved tremendous changes in medical imaging analysis by developing robust approaches to tackle medical image classification issues and providing computer-aided diagnosis systems that reduce observer-specific variability [
16]. Furthermore, CNNs demonstrate the possibility of automating the classification of various cancerous tumors. Herein, two major techniques can be devolved; the first is based on manual feature extraction according to pathologists’ knowledge to ascertain grading. The second technique is based on deep learning without manual feature engineering. As a result, image classification has benefited greatly from deep learning by developing an architecture that meets the classification challenges and increases the predictable outcomes.
OSCC can be detected early, which helps reduce cancer-related mortality and morbidity [
39]. Unfortunately, oral cancer is identified at an advanced stage in the majority of instances. The histopathological examination is the standard procedure for diagnosing OSCC; however, tumor heterogeneity constitutes a major challenge [
11]. The increasing application of digitalization in histopathology motivates extensive research on developing accurate deep-learning-based decision support systems that can help in OSCC prognosis and management. Obtaining reliable diagnostic and prognostic information for OSCC could greatly assist pathologists in making informed judgments that assure effective healthcare screening support, early detection, and treatment. This section reviews the up-to-date state-of-the-art studies that have applied deep learning in OSCC.
Aubreville et al. [
40] proposed a deep artificial neural network (DNN) approach for the automatic binary classification of OSCC. The approach was based on 7894 oral cavity CLE images from OSCC patients. The proposed approach first preprocessed images by grouping images into patches and scaling them to reduce processing complexity and noise. Data augmentation (DA) was then performed to enrich the data. Finally, classification approaches were deployed using SVM and RF. The approach achieved image recognition with an accuracy of 88.3%. However, this approach needs further enhancement, especially in accuracy and adapting it for more complex diagnosis tasks. Ariji et al. [
41] investigated a deep learning CT image classifier to diagnose oral cancer lymph node metastasis. They used CT images of 441 histological nodes from 45 patients with OSCC. Their approach involved segmentation by two experienced radiologists, augmentation, training with a five-fold cross-validation procedure, validation, and testing using the AlexNet architecture. Using the DCNN classification algorithm, they achieved an accuracy of 78.2%.
Jeyaraj and Nadar [
14] presented a partitioned DCNN (PDCNN) model for detecting cancerous lesions in hyperspectral images. The PDCNN model was developed to classify RoI in the multidimensional hyperspectral image. The model involves classification, segmentation, labeling, feature extraction, and deep learning algorithms. They used a dataset from three repositories consisting of 2200 images. The proposed partitioned DCNN outperformed the conventional SVM classification technique and achieved a classification accuracy of 94.5% using the selected bagging and boosting method that selects the final feature based on weighted votes. Motivated by building a lightweight oral lesions classifier, Jubair et al. [
39] introduced a transfer-learning-based DCNN. They used the EfficientNet-B0 transfer model for binary classification. For training and testing, 716 clinical images for tongue lesions were used. The experimental analysis reported an accuracy of 90%. The limitation of this study was relying on a small dataset that contained tongue lesions only. An OSCC classification using the MobileNet CNN for FCM scanner images was demonstrated in [
42]. Tissue samples from twenty patients were collected and identified based on the location and histological grading, and then an ex vivo FCM investigation proceeded. After that, tissue annotation and feature extraction were performed. The model achieved a specificity of 96%. The main drawback of this work was using small sample sizes.
An automated oral lesions binary classification deep learning technique was proposed using the combined ResNet50 and VGG16 models in [
43]. The model was trained with 332 oral lesion digital images. These images were processed by discrete wavelet transform and adaptive histogram equalization. The ensemble model, capable of effectively extracting all useful features, achieved an accuracy of 96.2%. Jelena et al. [
11] introduced a CNN-based dual-stage OSCC diagnostic system. The system performed OSCC multiclass classification and segmentation. In the first stage, three automated classes of tumor grading were performed using Xception and SWT. In the second stage, microenvironment cell segmentation was used to discover new features. The workflow started with image acquisition, preprocessing, and augmentation. Then, decomposition, semantic segmentation, and classification. Finally, the best-performing configuration was deployed. The ensemble resulted in a classification accuracy of 94.1%. In [
8], the authors aimed to construct a simple and reliable ANN model for classifying oral cancer based on risk factors, systematic medical problems, and clinical pathology aspects. A dataset consisting of 73 patients with 29 variables/cases was used. The analysis demonstrated a classification accuracy of 78.95%. The proposed model’s biggest flaw was using a too-small database.
Panigrahi et al. [
16] studied applying a deep learning architecture called a capsule network (CN) for OSCC diagnosis using histopathological images. For classification, the CN was based on a dynamic routing algorithm. Five distinct processes make up the proposed method: preprocessing, segmentation, image, augmentation, data partitioning, and binary classification. This method achieved 97.35% accuracy using a WSI of 150 images. In [
1], the authors used a CNN to classify and segment OSCC from H&E-stained histological WSIs. The first stage was preprocessing to remove background and scanning artifacts and segment RoI to extract and quantify features. A new dataset involving two types of WSI containing 85,621 image patches for OSCC tissue samples and breast cancer metastases was introduced. The method achieved an accuracy of 97.6%, and the preprocessing stage needed further optimization. In [
44], Maurya et al. introduced a TL-based classification approach for multiclass OSCCs based on microscopic imaging. The framework extracted features from the three ensembles of DCNN models that applied various optimization methods. The framework was trained and tested on five large public datasets. A classification accuracy of 99.28% was achieved for the ensemble-based approach.
A segmentation and classification approach for detecting oral dysplasia lesions (ODLs) was introduced in [
44]. The approach involved four stages: segmentation using an R-CNN, post-processing using morphological operations, feature extraction, and classification using a polynomial classifier. In this method, 66 images of the tongues of mice were histologically divided into 296 sections. The segmentation and classification accuracy ranged from 88.92% to 90.35%. In addition, preprocessing techniques could reduce the impact of pigmentation excesses or deficiencies. A multiclass OSCC grading was proposed in [
45], using multiple DCNN architectures. A five-stage architecture was proposed based on TL with four pre-trained models and a CNN model. The workflow started with the acquisition, labeling, augmentation, segmentation, and classification. The proposed CNN model obtained an accuracy of 97.5% with 156 histopathological WSI datasets, and a large-scale training dataset was needed. Figueroa et al. [
22] developed a deep learning training approach to achieve understandability. This study was focused on utilizing gradient-weighted class activation mapping and a dual training process with augmentation for optimized classification and segmentation. First, they collected the dataset and performed data cleaning, labeling, and annotation. Two-stage training was then performed; in the first stage TL was used on VGG19 as well as data augmentation, and in the second stage the GAIN training architecture was deployed. Although a classification accuracy of 86.38% was achieved, further enhancement is required.
The application of DS approaches has already shown that they have the potential to revolutionize medical care in the fields of imaging, surgery, and laboratory medicine [
46]. Numerous deep learning architectures have been proposed concerning OSCC automatic detection; however, they have several issues and challenges. For example, most of the proposed approaches lack highly complete datasets, have high system operation costs, suffer from limited accuracy, and fail optimization. Accordingly, further studies are needed to develop enhanced and optimized architectures to be integrated into clinical practices. Future research in OSCC detection techniques could focus on the following:
- -
Large-scale datasets: Collecting and annotating large-scale datasets with diverse OSCC cases to train more robust deep learning models.
- -
Reducing operational costs: Exploring cost-effective data acquisition methods and model deployment in clinical settings.
- -
Improving accuracy: Investigating advanced network architectures, ensemble methods, and hybrid approaches to enhance classification accuracy.
- -
Clinical integration: Achieve seamless integration of deep learning models into clinical workflows, ensuring practical utility.
- -
Addressing dataset bias: Addressing potential biases in training data that may affect model generalization.
Further research and improvements in OSCC detection techniques promise to revolutionize oral cancer diagnosis and management, ultimately improving patient outcomes. This comprehensive review of related studies informs the approach taken in this manuscript, where the metaheuristic optimization algorithms GTO and AO were proposed to enhance the accuracy and efficiency of OSCC detection from histopathological images. The subsequent sections will detail the methodology and experimental setup based on the insights gained from the reviewed studies.
6. Conclusions and Future Work
In this study, we have presented a novel and highly effective methodology for the classification of oral cancer utilizing pre-trained convolutional neural networks (CNNs) in conjunction with two distinct metaheuristic optimization algorithms, namely, the Gorilla Troops Optimizer (GTO) and Aquila Optimizer (AO). Our approach focuses on optimizing the preprocessing steps, selecting appropriate optimizers, and fine-tuning the hyperparameters of pre-trained CNNs. We conducted experiments on the Histopathologic Oral Cancer dataset obtained from Kaggle, which comprises two classes: “normal”, with 2494 images, and “OSCC”, with 2698 images. Our preprocessing pipeline involved resizing, dimension scaling, and balancing the datasets, followed by data augmentation to enhance model generalization.
The AO and GTO metaheuristic optimizers were instrumental in optimizing various transfer learning (TL) parameters, ensuring that each pre-trained CNN model reached its optimal configuration of hyperparameters. This approach, instead of random or grid searches, has demonstrated its reliability in producing superior results. We employed several key metrics to assess model performance, including accuracy, area under the curve (AUC), and specificity. Our study incorporated nine pre-trained CNN models, including NASNetMobile, Xception, VGG16, VGG19, DenseNet201, MobileNetV2, MobileNetV3Small, MobileNet, and MobileNetV3Large, all initialized with “ImageNet” pre-trained weights. The preliminary results showcase the effectiveness of our proposed framework, achieving an impressive average accuracy of 99.25% when the Aquila Optimizer uses the (recommended) KL divergence loss function for seven models. The GTO recommends the Poisson loss function for four models. Notably, the AO outperforms the GTO in terms of F1 score across all models except for Xception, with DenseNet201 emerging as the top-performing model.
In future research, we plan to rigorously assess and validate the comparison between the two optimizers using statistical tests such as Friedman’s and Wilcoxon rank-sum tests. Additionally, we envision extending the application of our framework to other domains, such as COVID-19 and breast cancer detection. Exploring diverse deep learning architectures and investigating the integration of swarm intelligence in oral cancer treatment are promising directions for our ongoing work.