1. Introduction
Cancer can affect any organ of the human body; the foremost areas to be commonly affected are the brain, colon, skin, breasts, stomach, rectum, liver, prostate, and lungs. The common tumors to cause death in females and males are lung and colon cancer (LCC) [
1]. When lung cells mutate uncontrollably, malignant cells appear, forming clusters called cancers. Globally, lung and colorectal (rectum and colon) tumors are the general kinds of tumors after breast cancer (BC) [
2]. Moreover, colorectal and lung tumors have resulted in death rates of 9.4% and 18% correspondingly among all tumors. Hence, to explore the treatment options in the early phase of diseases, the precise detection of these tumor subtypes are essential. The noninvasive approaches for detection involve computed tomography (CT) imaging and radiography for flexible sigmoidoscopy and lung cancer and CT colonoscopy for colon tumors [
3], but dependable classifying of these tumors is not probable utilizing noninvasive means at all times, and invasive processes such as histopathology are essential for accurate disease detection and the enhanced quality of treatments. As well, the manual grading of histopathologic images may be annoying for pathologists [
4]. Likewise, the precise grading of the colon and lung tumor subtypes necessitates a trained pathologist, and manual grading can be prone to error. Now, automatic image processing techniques are being applied for lung tumors [
5].
Artificial intelligence (AI) methods are utilized in the medical domain, such as the initial detection of biomedical images, health disasters, forecasts of diseases, etc. [
6]. Deep learning (DL) methods can examine data in anatomical representations, high-dimensional images, and videos [
7]. Likewise, DL methods derive hidden characteristics and features from healthcare images that are invisible to the naked eye for the initial cancer recognition and discrimination between its phases. DL refers to a subdivision of ML that eradicates the necessity for manual feature engineering, and CNN-related DL methods present hierarchical mapping features for superior representations of input images [
8]. However, enormous data are needed for training large DL methods; transfer learning (TL) aids in adapting large pretraining methods for downstream tasks. Therefore, TL decreases the necessity for an enormous dataset for training that is rare in particular domains such as medicine [
9]. TL and DL execute a crucial role in healthcare in framing automatic diagnostic mechanisms by utilizing healthcare images that include magnetic resonance images, histopathological images, radiographs, retina images, etc. Such automatic mechanisms are mainly utilized for classifier tasks and assist doctors in circumstances of automated quality checking and rapid data acquisition [
10].
This manuscript offers the design of the Al-Biruni Earth Radius Optimization with Transfer Learning-based Histopathological Image Analysis for Lung and Colon Cancer Detection (BERTL-HIALCCD) technique. In the BERTL-HIALCCD technique, an improved ShuffleNet model is applied for the feature extraction process, and its hyperparameters are chosen by the BER algorithm. For the effectual detection of LCC, a deep convolutional recurrent neural network (DCRNN) model is applied. At the final stage, the coati optimization algorithm (COA) is exploited for the parameter selection of the DCRNN approach. The design of BER and COA for the hyperparameter tuning of the improved ShuffleNet and DCRNN models demonstrates the novelty of the work. To examine the result of the BERTL-HIALCCD technique, a comprehensive group of experiments was conducted on a large dataset of histopathological images.
2. Related Works
In [
11], the authors utilized an AI-supported method and optimization approaches to realize the categorization of histopathologic images of colon and lung tumors. In the presented method, the image classes were trained with the DarkNet-19 technique, one of which was the DL method. With the equilibrium and Manta Ray Foraging optimizer methods, the selection of the ineffective attributes was attained. In the feature set mined from the DarkNet19 method, the potential attributes gained by the two utilized optimization methods were classified and integrated with the SVM approach. In [
12], a hybrid classification method that included hog and daisy feature extraction modules and inception_v3 network were built to categorize lung tumors and normal tissues from lung pathological imagery. In [
13], the authors presented a brief analysis of two feature extraction approaches for colon and lung tumor classification. In one method presented, six handcrafted extracted features methods dependent on shape, color, structure, and texture. The RF, Gradient Boosting (GB), MLP, and SVM-RBF methods with handcrafted attributes were tested and trained for lung and colon tumor categorization. In [
14], the main intention of this case was to utilize digital histopathology images and a multi-input capsule network for framing an enhanced computerized diagnosis mechanism to find adenocarcinomas and squamous cell carcinomas of the lungs, in addition to adenocarcinomas of the colon. During the presented multi-input capsule network, two convolution layer blocks were utilized. The CLB (convolutional layer blocks) considered unprocessed histopathologic images as the input.
The authors of [
15] presented a hybrid ensemble extracted feature method to proficiently find the LCC. It integrated ensemble learning and deep feature extraction with high-performance filters for cancer image data. Hamida et al. [
16] concentrated on the usage of a DL structure for highlighting and classifying colon tumor regions in a sparsely annotated histopathologic data context. Firstly, the authors compared and reviewed the existing CNN that included the DenseNet, AlexNet, vgg, and ResNet Inception methods. The authors resorted to the use of TL methods to cope with the lack of a rich WSI dataset.
Mangal et al. [
17] aimed to devise a computer-aided diagnosis system to find squamous cell carcinomas and adenocarcinomas of the lungs, in addition to adenocarcinomas of the colon, utilizing CNN by assessing the digital pathology images for these cancers. Ding et al. [
18] designed FENet for genetic mutation estimation utilizing histopathologic images of colon cancer. Different to traditional methods of analyzing patch-related features alone without concerning their spatial connectivity, FENet incorporated feature enhancements in convolutional graph NN to combine discriminatory attributes with the capture gene mutation status.
In [
19], a clinically comparable CNN structure-based approach to carry out an automated classifier of cancer grades and tissue infrastructures in hematoxylin and eosin-stained colon HSI was presented. It contained Enhanced Convolutional Learning Modules (ECLMs), a multi-level Attention Learning Module (ALM), and Transitional Modules (TMs). In [
20], the efficiency of an extensive variety of DL-based structures was measured for the automatic tumor segment of colorectal tissue instances. The presented method demonstrated the efficacy of integrating CNN and TL in the encoded part of the segmentation structure for histopathology image diagnosis. In [
21], the authors provided a systematic examination of XAI with an initial concentration on the methods that are presently being utilized in the domain of the healthcare system. In [
22], the authors established a DL technique to predict disease-specific survival for stage II and III colorectal cancer utilizing 3652 cases. In [
23], a novel approach dependent upon GCN for the early detection of COPD was presented that utilized lesser and weakly labeled chest CT image data in the openly accessible Danish Lung Cancer Screening Trial database. Jain et al. [
24] demonstrated that lung cancer recognition depends on a histopathological image diagnosis utilizing DL structures. Afterwards, the image features were extracted utilizing Kernel PCA combined with CNN with KPCA (KPCA-CNN) utilized in the feature extracted layer of the CNN.
Although several LCC classification models are available in the literature, it is still required to enhance the detection performance. Most of the existing works did not focus on the hyperparameter optimization process. Generally, hyperparameter optimization helps in the identification of an optimal combination of hyperparameters for a given model architecture and dataset. It proficiently searches the hyperparameter space for the detection of the optimal configuration, which saves time and computational resources by automatically exploring different combinations rather than relying on manual trial-and-error approaches. It helps in quickly finding a good set of hyperparameters without exhaustively evaluating every possible combination. Therefore, in this work, the BER and COA are used for the hyperparameter tuning process.
3. Materials and Methods
In this manuscript, we developed a novel LCC detection model named the BERTL-HIALCCD technique. The study aims to find LCC effectually on histopathological images. To achieve this, the BERTL-HIALCCD approach follows a series of subprocesses, namely improved ShuffleNet feature extraction, BER-based parameter optimization, DCRNN-based classification, and COA-based hyperparameter tuning.
Figure 1 signifies the overall flow of the BERTL-HIALCCD system.
3.2. Feature Extraction Using Improved ShuffleNet
In this work, the improved ShuffleNet method extracts features from histopathological images. Depthwise convolution (DW-Conv) is a special case of gathering convolutions in general and then evaluating a typical convolution with
×
dimensions for the depthwise convolutional procedure [
25]:
In Equation (1),
represent the resultant feature matrix and convolutional kernel weighted matrix, correspondingly.
represents the coordinates of the related matrix. The depthwise separable convolution replaces the typical convolution, has lesser computations, and is typical for lightweight methods:
where
and
show the side size of the feature matrix and convolutional kernels, and
and
denote the computation count of depthwise convolutional and typical convolution correspondingly.
and
signify the number of channels in the output and input feature maps. Thus, the computation amount of the depthwise convolution is 1/9 of the typical convolution.
The channel shuffle method is a significant innovation point developed by ShuffleNet that realizes the data interchange of channels from the extracted feature method with smaller computation costs. In this work, effective channel attention (ECA) was established to suppress the relevant attributes and accomplish the weighting method of the features for succeeding the classifier models. However, when retaining the lightness, the LSR loss function can be performed when considering the multi-dimensional loss computation and enhances the model noise immunity.
The visual attention modules are drawn from human visual features to emphasize crucial data in images that are advantageous for enhancing the performance of the model. Visual attention mechanism brings accuracy improvements to CNNs by weighting the outcome features but mainly at the cost of enhancing the difficulty, namely convolution block attention mechanism (CBAM) SE. ECA refers to a lightweight attention model that borrowed the idea from SE to construct a channel attention module that is established in CNN to have participated during the end-to-end trained model. ECA exploits 1D convolutions for an extracted feature that prevents feature downscaling and efficiently captures cross-channel data interactions.
Assume the input feature matrix is
, and
,
and
characterize the width, channels, and height of the input features, correspondingly. First, the input matrix can be processed via a global average pooling layer that leads to the channel feature description matrix
. Next, feature extraction can be executed utilizing
convolutional layer, and the output can be processed via a nonlinear activation function:
In Equation (5),
represents the sigmoid activation function, and
designates the
convolutional process. Lastly, the input feature is multiplied with the attention weight for the channel size:
In Equation (6), represents elemental multiplication, is copies within the spatial size to attain feature matrices that are later point multiplied with another matrix. The ECA belongs to the channel attention that enables the allocation of the weight to the channel of the feature maps, which makes the network focus on the most crucial channels. Thus, after this layer, the ECA module is embedded.
3.3. BER-Based Hyperparameter Optimization
For the effectual identification of the hyperparameter values of the improved ShuffleNet method, the BER algorithm is used. The optimizer technique aims to find the optimum solution for the problem with a set of constraints [
26]. In this work, an individual from the population is characterized by the vector
, where
represents the searching space size, and
indicates the feature or parameter from the optimizer problems. The fitness function
is used for determining how well an individual performs up to the provided point. The optimization process exploits the subsequent phase to search through the population for the certain optimal vector
, which increases the FF. This technique starts with a set of solutions. The following parameters are used for the optimization process: the population size, the FF, the dimension, and the lower and upper boundaries for the solution space.
In the proposed AER algorithm, the population is split into subgroups. The number of individuals in every group can be adapted to enhance the balance between the exploration and exploitation processes. Moreover, to assure the convergence of optimization techniques for the population, the elitism approach can be applied by keeping the leading solution when no best solution is found. If the fitness of the solution does not increase dramatically for 3 iterations, this is the local optima, and subsequently, other exploration individuals are produced by using the mutation process.
Exploration avoids local optimal stagnation by movement towards a better solution and is responsible for finding exciting locations in the search space.
Heading towards the best solution strategy is used to search for prospective regions around its existing location in the search space. This could be obtained by repetitively searching amongst near promising alternatives for a better option with respect to the fitness value:
where
denotes the diameter of the circle where the searching agent finds the potential area. If
denotes a random number within
and
show the coefficient vector which value was measured by
.
The exploitation team is responsible for improving the present solution. The BER evaluates every individual fitness value at all the cycles and differentiates the optimal individual. The BER applies two dissimilar strategies to accomplish exploitation, as follows.
The following equation is used for moving the searching agent towards the better solution:
where
indicates the solution vector at
iteration,
Denotes a random vector that controls the movement steps near the better solution,
denotes the distance vector, and
shows the better solution vector.
The most potential is the area surrounding the lead (better solution). Consequently, some individuals hunt in the surrounding area for a better solution with the possibility of finding the best solution. The BER exploits the subsequent formula to realize these operations.
From the expression, represents the better solution, denotes the random value within indicates the iteration value, and indicates the overall number of iterations.
A mutation is an alternative approach applied by the BER. It can be a genetic operator used for creating and sustaining population diversity. It avoids an earlier convergence by assisting in avoiding the local optima, such as a modification of the searching space as a springboard to other interesting topics. The mutation is crucial for the remarkable exploration of the BER.
The BER selects the better performance for the following iteration to guarantee the quality of the defined solution. Even though the elitism technique improves the efficacy, it causes an early convergence when using a multi-modal function. Note that the BER gives impressive exploration abilities by applying a mutation method and searching nearby individuals to the exploration group. Due to its strong exploration abilities, the BER could prevent early convergence.
3.4. Detection Using Optimal DCRNN Model
At this stage, the features are passed to the DCRNN model for classification. Recently, a CNN was investigated and proved efficient in high-dimensional and large-scale data learning [
27]. Additionally, a RNN is robust in long-term dependency capturing and temporal sequence learning. Here, the CNNs and RNNs are combined to implement feature learning on a MFCC-based representation depending on heart sounds that exploit the long-term dependency captured by the RNN and the encoded local features extracted from the CNN. The learnable kernel size in all the layers is fixed to 3 × 3, and the renowned ReLU function is exploited in all the convolution layers. A max-pooling is employed, followed by each convolutional layer, where 2 × 2 windows are exploited, and the stride is 2 × 2.
The BN layer standardizes the mini-batch via the whole network, which reduces the internal covariate shift caused by progressive transform, and the dropout layer might prevent overfitting and decrease the amount of neurons. Therefore, for all the input samples, a feature map is attained. After the max-pooling and convolutional layers, an LSTM layer is exploited for learning the temporal characteristics amongst the attained mapping features, and an FC layer with sixty-four neurons is effectuated to learn the global features. Eventually, a softmax layer is executed for deriving the probability distribution through two classes respective to normal and abnormal heart sound signals.
Figure 2 depicts the infrastructure of the CRNN.
To improve the detection rate, the COA is used for hyperparameter tuning. The updating method of a candidate solution (coatis place) from the COA dependent upon modeling 2 basic performances of coatis are given below [
28]:
- (i)
Coatis’ escape strategy from the predator.
- (ii)
Coatis’ strategy while attacking iguanas.
Consequently, the COA population was upgraded 2 stages.
The primary stage of updating the coatis’ population is modeled by simulating the strategy while attacking the iguanas. Here, a set of coatis climb trees to attain an iguana and scare it. This approach leads the coatis to move toward dissimilar locations in the searching space, which illustrates their exploration capability in a global searching space. Here, the location of the fittest member of the population is considered the location of the iguana. Thus, the rise in the coatis’ location in the tree can be a mathematical formula:
Once the iguana falls to the ground, it can be positioned at an arbitrary location from the searching space. As per the arbitrary location, the coatis on the ground moves from the searching space, which can be given as follows:
The new location evaluated for all the coatis is acceptable for the updating procedure once it enhances the value of the main function or the coatis remain in the prior location.
Now, represents a random real number within , denotes the new location evaluated for the coatis, indicates the dimension, shows the objective function value, shows the iguana’s location in the search space that represents the location of the fittest member, refers to the dimension, shows the location of the iguana on the ground that is produced at random, stands for the dimension, shows the floor function, and denotes the value of the main function.
The second stage is to update the location of the coatis from the searching space using a mathematical process according to the natural behaviors of the coatis while encountering predators and escaping from them. Once the predator attacks a coati, the animal escapes during the locating. The coatis’ moves during these strategies result in a safer location closer towards the existing location that represents the exploitation capability during a local search.
To simulate these behaviors, a random location is produced nearby the location where every coati is located as follows:
A recently estimated location is suitable once it enhances the value of the main function as follows:
In Equation (21), denotes the newest location evaluated for the coatis, shows the dimension, indicates the objective function values, refers to the randomly generated values within shows the iteration counter, and denote the local lower, as well as upper, boundaries of the search space correspondingly, and and indicate the lower boundaries, as well as upper boundaries, of the search space, respectively.
The COA iteration is completed after the location of the coatis when the decision variable is updated. The updating process is based on Equations (15)–(21) and reiterated until it attains the maximum iteration.
The COA method uses a fitness function (FF) to obtain a superior efficiency of the classifier. It describes a positive integer to suggest the finest outcome of the candidate solutions. The decline of the classifier rate of errors is assumed in the FF.
4. Performance Validation
In this section, the cancer detection outcomes of the BERTL-HIALCC technique are validated using the LC25000 database [
29]. The database contains 25,000 instances with five classes, as provided in
Table 1.
Figure 3 demonstrates the sample images.
The confusion matrices of the BERTL-HIALCC method in the LCC detection results are depicted in
Figure 4. The results identified that the BERTL-HIALCC technique recognized lung and colon cancers effectually.
In
Table 2, the LCC recognition outcomes of the BERTL-HIALCC method under 80:20 of TRP/TSP are reported. In
Figure 5, the recognition outcomes of the BERTL-HIALCC method are investigated under 80% of TRP. The figure indicates that the BERTL-HIALCC system identified the five classes proficiently. In the Col-Ad class, the BERTL-HIALCC technique provides
,
,
,
, and
of 99.19%, 98.98%, 96.92%, 97.94%, and 98.34%, respectively. Likewise, in the Col-Be class, the BERTL-HIALCC technique provides
,
,
,
, and
of 99.27%, 97.85%, 98.51%, 98.18%, and 98.98%, respectively. Similarly, in the Lun-SC class, the BERTL-HIALCC method provides
,
,
,
, and
of 98.99%, 97.05%, 97.93%, 97.49%, and 98.59%, correspondingly.
In
Figure 6, the recognition results of the BERTL-HIALCC technique are investigated under 20% of TRP. The figure revealed that the BERTL-HIALCC technique identified the five classes proficiently. In the Col-Ad class, the BERTL-HIALCC approach provided
,
,
,
, and
of 99.28%, 99.08%, 97.30%, 98.18%, and 98.54%, correspondingly. Similarly, in the Col-Be class, the BERTL-HIALCC method provided
,
,
,
, and
of 99.38%, 98.27%, 98.57%, 98.42%, and 99.07%, correspondingly. Similarly, in the Lun-SC class, the BERTL-HIALCC algorithm provides
,
,
,
, and
of 99%, 96.65%, 98.40%, 97.52%, and 98.77%, respectively.
In
Table 3, the LCC recognition results of the BERTL-HIALCC method under 80:20 of TRP/TSP are reported. In
Figure 7, the recognition outcomes of the BERTL-HIALCC method are investigated under 80% of TRP. The results point out that the BERTL-HIALCC system identified the five classes proficiently. In the Col-Ad class, the BERTL-HIALCC technique provides
,
,
,
, and
of 99.42%, 98.98%, 98.25%, 98.57%, and 98.98%, respectively. Likewise, in the Col-Be class, the BERTL-HIALCC approach offers
,
,
,
, and
of 98.77%, 95.66%, 98.35%, 96.99%, and 98.61%, correspondingly. Additionally, in the Lun-SC class, the BERTL-HIALCC technique provides
,
,
,
, and
of 98.67%, 96.33%, 97.05%, 96.69%, and 98.06%, correspondingly.
In
Figure 8, the recognition results of the BERTL-HIALCC technique are inspected under 20% of TRP. The results demonstrate that the BERTL-HIALCC technique identified the five classes proficiently. In the Col-Ad class, the BERTL-HIALCC technique provides
,
,
,
, and
of 99.40%, 98.56%, 98.36%, 98.46%, and 99.01%, respectively. Additionally, in the Col-Be class, the BERTL-HIALCC method offers
,
,
,
, and
of 98.97%, 96.61%, 98.24%, 97.42%, and 98.70%, respectively. Likewise, in the Lun-SC class, the BERTL-HIALCC approach presents
,
,
,
, and
of 98.89%, 97.22%, 97.29%, 97.25%, and 98.29%, respectively.
Figure 9 inspects the accuracy of the BERTL-HIALCC method in the training and validation on 80:20 of TRP/TSP. The figure specifies that the BERTL-HIALCC approach reaches greater accuracy values with higher epochs. In addition, the greater validation accuracy over training accuracy shows that the BERTL-HIALCC method learns productively on 80:20 of TRP/TSP.
The loss analysis of the BERTL-HIALCC algorithm at the time of training and validation is validated on 80:20 of TRP/TSP in
Figure 10. The results point out that the BERTL-HIALCC technique reaches similar values for the training and validation loss. The BERTL-HIALCC method learns productively on 80:20 of TRP/TSP.
A detailed precision–recall (PR) curve of the BERTL-HIALCC method is shown on 80:20 of TRP/TSP in
Figure 11. The results show that the BERTL-HIALCC method results in increasing values of PR. In addition, the BERTL-HIALCC approach reached higher PR values in all the classes.
In
Figure 12, a ROC study of the BERTL-HIALCC technique is revealed on 80:20 of TRP/TSP. The figure highlights that the BERTL-HIALCC method results in improved ROC values. In addition, the BERTL-HIALCC algorithm extends enhanced ROC values in all the classes.
To illustrate the improved cancer recognition results of the BERTL-HIALCC technique, a brief comparison study is carried out in
Table 4 [
30]. The results point out that the BERTL-HIALCC technique has improved results. Based on
, the BERTL-HIALCC technique gains increasing
of 99.22%, while the MPADL-LC3, mSRC, Faster R-CNN, DAELGNN, ResNet50, CNN, and DL models accomplish decreasing
of 99.09%, 88.21%, 98.79%, 98.73%, 93.64%, and 97.11%, respectively.
Additionally, based on , the BERTL-HIALCC technique has an increasing of 98.07%, while the MPADL-LC3, mSRC, Faster R-CNN, DAELGNN, ResNet50, CNN, and DL algorithms accomplish decreasing of 98.01%, 85.21%, 96.53%, 97.95%, 96.12%, and 97.07%, respectively. Last, based on , the BERTL-HIALCC method has an increasing of 98.06%, while the MPADL-LC3, mSRC, Faster R-CNN, DAELGNN, ResNet50, CNN, and DL approaches accomplish decreasing of 97.20%, 91.78%, 97.78%, 97.63%, 96.39%, and 96.44%, correspondingly. In addition, the computation time (CT) analysis reported that the BERT-HIALCC technique results in the minimal CT value compared to the other models. Therefore, the proposed model can be employed for accurate LCC detection and classification.