FabNet: A Features Agglomeration-Based Convolutional Neural Network for Multiscale Breast Cancer Histopathology Images Classification

Simple Summary Histology sample images are usually diagnosed definitively based on the radiologist’s extensive knowledge, yet, owing to the highly gritty visual appearance of such images, specialists sometimes differ on their evaluations. Automating the image diagnostic process and decreasing the analysis time may be achieved via the use of advanced deep learning algorithms. Diagnostic objectivity may be improved with the use of more effective and accurate automated technologies by lessening the differences between the humans. In this research, we propose a CNN model architecture for cancer image classification by accumulating layers closer together to further merge the semantic and spatial features. Regarding precision, our suggested cutting-edge model improves upon the current state-of-the-art approaches. Abstract The definitive diagnosis of histology specimen images is largely based on the radiologist’s comprehensive experience; however, due to the fine to the coarse visual appearance of such images, experts often disagree with their assessments. Sophisticated deep learning approaches can help to automate the diagnosis process of the images and reduce the analysis duration. More efficient and accurate automated systems can also increase the diagnostic impartiality by reducing the difference between the operators. We propose a FabNet model that can learn the fine-to-coarse structural and textural features of multi-scale histopathological images by using accretive network architecture that agglomerate hierarchical feature maps to acquire significant classification accuracy. We expand on a contemporary design by incorporating deep and close integration to finely combine features across layers. Our deep layer accretive model structure combines the feature hierarchy in an iterative and hierarchically manner that infers higher accuracy and fewer parameters. The FabNet can identify malignant tumors from images and patches from histopathology images. We assessed the efficiency of our suggested model standard cancer datasets, which included breast cancer as well as colon cancer histopathology images. Our proposed avant garde model significantly outperforms existing state-of-the-art models in respect of the accuracy, F1 score, precision, and sensitivity, with fewer parameters.


Introduction
Breast cancer is the most prevalent types of cancer in women, affecting 2.1 million women annually, and it is responsible for the bulk of cancer-related deaths globally [1]. It has been estimated that the prevalence rates of breast cancer range from 19.3 per 100,000 African women to 89.7 per 100,000 European women [2]. Breast cancer is a fatal condition that can occur in nearly any bodily region or tissue when irregular cells abnormally spread, infiltrate, or move into adjacent tissues. The number of reported cases has increased in recent years, and it is projected to reach 27 million by 2030 [3][4][5][6][7]. Considering

1.
We proposed a FabNet model that can learn the fine-to-coarse structural and textural features of multi-scale histopathological images by accretive network architecture that agglomerate hierarchical feature maps to acquire significant classification accuracy.

2.
To preserve and integrate the features, our model links convolutional blocks in a closely coupled tree-based architecture. This method employs every layer of the network from the shallowest to the deepest layers to learn about the rich patterns that occupy a large portion of the feature pile.

3.
We assessed the FabNet model using two publicly available standard datasets that are related to breast cancer and colorectal cancer and noticed that it outperforms the current state-of-the-art models in terms of accuracy, F1 score, sensitivity, and precision when we evaluated our model at different magnification scales of both binary and multi classification.
The rest of this article is structured as follows: Section 2 addresses the related work. Section 3 defines the design of the proposed FabNet model. We define the experimental setup, datasets, training, and implementation descriptions, and provide a detailed analysis of the performance in Section 4. The discussion, conclusions, and possible future research directions are all contained in Section 5.

Related Works
There has been extensive work that has been conducted in the literature to establish strategies for classifying and recognizing breast and colon cancers from histopathology images. The majority of the current approaches utilize computer-aided diagnosis (CAD) techniques to identify breast-cancer-related tumors that include benign and malignant ones. Before the deep learning breakthrough, the data were examined using conventional machine learning techniques based on supervised learning methods [28] to obtain the data features.

Conventional Learning Methods
The bulk of the research in this area has concentrated on a small data sample taken mostly from proprietary datasets. In 2013, several algorithms were used to classify the nuclei from a dataset containing five hundred images from fifty patients, including Gaussian mixture models and fuzzy C-means clustering techniques. This study reported 96% accuracy for two category classifications [29], suggesting that such machine learningbased approaches allowed adequately comprehensive and precise research and were considered to be useful for supporting breast cancer diagnostics. Spanhol et al. [30] published yet another study in which they achieved 85.1 % accuracy on a breast cancer dataset. They applied support vector machines for a patient-level analysis. Using a database of ninety-two specimens, George et al. [31] proposed a breast cancer classification method by applying neural nets with a support vector machine, which achieved 94 percent accuracy. Zhang et al. [32] suggested a cascading approach with a refusal alternative. This procedure was evaluated on a dataset with 361 specimens [33]. This study [34] suggested the application of different classifiers such as support vector machines and the k-nearest neighbor for breast cancer histology image classification. They achieved 87 % accuracy by utilizing assembling voting using the mentioned techniques. In this study [35], adaptive sparse support vector machine-based techniques were applied on a dataset at a 40× magnification level. They reported 94.97% accuracy. There have been a couple of other studies on histopathological representations for carcinoma classification; these studies specifically explain the dichotomies and shortcomings of various publicly accessible benchmark data [36,37].

Deep Learning Approaches
Deep learning has ushered in a new era in the domain of general object classification and detection. The classification of cancer histopathological images (i.e., breast and colon) has been a significant field of study due to advances in medical computer vision and deep learning. Because of the elevated histopathological image resolutions, the conventional machine learning algorithms and deep neural network models used to explicitly view the WSI have resulted in very complex network designs that are a challenge to training [38]. The number of samples used in the classification cancer histopathology images is limited, and the image size is large, making the training of CNN models challenging. Furthermore, image compression of the entire oncology image array to the CNN's input size would result in a loss of the richness of the detailed feature data. As a result, some researchers suggested the classification of images based on patches to alleviate the challenge. In this study [39], the author used a technique to achieve the arbitrary extraction of patches based on a window slithering approach to extract image patches from the BreakHis dataset. AlexNet [40]  Even though the preceding studies demonstrate that patch-based image classification approaches are commonly used in different breast cancer histopathology datasets, histopathology images contain a large number of fine details that need to be extracted with utmost accuracy and precision. We present FabNet, a CNN model that ensembles every fine-to-coarse detail for more accurate learning. This method employs every layer of network from the shallowest to the deepest layers to learn about the rich patterns that occupy a large portion of the feature pile.

FabNet: Features Agglomeration Approach
We define agglomeration as the combination or merging of network layers in a closely coupled manner. In the proposed model FabNet, as shown in Figure 1, we are particularly focused on the productive accumulation of depth, dimensions, and resolutions. We define an agglomeration sequence as deep if it is holistic, discrete, and the initial agglomerated layer moves features through several agglomerations. Since our network has multiple layers and connections, we designed modular architecture that tends to reduce the complexity by grouping and replication. The proposed network layers are subdivided into blocks, for example, B1, which are further subdivided into stages based on the feature resolution. This design is focused on agglomerating the blocks to preserve and combine the feature channels. In Figure 2, a conv block (i.e., B1) is shown, which comprises two convolutional layers with 5 × 5 and 3 × 3 filter window sizes. Both of the convolutional layer activation maps are concatenated, and then transferred to another convolutional layer with 1 × 1 filter size window to reduce the optimal channels. Agglomeration starts on the smallest, shallowest scale and gradually merges on the deeper, wider scales in a repetitive manner. In this manner, the shallow features are redefined as they progress over to deeper blocks of layers.
particularly focused on the productive accumulation of depth, dimensions, and resolutions. We define an agglomeration sequence as deep if it is holistic, discrete, and the initial agglomerated layer moves features through several agglomerations. Since our network has multiple layers and connections, we designed modular architecture that tends to reduce the complexity by grouping and replication. The proposed network layers are subdivided into blocks, for example, B1, which are further subdivided into stages based on the feature resolution. This design is focused on agglomerating the blocks to preserve and combine the feature channels. In Figure 2, a conv block (i.e., B1) is shown, which comprises two convolutional layers with 5 × 5 and 3 × 3 filter window sizes. Both of the convolutional layer activation maps are concatenated, and then transferred to another convolutional layer with 1 × 1 filter size window to reduce the optimal channels. Agglomeration starts on the smallest, shallowest scale and gradually merges on the deeper, wider scales in a repetitive manner. In this manner, the shallow features are redefined as they progress over to deeper blocks of layers. For a sequence of blocks 1, 2, 3 … . .
, we formulated the function ℜ for such a repetition below.
In Equation (1), n is the number of blocks. To increase the depth of the network and the performance, we merge or fuse blocks in a tree-like closely coupled structure. We pass an agglomerated node's feature map back to the baseline as the input feature map to the next sub-module, instead of forwarding intermediate agglomerations further up the tree. This spreads the agglomerations of all of the previous modules, rather than the preceding module only, to help the best preservation of features. We combine the parent and left child nodes of the same depth in the performance.  Our model consists of conv blocks, which are the basic building block of each node. The input of a conv block in the case of B1 accepts an input of 224 × 224 × 3. This input is passed to two different convolutional layers simultaneously for convolutional operations to be performed. Both of the convolutional layers apply 16 kernels with filter window sizes of 3 × 3 and 5 × 5 each with nonlinearity (ReLU), which aims to alleviate the issue of vanishing gradients, as well as improve the network's training speed. To generate an optimal feature map, the feature maps of these two convolutional layers are combined, and thereafter, transferred to a 1 × 1 convolutional layer. In each convolutional layer that is discussed above, we use zero padding, which preserves the original image size, while also In Equation (1), n is the number of blocks. To increase the depth of the network and the performance, we merge or fuse blocks in a tree-like closely coupled structure. We pass an agglomerated node's feature map back to the baseline as the input feature map to the next sub-module, instead of forwarding intermediate agglomerations further up the tree. This spreads the agglomerations of all of the previous modules, rather than the preceding module only, to help the best preservation of features. We combine the parent and left child nodes of the same depth in the performance.
Our model consists of conv blocks, which are the basic building block of each node. The input of a conv block in the case of B1 accepts an input of 224 × 224 × 3. This input is passed to two different convolutional layers simultaneously for convolutional operations to be performed. Both of the convolutional layers apply 16 kernels with filter window sizes of 3 × 3 and 5 × 5 each with nonlinearity (ReLU), which aims to alleviate the issue of vanishing gradients, as well as improve the network's training speed. To generate an optimal feature map, the feature maps of these two convolutional layers are combined, and thereafter, transferred to a 1 × 1 convolutional layer. In each convolutional layer that is discussed above, we use zero padding, which preserves the original image size, while also providing valuable knowledge about feature learning, which aids in the extraction of low-level features for the subsequent layers. Following that, we apply batch normalization, which balances the inferences of the preceding activation layer by subtracting the batch mean and dividing the batch division, thereby increasing the network stability.
The output of conv block B1 is fed into B2, which has a similar internal architecture to that of B1, as depicted in Figure 2, except for the number of kernels. Conv B2 contains 32 convolution filters. The feature maps of both of the conv blocks are then concatenated, which results in an enhanced collective feature map. We apply an average pooling operation with an average pooling layer with 2 × 2 patches of the feature map with a stride of two. This layer down-samples the estimation complexities and parameters from the evaluated image by dividing it into rectangular pooling window areas, which is proceeded by a mean value estimation for every region. The inference of the average pooled image propagates to the next block as an input to conv block B3, which is fed into the final stage C5. As it was mentioned earlier, B3 contains the same internal architecture as those of conv blocks B1 and B2, but the number of convolution filters is 32. The output feature map of B3 is fed into conv block B4 as an input. The internal convolutional layers of conv block B4 apply 64 convolution filters to learn the features. The feature maps of B3 and B4 are fused to generate an extended feature map, which is proceeding by average pooling for downsampling. The average pooled value feeds into the next conv block B5. Conv block B6 is fed to B5 as an input. B6 utilizes 128 convolution filters.
The feature maps of conv block B5 are conv block B6 which is concatenated to fuse the feature, which results in an enhanced feature map with detailed data information. This step is preceded by an average pooling operation to obtain half of the image size. The result of the pooled value is fed into conv block B5. The network repeats the same operation until it reaches conv block B10. The only difference is between the blocks is the number of convolution filters, which is 256 for B8 and 512 for B10. Until it reaches B10, the feature maps of the entire network resulted in optimized propagation from the shallower to the deeper layers and blocks, which makes the proposed network compact and closely bind the entire network. The best features of every block and stage are collected and fused at stage C5 by the extensions from C1 to C5 and by bridging the adjacent blocks The C5 is subjected to the global average pooling function, which significantly reduces the number of data, and thus, the classification layers by measuring the average results of every feature map in the preceding layer. The output layer, which is the last dense layer, includes neurons for each class that have been normalized with the Softmax function; the amount of them varies based on the classification category. We used binary and multi-class classifications in this study.

Methodology
As seen in Figure 3, the proposed method consists of three main steps. Firstly, we obtain training samples by applying the extraction of patches technique to the dataset. Secondly, stain normalization preprocessing of the dataset is performed to resolve the stain variation in the images. For stain normalization, several methods have been suggested in these studies [49][50][51]. DL-based approaches for classifying cancer histopathology images employs a training set to detect a wide range of enhancements to distinguish variations within, as well as across, the categories. A wide range of color inconsistencies in the histopathological images may occur due to the color response of the automated scanners, stain supplier materials and processing units or due to various staining procedures in different laboratories. Therefore, stain normalization is a basic step during histopathological image preprocessing. The key benefit of using image patches for each type of training is that it preserves the local characteristic information from the histopathology images, helping the model to learn the local characteristics features. Thirdly, we train our proposed model with these extracted images to classify and differentiate between the benign and malignant tumors. Furthermore, we outline the datasets, image preprocessing, model training, and implementation details below.
subjected to the global average pooling function, which significantly reduces the number of data, and thus, the classification layers by measuring the average results of every feature map in the preceding layer. The output layer, which is the last dense layer, includes neurons for each class that have been normalized with the Softmax function; the amount of them varies based on the classification category. We used binary and multi-class classifications in this study.

Methodology
As seen in Figure 3, the proposed method consists of three main steps. Firstly, we obtain training samples by applying the extraction of patches technique to the dataset. Secondly, stain normalization preprocessing of the dataset is performed to resolve the stain variation in the images. For stain normalization, several methods have been suggested in these studies [49][50][51]. DL-based approaches for classifying cancer histopathology images employs a training set to detect a wide range of enhancements to distinguish variations within, as well as across, the categories. A wide range of color inconsistencies in the histopathological images may occur due to the color response of the automated scanners, stain supplier materials and processing units or due to various staining procedures in different laboratories. Therefore, stain normalization is a basic step during histopathological image preprocessing. The key benefit of using image patches for each type of training is that it preserves the local characteristic information from the histopathology images, helping the model to learn the local characteristics features. Thirdly, we train our proposed model with these extracted images to classify and differentiate between the benign and malignant tumors. Furthermore, we outline the datasets, image preprocessing, model training, and implementation details below.

Dataset
To evaluate our proposed model, we used the two main, public cancer histology image datasets. Such datasets were considered with three motives: firstly, the diversity of cancer types represented in the histology slides, such as breast cancer and colorectal cancer; secondly, their amount; thirdly, the existence of multiple magnification factors that

Dataset
To evaluate our proposed model, we used the two main, public cancer histology image datasets. Such datasets were considered with three motives: firstly, the diversity of cancer types represented in the histology slides, such as breast cancer and colorectal cancer; secondly, their amount; thirdly, the existence of multiple magnification factors that helped us to carry different tests with the restricted equipment, while modifying different parameters.

BreaHis
In this study, we assessed our model with BreakHis, a publicly available breast-cancerrelated histologic dataset [30]. Samples were created using breast tissue biopsy slides that were colored with H&E staining. There are reportedly 7909 histopathological biopsy images of 700 × 460 pixels in the BreakHis dataset from eighty-two individuals. The dataset consists of two main categories: one of them is benign, and the other one is malignant, which are further subdivided into 4 subclasses as per each category. Table 2 shows the statistical specifics of this dataset, and Figure 4, shows a few illustrations of the histological images. For our tests, we randomly divided the entire dataset in into training/testing subgroups at a 70:30 ratio. To assess our model's efficiency in clinical settings, we kept a patient-based distinction between the training and test data. For stain normalization, we adopted the technique suggested in [50]: an innovative composition-preserving color normalization (SPCN) scheme is used in this process. logical images. For our tests, we randomly divided the entire dataset in into training/testing subgroups at a 70:30 ratio. To assess our model's efficiency in clinical settings, we kept a patient-based distinction between the training and test data. For stain normalization, we adopted the technique suggested in [50]: an innovative composition-preserving color normalization (SPCN) scheme is used in this process. The illustration of stain normalized images is shown in Figure 5.  The illustration of stain normalized images is shown in Figure 5.

NCT-CRC-HE-100K
This dataset includes publicly available 100 K images of human colorectal cancer (CRC), as well as normal tissues [52]. To stain normalize this dataset, in which the image size was 224 × 224 pixels, the Macenko approach [53] was used. We used this color normalization technique because the initial images had subtle variations between red and blue tones, resulting in a misleading classification. Figure 6 shows descriptive representations of the sample images. This dataset is divided into nine subclasses, which are adipose tissue (ADI), lymphocytes (LYM), background (BACK), mucus (MUC), smooth muscle

NCT-CRC-HE-100K
This dataset includes publicly available 100 K images of human colorectal cancer (CRC), as well as normal tissues [52]. To stain normalize this dataset, in which the image Cancers 2023, 15, 1013 9 of 20 size was 224 × 224 pixels, the Macenko approach [53] was used. We used this color normalization technique because the initial images had subtle variations between red and blue tones, resulting in a misleading classification. Figure 6 shows descriptive representations of the sample images. This dataset is divided into nine subclasses, which are adipose tissue (ADI), lymphocytes (LYM), background (BACK), mucus (MUC), smooth muscle (MUS), normal (NORM), debris (DEB), cancer-associated stroma (STR), and tumor (TUM) ones. To improve the variance in this training set, normal tissue samples were obtained primarily from clinical specimens, as well as from gastrectomy samples (such as upper gastrointestinal smooth muscle). The number of distributed training set images in each group was nearly equal, while the test samples contained 7180 images.  Table 2 shows that the BreakHis dataset has a data imbalance problem, which was calculated as 0.42 at the case image scale and 0.44 at the patient scale. The data disparity problem can cause a discriminating performance of computer-aided diagnosis (CAD) models against the majority class in classification problems. Equation (2) determines the patch amount obtained from the dataset image of the ith class.

Image Representation and Patch Extraction
Equation (2) depicts a mathematical representation of Ni patches derived from the i(th) category, xi is the i(th) category's number, xth is the i(th) category's number, β is a constant value, and n represents the classes. The fixed parameter (β) was set to 32. After that, each class has nearly the same number of patches. The primary benefit of utilizing patches during training for every individual class is that it preserves the regional distinctive details in the histological image, which enables the model to learn the spatial information [54].
To obtain an image classification, first, we use a patch classifier to compare several distinct magnifications of patches, and afterward, we average the effects for the complete image patches. The extraction and learning of similar features, for instance, the entire tis-  Table 2 shows that the BreakHis dataset has a data imbalance problem, which was calculated as 0.42 at the case image scale and 0.44 at the patient scale. The data disparity problem can cause a discriminating performance of computer-aided diagnosis (CAD) models against the majority class in classification problems. Equation (2) determines the patch amount obtained from the dataset image of the ith class.

Image Representation and Patch Extraction
Equation (2) depicts a mathematical representation of N i patches derived from the i(th) category, x i is the i(th) category's number, x th is the i(th) category's number, β is a constant value, and n represents the classes. The fixed parameter (β) was set to 32. After that, each class has nearly the same number of patches. The primary benefit of utilizing patches during training for every individual class is that it preserves the regional distinctive details in the histological image, which enables the model to learn the spatial information [54].
To obtain an image classification, first, we use a patch classifier to compare several distinct magnifications of patches, and afterward, we average the effects for the complete image patches. The extraction and learning of similar features, for instance, the entire tissue composition, nucleus state, and texture features are used to classify the images to the desired categories. We inferred that 224 × 224, as well as 700 × 460-pixel patches, would be sufficient to justify the proper cell formation of various tissues. We deduced that 700 × 460, as well as 224 × 224 px size for images, would be ample to explain the relevant composition of different tissues.

Model Training
We assessed the proposed model's efficiency in two areas: (1) sample classification based on binary and multi-class classification, and (2) sample classification based on patientand image-level classification. We used the datasets discussed in the study. These datasets were subdivided into training validation sets. To find the optimal parameters for our model, we use a five-fold cross-validation scheme. We assess our model with assessment metrics such as accuracy, sensitivity, and precision, and F1 score in the performance assessment. On an NVIDIA GTX 1080Ti, we used the Keras framework to implement the method. The metrics of five successful completed trial experiments are reported. We compared our model's efficiency to that of cutting-edge models such as DenseNet 121 [55], VGG16 [56], and ResNet 50 [57].

Implementation Details
FabNet model assimilates the fine-to-coarse structural and textural features of multiscale histopathological images by accretive network architecture that agglomerate hierarchical feature maps to perform significant learning. Our model propagates the features from block to block, and overall, from stage to stage to ensemble the best feature map for learning. We tuned the following hyperparameters in our model, which are a number of convolutional blocks (the internal architecture is defined in Figure 2), epochs, learning rate, optimizer, size of batch, and batch normalization. The epochs were set to 20, 50, 70, and 100, respectively, while 0.01, 0.001, 0.0001, and 10 −4 learning rates were evaluated. We used a batch size of 16, 32, and 64 due to hardware limitations. We tested the model with different optimizers such as Adadelta, Adamax, SGD, RMSprop, and Nadam, but Adam provided the optimal accuracy. The detailed optimized hypermeters are shown in Table 3. The proposed BreakHis and NCT-CRC-HE-100K datasets intended to serve as a standard for breast and colon cancer CAD systems. Before discussing the results, we define the evaluation matrices, which were used to assess the proposed model. The experimental procedure for evaluating the proposed approach for the BreakHis dataset is similar to that which was used in the previous study [39]. The authors defined two types of accuracies, in which the first one reflects the performance accuracy achieved on the patient scale.
If we suppose N p represents the images of the patient, while N c is the patient images that are accurately categorized and N t are the total patients, the score for an individual patient can be calculated as While the global patient accuracy can be calculated as, The second case for the evaluation of classification accuracy is image-level accuracy. If we let N tb be the test image samples for breast cancer and N cb be the images that are classified by CAD system accurately, according to labeled classes, the image level accuracy can be defined as follows, The obtained accuracy at the image and patient levels for different magnification levels is shown in Table 4. Largely, a malignant case is considered to be positive during cancer diagnosis, whereas a benign case is considered to be negative. In clinical diagnosis, sensitivity (also known as recall) is more significant for medical professionals. Therefore, in this study, the proposed model is evaluated based on metrics defined below,  Table 4 depicts the performance of the proposed model, which outperformed DenseNet 121 and MSI-MFNET in terms of test accuracy at each magnification level using the BreakHis dataset. The model showed superior test accuracy at 40×, 200×, and 400× magnifications. At the 100× magnification level, the model slightly lags behind Dense121, which achieves 90.21% accuracy at the patient level, while it achieves 92.71 for the image-level classifications.
The experiments are performed largely focused on binary and multiclass classification. The patch-wise binary and multi-classification outcomes are shown in Table 5. The results are shown using important metrics such as test accuracy and sensitivity (recall) using the 200× magnified image patches. The results are compared with those of two benchmark models, which are DenseNet121 and MSI-MFNet. The experimental results that are ob-tained by the proposed FabNet were better than the mentioned models were, with a larger margin in terms of test accuracy for binary classification as well as multi-class classification. In Table 6, the detailed results that are obtained from the proposed model are presented. It is evident that the model exhibited better accuracy for binary classification, as well as multi-classification at contrasting magnifications, for instance, 40×, 100×, 200×, and 400×. The model showed better performance for binary classification, for instance, the accuracy at the 40× magnification scale the model achieved 99 percent accuracy. The model showed better performance for many classed as well.  Table 7 depicts the classification results of the proposed FabNet for the NCT-CRC-HE-100 K dataset. It is evident that the model exhibited an outstanding performance in terms of test accuracy and sensitivity compared to those of the benchmark models such as VGG16, DenseNet 121, and ResNet50.
In Table 8, detailed class-wise scores for important matrices such as precision, sensitivity, and recall are given to elaborate the efficiency using the NCT-CRC-HE-100K dataset.
The ROC curve is a graphical determination of the classification model's results. It is determined by plotting the true positive rate (TPR) against the false positive rate (FPR) at various discriminatory thresholds, where TPR stands for sensitivity or recall, and FPR stands for false positive rate (1-specificity). The ROC curve for a classification algorithm would be a diagonal line from (0,0) to (1,1). Any curve above the diagonal line indicates a decent classification model that randomly outperforms, and any curve below the diagonal line indicates a model that randomly underperforms. The region under the ROC curve, which is often between 0 and 1, is referred to as the AUC. A high AUC means that the classification model is accurate according to the ROC curve concept. The ROC curve graph can be seen for the binary classification of the BreakHis dataset in Figure 6, where class 0 indicates a benign tumor, and class 1 represents a malignant tumor. Figures 7 and 8 depict the ROC curve graph for the multi-classification performance using the BreakHis and NCT-CRC-HE-100K datasets. The confusion matrix for the binary classification of the BreakHis dataset at different magnification scales is shown in Figure 9. As can be seen in the cases of different magnification levels, 40×, 100×, and 200×, our model tends to produce better results for binary classification. Because of the diverse and significant areas in the images, the representation of the confusion matrix results shows that binary scenarios performed better than multi-classification scenarios did. The higher magnification of features give further structural information to the model, which helps it to acquire a decent depiction of patches with labels.     The confusion matrix results for multi-classification in the case of NCT-CRC colon cancer are shown in Figure 10. The confusion matrix results for multi-classification in the case of NCT-CRC colon cancer are shown in Figure 10. Tables 9 and 10 shows the results of proposed model in comparison with benchmarks related to breast and colon histology models.  Tables 9 and 10 shows the results of proposed model in comparison with benchmarks related to breast and colon histology models. Table 9 shows the mean and standard deviation of our results by experimenting with satin and without stain normalization to better understand the use of the FabNet model in studying cancer histopathology images. The model outperformed some of the one in the most current research studies. For example, in [68], they obtained 97.58% and 97.45% accuracy rates with 7.6 million training parameters, whereas we reached a 99.03% accuracy with 3239 K training parameters. Despite having fewer training parameters, our model achieved a higher degree of accuracy. In another study [2], the authors proposed the Inception Recurrent Residual Convolutional Neural Network (IRRCNN) network, which obtained 97.95% accuracy for image classification and 97.65% accuracy for patient classification. Unlike IRRCNN, FabNet obtained a 99.01% patient-level accuracy and 99.03% picture-level accuracy using this dataset. The authors obtained 99.05% accuracy for binary classification and 98.59% accuracy for multiclassification using data augmentation. We obtained comparable outcomes without applying data augmentation. Data augmentation enables a learning model to overcome important training constraints such as overfitting, hence improving its accuracy and generalization capabilities. In the case of our model, we think that its ability for generalization is strengthened despite the absence of data augmentation. A similar accuracy was shown by Rui Man et al. [55] at the 40× magnification level, however our model achieved better results at the 200× and 400× magnification levels. The authors proposed the use of DenseNet121-ino, which has substantially more training parameters than FabNet does.

Conclusions
In this paper, we suggested the FabNet model that can learn the fine-to-coarse structural and textural features of multi scale histopathological images by accretive network architecture, which agglomerates hierarchical feature maps to acquire significant classification accuracy. We expanded upon the conventional convolutional neural network architecture by incorporating deeper integration to finely fuse information across layers. This layer expansion had a small impact on the model's depth; however, it made the model more tightly linked with a compact form, ensuring that any piece of detail was transferred to the deeper layers for better learning. Despite having fewer parameters, this lightweight network architecture yielded better classification accuracy than the state-of-the-art models did.
Our model yields improved classification probabilities at both the patch as well as the image levels. The efficiency and reliability of the FabNet were assessed using two public datasets that included breast and colon cancer data based on several experiments, for instance, multi-and binary classifications. The suggested FabNet improved upon the existing state-of-the-art models when they were evaluated using both of the public benchmark datasets. The experimental parameters were kept the same for the benchmark models, as well as for the proposed model to precisely conclude the performance. The proposed model achieved 99% accuracy and a 98.9% F1 score in the case of the binary classification of BreakHis at the 40× magnification scale. The model achieved 98.2% test accuracy and a 98.23% F1 score for NCT-CRC-HE-100K colon cancer dataset without employing any data augmentation technique.
We believe that the model can reduce the cancer screening time for pathologists, as well as oncologists. In diverse circumstances, oncologists and researchers working in the field of cancer detection and diagnostics using histological images will benefit from the proposed model's high sensitivity and accuracy. Although the closely coupled architecture tackled the imbalance in the dataset issue, which ultimately resulted in minor effects on the model's performances, since the data imbalance is so prominent in the clinical histology, we intend to look at certain strategies for coping with this problem in the future. We will also look at which feature map combinations which are most significant for classification. The proposed model can be used to perform a variety of tasks related to histological image-based classification in clinical environments. Data Availability Statement: Publicly available datasets were used in this study. The dataset can be found on www.tubo.tu.ac.kr.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report for this study.