1. Introduction
Exposure to electromagnetic fields (EMFs) at different frequencies has been reported to induce structural and functional alterations in brain tissue, particularly by triggering apoptotic processes [
1]. Apoptosis, defined as programmed cell death, is an important biomarker for evaluating brain tissue damage. In this context, the TUNEL (terminal deoxynucleotidyl transferase-mediated dUTP end labeling) technique is widely used to detect apoptosis by identifying DNA fragmentation and enabling histopathological assessment of apoptotic cells [
2,
3].
The evaluation of apoptotic damage in microscopic images obtained after TUNEL staining is generally based on semi-quantitative scoring. However, this approach is inherently subjective due to observer-dependent variability and may lead to inconsistencies between evaluations. In addition, manual assessment is time-consuming and prone to error, particularly in samples with high cell density. These limitations highlight the need for more objective and reproducible analysis methods. Recent studies have demonstrated that artificial intelligence-based approaches can improve the accuracy, efficiency, and reproducibility of medical image analysis [
4].
Over the past decade, deep learning has been increasingly utilized in image classification. This is largely because it can autonomously derive hierarchical feature representations from raw inputs, eliminating the need for manual feature design [
5]. Among these, convolutional neural networks (CNNs) have been widely used in biomedical image analysis due to their ability to effectively extract spatial and structural features from image data [
6]. CNN-based approaches have shown high performance in various medical applications, including cell classification, tumor detection, and histopathological image analysis [
7,
8].
CNN architectures are particularly effective in image-based tasks due to their local connectivity, weight sharing, and ability to learn complex patterns through multiple layers [
9]. Moreover, transfer learning using pre-trained deep convolutional neural networks such as AlexNet, GoogLeNet, Inception-v3, and ResNet has been shown to be effective in classification tasks, especially when working with relatively small datasets [
10]. Previous studies have reported high classification accuracies using these models in medical imaging applications, including brain tumor classification and histopathological image analysis [
11,
12,
13].
Despite the growing number of studies applying deep learning to medical imaging, the use of artificial intelligence for the classification of apoptotic damage specifically in TUNEL-stained histopathological images remains limited. Furthermore, most existing studies rely on large publicly available datasets, whereas studies based on experimentally derived datasets are relatively scarce.
In this study, apoptotic damage in TUNEL-stained brain tissue microscopic images obtained under different electromagnetic field frequencies and ginseng treatment was classified using transfer learning-based CNN models. The performances of multiple pre-trained deep learning architectures, including AlexNet, SqueezeNet, GoogLeNet, Inception-v3, and ResNet-101, were comparatively evaluated. Rather than developing a novel deep learning architecture, the primary aim of this study was to investigate the feasibility of using established transfer learning models for objective and reproducible histopathological evaluation as an alternative to conventional semi-quantitative manual scoring.
2. Related Work
CNNs are frequently used in image classification in the healthcare field. They are neural networks specifically designed for image recognition problems [
9]. The main advantages of CNNs are their improved feature learning capability and high classification performance compared to traditional machine learning approaches and classical neural networks. This can be achieved by increasing training examples, thus creating a more robust and accurate model. In CNN architecture, convolutional filters function as feature extractors, extracting increasingly complex features (spatial and structural information) as they delve deeper. Feature extraction occurs through the evolution of filters with input patterns, followed by the selection of the most distinctive features. This then initiates the training of the classification network [
14]. The Inception-v3-based network architecture, combined with transfer learning, has attracted the attention of academics in recent years due to its excellent performance on a wide variety of small datasets. Li et al. [
15] successfully performed lymph node metastasis classification in colorectal cancer. Mednikov et al. [
16] performed an effective classification of breast masses.
Deep CNN (DCNN) typically requires large-scale image datasets to achieve the highest accuracy. However, obtaining such an image dataset is difficult in many fields. In such challenging situations, the use of ready-made features of established DCNNs such as AlexNet, GoogLeNet, Inception-v3, and ResNet101, pre-trained in large-scale public image databases (such as ImageNet), has proven useful in solving various medical image classification tasks through transfer learning [
10]. CNNs and pre-trained networks are frequently used in image classification studies, particularly in the healthcare field. Rasool et al. [
11] proposed a novel hybrid CNN-based architecture to classify three brain tumor types through MRI images. They used a pre-trained GoogLeNet model of the CNN algorithm for feature extraction. They achieved an accuracy of 93.1% from the GoogLeNet model. In this study, the most accurate result, 89.47%, was also obtained from the pre-trained GoogLeNet. Tamilarasi and Gopinathan [
12] presented a non-invasive diagnostic support system for brain cancer diagnosis. They demonstrated the usefulness of their self-developed Inception-v3 initial architecture in brain image classification in MRI images with an average accuracy of 95.1%, sensitivity of 96.2%, and specificity of 94%. Alinsaif and Lang [
13] used pre-trained CNNs SqueezeNet-v1.1, MobileNet-v2, ResNet-18, and DenseNet-201 for the automatic classification of histopathological images. ResNet generally showed better accuracy results than SqueezeNet. A similar result was observed in this study, with ResNet-101 (84.21%) yielding better accuracy than SqueezeNet (78.95%). Well-developed deep learning architectures are used in many image analysis approaches to achieve better performance. The most commonly used architectures are VGG-16, AlexNet, and GoogLeNet [
12]. Therefore, pre-trained CNNs were preferred in this study as well.
3. Materials and Methods
3.1. Experimental Design and Histopathological Evaluation
The histopathological images used in this study were obtained from our previously conducted experimental study, which investigated the effects of electromagnetic field (EMF) exposure at different frequencies with and without ginseng treatment [
1]. Brain tissue samples were collected following the experimental protocol and processed for histological examination.
Apoptosis in brain tissue was evaluated using the TUNEL (terminal deoxynucleotidyl transferase-mediated dUTP end labeling) method. TUNEL-positive cells were identified by the presence of brown-stained nuclei, indicating DNA fragmentation associated with apoptosis. The severity and extent of apoptotic staining were assessed semi-quantitatively by an expert histologist and categorized into three classes: 0 (no), +1 (slight), and +2 (moderate).
3.2. Image Acquisition and Dataset Preparation
TUNEL-stained brain tissue sections were examined using a light microscope (Leica DM 500, Wetzler, Germany) at 40× magnification. Following image acquisition, the dataset consisted of microscopy images categorized according to a semi-quantitative scoring provided by the expert histologist. A total of 92 images were included in the dataset, with 18 images in class 0, 39 images in class +1, and 35 images in class +2. All images had a resolution of 2048 × 1536 pixels. Due to the limited number of experimentally obtained histopathological images, transfer learning-based approaches were preferred to improve classification performance on a relatively small dataset.
To evaluate model performance under different conditions, the dataset was divided into training and test sets using three different ratios: 80% training–20% testing, 70% training-30% testing, and 50% training-50% testing. The training set was used to train the models, while the test set was used to evaluate classification performance.
3.3. Deep Learning Models and Training Procedure
Image classification was performed using pre-trained convolutional neural network (CNN) architectures implemented in MATLAB R2021b (MathWorks, Natick, MA, USA). The models used in this study were SqueezeNet, AlexNet, GoogLeNet, Inception-v3, and ResNet-101. These architectures were selected due to their proven performance in image classification tasks and their suitability for transfer learning. The selected CNN architectures differ in depth and structural design. AlexNet and SqueezeNet are relatively lightweight models, whereas GoogLeNet, Inception-v3, and ResNet-101 provide deeper architectures with enhanced feature extraction capabilities for complex image classification tasks. These architectures utilize convolutional and pooling operations to automatically extract hierarchical image features during the classification process.
The models were trained using a transfer learning approach, where pre-trained CNN architectures were adapted using the TUNEL microscopy image dataset. During training, model parameters were optimized to achieve the best classification performance for each dataset split. The classification performances of the models were evaluated and compared using accuracy, precision, sensitivity, and F1-score metrics.
Input images were resized according to the requirements of the pre-trained CNN architectures prior to training. The pre-trained CNN models were adapted for the three-class classification task used in this study. Different hyperparameter settings, including learning rate, batch size, and number of epochs, were evaluated to determine the optimal classification performance for each model. Model performance was assessed using different training and test dataset ratios.
All experiments were conducted using MATLAB R2021b (MathWorks, Natick, MA, USA) on a system equipped with an Intel Core i7-7700HQ CPU (2.80 GHz), 16 GB RAM, and an NVIDIA GeForce GTX 1050 Ti GPU. A schematic overview of the classification workflow is presented in
Figure 1.
3.4. Evaluation Metrics
Model performances were evaluated using accuracy, precision, sensitivity, and
F1-score metrics. Furthermore, since the accuracy value alone would not be sufficient to measure the success of the network, a confusion matrix was obtained as shown in
Table 1. The confusion matrix corresponding to the best classification accuracy was obtained for each CNN model. From the confusion matrix, it was possible to calculate the true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) of the classification model. TP and TN represent correctly classified samples, whereas FP and FN represent incorrectly classified samples. Accuracy (Equation (1)), precision (Equation (2)), sensitivity (Equation (3)), and
F1 score (Equation (4)) were used as evaluation criteria. These equations represent the classical evaluation criteria for classification AI models [
17]. Sensitivity refers to the probability of correctly detecting the presence of an object, while specificity refers to the probability of correctly detecting the absence of an object [
18].
4. Results
In the 2100 MHz, 2450 MHz, and 2600 MHz EMF groups, the immunostaining intensity of apoptotic cells was observed at the +2 level compared to the control group. In contrast, in the 2100 MHz + G, 2450 MHz + G, and 2600 MHz + G groups, the apoptotic immunostaining intensity was at the +1 level. In the sham, ginseng, and control groups, apoptotic immunopositive staining was negligible.
Based on these findings, the dataset consisted of a total of 92 microscopy images classified into three categories (0, +1, +2), including 18 images in class 0, 39 images in class +1, and 35 images in class +2. (
Figure 2).
The pre-trained CNNs studied included AlexNet with 25 layers, SqueezeNet with 68 layers, GoogLeNet with 144 layers, Inception-v3 with 315 layers, and ResNet-101 with 347 layers. Across different networks, higher or comparable accuracy rates were generally obtained when the validation frequency was set to 3 rather than 6. For example, in the Inception-v3 network, with a 70% training and 30% test dataset and a learning rate of 0.0001, the accuracy was 62.96% when the validation frequency was set to 3, while it was 55.56% when the validation frequency was set to 6. The highest accuracy rates were obtained when the batch size was 10, the epochs were 6, and the validation frequency was 3. The learning rate was increased from 0.0001 to 0.1. The accuracy rates of different neural networks trained with different parameters were given by selecting batch size 10, epochs 6, and validation frequency 3.
The accuracy results for SqueezeNet are shown in
Table 2. The highest accuracy of 78.95% in SqueezeNet was obtained with 80% training + 20% test data, batch size 10, epochs 6, validation frequency 3, and learning rate 0.0001. The accuracy and loss graphs for the 78.95% accuracy are given in
Figure 3.
The accuracy results for AlexNet are shown in
Table 3. The highest accuracy of 73.68% in AlexNet was obtained when 80% training + 20% test data, batch size 10, epochs 6, validation frequency 3, and learning rate 0.0001 were selected. The accuracy and loss graphs for the 73.68% accuracy are given in
Figure 4.
The accuracy results for GoogLeNet are shown in
Table 4. The highest accuracy of 89.47% in GoogLeNet was achieved when 80% of the training data and 20% of the test data were used, a batch size of 10, 6 epochs, a validation frequency of 3, and a learning rate of 0.0001 were selected.
Figure 5 shows the accuracy and loss graphs for the 89.47% accuracy,
Figure 6 shows the confusion matrix, and
Table 5 shows the evaluation metric results obtained from the confusion matrix. The evaluation metrics demonstrated high precision, sensitivity, and
F1-score values across all classes for the best-performing GoogLeNet model. When the parameters that yielded the highest accuracy were selected, the GoogLeNet algorithm correctly classified 7 out of 10 new images from different classes not present in the dataset.
The accuracy results for Inception-v3 are shown in
Table 6. The highest accuracy of 84.21% in Inception-v3 was obtained when 80% training + 20% test data, batch size 10, epochs 6, validation frequency 3, and learning rate 0.001 were selected. The accuracy and loss graphs for the 84.21% accuracy are given in
Figure 7.
The accuracy results for ResNet-101 are shown in
Table 7. The highest accuracy of 84.21% was obtained when 80% training + 20% test data, batch size 10, epochs 6, validation frequency 3, and learning rate 0.0001 were selected. The accuracy and loss graphs for the 84.21% accuracy are given in
Figure 8.
The highest classification accuracy was achieved by the GoogLeNet model, reaching 89.47%. A comparison of the classification accuracy of the evaluated deep learning models is presented in
Figure 9. In terms of processing time, SqueezeNet was the fastest model, whereas ResNet-101 showed the longest processing time. The models ranked from fastest to slowest average processing time as follows: SqueezeNet (66 s), AlexNet (74 s), GoogLeNet (95 s), Inception-v3 (230 s), and ResNet-101 (329 s).
5. Discussion
The findings of the present study were evaluated in light of previous studies investigating the use of deep learning approaches in medical image classification. Dong et al. [
9] developed a cell classification algorithm combining Inception-v3 with artificial features to identify normal and abnormal-looking cervical cells. Based on the improved Inception-v3 network structure, they found the classification accuracy to be 98.23%. In another study by Weis et al. [
19], a deep learning algorithm was developed to recognize complex glomerular structural changes from light microscopy images of kidney tissues stained with Periodic Acid-Schiff (PAS). They included images of damaged tissues, such as necrosis and sclerosis, alongside normal tissue images in the dataset. Since they obtained images from a ready-made dataset, the number of training images was quite large (2451 images). They applied pre-trained CNNs to a test dataset consisting of 180 categorized images. They achieved accuracy of 85% in AlexNet, 86% in SqueezeNet, 90% in ResNet-101, and 92% in Inception.
CNNs perform classification by directly extracting image features from raw images by adjusting the parameters of the convolution and pooling layers. The features extracted by the CNN are largely dependent on the size of the training dataset. When the training dataset is limited, CNN models may be more prone to overfitting after a certain number of epochs [
10]. Kaur and Gandhi [
10] showed that the pre-trained transfer-learning AlexNet performed best in a shorter time compared to other proposed models (94% accuracy). Deepak and Ameer [
20] used a pre-trained GoogLeNet to extract features from brain tumor MR images. They used 3064 brain MR images from 233 patients obtained from an open-access dataset and achieved 92.3% classification accuracy. Yang et al. [
21] used AlexNet and GoogLeNet in their research studies on grading glioma from MRI images. They proved that GoogLeNet was superior to AlexNet for the task in terms of observed performance metrics. Similarly, in this study, when comparing GoogLeNet (89.47% accuracy) and AlexNet (73.68% accuracy), GoogLeNet yielded the most accurate result.
The relatively limited dataset size is mainly due to the use of experimentally obtained TUNEL-stained microscopy images derived from our own animal study [
1], rather than publicly available datasets. This provides a biologically relevant experimental dataset for evaluating the applicability of transfer learning-based CNN models.
6. Conclusions
A review of the literature reveals that each study yields different results, and even when using the same neural network and selecting the same parameters, the same accuracy cannot be achieved. Therefore, the most accurate results can be obtained by selecting different neural networks and making various corrections according to the dataset used. TUNEL microscope images have similar characteristics. The color change from light to dark brown indicates the level of damage. The darker brown cells there are in an image, the higher the apoptotic cell density. Accurate classification of apoptotic intensity from TUNEL-stained microscopy images may be challenging, particularly when the available dataset size is limited. In a comprehensive study involving different disciplines, the use of experimentally obtained TUNEL-stained microscopy images provides a biologically relevant framework for evaluating transfer learning-based CNN models. The most accurate result, with 89.47% accuracy, was obtained on GoogLeNet. When comparing the processing times of the pre-trained CNNs used, SqueezeNet was found to be the fastest, and ResNet-101 was the slowest. Ranking them from fastest to slowest processing time, we compared SqueezeNet (average 66 s), AlexNet (average 74 s), GoogLeNet (average 95 s), Inception-v3 (average 230 s), and ResNet-101 (average 329 s). When evaluated using additional images outside the training dataset, the models demonstrated promising classification performance even with a relatively limited number of images. GoogLeNet, with an accuracy of 89.47%, gave the most accurate result, correctly classifying 7 out of 10 images from different classes.
In conclusion, the findings demonstrate that pre-trained CNN models can be effectively used for the automated classification of apoptotic damage in TUNEL-stained images. Among the evaluated models, GoogLeNet showed the best overall performance, while SqueezeNet offered advantages in terms of computational efficiency. These findings suggest that transfer learning-based CNN models may offer an objective and reproducible approach for evaluating apoptotic damage in histopathological image analysis.
Furthermore, a review of the literature reveals that while artificial intelligence algorithms are widely applied in histopathology studies, the lack of literature on their application in TUNEL staining scoring highlights the importance of this topic. Nevertheless, the study has some limitations. The relatively small dataset size may limit the generalizability of the results. Future studies, including larger datasets and improved model optimization techniques, may further enhance classification accuracy and overall model performance.