A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA

Mhaouch, Ayoub; Gtifa, Wafa; Nouira, Ibtihel; Abdelali, Abdessalem Ben; Machhout, Mohsen

doi:10.3390/a19020107

Open AccessArticle

A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA

by

Ayoub Mhaouch

^1,*,

Wafa Gtifa

²,

Ibtihel Nouira

³,

Abdessalem Ben Abdelali

¹

and

Mohsen Machhout

¹

Laboratory of Electronics and Microelectronics (EµE), Faculty of Sciences of Monastir, University of Monastir, Monastir 5000, Tunisia

²

Laboratory of Automatics, Electrical Systems and Environment (LASEE), National Engineering School of Monastir (ENIM), University of Monastir, Monastir 5019, Tunisia

³

Laboratory of Technology and Medical Imaging, Faculty of Medicine of Monastir, University of Monastir, Monastir 5019, Tunisia

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(2), 107; https://doi.org/10.3390/a19020107

Submission received: 26 October 2025 / Revised: 26 November 2025 / Accepted: 1 December 2025 / Published: 1 February 2026

Download

Browse Figures

Versions Notes

Abstract

Brain cancer classification using deep learning has gained significant attention due to its potential to improve early diagnosis and treatment planning. In this work, we propose a reconfigurable and hardware-optimized CNN-2D architecture implemented on FPGA for multiclass classification of brain tumors from MRI images. The contribution of this study lies in the development of a lightweight CNN model and a modular hardware design, where three key IP coresConv2D, MaxPooling, and ReLUare architected with parameterizable kernels, efficient dataflow, and optimized memory reuse to support real-time processing on resource-constrained platforms. These IPs are iteratively reconfigured to process each CNN layer, enabling flexibility while maintaining low latency. To evaluate the proposed architecture, we first implement the model in software on a Dual-Core Cortex-A9 processor and then deploy the hardware-accelerated version on an XC7Z020 FPGA. Performance is assessed in terms of execution time, power consumption, and classification accuracy. The FPGA implementation achieves a 93.21% reduction in latency and a 67.5% reduction in power consumption, while maintaining a competitive accuracy of 96.09% compared with 98.43% for the software version. These results demonstrate that the proposed reconfigurable FPGA-based architecture offers a strong balance between accuracy, real-time performance, and energy efficiency, making it highly suitable for embedded brain tumor classification systems.

Keywords:

brain cancer; hardware acceleration; artificial intelligence; hardware implementation; reconfigurable hardware architecture; convolutional neural network; PYNQ Z2

1. Introduction

Brain cancer remains one of the most aggressive and life-threatening diseases, where early and accurate diagnosis is crucial for effective treatment and improved survival rates [1]. Magnetic Resonance Imaging (MRI) is widely recognized as the most reliable non-invasive imaging technique for brain tumor detection and classification. However, manual interpretation of these images is time-consuming and subject to inter-observer variability [2]. Consequently, there has been a growing interest in developing automated systems that leverage deep learning techniques to enhance diagnostic accuracy and efficiency.

In recent years, deep learning, particularly Convolutional Neural Networks (CNNs), has demonstrated remarkable performance in automatic medical image analysis [3,4,5,6,7,8]. Brain tumor classification from MRI is a vital task in neuro-oncology: accurate identification of glioma, meningioma, pituitary, or healthy tissue can influence treatment decisions and prognosis. Deep learning, especially convolutional neural networks (CNNs), has demonstrated state-of-the-art performance in multi-class brain tumor classification. For instance, end-to-end CNN models have achieved high accuracy on publicly available MRI datasets [9]. However, most of these models are designed for high-performance GPU environments, which limits their use in real-time, energy-constrained clinical settings.

To address these challenges, Field-Programmable Gate Arrays (FPGAs) have emerged as a promising hardware platform for deep learning acceleration. FPGAs offer a unique balance between performance, energy efficiency, and flexibility, thanks to their fine-grained parallelism and reconfigurable logic resources. Unlike GPUs, which are optimized for general-purpose computation, FPGAs can be customized at the hardware level to meet the real-time and low-power constraints required in medical imaging systems [10,11,12]. Recent work demonstrates how CNNs can be optimized for FPGAs by quantization, compression, and hardware-aware redesign [13]. Quantized neural network frameworks significantly reduce memory footprint and computational demands by using low-precision weights and activations. However, this efficiency often comes at the cost of reduced model accuracy, especially for complex tasks such as multi-class brain cancer classification, where subtle features are critical for correct prediction. Moreover, quantization can introduce training instability, as gradient precision is limited, potentially slowing convergence or causing suboptimal learning [14,15]. Hardware implementations may also face difficulties due to reduced dynamic range and increased sensitivity to noise, which can affect inference reliability. Balancing memory savings, execution speed, energy efficiency, and accuracy remains a major challenge, often requiring careful design choices and calibration strategies to mitigate performance degradation while retaining the benefits of low-precision computation.

Recent advances in the medical imaging field have introduced large-scale foundation models and multimodal networks such as vision-language models and segmentation architectures that demonstrate impressive generalization across different imaging modalities and downstream tasks. However, these powerful models bring substantial computational and memory burdens, making them difficult to deploy on resource-limited embedded platforms like FPGAs [16]. Moreover, their scalability and high cost remain a critical bottleneck in low-resource clinical environments [17]. Consequently, there is an urgent need for lightweight and efficient CNN architectures that can operate in real time on FPGA hardware for brain tumor multi-class classification, while also being compatible with or complementary to existing foundation-model workflows.

In this work, we propose a reconfigurable 2D-CNN hardware architecture on FPGA, specifically optimized for multi-class brain cancer classification. The proposed IP cores are designed with generic input and output interfaces, enabling easy reuse and integration into diverse applications. This architecture delivers real-time inference while maintaining high classification accuracy. Furthermore, we discuss its potential integration with future workflows that leverage large multimodal or foundation models, highlighting the system’s flexibility and compatibility with advanced medical imaging pipelines.

The main contributions of this work can be summarized as follows:

Proposed CNN for brain cancer multi-classification, a CNN-2D model specifically designed for accurate classification of multiple brain tumor types.
Design of reusable IP cores with generic input and output interfaces for convolution, pooling, and activation layers, enabling easy integration and reuse.
Reconfigurable hardware architecture for FPGA implementation of the proposed CNN under resource constraints.
Implementation and experimental validation on real brain MRI datasets demonstrating performance and accuracy.

The remainder of this paper is organized as follows: Section 2 reviews related works and existing FPGA-based CNN accelerators. Section 3 presents the proposed reconfigurable CNN-2D hardware architecture and its design methodology. Section 4 details the experimental setup and performance evaluation. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. Related Works

Deep learning has become the dominant methodology for brain tumor classification and segmentation from MRI, with current systems increasingly leveraging both GPU- and FPGA-based implementations. GPU-accelerated convolutional neural networks (CNNs) remain the most widely adopted solutions for multi-class brain tumor classification due to their ability to extract rich hierarchical features. For instance, CNN-2D architectures and convolutional autoencoder models have demonstrated high accuracy in distinguishing glioma, meningioma, pituitary tumors, and even healthy brain tissue [18,19]. These results highlight the strong discriminative power of deep models and motivate ongoing efforts toward achieving efficient, clinically deployable hardware implementations.

GPUs work very well for research and cloud or desktop applications, but they are not always suitable for clinical or point-of-care settings. They use a lot of energy, generate heat, have unpredictable delays, and need large memory, which makes real-time or low-power use in medical devices difficult. To address this, researchers are exploring techniques like model compression, mixed-precision quantization, pruning, and hardware-aware neural architecture design to reduce computing requirements while keeping diagnostic accuracy [19,20,21]. However, most models are still designed for GPUs, making them hard to use in real-time or embedded medical devices [20]. Recent studies highlight the need for hardware-aware designs and quantization methods to create practical, deployable systems [21].

FPGA acceleration has gained significant attention in medical imaging due to its parallelism, energy efficiency, and customisable dataflow architecture. Studies on FPGA-accelerated CNNs show promising results for MRI classification, segmentation, and registration. Xiong et al. [22] implemented a 3D U-Net for brain tumor segmentation on a Xilinx Alveo U280 FPGA, achieving a 5.21× speed-up over GPU execution and 11× better energy efficiency, with minimal loss in segmentation quality. Jarrah et al. [23] proposed an FPGA-based hybrid detection system combining K-means clustering and Gray Wolf Optimization for MRI tumor detection, achieving 88× faster execution than a CPU implementation. More recently, Mhaouch et al. [24] proposed an FPGA-accelerated Conv2D architecture for brain tumor multi-classification on a PYNQ-Z2 board. Their approach evaluates INT8, FP16, and FP32 precisions, showing that 8-bit fixed-point reduces latency by 16.8% and power by 22.2%, while maintaining ~94.2% classification accuracy.

GPU-based models provide high accuracy and flexibility, but they are power-hungry and less suitable for embedded or real-time applications [25]. FPGA-based designs, especially when combined with quantization and pipelining, offer a promising alternative for energy-efficient, real-time inference. Nevertheless, challenges remain in designing resource-efficient networks, managing FPGA memory and DSP constraints, and maintaining accuracy for multi-class classification. Our proposed reconfigurable CNN-2D FPGA architecture addresses these gaps by providing layer-level reconfigurability, reusable IP cores, quantization, and real-time inference capabilities, bridging the gap between high-performance GPU-trained CNNs and practical embedded deployment for multi-class brain tumor classification.

3. Methodology and Proposed Method

The proposed methodology aims to designing and implementing a reconfigurable CNN-2D hardware architecture on FPGA for real-time multi-class brain tumor classification. The design addresses both algorithmic accuracy and hardware efficiency, ensuring compatibility with resource-constrained platforms. The workflow consists of three main stages: data preprocessing, CNN model design, and FPGA implementation with reconfigurable IP cores. The following subsection provides an overview of the proposed AI-based approach for multi-class brain cancer classification, including details on the dataset, preprocessing techniques, and model training parameters.

3.1. Proposed Brain Cancer Multi-Classification

3.1.1. Methodology

The proposed approach for multi-class brain cancer classification leverages a Convolutional Neural Network (CNN) designed to automatically detect and categorize brain MRI images into four classes: glioma, meningioma, pituitary tumor, and healthy (no tumor). This model aims to extract relevant features from MRI scans while maintaining high classification accuracy and efficiency, making it suitable for integration into real-time diagnostic systems.

The methodology consists of three main stages. In the data preprocessing stage, MRI images are enhanced, normalized, and resized to ensure uniform input quality and reduce noise. The CNN model design stage focuses on defining the network architecture, selecting appropriate activation functions, and applying optimization techniques to achieve a balance between accuracy and computational efficiency. Finally, in the training and evaluation stage, the model is trained on labeled datasets and evaluated using metrics such as accuracy, precision, recall, and overall performance. Figure 1 illustrates the workflow of the proposed AI-based classification system, from image preprocessing to the final tumor classification.

The learning process employed in this study follows a supervised learning paradigm, which is well-suited for multi-class classification tasks. The proposed CNN architecture is designed to automatically extract discriminative features from brain MRI images, eliminating the need for manual feature engineering. During training, the network processes labeled MRI scans and iteratively learns to differentiate between various tumor types through backpropagation and weight optimization.

To improve training stability and accelerate convergence, the Adam optimizer is employed due to its adaptive learning rate and effectiveness in handling sparse gradients. The model is trained using categorical cross-entropy as the loss function, which is tailored for multi-class classification. Training is performed over multiple epochs, with performance continuously monitored on a separate validation set to track learning progress and prevent overfitting.

The proposed CNN model is evaluated using key performance metrics, including accuracy, precision, recall, and F1-score, providing a comprehensive assessment of its classification capability. Additionally, confusion matrices are generated to analyze misclassifications and gain deeper insights into the model’s decision-making process across different tumor categories. These evaluation strategies collectively ensure the reliability, robustness, and effectiveness of the proposed approach for multi-class brain tumor classification.

3.1.2. Dataset and Preprocessing

The dataset used in this study consists of 7023 MRI images, categorized into four tumor types: glioma (1621 images), meningioma (1645 images), pituitary tumor (1757 images), and no tumor (2000 images). These images were obtained from a publicly available MRI dataset, Msoud, sourced from the Kaggle platform. The Msoud [26] dataset is an integration of three widely used MRI datasets: Figshare [27], SARTAJ [28], and BR35H [29], ensuring a diverse and well-balanced representation of tumor types (Figure 2).

To ensure effective model training and evaluation, the dataset was split into training (80%), validation (10%), and test (10%) sets, maintaining a balanced distribution of each class across all subsets. The inclusion of a validation set allows continuous monitoring of the model’s performance during training, helping to prevent overfitting and tune hyperparameters effectively. Data augmentation techniques such as rotation, flipping, zooming, and brightness adjustment were applied to further enhance the diversity of the training set and improve generalization.

Before training the model, several preprocessing steps are applied to improve image quality and optimize the learning process. The preprocessing pipeline consists of image resizing and data augmentation, both essential for enhancing model performance and preventing overfitting. All MRI images are resized to 256 × 256 pixels, ensuring uniform input dimensions across the dataset. This uniformity facilitates stable feature extraction and improves computational efficiency during training.

To improve the performance and robustness of the model, a set of data augmentation techniques is applied. These include rotation, flipping, zooming, and brightness adjustment, each introducing controlled variations that help the model generalize to real-world imaging conditions. Rotation at 90°, 180°, and 270° simulates differences in patient positioning. Flipping, both horizontal and vertical, provides additional diversity in orientations, allowing the model to better recognize tumors from different perspectives. Zooming, within a range of 0.7× to 1.3×, allows the model to learn tumor patterns at multiple scales. Brightness adjustment of ±30% compensates for variations in scanner settings and MRI contrast levels, improving model robustness to contrast differences across MRI scans.

These augmentation techniques are applied dynamically during training, effectively expanding the variability of the dataset without requiring additional MRI samples. This approach significantly improves the model’s generalization to unseen data and contributes to higher classification accuracy. Table 1 summarizes the augmentation strategies and their corresponding parameters used in the proposed brain tumor classification system.

3.1.3. Proposed CNN Architecture for Brain Cancer Multi-Classification

The proposed CNN architecture (Figure 3) is designed to efficiently extract spatial and structural features from brain MRI images, enabling accurate multi-class tumor classification. The proposed neuron network integrates five convolutional layers, each followed by a MaxPooling2D layer, allowing progressive reduction in spatial resolution while preserving essential discriminatory features. These convolutional layers capture increasingly complex patterns related to tumor shape, texture, and boundaries, improving the model’s ability to differentiate between tumor types. The final feature maps are passed through fully connected layers to generate the final four-class output. Table 2 provides a detailed summary of the proposed CNN architecture.

The proposed CNN architecture for brain cancer classification is designed to process grayscale MRI images of size 256 × 256 × 1, where the input layer receives the single-channel image. The first convolutional layer applies 32 filters (3 × 3 kernel), producing an output of 254 × 254 × 32, capturing basic features such as edges and low-level textures. This layer extracts basic features like edges and textures. It is followed by a MaxPooling2D layer that reduces the spatial dimensions to 127 × 127 × 32, decreasing the computational load while retaining important features. The second convolutional layer uses 64 filters, producing a feature map of 125 × 125 × 64, followed by MaxPooling2D layer that reduces the spatial size to 62 × 62 × 64. The third convolutional layer applies 128 filters, generating 60 × 60 × 128, and MaxPooling2D reduces the spatial dimensions to 30 × 30 × 128. The fourth convolutional layer uses 256 filters, producing an output of 28 × 28 × 256, followed by MaxPooling to 14 × 14 × 256. The fifth convolutional layer applies 512 filters, generating an output of 12 × 12 × 512, followed by MaxPooling2D, which reduces the shape to 6 × 6 × 512.

The final pooling output is flattened into a 18,432-element vector, which is fed into a Dense layer with 512 units to learn high-level abstract representations. The final classification layer contains 4 neurons with a softmax activation function corresponding to the four categories: glioma, meningioma, pituitary tumor, and no tumor. This architecture achieves a balance between computational efficiency and classification performance, making it suitable for deployment on hardware-constrained systems such as FPGAs.

The model is trained following a structured supervised learning procedure. It is compiled using the Adam optimizer, selected for its adaptive learning rate and stable convergence, and the categorical cross-entropy loss function, appropriate for multi-class classification. The learning rate is set to 0.001, with a batch size of 32 and 50 epochs used for training. To prevent overfitting, early stopping is implemented, halting training if validation performance stops improving. The proposed neuron network is trained on an augmented dataset, enhancing robustness and generalization. A separate validation set is employed to monitor training progress, evaluate convergence, and tune hyperparameters.

After training, the model is assessed on an independent test set, unseen during training and validation. This final evaluation provides an unbiased measure of generalization ability, with performance metrics such as accuracy, precision, recall, F1-score, and confusion matrices used to assess classification quality across all tumor classes.

3.2. Proposed Hardware-Acceleration

To improve the computational efficiency of the proposed brain cancer classification model, a dedicated hardware-accelerated implementation is introduced using FPGA technology. The primary objective of this hardware acceleration is to optimize the execution of the Convolutional Neural Network (CNN), with particular emphasis on the most computationally demanding layers such as convolution, activation, and pooling operations. By mapping these operations onto specialized FPGA hardware kernels, the design achieves substantial gains in execution time, throughput, and energy efficiency compared to conventional CPU- or GPU-based software implementations. This approach leverages the inherent parallelism, reconfigurability, and low-power characteristics of FPGAs, enabling real-time inference while operating under strict resource constraints typically associated with embedded medical imaging systems.

The proposed hardware-acceleration strategy focuses on deploying and optimizing the complete CNN-based brain cancer classification model on an FPGA platform, while respecting strict constraints on available logic resources, BRAM, and DSP slices. To achieve this, three core operations such as Convolution (Conv2D), MaxPooling2D, and ReLU Activation were implemented as modular IPs with parameterizable configurations. Each algorithm is optimized for parallel execution and pipelining using High-Level Synthesis (HLS) pragmas to maximize throughput while minimizing latency and resource usage (Figure 4). Here, we detail the key layers in the proposed CNN (Conv2D, MaxPooling, and ReLU) and their corresponding equations for hardware implementation.

The Conv2D layer is a core component in the CNN, where it applies a series of filters (kernels) to extract feature maps from the input image. For each filter, the convolution operation slides over the image and performs a dot product of the filter and the receptive field of the input image. The output of the convolutional operation is the feature map, which is then passed to the next layer. The convolution operation for a single output pixel is described by (1):

Y (i, j, k) = \sum_{m = 1}^{M} \sum_{n = 1}^{N} \sum_{c = 1}^{C} X (i + m, n + j, c) . W (m, n, c, k) + B (k)

(1)

where

Y(i, j, k) is the output feature map at position (i, j) for filter k.
X(i + m, j + n, c) is the input image at position (i + m, j + n) for channel c.
W(m, n, c, k) is the filter weights for filter k.
B(k) is the bias term for filter k.
M and N are the dimensions of the filter.
C is the number of input channels (e.g., 1 for grayscale).

In FPGA, the proposed convolution IP performs multiply accumulate operations across input feature maps and filter weights to extract spatial and structural features. The algorithm accepts a generic input feature map of size M × N × H and a weight tensor of F filters of size K × K × H, generating an output of dimensions (M − K + 1) × (N − K + 1) × F. Nested loops iterate over the output channels, rows, columns, and kernel dimensions, while HLS #pragma UNROLL directives are applied to the most inner loops to exploit parallelism. Bias values are incorporated at the start of each convolution operation to enhance flexibility and support different activation thresholds. Algorithm 1 presents the proposed convolution IP for FPGA acceleration with generic parameters.

Algorithm 1: Proposed Conv2D for FPGA Acceleration with Generic Parameters and Maximal Supported Configuration (M = 256, N = 256, H = 3, F = 512, K = 3)
Inputs:
1	I ← Input feature map (height M, width N, depth H) [M][N][H]
2	W ← Weight tensor (F filters, each K × K × H) [F][K] [K][H]
3	B ← Bias vector for each filter [F]
Outputs:
4	O ← Output feature map [C][L][H]
5	Output height: C ← M − K + 1
6	Output width: L ← N − K + 1
7	Initialize output feature map: O ← zeros (C × L × F)
8	For fi = 0 to F − 1 do
9	#pragma HLS UNROLL factor = 8
10		For ci = 0 to C − 1 do
11		#pragma HLS UNROLL factor = 8
12			For li = 0 to L − 1 do
13				accumulator ← B[fi]
14				For i = 0 to K − 1 do
15				#pragma HLS UNROLL factor = 8
16					For j = 0 to K − 1 do
17					#pragma HLS UNROLL factor = 8
18						For hi = 0 to H − 1 do
19							accumulator ← I[ci + i][li + j][hi] * W[fi][i][j][hi] + accumulator
20						end for
21					end for
22				end for
23				O[c][l][f] ← accumulator
24			end for
25		end for
26	end for

The MaxPooling layer reduces the spatial dimensions of the feature map by selecting the maximum value from a specific region of the feature map. This operation not only reduces the size but also helps in making the representation invariant to small translations and distortions in the image. The MaxPooling operation is defined as:

Y (i, j, k) = M A X (X (i + m, n + j, c))

(2)

where

X(i + m, j + n, k) is the input to the pooling layer at position (i + m, j + n) for feature map k.
Y(i, j, k) is the output after max pooling at position (i, j) for feature map k.

The proposed pooling IP reduces the spatial dimensions of feature maps while retaining key features, supporting both stride and kernel size as configurable parameters. For a given input feature map of size M × N × H, the algorithm computes the maximum value within a sliding K × K window across each depth channel, producing a downsampled output of size (M/S) × (N/S) × H, where S is the stride. Loop unrolling is applied to the channel dimension to accelerate computation, and intermediate max values are updated efficiently within the kernel window. Algorithm 2 presents the proposed pooling IP for FPGA acceleration with generic parameters.

Algorithm 2: Proposed Pooling IP (MaxPooling2D) for FPGA Acceleration with Generic Parameters and Maximal Supported Configuration (M = 256, N = 256, H = 512, K = 2, S = 2)
Inputs:
1	I ← Input feature map (height M, width N, depth H) [M][N][H]
Outputs:
2	O ← Output feature map [C][L][H]

3	Pool kernel size: K ← 2
4	Pool stride: S ← 2
5	Output height: C ← M/S
6	Output width: L ← N/S
7	Initialize output feature map: O ← zeros(C × L × H)
8	For hi = 0 to H − 1 do
9	#pragma HLS UNROLL factor = 8
10		For ci = 0 to C − 1 do
11			For li = 0 to L − 1 do
12				For i = 0 to K − 1 do
13				#pragma HLS UNROLL factor = 8
14					For j = 0 to K − 1 do
15						val ← I[ciS + i][liS + j][hi]
16						if (val > max_val) then
17							max_val ← val
18						end if
19					end for
20				end for
21				O[ci][li][hi] ← max_val
22			end for
23		end for
24	end for

For the ReLU (Rectified Linear Unit) activation function, it is applied element-wise to the feature map to introduce non-linearity into the model. This is crucial because it allows the network to learn complex patterns and representations that a linear model could not. ReLU sets all negative values in the feature map to zero while leaving positive values unchanged. The ReLU activation is mathematically defined as:

ReLU(x) = max(0, x(i, j, k))

(3)

ReLU is simple and computationally efficient, making it ideal for FPGA implementation. Each element of the feature map is processed in parallel, and negative values are effectively discarded, enabling the model to focus on positive activations. Algorithm 3 presents the proposed ReLU Activation for FPGA acceleration with generic parameters.

Algorithm 3: Proposed ReLU Activation for FPGA Acceleration with Generic Parameters and Maximal Supported Configuration (M = 256, N = 256, H = 512)
Inputs:
1	I ← Input feature map (height M, width N, depth H) [M][N][H]
Outputs:
2	O ← Output feature map [M][N][H]

3	Initialize output feature map: O ← zeros (M × N × H)
4	For hi = 0 to H − 1 do
5	#pragma HLS UNROLL factor = 8
6		For i = 0 to M − 1 do
7			For j = 0 to N − 1 do
8				if I[i][j][hi] > 0 then
9					O[i][j][hi] ← I[i][j][hi]
10				else
11					O[i][j][hi] ← 0
12				end if
13			end for
14		end for
15	end for

Collectively, these IP cores form the computational backbone of the FPGA-accelerated CNN, supporting reconfigurability for different kernel sizes, feature map dimensions, and depth channels. The design leverages local buffering in on-chip BRAMs to minimize external memory access, exploits FPGA DSP slices for multiply accumulate operations, and pipelines dataflow across layers to maximize throughput. This modular, parameterized approach enables the deployment of CNNs for real-time multi-class brain tumor classification on resource-constrained FPGA platforms.

The proposed hardware acceleration for the brain cancer classification system is designed to efficiently execute the three core CNN operations, Conv2D, MaxPooling2D, and ReLU, using custom FPGA IP cores configured for high performance. These operations are executed iteratively across the five convolutional stages of the network (i.e., n = 5), enabling a fully pipelined and optimized hardware workflow. Each IP is implemented with its maximum supported configuration, ensuring that the FPGA can process the largest feature maps and filter dimensions required by the model. For instance, the Conv2D IP is capable of handling feature maps up to 254 × 254 pixels with 512 filters, the MaxPooling2D IP operates on 127 × 127 × 512 feature maps, and the ReLU IP is applied after each convolution stage to introduce non-linearity at minimal computational overhead. Figure 5 presents a detailed overview of the proposed hardware implementation approach for brain cancer multi-classification.

Key optimization strategies are applied to enhance the performance of the proposed hardware accelerator. Loop unrolling with a factor of 8 is employed to increase parallelism, allowing multiple computations to be executed simultaneously and significantly reducing the latency of each operation. Furthermore, pipelining is introduced to overlap data processing and memory access, thereby improving throughput and ensuring continuous data flow across the Conv2D, MaxPooling, and ReLU IPs. The proposed CNN layers are synthesized and validated on a Xilinx FPGA platform, where critical performance indicators including execution time, throughput, resource utilization, and power consumption are systematically monitored. The resulting hardware acceleration achieves substantial reductions in processing time while maintaining efficient resource usage, making it well-suited for real-time brain tumor classification from MRI scans. Figure 6 presents the proposed hardware design for brain cancer multi-classification.

The proposed hardware-accelerated approach substantially reduces computational latency compared to traditional software-based CNN implementations, delivering a highly efficient solution for large-scale medical image classification. By leveraging FPGA-optimized IP cores, the system achieves real-time processing capability, making it well-suited for deployment in resource-constrained and time-critical clinical environments.

The next section presents a detailed discussion of the experimental results, comparing the software and hardware implementations. The benefits of FPGA acceleration, trade-offs in terms of power and execution time, and comparisons with existing state-of-the-art approaches are analyzed to highlight the advantages of the proposed hardware implementation.

4. Results and Performance Analysis

To evaluate the efficiency of the proposed hardware-accelerated CNN for brain cancer multi-classification, the designed IP cores for Conv2D, MaxPooling, and ReLU were synthesized and exported as RTL IPs using Vivado 2018.2. These IPs were then integrated into the PYNQ-Z2 FPGA platform, where a bitstream was generated and deployed to enable hardware acceleration. The implementation process involved optimizing the design for efficient computation and resource utilization.

For software execution and system control, Jupyter Notebook (Python 3.10) was used as an interactive environment to communicate with the FPGA through the PYNQ framework on the PYNQ-Z2 board. The trained CNN model was executed in different configurations, comparing its performance in three evaluation setups: (1) software-only execution on a Cortex-A9 processor, (2) hardware-accelerated execution on FPGA, and (3) hybrid CPU-FPGA execution.

The following sections provide a detailed analysis of model accuracy, computational performance on different platforms, and a comparative study of the proposed approach with existing implementations.

4.1. Evaluation of the Proposed Model

The performance of the proposed CNN-based brain tumor classification model is evaluated using standard classification metrics, including accuracy, precision, recall, and F1-score. The trained model is tested on an independent dataset to assess its generalization ability in distinguishing between the four tumor classes: glioma, meningioma, pituitary tumor, and no tumor.

To ensure robust evaluation, the model is trained using categorical cross-entropy loss and optimized with the Adam optimizer. The learning process is monitored by tracking training and validation accuracy, allowing for fine-tuning of hyperparameters to enhance performance. The evaluation of the proposed CNN model for brain tumor classification demonstrates high accuracy and reliable classification performance across all tumor categories. As shown in Table 3 and Figure 7, the model achieves an overall accuracy of 98.43%, indicating its effectiveness in distinguishing between glioma, meningioma, pituitary tumors, and no tumor cases.

The precision values for all classes exceed 97.85%, confirming that the model correctly identifies tumor types with minimal false positives. Similarly, the recall values, which measure the ability to correctly detect positive cases, remain consistently high, with an overall value of 98.58%, ensuring that most tumors are correctly classified. The F1-score, a harmonic mean of precision and recall, further supports the model’s robustness, reaching 98.37% across all categories.

These results highlight the effectiveness of the proposed CNN architecture in learning discriminative features from MRI images, making it a reliable solution for automated brain tumor classification. The next sections will analyze the computational performance of the model on different hardware platforms, comparing software-based execution on the Cortex-A9 processor with the hardware-accelerated FPGA implementation.

4.2. Evaluation of Cortex-A9 CPU

To establish a baseline for performance comparison, the proposed CNN model was first executed on the Cortex-A9 processor, which is embedded within the Zynq-7000 SoC. The model was implemented using TensorFlow lite and executed on the CPU to measure key performance metrics, including inference time, power consumption, and throughput. The results, including execution time and classification accuracy, are summarized in Table 4.

During execution, the CNN processed MRI images sequentially, utilizing the ARM Cortex-A9 dual-core processor running at 667 MHz. The inference time per image was recorded, revealing an average processing time of 721 ms per image, with a total execution time of 1 s for the entire test dataset. While the software-based implementation achieved the expected accuracy of 98.43%, it exhibited limitations in terms of computational efficiency, particularly in real-time applications where rapid inference is required.

Furthermore, the power consumption was measured during inference, averaging 3.2 watts, which is relatively high for embedded systems requiring low-power consumption. The throughput, defined as the number of images processed per second, was 1.386 images per second, indicating that the CPU-based execution struggles to meet real-time processing demands for large-scale medical datasets.

Overall, while the Cortex-A9 CPU implementation provides a reference for software-based execution, its high latency and power consumption make it less suitable for real-time clinical applications. To overcome these limitations, a hardware-accelerated implementation on an FPGA is proposed, significantly improving processing speed and energy efficiency. The next section presents the evaluation of the hybrid CPU-FPGA implementation and its advantages over the software-only approach.

4.3. Evaluation of Hybrid CPU-FPGA Implementation

To enhance computational efficiency, the proposed CNN model was accelerated using a hybrid CPU-FPGA approach, where key operations such as Conv2D, MaxPooling, and ReLU were offloaded to FPGA hardware while the rest of the processing remained on the Cortex-A9 CPU. The hardware-accelerated implementation was deployed on the PYNQ-Z2 board, leveraging Vivado HLS 2018.2 to generate optimized hardware IP cores.

Table 4 summarizes the resource utilization and performance metrics for the hardware implementation of Conv2D, MaxPooling, and ReLU as IP cores on the PYNQ-Z2 FPGA. The metrics include Slices, LUTs (Look-Up Tables), FFs (Flip-Flops), DSP (Digital Signal Processing units), BRAM (Block RAM), Latency, and Operating Frequency.

The Conv2D layer requires significant FPGA resources, utilizing 15,456 slices, 6665 LUTs, and 2234 FFs. It operates at a frequency of 120.17 MHz, with a latency of 34,848,820 cycles, reflecting the complexity of convolution operations. The MaxPooling layer consumes fewer resources, with 238 slices, 833 LUTs, and 1238 FFs, and operates at 117.81 MHz with a latency of 981 cycles. The ReLU activation function requires 3376 slices, 5561 LUTs, and 9637 FFs, operating at 102.56 MHz with a latency of 269 cycles.

The evaluation results (Figure 8) demonstrate a significant improvement in execution speed and power efficiency. Table 5 presents the Performance results of the proposed hardware-acceleration of brain cancer Multi-classification in PYNQ-Z2. The accuracy of the FPGA-accelerated model was 96.09%, slightly lower than the software-only implementation but still within an acceptable range for medical diagnosis. The execution time per image was drastically reduced to 0.0869 s, making the FPGA-based solution highly suitable for real-time applications. This reduction in latency represents a major advantage over the CPU-only approach, which required 721 ms per image.

Moreover, the power consumption was reduced to 1.04 W, showcasing the energy efficiency of the FPGA-based acceleration compared to the 3.2 W consumed in the CPU-based implementation. The throughput, measured as the number of images processed per second, was significantly improved, enabling the system to handle larger datasets with minimal delay.

Table 6 provides a comprehensive comparison of recent machine learning methods applied to brain cancer classification, focusing on accuracy, hardware platform, power consumption, and latency. Most existing works rely predominantly on GPU or CPU-based architectures, achieving high accuracy but often at the cost of significant computational resources and power requirements. For example, EfficientNet-based models such as EfficientNetB3 [31] and EfficientNetB0 [32] achieve accuracies above 97% but require powerful GPU hardware, making them less suitable for embedded or real-time medical applications. Similarly, models combining image enhancement techniques (e.g., CLAHE + DWT [33] and CLAHE + CNN [34]) achieve competitive performance but are not optimized for hardware acceleration.

A few studies investigate embedded platforms, such as the ARM Cortex-A9 CPU [24] or Jetson TX2 [41], which reduce power consumption but suffer from higher latency due to limited parallel computation capabilities. Hybrid CPU + FPGA solutions, such as the one reported in [24], offer improvements in performance and energy efficiency; however, the achieved accuracy remains lower than state-of-the-art CNN models.

The proposed CNN-2D hardware-accelerated architecture demonstrates a superior balance between accuracy, power consumption, and execution speed. When deployed on the dual-core ARM Cortex-A9, the model achieves 98.43% accuracy, surpassing most CPU- and GPU-based implementations while maintaining moderate power consumption. More importantly, the fully FPGA-accelerated version on XC7Z020 achieves 96.09% accuracy with a significantly reduced power consumption of 1.04 W and a low inference latency of 0.0869 s, outperforming all comparable embedded and edge-AI approaches. This demonstrates that the proposed architecture provides an effective trade-off between classification accuracy and hardware efficiency, making it highly suitable for real-time and resource-constrained medical imaging systems.

Overall, the proposed full hardware implementation demonstrates a highly efficient and scalable solution for real-time brain cancer classification, delivering competitive accuracy while significantly reducing power consumption and inference latency. By fully mapping the CNN computation onto the FPGA and optimizing each processing block Conv2D, ReLU, and MaxPooling the architecture achieves substantial acceleration under strict resource constraints, outperforming conventional CPU and GPU-based approaches in terms of energy efficiency and suitability for embedded medical imaging applications.

5. Conclusions

In this work, we have proposed a hardware-accelerated CNN model for the multi-classification of brain cancer, focusing on the classification of MRI images into four categories: glioma, meningioma, no tumor, and pituitary tumors. The model was optimized using FPGA acceleration to enhance the speed and efficiency of the classification process, addressing the key challenges of execution time, power consumption, and throughput that are critical for real-time medical applications.

The evaluation results demonstrated that the FPGA-based approach significantly improved the overall performance compared to a software-only implementation on a Cortex-A9 CPU. The accuracy of the CNN model remained high at 96.09% even with the FPGA acceleration, and the execution time was drastically reduced to 0.0869 s per image, a substantial improvement over the 721 ms required by the CPU-based model. Furthermore, the power consumption was minimized to 1.04 W, compared to 3.2 W for the CPU-based implementation, making the FPGA solution more suitable for energy-efficient real-time medical diagnostics.

In terms of hardware resource utilization, the proposed Conv2D, MaxPooling, and ReLU IP cores demonstrated efficient use of FPGA resources, with frequencies ranging from 102.56 MHz to 120.17 MHz and optimized latency and resource consumption. These optimizations highlight the capability of FPGA acceleration in processing complex deep learning models while maintaining low power and high efficiency.

The results confirm that FPGA-based acceleration offers a power-efficient, fast, and scalable solution for real-time brain cancer classification, making it highly applicable for embedded medical imaging systems. This work opens avenues for future enhancements, including further FPGA optimizations and the integration of more advanced AI techniques for even more complex medical diagnosis tasks. Additionally, ongoing research will explore the potential for scaling this solution to handle other types of medical imaging data and improving the overall system’s generalizability and accuracy.

Author Contributions

Conceptualization, A.M. and W.G.; methodology, I.N.; software, A.M.; validation, A.B.A., I.N. and M.M.; formal analysis, A.M.; investigation, W.G.; resources, M.M.; data curation, A.B.A.; writing—original draft preparation, A.M. and W.G.; writing—review and editing, A.M. and W.G.; visualization, I.N.; supervision, A.B.A.; project administration, M.M.; funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was not funded by any external source.

Data Availability Statement

The dataset used in this study is publicly available on Kaggle at: https://www.kaggle.com/datasets/ (accessed on 2 January 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, A.; Shukla, S.K.; Prakash, N.; Yadav, R.K. A deep learning and powerful computational framework for brain cancer MRI image recognition. J. Inst. Eng. Ser. B 2024, 105, 1–18. [Google Scholar]
Kanna, R.K.; Sahoo, S.K.; Mandhavi, B.K.; Mohan, V.; Babu, G.S.; Panigrahi, B.S. Detection of Brain Tumour based on Optimal Convolution Neural Network. EAI Endorsed Trans. Pervasive Health Technol. 2024, 10. [Google Scholar] [CrossRef]
Bansal, S.; Jadon, R.S.; Gupta, S.K. Robust Hybrid Convolutional Network for Tumor Classification Using Brain MRI Image Datasets. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 576–584. [Google Scholar] [CrossRef]
Rao, K.N.; Khalaf, O.I.; Krishnasree, V.; Kumar, A.S.; Alsekait, D.M.; Priyanka, S.S.; Alattas, A.S.; AbdElminaam, D.S. An efficient brain tumor detection and classification using pre-trained convolutional neural network models. Heliyon 2024, 10, e36773. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Wang, P.; Li, Y.; Shi, X.; Yin, T.; Yu, J.; Teng, F. Development and validation of an MRI-Based nomogram to predict the effectiveness of immunotherapy for brain metastasis in patients with non-small cell lung cancer. Front. Immunol. 2024, 15, 1373330. [Google Scholar] [CrossRef]
Mokri, S.M.G.; Valadbeygi, N.; Grigoryeva, V. Diagnosis of Glioma, Menigioma and Pituitary brain tumor using MRI images recognition by Deep learning in Python. EAI Endorsed Trans. Intell. Syst. Mach. Learn. Appl. 2024, 1. [Google Scholar] [CrossRef]
Reddy, L.C.S.; Elangovan, M.; Vamsikrishna, M.; Ravindra, C. Brain Tumor Detection and Classification Using Deep Learning Models on MRI Scans. EAI Endorsed Trans. Pervasive Health Technol. 2024, 10. [Google Scholar] [CrossRef]
Saeed, Z.; Bouhali, O.; Ji, J.X.; Hammoud, R.; Al-Hammadi, N.; Aouadi, S.; Torfeh, T. Cancerous and non-cancerous MRI classification using dual DCNN approach. Bioengineering 2024, 11, 410. [Google Scholar] [CrossRef]
Khan, H.A.; Jue, W.; Mushtaq, M.; Mushtaq, M.U. Brain tumor classification in MRI image using convolutional neural network. Math. Biosci. Eng. 2021, 17, 6203–6216. [Google Scholar] [CrossRef] [PubMed]
Mahajan, S.; Dhull, A.; Dahiya, A. An Efficient Deep Learning Technique for Brain Abnormality Detection Using MRI. Intel. Artif. 2025, 28, 81–100. [Google Scholar] [CrossRef]
Wu, T.; Sun, J.; Wang, Z.; Tan, J.; Tang, X.; Xiong, D.; Feiweier, T.; Gong, Q.; Xing, H.; Wu, M. Accelerated MR cell size imaging through parallel acquisition technique (PAT) and simultaneous multi-slice (SMS) with local principal component analysis (LPCA) enhancement. Magn. Reson. Imaging 2025, 117, 110327. [Google Scholar] [CrossRef] [PubMed]
Agnoli, A.-L.; Jungmann, D.; Lochner, B. Resonance imaging of brain tumors: Application of gadolinium-DTPA and comparison to computed tomography. Neurosurg. Rev. 1987, 10, 25–29. [Google Scholar] [CrossRef] [PubMed]
Khaki, A.M.Z.; Choi, A. Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification. Appl. Sci. 2025, 15, 422. [Google Scholar] [CrossRef]
Sakai, Y.; Tamiya, Y. S-DFP: Shifted dynamic fixed point for quantized deep neural network training. Neural Comput. Appl. 2025, 37, 535–542. [Google Scholar] [CrossRef]
Tasci, M.; Istanbullu, A.; Tumen, V.; Kosunalp, S. FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs. Appl. Sci. 2025, 15, 688. [Google Scholar] [CrossRef]
Ryu, J.S.; Kang, H.; Chu, Y.; Yang, S. Vision-language foundation models for medical imaging: A review of current practices and innovations. Biomed. Eng. Lett. 2025, 15, 809–830. [Google Scholar] [CrossRef]
Jha, D.; Durak, G.; Das, A.; Sanjotra, J.; Susladkar, O.; Sarkar, S.; Rauniyar, A.; Tomar, N.K.; Peng, L.; Li, S.; et al. Ethical framework for responsible foundational models in medical imaging. Front. Med. 2025, 12, 1544501. [Google Scholar] [CrossRef]
Missaoui, R.; Hechkel, W.; Saadaoui, W.; Helali, A.; Leo, M. Advanced Deep Learning and Machine Learning Techniques for MRI Brain Tumor Analysis: A Review. Sensors 2025, 25, 2746. [Google Scholar] [CrossRef]
Berghout, T. The Neural Frontier of Future Medical Imaging: A Review of Deep Learning for Brain Tumor Detection. J. Imaging 2025, 11, 2. [Google Scholar] [CrossRef]
Xie, Y.; Zaccagna, F.; Rundo, L.; Testa, C.; Agati, R.; Lodi, R.; Manners, D.N.; Tonon, C. Convolutional neural network techniques for brain tumor classification (from 2015 to 2022): Review, challenges, and future perspectives. Diagnostics 2022, 12, 1850. [Google Scholar] [CrossRef]
Wang, K.; Liu, Z.; Lin, Y.; Lin, J.; Han, S. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8612–8620. [Google Scholar]
Xiong, S.; Wu, G.; Fan, X.; Feng, X.; Huang, Z.; Cao, W.; Zhou, X.; Ding, S.; Yu, J.; Wang, L.; et al. MRI-based brain tumor segmentation using FPGA-accelerated neural network. BMC Bioinform. 2021, 22, 421. [Google Scholar] [CrossRef]
Jarrah, A.; Amri, S. Optimized fpga-based implementation of brain tumor detection by combining K-means and grey wolf optimization algorithms. Trait. Du Signal 2022, 39, 1879. [Google Scholar] [CrossRef]
Mhaouch, A.; Gtifa, W.; Althobaiti, T.; Faraj, H.; Machhout, M. A Quality of Service Analysis of FPGA-Accelerated Conv2D Architectures for Brain Tumor Multi-Classification. Comput. Mater. Contin. 2025, 84, 5637–5663. [Google Scholar] [CrossRef]
Hussain, S.I.; Toscano, E. Enhancing Recognition and Categorization of Skin Lesions with Tailored Deep Convolutional Networks and Robust Data Augmentation Techniques. Mathematics 2025, 13, 1480. [Google Scholar] [CrossRef]
Nickparvar, M. Brain_Tumor_MRI Dataset. Kaggle. Dataset. 2021. Available online: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset (accessed on 10 May 2023).
Cheng, J. Brain Tumor Dataset. Figshare. 2017. Available online: https://figshare.com/articles/dataset/brain_tumor_dataset/1512427 (accessed on 10 May 2023).
Kaggle. Brain Tumor Classification (MRI). Available online: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri (accessed on 10 July 2023).
Hamada, A. Br35H: Brain Tumor Detection. 2020. Available online: https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection (accessed on 10 May 2023).
Mhaouch, A.; Gtifa, W.; Machhout, M. FPGA Hardware Acceleration of AI Models for Real-Time Breast Cancer Classification. AI 2025, 6, 76. [Google Scholar] [CrossRef]
Reyes, D.; Sánchez, J. Performance of convolutional neural networks for the classification of brain tumors using magnetic resonance imaging. Heliyon 2024, 10, e25468. [Google Scholar] [CrossRef]
Ishaq, A.; Ullah, F.U.M.; Hamandawana, P.; Cho, D.-J.; Chung, T.-S. Improved EfficientNet Architecture for Multi-Grade Brain Tumor Detection. Electronics 2025, 14, 710. [Google Scholar] [CrossRef]
Hekmat, A.; Zhang, Z.; Khan, S.U.R.; Bilal, O. Brain tumor diagnosis redefined: Leveraging image fusion for MRI enhancement classification. Biomed. Signal Process. Control 2025, 109, 108040. [Google Scholar] [CrossRef]
Rasheed, Z.; Ma, Y.K.; Ullah, I.; Ghadi, Y.Y.; Khan, M.Z.; Khan, M.A.; Abdusalomov, A.; Alqahtani, F.; Shehata, A.M. Brain Tumor Classification from MRI Using Image Enhancement and Convolutional Neural Network Techniques. Brain Sci. 2023, 13, 1320. [Google Scholar] [CrossRef]
Oztel, I. Ensemble Deep Learning Approach for Brain Tumor Classification Using Vision Transformer and Convolutional Neural Network. Adv. Intell. Syst. 2025, 7, 2500393. [Google Scholar] [CrossRef]
Mohanty, B.C.; Subudhi, P.K.; Dash, R.; Mohanty, B. Feature-enhanced deep learning technique with soft attention for MRI-based brain tumor classification. Int. J. Inf. Technol. 2024, 16, 1617–1626. [Google Scholar] [CrossRef]
Hossain, S.; Chakrabarty, A.; Gadekallu, T.R.; Alazab, M.; Piran, M.J. Vision Transformers, Ensemble Model, and Transfer Learning Leveraging Explainable AI for Brain Tumor Detection and Classification. IEEE J. Biomed. Health Inform. 2024, 28, 1261–1272. [Google Scholar] [CrossRef] [PubMed]
Mathivanan, S.K.; Sonaimuthu, S.; Murugesan, S.; Rajadurai, H.; Shivahare, B.D.; Shah, M.A. Employing deep learning and transfer learning for accurate brain tumor detection. Sci. Rep. 2024, 14, 7232. [Google Scholar] [CrossRef] [PubMed]
Baez, A.; Fabelo, H.; Ortega, S.; Florimbi, G.; Torti, E.; Hernandez, A.; Leporati, F.; Danese, G.; Callico, G.M.; Sarmiento, R. High-level synthesis of multiclass SVM using code refactoring to classify brain cancer from hyperspectral images. Electronics 2019, 8, 1494. [Google Scholar] [CrossRef]
Mallick, A.; Prasad, H.; Maji, P.; Banerjee, S.; Mondal, H.K. Deep Learning for Brain Tumor Detection with FPGA Pathway. In Proceedings of the 2024 IEEE International Symposium on Smart Electronic Systems (iSES), New Delhi, India, 16–18 December 2024; IEEE: New York, NY, USA, 2024; pp. 364–367. [Google Scholar]
Sutradhar, P.; Sancho, J.; Villa, M.; Martin-Perez, A.; Vazquez, G.; Rosa, G.; de Ternero, A.M.; Jimenez-Roldan, L.; Perez-Nunez, A.; Lagares, A.; et al. Exploration of realtime brain tumor classification from hyperspectral images in heterogeneous embedded MPSoC. In Proceedings of the 2022 37th Conference on Design of Circuits and Integrated Circuits (DCIS), Pamplona, Spain, 16–18 November 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]

Figure 1. Workfow of the proposed AI-Model for proposed brain cancer detection.

Figure 2. Sample Brain MRI Images Used for Tumor Classification: Glioma, Meningioma, and Pituitary Tumor.

Figure 3. Proposed CNN architecture for Tumor Classification: Glioma, Meningioma, and Pituitary Tumor.

Figure 4. Workflow for the Design, Optimization, and Implementation of the Proposed IPs [30].

Figure 5. Proposed hardware implementation approach for brain cancer multi-classification.

Figure 6. Proposed hardware design for the Brain Cancer Multi-Classification system.

Figure 7. Accuracy and loss metrics for the proposed Brain Cancer Multi-Classification.

Figure 8. Comparison of Execution Time and Power Consumption for Brain Cancer Multi-Classification Model on Cortex-A9 and Zynq XC7Z020.

Table 1. Data Augmentation Parameters for proposed Brain Tumor Multi-Classification.

Augmentation Technique	Parameters	Description
Rotation	Angles: 45°, 90°, 135°, 180°, 225°, 270°	Rotating MRI images to simulate variations in patient positioning and orientation.
Flipping	Vertical (V), and Horizontal (H)	Flipping images along vertical and horizontal axes to increase variability and enhance model generalization.
Zooming	0.7× to 1.3×	Zooming in and out to allow the model to detect tumors at different scales.
Brightness Adjustment	±30%	Modifying image brightness to account for variations in imaging conditions and scanner settings.

Table 2. A detailed summary of the proposed CNN architecture.

Layer (Type)	Output Shape	Parameters
Input Layer	(256, 256, 1)	0
Conv2D (32 filters)	(254, 254, 32)	320
MaxPooling2D	(127, 127, 32)	0
Conv2D (64 filters)	(125, 125, 64)	18,496
MaxPooling2D	(62, 62, 64)	0
Conv2D (128 filters)	(60, 60, 128)	73,856
MaxPooling2D	(30, 30, 128)	0
Conv2D (256 filters)	(28, 28, 256)	295,168
MaxPooling2D	(14, 14, 256)	0
Conv2D (512 filters)	(12, 12, 512)	1,180,160
MaxPooling2D	(6, 6, 512)	0
Flatten	(18,432)	0
Dense (512 units)	(512)	9,437,696
Dense (4 units—output)	(4)	2052
Total Parameters		11,007,748
Trainable Parameters		11,007,748
Non-trainable Parameters		0

Table 3. Performance evaluation of the proposed brain Cancer Multi-Classification.

Metric	Glioma (%)	Meningioma (%)	Pituitary Tumor (%)	No Tumor (%)	Overall (%)
Accuracy	98.21	98.65	98.75	98.12	98.43
Precision	97.90	98.30	98.60	97.85	98.16
Recall	98.40	98.80	98.90	98.25	98.58
F1-Score	98.15	98.55	98.75	98.05	98.37

Table 4. Hardware implementation of Conv2D, MaxPooling, and ReLU as IP cores on the PYNQ-Z2 FPGA.

IP	Slices	LUTs	FFs	DSP	BRAM	Latency	Freq Mhz
Conv2D	15,456	6665	2234	20	0	34,848,820	120.17
Maxpooling	238	833	1238	0	2	981	117.81
ReLu	3376	5561	9637	0	2	269	102.56

Table 5. Performance of the proposed implementations for Brain Cancer Multi-Classification in PYNQ-Z2.

	Dual Core ARM Cortex-A9				Zynq XC7Z020
	Execution Time (s)	Accuracy (%)	Power (W)	Throughput (FPS)	Execution Time (s)	Accuracy (%)	Power (W)	Throughput (FPS)
Proposed model	0.721	98.43	3.2	1.386	0.0869	96.09	1.04	11.62

Table 6. Comparison of Different Machine Learning Approaches.

Works	Method	Hardware Platform	Accuracy (%)	Power (W)	Latency (s)
[24]	CNN	ARM CPU (Cortex-A9)	94.2%	6.2	0.771
[24]	CNN	CPU + FPGA	94.1%	3.6	0.683
[31]	EfficientNetB3	-	97.5	-	-
[35]	CNN-ViT	GPU	84.35%	-	-
[32]	CLAHE + DWT	-	94.28%	-	-
[36]	-	GPU	95.1%	-	-
[37]	Inception-ResnetV2	-	93.8	-	-
[38]	MobileNetv3	-	98.5	-	-
[33]	CLAHE + CNN	-	83.0	-	-
[34]	EfficientNetB0	NVIDIA V100Q	98.5	-	-
[39]	SVM	ZedBoard (ZC7020)	-	2.04	-
[40]	EfficientNetB0	CPU	94.07	-	-
[41]	-	Jetson TX2 (854 MHz)	-	-	0.0353
This work	CNN-2D	ARM CPU (Cortex-A9)	98.43	3.2	0.721
This work	CNN-2D	FPGA XC7Z020	96.09	1.04	0.0869

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mhaouch, A.; Gtifa, W.; Nouira, I.; Abdelali, A.B.; Machhout, M. A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA. Algorithms 2026, 19, 107. https://doi.org/10.3390/a19020107

AMA Style

Mhaouch A, Gtifa W, Nouira I, Abdelali AB, Machhout M. A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA. Algorithms. 2026; 19(2):107. https://doi.org/10.3390/a19020107

Chicago/Turabian Style

Mhaouch, Ayoub, Wafa Gtifa, Ibtihel Nouira, Abdessalem Ben Abdelali, and Mohsen Machhout. 2026. "A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA" Algorithms 19, no. 2: 107. https://doi.org/10.3390/a19020107

APA Style

Mhaouch, A., Gtifa, W., Nouira, I., Abdelali, A. B., & Machhout, M. (2026). A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA. Algorithms, 19(2), 107. https://doi.org/10.3390/a19020107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Reconfigurable CNN-2D Hardware Architecture for Real-Time Brain Cancer Multi-Classification on FPGA

Abstract

1. Introduction

2. Related Works

3. Methodology and Proposed Method

3.1. Proposed Brain Cancer Multi-Classification

3.1.1. Methodology

3.1.2. Dataset and Preprocessing

3.1.3. Proposed CNN Architecture for Brain Cancer Multi-Classification

3.2. Proposed Hardware-Acceleration

4. Results and Performance Analysis

4.1. Evaluation of the Proposed Model

4.2. Evaluation of Cortex-A9 CPU

4.3. Evaluation of Hybrid CPU-FPGA Implementation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI