Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems

J. Low Power Electron. Appl. 2025, 15(3), 40; https://doi.org/10.3390/jlpea15030040

by Muhammet Arucu^1,*

and Teodor Iliev²

Reviewer 1:

Wenjing Wu

Reviewer 2:

Chengying Chen

Reviewer 3: Anonymous

J. Low Power Electron. Appl. 2025, 15(3), 40; https://doi.org/10.3390/jlpea15030040

Submission received: 30 May 2025 / Revised: 10 July 2025 / Accepted: 16 July 2025 / Published: 21 July 2025

(This article belongs to the Topic Advanced Integrated Circuit Design and Application)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript titled 'Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems' by Arucu et al. show their study on advanced semiconductor technologies found that FPGAs outperform CPUs and GPUs in speed and energy efficiency for real-time applications, as demonstrated by their superior performance in FIR filter implementation. The work is well organized and delivered, I would recommend a minor revision for this to be accepted in JLPEA. My comments can be found below:

1) The author should show a summary table for the performance evaluation. The current section contains to many bullet points;

2) can author clarify the CPU benchmark: is it run through Python on the PYNQ board's Linux OS?

3)it would be helpful to provide direct power measurement.

4)The abstract mentioned scalability, but in the main text it was not discussed thoroughly.

Author Response

Dear Reviewer

We sincerely appreciate your thorough and insightful feedback on our manuscript. Your detailed comments and constructive criticisms have been invaluable in guiding the revision process and improving the quality of our work. In response to your concerns, we have undertaken a comprehensive revision of the paper, addressing each point raised to ensure clarity, accuracy, and academic rigor. Below, we outline the specific changes made in response to your comments.

Comment 1: The author should show a summary table for the performance evaluation. The current section contains too many bullet points.

Response 1: We appreciate the suggestion to streamline the performance evaluation section. In the revised manuscript, we have added a summary table in Section 4 (Performance Evaluation and Results) that consolidates the key performance metrics, including processing times (FPGA: 0.004 s, GPU: 0.008 s, CPU: 0.107 s) and power consumption (FPGA: 1.431 W, GPU: ~50–100 W, CPU: ~5–10 W) across the three platforms. This table replaces the extensive bullet points, providing a clearer and more concise comparison of computational efficiency, latency, and energy consumption, enhancing readability and alignment with JLPEA’s standards.

Comment 2: Can author clarify the CPU benchmark: is it run through Python on the PYNQ board’s Linux OS?

Response 2: Thank you for seeking clarification. The CPU benchmark was indeed executed on the ZYNQ XC7Z020 chip’s Arm processor, running on the Linux operating system installed on the PYNQ board, using Python with NumPy and SciPy libraries for FIR filter implementation. This has been explicitly detailed in the revised Section 3 (Methodology) of the manuscript. We have further clarified the software environment, specifying the use of Jupyter Notebook on the PYNQ board for generating and processing the input signal, as noted in Page 9, to ensure reproducibility and transparency.

Comment 3: It would be helpful to provide direct power measurement.

Response 3: We agree that direct power measurements enhance the study’s rigor. In the revised manuscript, we have included detailed power consumption measurements in Section 4, specifying that the FPGA implementation consumes 1.431 W, as reported by the Vivado HLx power analysis tool (Page 9). For GPU and CPU, power consumption is estimated at approximately GPU: ~50–100 W, CPU: ~5–10 W each, based on standard specifications for the Tesla K80 GPU and ZYNQ XC7Z020 Arm processor, respectively, as clarified in the updated performance evaluation table. Additionally, we have added a power consumption bar chart in Section 4 to visually compare these metrics, addressing the need for clear and direct power data presentation.

Comment 4: The abstract mentioned scalability, but in the main text it was not discussed thoroughly.

Response 4: We acknowledge the need for a more thorough discussion of scalability. In the revised manuscript, we have expanded Section 5 (Discussion) to provide a detailed analysis of scalability (Page 11). This includes an explanation of FPGA scalability through reconfigurable logic blocks and parallel processing of filter taps, contrasted with GPU scalability via CUDA parallelism on the Tesla K80, and CPU limitations due to sequential processing. We further discuss constraints, such as FPGA resource limits and GPU power consumption challenges in massive-scale applications, to provide a comprehensive evaluation. This addition aligns the main text with the abstract’s claims and strengthens the manuscript’s depth.

We believe these revisions enhance the scientific rigor and focus of the manuscript. We are grateful for your insightful comments and look forward to your further feedback.

Sincerely,

Reviewer 2 Report

Comments and Suggestions for Authors

(1) This paper lacks substantive content on the discussion and comparison of FPGA, GPU, and CPU. The scenarios of the three applications are different, and it is not appropriate to compare them. FPGA is mainly used for verification work, rather than being implemented as a chip, which is completely different from GPU and CPU. The author confused the concepts of these devices
(2) The scale of FIR filters is relatively small and not suitable for evaluating processing speed. And the energy efficiency of GPU is definitely higher than FPGA.

Author Response

Dear Reviewer

Comment 1: This paper lacks substantive content on the discussion and comparison of FPGA, GPU, and CPU. The scenarios of the three applications are different, and it is not appropriate to compare them. FPGA is mainly used for verification work, rather than being implemented as a chip, which is completely different from GPU and CPU. The author confused the concepts of these devices.

Response 1: Thank you for your comment. We have enhanced Section 5 (Discussion, Page 11) to provide a more substantive comparison of FPGA, GPU, and CPU platforms, emphasizing their architectural differences and suitability for FIR filter implementation in DSP applications. The revised discussion clarifies that while FPGAs are used for verification in some contexts, they are also widely deployed as standalone processing units in real-time DSP systems due to their low-latency and energy-efficient hardware customization, as evidenced by our implementation on the ZYNQ XC7Z020 chip (Page 9). We justify the comparison by using a standardized FIR filter benchmark, which is a common DSP task applicable across all three platforms, enabling fair evaluation of computational efficiency, latency, and energy consumption (see performance table). To address potential confusion, we have added a paragraph in Section 1 (Introduction) explaining the complementary roles of FPGA (reconfigurable, low-latency), GPU (parallel processing), and CPU (general-purpose) in DSP, ensuring the comparison’s relevance and clarity.

Comment 2: The scale of FIR filters is relatively small and not suitable for evaluating processing speed. And the energy efficiency of GPU is definitely higher than FPGA.

Response 2: We acknowledge the concern regarding the FIR filter scale and energy efficiency claims. In the revised manuscript, we have addressed the filter scale in Section 5 (Page 11) by noting that the FIR filter, designed with a moderate number of taps using the Kaiser Window Method (Page 7), serves as a representative DSP task for comparing computational efficiency across platforms, as supported by prior studies [17, 18, 30]. While larger filters may amplify performance differences, the chosen scale is sufficient to demonstrate FPGA’s 27x and 2x speed advantage over CPU and GPU, respectively (0.004 s for FPGA, 0.008 s for GPU, 0.107 s for CPU, Page 11). We have added a discussion on limitations, noting that future work will explore larger filter sizes to further validate scalability (Page 14). Regarding energy efficiency, our results show FPGA consumes 1.431 W, compared for GPU and CPU (Page 9), indicating comparable or slightly better efficiency for FPGA in this context. We have clarified in Section 4 that these measurements are derived from Vivado HLx for FPGA and standard specifications for GPU (Tesla K80) and CPU (ZYNQ XC7Z020 Arm processor), supported by a new power consumption chart. To address the reviewer’s claim, we have added a comparison with literature [30] in Section 5, acknowledging that GPUs may excel in energy efficiency for highly parallel tasks but noting FPGA’s advantage in low-latency DSP applications like FIR filtering.

We believe these revisions enhance the scientific rigor and focus of the manuscript. We are grateful for your insightful comments and look forward to your further feedback.

Sincerely,

Reviewer 3 Report

Comments and Suggestions for Authors

Dear authors,

I appreciate the consistency and clearness of the overview content in the first sections, corelated to the context of the domain and of the digital processing platforms.

The contribution of the paper in terms of applicability of the solution is not evidenced. Also, the novelty of the subject is poorly highlighted and referenced. I send you some suggestions and questions, to reconsider the evaluation

in Section 3 – a more exhaustive state of the art is needed, regarding the different FIR algorithms and their relevance to the subject (a classification);
In section 4 - Following the style of the other paragraphs, you should provide more substantial justification for the choice of your FIR filter design method, along with a thorough evaluation;
the calculation stages should be accompanied by strong references;
Does scientific literature report other methods than FIR implementation to evaluate these digital processing devices?
I recommend extending the content of Section 5 to include aspects related to programming tools (open-source vs. proprietary, graphical platforms vs. hand-coded or hybrid, and how accessible are those to the user).
In my opinion a paragraph on digital processing would be valuable and the digital signals processors DSP could be considered in the comparative study.

Author Response

Dear Reviewer

Comment 1: The contribution of the paper in terms of applicability of the solution is not evidenced. Also, the novelty of the subject is poorly highlighted and referenced.

Response 1: We appreciate the concern regarding the evidence of applicability and novelty. In the revised manuscript, we have strengthened Section 5 (Discussion, Page 11) to explicitly highlight the applicability of our findings, emphasizing FPGA’s 27x and 2x performance improvement over CPU and GPU, respectively (0.004 s for FPGA, 0.008 s for GPU, 0.107 s for CPU), and its low power consumption (1.431 W, Page 10) for real-time DSP applications in semiconductor-based systems, such as edge computing and IoT (Page 14). The novelty is underscored by the direct comparison of FPGA, GPU, and CPU using a standardized FIR filter benchmark, a rare quantitative analysis in DSP contexts, supported by new references [30, 31] (Page 16). We have added a paragraph in Section 1 (Introduction) to clarify the study’s contribution to sustainable, low-power DSP solutions, addressing the growing demand in industries like healthcare and autonomous vehicles (Page 15).

Comment 2: In Section 3 – a more exhaustive state of the art is needed, regarding the different FIR algorithms and their relevance to the subject (a classification).

Response 2: Thank you for the suggestion. In the revised Section 3 (Methodology, Page 5), we have expanded the state-of-the-art review to include a classification of FIR filter design algorithms, such as the Windowing Method (e.g., Kaiser Window), Frequency Sampling, and Parks-McClellan (Page 6). We discuss their relevance to DSP applications, noting the Kaiser Window Method’s flexibility in balancing main lobe width and side lobe attenuation, making it ideal for our study’s focus on precise frequency response control (Page 7). This classification is supported by additional references [32, 33, 34] (Page 6), enhancing the literature context and justifying our methodological choices.

Comment 3: In Section 4 – Following the style of the other paragraphs, you should provide more substantial justification for the choice of your FIR filter design method, along with a thorough evaluation.

Response 3: We agree that a stronger justification for the FIR filter design method is necessary. In the revised Section 4 (Performance Evaluation and Results, Page 7), we have added a detailed explanation of why the Kaiser Window Method was chosen, highlighting its ability to adjust the shape parameter for precise control over filter characteristics, suitable for stringent DSP requirements (Page 7). The evaluation now includes a comprehensive performance table comparing processing times (FPGA: 0.004 s, GPU: 0.008 s, CPU: 0.107 s) and power consumption (FPGA: 1.431 W, GPU: ~50–100 W, CPU: ~5–10 W), supported by a new resource utilization and power consumption chart (Page 9), ensuring a thorough and consistent evaluation.

Comment 4: The calculation stages should be accompanied by strong references.

Response 4: To address this, we have revised Section 4 (Page 7) to include strong references for the calculation stages of the FIR filter design using the Kaiser Window Method. Specifically, we cite [34] for the Parks-McClellan algorithm’s role in optimizing filter coefficients and [30, 31] for performance benchmarking methodologies (Page 6, Page 16). Additionally, we reference [32, 33] for the mathematical formulation of FIR filters (e.g., equation 1, Page 6), ensuring that all calculation stages, including coefficient computation and frequency response analysis, are well-supported by peer-reviewed literature, enhancing the study’s credibility.

Comment 5: Does scientific literature report other methods than FIR implementation to evaluate these digital processing devices?

Response 5: Thank you for raising this point. In the revised Section 3 (Page 5), we have added a discussion on alternative methods for evaluating digital processing devices, such as Fast Fourier Transform (FFT) and Infinite Impulse Response (IIR) filters, which are commonly used to assess computational efficiency and memory bandwidth (Page 6). We cite [30] to note that FFT highlights memory bandwidth performance, while IIR filters test computational efficiency for recursive algorithms. We explain that FIR was chosen for its stability and linear-phase properties, making it a standard benchmark for DSP platforms, but we acknowledge the relevance of these alternatives and propose their exploration in future work (Page 14).

Comment 6: I recommend extending the content of Section 5 to include aspects related to programming tools (open-source vs. proprietary, graphical platforms vs. hand-coded or hybrid, and how accessible are those to the user).

Response 6: We appreciate this suggestion. In the revised Section 5 (Page 11), we have added a subsection discussing programming tools for FPGA, GPU, and CPU implementations. For FPGA, we detail the use of Vivado HLx (proprietary) for Verilog-based design and Jupyter Notebook (open-source) on the PYNQ board for Python integration (Page 9). For GPU, we note CUDA (proprietary) usage via Google Colab, and for CPU, we highlight Python with NumPy (open-source). We compare the accessibility of graphical platforms (e.g., Vivado’s block design) versus hand-coded Verilog and hybrid approaches, noting that graphical tools enhance user accessibility for non-experts, while hand-coding offers precision for advanced users. This addition, supported by references [30, 31], addresses tool accessibility and aligns with the journal’s focus on practical DSP solutions.

Comment 7: In my opinion a paragraph on digital processing would be valuable and the digital signals processors DSP could be considered in the comparative study.

Response 7: We agree that including digital signal processors (DSPs) enhances the study’s scope. In the revised Section 1 (Introduction), we have added a paragraph introducing DSPs as specialized processors optimized for signal processing tasks, contrasting their fixed architecture with FPGA’s reconfigurability, GPU’s parallelism, and CPU’s versatility (Page 2). In Section 5 (Page 11), we discuss the potential inclusion of DSPs in comparative studies, noting their efficiency in FIR filtering but highlighting resource constraints compared to FPGAs. We propose future work to include DSPs (e.g., TI C6000 series) in benchmarks (Page 14), supported by reference [30], to provide a more comprehensive evaluation of digital processing platforms.

We believe these revisions enhance the scientific rigor and focus of the manuscript. We are grateful for your insightful comments and look forward to your further feedback.

Article Menu

Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems

Further Information

Guidelines

MDPI Initiatives

Follow MDPI