Next Article in Journal
A Calibration-Free Digital-to-Time Converter for Phase Interpolation-Based Fractional-N PLLs
Next Article in Special Issue
Reliability Analysis of FinFET Based High Performance Circuits
Previous Article in Journal
Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
Previous Article in Special Issue
Mapping and Optimization Method of SpMV on Multi-DSP Accelerator
 
 
Communication
Peer-Review Record

A Hybrid GPU and CPU Parallel Computing Method to Accelerate Millimeter-Wave Imaging

Electronics 2023, 12(4), 840; https://doi.org/10.3390/electronics12040840
by Li Ding 1,2,†, Zhaomiao Dong 1,†, Huagang He 1 and Qibin Zheng 1,2,*
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(4), 840; https://doi.org/10.3390/electronics12040840
Submission received: 11 October 2022 / Revised: 28 January 2023 / Accepted: 1 February 2023 / Published: 7 February 2023
(This article belongs to the Special Issue High-Performance Computing and Its Applications)

Round 1

Reviewer 1 Report

The manuscript is well organized, and the contribution to the literature is good, but  some of corrections let it make better.

1. the conclusions should more clear that is explaining the novelty of the results more deeply. 

2. references should be improved with recent studies

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presents CPU-GPU hybrid implementation of a range migration algorithm for 3D millimeter-wave imaging. The experimental results show the proposed implementation is faster than GPU-only implementation, since some part of the algorithm is not suitable for SIMD style parallelization on a GPU.

(1) A major concern is that the paper just compares the CPU-GPU approach and GPU-only approach and the discussion is not very deep. Do you think the proposed implementation fully utilized the performance potential of the GPU? In general, the theoretical performance model of the GPU acceleration should be built before the experiment, and the obtained performance should be compared to the performance model.

(2) I agree the CPU-GPU approach is sometimes more efficient than the GPU-only approach, but at the same time, the data transfer between the CPU and GPU tends to be a performance bottleneck.  This tradeoff relationship should be analyzed for your application.

(3) MATLAB implementation inherently has a significant performance overhead. Is it adequate to discuss the performance issues with MATLAB implementation rather than native implementation?

(4) What does the colors of plots in Figure 7 show?

(5) Why the results of your implementation are not the same with those of the original MATLAB implementation as shown in Figure 7? What is the algorithmic difference between the two implementations?

(6) Equation (2) in Algorithm 1: s(x', y', k_p) should be sp(x', y', k_p)

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

A Hybrid GPU and CPU Parallel Computing Method to Accelerate Millimeter-wave Imaging 

 
 

This paper describes a CPU-GPU hybrid method to accelerate millimeter-wave Imaging. The paper is fairly well written. 

 
 

My major concern with the paper is the experimental section. I dont believe that the baseline is strong (see comments). Moreover, only one platform is used for evaluation. 

 
 

Comments: 

 
 

  1. It would be better to add pseudo code rather than just equations. Having a loop form helps to better analyze the memory access patterns and control flow which is important for GPU optimizations. 

  1. What is the impact of placing the CPU computed parts in global memory as opposed to constant memory? 

  1. What was the baseliine GPU implementation used for steps 1,2 and 3? How was it chosen. Was cuSolver/Cusolver used for step 3? Was the LU decomp in Nvidia sample repo used? There are several other libraries out there for CUDA LU. 

  1. For GPU only implementation what was the data format chosen for H?  

  1. A [minor] Line 25: Its not fair to call Nvidia just an artificial intelligence computing company. Its much than that. I would recommend to stop that sentence at "NVIDIA". 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Unfortunately, the revised version does not address my major concerns. 

  

1. My major concern with the paper is still the experimental section. The baseline still seems weak. No data was provided to improve confidence in baseline. Instead of addressing what framework was used for the baseline, the authors replied with how. It's still not clear how the proposed approach would compare to Nvidia's solution. 

   

2. It's not enough to just add a pseudo code. It should also be explained. More importantly, the idea behind pseudo-code is to clarify things. The current version raises more questions. For eg. thread_id in line 1. Is it global threadid or threadid within a thread block? Line 6 and similar... refer to the equations. Line 12: id-1 is not valid for threadid=0. That being said, the added pseudo code helps in understanding the paper better. 

  

3. Constant vs global memory: The impact is not clear from the description. It should be quantified. How much performance gain or loss would you suffer if you don't use constant memory? There are several situations where code with warp-level broadcast const read accesses, will give zero or negative performance. Since this is described as a contribution, the impact must be clearly quantified. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop