Improving the Fast Fourier Transform for Space and Edge Computing Applications with an Efficient In-Place Method
Round 1
Reviewer 1 Report (Previous Reviewer 4)
Comments and Suggestions for AuthorsThis revised paper has significantly improved in clarity, technical depth, and presentation quality. The authors have addressed most of my previous concerns, particularly in distinguishing their method from existing techniques, enhancing logical coherence, and expanding the discussion on power consumption and SNR improvements. However, a few minor refinements could further strengthen the paper:
- The newly added Table 9 requires a concise quantitative comparison, which would further substantiate the claims. Additionally, it should be supplemented with references to related works from the past 5 years rather than those exceeding a decade in age.
- The English quality is much improved, but it still has some minor grammatical errors. For example, “the the” in line 211 and line 318 should be corrected to “the”. Please check the entire manuscript as well.
- Subsection 3.4 and Table 5 provide comprehensive power consumption data. It is recommended to further discuss the underlying reasons for the invariance of power consumption with respect to input size.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThis paper introduces a memory-efficient, in-place FFT technique designed for space and edge computing platforms with limited resources.
The authors propose a method where each memory address stores two FFT points.
They also use optimized number formats (FP8, FP16, BFP8, BFP16) to improve performance in terms of execution time, signal-to-noise ratio (SNR), and memory usage.
The proposed method is tested on multiple real devices, including AVR32, Intel Myriad 2, Raspberry Pi Zero 2W, and a Zynq FPGA-based accelerator.
The method is motivated by space missions such as ERMIS 3, where limited processing and memory resources are a major challenge.
The method improves memory usage, execution time, and SNR.
The work is motivated by real satellite missions.
However, to improve clarity and ensure the paper flows smoothly, I recommend the following:
- The paper needs a dedicated "Related Work" section. Add this early in the paper, before the methods. Include a short description of key references listed in Table 9.
- Please move it into the new "Related Work" section for better flow.
- The results should include a comparison with at least two or more papers from Table 9. For example, compare execution time, SNR, or memory use. This will show how your method improves over others.
- Make sure citations are in numerical order.
- Refer to all tables clearly in the text.
- Check that all cited works are in the reference list.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report (New Reviewer)
Comments and Suggestions for Authors The paper is well-written and very comprehensive in its explanations. The strengths of this work lie in its combination of algorithmic innovation and practical experimentation. The idea of storing two FFT elements per memory address to exploit parallel load/store on single-bank memory systems is elegant. I also liked that you validated this idea on four different platforms, what strengthens the impact of your results. Good paper organization too. I have a few minor suggestions that could further improve the paper: In the Results, you mention measuring power on all devices. It would benefit readers if you briefly explain how power was measured or estimated for each platform (e.g., did you use an external power meter, on-board sensor, or simulated?). A sentence or two in the Methods or Results section about the measurement setup for power would be helpful.Regarding the placement of Table 9, it currently appears in the Conclusions section. You might consider moving this table into the Discussion section, since that’s where you are actively comparing your method to others. Overall the writing is clear. I did notice a few very small issues that could be corrected for polish. In the Introduction, the sentence starting "In the satellites the choice is to either..." could be rephrased for clarity to "In satellites, one either has to forward the data to Earth or add computing resources...” Ensure consistent capitalization, e.g., “the Section” with a capital S when referring back to a section, should be lowercase “the section” or Section X. Check for any typos, e.g., repetition like “can be be utilized”, in page 14.
On another note, since you have a GitHub link for the Float8–Float32 converter tool ([22]), consider also providing if possible a link or statement about the availability of your FFT implementation code or scripts used for experiments. If the code is proprietary or part of a project, that’s fine, but if it can be shared, then letting readers know where to find it (perhaps on the same repository or a project website) would be a bonus.
In summary, my suggestions are mostly about presentation and minor clarifications. The core content of the paper is solid. Addressing the above points would polish the manuscript but does not require substantial work, keep up the good work!
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsPlease refer to the attachment
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report (Previous Reviewer 2)
Comments and Suggestions for Authors- This paper proposes an efficient in-place FFT method specifically designed for edge computing devices. The authors introduce a novel way to store two FFT elements per memory address, enabling parallel load/store operations. They also optimize the use of 8-bit and 16-bit floating-point (FP) and block floating-point (BFP) representations. Results show significant improvements in memory usage, execution time, and SNR.
The authors have made significant progress in revising the manuscript, addressing most of the concerns raised in the first review.
- This is a very concise and well-written article.
- Conclusion is concise and well supported.
- The bibliography is adequate in breadth and depth of coverage.
- The optimized BFP strategy leads to strong SNR improvements using minimal resources.
However, the following issues remain unresolved.
- It would benefit from a more intuitive example, such as a step-by-step FFT example with 8 input points.
- Including even a simple comparison to FFTW or CMSIS DSP would be valuable.
- The study only considers BFP and FP representations, and alternative formats like posits are not discussed.
- Adding a short note on potential challenges for larger datasets, including the scalability for larger FFT sizes, would be helpful.
- A deeper FPGA resource usage breakdown (e.g., LUT, BRAM, DSP utilization) would make the analysis more complete.
Overall, the paper can be accepted after minor revisions addressing the points mentioned above.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper as some weaknesses such as novelty is unclear, there are English issues, some references are inadequate, the introduction must be improved, the related work section must be enhanced, experimental evaluation must be improved, some improvements are needed in the description of the method. Some sentences are too long. Generally, it is better to write short sentences with one idea per sentence.
Some figures are blur. Authors should either use a higher resolution figures or redo them as vector graphics. All references should be cited in the paper. The authors should add more references in the introduction to support the claims. The authors should add a table that compares the key characteristics of prior work to highlight their differences and limitations. The authors may also consider adding a line in the table to describe the proposed solution. The description of the proposed solution should be more formal. There is not enough discussion of the experimental results. The experiments have been carried with a few datasets. It is necessary to add more datasets so as to make experiments more convincing.
N/A
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper introduces a novel in-place method for optimizing the FFT for space and edge computing applications. The authors propose a technique that uses BFP configurations and radix-2 FFT to enhance SNR, memory utilization, and power efficiency. Validation was conducted on multiple devices, including Atmel AVR32, Intel Movidius Myriad 2, Raspberry Pi Zero 2 W, and Xilinx Zynq 7000 FPGA.
- The proposed method addresses significant challenges in resource-constrained environments, such as satellites and edge devices.
- Efficient use of BFP and memory organization allows simultaneous storage and computation of FFT points, reducing resource demands.
- The paper evaluates the approach across multiple platforms, ensuring the results are robust and broadly applicable.
- The approach demonstrates consistent improvement in SNR, making it attractive for high-precision applications.
- Pseudocode and algorithmic descriptions are clear and replicable, enabling other researchers to validate or extend the work.
The paper’s novelty lies in its in-place FFT implementation tailored to specific hardware constraints, providing:
- A reduction in memory requirements by a factor of up to eight compared to out-of-place methods.
- Enhanced execution times for low-resource processors while maintaining or improving SNR.
Perhaps, the authors may reread the paper thoroughly, improve its flow and coherence, and response to the following issues:
- The authors also may add a new table explaining the existing model and their specifications based on relevant terms (e.g., year published, contribution and etc.) for the related work section.
- Line 426: "the the data" should be "the data."
- While prior work is cited, a detailed comparison table highlighting key differences in performance, memory utilization, and SNR improvement is missing.
- Power efficiency metrics are discussed only for a subset of devices, leaving room for speculation about broader applicability.
- Incorporate diagrams for the memory organization and FFT data flow to enhance readability.
- Add a comprehensive table comparing the proposed method to existing techniques across metrics like execution time, memory utilization, and SNR.
- Provide power consumption data for all tested platforms to highlight energy efficiency consistently.
- Test the method on additional hardware configurations, such as GPUs or other FPGAs, to strengthen its generalizability
Comments on the Quality of English LanguageN/A
Reviewer 3 Report
Comments and Suggestions for AuthorsAn in-place technique which can enhance the efficiency of the radix-2 FFT was proposed to improve FFT solution in this paper. The execution on the space-proven Atmel AVR32 and the Vision Processing Unit (VPU) Intel Movidius Myriad 2, as well as the edge device Raspberry Pi Zero 2 W and a low cost accelerator developed on the Xilinx Zynq 7000 XC7Z007S Field Programmable Gate Array (FPGA) validates the technique. This is a very interesting and useful study. There are still some issues that need to be addressed before the paper is accepted.
The author introduced many other improved FFT solving methods in the introduction, but did not analyze their shortcomings. What are your advantages compared to these methods?
The paper lacks performance comparison experiments between the new method and other methods.
Have the authors conducted prolonged testing and what is the reliability of this method?
The reference format is not standardized. Some references have bolded years, while others have not. It is recommended to check and revise them
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper proposes an efficient FFT method based on the in-place technique, optimized for space and edge computing applications, which is innovative and practical. The experimental part covers a variety of hardware platforms, validating the effectiveness of the proposed method. It improves the memory utilization, the execution time and the power consumption. However, the paper requires further enhancement in terms of the clarity of technical details and the depth of experimental analysis. I have some questions, which are listed as follows:
1. The in-place FFT method combined with efficient block floating-point(BFP) configuration excels in resources utilization and SNR performance. However, further clarification is needed to distinguish the core differences between the proposed method and existing in-place FFT techniques.
2. There are some spelling mistakes in the paper. For example, 'Modivius Myriad 2 Accelerator' should be corrected to 'Movidius Myriad 2 Accelerator' in line 224 and line 225. Terms (e.g., “space-proven”) should be clearly defined upon first usage to avoid potential confusion for the reader. Please check the entire manuscript as well.
3. It is recommended to enhance logical coherence through the use of subsections or transitional sentences, such as clarifying the relationship between data arrangement and butterfly operations in the methods section.
4. The reference citations primarily rely on earlier studies and should be supplemented with a discussion of related work from the past five years, including FPGA-based FFT accelerators and BFP applications, to ensure a comprehensive research background.
5. In the paper, the comparison is only with the FP configuration, it is recommended to include a comparison with other in-place FFT implementations to better highlight the performance advantages.
6. To strengthen the conclusions, additional analysis of power consumption is recommended, particularly with regard to varying input sizes and differences between hardware platforms.
7. The significant improvement in SNR due to BFP should be analyzed in depth, specifically examining whether it results from increased fractional bit retention or dynamic range optimization.