Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A CNN-Based Indoor Positioning Algorithm for Dark Environments: Integrating Local Binary Patterns and Fast Fourier Transform with the MC4L-IMU Device

Appl. Sci. 2025, 15(7), 4043; https://doi.org/10.3390/app15074043

by Nan Yin, Yuxiang Sun and Jae-Soo Kim^*

Reviewer 1: Anonymous

Reviewer 2:

Petar Rajkovic

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5: Anonymous

Appl. Sci. 2025, 15(7), 4043; https://doi.org/10.3390/app15074043

Submission received: 9 March 2025 / Revised: 31 March 2025 / Accepted: 3 April 2025 / Published: 7 April 2025

(This article belongs to the Special Issue Advanced Convolutional Neural Network (CNN) Technology in Object Detection and Data Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

An essential point in any written work is its title, and unfortunately, the use of acronyms is not a good policy to follow, in my view. Therefore, it is absolutely necessary to change the title to something truly descriptive, something that invites the reader to take an interest in the article.

Only references 1, 2, and 3 are included in the introduction. I believe this is not sufficient. I suggest that, to avoid rewriting the introduction, and given that the following section, Related Works, contains valuable information, I propose merging both sections, leaving it as an Introduction. I would also include a description of how the article is organized at the end.

Since this is an article that presents experimental results, I consider it absolutely necessary for the authors to make an adequate comparison with the declared Related Works. What makes this method better than those declared? With respect to the state of the art, what improvements are proposed? Therefore, a discussion of results section is absolutely necessary.

The conclusions, while strictly related to the work developed, undoubtedly need to be added by adding a section discussing the results, therefore a future version should include topics related to that new section.

Author Response

Common 1:

An essential point in any written work is its title, and unfortunately, the use of acronyms is not a good policy to follow, in my view. Therefore, it is necessary to change the title to something truly descriptive, something that invites the reader to take an interest in the article.

Response 1:

Thank you for this important suggestion. We fully agree that a clear and descriptive title improves readability and appeal. Accordingly, we have revised the title as follows:

"A CNN-Based Indoor Positioning Algorithm for Dark Environments: Integrating Local Binary Pattern and Fast Fourier Transform with MC4L-IMU Device"

Common 2:

Response2:

Agree, so the Introduction section and Related Studies section have been combined into one and include a description of how the article is organized at the end.(line 33~136)

Common 3:

Response 3:

Thank you for pointing this out. I totally agree with your suggestion. therefore, We added an Experimental Results Discussion in section 5.6 to make an adequate comparison with the declared Related Works. (line 668~685)

Common 4:

Response 4:

Agree! We have added Experimental Results Discussion to conclusion and in Conclusions and Future Work section. (line 687~771)

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents a combined, novel approach for indoor positioning. The research is important; the results are applicable, thus, the highest possible significance of the content.

The abstract is well-balanced between methodology and result presentation and gives an excellent brief overview.

The introduction gives a good position of the paper but lacks references that will establish a proper connection to the technology (i.e., in lines 32-47).

Figure 2 is not quite readable (at least on my monitor). To be more effective, it should be presented differently.

The related work is well-written, and the references are up to date. If possible, it would be good to make a parallel between two aspects (indoor positioning and obstacle recognition) of the projects and write an opinion on whether there is a technique that could serve as a solution in both cases.

Section 3 misses reference to starting points and calculation examples. The same applies to section 4.

Figure 3 is hard to read. Maybe its blocks should be separate images completed by the diagram with the building blocks.

Consider adding a diagram to explain the presented algorithm better (figure 5).

The method itself is adequately explained.

Comparison with state-of-the-art is missing, and this is the weakest point of the paper.

The conclusion and future work are rather short and do not provide a good correlation between the state-of-the-art and the presented method.

Author Response

Common 1:

The introduction gives a good position of the paper but lacks references that will establish a proper connection to the technology (i.e., in lines 32-47).

Response 1:

I totally agree with your point. We have added references 12, 13, 14, 15 to establish a proper connection to the technology. We have also removed this section from the introduction to form a separate section 2 (Structure and connection for MC4L-IMU Device) shown in line 138～173.

Common 2:

Figure 2 is not quite readable (at least on my monitor). To be more effective, it should be presented differently. (line 108~110)

Response 2:

Thanks for your suggestion. I have resized the image and hope it will be clearer.

Common 3:

Response 3:

I totally agree with your point. We have tried to add a paragraph in the introduction and related studies section about whether there is a technique that could serve as a solution in both cases shown in Line 73~87.

Common 4:

Section 3 misses reference to starting points and calculation examples. The same applies to section 4.

Response 4:

We added references 17 to 21 in Chapter 3, and referenced all key techniques such as FFT, attention mechanism, and adaptive threshold. We also added the EKF Derivation and Parameter Settings section in lines 440 to 467 at Section 4.1.6.

Common 5:

Figure 3 is hard to read. Maybe its blocks should be separate images completed by the diagram with the building blocks.

Response 5:

Thank you very much for your thoughtful suggestion regarding Figure 3. We agree that clarity is important, and we appreciate your perspective. At the same time, we feel that presenting the building blocks together in a single figure helps convey the overall structure and relationships more cohesively.

Common 6:

Consider adding a diagram to explain the presented algorithm better (figure 5).

Response 6:

It is a good idea. We have added a flow char(figure 5) in figure to explain the presented algorithm in line 516~517。

Common 7:

The method itself is adequately explained. Comparison with state-of-the-art is missing, and this is the weakest point of the paper.

Response 7:

Agree! To address this issue, We added Experimental Results Discussion in section 5.6. Further comparison and analysis with state-of-the-art algorithms shown in line 667～685.

Common 7:

The conclusion and future work are rather short and do not provide a good correlation between the state-of-the-art and the presented method.

Response 7:

We rewrite the Conclusion and Future Work sections and provide a good correlation between the state-of-the-art and the presented method in line 687~771.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Authors should correct:

WRITING STYLE
1. It is not a common practice to start a section/subsection text or paragraph with "Figure..." For example, in line 208. there is "Figure 3 illustrates"...There should be at least several sentences before figure introduction.

2. Position of images - figure 3 comes too late in line 398, while it is mentioned in text in line 209. It is common practice to have figure placed next to the text it is mentioned and described. Usually, text introducing the figure comes immediately before the figure.

CONTENT
1. Since the authors describe the created device (lines 32 to 47), it would be beneficial to see a photography of the created device, not just a simplified schema of the structure.

2. The whole device consists of several modules, that are integrated. It is a common practice to have a technical schema or diagram to represent modules and their connections. Currently, at figure 1 there are only components, but their integration is not presented.

3. The structure of the introduction - should be changed. Authors provide details about the device used in this manuscript at the very beginning of the introduction, while this should be shifted into Materials and methods section. The structure of introduction should be as usual, by paragraphs: motivation, current brief literature review and research gap, contribution, intro to the rest of the paper.

4. Related works - too much text without proper citation from references list. Table 1. with classification of indoor position technologies with advantages and weaknesses (should be renamed into disadvantages) - without any citation from literature.

5. Performance evaluation - this section should be split into at least 4 sections - "Performance evaluation methods" - to explain statistical methods or metrics that will be used in the experiment to evaluate, "Experimental setup" - to describe the software and hardware tools that implemented the algorithm and that were used in the experiment, sample data collection and sample characteristics, "Results" - to present results from experiment and performance evaluation, currently presented under "Performance evaluation section", and if possible, to compare with existing positioning algorithms, as announced in line 104, "Discussion" - to summarize conclusions from results and compare to previous approaches.

Comments on the Quality of English Language

1. Authors should carefully choose words - for example, in line 183 - table 1 last column: "weakness "is related to humans, while "disadvantages" are more used in technical area.

Author Response

WRITING STYLE
Common 1.

It is not a common practice to start a section/subsection text or paragraph with "Figure..." For example, in line 208. there is "Figure 3 illustrates"...There should be at least several sentences before figure introduction.

Response 1:

Thanks for your suggestion. All words starting with "figure" in the article are placed in the middle of the paragraph and not at the beginning.

Common 2.

Position of images - figure 3 comes too late in line 398, while it is mentioned in text in line 209. It is common practice to have figure placed next to the text it is mentioned and described. Usually, text introducing the figure comes immediately before the figure.

Response 2:

Agreed! Figure 3 has been repositioned to appear in the text

CONTENT
Common 3.

Since the authors describe the created device (lines 32 to 47), it would be beneficial to see a photography of the created device, not just a simplified schema of the structure.

The whole device consists of several modules, that are integrated. It is a common practice to have a technical schema or diagram to represent modules and their connections. Currently, at figure 1 there are only components, but their integration is not presented.

Response 3:

We completely agree with your point of view. We have taken the composition and connection of the equipment out of the introduction and put it in section 2. We also describe the connection between the equipment from line 158 to 173.

Common 4.

The structure of the introduction - should be changed. Authors provide details about the device used in this manuscript at the very beginning of the introduction, while this should be shifted into Materials and methods section. The structure of introduction should be as usual, by paragraphs: motivation, current brief literature review and research gap, contribution, intro to the rest of the paper.

Response 4:

Your point is very helpful. We have merged the introduction and related studies sections as you requested and separated the equipment section from the introduction section into a separate section 2. Line 158～173

Common 5. Related works - too much text without proper citation from references list. Table 1. with classification of indoor position technologies with advantages and weaknesses (should be renamed into disadvantages) - without any citation from literature.

Response 5:

We sincerely thank the reviewer for pointing this out. In the revised manuscript, we have addressed this comment as follows:

We reviewed the entire “Related Works” section and have added appropriate citations throughout, especially where previous technologies, algorithms, or systems are discussed.
The title of Table 1 has been revised from “advantages and weaknesses” to “advantages and disadvantages” as suggested.

Common 6. Performance evaluation - this section should be split into at least 4 sections - "Performance evaluation methods" - to explain statistical methods or metrics that will be used in the experiment to evaluate, "Experimental setup" - to describe the software and hardware tools that implemented the algorithm and that were used in the experiment, sample data collection and sample characteristics, "Results" - to present results from experiment and performance evaluation, currently presented under "Performance evaluation section", and if possible, to compare with existing positioning algorithms, as announced in line 104, "Discussion" - to summarize conclusions from results and compare to previous approaches.

Response 6:

We sincerely appreciate the reviewer’s thoughtful suggestion regarding the restructuring of the “Performance Evaluation” section. In this revision, we have added a new “Discussion” section, which provides an in-depth analysis of the experimental results and compares our method with existing approaches in indoor positioning and obstacle recognition. Line 667～685

Author Response File: Author Response.docx

Reviewer 4 Report

Comments and Suggestions for Authors

Title: LBP-FFT-CNNs: A Convolutional Neural Network Combined with Local Binary Pattern and Fast Fourier Transform based on MC4L-IMU Device for Indoor Positioning

The paper presents LBP-FFT-CNNs, a novel indoor positioning system combining Local Binary Patterns (LBP), Fast Fourier Transform (FFT), and Convolutional Neural Networks (CNNs) with an IMU for enhanced obstacle recognition and positioning accuracy in dark environments.

The paper is well-structured and presents a novel and well-justified approach for indoor positioning and onstacle recognition by combing LBP, FFT, and CNNs with an IMU.

The abstract is well-written. However it can be improved by briefly mentioning the experimental setup or environment to give readers a better context of the results.

The introduction could briefly mention the significance of the IMU integration.

: The related work section could benefit from a brief critique of the limitations of existing methods.

The section on LBP-FFT-CNNs Model can have a brief discussion on the computational complexity of the proposed model. The use of FFT and self-attention mechanisms can increase computational complexity.

The performance evaluation is comprehensive and well-presented. However, The section can discuss on the real-time performance of the system, especially given the computational demands of FFT and CNNs.

Author Response

Common 1: The abstract is well-written. However, it can be improved by briefly mentioning the experimental setup or environment to give readers a better context of the results.

Response 1:

Agree！ We have added a description of the experimental environment in the introduction section on Line26~28 shown in following:

To evaluate model robustness and MC4L-IMU work reliably under different conditions, the experiments were conducted in a controlled indoor environment with different obstacle materials and lighting conditions

Common 2：The introduction could briefly mention the significance of the IMU integration.

Response 2:

That's a good idea, we briefly mentioned the benefits of IMU bonding in the introduction section and added it to line 81～86.

The IMU module, which includes an accelerometer, gyroscope, and magnetometer, provides acceleration, angular velocity and magnetic field data. This fusion enhances system performance under low-light, occluded, or visually degraded environments. Moreover, inertial-visual fusion techniques help mitigate sensor drift in localization while reducing noise and motion blur in recognition tasks. These shared requirements and synergistic techniques motivate the development of unified algorithms that can serve both functions.

Common 3：The related work section could benefit from a brief critique of the limitations of existing methods.

Response 3:

Thank you for your suggestion, and in order to make the article clearer, we have merged the introductory and related research sections, and at the beginning of the article there was a criticism of the limitations of the existing methodology. These contents are displayed on Line34~72

Common 4: The section on LBP-FFT-CNNs Model can have a brief discussion on the computational complexity of the proposed model. The use of FFT and self-attention mechanisms can increase computational complexity.

Response 4:

Thank you for pointing out the shortcomings of the article. To make up for this shortcoming, I added the analysis of computational complexity in section 3.2 and presented it in Line 224 to 238.

3.2 Computational Complexity Analysis

While the integration of FFT and self-attention mechanisms enhances feature representation and model robustness, it inevitably introduces additional computational overhead. FFT transforms spatial image data into the frequency domain, which involves operations per image, where is the number of pixels. However, due to selective frequency component retention, the overall dimensionality is reduced before entering the convolutional layers, partially offsetting the added cost.

The self-attention module, designed to capture global dependencies across feature maps, requires time and space complexity for feature maps of size . While more computationally demanding than standard convolution, its inclusion significantly improves the model’s ability to distinguish between background textures and obstacles under challenging conditions.

To ensure practical feasibility, we apply both FFT and self-attention only at specific stages in the network. Additionally, experiments were conducted on a Raspberry Pi 4 platform to validate real-time performance, confirming that the processing speed remains acceptable for embedded applications in indoor positioning.

Common 5: The performance evaluation is comprehensive and well-presented. However, the section can discuss on the real-time performance of the system, especially given the computational demands of FFT and CNNs.

Response 5:

This is a very useful suggestion. Therefore, we added a new set of experiments to verify the memory usage and latency of the system in a real environment. The following is the content of this part and is shown in Line 645~665

To be closer to the actual usage scenario, we randomly set up obstacles in a two-meter-wide corridor and conducted the experiment in a completely dark environment. Figure 12 presents the memory usage and inference latency of the proposed LBP-FFT-CNNs model over a 2-hour period, sampled every 5 minutes under simulated high-load and variable system conditions. Compared to baseline conditions, both metrics exhibit more pronounced fluctuations. The memory usage varies within the range of approximately 1280 MB to 1320 MB, reflecting periodic increases in background memory consumption or model memory reallocation. Despite these variations, the model maintains stability without exceeding the critical 1.4 GB threshold, ensuring continued operation within the constraints of the Raspberry Pi 4B.

Inference latency fluctuates between 145ms and 160ms, which corresponds to a significant increase relative to nominal performance (~45ms). These elevated values simulate worst-case scenarios such as concurrent sensor data processing or intermittent I/O activity. Importantly, the latency remains within acceptable bounds for applications with moderate real-time constraints.

Overall, the system demonstrates robust behavior under resource-constrained conditions, confirming the model’s viability for deployment in dynamic embedded environments.

Author Response File: Author Response.docx

Reviewer 5 Report

Comments and Suggestions for Authors

Summary:

This is a manuscript on the issue of indoor positioning under conditions in dark places with no light, where the visual cues are reduced (affecting the accuracy of those localization systems that are based on vision). Authors had been working previously on LRA to differentiate obstacle from the environment (walls) under limited visibility, experimenting issues with textures that are similar (many obstacle recognition system suffer from this problem). In this case, the authors propose an approach (LBP-FFT-CNN) that fuses feature extraction (with local binary patters, LBP), analysis in the domain of the frequency (Fourier transform, FFT) and deep learning (CNN with self attention), with inertial data (IMU). After experimentation, authors claim to have achieve up to 96% of obstacle recognition accuracy, a nice determination coefficient and very low PSI.

Broad comments:

Strengths:

The research is focused on an important topic (inddor navigation), that is not completely solved in the literature (especially, the textures in dark environments).
The inclusion of the combination of FFT with the self attention in the CNN is interesting, and seems to overcome many of the issues related to local textures and global contexts.
The text is easy to follow for any reader.

Weaknesses:

A description of the computational overhead associated with the system would be interesting, since these solutions are most of the times part of an embedded system.
The scenario used for validation is suitable for the first laboratory validations, but a more realistic scenario would be completely conclusive.

Specific comments:

Major issues:

Please, elaborate the formulation of the inclusion of the EKF (equations 19 and 23): the mathematical explanation and the reason for the EKF parameters.
Please, considering reviewing the text with a professional translator: some sentences are difficult to read (line 358), some are lacking articles (line 329, line 455 and many others), or sound too colloquial (line 324-325, for instance).
Please, consider elaborating section 6 a little more: now, it is just a summary of the abstract.

Minor issues:

Please, explain the rationale for using 10 neurons (line 241)
Please, add a description of the overhead that the proposed system implies in terms of processing time, memory consumption and power consumption.
Please, provide a citation for stating the superiority adaptability to change in light conditions of adaptive thresholding (line 256).
Please, consider testing your proposed algorithm in a more realistic indoor environment.
Maybe reference [11] to showcase signal strength technology uses for positioning is a little far-fetched.
Please, explain why 130 consecutive positions/obstacles (line 457)
Please, follow the guide to authors, especially regarding the use of italics (just a few examples in lines 348, 349, 518-524, but many others throughout the text.
Trailing space in line 235 (“…followed by a …”)
Please, rewrite the sentence in line 309 (the “.” is misleading).
Typo in line 243 (“bet-ween”)

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Please, considering reviewing the text with a professional translator: some sentences are difficult to read (line 358), some are lacking articles (line 329, line 455 and many others), or sound too colloquial (line 324-325, for instance)

Author Response

Weaknesses:

Common 1: A description of the computational overhead associated with the system would be interesting, since these solutions are most of the times part of an embedded system. The scenario used for validation is suitable for the first laboratory validations, but a more realistic scenario would be completely conclusive.

Response 1:

This is a very useful suggestion. Therefore, we added a new set of real experiments to verify the memory usage and latency of the system in a real environment. The following is the content of this part and is shown in Line 645~665

Specific comments:

Major issues:

Common 1: Please, elaborate the formulation of the inclusion of the EKF (equations 19 and 23): the mathematical explanation and the reason for the EKF parameters.

Response 1:

Thank you for your suggestion, in order to understand the EKF algorithm more clearly, we have a new set of sections, called EKF Derivation and Parameter Settings line 439~467. Hopefully, it will help readers to understand more clearly how EKF works in our system.

Common 2: Please, considering reviewing the text with a professional translator: some sentences are difficult to read (line 358), some are lacking articles (line 329, line 455 and many others), or sound too colloquial (line 324-325, for instance).

Response 2:

Thank you for your suggestion, we have found a specialized translation agency to correct the grammar and wording of the paper

Common 3: Please, consider elaborating section 6 a little more: now, it is just a summary of the abstract.

Response 3:

Thanks to your suggestion, we have revised the conclusion section from the ground up. The revised version not only summarizes the abstract, but also discusses the results of the experiments, and describes future research directions

Minor issues:

Common 1: Please, explain the rationale for using 10 neurons (line 241)

Response 1: In order to better verify the effect of the number of neurons on the distance estimation error, we did an experiment and set the number of neurons to 4, 8, 10, 16 and 32 respectively

Common 2: Please, add a description of the overhead that the proposed system implies in terms of processing time, memory consumption and power consumption.

Response 2: This is a very useful suggestion. Therefore, we added a new set of experiments to verify the memory usage and latency of the system in a real environment. The following is the content of this part and is shown in Line 645~665

Common 3: Please, provide a citation for stating the superiority adaptability to change in light conditions of adaptive thresholding (line 256).

Response 3: We looked for references 17 and 18 to demonstrate the validity of Adaptive thresholding at line 250

Common 4: Please, consider testing your proposed algorithm in a more realistic indoor environment.

Response 4: This is a very useful suggestion. Therefore, we added a new set of experiments to verify the memory usage and latency of the system in a real environment.

Common 5: Maybe reference [11] to showcase signal strength technology uses for positioning is a little far-fetched.

Response 5: reference [11] have been deleted.

Common 6: Please, explain why 130 consecutive positions/obstacles (line 457)

Response 6: We explain in detail why we chose 130. The content is shown below in lines 475 to 482 of the original paper

We chose 130 positions combinations to ensure data diversity while controlling experimental overhead. The combinations cover a variety of materials (such as metal, wood, paper, and wall), with distances ranging from 0.3 to 3 meters, a step size of 5 cm, and different obstacle placements to simulate real indoor scenes. Preliminary experiments show that further increasing the number of samples has limited improvement in model accuracy (<0.2%), but training time increases significantly. Therefore, 130 is a reasonable choice that considers both representativeness and efficiency

Common 7: Please, follow the guide to authors, especially regarding the use of italics (just a few examples in lines 348, 349, 518-524, but many others throughout the text.

Response 7: The use of italics has been modified in accordance with the Author's Manual

Common 8: Trailing space in line 235 (“…followed by a …”)

Response 8: have been mortified

Common 9: Please, rewrite the sentence in line 309 (the “.” is misleading).

Response 9: sentence have been rewritten in line 301~ 303

Common 10: Typo in line 243 (“bet-ween”)

Response 10: this word have been modified in line 216

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Congratulations on the article, it turned out excellent and all the points that were somewhat weak in my opinion were effectively resolved.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper is significantly updated, and the authors meet all requirements from the previous review round.

I suggest to accept the paper in its present form.

Article Menu

A CNN-Based Indoor Positioning Algorithm for Dark Environments: Integrating Local Binary Patterns and Fast Fourier Transform with the MC4L-IMU Device

Further Information

Guidelines

MDPI Initiatives

Follow MDPI