An RGB-D Camera-Based Wearable Device for Visually Impaired People: Enhanced Navigation with Reduced Social Stigma
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- The study relies on blindfolded college students to simulate visually impaired children due to the unavailability of real users. However, simulated users may not accurately replicate the perceptual and behavioral patterns of actual visually impaired children (e.g., spatial awareness differences), leading to biased performance metrics. Future work should prioritize collaboration with special education institutions to validate results with real users.
- The vibration interaction rules (e.g., multi-motor combinations) require memorization, but the study lacks data on learning curves or error rates during initial usage. This raises concerns about usability for children, who may struggle with complex feedback patterns. A simplified or adaptive feedback mechanism should be explored.
- While YOLOv8n shows high accuracy, the paper does not justify its selection over other lightweight models (e.g., YOLOv5s) for wearable devices. Metrics like model size (MB) and power consumption are critical for edge deployment but remain unaddressed.
- Real-time claims lack quantitative evidence (e.g., end-to-end latency per frame, GPU/CPU utilization rates). For wearable devices, latency exceeding 100ms may disrupt user experience, but no such thresholds are discussed.
- The combined voice-vibration mode showed no efficiency gain in experiments, but the study does not analyze why or propose optimizations (e.g., context-aware modality switching). A controlled study comparing uni-modal vs. multi-modal feedback in complex scenarios is needed.
- Training data is limited to campus environments (e.g., pedestrians, bicycles), potentially overlooking urban challenges like moving vehicles or irregular obstacles. Cross-environment testing and data augmentation for diverse scenarios are necessary.
- The arm-mounted design may cause fatigue during prolonged use, but no metrics (e.g., muscle strain, user comfort surveys) are provided. Longitudinal studies on ergonomic impacts are critical for practical adoption.
- Deep learning based related work can be supplemented such as A real-time constellation image classification method of wireless communication signals based on the lightweight network MobileViT, MobileRaT: A Lightweight Radio Transformer Method for Automatic Modulation Classification in Drone Communication Systems.
Author Response
Comments 1: The study relies on blindfolded college students to simulate visually impaired children due to the unavailability of real users. However, simulated users may not accurately replicate the perceptual and behavioral patterns of actual visually impaired children (e.g., spatial awareness differences), leading to biased performance metrics. Future work should prioritize collaboration with special education institutions to validate results with real users.
Response 1: Thank you for the reviewer's important comments. We fully agree that using college students with normal vision to simulate visually impaired children may result in differences in perception and behavior patterns (such as spatial cognitive ability), which is indeed one of the limitations of this study. Because visually impaired children are so special, in this study, we have tried to reduce the bias by expanding the research target from visually impaired children to visually impaired people. At the same time, we will continue to conduct experimental tests on real scenes of visually impaired people in the future to optimize and implement this bias. The specific changes are reflected in the title, abstract, introduction, and conclusion of the article, changing visually impaired children to visually impaired people, and adjusting the arguments supporting visually impaired children throughout the article.
Comments 2: The vibration interaction rules (e.g., multi-motor combinations) require memorization, but the study lacks data on learning curves or error rates during initial usage. This raises concerns about usability for children, who may struggle with complex feedback patterns. A simplified or adaptive feedback mechanism should be explored.
Response 2: Thank you for the reviewer's valuable comments. We understand your concerns about the learning cost of vibration interaction rules, and we have adjusted the paper. In the revised manuscript, we have adjusted the expression to avoid over-focusing the conclusions on the children group. The current experiment is more focused on the general usability verification of the vibration interaction mode. At the same time, we acknowledge the lack of statistics on learning curves and error rates, but in the experimental interviews, 5 volunteers expressed the ease of learning to use the device, which can be seen in lines 428-430 of the revised manuscript. In addition, before the experimental test, all 20 volunteers were able to use the device independently after 3 minutes of training, and no learning burden was found.
Comments 3: While YOLOv8n shows high accuracy, the paper does not justify its selection over other lightweight models (e.g., YOLOv5s) for wearable devices. Metrics like model size (MB) and power consumption are critical for edge deployment but remain unaddressed.
Response 3: Thank you for pointing out this important issue. We accept this opinion. In lines 113-119 of our revised manuscript, we added the reasons for using YOLO and the selection criteria in the YOLO series. In order to match the accuracy and real-time performance of smart devices, we need to compare and select models with high accuracy and good real-time performance. In the model comparison, we compared YOLOv3, YOLOv4, YOLOv5, YOLOv6, YOLOv7 and other models to select the best performing model. As for not verifying other lightweight models (such as YOLOv5s), it is true that we did not consider this version. However, we have the following considerations. On the one hand, in line 321 of our revised manuscript, we have verified a variety of different models and found that the average accuracy of YOLOv8n is 0.955, which is already a high level. The recognition rate is as high as 34.8ms/pic, and the room for improvement is relatively small. For this smart device, no other lightweight models were further verified. On the other hand, the gap in memory and power consumption on this smart device is relatively small, with less memory usage. At the same time, the largest power consumption is in the depth camera perception and camera gimbal control, so this study did not conduct more in-depth analysis on them.
Comments 4: Real-time claims lack quantitative evidence (e.g., end-to-end latency per frame, GPU/CPU utilization rates). For wearable devices, latency exceeding 100ms may disrupt user experience, but no such thresholds are discussed.
Response 4: Thank you for the important suggestions from the reviewer. We have added quantitative data for real-time verification in lines 288-292 and lines 322-325 of the revised manuscript. Specifically, it includes: point cloud processing time and end-to-end delay of image target detection, and supplements that the result meets the real-time threshold of less than 100ms for wearable devices.
Comments 5: The combined voice-vibration mode showed no efficiency gain in experiments, but the study does not analyze why or propose optimizations (e.g., context-aware modality switching). A controlled study comparing uni-modal vs. multi-modal feedback in complex scenarios is needed.
Response 5: Thank you for the reviewer's important suggestions. The T voice-vibration combination mode did not show an increase in efficiency in the experiment. We have added the reason in lines 410-416 of the revised manuscript. The interview content of the test volunteers mentioned that they found that more information interaction and the need to wait for voice broadcasts would affect the efficiency of walking.
Comments 6: Training data is limited to campus environments (e.g., pedestrians, bicycles), potentially overlooking urban challenges like moving vehicles or irregular obstacles. Cross-environment testing and data augmentation for diverse scenarios are necessary.
Response 6: Thank you for the reviewer's important correction on the limitations of the dataset. We have added the explanation in the "Limitations" section of lines 533-545 of the revised manuscript. The campus environment is our first step. The experiment verifies the effectiveness of the obstacle avoidance performance of some common static obstacles and slow-moving pedestrians on the campus. For obstacles on urban roads, obstacle avoidance can be analogized to a certain extent, but for the verification of urban environments such as fast-moving cars, the expansion of the dataset will be the direction of our next research efforts.
Comments 7: The arm-mounted design may cause fatigue during prolonged use, but no metrics (e.g., muscle strain, user comfort surveys) are provided. Longitudinal studies on ergonomic impacts are critical for practical adoption.
Response 7: Thank you for the reviewer's important suggestions. We fully agree that ergonomic design is important for the long-term use of wearable devices, and longitudinal studies on fatigue and user comfort are very important for actual adoption, especially for long-term use. Although we did not conduct quantitative evaluations of performance factors (such as muscle tension or comfort surveys), in lines 191-195 of the revised manuscript, we added the weight of the prototype and the fact that the user's force point is on the shoulder when using the prototype. At the same time, lines 419-431 of the revised manuscript give feedback from volunteers after testing, who believe that the research prototype is easier than traditional blind sticks. At this stage of research, our main focus is to optimize the navigation efficiency and obstacle avoidance of visually impaired users using RGB-D camera technology. In future research stages, we will add quantitative evaluations of performance factors (such as muscle tension or comfort surveys).
Comments 8: Deep learning based related work can be supplemented such as A real-time constellation image classification method of wireless communication signals based on the lightweight network MobileViT, MobileRaT: A Lightweight Radio Transformer Method for Automatic Modulation Classification in Drone Communication Systems.
Response 8: Thank you for the important suggestions from the reviewer. In this study, the YOLO network is used to realize the target recognition function of environmental images in the work related to deep learning. We use the deep learning YOLO network model, but do not optimize the architecture of the model. Instead, we verify the performance of different YOLO models under the existing model, and then adapt it to smart devices. The focus on improving efficiency is more on the lightweight structure, the processing of point clouds to improve real-time performance, and the intelligent interaction method to improve efficiency. The steps of using YOLO introduced in our original manuscript are described, including collecting image data sets, manually annotating data sets, and dividing data sets according to the ratio of 8:1:1 for training data: verification data: test data. At the same time, the platform for YOLO training and the parameters set are also mentioned. At the same time, the use of YOLO is added in lines 113-119 of the revised manuscript to improve accuracy and real-time performance, which is also based on screening under the existing YOLO model, rather than optimizing the model.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis is an interesting paper on an up-to-date subject. The authors propose a wearable navigation system for visually impaired children to enhance their independent travel and environmental awareness. The methodology sounds correct.
1) Abstract: "In contrast, vibration feedback (...)" In contrast to what?
2) "better mid-air obstacle recognition" - what is a "mid-air obstacle recognition"?
3) "As a symbol of disability, the cane may cause children to suffer strange eyes" - this sentence is incomprehensible, probably a vocabulary error.
4) "Figure 1 shows the device's structure" - please add a figure that demonstrates how the device is carried by a user. How much does it weight, what is its dimension?
5) Please provide technical details of the hardware used by the device, like camera resolution, IMU precision, drift, etc.
6) "This paper uses RANSAC plane fitting" - this paper or this device? What is "RANSAC plane fitting"?
7) Section 3.3 - it seems that the proposed method is some kind of 3D image processing / point cloud algorithm. Please present it in the form of the pseudocode.
8) "In this study, YOLOv8n was used" - please add reference to the paper describing it.
9) "During the training process, the initial learning rate is set to 0.01, the Batch Size is 16, the total number of training rounds is 100, and the training time is about 4 hours." - on which dataset it was trained? What was the purpose of training? What classes of objects were learned to recognize?
10) "combines homemade data sets" - what are those datasets? Where they can be downloaded?
11) It seems that this paper describes studies on human subjects. Please add information about ensuring ethical standards and procedures for research.
12) "All data generated or analyzed during this study are included in this published article." - where? I do not see any link.
13) Please publish source codes of your papers on an open website repository (GitHub or any other). Without them, your results are virtually impossible to reproduce.
Author Response
Comments 1: Abstract: "In contrast, vibration feedback (...)" In contrast to what?
Response 1: It contrasts vibration feedback with voice feedback. Since the article expanded the scope of the study from visually impaired children to visually impaired people, the introduction and abstract were significantly adjusted. In the revised manuscript, the expression of this sentence has been replaced with other expressions. We have checked the whole article and paid attention to the rigor of the expression of the sentences.
Comments 2: "better mid-air obstacle recognition" - what is a "mid-air obstacle recognition"?
Response 2: It refers to obstacles that are not on the ground but in mid-air, such as low-hanging branches or railings. The height of the obstacle is above the waist. If a cane is used, it will not be easy to detect and will collide with such obstacles. In the experiment section 4.2, two mid-air obstacles were also placed on the experiment site. The experiment pasted a long strip of cardboard on a 1.5-meter-high wall and placed the cardboard in mid-air. When the blindfolded volunteer passed by, the obstacle could not be detected and would collide with the volunteer .
Comments 3: "As a symbol of disability, the cane may cause children to suffer strange eyes" - this sentence is incomprehensible, probably a vocabulary error.
Response 3: Thank you very much for pointing out the error in vocabulary. It is indeed inappropriate to use this vocabulary. What we originally wanted to express was that visually impaired people, due to their physical disabilities, would attract excessive attention or even discrimination if they used obvious stick structures. Since the article expanded the scope of research from visually impaired children to visually impaired people, the introduction and abstract have been greatly adjusted. This paragraph has been replaced by other expressions. We will pay more attention to whether the expression of vocabulary throughout the article is accurate.
Comments 4: "Figure 1 shows the device's structure" - please add a figure that demonstrates how the device is carried by a user. How much does it weight, what is its dimension?
Response 4: Thank you for your suggestion. In the revised manuscript, I have added the wearing picture, size and total mass of the actual device in lines 183-195.
Comments 5: Please provide technical details of the hardware used by the device, like camera resolution, IMU precision, drift, etc.
Response 5: Thank you for your suggestion. In the revised manuscript, I have added the camera parameters and IMU category in lines 160-178.
Comments 6: "This paper uses RANSAC plane fitting" - this paper or this device? What is "RANSAC plane fitting"?
Response 6: Thank you for pointing out the problem in our original manuscript. This article is about studying a smart navigation wearable device. The obstacle detection principle of this device uses the RANSAC plane fitting method. We have revised the problem. In the revised manuscript, I have re-described it in lines 257-267, describing the principle of RANSAC plane fitting in more detail.
Comments 7: Section 3.3 - it seems that the proposed method is some kind of 3D image processing / point cloud algorithm. Please present it in the form of the pseudocode.
Response 7: Thank you for your suggestion. Section 3.3 proposes the process of obstacle detection. The original point cloud is filtered and downsampled, and then the largest plane in the point cloud, i.e. the ground, is removed by the SANSAC plane fitting method. Finally, the points cloud is divided into obstacle point clusters for analysis using the DBSCAN clustering method optimized by the KD-tree structure. In lines 246-292 of the revised manuscript, I introduced this 3D point cloud processing method in detail and added the pseudo code of the DBSCAN method.
Comments 8: "In this study, YOLOv8n was used" - please add reference to the paper describing it.
Response 8: Thank you for your suggestion, I have added references to the YOLO series in the revised manuscript.
Comments 9: "During the training process, the initial learning rate is set to 0.01, the Batch Size is 16, the total number of training rounds is 100, and the training time is about 4 hours." - on which dataset it was trained? What was the purpose of training? What classes of objects were learned to recognize?
Response 9: Thank you for pointing out the problems and suggestions. I am very sorry that the sentences in my original manuscript were not clear, which caused some confusion in reading this paper. In lines 307-314 of the revised manuscript, this study is based on a manually annotated obstacle image dataset and uses the PyTorch 1.11 framework for model training, aiming to develop an obstacle recognition algorithm with strong generalization ability. As a single-stage target detector, YOLO's recognition ability relies on the prior knowledge provided by a large-scale annotated dataset. Through supervised learning, these annotated data systems guide the optimization of model parameters and ultimately form accurate obstacle detection rules. In theory, any obstacle can be accurately identified, and only a large-scale annotated dataset is needed to provide prior knowledge in advance. However, in this study, the types of obstacles in the dataset include person, bicycle, warning sign, garbage can, green plant, and chair.
Comments 10: "combines homemade data sets" - what are those datasets? Where they can be downloaded?
Response 10: Thank you for pointing out the problem and suggestion. I am very sorry that the sentence in my original manuscript was not clear, which caused some confusion in reading this paper. The self-made dataset refers to the image annotation of person, bicycle, warning sign, garbage can, green plant, chair. The acquisition process of this dataset is described in detail in lines 294-300 of the revised manuscript. At the same time, I put the download link of this dataset in line 547 of the revised manuscript. You can download or access it if you need it.
Comments 11: It seems that this paper describes studies on human subjects. Please add information about ensuring ethical standards and procedures for research.
Response 11: Thank you for pointing out that the ethical standards and procedures for research section of the article is missing. I have added this section and added the Institutional Review Board Statement at line 550 of the revised manuscript.
Comments 12: "All data generated or analyzed during this study are included in this published article." - where? I do not see any link.
Response 12: I am very sorry for my careless mistake. I have corrected it and put the link to download the dataset in line 547 of the revised manuscript. The source code is published on Github and the link is in line 547 of the revised manuscript.
Comments 13: Please publish source codes of your papers on an open website repository (GitHub or any other). Without them, your results are virtually impossible to reproduce.
Response 13: Thank you very much for your suggestions for revision. My source code has been published on Github. The link is at line 547 of the revised manuscript and can be accessed directly.
Reviewer 3 Report
Comments and Suggestions for AuthorsI think the authors submitted a manuscript dealing with an important engineering problem. I think the manuscript may fit well to the scope of the journal. I think the introduction section give enough background information on the research topic. But I think it would be good to give a definition on the levels and severity of visual impairments of humans. For example, my background does not lie in medicine or biology. Thus, my knowledge is limited on that topic. In my opinion, the related work section is also satisfactory. I think the overall architecture of the system is described well and it is suitable for a scientific publication. I felt that Section 4.3 is rather superficial. I think the authors could share more details on the design of the PID controller. The authors could illustrate the response of the PID controller to various input signals for instance. I felt that Section 3.3 is also rather bit superficial. The obstacle detection algorithm could be illustrated with a flow chart. Sample results would be also welcomed. If I understand correctly, the authors used already existing algorithms for target recognition. I think the presentation of the results are also bit superficial. It would be good to know something on the demographic composition of volunteers. The authors proposed a system for visually impaired children. It would be good to know the severity and the type of visual impairment of the volunteers.
Author Response
Comments1: I think the authors submitted a manuscript dealing with an important engineering problem. I think the manuscript may fit well to the scope of the journal. I think the introduction section give enough background information on the research topic. But I think it would be good to give a definition on the levels and severity of visual impairments of humans. For example, my background does not lie in medicine or biology. Thus, my knowledge is limited on that topic. In my opinion, the related work section is also satisfactory. I think the overall architecture of the system is described well and it is suitable for a scientific publication. I felt that Section 4.3 is rather superficial. I think the authors could share more details on the design of the PID controller. The authors could illustrate the response of the PID controller to various input signals for instance. I felt that Section 3.3 is also rather bit superficial. The obstacle detection algorithm could be illustrated with a flow chart. Sample results would be also welcomed. If I understand correctly, the authors used already existing algorithms for target recognition. I think the presentation of the results are also bit superficial. It would be good to know something on the demographic composition of volunteers. The authors proposed a system for visually impaired children. It would be good to know the severity and the type of visual impairment of the volunteers.
Response 1: Thank you very much for your comments. I agree with your suggestions. For Section 3.2, the gimbal design and implementation, we have added more PID controller design details in lines 237-245 of the revised manuscript. The controller input is the camera set pitch angle, which is subtracted from the actual pitch angle provided by the IMU to generate the pitch angle error. Through the PID controller, the control signal V is output. The control signal drives the gimbal motor to rotate accordingly, thereby adjusting the camera to maintain the set angle. In addition, this section also adds the rationality verification of the gimbal field of view, further improving the adequacy of using the gimbal. For Section 3.3, the obstacle detection algorithm has been described in more detail in lines 247-292 of the revised manuscript. First, the original point cloud is preprocessed by point cloud filtering, downsampling and other preprocessing operations to reduce the point cloud density. Next, the preprocessed point cloud is fitted with a point cloud plane to remove the ground point cloud, and the dynamic adaptive threshold method is used to improve the problem of poor point cloud accuracy due to long distance and strong light conditions. Finally, the point cloud after removing the ground is clustered to obtain the clusters of obstacle point clouds, and the KD-tree structure is used to accelerate the clustering process and improve the detection rate. The target recognition algorithm is already very mature, but our research has established an obstacle dataset and used the dataset to verify a variety of YOLO network models. We selected the YOLO model with the highest accuracy, the fastest average rate, and the most suitable for the smart navigation wearable device, and deployed it on the actual object. When we further expand the research in the future, we will improve the YOLO network algorithm to further optimize the smart device. The age and vision of the volunteers were added in lines 400-406 and 441 of the revised manuscript.
Reviewer 4 Report
Comments and Suggestions for Authors---- abstract
Abstract introduces potentially valuable work on a wearable navigation system for visually impaired children. However, for a scientific paper, it could be strengthened by providing more concrete methodological and technological details. While it mentions "advanced obstacle detection," "real-time object recognition," and the use of "YOLOv8n," these descriptions remain somewhat high-level. Greater specificity regarding the sensor types, the distinct algorithms beyond naming the model, or the particular implementation of the "multimodal feedback" would allow for a clearer understanding of the system's technical novelty.
Furthermore, the focus on exactly what was achieved could be more precise. The abstract states the system significantly improves safety and outperforms traditional aids based on user studies. Including key quantitative results or specific performance metrics from the user studies would provide more substantial evidence of the system's effectiveness.
---- introduction
The introduction section of the paper requires significant enhancement to clearly highlight the authors' contributions. As it stands, the current version is overly brief and lacks a detailed explanation of the work undertaken. It does not sufficiently convey the innovative aspects or the specific outcomes of the study.
To improve the introduction, the authors should explicitly describe the system they have developed. Readers need a clear understanding of what the system comprises, how it functions, and how it addresses the challenges faced by visually impaired children. Additionally, a concise overview of the methodology, core features, and intended impact of the system would provide valuable context.
Including a visual representation, such as a figure or diagram, could significantly enhance the clarity of the introduction. A well-crafted figure illustrating the system's design, components, or workflow would allow readers to grasp the concept more quickly and effectively.
---- related work
The segment that elaborates on the 'innovation and contribution of this study' should be more appropriately integrated into the introduction section for better context and flow.
----
The paper introduces the system structure conceptually (Subsection 3.1) and illustrates it with what appears to be a CAD design or schematic (Figure 1). However, there is no photographic evidence, detailed bill of materials, or description of an assembled, functional prototype presented within this section. This absence makes it difficult to ascertain whether the described wearable device was physically realized and tested, moving beyond the conceptual design stage. Evaluating claims about system integration (wired/Bluetooth interfaces, cohesion/coupling) and real-world performance requires evidence of a tangible implementation.
Insufficient Detail on Camera Gimbal Implementation (Subsection 3.2): The rationale for including a single-axis gimbal is stated – to improve camera stability during movement and reduce user fatigue. A conceptual PID control block diagram (Figure 2) is provided. However, this description remains high-level. Crucial implementation details are missing.
Conceptual Description of Obstacle Detection (Subsection 3.3): This subsection describes standard point cloud processing techniques for obstacle detection. While these methods are appropriate, the description lacks sufficient depth for scientific rigor and reproducibility. As presented, this section describes what techniques were chosen, but not how they were specifically implemented and tuned. This makes it impossible to verify the claimed results or reproduce the obstacle detection module accurately. The description, while referencing valid techniques, resembles a conceptual overview rather than a detailed implementation report.
Subsection 3.4 lacks details on the data collection process, it focuses solely on data preparation for obstacle recognition.
--- Experiments
Comparison of Interaction Efficiency. The experiment recruited five volunteers, blindfolding them to imitate visually impaired children. However, this approach raises serious concerns about the reliability and validity of the findings. The methodology used does not involve actual blind children—the very demographic this study seeks to assist—which undermines the practical relevance of the results. While testing on volunteers may have been logistically easier, it fails to replicate the unique challenges that visually impaired children face in real-world scenarios. This oversight reflects a lack of rigor in the experimental design and compromises the credibility of the study. Furthermore, the limited sample size of five participants is insufficient to establish statistically meaningful conclusions, and the experimental outcomes cannot be confidently generalized.
Comparison of Devices’ Function. The study compared the functionality of smart navigation wearable devices, traditional guide sticks, and ultrasonic canes by recruiting 20 volunteers . Similar to the previous remark about 5 volunteers, these participants did not belong to the target group of visually impaired children. The exclusion of actual blind children in this testing diminishes the applicability and reliability of the results. The experimental outcomes, summarized in histogram form, provide minimal value to the reader. A more thorough analysis is required, including detailed statistical evaluations and qualitative insights, to convey the comparative advantages of the devices accurately. Overall, the lack of focus on the intended user group casts doubt on the study's seriousness and practical contribution.
Comments on the Quality of English Language
The overall quality of English in the manuscript is acceptable.
Author Response
Comments 1:
---- abstract
Abstract introduces potentially valuable work on a wearable navigation system for visually impaired children. However, for a scientific paper, it could be strengthened by providing more concrete methodological and technological details. While it mentions "advanced obstacle detection," "real-time object recognition," and the use of "YOLOv8n," these descriptions remain somewhat high-level. Greater specificity regarding the sensor types, the distinct algorithms beyond naming the model, or the particular implementation of the "multimodal feedback" would allow for a clearer understanding of the system's technical novelty.
Furthermore, the focus on exactly what was achieved could be more precise. The abstract states the system significantly improves safety and outperforms traditional aids based on user studies. Including key quantitative results or specific performance metrics from the user studies would provide more substantial evidence of the system's effectiveness.
Response 1: I highly agree with your suggestions on the abstract, which have been very helpful in improving my paper. In the revised manuscript, we have reorganized the abstract. A detailed description of the sensor type and the specific implementation of multimodal vibration has been added to the abstract. In the description of the results achieved, we have added an overview of the experimental indicator collision rate and an overview of the experimental questionnaire results, which provide evidence for the effective performance of the system.
Comments 2:
---- introduction
The introduction section of the paper requires significant enhancement to clearly highlight the authors' contributions. As it stands, the current version is overly brief and lacks a detailed explanation of the work undertaken. It does not sufficiently convey the innovative aspects or the specific outcomes of the study.
To improve the introduction, the authors should explicitly describe the system they have developed. Readers need a clear understanding of what the system comprises, how it functions, and how it addresses the challenges faced by visually impaired children. Additionally, a concise overview of the methodology, core features, and intended impact of the system would provide valuable context.
Including a visual representation, such as a figure or diagram, could significantly enhance the clarity of the introduction. A well-crafted figure illustrating the system's design, components, or workflow would allow readers to grasp the concept more quickly and effectively.
Response 2: Thank you very much for pointing out the problem with our introduction. We strongly agree with your point of view. In the revised manuscript, we have rewritten the introduction section. The logic of the rewritten introduction is based on the challenges faced by visually impaired people in traveling, the shortcomings of traditional assistive tools, the RGB-D camera as a development trend to solve this problem, and the shortcomings of the current assistive tools equipped with RGB-D cameras. Finally, we put forward the innovative points of our research on the intelligent navigation wearable assistive tool. The rewritten introduction section describes that the system consists of an RGB-D camera and a camera gimbal. The system uses obstacle detection and target recognition algorithms, and guides users' travel navigation through multi-modal intelligent interaction.
Comments 3:
---- related work
The segment that elaborates on the 'innovation and contribution of this study' should be more appropriately integrated into the introduction section for better context and flow.
Response 3: Thank you very much for your suggestion. In the revised manuscript, we have added the "Innovation and Contribution" section to the introduction. At the same time, because we have expanded the scope of visually impaired children to include visually impaired people, we have also adjusted the content of the arguments in the "Related Work" section.
Comments 4:
----
The paper introduces the system structure conceptually (Subsection 3.1) and illustrates it with what appears to be a CAD design or schematic (Figure 1). However, there is no photographic evidence, detailed bill of materials, or description of an assembled, functional prototype presented within this section. This absence makes it difficult to ascertain whether the described wearable device was physically realized and tested, moving beyond the conceptual design stage. Evaluating claims about system integration (wired/Bluetooth interfaces, cohesion/coupling) and real-world performance requires evidence of a tangible implementation.
Insufficient Detail on Camera Gimbal Implementation (Subsection 3.2): The rationale for including a single-axis gimbal is stated – to improve camera stability during movement and reduce user fatigue. A conceptual PID control block diagram (Figure 2) is provided. However, this description remains high-level. Crucial implementation details are missing.
Conceptual Description of Obstacle Detection (Subsection 3.3): This subsection describes standard point cloud processing techniques for obstacle detection. While these methods are appropriate, the description lacks sufficient depth for scientific rigor and reproducibility. As presented, this section describes what techniques were chosen, but not how they were specifically implemented and tuned. This makes it impossible to verify the claimed results or reproduce the obstacle detection module accurately. The description, while referencing valid techniques, resembles a conceptual overview rather than a detailed implementation report.
Subsection 3.4 lacks details on the data collection process, it focuses solely on data preparation for obstacle recognition.
Response 4:
Thank you very much for your suggestion, and we agree with your suggestion. In Section 3.1 of the revised manuscript, we have added a picture of the actual object and a description of the object, including the actual weight, size and other information. At the same time, in order to make the system easier for readers to understand, we have described the system's camera, IMU, and controller in more detail. In Section 3.2 of the revised manuscript, we added the input and output of the PID controller, as well as the core calculation formula of the PID controller. At the same time, we added the calculation of the camera gimbal's field of view to verify the rationality of the camera gimbal's placement. In Section 3.3 of the revised manuscript, we added the specific process of obstacle detection implementation, and added more detailed descriptions and pseudo code presentations. In Section 3.4 of the revised manuscript, we added the content of the data collection process and reorganized this part.
Comments 5:
--- Experiments
Comparison of Interaction Efficiency. The experiment recruited five volunteers, blindfolding them to imitate visually impaired children. However, this approach raises serious concerns about the reliability and validity of the findings. The methodology used does not involve actual blind children—the very demographic this study seeks to assist—which undermines the practical relevance of the results. While testing on volunteers may have been logistically easier, it fails to replicate the unique challenges that visually impaired children face in real-world scenarios. This oversight reflects a lack of rigor in the experimental design and compromises the credibility of the study.
Furthermore, the limited sample size of five participants is insufficient to establish statistically meaningful conclusions, and the experimental outcomes cannot be confidently generalized. Comparison of Devices’ Function. The study compared the functionality of smart navigation wearable devices, traditional guide sticks, and ultrasonic canes by recruiting 20 volunteers . Similar to the previous remark about 5 volunteers, these participants did not belong to the target group of visually impaired children. The exclusion of actual blind children in this testing diminishes the applicability and reliability of the results. The experimental outcomes, summarized in histogram form, provide minimal value to the reader. A more thorough analysis is required, including detailed statistical evaluations and qualitative insights, to convey the comparative advantages of the devices accurately. Overall, the lack of focus on the intended user group casts doubt on the study's seriousness and practical contribution.
Response 5: Thank you very much for your suggestion, we are very complementary to your suggestion. We asked volunteers to blindfold themselves to imitate visual impairment for the experiment. The methodology used does not involve actual blind children which undermines the practical relevance of the results. Therefore, in the revision stage, we adjusted the target user group from the visually impaired to a wider range of visually impaired people (including teenagers and adults), reducing the specificity of the target user group, reducing the specificity of the target user group, and reducing the specificity of the target user group.
However, the experiment still uses volunteers with normal vision to simulate the state of visual impairment for two reasons: On the one hand, the needs of the technical verification stage: the current experiment focuses on the basic interaction logic and functional verification of the device (such as obstacle avoidance failure, delay, etc.), and these indicators feedback have a higher impact on the state of visual loss. In addition, methodology: Hass C et al. published in "Consumer Informatics and Digital Health: Health and Healthcare Solutions" in 2019: Usability testing focuses on products or services rather than end users, making it a powerful tool for improvement. In Usertesting without the user: Opportunities and Challenges of an ai-driven manner in games user research published by Stahlke et al. in 2018, "Using AI-driven agents as a substitute for human participants in game evaluation solves the challenges of cost and time associated with traditional user testing." The stage of our study also uses healthy volunteers test to quickly iterate the design. In subsequent studies, we will recruit visually impaired groups to conduct experiments so that the experimental data are more in line with the actual effects. For the experiment with 5 volunteers, we mastered the walking time of the smart navigation device in different interaction methods and conducted experience relationship interviews with the testers after the experiment. In the original manuscript, we evaluated and qualitatively analyzed the feedback of the volunteers' questionnaire records. However, since the questionnaire feedback and the experiment are divided into the same level of titles, readers can not clearly see the logic. In lines 407-437 and 445-489 of the revised manuscript, we divide the experimental results of the two experiments into two parts: data analysis and interview analysis. We pay more attention to obtaining fixed user experience feelings in the volunteer experience interview, so as to analyze the advantages of different interaction methods. In the revised manuscript, the qualitative analysis and evaluation content of the questionnaire feedback is placed at the next level of the experiment title, so that the qualitative analysis of the experiment can be more clearly seen.Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have addressed all my remarks. In my opinion, paper can be accepted.
Author Response
Comments 1: The authors have addressed all my remarks. In my opinion, paper can be accepted.
Response 1: We sincerely appreciate the time and effort you have dedicated to reviewing our manuscript and providing constructive feedback. We are grateful for your positive assessment that all raised remarks have been adequately addressed. Your insightful comments have significantly improved the quality of this work.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe marked comments of my previous review were not addressed.
I think the authors submitted a manuscript dealing with an important engineering problem. I think the manuscript may fit well to the scope of the journal. I think the introduction section give enough background information on the research topic. But I think it would be good to give a definition on the levels and severity of visual impairments of humans. For example, my background does not lie in medicine or biology. Thus, my knowledge is limited on that topic. In my opinion, the related work section is also satisfactory. I think the overall architecture of the system is described well and it is suitable for a scientific publication. I felt that Section 4.3 is rather superficial. I think the authors could share more details on the design of the PID controller. The authors could illustrate the response of the PID controller to various input signals for instance. I felt that Section 3.3 is also rather bit superficial. The obstacle detection algorithm could be illustrated with a flow chart. Sample results would be also welcomed. If I understand correctly, the authors used already existing algorithms for target recognition. I think the presentation of the results are also bit superficial. It would be good to know something on the demographic composition of volunteers. The authors proposed a system for visually impaired children. It would be good to know the severity and the type of visual impairment of the volunteers.
Author Response
Comments 1: But I think it would be good to give a definition on the levels and severity of visual impairments of humans.
Response 1: Thank you for your comments, I agree with you very much. In lines 32-36 of the latest revised manuscript, I added the definition of the level and severity of human visual impairment and cited literature to support the definition.
Comments 2: I felt that Section 4.3 is rather superficial. I think the authors could share more details on the design of the PID controller. The authors could illustrate the response of the PID controller to various input signals for instance.
Response 2: Thank you for pointing out the problems in my paper. I have revised the manuscript. In lines 251-271 of the latest revision, I added more details on the PID controller design, including the design of the PID controller and the response results of the PID controller parameter tuning to step signals and disturbances.
Comments 3: I felt that Section 3.3 is also rather bit superficial. The obstacle detection algorithm could be illustrated with a flow chart. Sample results would be also welcomed.
Response 3: Thank you for your comments. I have revised the manuscript. In the latest revision, lines 273-278 have been added to illustrate the obstacle detection algorithm. At the same time, lines 287 and 309 show sample results of the detection algorithm.
Comments 4: If I understand correctly, the authors used already existing algorithms for target recognition. I think the presentation of the results are also bit superficial.
Response 4: Thank you very much for your comments, which are extremely helpful. Indeed, I employed the existing YOLO target detection algorithm as the basis for the recognition task. However, I constructed and trained multiple target detection models using my own dataset. Through comparative analysis, the YOLOv8n model demonstrated the best performance in terms of recognition accuracy and detection speed. To enhance the presentation of the results, I have added definitions of the evaluation metrics used to assess model performance, along with the visualized detection outcomes, in the revised manuscript (lines 348–362). These additions help to strengthen the persuasiveness and clarity of the model evaluation process.
Comments 5: It would be good to know something on the demographic composition of volunteers. The authors proposed a system for visually impaired children. It would be good to know the severity and the type of visual impairment of the volunteers.
Response 5: Thank you for your comments, I agree very much. In lines 454-456 and 495-499 of the latest revised manuscript, I added descriptions of the demographic composition of volunteers, including age, gender, and the type of visual impairment simulated by the volunteers. At the same time, in lines 445 and 493 of the latest revised manuscript, I added descriptions of the severity and type of visual impairment of the volunteers. For all recruited volunteers, they are healthy people with normal vision, but in experimental tests, we simulate the most severe visual impairment of completely blind people by blindfolding the volunteers. People with the most severe visual impairment can use tools to avoid obstacles, so other people with less severe visual impairment can also use the tool to assist in obstacle avoidance. At the same time, in lines 445-454 of the latest revised manuscript, I explained the reason why I choose the volunteers with normal vision. To avoid possible ethical risks in using product prototypes for testing by visually impaired people, such as potential physical harm, volunteers with normal vision were recruited to simulate the visual impairment status. At the last, in lines 587-592 of the latest revised manuscript, the limitations acknowledge the shortcomings of using people with normal vision to simulate visually impaired people, and future research will use real visually impaired people for testing.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe revised paper demonstrates significant improvement in addressing previously highlighted concerns. The authors have enhanced the abstract by incorporating more detailed descriptions of the technological aspects and methodologies, which now provide clearer insights into the system's innovation and effectiveness. The introduction has been expanded, offering a more comprehensive overview of the study’s contributions, alongside visual aids that bolster understanding of the system’s design. Moreover, the technical sections present greater specificity, with added implementation details that strengthen the scientific rigor and reproducibility of the work. The experiments have been refined to better align with the study's objectives, and while some limitations remain, the overall improvements contribute to a more robust presentation of findings.
Author Response
Comments 1: The revised paper demonstrates significant improvement in addressing previously highlighted concerns. The authors have enhanced the abstract by incorporating more detailed descriptions of the technological aspects and methodologies, which now provide clearer insights into the system's innovation and effectiveness. The introduction has been expanded, offering a more comprehensive overview of the study’s contributions, alongside visual aids that bolster understanding of the system’s design. Moreover, the technical sections present greater specificity, with added implementation details that strengthen the scientific rigor and reproducibility of the work. The experiments have been refined to better align with the study's objectives, and while some limitations remain, the overall improvements contribute to a more robust presentation of findings.
Response 1: We sincerely appreciate your thorough evaluation and constructive feedback on our manuscript. Your insightful comments have been invaluable in enhancing the quality of our work. We are pleased that the revisions—including the expanded methodological details in the abstract, strengthened introduction with visual aids, and refined technical implementation descriptions—have met your expectations. Thank you for your time and expertise in reviewing this manuscript.
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsAfter two revisions, I think the paper can be accepted now. The authors addressed my comments.