Accelerating Surgical Skill Acquisition by Using Multi-View Bullet-Time Video Generation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper evaluates a multi-camera system for bullet-time video, and performs an analysis of the system on its ability to reduce occlusions, and a user study that validates the surgical education applications of the system for the suturing task.
Some comments on the work are the following:
* What would be the main differences between the proposed work and the one presented in [1]? That system also employed a 20-camera array to generate the bullet time video. If there are additional contributions or improvements over the system presented in [1], these must be properly cited, and the changes must be clearly indicated in the paper. Similarly, the submitted work includes a user evaluation of the camera array proposed in [1]. If this is the main contribution, then it would be better to present the paper as a validation of the learning advantages of such a system, and the methodology can also mainly focus on the study design employed to validate and compare. In its current form, it is unclear what the paper's contribution is; it appears to be building upon a previously proposed system.
* I still do not fully understand what the main contribution is besides using multiple cameras. The comparisons are performed with a system with a single camera and one with five cameras. It would be recommended to describe better what the benefits of the proposed system are, in addition to increasing the number of cameras. I understand this might be related to the inclusion of bullet time generation, but I wonder if that can also be done with a five-camera array.
* Similarly, while the number of cameras will help with occlusions, I also wonder if fewer cameras, positioned in a strategically arrayed manner, can also reduce occlusion problems. Are there any alternative methods to achieve similar results with fewer cameras, or is 20 cameras a requirement to achieve the results presented? For example, would a calibrated array of five cameras with overlapping scenes generate a similar effect?
* It is not clear how the users interact with the system once it is created. For example, the users watch the multi-view video before the task, or during the task. Also, does the presented bullet time allow the user to observe any particular angle at any particular time?
* While a set of computer vision methods is described in the document and included in the processing of the videos, it is unclear how they are used in the final system and how this information is used to enhance the learning experience.
* They are references to images that do not appear in the document, for example. Fig. 14 in line 371.
[1] Wang, Yinghao, et al. "A Surgical Bullet-Time Video Capturing System Depending on Surgical Situation." 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE). IEEE, 2021.
Comments on the Quality of English LanguageThe language can be improved, especially in the introduction. The motivations are presented in a not clearly connected way.
Author Response
For research article
Response to Reviewer 1 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions in track changes in the re-submitted file. Your review possesses a high degree of professionalism and we thank you for such high quality comments and suggestions. This will help us to improve this article and present our work better.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Must be improved |
|
Is the research design appropriate? |
Must be improved |
|
Are the methods adequately described? |
Must be improved |
|
Are the results clearly presented? |
Must be improved |
|
Are all figures and tables clear and well-presented |
Must be improved |
|
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: What would be the main differences between the proposed work and the one presented in [1]? That system also employed a 20-camera array to generate the bullet time video. If there are additional contributions or improvements over the system presented in [1], these must be properly cited, and the changes must be clearly indicated in the paper. Similarly, the submitted work includes a user evaluation of the camera array proposed in [1]. If this is the main contribution, then it would be better to present the paper as a validation of the learning advantages of such a system, and the methodology can also mainly focus on the study design employed to validate and compare. In its current form, it is unclear what the paper's contribution is; it appears to be building upon a previously proposed system. [1] Wang, Yinghao, et al. "A Surgical Bullet-Time Video Capturing System Depending on Surgical Situation." 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE). IEEE, 2021. |
||
Response 1: Thank you for your comment. Exactly, the article’s method has close relationship with our previous work. While as for the contribution difference between this work and previous is that this time we made improvements for the key step in the generation of bullet-time videos, that is, the strategy of selecting gaze-points, YOLO+CBAM. In addition, we designed experiments to demonstrate, for the first time, both quantitatively and qualitatively, the effectiveness of bullet-time videos for the education of surgical instrumentation skills. However in previous work, there was the bullet-time basic generation method’s introduction. For the Fine-tuned object-detection contribution is written in section 6 paragraph 1, while the effectiveness evaluation is written in the next paragraph at same page.] |
||
Comments 2: [I still do not fully understand what the main contribution is besides using multiple cameras. The comparisons are performed with a system with a single camera and one with five cameras. It would be recommended to describe better what the benefits of the proposed system are, in addition to increasing the number of cameras. I understand this might be related to the inclusion of bullet time generation, but I wonder if that can also be done with a five-camera array.] |
||
Response 2: Thank you for your comment. This is as you said, leaving aside the advantage of mitigating the effects of occlusions, also, the higher number of cameras allows for a more natural-looking bullet-time video that switches and rotates between viewpoints, thus reducing the feeling of alienation for the viewer. If there are fewer cameras, as the object moves, there will always be camera views that do not capture the information about the object in the moment. In contrast to the uncertainty of frame interpolation in generating scenes (especially for high-precision scenes such as medical, semiconductor, etc.), we think increasing the number of cameras at the physical level is clearly a wise choice. “[We add context at the beginning of section 4.1 to emphasize this effect]” |
||
Comments 3: [Similarly, while the number of cameras will help with occlusions, I also wonder if fewer cameras, positioned in a strategically arrayed manner, can also reduce occlusion problems. Are there any alternative methods to achieve similar results with fewer cameras, or is 20 cameras a requirement to achieve the results presented? For example, would a calibrated array of five cameras with overlapping scenes generate a similar effect?] |
||
Response 3: Thanks for your comment. Just as I responded in Response 2, the generating bullet-time video’s unnatural problem will exist in fewer cameras situation. Besides, 20 cameras provide richer feature correspondences across overlapping views and increased data redundancy suppresses noise in Bundle Adjustment optimization. The calibration contents is in section 3.1.1 at page 4, however the accurate internal and external camera paramaters help to improve subsequent gaze-point selection and homography transformation.] |
||
Comments 4: [It is not clear how the users interact with the system once it is created. For example, the users watch the multi-view video before the task, or during the task. Also, does the presented bullet time allow the user to observe any particular angle at any particular time?] |
||
Response 4: Thank you for pointing out this. The users are required to watch bullet-time video before the task. And allowing users to observe any angle at any time is an advantage and feature of generated bullet-time video. We have emphasize this at section 4.3.1’s paragraph 2 at page 11.] |
||
Comments 5: [While a set of computer vision methods is described in the document and included in the processing of the videos, it is unclear how they are used in the final system and how this information is used to enhance the learning experience.] |
||
Response 5: Agree. We truly did not describe the computer vision methods using strategy well at original submitted one. We have, accordingly added new section to emphasize this point. We added the section 3.2.3 to illustrate what characters the technologies played in our work at page 6. |
||
Comments 6: [They are references to images that do not appear in the document, for example. Fig. 14 in line 371.] |
||
Response 6: Thank you for pointing this out. We have changed to correct figure number at page 14, line 462.] |
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper presented a novel approach of multi-view capturing system and bullet-time generation technique to surgical education.
Strengths:
- The paper is well-written, offering sufficient background information and detailed descriptions of the experiments.
- The proposed system consists of many complex components, and the paper provides a detailed explanation of each one.
- Based on the evaluation results, the proposed approach appears robust. The metrics and experimental studies are thoroughly and clearly presented.
Weaknesses:
- The quality of the presentation could be further enhanced, particularly with regard to the format of tables and plots.
- It would be beneficial for the author to include a more in-depth discussion in the Discussion section. At present, several next steps are mentioned without sufficient elaboration.
- Although the proposed approach has demonstrated improvements in the metrics, are there any resource or financial caveats that should be considered or acknowledged? In other words, to replace existing system with the proposed multi-view system, what are the main challenges to be tackled? It would be better if authors can add discussion towards real-world deployment and launch.
Author Response
For research article
Response to Reviewer 2 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions in track changes in the re-submitted file. Your evaluation of the strengths and weaknesses of our work is very pertinent, and we will also focus on the deployment issues you raised at the end in our future development.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Yes |
|
Is the research design appropriate? |
Yes |
|
Are the methods adequately described? |
Can be improved |
|
Are the results clearly presented? |
Yes |
|
Are all figures and tables clear and well-presented |
Can be improved |
|
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: The paper is well-written, offering sufficient background information and detailed descriptions of the experiments. The proposed system consists of many complex components, and the paper provides a detailed explanation of each one. Based on the evaluation results, the proposed approach appears robust. The metrics and experimental studies are thoroughly and clearly presented. |
||
Response 1: Thank you for your affirmation, your encouragement means a lot to us. |
||
Comments 2: [The quality of the presentation could be further enhanced, particularly with regard to the format of tables and plots.] |
||
Response 2: Thank you for your comment. This is truly as you mentioned, that our figures and tables captions could be too simplistic and could easily be misleading to readers. “[We add instruction and additional context for almost all the tables and figures especially the figures with subsectors]” |
||
Comments 3: [It would be beneficial for the author to include a more in-depth discussion in the Discussion section. At present, several next steps are mentioned without sufficient elaboration.] |
||
Response 3: Thanks for your comment. Agree with your advice with the discussion. Accordingly, we re-considered our system’s limitation. We add two main limitation about our work, which are “Lack of real-time performance and Limited Portability”. We add two paragraph in section 5 at page 17 accordingly.] |
||
Comments 4: [Although the proposed approach has demonstrated improvements in the metrics, are there any resource or financial caveats that should be considered or acknowledged? In other words, to replace existing system with the proposed multi-view system, what are the main challenges to be tackled? It would be better if authors can add discussion towards real-world deployment and launch.] |
||
Response 4: Thank you for pointing out this. As I mentioned in the new additions in Discussion, and as discussed in Response 3, the current multi-view system’s hardware is too cumbersome and it will be a challenge to deploy it without compromising the conduct of surgery as well as surgical room’s medical standards. So our system currently exists as a post-processing plug-in system for accelerating the acquisition of surgical instruments skills, and is still a long way from being a computer assisted real-time bullet-time generation system. |
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsIn the manuscript entitled “Accelerating Surgical Skill Acquisition by Using Multi-View Bullet-Time Video Generation”, the authors present several important findings. Despite the well-structured results, there are some points that must be addressed before I can provide a positive recommendation for accepting the manuscript for publication in the MDPI Applied Sciences journal:
- The Introduction section must be rewritten. Some ideas appear disjointed. For example, the authors write “Enhancing Learning Efficacy: Surgical …” and “Global Accessibility: Surgical Expertise,” but it is unclear whether they intend to present a glossary or introduce key concepts. In any case, the introduction should provide an extensive background on the relevance and prior work related to the main topic. I suggest including a comprehensive review of similar studies conducted globally to emphasize the international relevance and importance of the research.
- All figures need improved captions and context. The descriptions should clearly explain what is being shown to ensure the reader remains connected to the manuscript's narrative.
- Clarification is needed regarding the validity of the experimental results. For example, the authors state that “the experimental group achieved 30% lower trajectory errors (RMSE) and 22% faster task completion, with novices reporting 7.75/10 usefulness versus 6.0/10 for medical professionals.” However, how do the authors ensure that these results are attributable solely to the system and not to other factors (e.g., level of rest, individual experience, etc.)? The manuscript should include details on experimental controls to support the reliability of the outcomes.
- I recommend adding a paragraph before the Conclusion section to outline the study’s limitations.
- I also suggest including a paragraph before the Conclusion to highlight the novelty of this work in comparison with similar previously published studies.
- The Discussion section is too limited. This section should be one of the most extensive in the manuscript, where the results are thoroughly compared with prior work and their implications explored in depth. I recommend a careful rewrite of this section, including a detailed comparison with existing studies and a discussion of the significance of the findings.
- Conclusions must be fully supported by the presented results. I recommend revisiting the experimental design to ensure proper controls are in place to justify the claims made in the conclusions. Otherwise, the authors may consider modifying the scope of the study—for example, to focus on analyzing the system's electrical response.
- I suggest that the authors include in the Discussion section a perspective on the system’s potential applications in the medical sector. As a computer-assisted system, it would be helpful to reinforce its importance within current international trends in the scientific community. I also suggest citing the following reference: Automated Computer-Assisted Medical Decision-Making System Based on Morphological Shape and Skin Thickness Analysis for Asymmetry Detection in Mammographic Images. Diagnostics2023, 13, 3440.
Author Response
For research article
Response to Reviewer 3 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions in track changes in the re-submitted file. Your suggestions are vital for article structure improvement and ensuring novelty is conveyed. We thank you for reading our articles in such detail and for your suggestions.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Must be improved |
|
Is the research design appropriate? |
Must be improved |
|
Are the methods adequately described? |
Must be improved |
|
Are the results clearly presented? |
Must be improved |
|
Are all figures and tables clear and well-presented |
Must be improved |
|
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: The Introduction section must be rewritten. Some ideas appear disjointed. For example, the authors write “Enhancing Learning Efficacy: Surgical …” and “Global Accessibility: Surgical Expertise,” but it is unclear whether they intend to present a glossary or introduce key concepts. In any case, the introduction should provide an extensive background on the relevance and prior work related to the main topic. I suggest including a comprehensive review of similar studies conducted globally to emphasize the international relevance and importance of the research. |
||
Response 1: Thank you for your comment. Exactly, as for “Global Accessibility” We didn't correspond in the work described in this post, and another reviewer suggested that I expand on it in Discussion, so I moved it to Discussion(section 5), citing proven internationalized deployment systems. However in our work, Enhancing Learning Efficacy. This is one of the central key points we hope to improve, and textbooks and regular videos in the traditional sense have their limitations. And it is not just open surgery, but we found that laparoscopic surgery in [1](Citation 3 in article) also requires spatial perception, which is where our effectiveness in generating bullet-time videos to potentially improve and train participants comes into play. We delete the original context about Global Accessibility in introduction at page 1-2, instead, we dispersed it into several arguments and inserted them in paragraphs 2 and 4 of the Discussion. at page 17.] [1]Vajsbaher, T., Schultheis, H. & Francis, N.K. Spatial cognition in minimally invasive surgery: a systematic review. BMC Surg 18, 94 (2018). https://doi.org/10.1186/s12893-018-0416-1 |
||
Comments 2: [All figures need improved captions and context. The descriptions should clearly explain what is being shown to ensure the reader remains connected to the manuscript's narrative.] |
||
Response 2: Thank you for your comment. This is truly as you mentioned, that our figures and tables captions could be too simplistic and could easily be misleading to readers. “[We add instruction and additional context for almost all the tables and figures especially the figures with subsectors]” |
||
Comments 3: [Clarification is needed regarding the validity of the experimental results. For example, the authors state that “the experimental group achieved 30% lower trajectory errors (RMSE) and 22% faster task completion, with novices reporting 7.75/10 usefulness versus 6.0/10 for medical professionals.” However, how do the authors ensure that these results are attributable solely to the system and not to other factors (e.g., level of rest, individual experience, etc.)? The manuscript should include details on experimental controls to support the reliability of the outcomes.] |
||
Response 3: Thanks for your comment. Just as I responded in Response 2, the generating bullet-time video’s unnatural problem will exist in fewer cameras situation. Besides, 20 cameras provide richer feature correspondences across overlapping views and increased data redundancy suppresses noise in Bundle Adjustment optimization. The calibration contents is in section 3.1.1 at page 4, however the accurate internal and external camera paramaters help to improve subsequent gaze-point selection and homography transformation.] |
||
Comments 4: [I recommend adding a paragraph before the Conclusion section to outline the study’s limitations.] |
||
Response 4: Thanks for your advice. Also accordingly in conjunction with the suggestions you gave in comment 6 we have reworded the Discussion section to include system limitations. We add three new paragraph in section 5 and edited the current contents at page 17 accordingly.] |
||
Comments 5: [I also suggest including a paragraph before the Conclusion to highlight the novelty of this work in comparison with similar previously published studies.] |
||
Response 5: Agree. For our work, there are system prototypes proposed, algorithms improved, and new technologies developed, and such a comprehensive pipeline of work really necessitates the addition of exclusive paragraphs of explicit novelty as you mentioned. We add a paragraph to illustrate the summarize novelties right before section 6 at page 17. |
||
Comments 6: [The Discussion section is too limited. This section should be one of the most extensive in the manuscript, where the results are thoroughly compared with prior work and their implications explored in depth. I recommend a careful rewrite of this section, including a detailed comparison with existing studies and a discussion of the significance of the findings.] Response 6: Agree. We truly might have previously passed over some of the key LIMITATIONS and CHALLENGES in an overly concise stroke in DISCUSSION, e.g., system limitations, hardware deployment challenges, lack of data. While we gained an advantage in terms of efficiency, though the mentioned other parts seem to be the advantages for the pre-existing system. Therefore we almost rewrote the DISCUSSION chapter. We rewrote the section 5 to illustrate what’s the challenge and limitation for current multi-view system and on the way of continually developing at page 17. |
||
Comments 7: [Conclusions must be fully supported by the presented results. I recommend revisiting the experimental design to ensure proper controls are in place to justify the claims made in the conclusions. Otherwise, the authors may consider modifying the scope of the study—for example, to focus on analyzing the system's electrical response.] Response 7: Thanks for your well-considered comment. For the connection between methods, experiments and conclusion in section 6. We hold the associations that make the one-to-one correspondence. The first paragraph of the Conclusion recounts the improvement of our work since PREVIOUS WORK in terms of the improvement of the multi-viewpoint proposal and the bullet time video generation method itself. The second paragraph is a summary of the qualitative and quantitative analysis of this experiment and evaluation chapter. The last paragraph is a summary of the paper and a brief account of the challenges we are facing (details in DISCUSSION). |
||
Comments 8: [I suggest that the authors include in the Discussion section a perspective on the system’s potential applications in the medical sector. As a computer-assisted system, it would be helpful to reinforce its importance within current international trends in the scientific community. I also suggest citing the following reference: Automated Computer-Assisted Medical Decision-Making System Based on Morphological Shape and Skin Thickness Analysis for Asymmetry Detection in Mammographic Images. Diagnostics2023, 13, 3440.] |
||
Response 8: Agree. But comparing international trends, we are more concerned about his challenges in terms of functional defects, so we have, accordingly integrated your comments into the discussion given in conjunction with the references mentioned. We added the 2nd paragraph in section 5 with the sub-title is Lack of real-time performance to illustrate the limitations of the system, also simply comparing some of the immaturity of the breast cancer system [36] that you mentioned. |
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI thank the authors for the response and clarifications. Considering the responses, the current work appears to be more focused on evaluating the educational applications of the previously presented system. I would recommend making this clear from the introduction, as this is mentioned in the discussion section at the end of the paper.
Similarly, there is a set of motivations that are described similarly to a bullet point in the introduction. I understand that the proposed system can help support and address the motivations presented in the introduction. It would be recommended to discuss how the system contributes to these motivations. Then, as the previous work did not evaluate these from the user perspective, the main contribution of the paper can be introduced as the additional improvements to the algorithmic base of the previous system and the user-oriented evaluation of the applicability of the system in surgical education (and the additional experiments that demonstrate better control over occlusions, etc). This can help provide a better context for readers regarding the relevance of the current work in relation to the previous system.
Some minor details: equation 4.3.3 might be missing some values.
Comments on the Quality of English LanguageThe additional text clarifies some sections in the methodology. A recommendation is to review the introduction section.
Author Response
Comments 1: I thank the authors for the response and clarifications. Considering the responses, the current work appears to be more focused on evaluating the educational applications of the previously presented system. I would recommend making this clear from the introduction, as this is mentioned in the discussion section at the end of the paper.. |
Response 1: Thank you for checking and agreeing with our response. And for your comment, exactly, the article’s method has close relationship with our previous work. We have conducted further research and improvements in both technical(algorithm improvement) and effectiveness exploration. So this time for making clear of the relationship between this work and our previous one, we add a new paragraph to describe it in introduction, please refer to section 1 paragraph 7, from lines 58 to 63.] |
Comments 2: [Similarly, there is a set of motivations that are described similarly to a bullet point in the introduction. I understand that the proposed system can help support and address the motivations presented in the introduction. It would be recommended to discuss how the system contributes to these motivations. Then, as the previous work did not evaluate these from the user perspective, the main contribution of the paper can be introduced as the additional improvements to the algorithmic base of the previous system and the user-oriented evaluation of the applicability of the system in surgical education (and the additional experiments that demonstrate better control over occlusions, etc). This can help provide a better context for readers regarding the relevance of the current work in relation to the previous system.] |
Response 2: Thank you for your comment. Agree with your recommendation, It’s essential to add the contribution at the beginning and being call back of the motivation. Which I divided into three point: better strategy for gaze-point setting, proving the effectiveness of bullet-time video and evaluating the applicability of the system from a user-oriented perspective. “[We add context in section 1 paragraph 8(last paragraph), from lines 64 to 70.]”
|
Comments 3: [Some minor details: equation 4.3.3 might be missing some values.] |
Response 3: Thanks for your comment. I realized there was a slight error in the ordinal numbering of the formulas, thank you for pointing this out. Moreover, the automatic system scoring of S (0 to 10 points) duplicates the functionality of RMSE_ref in Figure 13, so we deleted the original 4.3.4 (labeled 4.3.3). We deleted the equation 4.3.4 and related description at original page 14, because the function is overlapping with RMSE_ref.] |
[1(6 in manuscript)] Wang, Yinghao, et al. "A Surgical Bullet-Time Video Capturing System Depending on Surgical Situation." 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE). IEEE, 2021.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have responded to the comments. Thus, I recommend accepting the manuscript for publication.
Author Response
Response: Thank you very much for taking your valuable time to check our response, your opinion is important to us.
Thank you also for supporting and recognizing our work.