Real-Time Deployment of Ultrasound Image Interpretation AI Models for Emergency Medicine Triage Using a Swine Model
Round 1
Reviewer 1 Report
Comments and Suggestions for Authorscomments in the attached file
Comments for author File: Comments.pdf
Author Response
In this paper, the authors present an application of Artificial Intelligence (AI) models for guidance and diagnostic in ultrasound image interpretation in emergency medicine. They train different models with the YOLOv8 architecture, for detection and localization of the kidney, the liver and ribs in different scan sites (guidance models), and for binary classification about the presence of injury (diagnostic models). The experiments are done using a swine animal model, and a comparison is performed of manual and robotic scanning procedures. The results show relatively high accuracy except for the detection of injury in the bladder. Although the work is a valuable contribution to the field, some points need clarification.
- There is a lack of references to other authors works on the use of AI in eFAST (extended focused assessment with sonography in trauma) and a discussion of how the submitted article is related to those works. Examples of references: https://pubmed.ncbi.nlm.nih.gov/37165477/, https://pubmed.ncbi.nlm.nih.gov/38088928/, https://www.mdpi.com/2075-4418/13/21/3388
Response: We appreciate the reviewer taking the time to evaluate our manuscript for publication. We have added additional details on other work in this area and how this manuscript differs from those in the introduction starting on line 77.
- As I have no expertise in animal models, I don’t understand why the spleen is removed. I think clarification about this point is needed, as this journal is not specific on biology or medicine.
Response: Great comment, the spleen in swine is much larger than humans and provide autoresuscitation during hemorrhage studies unlike humans. As such it is removed to better mimic human physiology during shock studies. As those studies were not the scope of this manuscript, we did not initially include but have provided justification for this animal model approach in the methods on line 117.
- Line 119 - In the paragraph starting at line 119, the structure of the ultrasound dataset is described. It would be very useful here a diagram or a table summarizing the structure of the dataset.
Response: We agree with the reviewer and have added a figure as requested in this section of the paper describing the ultrasound image dataset.
- Line 141 - It is not clear here the meaning of the term “datastores”.
Response: Datastore is a filetype containing images used for different aspects of training, validation, testing. In this context, it contains a random assortment of images and the corresponding bounding box labels. The sentence has been reworded for clarity.
- Line 161 – How many examples were used for training? As I understand, only a random subset of the total frames was labelled.
Response: A total of 44,736; 9,449; and 7039 images were used for training ribs, kidney, and bladder guidance AI models. This has been clarified in the text at the referenced section.
- 175 – The type of model used for diagnostic should be mentioned at the beginning of section 2.4
Response: Thanks for the comment – we agree with the reviewer and have specified what AI models types were used earlier in this section.
- Diagrams describing the overall data processing workflow would help a better understanding. For example, a diagram showing the different trained models and which task is performed by each of them (structure segmentation or injurie classification)
Response: We have added a data processing workflow diagram and another detailing how the AI models and pre-processing steps fit together in the methods at this section. We hope that helps provide additional clarity in the text.
- Line 382. How were depths measured?
Response: Depths were measured using the Intel RealSense 435i camera using its stereo vision technology. This is described starting on line 376. The stereo vision camera takes two images from two sensors and compares them. Given the known distance between these two sensors, the depth is able to be determined.
- The model for injury detection in bladder has a significantly low accuracy. Do you have any hypothesis about why?
Response: We believe the poor bladder performance could be due to a number of factors. One, there is additional variability with regard to bladder size, fullness that does not exist at any other scan point. Two, the bladder appears as a dark fluid filled cavity similar to the hemorrhage found adjacent to it which could lead to miss-classification. Three, the urinary catheter balloon can cause an artifact in capture ultrasound images which can make interpreting the image more complicated. We have added these potential reasons for poor BLD performance in the discussion section on line 629.
- There is no discussion in the article about how the swine results are going to be generalized to humans.
Response: We have added reference to this generalization and human translation concept to the “next steps / future work” section of the discussion
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this manuscript, the authors propose an AI-driven approach for real-time ultrasound image interpretation using a variety of models, with a specific focus on the eFAST scan. The methodology presented is innovative and holds great promise for improving the accuracy and speed of medical diagnostics. However, several points require clarification and further exploration to strengthen the manuscript. I recommend the authors address the following points before publication:
Here are some specific comments and suggestions:
1. The manuscript describes how MATLAB's toolbox is used for cropping and resizing images. It would be helpful if the authors could further explain why they selected a 512×512 pixel size for resizing. How does this specific resolution impact the AI training process, particularly in terms of model performance and computational efficiency?
2. The authors provide a description of the image quality assessment and filtering process. It would be beneficial for the authors to clarify how the image quality scores (1 to 5) are quantified. Specifically, what criteria or standards were used to assign these scores? Additionally, can the authors confirm whether a standardized rating system or any auxiliary tools were used to ensure consistency in the evaluation process?
3. YOLOv8 architecture is mentioned multiple times for training the models. However, there is limited discussion on why YOLOv8 was chosen over other architectures such as Faster R-CNN, SSD, or other alternatives. Could the authors provide more insight into why YOLOv8 was preferred for this application? Specifically, what advantages does YOLOv8 offer for processing ultrasound images compared to other architectures?
4. The manuscript discusses how guidance filtering helps improve prediction accuracy, but there may be concerns about over-filtering, which could potentially lead to the loss of useful images. Would it be possible for the authors to elaborate on the potential trade-offs of this approach and discuss its impact on the overall results, particularly in terms of model performance and diagnostic accuracy?
5. IOU scoring is used during the testing phase for model evaluation, which is a reasonable approach. However, the manuscript does not clarify the specific threshold for IOU scores. In which cases would a model prediction be considered successful? It would be helpful to define the criteria for a successful prediction based on IOU scores, to allow readers to fully understand how model performance is assessed.
6. The manuscript highlights the high IOU score for the kidney model (0.94), indicating the model's ability to accurately identify this region. However, the bladder model has a relatively low accuracy (0.65), primarily due to the high false positive rate. Could the authors consider exploring ways to improve the bladder model's performance, such as refining the model or using more targeted training data to reduce false positives?
7. The real-time performance of the guidance and diagnostic models in the RT eFAST application appears to differ from the training data, particularly in the RUQ scan region. This suggests that data quality or processing delays may impact model accuracy in real-time scenarios. Could the authors explore strategies to optimize the models for real-time performance, especially in terms of handling live ultrasound images and minimizing processing time or delays that might affect diagnostic reliability.
8. The article mentions AI and Medical imaging. It is suggested that the following research articles could be referenced: Ultrasonics, 139, 107277, 2024; Neurocomputing, 573, 12720, 20247; Photoacoustics, 2023, 34: 100572.
Comments on the Quality of English LanguageThe English could be improved to more clearly express the research.
Author Response
In this manuscript, the authors propose an AI-driven approach for real-time ultrasound image interpretation using a variety of models, with a specific focus on the eFAST scan. The methodology presented is innovative and holds great promise for improving the accuracy and speed of medical diagnostics. However, several points require clarification and further exploration to strengthen the manuscript. I recommend the authors address the following points before publication:
Here are some specific comments and suggestions:
- The manuscript describes how MATLAB's toolbox is used for cropping and resizing images. It would be helpful if the authors could further explain why they selected a 512×512 pixel size for resizing. How does this specific resolution impact the AI training process, particularly in terms of model performance and computational efficiency?
Response: We appreciate the reviewer taking the time to review our manuscript for publication. This decision was based on making the image symmetrical in height and width and maintaining a high input resolution for distinguishing small features found in US images. Further, our team has had success using an image input size of 512 x 512 in prior US AI model development activities. This justification has been added to the methods on line 154.
- The authors provide a description of the image quality assessment and filtering process. It would be beneficial for the authors to clarify how the image quality scores (1 to 5) are quantified. Specifically, what criteria or standards were used to assign these scores? Additionally, can the authors confirm whether a standardized rating system or any auxiliary tools were used to ensure consistency in the evaluation process?
Response: Great question. Image quality was to help distinguish US scans where diagnostic was not possible (scored 1) from high quality, easy to interpret images (scored 5). Two of our author team who was responsible for the vast majority of capture US images performed this task and met initially to establish some scoring criteria of what is a 1 and 5 to try and standardize the scoring system, followed by discussing score discrepancies to resolve these images as best possible. More information on this process has been added to the methods on line 175.
- YOLOv8 architecture is mentioned multiple times for training the models. However, there is limited discussion on why YOLOv8 was chosen over other architectures such as Faster R-CNN, SSD, or other alternatives. Could the authors provide more insight into why YOLOv8 was preferred for this application? Specifically, what advantages does YOLOv8 offer for processing ultrasound images compared to other architectures?
Response: We have previously compared object detection models for shrapnel identification in ultrasound images and identified YOLO as the best performing model type for a similar application. Faster R-CNN and similar models are much slower than YOLO and were not considered. In addition, YOLO architecture is easy to implement and use in real-time. This justification has been added in the methods on line 192.
- The manuscript discusses how guidance filtering helps improve prediction accuracy, but there may be concerns about over-filtering, which could potentially lead to the loss of useful images. Would it be possible for the authors to elaborate on the potential trade-offs of this approach and discuss its impact on the overall results, particularly in terms of model performance and diagnostic accuracy?
Response: Great suggestion. We have added reference to the potential drawbacks of guidance filtering on how that may impact overall performance in the discussion section of the paper.
- IOU scoring is used during the testing phase for model evaluation, which is a reasonable approach. However, the manuscript does not clarify the specific threshold for IOU scores. In which cases would a model prediction be considered successful? It would be helpful to define the criteria for a successful prediction based on IOU scores, to allow readers to fully understand how model performance is assessed.
Response: This was an oversight by the authors and has been stated in the methods section on line 210. We aimed for IOU scores above 0.50 as this is commonly used in the field for successful object detection models.
- The manuscript highlights the high IOU score for the kidney model (0.94), indicating the model's ability to accurately identify this region. However, the bladder model has a relatively low accuracy (0.65), primarily due to the high false positive rate. Could the authors consider exploring ways to improve the bladder model's performance, such as refining the model or using more targeted training data to reduce false positives?
Response: We believe the poor bladder performance could be due to a number of factors. One, there is additional variability with regard to bladder size, fullness that does not exist at any other scan point. Two, the bladder appears as a dark fluid filled chamber similar to the hemorrhage found adjacent to it which could lead to issues. Three, the urinary catheter balloon can cause an artifact in capture ultrasound images which can make interpreting the image more complicated. Means of refining including improved model training, better data curation are highlighted in the discussion as well. This is shown on line 642.
- The real-time performance of the guidance and diagnostic models in the RT eFAST application appears to differ from the training data, particularly in the RUQ scan region. This suggests that data quality or processing delays may impact model accuracy in real-time scenarios. Could the authors explore strategies to optimize the models for real-time performance, especially in terms of handling live ultrasound images and minimizing processing time or delays that might affect diagnostic reliability.
Response: We think there are a couple of dynamics happening here that impacted performance. One, loss of image resolution during streaming in real-time or other artifacts that get introduced in the process may result in the image looking different from training data for developing the AI models which may cause the performance issues. This can be improved through inclusion of real-time data capture types in the model training. Second, there is no curation of images when running in real time so images that would have been discarded from training may get passed to the AI model. Improved guidance models to better detect injury locations (pleura detection, edge of kidney/bladder, for instance) could assist with this. Both of these matters are discussed in the paper at line 680 and 721, respectively.
- The article mentions AI and Medical imaging. It is suggested that the following research articles could be referenced: Ultrasonics, 139, 107277, 2024; Neurocomputing, 573, 12720, 20247; Photoacoustics, 2023, 34: 100572.
Response: Thanks for the suggestion and the links to the interesting articles. We have provided three citations already to ultrasound and AI applications and found those to be more relevant to the scope of work presented here vs. the photoascoustic imaging papers suggested here. Thanks, however, for the information.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsComments in the attached file
Comments for author File: Comments.pdf
Author Response
Thanks for re-reviewing our manuscript. Comments are reproduced below along with our responses. We hope our edits address all raised issues.
- New references (from 16) are missing in the bibliography section
Our reference manager software did not update for the new references but that has been resolved now.
- Figure 2 seems to have an error here:
The figure has been modified to try and address the issue. This figure is about the image dataset organization and not the training of the AI models if that helps alleviate any confusion.
- The paragraph starting at line 665 contains some wording errors that need to be revised
Thanks for the suggestion, we have modified the paragraph for additional clarity.
- Line 722: “Further translation of this work will require translation…”. This sentence needs to be rephrased for clarity.
This has been reworded per reviewer recommendation.
- How do you think a large human image dataset including enough injury cases could be obtained? A brief comment about this point would improve the discussion.
We have added an extra sentence describing this requirement of working with emergency medicine departments to create this dataset.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author has made appropriate revisions to the paper.
Author Response
Thanks for reviewing our manuscript.