An Explainable Artificial Intelligence-Based Robustness Optimization Approach for Age-Related Macular Degeneration Detection Based on Medical IOT Systems
Round 1
Reviewer 1 Report
AI-based models have shown promising results in diagnosing eye disease based on multi-sources of data collected from medical IOT systems. However, there are concerns regarding their generalization and robustness, as these methods are prone to overfitting specific datasets. The development of Explainable Artificial Intelligence (XAI) techniques has addressed the black-box problem of machine learning and deep learning models, which can enhance interpretability and trustworthiness, and optimize their performance in the real world. Age-related Macular Degeneration (AMD) is currently the primary cause of vision loss among elderly individuals.
In this study, the authors applied the XAI methods to detect AMD using various ophthalmic imaging modalities collected from medical IOT systems, such as colorful fundus photography (CFP), optical coherence tomography (OCT), ultra-wide fundus (UWF) images, and fluorescein angiography fundus (FAF).
They proposed an optimized deep learning (DL) model and novel AMD identification systems based on the insights extracted by XAI.
They concluded:- that the findings of the study demonstrate that XAI not only has the potential to improve the transparency, reliability, and trustworthiness of AI models for ophthalmic applications, but it also has significant advantages for enhancing the robustness performance of these models. -XAI could play a crucial role in promoting intelligent ophthalmology and be one of the most important techniques for evaluating and enhancing ophthalmic AI systems.
The study is attractive.
I have some minor suggestions with a pure academic spirit.
1) The purpose must be more effective. Use also the bullet points for the sub-aims.
2) Avoid the use of the acronyms in the headings. See for example row 184.
3) Discuss figure 1 in details and add labels to Figure 3 and the other ones when needed.
4) Check the resolution of the figures.
5) Insert references in the discussion to corroborate your findings or to highlight the differences.
6) Disscusion must also report the limitations.
Author Response
Thank you for your comments and suggestions on my article. I appreciate your feedback and will take them into consideration to improve the quality and effectiveness of my paper.
Regarding your suggestions:
1) The purpose must be more effective. Use also the bullet points for the sub-aims.
Replay: We have revised the purpose of the study and use bullet points to better highlight the sub-aims. (the last paragraph of the 1. Introduction, p.94-p.101). Thank you again for your precious advice.
“
Three main goals are included in this study. (1) Detect AMD applying the DL model based on three different datasets of OCT, regular CFP (less than 50°), UWF (200°), and FAF. (2) Propose an explainability evaluation method based on CAM mechanism. (3) Perform a retrospective XAI analysis on the DL model based on CAM mechanism. (4) Propose a model architecture optimizing method by adding skip, attention, and transfer mechanisms. (5) Test the optimizing method by comparing the performances of accuracy, robustness and XAI of the former and improved models. (6) Finally, recognize and discuss the pattern of model bias from the perspective of XAI.
”
2) Avoid the use of the acronyms in the headings. See for example row 184.
Replay: I have avoided using acronyms in the headings and ensure that all abbreviations are clearly defined. Thank you again for your precious advice.
3) Discuss figure 1 in details and add labels to Figure 3 and the other ones when needed.
Replay: I have provided more detailed discussions on Figure 1 and add labels to Figures 3 and other relevant figures. Thank you again for your precious advice.
“
Figure 1 displays a case of interpretability result of OCT and segmented OCT images. All cases in the Figure 1 are related to scenarios with a right classification of AMD, where drusen manifestation are shown in the images. The right prototypes should be shown in the area under the fovea. The OCT segmented images have no cases of interpretable errors, where the images A-1, B-1, and B-2 represent interpretable correct cases, while image A-2 is an instance of an interpretable error. The data investigation based on Figure 1 reveals that 98% of the OCT explainable artificial intelligence (XAI) error cases occur at the incorrect prototypes located at the extreme ends of the images (as demonstrated in A-2 of Figure 1). In response to this, the images are divided into three segments based on their width, and the part with a width of 25% and 75% of the original images is retained. This manipulated data is termed "End-cut-OCT" images in this research. Subsequently, this dataset is employed to test AMD detection using deep learning models and compared with the original OCT datasets in the concluding experiment.
”
4) Check the resolution of the figures.
Replay: I will deliver the original figures submitting to the systems and double check the resolution of the figures to ensure that they are clear and easy to read. Thank you again for your precious advice.
5) Insert references in the discussion to corroborate your findings or to highlight the differences.
Replay: I have inserted references in the discussion to support my findings and highlight any differences between my work and other related studies. Thank you again for your precious advice.
- Saarela, M. and L. Geogieva, Robustness, Stability, and Fidelity of Explanations for a Deep Skin Cancer Classification Model. Applied Sciences, 2022. 12(19): p. 9545.
- AL-Essa, M., et al. XAI to Explore Robustness of Features in Adversarial Training for Cybersecurity. in Foundations of Intelligent Systems: 26th International Symposium, ISMIS 2022, Cosenza, Italy, October 3–5, 2022, Proceedings. 2022. Springer.
- Bradshaw, T.J., et al., Artificial Intelligence Algorithms Need to Be Explainable—or Do They? Journal of Nuclear Medicine, 2023.
6) Disscusion must also report the limitations.
Replay: I add a section in the discussion to report on the limitations of the study.
“
However, there are still some limitations to this study. First, more DL models could be considered as the basic architecture for classifications rather than the vgg16 typical model. Second, more classification tasks rather than AMD detection could be performed to verify the hypothesis this study proposed. Third, a deep exploration related to data quality enhancements and data bias removal based on XAI could be discussed. Lastly, other XAI mechanisms and optimization methods may be considered, and the proposed model optimization methods could be compared to the other optimization approaches.
”
Once again, thank you for your valuable feedback. I will work on incorporating these changes to improve the quality of my article.
Author Response File: Author Response.pdf
Reviewer 2 Report
One of the biggest issues in AI use for medical imaging (and many other areas) is the "black box" effect. You can get good results in terms of metrics classification but real world need explanations, not only numbers.
This paper is about that, but unfortunately it does not bring too much light. First, the part of the title "Can Explainable Artificial Intelligence optimize the robustness of the machine learning model? " sounds a little bit as a click bait. The paper is about VGG and AMD images. Do not try to include anything else in the title and less if there are conclusions far beyond the work included in the paper.
Authors show that some CAM and other well known techniques to overcome the AI interpretability problem is working for this case. But it is dangerous to say "the number of correct prototypes in ROI-extracted UWF increases from layer 3 to layer 9, whereas layer 12 shows fewer correct and more incorrect prototypes. This implies that the model is learning correctly from layer 3 to 9, but not after layer 9" with little support or understanding about what it really means. You can modify the DL model in order to include some of these findings, but to say that it means that interpretability is better, I do not agree (and based on the use by real world physicians, I do not expect a huge change just because you can find some correlations, anything more, between some activation maps and some labels).
Another big problem with the paper is that it is written as a texbook, not a scientific paper. First sections are too basic and verbose, explaining attention, transfer learning and other basic ideas/methods.
Author Response
Thank you for taking the time to review my paper and for your valuable feedback. I appreciate your comments and will address them accordingly.
I agree that the "black box" effect is a significant issue in AI use for medical imaging, and the need for explainability is crucial. This article is aiming to verify the hypothesis of “Can XAI help to optimize the machine learning model robustness” and propose a model optimization approach based on XAI. We take the AMD image classification task as a case to verify our point, multiple types of images from the IOT system are tested in this study to verify this hypothesis. Real-world data is used for the machine learning model’s robustness evaluation. In this experiment, we use the XAI mechanism to optimize the robustness of VGG16 and compare the robustness of the original VGG16 and the optimized model. The main topic is still XAI and robustness. I am sorry about the trouble we make that it is not clear enough for your review. Thank you again for your precious advice.
Regarding the use of CAM and other techniques to overcome the AI interpretability problem, I acknowledge that the results presented in the paper are limited to the specific case of VGG and AMD images. However, the purpose of the paper was to showcase the effectiveness of these techniques in this particular case and not to make general conclusions about their applicability in other scenarios. Thank you again for your precious suggestions.
I understand your concern regarding the title, and the title has been revised as “An explainable artificial intelligence-based robustness optimization approach for age-related macular degeneration detection based on medical IOT systems” to make it more specific to the work presented in the paper. Thank you so much for your comments and great advice.
Regarding the statement you mentioned, "the number of correct prototypes in ROI-extracted UWF increases from layer 3 to layer 9, whereas layer 12 shows fewer correct and more incorrect prototypes," I understand your concern about the statement regarding the model's learning from layer 3 to 9 and interpretability. I agree that it is essential to be cautious in making conclusions about interpretability based on correlations between activation maps and labels. Therefore, I have revised the statement to more accurately and precisely reflect the findings presented in the paper. It is reversed as the following paragraph.
“Furthermore, the results of the XAI analysis conducted on the model trained for AMD detection revealed an interesting trend. As depicted in subfigure B of Figure 3, in the following ROI-extracted UWF images, prototypes are increasingly reasonable from layer 3 to layer 9. This implies that the model is learning interpretably from layer 3 to layer 9, but not after layer 9. To explore the underlying reasons, this study conducted an extensive XAI analysis of this figure for each layer of the model. Findings are presented in Figure 4, which reveals that layers 3, 10, 12, 13, 14, and 16 exhibit weak interpretability.”
I also appreciate your comment about the writing style of the paper. I have revised the earlier sections to ensure that they are appropriately concise and effectively communicate the necessary background information.
Once again, thank you for your valuable feedback, and I will take all your comments into consideration to improve the quality of my paper. I really appreciated it.
Author Response File: Author Response.pdf
Reviewer 3 Report
In this study, the authors proposed explainable AI models for detecting AMD using various ophthalmic imaging modalities collected from medical IOT systems. Although the idea is of interest, some major concerns are raised as follows:
1. The models have been implemented using simple/conventional architectures, thus the contribution is limited. What is indeed a significant novelty added to this study?
2. The study mentions the application of Explainable Artificial Intelligence (XAI) techniques to detect Age-related Macular Degeneration (AMD) using various ophthalmic imaging modalities. However, it does not provide specific details about the XAI techniques employed. Different XAI techniques have varying strengths and limitations, and their suitability for a particular problem domain should be justified. Providing more information about the specific XAI techniques used and their rationale would enhance the study's credibility.
3. The study acknowledges concerns about the generalization and robustness of AI models for eye disease diagnosis. However, it does not thoroughly discuss how the proposed XAI methods address these concerns. Generalization is a critical aspect of deploying AI models in real-world settings, and it is important to discuss how the XAI techniques mitigate overfitting and improve the model's ability to perform well on unseen data.
4. The study mentions the proposal of an optimized deep learning (DL) model for AMD identification based on insights extracted by XAI. However, it lacks specific details about the architecture, training methodology, hyperparameters, and evaluation metrics used for the DL model. Providing more information about the DL model's design and its optimization process would allow readers to assess its effectiveness and reproducibility.
5. The study mentions the findings that demonstrate the potential of XAI to improve the transparency, reliability, and trustworthiness of AI models for ophthalmic applications. However, it does not provide comprehensive details about the evaluation metrics, validation datasets, or statistical significance of the results obtained. Including a thorough evaluation with appropriate benchmarking, statistical analysis, and validation against independent datasets would strengthen the study's claims.
6. While the study highlights the advantages of XAI in improving the performance of ophthalmic AI systems, it does not discuss the practical considerations for implementing these systems in real-world clinical settings. Factors such as integration with existing healthcare infrastructure, data sharing protocols, regulatory compliance, and user acceptance are critical for successful deployment. Addressing these practical implementation challenges would provide valuable insights into the feasibility and adoption of the proposed approach.
7. Deep learning is well-known and has been used in previous studies i.e., PMID: 36642410, PMID: 31920706. Therefore, the authors are suggested to refer to more works in this description to attract a broader readership.
8. The study emphasizes the potential benefits of XAI in ophthalmic AI systems but does not address potential ethical considerations associated with these technologies. AI in healthcare raises important ethical concerns related to data privacy, informed consent, biases, and the impact on the doctor-patient relationship. Discussing these ethical considerations and any measures taken to mitigate potential risks would enhance the study's relevance and ensure a more holistic understanding of the proposed approach.
9. When comparing the predictive performance among methods/models, the authors should perform some statistical tests to see significant differences.
10. The authors should compare their predictive performance to previously published works on the same problem/data.
11. Source codes should be provided for replicating the study.
12. Quality of figures should be improved.
English writing and presentation style should be improved.
Author Response
Thank you for your feedback on my paper. I appreciate your comments and will address them accordingly.
- Regarding your first concern, I understand your point that the models used in the study are simple and conventional. However, the novelty of the study lies in the application of explainable AI (XAI) models to enhance the robustness of the machine learning model. This article proposed and aims to verify the hypothesis of “Can XAI help to optimize the machine learning model robustness”. A novel model robustness optimization approach based on XAI is proposed. We take the AMD image classification task as a case to verify our point, multiple types of images from the IOT system are tested in this study to verify this hypothesis. Real-world data is used for the machine learning model’s robustness evaluation. In this experiment, we use the XAI mechanism to optimize the robustness of VGG16 and compare the robustness of the original VGG16 and the optimized model. Besides, this study used XAI for detecting AMD using various ophthalmic imaging modalities collected from medical IOT systems. This approach has the potential to improve the interpretability and transparency of the models, which is crucial in the medical domain. Additionally, the study provides insights into the performance of these models compared to traditional machine learning models, which can be useful for future research.
- Regarding the second concern, I agree that providing more specific details about the XAI techniques used and their rationale would enhance the study's credibility. The XAI techniques used in the study include Class Activation Mapping (CAM) and Grad-CAM, which are commonly used in the field of computer vision to visualize the features learned by deep learning models. We used Grad-CAM and Layer-CAM (1) in the new situation of AMD detection model optimization; (2) to evaluate the explainability of the typical machine learning model and recognize the bias of data and model; (3) to guide the model optimization with skip, attention and transfer mechanism. Results show that guided by the CAM mechanism, the optimized model has a significant advantage of robustness enhancements. But still, you are right about the XAI method exploration, I have provided more information about the specific XAI techniques used in the paper and their suitability for the problem domain to improve the clarity and credibility of the study.
“The gradient feature map is a great way to explain the ML models’ convergence direction. By comparing the conception of different CAM mechanisms, this study uses Grad-CAM for heatmap generation of the last layer, since it is suitable to evaluate VGG16 frameworks which include a fully connected layer. A Layer-CAM is utilized for proto-type extraction of layers within the network since with weighted sums are calculated, no fully connected layer is required for this approach.”
- Thank you for your feedback on the need to discuss how the proposed XAI methods address concerns about the generalization and robustness of AI models in eye disease diagnosis. We agree that generalization is critical for the real-world deployment of AI models and have revised the article to include a more detailed discussion of how our proposed XAI techniques mitigate overfitting and improve model performance on unseen data. Specifically, we have elaborated on how our explainability methods help identify features that are most relevant for diagnosis, which in turn improves the generalization of the model. Additionally, we have included a discussion on how our proposed XAI methods can help identify and mitigate bias in the training data, which further improves the robustness of the model.
“
In conclusion, this study shows that the CAM-based XAI method is a great way to evaluate the efficiency of layers in the model. With the CAM score and the judged by the ophthalmologists, the layers with explainability should be with more concern in the model architecture. Using skip and attention mechanisms, these layers could contribute more to the prediction process. Besides, with the CAM visualization and cooperation with experts, data bias could be recognized, which could assist the data preprocessing and data quality enhancements. Thus, compared to the original black-box models, where model bias and data bias are not recognized, the interpretable improved learning model shows high robustness.
”
- Thank you for your feedback on the need to provide more details about the architecture, training methodology, hyperparameters, and evaluation metrics used for the DL model. We have updated the article to include a more detailed description of the DL model's architecture, including the number of layers, activation functions, and optimization algorithm used. We have also provided more information about the training methodology, including the use of data augmentation techniques and the selection of hyperparameters. Finally, we have included a discussion of the evaluation metrics used, including sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC). We hope that these revisions provide a more comprehensive description of the proposed DL model and its performance.
- Thank you for your feedback on the need to provide more comprehensive details about the evaluation metrics, validation datasets, and statistical significance of the results obtained. We have revised the article to include a more thorough evaluation of the proposed XAI techniques, including benchmarking against state-of-the-art models, statistical analysis of the results, and validation against independent datasets. We have also included a discussion of the limitations of our study and directions for future research. We believe that these revisions strengthen the claims made in the article and provide a more complete picture of the effectiveness of our proposed XAI techniques.
“
In AI-based AMD detection, low explainability can result in low robustness, leading to reduced confidence in the model's output and lower adoption rates[52]. A model lacking transparency may also be difficult to interpret, making it challenging for clinicians to understand how the model arrived at its diagnosis. Therefore, XAI exploration is necessary for AI-based medical identification tasks.
Otherwise, A high level of explainability and interpretability could assist in improv-ing model performance and robustness[53]. This study verified the effectiveness of data bias removal and model enhancement processes in enhancing the robustness of AMD de-tection by evaluating accuracy results for unseen test datasets. This study shows that the CAM-based XAI method is a great way to evaluate the efficiency of layers in the model. With the CAM score and the judged by the ophthalmologists, the layers with explainabil-ity should be with more concern in the model architecture. Using skip and attention mechanisms, these layers could contribute more to the prediction process. Besides, with the CAM visualization and cooperation with experts, data bias could be recognized, which could assist the data preprocessing and data quality enhancements. Thus, com-pared to the original black-box models, where model bias and data bias are not recognized, the interpretable improved learning model shows high robustness.
Additionally, XAI indicators and measurements can also aid in sample data selection and preprocessing method choice[54]. The study showed that segmented OCT and ROI-extracted UWF deliver the strongest explainability and interpretability due to significant drusen features for OCT after segmentation and less noise in the entire ROI-extracted UWF image entity. It is important to analyze and interpret visualizations of prototypes at different layers to understand the model's behavior and improve its accuracy, transparency, and interpretability. By understanding the most important features and patterns in diagnosis, clinicians can ensure that the AI system is not unfairly biased towards certain patient populations, making it more useful and effective in clinical practice.
Moreover, by exploring the XAI measurement in certain layers of the DL-based AMD detection model DL model, this study has identified three possible reasons for the existence of wrong prototypes. Firstly, the complexity and variability of features learned by the model at different layers could be contributing factors. Deeper layers of the model learn more abstract and intricate features that may not be directly relevant to the target AMD features, leading to the development of wrong prototypes in certain layers that may not aid accurate detection. Secondly, the model may have to overfit certain features or patterns in the training data, resulting in the generation of wrong prototypes. Insufficient amounts of training data may also be a possible reason for the development of wrong prototypes, as it can cause the model to learn irrelevant or incorrect features. To address these issues, skip connections that learn global and local features from each layer could be introduced. Advanced feature extraction mechanisms such as reward mechanisms for reinforcement learning, punishment mechanisms for adversarial neural networks, and attention mechanisms for useful feature extraction should also be considered. Besides, features extracted from different channels are different, the channel selection and model optimization should be considered in the future. Data rotation and other data enhancement processes could be used to improve data robustness, and more real-world data could be added to the model training dataset. Transfer learning is another great way to encounter the small size of the training dataset, it has been verified in this study by comparing the performances between the original VGG16 and improved model with transfer based on FAF images.
Furthermore, this study suggests a potential relationship between the center of the optic disc and AMD, as nearly all the analyzed images exhibit a prototype in this area. The optic disc is an essential region in the retina that is positioned close to the macula, which is responsible for detailed vision and is often affected by AMD. Drusen, small deposits of cellular debris, can accumulate in the macula of individuals with AMD and may also be detected near the optic disc. Therefore, it is plausible that the appearance of a prototype in the center of the optic disc in certain images could be associated with the presence of drusen and AMD. However, additional research is required to validate this hypothesis.
However, there are still some limitations to this study. First, more DL models could be considered as the basic architecture for classifications rather than the vgg16 typical model. Second, more classification tasks could be performed to verify the hypothesis this study proposed. Third, a deep exploration related to data quality enhancements and data bias removal based on XAI could be discussed. Lastly, other XAI mechanisms and optimization methods may be considered, and the proposed model optimization methods could be compared to the other optimization approaches.
”
- Thank you for your feedback on the need to discuss the practical considerations for implementing ophthalmic AI systems in real-world clinical settings. We have revised the article to include a more detailed discussion of the practical considerations, including data privacy, regulatory compliance, and ethical considerations. We have also included a discussion of the potential challenges that may arise when implementing ophthalmic AI systems in clinical settings, such as the need for robust validation and monitoring of the system's performance. We hope that these revisions provide a more complete picture of the practical considerations involved in implementing ophthalmic AI systems in real-world clinical settings.
“
The implementation of ophthalmic AI systems in clinical settings necessitates careful consideration of practical aspects, including data privacy, regulatory compliance, ethical considerations, and system validation and performance monitoring challenges. These considerations are crucial to ensure the privacy and security of patient data in accordance with relevant data protection regulations such as GDPR and HIPAA. Compliance with regulatory requirements, including obtaining necessary certifications or approvals, may be necessary. Ethical considerations are paramount, including informed consent, transparency in decision-making, and responsible use of AI technology. Robust validation processes are essential to assess the performance and reliability of ophthalmic AI systems, including evaluating the accuracy, sensitivity, specificity, and other pertinent metrics. Performance monitoring is crucial to identify any potential drift or degradation in system performance over time. Mitigating biases and ensuring fairness in AI systems is important and achieved through diverse and representative training data. User training and education are vital to equip healthcare professionals with the knowledge to interpret AI system outputs accurately. Integrating AI systems into clinical workflows should be seamless and efficient. Liability, malpractice, and legal frameworks must be established to address responsibility and accountability. Successful implementation requires collaboration among AI developers, healthcare professionals, regulatory bodies, and ethics committees to navigate these challenges while upholding patient privacy, safety, and ethical standards in AI technology usage.
”
- Thank you for your feedback on the need to refer to more works in the description of deep learning to attract a broader readership. We have revised the article to include a more comprehensive review of previous studies that have used deep learning for ophthalmic applications. We hope that these revisions provide readers with a better understanding of the state-of-the-art in this field and the novelty of our proposed approach.
“
The development of AI-based detection methods delivers a potential opportunity for AMD detection and treatment[5]. AMD is a complex and multifactorial eye disease that requires accurate and timely diagnosis and treatment to prevent vision loss and maintain quality of life for affected individuals[6, 7]. Traditional methods of AMD diagnosis, such as manual grading of fundus images, are time-consuming and can be prone to inter-observer variability[8]. The increasing prevalence of AMD, coupled with the aging population, creates a growing need for more effective and accessible screening and diagnosis methods. AI-based detection methods have the potential to address this need by providing cost-effective and scalable approaches to AMD diagnosis and monitoring[9], enabling earlier detection and intervention[10]. CNN-based deep learning methods have been proven to have a great performance for AMD detection[11, 12], scoring[13], classification (wet and dry)[14], biomarker extraction[15, 16], and the relationship analysis of AMD and other organs (like the liver)[17].
”
- Thank you for your feedback on the need to address potential ethical considerations associated with ophthalmic AI systems. We have revised the article to include a more detailed discussion of the ethical considerations involved in the development and deployment of these systems, including data privacy, informed consent, biases, and the impact on the doctor-patient relationship. We have also included a discussion of potential solutions to mitigate these ethical risks, such as the use of privacy-preserving techniques and transparent explainability methods. We hope that these revisions provide readers with a more complete understanding of the ethical considerations involved in the use of ophthalmic AI systems.
Thank you for your valuable feedback on my article. I appreciate your comments and suggestions, and I will take them into consideration in my revisions.
- Regarding the ethical considerations and potential risks of the proposed approach, I agree that discussing these aspects is crucial. I have expanded on this topic in my revised manuscript, including any measures taken to mitigate potential risks.
- In terms of comparing the predictive performance among methods/models, I agree that statistical tests should be performed to determine significant differences. I have included these tests in my revised manuscript to provide a more comprehensive analysis.
- I also agree that comparing our predictive performance to previously published works on the same problem/data is important. I have included a comparison of our results with the relevant literature in my revised manuscript.
- Regarding the source codes, I provided them through my github.
“
The source code is available on the GitHub platform” https://github.com/MiniHanWang/xai-amd-1.git”.
”
Finally, I acknowledge your comment on the English writing and presentation style. We have worked on improving the clarity of the language and presentation in my revised manuscript.
I appreciate your feedback on the quality of the figures, and I will make sure to improve them in my revised manuscript.
Author Response File: Author Response.pdf
Reviewer 4 Report
The authors presented the paper "Can Explainable Artificial Intelligence optimize the robustness of the machine learning model? Age-related Macular Degeneration detection based on medical IOT systems"
1) The reference list may be improved. More 2-3 years review papers should be cited in the Introduction section to show the progress in the area.
2) I don't see any relevance differences between sections Introduction and Section 2. Section 2 text looks very similar with part of the Introduction and may be united into one good text. No specific data, parameters, design, and previously published results discussed in Sections 2, which may improve the significance of this section.
3) Table 2. You have excellent accuracy results. However, I recommend presenting error of this data. Moreover, accuracy is only one results.
Is it possible, for example, to present not only accuracy but Sensitivity, Specificity, False Negative Ratio, False Positive Ratio, Total Correct, Total Wrong, and Precision data. It will be good to present the main data with error.
4) The discussion section is poor. Please, discuss quantitative results of your experiments. Please, compare your results with previously published works.
5) What the results will be if you will use the samples with other disease instead of Age-related Macular Degeneration. Have you studied such possibility?
6) Please, specify in the Conclusion section quantitative results of your experiments. The novelty and limitations of the work should be clearly mentioned in the Conclusion section and Abstract. I recommend inserting to the Conclusion section with future prospects and outlooks.
Figure 7. It is not possible to see the text.
Moderate editing of English language required
Author Response
Thank you for your feedback on my paper. I appreciate your suggestions, and I will take them into consideration in my revisions.
- Regarding the reference list, I agree that including more recent review papers in the Introduction would help demonstrate the progress in the area. I have added some relevant papers to my reference list to address this concern.
- I appreciate your comment on the similarity between the Introduction and Section 2. I Have combined the two sections. Thank you so much for your precious advice
- Regarding Table 2, I agree that presenting the error of the accuracy results would be valuable information. In addition, I acknowledge that accuracy is only one metric, and I have included other performance metrics, such as sensitivity and specificity, in my revised manuscript. Thank you so much for your precious suggestion. I am really appreciated.
- I agree that the discussion section needs improvement. In my revised manuscript, I have discussed the quantitative results of my experiments in more detail, and I will compare my results with previously published works to highlight the significance of my findings. Thank you again for your precious advice.
- Regarding the possibility of using samples with other diseases instead of Age-related Macular Degeneration, I have not studied this possibility. However, I will acknowledge this limitation in my revised manuscript and suggest possible future directions for research in this area.
- In my revised manuscript, I have specified the quantitative results of my experiments in the discussion section, and I have also mentioned the novelty and limitations of my work in both the discussion section. Additionally, future prospects and outlooks are included in the discussion and conclusion sections to provide readers with a sense of the potential impact of my research.
“
In AI-based AMD detection, low explainability can result in low robustness, leading to reduced confidence in the model's output and lower adoption rates[61]. A model lacking transparency may also be difficult to interpret, making it challenging for clinicians to understand how the model arrived at its diagnosis. Therefore, XAI exploration is necessary for AI-based medical identification tasks. Compared to the former studies, this study proposes a novel model optimization approach based on XAI. Taking the AMD image classification based on multiple types of images from the IOT system as an example, this study proposed a model enhancement approach by adding skip and attention mechanisms between explainable layers. A data bias-removing process is implemented and guided by XAI evaluation results. Real-world “unseen” data is used for the machine learning model’s robustness evaluation. By comparing the robustness of the original VGG16 and the optimized model for AMD detection, this study verified the effectiveness of the XAI mechanism to optimize the robustness of machine learning models based on four types of ophthalmic digital images (Average Accuracy for the unseen testing dataset of all data types: 82% vs. 96.62%). The ethics approval has been approved by Zhuhai Peo-ple’s Hospital (Zhuhai Hospital Affiliated with Jinan University).
Otherwise, A high level of explainability and interpretability could assist in improving model performance and robustness[62]. This study verified the effectiveness of data bias removal and model enhancement processes in enhancing the robustness of AMD detection by evaluating accuracy results for unseen test datasets. This study shows that the CAM-based XAI method is a great way to evaluate the efficiency of layers in the model. With the CAM score and the judged by the ophthalmologists, the layers with explainability should be with more concern in the model architecture. Using skip and attention mechanisms, these layers could contribute more to the prediction process. Besides, with the CAM visualization and cooperation with experts, data bias could be recognized, which could assist the data preprocessing and data quality enhancements. Thus, com-pared to the original black-box models, where model bias and data bias are not recognized, the interpretable improved learning model shows high robustness.
Additionally, XAI indicators and measurements can also aid in sample data selection and preprocessing method choice[63]. The study showed that segmented OCT and ROI-extracted UWF deliver the strongest explainability and interpretability due to significant drusen features for OCT after segmentation and less noise in the entire ROI-extracted UWF image entity. It is important to analyze and interpret visualizations of prototypes at different layers to understand the model's behavior and improve its accuracy, transparency, and interpretability. By understanding the most important features and patterns in diagnosis, clinicians can ensure that the AI system is not unfairly biased towards certain patient populations, making it more useful and effective in clinical practice.
Moreover, by exploring the XAI measurement in certain layers of the DL-based AMD detection model DL model, this study has identified three possible reasons for the existence of wrong prototypes. Firstly, the complexity and variability of features learned by the model at different layers could be contributing factors. Deeper layers of the model learn more abstract and intricate features that may not be directly relevant to the target AMD features, leading to the development of wrong prototypes in certain layers that may not aid accurate detection. Secondly, the model may have to overfit certain features or patterns in the training data, resulting in the generation of wrong prototypes. Insufficient amounts of training data may also be a possible reason for the development of wrong prototypes, as it can cause the model to learn irrelevant or incorrect features. To address these issues, skip connections that learn global and local features from each layer could be introduced. Advanced feature extraction mechanisms such as reward mechanisms for reinforcement learning, punishment mechanisms for adversarial neural networks, and attention mechanisms for useful feature extraction should also be considered. Besides, features extracted from different channels are different, the channel selection and model optimization should be considered in the future. Data rotation and other data enhancement processes could be used to improve data robustness, and more real-world data could be added to the model training dataset. Transfer learning is another great way to encounter the small size of the training dataset, it has been verified in this study by comparing the performances between the original VGG16(Accuracy for the unseen testing dataset=97%) and improved model with transfer(Accuracy for the unseen testing dataset=99.2%) based on FAF images.
Furthermore, this study suggests a potential relationship between the center of the optic disc and AMD, as nearly all the analyzed images exhibit a prototype in this area. The optic disc is an essential region in the retina that is positioned close to the macula, which is responsible for detailed vision and is often affected by AMD. Drusen, small de-posits of cellular debris, can accumulate in the macula of individuals with AMD and may also be detected near the optic disc. Therefore, it is plausible that the appearance of a pro-totype in the center of the optic disc in certain images could be associated with the pres-ence of drusen and AMD. However, additional research is required to validate this hy-pothesis.
The implementation of ophthalmic AI systems in clinical settings necessitates careful consideration of practical aspects, including data privacy, regulatory compliance, ethical considerations, and challenges related to system validation and performance monitoring. These considerations are crucial to ensure the privacy and security of patient data in accordance with relevant data protection regulations such as GDPR and HIPAA. Compliance with regulatory requirements, including obtaining necessary certifications or approval, may be necessary. Ethical considerations are paramount, including informed consent, transparency in decision-making, and responsible use of AI technology. Robust validation processes are essential to assess the performance and reliability of ophthalmic AI systems, including evaluating the accuracy, sensitivity, specificity, and other pertinent metrics. Performance monitoring is crucial to identify any potential drift or degradation in system performance over time. Mitigating biases and ensuring fairness in AI systems is important and achieved through diverse and representative training data. User training and education are vital to equip healthcare professionals with the knowledge to interpret AI system outputs accurately. Integrating AI systems into clinical workflows should be seamless and efficient. Liability, malpractice, and legal frameworks must be established to address responsibility and accountability. Successful implementation requires collaboration among AI developers, healthcare professionals, regulatory bodies, and ethics committees to navigate these challenges while upholding patient privacy, safety, and ethical standards in AI technology usage.
However, there are still some limitations to this study. First, more DL models could be considered as the basic architecture for classifications rather than the vgg16 typical model. Second, more classification tasks rather than AMD detection could be performed to verify the hypothesis this study proposed. Third, a deep exploration related to data quality enhancements and data bias removal based on XAI could be discussed. Lastly, other XAI mechanisms and optimization methods may be considered, and the proposed model optimization methods could be compared to the other optimization approaches.
”
- I appreciate your feedback on Figure 7, and I will make sure to improve its legibility in my revised manuscript. All the original high-resolution figures will be submitted to the system and delivered to the editors. Thank you again for your precious advice.
Finally, I acknowledge your comment on the English language, and I will make sure to edit the paper appropriately to improve its clarity and readability.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
The new version of the paper is OK.
Authors reduced the first part of the paper, modified title and make more precise comments when necessary avoiding conclusions beyond the results shown in the paper.
Reviewer 4 Report
Thank you for the revised paper.
Minor editing of English language required