Optimal DaTQUANT Thresholds for Diagnostic Accuracy of Dementia with Lewy Bodies (DLB) and Parkinson’s Disease (PD)
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
In this manuscript, Kuo et al. used quantitative SPECT image features extracted with DaTQUANT to predict the patient clinical outcome. This work extended previous studies which were limited either to the patient population or missing key statistical indicators. The reviewer believes that the scientific soundness of this manuscript can be improved if the below comments can be further addressed.
1. Please explain the process of the construction of multi-variable model a bit more in the methods section, even if it is almost identical to the previous study. Is it simply a linear combination of these image features or are they weighted? Have the authors ever thought about using machine-learning based classifiers to predict the outcomes, such as a support vector machine?
2. In the result section, can the authors add ROC (receiver operating characteristic) curves of the single and multi-variable models? A comparison of ROC curves between this study and previous studies would also be very interesting.
3. The methods section also lacks some description of the imaging acquisition procedure and the scanner used for different patient cohorts. Please briefly mention these info, as well as image processing steps, such as reconstruction, denoising, if possible.
4. In the discussion section, the authors could also discuss how pathologically or physiologically putamen could be linked with PD, DLB and how it could be used to differentiate the clinical outcomes.
Comments on the Quality of English Language
The abbreviations in this manuscript is a bit confusing. The reviewer suggests another round of proofreading. For example, 'SRB' was used as both 'specific binding ratio' and 'striatal binding ratio'. If the primary use case in this article is 'striatal binding ratio' then the former abbreviation should be removed, otherwise they should be abbreviated differently. Also, the title does not have to contain abbreviations. Some abbreviations were defined in abstract but never used again then defined again the main body.
Some other minor typos: in Table 3, is the 'Single-variable model: Post Putamen' rather the 'Posterior Putamen'?
Author Response
September 30th, 2024
Prof. Dr. Emilio Quaia,
Editor-in-Chief, Tomography
Department of Radiology
University of Padova
35100 Padova, Italy
Dear Dr. Quaia,
We would like to express our sincere gratitude for the time and effort invested by you and the reviewers in evaluating our manuscript titled "Optimal DaTQUANT Thresholds for Diagnostic Accuracy of Dementia with Lewy Bodies (DLB) and Parkinson’s Disease (PD)”.
We have carefully considered all the comments and suggestions from the reviewers and provided a detailed, point-by-point response to their comments in the following pages of this letter.
Thank you for your consideration, and we look forward to your feedback.
Sincerely,
Julia M. Fisher
Statistics Consulting Laboratory, BIO5 Institute
University of Arizona, Tucson, AZ, USA
julia@statlab.bio5.org
MDPI Editorial Office:
During the initial check, we noticed that the format of the references in your manuscript does not meet the requirements of the journal. Please cite references throughout the article with reference numbers, and place the numbers in square brackets [ ], for example [1], [1–3] or [1,3]. You may refer to the following website: https://www.mdpi.com/authors/references.
Thank you for your feedback. The brackets have been inserted and highlighted per your guidelines in the latest manuscript.
Review 1 Comments and Suggestions:
In this manuscript, Kuo et al. used quantitative SPECT image features extracted with DaTQUANT to predict the patient clinical outcome. This work extended previous studies which were limited either to the patient population or missing key statistical indicators. The reviewer believes that the scientific soundness of this manuscript can be improved if the below comments can be further addressed.
- Please explain the process of the construction of multi-variable model a bit more in the methods section, even if it is almost identical to the previous study. Is it simply a linear combination of these image features or are they weighted? Have the authors ever thought about using machine-learning based classifiers to predict the outcomes, such as a support vector machine?
Thank you for your insightful comment. We apologize for the oversight in not elaborating on the construction of the multi-variable model in the methods section. The multi-variable models were logistic regression models of disease (1 = has disease, 0 = does not have disease). For each type of ioflupane iodine-123 [I123] measure (i.e., SBR, z-score, percent deviation) and each possible combination of the eight region or asymmetry measures (i.e., striatum, putamen, caudate nucleus, anterior putamen, posterior putamen, and PCR of the MAH and the caudate and putamen asymmetries), a separate logistic regression model was fit. Thus, the log odds of having a given type of disease (i.e., pDLB or PS) was modeled as a linear combination of ioflupane iodine-123 measurements. The coefficients for each measure were estimated in the modeling process using maximum likelihood; producing an expression for the log odds of disease that differentially weights the various measures. The following text was added to section 2.3 for clarity:
“For each ioflupane iodine-123 [I123] binding measure (ie, SBR, z-score, % deviation), every possible combination of one to eight measurements (255 combinations in total) was included in a separate logistic regression model of disease state. For example, the most complex model using z-scores modeled the log odds of disease as a linear combination of the MAH z-scores for the striatum, putamen, caudate nucleus, anterior putamen, posterior putamen, and PCR, and the z-scores for the caudate and putamen asymmetries. In contrast, the simplest models included only a single predictor (e.g., the MAH z-score for the putamen). The predictive ability (accuracy, sensitivity, and specificity) of each combination of measurements and ioflupane iodine-123 [I123] binding measure was evaluated using leave-one-out cross-validation.”
Regarding the application of machine-learning classifiers such as support vector machines (SVMs), we did consider this approach. However, we decided to use logistic regression as it allows for easier interpretability of the coefficients, particularly in a clinical context where understanding the contribution of each variable is crucial. Additionally, logistic regression aligns with our previous work and existing literature, allowing for consistency in the analysis and comparison of results.
- In the result section, can the authors add ROC (receiver operating characteristic) curves of the single and multi-variable models? A comparison of ROC curves between this study and previous studies would also be very interesting.
Thank you for your feedback. We appreciate the usefulness of ROC curves and their ability to illuminate the separation of patient groups based on a classification measure. However, in this case, we feel they do not provide much additional clinical utility given the information we have already detailed in the manuscript. Specifically, we provide plots of the distributions of SBR values for each MAH region and asymmetry measure for patients with and without pDLB. These are almost identical to the analogous plots for the other ioflupane iodine-123 measures and quite similar to the plots for the PS/non-PS patient groups. These plots provide similar information to the ROC curves for the single-variable models. Given the overall message that a single region performs similarly to multiple regions at distinguishing patients with and without disease, and given the provision of accuracy, sensitivity, and specificity for all models, we thus don’t feel that ROC curve plots would provide additional clinically-actionable information.
- The methods section also lacks some description of the imaging acquisition procedure and the scanner used for different patient cohorts. Please briefly mention these info, as well as image processing steps, such as reconstruction, denoising, if possible.
Thank you for your comment. We appreciate the importance of providing detailed information regarding the imaging acquisition procedure and the scanners used, as well as the image processing steps, to ensure the reproducibility and clarity of our methods.
The following paragraph was added to Section 2.2:
“The patients in the three multicenter clinical trials were imaged using gamma cam-eras having either two or three detectors fitted with low energy, high-resolution (LEHR) collimators. The images were acquired for 30 minutes duration starting 3 to 3.5 hours after injection of between 2.5 and 6 mCi (92-222 MBq) of ioflupane using an energy window of either 15% or 20% and a pixel size between 3 and 4.5 mm. Reconstruction was performed using the default DaTQUANT parameters of OSEM 2i10s, 3D low-pass post-filter with cut-off frequency 0.6 cycles/pixel and power 10, and no corrections were applied. After automatic registration to the standard striatal template, the striatal and occipital volumes of interest were adjusted manually (only if necessary to accommodate any slight variations in patient anatomy).”
- In the discussion section, the authors could also discuss how pathologically or physiologically putamen could be linked with PD, DLB and how it could be used to differentiate the clinical outcomes.
Thank you for your feedback. We have incorporated the following verbiage within the discussion section of the manuscript (lines 380 to 383):
“Patients with dementia symptoms typically have more comorbidities in the brain which may lower the DaT density in the striata slightly and result in the more restrictive optimum threshold values to differentiate pDLB from non-DLB pathology which was obtained. See Figure 2C for an example.”
It is beyond the scope of the present work to explain the pathology correlation between reduced tracer intensity in the image and reduced DaT density in the striatal regions. This has been covered adequately by other studies which are referenced in the present work and included pathology analysis of brain tissue. As mentioned several places in the manuscript, there is not a reliable way to differentiate PD from DLB pathologies using DaT SPECT imaging, so we decline the reviewer’s request to include discussion on “how it could be used to differentiate the clinical outcomes.
Comments on the Quality of English Language
The abbreviations in this manuscript is a bit confusing. The reviewer suggests another round of proofreading. For example, 'SRB' was used as both 'specific binding ratio' and 'striatal binding ratio'. If the primary use case in this article is 'striatal binding ratio' then the former abbreviation should be removed, otherwise they should be abbreviated differently.
Thank you for your feedback. The former abbreviation has been removed and all ‘SBR’ in the text now only refers to ‘striatal binding ratio’
Also, the title does not have to contain abbreviations.
Thank you for your feedback. Many clinicians and researchers commonly refer to the diseases primarily by their abbreviations. Keeping the abbreviations in the title does significantly improve search results in PubMed and other search engines when using the abbreviations in the search field, even when the abbreviations are contained in the manuscript text. Finally, we were able to find many examples of other publications using DLB and PD in the title (both with and without the expanded versions). Therefore, to improve the ability of readers interested in the content to find the manuscript in search engines, we have respectfully decided to keep the title as-is.
Some abbreviations were defined in abstract but never used again then defined again the main body.
Thank you for your feedback. However, we went through all the abbreviations used in the abstract (including NSDD, DLB, PS, non-PS, and MAH) and found each of these abbreviations were indeed used again in the manuscript.
Some other minor typos: in Table 3, is the 'Single-variable model: Post Putamen' rather the 'Posterior Putamen'?
Thank you for your feedback. We updated Table 3 to read “Single-variable model: posterior putamen”
Reviewer 2 Report
Comments and Suggestions for Authors
The manuscript presents a retrospective analysis for establishing optimal DaTQUANT thresholds to enhance the diagnostic accuracy of DaT SPECT. While it addresses an important clinical need, there are several concerns that need to be addressed.
1.While the study claims to offer "optimal" thresholds, the result appears to show only modest improvements in accuracy over existing methods.
2.The threshold obtained in the study is highly related to the data used, and it may have limited value in clinical practice.
3. The sample size, though adequate for a retrospective analysis, may not provide sufficient power to draw broad conclusions.
4. The confidence intervals for the diagnostic accuracy metrics are wide, suggesting a high degree of uncertainty.
5. The paper would be strengthened with a more detailed methods section, such as the image acquisition and preprocessing steps.
Author Response
September 30th, 2024
Prof. Dr. Emilio Quaia,
Editor-in-Chief, Tomography
Department of Radiology
University of Padova
35100 Padova, Italy
Dear Dr. Quaia,
We would like to express our sincere gratitude for the time and effort invested by you and the reviewers in evaluating our manuscript titled "Optimal DaTQUANT Thresholds for Diagnostic Accuracy of Dementia with Lewy Bodies (DLB) and Parkinson’s Disease (PD)”.
We have carefully considered all the comments and suggestions from the reviewers and provided a detailed, point-by-point response to their comments in the following pages of this letter.
Thank you for your consideration, and we look forward to your feedback.
Sincerely,
Julia M. Fisher
Statistics Consulting Laboratory, BIO5 Institute
University of Arizona, Tucson, AZ, USA
julia@statlab.bio5.org
MDPI Editorial Office:
During the initial check, we noticed that the format of the references in your manuscript does not meet the requirements of the journal. Please cite references throughout the article with reference numbers, and place the numbers in square brackets [ ], for example [1], [1–3] or [1,3]. You may refer to the following website: https://www.mdpi.com/authors/references.
Thank you for your feedback. The brackets have been inserted and highlighted per your guidelines in the latest manuscript.
Review 2 Comments and Suggestions:
The manuscript presents a retrospective analysis for establishing optimal DaTQUANT thresholds to enhance the diagnostic accuracy of DaT SPECT. While it addresses an important clinical need, there are several concerns that need to be addressed.
- While the study claims to offer "optimal" thresholds, the result appears to show only modest improvements in accuracy over existing methods.
Thank you for your insightful observation. We agree that the improvements in accuracy offered by our proposed thresholds may appear modest. However, the term "optimal" is used to describe thresholds that are specifically tailored to the distinct patient populations examined in our study (PD and DLB), and which balance simplicity of use, ease of interpretability and similar diagnostic performance. To better define “optimal” to the readers of this work, we have included the following statement in Section 4.1:
“An optimal discriminator will balance simplicity of use, ease of interpretability and adequate diagnostic performance. A single-variable predictor, such as the z-score of the MAH posterior putamen, is much simpler to use and interpret than a weighted multi-variable calculation.”
2.The threshold obtained in the study is highly related to the data used, and it may have limited value in clinical practice.
Thank you for raising this important point. We acknowledge that the thresholds derived in our study are specific to the dataset used, which consists of patients from multicenter phase 3 or 4 clinical trials (Neill and colleagues). We believe that this actually enhances the applicability of our results to routine clinical practice for several reasons:
- Diverse Patient Population: The data used in our study were drawn from a multicenter cohort, including a broad spectrum of patients with varying stages of Parkinson’s disease (PD) and dementia with Lewy bodies (DLB). This diversity in patient demographics and disease presentation enhances the applicability of our thresholds across different clinical scenarios. This is similar to the wider applicability and better general performance of normal databases or normal limits which are drawn from multiple imaging centers with diverse imaging equipment when compared with databases arising from single centers.
- Comparison with Previous Studies: We compared our thresholds with those from previous studies and found that our thresholds were consistent with and in some cases slightly improved upon existing standards. This suggests that our findings are not isolated but are aligned with broader trends in DaT SPECT imaging.
- Practical Considerations: While thresholds may vary based on population characteristics and imaging protocols, our study emphasizes the importance of refining and adapting these thresholds to specific clinical contexts. This approach allows for the potential tailoring of diagnostic criteria to local practices, which is a common necessity in clinical diagnostics.
In addition, the limitations of the current study (including applicability of the data set) are discussed in the last paragraph of Section 4. In light of these factors, we believe our thresholds offer meaningful guidance for clinical practice, particularly when used in conjunction with clinician judgment and other diagnostic tools.
- The sample size, though adequate for a retrospective analysis, may not provide sufficient power to draw broad conclusions.
Thank you for your comment regarding the sample size. We agree that while the sample size of 303 patients is sufficient for the retrospective nature of this study, it does have limitations in terms of statistical power and the generalizability of the findings. However, we would like to highlight several factors that support the robustness of our conclusions:
- Multicenter Data: The dataset used in this study is drawn from multiple phase 3 and 4 clinical trials, encompassing a diverse patient population across different centers. This multicenter approach enhances the representativeness of our sample and mitigates some concerns related to sample size by providing a broader perspective on the data.
- Consistency with Previous Research: Our findings are consistent with those of previous studies, which lends additional credibility to our conclusions. The thresholds and diagnostic accuracy observed align with those reported in other research, suggesting that our results are not isolated but reflect broader trends in the field.
- Statistical Rigor: We employed rigorous statistical methods, including cross-validation and bootstrapping, to ensure the reliability of our findings. These techniques help to maximize the utility of the available data and provide confidence in the observed results, even with a moderate sample size.
- Clinical Relevance: While larger studies are always preferable, the findings from this study offer valuable insights that can inform clinical practice. The thresholds we identified provide practical guidance for the use of DaT SPECT image quantification to support diagnoses of PD and DLB, particularly in settings where similar patient symptoms due to different pathology are encountered.
- DLB Cohort is the largest to date: The cohort of patients with dementia symptoms in this work is the largest known to date which were studied with DaT SPECT and which have rigorous collection of clinical diagnosis and other data, such as is done in a clinical trial.
Future studies with larger cohorts and prospective designs may be necessary to further validate and expand upon these findings.
- The confidence intervals for the diagnostic accuracy metrics are wide, suggesting a high degree of uncertainty.
Thank you for your observation regarding the confidence intervals of our diagnostic accuracy metrics. We acknowledge that the wider confidence intervals indicate some degree of uncertainty in our estimates, which is not uncommon in studies with a moderate sample size and multiple variables. However, several factors help contextualize this uncertainty:
- Complexity of the Dataset: The data used in this study involved a diverse population from multiple centers, which introduces variability that can lead to wider confidence intervals. This variability reflects real-world conditions, where heterogeneity of patients and imaging equipment (in terms of both manufacturer and age) is common, and it underscores the robustness of our findings across different scenarios.
- Cross-Validation and Bootstrap Methods: We utilized cross-validation and bootstrap resampling to estimate the confidence intervals. These methods are designed to provide more reliable estimates by accounting for variability in the data. While this approach can result in wider intervals, it also increases the reliability of the point estimates, ensuring that they are not overly optimistic.
- Clinical Relevance Despite Uncertainty: Even with wider confidence intervals, the diagnostic accuracy metrics we observed are within acceptable ranges and align with those reported in previous studies. The thresholds we identified still provide valuable clinical guidance, particularly in differentiating PD and DLB from patients with similar symptoms but no degeneration of DaT, where even incremental improvements in accuracy can be clinically meaningful.
- Future Research Directions: We agree that further studies with larger sample sizes and prospective designs would help to narrow these confidence intervals and reduce uncertainty. The importance of ongoing research in this area and use of larger cohorts of data (which would hopefully help to reduce the size of the confidence intervals) is already mentioned in the last paragraph of Section 4 of the manuscript.
- We acknowledge that future research and harmonization efforts might result not in a precise threshold value, but in a definition of a range of highly uncertain borderline values. We discuss this possibility in section 4: “It is possible that further work will unify thresholds by establishing a “gray zone” that will delineate the value ranges of indeterminate scans.”
- The paper would be strengthened with a more detailed methods section, such as the image acquisition and preprocessing steps.
Thank you for your comment. We appreciate the importance of providing detailed information regarding the imaging acquisition procedure and the scanners used, as well as the image processing steps, to ensure the reproducibility and clarity of our methods.
The following paragraph was added to Section 2.2:
“The patients in the three multicenter clinical trials were imaged using gamma cameras having either two or three detectors fitted with low energy, high-resolution (LEHR) collimators. The images were acquired for 30 minutes duration starting 3 to 3.5 hours after injection of between 2.5 and 6 mCi (92-222 MBq) of ioflupane using an energy window of either 15% or 20% and a pixel size between 3 and 4.5 mm. Reconstruction was performed using the default DaTQUANT parameters of OSEM 2i10s, 3D low-pass post-filter with cut-off frequency 0.6 cycles/pixel and power 10, and no corrections were applied. After automatic registration to the standard striatal template, the striatal and occipital volumes of interest were adjusted manually (only if necessary to accommodate any slight variations in patient anatomy).”
Round 2
Reviewer 2 Report
Comments and Suggestions for Authors
The authors have provided a satisfatory response to the issues raised.