A Metabolomics-Based Approach for Diagnosing NAFLD and Identifying Its Pre-Condition Along the Potential Disease Spectrum
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this study, researchers developed a diagnostic model for non-alcoholic fatty liver disease (NAFLD) using metabolomics. The model, which used machine learning, showed good diagnostic performance. It worked even for people with normal body mass index (BMI) and could identify "pre-NAFLD" states. The text is clear and accurate, but certain parts need more detail for consistency and better context.
1. Line 35: Define “LASSO” fully within the keywords section for accessibility to readers unfamiliar with this term.
2. Line 38: Rewrite “as part of the obesity pandemic” to make it clearer. Show that NAFLD is common even without obesity.
3. Line 46: Add a sentence to explain why current diagnostic methods are not good enough. For example, mention problems with sensitivity or cost.
4. Line 59: Explain if the study's goal (making a detection model for NAFLD) is to create a standard model for clinical use.
5. Line 69: Explain why 6587 individuals were randomly selected and how it was done.
6. Line 72: Explain why people who drink alcohol or have diabetes were not included, and if this affects the study.
7. Line 88: Mention that using ultrasound for fatty liver diagnosis is common. Add why this method could have problems, like differences between observers or machine issues.
8. Line 95: Explain why “lean NAFLD” is defined as having a BMI less than 23. Say where this cutoff comes from or why it was used.
9. Line 99: Mention that storing serum samples after tests keeps them in good condition. Add details like storage temperature and how long the samples were stored to support reliability.
10. Line 119: Give more details about how LASSO was used. Add the R package name, its version, and the R software version. For example: "using the 'glmnet' package (version X.X.X) in R version X.X.X."
11. Line 123: Fix how the text mentions “MetaboAnalyst 5.0” for MSEA. Make sure the version number is correct and formatted consistently. Confirm that reference 32 matches this version (5.0). If needed, update the citation and include full URLs without line breaks.
12. Line 144: The correlations with clinical markers help provide context. Add confidence intervals (e.g., R=0.3–0.4) in a table to improve understanding.
13. Line 163: Table 1 includes detailed statistics. Add one sentence to summarize the most important findings about metabolites. Explain why certain metabolites were chosen for analysis to help justify their inclusion.
14. Line 169: Include the formula for the diagnostic score. This makes it clear and useful for other researchers in the future.
15. Line 197: The LASSO model uses 70 metabolites and refers to Supplementary Table S3. Add details about the final model’s parameters, like the penalty parameter (lambda). Say how this value was chosen (e.g., cross-validation). This makes the process clear and possible to repeat. Include a statement about whether the LASSO model code is available, so others can reproduce the study.
16. Line 250: The term “pre-NAFLD” or “invisible NAFLD” is interesting. Add a clearer definition and explain the diagnostic process for this category.
17. Line 263: When saying the LASSO score can detect metabolic changes before NAFLD appears, give examples from the data. For instance, name a key metabolite or marker to support the claim.
18. Line 345: Add, “This ensures that the study was done according to ethical principles that protect patients’ rights and well-being.”
19. Line 349: Be specific about data-sharing steps. For example: “The anonymized datasets and code can be obtained by contacting the main author. A formal request and signed agreement are required.”
Author Response
Reviewer 1. Thank you very much for detailed review and valuable advice. The logical consistency of the paper was much improved. Below are my responses.
- Line 35: Define “LASSO” fully within the keywords section for accessibility to readers unfamiliar with this term.
-> We have revised the description as the reviewer’s comments.
- Line 38: Rewrite “as part of the obesity pandemic” to make it clearer. Show that NAFLD is common even without obesity.
-> Thank you for your helpful suggestion. We have revised the description as the reviewer’s comments: “Over the last few decades, the prevalence of NAFLD has increased alongside the global rise in obesity. NAFLD is common not only in obese individuals but also frequently found in those of normal weight, known as lean NAFLD”.
- Line 46: Add a sentence to explain why current diagnostic methods are not good enough. For example, mention problems with sensitivity or cost.
-> We have revised the description as the reviewer’s comments: “Although diagnostic imaging modalities (e.g., ultrasound), liver biochemical tests, and liver biopsies are standard practices for diagnosing NAFLD/NASH, they have several limitations, which are widely acknowledged by the medical community.14-17 These include inter-operator variability in ultrasound, low sensitivity for detecting hepatic steatosis in ultrasound and liver biochemical tests, and the invasive nature of procedures such as liver biopsies.”
- Line 59: Explain if the study's goal (making a detection model for NAFLD) is to create a standard model for clinical use.
-> We have explored that aspect, so we revised the text as follows: “In this study, we investigated the potential of metabolomics to develop a comprehensive diagnostic model for NAFLD that could serve as a standard for clinical use.”
- Line 69: Explain why 6587 individuals were randomly selected and how it was done.
-> This study was planned as part of a case-cohort study and conducted as a cross-sectional study using a sub-cohort selected through random sampling. The sub-cohort should be representative of the population and is used for costly biological measurements, among other purposes. The sample size has been set based on considerations of feasibility and cost. I have added an explanation regarding this to the Materials and Methods section.
- Line 72: Explain why people who drink alcohol or have diabetes were not included, and if this affects the study.
-> We have added the following text: “The reason for excluding individuals who consume more than a certain amount of alcohol is that excessive alcohol intake is one of the diagnostic criteria for NAFLD. The purpose of excluding individuals with diabetes and other diseases is to exclude those who have developed fatty liver due to the effects of their conditions, thereby approximating a general healthy population more closely.”
- Line 88: Mention that using ultrasound for fatty liver diagnosis is common. Add why this method could have problems, like differences between observers or machine issues.
-> We have revised the description as the reviewer’s comments: “Each participant also routinely underwent an abdominal ultrasound (Xario XG SSA-680A, Aplio400 TUS-A400, or Aplio500 TUS-A500 instrument [Canon Medical Systems, Tochigi, Japan]) examination, which is the standard method for NAFLD screening.” I have addressed the issues mentioned in your comments by adding to the introduction as the answer to the comment 3.
- Line 95: Explain why “lean NAFLD” is defined as having a BMI less than 23. Say where this cutoff comes from or why it was used.
-> This is a well-known cutoff, used by Younes in the valuable review "NASH in Lean Individuals" and also proposed by the WHO expert consultation. The cutoff point of a BMI of 23 has been shown to be an important threshold for Asians. I have added this point to the main text: “Although the definition of obesity varies, including adjustments based on ethnicity,25 we considered participants with a body mass index (BMI) <23 kg/m2 and NAFLD to have ’lean NAFLD’, in accordance with the definition adopted by Younes et al.26 Additionally, according to the WHO expert consultation, BMI of 23 kg/m2 was identified as potential public health action point for Asian population.”
- Line 99: Mention that storing serum samples after tests keeps them in good condition. Add details like storage temperature and how long the samples were stored to support reliability.
-> The details you pointed out are described in the Supplementary Methods section, under sample collection and serum collection and preparation.
- Line 119: Give more details about how LASSO was used. Add the R package name, its version, and the R software version. For example: "using the 'glmnet' package (version X.X.X) in R version X.X.X."
-> We have added the version information as follows: To develop a logistic regression-based diagnostic model, least absolute shrinkage and selection operator (LASSO) was applied using the ’glmnet‘ package (version 2.0-16) in R 3.3.4 (R Foundation for Statistical Computing, Vienna, Austria).
- Line 123: Fix how the text mentions “MetaboAnalyst 5.0” for MSEA. Make sure the version number is correct and formatted consistently. Confirm that reference 32 matches this version (5.0). If needed, update the citation and include full URLs without line breaks.
-> Since the version was updated to 6.0, I have conducted it anew and updated the information accordingly.
- Line 144: The correlations with clinical markers help provide context. Add confidence intervals (e.g., R=0.3–0.4) in a table to improve understanding.
-> We have added confidence intervals to each correlation coefficient.
- Line 163: Table 1 includes detailed statistics. Add one sentence to summarize the most important findings about metabolites. Explain why certain metabolites were chosen for analysis to help justify their inclusion.
-> Here, we would like to demonstrate that although each metabolite shows highly significant differences between the presence and absence of NAFLD, they are insufficient alone for diagnostic use. Therefore, I have added the following sentence: “Even though the top 10 metabolites demonstrated highly significant differences, each used individually showed an AUC of 0.6-0.75, which is insufficient for clinical use.”
- Line 169: Include the formula for the diagnostic score. This makes it clear and useful for other researchers in the future.
-> We added the detail in Supplementary Table S3.
- Line 197: The LASSO model uses 70 metabolites and refers to Supplementary Table S3. Add details about the final model’s parameters, like the penalty parameter (lambda). Say how this value was chosen (e.g., cross-validation). This makes the process clear and possible to repeat. Include a statement about whether the LASSO model code is available, so others can reproduce the study.
-> Thank you for your helpful comment. This information was originally included in the Supplementary Methods, but we have added details about lambda and moved it to the main text.
- Line 250: The term “pre-NAFLD” or “invisible NAFLD” is interesting. Add a clearer definition and explain the diagnostic process for this category.
-> We have revised and added the descriptions as the reviewer’s comments: “This suggests that the LASSO score can detect a ‘pre-NAFLD’ or ‘invisible NAFLD (metabolic change cannot be captured by diagnostic imaging)’ condition, indicative of an emerging metabolic syndrome, regardless of BMI. We therefore determined non-NAFLD individuals with the LASSO score above cutoff as ‘pre-NAFLD’.”
- Line 263: When saying the LASSO score can detect metabolic changes before NAFLD appears, give examples from the data. For instance, name a key metabolite or marker to support the claim.
-> We have added the following text: “In particular, glutamic acid, the most influencing component of the LASSO score exhibited a significant difference of 1 standard deviation between the pre-NAFLD and non-NAFLD groups.”
- Line 345: Add, “This ensures that the study was done according to ethical principles that protect patients’ rights and well-being.”
-> We have revised the description as the reviewer’s comments.
- Line 349: Be specific about data-sharing steps. For example: “The anonymized datasets and code can be obtained by contacting the main author. A formal request and signed agreement are required.”
-> We have revised the description as the reviewer’s comments.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript by Nojima & Kimura et al. attempts to develop a metabolomics-based diagnostic model for nonalcoholic fatty liver disease (NAFLD). It proposes "pre-NAFLD" as a novel diagnostic group. While metabolomics for disease diagnostics is well-established, applying this to identify "pre-NAFLD" is an interesting perspective on early detection. Combining machine learning (LASSO regression) with metabolomics data to address lean NAFLD is also interesting, as this subgroup is often underrepresented in studies. However, the following concerns need to be addressed:
- The study's reliance on BMI alone for defining lean versus non-lean individuals is a potential weakness. Other aspects like waist-to-hip ratio or visceral fat quantification may provide a more robust stratification. Have the authors recorded these metrics? If not, it should be included as a discussion point.
- The authors should consider validating ultrasound findings with a subset of participants undergoing MR spectroscopy or biopsy for a more robust comparison.
- Some claims about "pre-NAFLD" seem speculative, without direct evidence of progression from pre-NAFLD to clinical NAFLD. The authors should clarify and discuss this issue in the discussion section.
- Can the authors clarify whether "pre-NAFLD" participants were screened for other metabolic conditions that might affect the metabolomic profiles, like subclinical insulin resistance or dyslipidemia?
- The diagnostic accuracy of the LASSO model compared to current non-invasive tools (e.g., FibroScan or FLI) is not clearly discussed. The authors should add a discussion comparing the LASSO model to existing diagnostic tools for better contextualization.
- Fig.4 contains essential information about the clinical significance of the LASSO scores. However, the information and the overall presentation feel overwhelming. The authors could consider splitting the figure; this division would allow each sub-figure to be more focused and less crowded.
Figure 4A: Focus on the regression analyses and metabolic indicator stratifications.
Figure 4B: Showcase the conceptual illustration of the "NAFLD/metabolic syndrome spectrum."
Relocating detailed explanations (e.g., definitions of categories or findings) from the figure to the legend should improve clarity, as the figure should primarily convey visual data, while the legend should provide context. To focus the audience's attention, consider adding bold or enlarged labels to emphasize the diagnostic potential of the LASSO score or the progression from "pre-NAFLD" to NAFLD.
Minor concerns: Typo on Page 11, Lines 280-281: "Deep red, NAFLD with 280
LASSO score < Cutoff (NAFLD-high)."
Author Response
Reviewer 2. Thank you for your insightful advice that adds worth to our paper. Below are my responses.
- The study's reliance on BMI alone for defining lean versus non-lean individuals is a potential weakness. Other aspects like waist-to-hip ratio or visceral fat quantification may provide a more robust stratification. Have the authors recorded these metrics? If not, it should be included as a discussion point.
-> Thank you for your valuable comments. This is a widely recognized cutoff, also used by Younes in the valuable review titled "NASH in Lean Individuals," and endorsed by the WHO's proposal that a BMI of 23 is an important cutoff point for Asians. I have added this information to the main text: Although the definition of obesity varies, including adjustments based on ethnicity,25 we considered participants with a body mass index (BMI) <23 kg/m2 and NAFLD to have ’lean NAFLD’, in accordance with the definition adopted by Younes et al.26 Additionally, according to the WHO expert consultation, BMI of 23 kg/m2 was identified as potential public health action point for Asian population.” The waist-to-hip ratio is an important indicator of obesity, but measurements of the hips and quantified visceral fat values have not been taken. This point will be noted as a limitation: “Furthermore, more sensitive indicators of visceral fat obesity such as the waist-to-hip ratio and quantification of visceral fat were not assessed. These measures, potentially more relevant than BMI for metabolic syndrome and NAFLD onset, could serve as more appropriate stratification indicators and may also show a stronger correlation with the LASSO score. However, a BMI of 23 is still widely used as a useful benchmark for Asians and is therefore considered a reasonable standard.25,26”
- The authors should consider validating ultrasound findings with a subset of participants undergoing MR spectroscopy or biopsy for a more robust comparison.
-> As you pointed out, MR spectroscopy and liver biopsy are reliable methods for diagnosing hepatic steatosis. However, in this research, the objective is to establish a practical measure for assessing hepatic steatosis in the general population, and it is very hard to offer such imaging diagnosis to individuals coming for health check-up. Therefore, these two procedures will not be suitable for this purpose because of cost and invasiveness.
- Some claims about "pre-NAFLD" seem speculative, without direct evidence of progression from pre-NAFLD to clinical NAFLD. The authors should clarify and discuss this issue in the discussion section.
-> We have added this sentence to the discussion: Since this study did not directly examine whether individuals categorized as 'pre-NAFLD' are more susceptible to developing NAFLD, whether factors such as obesity, aging and changes in lifestyle have a stronger impact on the pre-NAFLD population should be verified in future prospective studies.
- Can the authors clarify whether "pre-NAFLD" participants were screened for other metabolic conditions that might affect the metabolomic profiles, like subclinical insulin resistance or dyslipidemia?
-> Since this study was conducted within the framework of a health checkup examination, clinical tests as shown in Figure 4 have been carried out, but they are limited and no further endocrine testing has been conducted. I hope this clarifies the situation.
- The diagnostic accuracy of the LASSO model compared to current non-invasive tools (e.g., FibroScan or FLI) is not clearly discussed. The authors should add a discussion comparing the LASSO model to existing diagnostic tools for better contextualization.
-> Because FLI (Fatty Liver Index) could be calculated from the measurement items, we added a figure for comparison (Figure 3C) and discussed this in the discussion section: “While the fatty liver index (FLI) also demonstrated good diagnostic ability in lower and middle BMI category, it was a bit weaker in higher BMI category.”, “Remarkably, the simple fatty liver index (FLI), which includes triglycerides, γ-GTP, BMI, and abdominal circumference, also demonstrated a good association with lean NAFLD (Figure 3F). It is apparent that the LASSO score demonstrates high diagnostic ability across all BMI categories, possibly due to its lower dependency on obesity-related indicators such as BMI, compared to the FLI.”
- 4 contains essential information about the clinical significance of the LASSO scores. However, the information and the overall presentation feel overwhelming. The authors could consider splitting the figure; this division would allow each sub-figure to be more focused and less crowded.
Relocating detailed explanations (e.g., definitions of categories or findings) from the figure to the legend should improve clarity, as the figure should primarily convey visual data, while the legend should provide context. To focus the audience's attention, consider adding bold or enlarged labels to emphasize the diagnostic potential of the LASSO score or the progression from "pre-NAFLD" to NAFLD.
-> Thank you for your valuable advice. We split Figure 4 into two parts, with the illustration originally in Figure 4C now standing alone as Figure 5. The descriptions from Figure 4A have been moved to the legend, and we have graphed the standardized coefficients and R-squared change rate to create Figure 4B. Additionally, we have made enhancements such as emphasizing certain text.