Dietary Sources of Glycine Betaine and Proline Betaine in Plant Foods and Their Potential Biological Relevance in Human Nutrition
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsPlease see below for a range of suggestions aimed at improving the clarity and scientific appropriateness of this manuscript.
- Line 37 - should "a-amino acids" be "α-amino acids"?
- Lines 79-81 - this logic does not follow. Why does the presence of ProBet in some foods make it a good study? This content needs updated for clarity or expanded to make sense to a wider audience.
- Line 161 - should read "a metabolomic dataset" or "metabolomic datasets".
- Methods 2.2 - please provide more detail on the process of agitation during sample mixing for clarity. This is also important for interpretation of data in consideration of whether your measured content relates to total betaine content or just release from each type of matrix (data in Table 1).
- Line 324-325 - absorption by Caco-2 cells does not model for permeation through the mucus layer. As this is highly negatively charged, this presumably affects bioaccessibility modelling and should be acknowledged in discussion somewhere.
- Line 361-366 - this discussion content has seemingly not been focused on food intake, which has to be the focus based on the journal you have submitted to. Has modelling considered the amount that would reach skin from oral intake? Please reconsider this section carefully based on the targe audience.
- Results - how was dietary intake of betaines originally estimated? This appears to be crucial in your modelling but I don't think this information has been included.
- Results - "Rare metabolic disorders" data appear to come from n=1 in two different conditions. Authors need to consider the reliability and value of interpretations in these cases based on the available starting information. Narratives about putative pathways should be presented with appropriate scientific conservatism. There may be a similar issue for some other examples where original n-values of observations are limited. How has this approach limited the potential for phenomenology?
- Line 567 - missing space been "estimates" and "and".
Author Response
- Line 37 - should "a-amino acids" be "α-amino acids"?
Reply: We thank the reviewer for pointing this out. The typographical error has been corrected, and “a-amino acids” has been replaced with “α-amino acids” throughout the manuscript.
- Lines 79-81 - this logic does not follow. Why does the presence of ProBet in some foods make it a good study? This content needs updated for clarity or expanded to make sense to a wider audience.
Reply: We agree that the original wording did not sufficiently clarify the rationale for focusing on proline betaine. The introduction has been revised to explicitly state that ProBet is almost exclusively of dietary origin in humans, is a well-established biomarker of citrus intake, and provides a useful contrast to glycine betaine in terms of metabolic fate and biological interpretation. This clarification strengthens the nutritional relevance of studying the two betaines together (ll- 70-78).
- Line 161 - should read "a metabolomic dataset" or "metabolomic datasets".
Reply: We thank the reviewer for this comment. The wording has been corrected to “metabolomic datasets” to reflect that data were retrieved from multiple studies and sources (l. 159).
- Methods 2.2 - please provide more detail on the process of agitation during sample mixing for clarity. This is also important for interpretation of data in consideration of whether your measured content relates to total betaine content or just release from each type of matrix (data in Table 1).
Reply: We thank the reviewer for this important observation. We agree that additional clarification on the extraction procedure is necessary to support correct interpretation of the quantitative data. The manuscript has been revised to clarify that the applied extraction protocol and that the reported values represent the extractable fraction of betaines from each food matrix under standardized conditions, rather than an exhaustive determination of total betaine content(ll-192-195; 201-203)
- Line 324-325 - absorption by Caco-2 cells does not model for permeation through the mucus layer. As this is highly negatively charged, this presumably affects bioaccessibility modelling and should be acknowledged in discussion somewhere.
Reply: We thank the reviewer for this insightful comment. We agree that Caco-2-based permeability and in silico absorption estimates do not account for the mucus layer, which may influence the bioaccessibility of highly polar and zwitterionic compounds such as betaines. This limitation has now been explicitly acknowledged in the discussion (ll. 339-345).
- Line 361-366 - this discussion content has seemingly not been focused on food intake, which has to be the focus based on the journal you have submitted to. Has modelling considered the amount that would reach skin from oral intake? Please reconsider this section carefully based on the targe audience.
Reply: Thanks for this important observation. We have settled that the discussion should remain clearly focused on dietary exposure, in line with the scope of Foods. The manuscript has been revised to clarify that skin sensitization predictions are included solely as part of a qualitative safety-related ADMET comparison and are not intended to model dermal exposure following oral intake of betaine-containing foods. We now explicitly state that systemic-to-skin exposure after dietary consumption was not assessed and is expected to be negligible, thereby reinforcing the food- and nutrition-oriented interpretation of these results (ll. 356-366).
- Results - how was dietary intake of betaines originally estimated? This appears to be crucial in your modelling but I don't think this information has been included.
Reply: We thank the reviewer for highlighting this important point. We agree that clarification on how dietary betaine intake was estimated is essential for correct interpretation of the modelling results. The manuscript has been revised to explicitly state that dietary intake values were derived from published nutritional studies and metabolomic literature, using reported consumption patterns and intake estimates rather than being calculated directly from the food concentration data generated in this study. We further clarify that these intake values were used at an aggregated, study-level for exploratory modelling purposes, and that the approach is intended to provide qualitative insights rather than precise individual-level exposure estimates (ll. 468-478).
- Results - "Rare metabolic disorders" data appear to come from n=1 in two different conditions. Authors need to consider the reliability and value of interpretations in these cases based on the available starting information. Narratives about putative pathways should be presented with appropriate scientific conservatism. There may be a similar issue for some other examples where original n-values of observations are limited. How has this approach limited the potential for phenomenology?
Reply: We thank the reviewer for this important and thoughtful comment. We agree that some data included under the category of rare metabolic disorders are derived from very limited sample sizes, in some cases single-patient reports, and that this constrains the strength of any mechanistic interpretation. The manuscript has been revised to explicitly acknowledge these limitations and to frame these observations as illustrative and hypothesis-generating rather than as evidence of causal pathways. We also clarify that, more generally, the integration of heterogeneous metabolomic sources prioritizes descriptive phenomenology over mechanistic inference, and that limited original sample sizes restrict the depth of biological interpretation that can be drawn (ll. 419-426).
- Line 567 - missing space been "estimates" and "and".
Reply: done; we thank the reviewer for noting this typographical error. The missing space has been corrected.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript presents a comprehensive, multi-faceted investigation into the dietary distribution, predicted pharmacokinetic profiles, and potential biological roles of two betaines: glycine betaine (GlyBet) and proline betaine (ProBet). The authors effectively combine targeted LC-MS quantification in a representative panel of plant-based foods, in silico ADMET predictions using the pkCSM platform, similarity-based target prediction via SuperPred 3.0, and integration with human metabolomic data from the HMDB. The study convincingly demonstrates distinct dietary sources for the two betaines (leafy vegetables/cereals for GlyBet vs. citrus fruits for ProBet) and provides a comparative analysis of their predicted ADMET properties, highlighting both similarities and key differences such as solubility and clearance. The work successfully positions these compounds not only as robust biomarkers of specific dietary intakes but also generates testable hypotheses regarding their differential involvement in metabolic (GlyBet) versus signaling-related (ProBet) pathways. The methodology is generally sound, the data presentation is clear, and the discussion contextualizes the findings within the existing literature. The manuscript is suitable for publication pending minor revisions to address the points below.
Authors need to answer the flowing questions:
- The description of sample homogenization and extraction is clear. However, the centrifugation details appear duplicated between sections 2.2 and 2.3. Please streamline this to avoid repetition.
- Specify the open-source cheminformatics tool(s) used for descriptor calculation.
- The model is appropriately framed as exploratory and study-level. The equation is clear. Please confirm that all predictor variables were coded appropriately in the model building, as this is implied but not explicitly stated.
Author Response
Comments and Suggestions for Authors
This manuscript presents a comprehensive, multi-faceted investigation into the dietary distribution, predicted pharmacokinetic profiles, and potential biological roles of two betaines: glycine betaine (GlyBet) and proline betaine (ProBet). The authors effectively combine targeted LC-MS quantification in a representative panel of plant-based foods, in silico ADMET predictions using the pkCSM platform, similarity-based target prediction via SuperPred 3.0, and integration with human metabolomic data from the HMDB. The study convincingly demonstrates distinct dietary sources for the two betaines (leafy vegetables/cereals for GlyBet vs. citrus fruits for ProBet) and provides a comparative analysis of their predicted ADMET properties, highlighting both similarities and key differences such as solubility and clearance. The work successfully positions these compounds not only as robust biomarkers of specific dietary intakes but also generates testable hypotheses regarding their differential involvement in metabolic (GlyBet) versus signaling-related (ProBet) pathways. The methodology is generally sound, the data presentation is clear, and the discussion contextualizes the findings within the existing literature. The manuscript is suitable for publication pending minor revisions to address the points below.
Authors need to answer the flowing questions:
- The description of sample homogenization and extraction is clear. However, the centrifugation details appear duplicated between sections 2.2 and 2.3. Please streamline this to avoid repetition.
Reply: We thank the reviewer for this observation. The description of centrifugation has been streamlined to avoid redundancy, and repeated details have been consolidated to improve clarity and readability of the Methods section ll-192-195; 201-203)
- Specify the open-source cheminformatics tool(s) used for descriptor calculation.
Reply: Thanks for noting missing information. The manuscript has been revised to explicitly specify the open-source cheminformatics software used for descriptor calculation (ll. 237-238).
- The model is appropriately framed as exploratory and study-level. The equation is clear. Please confirm that all predictor variables were coded appropriately in the model building, as this is implied but not explicitly stated.
Reply: We thank the reviewer for this suggestion. The manuscript has been revised to explicitly state that all predictor variables included in the regression model were coded consistently and according to their categorical or continuous nature, prior to model fitting (ll. 268-277).
Reviewer 3 Report
Comments and Suggestions for Authors
The manuscript presents a well‑integrated study combining experimental quantification, computational prediction, and metabolomic analysis of two dietary betaines. The systematic approach and the effort to connect analytical chemistry with computational and clinical data are commendable, and the topic is of clear relevance to understanding betaine distribution and potential roles in human nutrition. However, several important issues need to be addressed.
- Although the manuscript notes the limitations of applying SuperPred to betaines, the extended discussion of predicted targets and the detailed mechanistic explanations risk suggesting direct interactions that are unlikely for highly polar metabolites lacking classical pharmacophoric features. The repeated presentation of these speculative predictions may lead readers to interpret them as validated mechanisms. The authors should consider significantly shortening these sections and presenting the content in a clearly marked section on speculative biological hypotheses that emphasizes their exploratory nature and directs readers to rely primarily on established experimental evidence.
- The regression model is built from study‑level averages taken from very different populations, methods, and metabolomic platforms, yet it is used to make specific quantitative claims about how dietary intake relates to plasma GlyBet. Because the confidence intervals are wide, some predictors are not significant, and no validation or residual diagnostics are reported, it's difficult to judge how stable or unbiased the model estimates truly are. The authors should consider evaluating the stability of the regression model through cross‑validation or bootstrap resampling, presenting residual diagnostics to demonstrate that model assumptions are met, and clearly describing the study characteristics and sample sizes that inform each coefficient.
- There is no clear description of how dietary GlyBet and ProBet intake values were derived for the cohorts used in the regression model, creating a disconnect between the quantified food concentrations in Table 1 and the mmol/day exposure estimates in Section 3.5. Without knowing the assumed portion sizes, consumption frequencies, or how urinary ProBet measurements were translated into estimated citrus intake, readers cannot evaluate the biological plausibility or accuracy of the dietary intake coefficients. The authors should consider providing a supplementary table detailing the dietary assumptions used to calculate intake for each study, explicitly reconciling the concentration units in Table 1 with the mmol/day values in the model, discussing potential measurement error and dietary misclassification, and examining whether alternative consumption patterns meaningfully affect the regression estimates.
- Section 3.4 and Figure 1 provide a broad survey of plasma GlyBet levels across health and disease, but the manuscript doesn't clearly distinguish whether GlyBet is being interpreted as a sensitive metabolic marker or as a mechanistic driver of pathology. In conditions such as NAFLD and NASH, it remains unclear whether reduced GlyBet reflects downstream consequences of hepatic dysfunction or whether GlyBet depletion itself contributes to disease progression, and the text shifts between these interpretations. The authors should consider separating the discussion of GlyBet as an observational biomarker from the discussion of GlyBet as a mechanistic contributor to disease, and provide a more rigorous appraisal of the betaine supplementation literature, including study design, effect size consistency, sample attrition, and outcome definitions.
- Although the manuscript notes that betaine levels vary substantially by species, agronomic conditions, ripeness, and post‑harvest handling, the sampling approach described in Section 2.1 appears too limited to capture this biological and agricultural variability. Each food item seems to have been sourced from only one general supplier without documentation of growing conditions, harvest timing, storage parameters, or seasonal context. The authors should consider acknowledging this limitation and its implications for estimating typical dietary exposure.
- The authors should consider expanding the introduction to more clearly justify the clinical relevance of betaine dysregulation in NAFLD, NASH, CVD, and neuropsychiatric conditions, and to explain why simultaneous investigation of GlyBet and ProBet is necessary and what specific knowledge gap this study aims to address.
- Although the LC‑MS analytical methods are well referenced, several essential components needed for reproducibility and interpretation are missing. The food sampling strategy is insufficiently described. The dietary intake calculations are not explained, making it difficult to understand how the mmol/day estimates were derived from the quantified food data. The regression model lacks validation through cross‑validation or sensitivity analyses, and no diagnostic checks are provided to assess whether statistical assumptions are met.
- Some conclusions regarding GlyBet’s role in disease regulation extend beyond what the presented data can support. The claim that betaines act as modulators of cellular physiology isn't directly demonstrated, as the evidence more clearly positions them as biomarkers and only potential modulators pending further experimental validation. The authors should express statements about the clinical relevance of betaines more cautiously to avoid suggesting stronger evidence than the current data justify.
Author Response
Comments and Suggestions for Authors
The manuscript presents a well‑integrated study combining experimental quantification, computational prediction, and metabolomic analysis of two dietary betaines. The systematic approach and the effort to connect analytical chemistry with computational and clinical data are commendable, and the topic is of clear relevance to understanding betaine distribution and potential roles in human nutrition. However, several important issues need to be addressed.
- Although the manuscript notes the limitations of applying SuperPred to betaines, the extended discussion of predicted targets and the detailed mechanistic explanations risk suggesting direct interactions that are unlikely for highly polar metabolites lacking classical pharmacophoric features. The repeated presentation of these speculative predictions may lead readers to interpret them as validated mechanisms. The authors should consider significantly shortening these sections and presenting the content in a clearly marked section on speculative biological hypotheses that emphasizes their exploratory nature and directs readers to rely primarily on established experimental evidence.
Reply: We thank the reviewer for this important and constructive comment. We fully agree that similarity-based target prediction tools such as SuperPred have intrinsic limitations when applied to highly polar endogenous metabolites, and that extended mechanistic interpretations may risk overstatement. The manuscript has been revised to substantially shorten the sections discussing predicted targets and to clearly frame these analyses as exploratory and hypothesis-generating. Specifically, sections 3.4, 3.5, 3.6, and 3.7 have been rewritten and shortened. We now explicitly distinguish established experimental evidence from speculative associations, and we emphasize that SuperPred outputs are intended to provide qualitative contextualization rather than to imply validated ligand–target interactions. Readers are directed to rely primarily on consolidated experimental literature for mechanistic interpretation.
- The regression model is built from study‑level averages taken from very different populations, methods, and metabolomic platforms, yet it is used to make specific quantitative claims about how dietary intake relates to plasma GlyBet. Because the confidence intervals are wide, some predictors are not significant, and no validation or residual diagnostics are reported, it's difficult to judge how stable or unbiased the model estimates truly are. The authors should consider evaluating the stability of the regression model through cross‑validation or bootstrap resampling, presenting residual diagnostics to demonstrate that model assumptions are met, and clearly describing the study characteristics and sample sizes that inform each coefficient.
Reply: We thank the reviewer for this detailed and thoughtful assessment. We fully agree that the regression model is based on heterogeneous, study-level aggregated data derived from different populations, analytical platforms, and study designs, which limits formal validation and stability assessment. The manuscript has been revised to explicitly clarify that the model is exploratory and descriptive in nature, and that its coefficients should not be interpreted as predictive or causal estimates. We now state that cross-validation, residual diagnostics, and robustness analyses are not applicable in a strict sense to aggregated literature-derived data, and we emphasize that the model is intended to provide qualitative insights into the relative contribution of dietary and clinical factors rather than unbiased effect size estimates (ll. 268-277 and ll. 497-516).
- There is no clear description of how dietary GlyBet and ProBet intake values were derived for the cohorts used in the regression model, creating a disconnect between the quantified food concentrations in Table 1 and the mmol/day exposure estimates in Section 3.5. Without knowing the assumed portion sizes, consumption frequencies, or how urinary ProBet measurements were translated into estimated citrus intake, readers cannot evaluate the biological plausibility or accuracy of the dietary intake coefficients. The authors should consider providing a supplementary table detailing the dietary assumptions used to calculate intake for each study, explicitly reconciling the concentration units in Table 1 with the mmol/day values in the model, discussing potential measurement error and dietary misclassification, and examining whether alternative consumption patterns meaningfully affect the regression estimates.
Reply: We thank the reviewer for this important clarification request. We agree that dietary intake estimation requires explicit description to avoid misinterpretation. The manuscript has been revised to clearly distinguish between food concentration data generated in this study (Table 1), which are provided to characterize dietary sources, and dietary intake estimates (mmol/day), which were derived from published nutritional and metabolomic studies rather than calculated directly from our analytical measurements. Portion sizes, consumption frequencies, and exposure estimates therefore reflect the assumptions and methodologies of the original studies. We now explicitly state that intake values were used at an aggregated, study-level for exploratory modelling purposes, and we acknowledge the potential for dietary misclassification and measurement error inherent to this approach (ll. 481-494 see also reply to Rev 1).
- Section 3.4 and Figure 1 provide a broad survey of plasma GlyBet levels across health and disease, but the manuscript doesn't clearly distinguish whether GlyBet is being interpreted as a sensitive metabolic marker or as a mechanistic driver of pathology. In conditions such as NAFLD and NASH, it remains unclear whether reduced GlyBet reflects downstream consequences of hepatic dysfunction or whether GlyBet depletion itself contributes to disease progression, and the text shifts between these interpretations. The authors should consider separating the discussion of GlyBet as an observational biomarker from the discussion of GlyBet as a mechanistic contributor to disease, and provide a more rigorous appraisal of the betaine supplementation literature, including study design, effect size consistency, sample attrition, and outcome definitions.
Reply: We agree with the reviewer that distinguishing between biomarker interpretation and mechanistic causality is essential. In the revised manuscript, we explicitly adopt the interpretation that glycine betaine does not play a primary causal role in disease pathogenesis. We now state that variations in plasma GlyBet levels predominantly reflect underlying metabolic, hepatic, and renal dysfunction, and should be interpreted as downstream biomarkers rather than disease drivers. Any discussion of potential biological effects is framed as indirect and context-dependent, based largely on experimental models, and not as evidence of causal involvement in human disease (ll. 590-599).
- Although the manuscript notes that betaine levels vary substantially by species, agronomic conditions, ripeness, and post‑harvest handling, the sampling approach described in Section 2.1 appears too limited to capture this biological and agricultural variability. Each food item seems to have been sourced from only one general supplier without documentation of growing conditions, harvest timing, storage parameters, or seasonal context. The authors should consider acknowledging this limitation and its implications for estimating typical dietary exposure.
Reply: We thank the Reviewer for this thoughtful comment. We agree that betaine content in foods is influenced by multiple biological and agronomic factors, including species, growing conditions, ripeness, and post-harvest handling. As correctly noted, the food sampling described in Section 2.1 was not designed to capture the full extent of this variability. The primary aim of this analysis was to provide representative reference values for GlyBet and ProBet content across major food categories, rather than to estimate population-level dietary exposure. Food items were therefore sourced from standard commercial suppliers to reflect commonly available products, without detailed documentation of agronomic or seasonal parameters. We have now explicitly acknowledged this limitation in the revised manuscript and clarified that the reported values should be interpreted as indicative rather than exhaustive, particularly with respect to variability related to agricultural and post-harvest factors (ll. 324-330).
- The authors should consider expanding the introduction to more clearly justify the clinical relevance of betaine dysregulation in NAFLD, NASH, CVD, and neuropsychiatric conditions, and to explain why simultaneous investigation of GlyBet and ProBet is necessary and what specific knowledge gap this study aims to address.
Reply: This is helpful suggestion. We agree that the introduction can more clearly articulate the clinical relevance of betaine dysregulation and the rationale for jointly investigating glycine betaine and proline betaine. The Introduction has been expanded to better contextualize the involvement of betaine metabolism in NAFLD/NASH, cardiovascular disease, and neuropsychiatric conditions, and to explicitly define the knowledge gap addressed by this study. We now clarify that, despite extensive literature on individual betaines, a comparative and integrated evaluation of GlyBet and ProBet as dietary-derived biomarkers—linking food sources, pharmacokinetic behavior, and human metabolomic data—has been lacking. This revision strengthens the justification for the study and its relevance to nutrition-focused research (ll. 139-160).
- Although the LC‑MS analytical methods are well referenced, several essential components needed for reproducibility and interpretation are missing. The food sampling strategy is insufficiently described. The dietary intake calculations are not explained, making it difficult to understand how the mmol/day estimates were derived from the quantified food data. The regression model lacks validation through cross‑validation or sensitivity analyses, and no diagnostic checks are provided to assess whether statistical assumptions are met.
Reply: We thank the reviewer for this comprehensive comment addressing reproducibility and interpretability. Each of the raised points has been addressed in the revised manuscript (see also replies to previous comments). First, the food sampling strategy has been clarified to specify that samples were selected to represent commonly consumed food categories and were analyzed to characterize relative betaine distribution across dietary sources rather than to capture agricultural or seasonal variability (ll. 324-326). Second, dietary intake estimates have been explicitly described as being derived from published nutritional and metabolomic studies, and not calculated from the food concentration data generated in this work (ll. 480-494 see also reply to Rev 1). We now clearly distinguish between analytical food composition data, provided for dietary contextualization, and intake estimates (mmol/day), which were used at an aggregated, study-level for exploratory analysis. Third, with respect to the regression model, we now explicitly state that formal validation procedures such as cross-validation, sensitivity analyses, or residual diagnostics are not applicable in a strict statistical sense to literature-derived, aggregated data. The model is therefore framed as descriptive and hypothesis-generating, intended to provide qualitative insight into patterns of association rather than validated predictive inference. These clarifications have been incorporated to improve transparency, reproducibility of interpretation, and alignment with the exploratory scope of the study (ll. 268-277). Furthermore, the abstract has been completely rewritten to reflect all the revisions integrated into the manuscript.
- Some conclusions regarding GlyBet’s role in disease regulation extend beyond what the presented data can support. The claim that betaines act as modulators of cellular physiology isn't directly demonstrated, as the evidence more clearly positions them as biomarkers and only potential modulators pending further experimental validation. The authors should express statements about the clinical relevance of betaines more cautiously to avoid suggesting stronger evidence than the current data justify.
Reply: We acknowledge the reviewer for this important clarification. We fully agree that the evidence presented more strongly supports betaines as metabolic and nutritional biomarkers rather than as demonstrated modulators of cellular physiology or disease regulation. In the new version of the manuscript, all claims beyond what the presented data can support on betaines have been deleted. The text and, specifically, the Conclusions, have been revised to avoid overstatement and to clearly state that glycine betaine and proline betaine are best interpreted as biomarkers, while any potential modulatory effects remain indirect, context-dependent, and pending further experimental validation. Claims regarding clinical relevance have been deliberately expressed with caution to reflect the strength and limitations of the available data. The abstract has been completely rewritten to reflect all the revisions integrated into the manuscript.
Round 2
Reviewer 3 Report
Comments and Suggestions for Authors The manuscript has improved considerably and the authors have thoughtfully addressed all of my major concerns. I believe the work is now suitable for publication.

