Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Development of a Non-Invasive Clinical Machine Learning System for Arterial Pulse Wave Velocity Estimation

Appl. Sci. 2025, 15(9), 4788; https://doi.org/10.3390/app15094788

by Arturo Martinez-Rodrigo¹

, João Pedrosa²

, Davide Carneiro^2,3

, Iván Cavero-Redondo^4,*

and Alicia Saz-Lara⁴

Reviewer 1: Anonymous

Reviewer 2:

Hermann Gilly

Reviewer 3: Anonymous

Appl. Sci. 2025, 15(9), 4788; https://doi.org/10.3390/app15094788

Submission received: 25 March 2025 / Revised: 17 April 2025 / Accepted: 24 April 2025 / Published: 25 April 2025

(This article belongs to the Special Issue Biological Signal Development for Medical Support)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript "Development of a Non-Invasive Clinical Machine Learning System for Arterial Pulse Wave Velocity Estimation" presents the development and validation of different machine learning models to estimate arterial pulse wave velocity (aPWV) using common clinical markers routinely collected during standard medical examinations. The manuscript is well-written, clearly structured, and presents a transparent and reproducible methodology aligned with the TRIPOD (Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis) initiative. The study aligns well with the aims of Applied Sciences, but a few minor revisions are suggested to improve clarity and highlight the manuscript’s unique contribution:

1) Please correct the bold formatting issue on line 82.

2) In the sentence "To the best of our knowledge, no studies have aimed to estimate PWV without the analysis of complex signals, using only routine clinical variables that can be easily measured during any medical visit in primary care," the manuscript attempts to state its originality. However, the contribution remains unclear. Please revise this sentence to better emphasize how your approach differs from the existing studies mentioned in the introduction.

3) Based on Table 2, the feature importance analysis using permutation-based ΔMSE with Random Forest is methodologically sound and widely accepted. However, as an optional enhancement, the authors may consider complementing this with SHAP values or traditional feature importance scores to improve interpretability, particularly if individual-level insights are important for the study’s application.

4) In the conclusion (line 382), the phrase “The best performed model showed excellent prediction accuracy…” is too vague. Please specify which model performed best.

Comments on the Quality of English Language

Although the manuscript is generally well written, a thorough revision of the English language throughout the text is recommended. For example, in Section 2, line 100, there is a subject-verb agreement error: 'Data from three distinct datasets are collected and preprocessed through imputation and normalization to ensure uniformity.' The correct form should be 'Data from three distinct datasets is collected...' since 'data' is considered a singular mass noun in this context.

Author Response

See attached file "reviewer 1.pdf" for detailed response.

RESPONSE TO THE REVIEWER #1

We would like to thank the reviewer for his/her comments and suggestions that have allowed us to improve the manuscript. Below, we hope to give a precise reply to all the reported concerns.

In the interest of clearness, the reviewer’s comments are completely replicated below. After each comment, the authors’ response is included in a square box (see attached file). Where needed, the added or modified text has been indicated in blue throughout the manuscript for easy reading.

Reviewer’s comments:

Please correct the bold formatting issue on line 82.

Done.

In the sentence "To the best of our knowledge, no studies have aimed to estimate PWV without the analysis of complex signals, using only routine clinical variables that can be easily measured during any medical visit in primary care," the manuscript attempts to state its originality. However, the contribution remains unclear. Please revise this sentence to better emphasize how your approach differs from the existing studies mentioned in the introduction.

We thank the reviewer for this valuable observation.

In the revised version of the manuscript, we have restructured and rewritten the final paragraph of the Introduction to better emphasize the originality of our approach and clearly state the study hypothesis. We now clarify that previous studies on aPWV or cfPWV estimation typically rely on the analysis of biosignals or waveform-derived features, which require either complex signal processing pipelines or specialized equipment to acquire data. In contrast, our hypothesis proposes that aPWV could be reliably estimated using only routinely collected clinical variables, such as blood pressure, anthropometric measures or metabolic markers, commonly available in any general medical consultation.

Based on Table 2, the feature importance analysis using permutation-based ΔMSE with Random Forest is methodologically sound and widely accepted. However, as an optional enhancement, the authors may consider complementing this with SHAP values or traditional feature importance scores to improve interpretability, particularly if individual-level insights are important for the study’s application.

We sincerely thank the reviewer for this thoughtful suggestion.

Following this recommendation, we have included a complementary SHAP (SHapley Additive exPlanations) analysis in the revised manuscript to improve interpretability and gain further insight into the model’s behavior at the individual level. The resulting SHAP summary plot (see attached file) confirms the consistency with the findings obtained through the Random Forest permutation-based ΔMSE method, already computed.

This figure ranks the variables by their overall contribution to the model’s predictions, with the importance of each variable determined by the magnitude of its SHAP values across all samples, regardless of sign. The SHAP analysis confirms that Age and Systolic Blood Pressure (SBP) are by far the most influential features, followed by DBP, BMI, and Waist Circumference. In contrast, features such as HbA1c, AGEs, Resting Heart Rate (RHR), Sex, and Smoking Status exhibit minimal to negligible contribution, which reinforces their exclusion from the final reduced model.

These new results are consistent with the feature importance rankings obtained through permutation-based analysis using Random Forest, thereby strengthening the robustness of our feature selection process and supporting the reliability of the proposed model for estimating aPWV from clinically accessible variables.

Corresponding comments and interpretations have been added to the Methodology and Results sections of the revised manuscript (see Section 2.4 last paragraph and Section 3.1)

In the conclusion (line 382), the phrase “The best performed model showed excellent prediction accuracy…” is too vague. Please specify which model performed best.

We thank the reviewer for the suggestion. The sentence in the conclusion has been now revised to specify that “among the models tested, Polynomial Regression achieved the highest prediction accuracy …”

We thank the reviewer for pointing out this language issue. The specific grammatical error mentioned has been corrected. In addition, we have conducted a thorough revision of the English language throughout the manuscript to improve clarity, consistency, and grammatical accuracy.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors, find all the remarks in the inserted text boxes in reference to the highlighted text.

Basically you have to clearly outline the limitations of your study as you did use a reference method with just an "acceptable" but not "excellent" level of accuracy. Also you trained your models with aPMV data mostly smaller than 10m/s (as known in healthy patients) but your final aim is to correctly predict aPWV in cardiovascular impaired/diseased patients. Therefore the accuracy of your models must be verified with more data obtained in such a cohort.

a few spelling errors also to be corrected

Comments for author File: Comments.pdf

Author Response

see attached file "Reviewer 2.pdf" for a detailed response.

RESPONSE TO THE REVIEWER #2

We would like to thank the reviewer for his/her comments and suggestions that have allowed us to improve the manuscript. Below, we hope to give a precise reply to all the reported concerns.

Reviewer’s comments:

Overall Comment:

We would like to thank the reviewer for this important observation.

We fully agree that these are relevant limitations that must be acknowledged in the manuscript. In response, we have now added a dedicated subsection of Limitations at the end of the Discussion section explicitly outlining two key limitations of our study:

First, we acknowledge that although the Mobil-O-Graph device provides a aproximated estimation of aortic pulse wave velocity (aPWV), it is not the gold standard carotid-femoral Pulse Wave Velocity (cfPWV) measured via applanation tonometry. This limitation may introduce a degree of measurement bias that should be considered when interpreting the results presented in this study.

Second, we highlight that the training data primarily included individuals with aPWV values below 10 m/s, which is typical of apparently healthy populations. Although external validation on the ExIC-FEp cohort, which includes patients with elevated arterial stiffness, supports the generalizability of the models, further validation using larger and more representative datasets of cardiovascular patients is needed to ensure consistent performance across the entire risk spectrum.

Introduction:

I am not aware of any oscillometric sensor. Please explain the oscillometric sensor.

We thank the reviewer for pointing out this imprecision.

The term “oscillometric sensor” is indeed incorrect, and we apologize for the confusion. What is oscillometric is the method, not the sensor itself (as is already correctly stated elsewhere in the same paragraph). In fact, the oscillations in cuff pressure during inflation and deflation, caused by arterial pulsations, are detected by a pressure sensor.

We have corrected this misleading term in the manuscript and have also taken the opportunity to clarify this explanation in the revised version to ensure accuracy (See Introduction Section, paragraph 3.

Why are you entitled to state "extensive validated in clinical studies" ?Your reference is a single report on measurements in 109 male and 11 female patient aged 62+-11 years, 22 patients being 50 years of age or younger and 29 patients being 70 years of age or older.

This would match your ExIC-FEp dataset but not in respect to male/female ratio as well as being suspected coronary artery disease patients in contrast to your apparently healthy cohort. Anyway it is different from your training data set. Please discuss the relevance in a paragraph "Limitations"

We thank the reviewer for this valuable observation and fully agree that the original wording ("extensively validated") was inaccurate. As rightly pointed out, the reference we initially provided does not justify such a strong claim, especially considering the differences in study populations.

In response, we have now removed the phrase “extensively validated” from the manuscript. Instead, we state that the Mobil-O-Graph has been used as a surrogate method to estimate aPWV in both research and clinical settings (see Introduction Section, paragraph 3). Some comparative studies have reported moderate correlation between Mobil-O-Graph estimates and cfPWV measurements [1], supporting its use as a proxy, particularly in contexts where access to gold-standard is limited.

Furthermore, we have added a dedicated Limitations section in the revised manuscript, where this issue is discussed explicitly. We acknowledge that the Mobil-O-Graph is not the gold standard for PWV assessment, and that, as emphasized in recent reviews, further validation in larger and more diverse clinical populations is still needed to support the widespread clinical adoption of this technique.

[1] Grillo, A., et al. "Non-Invasive Measurement of Aortic Pulse Wave Velocity: A Comparative Evaluation of Eight Devices." Journal of Hypertension, vol. 36, June 2018, p. e199. DOI: 10.1097/01.hjh.0000539556.02943.81

Data Collection:

Please explain why your blood pressure measurements involved 2 different devices (Omron and Mobil-O-Graph)

We thank the reviewer for this observation.

We would like to clarify that although the Mobil-O-Graph® device is capable of simultaneously measuring blood pressure and aPWV, in this study it was used exclusively for the estimation of aortic pulse wave velocity. All blood pressure measurements used in the models, specifically systolic and diastolic blood pressure, along with derived variables such as pulse pressure, were obtained using the Omron® M5-I monitor.

This choice was intentional and aligns with the main goal of our study: to develop a predictive model that relies solely on clinical variables routinely collected in standard medical visits. It is important to note that while conventional blood pressure monitors are ubiquitous in primary care settings, advanced diagnostic tools like the Mobil-O-Graph® are not commonly available, particularly in underserved or low-resource environments. Therefore, using the Omron® device for blood pressure measurements reflects a more realistic and accessible clinical scenario for the intended application of our model.

We have clarified this point in the revised manuscript, now explicitly stated in the last paragraph of Section 2.3 Data Collection, to avoid any misunderstanding regarding the roles of the two devices used.

I think you want to state that from two measurements the mean of aPWV was calculated

Thak you for this helpful observation.

We agree that the original phrasing may have been ambiguous. Our intention was indeed to state that, in this study, two aPWV measurements were taken five minutes apart, and their average was used as the final value.

The sentence has now been revised in the manuscript to clarify this point and avoid confusion.

Results:

No data shown for HbA1c or glycolation in any of your results

We thank the reviewer for this valuable observation.

The reason why no results are shown for HbA1c and AGEs is that, during the feature selection phase, these variables demonstrated negligible or even negative contribution to the model's predictive performance. Specifically, permutation-based ΔMSE analysis revealed that including HbA1c and AGEs did not improve prediction and, in some cases, slightly degraded model accuracy. For clarity, Table 2 in the previous version of the manuscript presented only the features with a mean positive impact on model performance, as explicitly stated in its caption. Furthermore, in response to another reviewer’s suggestion, we have now performed an additional SHAP (SHapley Additive exPlanations) analysis to assess the contribution of each variable across all individual predictions. This analysis corroborated the previous findings, confirming that HbA1c and AGEs have a minimal effect on the model's output.

Anyway, in response to the reviewer’s comment, we have now expanded the feature importance table in the revised manuscript to include all variables assessed, regardless of their impact on performance. Figure’s caption has also been updated. Additionally, we have added a paragraph at the end of Section 3.1 to explicitly explain the exclusion of these variables from the final models, highlighting both their low predictive relevance and the practical implications of reducing model complexity.

“These observations highlight the need for further external validation, especially with patients exhibiting high aPWV values, to assess and potentially enhance the model’s predictive capability in this subgroup.”

This paragraph - and in addition a few others - should be shifted into a chapter limitations which in this submission is missing

We agree with the reviewer’s observation. This limitation has now been moved to the newly added Limitations subsection, where it is explicitly addressed along with other relevant considerations.

Please explain "sensitivity to data shifts"

Thank you for this helpful observation.

In this context, what we meant by “sensitivity to data shifts” is the models’ ability to maintain predictive accuracy when applied to external datasets that, although derived from similar populations, may present slight variations in the distribution of clinical variables. For example, although both EVasCu and VascuNET consist of apparently healthy individuals, they differ slightly in average age and cardiovascular profiles, which may introduce minor distributional shifts.

In our results, models such as LR, BR, and SVR showed a more noticeable drop in performance compared to PR and NN when tested on external datasets. This suggests that, under the conditions of our study, these models may be somewhat less robust to such shifts

We have now revised the manuscript to better explain what is meant by this expression, ensuring a clearer and more accurate interpretation within the study context.

VOP abbreviation ????

We sincerely thank the reviewer for pointing this out.

The abbreviation “VOP” mistakenly appeared in the manuscript due to a language oversight, as it corresponds to the Spanish acronym for Pulse Wave Velocity (“Velocidad de Onda de Pulso”) and is occasionally used informally in our native academic environment.

Now, the new version of the manuscript has been carefully reviewed to remove this inconsistency, and all occurrences of “VOP” have now been replaced with “aPWV.” We apologize for this error and appreciate the reviewer’s attention to detail.

Discussion:

From the graphs it becomes obvious that in this data set (ExIC-FEp) apparently about 40 to 50% of study participants exhibit an aPWV higher than 10m/s (a value to be considered as an index of asymptomatic organ damage according to the European Society of Cardiology Guidelines for the management of arterial hypertension) which is not the case in the other 2 datasets. Thus you are training your algorithms preferentially on data sets with lower aPWV than present in a potentially diseased population. Please discuss extensively. Maybe this is the very/another reason of poor performance of LR, BR and SVR models as depicted in figure 4.

We appreciate this reviewer’s important observation.

The primary rationale behind our approach was to develop and validate a model using a broad age range and a diverse yet structured clinical spectrum. The EVasCu dataset, used for training, comprises a relatively large and balanced sample of apparently healthy individuals across a wide range of ages. This choice allowed us to build an initial predictive model that performs well across the lower-to-mid aPWV spectrum.

We fully acknowledge, however, that a model trained mostly on subjects with lower aPWV values may not generalize as effectively to patients with advanced arterial stiffness or overt cardiovascular disease. To partially address this concern, we included external validation using the ExIC-FEp dataset, which contains a cohort of patients diagnosed with heart failure with preserved ejection fraction. The fact that our best models (PR and NN) maintained high predictive accuracy in this high-risk population supports the model’s robustness and potential applicability beyond the training range.

Furthermore, as an additional analysis, we conducted a reverse validation experiment in which the models were trained on the combined VascuNET + ExIC-FEp datasets, which include older individuals and patients with cardiovascular impairments, and subsequently validated on the EVasCu dataset. The results were highly consistent, with PR and NN again outperforming other models in external validation and achieving R² values of 0.95 and 0.91, respectively (see Table in attached document "Reviewer 2.pdf"). This additional analysis confirms that the results do not depend on a specific dataset for training, further supporting the robustness of the proposed modeling approach. Interestingly, models such as LR, BR, and SVR, whose performance dropped more substantially in the original external validation, achieved considerably better results under this reverse scheme. This suggests that their previously observed sensitivity may stem from being applied to clinically distinct populations that were not represented in the training set, rather than from intrinsic model limitations, as hypothesized by the reviewer.

Unfortunately, under this reversed validation scheme, it is no longer possible to assess model generalization on a clinically distinct cohort of patients with overt cardiovascular disease, as this population is now part of the training set. For this reason, we opted to retain the original validation strategy, training on EVasCu and validating on VascuNET and ExIC-FEp, as the most clinically relevant configuration, which is the one presented in the current manuscript.

That said, we fully agree with the reviewer that future work should involve the recruitment of larger and more diverse patient cohorts with elevated aPWV, and this has been already stated in the limitations of the study. Indeed, this would enable more granular validation strategies, such as splitting the cardiopathic population into separate training and testing subsets and ultimately allowing a more rigorous evaluation of model performance in high-risk clinical settings.

Finally, in the revised version of the manuscript, two additional paragraphs (the last ones) have been included at the end of the Discussion section to address this important point, as highlighted by the reviewer.

Please explain why the BMI increase in the young chinese cohort (less than 44 years in age) as investigated in cit 26 being a very different sample in comparison with your datasets is to be associated - at least in part - with any influence on the aPWV in your study simultaneously considering that the positive deltaSME in BMI (with a huge standard deviation) is very small.

Why do you include BMI, a derived parameter?

What would be the result when leaving it out from your analysis?

We thank the reviewer for raising this point.

The cited study by Wang et al. (reference 26) was not intended to serve as direct empirical validation for our dataset or modeling approach, but rather to support the broader physiological idea for including BMI as a potential predictor of arterial stiffness. Nevertheless, we agree with the reviewer that the characteristics of their cohort differ substantially from ours, particularly in terms of age, ethnicity, and follow-up duration. For this reason, we have not interpreted or presented the results of that study as directly comparable. Instead, our reference to it was meant to provide general support for the established link between BMI and vascular aging.

On the other hand, although BMI is a derived variable, it is routinely used in clinical settings as a surrogate marker of adiposity and metabolic risk. Its inclusion was initially justified by its clinical relevance and widespread use (as we wanted to justify with the inclusion of cite 26). However, as the reviewer correctly notes, the ΔMSE analysis in our study showed only a marginal contribution to model performance, with a high standard deviation, thus indicating inconsistency across cross-validation folds.

Interestingly, further insight was gained from the SHAP analysis conducted in the revised version of the manuscript, where BMI exhibited a higher individual predictive impact than either of its base components, weight and height. This suggests that BMI may capture interaction effects between these variables that could be more informative for modeling arterial stiffness in certain individuals. However, following the reviewer’s suggestion, we conducted an additional study to test the actual contribution of BMI to model performance, and the results were mixed. Removing BMI slightly improved performance in some models (e.g., SVR and BR), had minimal impact in others (e.g., PR), and caused only minor decreases in a few cases (e.g., NN and LR). Taken together, these findings suggest that, although BMI may carry relevant information in specific cases, it does not provide consistent or substantial added value in this particular dataset, likely because its information is already implicitly captured by the combination of height and weight, as correctly noted by the reviewer.

We ultimately decided to retain it in this approach, as SHAP analysis suggested that BMI may carry individualized predictive value beyond its base components, particularly in certain subsets of the population. Nevertheless, these new findings are now discussed in the revised manuscript (see Discussion section) to provide a transparent and nuanced interpretation of BMI’s role within our modeling framework.

Conclusions:

Rsquared of 0.95 in your PR and NN model in the Vascu NET dataset is fine and impressing.

In your 3rd dataset being a mixture of healthy and heart disease patients, the best performing models PR and NN achieve a Rsquared of about 0.9 (only). This result appears to be the most relevant of your paper.

However, as the attempted goal is a most reliable aPWV prediction in clinically suspicious patients, the performance of your models yet remains completely open in such a cohort.

This fact must be clearly highlighted in a pragraph "Limitations"

Also the question why you used an "acceptable" but not excellent reference method (see Milan`s paper: Current assessment of pulse wave velocity: comprehensive review of validation studies) remains to discussed and clarified in a paragraph "Limitations"

We thank the reviewer for this valuable comment. A dedicated subsection on Limitations was already included in the revised manuscript, where the limitations raised are explicitly discussed. In addition, we have incorporated the relevant reference suggested by the reviewer (Milan et al.) to support our discussion on the limitations of the reference method used.

Furthermore, we have now extended the Limitations section to explain not only that the Mobil-O-Graph® is not the gold standard, but also why it was chosen. Specifically, the decision was based on its non-invasive, operator-independent design, lower cost, and suitability for large-scale, real-world clinical use. These points are now reflected at Limitation Section in the revised manuscript, as suggested by the reviewer.

more accessible is fine, but the purchase costs of the Mobile-O-Graph device is rather high at least for underserved or remote areas

We are fully aligned with the reviewer’s observation. The original wording may have been misleading, and we agree that the cost of devices like the Mobil-O-Graph® can indeed be a barrier in underserved or resource-limited areas. In response, we have revised the final paragraph of the Conclusions section to clearly state that our proposed approach is intended as an alternative for estimating aPWV in settings where access to specialized and costly diagnostic equipment is not feasible.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Authors should correct:

Using abbreviations - at first appearance in the text the whole name behind the abbreviation should be provided. For example, in line 127 there is Figure 1. with abbreviations for model training - LR, PR...later model evaluation with MSE, RMSE and R**2...these should be (at least briefly as whole words with abbreviations) explained before or immediately below the diagram within Figure 1.
Table elements explanation - for each table, there should be explanation of table elements and abbreviations below the table (in the table footer), regardless they have already (or not) explained in the preceding text - for Table 1. it is not clear what Age SD stands for, for Table 2. all abbreviations should be explained shortly in the table footer, such as for SBP, DBP etc.
Table 5. should be placed within the discussion section.
Introduction - authors should formulate research questions or hypotheses or list key contributions in the last paragraph of the introduction. Currently, last paragraph contains research gap, motivation and descriptive form of the contribution. It would be clearer what problems this research is targeting and what are contributions, if they are listed in bullets.
Citation of sources - authors should provide sources for all referenced text. For example, in lines 108-117 there are VascuNET and STROBE mentioned, but no literature citing was provided for that part of text.
Figure 1. placement - it has been placed within 2.1. subsection "Study population", but it is related to the whole procedure of the experiment. Before Study population there should be subsection "2.1. The proposed method" and text explaining the whole experiment process, with Figure 1. placed here.
Data source for the experiment - authors state in 2.1. that they use "off-the-shelf" ready datasets from some sources as a research sample and later they state in 2.2. that they perform questionnaires (line 137) to collect data. How do they integrate off-the-shelf data and personalized real people data? This is impossible, since these are two different groups of people! Later they also state that they used some medical devices to measure blood-related data from real people- Omron® M5-I monitor(line 145), Mobil-O-Graph) (line 154). I suppose the "Dataset data" were related to machine learning training, while real people data (obtained from questionnaires and measurements with devices) were used after the machine learning has been trained, to test the models obtained from training? All these are not clear, since data sample for training (currently entitled Study population) and data collection were provided too early in the Materials and methods section. I suggest that Materials and methods should be changed: 1) first subsection should be entitled - The proposed method, 2) Variable selection (here should be explained what are the relevant measurements and other variables that will be analyzed from training and evaluation data sets and why these variables are important - currently presented in section 3.1. ; 3) Training data collection (includes off the shelf data) 4) Evaluation data collection (includes questionnaire and measurements), 4) Data preprocessing...
It is not clear and it is not commonly selected which data are used for training and model preparation and which for prediction model evaluation. In lines 215-223 it is stated that VascuNET and ExIC-FEp are used for validation of developed model? Should it be different - to have these off-the-shelf datasets for training and real(questionnaire and measurement) data for prediction model evaluation?
The proposed method - should be explained what is the aim of this research - to make prediction model by using machine learning - the model is to be created via training with off-the-shelf data and evaluated with measurement/questionnaire data?
Figure 3. and Figure 4. contain diagrams that represent results of correlation between measurements and potentials of creating prediction models with correlating these measurements. These diagrams are very important, but the font is too small, so it is suggested to make less diagrams in a row (3 max) and to enlarge fonts at x and y axis, in aim to make these diagrams readable.

Author Response

See the attached file "Reviewer 3.pdf" for a detailed response.

RESPONSE TO THE REVIEWER #3

We would like to thank the reviewer for his/her comments and suggestions that have allowed us to improve the manuscript. Below, we hope to give a precise reply to all the reported concerns.

In the interest of clearness, the reviewer’s comments are completely replicated below. After each comment, the authors’ response is included in a square box (see detailed response "Reviewer 3.pdf") . Where needed, the added or modified text has been indicated in blue throughout the manuscript for easy reading.

Reviewer’s comments:

Using abbreviations - at first appearance in the text the whole name behind the abbreviation should be provided. For example, in line 127 there is Figure 1. with abbreviations for model training - LR, PR...later model evaluation with MSE, RMSE and R**2...these should be (at least briefly as whole words with abbreviations) explained before or immediately below the diagram within Figure 1.

Thank you for your helpful observation.

While all abbreviations used in Figure 1 (such as LR, PR, BR, SVM, NN, MSE, RMSE, and R^2) are already defined in the main text, we agree that including their full definitions in the figure caption improves clarity and immediate accessibility for the reader. We have therefore revised the caption of Figure 1 to include all relevant definitions, ensuring that the figure is self-contained and easier to interpret without requiring reference to the main text.

Table elements explanation - for each table, there should be explanation of table elements and abbreviations below the table (in the table footer), regardless they have already (or not) explained in the preceding text - for Table 1. it is not clear what Age SD stands for, for Table 2. all abbreviations should be explained shortly in the table footer, such as for SBP, DBP etc.

We thank the reviewer for this helpful suggestion.

In the revised version of the manuscript, we have added explanatory footnotes below each table to define all abbreviations and clarify the table elements. Specifically, abbreviations such as SBP (Systolic Blood Pressure), DBP (Diastolic Blood Pressure), and many others are now explained directly within the footnotes of the corresponding tables.

Table 5. should be placed within the discussion section.

We appreciate the reviewer’s observation. Table 5 has now been repositioned to appear within the Discussion section, as suggested.

Introduction - authors should formulate research questions or hypotheses or list key contributions in the last paragraph of the introduction. Currently, last paragraph contains research gap, motivation and descriptive form of the contribution. It would be clearer what problems this research is targeting and what are contributions, if they are listed in bullets.

We appreciate this constructive suggestion.

In response, we have now revised and restructured the final paragraph of the introduction to improve clarity regarding the study’s objective and contributions. Specifically, we have explicitly stated the working hypothesis and added a list of clearly formulated research questions and expected contributions (in bullets), which we believe clearly define the scope of the study and highlight its novelty and potential impact.

Citation of sources - authors should provide sources for all referenced text. For example, in lines 108-117 there are VascuNET and STROBE mentioned, but no literature citing was provided for that part of text.

Thank you for pointing this out.

We have now revised the manuscript to include appropriate references for all previously uncited mentions. Specifically, we have now cited the VascuNET study with its corresponding reference, and we have added a citation to the STROBE guidelines where they are mentioned in the Study Population subsection.

Figure 1. placement - it has been placed within 2.1. subsection "Study population", but it is related to the whole procedure of the experiment. Before Study population there should be subsection "2.1. The proposed method" and text explaining the whole experiment process, with Figure 1. placed here.

We thank the reviewer for this helpful observation.

In the revised manuscript, we have reorganized the structure of Section 2 to improve the logical flow and contextual alignment of Figure 1. Specifically, we have introduced a new subsection entitled “2.1. The Proposed Method”, as pointed out by the reviewer, and placed before the description of the study population. This new subsection now provides a comprehensive overview of the entire experimental process, including dataset collection, preprocessing, feature selection, model training using 10-fold cross-validation, hyperparameter optimization, and external validation on independent datasets. Finally, Figure 1 has been relocated to this section accordingly.

Data source for the experiment - authors state in 2.1. that they use "off-the-shelf" ready datasets from some sources as a research sample and later they state in 2.2. that they perform questionnaires (line 137) to collect data. How do they integrate off-the-shelf data and personalized real people data? This is impossible, since these are two different groups of people! Later they also state that they used some medical devices to measure blood-related data from real people- Omron® M5-I monitor(line 145), Mobil-O-Graph) (line 154). I suppose the "Dataset data" were related to machine learning training, while real people data (obtained from questionnaires and measurements with devices) were used after the machine learning has been trained, to test the models obtained from training? All these are not clear, since data sample for training (currently entitled Study population) and data collection were provided too early in the Materials and methods section.

We appreciate the reviewer’s concern and the opportunity to clarify this point.

All the datasets used in this study (EVasCu, VascuNET, and ExIC-FEp) were not off-the-shelf or publicly available datasets ready to use, but rather datasets collected directly by the research team through clinical fieldwork in real-world healthcare settings. This includes physiological measurements using validated medical devices (e.g., Omron®, Roche Diagnostic®, Mobil-O-Graph®) and basic clinical information obtained through structured clinical interviews (e.g., age, sex or smoking status).

The term “direct questioning” in the original text (at the beginning of the old subsection 2.2. Data Collection) referred specifically to clinician-administered interviews, during which demographic and lifestyle information was collected directly from participants. It did not refer to self-administered questionnaires or standardized survey instruments. To avoid confusion, we have now revised the wording in Section 2.3 “Data Collection” to more accurately reflect the nature of the data collection process.

Therefore, all data collected from medical devices to clinical interview, were gathered from the same set of participants within each respective study. And consequently, there is no integration of disparate data sources, and the dataset is fully coherent in both training and validation phases.

Accordingly, clarifications and rewording have been incorporated into the newly titled Sections 2.2 Study Population and 2.3 Data Collection to ensure greater clarity and avoid misunderstandings regarding the origin and collection methods of the data used.

I suggest that Materials and methods should be changed: 1) first subsection should be entitled - The proposed method, 2) Variable selection (here should be explained what are the relevant measurements and other variables that will be analyzed from training and evaluation data sets and why these variables are important - currently presented in section 3.1. ; 3) Training data collection (includes off the shelf data) 4) Evaluation data collection (includes questionnaire and measurements), 4) Data preprocessing...

As clarified in our previous response, we would like to reiterate that no off-the-shelf or publicly available datasets were used in this study, nor were standardized questionnaires administered. All datasets were collected directly by the research team through clinical fieldwork carried out in real healthcare settings.

Furthermore, both the training and evaluation datasets consist entirely of original, firsthand measurements from different participant groups, not from different data sources or methodologies, and were collected using the same protocol and the same set of demographic, lifestyle, clinical, and biochemical variables.

That said, and in line with the reviewer’s helpful suggestion, we have reorganized the Materials and Methods section to improve clarity and flow. Specifically:

A new introductory subsection titled “The Proposed Method” (Section 2.1) has been added to provide an overview of the entire methodology;
Sections 2.2 (“Study Population”) and 2.3 (“Data Collection”) now include clearer explanations of how and where the data were collected, in order to avoid potential misunderstandings.

It is not clear and it is not commonly selected which data are used for training and model preparation and which for prediction model evaluation. In lines 215-223 it is stated that VascuNET and ExIC-FEp are used for validation of developed model? Should it be different - to have these off-the-shelf datasets for training and real(questionnaire and measurement) data for prediction model evaluation?

We thank the reviewer for this interesting comment.

Regarding the rationale for using EVasCu as the training dataset, we chose this configuration because EVasCu is the largest and most balanced dataset in terms of population distribution. Its size allows for more effective model training and tuning via cross-validation, reducing the risk of overfitting.

Nevertheless, we find the reviewer’s suggestion highly valuable. To explore whether the model’s performance is influenced by the choice of training dataset, we conducted an additional experiment in which we trained the models on VascuNET + ExIC-FEp and used EVasCu as the external validation dataset.

The Table below shows the results for this additional experiment. As can be observed, the models retain excellent performance, particularly for Polynomial regression (PR) and Neural Network (NN), with R² values of 0.95 and 0.91, respectively, on the external validation set, which support the generalisability of the proposed models regardless of the training-evaluation split and further confirm the robustness of our approach.

Despite the valuable insights provided by this additional analysis, we have decided not to include it in the final version of the manuscript. The results align closely with those already presented, further reinforcing the robustness and generalizability of the proposed models. However, incorporating these extra results could potentially introduce redundancy and distract the reader from the main narrative. In this sense, we preferred to keep the manuscript focused and streamlined.

The proposed method - should be explained what is the aim of this research - to make prediction model by using machine learning - the model is to be created via training with off-the-shelf data and evaluated with measurement/questionnaire data?

We thank the reviewer for this observation.

As part of the revised manuscript, we have already included a clearly stated working hypothesis as well as a bullet-pointed list of research questions and expected contributions at the end of the introduction. To further clarify the objective of our study, we have now added an additional bullet that explicitly states the aim of developing a machine learning-based predictive model for aPWV estimation using only routinely collected clinical variables, trained and evaluated entirely on prospectively collected real-world datasets.

Figure 3. and Figure 4. contain diagrams that represent results of correlation between measurements and potentials of creating prediction models with correlating these measurements. These diagrams are very important, but the font is too small, so it is suggested to make less diagrams in a row (3 max) and to enlarge fonts at x and y axis, in aim to make these diagrams readable.

We thank the reviewer for this constructive observation. Indeed, we agree that the previous formatting of the figures made the axis labels and annotations difficult to read.

To address this, we explored alternative layouts, including 3x2 and 2x2 configurations, as suggested by the reviewer. However, these formats resulted in overly large figures that disrupted the flow and readability of the manuscript. After testing several configurations, we opted to maintain the original 5x2 layout but implemented the following adjustments to enhance clarity:

* Removed redundant axis labels and titles to reduce visual clutter.

* Decreased the vertical spacing between rows to bring related subplots closer together

* Increased the font size of axis labels and tick marks to ensure legibility.

A comparison between the original and improved layouts is shown below to illustrate these enhancements.

(see detailed response "Reviewer 3.pdf" for better details in Figures).

BEFORE:

AFTER:

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors, you nicely included all the necessary (my opinion) improvements in your updated submission. Very well done.

Article Menu

Development of a Non-Invasive Clinical Machine Learning System for Arterial Pulse Wave Velocity Estimation

See the attached file "Reviewer 3.pdf" for a detailed response.

Further Information

Guidelines

MDPI Initiatives

Follow MDPI