Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Wearable Sensors for Athletic Performance: A Comparison of Discrete and Continuous Feature-Extraction Methods for Prediction Models

Mathematics 2024, 12(12), 1853; https://doi.org/10.3390/math12121853

by Mark White^1,*

, Beatrice De Lazzari^2,3,4

, Neil Bezodis¹

and Valentina Camomilla^2,4

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Mathematics 2024, 12(12), 1853; https://doi.org/10.3390/math12121853

Submission received: 15 April 2024 / Revised: 7 June 2024 / Accepted: 8 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Trends and Prospects of Numerical Modelling in Bioengineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors investigate the efficacy of feature extraction methods to model countermovement jump performance using wearable sensor data. They have proven that continuous features derived from Functional Principal Component Analysis outperform discrete features in many respects.

Remarks:

1. A list of acronyms would be useful.

2. The equation in line 240 is untrue.

3. The authors wrote in lines 431-433 that "as the number of features increased, continuous feature models were able to make effective use of the additional predictors, improving their validation performance." However, based on Figure 4 b) this statement is not true for the Validation Smartphone Dataset since standardized RMSE increases as the number of features increases for discrete, continuous, and combined features in the Validation Smartphone Dataset.

4. Plots in Figure A5 in the second and fourth rows and columns 1 and 2 are redundant since they are duplicates of plots from the first and third rows and columns 1 and 2. Moreover, Figures in rows 3 and 4 and columns 1 and 2 probably concern data from the accelerometer, not the Smartphone.

5. The reference 26. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Transactions on Signal Processing 2014, 62, 531–544. 789
https://doi.org/10.1109/TSP.2013.2288675
does not occur in the manuscript.

Comments on the Quality of English Language

The meaning of the sentence "The feature selection method chosen was based on Lasso regression using least squares." in lines 184-185 is unclear. Maybe, the authors meant "The chosen feature selection method was based on Lasso regression using least squares."

Author Response

A list of acronyms would be useful.

We have added a table of acronyms on page 3.

The equation in line 240 is untrue.

Thanks for spotting the typo. 100 was the correct number of model fits. We did 5 rather than 10 repeats of two-fold CV, which was repeated 10 more times for the 10 data truncations. We favoured more different ways of truncating the data than repeating CV. We have amended the text accordingly, using blue to highlight the change.

The authors wrote in lines 431-433 that "as the number of features increased, continuous feature models were able to make effective use of the additional predictors, improving their validation performance." However, based on Figure 4 b) this statement is not true for the Validation Smartphone Dataset since standardized RMSE increases as the number of features increases for discrete, continuous, and combined features in the Validation Smartphone Dataset.

Yes, you are correct. Although we noted this in the results section, our summary in the discussion did not acknowledge this point. We have therefore updated the discussion to highlight this difference and then to attribute data quality to be a possible cause, suggesting that the handheld Smartphone may have registered variable extraneous motion unlike the Accelerometer which was held firmly in place, albeit for skin-movement artefact. We updated section 4.2 (model robustness) and also sections 4.3 (generalisability) and 4.6 (limitations and future directions).

Plots in Figure A5 in the second and fourth rows and columns 1 and 2 are redundant since they are duplicates of plots from the first and third rows and columns 1 and 2. Moreover, Figures in rows 3 and 4 and columns 1 and 2 probably concern data from the accelerometer, not the Smartphone.

Thank you for pointing out the redundancy. It has been removed so there is one row for the Smartphone models and one for the Accelerometer models. We have retained the use of narrow and wide scales for the discrete features because we think it is important to show how much these beta coefficients vary in magnitude. One plot cannot adequately show the variations in magnitude at different scales. Without the redundancy, we have revised the panel layout by placing the two Accelerometer model plots on the same row (one plot for each of the chosen alignment points) rather than have them on separate rows, one above the other.

The reference 26. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Transactions on Signal Processing 2014, 62, 531–544. 789
https://doi.org/10.1109/TSP.2013.2288675
does not occur in the manuscript.

It appears in the manuscript in section 2.2 on line 121.

We have updated the manuscript with the proposed rewording. Thank you.

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for submitting an interesting paper. My comments are:

1. Why did you only use only 2 independent studies and not more?

2. You mentioned that converted the time series into smooth functions using a 4th order B-spline. Why specifically a 4th order?

3. You mentioned that you used Lasso regression for feature selection. You should also consider the Elastic Net and SCAD penalties as candidates for feature selection. The Elastic net addresses both feature selection and multicollinearity.

4. Why did you specifically use 2 fold cross validation? Was it due to sample size?

5. You refer to features (par 3.2) that are generally normally distributed. Why is this the case, I would argue that there should be some features that are not Gaussian. This problem could be phrased in a non-parametric setting- but this is likely important to mention in future work

6. You limit of 20 optimisation iterations (par 2.6). Are you sure that the results were feasible?

Author Response

Why did you only use only 2 independent studies and not more?

The two studies we chose for this analysis were conducted by our research teams, who had independently explored similar ideas using wearable sensors to estimate jump performance. Upon discovering our shared interests and the compatibility of our datasets, we decided to collaborate and pool our data to address the three research questions posed in this manuscript.

Our studies had moderate to large sample sizes, were conducted on similar cohorts, and differed primarily in the sensors used and their attachment methods. By focusing on our own datasets, we had full access to the raw data and a thorough understanding of the data collection protocols, enabling us to perform a detailed and nuanced analysis. Importantly, we also had ethical approval to perform the current analyses on both datasets.

While including a broader range of studies could enhance the generalizability of our findings, we believe the in-depth analysis of the datasets from these two representative studies provides valuable insights into the key factors influencing the performance of wearable sensor-based models for estimating jump performance.

You mentioned that converted the time series into smooth functions using a 4th order B-spline. Why specifically a 4th order?

Fourth order is the lowest order that may be chosen for a smooth continuous function representing the data if high curvature is to be penalised in the fitting process. In terms of b-splines, 4^th order implies the basis functions are cubic, which is the lower order we could reasonably choose to fit to the data. Moreover, since the roughness penalty must be applied two orders lower, that is to the curvature (second order derivative). Fourth order b-splines are therefore typically used for fitting continuous smooth functions to data for these reasons (Ramsay & Silverman, 2005).

You mentioned that you used Lasso regression for feature selection. You should also consider the Elastic Net and SCAD penalties as candidates for feature selection. The Elastic net addresses both feature selection and multicollinearity.

Thank you for the suggestions. We appreciate your insight and agree that these methods have their merits in addressing feature selection and multicollinearity. In our study, we used Lasso regression for feature selection as a preprocessing step for all model types. The subsequent model fitting for prediction therefore did not utilise the non-zero coefficients from the Lasso feature-selection model. That was also the case for the Lasso prediction model, which started with the shortlist of features, as did the other three model types. We have added sentence to the end of the second paragraph in section 2.4 making this clear to the reader.

Given that feature selection was a separate prior operation, the specific choice of penalty (Lasso, Elastic Net or SCAD) would not have a significant impact on the final predictive model’s performance. The key aspect is the sparsity of coefficients, which determines the selected features. The exact values of the remaining non-zero coefficients does not apply. While Elastic Net and SCAD offer advantages in terms of handling multicollinearity and providing flexibility in feature selection, respectively, their benefits are more pronounced when the model used for feature selection is also used for prediction. In our case, that would not be the case and so the impact of these advantages may be limited.

Therefore, we believe that our current approach of using Lasso regression for feature selection as a preprocessing step is sufficient for our specific problem setting. The Lasso penalty provides a straightforward and widely used method for identifying relevant features, and it has been effective in reducing the dimensionality of our dataset before fitting the final model. We believe that Lasso regression for feature selection is appropriate for our study as it provides a reasonable balance between simplicity and effectiveness.

Why did you specifically use 2 fold cross validation? Was it due to sample size?

Our study was based on model comparisons throughout so it was essential that our cross-validated errors would differentiate between models so that the best models may be correctly identified. Note, that this is a different aim from determining the generalised performance of a given model on unseen data. Our aim can therefore be characterised as model selection rather than model evaluation. This is a key factor in determining the cross-validation scheme. For model selection, the size of the validation set is important in differentiating between models even if that reduces the training set somewhat, a finding that goes back to the work of Shao in the 1990’s (Shao, 1993, 1997). Hence, two-fold cross validation is the preferred choice for model selection as it provides a large validation set without making the training set unduly small. Under this scheme there is more variance in the validation set between models but the bias in error is large.

We have added a sentence to section 2.7 explaining this for the reader.

You refer to features (par 3.2) that are generally normally distributed. Why is this the case, I would argue that there should be some features that are not Gaussian. This problem could be phrased in a non-parametric setting- but this is likely important to mention in future work

The first few FPCs capture the main modes of variation in the data, and so their associated scoresmay include non-Gaussian characteristics such as skewness or long tails. These components are more likely to be influenced by the inherent structure and outliers in the data. As we move to higher-order principal components, the remaining variance becomes more evenly distributed and less influenced by specific data points or structures. This can lead to a more Gaussian distribution for these components. As the Central Limit Theorem suggests, the sum of many independent random variables tends towards a normal distribution, regardless of their individual distributions. Furthermore, the process of computing FPCs involves a linear transformation of the original data. This linear transformation can have a normalising effect on the data, especially for higher-order components that capture smaller and more random variations.

We agree that considering a non-parametric setting for this problem is valuable and should be explored in future work. This approach may be valuable as the first few FPCs play a significant role in the model. We have added a sentence to section 4.6 on future work.

You limit of 20 optimisation iterations (par 2.6). Are you sure that the results were feasible?

Yes, we appreciate that setting the limit on the number of iterations for an optimisation procedure needs careful consideration. When developing the code for optimisation we carefully scrutinised the performance of the optimisation using the verbose output to determine what would be a reasonable upper limit for the iteration count. We noted that optimised hyperparameter values rarely differed from those values obtained when running for 30 iterations rather than 20. The computational cost was another consideration given that large numbers of models needed to be fit, typically 50 models for each point in the grid search requiring hundreds or even thousands of model fits to be performed (including optimisations). Hence, it was reasonable to seek to limit the number of optimisation iterations. Going for 20 rather than the standard 30 iterations therefore reduced the computation cost by a third.

It should also be noted that we opted for the shorter ‘auto’ list of hyperparameters provided by MATLAB's Bayesian optimisation function, which are pre-selected based on their importance and impact on model performance. This usually requires fewer iterations to find the global optimum. For Lasso, this included only two hyperparameters (‘Lambda’ and ‘Learner’); for SVM it included 'BoxConstraint', 'KernelScale', 'Epsilon' and 'Standardize' (which was only effectively three hyperparameters since the predictors were already standardised); and for XGBoost there were three hyperparameters 'Method', 'NumLearningCycles' and 'LearnRate'. Note that in several cases the hyperparameters were categorical, which are optimised more efficiently with Bayesian optimization since it can handle categorical variables directly without the need for encoding or transformations. In conclusion, we are satisfied that there were sufficient iterations to determine an optimal model.

References cited in our responses:

Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (2nd ed). Springer.

Shao, J. (1993). Linear Model Selection by Cross-validation. Journal of the American Statistical Association, 88(422), 486–494. https://doi.org/10.1080/01621459.1993.10476299

Shao, J. (1997). An asymptotic theory for linear model selection. Statistica Sinica, 7(2), 221–242.

Reviewer 3 Report

Comments and Suggestions for Authors

The article is good and well structured .Article is required minor corrections ,My comments are given in attached file.

Thanks

Comments for author File: Comments.pdf

Author Response

From the comments annotated on the PDF:

We have ordered the keywords alphabetically.
We have made the sources of the data for the analysis clear in the caption of Table 2.
The statistical analysis was performed in MATLAB R2023b. We now make this explicit in section 2.7.
We have defined RMSE earlier in the text in section 2.7 on its first appearance, as we believe is standard practice. We think it unnecessary to redefine it here given its previous usage in the text and the fact that there is now a table of acronyms, as requested by another reviewer.
All our figures have been prepared to the high quality resolution required by the journal. We have added horizontal and vertical lines, splitting the heatmaps into quadrants to make it clearer to the reader the blocks for each feature set.
As suggested ,we have restated the key findings of our study in the conclusions and we believe that this adds weight to the summary. Thanks for the suggestion.

Article Menu

Wearable Sensors for Athletic Performance: A Comparison of Discrete and Continuous Feature-Extraction Methods for Prediction Models

Further Information

Guidelines

MDPI Initiatives

Follow MDPI