Text Correction
There was an error in the original publication [1]. We would like to adjust the numerical values in Section 3.2. Specifically, we identified a rounding error in the following sentence, which could lead to a misunderstanding regarding our selection of optimal hyperparameters. This rounding issue might create the impression that we determined the best hyperparameters based on test set performance, which is not the case. Instead, we selected the run with the highest validation performance as the optimal configuration.
A correction has been made to Section 3.2:
The optimal parameter combination is selected using the averaged performance from the 5-fold CV. Considered is the averaged macro F1-score achieved on the validation dataset. The best performance was achieved by a network with the following hyperparameters: batch size 32, three CNN-blocks with the “increasing filter, fixed kernel size” scheme, dropout with a rate of 0.2 as a regularization technique, and two LSTM layers. This combination results in a macro F1-score of 0.956 on the training set, 0.955 on the validation set, and 0.906 on the test set. In all tested configurations, macro F1-scores were achieved in the ranges of 0.932–0.969 (training set), 0.939–0.955 (validation set), and 0.867–0.906 (test set).
The Academic Editor has also instructed us to round all other entries of the same metric in the text to 3 decimal places so that this is consistent throughout the publication. Therefore, there are additional corrections in other sections, which I will list in the following.
A correction has been made to Section 3.3:
Regardless of the number of CNN blocks and the CNN structure used, the macro F1-score on training (0.935–0.973), validation (0.948–0.96), and test (0.877–0.901) sets are found to be close to the optimum.
A correction has been made to Section 4.5:
For example, the HS variants achieved a macro F1-score of 0.582–0.729 on the training dataset but a weighted F1-score of 0.821–0.856. Based on the rating distribution of these variants, one can see that rating “1” is underrepresented, and accordingly, a misclassified example of rating “1” influences the macro F1-score significantly more than misclassified examples of the other classes. In contrast, the influence of a misclassified example on the weighted F1-score is independent of the rating. A good example is the macro F1-score of the HS left dataset: The exercise has a score of 0.582 on the training dataset; at the same time, there are hardly any examples for rating “1”, which therefore has a huge impact on the score.
A correction has been made to Section 4.7:
The macro F1-score per exercise is improved by 0.04 (DS), 0.245 (IL), 0.205 (HS), and 0.054 (TSP).
As the majority of instances of the described metric appear in tables, we also have to add the additional third decimal place to three tables:
A correction has been made to Table 2:
| CNN-Blocks | IMU-Specific (Train/Validation/Test) | Channel-Specific (Train/Validation/Test) | Baseline (Train/Validation/Test) |
| 1 | 0.945/0.951/0.891 | 0.96/0.96/0.9 | 0.936/0.956/0.894 |
| 2 | 0.96/0.958/0.9 | 0.959/0.948/0.896 | 0.973/0.959/0.881 |
| 3 | 0.954/0.956/0.896 | 0.935/0.949/0.877 | 0.952/0.953/0.901 |
A correction has been made to Table 3:
| Dataset | Training Set | Validation Set | Test Set |
| Hurdle Step | 0.686 ± 0.045 | 0.679 ± 0.049 | 0.645 ± 0.049 |
| Hurdle Step right | 0.729 ± 0.037 | 0.755 ± 0.062 | 0.687 ± 0.041 |
| Hurdle Step left | 0.582 ± 0.041 | 0.566 ± 0.019 | 0.546 ± 0.018 |
| Inline Lunge | 0.877 ± 0.037 | 0.862 ± 0.05 | 0.825 ± 0.023 |
| Inline Lunge right | 0.863 ± 0.044 | 0.815 ± 0.062 | 0.84 ± 0.037 |
| Inline Lunge left | 0.868 ± 0.012 | 0.846 ± 0.05 | 0.849 ± 0.046 |
| Trunk Stability Pushup | 0.953 ± 0.027 | 0.897 ± 0.043 | 0.914 ± 0.037 |
| Deep Squat | 0.941 ± 0.029 | 0.948 ± 0.014 | 0.9 ± 0.021 |
A correction has been made to Table 4:
| Dataset | Training Set | Validation Set | Test Set |
| Hurdle Step | 0.816 ± 0.019 | 0.821 ± 0.015 | 0.301 ± 0.284 |
| Hurdle Step right | 0.854 ± 0.04 | 0.792 ± 0.021 | 0.267 ± 0.258 |
| Hurdle Step left | 0.821 ± 0.019 | 0.868 ± 0.022 | 0.405 ± 0.409 |
| Inline Lunge | 0.912 ± 0.031 | 0.88 ± 0.026 | 0.331 ± 0.177 |
| Inline Lunge right | 0.859 ± 0.03 | 0.806 ± 0.017 | 0.442 ± 0.353 |
| Inline Lunge left | 0.884 ± 0.026 | 0.813 ± 0.036 | 0.498 ± 0.347 |
| Trunk Stability Pushup | 0.953 ± 0.022 | 0.954 ± 0.01 | 0.154 ± 0.318 |
| Deep Squat | 0.978 ± 0.01 | 0.953 ± 0.007 | 0.485 ± 0.427 |
The authors state that the scientific conclusions are unaffected. This correction was approved by the Academic Editor. The original publication has also been updated.
Reference
- Spilz, A.; Munz, M. Automatic Assessment of Functional Movement Screening Exercises with Deep Learning Architectures. Sensors 2023, 23, 5. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).