Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Performance Progression and Stability of Female Swimmers Across Different Swimming Techniques from Childhood to Adulthood

Sports 2026, 14(4), 164; https://doi.org/10.3390/sports14040164

by Francisco A. Ferreira^1,2,*

, Mário J. Costa^3,4

and Catarina C. Santos^3,5,6

Reviewer 1:

Gavriil Arsoniadis

Reviewer 2: Anonymous

Reviewer 3:

Marco Panasci

Reviewer 4: Anonymous

Sports 2026, 14(4), 164; https://doi.org/10.3390/sports14040164

Submission received: 2 March 2026 / Revised: 8 April 2026 / Accepted: 17 April 2026 / Published: 21 April 2026

(This article belongs to the Special Issue Performance Analytics and Health Tracking: Toward Lifelong Athletic Sustainability)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I would like to thank the authors for their contribution to this field of research.

The manuscript addresses an important topic in sport science, especially in the areas of long-term athlete development and performance tracking in swimming. Using longitudinal data across multiple strokes and distances in female swimmers offers a valuable contribution. However, several methodological clarifications, analytical justifications, and improvements in the presentation and interpretation of results are necessary before the manuscript can be considered for publication.

Below you can find my comments for each section of the manuscript.

Title and Abstract

C1. The abstract does not clearly specify that the sample consists only of Portuguese Top-50 swimmers, which limits the generalizability of the findings. This should be clearly stated to prevent misleading readers about the population represented.

C2. The explanation of stability metrics (e.g., ICC values and normative stability) in the abstract is brief and may be difficult for non-specialist readers to interpret. Consider briefly explaining what distinguishes low, moderate, and high stability.

C3. The statement that short distances seem to be higher stability in comparison to longer distances should be more cautiously phrased since the ICC differences reported are relatively small.

C4. Consider making the final sentence of the abstract simpler for better clarity.

C5. Change “females’ swimming performance” to “female swimmers’ performance” for stylistic consistency.

C6. Eliminate the repetition of the word stability in the last sentences.

Introduction

C1. Although the introduction mentions that previous studies focus mainly on male swimmers or freestyle events, the specific knowledge gap addressed by the present study should be clearly articulated.

C2. The manuscript should explain why Portuguese swimmers constitute an appropriate sample for investigating long-term development trends.

C3. Is the Portuguese swimming system representative of broader developmental pathways?

C4. The introduction introduces mean stability and normative stability, but the conceptual differences between these constructs could be explained more clearly before the methods section.

C5. Some references are quite dated (e.g., tracking literature from the early 1990s). Consider including more recent longitudinal athlete development studies.

C6. Minor stylistic issues appear throughout the section (e.g., sentence length and punctuation).

C7. Check for consistency in citation formatting.

Materials and Methods

C1. Including only Top-50 ranked swimmers may cause selection bias and should be discussed more thoroughly. It might exclude late developers or swimmers who achieve elite level later.

C2. The method for retrieving performances from the SwimRankings database needs more explanation. It is unclear whether: a) the best performance per season was used, or b) multiple performances per swimmer were included.

C3. The use of both ANOVA and mixed-effects models is appropriate, but the reasoning for combining these methods should be more clearly explained.

C4. The manuscript should also explain: a) model assumptions, b) whether random slopes were considered.

C5. The thresholds used for η² interpretation appear unusual (e.g., “no effect if η² > 0.04”). These thresholds should be justified or replaced with commonly accepted standards.

C6. Specify the exact number of swimmers per event, if available.

C7. Clarify whether swimmers participated in multiple events simultaneously.

C8. The description of the software used for graphical visualization (Python, VS Code) may not be necessary for readers unless the code is included.

Results

C1. The manuscript states that performance “declines yearly," but this seems to refer to decreasing race times (i.e., improved performance). This wording should be clarified.

C2. The boxplots in Figure 1 (page 4) display performance distributions across ages, but the figure appears visually cluttered and is hard to interpret without more explicit labels.

C3. The heatmap in Figure 2 (pages 5–6) is informative but needs a clearer explanation in the text to help the reader understand the observed patterns.

C4. Table 1 could benefit from emphasizing the most relevant trends instead of presenting all numerical values equally.

C5. Some sentences describing results are quite lengthy and could be simplified.

C6. Consider including confidence intervals where appropriate.

Discussion

C1. Some conclusions about talent identification and long-term athlete development seem stronger than the data justify. The relatively low stability coefficients indicate that predictive ability remains limited.

C2. Although several studies are cited, the discussion could better differentiate the findings from previous longitudinal swimming studies, especially concerning stabilization age.

C3. Several explanations (e.g., menarche timing, technical learning advantages of backstroke) are speculative and should be framed more cautiously.

Limitations

C1. The limitations section is appropriate but should expand on: a) Selection bias caused by the Top-50 ranking criterion, b) Lack of biological maturation indicators, and c) Possible effects of training volume and coaching systems.

C2. Consider distinguishing between methodological limitations and limitations related to generalizability.

Conclusions

C1. The conclusions offer useful recommendations for coaches, but some claims about talent identification should be tempered, considering the relatively low stability indicators.

C2. Highlight that the findings are based on Portuguese competitive swimmers, which may limit their external validity.

C3. Consider presenting two or three clear key takeaways.

Author Response

Reviewer 1:

R: We sincerely thank the reviewer for the careful evaluation of our manuscript and for the constructive and insightful comments. We appreciate the time and effort dedicated to improving the quality and clarity of our work. Below, we provide a detailed, point-by-point response outlining how each suggestion has been incorporated into the revised manuscript, if applicable. We hope that this new version will meet your requirements for further publication.

Title and Abstract

The abstract does not clearly specify that the sample consists only of Portuguese Top-50 swimmers, which limits the generalizability of the findings. This should be clearly stated to prevent misleading readers about the population represented.

R: Thank you for your remark. The following information has now been included: “Data from female Portuguese Top-50 rankings in the short-course pool was extracted from an open-access database (swimrankings.net)” (please see lines 18-19)

The explanation of stability metrics (e.g., ICC values and normative stability) in the abstract is brief and may be difficult for non-specialist readers to interpret. Consider briefly explaining what distinguishes low, moderate, and high stability.

R: Thanks. New information has been added (please see lines 25-26). However, please note that the abstract has a limited word count. So, the detailed explanation of statistical procedures is provided in subsection entitled “2.3. Statistical analysis” (please see lines 116-146).

The statement that short distances seem to be higher stability in comparison to longer distances should be more cautiously phrased since the ICC differences reported are relatively small.

R: You are right. Smooth changes were made to improve the clarity of results, as well as overall conclusions (please see lines 33-34).

Consider making the final sentence of the abstract simpler for better clarity.

R: Please see our response to your remark#3. The sentence was changed (please see lines 33-34).

Change “females’ swimming performance” to “female swimmers’ performance” for stylistic consistency.

R: Thank you. Changed accordingly.

Eliminate the repetition of the word stability in the last sentences.

R: Done as suggested.

Introduction

Although the introduction mentions that previous studies focus mainly on male swimmers or freestyle events, the specific knowledge gap addressed by the present study should be clearly articulated.

R: Lines 60-64 were revised according to reviewer’s suggestions. The following sentence was added: “While these statistical analyses provide valuable insights, retrospective longitudinal research simultaneously considering the interaction between swimming techniques and distances across the maturational development remains scarce. This gap in the literature highlights the need to better characterize the trajectories of female swimmers across different competitive events.”

The manuscript should explain why Portuguese swimmers constitute an appropriate sample for investigating long-term development trends.

R: We thank the reviewer for this remark. Although we understand the concern, we believe that, from a methodological point of view, a detailed justification regarding the nationality of the swimmers is not necessary in the manuscript. Developmental pathways in sport constitute a multifactorial phenomenon shaped by interacting factors such as race and region of origin, as evidenced in several theoretical models (see doi:10.1017/S0954579423001281; doi:10.13189/saj.2025.130408). While socio-cultural and economic differences influence physical activity patterns (see doi:10.1016/j.healthplace.2017.05.013), the findings presented here align with established developmental frameworks for both male and female swimmers in other international contexts (see doi:10.3389/fspor.2020.589938; doi:10.1111/sms.13599). Furthermore, this study provides an up-to-date analysis by incorporating performance data through 2025. Also, several experimental studies are conducted using samples from a single nationality, and this information is not always explicitly available to the reader. Nevertheless, in the interest of transparency and clarity, and acknowledging that the results may not be fully generalizable to other samples (e.g., swimmers from different countries), we have included the country associated with the ranking used. Moreover, as the competitive swimming system may differ between countries, this aspect has been highlighted in the take-home message.

Is the Portuguese swimming system representative of broader developmental pathways?

R: Please see our response to your remark#8. We acknowledge that some countries do not use the same competitive system, however regarding the Portuguese system, the swimming pathway reflects broader global developmental trends, particularly in performance progression. However, this alignment is most prominent within the female swimmers' system, which mirrors international trajectories and represents a significant achievement for the national structure. In contrast, the pathways for males exhibit distinct nuances, suggesting that while the Portuguese system is representative of general trends, it remains uniquely shaped by sex-specific and country-specific characteristics.

The introduction introduces mean stability and normative stability, but the conceptual differences between these constructs could be explained more clearly before the methods section.

R: We understand your concern. However, we opted to include these differences in the Methods section to avoid excessive detail in the Introduction and to prevent potential misinterpretation of the concepts. Please see the new information added in the Statistical Analysis subsection (lines 119, 126-127)

Some references are quite dated (e.g., tracking literature from the early 1990s). Consider including more recent longitudinal athlete development studies.

R: We thank the reviewer for the suggestion. However, we consider the current references to be adequate for our rationale, as they describe the primary methodological approaches upon which our study is built. These earlier works remain highly relevant to the specific context of swimming research and provide the necessary technical background for our analysis.

Minor stylistic issues appear throughout the section (e.g., sentence length and punctuation).

R: Thank you. Revised.

Check for consistency in citation formatting.

R: Thank you. Revised.

Materials and Methods

Including only Top-50 ranked swimmers may cause selection bias and should be discussed more thoroughly. It might exclude late developers or swimmers who achieve elite level later.

R: We acknowledge this concern, but we do not fully understand the point being raised. This study used a retrospective design, specifically selecting swimmers who reached the Top-50 national ranking to analyze the pathways of successful performers. While this focus may exclude some late developers, it allows for a clear analysis of the developmental trajectories leading to elite-level status in the Portuguese competitive system.

The method for retrieving performances from the SwimRankings database needs more explanation. It is unclear whether: a) the best performance per season was used, or b) multiple performances per swimmer were included.

R: Thank you. The data corresponds to the best performance achieved in each competitive season, meaning that only the best performance for each age was considered. We hope that this clarification makes the approach clearer to the reader. Please see the new information in lines 104-105.

The use of both ANOVA and mixed-effects models is appropriate, but the reasoning for combining these methods should be more clearly explained.

R: In the statistical analysis subsection, a sentence was rephrased to “A linear mixed-effect model (LMM) with fixed intercepts and restricted maximum likelihood estimation was applied to compare the stability of performance across events.”

The manuscript should also explain: a) model assumptions, b) whether random slopes were considered.

R: To clarify the methodological approach on LMM, the following sentence was included in the Statistical Analysis subsection “Model assumptions, including linearity, normality of residuals, and homoscedasticity, were verified through visual inspection of residual plots and Q–Q plots, which confirmed that the data met the requirements for LMM.”

The thresholds used for η² interpretation appear unusual (e.g., “no effect if η² > 0.04”). These thresholds should be justified or replaced with commonly accepted standards.

R: We acknowledge the reviewer's remark. The information was revised and the interpretation for no effect was corrected to “if η² ≤ 0.04”.

Specify the exact number of swimmers per event, if available.

R: The exact number of swimmers per event is described in the boxplots (please see Figure 1).

Clarify whether swimmers participated in multiple events simultaneously.

R: Yes, swimmers may participate in more than one event if they belong to the Top-50 of different events; however, this does not imply that a swimmer’s best performance was achieved in the same competitive event. For example, considering the Top-50 rankings from the USA analyzed here, Kate Ledecky would appear in at least there freestyle events (400, 800 and 1500m), but this does not mean that her best performance for a specific age was achieved at the same competition. Nevertheless, looking specifically at her performance at 18 years old, Ledecky achieved her best performance for all events at the 2015 WC in Kazan. Your concern is something very common in the context of competitive swimming and we therefore opted not include redundant information.

The description of the software used for graphical visualization (Python, VS Code) may not be necessary for readers unless the code is included.

R: We appreciate the reviewer’s suggestion. However, we believe that specifying the use of Python is relevant as it indicates that the visualizations were generated programmatically rather than through manual editing software. This ensures that the graphical outputs are a direct, reproducible reflection of the underlying statistical analysis.

Results

The manuscript states that performance “declines yearly," but this seems to refer to decreasing race times (i.e., improved performance). This wording should be clarified.

R: Rephrased to “Performance seems to improve yearly across all events through the time decline (s).” and clarified in the discussion (please see lines 153-154).

The boxplots in Figure 1 (page 4) display performance distributions across ages, but the figure appears visually cluttered and is hard to interpret without more explicit labels.

R: To counteract with this issue, an additional Table was created as Supplementary File, including the suggestion of 95%CI inclusion.

The heatmap in Figure 2 (pages 5–6) is informative but needs a clearer explanation in the text to help the reader understand the observed patterns.

R: The initial sentence was rephrased “Figure 2 presents the normative stability for each event, where lighter or darker colors translate into reduced or increased correlation across ages (respectively).”

Table 1 could benefit from emphasizing the most relevant trends instead of presenting all numerical values equally.

R: We appreciate the suggestion. As a solution, the values of stability were highlighted in Table 1 (please see line 167-169).

Some sentences describing results are quite lengthy and could be simplified.

R: Thank you. The length of the sentences was revised.

Consider including confidence intervals where appropriate.

R: We appreciate it. The suggestion was considered and 95%CI were retrieved and reported in Table S1 (as described in remark#23)

Discussion

Some conclusions about talent identification and long-term athlete development seem stronger than the data justify. The relatively low stability coefficients indicate that predictive ability remains limited.

R: The last sentence of the first paragraph of discussion was carefully revised (please see lines 207-209), and we hope it is now clearer and better reflects the results obtained.

Although several studies are cited, the discussion could better differentiate the findings from previous longitudinal swimming studies, especially concerning stabilization age.

R: Although some reinforcement was conducted in this order, some studies regarding stability ages were already considered during the previous version. If the reviewer believes that the Discussion could be improved, we would welcome suggestion to enhance it (e.g., specific studies to cite, rationale or other relevant aspects), as their remark was not entirely clear for the authors.

Several explanations (e.g., menarche timing, technical learning advantages of backstroke) are speculative and should be framed more cautiously.

R: We understand the reviewer's point of view and appreciate the alert. However, in our perspective, those factors could change the trajectory of the developmental pathways. Using the menarche timing example, female swimmers could also differ in the stabilization years between each other, which may lead to variations in normative stability since rank-order may not be maintained during the years. Moreover, backstroke and freestyle are recognized as the most economical techniques among international swimmers [34], which may contribute to more consistent performance levels. From a technical development perspective, as backstroke is often one of the first techniques mastered in competitive swimming, athletes have a longer period to refine technical skills, potentially leading to earlier inter-group stability [7]. These distinctions help explain the slightly greater stability observed in short-distance and backstroke performances found in the present cohort. Despite that, we would like to hear other opinions and reasons.

Limitations

The limitations section is appropriate but should expand on: a) Selection bias caused by the Top-50 ranking criterion, b) Lack of biological maturation indicators, and c) Possible effects of training volume and coaching systems.

R: Thanks. The limitations paragraph was revised and some limitations were extended or clarified (please see lines 313-320)

Consider distinguishing between methodological limitations and limitations related to generalizability.

R : Please see our response to your remark#31.

Conclusions

The conclusions offer useful recommendations for coaches, but some claims about talent identification should be tempered, considering the relatively low stability indicators.

R: Conclusion was revised and the recommendations were considered. Also, a new conclusion sentence was added (see lines 332-333).

Highlight that the findings are based on Portuguese competitive swimmers, which may limit their external validity.

We understand and appreciate the suggestions. The conclusion was revised.

Consider presenting two or three clear key takeaways.

R: The last part was reformulated in a way to present take-home messages (please see lines 333-344).

Reviewer 2 Report

Comments and Suggestions for Authors

The topic is relevant for long-term athlete development and talent identification in swimming. The dataset is valuable and the study design is interesting. However, several issues should be addressed before the manuscript can be considered for publication.

The study is described as a retrospective observational design, but the sampling procedure requires clearer explanation.

Whether swimmers could contribute data to multiple events.
Whether rankings were event-specific or swimmer-specific.
How missing seasons were handled.

The inclusion criteria require swimmers to:

be ranked Top-50 in the 2024–2025 season
have results in at least seven seasons from age 10 to 18

This procedure may introduce survivorship bias, because:

only swimmers who remained active and competitive until age 18 were included
swimmers who dropped out earlier were excluded

As a result, the developmental trajectories may be overly optimistic compared with the general swimmer population.

Discuss this potential bias explicitly in the limitations section.

In the Results section, the manuscript states:

“Performance (time, s) seems to decline yearly across all events.”

However, declining time means performance improvement in swimming.

This wording is confusing and could be misinterpreted.

The mixed-effects model shows:

ICC = 0.05 in the general model
ICC = 0.11–0.15 in separate models

These values indicate low stability between swimmers, yet the discussion interprets them as meaningful stability.

This interpretation should be more cautious, as ICC values below ~0.40 typically indicate weak reliability or stability.

The reported conditional R² values are extremely high:

R²c = 0.98 in the general model
R²c up to 0.99 in some technique models

Such values are unusual in performance development research and may reflect:

model overfitting
strong dependence on age trends
model specification issues

Provide clarification on:

how R² values were calculated
whether marginal and conditional R² were both assessed
which variance components contributed to these values.

The manuscript introduces mean stability and normative stability, but the conceptual definitions are not sufficiently clear.

For example:

“Mean stability examines whether individuals maintain their relative position in a distribution”
“Normative stability investigates the consistency of inter-individual differences”

These definitions are somewhat confusing and overlap conceptually.

Provide clearer operational definitions and explain how each metric contributes to the research objectives.

In the Methods section:

Normative stability was evaluated through the Pearson correlation coefficient.

However:

correlations were calculated across multiple repeated observations
this structure may violate independence assumptions.

It is unclear whether autocorrelation or repeated-measures correlations were considered.

Clarify the correlation procedure and justify the use of Pearson correlations in this longitudinal context.

The manuscript states:

“The Kolmogorov–Smirnov test proved normality of data for all ages.”

However, this test has low power with moderate samples and is rarely recommended as the sole indicator of normality.

Additionally, normality is less relevant for mixed-effects models.

Provide additional justification or consider removing this statement.

Some visual elements require clarification.

Figure 1

The figure description indicates Bonferroni comparisons with colored lines, but this may be difficult to interpret without clearer labeling.

Figure 2

The heatmap includes significance markers but the caption lists:

“*p < 0.05, *p < 0.01”

The second should likely be **“p < 0.01”.

The manuscript suggests that performance before age 15 cannot reliably predict adult success.

While this may be plausible, the present dataset:

only includes swimmers who remained competitive until age 18
does not track adult elite success

Therefore, the predictive conclusions should be framed more cautiously.

Author Response

Reviewer 2:

The study is described as a retrospective observational design, but the sampling procedure requires clearer explanation.

Whether swimmers could contribute data to multiple events.
Whether rankings were event-specific or swimmer-specific.
How missing seasons were handled.

R: We appreciate the reviewers’ comments. Smooth changes were made in the participants’ subsection (lines 97-98). Answering the first point: yes, swimmers may participate in more than one event if they belong to the Top-50 of different events; however, this does not imply that a swimmer’s best performance was achieved in the same competitive event. For example, considering the Top-50 rankings from the USA analyzed here, Kate Ledecky would appear in at least there freestyle events (400, 800 and 1500m), but this does not mean that her best performance for a specific age was achieved at the same competition. Nevertheless, looking specifically at her performance at 18 years old, Ledecky achieved her best performance for all events at the 2015 WC in Kazan. Your concern is something very common in the context of competitive swimming and we therefore opted not include redundant information. Regarding the last point, we also highlight that, when two out of the nine performances were missing (corresponding to the nine years searched), the swimmers were removed from the analysis.

The inclusion criteria require swimmers to:

be ranked Top-50 in the 2024–2025 season
have results in at least seven seasons from age 10 to 18

This procedure may introduce survivorship bias, because:

only swimmers who remained active and competitive until age 18 were included
swimmers who dropped out earlier were excluded

As a result, the developmental trajectories may be overly optimistic compared with the general swimmer population.

Discuss this potential bias explicitly in the limitations section.

R: We appreciate the highlighted question regarding the bias and it was considered to include in the limitations section (lines 313-320). In the Results section, the manuscript states:

“Performance (time, s) seems to decline yearly across all events.”
However, declining time means performance improvement in swimming.
This wording is confusing and could be misinterpreted.

R: Based on yours and the other reviewer suggestions, we have clarified this issue all over the text. We hope it is clearer now.

The mixed-effects model shows:

ICC = 0.05 in the general model
ICC = 0.11–0.15 in separate models

These values indicate low stability between swimmers, yet the discussion interprets them as meaningful stability.

This interpretation should be more cautious, as ICC values below ~0.40 typically indicate weak reliability or stability.

R: Thank you. The data was interpreted as suggested and a tentative comparison across the events was highlighted, instead of a direct interpretation of the ICC values. Please tell us if it understandable now.

The reported conditional R² values are extremely high:

R²c = 0.98 in the general model
R²c up to 0.99 in some technique models

Such values are unusual in performance development research and may reflect:

model overfitting
strong dependence on age trends
model specification issues

Provide clarification on:

how R² values were calculated
whether marginal and conditional R² were both assessed
which variance components contributed to these values.

R: We acknowledge the reviewer’s concern regarding the high R² values. These were retrieved from the Linear Mixed Model analysis in SPSS using the Nakagawa and Schielzeth variance decomposition framework. Both marginal and conditional were assessed; the high conditional values (R²c= 0.86–0.99) are primarily driven by the strong developmental effect of age and the substantial between-swimmer variance captured by the random intercept. In age group swimmers, stable inter-individual differences and age-related growth explain a vast majority of performance variance, which the model accurately reflects. To prevent overfitting and ensure proper specification, we employed Restricted Maximum Likelihood (REML) estimation. We have made smooth changes in the manuscript (lines 285-290) to clarify these high values.

The manuscript introduces mean stability and normative stability, but the conceptual definitions are not sufficiently clear.

For example:

“Mean stability examines whether individuals maintain their relative position in a distribution”
“Normative stability investigates the consistency of inter-individual differences”

These definitions are somewhat confusing and overlap conceptually.

Provide clearer operational definitions and explain how each metric contributes to the research objectives.

R: In the statistical analysis subsection, the following sentences were complemented: “Mean stability (i.e., the consistency of group-level performance over time) was assessed using the quartiles distribution and mean ± standard deviation (SD) values … Normative stability (i.e., the consistency of an individual’s rank-order position relative to their peers) was evaluated through the Pearson correlation coefficient (r)”. By using both metrics, we can distinguish between the overall system progression (mean stability) and the predictability of individual success (normative stability) in female swimmers, which was done in the results subsection (Figure 1 for mean stability and Figure 2 for normative stability).

In the Methods section:

Normative stability was evaluated through the Pearson correlation coefficient.

However:

correlations were calculated across multiple repeated observations
this structure may violate independence assumptions.

It is unclear whether autocorrelation or repeated-measures correlations were considered.

Clarify the correlation procedure and justify the use of Pearson correlations in this longitudinal context.

R: The approach used to verify the relation between paired performances has been a common methodological analysis for longitudinal data analysis (see doi:10.1080/02640414.2011.587196; doi: 10.1111/cdep.12221; https://pmc.ncbi.nlm.nih.gov/articles/PMC3761712/). Also, this is now justified in the statistics subsection according to a previous comment of yours.

The manuscript states:

“The Kolmogorov–Smirnov test proved normality of data for all ages.”

However, this test has low power with moderate samples and is rarely recommended as the sole indicator of normality.

Additionally, normality is less relevant for mixed-effects models.

Provide additional justification or consider removing this statement.

R: We understand your point. However, the Kolmogorov–Smirnov test was used to prove normality of data for the remaining statistical approach (i.e., ANOVA and correlations). If we remove the sentence, it can be asked by the editors and the other reviewers to include it again. We hope you can understand it.

Some visual elements require clarification.

Figure 1

The figure description indicates Bonferroni comparisons with colored lines, but this may be difficult to interpret without clearer labeling.

Figure 2

The heatmap includes significance markers but the caption lists:

“*p < 0.05, *p < 0.01”

The second should likely be **“p < 0.01”.

R: A tentative of higher quality images was tried. Also, the “**p < 0.01” was changed. Thanks for highlighting this.

The manuscript suggests that performance before age 15 cannot reliably predict adult success.

While this may be plausible, the present dataset:

only includes swimmers who remained competitive until age 18
does not track adult elite success

Therefore, the predictive conclusions should be framed more cautiously.

R: We appreciate the alert. The conclusion section was revised and complemented with some cautious interpretations.

Reviewer 3 Report

Comments and Suggestions for Authors

Abstract

Please, mention the use of linear mixed-effects models and ICCs briefly in the methods section.
Please, clarify the conclusion regarding backstroke and short-distance stability.

Introduction

Please, include a paragraph on the female swimming performance model to better contextualize the study.
Add one or some scientific references supporting the hypothesis of study.

Materials and Methods

Participants

Could you please clarify why only swimmers aged 18 or over in 2024-25 were included?
Confirm that the performances were from official short-course competitions.
Please indicate the date on which the SwimRankings database was accessed.
Justify briefly the use of World Aquatics Points.
Please state that the data were publicly available and analysed anonymously.

Study design

Please, clarify the inclusion/exclusion criteria used to select swimmers from the Portuguese Top-50 rankings.
Furthermore, provide a breakdown of the number of individual swimmers represented in the 3,087 performances.

Statistical analysis

Consider briefly justifying the choice of repeated-measures ANOVA over linear mixed-effects models.
Clarify the interpretation of effect sizes and correlation thresholds to improve reader understanding.

Results

Please, improve the quality of graphs for a better understanding of the results.
Please, update Table 2 to include the correct p-values for all parameters.

Discussion

Focus the discussion more explicitly on the study’s results, ensuring all interpretations are directly supported by the data.
Clarify why backstroke and short-distance events showed higher stability, relating these findings to existing literature.
Expand the comparison of your results with previous studies, highlighting similarities and differences in performance progression and stability.
Consider emphasizing limitations related to sample selection and developmental variability more concisely.
Add a paragraph discussing the practical applications of the results, specifically highlighting how coaches can use these findings to guide training and athlete development.

Author Response

Reviewer 3:

Abstract

Please, mention the use of linear mixed-effects models and ICCs briefly in the methods section.

R: Thank you. Included as suggested.

Please, clarify the conclusion regarding backstroke and short-distance stability.

R: Based on yours and the other reviewer suggestions, changes were performed and could be observed in lines 32-34.

Introduction

Please, include a paragraph on the female swimming performance model to better contextualize the study.

R: We thank the reviewer for the suggestion. An additional paragraph was included (please see lines 60-64).

Add one or some scientific references supporting the hypothesis of study.

R: The lack of literature on this specific topic does not help an in-depth support; however, new references were included (e.g., doi 10.3390/sports4010016).

Materials and Methods

Participants

Could you please clarify why only swimmers aged 18 or over in 2024-25 were included?

R: We thank the reviewer for this observation. The inclusion criteria were restricted to swimmers aged 18 or older in the 2024-25 season to ensure that the dataset consisted entirely of retrospectively observed data. Including younger swimmers (e.g., 16-year-olds) would have resulted in incomplete data profiles for the study period, necessitating the use of predictive modeling or statistical imputation to fill in missing values. To maintain the highest level of data integrity and avoid the uncertainty associated with such projections, we focused only on those with fully realized data points for the specified timeframe.

Confirm that the performances were from official short-course competitions.

R: Yes, both the consulted Top-50 female swimmers and the performances across the ages were retrieved from short-course.

Please indicate the date on which the SwimRankings database was accessed.

R: The information is presented in reference 22, and the database was accessed on 17 July 2025.

Justify briefly the use of World Aquatics Points.

R: It is the most common metric for swimming performance and has been lately used to categorize swimmers into competitive tiers.

Please state that the data were publicly available and analysed anonymously.

R: The information was reinforced and included in the Study Design subsection (lines 97-98;102-106).

Study design

Please, clarify the inclusion/exclusion criteria used to select swimmers from the Portuguese Top-50 rankings.

R: The inclusion/exclusion criteria was improved and clarified. Please see lines 95-98.

Furthermore, provide a breakdown of the number of individual swimmers represented in the 3,087 performances.

R: A breakdown was already performed in the boxplots (n= informed in the label) and improved now in Table S1.

Statistical analysis

Consider briefly justifying the choice of repeated-measures ANOVA over linear mixed-effects models.

R: Both analysis are relevant to understanding the different aims addressed in our manuscript. While repeated-measures ANOVA was used to mean stability analysis (based on the values of the repeated performance of a certain event), the LMM was used to compare the different events. This idea was complemented in Statistical Analysis subsection “A linear mixed-effect model (LMM) with fixed intercepts and restricted maximum likelihood estimation was applied to compare the stability of performance to compare events.”

Clarify the interpretation of effect sizes and correlation thresholds to improve reader understanding.

R: The initial sentence regarding the correlation was rephrased “Figure 2 presents the normative stability for each event, where lighter or darker colors translate into reduced or increased correlation across ages (respectively).” to allow a clearer interpretation of the results (please see lines 171-172).

Results

Please, improve the quality of the graphs for a better understanding of the results.

R: Thank you. We attempted to improve the quality of the images. Moreover, a new Table S1 provides a clearer analysis of the boxplot values.

Please, update Table 2 to include the correct p-values for all parameters.

R: We apologize but we are unable to understand your remark. If possible, could you please explain what you mean by “correct p-values”?

Discussion

Focus the discussion more explicitly on the study’s results, ensuring all interpretations are directly supported by the data.

R: Your comment was considered and the discussion was revised in several parts for a deeper explanation. We hope that now could meet your demands.

Clarify why backstroke and short-distance events showed higher stability, relating these findings to existing literature.

R: We appreciate the reviewer’s suggestion to further clarify the stability observed in backstroke and short-distance events. We agree that this is a key finding and have ensured this is explicitly addressed in the Discussion section by synthesizing the biomechanical, physiological, and pedagogical factors that contribute to this stability. Specifically, we highlight that the biomechanical and physiological demands of short-distance events may result in lower performance fluctuations within a season, aligning with previous evidence [1, 7, 11]. Furthermore, backstroke and freestyle are recognized as the most economical techniques among international swimmers [34], which may contribute to more consistent performance levels. From a technical development perspective, as backstroke is often one of the first techniques mastered in competitive swimming, athletes have a longer period to refine technical skills, potentially leading to earlier inter-group stability [7]. These distinctions help explain the slightly greater stability observed in short-distance and backstroke performances found in the present cohort.

Expand the comparison of your results with previous studies, highlighting similarities and differences in performance progression and stability.

R: New sentences were included regarding this question (please see lines 295-300).

Consider emphasizing limitations related to sample selection and developmental variability more concisely.

R: Thank you. Based on your and the other reviewers' comments, several limitations were further detailed, and some were included. We hope that this section is better now (please see lines 313-320).

Add a paragraph discussing the practical applications of the results, specifically highlighting how coaches can use these findings to guide training and athlete development.

R: The asked information is already presented in the Conclusion section: “Taken together, these findings bring several practical implications for long-term swimmers’ development. Some caution is warranted when interpreting early success or attempting to identify talented female Portuguese swimmers before the age of 15 years, as performance at age groups does not reliably predict adult success. The performance trajectories during the initial stages of competitive swimming (10-15 years) are strongly influenced by growth and maturation, which require avoiding premature judgments about long-term potential and possibly include within-sport specialization perspectives instead. To counteract discouragement from performance plateaus, coaches must actively use the quantified annual progression data provided in this study in combination with percentile curves and mathematical models to establish appropriate goals and ensure realistic expectations as swimmers approach peak performance age. Moreover, frameworks addressing the different developmental phases and normative benchmarks are welcome to be integrated into this analysis.”

Reviewer 4 Report

Comments and Suggestions for Authors

Dear authors,

However, after careful consideration, I regret to say that I cannot recommend the manuscript for publication in its current form. The reasons for this decision are outlined below:

The study's overall contribution to the current literature seems constrained. The analysis offers descriptive insights into performance progression across swimming styles, with the primary findings substantially corroborating patterns previously recorded in research on long-term athlete development and performance trajectories in swimming. The article fails to significantly enhance the existing theoretical or empirical comprehension of the subject matter.

While the manuscript references several previous studies on swimming performance development, the discussion could benefit from a more explicit positioning of the present findings relative to the existing evidence. For instance, many of the observed patterns such as the progressive reduction in yearly improvements with age and the emergence of a performance plateau in mid-adolescence have already been documented in prior longitudinal analyses of competitive swimmers. Clarifying in more detail what specific novel insights are provided by examining different techniques and distances, beyond confirming previously reported developmental trajectories, would help to better justify the contribution of the study to the literature.

The study solely depends on retrospective ranking data, limiting the capacity to include significant elements that may affect performance development, like biological maturation, training volume, coaching environment, and athlete selection methods. The interpretation of the observed development and stability patterns is rather constrained without considering these variables.

The discussion section might benefit from a more thorough engagement with the current literature and a more critical analysis of the results. The discussion, as it is, reiterates the findings instead of examining the mechanisms that may elucidate the observed disparities among swimming styles and distances.

Several methodological elements require additional elucidation, especially concerning the consequences of the sample technique (top-ranked swimmers exclusively) and the inherent biases linked to longitudinal ranking-based datasets.

Another aspect that may deserve further clarification concerns the interpretation of the stability indicators. In the general mixed-effects model, the intraclass correlation coefficient (ICC) reported is relatively low (ICC = 0.05), suggesting that only a small proportion of the variance in performance is attributable to between-swimmer differences after accounting for the fixed effects. This raises questions regarding the practical magnitude of the stability effects discussed in the manuscript. A more cautious interpretation of these values, along with a clearer explanation of their practical significance for talent identification and long-term athlete development, would strengthen the methodological transparency of the study.

Some figures could benefit from improved clarity. For instance, Figure 2 (the heatmap of Pearson correlation coefficients) is somewhat difficult to interpret due to the density of information and the small values displayed within the cells. Improving the visualization or simplifying the presentation may enhance readability for the reader. This is my opinion. The same could apply for Figure 1 as well with boxplots.

I am unable to endorse the manuscript's publication for these reasons. I trust that the authors will find these comments to be constructive and beneficial in the process of enhancing the work for submission to another journal.

Author Response

Reviewer 4:

Dear authors,

I appreciate the opportunity to evaluate this work. The study examines performance progression and stability in female swimmers across various swimming styles, which is pertinent to sport science and athlete development. The paper is well-structured, and the dataset is vast, encompassing numerous seasons of competitive performances. However, after careful consideration, I regret to say that I cannot recommend the manuscript for publication in its current form. The reasons for this decision are outlined below:

The study's overall contribution to the current literature seems constrained. The analysis offers descriptive insights into performance progression across swimming styles, with the primary findings substantially corroborating patterns previously recorded in research on long-term athlete development and performance trajectories in swimming. The article fails to significantly enhance the existing theoretical or empirical comprehension of the subject matter.

R: We thank the reviewer for this comment. While some findings align with previous literature, the study adds value in several ways. It focuses exclusively on female swimmers, an underrepresented group in longitudinal research. It analyzes a complete developmental window (10–18 years) using a consistent Top-50 cohort. It also compares multiple techniques and distances simultaneously, which is less explored. Additionally, combining mean and normative stability with mixed models provides a more comprehensive approach. Finally, the use of recent data (up to 2025) reflects current development patterns. The manuscript was revised to better emphasize these contributions.

While the manuscript references several previous studies on swimming performance development, the discussion could benefit from a more explicit positioning of the present findings relative to the existing evidence. For instance, many of the observed patterns such as the progressive reduction in yearly improvements with age and the emergence of a performance plateau in mid-adolescence have already been documented in prior longitudinal analyses of competitive swimmers. Clarifying in more detail what specific novel insights are provided by examining different techniques and distances, beyond confirming previously reported developmental trajectories, would help to better justify the contribution of the study to the literature.

R: We thank the reviewer for this valuable suggestion. We agree that some observed patterns are consistent with previous literature. However, the novelty of this study lies in the comparative analysis across techniques and distances, allowing us to identify subtle differences in stability and progression that are not evident in single-event studies. In particular, our findings highlight variation in stability profiles between techniques and distances of competitive events, providing additional insight into how developmental trajectories may differ by event. The Discussion section was revised to more clearly position these contributions relative to existing evidence. Mainly, new sentences were included regarding this question (please see lines 295-300).

The study solely depends on retrospective ranking data, limiting the capacity to include significant elements that may affect performance development, like biological maturation, training volume, coaching environment, and athlete selection methods. The interpretation of the observed development and stability patterns is rather constrained without considering these variables.

R: We thank the reviewer for this important remark. We acknowledge that the retrospective nature of ranking data limits the inclusion of factors such as biological maturation, training load, and coaching environment. As such, the interpretation of developmental patterns should be made with caution. However, the use of objective, large-scale performance data allows for a consistent and ecologically valid analysis of competitive outcomes over time. We have expanded the Limitations section to explicitly address the absence of these variables and to highlight their potential influence on performance trajectories. Thank you. Based on your and the other reviewers' comments, several limitations were further detailed, and some were included. We hope that this section is better now (please see lines 313-320).

The discussion section might benefit from a more thorough engagement with the current literature and a more critical analysis of the results. The discussion, as it is, reiterates the findings instead of examining the mechanisms that may elucidate the observed disparities among swimming styles and distances.

R: We thank the reviewer for this constructive comment. We acknowledge that the initial version of the Discussion was more descriptive. Following this and previous reviewers’ suggestions, several paragraphs were revised to provide a more critical interpretation of the results and underlying mechanisms (please see lines 207-209). In particular, we expanded the explanation of differences between techniques and distances, integrating biomechanical, physiological, and developmental perspectives. We also strengthened the connection with current literature to better contextualize our findings (please see lines 295-300). We hope these revisions improve the depth and clarity of the Discussion section.

Several methodological elements require additional elucidation, especially concerning the consequences of the sample technique (top-ranked swimmers exclusively) and the inherent biases linked to longitudinal ranking-based datasets.

R: We thank the reviewer for this helpful comment. The methodological aspects related to the use of Top-50 swimmers and ranking-based longitudinal data have been further clarified in the Methods and Limitations sections (please see lines 313-320). This approach allows for a consistent analysis of developmental trajectories within a defined competitive level, while also acknowledging considerations inherent to this type of dataset. These points are also now more clearly described to improve transparency and interpretation during Material and Methods section (please see lines 97-98, 104-106).

Another aspect that may deserve further clarification concerns the interpretation of the stability indicators. In the general mixed-effects model, the intraclass correlation coefficient (ICC) reported is relatively low (ICC = 0.05), suggesting that only a small proportion of the variance in performance is attributable to between-swimmer differences after accounting for the fixed effects. This raises questions regarding the practical magnitude of the stability effects discussed in the manuscript. A more cautious interpretation of these values, along with a clearer explanation of their practical significance for talent identification and long-term athlete development, would strengthen the methodological transparency of the study.

R: We thank the reviewer for this insightful comment. We agree that the reported ICC values indicate a relatively low proportion of between-swimmer variance and should be interpreted with caution. Following this and previous reviewers’ suggestions, the interpretation of stability indicators was revised to avoid overstatement (please see lines 32-33, 207-209). We now emphasize that these values reflect limited predictive capacity at the individual level, particularly in early developmental stages. Additionally, the practical implications for talent identification and long-term development were reframed more cautiously to better align with the observed effect sizes.

Some figures could benefit from improved clarity. For instance, Figure 2 (the heatmap of Pearson correlation coefficients) is somewhat difficult to interpret due to the density of information and the small values displayed within the cells. Improving the visualization or simplifying the presentation may enhance readability for the reader. This is my opinion. The same could apply for Figure 1 as well with boxplots.

R: We thank the reviewer for this helpful suggestion. Following this and previous reviewers’ comments, several improvements were made to enhance figure clarity. The overall image quality was improved, and Figure 1 and can be now complemented with a new Table S1 (95% CI) to aid interpretation (Supplementary File). Additionally, key values were highlighted in Table 1 (red) to emphasize relevant trends (please see lines 167–169). For Figure 2, the explanation of the heatmap was clarified in the text (please see lines 171–172), improving the interpretation of correlation. We hope these changes enhance readability.

I am unable to endorse the manuscript's publication for these reasons. I trust that the authors will find these comments to be constructive and beneficial in the process of enhancing the work for submission to another journal.

R: We thank the reviewer for the time and effort dedicated to evaluating our manuscript and for the constructive feedback provided. We respectfully hope that the revisions made in response to this and the other reviewers’ comments have substantially improved the clarity, methodological transparency, and overall contribution of the study. We believe that the manuscript is now stronger and better aligned with the journal’s standards, and we hope it may be reconsidered for publication.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I want to thank the authors for their effort in responding to all my comments. I believe the current manuscript is ready for publication.

Author Response

Round 2:

I want to thank the authors for their effort in responding to all my comments. I believe the current manuscript is ready for publication.

R: We sincerely thank the reviewer for acknowledging the improvement of the manuscript quality and for the positive feedback of publication.

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript is clearer and better organized than the last one. Some of the initial problems have been fixed, and the article is now in much better shape. That so, there are still some important limits, especially when it comes to interpretation and the openness of methods.

The manuscript is well-written and methodologically sound, although its overall contribution still seems a bit small. The results mostly back up what has already been said about how swimming performance improves and then levels off. The inclusion of female swimmers and the comparison of tactics and distances are pertinent and appreciated. It is still not obvious how these discoveries add to what we already know in a meaningful way. The manuscript would benefit from a clearer discussion of what is new in terms of ideas, beyond just verifying what is already known about developmental trends. Right now, the work reads more like a thorough description of trajectories than like a new way to understand them.
Linear mixed-effects models are a good choice for this kind of data and are the right choice. Nonetheless, the analysis does not completely capitalize on this methodology. The manuscript presents ICC values and fixed effects; however, it does not thoroughly examine interaction effects. The notably low ICC in the general model (ICC = 0.05) necessitates more meticulous interpretation. If just a tiny part of the variance can be explained by changes between swimmers after taking into account age, distance, and technique, this has direct effects on how "stability" should be understood. Right now, this point is brought up, but not in a critical way. There is also no model comparison or sensitivity analysis, which would help the reader realize how strong these results are.
From a statistical point of view, mean and normative stability are well-defined, but their practical meaning is still a little unclear. It would be beneficial to elucidate the implications of these results for coaches or talent discovery systems. For instance, if ICC values are always low, this means that they don't have much predictive power at the individual level. This is an essential point, but the discussion doesn't go into enough detail about it. The text would be stronger if there were a clearer connection between the statistical results and how they may be used in real life.
It's reasonable that the design only focuses on the Top 50 Portuguese swimmers, but this brings up problems that go beyond just admitting constraints. This is a very well chosen sample from a certain country. Consequently, it is essential to engage in a more critical examination of how this selection can influence the observed stability patterns. The current discussion does not give enough weight to how this sampling approach could affect generalizability.
The phrase "statistical modeling was done to guess values between two known years of performance" needs a lot more information. This is not a small technical step; it has a direct effect on the longitudinal structure of the data. The text should make it explicit how these values were made (for example, the interpolation method and the assumptions) and whether this process could change the estimations of stability. It is hard to judge how strong and reliable the analyses are without this information.
Some of the images, such the correlation heatmap and the multi-panel boxplots, are still hard to see because they are so dense. It is hard to see key patterns right away, and the visual load makes it harder to understand. To make things clearer, you may either reduce the graphs or move some of the detailed outputs to extra materials, or resize the heatmap for example, to be bigger.
The Discussion section is comprehensive but would benefit from clearer structuring and partial condensation. In its current form, the section moves between literature comparison, biological interpretation, statistical explanation, and practical implications without clear internal organization. I would recommend to the authors introducing sub-sections (for example try and think for something like - Performance Progression, Stability Interpretation, Mixed-Model Findings, Practical Implications, Limitations - it's an opinion. Limitations are mandatory anyway) would improve readability and conceptual clarity. I particularly suggest separating the limitations currently presented in lines 312–326 into a distinct subsection (for example 4.1. Limitations). This would strengthen the scientific structure, and make it easier to follow and read.
The paragraphs between lines 210–229 and 231–248 largely reiterate known developmental trends (quadratic progression, peak age, decline in yearly improvement). While relevant, this section is somewhat literature-heavy and could be condensed. The manuscript would benefit from shifting emphasis from confirming previously reported patterns toward clarifying what is specifically added by the present dataset (for example - integration of techniques and distances in female swimmers, combined stability approaches). A reduction of approximately 15–20% in this section would improve focus without losing scientific grounding.
The section addressing mean and normative stability (lines 250–283) is one of the most important parts of the manuscript. However, the practical implications of these findings could be articulated more explicitly. In particular: (1) The contrast between group-level stabilization and limited rank-order predictability deserves stronger emphasis. (2) The implications of stability emerging only around ages 15–16 should be clearly linked to talent identification frameworks. Currently, the statistical findings are described appropriately, but their applied interpretation could be sharpened.
The discussion of the mixed-effects model (lines 284–299) requires “deeper” interpretation. As I already mentioned among the first observations, the reported ICC values are relatively low, yet the practical meaning of this is only briefly mentioned (lines 289–291). A more explicit discussion of what low ICC implies for individual predictability would strengthen the manuscript. Additionally, the contrast between high conditional R² values and low ICC values, I would recommend to the authors to be clarified conceptually, as it has important implications for how “stability” is understood.
The biomechanical and energetic explanations provided in lines 300–311 are interesting but somewhat “speculative”. This section could either be slightly condensed or more clearly tied to the actual statistical findings of the study. Currently, I noticed that the link between the energetic hierarchy of techniques and the observed stability patterns is suggested rather than demonstrated.
The Conclusions section (lines 327–347) would benefit from a more concise and decisive structure. At present, the conclusions restate the descriptive findings but could be reformulated into clearer take-home messages. (1) Early performance (before ~15 years) has limited predictive value; (2) Stability increases after mid-adolescence; (3) Distance appears to exert a stronger influence on stability than technique.; (4) Between-swimmer variance remains relatively small once age is controlled; (5) Talent identification processes should avoid premature judgments. These are just some proposals ideas. Reducing repetition and focusing on 4–6 strong, practice-oriented statements would significantly strengthen the closing section. This is my opinion.
The manuscript is clearly better than the last one overall. The structure is stronger, and some of the problems that were brought up before have been fixed. Nonetheless, significant constraints persist, especially concerning the profundity of interpretation, the elucidation of specific scientific procedures, and the incorporation of the results into a more expansive conceptual framework. Improving these areas even further would make the study more convincing and easier to understand.
In order to ensure consistency in formatting and reporting standards, the manuscript would benefit from thorough editing. Make sure that: (1) All abbreviations are written out completely when they appear for the first time in the text, followed by the abbreviation in parenthesis, and that only the abbreviation is used going forward. (2) Units of measurement are explicitly provided in all tables and figures, either in the column/row headers for tables or in the axis labels for figures. (3) Throughout the manuscript, statistical symbols (such as p, r, ICC, and R²) are formatted uniformly. (4) The text and tables have consistent decimal formatting and spacing. Clarity and readability will increase if these elements are consistent.

Author Response

Round 2:

Dear authors,

R: We sincerely thank the reviewer for acknowledging the improvement of the manuscript quality. We also thank you for the constructive comments aimed at strengthening our work. In the sections below, we address each recommendation and describe the corresponding revisions made to the text. We believe these improvements have significantly enhanced the manuscript.

The manuscript is well-written and methodologically sound, although its overall contribution still seems a bit small. The results mostly back up what has already been said about how swimming performance improves and then levels off. The inclusion of female swimmers and the comparison of tactics and distances are pertinent and appreciated. It is still not obvious how these discoveries add to what we already know in a meaningful way. The manuscript would benefit from a clearer discussion of what is new in terms of ideas, beyond just verifying what is already known about developmental trends. Right now, the work reads more like a thorough description of trajectories than like a new way to understand them.

R: We thank the reviewer for the positive assessment of the manuscript and for recognizing the relevance of including female swimmers and comparing swimming techniques and distances. We appreciate this comment; however, we do not fully agree that the study is merely a description of performance trajectories. In our view, the manuscript goes beyond confirming general developmental trends by providing a new perspective on stability across events, showing that the progression of performance is not uniform and should be interpreted according to the specific characteristics of each event. This comparative approach across distances and techniques adds to the current literature by extending the understanding of how stability and development differ within competitive swimming, especially in female swimmers. We also believe that the Discussion explores these aspects in considerable detail and has been improved (please see lines 253-260, 290-295). Nevertheless, we have revised the manuscript to make the novel contribution and the conceptual relevance of these findings more explicit.

Linear mixed-effects models are a good choice for this kind of data and are the right choice. Nonetheless, the analysis does not completely capitalize on this methodology. The manuscript presents ICC values and fixed effects; however, it does not thoroughly examine interaction effects. The notably low ICC in the general model (ICC = 0.05) necessitates more meticulous interpretation. If just a tiny part of the variance can be explained by changes between swimmers after taking into account age, distance, and technique, this has direct effects on how "stability" should be understood. Right now, this point is brought up, but not in a critical way. There is also no model comparison or sensitivity analysis, which would help the reader realize how strong these results are.

R: We thank the reviewer for this important comment. We have revised the manuscript to clarify the interpretation of the mixed-effects models, particularly the interaction effects and the low ICC in the general model. The low ICC value (0.05) indicates that, after accounting for age, distance, and technique, only a small part of the remaining variance is explained by differences between swimmers. This was already reflected in the manuscript as low predictive power for stability across events, and therefore stability should be interpreted cautiously. However, the model remains informative because it allows direct comparison between events and helps identify how stability differs according to event characteristics. We also clarified that the general model included the interaction between distance and technique. In addition, we acknowledge that no formal model comparison or sensitivity analysis was performed, and this has now been recognized as a limitation of the study.

From a statistical point of view, mean and normative stability are well-defined, but their practical meaning is still a little unclear. It would be beneficial to elucidate the implications of these results for coaches or talent discovery systems. For instance, if ICC values are always low, this means that they don't have much predictive power at the individual level. This is an essential point, but the discussion doesn't go into enough detail about it. The text would be stronger if there were a clearer connection between the statistical results and how they may be used in real life.

R: The practical meaning of these stability measures is essential for coaches, talent identification systems or talent scouts. Mean stability shows the general progress of the entire group, helping coaches set realistic performance benchmarks for each age. Normative stability tells us how much an athlete’s ranking changes over time. High normative stability means the top-ranked athletes today are likely to stay at the top later, making early talent selection more reliable. On the other hand, low normative stability suggests that talent identification systems should stay open and inclusive for longer, as "late bloomers" who are currently behind may eventually catch up to or even overtake their early-maturing peers. This mean stability and normative stability clarification are already in the discussion in agreement with the results (please see lines 284-287). Regarding the reduced ICC values, we kindly ask the reviewer to see the reply to remark #2.

It's reasonable that the design only focuses on the Top 50 Portuguese swimmers, but this brings up problems that go beyond just admitting constraints. This is a very well chosen sample from a certain country. Consequently, it is essential to engage in a more critical examination of how this selection can influence the observed stability patterns. The current discussion does not give enough weight to how this sampling approach could affect generalizability.

R: The authors acknowledge the highlighted and clearly presented in the Limitations subsection (please see lines 335-341). Although we understand the concern, we believe that, from a methodological point of view, a detailed justification regarding the nationality of the swimmers is not necessary in the manuscript. Developmental pathways in sport constitute a multifactorial phenomenon shaped by interacting factors such as race and region of origin, as evidenced in several theoretical models (see doi:10.1017/S0954579423001281; doi:10.13189/saj.2025.130408). While socio-cultural and economic differences influence physical activity patterns (see doi:10.1016/j.healthplace.2017.05.013), the findings presented here align with established developmental frameworks for both male and female swimmers in other international contexts (see doi:10.3389/fspor.2020.589938; doi:10.1111/sms.13599). Furthermore, this study provides an up-to-date analysis by incorporating performance data through 2025. Also, several experimental studies are conducted using samples from a single nationality, and this information is not always explicitly available to the reader.

The phrase "statistical modeling was done to guess values between two known years of performance" needs a lot more information. This is not a small technical step; it has a direct effect on the longitudinal structure of the data. The text should make it explicit how these values were made (for example, the interpolation method and the assumptions) and whether this process could change the estimations of stability. It is hard to judge how strong and reliable the analyses are without this information.

R: We appreciate the alert. For this reason, the sentence was improved and clarified (please see lines 97-100). Now, it is stated that non-linear interpolation was applied when necessary. Also, to ensure objective prediction and minimize bias, different models, such as exponential or polynomial regressions, were fitted to the available data, with the final value determined by the model demonstrating the best fit (i.e., minimum Mean Squared Error).

Some of the images, such the correlation heatmap and the multi-panel boxplots, are still hard to see because they are so dense. It is hard to see key patterns right away, and the visual load makes it harder to understand. To make things clearer, you may either reduce the graphs or move some of the detailed outputs to extra materials, or resize the heatmap for example, to be bigger.

R: Thank you. The quality of the images was improved and some detailed information was refined. Also, the heatmaps were resized. Please take into consideration that the image has the minimum quality required by MDPI policy and can be increased for a better readability when transformed in pdf format.

The Discussion section is comprehensive but would benefit from clearer structuring and partial condensation. In its current form, the section moves between literature comparison, biological interpretation, statistical explanation, and practical implications without clear internal organization. I would recommend to the authors introducing sub-sections (for example try and think for something like - Performance Progression, Stability Interpretation, Mixed-Model Findings, Practical Implications, Limitations - it's an opinion. Limitations are mandatory anyway)would improve readability and conceptual clarity. I particularly suggest separating the limitations currently presented in lines 312–326 into a distinct subsection (for example 4.1. Limitations). This would strengthen the scientific structure, and make it easier to follow and read.

R: Although we understand your suggestion until some point, it is hard to split the discussion into subsections since the results don’t follow the same rationale. Moreover, the three other reviewers congratulated the authors in the way they presented the discussion, and any further changes could go against this previous comments. So, please keep in mind that we are trying to accommodate every feedback and not always is possible accomplish all the changes required.

The paragraphs between lines 210–229 and 231–248 largely reiterate known developmental trends (quadratic progression, peak age, decline in yearly improvement). While relevant, this section is somewhat literature-heavy and could be condensed. The manuscript would benefit from shifting emphasis from confirming previously reported patterns toward clarifying what is specifically added by the present dataset (for example - integration of techniques and distances in female swimmers, combined stability approaches). A reduction of approximately 15–20% in this section would improve focus without losing scientific grounding.

R: We appreciate the suggestion to further highlight the novelty of our methodology and findings in relation to existing literature. To better differentiate our work from previous studies, we have incorporated new insights (see lines 287-295). We also value the suggestion to condense certain sections; however, we believe the current discussion provides a comprehensive and relevant extension of the topic that is essential for context.

The section addressing mean and normative stability (lines 250–283) is one of the most important parts of the manuscript. However, the practical implications of these findings could be articulated more explicitly. In particular: (1) The contrast between group-level stabilization and limited rank-order predictability deserves stronger emphasis. (2) The implications of stability emerging only around ages 15–16 should be clearly linked to talent identification frameworks. Currently, the statistical findings are described appropriately, but their applied interpretation could be sharpened.

R: Please see the replies to remark #2 and the remark #3. To enhance the implications of these findings, we have included a more accessible interpretation (please see lines 284-287).

The discussion of the mixed-effects model (lines 284–299) requires “deeper” interpretation. As I already mentioned among the first observations, the reported ICC values are relatively low, yet the practical meaning of this is only briefly mentioned (lines 289–291). A more explicit discussion of what low ICC implies for individual predictability would strengthen the manuscript. Additionally, the contrast between high conditional R² values and low ICC values, I would recommend to the authors to be clarified conceptually, as it has important implications for how “stability” is understood.

R: Please confirm previous replies (remarks #2 and #3). This idea was explored deeply in lines 292-295. At some point, the addition of a large amount of information can take away the essence of the manuscript and affect readability. We hope that you can be sensitive to this issue, since the other three reviewers were comfortable with the way it is written.

The biomechanical and energetic explanations provided in lines 300–311 are interesting but somewhat “speculative”. This section could either be slightly condensed or more clearly tied to the actual statistical findings of the study. Currently, I noticed that the link between the energetic hierarchy of techniques and the observed stability patterns is suggested rather than demonstrated.

R: We thank the reviewer for this comment. We respectfully do not consider this section speculative, as it is grounded in established swimming pedagogy and performance principles. For example, backstroke is typically introduced early, allowing more time for technical development, which may help explain its higher stability. Additionally, the biomechanical and physiological demands vary across distances, which may not strongly influence within-season performance fluctuations. Nevertheless, we have revised the text to more clearly link these explanations to our findings.

The Conclusions section (lines 327–347) would benefit from a more concise and decisive structure. At present, the conclusions restate the descriptive findings but could be reformulated into clearer take-home messages. (1) Early performance (before ~15 years) has limited predictive value; (2) Stability increases after mid-adolescence; (3) Distance appears to exert a stronger influence on stability than technique.; (4) Between-swimmer variance remains relatively small once age is controlled; (5) Talent identification processes should avoid premature judgments. These are just some proposals ideas. Reducing repetition and focusing on 4–6 strong, practice-oriented statements would significantly strengthen the closing section. This is my opinion.

R: Thank you this important suggestion. We tried to revise the conclusion accordingly (please see lines 350-357)

The manuscript is clearly better than the last one overall. The structure is stronger, and some of the problems that were brought up before have been fixed. Nonetheless, significant constraints persist, especially concerning the profundity of interpretation, the elucidation of specific scientific procedures, and the incorporation of the results into a more expansive conceptual framework. Improving these areas even further would make the study more convincing and easier to understand.

R: We appreciate the general comments. We made a great effort in trying to solve some issues in response to your remarks in both first and second rounds, which yielded major revisions. As you surely know, it is hard to make the necessary changes if, round after round, the authors are asked to perform deeper changes. Your comments clearly helped us to enhance our work and we hope you now find the manuscript more concise and well discussed. We deeply thank your collaboration.

In order to ensure consistency in formatting and reporting standards, the manuscript would benefit from thorough editing. Make sure that: (1) All abbreviations are written out completely when they appear for the first time in the text, followed by the abbreviation in parenthesis, and that only the abbreviation is used going forward. (2) Units of measurement are explicitly provided in all tables and figures, either in the column/row headers for tables or in the axis labels for figures. (3) Throughout the manuscript, statistical symbols (such as p, r, ICC, and R²) are formatted uniformly. (4) The text and tables have consistent decimal formatting and spacing. Clarity and readability will increase if these elements are consistent.

R: Thank you for your suggestions. All these details were revised.

Round 3

Reviewer 4 Report

Comments and Suggestions for Authors

Dear authors,

Thank you for the detailed responses and for the substantial revisions made to the manuscript. The paper is clearly improved in structure, clarity, and methodological transparency compared to the previous version. I appreciate the effort invested in addressing the comments from both rounds.

That said, a few conceptual points still require clearer articulation before the manuscript to be considered further.

1. Interpretation of the mixed-effects models (ICC vs R²c). While the manuscript now acknowledges the low ICC values, the conceptual implications remain somewhat underdeveloped. In particular, the contrast between high conditional R² and low ICC deserves clearer explanation. This has important implications for how “stability” and individual predictability are interpreted and should be addressed more explicitly.

2. Sampling framework (Top-50 Portuguese swimmers). The limitations section acknowledges restricted generalizability, but a more critical discussion of how range restriction and selective sampling may influence observed stability patterns would strengthen the manuscript.

3. Interpolation procedure. Although the method is now described, it would be helpful to briefly clarify how frequently interpolation was applied and whether its use may have influenced stability estimates.

4. In the paragraph discussing previous literature (lines 287–291), the authors state that “existing literature on Olympic swimmers has attempted to compare competitive events” and that such comparisons “often rely on indirect, non-statistical assessments.” However, it is not entirely clear which specific studies are being referred to here. Please specify and explicitly cite the studies supporting this claim. If reference [29] is intended to substantiate this statement, this should be made clearer in the text. Otherwise, additional references should be included to support the assertion. Providing precise citations will strengthen the argument and clarify how the present study advances beyond prior work.

5. While the manuscript has improved substantially, several sentences remain excessively long and syntactically dense, particularly in the Introduction and Discussion sections. In multiple instances, sentences extend well beyond 40–50 words, combining several ideas that would benefit from clearer separation. For readability and clarity, I strongly recommend limiting sentences to approximately 30–40 words whenever possible and splitting overly complex constructions into shorter, more direct statements. This would significantly improve flow and accessibility without altering the scientific content. This is particularly relevant in:

- Introduction (for example check, lines 41–46; 68–82); Discussion (multiple paragraphs between lines 223–253 and 271–311)

6. The limitations are currently presented within the final part of the Discussion (lines 335–347), but not as a clearly separated subsection. For structural clarity and consistency with scientific reporting standards, I recommend introducing a distinct subsection (4.1. Limitations).

7. The tables would benefit from a standardized “Notes” section below each table. Adding structured table notes would improve clarity and reporting consistency.

All abbreviations (e.g., ICC, R²c, F) are defined in the table notes.
Statistical symbols and significance thresholds are clarified.
Any color coding or formatting conventions are explicitly explained.

Author Response

Reviewer 4:

Dear authors,

That said, a few conceptual points still require clearer articulation before the manuscript to be considered further.

R: We appreciate the reviewer’s comments and recognition of the manuscript’s improvement. We hope that this last version could meet your expectations and standards for further publication.

Interpretation of the mixed-effects models (ICC vs R²c). While the manuscript now acknowledges the low ICC values, the conceptual implications remain somewhat underdeveloped. In particular, the contrast between high conditional R² and low ICC deserves clearer explanation. This has important implications for how “stability” and individual predictability are interpreted and should be addressed more explicitly.

R: We thank the reviewer for this insightful comment. The contrast between these two metrics provides key insight into the nature of performance stability of those female swimmers. A high R²c (e.g., > 0.86) demonstrates that the model has high explanatory power, largely driven by the deterministic effects of age and event characteristics (distance and technique). However, the low ICC (e.g., < 0.15) indicates that individual stability is limited. Conceptually, this suggests that while the overall performance progression is highly predictable across the population, the relative ranking of individual swimmers remains fluid (i.e., accounting for inter-individual variation in ranks). In practical terms, this means that early performance is a poor predictor of long-term standing because the variance due to a certain swimmer is small compared to the variance driven by the rapid, non-linear improvements associated with maturation. Smooth changes were made in the Discussion section (lines 299-306) to clarify this issue.

Sampling framework (Top-50 Portuguese swimmers). The limitations section acknowledges restricted generalizability, but a more critical discussion of how range restriction and selective sampling may influence observed stability patterns would strengthen the manuscript.

R: We appreciate your point of view. We believe these aspects are already clearly addressed in both the Discussion and Limitations sections. Specifically, the use of a Top-50 framework and the specific national scope of the sample were necessary to meet our aim of analyzing high-level performance progression and is not a strange case. Previous studies were framed within a specific context (please see https://doi.org/10.1080/02640414.2011.587196; https://doi.org/10.1371/journal.pone.0292038). Both the selective nature of this sampling and its geographical restriction are explicitly acknowledged as limitations and reflected in our conclusions (please see lines 332-333; 358-360), without introducing bias in the interpretation of the findings.

Interpolation procedure. Although the method is now described, it would be helpful to briefly clarify how frequently interpolation was applied and whether its use may have influenced stability estimates.

R: We appreciate your request for clarification. The interpolation procedure was used only in rare instances (estimated at ~3% of the total values). Consequently, we reinforce that this procedure had a negligible impact on the analysis of the performance progression and stability across the ges and techniques. We hope you can be sensitive enough to understand it.

In the paragraph discussing previous literature (lines 287–291), the authors state that “existing literature on Olympic swimmers has attempted to compare competitive events” and that such comparisons “often rely on indirect, non-statistical assessments.” However, it is not entirely clear which specific studies are being referred to here. Please specify and explicitly cite the studies supporting this claim. If reference [29] is intended to substantiate this statement, this should be made clearer in the text. Otherwise, additional references should be included to support the assertion. Providing precise citations will strengthen the argument and clarify how the present study advances beyond prior work.

R: Thank you. The statement was referring to the reference [29] and the correction was made accordingly.

While the manuscript has improved substantially, several sentences remain excessively long and syntactically dense, particularly in the Introduction and Discussion sections. In multiple instances, sentences extend well beyond 40–50 words, combining several ideas that would benefit from clearer separation. For readability and clarity, I strongly recommend limiting sentences to approximately 30–40 words whenever possible and splitting overly complex constructions into shorter, more direct statements. This would significantly improve flow and accessibility without altering the scientific content. This is particularly relevant in:
- Introduction (for example check, lines 41–46; 68–82);
- Discussion (multiple paragraphs between lines 223–253 and 271–311)

R: Thank you. The text was revised accordingly.

The limitations are currently presented within the final part of the Discussion (lines 335–347), but not as a clearly separated subsection. For structural clarity and consistency with scientific reporting standards, I recommend introducing a distinct subsection (4.1. Limitations).

R: We appreciate the reviewer's suggestion. We have carefully considered this suggestion in the context of the journal’s formatting guidelines. Sports do not require or typically enforce a separate limitations subsection in the main text; rather, limitations are commonly integrated into the final part of the Discussion, as reflected in recent experimental articles published in this journal. Please see “Instructions for Authors”, subsection of “Manuscript Preparation” (available at: https://www.mdpi.com/journal/Sports/instructions). Also, since the Results and Discussion were not originally structured using subsections, we believe that introducing a specific subsection only for the limitations within the Discussion would not be consistent with the overall structure adopted in our manuscript. For this reason, we opted to maintain the current format and hope this explanation clarifies our decision.

The tables would benefit from a standardized “Notes” section below each table. Adding structured table notes would improve clarity and reporting consistency.
All abbreviations (e.g., ICC, R²c, F) are defined in the table notes.
Statistical symbols and significance thresholds are clarified.
Any color coding or formatting conventions are explicitly explained.

R: Thank you. The structure was followed. The labels and notes of the Figures and Tables were updated.

Article Menu

Performance Progression and Stability of Female Swimmers Across Different Swimming Techniques from Childhood to Adulthood

Further Information

Guidelines

MDPI Initiatives

Follow MDPI