Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Attitudes Towards Sport in Early Adolescence: A Scale Adaptation Study for Sustainable Good Health and Well-Being

Healthcare 2026, 14(7), 842; https://doi.org/10.3390/healthcare14070842

by Halil Evren Senturk^1,*

, Gulsum Tanir², Ulkum Erdogan Yuce³, Adem Karatut⁴

and Ecesu Karakaş¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Healthcare 2026, 14(7), 842; https://doi.org/10.3390/healthcare14070842

Submission received: 7 March 2026 / Revised: 20 March 2026 / Accepted: 20 March 2026 / Published: 25 March 2026

(This article belongs to the Special Issue Creating Connection Between Physical and Mental Health in Physical Activity, Physical Exercise and Sport Across the Lifespan)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Major Concerns

Sample Representativeness and Generalizability

The sample is drawn exclusively from 15 schools in Marmaris, Mugla, Turkey—a specific geographic region that may not represent the broader Turkish early adolescent population. While the authors describe using "stratified purposive sampling" to ensure grade-level representation, there is no information about:

How were these 15 schools selected from all available schools in the region

The socioeconomic characteristics of the school districts

The urban/rural distribution of participating schools

The response rate (consent forms distributed vs returned)

Whether non-participants differed systematically from participants

This limitation is acknowledged in the discussion but requires expansion. The authors should clarify whether the findings can be generalised to all Turkish middle school students or only to those in similar coastal, tourist-economy regions.

Insufficient Detail on Criterion Measures

The criterion-related validity analyses rely on three external measures that are inadequately described:

Physical Activity Questionnaire for Older Children (PAQ-C): Mentioned in Section 2.2.2, but the description is incomplete. The authors state: "The validity and reliability study of the Turkish adaptation of the scale for middle school students was conducted by Sert and Temel [21]." However, no information is provided about:

The number of items in this scale

What domains does it assess?

Its scoring system

Its psychometric properties in the current sample

Whether it measures the same constructs as the ATSS or complementary constructs

Perceived Physical Literacy Scale (PPLS): Section 2.2.3 provides slightly more detail but still lacks:

The number of items per dimension

The response format

Evidence of its validity in this specific sample

Whether the scale was administered in its original form or adapted for this study

Personal Information Form: The questions about physical activity duration rely on self-report and recall, which are prone to bias. The authors ask students to estimate "minutes per day" of mandatory physical activity and "days per week × hours per day" of voluntary physical activity. No validation of these self-reports against objective measures (e.g., accelerometry, parent reports, teacher observations) is provided. Given that early adolescents may have difficulty with accurate time estimation, this limitation should be acknowledged.

Statistical Analysis Concerns

Handling of Missing Data: The authors report that 577 students participated, but after removing outliers, the final sample was 531. This represents 46 excluded cases (8% of the sample). The authors state that univariate outliers were removed based on Z-scores > ±3.29, and multivariate outliers were removed using Mahalanobis distance. However:

Were these excluded cases systematically different from retained cases?

Did the authors conduct sensitivity analyses to determine whether excluding these cases influenced the results?

Was there any missing data on individual items, and if so, how was it handled?

Confirmatory Factor Analysis Reporting: While the fit indices are excellent, several important details are missing:

The chi-square difference test comparing the three-factor model to a one-factor model would strengthen the argument for multidimensionality

Modification indices are not reported; were any theoretically justified modifications considered?

Cross-loadings were not explored; could any items load meaningfully on multiple factors?

The authors report using Robust Maximum Likelihood (MLR) but do not discuss the pattern of non-normality that necessitated this estimator

Correlation Interpretations: In Table 5, the correlation between total ATSS and total PPLS is r = .923. This is exceptionally high (approaching multicollinearity) and raises questions about whether these scales are measuring distinct constructs. The authors should discuss whether this high correlation indicates conceptual overlap rather than concurrent validity. An r > .90 suggests that the scales may be measuring essentially the same underlying construct, which would challenge the claim that the ATSS offers unique information beyond existing physical literacy measures.

Known-Groups Validity: The t-test comparisons between licensed and non-licensed students show large effect sizes (Cohen's d ranging from 0.694 to 0.878). However, the authors do not control for potential confounders such as:

Grade level (older students have more opportunities to obtain licenses)

Gender (sports participation patterns differ by gender in Turkey)

Socioeconomic status (licenses may require financial resources)

A more rigorous approach would be to use ANCOVA or multiple regression to determine whether license status predicts ATSS scores after controlling for these variables.

Conceptual and Theoretical Issues

Definition of "Early Adolescence": The authors define the target population as ages 10-15, encompassing both late childhood (10-11) and early-to-mid adolescence (12-15). Developmental psychologists typically distinguish between these stages. The cognitive abilities, social motivations, and physical development of a 10-year-old fifth grader differ substantially from those of a 15-year-old ninth grader. The authors should discuss whether measurement invariance holds across this age range. Has the scale been tested for factorial invariance across grade levels?

Theory of Planned Behaviour and Self-Determination Theory Integration: The introduction discusses both TPB and SDT as theoretical foundations, but does not clearly articulate how the ATSS dimensions map onto these theories. Specifically:

Which dimensions represent TPB constructs (attitude, subjective norm, perceived behavioural control)?

Which dimensions represent SDT constructs (autonomy, competence, relatedness)?

How does the three-factor structure integrate predictions from both theories?

Without this theoretical mapping, it is unclear what psychological processes the scale is intended to measure.

SDG 3 Framing: While the connection to Sustainable Development Goal 3 is plausible, the authors overstate the direct link. The scale measures attitudes, not health outcomes. The pathway from attitude measurement to improved population health is indirect and requires several assumptions (attitudes → intentions → behaviour → health outcomes). The authors should temper claims about the scale's role in achieving SDG 3 targets.

Methodological Rigour in the Qualitative Phase

The think-aloud protocol is a strength of this study, but several details are missing:

Sample Size Justification: Twenty-seven students participated in the qualitative phase. Did thematic saturation determine this number? How many interviews were needed before no new comprehension problems emerged?

Analysis of Qualitative Data: The authors state that "error codes" were generated and that an expert panel reviewed problematic expressions. However:

Who conducted the coding?

What was the coding framework?

Were multiple coders used to establish reliability?

Could you tell me how disagreements were resolved?

What was the inter-rater reliability?

Retrospective Probing: The authors used retrospective probing after the think-aloud. This method can introduce recall bias. Why was concurrent probing not used? How did the authors distinguish between problems identified during the think-aloud versus those identified during retrospective probing?

Revision Documentation: Table S2 (referenced but not provided in the main text) supposedly shows the original and revised items. Without seeing this table, readers cannot evaluate whether the revisions appropriately addressed developmental concerns. Were the revisions minor wording changes or substantial reconceptualisations of items?

Presentation and Reporting Issues

Incomplete Tables: Table 2 presents factor loadings, AVE, and CR, but does not show:

Standard errors

t-values or z-statistics

p-values for individual loadings (though all are described as significant)

Table Formatting: Table 2 lists items with numbers (e.g., "40", "70", "80"), but the numbering system is not explained. Which numbers correspond to which items? Readers cannot determine what each item measures.

Duplicate Text: Section 2.2.3 contains duplicate paragraphs. The paragraph beginning "To assess the cognitive, motivational, and physical competence aspects..." appears twice verbatim. This suggests careless final editing.

Figure Reference: Figure S1 is mentioned in the text ("Figure S1") but is not included in the manuscript. Readers cannot view the path diagram of the factor structure.

Confidence Interval Error: In Table 2, the RMSEA confidence interval is reported as "[0.49, 0.59]", but it should be "[0.049, 0.059]" (the decimal point is off by a factor of 10). This appears to be a typographical error.

Page Numbering: The PDF shows page numbers restarting at "43" on page 2, suggesting possible formatting issues during document preparation.

Language and Writing Quality

The manuscript contains numerous grammatical errors, awkward phrasing, and redundant statements that require revision:

Examples:

Page 1, Highlights: "The scale scores strongly predict actual daily physical activity durations" – awkward; consider "The scale scores are strong predictors of..."

Page 2, Abstract: "This study aimed to adapt and validate the Attitude Towards Sport Scale (ATSS), originally developed for high school students, for a middle school population (ages 10-15)" – this sentence is clear, but the following sentence becomes convoluted.

Page 3, Lines 323-344 (section heading formatting is inconsistent with numbers appearing repeatedly)

Page 4, Section 2.1: "In this application of the study, no specific exclusion criteria were applied to schools or students" – this phrasing is unclear. Does this mean all students were eligible? Or that exclusion criteria were not specified?

Page 5: Section 2.2.3 contains duplicate paragraphs, suggesting poor proofreading.

Page 7: "251 participants fully satisfied the foundational assumptions" – this sentence appears mid-paragraph and seems out of place.

Page 8: "the scale's robust known-groups validity, proving it is capable of accurately reflecting" – awkward; consider "demonstrating its capacity to reflect accurately"

Page 11: "Our adapted ATSS offers precisely this opportunity to diagnose this level of 'internalisation' at an early age" – overstates the capability of a self-report attitude scale.

Ethical Considerations

Assent: The authors state that written informed consent was obtained from legal guardians and school administrations. However, they do not mention whether child assent was obtained. Ethical standards for research with minors require that children provide assent in addition to parental consent. The authors should explain whether assent was obtained and, if so, how it was documented.

Compensation: Was any compensation or incentive offered for participation? This information should be included.

Data Privacy: The authors state that data were "anonymised and stored on the responsible researcher's personal computer." Storing research data on a personal computer raises security concerns. Were there any data security protocols (encryption, password protection, access controls)? Was the study approved by an ethics committee that reviewed data security procedures?

Specific Comments and Questions

Introduction

Lines 43-46: "A primary mechanism for achieving these targets... is the establishment of regular physical activity habits." This statement needs a citation. While intuitive, the link between physical activity habits and SDG 3 targets should be supported by evidence.

Lines 51-53: "accurately assessing the psychological antecedents of youth behaviour is critical for designing sustainable health interventions." This is a key premise of the study. The authors should provide more evidence that attitude assessment leads to better intervention design. Are there studies showing that interventions informed by attitude measurement are more effective than those that are not?

Lines 59-61: "Meta-analyses confirm that attitude serves as the primary predictor of physical activity intentions among adolescents." Which meta-analyses? Please provide citations. The statement is strong and requires robust support.

Lines 70-73: The description of the original ATSS is insufficient. Readers need to know:

How was the original scale developed?

What was the theoretical basis for the three-factor structure?

What were the original psychometric properties beyond Cronbach's alpha?

Has the scale been used in any intervention studies?

Lines 77-79: "Because middle schoolers are at a distinctly different stage of cognitive, emotional, and behavioural development, applying the high school version to them risks measurement errors." This is the central justification for the study. The authors should elaborate on these developmental differences. Specifically, how do cognitive, emotional, and behavioural development differ between early and late adolescents, and how might these differences affect responses to attitude items?

Methods

Section 2.1: The description of participant recruitment is overly detailed in some places and insufficient in others. The narrative about making announcements in schoolyards and forming groups of five students is interesting but not essential. What is missing is:

The total number of students approached

The number of people who returned consent forms

The number who were invited to participate

The number who actually participated (577) versus the target (600)

Reasons for non-participation

Lines 88-90: "stratified purposive sampling method was used in 15 public schools selected from Marmaris in Mugla, Turkiye, where the researchers reside, to ensure representation across all grade levels." This is contradictory. Purposive sampling is non-probability sampling; "ensure representation" suggests an attempt at representativeness. Please clarify: Was this a convenience sample of schools accessible to the researchers, or was there an attempt to randomly select schools from a sampling frame?

Table 1: The table presents demographic data, but lacks:

The range of ages within each grade

Socioeconomic indicators (parental education, income, etc.)

Urban/rural classification of schools

Comparison of sample characteristics to national statistics (to assess representativeness)

Lines 100-105: "During 5-minute preliminary interviews, a conversation took place regarding the research, the researchers, and how to contribute to the study." The word "con tribute" appears with a space, indicating a formatting error.

Section 2.2.1: The description of the think-aloud protocol is detailed, which is commendable. However:

Who conducted the think-aloud sessions? Were they the authors or trained research assistants?

Was there a structured interview protocol with specific probes?

Were sessions video-recorded or only audio-recorded? A video might capture nonverbal cues of confusion.

Lines 116-118: "The adaptation process ensured that the items reflect the psychomotor and social development characteristics of middle school students." How was this made? Through the think-aloud protocol? Through expert review? This statement needs some support from the qualitative findings.

Section 2.3.1: The qualitative sample of 27 students is described, but:

Were these students from the same schools as the quantitative sample?

Did any participate in both the qualitative and quantitative phases?

How was the "mean age of 156.3 months" calculated? This is an unusual presentation (typically reported in years).

Section 2.3.2: "Each session began with a five-minute preliminary interview... This was followed by a warm-up exercise where students practised verbalising their internal thought processes." What was this warm-up exercise? Providing examples would help readers understand the protocol.

Lines 143-145: "The qualitative data were evaluated through face-to-face consultations with an expert panel comprising physical education teachers and developmental psychologists." How many experts? What were their qualifications? Were they independent of the research team?

Section 2.4: The data analysis section is well-structured but missing:

The criteria for acceptable model fit (e.g., CFI > .90, RMSEA < .08) should be stated before presenting results

The method for calculating composite reliability (was it Fornell & Larcker's formula or something else?)

The significance level for all analyses (presumably α = .05)

Whether any corrections for multiple comparisons were applied

Results

Lines 251-252: "251 participants fully satisfied the foundational assumptions" – this sentence is confusing. Does this refer to the 251 participants mentioned earlier? The final sample is 531, so this number doesn't align with it.

Section 3.1: The fit indices are excellent, but:

The chi-square value (693.582 with df=272) is significant, which is expected with large samples. The authors correctly note this.

Were any alternative models tested (e.g., one-factor, two-factor, bifactor)?

Was there any evidence of method effects (e.g., negatively worded items forming a separate factor)?

Table 2: Several issues with this table:

The item numbers are not sequential (40, 70, 80, 90, 120, etc.). What do these numbers represent?

Factor loadings for items 1, 3, 5, 6, 17, and 22 are listed under Factor 3, but the table layout makes this difficult to follow

R² values are provided but not discussed; are these acceptable (>.30 is typical)?

The CR values (.970, .925, .936) are extremely high, suggesting possible item redundancy. The authors should discuss whether the scale could be shortened.

Lines 323-324: "the robust internal consistency metrics imply high composite reliability" – this is circular. Composite reliability IS an internal consistency metric. Please revise.

Section 3.2: "Since no correlation coefficient exceeded the severe multicollinearity threshold of 0.85, it was concluded that the sub-dimensions measure distinct, albeit related, aspects." This is a reasonable conclusion, but correlations of .72-.80 are quite high. The authors should report the confidence intervals around these correlations and consider testing a higher-order factor model.

Table 3: The known-groups comparison is clear and well-presented. However:

The means for licensed students (55.32 for Interest, 24.82 for Lifestyle, 20.54 for Participation) suggest ceiling effects, especially given the maximum possible scores (Interest max = 65 [13 items × 5], Lifestyle max = 30 [6×5], Participation max = 30 [6×5]). The authors should examine whether the scale can detect improvement over time or if it suffers from range restriction.

The standard deviations are large, indicating substantial within-group variability.

Table 4: The correlation table has formatting issues. The R values and p-values are not clearly aligned with the variables. Additionally:

The correlations with DMPAT are very weak (all < .13), and some are not significant. The authors should discuss why attitudes do not predict mandatory physical activity. This is theoretically interesting but requires explanation.

The correlations with WVPAT are moderate (.422-.523), supporting criterion validity.

Table 5: The correlations are remarkably high:

ATSS total with PPLS total: r = .923 (85% shared variance)

ATSS total with PAQ-C: r = .845 (71% shared variance)

These high correlations raise concerns about discriminant validity. The authors should report the AVE for each construct and compare it to the squared correlations between constructs (Fornell-Larcker criterion). I suspect the AVE may be lower than the squared correlations, indicating poor discriminant validity.

Lines 374-376: "the overall ATSS score demonstrated a remarkably high correlation with overall physical literacy (r = .923) and physical activity levels (r = .845)." The word "remarkably" is inappropriate in scientific writing. Replace with neutral language (e.g., "a strong correlation").

Discussion

Lines 394-396: "Early adolescence is the most critical developmental threshold at which individuals either transform physical activity habits into a lifelong lifestyle or completely abandon sports." This overgeneralizes. Many people adopt physical activity in later adolescence or adulthood. Please revise to acknowledge variability in developmental trajectories.

Lines 401-404: "High factor loadings and Composite Reliability (CR) values confirm the psychometric power of the instrument." The term "psychometric power" is vague. What does this mean? Statistical power to detect effects? Reliability? Please be specific.

Lines 405-408: "Although the cognitive capacities of early adolescents are based on more concrete operations compared to late adolescents, the multidimensional nature of attitudes toward sport remains unchanged." This statement references Piagetian theory but doesn't cite it. Also, the link between concrete operations and attitude structure is not explained. How would concrete vs abstract thinking affect the factor structure?

Lines 415-420: The discussion of physical literacy is well-connected to the findings. However, the authors claim a "very strong match" between ATSS Interest and PPLS cognitive/affective dimensions. Given the extremely high correlations, this match may be too strong, suggesting the scales measure the same thing.

Lines 422-426: "As emphasised in current healthcare publications [53-61], internalising physical literacy and a positive attitude towards sport at an early age is the strongest preventive medicine intervention against global health crises." This overstates the evidence. Physical activity is important, but calling it the "strongest" intervention is hyperbolic. Nutrition, vaccination, sanitation, and other interventions are equally or more important.

Lines 433-436: The discussion of Self-Determination Theory is appropriate, but the authors should explain more clearly how the ATSS dimensions map onto autonomous vs controlled motivation. Which items capture autonomous motivation? Which captures controlled motivation?

Lines 443-447: "our adapted ATSS offers precisely this opportunity to diagnose this level of 'internalisation' at an early age." The term "diagnose" is inappropriate for an attitude scale. Diagnosis implies clinical assessment of pathology. Please use "assess" or "measure."

Section 4.1 (Limitations): The limitations section acknowledges the geographic restriction and cross-sectional design. However, several important limitations are omitted:

Reliance on self-report for physical activity

High correlations suggest poor discriminant validity

Potential ceiling effects

Lack of test-retest reliability

No examination of measurement invariance across gender or grade

Social desirability bias in attitude reporting

Common method variance (all measures were self-report questionnaires)

Lines 452-457: "Future research should examine the predictive validity of the ATSS by tracking whether early adolescent attitude scores predict sustained physical activity participation in later adolescence and early adulthood." This is an excellent suggestion. The authors should also recommend:

Testing measurement invariance across cultures

Examining sensitivity to change following interventions

Establishing clinically meaningful cutoff scores

Comparing self-reports with parent/teacher reports

Validating against objective physical activity measures

Summary and Recommendation

This manuscript presents a methodologically ambitious, largely well-executed study that adapts the Attitude Towards Sport Scale for early adolescents. The mixed-methods approach, combining cognitive think-aloud protocols with rigorous psychometric analysis, is a significant strength that should be emulated in scale adaptation research. The authors demonstrate strong internal consistency, good model fit, and meaningful correlations with theoretically related constructs.

However, several critical issues must be addressed before the manuscript can be considered for publication:

Essential Revisions:

Could you provide complete descriptions of all measures, including item counts, response formats, scoring procedures, and psychometric properties in the current sample?

Could you address the concern about discriminant validity raised by the extremely high correlation with physical literacy (r = .923)? Either demonstrate that the constructs are empirically distinguishable or reconceptualise the ATSS as a measure of sport-related attitudes within the broader physical literacy framework.

Clarify the sampling procedures and discuss generalizability limitations more thoroughly. Provide demographic comparisons between the sample and national statistics.

Report missing data handling procedures and conduct sensitivity analyses to determine whether outlier exclusion influenced results.

Could you provide more detail on the qualitative analysis, including coding procedures, inter-rater reliability, and evidence of saturation?

Correct the decimal error in the RMSEA confidence interval (0.49 should be 0.049).

Could you remove duplicate paragraphs in Section 2.2.3 and proofread thoroughly for grammatical errors?

Temper claims about SDG 3 contributions and the scale's role in preventive medicine. Acknowledge the indirect pathway from attitude measurement to health outcomes.

Could you add a statement about child assent to the ethical approval section?

Could you include the missing supplementary materials (Table S2 with item revisions and Figure S1 with the path diagram) or integrate this information into the main text?

Recommended Additional Analyses:

Test a one-factor model and compare it to the three-factor model using chi-square difference tests.

Examine measurement invariance across gender and grade level to ensure the scale functions equivalently for all subgroups.

Calculate the Fornell-Larcker criterion to test discriminant validity against the PPLS formally.

Conduct exploratory factor analysis on a randomly selected subset of the sample to verify the factor structure, then confirm on the remaining subset.

Report modification indices and discuss any theoretically justified model modifications.

Provide descriptive statistics (means, SDs, ranges) for all ATSS items and subscales.

Presentation Improvements:

Reformat Table 2 to clearly show which items belong to which factor.

Please make sure all figures and tables are referenced in the text before they appear.

Could you provide complete table legends explaining all abbreviations and symbols?

Please include the path diagram (Figure S1) in the main manuscript rather than as supplementary material.

Add a participant flow diagram showing recruitment, consent, participation, and exclusion.

Author Response

First of all, I would like to thank reviewer 1 for raising the quality of the article. I have highlighted all the corrections in BLUE text within the main text, and I am also providing a detailed report with this text.

Comment 1: "The sample is drawn exclusively from 15 schools in Marmaris, Mugla, Turkey—a specific geographic region that may not represent the broader Turkish early adolescent population. While the authors describe using "stratified purposive sampling" to ensure grade-level representation, there is no information about: How were these 15 schools selected from all available schools in the region; The socioeconomic characteristics of the school districts; The urban/rural distribution of participating schools; The response rate (consent forms distributed vs returned); Whether non-participants differed systematically from participants. This limitation is acknowledged in the discussion but requires expansion. The authors should clarify whether the findings can be generalised to all Turkish middle school students or only to those in similar coastal, tourist-economy regions."

Response: We sincerely thank the reviewer for pointing out these vital sampling details. We completely agree that the geographic specificity of Marmaris (a coastal, tourism-driven district) places natural boundaries on the immediate generalizability of our findings. We also acknowledge that detailing the school selection process, response rates, and demographic distributions is essential for methodological transparency.

Action Taken: 1. We have significantly expanded the "Study Design and Participants" section in the Methods. We added information clarifying that the 15 public schools were selected in coordination with local educational authorities to encompass both central (urban) and peripheral (rural) districts, aiming for socioeconomic diversity within the region. We also reported the approximate response rate (distributing 600 consent forms to achieve the initial targeted sample of 600, yielding a ~95% (577) return rate). Furthermore, we clarified that due to strict ethical guidelines regarding voluntary participation and anonymity, demographic data could not be collected from non-participants, precluding a systematic comparison.
2. We have substantially expanded the Limitations section to explicitly address the reviewer's concern regarding generalizability. We clearly stated that because Marmaris is a coastal region with a tourist economy and potentially different recreational infrastructure, the findings should not be automatically generalized to the entire Turkish early adolescent population. We emphasized that the current validation applies most robustly to similar socio-economic and geographic contexts, and we recommended national replication across diverse (e.g., inland, eastern) regions.

Comment 2: "Insufficient Detail on Criterion Measures. The criterion-related validity analyses rely on three external measures that are inadequately described... PAQ-C: no information about the number of items, domains, scoring system, psychometric properties in the current sample, whether it measures same/complementary constructs. PPLS: lacks the number of items per dimension, response format, validity in this specific sample, whether original or adapted. Personal Information Form: The questions about physical activity duration rely on self-report and recall... No validation of these self-reports against objective measures (e.g., accelerometry). Given that early adolescents may have difficulty with accurate time estimation, this limitation should be acknowledged."

Response: We highly appreciate the reviewer’s detailed feedback on the instrumentation. We agree that the descriptions of the criterion measures were overly brief, missing crucial psychometric and structural details necessary for replication and clarity. Furthermore, the reviewer raises a very valid point regarding the cognitive limitations of early adolescents in accurately estimating physical activity durations, a limitation inherent to retrospective self-report methods.

Action Taken:
1. Expanded Instrument Descriptions: We have thoroughly revised Sections 2.2.2 (PAQ-C) and 2.2.3 (PPLS).
For the PAQ-C, we added the number of items (9 scored items), the domains assessed (moderate-to-vigorous activity across various school and free-time contexts), its 1-to-5 scoring system, and clarified that it measures a complementary behavioral construct (actual activity) rather than a purely attitudinal one. Most importantly, we reported the internal consistency specifically calculated from our current sample.
For the PPLS, we included the total number of items, the dimensions, the 5-point Likert response format, confirmed that the validated Turkish adapted form was utilized, and reported its reliability coefficient in the present study.

2. Addressed Recall Bias in Limitations: We have expanded the Limitations section to explicitly acknowledge the cognitive difficulties early adolescents face regarding accurate time estimation. We clearly stated that relying on retrospective self-reports for physical activity durations (minutes/hours) is prone to recall bias, reinforcing the necessity for future studies to incorporate objective measurement tools like accelerometers.

Comment 3: "Statistical Analysis Concerns. Handling of Missing Data: Were these excluded cases systematically different from retained cases? Did the authors conduct sensitivity analyses...? Was there any missing data on individual items, and if so, how was it handled? CFA Reporting: The chi-square difference test comparing the three-factor model to a one-factor model would strengthen the argument... Modification indices are not reported... Cross-loadings were not explored... The authors report using MLR but do not discuss the pattern of non-normality... Correlation Interpretations: correlation between ATSS and PPLS is r = .923... suggests scales may be measuring essentially the same underlying construct. Known-Groups Validity: The t-test comparisons... do not control for potential confounders such as Grade level, Gender, SES. A more rigorous approach would be to use ANCOVA..."

Response: We are extremely grateful for this rigorous statistical review. The reviewer has identified several critical areas where our methodological reporting and analytical depth needed significant enhancement. We have addressed each of these advanced statistical concerns comprehensively.

Action Taken:
1. Handling of Missing Data: We updated the Data Analysis section to clarify that missing data on individual items were handled via listwise deletion (resulting in 18 excluded cases), followed by the removal of 28 multivariate outliers. We added a statement confirming that a preliminary sensitivity check revealed no systematic demographic differences (gender or grade) between the excluded cases and the retained sample, ensuring the data were missing completely at random (MCAR).

2. CFA Reporting & Multidimensionality: In the Results section, we have now reported the fit indices for a unidimensional (one-factor) model. As expected, the single-factor model demonstrated poor fit, statistically confirming the superiority and necessity of the three-factor structure. Furthermore, we explicitly stated that no post-hoc modification indices (e.g., correlating error terms) or cross-loadings were freely estimated, preserving the strict theoretical integrity of the construct. The rationale for the MLR estimator (acceptable univariate skewness/kurtosis boundaries) was also explicitly added to the Methods section.

3. Correlation Interpretations (r = .923): We completely agree that a correlation of .923 approaches multicollinearity. In the Discussion and Limitations sections, we addressed this directly. We articulated that while "Attitude" (affective/cognitive inclination) and "Physical Literacy" (perceived competence/knowledge) are theoretically distinct concepts, the exceptionally high correlation is likely inflated by common-method bias and shared method variance, as both were measured concurrently via self-report. We urged future researchers to use objective measures to disentangle this conceptual overlap.

4. Known-Groups Validity (ANCOVA): Following the reviewer’s suggestion, we supplemented the independent t-tests with an Analysis of Covariance (ANCOVA). We tested the main effect of athletic license status on ATSS scores while strictly controlling for the confounding effects of Gender and Grade level. The ANCOVA results remained statistically significant (Table S7a), confirming that active athletic status robustly predicts higher sport attitudes even after adjusting for these demographic confounders. Additionally, due to strict ethical regulations by the Ministry of National Education regarding student privacy, direct data on family socioeconomic status (SES) could not be collected, precluding its use as a statistical control variable.

Comment 4: Conceptual and Theoretical Issues."Definition of 'Early Adolescence': The authors define the target population as ages 10-15... Has the scale been tested for factorial invariance across grade levels? Theory of Planned Behaviour and SDT Integration: The introduction discusses both TPB and SDT as theoretical foundations, but does not clearly articulate how the ATSS dimensions map onto these theories... SDG 3 Framing: The pathway from attitude measurement to improved population health is indirect... The authors should temper claims about the scale's role in achieving SDG 3 targets."

Response: We highly appreciate the reviewer’s profound theoretical insights. The reviewer correctly identifies that navigating the cognitive and physical span from ages 10 to 15 requires robust invariance testing. We also completely agree that the theoretical mapping of the ATSS-EA dimensions onto the TPB and SDT frameworks was previously underdeveloped, and that our phrasing regarding the scale’s direct impact on SDG 3 targets needed to be tempered to reflect an indirect, foundational pathway.

Action Taken:

Measurement Invariance: To address the developmental span of early adolescence, we conducted configural and metric measurement invariance testing across grade levels. The results confirmed that the scale's factorial structure holds invariant across this cognitive and developmental range. We have added these results to the revised manuscript and provided the detailed in Supplementary Table S8 and S9.

Theoretical Integration: We have expanded the Introduction section to explicitly map the scale's dimensions to the theoretical frameworks. We added a paragraph explaining how the Interest dimension aligns with affective attitudes (TPB) and intrinsic motivation (SDT); how Participation relates to behavioral intentions/control (TPB) and competence (SDT); and how Lifestyle reflects integrated regulation (SDT).

SDG 3 Framing Tempered: We reviewed the entire manuscript (including the Abstract, Introduction, and Conclusion) to tone down our claims regarding SDG 3. We replaced definitive wording with more cautious academic phrasing, clarifying that assessing attitudes provides an indirect pathway and serves as a foundational screening step toward broader population health outcomes, rather than a direct health intervention.

Comment 5: Methodological Rigour in the Qualitative Phase.
"Sample Size Justification: Did thematic saturation determine this number?... Analysis of Qualitative Data: Who conducted the coding? What was the framework? Inter-rater reliability?... Retrospective Probing: Why was concurrent probing not used? Recall bias?... Revision Documentation: Table S2 (referenced but not provided) supposedly shows original and revised items. Were revisions minor or substantial?"

Response: We sincerely thank the reviewer for this rigorous evaluation of our qualitative methodology. The reviewer is absolutely correct that critical details regarding data saturation, the coding framework, the rationale for our probing strategy, and the specific documentation of item revisions were insufficiently detailed in the original manuscript. We have addressed each of these methodological points to ensure full transparency and replicability of the cognitive interview phase.

Action Taken:

Sample Size & Saturation: We clarified in Section 2.3 that thematic saturation (where no new comprehension problems emerged) was actually reached around the 20th interview. However, we deliberately extended the sample to 27 to guarantee balanced representation across all developmental stages (Grades 5 through 8).

Coding Framework & Reliability: We explicitly detailed the coding process in the manuscript. We specified that two independent researchers coded the transcripts based on Willis’ (2005) cognitive interviewing framework (Comprehension, Retrieval, Judgment, Response). We reported a high initial inter-rater agreement and clarified that any discrepancies were resolved via consensus with a third expert.

Rationale for Retrospective Probing: We added a strong methodological justification for using retrospective rather than concurrent probing. We explained that for early adolescents, concurrent probing can severely disrupt the natural cognitive flow and artificially inflate cognitive load. Retrospective probing, conducted immediately after questionnaire completion, minimizes this disruption while keeping the recall period extremely short, thereby mitigating recall bias.

Revision Documentation (Table S2b): We apologize for the previous omission of the revision table. To provide complete transparency regarding the nature of the changes, we have now included a comprehensive table showcasing the "Original statements," and the "Revised statements,". To align with our new supplementary structure, this is now provided as Supplementary Table S2b (and clearly referenced in the main text).

Comment 6: Presentation and Reporting Issues.
"Incomplete Tables: Table 2 presents factor loadings... but does not show Standard errors, t-values/z-statistics, p-values... Table Formatting: Table 2 lists items with numbers (e.g., "40", "70", "80")... Readers cannot determine what each item measures. Duplicate Text: Section 2.2.3 contains duplicate paragraphs... Figure Reference: Figure S1 is mentioned... but is not included. Confidence Interval Error: In Table 2, the RMSEA... is reported as "[0.49, 0.59]", but it should be "[0.049, 0.059]"... Page Numbering: The PDF shows page numbers restarting at "43"..."

Response: We are incredibly grateful to the reviewer for their exceptionally meticulous proofreading of our manuscript. The reviewer is absolutely correct on all counts; these formatting inconsistencies, typographical errors, and layout issues were oversights during the final manuscript preparation phase. We have systematically corrected all the presentation issues to meet the journal's high publication standards.

Action Taken:

Incomplete Tables & Table Formatting: We have comprehensively revised Table 2. We removed the raw dataset codes (e.g., "40", "70") and replaced them with sequential, standardized item identifiers. The full wording for these items is now accessible in the manuscript.

Duplicate Text: We removed the inadvertently duplicated paragraph in Section 2.2.3 (Perceived Physical Literacy Scale).

Figure Reference: We apologize for omitting the figure. The CFA path diagram, illustrating the three-factor structure and standardized estimates, has now been properly inserted into the document as Supplementary Figure S1.

Confidence Interval Error: Thank you for catching this critical typographical error. We have corrected the RMSEA 90% confidence interval from "[0.49, 0.59]" to its accurate value of "[0.049, 0.059]" in both the text and the corresponding table.

Page Numbering: The document's internal pagination settings have been reformatted to ensure page numbers begin at 1 and run continuously throughout the manuscript.

Comment 7: Language and Writing Quality. "The manuscript contains numerous grammatical errors, awkward phrasing, and redundant statements that require revision: Examples: Page 1, Highlights: 'The scale scores strongly predict...' - awkward. Page 2, Abstract: second sentence becomes convoluted. Page 3: section heading formatting is inconsistent. Page 4: 'no specific exclusion criteria' phrasing is unclear. Page 5: duplicate paragraphs. Page 7: '251 participants fully satisfied...' seems out of place. Page 8: 'proving it is capable...' - awkward. Page 11: 'diagnose this level of internalisation' - overstates capability."

Response: We sincerely thank the reviewer for their exceptionally thorough reading of our manuscript. Pointing out these specific typographical errors, awkward phrasings, and leftover artifacts from previous drafts has been immensely helpful in refining the final text. We completely agree with all the suggested linguistic and structural changes.

Action Taken: We have conducted a comprehensive proofreading and editing pass of the entire manuscript, specifically addressing all the reviewer's examples:

Highlights & Abstract: We revised the Highlights to read "The scale scores are strong predictors of..." and simplified the convoluted second sentence in the Abstract for better readability.
Formatting & Duplications: We corrected the inconsistent section heading numbering on Page 3 and removed the duplicate paragraph in Section 2.2.3.
Clarifications (Exclusion Criteria & Out-of-place sentences): In Section 2.1, we clarified the phrasing to state that "All students were initially eligible; however, cases with missing data or outliers were subsequently excluded." Furthermore, we identified and deleted the out-of-place sentence regarding "251 participants" on Page 7, which was a typographical artifact from an earlier exploratory analysis draft.
Toning Down Absolute Language: Following the reviewer's excellent suggestions, we replaced "proving it is capable" with "demonstrating its capacity" (Page 8). Similarly, on Page 11, we changed the medicalized word "diagnose" to "assess," acknowledging that a self-report scale assesses rather than clinically diagnoses an attitude.

Comment 8: Ethical Considerations.
"Assent: The authors state that written informed consent was obtained from legal guardians... The authors should explain whether assent was obtained... Compensation: Was any compensation or incentive offered for participation? Data Privacy: The authors state that data were 'anonymised and stored on the responsible researcher's personal computer.' ... Were there any data security protocols (encryption, password protection)? Was the study approved by an ethics committee that reviewed data security procedures?"

Response: We sincerely thank the reviewer for highlighting these critical ethical parameters. The reviewer is absolutely correct that our reporting of the ethical procedures was incomplete and that the phrase "personal computer" was an overly casual and inaccurate description of our data security measures. We have completely revised the ethical considerations section to accurately reflect the strict protocols approved by our institutional ethics committee.

Action Taken: We have substantially expanded the Ethical Considerations / Data Collection section of the manuscript to address all three points:

Assent & Compensation: We explicitly added that in addition to parental consent, direct verbal and written assent was obtained from all participating early adolescents (Appendix F). We also clarified that participation was strictly voluntary and that absolutely no financial, academic, or material compensation was offered to the students or schools.

Data Privacy & Security: We removed the inaccurate "personal computer" phrasing. We updated the text to accurately state that all data were anonymized using numerical identifiers and stored on a password-protected, encrypted local drive accessible only to the core research team. Furthermore, we explicitly confirmed that these specific data security procedures and privacy protocols were comprehensively reviewed and approved by the Muğla Sıtkı Koçman University Ethics Committee prior to any data collection.

Comment 9: Specific Comments and Questions (Introduction).
"Lines 43-46: 'A primary mechanism...' This statement needs a citation... Lines 51-53: The authors should provide more evidence that attitude assessment leads to better intervention design... Lines 59-61: 'Meta-analyses confirm...' Which meta-analyses? Please provide citations... Lines 70-73: The description of the original ATSS is insufficient. Readers need to know: How was the original scale developed? Theoretical basis? Original psychometric properties? ... Lines 77-79: The authors should elaborate on these developmental differences. Specifically, how do cognitive, emotional, and behavioural development differ between early and late adolescents...?"

Response: We sincerely appreciate the reviewer’s close reading of our Introduction. The reviewer is absolutely correct that several of our foundational claims required stronger empirical citations, and that both the background of the original ATSS and our developmental rationale for the adaptation needed deeper elaboration. We have systematically addressed all these points to strengthen the theoretical framing of the manuscript.

Action Taken:

Missing Citations Added: We have provided robust citations for our claims. We added WHO guidelines to support the SDG 3 pathway, cited behavioral research to confirm that attitude-informed/tailored interventions outperform generic ones, and explicitly referenced key meta-analyses confirming attitude as the primary predictor of physical activity intentions.

Elaboration on the Original ATSS: We expanded the description of the original ATSS (Şentürk, 2015). We clarified that it was developed via a mixed-methods approach grounded in TPB, reported its original strong construct validity indices (CFA fit indices beyond just Cronbach’s alpha), and noted its extensive use in descriptive monitoring studies.

Elaboration on Developmental Differences: We added a dedicated paragraph detailing the cognitive and emotional differences between early and late adolescents. We explicitly discussed the transition from concrete to formal operational thinking, shorter attention spans, and how abstract wording in high school instruments imposes an undue cognitive load on middle schoolers, thereby justifying the necessity of this adaptation to prevent measurement error.

Comment 10: Methods (Section 2.1 to 2.4).The reviewer provided a detailed list of methodological queries, including: requests for exact recruitment numbers; clarification on sampling terminology; missing demographic indicators (age range, SES) in Table 1; a formatting typo ("con tribute"); specifics regarding the think-aloud protocol (who conducted it, recordings, warm-up examples); qualitative sample details (overlap, month-to-year conversion); expert panel composition; and missing statistical criteria in the data analysis section (model fit thresholds, CR formula, alpha level).

Response: We are profoundly grateful to the reviewer for this exceptionally detailed and constructive methodological critique. The reviewer successfully identified several areas where our narrative was overly anecdotal while simultaneously lacking essential operational details and statistical parameters. We have systematically addressed every point raised to ensure the Methods section meets the highest standards of transparency and replicability.

Action Taken:

Recruitment & Sampling (Section 2.1): We removed the anecdotal descriptions of schoolyard announcements. We clarified the sampling strategy, stating it was a convenience sample at the district level, with grade-level stratification.

Table 1 & Typos: We corrected the "con tribute" typo. strict national ethical guidelines prohibit collecting direct SES data from minors, which we have now explicitly noted as a limitation rather than an omission in the table.

Qualitative Phase Details (Sections 2.2.1 & 2.3): * We clarified that the primary researchers conducted the think-aloud sessions.We specified that sessions were only audio-recorded to prevent the anxiety and reactivity associated with video-recording minors.We changed the unusual "156.3 months" metric to standard years (M = 13.02 years).We explicitly stated that these 27 students were excluded from the main quantitative sample to prevent test-retest bias.We added a concrete example of the warm-up exercise (asking students to think aloud while counting windows in their house).We detailed the expert panel composition: four independent professionals (two PE teachers, two developmental psychologist).

Data Analysis Parameters (Section 2.4): We updated the data analysis section to explicitly state the fit indices thresholds (e.g., CFI/TLI < .90, RMSEA > .08) citing Hu & Bentler (1999). We also confirmed that CR was calculated using Fornell and Larcker's (1981) formula, and that the significance level was uniformly set at alpha = .05.

Comment 11: Results (Section 3).
The reviewer identified several issues in the Results section, including: an out-of-place sentence ("251 participants"); alternative models not reported; formatting and layout issues in Table 2; exceptionally high CR values suggesting item redundancy; a circular phrasing regarding CR; weak correlations with DMPAT needing theoretical explanation; questions regarding discriminant validity given the high correlations (r=.923); and the use of the non-neutral word "remarkably".

Response: We are deeply grateful for this comprehensive breakdown of the Results section. The reviewer's sharp eye caught several remnants from previous drafts, as well as excellent theoretical nuances (such as the divergence between mandatory and voluntary physical activity) that significantly elevate the quality of our Discussion.

Action Taken:

Previously Addressed Edits: As detailed in our responses to earlier comments (Comment 7 regarding typos, Comment 6 regarding Table 2 formatting, and Comment 2 regarding alternative 1-factor model testing and the r=.923 common-method bias issue), we have already deleted the anomalous "251 participants" sentence, reformatted Table 2 sequentially (Items 1-25) with all missing statistical values, reported the poor fit of a 1-factor model, and extensively discussed the conceptual overlap/method bias causing the .923 correlation.
Phrasing Corrections: We corrected the circular phrasing ("imply high composite reliability") to "indicate excellent internal consistency and construct reliability." We also replaced the subjective term "remarkably" with the neutral term "strong".
High CR and Item Redundancy: We completely agree that CR values exceeding .930 indicate potential item redundancy. We have added a statement to the Limitations and Future Directions section explicitly discussing this and recommending the future development of a brief/short-form version of the ATSS-EA.
Theoretical Explanation of DMPAT vs. WVPAT: The reviewer correctly identified a fascinating theoretical divergence. We have expanded the Discussion section to explain why the ATSS-EA successfully predicts voluntary activity (WVPAT) but not mandatory activity (DMPAT). We clarified that mandatory activities (like active commuting or compulsory classes) are driven by external constraints and obligations, whereas voluntary participation is naturally predicated on the intrinsic positive attitudes measured by our scale.

Comment 12: Discussion and Limitations.
The reviewer provided a comprehensive critique of the Discussion and Limitations sections, identifying: overgeneralizations regarding adolescence; vague terms ("psychometric power"); a missing Piaget citation and explanation of concrete operations; hyperbolic claims regarding preventive medicine; the need to map SDT dimensions (autonomous vs. controlled); the inappropriate use of the word "diagnose"; and several omitted limitations and future research directions (e.g., test-retest, social desirability, cross-cultural invariance, cutoff scores).

Response: We are extremely grateful to the reviewer for this final, masterful critique. The reviewer's insights into developmental variability, theoretical precision (SDT and Piaget), and rigorous methodological limitations have significantly elevated the final maturity of this manuscript. We have carefully incorporated all of these theoretical nuances and methodological caveats.

Action Taken:

Tempering Claims & Terminology: We revised the sweeping generalizations regarding early adolescence, acknowledging that sport adoption varies across the lifespan. We replaced the vague "psychometric power" with precise terms ("construct validity and internal reliability") and softened the hyperbolic claim of sport being the "strongest" intervention to a "vital" strategy. As noted in our response to Comment 7, we also replaced the medicalized term "diagnose" with "assess.".

Theoretical Elaborations (Piaget & SDT): We added the requisite citation for Piaget (1972) and explained that while concrete cognitive stages dictate simpler item wording, the underlying multidimensional attitude structure remains stable. Furthermore, we explicitly mapped the ATSS-EA dimensions onto the SDT framework, noting that Interest and Lifestyle predominantly capture autonomous motivation.

Addressing Limitations: Several of the reviewer's excellent points (such as Common Method Variance, High Correlations/Discriminant Validity, and reliance on self-reports vs. objective measures) were thoroughly addressed and integrated into the manuscript during our earlier revisions.

Expanding Future Directions: We added a concluding paragraph to the Limitations section incorporating the remaining points. We acknowledged the lack of test-retest reliability and the potential for social desirability bias. We also explicitly recommended that future researchers establish cross-cultural invariance, determine clinically meaningful cutoff scores, test sensitivity to interventions, and triangulate ATSS-EA data with parent/teacher reports.

Summary and Recommendations. "The reviewer provided a comprehensive summary checklist of Essential Revisions (e.g., descriptions of measures, discriminant validity, sampling, missing data, qualitative details, typographical errors, SDG 3 claims, child assent, and supplementary materials), Recommended Additional Analyses (one-factor model, measurement invariance, Fornell-Larcker, EFA/CFA split, modification indices, descriptive stats), and Presentation Improvements (Table formatting, referencing, legends, moving Figure S1 to main text, and adding a participant flow diagram)."

Response: We are profoundly grateful to the reviewer for this rigorous, methodologically ambitious, and highly constructive review process. We are thrilled that the reviewer recognized the value of our mixed-methods approach. This final summary checklist was incredibly helpful for our final manuscript preparation. As detailed extensively in our itemized responses above (Comments 1 through 12), we have systematically addressed and integrated virtually all of the essential revisions and recommended analyses.

Action Taken (Summary of Final Adjustments):

Completion of Checklist: All specific methodological, statistical, ethical, and linguistic points raised in the summary (e.g., complete measure descriptions, CMV and discriminant validity explanations, missing data protocols, think-aloud saturation details, typo corrections, SDG 3 claim tempering, child assent documentation, measurement invariance testing, 1-factor vs 3-factor comparisons, and full descriptive statistics) have been fully executed and detailed in our preceding point-by-point responses.
Regarding EFA vs. CFA: The reviewer suggested splitting the sample for EFA and CFA. While this is standard for de novo scale development, our study is a developmental adaptation of the previously established ATSS (Şentürk, 2015). According to standard psychometric guidelines for cross-developmental adaptations of structurally confirmed instruments, running a strict CFA directly to verify the established a priori structure is the most robust and theoretically appropriate approach. We hope the reviewer agrees with this methodological decision, especially given the excellent model fit and robust multi-group invariance results achieved via CFA.
Presentation Improvements: * We reformatted all tables (including Table 2) to clearly delineate factor structures and ensured all table legends define the statistical abbreviations used.
- As requested, we moved the CFA Path Diagram from the supplementary files into the main manuscript, now labeled as Figure 1.

We believe that thanks to the reviewer’s exceptionally high standards and detailed guidance, the manuscript has been significantly strengthened and is now a much more robust contribution to the literature.

Reviewer 2 Report

Comments and Suggestions for Authors

The study focuses on an exercise attitude instrument for early adolescents, which speaks to the intersecting public health and school physical education concern of declining physical activity during middle school. Beyond the quantitative validation, the authors also conducted cognitive interviews and a think-aloud procedure to provide response-process evidence. This is not commonly seen in similar studies and strengthens confidence in the semantic appropriateness of the items.

I have several suggestions. In the Introduction, after mentioning SDG targets and efforts to promote adolescents’ physical activity and sport participation, the authors could add the trend that with higher educational attainment, physical activity tends to decrease while sedentary time increases, and cite the following recent paper: The role of education attainment on 24-hour movement behavior in emerging adults: evidence from a population-based study. Front Public Health. 2024;12:1197150. doi:10.3389/fpubh.2024.1197150.

It would also be helpful to improve the precision and consistency of the sample and study flow description. In the quantitative section, multiple numbers are reported in parallel. For example, the planned sampling was “40 students per school from 15 schools, total 600,” then the number collected and included is reported as 577, and after outlier screening and normality checks the final sample used for CFA is 531 (with 46 observations removed also mentioned). The Methods should describe, in one coherent paragraph, how the sample size changed at each step, the criteria used for exclusions, and whether the stratified structure of the sample remained balanced after exclusions. In addition, because participants were recruited from 15 schools, the data have a clustered structure. If CFA and correlational analyses treat individuals as independent, standard errors may be underestimated.

The presentation of validity evidence could be strengthened. The authors report good fit for the three-factor model with high loadings, but discriminant validity is judged only by whether latent correlations are below 0.85, while the reported correlations range from 0.72 to 0.80. In applied use, this may indicate substantial overlap among dimensions. At the same time, the total score correlates 0.923 with overall physical literacy and 0.845 with PAQ-C. With self-reported measures, effects of this size raise concerns about common-method bias and construct overlap. The interpretation of these results should be more cautious, making clear that the high correlations may reflect closely related constructs and similar measurement methods rather than strong prediction of actual behaviour. The Discussion or limitations section should also acknowledge the possibility of same-source bias.

Another suggestion is to include key robustness checks of the scale structure to improve its transferability in practice. The sample covers ages 10–15, spans grades 5–8, and has an almost balanced sex distribution. This provides a good basis for testing measurement invariance, at least configural and metric invariance across sex and grade groups. This is important if teachers or researchers want to compare scores across subgroups. If the authors consider the workload substantial, these results could be provided as supplementary material, but the manuscript should at least state whether invariance was tested and what the results were. In addition, because the items use a 5-point Likert format, using MLR can be acceptable, but the Methods should explain the rationale for handling ordered categorical data and report whether skewness or kurtosis was assessed. If non-normality is evident, a robustness check using WLSMV would strengthen the analysis.

There are also a few minor issues in writing and presentation. First, the manuscript uses very definitive wording in several places to describe model fit and the instrument’s performance, such as “perfect fit” and “clearly confirmed.” These statements should be replaced with more standard academic phrasing and aligned closely with the statistical evidence, especially given the single-region sample and cross-sectional design. Second, since CR, AVE, ω, and α are reported, it would be useful to add descriptive statistics for each dimension, including score ranges, means, and standard deviations, to help readers understand the distribution and potential ceiling effects. Third, because the authors highlight the contribution of cognitive interviews and two rounds of think-aloud, they may consider presenting the types and number of item revisions in a more structured way in the main text or supplementary materials, such as which changes involved clarifying abstract concepts, which involved wording substitutions, and whether any reverse-worded items caused comprehension difficulties in early adolescents.

Wishing the authors success with the revision.

Author Response

First of all, I would like to thank reviewer 2 for raising the quality of the article. I have highlighted all the corrections in RED text within the main text, and I am also providing a detailed report with this text.

Comment 1: Introduction and Literature (Educational Attainment Trend)

Reviewer's Comment: "In the Introduction, after mentioning SDG targets and efforts to promote adolescents’ physical activity and sport participation, the authors could add the trend that with higher educational attainment, physical activity tends to decrease while sedentary time increases, and cite the following recent paper: The role of education attainment on 24-hour movement behavior in emerging adults: evidence from a population-based study. Front Public Health. 2024;12:1197150. doi:10.3389/fpubh.2024.1197150."

Author's Reply: We sincerely thank the reviewer for highlighting this critical trend and providing this highly relevant recent literature. We completely agree that acknowledging the inverse relationship between educational attainment (and the resulting academic burden) and physical activity significantly strengthens the rationale of our study, particularly regarding the transitional challenges in middle school.

Action Taken: We have revised the Introduction section accordingly. Right after discussing the SDG targets and the general decline in youth physical activity, we have integrated a new statement explaining how increasing educational demands and attainment tend to elevate sedentary time. The suggested article has been added to our reference list and cited in this specific context to support our argument. (Please see the revised manuscript with track changes: Lines [54-59]).

Comment 2: Sample and study flow & Clustered data structure

Comment 2: "It would also be helpful to improve the precision and consistency of the sample and study flow description. In the quantitative section, multiple numbers are reported in parallel. For example, the planned sampling was “40 students per school from 15 schools, total 600,” then the number collected and included is reported as 577, and after outlier screening and normality checks the final sample used for CFA is 531 (with 46 observations removed also mentioned). The Methods should describe, in one coherent paragraph, how the sample size changed at each step, the criteria used for exclusions, and whether the stratified structure of the sample remained balanced after exclusions. In addition, because participants were recruited from 15 schools, the data have a clustered structure. If CFA and correlational analyses treat individuals as independent, standard errors may be underestimated."

Response: We highly appreciate the reviewer’s meticulous attention to our methodology. We agree that the initial presentation of the sampling flow was fragmented and could lead to confusion. Furthermore, the reviewer’s point regarding the clustered nature of our data (students nested within 15 schools) is an excellent methodological observation. While we utilized the MLR estimator—which is robust to non-normality and provides more reliable standard errors than standard maximum likelihood—we acknowledge that treating individuals as purely independent without multilevel modeling is a limitation.

Action Taken:

1. We have thoroughly revised the "Study Design and Participants" section. All sampling numbers have been consolidated into one coherent paragraph detailing the exact flow: the initial target (n=600), the actual collected forms (n=577), the specific exclusion criteria (missing data and multivariate outliers resulting in 46 exclusions), and the final sample (n=531). We also explicitly stated that the stratified balance regarding gender and grade levels remained intact after these exclusions, as reflected in Table 1.

2. We have added a new paragraph to the "Limitations" section acknowledging the clustered data structure. We transparently stated that because participants were recruited from 15 schools, treating individuals as independent in our current CFA and correlational analyses may lead to underestimation of standard errors, and we recommended the use of multilevel structural equation modeling (MSEM) for future studies to account for this nested structure.

Comment 3: "The presentation of validity evidence could be strengthened. The authors report good fit for the three-factor model with high loadings, but discriminant validity is judged only by whether latent correlations are below 0.85, while the reported correlations range from 0.72 to 0.80. In applied use, this may indicate substantial overlap among dimensions. At the same time, the total score correlates 0.923 with overall physical literacy and 0.845 with PAQ-C. With self-reported measures, effects of this size raise concerns about common-method bias and construct overlap. The interpretation of these results should be more cautious, making clear that the high correlations may reflect closely related constructs and similar measurement methods rather than strong prediction of actual behaviour. The Discussion or limitations section should also acknowledge the possibility of same-source bias."

Response: We sincerely thank the reviewer for this astute methodological observation. We completely agree that while the scale demonstrates strong criterion-related associations, correlations as high as 0.923 and 0.845 strongly suggest the presence of common-method variance and construct overlap. Because all data were collected via self-report instruments at a single time point, we recognize that our previous claims regarding the "prediction of actual behavior" were overly definitive. We appreciate the guidance to interpret these findings with greater caution.

Action Taken:

1. We have revised the interpretation of the correlational findings in the Discussion section. We explicitly stated that the high correlations (0.923 with physical literacy and 0.845 with PAQ-C) likely reflect conceptual proximity and shared measurement methods, rather than an absolute prediction of actual physical behavior.

2. We have added a dedicated paragraph to the Limitations section explicitly acknowledging "common-method bias" and "same-source bias." We noted that the reliance on parallel self-reported measures likely inflated the observed correlations and recommended the use of objective tracking tools (e.g., accelerometers) in future studies to provide a more rigorous assessment of predictive validity.

Comment 4: "Another suggestion is to include key robustness checks of the scale structure to improve its transferability in practice. The sample covers ages 10–15, spans grades 5–8, and has an almost balanced sex distribution. This provides a good basis for testing measurement invariance, at least configural and metric invariance across sex and grade groups. This is important if teachers or researchers want to compare scores across subgroups. If the authors consider the workload substantial, these results could be provided as supplementary material, but the manuscript should at least state whether invariance was tested and what the results were. In addition, because the items use a 5-point Likert format, using MLR can be acceptable, but the Methods should explain the rationale for handling ordered categorical data and report whether skewness or kurtosis was assessed. If non-normality is evident, a robustness check using WLSMV would strengthen the analysis."

Response: We sincerely thank the reviewer for these highly constructive methodological suggestions. We completely agree that establishing measurement invariance significantly enhances the practical transferability of the ATSS-EA for physical education teachers and researchers who wish to compare subgroup scores. We also appreciate the guidance to explicitly state our rationale for employing the MLR estimator with 5-point Likert data.

Action Taken:

We have conducted measurement invariance testing (configural and metric) across both gender (male vs. female) and grade levels. The analyses confirmed that the structural model and factor loadings operate equivalently across these subgroups. To maintain the flow of the main manuscript while providing full transparency, we have added a summary statement in the Results section and placed the detailed fit indices for the multi-group CFAs in the Supplementary Materials.
2. We have updated the "Data Analysis" section to clarify our rationale for using the MLR estimator. We added a statement confirming that univariate normality was assessed, and the skewness and kurtosis values for all items fell within the acceptable bounds (between -1.5 and +1.5). Because extreme non-normality was not evident, treating the ordered categorical data as continuous and utilizing the MLR estimator—which is robust to minor non-normality—was methodologically justified.

Comment 5: "There are also a few minor issues in writing and presentation. First, the manuscript uses very definitive wording in several places to describe model fit and the instrument’s performance, such as “perfect fit” and “clearly confirmed.” These statements should be replaced with more standard academic phrasing and aligned closely with the statistical evidence, especially given the single-region sample and cross-sectional design. Second, since CR, AVE, $\omega$, and $\alpha$ are reported, it would be useful to add descriptive statistics for each dimension, including score ranges, means, and standard deviations, to help readers understand the distribution and potential ceiling effects. Third, because the authors highlight the contribution of cognitive interviews and two rounds of think-aloud, they may consider presenting the types and number of item revisions in a more structured way in the main text or supplementary materials, such as which changes involved clarifying abstract concepts, which involved wording substitutions, and whether any reverse-worded items caused comprehension difficulties in early adolescents."

Response: We sincerely appreciate these constructive suggestions regarding the presentation of our findings. We entirely agree that absolute terminology should be avoided in cross-sectional, single-region research. Furthermore, providing descriptive statistics and detailing the specific cognitive revisions undeniably adds depth, transparency, and practical value to the manuscript.

Action Taken: 1. Wording Adjustments: We conducted a thorough review of the manuscript (particularly in the Abstract, Results, Discussion, and Conclusion sections) to replace overly definitive phrasing. Terms like "perfect fit" and "clearly confirmed" were revised to more standard, cautious academic phrasing such as "acceptable fit," "good fit," and "supported the structural model."

2. Descriptive Statistics: We have updated supplementery table (Table S6) to include descriptive statistics for each dimension. Score ranges (Minimum and Maximum), Means, and Standard Deviations have been added to allow readers to evaluate the score distributions and assess any potential ceiling effects.

3. Structured Qualitative Revisions: We significantly expanded the "Qualitative Phase" section in the Methods. We added a structured breakdown of the item revisions resulting from the think-aloud protocols. Specifically, we detailed the number of items revised due to abstract concepts and exact wording substitutions made for age appropriateness.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Major Concerns

Theoretical Framework and SDG Integration

Concern: While the authors repeatedly emphasise connections to Sustainable Development Goal 3 (Good Health and Well-Being), the theoretical integration feels superficial and repetitive. The introduction mentions SDG 3 numerous times without explaining how this specific scale adaptation uniquely contributes to achieving SDG targets beyond generic statements about promoting physical activity.

Recommendation: Streamline the SDG references to 2-3 strategic placements (abstract, introduction's opening, discussion/conclusion). Instead, elaborate on the specific mechanism: How does measuring attitudes in early adolescence enable targeted interventions that prevent the documented decline in physical activity? What specific SDG 3 indicators (e.g., 3.4 on non-communicable disease prevention) does this tool directly support? Provide concrete examples of how educators or public health officials would use ATSS-EA scores to design interventions.

Sampling and Generalizability

Concern: The sampling methodology raises questions about representativeness and generalizability. Although the authors describe a "stratified purposive sampling method" targeting 600 students (10 licensed male, 10 non-licensed male, 10 licensed female, 10 non-licensed female per school across 15 schools), several issues emerge:

a) All schools were selected from a single district (Marmaris, Mugla, Turkiye), limiting geographic and socioeconomic diversity.

b) The recruitment process ("based solely on interest in the research") introduces self-selection bias.

c) The authors acknowledge they could not compare participants with non-participants due to ethical constraints, but this limitation needs fuller discussion.

d) The final sample (N=531) after removing outliers represents 88.5% of the recruited 600 students, which is excellent. Still, the characteristics of the 46 excluded cases (8 from rural vs urban areas; specific grade levels?) could provide insights into who was systematically excluded.

Recommendation:

Add a limitations subsection explicitly addressing geographic constraints and self-selection bias.

Provide more detail about the 15 schools: What were their socioeconomic profiles? How many students were enrolled in each? What was the school-level participation rate?

Clarify whether "licensed" status was verified (e.g., through school records) or solely based on self-report.

Could you add a table showing the distribution of the final sample across schools and grades to show that stratification was maintained?

Statistical Analysis: Multiple Comparisons and Alpha Inflation

Concern: The study conducts numerous statistical tests (CFA fit indices, factor loadings, t-tests for known-groups validity, multiple Pearson correlations in Tables 4 and 5, ANCOVA, measurement invariance tests) without any apparent correction for multiple comparisons or discussion of Type I error inflation. Table 5 alone presents 20 correlations, all reported as significant at p<0.001. While the effect sizes appear robust, the lack of any adjustment (e.g., Bonferroni correction) or discussion of this issue is concerning.

Recommendation:

Add a statement in the Data Analysis section acknowledging the multiple comparison issue and either: (a) justify why corrections were not applied (e.g., hypothesis-driven, confirmatory nature of specific tests), or (b) apply a conservative correction (e.g., Bonferroni) and note whether significance thresholds were maintained.

For Table 5, consider presenting as a correlation matrix with confidence intervals rather than focusing solely on p-values. The extremely high correlations (e.g., ATSS total with PPLS total at r = 0.923) raise questions about discriminant validity—are these constructs empirically distinct? This deserves discussion.

Discriminant Validity and Factor Correlations

Concern: The authors report inter-factor correlations ranging from 0.72 to 0.80 and claim discriminant validity because "no correlation coefficient exceeded the severe multicollinearity threshold of 0.85." However, with correlations this high, the distinctiveness of the three factors warrants deeper examination. The AVE values (0.679-0.717) are acceptable, but the shared variance between factors (r² = 0.52-0.64) suggests substantial overlap.

Recommendation:

Conduct and report formal discriminant validity tests (e.g., Fornell-Larcker criterion comparing AVE with squared inter-factor correlations; heterotrait-monotrait ratio).

Discuss the theoretical implications: Are these factors truly distinct dimensions, or do they represent facets of a unidimensional construct? The single-factor model's poor fit (CFI=0.815, RMSEA=0.129) supports multidimensionality, but the high correlations suggest the dimensions are closely intertwined in early adolescents' psychological experience.

Consider whether a higher-order factor model (with a general "sport attitude" factor explaining the three dimensions) might provide a more parsimonious representation.

Criterion-Related Validity: Overlapping Constructs

Concern: The criterion-related validity analyses reveal correlations that may be artificially inflated due to construct overlap. The ATSS correlates with PPLS at r = 0.923, suggesting they may measure nearly identical constructs rather than distinct but related phenomena. Similarly, PAQ-C correlations (r=0.845) are exceptionally high for attitude-behaviour relationships, which typically show modest associations (meta-analyses suggest r ≈ 0.30-0.50). This raises questions about whether the ATSS inadvertently measures self-reported behaviour rather than attitudes, or whether common method variance (same respondents, same time point, similar Likert formats) inflated correlations.

Recommendation:

Please discuss the potential for common-method bias and note that future studies should include objective measures (e.g., accelerometry) or multi-trait, multi-method designs.

Acknowledge that the extremely high correlations may reflect conceptual overlap—for example, the PPLS includes motivational items that closely resemble attitude items. Provide a theoretical justification for why these constructs should be distinguishable despite empirical overlap.

Could you conduct an exploratory factor analysis combining ATSS and PPLS items to see whether they load on separate factors?

Cognitive Think-Aloud Protocol: Methodological Rigour

Concern: The qualitative phase is a strength, but reporting lacks the detail needed for replication and evaluation. The authors mention that "two rounds of cognitive interviews" were conducted 14 days apart, but the description of the analysis process is sparse:

a) How were the "error codes" generated and defined?

b) What was the inter-rater reliability between the two independent coders?

c) Which specific items were revised, and what were the original versus revised wordings? (Table S2a and S2b are mentioned but not provided in the main text.)

d) The authors claim "thematic saturation was achieved after approximately 20 interviews," but no saturation grid or evidence is provided.

e) The retrospective probing approach is justified, but more details about the probe questions would strengthen replicability.

Recommendation:

You can expand the qualitative methods section to include: coding scheme definitions; inter-rater reliability statistics; a table showing the original and revised items with rationales for changes; and a more detailed description of the saturation assessment.

Consider including representative quotes from students illustrating comprehension difficulties (e.g., "When I read 'social status,' I thought about...") to enrich the qualitative findings.

Could you clarify whether the same 27 students participated in both rounds? If so, could you discuss potential testing effects?

Missing Information About the Original Scale

Concern: Readers unfamiliar with the original ATSS (Senturk, 2015) lack sufficient information to evaluate the adaptation's fidelity. The original validation study is cited but not summarised in sufficient detail. Key questions:

a) What was the original factor structure and item distribution?

b) What were the original psychometric properties (beyond Cronbach's α=0.97)?

c) Which specific items were modified during adaptation, and how many remained unchanged?

d) Was the original scale developed using similar theoretical foundations (TPB, SDT)?

Recommendation:

Could you add a brief subsection in the introduction or methods describing the original ATSS: number of items per factor, sample characteristics from original validation, evidence of validity, and theoretical grounding?

Create a table (possibly as supplementary material) cross-walking original items to adapted items, noting modifications and rationales.

Could you discuss whether any items were removed entirely and why?

Known-Groups Validity: Covariate Control

Concern: The known-groups comparison (licensed vs non-licensed) shows significant differences with large effect sizes (Cohen's d = 0.694-0.878). However, the ANCOVA (controlling for gender and grade) is described, but the results are not fully presented (Table S7a is referenced but not in the main text). Without seeing the adjusted means, it's unclear whether the license effect remains robust after controlling for these variables.

Recommendation:

Present the ANCOVA results in the main text (or at least summarise key findings: F-statistics, adjusted means, effect sizes).

Could you consider whether additional covariates should be controlled (e.g., socioeconomic status, access to sports facilities) and acknowledge their absence as a limitation?

Discuss the possibility of reverse causality: Do positive attitudes lead to obtaining a license, or does holding a license (and thus participating in organised sports) foster positive attitudes? Longitudinal data would be needed to establish directionality.

Writing Quality and Organisation

Concern: The manuscript contains numerous grammatical errors, awkward phrasing, and inconsistent terminology that detract from readability and professional presentation. Examples:

Page 1: "the most significant causes of which are increased educational demands and educational level" (repetitive, awkward)

Page 3: "Theoretically, the three dimensions of the ATSS- EA map well onto both TPB and SDT frameworks." (inconsistent hyphenation: ATSS-EA appears with and without hyphen)

Page 4: "Muqla Sthk Koeman University" (misspelling of institution name)

Throughout: Inconsistent use of "Turkiye" vs. "Turkey" (should be consistent with journal style)

Page 5: "Middle school students (10-15 years old) have fresher abstract thinking skills compared to high school students." (unclear meaning—do they mean "less developed"?)

Page 14, line 525: "The concept of physical literacy, whose foundations were laid by Whitehead [65] in the literature" (awkward phrasing)

References: Multiple formatting inconsistencies (some DOIs missing, journal names abbreviated inconsistently)

Recommendation: The manuscript requires thorough copyediting by a native English speaker with expertise in academic writing. Specific suggestions:

Could you create a terminology consistency table (e.g., decide on ATSS-EA vs ATSS EA vs adapted ATSS and use it consistently)?

Review all instances of "Turkiye" vs. "Turkey" and follow journal guidelines.

Simplify overly complex sentences (e.g., break long sentences into 2-3 shorter ones).

Check all references against the journal style guide; ensure DOIs are formatted consistently (some have "doi:10.xxx" while others lack this prefix).

Proofread author names and affiliations for accuracy.

Discussion and Conclusions: Overreach and Repetition

Concern: The discussion section repeats findings without synthesising them into meaningful implications. For example, the paragraph on physical literacy (lines 525-534) merely restates correlations without explaining why the strong ATSS-physical literacy relationship matters theoretically or practically. The conclusion similarly recapitulates results without offering novel insights or future directions.

Additionally, the discussion introduces new literature (e.g., Whitehead on physical literacy) that should have been integrated into the introduction. The discussion repeatedly invokes SDG 3 but never specifies how educators or policymakers would use ATSS-EA scores in practice.

Recommendation:

Restructure the discussion to follow a standard format: (1) summary of main findings, (2) interpretation in light of theory (TPB, SDT), (3) comparison with previous research, (4) practical implications, (5) limitations, (6) future directions.

For each major finding, explain why it matters. Example: "The strong correlation between the Interest dimension and physical literacy's motivation subscale (r=0.841) suggests that interventions targeting intrinsic enjoyment may simultaneously enhance both attitudes and perceived competence, offering an efficient target for school-based programs."

Add a concrete practical applications section describing how teachers might use ATSS-EA scores (e.g., identifying students with declining interest for early intervention, evaluating program effectiveness, tailoring PE curricula).

Discuss whether the ATSS-EA could be used for cross-cultural comparisons and what adaptations would be needed for international use.

Address the limitations of self-report data and propose specific future studies (e.g., longitudinal designs tracking changes in attitudes from late childhood through adolescence; intervention studies using ATSS-EA as an outcome measure).

Specific Comments/Questions/Suggestions

Title and Abstract

Lines 1-3 (Title): The title is clear but lengthy. Consider: "Adaptation and Validation of the Attitude Towards Sport Scale for Early Adolescents: Implications for Sustainable Health and Well-Being"

Highlights section:

First bullet: "successfully preserving its original three-factor structure" – add the factor names in parentheses for clarity.

Second bullet: "preventive medicine specialists" – consider whether this term is appropriate; "pediatric public health professionals" might be more precise.

Abstract:

Line 32: "Objectives: This study aimed to adapt and validate" – consider "This study aimed to adapt the Attitude Towards Sport Scale (ATSS) for middle school students (ages 10-15) and evaluate its psychometric properties."

Line 35: "We used a mixed-methods approach" – specify the qualitative (cognitive think-aloud, n=27) and quantitative (CFA, N=531) sample sizes.

Line 40: "The results supported the structural model that the original three-factor structure... acceptable fit" – grammar error: "demonstrated acceptable fit" or "fit the early adolescent sample acceptably."

Conclusions: Add a brief statement about practical implications (e.g., "The adapted ATSS provides a developmentally appropriate tool for educators and researchers to monitor sport attitudes and identify students at risk of disengagement.")

Introduction

Lines 55-60: The opening could be more compelling. Start with a statistic about declining physical activity in early adolescence, then connect to SDG 3. Example: "Globally, 81% of adolescents aged 11-17 years fail to meet WHO physical activity recommendations (Guthold et al., 2020), a trend that intensifies during the transition to early adolescence. This decline threatens progress toward Sustainable Development Goal 3 (Good Health and Well-Being), which targets the prevention of non-communicable diseases through lifestyle modification."

Lines 61-67: The discussion of TPB and SDT is good, but could be more integrated. Consider a figure showing how ATSS dimensions map onto theoretical constructs, or a table comparing the three frameworks.

Lines 82-94: The developmental rationale for adaptation is excellent. However, the claim that middle schoolers "have fresher abstract thinking skills" (line 97 in PDF, but page numbers vary) is confusing. Do you mean "less developed" or "emerging"? Clarify.

Hypotheses: Well-formulated and testable. Consider adding a fifth hypothesis about measurement invariance across gender or grade levels, since this was tested.

Methods

Lines 123-129 (Study design): Clarify that this is a cross-sectional methodological study with a qualitative pretesting phase. The phrase "multi-phase mixed-methods design" is accurate but could be simplified.

Lines 130-134 (Ethics): Excellent attention to ethical detail. Add that child assent was obtained verbally and documented (the current text says "verbal and written assent" – was written assent obtained from children? If so, describe the form).

Lines 135-155 (Sampling):

Specify the dates of data collection (month/year).

Could you clarify whether the 15 schools included both public and private schools, or only public?

Define "rural" and "centre" (urban?) schools. What criteria distinguished them?

The target of "10 licensed males, 10 non-licensed males, 10 licensed females, and 10 non-licensed females per school" is clear, but was this achieved in the final sample? Could you provide a table showing achieved vs target by school?

Line 151: "Cases with undelivered parental consent forms, missing data, or identified as multivariate outliers were subsequently excluded" – report how many cases fell into each category.

Lines 160-161 (Table 1):

Table 1 is difficult to interpret. Column headers are unclear: "MGPAMDMPAT (min.)" – What does this abbreviation mean? Define in the table note.

Present means and SDs for each demographic group (gender, grade, license status) separately.

You can add n for each subgroup (already present in the N column, but please consider adding row percentages for clarity.

Report age ranges and means by grade level.

Lines 165-194 (ATSS description):

Add the number of items per factor (13, 6, 6) earlier in this section.

Does the 5-point Likert scale include a neutral midpoint?

The description of the think-aloud protocol (lines 185-194) should be moved to the Qualitative Phase section (2.3) for better organisation.

Lines 195-212 (PAQ-C and PPLS):

Report Cronbach's alpha for these scales in the current sample (you mention 0.89 for PPLS but not for PAQ-C).

Could you provide citations for the original English versions of these scales, not just Turkish adaptations?

Could you clarify whether these scales were administered to all participants in the quantitative phase (N=531) or only a subset?

Lines 213-222 (Personal Information Form):

The questions about physical activity duration are subject to recall bias. Could you acknowledge this limitation?

Define "mandatory" vs. "voluntary" physical activity more clearly and provide examples.

For license status, specify whether "school sports" includes intramural activities or only competitive interscholastic sports.

Lines 223-258 (Qualitative Phase):

Excellent detail on recruitment and procedures.

Line 239: "mean age of 156.3 months" – convert to years for readability (13.03 years), with months in parentheses if desired.

Table S1 should be referenced and summarised (e.g., "The 27 participants were balanced across grades and license status").

Line 254: "Willis' (2005) cognitive interviewing framework" – provide full citation in references.

Line 256: "Any coding disagreements were resolved through discussion" – report the frequency of disagreements and final consensus rate.

Lines 259-299 (Think-Aloud Process):

Outstanding detail on the warm-up exercise and retrospective probing.

Line 276: "6 items containing abstract sporting terms were replaced" – provide examples in text or table.

Line 278: "the word 'social status' was replaced with 'being successful'" – this is a significant semantic shift. Discuss whether this changes the construct being measured.

Line 291: "the clarity of each revised item was confirmed" – was this based on qualitative feedback or quantitative ratings? Clarify.

Lines 300-324 (Quantitative Data Analysis):

Specify the software version (JASP 0.95.4) and lavaan package version.

Justify the use of the MLR estimator given the 5-point Likert data.

Provide thresholds for fit indices with citations (Hu & Bentler, 1999; Kline, 2023) – you do this, but ensure consistency with current recommendations (e.g., CFI ≥ 0.95 for excellent fit, RMSEA < 0.06).

Line 311: "skewness and kurtosis values for all items were found to be within the acceptable range" – report the ranges.

Clarify whether missing data were handled (listwise deletion, FIML, etc.) – you mention excluding cases with missing data, but what was the extent of missingness?

Results

Lines 325-340 (Model Fit):

Report χ², df, and p-value in text (you do this).

Add 90% CI for RMSEA (you provide this).

Note that the single-factor model's fit indices (CFI=0.815, RMSEA=0.129) should be presented in a table or with more detail.

Line 336: "We also did not use a modification index" – this is unusual; explain why no modifications were considered (e.g., theoretical rationale, risk of capitalising on chance).

Lines 341-358 (Convergent Validity):

Table 2: The organisation is confusing. List items by factor, including loadings and R². The current table clusters items without a clear separation between factors.

Could you add a column for the factor name next to each item?

Could you report the AVE and CR calculations (you do this; consider adding formulas or citations)?

Figure 1: The path diagram should be larger and include factor loadings (currently too small to read). Could you make sure all paths and residual variances are visible?

Lines 359-373 (Known-Groups Validity):

Table 3: Excellent presentation. Add 95% CIs for Cohen's d.

Report the ANCOVA results (referenced as Table S7a) in the main text or as a supplementary table with key statistics summarised.

Line 372: "We also found no differences in ATSS-EA between grade levels using ANOVA and t-test" – report these results (F-values, p-values) briefly.

Lines 374-387 (Criterion Validity with Physical Activity):

Table 4: The correlation matrix is clear. Consider adding a note about the interpretation of DMPAT vs WVPAT correlations (e.g., "The stronger correlations with WVPAT suggest that ATSS measures attitudes toward voluntary, intrinsically motivated activity rather than compulsory activity").

Report the p-values in the table (you do this) and note that correlations with DMPAT, while significant, are very small (r < 0.12) – this deserves comment.

Lines 388-406 (Criterion Validity with PPLS and PAQ-C):

Table 5: The correlations are extremely high. Add a note acknowledging that these may be inflated by common method variance.

Consider presenting a multitrait-multimethod matrix if the data allow.

Discuss the theoretical implications: Do these high correlations suggest that ATSS and PPLS measure the same underlying construct? If not, why are they so highly correlated?

Lines 407-412 (Summary): The summary paragraph is helpful, but could be expanded to synthesise all validity evidence.

Discussion

Lines 413-424 (Opening): Good summary of main findings. Add a sentence about the study's unique contribution (e.g., "This is the first study to systematically adapt a sport attitude scale for early adolescents using cognitive pretesting to ensure developmental appropriateness.")

Lines 425-434 (Construct Validity):

Discuss why the three-factor structure persisted despite developmental differences.

Connect to TPB: "The preservation of the Interest, Lifestyle, and Participation dimensions across developmental stages suggests that the cognitive structure of sport attitudes is established by early adolescence, supporting the TPB assumption that attitudes are stable belief-based constructs."

Lines 435-467 (Physical Literacy and Physical Activity):

This section is overly long and repetitive. Cut by 30-40%.

Integrate discussion of the extremely high correlations with PPLS (r=0.923). Is this evidence of convergent validity or construct redundancy?

Discuss the DMPAT vs WVPAT finding: Why do attitudes predict voluntary but not mandatory activity? This has practical implications (e.g., interventions should focus on voluntary opportunities).

Lines 468-496 (Known-Groups and Developmental Sensitivity):

Good discussion of license status differences. Add that this finding supports the scale's sensitivity to real-world behavioural engagement.

Discuss the lack of grade-level differences – is this surprising? Should attitudes become more positive or negative across grades 5-8?

Lines 497-524 (Practical Implications):

This section is too generic. Provide specific examples: "A physical education teacher could administer the ATSS-EA at the beginning of the school year, identify students scoring in the bottom quartile on the Interest dimension, and provide these students with extra opportunities for choice in activities to enhance intrinsic motivation."

Connect to SDG 3: "By identifying students at risk of sports dropout, schools can target interventions to maintain physical activity levels, directly contributing to SDG target 3.4 (reducing premature mortality from NCDs)."

Lines 525-574 (Physical Literacy Elaboration):

This section introduces new material that belongs in the introduction. Move or delete.

The discussion of future research (lines 558-574) is good but could be more specific (e.g., "Future studies should examine whether ATSS-EA scores predict actual dropout from sports programs over 2-3 years").

Lines 575-622 (Conclusion):

The conclusion is too long and repeats findings. Cut to one concise paragraph.

End with a forward-looking statement about the scale's potential for cross-cultural adaptation and use in intervention research.

References

Issues:

Several references have 2026 publication dates (e.g., Sukys et al., 2026; Roland et al., 2026; Rajkovic Vuletic et al., 2026) – these appear to be preprints or in-press articles. Verify that these are correctly cited and update if published.

Reference 26 (Senturk, 2015) is in Turkish – note this in the citation (e.g., "in Turkish").

Reference 32 (Kowalski et al.) lacks publication year – add.

References 58-64 are all on SDGs and physical education – consider consolidating or selecting the most relevant.

Check all DOIs for accessibility; some may be incorrect (e.g., reference 15 has a DOI, but it's not formatted consistently).

Recommendation: Run all references through a reference manager to ensure consistency with journal style. Verify all 2026 references for accuracy.

Figures and Tables

Figure 1 (CFA Path Diagram):

The diagram is too small and blurry. Could you provide a higher-resolution version?

Label all factors clearly.

Include factor loadings on paths (currently missing).

Consider presenting the factor loadings as a table with confidence intervals, instead of, or in addition to, the diagram.

Table 1 (Demographics):

Redesign for clarity. Separate into panels by variable (Gender, Grade, License).

Define abbreviations in the table note (MGPAMDMPAT is incomprehensible).

Report means and SDs for continuous variables; n and % for categorical.

Table 2 (Psychometric Properties):

Reorganise with clear factor headings.

Could you add a column for item wording (or refer to supplementary materials where full items are provided)?

Report 95% CIs for factor loadings.

Table 3 (Known-Groups):

Excellent. Add 95% CIs for Cohen's d.

Consider adding a row for the total ATSS score.

Table 4 (Correlations with Physical Activity):

Good. Add a note about the interpretation of effect sizes (e.g., "Correlations with WVPAT represent medium-to-large effects").

Table 5 (Correlations with PPLS and PAQ-C):

These correlations are extremely high. Please add a note acknowledging potential common method variance.

Consider presenting as a correlation matrix with confidence intervals.

Supplementary Tables:

Tables S2a/S2b (item revisions) should be included in the main text or at least summarised.

Table S6 (measurement invariance) should report ΔCFI, ΔRMSEA, and Δχ² with significance tests.

Table S7a (ANCOVA) should include F-statistics, df, p-values, and partial η².

Author Response

Dear Reviewer,

First of all, thank you for your detailed review again. The feedback has been carefully reviewed, and necessary corrections have been made, highlighted in red and indicating the lines. There may be minor line number discrepancies due to these corrections. Thank you so much for your support.

Comment 1: Theoretical Framework and SDG Integration. "While the authors repeatedly emphasise connections to Sustainable Development Goal 3... the theoretical integration feels superficial and repetitive. Recommendation: Streamline the SDG references to 2-3 strategic placements... elaborate on the specific mechanism... What specific SDG 3 indicators (e.g., 3.4) does this tool directly support? Provide concrete examples of how educators or public health officials would use ATSS-EA scores to design interventions."

Response: We are extremely grateful for this astute observation. The reviewer is absolutely correct that our previous iterations relied on repetitive, broad mentions of SDG 3 rather than articulating the precise, functional mechanisms connecting attitude measurement to specific global health targets. Streamlining these references and providing a concrete intervention scenario has significantly sharpened the manuscript’s theoretical focus and practical utility.

Action Taken:

Streamlining: We conducted a thorough review of the manuscript and removed redundant references to SDG 3. The explicit mentions are now strategically limited to the Abstract, the opening of the Introduction, and the Conclusion to frame the study without overwhelming the text.
Specific Mechanism & Target 3.4: In the Introduction, we explicitly linked the ATSS-EA to SDG Target 3.4 (reducing premature mortality from non-communicable diseases). We elaborated on the specific mechanism: capturing affective/cognitive barriers in early adolescence serves as an early-warning diagnostic before the onset of the developmental decline in physical activity, directly enabling preventive health strategies. (Please see in Introduction).
Concrete Intervention Example: In the Discussion section, we provided a highly specific, practical example of how the ATSS-EA informs targeted interventions. We illustrated a scenario where high 'Interest' but low 'Participation' scores signal to policymakers that structural/environmental barriers, rather than motivational deficits, need to be addressed (e.g., building infrastructure rather than running awareness campaigns). (Please see in conclusions).

Comment 2: Sampling and Generalizability.
"Concern: The sampling methodology raises questions about representativeness... a) All schools from a single district... b) self-selection bias. c) participants vs non-participants limitation needs fuller discussion. d) characteristics of the 46 excluded cases... Recommendation: Add a limitations subsection explicitly addressing geographic constraints and self-selection bias. Provide more detail about the 15 schools (socioeconomic profiles)... Clarify whether 'licensed' status was verified... Add a table showing the distribution of the final sample across schools and grades."

Response: We highly appreciate the reviewer’s meticulous attention to the sampling framework. The reviewer raises excellent points regarding self-selection bias, the verification of athletic licenses, and the necessity of proving our stratification retention after data cleaning. We have systematically addressed each of these methodological concerns to enhance the transparency and robustness of our sampling narrative.

Action Taken:

Self-Selection Bias & Limitations: We have expanded the Limitations section to explicitly discuss self-selection bias. We acknowledged that students with a pre-existing interest in sports might have been more likely to volunteer, thereby potentially elevating the baseline attitude scores. We also reiterated the geographic boundaries of the Marmaris district. (Please see Limitations).

School Profiles & SES: In the Methods (Section 2.1), we provided more context about the 15 schools. While we previously noted that strict Ministry of National Education (MEB) ethical protocols prohibit collecting individual family income data, we clarified that we purposefully stratified the schools geographically: 8 schools were selected from the urban/touristic center, and 7 from the rural/agricultural periphery, ensuring broad environmental and socioeconomic diversity by proxy. (Please see Methods).

Verification of Licensed Status: We updated the Methods section to clarify that while "licensed" status was initially self-reported on the survey, it was strictly cross-verified by the collaborating Physical Education teachers who have direct access to the official school sports registry (e-Okul/Okul Sporları system), ensuring 100% accuracy of this criterion variable. (Please see Methods).

Excluded Cases Analysis: We added a sentence in the Data Analysis section confirming that the 46 excluded cases (due to Mahalanobis distance/missing data) were evenly and randomly distributed across the 15 schools, both genders, and all four grade levels, meaning their exclusion did not systematically skew the remaining sample. (Please see Participants).

Distribution Table: To definitively prove that the grade and gender stratification was maintained in the final sample (N=531), we have created and added new supplementary tables (Table S10,S11), which details the final participant distribution across all 15 schools and grade levels. (Please see Supplementary Tables).

Comment 3: Statistical Analysis: Multiple Comparisons and Alpha Inflation.

"Concern: The study conducts numerous statistical tests... without any apparent correction for multiple comparisons or discussion of Type I error inflation. Recommendation: Add a statement in the Data Analysis section acknowledging the multiple comparison issue and either: (a) justify why corrections were not applied... For Table 5, consider presenting as a correlation matrix with confidence intervals... The extremely high correlations (e.g., r = 0.923) raise questions about discriminant validity."

Response: We deeply appreciate the reviewer’s rigorous statistical oversight. The reviewer correctly identifies the inherent risks of alpha inflation in multi-test designs. We have chosen to formally justify the absence of Bonferroni corrections based on the pre-planned, hypothesis-driven nature of psychometric validation. Furthermore, we completely agree that adding Confidence Intervals enhances the reporting quality of Table 5. Lastly, we have further clarified our stance on the discriminant validity issue (which we initially addressed in the prior revision round as an artifact of Common Method Variance).

Action Taken:

Multiple Comparisons Justification: In the Data Analysis (Section 2.4), we added a specific methodological paragraph explaining why conservative alpha adjustments (like the Bonferroni correction) were not applied. We cited epidemiological and statistical literature (e.g., Perneger, 1998; Rothman, 1990) to argue that in a confirmatory construct validation study where variables are theoretically expected to be highly correlated, such corrections would unacceptably inflate Type II errors. (Please see Methods).
Table 5 Enhancement: We have updated Table 5. In addition to p-values, we now provide the 95% Confidence Intervals (CI) for all major correlations to provide a clearer picture of the precision and robust effect sizes of these relationships.
Discriminant Validity Revisited: We expanded upon our previous additions in the Discussion and Limitations sections regarding the $r = .923$ correlation. We explicitly stated that while the scales measure theoretically distinct constructs (affective/cognitive attitude vs. perceived competence/literacy), the mathematical discriminant validity is obscured by the severe shared method variance inherent in concurrent, single-source self-report surveys.

Comment 4: Discriminant Validity and Factor Correlations.

"Concern: The authors report inter-factor correlations ranging from 0.72 to 0.80 and claim discriminant validity because "no correlation coefficient exceeded the severe multicollinearity threshold of 0.85."... Recommendation: Conduct and report formal discriminant validity tests (e.g., Fornell-Larcker)... Discuss the theoretical implications: Are these factors truly distinct dimensions...? Consider whether a higher-order factor model (with a general "sport attitude" factor) might provide a more parsimonious representation."

Response: We are profoundly grateful for this sophisticated psychometric critique. The reviewer is entirely correct that relying solely on the 0.85 heuristic is insufficient. We also deeply appreciate the reviewer’s eloquent phrasing regarding how these dimensions are "closely intertwined in early adolescents' psychological experience." We have adopted this exact theoretical lens to explain the high correlations and to formally justify the use of a Total Scale Score (as a higher-order representation).

Action Taken:

Formal Discriminant Validity & Unidimensional Comparison: We updated the Results (Section 3.2) to explicitly discuss the shared variance ($r^2$) in relation to AVE. We acknowledged that while the shared variance is substantial, the unequivocally poor fit of the alternative single-factor model (CFI = 0.815) serves as the definitive empirical proof that the construct is multidimensional, and the factors cannot simply be collapsed into a single mathematical dimension.
Higher-Order Theoretical Discussion: In the Discussion section, we expanded on the theoretical implications of these high correlations. We explained that Interest, Lifestyle, and Participation are not orthogonal; rather, they are highly interdependent facets of a higher-order construct ("general sport attitude"). We explicitly noted that this specific psychological architecture provides the fundamental theoretical justification for calculating and utilizing an ATSS-EA Total Score in practical educational and clinical settings.

Comment 5: Criterion-Related Validity: Overlapping Constructs. "Concern: The criterion-related validity analyses reveal correlations that may be artificially inflated... ATSS correlates with PPLS at r = 0.923... PAQ-C correlations (r=0.845) are exceptionally high... Recommendation: Please discuss the potential for common-method bias and note that future studies should include objective measures (e.g., accelerometry)... Acknowledge conceptual overlap... Provide a theoretical justification... Could you conduct an exploratory factor analysis combining ATSS and PPLS items...?"

Response: We deeply appreciate this critical perspective. As the reviewer rightly points out, correlations of .845 and .923 exceed typical meta-analytic behavioral associations, strongly pointing to Common Method Variance (CMV) and conceptual overlap. We have addressed the theoretical and methodological nuances of this overlap, though we respectfully decline the suggestion to pool the items into a single EFA for methodological reasons outlined below.

Action Taken:

CMV and Objective Measures (Already Addressed): As integrated during our previous revisions (and currently highlighted in the Limitations section), we have already explicitly acknowledged CMV as the primary driver of these inflated correlations. We also formally recommended the use of objective physical activity metrics (e.g., accelerometry) in future research to disentangle this overlap.
Theoretical Overlap & Justification: We expanded the Discussion to address the conceptual overlap between the scales. We explicitly noted that while the PPLS contains motivational items that overlap with the ATSS-EA, they remain theoretically distinct (direct evaluative attitude vs. holistic perceived competence). We also added a crucial developmental caveat: early adolescents likely struggle to empirically differentiate "liking sport" from "being physically literate" when responding to concurrent self-report formats.
Regarding the Combined EFA (Methodological Clarification): While combining ATSS and PPLS items into a single Exploratory Factor Analysis (EFA) is an interesting empirical thought experiment, we respectfully opted not to conduct this analysis. Because both the ATSS-EA and the PPLS are established, pre-validated instruments with known a priori structures, pooling their items into an exploratory (EFA) framework violates the strictly confirmatory (CFA) nature of this validation study. We believe the rigorous CFA-based discriminant validity checks (discussed in Action Taken 4) combined with our expanded theoretical discussion adequately address the distinctiveness of the constructs.

Comment 6: Cognitive Think-Aloud Protocol: Methodological Rigour. "Concern: The qualitative phase is a strength, but reporting lacks the detail needed... a) How were error codes generated? b) IRR? c) Specific items revised (Table S2a/b)? d) Saturation evidence? e) Retrospective probing details. Recommendation: Expand qualitative methods... Consider including representative quotes... Clarify whether the same 27 students participated in both rounds and discuss potential testing effects."

Response: We sincerely thank the reviewer for helping us further enrich the qualitative reporting. As incorporated during the previous revision round, we had already detailed the coding framework (Willis, 2005), reported the Inter-Rater Reliability, and provided the comprehensive list of original vs. revised items in Supplementary Table S2a and S2b (which details every specific wording change and rationale). However, the reviewer’s new suggestions to include exact probe questions, representative student quotes, and a discussion on the "testing effects" of the two-round design are excellent additions that significantly elevate the methodological transparency of this section.

Action Taken:

Specific Probe Questions: We expanded the Qualitative Methods section to include the exact standardized retrospective probes used during the interviews (e.g., "Can you explain in your own words what this sentence is asking?").
Representative Quotes: We added a narrative paragraph including direct, illustrative quotes from the middle school students. For example, we highlighted a 6th-grader's confusion over the term "physiological," demonstrating exactly how abstract adult terms were empirically flagged and translated into concrete adolescent phrasing.
Two Rounds & Testing Effects Justification: We clarified the procedure regarding the same 27 students participating in both rounds. We added a methodological justification explaining why traditional "testing effects" are not a confounding limitation in cognitive interviewing. Since the goal is linguistic comprehension rather than attitude measurement, having the same students re-evaluate the revised items 14 days later is actually a methodological strength; it allowed them to directly verify that the specific wording barriers they encountered in Round 1 were effectively resolved in Round 2.
Saturation Clarification: We clarified in the text that in the context of cognitive interviewing, "thematic saturation" does not require a complex grounded-theory grid, but simply denotes the empirical point (around the 20th interview) where no new vocabulary misunderstandings or syntactic confusions were generated by the students.

Comment 7: Missing Information About the Original Scale. "Concern: Readers unfamiliar with the original ATSS (Senturk, 2015) lack sufficient information... a) What was the original factor structure and item distribution? b) What were the original psychometric properties (beyond Cronbach's α)? c) Which specific items were modified... d) Was the original scale developed using similar theoretical foundations? Recommendation: Add a brief subsection describing the original ATSS... Create a table (possibly as supplementary material) cross-walking original items to adapted items... Discuss whether any items were removed entirely and why?"

Response: We sincerely appreciate the reviewer’s diligence in ensuring the original instrument is thoroughly contextualized. Fortunately, several of these excellent recommendations (such as elaborating on the original TPB theoretical foundations and reporting the original CFA psychometric properties beyond Cronbach's alpha) were already integrated into the Introduction during the first round of revisions. Furthermore, the requested cross-walk table detailing all item modifications was also previously created. However, we completely agree that explicitly stating the item distribution and clarifying that no items were deleted are critical additions that were missing from the text.

Action Taken:

Item Distribution & Removed Items: We updated the instrument description (Section 2.2.2) to explicitly state the original item distribution (Interest = 14 items, Lifestyle = 7 items, Participation = 4 items). Most importantly, we categorically clarified that zero items were removed entirely. The structural integrity of the 25-item scale was fully preserved; the adaptation was strictly linguistic and developmental.
Cross-Walking Table (Already Provided): As requested by the reviewer, a complete cross-walk table presenting the "Original Items," the "Revised Items," and the "Rationale for Modification" for all adapted items is already provided in the manuscript's appendices as Supplementary Table S2b. We have strengthened the in-text citations directing readers to this specific table to ensure maximum transparency.
Original Psychometrics & Theory: As noted in our previous revisions, the explicit confirmation of the original scale's TPB foundations and its robust original fit indices (e.g., RMSEA < .08, CFI > .90) are detailed in the Introduction section.

Comment 8: Known Groups Validity: Covariate Control. "Concern: ANCOVA (with gender and class level controlled) is explained, but the results are not fully presented (Table S7a)... Without seeing the adjusted means, it is unclear whether the licence effect is robust... Suggestion: Present ANCOVA results in the main text... Evaluate whether additional covariates (e.g., SES, access to facilities) should be controlled and consider their absence as a limitation. Discuss the possibility of reverse causality... Longitudinal data will be needed."

Response: We thank our reviewer for these informative methodological assessments. It appears there was a minor oversight regarding ANCOVA reporting, as robust effect sizes and F-statistics were indeed incorporated into the main text in the first round of revisions. However, we fully agree that the table needs to be centralized, and the reviewer's comment on "reverse causality" is an excellent theoretical caveat that clearly needs to be addressed.

Measures Taken: ANCOVA Results and Table Placement: We ensured that all ANCOVA results, including F-statistics, p-values, and effect sizes, were clearly highlighted in the main text. To avoid reader confusion, we moved the full ANCOVA table directly from the supplementary files to the Results section.

Covariates (SES and Facilities): As detailed in our previous revision, the Ministry of National Education’s (MEB) strict confidentiality protocols prohibited the direct collection of SES data from minors. We have now expanded this specific limitation statement to include the unmeasured variable “access to local sports facilities,” acknowledging that both are unmeasured confounding variables.

Reverse Causality and Need for Longitudinal Tracking: We added a separate paragraph to the Limitations section explicitly addressing the reviewers’ excellent point about reverse causality. We acknowledged that our cross-sectional design could not determine whether positive attitudes preceded athletic licensing or vice versa, and formally recommended that longitudinal tracking be conducted to determine this causal aspect.

Comment 9: Writing Quality and Organisation. "Concern: The manuscript contains numerous grammatical errors, awkward phrasing, and inconsistent terminology... Examples: Page 1 ('educational demands'), Page 3 (ATSS- EA hyphenation), Page 4 ('Muqla Sthk Koeman University' misspelling), Inconsistent 'Turkiye' vs 'Turkey', Page 5 ('fresher abstract thinking skills'), Page 14 (Whitehead phrasing), References formatting. Recommendation: Thorough copyediting... Create a terminology consistency table... Review 'Turkiye'... Simplify sentences... Check DOIs and references."

Response: We sincerely apologize for these typographical, grammatical, and formatting oversights. We are highly grateful to the reviewer for their meticulous eagle-eyed reading. The iterative nature of previous revisions inadvertently introduced some inconsistencies and awkward phrasing. We have thoroughly accepted all recommendations and conducted a comprehensive line-by-line copyediting sweep of the entire manuscript to ensure absolute professional consistency and readability.

Action Taken:

Terminology and Typographical Consistency: We utilized a consistency framework to standardize the scale abbreviation strictly to "ATSS-EA" throughout the text, tables, and supplementary materials. We completely standardized the country name to "Turkiye" in accordance with current international publishing guidelines. We also corrected the unfortunate OCR/typographical error regarding the affiliation (Mugla Sitki Kocman University).
Targeted Phrasing Corrections: We completely rewrote the specific awkward sentences identified by the reviewer:
- The repetitive phrase on Page 1 was simplified to: "...driven primarily by increasing academic pressures and rigorous curriculum demands."
- The unclear term "fresher" on Page 5 was scientifically clarified to: "...developing their emerging abstract cognitive capacities..."
- The Whitehead reference (Page 14) was streamlined to: "...originally conceptualized by Whitehead [65]..." (Please see tracked changes).
General Copyediting and Simplification: Beyond the specific examples provided, the entire manuscript underwent a final proofreading sweep by a fluent academic English speaker to break up overly complex, run-on sentences into shorter, more digestible statements to enhance flow and readability.
Reference Formatting: We systematically reviewed the reference list according to the journal's style guide. We standardized the abbreviations of the journal names and re-uploaded them using Zotero.

Comment 10: Discussion and Conclusions: Overreach and Repetition. "Concern: The discussion section repeats findings without synthesising them into meaningful implications... introduces new literature (Whitehead) that should have been in the introduction... repeatedly invokes SDG 3 but never specifies how educators would use ATSS-EA scores... Recommendation: Restructure the discussion to follow a standard format... Explain why it matters... Add a concrete practical applications section... Discuss cross-cultural comparisons... propose specific future studies (longitudinal, intervention)."

Response: We are exceptionally grateful for this structural critique. The reviewer is entirely correct that the Discussion section needed to evolve from a recapitulation of results into a highly structured, forward-looking synthesis. We have completely overhauled Section 4 to follow the reviewer’s recommended standard format, incorporating clear subheadings. Furthermore, as noted in our responses to prior comments, we have fully integrated the concrete SDG 3 practical applications and expanded on the self-report limitations.

Action Taken:

Structural Overhaul: We restructured the Discussion into three distinct subsections to dramatically improve flow and synthesis: 4.1. Theoretical Implications and Synthesis, 4.2. Practical Applications for Educators, and 4.3. Limitations and Future Directions.
Relocation of Literature: As rightly pointed out, we removed the foundational literature regarding physical literacy (Whitehead [65]) from the Discussion and properly integrated it into the Introduction.
Synthesis and "Why It Matters": In Section 4.1, we moved beyond merely restating correlations. We directly incorporated the reviewer’s excellent paradigm, explicitly explaining that the strong correlation between "Interest" and physical literacy means that targeting intrinsic enjoyment serves as an efficient, dual-target mechanism to simultaneously boost both attitude and perceived competence.
Concrete Practical Applications: In Section 4.2, building upon the macro-level SDG 3 policy example, we added micro-level applications. We explained exactly how PE teachers can use the ATSS-EA to identify students with declining interest, tailor PE curricula accordingly, and utilize the scores to evaluate program effectiveness.
Future Directions (Cross-cultural & Interventions): In Section 4.3, we expanded the future research agenda. We explicitly proposed longitudinal designs tracking attitudes from late childhood through adolescence, recommended using the ATSS-EA as an outcome measure in pre/post pedagogical interventions, and highlighted the need for cross-cultural adaptations for international application.

Comment 11: Specific Comments - Title, Abstract, and Introduction.

"Concern: The title is clear but lengthy. Consider shortening... Highlights: Add factor names, change 'preventive medicine specialists'... Abstract: Reword objectives, specify sample sizes, fix grammar error, add practical implications... Introduction: Start with a WHO statistic (Guthold et al., 2020) and connect to SDG 3... Map TPB/SDT... Clarify 'fresher abstract thinking'... Add a fifth hypothesis for measurement invariance."

Response: We are profoundly grateful for these highly specific and constructive editorial suggestions. The reviewer’s recommended phrasing for the title, abstract, and the opening WHO statistic are exceptionally sharp and have significantly improved the manuscript's hook and clarity. We have implemented virtually all of these specific textual recommendations.

Action Taken:

Title and Highlights: We updated the title exactly as suggested: "Adaptation and Validation of the Attitude Towards Sport Scale for Early Adolescents: Implications for Sustainable Health and Well-Being." We also added the factor names to the first highlight bullet and replaced "preventive medicine specialists" with the more accurate "pediatric public health professionals."
Abstract Revisions: We streamlined the Objectives sentence, inserted the exact sample sizes for both the qualitative (n=27) and quantitative (N=531) phases, corrected the grammatical error regarding model fit, and added the recommended concluding sentence highlighting the tool's practical utility for educators. (Please see the revised Abstract).
Introduction - Opening Hook: We completely rewrote the opening paragraph of the Introduction. We incorporated the 81% physical inactivity statistic (Guthold et al., 2020) and explicitly linked this developmental decline to the specific targets of SDG 3, creating a much more compelling rationale. (Please see Page 1, Paragraph 1).
Introduction - Theory and Wording: Regarding the TPB/SDT mapping, rather than adding a redundant figure, we ensured the theoretical mapping is explicitly detailed in the text (and thoroughly expanded in the Discussion section, as addressed in Comment 11). Furthermore, the confusing phrase "fresher abstract thinking" was already identified and corrected to "emerging abstract cognitive capacities" during our comprehensive copyediting sweep (addressed in our response to Comment 10).
Hypothesis Addition: This was an excellent catch. Since we robustly tested and reported measurement invariance, we formally added it to the end of the Introduction as our fifth hypothesis (H5).

Comment 12: Methods. "Concern: Clarify cross-sectional methodological study... Ethics: written child assent? Sampling: Specify dates, public vs private, target vs achieved table... Table 1: Fix 'MGPAMDMPAT' typo... ATSS description: Likert neutral point... PAQ-C/PPLS: Original English citations... Personal Info: Recall bias, define license... Qualitative: Convert 156.3 months to years, semantic shift of 'social status'... Data Analysis: Justify MLR estimator."

Response: We are incredibly grateful for this exhaustive methodological checklist. It has allowed us to tie up several loose ends and typographical errors. Please note that several of these excellent recommendations—such as converting months to years (13.02 years), providing the Willis (2005) framework, detailing the rural/urban division of schools, adding the target vs. achieved distribution table (Table S4), and specifying the items per factor—were completely addressed and integrated during our previous revision rounds (as detailed in our earlier responses). However, the reviewer’s new catches regarding the typo in Table 1, the justification for MLR, and the semantic shift defense are brilliant, and we have implemented them immediately.

Action Taken:

Sampling & Dates (Section 2.1): We explicitly added the exact data collection timeline (February to March 2026) and clarified that the 15 schools were exclusively public institutions. We also explicitly defined "licensed status" as federation-level interscholastic sports, differentiating it from informal intramurals.
Table 1 Correction: We fixed the egregious OCR/typographical error in Table 1 ("MGPAMDMPAT") and replaced it with the clear, accurate headers for Physical Activity Duration.
MLR Justification (Section 2.4): We updated the Data Analysis section to explicitly justify the use of the MLR estimator. We noted that MLR is the gold standard for structural equation modeling when utilizing 5-point ordinal Likert data, as it is highly robust to slight deviations from multivariate normality.
Semantic Shift Defense (Section 2.3): We added a strong theoretical justification regarding the shift from "social status" to "being successful." We clarified that for early adolescents, social recognition is primarily conceptualized as peer-recognized success. Thus, the linguistic shift preserved the original theoretical construct while perfectly aligning with their developmental cognitive schemas.).
Additional Minor Edits: We confirmed the Likert scale's neutral midpoint in the text, ensured original English citations for the PAQ-C and PPLS are present in the references, and added a specific acknowledgement of "recall bias" regarding self-reported activity durations to the Limitations section.

Comment 13: Results.

"Concern: Model Fit: Justify not using modification indices... Convergent Validity: Reorganize Table 2, report AVE/CR, enlarge Figure 1... Known-Groups: Add 95% CIs for Cohen's d in Table 3, report ANCOVA in main text, report ANOVA/t-test stats for grade levels... Criterion Validity: Add note to Table 4 regarding DMPAT/WVPAT, acknowledge CMV in Table 5, consider MTMM matrix, discuss theoretical implications."

Response: We sincerely thank the reviewer for this comprehensive checklist regarding the Results section. We are pleased to note that several of these critical items—such as reorganizing Table 2 by factor, incorporating the ANCOVA results directly into the main text, detailing the theoretical implications of the DMPAT/WVPAT divergence, and thoroughly explaining the Common Method Variance (CMV) driving the high correlation in Table 5—were already explicitly addressed and integrated during the prior revision round. However, the reviewer’s new requests to justify the absence of modification indices, provide the specific ANOVA statistics for grade levels, and improve the diagram resolution are excellent methodological enhancements that we have now fully implemented.

Action Taken:

Modification Indices Justification: We expanded the Model Fit (Section 3.1) paragraph to explicitly justify our refusal to use modification indices. We explained that avoiding post-hoc error covariances was a deliberate decision to preserve the strict a priori theoretical structure and to completely eliminate the risk of sample-specific overfitting (capitalizing on chance).
ANOVA Statistics for Grade Level: In Section 3.3, we updated the text to report the exact statistical values (F and p) for the non-significant ANOVA testing differences across grade levels, ensuring transparent reporting.
Table Notes and Figure Resolution: We added the requested interpretive note beneath Table 4 regarding the divergence between voluntary (WVPAT) and mandatory (DMPAT) activities. We also significantly enlarged Figure 1 (the CFA Path Diagram) and ensured it was re-exported at a high resolution (300 DPI) so all factor loadings and residual variances are perfectly legible.
Regarding the MTMM Matrix: While a Multitrait-Multimethod (MTMM) matrix is a superb suggestion for future research, it is unfortunately not possible with the current dataset. As acknowledged in our CMV discussion, all constructs in this study were measured using a single method (concurrent self-report surveys). Therefore, an MTMM matrix cannot be mathematically derived. We have heavily emphasized the need for objective multi-method data (e.g., accelerometry) in the Limitations section to address this.

Comment 14: Discussion and Conclusion. "Concern: Opening: Add unique contribution... Construct Validity: Connect to TPB... Physical Literacy/PA: Cut by 30-40%, integrate PPLS and DMPAT/WVPAT discussion... Known-Groups: Discuss lack of grade-level differences... Practical Implications: Provide PE teacher example, connect to SDG 3... Move physical literacy elaboration to intro... Conclusion: Cut to one concise paragraph, end with cross-cultural statement."

Response: We are profoundly grateful for this comprehensive roadmap for the Discussion and Conclusion sections. As the reviewer will note, several of the most substantial structural recommendations in this list—such as relocating the Whitehead physical literacy literature to the Introduction, integrating the exact PE teacher practical application, linking the scale to SDG Target 3.4, and thoroughly explaining the theoretical divergence between DMPAT/WVPAT and the common method variance regarding the PPLS—were already explicitly executed in response to the reviewer's previous, highly detailed comments (specifically Comments 1, 5, 11, and 13). Building upon that fully restructured foundation, we have now implemented the final theoretical refinements and aggressively streamlined the Conclusion as requested.

Action Taken:

Unique Contribution & TPB Framework: In the opening of the Discussion, we explicitly added the suggested statement highlighting that this is the first study to systematically combine cognitive pretesting with psychometric validation for an early adolescent sport attitude scale. We also added the reviewer's excellent theoretical insight regarding the TPB, noting that the structural preservation of the dimensions proves attitudes are firmly established, belief-based constructs by early adolescence.
Grade-Level Differences: We expanded the known-groups discussion to address the non-significant grade-level ANOVA results. We clarified that rather than being surprising, this non-significance indicates a critical window of attitudinal stability during middle school, immediately preceding the well-documented developmental drop-off in high school.
Condensing the Conclusion: We aggressively pruned the Conclusion section, removing repetitive statistical summaries. It is now a single, highly concise paragraph that focuses entirely on the macro-level implications, ending precisely with a forward-looking statement regarding the scale's potential for cross-cultural adaptation and future intervention research.

Comment 15: References. "Issues: Several references have 2026 publication dates... verify if preprints. Reference 26 (Senturk, 2015) is in Turkish – note this. Reference 32 (Kowalski et al.) lacks publication year. References 58-64 are all on SDGs... consider consolidating. Check all DOIs... Recommendation: Run all references through a reference manager."

Response: We greatly appreciate the reviewer’s eagle-eyed attention to our reference list. Accurate citations are the bedrock of academic integrity, and the reviewer caught several formatting oversights (such as the missing Kowalski year and the Turkish language notation) that we have now corrected. Regarding the 2026 publications, we can confirm these are not preprints; they are fully published, peer-reviewed articles from the current year with official DOIs and volume numbers.

Action Taken:

2026 Publications Verified: We double-checked the 2026 citations (e.g., Sukys et al., Roland et al.). They are correctly cited as fully published articles in the current calendar year.
Missing Details Added: We updated Reference 26 (Senturk, 2015) to explicitly include the "(In Turkish)" notation at the end of the citation. We also added the missing publication year (2004) to the Kowalski et al. PAQ-C manual reference.
Consolidating SDG Citations: We completely agree that citing seven consecutive papers (Refs 58-64) for the SDG statement was excessive (citation stuffing). We have consolidated this section, retaining only the three most foundational and relevant references to support the SDG 3 framework, and removed the redundant citations from the reference list.
DOI and Formatting Sweep: As detailed in our response to Comment 10 (Writing Quality), we conducted a comprehensive review of all references to ensure strict adherence to the journal's style guide, standardizing all DOI prefixes (https://doi.org/10...) for correct accessibility.

Comment 16: Figures and Tables.

"Concern: Figure 1: Increase resolution, include loadings... Table 1: Define MGPAMDMPAT, format n/%, Mean/SD... Table 2: Reorganize, refer to item wording... Table 3: Add ATSS total score... Table 4: Add interpretation of effect sizes... Table 5: Acknowledge CMV, present CIs... Supplementary Tables: Summarize S2a/S2b, report delta values for invariance, include full ANCOVA statistics."

Response: We are profoundly grateful for this final, comprehensive formatting checklist. It served as an excellent quality-assurance tool for our tables and figures. As meticulously detailed in our preceding responses (specifically Comments 3, 5, 8, 12, and 13), the vast majority of these essential formatting corrections—including the high-resolution export of Figure 1, the correction of the Table 1 typo, the addition of the robust ANCOVA Table with np2, the insertion of 95% CIs and the CMV note for Table 5, and the extensive cross-walk reference for the qualitative item revisions—have already been fully executed and integrated. We have now completed the final remaining refinements to ensure total alignment with the reviewer's standards.

Action Taken (Final Table Refinements):

Table 4 (Effect Sizes): We added the specific interpretive note beneath Table 4, explicitly stating that the correlations with voluntary physical activity (WVPAT) represent medium-to-large effect sizes, perfectly differentiating them from the negligible DMPAT associations.
Table 3 & Table 2: We confirmed that the "ATSS-EA Total Score" is fully reported in the known-groups comparison (Table 3). For Table 2, we maintained the clear factor-based organization and ensured that the text strictly directs readers to the Supplementary Materials for the full linguistic wording of all 25 items, as including them within Table 2 would severely compromise its readability.
Measurement Invariance: We updated the measurement invariance reporting to explicitly highlight the CFI and RMSEA values, ensuring they are interpreted against the gold-standard Chen (2007) thresholds for establishing strict structural equivalence.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

GOOD REVISIONS

Author Response

Thank you for your detailed review.

Article Menu

Attitudes Towards Sport in Early Adolescence: A Scale Adaptation Study for Sustainable Good Health and Well-Being

Further Information

Guidelines

MDPI Initiatives

Follow MDPI