1. Introduction
Early adolescence is a developmental window in which rapid cognitive gains co-occur with heightened emotional vulnerability. During this period, neuroplasticity is markedly enhanced, facilitating learning and growth in academic ability; at the same time, emotion-regulatory functions of the prefrontal cortex are not yet fully mature, rendering adolescents more susceptible to mood lability and mental health problems (
Casey et al., 2019;
Pfeifer & Allen, 2021). Globally, mental health problems among adolescents constitute a major public health concern, with recent estimates indicating that approximately 10–20% experience anxiety or depressive symptoms and that prevalence rates have continued to rise in recent years (
Lu et al., 2024;
World Health Organization, 2025). In China, systematic reviews and meta-analyses similarly report high prevalence rates of internalizing symptoms among children and adolescents, suggesting a substantial mental health burden comparable to, or in some estimates exceeding, global averages (
Z. Chen et al., 2023;
Zhou et al., 2024).
Within China’s Confucian-heritage educational system, academic achievement occupies a central social and cultural position and is widely regarded as a primary pathway to social mobility and family honor. This emphasis, reinforced by parental expectations and social comparison, often translates into sustained academic pressure during adolescence, increasing vulnerability to internalizing problems such as depression and anxiety. At the same time, students in Chinese participating regions consistently demonstrate strong academic performance relative to international benchmarks (
Organisation for Economic Co-operation and Development, 2023). Comparative research suggests that this coexistence of high achievement and high academic pressure is particularly salient in East Asian education systems, giving rise to the widely discussed “high achievement–high pressure” profile (
Steare et al., 2023). In response, China’s Ministry of Education issued the Double Reduction policy in 2021, aiming simultaneously to reduce academic burden and strengthen school-based mental health services. Against this backdrop, examining the coordinated development of academic ability and mental health in early adolescence is of immediate practical importance and of policy relevance for advancing educational equity and improving school mental-health support systems.
Adolescence is widely recognized as a critical developmental period characterized by elevated risk for mental health problems. A large-scale meta-analysis of 192 epidemiological studies identified the peak age of onset for most mental disorders at approximately 14.5 years, with a substantial proportion of first-episode symptoms emerging before age 18 (
Solmi et al., 2022). In Western populations, clear sex differences have been documented, with boys showing higher rates of externalizing problems and girls exhibiting greater vulnerability to internalizing symptoms such as anxiety and depression (
Salk et al., 2017). In the Chinese context, cultural socialization processes—including collectivist norms and high societal and academic expectations—may further predispose adolescents to internalized psychological difficulties (
X. Chen et al., 2013). Recent systematic reviews and large-scale studies indicate that approximately 22–26% of Chinese children and adolescents report depressive symptoms, and about one quarter report anxiety symptoms, placing China at a medium-to-high level internationally (
Xu et al., 2024). Longitudinal and repeated cross-sectional evidence further suggests that internalizing symptoms increase steadily during the junior-high years (roughly ages 12–15), particularly among girls and adolescents exposed to high academic pressure (
Liu et al., 2024;
Y. Sun et al., 2023;
Wu et al., 2022). Thus, this body of evidence highlights early adolescence as a critical window for mental-health risk emergence and an important target period for educational and preventive interventions.
Academic ability is a dynamic construct that evolves with cognitive maturation and educational experience. Longitudinal and growth-model research demonstrates that the development of core academic skills is shaped by the interplay of educational environment, socioeconomic resources, and neurocognitive maturation, showing nonlinear trajectories characterized by alternating acceleration and plateau phases (
Erbeli et al., 2021;
Little et al., 2021). In reading, fluency growth is fastest in the early grades but slows thereafter, and students exhibit heterogeneous developmental profiles rather than a single path (
Khanolainen et al., 2024). The reciprocal association between reading and mathematics is stronger in elementary school but tends to weaken or change direction in secondary education (
Gnambs & Lockl, 2023). Longitudinal evidence from Chinese children similarly reveals persistent divergence in vocabulary growth (
van der Kleij et al., 2023). With the onset of early adolescence (approximately 10–14 years), neural and cognitive systems undergo major reorganisation: prefrontal functions and executive control develop rapidly, and abstract and formal-operational reasoning begin to emerge (
Best & Miller, 2010). Such ability differences are socially manifested through grades, standardized tests, and classroom ranking, triggering social-comparison effects that may intensify achievement anxiety and self-efficacy disparities, leading some students to difficulties in academic pressure and emotional regulation (
Jiang et al., 2021). Conversely, higher language and vocabulary ability facilitate emotional expression and social communication, thereby reducing psychological distress (
Hentges et al., 2021;
Mellado, 2025). Academic ability and mental health are believed to exert bidirectional influences. According to the Attentional Control Theory, anxiety disrupts goal-directed attention and consumes working-memory resources, impairing task performance (
Eysenck et al., 2007). Excessive stress and anxiety heighten attention to threat cues, reduce working-memory efficiency, and decrease task persistence, whereas chronic anxiety or depression is associated with lower classroom engagement and achievement (
Owens et al., 2012;
Linnenbrink-Garcia & Pekrun, 2014).
Cross-sectional studies have consistently revealed a negative association between mental-health indicators and academic ability (
Steinmayr et al., 2016); however, such evidence only captures static relationships at a single time point. To compare the strength of this association across developmental stages and to identify its trajectory or lagged effects, longitudinal designs are required (
Pekrun et al., 2022). Cross-study comparisons indicate that most longitudinal research supports a negative relationship between mental health and academic ability, yet the magnitude and direction of this association vary across samples, statistical models, and control variables. For instance, longitudinal tracking studies in European samples have found that depressive symptoms predict subsequent declines in academic performance; however, when baseline ability and socioeconomic background are statistically controlled, the association often weakens or becomes non-significant (
López-López et al., 2021;
Wickersham et al., 2023). In Chinese adolescent samples, some studies have reported a persistent negative predictive effect of anxiety on subsequent Chinese-language achievement, whereas others suggest that this effect may be limited to short-term fluctuations or may vary depending on the measurement approach employed. These inconsistencies not only reflect the stage-specific and context-dependent nature of the relationship between mental health and academic ability but also highlight notable limitations in current psychometric practices (
W. Chen, 2025;
Ye et al., 2019). At present, many longitudinal studies of academic development still rely on the Classical Test Theory (CTT) framework, in which measurement typically depends on within-grade standard scores, raw scores, or teacher ratings. Such scores primarily reflect an individual’s relative position within a specific sample or test form, rather than representing an interval-scaled latent ability; consequently, cross-grade comparisons may inflate or underestimate true growth (
Protopapas et al., 2014). Moreover, item difficulty, scoring standards, and content coverage often differ substantially across grades or test versions. Without employing scaling or linking techniques to place all forms on a common scale, it is impossible to ensure measurement invariance and conceptual equivalence of the latent construct (
Kolen & Brennan, 2014). Therefore, longitudinal or cross-sectional studies based solely on CTT scores may conflate true developmental change with measurement error, hindering accurate identification of stage-specific features in the link between mental health and academic ability (
Gorter et al., 2015).
Overall, developmental evidence concerning the relationship between academic ability and mental health during early adolescence (approximately 12–15 years) in the Chinese context remains limited. Existing studies have primarily focused on cross-sectional samples or a single educational stage, offering little insight into developmental continuity. To address this gap, the present study adopts a developmental perspective that integrates educational measurement and mental health research to examine whether the association between academic ability and internalizing symptoms follows a dynamic, stage-specific pattern during early adolescence. Vocabulary comprehension was selected as a core indicator of academic ability, given its foundational role in language understanding and its relevance across academic domains (
Ricketts et al., 2020). Methodologically, Item Response Theory (IRT) and vertical scaling were used to construct a unified developmental vocabulary ability scale spanning Grades 1–12. This scale was then applied to an independent junior-high sample (Grades 7–9) using a fixed-item-parameter calibration procedure (
König et al., 2021), enabling developmentally comparable ability estimates. This measurement framework allowed us to examine cross-grade variation in the association between academic ability and internalizing symptoms without confounding developmental differences with measurement artifacts.
Based on developmental theory and prior empirical evidence, the present study tested two primary hypotheses. First, vocabulary ability was expected to be negatively associated with internalizing symptoms during early adolescence, such that adolescents with higher vocabulary ability would report lower levels of depression, anxiety, and stress (H1). Second, the strength of this association was hypothesized to vary across grade levels, reflecting developmental differences in academic demands and emotional vulnerability during early adolescence (H2). In addition, by comparing IRT-based vertically scaled vocabulary ability estimates with within-grade standardized raw scores, the study examined whether conclusions about academic–mental health associations depend on the measurement framework used.
4. Discussion
4.1. Overview of Main Findings
This study examined the association between academic ability and mental health during early adolescence from a developmental perspective. To enable cross-grade comparability, we used IRT to construct a vertically linked vocabulary scale spanning Grades 1–12, thereby placing ability estimates on a common developmental metric. Using this common metric, we examined associations between vocabulary ability and three internalizing symptoms and tested whether these associations varied across Grades 7–9. The results indicated that higher vocabulary ability was associated with lower levels of depression, anxiety, and stress. After controlling for gender and only-child status, these associations remained statistically significant, and the Grade 8 interaction terms suggested a relatively steeper negative association compared with Grade 7, albeit with modest incremental variance explained. Overall, the present study provides empirical evidence on the developmental interplay between academic and psychological functioning in adolescence and illustrates the utility of IRT-based vertical scaling for cross-grade measurement in the Chinese context. These findings contribute empirical evidence to research on academic–mental health co-development in adolescence and demonstrate how IRT-based vertical scaling enables the identification of developmental patterns in academic–psychological associations across grades in the Chinese context.
4.2. Developmental Interpretation of Grade Differences
In a sample of junior secondary students, higher vocabulary ability was significantly associated with lower levels of depression, anxiety, and stress. This finding is consistent with longitudinal and review studies showing that children with weaker language skills are more likely to exhibit internalizing problems during late childhood and early adolescence (
Bornstein et al., 2013;
Hentges et al., 2021). Language ability may reduce internalizing symptoms through two primary pathways: by facilitating emotion regulation (e.g., cognitive reappraisal or linguistic distancing;
Nook et al., 2020,
2025) and by enhancing social competence (e.g., improved peer interaction and emotional support;
Wieczorek et al., 2024). In this context, stronger language ability may reflect richer emotional vocabulary and inner speech resources, which could support emotion labeling and cognitive reappraisal as well as more effective social communication. Consequently, students with higher vocabulary ability may report lower levels of internalizing symptoms. Furthermore, the negative association appeared relatively more pronounced in Grade 8 than in Grade 7. Within the Chinese middle-school curriculum structure, Grade 8 marks a significant increase in course difficulty and academic demands, which may heighten the relevance of language-related resources for emotional regulation and coping (
J. Sun, 2024). In this context, the association between vocabulary ability and internalizing symptoms may become more salient during Grade 8, even if the overall effect size remains modest.
By contrast, although Grade 9 is generally associated with increasing pressure related to the high-stakes entrance examination, data collection in the present study took place at the beginning of the fall semester, prior to the peak period of exam-related stress. Prior research suggests that exam-related stress and emotional distress tend to intensify as high-stakes examinations approach rather than remaining constant across the school year, with peaks in mental health symptoms often observed during examination periods (
George, 2024). As a result, the psychological burden typically associated with imminent entrance examinations may not yet have fully manifested at the time of assessment in the present study. Moreover, in the Chinese context, higher academic burden and examination-related pressure have been consistently associated with depressive and anxiety symptoms among adolescents (
Wang et al., 2025). In addition, Grade 9 students may enter a more structured phase of exam preparation (e.g., standardized instruction and collective training), which could reduce between-student variability in study routines and perceived stress early in the semester. Such contextual arrangements may reduce between-student variability in emotional and stress responses during the early semester, thereby attenuating the statistical detectability of interaction effects at this stage.
4.3. Methodological Implications
Methodologically, this study goes beyond prior work that has relied primarily on CTT-based raw scores or within-grade standardized scores by adopting an IRT-based vertical scaling framework to support cross-grade comparability of vocabulary ability. By placing academic performance on a common latent scale rather than treating it as grade-specific or sample-dependent, this approach enabled developmentally interpretable comparisons across grades. This measurement strategy proved critical for identifying stage-specific patterns in the association between academic ability and internalizing symptoms. Using a unified ability scale, we were able to examine how the strength of academic–psychological associations vary across developmental stages, revealing a more pronounced association in early secondary school (especially Grade 8). Such patterns may be difficult to detect using conventional within-grade standardization, which removes between-grade variance and may obscure developmental differences.
The use of IRT calibration and linking aligns with established practices in large-scale assessments such as NAEP, PISA, and TIMSS, where common metrics are used to ensure comparability across forms, grades, and populations (
Yamamoto & Mazzeo, 1992). Consistent with prior large-scale studies of academic skill development, the resulting vocabulary trajectory showed a nonlinear, stage-specific pattern, characterized by steady growth in primary school, accelerated growth with widening individual differences in early secondary school, and a leveling-off in upper secondary school (
Peng et al., 2019). Importantly, by integrating a developmentally comparable academic ability scale with mental health outcomes, the present study illustrates how vertically scaled academic measures can advance research on the dynamic interplay between academic and psychological development during adolescence.
4.4. Limitations and Future Directions
Naturally, this study has several limitations. First, participants were recruited via convenience sampling from public schools in two economically developed Chinese cities, which may limit generalizability to other regions and school contexts. Neither dataset collected individual-level socioeconomic indicators or ethnicity; contextual information was limited to the city/school level. Because Dataset 1 primarily served IRT calibration and vertical scaling, we prioritized large grade-level samples and response-quality screening over detailed background variables, and Dataset 2 included only basic demographics for the regression models. Preliminary checks suggested possible DIF in a small number of items, and mean ability estimates differed across cities, supporting the need for broader multi-site calibration and validation. Future studies should sample more diverse regions and school types and collect richer contextual data to test invariance/DIF and improve the robustness and generalizability of the scale. Second, because the present study adopted a cross-sectional design, causal direction and true developmental trajectories could not be identified, and inferences about underlying mechanisms may be biased. In addition, academic ability was represented solely by vocabulary ability, excluding domains such as mathematics and reading comprehension, which limits a comprehensive understanding of academic–mental health covariation. Future research should employ multi-wave longitudinal designs to examine the temporal ordering and causal effects between vocabulary ability and internalizing symptoms. It is recommended that future work integrate mathematics and reading-comprehension measures within the unified scaling framework and include process-level tracking of mental health (e.g., emotion-regulation tasks or experience sampling) to enhance the explanatory power of mechanism testing. Furthermore, future studies could extend hybrid IRT approaches, such as multi-group IRT, Bayesian hierarchical modeling, and NEAT-linking comparisons across regions to systematically evaluate DIF and measurement fairness, thereby improving cross-regional comparability.