Our findings reveal a modest yet statistically significant linear relationship between intelligence and creativity (
r = 0.123,
p < .001) across a large developmentally diverse Hong Kong Chinese student sample, which is consistent with previous large-scale meta-analyses such as
Kim (
2005) (
r = 0.174) and
Gerwig et al. (
2021) (
r = 0.25). What initially appears as a straightforward partnership between these two constructs, however, becomes far more compelling upon closer examination. The true narrative emerges not from the baseline correlation itself, but rather from the nonlinear structure embedded within the data, a pattern that both challenges and refines decades of theoretical assumptions about where intelligence fundamentally matters for creative thinking.
The scholarly discourse surrounding intelligence and creativity has long centered on what
Guilford (
1967) called a “necessary but not sufficient” relationship, a formulation that has anchored research discussions for generations. Yet this theoretical anchor increasingly requires careful revision. Recent work by
Weiss et al. (
2020) and
Karwowski et al. (
2016) has compellingly demonstrated that the traditional 120-threshold model oversimplifies what is genuinely a far messier and may be a more culturally contingent phenomenon. Our analytical approach diverged from prior methodologies. Rather than imposing a predetermined threshold, we employed segmented regression analysis with empirical breakpoint detection, allowing the data themselves to speak about where and whether meaningful inflection points might actually exist.
4.1. Methodological Advantages
One of the most methodologically valuable advantages in this work is the transparent deployment of segmented regression with empirically detected breakpoints, coupled with explicit model comparison using information criteria. Such an approach moves decisively beyond the problematic practice of p-hacking different threshold values until desired results emerge. Complemented with this data-driven approach, another noteworthy strength of the present investigation concerns sample dimensionality and analytical precision. Whereas earlier threshold research, including the influential work by
Karwowski et al. (
2016) and the recent comprehensive cross-cultural study by
Repeykova et al. (
2025), which often operated with sample sizes insufficient for stable nonlinear modeling (frequently
N < 200 per group), our substantially larger sample affords several decisive analytical advantages. The segmented regression estimates became considerably more stable, with tighter confidence intervals around the detected breakpoint. This statistical robustness reduces substantially the likelihood that our 102-threshold represents mere sample-specific fluctuation rather than a genuine feature of the intelligence–creativity relationship in this population. Furthermore, the bootstrap confidence intervals and model comparison indices (BIC differential, Davies test statistic) gain credibility through increased statistical power. Where
Repeykova et al. (
2025) reported 95% CI bounds ranging widely (e.g., 75.76 to 124.24 for their UAE subsample), larger samples permit narrower, more tightly bounded confidence intervals that better discriminate signal from noise. This enhanced precision enables more confident inferences about where the true intelligence–creativity inflection point actually localizes. Larger samples also stabilize variance estimation, a particular concern in nonlinear modeling where heteroscedasticity can produce spurious breakpoint detection (
Muggeo, 2008).
What crystallized through this data-driven modeling was instructive on multiple fronts. The simplest linear specification, assuming a uniform relationship across the entire intelligence spectrum, captured only marginal explanatory power. However, the Davies test statistic comparing zero breakpoints to one achieved 11.33 with a bootstrap
p-value of .010 clearly evidenced that some form of structural shift was genuinely present. When we conducted a greedy BIC search to identify the empirically optimal single breakpoint, it emerged at an intelligence score of 102, notably lower than the long-standing Western threshold of 120. This finding aligns strikingly with recent cross-cultural investigations.
Repeykova et al. (
2025) examined Russian and United Arab Emirates samples and identified distinct breakpoints at 128 and 100 respectively. This underscores a critical insight: even if the threshold hypothesis itself is universal, the precise location where this threshold manifests seems to vary substantially across contexts, which could be conceptualized as environmental factors such as cultures. Such variation is not merely statistical noise; rather, it reflects deeper differences in how intelligence becomes mobilized toward creative ends when we consider the mixed findings from previous intelligence threshold studies that tested samples from diverse geographic and cultural origins. For instance, a threshold of 85 in German samples (
Preckel et al., 2006), a threshold of 109.2 in Chinese samples (
Shi et al., 2017), a threshold was supported at 120–129 in Turkish samples (
Çetinkaya, 2023).
The data-driven 102-model outperformed the conventional 120-model decisively, with a BIC differential of 8.89 points, which
Kass and Raftery (
1995) would classify as strong evidence favoring the empirically detected breakpoint. We pursued this investigation further, testing more complex two-, three-, and four-breakpoint specifications in hopes of uncovering additional inflection points. The four-breakpoint specification had a similar pattern to the three-breakpoint specification but did not have a better explanatory power than it. Therefore, we have a closer examination of the pattern that emerged in the three-breakpoint structure, i.e., an initial modest slope below 97, pronounced acceleration between 97 and 102, a dramatic drop approaching zero between 102 and 115, followed by a marginal rebound above 115, resembled overfitting to noise. The partial R
2 values shrunk to negligible proportions (0.00040), and the BIC climbed by 12.15 points, a clear Bayesian penalty reflecting excessive complexity without proportionate explanatory benefit. However, there was something in that oscillating pattern worth attending to.
Sligh et al. (
2005) proposed that individuals at different intellectual levels may deploy their fluid reasoning in qualitatively distinct ways during idea generation. The sharp acceleration just above 97, followed by the near-zero slope above 102, might capture such a qualitative switching mechanism. This suggests that individuals hovering just below 102 may gain considerable creative benefit from marginal increases in reasoning capacity, whereas those crossing 102 may increasingly depend on other factors such as personality traits, accumulated domain expertise, and/or motivational variables rather than raw intellectual power.
The present study’s analytical precision provides more defensible estimates of threshold location and sharpness. Moreover, the large sample size carries implications for both precision and generalizability. Meta-analytic evidence shows that intelligence–creativity correlations vary substantially across studies, with sample size and measurement precision accounting for significant variance in effect estimates (
Karwowski, 2021). By synthesizing data from a cross-developmental perspective, this investigation captures the intelligence–creativity relationship across its natural variance in psychometric space. Analytically, the study’s covariate inclusion (age and grade) represents a more sophisticated practice than crude correlational approaches. This methodological transparency models good practice in developmental research.
4.2. Hypothetical Cultural Influences on the Threshold
What merits substantial emphasis, however, is the fundamental ambiguity embedded in our and others’ cross-cultural findings. Differences in detected thresholds across cultural contexts could reflect genuine cultural variations in how intelligence supports creativity or could reflect measurement interpretation effects and sampling artifacts that masquerade as cultural differences. This interpretive uncertainty requires far more cautious framing than the literature has typically provided.
4.2.1. Cultural Differences in Cognitive Cautiousness
The empirical patterns deserve attention first.
Repeykova et al. (
2025) detected distinct breakpoints at 128 (Russia) and 100 (UAE), substantially different from the Hong Kong-derived 102-figure. Superficially, these variations could suggest cultural calibration on the hypothesis that threshold mechanisms are neuropsychologically universal yet culturally modulated. Several recent methodological critiques complicate this straightforward interpretation, particularly considering our instruments for intelligence and creativity assessment.
Despite its reputation as a culture-fair test, whether Raven’s Matrices actually functions identically across cultural contexts is a concern.
Gonthier (
2022) provides the most comprehensive recent synthesis, examining Raven’s solution processes and he challenges the “culture-fair” assumption of it.
Gonthier (
2022) pointed out that several cultural assumptions of the test are not culture-fair and largely make it impossible to draw clear-cut conclusions from average score differences between ethnic groups. Critically, these cultural assumptions emerge not from obvious language-based bias but from deep differences in how people from different cultural backgrounds perceive, organize, and manipulate visual-spatial information.
For Hong Kong respondents specifically, this creates interpretive complications. Confucian educational traditions emphasize analytical precision, pattern recognition within established frameworks, cautious hypothesis-formation, and cognitive styles well-suited to matrix completion tasks but potentially quite different from the exploratory, risk-tolerant approaches more prevalent in Western contexts. A Hong Kong respondent elevated in such traditions may approach Raven’s Matrices by meticulously analyzing each element, searching for clear logical principles, and potentially arriving at accurate conclusions efficiently. However, if cultural background shapes confidence in pattern-identification and willingness to commit to answers, then what appears as an “intelligence threshold” could partially reflect cultural differences in cognitive cautiousness rather than fundamental limits on reasoning capacity. If East Asian respondents employ more conservative pattern-verification strategies than Western respondents, they might produce different score distributions on Raven’s not because of inferior reasoning, but because of culturally shaped confidence calibration.
4.2.2. Sociocultural Factors of Expression Willingness in Creativity
Similarly, the assessment tool for creativity raises another concern.
Ivancovsky et al. (
2018), examining Israeli versus South Korean respondents on AUTs, found that Israelis had higher scores than Koreans. They suggested that cross-cultural differences in creativity might be explained by variations in inhibitory control, that is, cultural differences in willingness to express ideas freely, particularly unconventional or socially risky ideas. This finding has direct implications for our threshold analysis. If creativity differences reflect, at least partially, not creative capacity but expression willingness in creativity, then divergence between cultural groups might stem from sociocultural factors rather than cognitive ones. Hong Kong’s specific cultural position could produce particular patterns of idea-expression inhibition. A respondent might possess substantial creative ideation capacity but generate fewer responses simply because cultural norms around appropriateness, social harmony, and respectful self-presentation discourage uninhibited idea generation. If this cultural modulation of expression operates differentially across intelligence levels, perhaps higher-intelligence individuals develop meta-cognitive strategies to override cultural inhibition, whilst lower-intelligence individuals remain bound by cultural constraints, then apparent intelligence–creativity thresholds could emerge as statistical artifacts of measurement interpretation rather than genuine cognitive inflection points.
Taken together, these concerns suggest that apparent cross-cultural differences in threshold locations could plausibly arise from methodological factors rather than reflecting genuine cultural calibration of a hypothesized universal psychological mechanism. Our detected threshold at 102 may represent either: (a) a genuine feature of how intelligence and creativity relate in Hong Kong’s cultural context, or (b) an artifact of how Hong Kong students interpret Raven’s Matrices through culturally trained cognitive filters, combined with how they modulate creative idea expression based on cultural norms around appropriateness and self-presentation.
We cannot adjudicate definitively between these interpretations with the present evidence. The convergence of the Hong Kong 102-figure with Repeykova et al.’s UAE breakpoint of 100 is intriguing and might suggest genuine cross-cultural patterns. Yet this convergence could equally reflect non-Western contexts responding to culture-biased intelligence measures and facing cultural norms about creative expression. The Russian breakpoint at 128 might represent either a genuinely different cultural calibration or simply a different pattern of how Russian respondents approach Raven’s tasks and express creative ideas. Without explicit measurement invariance testing of Raven’s tasks across cultures, and explicit modeling of response-style effects on AUT fluency, claims about cultural calibration of universal thresholds remain speculative.
For educators, this implies that threshold-based screening for gifted programs carries hidden cultural baggage. If the threshold holds for a predominantly Western-educated or individualistic sample, applying it uniformly across culturally diverse classrooms risks systematically under-identifying creative potential in students whose cultural backgrounds emphasize different forms of creative expression or whose intelligence is expressed through alternative cognitive modalities (
Sternberg & Grigorenko, 2004). Future cross-cultural validation studies comparing threshold locations across societies with differing educational philosophies and values are encouraged to illuminate whether the threshold is truly invariant or culturally relative.
Educational implications also extend to instructional design and creativity nurturing. If the threshold is partly a cultural artifact rather than a hard cognitive ceiling, educators can potentially transcend it through culturally congruent pedagogies. For instance, schools emphasizing creative conformity may artificially suppress creativity in high-intelligence students by penalizing nonconformity; conversely, schools lacking intellectual scaffolding may fail to develop creative expression in students with emerging cognitive abilities. The data-driven precision of this study’s approach creates a platform for investigating such contextual moderation. Future work examining whether threshold patterns shift under different educational conditions, reward structures, or cultural framings would advance both theory and practice. This connects to broader literature on how teaching for creativity remains surprisingly contingent; creativity instruction works best when aligned with students’ and communities’ cultural values and epistemologies (
Beghetto & Anderson, 2022).