1. Introduction
Creativity has become a central goal in contemporary mathematics education, as learners are increasingly expected to generate ideas flexibly, construct and justify non-standard solutions, and engage productively with open-ended mathematical tasks (
Silver, 1997;
Leikin, 2009). A growing body of research has examined how specific task types—such as Multiple Solution Tasks (MSTs), Mathematical Modelling Tasks, and Fermi problems—can support students’ creative engagement. Nevertheless, judgements about the extent to which a given task has the potential to elicit creativity continue to rely largely on teachers’ professional experience and intuitive decision-making (
Vanlommel et al., 2017).
There is a lack of theoretically grounded frameworks that enable educators and researchers to compare mathematical tasks a priori in terms of their creativity-eliciting potential. Existing perspectives, such as cognitive demand (
Stein et al., 1996,
2000), are highly valuable for describing the complexity of mathematical work required by a task. The present study does not propose Creativity-Eliciting Potential Entropy (CEPE) as a replacement for such established perspectives; rather, it is intended as a complementary lens that foregrounds a different analytical dimension. Specifically, CEPE focuses on the breadth of the creative exploration space that a task structure affords—that is, the extent to which a task invites multiple assumptions, representations, and solution pathways before learners’ actual performance is observed. To address this gap, the present study proposes CEPE, an entropy-inspired theoretical indicator designed to capture the structural openness of mathematical tasks in relation to such creative exploration. This study focuses on three task types that the literature describes as having the potential to foster students’ creative mathematical thinking: MSTs (
Leikin, 2009), mathematical modelling tasks (
Lu & Kaiser, 2022), and Fermi problems (
Okamoto et al., 2023). It compares task rankings based on CEPE with teachers’ evaluations and examines to what extent CEPE complements teachers’ intuitive judgements about the potential of mathematical tasks to elicit creativity.
2. Literature Review
2.1. Framework
In this section, CEPE is developed step-by-step from existing research rather than being introduced as an isolated construct. We begin by specifying the understanding of creativity that is most relevant to the present study, with particular attention to idea generation and openness in mathematical activity. On this basis, we review three task types that have repeatedly been associated in prior research with creative mathematical engagement: MSTs, mathematical modelling tasks, and Fermi problems. We then connect these task types to broader research on the properties of mathematical tasks, especially openness in terms of start, process, and goal, before introducing entropy as a conceptual analogy for describing the diversity of possible pathways embedded in task structure. This sequence provides the basis for the theoretical proposal of CEPE presented at the end of the section.
2.1.1. Creativity
Creativity is commonly defined as the ability to generate ideas that are both novel and appropriate within a given context (
Treffinger, 2011;
Runco & Jaeger, 2012). Early studies conceptualised creativity through the lens of divergent thinking (
Guilford, 1950), while
Torrance (
1988) operationalised it using indicators such as fluency, flexibility, originality, and elaboration. Here, fluency refers to the ability to generate multiple ideas, flexibility to the ability to produce ideas across different categories or perspectives, originality to the production of novel or uncommon ideas, and elaboration to the ability to develop and refine ideas in detail. More recent cognitive perspectives describe creativity as an adaptive process that involves navigating between exploratory (divergent) and convergent modes of thinking, balancing variability and selectivity in idea generation (
Beaty et al., 2014). Contemporary research continues to expand this view. For example, recent work argues that creativity should be understood as a process involving iterative idea generation, evaluation, and refinement, rather than solely in terms of products or traits (
Green et al., 2024). Moreover, learning-science research highlights that creativity does not occur in isolation, but depends on interactions among individual cognition, prior knowledge, motivation, and contextual factors, indicating the multifaceted and dynamic nature of creativity in real-world settings (
Paaßen et al., 2022).
For the purposes of the present study, creativity is approached primarily in terms of idea generation and model generation in support of problem solving within structured mathematical activity, with particular attention to fluency and flexibility. This focus does not deny that creativity also involves evaluation, refinement, and contextual appropriateness; rather, it reflects the specific aim of the study, which is to examine the creativity-eliciting potential of task structures before learners’ actual performance is observed. Because CEPE is concerned with the extent to which a task affords diverse assumptions, representations, solution routes, and possible modelling approaches, a perspective on creativity that foregrounds divergent idea generation is especially appropriate here. This perspective is also consistent with empirical work suggesting that mathematical creativity is closely related to both divergent thinking and domain-specific mathematical competence (
Schoevers et al., 2020). In this sense, the study adopts a deliberately task-oriented and exploratory view of creativity, while acknowledging that fuller manifestations of mathematical creativity emerge through the interaction of idea generation, evaluation, and instructional context.
2.1.2. Creativity in Mathematics Education
Mathematics education research increasingly recognises creativity as an important competence (
Silver, 1997;
Leikin, 2009).
Mann (
2006) emphasised that creativity is the essence of mathematics, arguing that mathematical activity inherently involves making conjectures, identifying patterns, and generating original ideas. Recent empirical research further supports this perspective. For example,
Schoevers et al. (
2020) demonstrated that creative mathematical thinking is closely associated with students’ divergent thinking and mathematical ability, highlighting the cognitive foundations through which creativity contributes to mathematical performance.
International educational frameworks have also begun to incorporate aspects of creative thinking into assessments of students’ competencies. While creative thinking is treated as a separate domain in PISA 2022, the framework nevertheless acknowledges the importance of creative mathematical reasoning—particularly when students engage with non-routine or open-ended tasks that require flexible idea generation and justification (
OECD, 2024). Taken together, these perspectives suggest that cultivating creativity is increasingly seen as a meaningful and necessary goal in contemporary mathematics education.
2.1.3. Creativity in Multiple Solution Tasks (MSTs)
MSTs have been widely described as meaningful for supporting the development of mathematical creativity, as they invite learners to consider more than one possible approach to the same task (
Silver, 1997;
Leikin, 2009).
Leikin (
2009) noted that engaging with several solution paths may open opportunities for learners to broaden the space of mathematical ideas they activate. Empirical findings also point to potential learning benefits associated with MSTs. For example, the year-long implementation of MSTs in geometry reported by
Levav-Waynberg and Leikin (
2012) was accompanied by improvements in students’ connectedness of geometrical knowledge as well as in indicators such as fluency and flexibility. In addition,
Leikin (
2013) proposed a model for evaluating mathematical creativity in which multiplicity—the requirement to produce several solutions—serves as a basis for examining components such as fluency, flexibility, originality, and insight.
Recent studies provide further nuance to how MSTs may be related to creative mathematical performance.
Geitona et al. (
2022), for instance, examined geometry-based MSTs and found that the relationships among mathematical creativity, students’ visualisation processes, and their apprehension of geometrical figures varied depending on how the tasks were presented (e.g., with or without a supporting diagram). Taken together, these studies indicate that MSTs can provide conditions under which diverse reasoning processes may emerge, suggesting their relevance as a task type for research on mathematical creativity.
2.1.4. Creativity in Mathematical Modelling
Mathematical modelling is commonly described as a bidirectional process that connects real-world situations with mathematical representations, enabling learners to apply mathematical knowledge to address context-based tasks (
Blum & Borromeo Ferri, 2009;
Niss & Blum, 2020).
Blum and Leiss (
2007), for example, illustrated how learners translate contextual information—such as the speed, travel time, and height difference of a mountain cableway—into mathematical expressions and diagrams, followed by analysis and validation based on the constructed model. Such examples capture the typical structure of a modelling task, in which learners iteratively move between contextual understanding and mathematical formalisation.
Given its educational relevance, mathematical modelling has been incorporated into school curricula in various countries (
Stohlmann et al., 2015;
European Commission, 2019;
Vorhölter et al., 2019). Several studies have suggested that modelling may offer opportunities for learners to engage in creative processes (
Mann, 2006;
English & Watters, 2009;
Çiltaş, 2012;
Wessels, 2014). For instance,
Çiltaş (
2012) reported that preservice teachers participating in modelling activities showed increased creativity. At the same time, the conceptualisation of creativity within mathematical modelling remains a topic of ongoing discussion.
Wessels (
2014,
2017) and
Lu and Kaiser (
2021,
2022) have each proposed different sets of creativity-related components; however, a shared definition or framework has yet to emerge (
Okamoto et al., 2023). Moreover,
Lu and Kaiser (
2021,
2022) noted that certain types of modelling tasks may provide fewer opportunities for creative engagement, though little research has examined the creativity-eliciting potential of the specific task types investigated in the present study.
2.1.5. Creativity in Fermi Problems
Fermi problems constitute a particular form of mathematical modelling in which learners estimate quantities by making assumptions and approximations based on limited information (
Morrison, 1963;
Peter-Koop, 2004;
Ärlebäck & Albarracín, 2017). A classical example of Fermi problems is, “How many piano tuners are there in Chicago?” In such tasks, solvers typically decompose the overarching problem into subproblems—such as estimating population, ownership rates, or frequency of use—and formulate assumptions for each, eventually combining them into an overall estimate.
Because the initial conditions are intentionally underspecified, learners must draw on their prior knowledge and experience to supplement missing information and develop plausible assumptions (
Albarracín & Gorgorió, 2014;
Greefrath & Frenken, 2021). Owing to these characteristics, researchers have noted that Fermi problems may offer opportunities for creative thinking.
Silver (
1997), for instance, suggested that generating multiple ideas during Fermi estimation tasks can support fluency, and studies by
Goel and Singh (
1998) and
Ärlebäck and Bergsten (
2013) reported that the processes of formulating assumptions and estimations may foster aspects of creative reasoning.
More recent work has attempted to clarify creativity-related factors in Fermi problems. A distinctive feature of these studies is their focus on idea generation—specifically, on how learners reformulate or reinterpret the assumptions embedded within the problem (
Hartmann et al., 2019;
Albarracín & Ärlebäck, 2022;
Borys & Hartmann, 2022). However, consensus has yet to be reached regarding how such creativity-related factors should be defined or assessed (
Okamoto et al., 2023). Furthermore, although prior research has linked Fermi problems to creative reasoning processes, little work has examined the inherent creativity-eliciting potential of the specific task types addressed in the present study.
2.1.6. Properties of Mathematical Tasks
In mathematics education research, tasks are widely understood as the primary means through which the curriculum is enacted in classroom practice.
Doyle (
1983) characterised academic tasks as specifying the products students are expected to generate, the resources available to them, and the operations they are encouraged to perform. Through these specifications, tasks influence students’ information processing and shape the kinds of learning opportunities that become accessible to them.
Building on this perspective, Stein and colleagues (
Stein et al., 1996,
2000) proposed the Mathematical Tasks Framework, which offers an analytic lens for examining how tasks are represented in curricular materials, how they are set up by teachers, and how they are implemented during instruction and subsequently engaged with by students. This framework does not treat tasks as fixed entities; rather, it highlights that the learning opportunities a task affords may shift depending on how it is introduced and enacted in the classroom.
Stein et al. (
1996,
2000) further noted that maintaining high levels of cognitive demand during instruction is challenging, as interactions in the classroom may unintentionally simplify the intended mathematical work.
Leuders (
2023) synthesises research on the characteristics of mathematical tasks and emphasises that tasks structure and guide students’ mathematical activity. In particular, he highlights openness as a key property of tasks and systematically classifies task types according to which components—start, process, or goal—are specified by the task and which are left to the learner (see
Table 1). Here, start refers to the givens, process to the solution pathway, and goal to the intended outcome. Openness depends on which of these are specified versus left open to students.
This classification demonstrates that tasks differ substantially in the degree of freedom they offer, ranging from routine exercises to problems with multiple solution paths, inverse problems (i.e., tasks where the goal is fixed, but the method—and sometimes the starting conditions—must be reconstructed, often by working backward), open-ended situations, and worked examples.
The three task types discussed above—MSTs, mathematical modelling tasks, and Fermi problems—were selected not only because prior research has associated them with creative mathematical engagement, but also because they differ in systematic ways with respect to their structural openness. In other words, they do not merely represent different curricular formats; they also exemplify different configurations of what is given to learners at the start of a task, what remains open during the process, and how tightly the goal is specified. For this reason, a more general perspective on the properties of mathematical tasks is needed to relate these task types to a common analytical framework. The following subsection therefore turns to research on task properties, particularly task openness, as the basis for the later formulation of CEPE.
From a creativity-oriented perspective, such structural properties are highly significant because they influence the extent to which learners may activate diverse strategies, assumptions, or representations. However, existing frameworks generally describe these properties qualitatively, and few approaches provide quantitative indicators of the breadth, diversity, or dispersion inherent in a task’s potential solution space.
This lack of quantitative conceptualisation motivates the proposal of CEPE in the present study as a means of capturing, in a systematic manner, the creativity-related potential embedded within mathematical tasks.
2.1.7. Task Selection by Teachers
Prior studies have shown that when selecting mathematical tasks aimed at fostering creativity, teachers tend to rely heavily on their personal experience and intuition.
Vanlommel et al. (
2017) reported that instructional decisions are often guided more by tacit knowledge and experiential judgement than by explicit data. Similarly,
Murtafiah et al. (
2020) found that experienced teachers integrate their instructional history with content knowledge and evaluate tasks intuitively, often based on recollections of prior classroom practices.
However, such judgements are inherently subjective, and concerns regarding their consistency and validity have been raised. In mathematics education, teachers’ conceptions of mathematical creativity have been shown to vary according to their educational background and cultural context (
Leikin et al., 2013). More broadly, research on teachers’ beliefs about creativity suggests that teachers often differ substantially in how they recognise, value, and support creativity in educational settings (
Bereczki & Kárpáti, 2018). Recent systematic review work in mathematics education likewise indicates that mathematical creativity is still conceptualised through multiple notions and theoretical traditions, making it difficult to rely on a single, shared evaluative perspective when selecting tasks for their creativity-eliciting potential (
Joklitschke et al., 2022;
Sipahi & Bahar, 2025). Consequently, there is a risk that tasks with high potential for eliciting creativity may be underestimated, while others may be overvalued on the basis of individual perceptions and implicit criteria.
Even when teachers acknowledge the importance of nurturing students’ creative thinking, they often face considerable challenges in selecting and implementing appropriate tasks. Systemic pressure, such as curriculum pacing and standardised assessments, tend to favour procedural exercises over open-ended tasks (
Leikin et al., 2013). Additionally, limited expertise in teaching for creativity and adherence to traditional, teacher-centred instructional models may lead some educators to avoid using mathematical tasks with high creativity-eliciting potential, including open-ended tasks or MSTs (
Vale & Barbosa, 2024). From a practical standpoint, these tasks often generate unpredictable student responses and require teachers to demonstrate a high degree of pedagogical flexibility—something that not all teachers feel confident managing (
Lev-Zamir & Leikin, 2013).
Taken together, these findings suggest that although intuition- and experience-based judgements play a significant role in task selection, their accuracy and reliability are limited. Teachers’ preferences and previous successes may unconsciously introduce bias, potentially leading to task choices that misalign with the actual potential to foster creativity (
Vanlommel et al., 2017).
These limitations underscore the need for a more objective and theoretically grounded framework for describing the creativity-eliciting potential inherent in mathematical tasks. Importantly, the aim is not to replace teachers’ professional judgement, but to complement it by making more transparent the structural task features that may underlie such judgements. This need becomes even more pressing when the field itself contains multiple notions and evaluation traditions concerning mathematical creativity (
Joklitschke et al., 2022;
Sipahi & Bahar, 2025). Against this background, the following section introduces CEPE as a theory-driven lens for describing task openness and anticipating, a priori, the potential of mathematical tasks to elicit creative mathematical activity.
2.1.8. Entropy
Entropy, originally developed in physics and information theory (
Boltzmann, 1877;
Shannon, 1948), provides a general measure of diversity and uncertainty within a system. In educational and psychological research, entropy has been used, not in its strict physical or engineering sense, but as an indicator of the dispersion, variability, and openness of learning processes (
Koyama & Niwase, 2019;
Mai et al., 2023). These studies show that entropy can capture how widely learners explore available options, how stable their behaviours are over time, and how diverse the conditions of a learning environment can be.
Although applications of entropy to creativity research remain limited, several authors have drawn connections between entropy and creative thinking. Creativity has been conceptualised as involving the exploration of a broad space of ideas (
Beaty et al., 2014) or as the adaptive regulation of uncertainty during idea generation (
Malaie et al., 2024). These perspectives align with the entropic notion that creativity involves navigating and managing a landscape of diverse possibilities.
Building on these interpretations, the present study uses entropy as a conceptual framework for describing the structural openness of mathematical tasks. We propose CEPE as an exploratory index that captures how many pathways, assumptions, or solution routes a task potentially affords. Rather than applying information-theoretic formulas in a strict sense, CEPE adapts the general idea of entropy—quantifying dispersion and diversity—to characterise task features that teachers often evaluate intuitively. This enables a more objective perspective on a task’s potential to elicit creative mathematical thinking.
2.1.9. A Proposal of Entropy in This Study
Building on the literature reviewed above, this study proposes CEPE as a theory-driven indicator for identifying, a priori, mathematical tasks that have the potential to elicit students’ creativity. CEPE is defined here as an indicator of the overall structural openness of a task with respect to solution strategies, assumptions, representations, and validation. Drawing on an entropy analogy, CEPE refers to the breadth and diversity of possible task-realisation states that a given task structure affords before learners’ actual performance is observed. In this sense, CEPE is concerned not with creativity as an outcome but with the extent to which the structure of a task makes creative mathematical activity possible.
In line with the theoretical argument developed in the preceding subsections, the present study focuses on three task types that have been discussed in prior research as potentially supportive of creative mathematical thinking: a geometrical proof task, a mathematical modelling task, and a Fermi problem. These task types were selected because they differ systematically in terms of task openness across start, process, and goal, while at the same time representing well-established forms of mathematical activity in the relevant research traditions (
Bicer et al., 2021;
Joklitschke et al., 2022;
Vale & Barbosa, 2024). The purpose of selecting these three tasks is therefore not to claim that they exhaust the full range of creativity-supporting tasks, but to provide analytically contrasting cases through which the theoretical usefulness of CEPE can be explored.
The three tasks analysed in this study were selected as representative examples of broader task types that have been discussed in prior research on mathematical creativity (
Vale & Barbosa, 2024;
Sipahi & Bahar, 2025). More specifically, the proof task was chosen in relation to research on MSTs, which has highlighted the role of multiple pathways in creative mathematical activity (
Leikin, 2009); the modelling task was chosen in relation to research emphasising the creativity-demanding character of mathematical modelling (
Lu & Kaiser, 2022); and the Fermi problem was chosen in relation to research identifying the generation of assumptions and intermediate models as central to creative engagement in such tasks (
Okamoto et al., 2023). At the same time, the three tasks were selected because they differ systematically in their structural openness and therefore provide analytically contrasting cases for examining the theoretical usefulness of CEPE. Their selection was thus guided by both prior research on creativity-supporting task types and their theoretical relevance to the CEPE framework, rather than by an attempt to compare all possible school mathematics tasks.
In this study, three mathematical tasks are examined: a geometrical proof task as a Multiple Solution Task (MST), a Mathematical Modelling Task, and a Fermi Problem. The three tasks addressed in this study are as follows (
Figure 1):
Based on the task characteristics framework proposed by
Leuders (
2023), these tasks are categorised according to the classification shown in
Table 1. The following analysis explains how this task classification relates to the concept of CEPE.
Rather than treating CEPE as a fully operationalised quantitative entropy measure, the present study adopts it as a theory-driven predictive indicator of creative potential. Specifically, CEPE is preliminarily defined as a relative ranking determined through structured expert judgement. For the three tasks analysed in the next section, a hypothesised ordering of CEPE values is proposed and subsequently compared with teachers’ evaluations of each task’s potential to elicit creativity. This approach allows CEPE to function as an analytical lens for task selection without presupposing direct measurement of creativity. CEPE should therefore be understood as a theoretical predictor rather than a direct measure of creativity itself. While tasks with higher CEPE values are assumed to provide structural conditions that are more conducive to creative activity, such conditions do not guarantee creative outcomes. Instead, creativity emerges through complex interactions between task structure and factors such as learners’ prior knowledge, instructional design, and assessment criteria. From this perspective, CEPE conceptualises a task in terms of the openness of its start, process, and goal phases. Based on this framework, the three tasks are analysed below with reference to their classification in
Table 1, and their relative CEPE levels are discussed.
Task 1, the geometrical proof task as a MST, provides a clearly defined proposition and goal while allowing multiple proof strategies. The Process phase is therefore categorised as unknown, whereas the Start and Goal phases are given. Structurally, the task can be regarded as an Inverse Problem. However, the availability of multiple solution paths positions it between a Problem and an Inverse Problem. Its overall CEPE is thus classified as Medium-Low.
Task 2, the Mathematical Modelling Task, presents a real-world situation with a given context in the Start phase. The Process phase is open, as learners independently construct, revise, and validate models. The Goal phase allows variability depending on modelling assumptions and can therefore be regarded as known/unknown. Owing to this flexibility, Task 2 is classified as a Problem with a Medium-High level of CEPE.
Task 3, Fermi Problems, provide only partial initial information and require learners to generate their own assumptions and estimations. Both the Process and Goal phases remain open, with no single correct solution. Consequently, Task 3 corresponds to an Open Situation/Open-Ended Problem and exhibits a High level of CEPE, consistent with prior analyses of the branching and open structure of Fermi Problems.
Table 2 summarises the degree of openness across the three phases for each task and indicates their correspondence with the task types in
Table 1, together with their relative CEPE levels.
More generally, tasks with low structural openness—such as Closed Tasks, Routine Exercises, and Worked Problems—are characterised by low CEPE values. Problems and Inverse Problems exhibit greater openness, particularly in the Process phase, and are associated with medium levels of CEPE. Open Situations and Open-Ended Problems represent the highest degree of openness and therefore the highest CEPE.
In summary, the creative potential of a task depends on its structural openness. Higher openness corresponds to a greater number of possible microscopic states—that is, a wider range of approaches learners may adopt to meet the task’s requirements. In this sense, CEPE can be interpreted, following
Boltzmann’s (
1877) notion of entropy, as an approximate indicator of the internal diversity embedded in a task’s structure.
Being able to identify tasks with high creative potential a priori is valuable for instructional design and curriculum development. While teachers’ intuitive judgements are often informative, CEPE offers an additional theoretical framework for systematically describing and comparing the creative affordances of mathematical tasks. Against this background, an important question remains as to whether, and to what extent, judgements based on CEPE align with teachers’ intuitive evaluations of mathematical tasks. Accordingly, the present study formulates the following research question.
3. Research Question
To what extent does the judgement based on the newly proposed CEPE align with teachers’ judgements regarding how strongly the three mathematical tasks (MST, Mathematical Modelling Task, and Fermi Problem) can elicit students’ creativity?
4. Method
This section describes the research procedure and the methods used for analysing the collected data.
4.1. Subjects
A total of 32 teachers participated in this study, including 12 Japanese and 20 German participants. The sample comprised 20 male and 12 female teachers. Participants ranged in age from 23 to 57 years (M = 35.7), and their teaching experience ranged from under 1 year to 28 years (M = 10.2 years). All participants were either current mathematics teachers or had prior experience teaching mathematics. All 12 Japanese participants held a mathematics teaching licence. The German participants held a teaching qualification in mathematics together with at least one additional subject. The participants also represented a range of school contexts. The Japanese participants were associated with elementary, lower secondary, and upper secondary education, whereas the German participants worked in school types such as Gymnasium, Realschule, and Gesamtschule, representing academically oriented, intermediate, and comprehensive secondary pathways within the German school system. They were recruited through institutional affiliations and research team networks. Data collection was conducted in a semi-structured interview format, administered either face-to-face or online. Participants evaluated the presented tasks based on the given prompts. The interviews were conducted in the participants’ native languages and were audio-recorded. Furthermore, the interviewer shared the same nationality and native language as the participants to ensure linguistic and cultural consistency in the data collection process.
4.2. Procedure
Participants ranked the three tasks according to their perceived potential to elicit creativity (1 = most creativity-eliciting, 3 = least creativity-eliciting). Tied rankings were not permitted; participants were required to assign distinct ranks (1st–3rd) to all tasks. For face-to-face interviews, the tasks were presented during the interview. For online interviews, the task materials were sent electronically in advance; some participants first examined the tasks during the interview, whereas others had already looked through them shortly beforehand. However, participants were not informed in advance about the specific task-comparison activity or the ranking procedure. They were only told beforehand that the interview concerned mathematics lessons. Participants were given sufficient time to examine the tasks, and no time limit was imposed. In addition, they were asked during the semi-structured interview to explain the reasons for their rankings, and their verbal responses were transcribed verbatim. The interview guide first elicited participants’ own understanding and definition of creativity in mathematics teaching, as well as its role in classroom practice, before asking them to compare the three tasks. Participants were then asked to rank the tasks from the most suitable to the least suitable for fostering creativity, to assign each task a score on a 1–10 scale, and to explain briefly the reasons for their evaluations. This procedure yielded not only quantitative ranking data but also qualitative data indicating which aspects of the task structure influenced their evaluations.
Furthermore, the authors also established their own ranking of the three tasks based on CEPE. As described in the section, A Proposal of Entropy in this Study, the predicted order was Task 3 (Fermi Problem) ranked first, Task 2 (Mathematical Modelling Task) ranked second, and Task 1 (geometrical proof task as a MST) ranked third.
4.3. Data Analysis
Statistical analyses were conducted to examine the degree of agreement between the researchers’ a priori rankings and the teachers’ evaluations. The a priori ranking was established by the authors on the basis of the theoretical analysis developed in
Section 2, especially the proposed CEPE framework and the classification of task openness in terms of start, process, and goal. In this sense, the ranking was not treated as an independent coding exercise, but as a theory-driven analytical prediction derived from the conceptual framework of the study. First, for exact matches, the proportion of teachers whose complete rankings exactly matched the researchers’ predicted rankings was calculated. This observed proportion was then tested against the random baseline (1/6) using the exact binomial test. Similarly, for top matches, the proportion of teachers who ranked as first the same task that the researchers predicted to be first (the high-CEPE task) was calculated and tested against the random baseline (1/3) using the exact binomial test.
Next, to examine rank correlations, Kendall’s τ was computed between the researchers’ fixed rankings and each teacher’s rankings. The mean correlation coefficients and 95% confidence intervals were estimated using bootstrapping, and Kendall’s τ-b for the three-task case was also calculated. To assess consistency among teachers, Kendall’s coefficient of concordance (W) was computed based on the evaluation data of all 32 participants across the three tasks, and its significance was tested using the chi-square distribution with df = 2. Although the sample size is limited, an exploratory statistical analysis is conducted by country (Japan and Germany).
5. Research Hypotheses
To address the research question, the following hypotheses were formulated to examine the relationship between the CEPE-based task ranking and teachers’ evaluations from four complementary perspectives: exact agreement with the predicted ranking, agreement on the top-ranked task, overall rank correlation, and consensus among teachers.
H1. (Exact Match): The proportion of teachers whose complete rankings (1 → 2 → 3) match the researchers’ a priori rankings will be significantly higher than the random baseline (1/6).
H2. (Top Match): The proportion of teachers whose first-ranked task matches the researchers’ a priori first-ranked task (i.e., the high-CEPE task) will be significantly higher than the random baseline (1/3).
H3. (Rank Correlation): Kendall’s τ between the researchers’ rankings and the teachers’ rankings will be significantly positive (greater than 0).
H4. (Consensus): Kendall’s coefficient of concordance (W) among teachers will be significantly greater than 0, indicating a certain level of consensus across teachers.
6. Results
The results of the statistical analyses for the overall sample of Japanese and German participants are presented in this section. A certain degree of agreement was observed between the teachers’ evaluations and the prior predictions based on CEPE. Specifically, 11 out of 32 participants (34.4%) produced rankings that exactly matched the CEPE-predicted order (Fermi Problem > Mathematical Modelling > Proof). This proportion was significantly higher than the probability expected by chance (1/6 ≈ 16.7%), as confirmed by an exact binomial test (p = 0.012). This finding indicates that the task ranking predicted by CEPE corresponded, at least partially, with the teachers’ empirical judgements.
Furthermore, 19 out of 32 participants (59.4%) ranked the Fermi Problem as the most creativity-eliciting task, a proportion significantly greater than the chance level (1/3 ≈ 33.3%; p = 0.002). This suggests that teachers were more likely to perceive the Fermi Problem as having higher potential to elicit creativity compared with the other tasks.
In addition, Kendall’s rank correlation coefficient (τ) between the CEPE-predicted order and each participant’s ranking was calculated. The mean τ was 0.31, with a 95% bootstrap confidence interval of [0.10, 0.52]. This indicates a weak but consistent positive correlation between the two rankings, suggesting partial support for the theoretical prediction derived from CEPE. However, the effect size was within the small-to-moderate range, implying that individual differences among teachers’ evaluations still exist.
Finally, the degree of consensus among all participants was assessed using Kendall’s coefficient of concordance (W), yielding a value of W = 0.126, which was small but statistically significant (χ2(2) = 8.04, p = 0.018). This result suggests that while a weak yet statistically detectable common tendency existed among teachers’ evaluations, considerable variation remained. Hence, although a shared understanding of “tasks that can elicit creativity” appears to be emerging among teachers, the degree of consensus remains limited.
These results are visually illustrated in
Figure 2 (heatmap) and
Figure 3 (stacked bar chart). In
Figure 2, the overall distribution of rankings across the three tasks is represented by colour intensity, clearly showing that the Fermi Problem was most frequently selected as the top-ranked task.
Figure 3 visualises the proportion of each rank assigned to each task, revealing characteristic patterns in teachers’ evaluations—for example, the relatively high proportion of the Proof Task being placed in the third rank.
Subsequently, similar analyses were conducted separately by country. In both countries, a consistent tendency was observed: teachers tended to assign the Fermi Problem the highest rank (Rank 1), aligning with the CEPE-predicted order (Fermi Problem > Mathematical Modelling Task > Proof Task). However, differences were observed in the statistical strength of these tendencies (see
Figure 4 and
Figure 5).
In Germany (n = 20), seven participants (35.0%) produced rankings that perfectly matched the order predicted by CEPE. This proportion was significantly higher than the chance level (1/6 ≈ 16.7%), as confirmed by an exact binomial test (p = 0.037). In addition, twelve participants (60.0%) ranked the Fermi Problem first, which was significantly greater than the chance level (1/3 ≈ 33.3%; p = 0.013). The mean rank correlation with the researchers’ predicted order was Kendall’s τ = 0.333, with a bootstrap 95% confidence interval of [0.067, 0.583], lying in the positive range. The coefficient of concordance among participants was W = 0.153, χ2(2) = 6.10, p = 0.047, indicating a small but statistically significant level of agreement. These results suggest that among German participants, the evaluation tended to align with the CEPE-based assumption that higher entropy corresponds to greater creativity potential, and that a weak yet consistent collective consensus was observed within the group.
In Japan (n = 12), four participants (33.3%) showed perfect agreement with the CEPE-predicted order. However, this result did not reach statistical significance (p = 0.125). Seven participants (58.3%) ranked the Fermi Problem first, exceeding the chance level (1/3) but only marginally failing to reach significance (p = 0.066). The mean Kendall’s τ was 0.278 with a 95% confidence interval of [−0.111, 0.611], indicating a positive point estimate but with substantial uncertainty. The coefficient of concordance was W = 0.090, χ2(2) = 2.16, p = 0.338, showing no statistically significant agreement among participants. Overall, Japanese participants exhibited a similar directional tendency, but the relatively small sample size (n = 12) resulted in greater uncertainty in the estimates.
Between-country comparisons revealed no significant differences in the rate of perfect agreement (Germany − Japan = +1.7 percentage points) or in the proportion of participants ranking the Fermi Problem first (Germany − Japan = +1.7 percentage points), as tested by Fisher’s exact test (both p = 1.000). The difference in mean Kendall’s τ (Germany − Japan = +0.056) was not significant according to the permutation test (p = 0.831). Therefore, the overall evaluation pattern appeared consistent across both countries, suggesting that the CEPE-predicted directionality was broadly shared across national contexts.
7. Discussion
The results revealed that the overall level of agreement among teachers when ranking the three mathematical tasks was small but statistically significant (Kendall’s W = 0.126, p = 0.018). This indicates the presence of a weak yet systematic consensus regarding the relative creativity-eliciting potential of the tasks. In particular, the tendency to rank the Fermi Problem highest and the Proof Task lowest suggests that teachers shared a general directional understanding of which task structures are more likely to foster creativity.
At the same time, the modest magnitude of Kendall’s W highlights substantial variability in teachers’ judgements. This pattern is consistent with prior research demonstrating that teachers do not hold a single, unified conception of creativity. Instead, teachers rely on diverse and often implicit beliefs shaped by personal experience, pedagogical values, and instructional contexts (
Bereczki & Kárpáti, 2018). Such heterogeneity in creativity beliefs naturally limits the degree of inter-rater agreement, even when teachers evaluate the same set of tasks.
Importantly, similar patterns were observed across Japanese and German participants, and no statistically significant between-country differences were found. This suggests that the limited consensus is unlikely to be attributable to cultural or educational system differences. Rather, it reflects individual variability in how creativity in mathematics is conceptualised and operationalised by teachers.
Despite the overall variability, teachers’ rankings showed a statistically significant alignment with the a priori order predicted by CEPE. Approximately one third of participants produced rankings that matched the CEPE-predicted order exactly, a proportion significantly exceeding the chance level. Moreover, the Fermi Problem—predicted by CEPE to have the highest creativity-eliciting potential—was ranked first by a clear majority of teachers.
These findings suggest that CEPE captures structural characteristics of tasks that resonate with teachers’ intuitive evaluations. This interpretation aligns with research in mathematics education showing that tasks allowing multiple solution strategies, representations, and assumptions are particularly conducive to mathematical creativity (
Levav-Waynberg & Leikin, 2012). From this perspective, CEPE functions as an indicator of the affordances for creativity embedded in task design, rather than as a predictor of actual creative performance.
However, the moderate magnitude of Kendall’s τ indicates that CEPE does not fully account for teachers’ judgements. Rather than undermining the framework, this result suggests that CEPE should be interpreted as a supportive theoretical lens that explains part—but not all—of the variance in teachers’ evaluations.
Interview responses help to contextualise why teachers’ rankings aligned only partially with the CEPE-based prediction. Teachers who ranked the Fermi problem highly often emphasised that it was closely connected to everyday life, accessible to a broad range of learners, and open to multiple lines of thought. In this sense, the Fermi problem was frequently perceived not only as structurally open, but also as broadly accessible in classroom settings. This interpretation is consistent with prior research suggesting that Fermi problems can support the generation of assumptions, estimations, and multiple lines of reasoning, while also drawing on learners’ everyday knowledge and experience (
Albarracín & Gorgorió, 2014;
Greefrath & Frenken, 2021;
Okamoto et al., 2023). At the same time, some teachers suggested that this very openness can make such tasks harder to position within a tightly defined instructional sequence, because the diversity of possible responses makes them less suitable as a direct application immediately following a specific unit of content.
By contrast, teachers who ranked the geometrical proof task lower often did so not because they denied its creative potential in principle, but because they regarded it as strongly dependent on learner characteristics, such as prior knowledge, year level, and mathematical confidence. Several participants suggested that proof tasks may indeed elicit creativity for mathematically strong learners, especially when multiple solution pathways are available, but may be less accessible for many other learners because rigour tends to take precedence over openness. This interpretation is in line with research on MSTs, which has shown that such tasks can foster fluency, flexibility, and connected mathematical thinking, while at the same time presupposing access to sufficiently developed mathematical resources (
Leikin, 2009;
Levav-Waynberg & Leikin, 2012). Thus, CEPE-based openness alone did not fully determine teachers’ judgements: teachers were also considering for whom a task is likely to become creative in practice.
The modelling task was frequently positioned between the other two tasks. Teachers often described it as offering a productive balance: it was seen as broader and more realistic than the proof task, yet more guided and manageable than the Fermi problem. In particular, participants noted that modelling tasks allow multiple solutions and interpretations, while still providing a degree of direction through the task setting. This view is broadly consistent with prior research suggesting that mathematical modelling can function as a creativity-demanding activity while also remaining closely connected to curricular aims and classroom practice (
Lu & Kaiser, 2022;
Vale & Barbosa, 2024). At the same time, interview responses suggested that instructional constraints were relevant across all three task types. Teachers indicated that creativity-oriented tasks often require more instructional time and are difficult to assess, both because evaluating students’ creative development is demanding and because open-ended responses themselves are harder to evaluate consistently. They also noted that greater openness tends to generate a wider range of student responses, which can make classroom management more difficult. Taken together, these interview-based considerations suggest that teachers’ judgements reflected not only structural openness but also the perceived fit between task demands, learner readiness, assessment demands, and classroom manageability. This interpretation is consistent with broader research showing that teachers’ beliefs about creativity are strongly context dependent and that the assessment of mathematical creativity and open-ended mathematical work remains methodologically challenging (
Bereczki & Kárpáti, 2018;
Sipahi & Bahar, 2025).
CEPE conceptualises task openness through an entropy analogy, suggesting that creativity-eliciting potential may be associated not with maximal openness, but with a more balanced form of uncertainty that still allows productive exploration. This interpretation is broadly consistent with psychological research indicating that moderate uncertainty can support exploratory engagement, whereas excessive uncertainty may impede action (
Hirsh et al., 2012).
The preceding discussion suggests that creativity-eliciting potential cannot be fully explained by a purely dichotomous understanding of openness. Rather, both the quantitative results and the interview data indicate that creativity may be most strongly supported when tasks provide a balanced degree of openness, while excessive openness may hinder creative engagement by leaving learners uncertain about how to proceed. From this perspective, the present findings do not yet justify a definitive reformulation of CEPE, but they do suggest the value of exploring whether different degrees of openness might be represented more explicitly within the framework.
On this basis, we retain only a cautious conceptual implication. The present findings suggest that creativity-eliciting potential may depend not simply on greater openness, but on a pedagogically productive balance of openness across task stages. In this sense, future research may explore whether CEPE can be refined to represent degrees of openness more explicitly. However, such an extension would require independent empirical and theoretical justification and is therefore beyond the scope of the present study.
8. Conclusions
This study set out to examine whether the newly proposed concept of Creativity-Eliciting Potential Entropy (CEPE) can serve as a theoretically grounded indicator for identifying mathematical tasks that have the potential to elicit students’ creativity, and to investigate the extent to which CEPE aligns with teachers’ intuitive evaluations of such tasks.
By comparing teachers’ rankings of three representative task types—a geometrical proof task, a mathematical modelling task, and a Fermi problem—with a priori rankings derived from CEPE, the study yielded three main findings. Firstly, teachers’ evaluations exhibited a weak but statistically significant consensus, indicating that a shared directional understanding of creativity-eliciting task characteristics exists, even though substantial individual variation remains. Secondly, teachers’ rankings showed a significant alignment with CEPE-based predictions, particularly in the tendency to evaluate the Fermi problem as the most creativity-eliciting task. This result suggests that CEPE captures structural task properties that resonate with teachers’ professional intuitions. Thirdly, discrepancies between CEPE-based predictions and teachers’ judgements were systematically related to learner-related factors and instructional constraints, highlighting that creativity in mathematics education emerges from the interaction between task structure, learner resources, and classroom context. This conclusion is consistent with prior research on MSTs, mathematical modelling, and Fermi problems, which has shown that the creativity-eliciting potential of tasks depends not only on openness as such, but also on how task structure interacts with learners’ resources and instructional conditions (
Leikin, 2009;
Levav-Waynberg & Leikin, 2012;
Lu & Kaiser, 2022).
Taken together, these findings indicate that CEPE should be understood neither as a direct measure of creativity nor as a prescriptive tool for task selection. Rather, CEPE functions as a theory-driven analytical lens that makes the degree of structural openness embedded in mathematical tasks explicit—an aspect that teachers often evaluate implicitly. In this sense, CEPE is best understood as a complementary theoretical lens rather than as a replacement for teachers’ professional judgement or for existing task-related frameworks. Accordingly, CEPE contributes to mathematics education research by offering a conceptual bridge between task-design theory and teachers’ experiential knowledge.
From a theoretical perspective, the study advances prior entropy-based approaches in education by shifting the focus from observed diversity (e.g., behavioural distributions or interaction patterns) to the potential diversity inherent in task structure itself. By interpreting openness in terms of an entropy analogy, CEPE provides a coherent framework for discussing why certain tasks afford a broader space of possible solution strategies, assumptions, and representations, thereby increasing opportunities for creative engagement. At the same time, the findings also suggest that creativity-eliciting potential may depend not simply on more openness, but on a pedagogically productive balance of openness across task stages.
To extend this framework, the study proposed a preliminary quantitative formulation—numerical CEPE (n-CEPE)—which models task openness probabilistically across the Start, Process, and Goal stages. Importantly, n-CEPE was not applied to the present dataset, as the empirical design focused on ordinal teacher judgements rather than metrically scaled task properties. Instead, n-CEPE is positioned here only as an exploratory methodological extension of CEPE, rather than as a validated measure established by the present study. In this sense, it should be regarded as a proposal for future refinement rather than as an analytic tool implemented in the present dataset.
In summary, this study demonstrates that CEPE provides a meaningful and theoretically grounded way to anticipate the creativity-eliciting potential of mathematical tasks a priori. While CEPE does not replace teachers’ professional judgement, it offers a systematic reference point that can support task selection, instructional design, and reflective teacher education. By clarifying the structural conditions under which creativity is more likely to emerge, CEPE contributes to ongoing efforts to make creativity a more transparent and discussable component of mathematics education. This contribution is particularly relevant in light of prior research, showing that teachers’ beliefs about creativity are context dependent and that the assessment of mathematical creativity remains methodologically challenging (
Bereczki & Kárpáti, 2018;
Sipahi & Bahar, 2025).
However, several limitations of this study should be acknowledged. Firstly, the sample size was relatively small, comprising 32 teachers from Japan and Germany. Although sufficient for exploratory ordinal analyses, this limits the generalisability of the findings. In particular, factors such as teaching experience, grade levels taught, and familiarity with specific task types were not systematically controlled.
Secondly, while data related to task openness and related considerations were collected, they were intentionally not integrated into the quantitative analysis. The present study focused on ordinal comparisons between CEPE-based predictions and teachers’ rankings. Incorporating additional variables would have required metric scaling and weighting assumptions that could not yet be empirically justified, risking an unwarranted level of precision.
Thirdly, CEPE was examined indirectly through predicted task rankings rather than through direct numerical computation. As discussed, this choice reflects the exploratory nature of the study and the current lack of validated quantitative operationalisations of task openness.
Finally, the study did not examine learners’ actual creative performance. Consequently, no claims can be made regarding the direct impact of high-CEPE tasks on students’ creative outcomes.
Future research should therefore pursue larger and more diverse samples, refine the quantitative operationalisation of task openness, and apply n-CEPE in conjunction with empirical learner data. Comparative analyses linking CEPE values, teachers’ judgments, and students’ problem-solving behaviours will be particularly important for clarifying how structural task diversity translates into manifested creativity.