Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework

Yusuf, Abdullahi; Román-González, Marcos; Atan, Noor Azean; Behera, Santosh Kumar; Noor, Norah Md

doi:10.3390/educsci16010031

Open AccessArticle

Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework

by

Abdullahi Yusuf

^1,2

,

Marcos Román-González

^3,*

,

Noor Azean Atan

²

,

Santosh Kumar Behera

⁴

and

Norah Md Noor

²

¹

Department of Science Education, Sokoto State University, Sokoto 2134, Nigeria

²

Department of Advanced Learning Technology, Faculty of Educational Sciences and Technology, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia

³

Faculty of Education, National Distance Education University (UNED), 28040 Madrid, Spain

⁴

Department of Education, Kazi Nazrul University, Asansol 713340, India

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2026, 16(1), 31; https://doi.org/10.3390/educsci16010031

Submission received: 29 November 2025 / Revised: 18 December 2025 / Accepted: 22 December 2025 / Published: 26 December 2025

(This article belongs to the Special Issue Paving the Way for Quantum Education in K-12)

Download

Browse Figures

Review Reports Versions Notes

Abstract

As quantum technologies advance, there is growing international interest in integrating quantum concepts into secondary education. However, most K-12 quantum education studies rely on self-reported data or informal assessments lacking documented validity. This study aimed to address this gap by developing and validating the Quantum Literacy Test (QLt), a standardized instrument designed to objectively assess upper-secondary students’ understanding of foundational quantum concepts, practices, and perspectives. Grounded in the computational thinking (CT) framework, the QLt was piloted with 819 senior secondary school students in Nigeria and underwent a multi-phase validation process, including expert review, factor analysis, item-response modeling, differential item functioning analysis, and concurrent validity. The QLt demonstrated high internal consistency (α = 0.87) and structural validity. Strong concurrent validity was observed with the Computational Thinking Test (r = 0.65), and moderate validity was observed with a Spatial Ability Test (r = 0.32). However, machine learning models explained less than 40% of QLt score variance, suggesting the domain-specific nature of quantum literacy. We recommend future research to expand the QLt across diverse cultural contexts and to increase item coverage of quantum practices and perspectives. The QLt offers a valuable tool for evaluating curriculum effectiveness and monitoring equity in quantum education, thereby contributing to a more inclusive quantum-ready workforce.

Keywords:

quantum computing; quantum literacy; K-12; computational thinking; validation

1. Introduction

Quantum computing (QC) is an emerging domain with the potential to transform numerous industries by solving problems that exceed the capacity of classical computing systems (J. C. Meyer et al., 2024; Rycerz et al., 2015). Rooted in the foundational principles of quantum mechanics (QM; Khan et al., 2023), the idea of QC was initially introduced by Richard Feynman in 1981, who suggested that quantum systems could be simulated effectively using quantum machines (Kandala et al., 2017). Despite early enthusiasm, the realization of practical quantum computers was delayed due to technical challenges, most notably the difficulty of stabilizing qubits. Progress accelerated in the early 2000s with contributions from industry leaders such as IBM, paving the way for the first quantum prototypes (Singh & Bhangu, 2023). Today, quantum computing is gaining momentum across various scientific disciplines, promising computational breakthroughs and spurring what scholars have termed the Second Quantum Revolution (Dowling & Milburn, 2003). This momentum is reflected in the growing emphasis on quantum education, particularly in quantum information science (QIS).

In response to this technological shift, numerous researchers and educational initiatives have begun introducing QC into university (e.g., Escanez-Exposito et al., 2025) and K-12 education (e.g., Holincheck et al., 2024; Hughes et al., 2022; Ivory et al., 2023; Tucker, 2023). These efforts are driven by a range of motivations, including the strategic importance of national quantum initiatives, the interdisciplinary nature and accessibility of quantum topics, the benefits of early exposure to complex scientific ideas, and the need to prepare students for future careers in the quantum workforce (Yusuf et al., 2025). At the K-12 and university levels, these studies have adopted diverse pedagogical approaches and tools to facilitate quantum learning and reported various educational outcomes from these interventions.

However, a common limitation across these studies is the reliance on self-reported or anecdotal learning outcomes following exposure to quantum content. To the best of our knowledge, only one standardized instrument has been developed in this domain: the PSEUDO-QM scale by Bondani et al. (2024). This instrument focuses on measuring pseudoscientific beliefs related to quantum mechanics, aiming to assess the prevalence of misconceptions and misinformation among high school students. While the PSEUDO-QM scale provides valuable insights into students’ beliefs and the influence of pseudoscientific narratives, it does not directly assess students’ conceptual understanding or literacy in QC. This gap highlights the need for robust assessment tools to capture the cognitive and conceptual gains achieved through quantum learning experiences (Nita et al., 2021). It is within this context that the present study introduces and validates the Quantum Literacy Test (QLt), designed to offer a reliable and standardized means of measuring quantum literacy among upper-secondary students.

2. Conceptual and Theoretical Foundations

The development of the QLt was grounded in recent theoretical proposals that extend computational thinking (CT) to quantum education (Román-González et al., 2025). This work outlines three stages through which the CT framework has evolved and expanded: (1) the rule-based stage, (2) the data-driven stage, and (3) the quantum stage. Below, we provide a concise overview of these stages and their extension into CT, culminating in the conceptual framework that guides the development of the QLt.

The first stage focuses on rule-based programming, broadly encompassing procedural programming and procedural thinking, as originally conceptualized by Papert (1980). Román-González et al. (2025) proposed that CT in this stage primarily follows a deductive process, where all rules and steps are explicitly defined beforehand. In other words, the rule-based concept is rooted in a deterministic philosophy, which assumes that a series of commands follows a clear, structured sequence guided by predefined rules and moves from general premises to specific statements. A practical application of CT in this stage is the use of visual block-based programming environments, such as Scratch, App Inventor, or Snap!, which help prevent syntax errors and support the development of computational thinking processes (Román-González et al., 2025).

The second stage of CT extension focuses on the construction and validation of artificial intelligence (AI) and machine learning (ML) models. Unlike the deterministic nature of CT in the rule-based stage, CT in the data-driven stage adopts a probabilistic approach, reflecting the iterative process of building and refining predictive or explanatory AI/ML models. In this stage, CT is probabilistic because AI/ML models are not programmed in the traditional sense but are progressively developed and refined through data collection, training, and modeling phases. Leading scholars have highlighted the potential for extending CT into this phase (Grover, 2024; Tedre, 2022). For instance, Tedre (2022) explores how ML technology challenges traditional CT paradigms in computing education. He argues that several classical CT principles need to be re-evaluated for this new phase, transitioning from foundational concepts like control structures and problem-solving workflows (CT 1.0) to advanced considerations such as correctness and the notational machine (CT 2.0). A practical application of CT at this stage is the use of prompting in generative AI tools to assist individuals in problem-solving and idea generation (Yusuf et al., 2024), which aligns with the core definition of CT as a convergent process that integrates human creativity with computational capabilities (Repenning & Grabowski, 2023, p. 1; Román-González et al., 2025, p. 5).

While the rule-based and data-driven stages belong to the classical computing paradigm, where information is encoded in binary form (0 or 1), the quantum-oriented stage draws on representations based on superposition, in which quantum states can be expressed as linear combinations of basis states (Román-González et al., 2025). Given the recognized role of CT in supporting creative and conceptual problem-solving, prior work has proposed extending CT frameworks to better align with quantum education contexts (Román-González et al., 2025). Compared with the deterministic and probabilistic reasoning emphasized in earlier stages, this proposed extension foregrounds a more holistic orientation. Building on Brennan and Resnick’s (2012) CT framework, such work has articulated CT in quantum contexts in terms of three components: quantum concepts (e.g., qubits, superposition, entanglement, quantum gates), quantum practices (e.g., seeking coherence and intentional collapse), and quantum perspectives (e.g., embracing uncertainty and recognizing unity within diversity) (Román-González et al., 2025; Xenakis et al., 2023). Building on this review, Román-González et al. (2025) proposed a framework outlining how CT can be extended to quantum literacy (see Table 1).

The proposals of quantum CT in stage 3 offer a conceptual and theoretical foundation for developing the QLt. In this study, we adopted the three interconnected components: quantum concepts, quantum practices, and quantum perspectives (see Figure 1). Quantum concepts (QC) refer broadly to the fundamental ideas individuals engage with as they seek to understand quantum education. Quantum practices (QPr) encompass the behaviors and habits that emerge through interaction with these concepts. Finally, quantum perspectives (QPs) represent the views and opinions individuals form about the world and themselves regarding the quantum universe.

3. Design Principles

Building on the conceptual and theoretical foundations outlined above, we developed a 30-item Quantum Literacy Test (QLt). In addition to the quantum concepts proposed by Xenakis et al. (2023) and Román-González et al. (2025), we added the concepts of reversibility (Franklin et al., 2020), teleportation (Hughes et al., 2022), and algorithm (German et al., 2024). We propose that these are also part of the fundamental quantum concepts relevant to K-12 quantum education. Although the qubit is introduced instructionally as the foundational unit of quantum information, it was not retained as a standalone core literacy item in the QLt. During expert validation, the qubit was identified as a cross-cutting prerequisite that underpins multiple quantum concepts (particularly superposition), and not as an independent literacy construct. Its exclusion was therefore intended to reduce content overlap and avoid redundancy within the item set.

The 30 items were distributed across QC (20 items, qc1 to qc20), QPr (3 items, qpr1 to qpr3), and QPs (7 items, qps1 to qps7). The test employs a multiple-choice format, with one correct response out of four options. Similarly to the Computational Thinking Test (CTt; Román-González, 2015), the QLt items incorporate three types of answer alternatives: visual blocks, visual arrows, and visual objects. In addition to assessing fundamental quantum literacy, each item requires specific cognitive tasks, such as prediction, sequencing, completion, and pattern recognition.

Prediction tasks require test-takers to forecast the likelihood of an event occurring.
Sequencing tasks involve following a sequence of events.
Completion tasks ask test-takers to finish a given task.
Pattern recognition tasks require identifying patterns based on a provided path.

These cognitive tasks are closely tied to the answer alternative styles. For instance, prediction tasks primarily use visual blocks, sequencing tasks often involve visual arrows, completion tasks combine visual blocks and visual objects, and pattern recognition tasks predominantly rely on visual blocks. While each item addressed a fundamental aspect of quantum literacy, it may also address other latent dimensions. Figure 2, Figure 3, Figure 4 and Figure 5 provide examples of the items, illustrating their answer alternatives and required tasks. The complete items and detailed design principles can be accessed on the Open Science Framework1 (OSF) platform.

4. Participant Recruitment and Training

A sample of 853 senior secondary school students (Senior Secondary I and II) was recruited from ten schools (five public and five private). Prior to recruitment, approval was obtained from the Ministry of Basic and Secondary Education (Approval number anonymized for review). Consent for participation was also obtained from the parents of each student. The participants’ demographic profiles covered a broad range of characteristics (see Table 2), including binary gender categories (female and male), grade level (SSI and SSII), career disciplines (STEM and non-STEM), disability status (mild and no disability), and age range (mean of 17.4 years). After obtaining informed consent through a consent form, 34 students declined to participate, resulting in a final sample of 819 students.

After the recruitment, we conducted a two-month training program with the assistance of ten research assistants to introduce participants to quantum computing. The training was simple enough to match the participants’ present abilities and broad enough to introduce a wide range of quantum literacy. It was only designed to support the development and validation of the QLt as opposed to the evaluation or implementation of a formal curriculum. To minimize disruptions to classroom routines, training sessions were scheduled in coordination with each school. The program consisted of two modules, with each lasting one month. In the first module, we introduced the concept of quantum computing, including a brief history, definition, and the fundamental principles. In the second module, the participants were exposed to different unplugged activities following the unplugged experiments implemented by previous researchers (e.g., Angara et al., 2020; Carreño et al., 2019; Franklin et al., 2020). Below, we discussed some of the unplugged activities implemented for each quantum concept.

4.1. Superposition

The concept of superposition was implemented to understand how qubits (i.e., the fundamental units of quantum particles) can be assumed to exist in two states until when measured. Following the recommendation of Jessica Pointing2, we used a doughnut to demonstrate this task (see Figure 6). We informed the participants that the sprinkled side of the doughnut represents |0, and the plain side represents |1〉. We spun the doughnut on a flat surface and informed the participants that the spinning doughnut was in a mixture of both sides at the same time, just like a qubit in superposition. We also informed the participants that they could not determine which side it would land on until it stopped spinning, at which point it collapsed into a definite state. Although no classical analogy perfectly captures this quantum phenomenon, Angara et al. (2020) suggest that using a doughnut as a qubit provides a memorable and intuitive visualization for students.

4.2. Entanglement

We also used doughnuts to demonstrate the concept of quantum entanglement. Two research assistants stood at opposite ends of the classroom, each holding a doughnut. By replicating the experiment done by Angara et al. (2020), two assistants simultaneously spun their doughnuts. We explained that this simultaneous spinning symbolizes the entangled state where the properties of one doughnut are inherently linked to the other, regardless of the distance between them. To simulate measurement, one assistant stopped spinning their doughnut and revealed its side to the class. At the same moment, the second assistant also stopped spinning, ensuring that their doughnut displayed the opposite side. We explained that the display of opposite sides by the two assistants explains the key principle of entanglement, where the measurement of one particle instantly determines the state of the other, no matter how far apart they are (Duarte, 2022). After each iteration, participants were asked to observe and identify the displayed sides of both doughnuts, reinforcing their understanding of how entangled particles exhibit correlated outcomes.

4.3. Reversibility

The concept of quantum reversibility was implemented using a series of reversibility cards (see Figure 7, Franklin et al., 2020) that illustrate reversibility and non-reversibility actions to the students. Following the study by Franklin et al. (2020), we demonstrated that certain quantum operations, such as the unitary transformation, are reversible, just like some real-life actions illustrated on the cards can be reversed. We concluded this section of the experiments by informing the participants that in quantum mechanics, reversible operations preserve information, whereas irreversible processes result in information loss.

4.4. Quantum Gates

To introduce students to quantum gates, we designed interactive sessions that allowed participants to grasp these fundamental concepts without requiring advanced technology. Following recent educational approaches (Liu et al., 2023), we implemented two key activities. First, we conducted a human quantum gates simulation, where students acted as qubits, embodying states |0〉 or |1〉, and physically moved to represent the application of quantum gates. For example, the Pauli-X gate (quantum NOT gate) was demonstrated by having students switch positions between |0〉 and |1〉, which illustrates the bit-flip operation. Similarly, the Hadamard gate was represented by students standing between |0〉 and |1〉, which signifies a superposition state.

To illustrate how certain gates can change the state of a qubit as it passes through the gate, we designed an activity using color-coded stickers that students physically step on. Each sticker represented a different quantum gate, and as students (qubits) stepped on each, they were instructed to change their state accordingly. For example, students in state |0〉 stepping on a Pauli-X sticker would step forward and announce they had flipped to |1〉. Second, we used Bloch sphere representation with balloons to help students visualize qubit states and the impact of quantum gates. Each student was given a balloon, on which they marked points to denote qubit states. Rotating the balloon represented the effect of different quantum gates, such as the Pauli-X and Hadamard gates.

4.5. Quantum Algorithm

To illustrate Grover’s search algorithm, we used an unplugged classroom activity involving ten numbered balloons, one of which concealed a small colored ball (Figure 8). We first explained that a classical search would involve checking balloons one by one, resulting in up to ten attempts in the worst case. Students then participated in a quantum-inspired task conducted over several short rounds. In each round, students updated their expectations about the likely location of the ball based on previous outcomes and selected a small subset of balloons to check. Across successive rounds, students observed that the likelihood of identifying the correct balloon increased. The activity was designed to convey quadratic probability amplification, illustrating that the expected number of attempts scales with √N rather than N. This helped students understand how Grover’s algorithm improves search efficiency without guaranteeing immediate success.

To illustrate Shor’s algorithm, students engaged in an unplugged pattern-recognition task. Given a set of numbered balloons, they were asked to group and mark balloons that followed repeating numerical patterns (for example, counting in fixed steps such as every second or third number), instead of checking candidates individually. Based on the identified patterns, students then selected balloons to check until the hidden ball was found. We explained that Shor’s algorithm transforms factorization into a period-finding problem, in which identifying regularity in numerical relationships is central. At the algorithmic level, this period finding is enabled by the quantum Fourier transform (QFT), which makes the hidden periodic structure observable, although the present activity uses a high-level analogy rather than a direct implementation. In this way, students’ focus on detecting and using repeating patterns served as a conceptual analog of periodic structure detection, helping them distinguish Shor’s approach from classical trial-and-error factorisation.

Our unplugged activities provided participants with a fundamental understanding of quantum literacy. While we acknowledge the limitations of our approach, we observed that the participants developed a strong sense of enthusiasm and curiosity about quantum computing. This curiosity led them to inquire about the practical applications of the concepts they had learned. To address their inquiries, we explained that quantum literacy serves as a foundation for understanding cutting-edge technologies such as quantum cryptography, quantum machine learning, and quantum simulations. For example, we explained that quantum cryptography helps keep messages safe, just like a secret code that only the right person can unlock. We informed them that banks and governments use this technology to protect important information from hackers. Another example we gave was traffic control. We asked the students to imagine a city where traffic jams always happen. Quantum computers can help find the best routes for cars, reducing traffic and saving time.

5. Expert Validation and Test Administration

5.1. Expert Validation

We submitted the 30 draft QLt items to ten subject-matter experts drawn from five countries: Algeria, Nigeria, India, Saudi Arabia, and China. Half of the reviewers specialize in quantum physics and half in quantum computing. Their feedback covered item wording, rule consistency, response formats, and domain balance. For instance, as the original pool contained 28 items (20 of which targeted quantum concepts), reviewers suggested adding two additional items to capture quantum practices, bringing the total to 30. Most recommendations were incorporated, yet one point remained unresolved. One of the experts recommended a change in the instruction for item qc6, which measures quantum entanglement using two “entangled cards.” The expert demanded specifying “which card the observer starts with” during the experiment. However, we argued that for an entangled pair, there is no “designated first particle” as the observation is immaterial. Instead, both members of the pair are described by a single quantum state, and measurement on either one collapses that global state instantaneously. Overall, the expert panel judged the revised set highly suitable for classroom use, assigning a mean relevance rating of 4.53 on a 5-point scale (1 = poor, 5 = excellent).

5.2. Test Administration

Following the refinement of the item pool, we proceeded with administering our QLt to the 819 participants (see demography in Table 2). Initially, we considered projecting the questions but realized that this approach could compromise “response independence,” potentially affecting the psychometric validity of the test. To maintain individual responses, we opted for a paper-and-pencil format, allowing participants the flexibility to skip questions and answer at their own pace. The QLt had a 45 min time limit and was conducted under strict supervision to ensure test integrity. Below, we present the descriptive statistics of the test scores.

Descriptive statistics for the variables (see Table 3) indicate that the mean scores were moderate across all constructs. Quantum Concepts (QC) had a mean of 9.98, with a near-normal distribution as evidenced by a skewness of −0.08 and a moderate positive kurtosis of 1.47. Quantum Practices (QPr) recorded a mean of 1.57, with a skewness of −0.07 and a negative kurtosis of −0.92. Similarly, Quantum Perspectives (QPs) had a mean of 3.56, with minimal skewness (−0.02) and mild kurtosis (−0.21). The overall QLt scores showed a mean of 15.11, with a skewness of −0.11 and a kurtosis of 0.94, reflecting a nearly symmetric distribution with a modestly peaked shape. According to literature evidence, the acceptable limits to prove normal univariate distribution are [−2; +2] for Skewness and [−7; +7] for Kurtosis (El-Hamamsy et al., 2025). We plotted histograms and box plots to visualize normality and identify outliers across two categories: gender and grade level. The histograms depict a fairly normal distribution across categories with distinct patterns (see Figure 9), while the box plots indicate some outliers, particularly on QLt scores of male SSII students (see Figure 10). These variations forced us to examine statistical differences across the groups.

We conducted an independent sample t-test and observed significant differences across genders (t = −2.063, p-value = 0.039) but insignificant differences across grade levels (see Table 4). An inspection of the mean scores shows that female participants (M = 15.41, SD = 4.28) had a slightly higher mean score than male participants (M = 14.79, SD = 4.12). However, when we compared the three dimensions of the QLt, we observed no gender difference across any of these dimensions (see Table 5). This suggests that while there may be a slight overall performance gap, males and females engage similarly across the specific dimensions measured.

6. Psychometric Analysis

After collecting the responses, we conducted an initial psychometric validation of the test items using two complementary approaches: the Classical Test Theory (CTT) and the Item Response Theory (IRT) (Awopeju & Afolabi, 2016; De Champlain, 2010; El-Hamamsy et al., 2025). The IRT, CTT, and other analyses were performed in R (version 4.2.1, R Core Team, 2019) using the following packages: eRm (version 1.0-10, Mair & Hatzinger, 2007), mirt (version 1.44.0, Chalmers, 2012), ltm (version 1.2-0, Rizopoulos, 2006), EGAnet (version 2.3.0, Hudson & Alexander, 2025), lavaan (version 0.6-19, Rosseel, 2012), psych (2.5.3, Revelle, 2025), gbm (version 2.2.2, Ridgeway et al., 2024), caret (version, 7.0-1, Kuhn, 2008), and e1071 (version 1.7-16, D. Meyer et al., 2024).

6.1. Classical Test Theory

CTT was employed to evaluate item difficulty, reliability, and discrimination. From the result of CTT (see Table 6), the item difficulty indices ranged from 0.470 to 0.875, indicating a generally balanced level of difficulty, with most items falling near the ideal mid-range (around 0.50). Point-biserial correlation values, which reflect each item’s discrimination power, ranged from 0.207 to 0.391. The majority of items exceeded the acceptable threshold of 0.30, suggesting they reliably distinguished between higher- and lower-performing respondents. A few items (e.g., qc7, qc17, and qps7) showed lower discrimination values, which may warrant further review. The “drop alpha” values indicated that removing any individual item would not significantly improve the overall internal consistency. Reliability analysis showed strong internal consistency across the full scale (α = 0.87), as well as within subscales: quantum concepts (α = 0.84), quantum practices (α = 0.81), and quantum perspectives (α = 0.80), supporting the reliability of the instrument across its dimensions.

6.2. Item Response Theory

Despite the utility of CTT, it is generally limited by its dependence on the specific sample and test used (Hambleton & Jones, 1993; DeVellis, 2006), and it does not allow for the separation of item properties from individual respondent characteristics (El-Hamamsy et al., 2025). In view of this, we employed an additional approach, which is the IRT. This approach offers a validation approach that is independent of the sample, based on the premise that individuals possess a stable underlying ability that consistently influences their responses, regardless of the specific test used (El-Hamamsy et al., 2025). By estimating the probability that a person at a particular ability level (expressed in standard deviation units) will answer each item correctly, IRT enhances the potential for findings to extend beyond a single group of respondents (Xie et al., 2019), thereby supporting measurement consistency across different populations (Dai et al., 2020; Jabrayilov et al., 2016). Based on the IRT, we first re-examined the item difficulty estimate. The following rules were adopted to interpret the difficulty estimate:

Very easy: <−2
Easy: −2 to −0.5
Moderate: −0.5 to +0.5
Difficult: +0.5 to +2
Very difficult: >+2

The Item difficulty estimates model (see Table 7) revealed that all items had difficulty values within a narrow and acceptable range, approximately between −0.15 and +0.16. This indicates that the instrument as a whole was well-targeted for the sample, with items neither excessively easy nor unduly difficult. Most items, such as qc5 (b = −0.022), qc14 (b = 0.044), and qps3 (b = −0.012), were centered close to zero, reflecting moderate difficulty. The most difficult item, qc15 (b = 0.163), and the easiest item, qps7 (b = −0.146), still fall within the ideal difficulty span, suggesting a balanced scale that can differentiate well across ability levels without skewing toward extremes. Furthermore, the relatively narrow confidence intervals indicate precise estimates, reinforcing the psychometric stability of the instrument for this population. Overall, the discrimination indices range from 0.3 to 0.5, with most items demonstrating moderate discrimination (around 0.3–0.4). Notably, qps6 (0.498) exhibits the highest discrimination, suggesting that it is particularly effective in distinguishing between high- and low-ability respondents.

We observed some discrepancies in the item difficulty and discrimination between the CTT and the IRT models. Therefore, to confirm our analysis, we conducted the item fit statistics (see Table 8). Regarding overall fit, the p-values for all items are non-significant (p > 0.05), indicating that none show substantial misfits to the IRT model. The Outfit and Infit Mean Square (MSQ) values for most items fall within the acceptable range (0.8 to 1.2), confirming that the test items function well in differentiating respondents of varying abilities. However, qc17 (Outfit = 1.143, Infit = 1.096) shows slightly elevated values, suggesting some degree of unpredictability in responses to this item. Nevertheless, the overall results indicate that the test items fit the model well, with only minor deviations that do not significantly affect model validity.

Following the evaluation of item fit statistics, the IRT analysis was extended through a series of informative visualizations, including the Item Characteristic Curves (ICC), Item Information Curves (IIC), and the Test Information Function (TIF). These were modeled across a range of latent ability levels to understand how items and the overall test function across different points on the ability scale. Several IRT models have been proposed for binary response data: the 1-Parameter Logistic (1PL) model, which assumes that only item difficulty varies; the 2-Parameter Logistic (2PL) model, which allows both item difficulty and discrimination to vary; the 3-Parameter Logistic (3PL) model, which adds a guessing parameter to account for the probability of low-ability respondents answering correctly by chance; and the 4-Parameter Logistic (4PL) model, which further incorporates the possibility that even high-ability individuals may occasionally respond incorrectly due to carelessness or ambiguity in item wording (El-Hamamsy et al., 2025). While we tested all four models, only the 1-PL and 2-PL models converged to stable solutions. Similarly to a previous study (see El-Hamamsy et al., 2025), we only presented the characteristics of 1-PL and 2-PL models for the reader.

In the ICC plots (see Figure 11), all items in the 1-PL model display somewhat similarly shaped S-curves, reflecting the model’s core assumption that item discrimination is held constant across all items. This means the curves differ only in their horizontal positioning, depending on item difficulty, but remain parallel and uniformly steep, indicating equal ability to distinguish between respondents of different proficiency levels (S. Lee & Bolt, 2018; Rosenbaum, 1987). However, in the 2-PL, this uniformity disappears. While several items continue to follow the familiar S-shaped curve, others deviate slightly in steepness or curvature, reflecting the introduction of item-specific discrimination parameters. According to El-Hamamsy et al. (2025), items with high discriminability have a steep ICC slope, while items with low discriminability have a gentle slope.

In the IIC plots (see Figure 12), the 1-PL model shows that all items exhibit classic bell-shaped curves that are somewhat clustered. This pattern reflects the model’s assumption that item discrimination is constant across all items. In contrast, the 2-PL model introduces item-specific discrimination parameters, resulting in a notable divergence among the IICs. While many items still retain a bell-like shape, their peak heights vary substantially. According to El-Hamamsy et al. (2025), items with lower discrimination parameters yield flatter and lower peaks, contributing less information overall. Thus, the gradual drop in peak heights across items in the 2-PL plot highlights differences in each item’s precision in estimating ability. This demonstrates greater flexibility and complexity by allowing discrimination to vary.

The final visualization presents the Test Information Function (TIF) alongside the Standard Error of Measurement (SEM). The TIF represents the total information provided by the test across different ability levels, obtained by aggregating the individual IICs (El-Hamamsy et al., 2025). Both the 1-PL and 2-PL models produce bell-shaped TIFs, indicating that the test is most informative near the average ability range (see Figure 13). However, the 2-PL model yields a significantly higher peak (about 9.5 on the y-axis) than the 1-PL (about 7.0 on the y-axis), demonstrating that the test provides greater information when item discriminations are permitted to vary. This enhanced information is reflected in a lower SEM curve, indicating improved measurement precision under the 2-PL model.

To strengthen the analysis and confirm the item difficulty indices, we plotted the person-item map (PIM). This map compares the range and position of the item measure distribution to the range and position of the person measure distribution. As a rule, items should be located along the scale measurement to indicate the ability of all persons. From Figure 14, the distribution of a person’s abilities spans from approximately −1 to +1, with the highest density centered around 0. Additional noticeable peaks appear near −0.5 and +0.5, suggesting a clustering of respondents around average ability levels. The two vertical lines are closely aligned and positioned near the center of the scale, implying that the items are well-targeted to the test-takers’ abilities. Overall, results from the CTT and IRT indicate that the QLt provides optimal measurement precision for the majority of respondents.

6.3. Measurement Invariance

Measurement invariance, also known as measurement equivalence, refers to the consistency of an instrument’s measurement properties across different groups (El-Hamamsy et al., 2025). It can be evaluated in two complementary ways: through test-level analyses, which assess whether the overall construct is interpreted similarly across groups (Putnick & Bornstein, 2016), and item-level analyses, which examine whether individual test items function differently depending on group membership. In this study, we employed the Item-level measurement (ILM) approach using Differential Item Functioning (DIF). DIF is a statistical technique commonly used to identify potential biases in how individuals from different groups—such as gender (El-Hamamsy et al., 2025; Rachmatullah et al., 2022; Sovey et al., 2022), grade level (El-Hamamsy et al., 2025), or other demographic categories—respond to specific test items.

In our analysis, we employed the ILM within the DIF framework to examine how participants from different gender identities (male vs. female), disciplines (Science vs. non-science), grade levels (SSI vs. SSII), and school types (public vs. private) responded to individual QLt items. Visual inspection of the DIF plots indicated no meaningful differential functioning across these subgroups. For example, male and female participants showed highly similar item difficulty patterns, with only minor variations observed (see Figure 15 and Figure 16). Importantly, none of the items approached the ±0.5-logit threshold commonly used to indicate substantive DIF. Additional DIF plots for the remaining subgroups are provided in the OSF repository.

To complement the visual analyses, item-level DIF statistics were examined using Wald tests under the Rasch model. These analyses confirmed that logit differences between subgroup pairs were consistently small and that none of the Wald χ² tests reached statistical significance. These results indicate that item difficulty estimates were stable across gender, field of study, school type, and level. Consequently, the small statistically significant gender difference observed in total QLt scores (Table 4) is best interpreted as reflecting true-score variation rather than systematic item-level bias. The full item-level DIF statistics, including Wald χ² and p values, are available in the OSF repository for transparency and replication.

7. Item Unidimensionality and Multidimensionality

7.1. Exploratory Factor Analysis

Following the substantial model fit and acceptable discrimination, we proceeded to identify the factor structure of the QLt through an exploratory factor analysis (EFA). Traditional EFA methods, such as principal axis factoring and maximum likelihood estimation, have been widely used to uncover latent structures in psychological assessments (Goretzko, 2023). However, these methods often rely on subjective criteria for determining the number of factors and may not perform optimally with dichotomous data. To address this, we employed Exploratory Graph Analysis (EGA, H. F. Golino & Demetriou, 2017), a contemporary technique that leverages network science to estimate the number of dimensions in a dataset. EGA constructs a network where variables are represented as nodes connected by edges that denote their relationships. In addition, EGA applies community detection algorithms to identify clusters corresponding to latent factors (Jiménez et al., 2024). This approach has demonstrated superior accuracy in determining dimensionality compared to traditional methods (H. F. Golino & Epskamp, 2017; H. Golino et al., 2020).

We used the Infomap algorithm to detect the communities. Studies have recommended this algorithm due to its efficiency in detecting smaller communities within a network (Bae et al., 2017; Hu & Liu, 2015). In addition, Infomap has been shown to outperform Louvain in certain contexts (Linhares et al., 2020). The Infomap algorithm identified three distinct clusters (red, blue, and green). However, visual inspection of the EGA graph (see Figure 17) revealed that items “qc7,” “qc17,” and “qps7” appeared weakly connected to the red cluster, despite their algorithm-assigned membership. To assess the strength of these connections, we computed network loadings and performed a bootstrap analysis of the EGA with 1000 iterations to evaluate cluster stability. Results indicated that all three items exhibited weak network loadings below 0.20 (see Table 9). In line with psychometric conventions, items with network loadings below 0.15 or 0.20 are typically considered for removal due to insufficient association with their assigned cluster (Cosemans et al., 2022). After removing the three items with weak network loadings, a total of 27 items were retained. A second round of EGA was then conducted to ensure effective item purification and confirm the revised factor structure (see Figure 18).

Nevertheless, an inspection of the revised blueprint indicated that item removal did not result in the substantive under-representation of any content domain. Accordingly, the balance across subscales was preserved at a level appropriate for a literacy-focused instrument. The re-estimated reliability of the refined scale still yields satisfactory internal consistency for the total instrument (α = 0.88, ω = 0.89) as well as for the QC (α = 0.84, ω = 0.85), QPr (α = 0.79, ω = 0.80), and QPs (α = 0.81, ω = 0.82) subscales. This indicates that item removal improved the psychometric coherence of the scale. Overall, the final 27-item QLt comprised 18 items for Quantum Concept (QC), three items for Quantum Practice (QPr), and six items for Quantum Perspective (QPs).

7.2. Confirmatory Factor Analysis

To ensure that the EGA was valid, we conducted a confirmatory factor analysis on the 27 items. Specifically, the bi-factor model was employed using the weighted least squares mean and variance adjusted (WLSMV) estimator and the orthogonal rotation approach (see Figure 19). We had two reasons for estimating the bi-factor. The first is to estimate the correlated components and ensure that each dimension is partially distinct with a small to moderate correlation with other components. The second reason is to determine if a single general factor sufficiently explains the constructs or whether specific factors provide meaningful variance (Rodriguez et al., 2016). Previous research has shown that the bi-factor model provides a better fit by capturing shared variance as opposed to the first- or higher-order CFA models (Reise et al., 2016). This reduces multicollinearity issues and improves interpretability (Chen et al., 2006).

The results in Table 10 present the factor loadings for both the correlated model and the bi-factor model. In the correlated model, all items demonstrated moderate to strong loadings on their respective factors, generally exceeding 0.68. In the bi-factor model, each item has two loadings: one on a general factor and one on a specific factor. All items loaded substantially on the general factor, supporting the presence of an overarching latent trait. However, the specific factor loadings remained moderate. Overall, the results support a strong general factor structure with meaningful and specific groupings, further validating the unidimensional composition of the test.

8. Criterion Validity

Given the balanced item difficulty and the valid factor structure demonstrated by our QLt, we extended our analysis to investigate how it relates to other established psychometric instruments within cognitive psychology and computer science. Specifically, we assessed concurrent validity by examining the relationships between the 27-item QLt and two well-known instruments: the Computational Thinking Test (CTt; Román-González, 2015) and the Spatial Ability Test (SAt; Ekstrom et al., 1976). Prior research has indicated that concurrent validity paradigms often produce higher validity coefficients than predictive paradigms, as the passage of time in predictive models tends to attenuate the strength of correlations between constructs (Gupta et al., 2013; W.-L. Lin & Yao, 2021).

8.1. Criterion Tests

The CTt is a 28-item performance-based assessment designed based on the framework of Brennan and Resnick (2012). The test covers fundamental computational concepts such as basic directions, sequences, loops, conditionals, and functions. After validation, the reliability coefficients were found to be 0.793 for the overall sample, with breakdowns of 0.721 for 5th and 6th graders, 0.762 for 7th and 8th graders, and 0.824 for 9th and 10th graders. Since the CTt was initially created for middle-school students, we selected it for our criterion validity analysis. On the other hand, the SAt includes the Paper Folding Test (PFT) developed by Ekstrom et al. (1976). The PFT consists of 20 items, each showing a sequence of 2 to 4 images that depict how a piece of paper is folded. After folding, a hole is punched through the paper, marked by a visible circle. Participants are then presented with five images of unfolded paper, each showing different hole patterns. Their task is to select the image that accurately represents how the unfolded paper would look with the correct hole placements based on the folding sequence.

8.2. Test Administration and Results

One of the primary challenges in assessing concurrent criterion validity is test fatigue, particularly when administering multiple tests in a single session (Song et al., 2022). This issue is especially critical for secondary school students, who may experience cognitive overload. To mitigate fatigue effects, we administered the tests in three separate sessions, with 30 min breaks between each. Additionally, to prevent order bias, we randomized the sequence of the Computational Thinking Test (CTt) and Spatial Ability Test (SAt) in Sessions 2 and 3, following the initial Quantum Literacy Test (QLt). Table 11 summarizes the test administration protocol.

While intensive training was provided for the QLt (as previously discussed), we did not provide such training for the CTt and Sat. Instead, we provided brief instructions on how to answer these tests via short instructional videos for each test. For the CTt, we provided an English language description of the video content prepared by Jiménez et al. (2024)3 to ensure that all participants fully understood the instructions. For the SAt, we projected a 4 min video prepared by Visuprep4 to introduce the patterns involved in answering paper folding tests. It should be noted that the tests were administered on a school-by-school basis.

The data collected were analyzed using Pearson correlation. Results indicate a significant correlation between our QLt and the criterion tests (see Table 12): CTt (r = 0.65, p-value < 0.01) and SAt (r = 0.32, p-value < 0.01). On a facet level (see Table 13), results indicate that the computational concepts such as sequence, loops, conditional, and functions were more strongly correlated with quantum concepts (r ≥ 0.4) than quantum practices (r ≥ 0.13) and quantum perspectives (r ≥ 0.05).

Based on the criterion tests, our QLt demonstrates concurrent validity, as evidenced by significant correlations with established cognitive instruments. However, further predictive analysis reveals that while the QLt aligns with these cognitive measures, it is not strongly predicted by them. This is supported by the performance of four predictive algorithms, all yielding low coefficients of determination (R² ≤ 0.40; see Table 14) and limited model predictive power (see Figure 20). From this analysis, we hypothesize that the ability to perform well in CTt and the SAt does not infer causality on the performance in our QLt. Rather, our analysis suggests that the QLt has some association with these performance tests, which indicates its substantial concurrent validity.

9. Discussion and Conclusions

Until now, most K-12 quantum education studies have relied on self-reported data or bespoke quizzes that lack documented validity (e.g., Holincheck et al., 2024; Hughes et al., 2022; Ivory et al., 2023; Tucker, 2023). The only recent standardized instrument we identified focuses on assessing pseudoscientific beliefs about quantum mechanics rather than evaluating students’ conceptual understanding (Bondani et al., 2024). In contrast, we argued that a quantum literacy test should target students’ knowledge of foundational quantum ideas and provide an objective measure of how well learners grasp essential quantum concepts, practices, and perspectives. Such a tool should go beyond merely identifying misconceptions; it should capture students’ conceptual understanding to allow educators to evaluate instructional effectiveness and monitor learning progress reliably.

Based on this rationale, we developed and validated the Quantum Literacy Test (QLt) for upper-secondary students within the framework of computational thinking (CT). Our multi-step validation process confirmed the instrument’s reliability and structural validity, following the IRT and CTT. Internal consistency and item-fit statistics were within acceptable ranges. Differential item functioning analyses demonstrated measurement fairness across gender, discipline, grade level, and school type. The factor structure was stable but indicated the retention of 27 items. Following the item depuration, we examined how the QLt aligns with broader cognitive constructs. The large correlation with the computational thinking test (CTt, r = 0.65) mirrors the algorithmic overlap between qubit logic and CT decomposition strategies (Román-González et al., 2025). To further support our findings, Bahadori and Dvorak (2024) argued that introducing simple qubit-gate circuits inside early computer science courses reinforces CT habits such as abstraction and decomposition. Previous studies found that CT skills themselves are intertwined with general reasoning resources, including spatial, numerical, and verbal abilities (Román-González et al., 2017; Tsarava et al., 2022). On the other hand, the moderate link to spatial ability (SAt, r = 0.32) is consistent with evidence that mental rotation supports state-vector visualization (C.-H. Lin & Chen, 2016) but does not guarantee success on abstract concepts that require formal reasoning (Kohl & Finkelstein, 2008).

Interestingly, despite strong concurrent validity, our machine learning models using CTt and SAt scores explained less than 40% of the variance in QLt performance (R² ≤ 0.40). This modest predictive power suggests that quantum literacy may involve domain-specific knowledge and reasoning, not fully captured by broader cognitive abilities. Similar “domain-specific gaps” have been reported between general numeracy and energy-literacy tests (L.-S. Lee et al., 2015) and between general reasoning and genomic literacy (Linderman et al., 2021). Nevertheless, our instrument has shown acceptable validity and confirms the idea that CT can be extended to quantum education (Román-González et al., 2025; U.S. Department of Energy, Office of Science, 2023). We anticipate that the QLt will make a significant impact in assessing students’ quantum literacy in several ways. First, curriculum researchers can deploy it before and after unplugged, simulator-based, or hardware-access modules to quantify learning gains and compare pedagogies. Second, teachers can use the QLt for formative feedback, profiling classwide strengths across quantum concepts, practices, and perspectives. Third, because evidence from the differential-item-functioning shows no bias by gender, school type, or career track, the QLt is well suited to monitoring equity initiatives aimed at broadening participation in the quantum workforce.

10. Limitations and Future Direction

A key limitation of the present study is that validation was carried out solely with K-12 students in Nigeria. Because an instrument verified in one cultural setting is not automatically valid in another (Beaton et al., 2000; Gjersing et al., 2010; Huang & Wong, 2014), wider evidence is essential. Accordingly, our first future step will involve a study on cross-national validation, adapting and testing the QLt in multiple languages and curricula to establish its robustness across diverse educational contexts. Similarly to the CTt which largely focused on CT concepts (Román-González et al., 2017), our QLt also overly focused on quantum concepts, leaving practices and perspectives under-represented. Our next development phase will therefore expand the item bank to enrich these two dimensions. Although our unplugged activities replicated those used in earlier studies (e.g., Angara et al., 2020; Carreño et al., 2019) and aligned with the Nigerian secondary school computer curriculum, we developed them without direct collaboration with classroom computer science teachers. Thus, in the future, we also plan to co-design new scenarios with classroom teachers as recommended by a recent study (Moreno-León et al., 2025). This will include unplugged protocol-design tasks and ethics-of-quantum-technology vignettes, and piloting them alongside the existing QLt.

We also acknowledged the limitations in the psychometric indices. While the IRT results indicate that the current item pool provides the highest precision around average ability levels, it also shows reduced sensitivity at the lower and upper ends of the ability continuum. Future work may expand the item pool with deliberately easier and more challenging items to improve measurement precision across the full ability range. Overall, these future research developments will establish whether the QLt retains its psychometric properties internationally and also supply country-specific norms that educators and policymakers can use to benchmark progress toward national quantum literacy goals.

Author Contributions

Conceptualization, A.Y. and M.R.-G.; methodology, A.Y. and M.R.-G.; software, A.Y.; validation, A.Y. and S.K.B.; formal analysis, A.Y.; investigation, A.Y.; resources, A.Y. and S.K.B.; data curation, A.Y.; writing—original draft preparation, A.Y. and S.K.B.; writing—review and editing, M.R.-G., N.A.A., and N.M.N.; visualization, A.Y. and M.R.-G.; supervision, M.R.-G., N.A.A., and N.M.N.; project administration, M.R.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This research was approved by the Ministry of Basic and Secondary Education, Sokoto State, Nigeria (Approval Number: MBSE/RP/2024/0412; Approval Date: 2 October 2024), and followed the consent to participate procedures.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data for this study is available upon reasonable request.

Acknowledgments

During the preparation of this work, the authors used ChatGPT (version 4.5) to improve the readability and language of the manuscript. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the published article.

Conflicts of Interest

The authors declare no conflicts of interest.

Notes

1	https://osf.io/8etp4/?view_only=d3a30d0d44ae473e8207838a271858bd (accessed on 28 November 2025).
2	Jessica Pointing: Quantum Computing (explained with doughnuts)—www.youtube.com/watch?v=YIDdPbnnqsA (accessed on 28 June 2025).
3	https://vimeo.com/544462776/34e6be254a (accessed on 30 June 2025).
4	https://www.youtube.com/watch?v=ALZOnBim-hk (accessed on 30 June 2025).

References

Angara, P. P., Stege, U., & MacLean, A. (2020, October 12–16). Quantum computing for high-school students: An experience report. IEEE International Conference on Quantum Computing and Engineering, QCE 2020 (pp. 323–329), Denver, CO, USA. [Google Scholar] [CrossRef]
Awopeju, O. A., & Afolabi, E. R. I. (2016). Comparative analysis of classical test theory and item response theory based item parameter estimates of senior school certificate mathematics examination. European Scientific Journal, ESJ, 12(28), 263. [Google Scholar] [CrossRef]
Bae, S.-H., Halperin, D., West, J. D., Rosvall, M., & Howe, B. (2017). Scalable and efficient flow-based community detection for large-scale graph analysis. ACM Transactions on Knowledge Discovery from Data, 11(3), 32. [Google Scholar] [CrossRef]
Bahadori, F., & Dvorak, R. (2024). Bridging the quantum computing skills gap: Integrating Quantum education into computer science curricula. Journal of Computing Sciences in Colleges, 40(1), 21–22. [Google Scholar]
Beaton, D. E., Bombardier, C., Guillemin, F., & Ferraz, M. B. (2000). Guidelines for the process of cross-cultural adaptation of self-report measures. Spine, 25(24), 3186–3191. [Google Scholar] [CrossRef] [PubMed]
Bondani, M., Galano, S., Malgieri, M., Onorato, P., Sciarretta, W., & Testa, I. (2024). Development and use of an instrument to measure pseudoscientific beliefs in quantum mechanics: The PSEUDO-QM scale. Research in Science & Technological Education, 43(4), 1330–1351. [Google Scholar] [CrossRef]
Brennan, K., & Resnick, M. (2012). New frameworks for studying and assessing the development of computational thinking. In Proceedings of the 2012 annual meeting of the american educational research association (Vol. 1, p. 25). Scientific Research Publishing. [Google Scholar]
Carreño, M. J., Sepúlveda, J., Tecpan, S., Hernández, C., & Herrera, F. (2019). An instrument-free demonstration of quantum key distribution for high-school students. Physics Education, 54(6), 065006. [Google Scholar] [CrossRef]
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. [Google Scholar] [CrossRef]
Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41(2), 189–225. [Google Scholar] [CrossRef]
Cosemans, T., Rosseel, Y., & Gelper, S. (2022). Exploratory graph analysis for factor retention: Simulation results for continuous and binary data. Educational and Psychological Measurement, 82(5), 880–910. [Google Scholar] [CrossRef]
Dai, B., Zhang, W., Wang, Y., & Jian, X. (2020). Comparison of trust assessment scales based on item response theory. Frontiers in Psychology, 11, 10. [Google Scholar] [CrossRef]
De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44(1), 109–117. [Google Scholar] [CrossRef] [PubMed]
DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), S50–S59. [Google Scholar] [CrossRef]
Dowling, J. P., & Milburn, G. J. (2003). Quantum technology: The second quantum revolution. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 361(1809), 1655–1674. [Google Scholar] [CrossRef] [PubMed]
Duarte, F. J. (2022). Fundamentals of quantum entanglement (2nd ed.). IOP Publishing. [Google Scholar] [CrossRef]
Ekstrom, R. B., French, J. W., & Harmann, H. H. (1976). Manual for kit of factor-referenced cognitive tests. Educational Testing Service. [Google Scholar]
El-Hamamsy, L., Zapata-Cáceres, M., Martín-Barroso, E., Mondada, F., Dehler Zufferey, J., Bruno, B., & Román-González, M. (2025). The Competent Computational Thinking Test (cCTt): A valid, reliable and gender-fair test for longitudinal CT studies in grades 3-6. Technology, Knowledge and Learning, 30, 1607–1661. [Google Scholar] [CrossRef]
Escanez-Exposito, D., Rodriguez-Vega, M., Rosa-Remedios, C., & Caballero-Gil, P. (2025). QScratch: Introduction to quantum mechanics concepts through block-based programming. EPJ Quantum Technology, 12, 12. [Google Scholar] [CrossRef]
Franklin, D., Palmer, J., Jang, W., Lehman, E. M., Marckwordt, J., Landsberg, R. H., Muller, A., & Harlow, D. (2020, August 1–5). Exploring quantum reversibility with young learners. 2020 ACM Conference on International Computing Education Research, ICER 2020 (pp. 147–157), Virtual. [Google Scholar] [CrossRef]
German, D.-A., Pias, M., & Xiang, Q. (2024, March 20–23). A quantum abacus for teaching quantum algorithms. 55th ACM Technical Symposium on Computer Science Education (SIGCSE’24), Portland, OR, USA. [Google Scholar] [CrossRef]
Gjersing, L., Caplehorn, J. R., & Clausen, T. (2010). Cross-cultural adaptation of research instruments: Language, setting, time and statistical considerations. BMC Medical Research Methodology, 10(1), 13. [Google Scholar] [CrossRef]
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., Thiyagarajan, J. A., & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292–320. [Google Scholar] [CrossRef]
Golino, H. F., & Demetriou, A. (2017). Estimating the dimensionality of intelligence like data using Exploratory Graph Analysis. Intelligence, 62, 54–70. [Google Scholar] [CrossRef]
Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PLoS ONE, 12(6), e0174035. [Google Scholar] [CrossRef]
Goretzko, D. (2023). Regularized exploratory factor analysis as an alternative to factor rotation. European Journal of Psychological Assessment, 41(4), 264–276. [Google Scholar] [CrossRef]
Grover, S. (2024). Teaching AI to K-12 learners: Lessons, issues, and guidance. In Proceedings of the 55th ACM technical symposium on computer science education (SIGCSE 2024) (pp. 422–428). Association for Computing Machinery. [Google Scholar] [CrossRef]
Gupta, N., Ganster, D. C., & Kepes, S. (2013). Assessing the validity of sales self-efficacy: A cautionary tale. Journal of Applied Psychology, 98(4), 690–700. [Google Scholar] [CrossRef] [PubMed]
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47. [Google Scholar] [CrossRef]
Holincheck, N., Rosenberg, J. L., Zhang, X., Butler, T. N., Colandene, M., & Dreyfus, B. W. (2024). Quantum science and technologies in K-12: Supporting teachers to integrate quantum in STEM classrooms. Education Sciences, 14(3), 219. [Google Scholar] [CrossRef]
Hu, F., & Liu, Y. J. (2015). A novel algorithm Infomap-SA of detecting communities in complex networks. Journal of Communications, 10(7), 503–511. [Google Scholar] [CrossRef][Green Version]
Huang, W. Y., & Wong, S. H. (2014). Cross-cultural validation. In A. C. Michalos (Ed.), Encyclopedia of quality of life and well-being research (pp. 1339–1341). Springer. [Google Scholar] [CrossRef]
Hudson, G., & Alexander, C. (2025). EGAnet: Exploratory graph analysis—A framework for estimating the number of dimensions in multivariate data using network psychometrics. Available online: https://cran.r-project.org/web/packages/EGAnet/index.html (accessed on 15 June 2025).
Hughes, C., Isaacson, J., Turner, J., Perry, A., & Sun, R. (2022). Teaching quantum computing to high school students. The Physics Teacher, 60(3), 187–189. [Google Scholar] [CrossRef]
Ivory, M., Bettale, A., Boren, R., Burch, A. D., Douglass, J., Hackett, L., Kiefer, B., Kononov, A., Long, M., Metcalf, M., Propp, T. B., & Sarovar, M. (2023, September 17–22). Quantum computing, math, and physics (QCaMP): Introducing quantum computing in high schools. 2023 IEEE International Conference on Quantum Computing and Engineering (QCE) (pp. 1–9), Bellevue, WA, USA. [Google Scholar] [CrossRef]
Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment. Applied Psychological Measurement, 40(8), 559–572. [Google Scholar] [CrossRef]
Jiménez, M., Zapata-Cáceres, M., Román-González, M., Robles, G., Moreno-León, J., & Martín-Barroso, E. (2024). Computational concepts and their assessment in preschool students: An empirical study. Journal of Science Education and Technology, 33(6), 998–1020. [Google Scholar] [CrossRef]
Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J. M., & Gambetta, J. M. (2017). Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549(7671), 242–246. [Google Scholar] [CrossRef]
Khan, A. A., Ahmad, A., Waseem, M., Liang, P., Fahmideh, M., Mikkonen, T., & Abrahamsson, P. (2023). Software architecture for quantum computing systems—A systematic review. Journal of Systems and Software, 201, 111682. [Google Scholar] [CrossRef]
Kohl, P. B., & Finkelstein, N. D. (2008). Patterns of multiple representation use by experts and novices during physics problem-solving. Physical Review Special Topics—Physics Education Research, 4(1), 010111. [Google Scholar] [CrossRef]
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. [Google Scholar] [CrossRef]
Lee, L.-S., Lee, Y.-F., Altschuld, J. W., & Pan, Y.-J. (2015). Energy literacy: Evaluating knowledge, affect, and behavior of students in Taiwan. Energy Policy, 76, 98–106. [Google Scholar] [CrossRef]
Lee, S., & Bolt, D. M. (2018). Asymmetric item characteristic curves and item complexity: Insights from simulation and real data analyses. Psychometrika, 83(2), 453–475. [Google Scholar] [CrossRef]
Lin, C.-H., & Chen, C.-M. (2016). Developing spatial visualization and mental rotation with a digital puzzle game at primary school level. Computers in Human Behavior, 57, 23–30. [Google Scholar] [CrossRef]
Lin, W.-L., & Yao, G. (2021). Concurrent validity. In F. Maggino (Ed.), Encyclopedia of quality of life and well-being research (2nd ed.). Springer. [Google Scholar] [CrossRef]
Linderman, M. D., Suckiel, S. A., Thompson, N., Weiss, D. J., Roberts, J. S., & Green, R. C. (2021). Development and validation of a comprehensive genomics knowledge scale. Public Health Genomics, 24(5–6), 291–303. [Google Scholar] [CrossRef]
Linhares, C. D. G., Ponciano, J. R., Pereira, F. S. F., Rocha, L. E. C., Paiva, J. G. S., & Travençolo, B. A. N. (2020). Visual analysis for evaluation of community detection algorithms. Multimedia Tools and Applications, 79, 17645–17667. [Google Scholar] [CrossRef]
Liu, T., Gonzalez-Maldonado, D., Harlow, D. B., Edwards, E. E., & Franklin, D. (2023). Qupcakery: A puzzle game that introduces quantum gates to young learners. Proceedings of the 54th ACM Technical Symposium on Computer Science Education, 1, 1143–1149. [Google Scholar] [CrossRef]
Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20. [Google Scholar] [CrossRef]
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C., & Lin, C.-C. (2024). e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071). TU Wien. Available online: https://cran.r-project.org/web/packages/e1071/e1071.pdf (accessed on 18 November 2025).
Meyer, J. C., Passante, G., Pollock, S. J., & Wilcox, B. R. (2024). Introductory quantum information science coursework at US institutions: Content coverage. EPJ Quantum Technology, 11, 16. [Google Scholar] [CrossRef]
Moreno-León, J., Román-González, M., Martín-Barroso, E., Zapata-Cáceres, M., Jiménez, M., & Robles, G. (2025). Enhancing computational thinking skills in early education: Exploring the efficacy and feasibility of unplugged methodologies. Thinking Skills and Creativity, 58, 101879. [Google Scholar] [CrossRef]
Nita, L., Mazzoli Smith, L., Chancellor, N., & Cramman, H. (2021). The challenge and opportunities of quantum literacy for future education and transdisciplinary problem-solving. Research in Science & Technological Education, 41(2), 564–580. [Google Scholar] [CrossRef]
Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. Basic Books. [Google Scholar]
Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71–90. [Google Scholar] [CrossRef] [PubMed]
Rachmatullah, A., Vandenberg, J., & Wiebe, E. (2022, July 8–13). Toward more generalizable CS and CT instruments: Examining the interaction of country and gender at the middle grades level. 27th ACM Conference on Innovation and Technology in Computer Science Education (Vol. 1, pp. 179–185), Dublin, Ireland. [Google Scholar] [CrossRef]
R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available online: https://www.R-project.org (accessed on 25 August 2025).
Reise, S. P., Kim, D. S., Mansolf, M., & Widaman, K. F. (2016). Is the bifactor model a better model or is it just better at modeling implausible responses? Application of iteratively reweighted least squares to the Rosenberg Self-Esteem Scale. Multivariate Behavioral Research, 51(6), 818–838. [Google Scholar] [CrossRef] [PubMed]
Repenning, A., & Grabowski, S. (2023, June 6–8). Prompting is computational thinking. IS-EUD 2023: 9th International Symposium on End-User Development, Cagliari, Italy. Available online: https://ceur-ws.org/Vol-3408/short-s2-07.pdf (accessed on 15 June 2025).
Revelle, W. (2025). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. [Google Scholar] [CrossRef]
Ridgeway, G., Edwards, D., Kriegler, B., Schroedl, S., Southworth, H., Greenwell, B., & Cunningham, J. (2024). gbm: Generalized boosted regression models. Available online: https://cran.r-project.org/web/packages/gbm/gbm.pdf (accessed on 18 November 2025).
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. [Google Scholar] [CrossRef]
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21(2), 137–150. [Google Scholar] [CrossRef]
Román-González, M. (2015, July 6–8). Computational thinking test: Design guidelines and content validation. 7th International Conference on Education and New Learning Technologies (pp. 6–8), Barcelona, Spain. Available online: https://library.iated.org/view/ROMANGONZALEZ2015COM (accessed on 21 December 2025).
Román-González, M., Moreno-León, J., Robles, G., & Rodríguez, J. D. (2025). Extending the computational thinking framework towards quantum education. Preprint. [Google Scholar] [CrossRef]
Román-González, M., Pérez-González, J.-C., & Jiménez-Fernández, C. (2017). Which cognitive abilities underlie computational thinking? Criterion validity of the Computational Thinking Test. Computers in Human Behavior, 72, 678–691. [Google Scholar] [CrossRef]
Rosenbaum, P. R. (1987). Comparing item characteristic curves. Psychometrika, 52(2), 217–233. [Google Scholar] [CrossRef]
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. [Google Scholar] [CrossRef]
Rycerz, K., Patrzyk, J., Patrzyk, B., & Bubak, M. (2015). Teaching quantum computing with the QuIDE simulator. Procedia Computer Science, 51, 1724–1733. [Google Scholar] [CrossRef]
Singh, J., & Bhangu, K. S. (2023). Contemporary quantum computing use cases: Taxonomy, review, and challenges. Archives of Computational Methods in Engineering, 30(2), 615–638. [Google Scholar] [CrossRef]
Song, J., Howe, E., Oltmanns, J. R., & Fisher, A. J. (2022). Examining the concurrent and predictive validity of single items in ecological momentary assessments. Assessment, 30(5), 1662–1671. [Google Scholar] [CrossRef]
Sovey, S., Osman, K., & Matore, M. E. E. M. (2022). Gender differential item functioning analysis in measuring computational thinking disposition among secondary school students. Frontiers in Psychiatry, 13, 1022304. [Google Scholar] [CrossRef]
Tedre, M. (2022, October 31–November 2). Computational thinking 2.0. 17th Workshop in Primary and Secondary Computing Education (WiPSCE’22) (pp. 1–2), Morschach, Switzerland. [Google Scholar] [CrossRef]
Tsarava, K., Mavridis, A., Froelich, A., Zourmpaki, C., & Žilinskaitė, I. (2022). Cognitive predictors of primary pupils’ computational thinking: The role of spatial, numerical, and verbal abilities. Computers & Education, 185, 104530. [Google Scholar] [CrossRef]
Tucker, D. L. (2023, September 17–22). Leveraging dual enrollment programs to expand secondary education in quantum computation. 2023 IEEE International Conference on Quantum Computing and Engineering (QCE) (pp. 10–14), Bellevue, WA, USA. [Google Scholar] [CrossRef]
U.S. Department of Energy, Office of Science. (2023). Building computational literacy through STEM education: A guide for federal agencies. U.S. Department of Energy. Available online: https://www.energy.gov/sites/default/files/2024-03/Building-Computational-Literacy-Through-STEM-Ed-Guide-for-Federal-Agencies-FINAL-PUBLIC.pdf (accessed on 15 June 2025).
Xenakis, A., Avramouli, M., Sabani, M., Savvas, I., Chaikalis, C., & Theodoropoulou, K. (2023, May 1–4). Quantum serious games to boost quantum literacy within computational thinking 2.0 framework. 2023 IEEE Global Engineering Education Conference (EDUCON) (pp. 1–9), Kuwait, Kuwait. [Google Scholar] [CrossRef]
Xie, B., Davidson, M. J., Li, M., & Ko, A. J. (2019, February 27–March 2). An item response theory evaluation of a language-independent CS1 knowledge assessment. 50th ACM Technical Symposium on Computer Science Education (pp. 699–705), Minneapolis, MN, USA. [Google Scholar] [CrossRef]
Yusuf, A., Pervin, N., & Román-González, M. (2024). Generative AI and the future of higher education: A threat to academic integrity or reformation? Evidence from multicultural perspectives. International Journal of Educational Technology in Higher Education, 21(1), 21. [Google Scholar] [CrossRef]
Yusuf, A., Román-González, M., & Behera, S. K. (2025). Quantum computing in K-12 education: An early systematic review. Computer Science Education, 1–33. [Google Scholar] [CrossRef]

Figure 1. A conceptual framework of QLt.

Figure 2. Test item qc3. Primary quantum literacy addressed: Superposition (Xenakis et al., 2023). The English alphabet used in the figure is an answer option.

Figure 3. Test item qc11. Primary quantum literacy addressed: Reversibility (Franklin et al., 2020). The English alphabet used in the figure is an answer option.

Figure 4. Test item qc15. Primary quantum literacy addressed: Gates (Liu et al., 2023). The English alphabet used in the figure is an answer option.

Figure 5. Test item qc18. Primary quantum literacy addressed: Algorithm (German et al., 2024). The English alphabet used in the figure is an answer option.

Figure 6. Quantum doughnut. 1 = sprinkled, 0 = plain.

Figure 7. Reversibility cards (Franklin et al., 2020). The English alphabet used in the figure is to number the reversibility cards for easy reading.

Figure 8. Colored balloons for quantum search. Numbers are used for numbering the balloons, and colors are used to explain that the quantum search can occur in just 3 searches (√N), as opposed to classical search, which can take up to 9 searches (N).

Figure 9. Distribution of QLt scores by gender (a,b) and grade (c,d).

Figure 10. Box and whisker plots showing some outliers across groups.

Figure 11. ICC plots.

Figure 12. IIC plots.

Figure 13. TIF plot.

Figure 14. Person-item map plot.

Figure 15. Item measures for gender sub-group.

Figure 16. A sample of a bar plot of differences in item difficulty between sub-groups.

Figure 17. EGA plot of the initial 30 items.

Figure 18. EGA plot of the final 27 items.

Figure 19. Hypothesized model.

Figure 20. Plot of model prediction and QLt values.

Table 1. CT extension to quantum literacy.

CT Framework	Extension to Quantum Literacy
CT framework stage	Stage #3
Computing paradigm	Quantum computing (non-binary)
Programming	Quantum programming
Type of logic involved	Non-classical logic
Philosophical paradigm	Holistic
CT concepts	Qubit, Superposition, Supremacy, Entanglement, Teleportation, Gates, and Tunnelling.
CT practices	Seeking for coherence and intentionally collapsing
CT perspectives	Assuming uncertainty and experiencing unity within diversity

Source: Román-González et al. (2025).

Table 2. Participants’ characteristics.

	Frequency	Percent
Gender
Female	392	47.9
Male	427	52.1
Level
SSI	415	50.7
SSII	404	49.3
Career field
STEM	593	72.4
Non-STEM	226	27.6
School type
Public	472	57.6
Private	347	42.4
School gender composition
Boys	322	39.3
Girls	213	26.0
Co-education	284	34.7
Disability status
Mild disabled	206	25.2
Non-disabled	613	74.8

Table 3. Descriptive Statistics.

	N	Min.	Max.	Mean	Std. Error	Std. Deviation	Skewness	Kurtosis
QC	819	0	20	9.98	0.13	3.81	−0.08	1.47
QPr	819	0	3	1.57	0.03	0.95	−0.07	−0.92
QPs	819	0	7	3.56	0.05	1.56	−0.02	−0.21
QLt	819	2	27	15.11	0.14	4.21	−0.11	0.94

Table 4. Differences in QLt scores across gender and grade.

	N	Mean	Std. Dev.	t	p-Value
Gender
Male	392	14.79	4.12	−2.063	0.039
Female	427	15.41	4.28	−2.063	0.039
Grade
SSI	419	15.33	2.86	1.510	0.131
SSII	400	14.89	5.27	1.510	0.131

Table 5. Gender difference across QLt dimensions.

	t	p-Value	95% CI
	t	p-Value	Lower	Upper
QC	−1.638	0.102	−0.959	0.086
QPr	−1.250	0.212	−0.214	0.047
QPs	−0.792	0.429	−0.301	0.128

Table 6. Classical Test Theory analysis according to students’ scores.

Item	Difficulty (or p-Values, Mean)	Std. Dev.	Std. Error	Point-Biserial Correlation (or Item Discrimination)	Drop Alpha
qc1	0.470	0.29	0.025	0.354	0.85
qc2	0.505	0.20	0.027	0.338	0.83
qc3	0.699	0.20	0.027	0.338	0.81
qc4	0.476	0.39	0.028	0.326	0.84
qc5	0.503	0.10	0.026	0.350	0.85
qc6	0.503	0.24	0.025	0.391	0.89
qc7	0.875	0.19	0.032	0.215	0.81
qc8	0.509	0.31	0.025	0.388	0.88
qc9	0.486	0.50	0.027	0.344	0.88
qc10	0.501	0.22	0.027	0.369	0.88
qc11	0.492	0.21	0.025	0.375	0.88
qc12	0.607	0.33	0.025	0.379	0.88
qc13	0.509	0.19	0.026	0.355	0.88
qc14	0.487	0.15	0.027	0.343	0.88
qc15	0.459	0.14	0.027	0.366	0.85
qc16	0.508	0.23	0.027	0.342	0.86
qc17	0.823	0.27	0.031	0.207	0.81
qc18	0.513	0.31	0.026	0.356	0.85
qc19	0.691	0.24	0.025	0.357	0.88
qc20	0.509	0.26	0.026	0.325	0.88
qpr1	0.513	0.42	0.033	0.305	0.81
qpr2	0.627	0.11	0.032	0.322	0.81
qpr3	0.520	0.25	0.032	0.322	0.81
qps1	0.474	0.31	0.030	0.353	0.80
qps2	0.676	0.24	0.031	0.353	0.80
qps3	0.501	0.22	0.031	0.330	0.81
qps4	0.475	0.21	0.030	0.362	0.80
qps5	0.579	0.28	0.030	0.376	0.80
qps6	0.510	0.41	0.031	0.365	0.80
qps7	0.832	0.42	0.032	0.231	0.81

Overall reliability: α = 0.87; Quantum concepts reliability: α = 0.84; Quantum practices reliability: α = 0.81; Quantum perspectives reliability: α = 0.80. Note: Bold values either exceed the maximum threshold or are below the minimum threshold. Corresponding items could be considered for revision according to Classical Test Theory.

Table 7. Item Response Theory analysis according to students’ scores.

Item	Estimate	Std. Error	95% CI		Discrim
Item	Estimate	Std. Error	Lower	Upper	Discrim
qc1	−0.023	0.070	−0.160	0.116	0.329
qc2	−0.032	0.070	−0.171	0.106	0.307
qc3	−0.007	0.070	−0.145	0.131	0.311
qc4	0.091	0.070	−0.047	0.229	0.293
qc5	−0.022	0.070	−0.160	0.116	0.324
qc6	−0.022	0.070	−0.160	0.116	0.373
qc7	0.096	0.070	−0.042	0.234	0.222
qc8	−0.048	0.070	−0.186	0.090	0.370
qc9	0.050	0.070	−0.088	0.188	0.315
qc10	−0.012	0.070	−0.150	0.126	0.346
qc11	0.024	0.070	−0.114	0.162	0.356
qc12	−0.038	0.070	−0.176	0.100	0.357
qc13	−0.048	0.070	−0.186	0.090	0.326
qc14	0.044	0.070	−0.094	0.182	0.316
qc15	0.163	0.071	−0.024	0.301	0.344
qc16	−0.043	0.070	−0.181	0.095	0.313
qc17	−0.104	0.070	−0.243	0.034	0.212
qc18	−0.063	0.070	−0.201	0.075	0.340
qc19	0.029	0.070	−0.109	0.167	0.329
qc20	−0.048	0.070	−0.186	0.090	0.333
qpr1	−0.063	0.070	−0.201	0.075	0.340
qpr2	−0.125	0.071	−0.263	0.013	0.315
qpr3	−0.094	0.070	−0.232	0.044	0.337
qps1	0.101	0.070	−0.037	0.239	0.377
qps2	0.091	0.070	−0.047	0.229	0.376
qps3	−0.012	0.070	−0.150	0.126	0.349
qps4	0.096	0.070	−0.042	0.234	0.389
qps5	0.080	0.070	−0.058	0.218	0.311
qps6	−0.053	0.070	−0.191	0.085	0.498
qps7	−0.146	0.071	−0.284	0.077	0.244

Table 8. Itemfit Statistics.

Item	Chi-Square	df	p-Value	Outfit MSQ	Infit MSQ	Discrim
qc1	749.747	818	0.957	0.915	0.949	0.329
qc2	758.971	818	0.930	0.927	0.959	0.307
qc3	757.136	818	0.937	0.924	0.959	0.311
qc4	764.834	818	0.908	0.934	0.966	0.293
qc5	755.312	818	0.942	0.922	0.953	0.324
qc6	732.404	818	0.985	0.894	0.929	0.373
qc7	714.317	818	0.910	0.816	0.904	0.222
qc8	736.420	818	0.981	0.899	0.931	0.370
qc9	754.210	818	0.946	0.921	0.956	0.315
qc10	745.272	818	0.967	0.910	0.942	0.346
qc11	739.298	818	0.977	0.903	0.938	0.356
qc12	739.997	818	0.976	0.904	0.936	0.357
qc13	752.421	818	0.951	0.919	0.950	0.326
qc14	754.849	818	0.944	0.922	0.957	0.316
qc15	743.189	818	0.971	0.907	0.943	0.344
qc16	759.696	818	0.928	0.928	0.957	0.313
qc17	736.512	818	0.902	1.143	1.096	0.212
qc18	748.185	818	0.961	0.914	0.945	0.340
qc19	749.744	818	0.957	0.915	0.949	0.329
qc20	749.951	818	0.957	0.916	0.949	0.333
qpr1	738.403	818	0.902	0.946	0.985	0.340
qpr2	745.610	818	0.901	0.955	0.997	0.315
qpr3	740.793	818	0.902	0.949	0.987	0.337
qps1	718.623	818	0.908	0.922	0.967	0.377
qps2	709.041	818	0.914	0.910	0.967	0.376
qps3	734.722	818	0.903	0.941	0.982	0.349
qps4	712.213	818	0.912	0.914	0.962	0.389
qps5	797.939	818	0.927	0.996	0.954	0.311
qps6	702.578	818	0.921	0.902	0.961	0.498
qps7	732.740	818	0.903	0.939	0.981	0.244

Table 9. Network loadings.

Items	Communities
Items	1	2	3
qc1	0.81
qc2	0.64
qc3	0.65
qc4	0.61
qc5	0.73
qc6	0.97
qc7	0.19
qc8	0.94
qc9	0.81
qc10	0.93
qc11	0.93
qc12	0.62
qc13	0.62
qc14	0.89
qc15	0.88
qc16	0.72
qc17	−0.15
qc18	0.77
qc19	0.74
qc20	0.78
qpr1		0.77
qpr2		0.75
qpr3		0.72
qps1			0.61
qps2			0.73
qps3			0.81
qps4			0.75
qps5			0.70
qps6			0.69
qps7	0.17

Table 10. Factor loading estimate for the correlated and bi-factor model.

		Correlated Model	Bi-Factor Model
Factor	Item	Factor Loading	General Factor Loading	Specific Factor Loading
QC	qc1	0.75 [0.681–0.819]	0.53 [0.463–0.597]	0.48 [0.414–0.545]
	qc2	0.79 [0.731–0.849]	0.64 [0.575–0.704]	0.45 [0.388–0.518]
	qc3	0.74 [0.662–0.818]	0.69 [0.627–0.753]	0.47 [0.400–0.542]
	qc4	0.80 [0.741–0.859]	0.71 [0.647–0.773]	0.45 [0.380–0.513]
	qc5	0.75 [0.668–0.832]	0.70 [0.637–0.763]	0.46 [0.384–0.529]
	qc6	0.74 [0.679–0.801]	0.78 [0.715–0.845]	0.51 [0.449–0.568]
	qc8	0.74 [0.679–0.801]	0.69 [0.625–0.755]	0.51 [0.446–0.568]
	qc9	0.78 [0.719–0.841]	0.77 [0.707–0.833]	0.47 [0.402–0.531]
	qc10	0.75 [0.679–0.821]	0.76 [0.674–0.846]	0.49 [0.428–0.556]
	qc11	0.78 [0.719–0.841]	0.75 [0.685–0.815]	0.47 [0.405–0.532]
	qc12	0.80 [0.733–0.867]	0.83 [0.744–0.916]	0.44 [0.376–0.512]
	qc13	0.81 [0.751–0.869]	0.72 [0.636–0.804]	0.44 [0.371–0.504]
	qc14	0.78 [0.721–0.839]	0.66 [0.576–0.744]	0.47 [0.408–0.535]
	qc15	0.75 [0.687–0.813]	0.58 [0.515–0.645]	0.50 [0.442–0.563]
	qc16	0.77 [0.705–0.835]	0.76 [0.697–0.823]	0.47 [0.402–0.532]
	qc18	0.77 [0.709–0.831]	0.65 [0.566–0.734]	0.48 [0.414–0.540]
	qc19	0.78 [0.707–0.853]	0.52 [0.453–0.587]	0.46 [0.394–0.529]
	qc20	0.76 [0.699–0.821]	0.45 [0.387–0.513]	0.48 [0.422–0.547]
QPr	qpr1	0.68 [0.410–0.950]	0.49 [0.290–0.490]	0.56 [0.325–0.801]
	qpr2	0.86 [0.737–0.983]	0.60 [0.520–0.680]	0.47 [0.201–0.536]
	qpr3	0.79 [0.610–0.970]	0.73 [0.650–0.810]	0.45 [0.253–0.647]
QPs	qps1	0.80 [0.708–0.892]	0.48 [0.262–0.498]	0.44 [0.185–0.488]
	qps2	0.81 [0.720–0.902]	0.54 [0.420–0.660]	0.41 [0.272–0.541]
	qps3	0.69 [0.535–0.845]	0.68 [0.557–0.803]	0.49 [0.104–0.480]
	qps4	0.74 [0.636–0.844]	0.59 [0.492–0.688]	0.48 [0.093–0.458]
	qps5	0.69 [0.527–0.853]	0.43 [0.307–0.553]	0.71 [0.510–0.912]
	qps6	0.78 [0.700–0.860]	0.64 [0.577–0.703]	0.58 [0.420–0.735]

Note, values in square brackets are 95% confidence intervals.

Table 11. Test Administration Schedule.

Session	Test Administered	Sampling Method	Test Time
Session 1	Initial test: (QLt)	Purposive	45 min
Break:	30 min	-	-
Session 2	Randomized: (CTt or SAt)	Random	45 min
Break	30 min	-	-
Session 3	Remaining test: (CTt or SAt)	-	45 min

Table 12. Relationship between QLT, CT, and SA on domain level.

	N	Mean	Std. Dev.	QLT	CT	SA
QLt	819	21.37	7.90	1.00
CTt	819	18.78	4.27	0.655 **	1.00
SAt	819	11.63	5.08	0.321 **	0.346 **	1.00

** Correlation is significant at the 0.01 level (2-tailed).

Table 13. Relationship between QLT and CT on the facet level.

	Mean	Std. Dev.	QC	QPr	QPs	SEQ	LOOPS	COND	FUNC
QC	11.44	3.02	1.00
QPr	2.00	0.77	0.238 *	1.00
QPs	4.54	1.15	0.032	0.089 *	1.00
SEQ	17.98	3.61	0.720 **	0.125 *	0.166 *	1.00
LOOPS	2.91	0.91	0.441 **	0.228 *	0.058	0.457 **	1.00
COND	5.47	1.39	0.407 **	0.199 *	0.190 *	0.437 **	0.289 **	1.00
FUNC	7.53	2.44	0.548 **	0.138 *	0.048	0.568 **	0.400 **	0.407 **	1.00

* Correlation is significant at the 0.05 level (2-tailed). ** Correlation is significant at the 0.01 level (2-tailed).

Table 14. Model performance.

Model	Algorithm	R Package	Feature Set	Cross Validation	R²	RMSE	MAE	MPE	χ²
GBR	Gradient Boosting Regression	gbm (version 2.2.2)	CTt total score + SAt total score	10-fold	0.407	2.678	2.154	1.625	49.611
SVR	Support Vector Regression	e1071 (version 1.7-16)	CTt total score + SAt total score	10-fold	0.357	2.789	2.253	3.206	52.918
KNN	k-Nearest Neighbors	caret (version 7.0-1)	CTt total score + SAt total score	10-fold	0.239	3.033	2.393	−0.024	65.979
LR	Linear Regression	Stats (version 4.2.1)	CTt total score + SAt total score	10-fold	0.363	2.776	2.290	2.445	52.850

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yusuf, A.; Román-González, M.; Atan, N.A.; Behera, S.K.; Noor, N.M. Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework. Educ. Sci. 2026, 16, 31. https://doi.org/10.3390/educsci16010031

AMA Style

Yusuf A, Román-González M, Atan NA, Behera SK, Noor NM. Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework. Education Sciences. 2026; 16(1):31. https://doi.org/10.3390/educsci16010031

Chicago/Turabian Style

Yusuf, Abdullahi, Marcos Román-González, Noor Azean Atan, Santosh Kumar Behera, and Norah Md Noor. 2026. "Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework" Education Sciences 16, no. 1: 31. https://doi.org/10.3390/educsci16010031

APA Style

Yusuf, A., Román-González, M., Atan, N. A., Behera, S. K., & Noor, N. M. (2026). Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework. Education Sciences, 16(1), 31. https://doi.org/10.3390/educsci16010031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Quantum Literacy Test for K-12 Students: An Extension of the Computational Thinking Framework

Abstract

1. Introduction

2. Conceptual and Theoretical Foundations

3. Design Principles

4. Participant Recruitment and Training

4.1. Superposition

4.2. Entanglement

4.3. Reversibility

4.4. Quantum Gates

4.5. Quantum Algorithm

5. Expert Validation and Test Administration

5.1. Expert Validation

5.2. Test Administration

6. Psychometric Analysis

6.1. Classical Test Theory

6.2. Item Response Theory

6.3. Measurement Invariance

7. Item Unidimensionality and Multidimensionality

7.1. Exploratory Factor Analysis

7.2. Confirmatory Factor Analysis

8. Criterion Validity

8.1. Criterion Tests

8.2. Test Administration and Results

9. Discussion and Conclusions

10. Limitations and Future Direction

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI