Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features

Chi, Xu; Guo, Xinyu; Sheng, Yu

doi:10.3390/app15094783

Open AccessArticle

Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features

by

Xu Chi

¹,

Xinyu Guo

² and

Yu Sheng

^1,*

¹

School of Computer Science, Central South University, Changsha 410000, China

²

Xinjiang Technical Institute of Physics and Chemistry, University of Chinese Academy of Sciences, Urumqi 830011, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4783; https://doi.org/10.3390/app15094783

Submission received: 3 March 2025 / Revised: 28 March 2025 / Accepted: 22 April 2025 / Published: 25 April 2025

Download

Browse Figures

Versions Notes

Abstract

Keystroke data contain the behavioral information of students during the programming process. The clustering analysis of keystroke data can classify students based on specific characteristics in the programming process, thereby providing a basis for personalized teaching. Research combined with keystroke features is still in its initial stage. Due to the independence and discreteness of keystroke data, and the lack of a clear requirement for the selection of the number of clusters in traditional clustering algorithms, this selection is rather arbitrary, and outliers will affect the clustering effect. Aiming at the above problems, we improve the original method. Keystroke data were used to obtain students’ programming behavior information and optimize the traditional clustering algorithm according to the characteristics of keystroke data. The K-means++ algorithm was adopted to determine the initial clustering centers, the elbow method was used to determine the number of clusters, and an outlier processing algorithm was introduced. We have independently constructed a keystroke dataset for computer-based programming examinations and used it to verify our method. Moreover, the improved algorithm has shown improvements in multiple evaluation indicators. Experiments have proven that the method proposed in this paper can more accurately classify students’ proficiency levels in the evaluation of students’ programming abilities in the educational field. This provides strong support for the formulation of teaching strategies and the allocation of resources, and the method possesses important application value and practical significance.

Keywords:

keystroke characteristics; clustering algorithm; cognitive diagnosis; programming education

1. Introduction

With the rapid development of information technology, the field of education is constantly experiencing digital transformation. In programming education, traditional evaluation methods mainly rely on students’ examination scores and the completion of their assignments. Although this approach can, to a certain extent, reflect students’ learning achievements, it fails to comprehensively assess students’ cognitive processes and the proficiency levels of their abilities during the learning process [1]. Cognitive diagnosis relies on the performance of individuals in specific tasks to infer their cognitive abilities, and these performances are the outcomes of cognitive processes. Cognitive diagnosis can not only assess the cognitive abilities of individuals but also offer important insights for studying cognitive processes. By comparing the performances of different individuals in tasks, the disparities and characteristics of cognitive processes can be uncovered [2]. Previous studies have indicated that behavioral data generated during the learning process can reflect how the results of cognitive processes influence the final performance [3].

Keystroke data represent a type of behavioral data generated by students during the programming process. They encapsulate rich information and can mirror the distinct behavioral patterns of individuals. Similar to other biometric characteristics, keystroke data exhibit a certain degree of uniqueness and stability. The spatio-temporal characteristics of keystroke data offer a foundation for behavioral pattern analysis. By analyzing keystroke data, we can gain insights into students’ programming speed, code-editing behavior, and other characteristics. This enables us to determine whether students encounter problems during the learning process [4]. At the learning strategy level, programming learning involves both shallow and deep learning strategies, which are closely associated with students’ programming abilities, self-efficacy, and perseverance. Shallow learning strategies are more likely to be adopted in the initial stages of programming learning. They have a positive impact on perseverance and self-efficacy in computer programming. In contrast, deep learning strategies are more commonly employed in more advanced programming courses. The characteristics derived from keystroke data analysis reflect, to some extent, the learning strategies students adopt in programming learning. These characteristics are also related to students’ cognitive abilities [5]. Therefore, the in depth analysis of keystroke data in programming can offer a novel perspective for researching learning behaviors in the field of programming education. It also provides robust support for gaining a deeper understanding of students’ programming abilities and cognitive processes.

Keystroke data have demonstrated unique application value in multiple fields. In the field of identity recognition, by analyzing features such as the time intervals between users’ keystrokes and the force applied when they press keys, we can achieve the precise identification of users’ identities, with an accuracy rate of over 90% [6,7]. In the field of psychological state analysis, research has found that users’ emotions can be recognized through keystrokes, and the accuracy rate can exceed 75% [8]. In the field of education, as the most direct and detailed behavioral records during students’ programming process, keystroke data can, to a certain extent, reflect aspects like the speed, state, fluency, and accuracy of students during the programming process. Through the in-depth exploration and analysis of multi-dimensional keystroke features, students can be divided into different categories based on keystroke data, which can offer a unique perspective on understanding students’ programming learning statuses and abilities [9,10].

Clustering algorithms, as significant tools in the field of data mining, can group similar data objects, thereby revealing the latent patterns or rules within the data [11]. When applied to the analysis of students’ programming keystroke data, clustering algorithms are expected to facilitate the accurate classification of students’ programming learning patterns, providing substantial data support and a theoretical foundation for personalized programming education.

Cognitive diagnosis, as an important research direction in the field of educational measurement, conducts an in-depth analysis of students’ mastery of knowledge and skills in the learning process, the cognitive strategies they employ, and their existing cognitive deficiencies [12]. Through cognitive diagnosis, teachers can obtain detailed information about students’ knowledge structures and cognitive abilities, thereby laying a solid foundation for personalized teaching. In programming education, cognitive diagnosis can help teachers better understand students’ programming proficiency. By first understanding students’ programming learning situations and then intervening, it can identify the problems and difficulties that students encounter in programming learning, thereby providing teachers with targeted teaching suggestions.

In this paper, a dataset of examinees’ keystroke data in programming examinations is constructed. Regarding the analysis method, the proposed analysis method breaks through the existing limitations of keystroke data analysis in the field of programming education. It thoroughly considers the dynamics and sequential nature of keystroke data and effectively captures the interrelationships between these features. The students’ programming behavior data were combined with clustering methods for in-depth analysis. Improvements were made in areas such as the tendency of the traditional K-means algorithm to be trapped in local optima and the handling of outliers. The proposed analysis method is verified in the field of cognitive diagnosis in programming education.

2. Related Works

2.1. Research Status

Some scholars have delved into the application of keystroke data and cluster analysis in the field of programming education. Friday et al. [13] introduced the first generation of TrackIt-Net, an interactive code visualization platform designed using student keystroke data. This platform aims to help computer science teachers gain a better understanding of students’ programming processes, offer more timely support, and enhance students’ learning outcomes. By integrating machine learning algorithms, Nakada et al. [14] explored the relationship between students’ typing rhythms and programming abilities in programming courses through typing games. It has been found that the ability to learn and comprehend programming is correlated with typing proficiency, and the machine learning model can effectively differentiate students’ levels of understanding of programs. When it comes to exploring learning patterns, in their 2020 study, Kamal et al. [15] employed three clustering algorithms, namely the K-means, Fuzzy C-Means (FCM), and Kernel Fuzzy C-Means (KFCM) algorithms, to analyze students’ learning behaviors and performances. The study revealed that these three algorithms possess distinct characteristics in predicting students’ learning behaviors and performances. Specifically, the KFCM algorithm demonstrated excellent performance in terms of accuracy, with the highest accuracy rate reaching 90.22%. In contrast, regarding time and memory usage, the K-means algorithm yielded better results. This indicates that data analysis methods based on clustering algorithms can effectively uncover students’ learning patterns, yet different clustering algorithms vary in aspects such as accuracy, time consumption, and memory usage.

Current research indicates that keystroke data can, to some extent, mirror the state of the programming process. Programming characteristics can be extracted from keystroke data, and this extraction can contribute to enhancing programming instruction. Moreover, existing studies have demonstrated the feasibility and accuracy of clustering algorithms—particularly the K-means algorithm—in analyzing student behavioral patterns. Nevertheless, research on programming patterns grounded in keystroke data remains scarce. Additionally, the application of clustering analysis techniques to keystroke data is still in its infancy. This paper integrates keystroke data with clustering analysis methods and puts forward a classification approach for programming patterns based on keystroke features.

2.2. Research Method

The remainder of this section elaborates on the keystroke features involved in this experiment. Subsequently, it introduces the clustering algorithm employed to analyze the feature data. Finally, it enumerates the evaluation indicators utilized to assess the clustering effect following data clustering analysis.

2.2.1. Keystroke Features

As a microscopic manifestation of students’ programming behaviors, keystroke features can provide a wealth of information to gain a deep understanding of students’ programming learning processes. Here are the keystroke features chosen for the experiment.

1.: Average Keystroke Speed

It is calculated as the number of keystrokes per unit time during the continuous keystroke process. Specifically, this value is obtained by dividing the total number of keystrokes by the total duration of continuous keystrokes in programming, and the unit is in keystrokes per minute. In this experiment, the threshold for continuous keystrokes was set to 500 milliseconds. That is to say, the keystroke behavior in which the time interval between keystrokes did not exceed 500 milliseconds was considered a continuous keystroke process while the keystroke process with the time interval exceeding the threshold was regarded as a non-continuous keystroke process. The number of keystrokes and the duration in the non-continuous keystroke process would be excluded from the calculation of the keystroke speed. The calculation formula was as follows:

S = \frac{\sum_{i = 1}^{b} N_{i}}{\sum_{i = 1}^{b} T_{i}}

(1)

In this formula,

b

represents the number of continuous keystroke processes in a programming problem,

N_{i}

represents the number of keystrokes in each continuous keystroke process, and

T_{i}

represents the duration of each continuous keystroke process. The keystroke speed can intuitively reflect students’ proficiency in programming tools. Skilled programmers can usually input code quickly and accurately, thus having a relatively high keystroke speed.

2.: Modification Frequency

It aims to calculate the proportion of occurrences of modification operations such as deletion, backspace, and operations like copy, paste, and undo performed via shortcut keys to the total number of keystrokes. The calculation formula is as follows:

F = \frac{\sum_{i = 1}^{h} C_{i}}{L}

(2)

In this formula,

h

represents the number of modification actions,

C_{i}

represents the number of occurrences of each modification action, and

L

represents the number of modification actions used for calculating this proportion. The modification frequency can reflect students’ thinking and error-correction activities during programming. An appropriate modification frequency shows that students are continuously optimizing the code. On the other hand, an overly high modification frequency might suggest that students lack clear programming concepts or inadequate command of programming knowledge. The following are common modification operations during programming:

Delete;
Backspace;
Copy shortcut key;
Paste shortcut key;
Undo shortcut key;

3.: Average typing time of keywords.

For the average typing time of keywords, in this experiment, certain keywords from different programming languages, such as “return”, “for”, and “private”, were carefully selected. The average typing time of these keywords was determined by dividing the total typing duration of these keywords by the number of keywords, and the unit was in milliseconds. The calculation formula was as follows:

T = \frac{\sum_{i = 1}^{k} D_{i}}{k}

(3)

In this formula,

k

represents the frequency of keyword usage in a programming problem and

D_{i}

represents the time duration for typing each keyword. This feature can reflect students’ familiarity with programming syntax. Students who are familiar with programming syntax can input keywords quickly and accurately, thus shortening the programming time. Here are some common keywords for some common programming languages in online exams:

Python 3.12: def, class, if/else/elif, for/while, try/except/finally, import, return;
Java 8: class, public/private/protected, static, void, if/else, switch, for/while/do, try/catch/finally;
C(C17)/C++(C++17): int/float/char/double/void/short/long/signed/unsigned, if/else, switch/case, for/while/do/break/continue, break/continue, return, struct, typedef.

4.: Time Spent on Each Question

It is determined by the time interval between pressing the first key and releasing the last key of each question, and the unit is in seconds. The time spent on each question can reflect students’ efficiency in solving programming problems. The shorter the time spent, the stronger students’ ability is usually to master and apply the relevant knowledge.

In addition to the previously mentioned keystroke features, the following data generated during the examination process were selected for clustering analysis.

5.: Average Number of Submissions per Question

The number of submissions of each question by students was counted. This feature could, to a certain extent, reflect the accuracy rate of students’ code submissions. The more submissions there were, the more difficulties were encountered during the problem-solving process, and the lower the accuracy rate of the results given by the code was.

6.: Final Examination Score

The final examination score was the actual one obtained by students in the programming examination, which served as a comprehensive indicator to measure students’ programming learning achievements.

These features comprehensively reflected students’ programming behaviors and learning processes from multiple dimensions, providing rich data support for the subsequent clustering analysis.

2.2.2. Clustering Algorithm

Clustering algorithms are a category of unsupervised learning algorithms [16], which seek to partition the samples within a given dataset into distinct clusters so that the samples belonging to the same cluster have a great deal of similarity whereas those across different clusters have marked differences. Based on their principles and characteristics, clustering algorithms can be roughly classified into several types, such as partitioning clustering algorithms, hierarchical clustering algorithms, density-based clustering algorithms, grid-based clustering algorithms, and model-based clustering algorithms.

In this experiment, the various data were relatively independent and exhibited a certain level of dispersion. Considering the convergence speed and interpretability of the algorithm, the K-means algorithm was selected for clustering analysis in this study. The K-means algorithm is a partition-based clustering algorithm. The underlying principle of the K-means algorithm is to partition the samples in the dataset into K clusters. The goal is to minimize the sum of the distances from each sample within a cluster to the center of that cluster. Euclidean distance is commonly used as the distance metric in calculating K-means, and its calculation formula is as follows:

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(4)

Here,

x = (x_{1}, x_{2}, \cdot \cdot \cdot, x_{n})

and

y = (y_{1}, y_{2}, \cdot \cdot \cdot, y_{n})

are two samples, and

n

is the dimension of the samples. The mathematical model of the K-means algorithm can be expressed as follows:

{m i n}_{C_{1}, C_{2}, \dots, C_{k}} \sum_{i = 1}^{k} \sum_{x_{j} \in C_{i}} {||x - μ_{i}||}^{2}

(5)

Here,

k

is the number of clusters,

C_{i}

represents the

i

-th cluster,

x

is a data point in cluster

C_{i}

,

μ_{i}

is the centroid of cluster

C_{i}

, and

| |x - μ_{i}| |

represents the distance from data point

x

to centroid

μ_{i}

. Through the minimization of this objective function, the K-means algorithm strives to identify an optimal clustering approach that makes the data points within each cluster as closely grouped around its centroid as possible, thereby achieving the purpose of clustering.

In the context of this study, the dimension of the samples corresponded to the number of extracted keystroke features, including the keystroke speed and modification frequency. Subsequently, each sample was assigned to the cluster with the nearest clustering center. Following this assignment, the center of each cluster was recalculated, which represented the mean of all samples within that cluster. This process was carried out iteratively until the clustering centers no longer changed or the change was negligible, or the preset maximum number of iterations was reached.

2.2.3. Evaluation Method

The evaluation of the clustering effect is an important step. It helps in assessing the performance of clustering algorithms. Moreover, it plays a crucial role in analyzing the reliability and validity of the clustering results. According to the characteristics of the experimental dataset, internal evaluation indicators were chosen for evaluating the clustering effect in this study. It was not necessary to rely on external real label information. Such indicators usually evaluated the clustering effect in terms of two aspects: the compactness within clusters and the separation between clusters.

1.: Silhouette Score

The Silhouette Score is a commonly used internal evaluation indicator that comprehensively considers the compactness of a sample within the cluster and its separation from other clusters. The calculation formula of the Silhouette Score is given below:

s (i) = \frac{b (i) - a (i)}{\max (a (i), b (i))} \underset{i = 1, \dots n}{m a x} (s (i)) = 1 \min_{i = 1, \dots, n} (s (i)) = - 1

(6)

Here,

i

represents a sample, which, in this article, represents a candidate’s keystroke data vector.

a (i)

represents the average distance from sample

i

to other samples in the same cluster, quantifying the cohesion within the cluster. The smaller

a (i)

is, the closer the samples within the cluster are.

b (i)

represents the average distance from sample

i

to all samples in the nearest other cluster, quantifying the separation between clusters. The larger

b (i)

is, the better the separation between clusters is.

The value of the Silhouette Score ranges from −1 to 1. The closer the value is to 1, the better the clustering effect is, that is, the samples within the cluster are closely clustered and there is a clear separation between clusters. The closer the value is to −1, the more likely the sample is misclassified. When the value is close to 0, it means that the sample is at the boundary of two clusters and the clustering effect is poor. The Silhouette Score can consider both the compactness within clusters and the separation between clusters, comprehensively evaluate the clustering effect, and have good applicability to different clustering algorithms and datasets.

2.: Calinski–Harabasz Index (CH index)

The CH index is the ratio of the within-cluster dispersion to the between-cluster dispersion in clustering. The formula for calculating the CH index is as follows:

C H = \frac{T r (B_{k})}{T r (W_{k})} \times \frac{n - k}{k - 1}

(7)

Here,

T r (B_{k})

is the trace of the between-cluster dispersion matrix. This value reflects the degree of dispersion between clusters.

T r (W_{k})

is the trace of the within-cluster dispersion matrix, and it reflects the degree of dispersion within clusters.

n

represents the total number of samples, and

k

is the number of clusters. The larger the CH index is, the greater the dispersion between clusters and the smaller the dispersion within clusters will be, that is, the better the clustering effect will be. The CH index can efficiently evaluate the clustering effect and exhibit a certain level of adaptability to different types of datasets.

3.: Davies–Bouldin Index (DB index)

The DB index is an evaluation indicator used to measure the similarity between clusters, and it assesses the clustering effect by calculating the ratio of the intra-cluster average distance to the inter-cluster distance. Its calculation formula is as follows:

D B = \frac{1}{k} \sum_{i = 1}^{k} \max_{i \neq j} \frac{s_{i} + s_{j}}{d_{i j}}

(8)

Here,

s_{i}

represents the average distance from samples to the center of the

i

-th cluster, that is, the average distance from samples to the cluster center. The smaller the value of

s_{i}

is, the more tightly the samples within the cluster are clustered.

d_{i j}

is the distance between the

i

-th and

j

-th clusters, usually computed as the distance between the cluster centers. The larger the value of

d_{i j}

is, the higher the degree of separation between clusters will be. The smaller the value of the DB index is, the better the clustering effect will be. In other words, the samples within the cluster are tightly grouped together while the samples between different clusters are relatively distant from each other.

3. Materials and Methods

3.1. Experimental Method

As shown in Figure 1, in this experiment, the keystroke data were first collected by the data collection module. A multi-dimensional data collection and analysis method was employed in this study for comprehensively gathering and examining relevant information. During the data collection phase, relying on the self-developed on-computer programming examination system, the keystroke data of students during multiple programming exams were collected in real time and accurately.

In the data analysis stage, preprocessing operations such as cleaning, denoising, and standardization were performed on the original keystroke data. Abnormal data points and erroneous data records were effectively removed, and the data range was unified. The analysis and extraction of keystroke features were accomplished by means of the keystroke data processing algorithm. After obtaining the keystroke features of all the experiment participants, clustering analysis was conducted in combination with the examination data to obtain the experimental results.

3.2. Improvement of the K-Means Algorithm

3.2.1. Selection of Initial Clustering Centers

The examinees’ keystroke data during the programming examination exhibited a significant amount of randomness. In traditional K-means algorithm, the random selection of initial clustering centers may cause the algorithm to fall into a local optimum [17]. Different selections of initial clustering centers may lead to significantly different clustering results. To address this issue in this experiment, the K-means++ algorithm was employed to select the initial clustering centers. The K-means++ algorithm selects new clustering centers by choosing data points. These data points are far from the already selected clustering centers. In this way, the algorithm can effectively prevent the initial clustering centers from being overly clustered and effectively improve the convergence speed and stability of the algorithm.

3.2.2. Selection of the Clustering Number k

To ensure the universality of the data, the participants were selected without any screening based on grade. This made the examinees participating in the experiment diverse, and there was no way to determine the clustering number

k

for the examinees based on experience. In clustering algorithms, the K-means algorithm necessitates that the clustering number

k

be specified beforehand, and the choice of this value

k

often lacks an objective foundation. If

k

is not properly selected, it will seriously affect the clustering effect [18].

In this experiment, the elbow method was introduced to determine the clustering number. By calculating the Sum of Squared Errors (SSE) under different clustering numbers, a curve showing the relationship between SSE and the clustering number was plotted, and the clustering number corresponding to the inflection point of the curve was selected as the optimal clustering number. The calculation formula of SSE is as follows:

S S E = \sum_{i = 1}^{k} \sum_{x \in C_{i}} {||x - μ_{i}||}^{2}

(9)

Here,

k

is the number of clusters,

C_{i}

is the

i

-th cluster,

μ_{i}

is the center of the

i

-th cluster, and

x

is a sample in the

i

-th cluster.

3.2.3. Treatment of Outliers

There were students at the top and bottom of the ranking in the examinee group. For example, there were top students who had received professional programming training in the school team and bottom students with extremely poor programming abilities. Considering this situation, the keystroke feature data had rather extreme values, and there were certain outliers in the clustering. The K-means algorithm measures similarity using the squared Euclidean distance; therefore, it is very sensitive to data noise and outliers.

Outliers, lying far from most data points, cause significant deviation from the centroid position during the calculation of cluster centroids. Even a small number of outliers can exert a substantial influence on the center of the cluster they belong to. Moreover, outliers may be misassigned to a cluster or form a separate “pseudo-cluster”. This disrupts the structure of the original data distribution and impairs the compactness of other clusters. Outliers necessitate additional iteration steps to stabilize the centroid position. The repeated adjustment of centroids prolongs the convergence time of an algorithm and may even prevent it from reaching a stable state. As a result, outliers reduce the accuracy of clustering. They may also increase the number of algorithm iterations, thereby decreasing algorithm efficiency.

In this experiment, the Local Outlier Factor (LOF) algorithm was introduced during the preprocessing stage of experimental data to identify outliers. The LOF algorithm determines whether a data point is an outlier by calculating the ratio of the local density of each data point to that of its neighboring data points. In other words, it assesses whether the entire observation vector of a candidate represents abnormal keystroke data. If a data point is identified as an outlier, it is corrected and reinserted into the dataset.

The flowchart of the improved K-means clustering algorithm is shown in Figure 2.

4. Results

4.1. Experimental Data

The experimental data were collected in the context of a final computer programming examination at a renowned comprehensive university. During the examination, the system script recorded, in real time, detailed information such as timestamps, key values, and the key status (pressed, released) of each keystroke made by the students. As depicted in Table 1, the ID indexed the keyboard and mouse operations in chronological sequence. The “key” represented the key value, the “time” indicated the timestamp when the current operation took place, the “type” specified whether the current key operation was a press or a release, and the “row” and “column” denoted the row number and column number, respectively, where the current key action occurred. Furthermore, the system recorded data such as the time when students started answering each question, the submission time, and the answering results, ensuring the comprehensiveness and reliability of the data sources.

In this study, a total of 70,742 keystroke data points from computer programming exams were collected, involving 932 students. All the participants were first-year students majoring in computer-related disciplines, and they had received a one-semester programming course before the computer-based examination. The computer-based examination in the experiment consisted of six questions in total, covering different difficulty levels and question contents. Considering the differences in students’ states at different stages of the examination and the differences in the difficulty of the questions, the feature data of each examinee were all average values of each feature in the six questions, so as to reduce errors.

4.2. Experimental Results

The detailed results of the clustering analysis are presented in Table 2.

5. Discussion

5.1. Clustering Result Analysis

Based on the clustering results, to further explore and reprocess the data, the various features in Table 2 were matched with the programming behaviors during the exam as follows, aiming to better reflect the characteristics of the programming behavior process.

1.: Keystroke Speed

Table 2 directly presents the average keystroke speed value. By comparing this value among different types of candidates, it more intuitively reveals the difference in operation speed among different types of programming behaviors.

2.: Modification Operation

The “average proportion of modification actions” in Table 2 was converted into the “modification frequency”. A higher proportion of modification actions implied a higher modification frequency. Meanwhile, the content of the table was replaced with the relative speeds of the four types of candidates, making the expression more consistent with the context of programming behavior analysis and highlighting the operation frequency.

3.: Keyword Relevance

The “Average time to type a keyword” was transformed into “keyword proficiency“. A shorter average keyword typing time indicated a higher level of keyword proficiency. The time dimension was thus converted into a measure of the degree of mastery of programming knowledge. At the same time, the table content was replaced with the relative levels of keyword proficiency of the four types of candidates, which could better reflect the candidates’ familiarity with key programming elements.

4.: Question Relevance

Analysis indicators for results related to the questions, which comprehensively reflected programmers’ behaviors and outcomes during the problem-solving process, included the following two:

Score: This incorporated the “Average score for each question” in Table 1 into the relative high–low scores of the four types of candidates.
Number of Attempts: This converted the number of submissions for each question into the “Number of Attempts.” The more submissions there were, the more attempts had been made on a question. Then, we transformed the table content into the relative high–low numbers of attempts of the four types of candidates.

After matching the clustering analysis results with the programming behaviors in the examination, the analysis of the programming behaviors of different types of examinees is presented in Table 3 and Figure 3. Using the time spent typing keywords and the time spent on the questions as the coordinate axes, the data points of each student have been plotted in the figure and color-coded according to the category they belonged to. As shown in the Figure 4, students of different categories exhibited distinct clustered distributions. The data points of students within the same category were relatively concentrated. And, moreover, there was a noticeable distance between the data points of students of different categories. This further validated the effectiveness of the clustering results.

Students in Category 1 accounted for the largest proportion among all the categories. They had a relatively slow average typing speed, but the proportion of their modification actions was the lowest. The average time they spent typing keywords was the shortest, and the time they spent answering questions was relatively short. Moreover, their average scores were the highest. In the radar chart in Figure 3, the area of the polygon was quite large, and the values in each feature dimension were relatively balanced. This indicates that students of this type mastered, proficiently, programming syntax and knowledge; possessed clear programming ideas; and could complete programming tasks efficiently and accurately. Although they had no advantage in typing speed, their overall programming ability was relatively strong. This type of student was named “Steady and Proficient Student”.

Students in Category 2 accounted for a relatively large proportion compared to some other categories. They had a relatively slow average typing speed, and the proportion of their modification actions was moderate. The average time they spent typing keywords was relatively short. However, they spent an extremely long time answering questions, and their average scores were relatively low. This indicates that students of this type have a reasonable level of familiarity with programming syntax. However, when solving complex programming problems, they may lack effective problem-solving strategies and time management abilities, thus spending a substantial amount of time but struggling to achieve satisfactory outcomes. This type of student was named “Syntax-Familiar but Problem-Struggling Student”.

Students in Category 3 accounted for a relatively small proportion among all the categorized students. Their average typing speed was moderately paced, but the proportion of their code modification actions was relatively high. The average time they spent typing keywords was relatively long, and the time they spent answering questions was considerably long. Additionally, their average scores were low. This indicates that students of this type may have had a certain degree of understanding of programming knowledge, yet their grasp of it was rather weak, and their programming concepts lacked clarity. As a result, they frequently modified the code during the programming process, and they were not proficient in using keywords. They spent a substantial amount of time solving problems but achieved unsatisfactory results. This type of student was named “Foundation-Lingering Student”.

Students in Category 4 accounted for the smallest proportion. They had the fastest typing speed, but the average time they spent typing keywords was extremely long. The time they spent answering questions was relatively short, and their average scores were relatively high. This may imply that students of this type were quite proficient in programming operations, but they encountered significant difficulties in understanding and applying programming syntax and keywords. However, they could achieve good results due to their relatively fast operation speed and certain problem-solving abilities. This type of student was named “Swift-Typing but Shallow Student”.

The proportions of students of each type in this examination are shown in Figure 5. From the clustering results, we can clearly see that there were significant differences in students’ abilities in programming learning and styles among different categories, and that the distribution of students’ abilities in programming learning was uneven. Students of the “Steady and Proficient” type accounted for the largest proportion, which indicates that in the experimental programming education setting, most students can attain an advanced programming proficiency through learning.

To demonstrate the effectiveness of this clustering analysis, several other common clustering algorithms have been selected in this paper to compare their clustering effects with those of the clustering algorithm adopted in this experiment. As shown in Table 4, the clustering algorithm employed in this experiment significantly outperformed the traditional K-means algorithm and other types of clustering algorithms across all evaluation indicators, which proved the validity of this clustering analysis experiment.

5.2. Impact of Different Improvements on the Experimental Results

In multiple experiments, when the traditional method of randomly selecting the initial clustering centers was adopted, different initial values resulted in substantial variations in the clustering results. In one of the experiments, the proportion of “Foundation-Lingering Student” changed from the original 7.09% to 10.23%, and the average keystroke speed also changed from 183.538 keystrokes per minute to 165.421 keystrokes per minute. The values of other features also changed significantly. As shown in Table 5, the comparison of the clustering effect evaluation indicators indicates that using the K-means++ algorithm to select the initial clustering centers significantly enhances the clustering effect.

For this experiment, the curve showing the relationship between SSE and the clustering number K is illustrated in Figure 6. When the value of K was 4, a distinct inflection point appeared on the curve. As shown in Table 6, by comparing various evaluation indicators under different values of K, it was shown that the clustering number obtained through the elbow method had notable advantages in terms of interpretability and the clustering effect.

As shown in Table 7, the presence of outliers disrupted the calculation of the clustering centers, thereby exerting a significant influence on the clustering process. The LOF algorithm was designed to correct outliers to avoid the influence they caused. Specifically, after adding the LOF algorithm, the CH index decreased slightly. Adding the LOF algorithm in this experiment generally significantly improved the clustering result, as evidenced by multiple evaluation indicators, which enhanced the reliability and accuracy of the clustering.

6. Conclusions

This paper proposes a cognitive diagnosis method for students’ programming abilities based on the analysis of keystroke features, aiming to address the common issues in programming education at colleges and universities. An experiment was designed to prove the effectiveness of applying the clustering algorithm to keystroke feature data and further demonstrate the feasibility of our proposed method in the field of programming education. Cognitive diagnosis constituted one of the pivotal core objectives of this study. In this section, we will conduct an in-depth analysis of different types of students based on the clustering results, describe more precisely the cognitive characteristics and ability levels of students during the programming learning process, explore the guidelines for intervention after cognitive diagnosis, and provide a strong basis for personalized teaching.

6.1. Cognitive Diagnosis Based on Clustering Results

Students of the “Steady and Proficient” type performed quite consistently in all aspects, which indicates that they had already solidly established a good cognitive structure in programming. They skillfully applied programming knowledge to solve problems and possessed strong programming thinking skills and practical abilities. For such students, more challenging learning tasks can be provided, such as participating in open-source projects and carrying out algorithm optimization, to further enhance their programming abilities and innovative thinking abilities.

Students of the “Syntax-Familiar but Problem-Struggling” type solved problems slowly and gained poor grades, indicating that they needed to improve their programming thinking and problem-solving abilities. When facing programming problems, these students lacked effective analysis methods and problem-solving strategies, and they spent a large amount of time but could not obtain the correct results. We should strengthen the thinking training for such students, guiding them to analyze problems, summarize problem-solving methods, and enhance their programming thinking and problem-solving skills.

Students of the “Foundation-Lingering” type were greatly deficient in mastering basic programming knowledge. The relatively high modification frequency and long keyword-typing time in the clustering results indicated that these students did not have an in-depth understanding of programming concepts and syntax. They needed to keep making attempts and adjustments during the programming process, which fully reflected their deficiencies in the systematic nature of knowledge and its coherence. The educational guidance for such students should be aimed at laying a solid foundation in basic knowledge, including detailed explanations and relevant exercises.

Students of the “Swift-Typing but Shallow” type excelled in keyboard operations but spent a long time typing keywords, which showed that there was a disconnection between their understanding and application of programming knowledge. When programming, they lacked a thorough understanding of the programming concepts and logic represented by the keywords, resulting in them spending much time using the keywords. The educational guidance for such students should focus on providing in-depth explanations of knowledge and guiding them to understand the internal logic and application scenarios of programming knowledge.

6.2. Exploration of the Laws of Programming Education Based on Clustering Results

6.2.1. Basic Abilities Determine the Upper Limit

A solid foundation in programming syntax and a complete knowledge system hold a central position in programming learning. This can be seen from the outstanding performance of students of the “Steady and Proficient” type. The proficient mastery of programming syntax and knowledge by these students provided a good foundation for them to solve high-level programming problems. Students of the “Swift-Typing but Shallow” type were in sharp contrast. Although these students were relatively adept at code operations, they had significant deficiencies in the understanding and application of programming syntax and knowledge, which created obstacles when faced with complex programming tasks. The above data indicate that a solid programming foundation cannot be replaced by programming proficiency, and the former is the key factor determining the upper limit of students’ programming abilities.

From the perspective of educational psychology, a solid foundation helps students build a stable and efficient knowledge network [19]. When facing new programming problems, they can quickly call up relevant knowledge for analysis and solution. However, operational proficiency without the support of a foundation is just a superficial skill, and it proves challenging to deal with more complex programming situations.

6.2.2. The Law of Learning Strategy Effectiveness

The learning strategies applied in programming education typically cover several key aspects such as time management and problem-solving approach planning. There is a strong correlation between programming scores and the effectiveness of learning strategies. The learning difficulties of students of the “Syntax-Familiar but Problem-Struggling” type reveal this law. Although these students were somewhat familiar with programming syntax, they lacked effective learning strategies when facing complex programming problems. In the end, they lacked reasonable time allocation or clear problem-solving ideas, resulting in much time spent but achieving unsatisfactory results. Simple knowledge memorization or mechanical practice cannot help students overcome the hurdle of tackling complex problems.

6.2.3. Nonlinear Growth Characteristics

The development of programming ability showed a pronounced threshold effect. It could be seen from the clustering results that students of the “Steady and Proficient” type accounted for more than 65%. This proportion indicates that when students’ basic knowledge accumulation reaches a certain critical point, their learning efficiency will undergo exponential growth.

In the early stage of programming learning, students need to go through a long process of quantitative change. During this stage, the learning process mainly focuses on three aspects: constantly accumulating programming knowledge, improving programming skills, and cultivating programming thinking. As students continuously consolidate their foundations and reach the threshold, their programming abilities will make a qualitative leap, enabling them to independently solve complex programming problems and even carry out innovative programming. According to existing research, this characteristic of nonlinear growth is in line with the general laws of students’ ability development in education [20].

6.2.4. The Theory of Process Quality Dominance

Compared to superficial behavioral indicators like the keystroke speed, behavioral characteristics in the programming process, such as the frequency of code modification and the time spent on processing keywords, can predict learning outcomes more accurately. This reflects the importance of the quality of in-depth learning in programming study.

Taking students of the “Foundation-Lingering” type as an example, while their average keystroke speed was at a moderate level, the proportion of code modification actions was relatively high, and the average time taken to type keywords was also relatively long. This reflected two aspects. First, the quality of their in-depth learning during the programming process was not high. Second, their understanding of programming knowledge was not profound enough. These factors led to frequent errors in programming practice and thus the need for continuous code modification. In contrast, students of the “Steady and Proficient” type had the lowest proportion of modification actions and the shortest average time taken to type keywords. This indicates that they had a deep understanding of the knowledge during the learning process. They could accurately apply programming syntax and keywords, thereby completing programming tasks efficiently.

Through an in-depth analysis of students’ programming keystroke data, this study effectively conducted a cognitive diagnosis of students’ programming abilities. This study also revealed the differences among students in different categories. These differences were reflected in aspects such as programming learning abilities, styles, key factors influencing their grades, and the problems and challenges they faced during the learning process. In the present study, the data employed were limited to keystroke data. Moreover, only cluster analysis was conducted within the context of programming course examinations at colleges and universities.

For future research, the scope of data collection could be broadened. Data could be gathered from students across multiple scenarios, including daily programming learning and the use of online programming platforms. By integrating multimodal data such as videos and images, a more comprehensive dataset on students’ programming learning behaviors can be constructed. Subsequently, more characteristic variables associated with students’ programming learning can be unearthed. Additionally, multi-dimensional analysis of candidates’ test results on online examination platforms can be achieved. This will furnish more practical guiding theories and methods for the advancement of programming education.

Author Contributions

Conceptualization, Y.S. and X.C.; methodology, X.C. and X.G.; software, X.C.; validation, X.C.; formal analysis, X.C. and X.G.; investigation, X.C.; resources, Y.S.; data curation, Y.S. and X.C.; writing—original draft preparation, X.C.; writing—review and editing, X.C.; visualization, X.C.; supervision, Y.S.; project administration, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data available on request due to privacy restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xu, L.M.; Zhou, Q. Analysis of Teaching Problems and Countermeasures in Computer Programming Courses. Comput. Campus. 2020, 12, 9–12. [Google Scholar]
Xu, Y.; Wu, Z. Research on cognitive diagnosis in English testing. Foreign Lang. Teach. Theory Pract. 2021, 2, 44–55+75. [Google Scholar]
Limpo, T.; Alves, R.A.; Connelly, V. Examining the transcription-writing link: Effects of handwriting fluency and spelling accuracy on writing performance via planning and translating in middle grades. Learn. Individ. Differ. 2017, 53, 26–36. [Google Scholar] [CrossRef]
Leinonen, J. Keystroke Data in Programming Courses; University of Helsinki: Helsinki, Finland, 2019. [Google Scholar]
Mahatanankoon, P. Cognitive Learning Strategies in an Introductory Computer Programming Course. Inf. Syst. Educ. J. 2021, 19, 11–20. [Google Scholar]
Wang, K.; Song, L.P.; Zheng, J.J. Continuous Identity Authentication by Integrating Keystroke Content and Keystroke Behavior. Comput. Eng. Des. 2020, 41, 1562–1567. [Google Scholar]
Zhang, C.; Han, J.H.; Li, F.L.; Wei, C.P. A Review of Keystroke Dynamics Research. J. Inf. Eng. Univ. 2020, 21, 310–315+324. [Google Scholar]
Epp, C.; Lippold, M.; Mandryk, R.L. Identifying emotional states using keystroke dynamics. In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, Vancouver, BC, Canada, 7–12 May 2011. [Google Scholar]
Khan, M.F.A.; Edwards, J.; Bodily, P.; Karimi, H. Deciphering Student Coding Behavior: Interpretable Keystroke Features and Ensemble Strategies for Grade Prediction. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023. [Google Scholar]
Shrestha, R.; Leinonen, J.; Zavgorodniaia, A.; Hellas, A.; Edwards, J. Pausing While Programming: Insights From Keystroke Analysis. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), Pittsburgh, PA, USA, 25–27 May 2022. [Google Scholar]
Zhang, Y.L.; Zhou, Y.J. A Review of Clustering Algorithms. J. Comput. Appl. 2019, 39, 1869–1882. [Google Scholar]
Ma, W.; Guo, W. Cognitive diagnosis models for multiple strategies. Br. J. Math. Stat. Psychol. 2019, 72, 370–392. [Google Scholar] [PubMed]
James, F.E.; Feldhausen, R.; Bean, N.H.; Weese, J.; Allen, D.S.; Friend, M. From Typing to Insights: An Interactive Code Visualization for Enhanced Student Support Using Keystroke Data. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V 2, Pittsburgh, PA, USA, 26 February–1 March 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1493–1494. [Google Scholar]
Nakada, T.; Miura, M. Extracting typing game keystroke patterns as potential indicators of programming aptitude. Front. Comput. Sci. 2024, 6, 1412458. [Google Scholar] [CrossRef]
Kamal Bunkar, S.T. Educational Data Mining for Student Learning Pattern Analysis using Clustering Algorithms. Int. J. Eng. Adv. Technol. 2020, 9, 481–488. [Google Scholar]
Aggarwal, C.C.; Reddy, C.K. Data Clustering: Algorithms and Applications. 2013; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Patel, G.K.; Dabhi, V.K.; Prajapati, H.B. Study and Analysis of Particle Swarm Optimization for Improving Partition Clustering; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Guo, Y.K.; Zhang, X.Y.; Liu, L.P.; Ding, L.; Niu, X.L. K-means Clustering Algorithm with Optimized Initial Clustering Centers. Comput. Eng. Appl. 2020, 56, 172–178. [Google Scholar]
Chen, Q. Contemporary Educational Psychology; Beijing Normal University Press: Beijing, China, 2007. [Google Scholar]
An, S.A. The Complexity of Education and Nonlinear Laws. Mod. Educ. Manag. 2015, 7, 41–46+81. [Google Scholar]

Figure 1. Experimental method flowchart.

Figure 2. Improved K-means clustering algorithm process.

Figure 3. Radar chart comparing programming performance of different types of students.

Figure 4. Data scatter plot with keyword typing time and question time as coordinate axes.

Figure 5. The proportion of each type of students in this experiment.

Figure 6. SSE and cluster number K relationship curve.

Table 1. An example dataframe of keystroke logging information.

Id	Key	Time	Type	Column
1	Shift	14:06:16:638 4/1/2025	down	0
2	#	14:06:16:652 4/1/2025	down	0
3	Shift	14:06:16:741 4/1/2025	up	1
4	3	14:06:16:748 4/1/2025	up	1
5	i	14:06:16:981 4/1/2025	down	1
6	i	14:06:17:44 4/1/2025	up	2
7	n	14:06:17:133 4/1/2025	down	2
8	n	14:06:17:206 4/1/2025	up	3
9	c	14:06:17:276 4/1/2025	down	3
10	c	14:06:17:356 4/1/2025	up	4
11	l	14:06:17:412 4/1/2025	down	4
12	l	14:06:17:558 4/1/2025	up	5
13	u	14:06:17:596 4/1/2025	down	5
14	d	14:06:17:653 4/1/2025	down	6
15	u	14:06:17:658 4/1/2025	up	7
16	d	14:06:17:740 4/1/2025	up	7
17	e	14:06:17:874 4/1/2025	down	7
18	e	14:06:17:932 4/1/2025	up	8
19	Space	14:06:18:412 4/1/2025	down	8
20	Space	14:06:18:469 4/1/2025	up	9

Table 2. Cluster analysis results of keystroke feature data and test data.

Category	Proportion	Average Typing Speed (Keys/Minute)	Average Percentage of Modified Actions (%)	Average Time to Type a Keyword (Milliseconds)	Average Time to Answer Each Question (Seconds)	Average Score for Each Question	Average Number of Submissions per Question
Category 1	65.35%	152.315	10.10	538.827	1791.993	82.096	3.303
Category 2	24.41%	143.702	12.87	699.316	4773.891	44.968	4.947
Category 3	7.09%	183.538	23.32	9312.115	3333.714	51.111	4.803
Category 4	3.15%	296.528	15.42	22,709.063	2734.728	76	2.288

Table 3. Analysis of the results after matching the clustering results with the programming behavior.

Category	Proportion	Typing Speed	Modify Frequency	Keyword Proficiency	Question Time	Number of Attempts	Score
Steady and Proficient	Largest	Slower	Minimum	Best	Shortest	Less	Highest
Syntax-Familiar but Problem-Struggling	Larger	Slowest	Moderate	Better	Extremely long	Most	Lowest
Foundation-Lingering	Smaller	Medium	Maximum	Worse	Longer	More	Lower
Swift-Typing but Shallow	Least	Fastest	Moderate	Worst	Shorter	Least	Higher

Table 4. Comparison of clustering algorithm effects.

Clustering Algorithms	Silhouette Score	CH Index	DB Index
Improved K-Means	0.682	603.720	0.462
Traditional K-Means	0.658	612.553	0.528
Hierarchical Clustering	0.608	543.281	0.615
Gaussian Mixture Model	0.512	467.141	0.739

Table 5. Comparison of clustering effects under different initial cluster center selection methods.

Initial Cluster Center Selection Method	Silhouette Score	CH Index	DB Index
K-Means++	0.682	603.720	0.462
Random	0.658	612.553	0.528

Table 6. Comparison of clustering effects under different cluster numbers.

Number of Clusters	Silhouette Score	CH Index	DB Index
K = 4	0.682	603.720	0.462
K = 3	0.642	249.473	0.743
K = 5	0.543	542.950	0.578

Table 7. Comparison of clustering effects under different outlier processing methods.

Treatment of Outliers	Silhouette Score	CH Index	DB Index
LOF	0.682	603.720	0.462
No treatment	0.638	612.658	0.537

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chi, X.; Guo, X.; Sheng, Y. Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features. Appl. Sci. 2025, 15, 4783. https://doi.org/10.3390/app15094783

AMA Style

Chi X, Guo X, Sheng Y. Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features. Applied Sciences. 2025; 15(9):4783. https://doi.org/10.3390/app15094783

Chicago/Turabian Style

Chi, Xu, Xinyu Guo, and Yu Sheng. 2025. "Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features" Applied Sciences 15, no. 9: 4783. https://doi.org/10.3390/app15094783

APA Style

Chi, X., Guo, X., & Sheng, Y. (2025). Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features. Applied Sciences, 15(9), 4783. https://doi.org/10.3390/app15094783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features

Abstract

1. Introduction

2. Related Works

2.1. Research Status

2.2. Research Method

2.2.1. Keystroke Features

2.2.2. Clustering Algorithm

2.2.3. Evaluation Method

3. Materials and Methods

3.1. Experimental Method

3.2. Improvement of the K-Means Algorithm

3.2.1. Selection of Initial Clustering Centers

3.2.2. Selection of the Clustering Number k

3.2.3. Treatment of Outliers

4. Results

4.1. Experimental Data

4.2. Experimental Results

5. Discussion

5.1. Clustering Result Analysis

5.2. Impact of Different Improvements on the Experimental Results

6. Conclusions

6.1. Cognitive Diagnosis Based on Clustering Results

6.2. Exploration of the Laws of Programming Education Based on Clustering Results

6.2.1. Basic Abilities Determine the Upper Limit

6.2.2. The Law of Learning Strategy Effectiveness

6.2.3. Nonlinear Growth Characteristics

6.2.4. The Theory of Process Quality Dominance

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI