Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (557)

Search Parameters:
Keywords = problem of outliers

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 10779 KiB  
Article
Digital Measurement Method for Main Arch Rib of Concrete-Filled Steel Tube Arch Bridge Based on Laser Point Cloud
by Zhiguan Huang, Chuanli Kang, Junli Liu and Hongjian Zhou
Infrastructures 2025, 10(7), 185; https://doi.org/10.3390/infrastructures10070185 - 12 Jul 2025
Viewed by 204
Abstract
Aiming to address the problem of low efficiency in the traditional manual measurement of the main arch rib components of concrete-filled steel tube (CFST) arch bridges, this study proposes a digital measurement technology based on the integration of geometric parameters and computer-aided design [...] Read more.
Aiming to address the problem of low efficiency in the traditional manual measurement of the main arch rib components of concrete-filled steel tube (CFST) arch bridges, this study proposes a digital measurement technology based on the integration of geometric parameters and computer-aided design (CAD) models. In this method, first, we perform the high-precision registration of the preprocessed scanned point cloud of the CFST arch rib components with the discretized design point cloud of the standardized CAD model. Then, in view of the fact that the fitting of point cloud geometric parameters is susceptible to the influence of sparse or uneven massive point clouds, these points are treated as outliers for elimination. We propose a method incorporating slicing to solve the interference of outliers and improve the fitting accuracy. Finally, the evaluation of quality, accuracy, and efficiency is carried out based on distance deviation analysis and geometric parameter comparison. The experimental results show that, for the experimental data, the fitting error of this method is reduced by 76.32% compared with the traditional method, which can improve the problems with measurement and fitting seen with the traditional method. At the same time, the measurement efficiency is increased by 5% compared with the traditional manual method. Full article
Show Figures

Figure 1

20 pages, 3609 KiB  
Article
Beyond the Grid: GLRT-Based TomoSAR Fast Detection for Retrieving Height and Thermal Dilation
by Nabil Haddad, Karima Hadj-Rabah, Alessandra Budillon and Gilda Schirinzi
Remote Sens. 2025, 17(14), 2334; https://doi.org/10.3390/rs17142334 - 8 Jul 2025
Viewed by 264
Abstract
The Tomographic Synthetic Aperture Radar (TomoSAR) technique is widely used for monitoring urban infrastructures, as it enables the mapping of individual scatterers across additional dimensions such as height (3D), thermal dilation (4D), and deformation velocity (5D). Retrieving this information is crucial for building [...] Read more.
The Tomographic Synthetic Aperture Radar (TomoSAR) technique is widely used for monitoring urban infrastructures, as it enables the mapping of individual scatterers across additional dimensions such as height (3D), thermal dilation (4D), and deformation velocity (5D). Retrieving this information is crucial for building management and maintenance. Nevertheless, accurately extracting it from TomoSAR data poses several challenges, particularly the presence of outliers due to uneven and limited baseline distributions. One way to address these issues is through statistical detection approaches such as the Generalized Likelihood Ratio Test, which ensures a Constant False Alarm Rate. While effective, these methods face two primary limitations: high computational complexity and the off-grid problem caused by the discretization of the search space. To overcome these drawbacks, we propose an approach that combines a quick initialization process using Fast-Sup GLRT with local descent optimization. This method operates directly in the continuous domain, bypassing the limitations of grid-based search while significantly reducing computational costs. Experiments conducted on both simulated and real datasets acquired with the TerraSAR-X satellite over the Spanish city of Barcelona demonstrate the ability of the proposed approach to maintain computational efficiency while improving scatterer localization accuracy in the third and fourth dimensions. Full article
(This article belongs to the Section Urban Remote Sensing)
Show Figures

Graphical abstract

25 pages, 646 KiB  
Article
Exponential Squared Loss-Based Robust Variable Selection with Prior Information in Linear Regression Models
by Hejun Wei, Tian Jin and Yunquan Song
Axioms 2025, 14(7), 516; https://doi.org/10.3390/axioms14070516 - 4 Jul 2025
Viewed by 171
Abstract
This paper proposes a robust variable selection method that incorporates prior information through linear constraints. For more than a decade, penalized likelihood frameworks have been the predominant approach for variable selection, where appropriate loss and penalty functions are selected to formulate unconstrained optimization [...] Read more.
This paper proposes a robust variable selection method that incorporates prior information through linear constraints. For more than a decade, penalized likelihood frameworks have been the predominant approach for variable selection, where appropriate loss and penalty functions are selected to formulate unconstrained optimization problems. However, in many specific applications, some prior information can be obtained. In this paper, we reformulate variable selection by incorporating prior knowledge as linear constraints. In addition, the loss function adopted in this paper is a robust exponential squared loss function, which ensures that the estimation of model parameter coefficient will not have a great impact when there are a few outliers in the dataset. This paper uses the designed solution algorithm to calculate the estimated values of coefficients and some other parameters, and finally conducts numerical simulations and a real-data experiment. Experimental results demonstrate that our model significantly improves estimation robustness compared to existing methods, even in outlier-contaminated scenarios. Full article
(This article belongs to the Special Issue Computational Statistics and Its Applications, 2nd Edition)
Show Figures

Figure 1

20 pages, 4964 KiB  
Article
Unsupervised Approaches to Finding Outliers in Caption-Represented Images
by Jakub Zaprzałka and Magdalena Topczewska
Entropy 2025, 27(7), 661; https://doi.org/10.3390/e27070661 - 20 Jun 2025
Viewed by 207
Abstract
Both supervised and unsupervised machine learning algorithms are often based on regression to the mean. However, the mean can easily be biased by unevenly distributed data, i.e., outlier records. Batch normalization methods address this problem to some extent, but they also influence the [...] Read more.
Both supervised and unsupervised machine learning algorithms are often based on regression to the mean. However, the mean can easily be biased by unevenly distributed data, i.e., outlier records. Batch normalization methods address this problem to some extent, but they also influence the data. In text-based data, the problem is even more pronounced, as distance distinctions between outlier records diminish with increasing dimensionality. The ultimate solution to achieving unbiased data is identifying the outliers. To address this issue, multidimensional scaling (MDS) and agglomerative-based techniques are proposed for detecting outlier records in text-based data. For both methods, two of the most common distance metrics are applied: Euclidean distance and cosine distance. Furthermore, in the MDS approach, both metric and non-metric versions of the algorithm are used, whereas in the agglomerative approach, the last-p and level cutoff techniques are applied. The methods are also compared with a raw-data-based method, which selects the most distant element from the others based on a given distance metric. Experiments were conducted on overlapping subsets of a dataset containing roughly 2000 records of descriptive image captions. The algorithms were also compared in terms of efficiency with a proposed algorithm and evaluated through human judgment based on the described images. Unsurprisingly, the cosine distance turned out to be the most effective distance metric. The metric-MDS-based algorithm appeared to outperform the others based on human evaluation. The presented algorithms successfully identified outlier records. Full article
(This article belongs to the Collection Entropy in Image Analysis)
Show Figures

Figure 1

16 pages, 616 KiB  
Article
Bayesian Quantile Regression for Partial Functional Linear Spatial Autoregressive Model
by Dengke Xu, Shiqi Ke, Jun Dong and Ruiqin Tian
Axioms 2025, 14(6), 467; https://doi.org/10.3390/axioms14060467 - 16 Jun 2025
Viewed by 249
Abstract
When performing Bayesian modeling on functional data, the assumption of normality is often made on the model error and thus the results may be sensitive to outliers and/or heavy tailed data. An important and good choice for solving such problems is quantile regression. [...] Read more.
When performing Bayesian modeling on functional data, the assumption of normality is often made on the model error and thus the results may be sensitive to outliers and/or heavy tailed data. An important and good choice for solving such problems is quantile regression. Therefore, this paper introduces the quantile regression into the partial functional linear spatial autoregressive model (PFLSAM) based on the asymmetric Laplace distribution for the errors. Then, the idea of the functional principal component analysis, and the hybrid MCMC algorithm combining Gibbs sampling and the Metropolis–Hastings algorithm are developed to generate posterior samples from the full posterior distributions to obtain Bayesian estimation of unknown parameters and functional coefficients in the model. Finally, some simulation studies show that the proposed Bayesian estimation method is feasible and effective. Full article
Show Figures

Figure 1

19 pages, 2554 KiB  
Article
Research on an Automated Cleansing and Function Fitting Method for Well Logging and Drilling Data
by Wan Wei
Processes 2025, 13(6), 1891; https://doi.org/10.3390/pr13061891 - 14 Jun 2025
Viewed by 359
Abstract
Oilfield data is characterized by complex types, large volumes, and significant noise interference, so data cleansing has become a key procedure for improving data quality. However, the traditional data cleansing process needs to deal with multiple types of problems, such as outliers, duplicate [...] Read more.
Oilfield data is characterized by complex types, large volumes, and significant noise interference, so data cleansing has become a key procedure for improving data quality. However, the traditional data cleansing process needs to deal with multiple types of problems, such as outliers, duplicate data, and missing values in turn, and the processing steps are complex and inefficient. Therefore, an integrated data cleansing and function fitting method is established. The fine-mesh data density analysis method is utilized to cleanse outliers and duplicate data, and the automated segmented fitting method is used for missing data imputation. For the real-time data generated during drilling or well logging, data cleansing is realized through grid partitioning and data density analysis, and the cleansing ratio is controlled by data density threshold and grid spacing. After data cleansing, based on similar standards, the cleansed data is segmented, and the fitting function type of each segment is determined to fill in the missing data, and data outputs with any frequency can be obtained. For the analysis of the hook load data measured by sensors at the drilling site and obtained from rig floor monitors or remote centers, the data cleansing percentage reaches 98.88% after two-stage cleansing, which still retains the original trend of the data. After data cleansing, the cleansed data are modeled through the automated segmented fitting method, with Mean Absolute Percentage Errors (MAPEs) less than 3.66% and coefficient of determination (R2) values greater than 0.94. Through the integrated data processing mechanism, the workflow can synchronously eliminate outliers and redundant data and fill in the missing values, thereby dynamically adapting to the data requirements of numerical simulation and intelligent analysis and significantly improving the efficiency of on-site data processing and decision-making reliability in the oilfield. Full article
(This article belongs to the Special Issue Modeling, Control, and Optimization of Drilling Techniques)
Show Figures

Figure 1

27 pages, 15957 KiB  
Article
DataMatrix Code Recognition Method Based on Coarse Positioning of Images
by Lingyue Hu, Guanbin Zhong, Zhiwei Chen and Zhong Chen
Electronics 2025, 14(12), 2395; https://doi.org/10.3390/electronics14122395 - 12 Jun 2025
Viewed by 361
Abstract
A DataMatrix (DM) code is an automatic identification barcode based on a combination of coding and image processing. Traditional DM code sampling methods are mostly based on simple segmentation and sampling of a DM code. However, the obtained DM code images often have [...] Read more.
A DataMatrix (DM) code is an automatic identification barcode based on a combination of coding and image processing. Traditional DM code sampling methods are mostly based on simple segmentation and sampling of a DM code. However, the obtained DM code images often have problems such as wear, corrosion, geometric distortion, and strong background interference in practical scenarios. To improve decoding ability in complex environments, a DM code recognition method based on coarse positioning of images is proposed. The two-dimensional barcode is first converted into a one-dimensional waveform using a projection algorithm. Then, the spacing between segmentation lines is predicted and corrected using an exponential weighted moving average model for adaptive grid division. Finally, the local outlier factor algorithm and local weighted linear regression algorithm are applied to predict and binarize the gray level values, converting the DM code image into a data matrix. The experimental results show that this method effectively handles problems like blurring, wear, corrosion, distortion, and background interference. Compared to popular DM decoding libraries like libdmtx and zxing, it demonstrates better resolution, noise resistance, and distortion tolerance. Full article
Show Figures

Figure 1

14 pages, 1003 KiB  
Article
A Linear Fitting Algorithm Based on Modified Random Sample Consensus
by Yujin Min, Yun Tang, Hao Chen and Faquan Zhang
Appl. Sci. 2025, 15(11), 6370; https://doi.org/10.3390/app15116370 - 5 Jun 2025
Viewed by 394
Abstract
When performing linear fitting on datasets containing outliers, common algorithms may face problems like inadequate fitting accuracy. We propose a linear fitting algorithm based on Locality-Sensitive Hashing (LSH) and Random Sample Consensus (RANSAC). Our algorithm combines the efficient similarity search capabilities of the [...] Read more.
When performing linear fitting on datasets containing outliers, common algorithms may face problems like inadequate fitting accuracy. We propose a linear fitting algorithm based on Locality-Sensitive Hashing (LSH) and Random Sample Consensus (RANSAC). Our algorithm combines the efficient similarity search capabilities of the LSH algorithm with the robust fitting mechanism of RANSAC. With proper hash functions designed, similar data points are mapped to the same hash bucket, thereby enabling the efficient identification and removal of outliers. RANSAC is then used to fit the model parameters of the processed dataset. The optimal parameters for the linear model are obtained after multiple iterative processes. This algorithm significantly reduces the influence of outliers on the dataset, resulting in improved fitting accuracy and enhanced robustness. Experimental results demonstrate that the proposed improved RANSAC linear fitting algorithm outperforms the Weighted Least Squares, traditional RANSAC, and Maximum Likelihood Estimation methods, achieving a reduction in the sum of squared residuals by 29%, 16%, and 8%, respectively. Full article
Show Figures

Figure 1

23 pages, 2623 KiB  
Article
An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases
by Yan Wu, Xiao Lin, Haojie Lian and Zili Zhang
Mathematics 2025, 13(11), 1877; https://doi.org/10.3390/math13111877 - 4 Jun 2025
Viewed by 356
Abstract
Some knowledge bases (KBs) extracted from Wikipedia articles can achieve very high average precision values (over 95% in DBpedia). However, subtle mistakes including inconsistencies, outliers, and erroneous relations are usually ignored in the construction of KBs by extraction rules. Automatic detection and correction [...] Read more.
Some knowledge bases (KBs) extracted from Wikipedia articles can achieve very high average precision values (over 95% in DBpedia). However, subtle mistakes including inconsistencies, outliers, and erroneous relations are usually ignored in the construction of KBs by extraction rules. Automatic detection and correction of these subtle errors is important for improving the quality of KBs. In this paper, an inductive logic programming with exceptional information (EILP) is proposed to automatically detect errors in large knowledge bases (KBs). EILP leverages the exceptional information problems that are ignored in conventional rule-learning algorithms such as inductive logic programming (ILP). Furthermore, an inductive logical correction method with exceptional features (EILC) is proposed to automatically correct these mistakes by learning a set of correction rules with exceptional features, in which respective metrics are provided to validate the revised triples. The experimental results demonstrate the effectiveness of EILP and EILC in detecting and repairing large knowledge bases, respectively. Full article
Show Figures

Figure 1

20 pages, 2009 KiB  
Article
A Novel Robust Test to Compare Covariance Matrices in High-Dimensional Data
by Hasan Bulut
Axioms 2025, 14(6), 427; https://doi.org/10.3390/axioms14060427 - 30 May 2025
Cited by 1 | Viewed by 437
Abstract
The comparison of covariance matrices is one of the most important assumptions in many multivariate hypothesis tests, such as Hotelling T2 and MANOVA. The sample covariance matrix, however, is singular in high-dimensional data when the variable number (p) is greater [...] Read more.
The comparison of covariance matrices is one of the most important assumptions in many multivariate hypothesis tests, such as Hotelling T2 and MANOVA. The sample covariance matrix, however, is singular in high-dimensional data when the variable number (p) is greater than the sample size (n). Therefore, its determinant is zero, and its inverse cannot be calculated. Although many studies addressing this problem are discussed in the Introduction Section, they have not focused on outliers in datasets. In this study, we propose a test statistic that can be used on high-dimensional datasets without being affected by outliers. There is no distributional assumption because our proposed test is permutational. We investigate the performance of the proposed test based on simulation studies and real example data. In all cases, our proposed test demonstrates good type-1 error control, power, and robustness. Additionally, we have constructed an R function and added it to the “MVTests” package. Therefore, our proposed test can be performed easily on real datasets. Full article
(This article belongs to the Special Issue Computational Statistics and Its Applications, 2nd Edition)
Show Figures

Figure 1

25 pages, 1528 KiB  
Article
A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points
by Suyu Wang, Quan Yue, Zhenlei Xu, Peihong Qiao, Zhentao Lyu and Feng Gao
Mathematics 2025, 13(11), 1738; https://doi.org/10.3390/math13111738 - 24 May 2025
Viewed by 645
Abstract
Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We [...] Read more.
Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We propose a novel cooperative Multi-Agent Reinforcement Learning (MARL) algorithm based on MADDPG, termed MACPH, which innovatively incorporates three mechanisms: a Composite Experience Replay Buffer (CERB) mechanism that balances recent and important historical experiences through a dual-buffer structure and mixed sampling; an Adaptive Parameter Space Noise (APSN) mechanism that perturbs actor network parameters and dynamically adjusts the perturbation intensity to achieve coherent and state-dependent exploration; and a Huber loss function mechanism to mitigate the impact of outliers in Temporal Difference errors and enhance training stability. The study was conducted in standard and non-stationary navigation and communication task scenarios. Ablation studies confirmed the positive contributions of each component and their synergistic effects. In non-stationary scenarios featuring abrupt environmental changes, experiments demonstrate that MACPH outperforms baseline algorithms such as DDPG, MADDPG, and MATD3 in terms of reward performance, adaptation speed, learning stability, and robustness. The proposed MACPH algorithm offers an effective solution for multi-agent reinforcement learning applications in complex non-stationary environments. Full article
(This article belongs to the Special Issue Application of Machine Learning and Data Mining, 2nd Edition)
Show Figures

Figure 1

25 pages, 5202 KiB  
Article
Hybrid Adaptive Sheep Flock Optimization and Gradient Descent Optimization for Energy Management in a Grid-Connected Microgrid
by Sri Harish Nandigam, Krishna Mohan Reddy Pothireddy, K. Nageswara Rao and Surender Reddy Salkuti
Designs 2025, 9(3), 63; https://doi.org/10.3390/designs9030063 - 16 May 2025
Viewed by 1111
Abstract
Distributed generation has emerged as a viable solution to supplement traditional grid problems and lessen their negative effects on the environment worldwide. Nevertheless, distributed generation issues are unpredictable and intermittent and impede the power system’s ability to operate effectively. Moreover, the problems associated [...] Read more.
Distributed generation has emerged as a viable solution to supplement traditional grid problems and lessen their negative effects on the environment worldwide. Nevertheless, distributed generation issues are unpredictable and intermittent and impede the power system’s ability to operate effectively. Moreover, the problems associated with outliers and denial of service (DoS) attacks hinder energy management. Therefore, efficient energy management in grid-connected microgrids is critical to ensure sustainability, cost efficiency, and reliability in the presence of uncertainties, outliers, denial-of-service attacks, and false data injection attacks. This paper proposes a hybrid optimization approach that combines adaptive sheep flock optimization (ASFO) and gradient descent optimization (GDO) to address the challenges of energy dispatch and load balancing in MG. The ASFO algorithm offers robust global search capabilities to explore complex search spaces, while GDO safeguards precise local convergence to optimize the dispatch schedule and energy cost and maximize renewable energy utilization. The hybrid method ASFOGDO leverages the strengths of both algorithms to overcome the limitations of standalone approaches. Results demonstrate the efficiency of the proposed hybrid algorithm, achieving substantial improvements in energy efficiency and cost reduction compared to traditional methods like interior point optimization, gradient descent, branch and bound, and a population-based algorithm named Golden Jackal optimization. In case 1, the overall cost in scenario 1 and scenario 2 was reduced from 1620.4 rupees to 1422.84 rupees, whereas, in case 2, the total cost was reduced from 12,350 rupees to 12,017 rupees with the proposed hybrid ASFOGDO algorithm. Further, a detailed impact of attacks and outliers on scheduling, operational cost, and reliability of supply is presented in case 3. Full article
Show Figures

Figure 1

24 pages, 3404 KiB  
Article
Commonness and Inconsistency Learning with Structure Constrained Adaptive Loss Minimization for Multi-View Clustering
by Kai Zhang, Kehan Kang, Yang Bai and Chong Peng
Electronics 2025, 14(9), 1847; https://doi.org/10.3390/electronics14091847 - 1 May 2025
Viewed by 392
Abstract
Subspace clustering has emerged as a prominent research focus, demonstrating remarkable potential in handling multi-view data by effectively harnessing their diverse and information-rich features. In this study, we present a novel framework for multi-view subspace clustering that addresses several critical aspects of the [...] Read more.
Subspace clustering has emerged as a prominent research focus, demonstrating remarkable potential in handling multi-view data by effectively harnessing their diverse and information-rich features. In this study, we present a novel framework for multi-view subspace clustering that addresses several critical aspects of the problem. Our approach introduces three key innovations: First, we propose a dual-component representation model that simultaneously considers both consistent and inconsistent elements across different views. The consistent component is designed to capture shared structural patterns with robust commonality, while the inconsistent component effectively models view-specific variations through sparsity constraints across multiple modes. Second, we implement cross-mode sparsity constraints that enable the inconsistent component to efficiently extract high-order information from the data. This design not only enhances the representation capability of the inconsistent component but also facilitates the consistent component in revealing high-order structural relationships within the data. Third, we develop an adaptive loss function that offers greater flexibility in handling noise and outliers, thereby significantly improving the model’s robustness in real-world applications. Through extensive experimentation, we demonstrate that our proposed method consistently outperforms existing approaches, achieving superior clustering performance across various benchmark datasets. The experimental results comprehensively validate the effectiveness and advantages of our approach in terms of clustering accuracy, robustness, and computational efficiency. Full article
Show Figures

Figure 1

15 pages, 2965 KiB  
Article
A Fast Proximal Alternating Method for Robust Matrix Factorization of Matrix Recovery with Outliers
by Ting Tao, Lianghai Xiao and Jiayuan Zhong
Mathematics 2025, 13(9), 1466; https://doi.org/10.3390/math13091466 - 29 Apr 2025
Viewed by 287
Abstract
This paper concerns a class of robust factorization models of low-rank matrix recovery, which have been widely applied in various fields such as machine learning and imaging sciences. An 1-loss robust factorized model incorporating the 2,0-norm regularization [...] Read more.
This paper concerns a class of robust factorization models of low-rank matrix recovery, which have been widely applied in various fields such as machine learning and imaging sciences. An 1-loss robust factorized model incorporating the 2,0-norm regularization term is proposed to address the presence of outliers. Since the resulting problem is nonconvex, nonsmooth, and discontinuous, an approximation problem that shares the same set of stationary points as the original formulation is constructed. Subsequently, a proximal alternating minimization method is proposed to solve the approximation problem. The global convergence of its iterate sequence is also established. Numerical experiments on matrix completion with outliers and image restoration tasks demonstrate that the proposed algorithm achieves low relative errors in shorter computational time, especially for large-scale datasets. Full article
Show Figures

Figure 1

19 pages, 2183 KiB  
Article
Methods for Cognitive Diagnosis of Students’ Abilities Based on Keystroke Features
by Xu Chi, Xinyu Guo and Yu Sheng
Appl. Sci. 2025, 15(9), 4783; https://doi.org/10.3390/app15094783 - 25 Apr 2025
Viewed by 331
Abstract
Keystroke data contain the behavioral information of students during the programming process. The clustering analysis of keystroke data can classify students based on specific characteristics in the programming process, thereby providing a basis for personalized teaching. Research combined with keystroke features is still [...] Read more.
Keystroke data contain the behavioral information of students during the programming process. The clustering analysis of keystroke data can classify students based on specific characteristics in the programming process, thereby providing a basis for personalized teaching. Research combined with keystroke features is still in its initial stage. Due to the independence and discreteness of keystroke data, and the lack of a clear requirement for the selection of the number of clusters in traditional clustering algorithms, this selection is rather arbitrary, and outliers will affect the clustering effect. Aiming at the above problems, we improve the original method. Keystroke data were used to obtain students’ programming behavior information and optimize the traditional clustering algorithm according to the characteristics of keystroke data. The K-means++ algorithm was adopted to determine the initial clustering centers, the elbow method was used to determine the number of clusters, and an outlier processing algorithm was introduced. We have independently constructed a keystroke dataset for computer-based programming examinations and used it to verify our method. Moreover, the improved algorithm has shown improvements in multiple evaluation indicators. Experiments have proven that the method proposed in this paper can more accurately classify students’ proficiency levels in the evaluation of students’ programming abilities in the educational field. This provides strong support for the formulation of teaching strategies and the allocation of resources, and the method possesses important application value and practical significance. Full article
Show Figures

Figure 1

Back to TopTop