1. Introduction
In modern concrete engineering, the accurate identification of the vibration state is critical for ensuring structural safety and durability, directly dictating the density and homogeneity of the material [
1]. Traditional construction predominantly relies on manual experience, which often leads to internal porosity defects like honeycombs due to “under-vibration” or aggregate segregation caused by “over-vibration” [
2]. In tunnel engineering, vault cavities are common defects closely related to insufficient pouring compactness [
3]. Consequently, substantial research has been conducted: Yang et al. identified a linear relationship between vibration acceleration and porosity [
4]; Zheng et al. proposed biaxial vibration mixing to improve overall concrete quality [
5]. However, the intrinsic lack of self-perception in standard equipment often causes diagnostic accuracy to plummet under variable speeds [
6]. More critically, the harsh construction environment and high-speed operation make vibrating rods prone to mechanical fatigue during frequent idling or collisions with rigid molds, resulting in uncontrollable quality and high equipment depreciation.
The convergence of machine learning and civil engineering has shifted monitoring from “post-sampling inspection” to “real-time early warning.” Within damage detection and state assessment, techniques such as YOLOv7-based crack identification [
7], precision localization algorithms [
8], deep learning-driven mechanical damage assessment [
9], and genetic algorithm-based state forecasting [
10] have demonstrated significant potential. To sharpen feature extraction, the integration of STFT with SVM has enabled damage detection [
11], and multi-task fusion networks have been developed for complex scenes [
12]. Furthermore, real-time monitoring coupled with finite element modeling [
13,
14] and nonlinear dynamics modeling [
15] has established a foundation for addressing complex systems. For small-sample scenarios, lightweight diagnosis frameworks have exhibited high fidelity [
16,
17], while navigation algorithms for dynamic environments offer insights into sensor motion compensation [
18]. Nevertheless, the computational intensity of these models often precludes their millisecond-level execution on micro-nodes embedded within vibrating rods, and research regarding active protection mechanisms for equipment safety remains remarkably scarce.
The emergence of convolutional neural networks (CNN) and Kolmogorov–Arnold Networks (KAN) offers a promising trajectory for real-time identification in embedded environments. CNNs, characterized by weight-sharing mechanisms, have excelled in tool wear identification [
19] and power supply risk assessment [
20]. To further enhance the recognition precision for internal impact damage [
21] and vibration response parameters [
22] in concrete, KAN algorithms utilize learnable nonlinear activation functions to achieve superior fitting precision. In practical engineering, both follow standardized workflows to adapt to construction needs [
23]. CNNs capture local features and filter noise efficiently [
24,
25], achieving recognition accuracies of up to 99.43% under complex benchmarks [
26]. While stacking technologies [
27], hybrid predictions [
28], and kernel-method-optimized classifiers [
29,
30] further bolster performance, identifying transitional states with blurred physical boundaries—such as “compacted” versus “over-vibrated”—remains a formidable challenge for lightweight models attempting to capture subtle damping shifts caused by liquefaction.
In summary, to bridge the gap in intelligent equipment for “real-time quality monitoring” and “active asset protection,” this study, inspired by robust identification strategies for complex environments [
31], presents the development of an integrated intelligent vibration equipment featuring an onboard edge computing unit. This work achieves equipment “self-perception” through embedded sensing arrays and control modules; proposes a CNN-KAN hybrid architecture designed to enhance nonlinear mapping precision under constrained computational budgets; and establishes a millisecond-level active protection mechanism. By identifying hazardous conditions in real-time, the system effectively mitigates operational wear and extends the service life of the equipment, providing a robust technical foundation for the intelligent upgrading of construction machinery.
3. Concrete Vibration State Recognition Algorithm
3.1. Basic Principles of KAN
The Kolmogorov–Arnold Network (KAN) is a neural network model constructed based on the Kolmogorov–Arnold representation theorem. This theorem mathematically proves that any multivariable function can be represented as a combination of a finite number of single-variable functions. Its general mathematical expression is defined as:
Among them is a one-dimensional function, and is a composite function.
Unlike traditional multilayer perceptrons, which place weight parameters on the network ‘nodes,’ the core innovation of KAN is moving the learnable nonlinear activation functions to the network ‘edges’.
Its typical network structure is shown in
Figure 3. The network removes the originally fixed activation functions at the nodes and instead allows each edge to carry a learnable univariate function
(usually parameterized using B-spline curves). In traditional neurons, inputs are weighted by w and summed before passing through a fixed activation function
; in KAN units, the input signal undergoes a functional transformation directly on the edge, and nodes only perform simple summation operations. This theoretically provides a more efficient, parameter-reduced method for function approximation. This high parameter efficiency significantly alleviates the reliance on large-scale memory, making KAN particularly suitable for deployment on resource-constrained edge computing platforms (such as the Raspberry Pi) embedded within the vibration equipment.
Univariate functions in KAN are usually parameterized using B-spline curves, which are easy to optimize. The mathematical definition of a B-spline is:
In the formula, is the trainable control vertex, and is the corresponding B-spline basis function.
The computation at each layer in the network can be represented as:
Among them
is a matrix composed of all the activation functions in that layer. Consequently, the end-to-end output of the entire network is reflected as the composite operation of multiple layers of functions:
By introducing learnable functions on the connections, combined with regularization and structural optimization, KAN can accurately depict highly complex nonlinear mapping relationships with an extremely small number of parameters, enabling it to handle complex modeling scenarios with multi-feature input very well.
3.2. Basic Principles of CNN
Convolutional neural networks (CNNs) possess significant advantages in processing temporally correlated vibration signals due to their unique local perception and weight-sharing mechanisms. The CNN module constructed in this paper is mainly responsible for extracting deep features from the raw waveform, and its network structure is shown in
Figure 4.
In the feature extraction stage, the convolution kernel captures the local temporal evolution characteristics of the signal by performing cross-correlation operations with the input signal. For the
j-th feature map of the l-th layer, the mathematical calculation process of its output
is as follows:
Among them is the feature map input from the previous layer, is the learnable convolution kernel weight, is the bias term, and the symbol × denotes the one-dimensional convolution operation.
In order to adapt to the inherently non-stationary characteristics of concrete vibration signals, this paper provides targeted approaches in the convolutional layer parameters. The first convolutional layer uses 13 convolution kernels of size three, aiming to first capture the basic shape characteristics in the original waveform; the second convolutional layer uses 23 convolution kernels of size two, to extract the detailed features of high-frequency fluctuations in the signal with the help of a more refined receptive field.
In order to ensure the stability of model training and accelerate convergence efficiency, a Batch Normalization (BN) layer is introduced after the convolution operation. By centering and scaling each batch of data, the computational logic of the BN layer is:
Here, and are the mean and variance, respectively. The normalized signal is then mapped through the ReLU activation function . This design not only enhances the nonlinear expressive capability of the network but also, by utilizing its sparse activation characteristics, alleviates the problem of gradient vanishing in deep networks to some extent.
The pooling layer reduces the spatial dimensions through a down-sampling operation, thereby decreasing the number of model parameters and improving its robustness to signal shifts. This architecture adopts a max-pooling strategy to extract the maximum activation within a specific sliding window Ω:
Among them is the activation value of the neuron within the pooling window. After multiple layers of convolution and pooling iterations, the original vibration signal is transformed into a highly condensed feature vector, which is finally connected to the classification module through a global average pooling layer.
3.3. Data Preprocessing
3.3.1. Data Augmentation
The original vibration waveforms measured on construction sites often have long time-series characteristics. Directly inputting them all into the model for processing can result in significant overhead. To address this situation, this study uses the sliding window method to segment the original signals. This technique uses a fixed-length window and step size to set the segmentation, ensuring that the continuity of the physical characteristics of the signal is not interrupted, transforming the long time series into multiple short samples of equal length. The specific segmentation process is shown in
Figure 5.
Sliding window augmentation not only significantly expands the sample size, effectively mitigating the risk of overfitting that may arise from data scarcity, but also enhances the model’s robustness to signal time-shift characteristics, laying a data foundation for the stable training of subsequent deep neural networks.
The signal segments after sliding window augmentation are the only data source for all subsequent algorithms. CNN-KAN and CNN directly extract the original temporal features within the segments, while KAN further needs to transform these segments into statistical feature vectors.
3.3.2. Feature Extraction and Selection Strategy
To verify the classification performance of different algorithm architectures, this study designed a dedicated feature extraction process for the pure KAN model. Since the original vibration signals have high dimensionality and strong non-stationarity, directly inputting them into the KAN model would create tremendous computational pressure. Therefore, this study first converts the complex waveform signals into representative low-dimensional numerical indicators through feature extraction methods.
This study processes the original signals from both time and frequency domains, initially extracting 20 initial features, including variance, root mean square value, peak factor, and energy distribution in specific frequency bands. For specific feature names and calculation formulas, see
Table 1.
Considering that some redundant or weakly relevant indicators may interfere with the function approximation effect of the KAN model, this paper designs a multi-criteria evaluation mechanism to screen features. Using the Pearson correlation coefficient to measure the degree of association between each feature and the concrete vibration state, we then introduce variance analysis to quantify the information richness of each feature under different working conditions, thereby assessing its effectiveness as a classification criterion. Based on the comprehensive scores and a reasonable threshold, the top-performing core features are ultimately refined. This approach not only reduces data dimensionality while ensuring the model focuses on key physical features but also enhances the sensitivity of the system to hazardous operational signals, such as ground impacts and high-frequency idling, thereby laying a robust foundation for the equipment’s active protection logic.
3.3.3. Data Normalization
In order to prevent differences brought by different physical dimensions from affecting the model training process, and to allow the loss function to converge faster, Min–Max normalization was applied to all input feature data, scaling them to the [0, 1] range:
The normalization method eliminates differences in magnitude while fully preserving the original distribution of the signal itself, which is conducive to improving the numerical stability of the model during backpropagation.
3.4. Concrete Vibration State Recognition Algorithm Proposed in This Paper
To bridge the gap between construction quality control and active equipment protection, this paper proposes a hybrid recognition framework that leverages the feature extraction strength of CNNs and the precise nonlinear mapping of Kolmogorov–Arnold Networks (KAN). Considering that the sensor provides long-term signals, direct end-to-end learning in KAN could lead to computational overhead on edge nodes. Therefore, this framework utilizes a CNN front end to refine deep features, which are then processed by a KAN-based classification head to achieve millisecond-level decision-making. The overall structure is shown in
Figure 6.
The model abandons the traditional fully connected approach at the end of CNNs and instead incorporates a KAN module based on the Kolmogorov–Arnold representation theorem as the classification core. The CNN front end extracts deep feature vectors directly from raw vibration time-series signals through multilayer 1D convolution and pooling operations. The backend KAN module utilizes its internally learnable B-spline basis functions to perform curve connection and mapping on the deep features output by the CNN front end. Compared to traditional architectures, CNN_KAN employs learnable activation functions on the “edges,” enabling a more detailed depiction of high-order nonlinear relationships within the data. This cascaded approach significantly reduces the number of model parameters while effectively improving the model’s classification accuracy for working conditions with similar morphologies, such as “over-oscillation” and “dense” states.
To optimize the decision accuracy of multi-class classification tasks, this algorithm employs the cross-entropy loss function, whose mathematical expression is as follows:
Here, N represents the number of samples, C represents the number of classes, Y is the one-hot encoding of the true labels, and P represents the predicted probability distribution of each class by the model. For parameter optimization, the Adam optimizer is used to ensure fast convergence. The final output of the CNN-KAN model is integrated into the equipment’s control logic: when the predicted probability indicates a hazardous state (e.g., ground impact or idle vibration), the system triggers the industrial-grade alarm module immediately. This closed-loop design ensures that the equipment can autonomously mitigate mechanical wear while maintaining high-precision monitoring of the concrete vibration state.
4. Analysis of Experimental Results
4.1. Experimental Setup
To enable the experimental data to more accurately reflect the nonlinear variation characteristics of vibration signals in complex construction environments, all data collection for this study was conducted at actual concrete pouring sites. As shown in
Figure 7, during the experiment, the high-precision wireless triaxial accelerometer was securely fastened to the end of the vibrator using straps. This installation method effectively prevents signal attenuation during transmission and ensures the sensor can move synchronously with the vibration source.
Addressing the complex electromagnetic environment caused by high-density steel reinforcement mesh and high-power motors at construction sites, the experiment selected sensor nodes with strong anti-interference capabilities. For specific parameter configurations, please refer to
Table 2.
The sampling frequency was set to 200 Hz to balance signal fidelity with computational efficiency, ensuring that the embedded Raspberry Pi unit can acquire high-quality vibration data without causing buffer overflows. Beyond the hardware configuration, the software ecosystem of the intelligent system was implemented entirely in Python. Model training and evaluation were conducted using the PyTorch 2.2.2 and scikit-learn libraries. For hardware-level execution, the lgpio and spidev libraries were employed to manage sensor data acquisition, while PostgreSQL (via psycopg2) provided a robust local persistence layer for real-time data storage. The monitoring and feedback interface was constructed using the Flask and SocketIO frameworks to ensure low-latency data visualization and instantaneous acoustic alerting during the construction process.
4.2. Dataset Construction
4.2.1. State Definitions and Classifications
The data labeling standards for this experiment will strictly adhere to the physical evolution mechanisms of concrete vibration dynamics. To facilitate subsequent model input and result analysis, the experimental data are now categorized into the vibratory loading dataset DATASET1 and the no-load vibration dataset DATASET2 and provided symbolic definitions for the specific physical states under each set.
For the vibration operation condition dataset DATASET1, the logic for discrimination involves observing the envelope evolution of the signal amplitude and whether there are sudden changes in its frequency components. Define the idle reference state where the vibrator rod does not contact the concrete as D1. When the vibrator is inserted into the concrete, but the vibration time is insufficient, the signal amplitude is significantly suppressed due to strong dry friction between coarse aggregates. This under-vibrated condition is marked as D2. As the vibration continues, air bubbles are expelled from the concrete, and the slurry rises to the surface. When energy transfer efficiency reaches its optimal state, this degree of compaction is defined as D3. If the vibration time is too long, concrete segregation occurs, with aggregate settling reducing damping. The waveform exhibits low-damping, high-amplitude “liquefaction” oscillation characteristics. This over-damped state is classified as D4.
For the DATASET2 of empty vibration conditions, classification is based on the characteristics of the acceleration vector distribution derived from Newton’s Second Law. When the vibrator is placed horizontally, the acceleration due to the vertical component of gravity (Y-axis) approaches zero. This horizontal no-load vibration condition is denoted as F1. When the vibrator is suspended vertically, the signal exhibits periodic sinusoidal oscillations dominated by gravity. This vertical free-vibration state is denoted as F2. When the vibrator head intermittently strikes the ground, the instantaneous bearing capacity and rigid collision of the ground cause the signal to contain numerous non-steady spike pulses, with amplitude fluctuating violently. This ground-contact and no-load vibration state is denoted as F3.
This study first anchors the effective time intervals for each state based on the previously mentioned dynamic characteristics. Subsequently, a continuous stable waveform segment of 2 s in length (totaling 800 sampling points) is extracted to serve as the raw physical baseline for subsequent model training.
4.2.2. Construction of Time-Series Datasets
To accommodate the input requirements of CNN and CNN-KAN models and overcome the issue of overfitting on small samples, the experiment employed sliding window technology to augment the benchmark data. The experiment was set with a window width of 300 samples and a sliding step size of five samples for cyclic sampling. This process constructs raw time-series datasets directly applicable for deep learning model training. The details of the sample distribution are shown in
Table 3.
4.2.3. Feature Set Construction
In the comparative experiment, the KAN model cannot directly process the raw triaxial time-series signals due to its network architecture. Therefore, this study performed feature transformation on the samples obtained through sliding window sampling. First, feature extraction is performed on the original signal in both the time domain and the frequency domain. A preliminary feature pool comprising 20 initial features, including variance, root mean square value, peak factor, and energy in specific frequency bands, has been established. The specific names and mathematical definitions of the indicators are detailed in
Table 1. To quantify the representational capability of each feature for different operating conditions and optimize computational efficiency, a dual quantitative assessment of characteristic efficacy was conducted using Pearson’s correlation coefficient and analysis of variance.
During the specific screening process, the study conducted a quantitative analysis of the performance of each feature across two independent recognition tasks. Taking the task of identifying vibration states as an example,
Figure 8 shows the comprehensive correlation score ranking between the initial features and the four states of concrete. From the distribution pattern of the data in the figure, it can be observed that the root mean square value and skewness demonstrate distinct advantages in characterizing signal energy evolution and waveform asymmetry. These indicators, positioned at the top of the scoring table, have become key physical criteria for distinguishing between conditions with particularly similar physical characteristics, such as dense compaction and overcompaction.
For the task of identifying types of no-load vibration,
Figure 9 completely documents the specific process of feature selection and the distribution characteristics of scores. When processing touchdown signals with transient impacts, frequency-domain metrics such as frequency center of gravity prove to be of paramount importance, carrying extremely high discriminative weight. This indicates that the metric can sensitively capture spectral fluctuations caused by random collisions. Thus, from a dynamic perspective, this provides crucial support for the detailed classification of complex no-load vibration states.
By comparison, it can be seen that despite minor differences between tasks, core metrics such as root mean square value, skewness, and frequency center of gravity all exhibit high discriminative weights across both task categories. Based on these results, the experiment ultimately selected seven of the most representative core features from the initial 20 indicators to form the input vector. This operation reduces the feature dimension by 65% while preserving over 95% of the critical physical information, thereby constructing a feature dataset specifically tailored for KAN model training, as detailed in
Table 4.
4.3. Evaluation Criteria
To quantitatively evaluate the performance of the CNN-KAN model in concrete vibration state recognition across multiple dimensions, this paper selects accuracy, precision, recall, and F1-score as core evaluation metrics. These metrics collectively form a comprehensive evaluation framework that assesses the model’s overall performance, prediction accuracy, classification completeness, and robustness under category imbalance. The relevant performance evaluation metrics and their mathematical expressions are as follows:
where TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively.
The accuracy metric measures the proportion of correct predictions made by the model across the entire dataset. Precision focuses on quantifying the proportion of true positives among samples predicted as positive, thereby reflecting the accuracy of the model’s decision-making. Recall focuses on the proportion of true positives that are correctly retrieved, serving as a measure of the system’s identification coverage capability. Considering that data from construction sites may exhibit characteristics of uneven category distribution, this paper introduces the F1-score as the harmonic mean of precision and recall to comprehensively evaluate a model’s classification robustness under complex operating conditions.
In addition to the quantitative indicators mentioned above, this experiment also incorporates multiple visualizations for in-depth analysis. The experiment uses ROC curves to depict the relationship between true positive rate and false positive rate at different thresholds, serving to evaluate the model’s overall classification performance. The confusion matrix visually illustrates the misclassification patterns across different categories. Additionally, experiments can compare the convergence speed, training stability, and potential overfitting risks of different algorithmic architectures by examining the variation in loss functions and accuracy rates.
In the evaluation process, all experimental training data are divided into training and test sets according to established standards to ensure the objectivity and comparability of experimental results under equivalent benchmarks.
4.4. Comparative Experiment
To systematically validate the performance of the CNN-KAN model, comparative experiments were conducted on two independent datasets: a vibration test dataset and a no-load vibration test dataset. The experiment conducts a comparative evaluation of the proposed CNN-KAN against CNN and KAN. The following sections will detail model configurations, data characteristics, and specific recognition results across different experimental scenarios.
4.4.1. Concrete Vibration Condition Test
The core objective of this experiment is to verify whether the proposed model can accurately identify the four key states of concrete vibration compaction: empty vibration, under-vibration, vibration compaction, and over-vibration. For this task, this paper constructs the corresponding comparison sequences. The specific parameter configurations and hyperparameter settings for each model are shown in
Table 5.
Based on the configuration scheme, the three models exhibit distinct mapping logic when processing vibration signals. The CNN module employs convolution operators to achieve hierarchical dimensionality reduction in the original sequence and capture local features. The KAN module leverages the approximation advantages of B-spline functions to decouple complex high-order nonlinear relationships within the feature space. Given this situation, CNN-KAN employs a tandem coupling approach to construct a classification system that progresses from automatic feature refinement to nonlinear discrimination.
The raw vibration waveforms for each vibration state are shown in
Figure 10. The evolution of the raw vibration waveform across four states can be clearly observed: from the regular oscillations of the empty state D1, to the suppressed low amplitude of the under-vibrated state D2, then to the enhanced energy of the compacted state D3, and finally to the high-frequency oscillations of the over-vibrated state D4. The core challenge lies in the fact that the waveforms of dense D3 and over-vibrated D4 appear remarkably similar in later stages, like twins, making it easy for traditional simple classification methods to misidentify them. The proposed CNN-KAN architecture utilizes front-end convolutional layers to extract local features, combined with the B-spline fitting capability of the back-end KAN, effectively capturing minute damping and frequency variations. This overcomes the bottleneck where linear classifiers struggle to distinguish subtle differences.
This design advantage is particularly evident in the model’s convergence. As shown in the training and testing loss curves recorded in
Figure 11, the CNN-KAN model exhibited a significant decline in loss values within the first 10 epochs and quickly stabilized at a low level. Its convergence trajectory markedly outperforms both the descent path of traditional CNN and the slow convergence of KAN.
The trend in test set accuracy over iterations further supports the above conclusion, as shown in
Figure 12. Although the baseline CNN achieves high accuracy, its curve exhibits a step-like ascent accompanied by persistent fluctuations in the 0–50 epoch range, and its corresponding loss curve descends significantly slower than that of CNN-KAN. Such instability indicates that the traditional CNN is highly sensitive to parameter settings, primarily due to the high parameter density in its fully connected layers, which complicates gradient convergence when processing non-stationary vibration signals. The learning rate of the KAN model, however, lags behind. In contrast, the CNN-KAN model demonstrates superior learning efficiency from the outset of training, rapidly surpassing the 90% accuracy threshold while maintaining stable performance because the learnable B-spline functions on the edges of the KAN module achieve smoother local fitting and faster optimization. This “fast convergence, high retention” characteristic means that in practical engineering applications, the model can significantly reduce training time costs while demonstrating greater robustness to data perturbations.
To verify the model’s reliability under random conditions,
Figure 13 shows the accuracy distribution from ten independent replicate experiments. Unlike the relatively divergent distribution characteristics of the CNN and KAN models, the experimental results of the CNN-KAN model exhibit high concentration, presenting a sharp peak morphology with significantly compressed standard deviation. This extremely low result dispersion indicates that the hybrid architecture effectively counteracts the uncertainty introduced by random initialization through complementary advantages, ensuring high reproducibility of the algorithm in non-stationary construction signal processing.
Figure 14 illustrates the detailed classification performance of the three models on the test set. Comparing the probability distributions of predictions across different categories, it reveals the decision-making differences among different architectures when handling hard-to-classify samples.
The CNN model on the left achieved perfect classification for labels 0 and 2, but encountered a significant bottleneck in recognizing label 3, with a prediction accuracy of only 81%. It is worth noting that the CNN model exhibits specific structural misclassifications, incorrectly categorizing 17% of label 3 samples as label 1. Physically, this stems from the fact that over-vibration (label 3) and compacted (label 2) states exhibit extremely high similarity in vibration waveforms during the late stages of liquefaction, functioning as “twin signals”. This cross-category misclassification highlights an issue where traditional convolutional networks experience feature aliasing when capturing these subtle damping offsets and frequency drifts, making it difficult to precisely distinguish the physical differences between non-adjacent operating conditions. The KAN model on the right exhibits a completely different pattern of misclassification, with its primary errors concentrated between adjacent categories. Although the model performs relatively stably on label 0, it exhibits significant classification ambiguity across the intermediate categories. The CNN-KAN model demonstrates superior discriminative power, particularly in distinguishing between “compacted” (label 2) and “over-vibrated” (label 3) states. By utilizing the learnable B-spline basis functions within the KAN module to accurately disentangle high-order nonlinear features, the hybrid architecture effectively resolves overlapping feature boundaries that traditional networks overlook. The significant reduction in misclassification compared to baseline models confirms that the hybrid architecture can effectively capture the subtle damping shifts required for autonomous construction quality control.
The classification performance is demonstrated by the ROC curve in
Figure 15. The curve of the CNN-KAN model rapidly approaches the ideal inflection point in each category, with values approaching one infinitely. This near-perfect data not only demonstrates the model’s high discriminative power in feature space but also indicates its ability to adapt to the diverse and complex conditions found in construction environments. Compared to models with more common feature overlap regions, it exhibits significantly greater potential for engineering applications.
4.4.2. Experimental Investigation of the Internal Vibrator Under No-Load Conditions
This experiment primarily focuses on three common abnormal operating conditions: horizontal no-load vibration, vertical no-load vibration, and ground-contact no-load vibration. It aims to verify whether the model can accurately make judgments when confronted with unstable impact signals. Considering that no-load vibration signals may contain transient pulses, the experiment implemented targeted measures in the model structure. The specific parameter configurations are shown in
Table 6.
The baseline CNN model retains the standard two-layer small-kernel convolution architecture (with kernel size three and stride one), placing greater emphasis on preserving the temporal information of the original signal. In contrast, the proposed CNN-KAN architecture adopts a down-sampling convolution strategy with a stride of two, combined with max-pooling layers to achieve greater feature compression. The design aims to leverage the dimensionality reduction effect of convolutional layers in the temporal dimension to isolate high-frequency touchdown collision signals from redundant information. These signals are then fed into a 4000-dimensional input KAN for end-to-end nonlinear discrimination, thereby enhancing the system’s ability to identify high-frequency abrupt changes within touchdown signals.
Figure 16 clearly shows that the three types of no-load vibration states exhibit distinct differences in their time-domain waveforms. In the horizontal free-vibration state, due to the equilibrium of forces, only a small-amplitude steady background noise is observed along the Y-axis. Vertical no-load vibration is primarily influenced by gravity, with the waveform exhibiting regular sinusoidal oscillations. Ground-contact vibration is far more complex. Due to high-frequency, random collisions between the rod tip and the ground, the waveform contains numerous unstable spike pulses and abrupt energy fluctuations. When confronted with complex signals featuring transient impacts, the CNN-KAN architecture demonstrates exceptional adaptability. Its front-end down-sampling convolutional layer effectively captures the transient high-frequency features generated by collisions, while the rear-end KAN utilizes nonlinear mapping to analyze the dynamic sudden change patterns embedded within the waveform.
In terms of training dynamics,
Figure 17 records the descent trajectories of different models on the loss function. Experimental observations reveal that when processing raw signals containing transient impacts, the KAN model exhibits persistent numerical fluctuations in its loss curve during descent due to its lack of preprocessing capability for high-frequency features. Furthermore, its convergence speed is significantly constrained. In contrast, the CNN-KAN loss curve exhibits excellent overall smoothness and shows no signs of oscillation. This convergence advantage demonstrates that the front-end CNN module in the hybrid architecture effectively acts as a filter, eliminating redundant noise generated by ground-contact collisions. This provides the back-end KAN component with feature inputs characterized by a high signal-to-noise ratio and strong discriminative power, ensuring a stable and efficient gradient descent process.
The curve showing the change in test accuracy over training iterations is depicted in
Figure 18. The CNN-KAN model demonstrates rapid learning capabilities, with its performance curve climbing steeply within a short number of epochs before stabilizing at a plateau. The overall trajectory is compact and steady. In contrast, other comparison models exhibit relatively sluggish upward trajectories, and even during the stabilization phase, they inevitably experience varying degrees of fluctuation. This “fast and stable” learning characteristic precisely demonstrates that hybrid architectures indeed offer advantages in feature extraction efficiency. It can quickly identify key factors within non-stationary signals, significantly reducing the training cycles required to achieve optimal model performance.
To further quantify the model’s generalization stability under random disturbances,
Figure 19 shows the accuracy distribution across ten independent replicate experiments. The relatively broad distribution range of the CNN and KAN models indicates that their performance is significantly influenced by parameter initialization. In contrast, the accuracy distribution histogram of CNN-KAN exhibits the narrowest and most concentrated distribution. This extremely low variance characteristic effectively demonstrates the robustness of the hybrid architecture. Even when confronted with pulse signals exhibiting random characteristics, the model consistently delivers highly consistent classification results, demonstrating strong engineering reproducibility.
Figure 20 reveals the model’s exceptional performance in identifying “ground impact” (label 2), achieving near-perfect accuracy. From the perspective of equipment protection, this capability allows the intelligent system to detect high-energy collision pulses instantaneously. Unlike the baseline CNN and KAN models, which exhibit blurred decision boundaries for transient impacts, CNN-KAN effectively filters out redundant noise, ensuring that the active protection mechanism triggers the alarm only during genuine hazardous conditions. This high sensitivity to impact signals is fundamental to extending the service life of the vibrating rod in chaotic construction sites.
Figure 21 further quantifies the classification performance of each model at different thresholds using ROC curves. By observing the curve shapes, the CNN-KAN model exhibits steep, near-vertical ascents across all categories, rapidly converging toward the ideal classification point in the upper-left corner. This demonstrates that the model can maintain an exceptionally high true positive rate while simultaneously suppressing the false positive rate to near zero. Compared to it, the curves of the CNN and KAN models appear slightly flatter at the inflection points, indicating that there remains a certain margin of uncertainty for samples with lower confidence levels. Numerical analysis indicates that CNN-KAN achieves AUC values approaching one across all categories. This near-perfect figure reaffirms the advantages of the model’s feature extraction and nonlinear mapping capabilities. This model not only addresses the shortcomings of a single model in a specific category but also constructs a decision space with strong discriminative capabilities across the entire range. This ensures that in complex and dynamic construction site environments, both regular periodic vibrations and random impact signals can be accurately identified and separated.
The experimental results validate that CNN-KAN achieves a synergistic balance between feature refinement and efficient mapping, demonstrating superior performance in identifying transitional states with blurred physical boundaries. The significant advantage of this hybrid architecture lies in its high parameter efficiency and high-order nonlinear fitting capability, which allows the model to maintain elevated recognition precision even under the strict computational constraints of low-power edge nodes. Consequently, this framework is particularly well-suited for the intelligent upgrading of construction machinery where millisecond-level response and high-fidelity monitoring are required. Its robustness to data perturbations and fast convergence properties provide a reliable technical roadmap for real-time automated construction quality control in complex site environments.
4.5. Discussion
Combining the experimental results from both datasets, CNN-KAN outperforms the single model across all performance metrics. As shown in
Table 7, this model outperforms the single model in terms of recognition accuracy, precision, recall, and F1-score, demonstrating exceptional robustness and classification stability.
In the vibration type identification task, the CNN-KAN model achieved a test accuracy of 97.55%, representing a 3.35 percentage point improvement over the pure CNN model. Its core highlight lies in the structural design that effectively integrates the hierarchical feature extraction capability of CNN with the precise nonlinear mapping capability of KAN. In actual working conditions, the waveforms of “over-vibration” and “vibration compaction” in concrete appear remarkably similar, and traditional models often struggle to capture the subtle nonlinear differences between them. CNN-KAN leverages B-spline functions that can be learned from adjacent data points to more precisely capture damping shifts caused by varying degrees of concrete liquefaction. This enables it to demonstrate superior discrimination capabilities in conditions prone to confusion.
In terms of identifying the no-load condition, the model achieved an optimal accuracy of 98.17%. The standard deviation across multiple experiments was only ±0.53%, demonstrating the model’s high consistency and reliability. From experimental analysis, the KAN model demonstrates superior approximation capabilities for small-scale high-dimensional features. However, when confronted with raw features not processed through deep CNN layers, its noise robustness is slightly inferior to that of hybrid architectures.
The experimental results validate that CNN-KAN achieves a synergistic balance between feature refinement and efficient mapping. This tandem architecture enables the equipment to maintain high-precision monitoring under variable loads while responding to sudden pulses (such as ground strikes) with consistent reliability (standard deviation of ±0.53%). Overall, this research provides a viable technical pathway for a new generation of intelligent construction tools that possess both “construction state awareness” and “self-protection capabilities,” optimized for high-performance execution on low-power embedded devices. From an industrial feasibility perspective, the estimated hardware cost of this smart integrated module is approximately 800 RMB, which facilitates large-scale deployment on standard vibratory machinery. Field evaluations indicate that by reducing human judgment errors, the implementation of this real-time monitoring and warning system can significantly improve the construction process acceptance rate (compaction qualification rate) in laboratory settings. Furthermore, the Raspberry Pi 5 platform offers substantial hardware expandability through its GPIO and RS485 interfaces. Future development will focus on integrating the system with a variable-frequency drive (VFD) to establish a complete active vibration control mode. In this closed-loop control configuration, the equipment will be able to autonomously adjust the motor’s vibration frequency according to the identified liquefaction stage, achieving a transition from a pure diagnostic tool to an autonomous process regulator.