1. Introduction
The integrity of security screening in public spaces, such as airports, is a critical component of both national and international safety. Although numerous security measures are implemented, systems reliant on human operators remain vulnerable to error, potentially leading to severe security breaches with significant material and societal consequences. X-ray imaging systems play a central role in these security protocols, particularly for baggage inspection. However, the manual identification of concealed threats within complex environments, such as improvised explosive circuits hidden inside electronic devices like laptops, presents a formidable challenge. This specific task requires a high level of specialized expertise and is inherently prone to human oversight, thereby creating a significant vulnerability in security checkpoints.
In response to these challenges, automated detection systems driven by deep learning have been explored by researchers. However, early investigations that applied conventional deep learning models directly to this problem revealed significant limitations. These limitations include relatively low classification accuracy and a high tendency towards overfitting, largely attributed to the complexity and inherent variations within the X-ray dataset [
1]. The overlapping nature of components in X-ray images increases intra-class variation, while the visual similarity between benign laptop circuits and threat items elevates inter-class confusion, making robust classification difficult. To overcome these deficiencies, this study proposes a novel framework centered on feature fusion combined with a Random Weight Network (RWN) for classification. The core hypothesis is that features extracted from multiple and diverse deep learning architectures can provide a richer, more discriminative representation of the input data. By fusing these features and employing an RWN, which is noted for its rapid training and resistance to overfitting, it is anticipated that a more accurate and generalizable classification model can be achieved. This approach addresses the key research questions regarding the performance enhancement that can be achieved through feature fusion and the optimal configuration of the RWN classifier, including the impact of hidden neuron count and activation function.
The main contributions of this work are systematically outlined as follows:
- (a)
A Novel Feature Fusion Framework: This study proposes and validates a new framework that integrates features extracted from multiple deep learning models (e.g., ShuffleNet, InceptionV3) and employs a Random Weight Network (RWN) for classification. This multi-source feature fusion strategy marks a significant departure from conventional single-model approaches.
- (b)
Significant Performance Improvement: A substantial improvement in classification performance is demonstrated. The proposed feature fusion methodology achieves a test accuracy of 97.44%. This result is markedly superior to both the 83.55% accuracy of the best-performing individual deep learning model, ShuffleNet, and the 94.82% accuracy from classification using features from a single model with an optimized RWN.
- (c)
Comprehensive Empirical Analysis: A comprehensive empirical analysis of the RWN-based classifier is conducted. The investigation evaluates the influence of critical hyperparameters, including the number of hidden neurons and the choice of activation functions, providing a clear optimization guide for similar security applications.
- (d)
Robustness and Generalization: The robustness and generalization capability of the proposed method are established through a comparative analysis against 11 state-of-the-art machine learning classifiers. The framework is shown to offer superior generalization and effective mitigation of overfitting.
- (e)
Publicly Available Dataset: A challenging new dataset of X-ray images, featuring laptops with and without concealed circuits, has been created and made publicly available [
1], thereby providing a valuable benchmark for future research in this domain.
Within this framework, the following research questions are posed to articulate the study’s core contributions and key capabilities:
- (a)
How does it affect the classification performance using RWN on datasets whose features are extracted by deep learning models?
- (b)
Can the combination of features extracted from different deep learning models significantly improve training and test accuracy in classification?
- (c)
What are the performance implications of existing deep learning algorithms when applied to X-ray security datasets, and how can these be addressed through feature fusion techniques?
- (d)
How does the use of an RWN influence classification performance when compared to standard deep learning models on X-ray datasets?
- (e)
Do the combinations of merged features (e.g., N|M and M|N) have a significant effect on classification outcomes in RWN?
- (f)
What is the impact of the number of hidden layer neurons on the performance of an RWN, and how can the risk of overfitting be minimized through optimal parameter selection?
- (g)
How does the selection of activation functions (sigmoid, tangent sigmoid, or hardlim) affect the classification performance of an RWN, particularly in the context of combined datasets?
The organization of the study is as follows: In
Section 1, an introduction to the study is provided, a literature review is presented, and the motivation and contribution of the study are outlined.
Section 2 covers feature extraction from deep learning models, feature fusion, and dataset explanation. In
Section 3, experiments are conducted, and the results obtained are analyzed.
Section 4 discusses the findings, and the study is finally concluded in
Section 5.
X-ray imaging technologies have been used in various aspects of daily life, as well as in fields such as crystallography, astronomy, and medicine, since the discovery of X-rays by Wilhelm Conrad Rontgen. These technologies encompass a wide range of purposes and methods, including traditional transmission methods, dual-energy techniques, and scattered X-ray methods [
2]. In these technologies, rays emitted from an X-ray source are attenuated as they pass through objects. This decrease in intensity is utilized to calculate the density (d) and effective atomic number (
Zeff) of the materials [
3]. Consequently, materials with higher density, which cause greater attenuation, appear brighter in X-ray images, while lower-density materials appear darker. X-ray technologies are widely used for various purposes, as evidenced by the information provided by X-ray devices. Applications range from inspecting welds in industrial settings and identifying bone fractures in medicine to detecting prohibited materials in security-sensitive locations like airports, courthouses, and shopping malls.
X-ray images are utilized for the detection of prohibited materials, aiming to minimize security risks at airports through the application of machine learning and image processing techniques. This involves identifying items passengers are forbidden to carry, whether on their person or in their luggage, by analyzing 3D or 2D X-ray images [
4]. These applications are typically employed to assist personnel conducting baggage control or to automate the process. This section reviews the literature on X-ray image analysis and feature fusion using deep learning algorithms.
Previously, tasks such as classification in X-ray imaging were performed using manually extracted features, such as SIFT and PHOW, often within a Bag of Words (BoW) framework [
5].
In later periods, the success of convolutional neural network (CNN) techniques led to their increased use in this field as well. Akçay et al. [
6] implemented transfer learning in CNN using the fine-tuning paradigm. Jaccard et al. [
7] detected the presence of threat materials in cargo containers using CNN on image patches. Mery et al. [
8] compared methods such as Bag of Words, Sparse Representations, deep learning, and classical pattern recognition schemes. Jaccard et al. [
9] also detected cars within cargo using CNN with data augmentation. Rogers et al. [
10] used the original dual-energy images as separate channels in their CNN. They performed data augmentation with Threat Image Projection. Caldwell et al. [
11], investigated transfer learning in different scenarios using deep networks, such as VGG. Morris et al. [
12] focused on threat detection of traditional explosives using CNNs like VGG and Inception. In addition to these, newly emerged CNN models such as region-based CNNs [
13] and single-shot models like YOLO (You Look Only Once) [
14] have also been applied in X-ray imaging. Petrozziello and Jordanov [
15] performed the detection of steel barrel holes using CNN and Stacked Autoencoder. Cheng et al. [
16] used a YOLO-based method they called X-YOLO, which has feature fusion and attention mechanisms. Wu and Xu [
17] used a hybrid Self-Supervised Learning in the pre-training phase to perform detection with a structure containing a transformer in the final stage of the Head-Tail Feature Pyramid head. Wang et al. [
18] used a Yolov8-based method with a dual branch structure that includes Sobel convolution and convolution branches. Additionally, in the fusion part, they used a lightweight star operation module. In addition to classification, deep learning methods have also been employed for data augmentation. Yang et al. [
19] performed data augmentation using generative adversarial networks (GANs), which utilize Fréchet Inception Distance scores and compared them (DCGAN, WGAN-GP). Kaminetzky and Mery [
20] performed data augmentation using simulated 3D X-ray image models. Apart from these tasks, CNNs have also been used as feature extractors. Caldwell and Griffin [
21] performed data augmentation with photographic images, using both photographs and X-ray images of the same object. Benedykciuk et al. [
22] addressed the material recognition problem using a multiscale network structure consisting of five subnetworks and using image patches. Babalik and Babadag [
23] used CNNs as feature extractors; in the next stages, they selected the features with a binary Sparrow Search Algorithm and classified them using Support Vector Machines (SVM) and k-nearest neighbors (KNN) as classifiers. Methods such as ensemble learning and feature fusion have also been used to improve the performance of deep learning models. Ayantayo et al. [
24] proposed three different deep learning models, which used early-fusion, late-fusion, and late-ensemble learning strategies to resist overfitting. Zhang et al. [
25] used multi-domain features, employing transfer learning and feature fusion. They used SVM in feature extraction and model selection stages, then fused these features and performed baby cry detection. Wu et al. [
26] performed deep learning-based fault diagnosis for rolling bearings. In their method, they used a new multiscale feature fusion deep residual network containing multiple multiscale feature fusion blocks and a multiscale pooling layer. Liu et al. [
27] implemented a multi-modal fusion approach that combines two different deep learning models trained on simple clinical information and group images, using logistic regression for breast nodule diagnosis. Patil and Kirange [
28] proposed a method for detecting brain tumors by fusing deep network features, such as VGG Inception and a shallow CNN. Gill et al. [
29] used deep learning methods including CNN, Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) for image-based classification of fruits using early fusion and late fusion strategies. Deng et al. [
30] addressed side-channel attacks in information security using feature extraction with a multi-scale feature fusion mechanism. Al-Timemy et al. [
31] utilized Xception and InceptionResNetV2 deep learning architectures to extract features from three different corneal maps. The extracted features from these models were fused into one pool to train conventional machine learning classifiers. Peng and Zhang [
32] presented a deep learning network based on multiple feature fusion as well as ensemble learning approaches for the diagnosis and treatment of lung diseases. A deep supervised ensemble learning network was used to combine multiple inducers to improve lung lobe segmentation. Tu et al. [
33] proposed a general framework for solving online packing problems using deep reinforcement learning hyper-heuristics. They used feature fusion to combine the visual information of real-time packing with distributional information of random parameters of the problem. Tan et al. [
34] performed component identification using a deep learning network based on coarse-grained feature fusion. Medjahed et al. [
35] fused CNNs trained on different modalities using machine learning (ML) algorithms in the classification phase. Ma et al. [
36] proposed a deep dual-side learning ensemble model for Parkinson’s disease diagnosis by analyzing speech data. Their approach employs a weighted fusion mechanism to integrate multiple models. Alzubaidi et al. [
37] detected shoulder abnormalities. They trained models using different body part images in the same domain and performed feature fusion with different machine learning classifiers. Agarwal et al. [
38] used an approach that combined channel-based fusion and model-based fusion to classify chest x-ray images using ResNet50V3 and InceptionV3 models. Li et al. [
39] designed a dual-channel feature fusion network for detecting distal radius fractures. They used Faster region-based CNN (RCNN)and ResNet50 on their channels. The feature fusion method includes an attention mechanism.
While a broader overview of deep learning for X-ray analysis is available in the literature [
40], the foundation for the current study is the authors' prior work [
1]. That paper details the creation of the dataset used herein and presents a comparative performance analysis of 11 different deep learning models.
The dataset presented in [
1] is highly challenging due to two primary factors. Firstly, the overlapping nature of internal components in X-ray images leads to high intra-class variation. Secondly, high inter-class similarity, resulting from the visual resemblance between benign laptop circuits and threat circuits, reduces the distinction between the classes. Collectively, these issues caused the models evaluated in the previous work [
1] to be prone to overfitting, a problem exacerbated by the dataset's limited size and high complexity. To overcome this issue and improve classification performance, it was proposed to use the RWN network for the fusion of the feature extraction capabilities of the pre-trained models in a way that is resistant to overfitting. This was anticipated to increase fusion success due to both the single-stage nature of the training and its resistance to overfitting.
Therefore, the motivation for this study is twofold: to address the identified gap in the literature for this specific security application and to overcome the performance limitations of conventional deep learning models that were observed in our prior work [
1].
3. Experiments
All experiments were conducted on a workstation equipped with an AMD Ryzen 9 5950X 16-Core Processor and 64 GB of RAM. The MATLAB 2021a environment was used for all stages of the study, including feature extraction, feature fusion, and the training and evaluation of the classification models.
The feature extraction process requires deep learning models to first be trained on the dataset. A comprehensive performance analysis of 11 such architectures was previously presented in [
1]. The results from that study showed that ShuffleNet achieved the highest test accuracy of 83.55%, followed by the InceptionV3 architecture at 81.31%. In this study, the aim was to achieve higher accuracy in classification using features extracted from these architectures. The number of features extracted from the methods in the [
1] study is presented in
Table 2.
The features for the datasets obtained by combining the features given in
Table 2 are provided in
Table 3.
As seen in
Table 3, besides using features from different architectures, the feature set from each architecture was concatenated with itself to investigate the effect of this repetition on classification performance. The datasets were used to conduct classification experiments with RWN, employing 10-fold cross-validation in each experiment. To account for the stochastic nature of the RWN, where input-to-hidden layer weights are randomly assigned, we repeated each 10-fold cross-validation experiment 30 times to obtain statistically robust performance measures. Our investigation focused on two key hyperparameters that govern RWN's behavior: the number of hidden layer neurons and the choice of activation function. The investigation began by evaluating the impact of the number of hidden neurons, testing the values as follows: 50, 100, 250, 500, 1000, 2000, and 4585. The value of 4585 was specifically chosen because it matches the number of training samples in each fold of our 10-fold cross-validation. When the number of hidden neurons equals the number of training samples, the hidden layer output matrix (H) becomes a square matrix, allowing its inverse to be calculated directly without requiring the Moore–Penrose pseudo-inverse method. Subsequently, the effect of the activation function, the second key hyperparameter, was investigated. For these experiments, the number of hidden neurons was fixed to the value that yielded the best average test accuracy from the previous stage. The activation functions evaluated include tangent sigmoid, sigmoid, sine, hard limit, triangular basis, and radial basis, all of which are commonly used with RWNs.
Firstly, the features given in
Table 2 were reclassified using RWN and compared with the results of deep learning methods. Tangent sigmoid was used as the activation function in RWN, and the comparison results are presented in
Table 4.
Table 4 reveals a substantial improvement in test accuracy when features extracted from deep learning models are classified using an RWN. Specifically, while the best-performing standalone deep learning model (ShuffleNet) achieved an accuracy of 83.55%, this figure increased to 94.82% when using an RWN with 250 hidden neurons on the features extracted from the same ShuffleNet model. Additionally, the results indicate a trade-off related to the number of hidden neurons: increasing the neuron count improves training accuracy at the cost of decreasing test accuracy. This trend, which is indicative of overfitting, is illustrated for several model architectures in
Figure 8.
A similar analysis using classical machine learning algorithms shows that features extracted from ShuffleNet consistently yield the best performance. Both SVM and KNN demonstrated strong generalization, achieving test accuracies of 93.62% and 94.76%, respectively, on the ShuffleNet features, results that are comparable to those of the RWN. In contrast, while the TREE model achieved high accuracy on the training set, its performance dropped significantly on the test set, clearly emphasizing its tendency to overfit. This comparison highlights the superior generalization capabilities of RWN, SVM, and KNN for this classification task.
A combined evaluation of
Table 4 and
Figure 8 indicates that increasing the number of hidden neurons leads to overfitting, where the network begins to memorize the training data rather than generalizing from it. The results show that setting the number of hidden neurons to 250 provides the best balance, yielding the highest average test accuracy and overall classification performance among the tested values. Given the dramatic decrease in test performance observed with 4585 neurons, a clear sign of severe overfitting, this value was excluded from subsequent experiments on the combined feature datasets. The results of the experiments using the remaining neuron counts on these combined datasets are presented in
Table 5,
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10.
The extensive feature combinations detailed in
Table 3 were designed to systematically investigate the principles of an effective fusion strategy. Our analysis of the subsequent classification results (
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12 and
Table 13) revealed several key patterns. Firstly, experiments involving self-combination (e.g., ShuffleNet features fused with themselves) demonstrated no significant performance improvement over using the single feature set with an RWN. This critical finding indicates that merely increasing feature quantity is insufficient; feature diversity is a crucial driver of success. Secondly, the most substantial accuracy gains came from a synergistic fusion of features from the top-performing individual models, ShuffleNet and InceptionV3, where their distinct representational strengths complemented each other to create a more robust and discriminative feature space. This synergy proved more impactful than raw feature dimensionality alone, as this combination outperformed fusions with a higher total feature count. Finally, our tests also confirmed that the order of feature concatenation (e.g., N|M vs. M|N) had a negligible impact on the final classification outcome.
A holistic analysis of
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12 and
Table 13 reveals two key trends. First, duplicating the feature set by merging a dataset with itself does not yield a significant improvement in test performance. Second, as previously noted, increasing the number of neurons in the RWN's hidden layer consistently improves training performance, often at the expense of test performance. Furthermore, a critical finding is that fusing features from the individually best-performing deep learning models, notably ShuffleNet and InceptionV3, leads to the highest classification accuracies. This specific combination consistently produced the top results across different classifiers. Specifically, on the fused ShuffleNet-InceptionV3 feature set, several classifiers achieved high training accuracies, with SVM achieving 99.91%, TREE 99.54%, an RWN with 2000 neurons 99.69%, and KNN 97.87%. The highest test accuracy of 97.43% is achieved when the RWN hidden layer neuron count is set to 500 or 1000, and features extracted from Inception and ShuffleNet architectures are combined. As detailed in the overall performance comparison in
Table 14, other classifiers like SVM and KNN also achieved high performance on the fused feature sets, though the RWN provided a superior balance of accuracy and efficiency.
Having established the impact of the hidden neuron count, the investigation now shifts to evaluating the influence of the second key hyperparameter: the activation function. To conduct this analysis, the number of hidden neurons was fixed at 250. This value was chosen based on the results in
Table 14, as it yielded the highest average test accuracy in the previous experiments. The performances of the different activation functions on the combined datasets are subsequently presented in
Table 15,
Table 16,
Table 17,
Table 18 and
Table 19. Note that the results for the Tangent Sigmoid function with 250 neurons, which were already presented in
Table 7, are not duplicated in this section.
A comparative analysis of the results from
Table 7 and
Table 15,
Table 16,
Table 17,
Table 18 and
Table 19 reveals a clear distinction in the performance of the tested activation functions. The Sigmoid, Tangent Sigmoid, and Hardlimit functions consistently yielded strong and comparable results. In contrast, the Sine, Tribas, and Radbas functions were demonstrably less effective, with their average training and test accuracies remaining below 90%.
The best overall performance was unequivocally achieved using the Sigmoid activation function. On the fused Inception-ShuffleNet feature set, this configuration produced not only the highest training accuracy of 97.74% and a test accuracy of 97.40% but also the highest average training and test accuracies of 93.54% and 92.77%, respectively. A summary of these comparative results is presented in
Table 20.
The summary results in
Table 20 highlight a clear performance hierarchy among the activation functions. Sigmoid, Tangent Sigmoid, and Hardlimit consistently emerge as the top-performing functions. Conversely, Sine, Tribas, and Radbas demonstrate markedly inferior performance, particularly with respect to their average test accuracies.
Figure 9 presents the time and disk space usage for tests conducted with individual feature sets, while
Figure 10 illustrates the same metrics for tests performed with combined feature sets. The reported values are calculated as the averages of the consumption metrics across all tests for each classifier. When evaluating computational efficiency, the RWN demonstrates a strong and balanced time–performance profile. While its training time is longer than that of lazy learners like KNN, it is significantly faster than SVM. More critically, its testing time is remarkably short, outperforming the much slower KNN. SVM exhibits the longest training time of all classifiers but has a faster testing time compared to KNN. KNN, as a lazy learning algorithm, has negligible training time that is limited to loading instances into memory. However, its testing time is significantly longer due to the need to search for nearest neighbors during inference. TREE, on the other hand, is fast in both training and testing but, as demonstrated earlier, is highly prone to overfitting. In terms of disk usage, RWN is the most economical, requiring the least space, whereas SVM is the most demanding. Notably, when moving from individual to combined feature sets, the disk space consumption for KNN, TREE, and SVM increases significantly, while RWN's usage remains low and consistent.
Therefore, when considering the combined metrics of high classification accuracy, minimal disk space requirements, and a favorable balance of training/testing times, the RWN emerges as the most well-rounded and efficient classifier for this application. While SVM and KNN offer high accuracy potential, they demand greater computational and storage resources. TREE achieves a balanced trade-off between time and resource usage, but its classification performance does not match the other classifiers.
Having established the performance of the proposed RWN-based feature fusion framework, the final stage of our analysis compares these results against several state-of-the-art machine learning classifiers. Machine learning advancements have led to a variety of classifiers designed to solve complex problems with varying efficiency. To assess our proposed method, we compared its performance with state-of-the-art classifiers. Among these classifiers, CatBoost is a gradient boosting algorithm designed to handle categorical data effectively while mitigating overfitting [
48]. Decision trees use a tree-like structure to model decisions and their possible outcomes. They are widely used for both classification and regression problems [
49]. The Gaussian Naïve Bayes algorithm, based on Bayes' theorem, is a probabilistic classifier that calculates the likelihood of different classes. It gained popularity for its effectiveness in classification tasks [
50]. Gradient boosting methods enhance model performance by iteratively combining weak learners to create a strong predictive model [
51]. KNN is a straightforward yet effective algorithm for classification and regression. It assigns class labels by analyzing the nearest k data points in the feature space [
52]. LightGBM is a gradient boosting framework optimized for handling large datasets efficiently through distributed learning [
53]. Logistic regression is a statistical method used to predict the probability of a dependent variable belonging to a particular category [
54]. Random Forests combine multiple decision trees to tackle complex classification problems, improving accuracy and robustness [
55]. The Ridge Classifier is an extension of ridge regression; this algorithm is tailored for classification tasks. It excels by incorporating regularization to address overfitting [
56]. SVM is an algorithm that finds the most appropriate hyperplane to separate data into different classes and is widely used for both linear and nonlinear problems [
57]. XGBoost is a high-performance gradient boosting framework known for its speed and scalability [
58]. In
Table 21, we present an analysis of the results, highlighting the strengths and generalization capabilities of the proposed approach.
The comparative results are presented in
Table 21. It is crucial to note the experimental setup for this comparison: to provide a robust benchmark, the state-of-the-art classifiers were trained on the best-performing single feature set (ShuffleNet). Our proposed method, by contrast, was trained on the fused ShuffleNet-InceptionV3 feature set to specifically demonstrate the benefit of feature fusion.
The analysis clearly demonstrates the superiority of the proposed method. The RWN-based fusion approach not only achieves the highest average test accuracy of 97.43% but also exhibits the best generalization capability. This is evidenced by the minimal gap between its training and test accuracies, especially when compared to models like LightGBM and XGBoost, which, despite achieving perfect training scores, show a significant performance drop on the test set, indicating severe overfitting.
4. Results and Discussion
This study systematically investigated the performance of a novel feature fusion framework centered on a Random Weight Network (RWN) classifier. The findings directly address the core research questions posed in the Introduction, demonstrating a clear pathway to overcoming the limitations of conventional deep learning models in this challenging security domain.
The investigation first addressed the performance implications of substituting a standard deep learning classifier with an RWN. The results unequivocally demonstrate a significant performance uplift. For instance, on the features extracted from the best-performing standalone model, ShuffleNet, the test accuracy increased dramatically from 83.55% to 94.82% when an RWN with 250 hidden neurons was employed. This finding confirms that by decoupling feature extraction from classification, the inherent performance limitations of conventional models, namely low accuracy and a high propensity for overfitting on complex X-ray data, can be substantially mitigated.
Building upon this, the study validated its primary hypothesis regarding the efficacy of feature fusion. By combining features from different high-performing architectures, notably ShuffleNet and InceptionV3, the framework achieved a state-of-the-art test accuracy of 97.44%. This result provides a definitive affirmative answer to whether multi-model fusion can significantly enhance classification accuracy, clearly outperforming both standalone models and the RWN applied to single feature sets. This highlights that data diversity, achieved through fusing varied feature representations, is a key driver of performance. In contrast, simply duplicating an existing feature set by merging it with itself yields no significant improvement, reinforcing that the richness of the feature pool is what matters.
The performance of the proposed framework was also found to be critically dependent on the careful tuning of RWN’s hyperparameters. Addressing the impact of hidden layer size, the study revealed a clear trade-off: an excessive number of neurons led to overfitting, while an insufficient number resulted in ineffective learning. Optimal generalization was achieved through a balance, with 250 neurons providing the best average test accuracy across many scenarios, and 500 or 1000 neurons yielding the peak accuracy on the best fused dataset. Similarly, the choice of activation function proved significant. Sigmoid, Tangent Sigmoid, and Hardlimit functions consistently delivered superior and robust performance, with Sigmoid ultimately achieving the best overall results. Conversely, other implementation details, such as the order of feature concatenation (N|M vs. M|N), were found to have a negligible impact on the outcome.
In summary, this study confirms that a modular approach, which involves decoupling feature extraction, employing multi-model feature fusion, and utilizing a well-tuned RWN, is a highly effective strategy. This framework successfully answers the initial research challenges, demonstrating a clear pathway from the 83.55% accuracy of standalone models to the 97.44% achieved through the proposed methodology, thereby establishing a new performance benchmark in this security domain.