1. Introduction
WSNs are nowadays an important component of monitoring and data-collection systems. It was found that the WSN applications in a wide range of applications such as smart agriculture, medical telemetry, industrial control, and environmental monitoring offer a cheap way to have sensing abilities where it would be impossible to put in place wired infrastructure. With an average WSN implementation, there are hundreds of nodes or thousands of nodes, all battery-powered, spread over a distant or unmanaged location. The nodes collect local measurements and perform local processing, which is limited, and send data to a central base station. Although this kind of design has some significant positive aspects (small size, being wireless, and ability to operate independently), it also poses some severe weaknesses. Sensor nodes are easy targets for physical and network-layer attacks since they have no on-site personnel to guard them. A standard WSN is comprised of a massive amount of low-cost, resource-limiting sensor nodes that are distributed in unattended and frequently hostile locations. These nodes have the duty of sensing, processing and relaying data to a base station or sink node. Irrespective of all their broad areas of application, WSNs are seriously challenged in security due to their computational power, memory capacity, energy and lack of physical protection.
Of all the types of attacks that have been known to target WSNs, node clone attacks, also known as node replication attacks, are probably amongst the most destructive and challenging to deal with. In such an attack, an attacker literally takes possession of a deployed sensor node, steals its stored credentials in the form of node ID and cryptographic keys, and makes several physically distinct copies with the identical credentials. Once introduced to different positions in the network, these replica nodes interact with the other nodes just like the real nodes and relay information to the base station, as they are credible. This opens the door to various follow-on attacks, such as the insertion of fake data, changing the routing path, selective packet dropping and denial-of-service attacks that can cause systems reliant on IoT to fail, as mentioned by Chatterjee [
1].
Clone attacks are specifically difficult to detect due to the nature of sensor nodes’ hardware limitations. These nodes can be powered by small batteries, have limited memory, and use low-powered processors, and thus detection algorithms are required to be fast. Any solution that can create significant verification traffic will shorten the battery life and lower the working life of the network it is supposed to secure. Meanwhile, the cost of precision should not be compromised; an invisible clone can inflict lasting harm to the data quality and network operations. As Jane Nithya and Shyamala [
2] note, it is shown that the majority of the current security schemes are either de-energetic, focused on detection accuracy or energy-saving based on the requirements of detection performance, yet none of them have created a practical balance considering the real-life use of WSNs.
Various methods have been examined to combat clone attacks in WSNs, and these include centralized, distributed, and hybrid methods. Machine learning techniques have been popular in this field because of their ability to detect small behavioral changes and adapt to changing conditions of a network. Anitha et al. [
3] proved that the intelligent health-monitoring systems are able to detect the node replication events in WSNs effectively, and Liu et al. [
4] demonstrated that optimization-based approaches can maintain the network lifetime without undermining the security coverage. In a clone node attack, an attacker steals valid sensor nodes and obtains the cryptographic credentials and copies a number of copies of the stolen nodes into the network, which will behave like actual nodes. This type of clone can manipulate data, disable routing protocols, slow down the network performance, and access other sensitive data; hence, they should be found early enough and appropriately as cited by D. Liu and P. Ning [
5].
This all leads to a certain need, namely a detecting mechanism that can operate efficiently on the limited hardware, can effectively service large networks without failure, and can also be dependable in its accuracy at any given time. The combination of clustering with machine learning classification is one of the possible ways of solving this. Clustering reduces the verification area to local areas, whereas the classifier enacts the fine-grained perspective of whether an aberrant behavioral signature is a real indicator of a cloned node. The described work implements this concept with the help of the SDC-BENN framework.
The primary objectives of this paper will be as follows:
To increase the security of the Wireless Sensor Networks against node clone attacks, come up with a smart detection framework that can accurately and reliably detect node clones.
The purpose was to design a light and scalable detection mechanism that can work well with the low-energy, memory, and processing capabilities of sensor nodes in large-scale deployments of WSNs.
To enhance the accuracy of clone detection, incorporate the use of Spatial Distributive Clustering (SDC) with regard to ensuring effective feature organization with Block Ensemble Neural Network (BENN)-based classification.
To reduce communication and computational cost incurred in detecting a clone and maintain network lifetime and system performance.
To confirm the efficiency of the suggested framework based on the overall performance evaluation with the help of the typical statistical indicators and the comparison with the current methods of detection.
2. Related Work
Clone node attacks in WSNs have drawn the attention of many researchers since the damages they may cause to the accuracy of data and network reliability are very severe. The existing literature on the topic contains a wide variety of methods: classical rule-based and optimization, deep learning, and trust-management frameworks, with each of them tackling a portion of the issue. Devi and Jaison [
6] regarded clone detection as a classification problem and designed TAIGBRFCNIA as an optimization-supported classifier, which worked well in controlled settings. The network scaling needs to be considered for better performance. Pajila et al. [
7] tackled the issue in a different way and developed a fuzzy-logic-based defense mechanism to DDoS attacks that was reasonably effective in addressing changing traffic patterns.
Sreedevi and Venkateswarlu [
8] put emphasis on intra-cluster data aggregation and optimal location of sink nodes with EEC-MA-PSOGA. The result of their approach produced significant energy savings and increased network lifetimes. The WOGRU-IDS was a GRU-based intrusion detection system developed by Ramana et al. [
9] to detect networks of IoT-connected WSNs, which demonstrated high detection rates, but the per-node computation was excessively heavy to be run by the most hardware-intensive sensor platforms. The SPRT statistical technique was generalized by Chen et al. [
10] to allow faults in the detection phase, enhancing the performance in noisy systems. Cuckoo filters were applied by Sajitha et al. [
11] to minimize memory footprint with no loss of detection throughput; in static topology, this method worked well, but in topologies with node movement or rapidly changing topology, accuracy was lost.
Jane Nithya and Shyamala [
12] paired MDSO-based cluster head selection with an RNN-based clone node detector (RNNCND) with a higher detection accuracy by enhancing cluster formation. The downside was that the energy cost and training overhead of the RNN component are both problematic and unaddressed sufficiently. Kalvikkarasi and Selvakumar [
13] present a clone-detection ensemble learning strategy framework that is based on an Optimized Extreme Learning Machine (OELM) coupled with a hybrid feature selection technique. Another routing protocol that helped solve clone detection was Roberts and Ramasamy [
14] who proposed a better clustering-based routing protocol applied to IoT-enabled WSNs to improve throughput and reduce energy consumption. Srividya et al. [
15] provided some trust-based algorithm to predict data link failures and intrusion detection with a high reliability, but must continuously update their trust score on all nodes.
Numan et al. [
16] combined several isolated detection schemes into a hybrid system of a static WSN that achieved a significant detection accuracy. The algorithm, however, proved to be not very responsive as nodes started to move. Sheela et al. [
17] used a neural optimization-based approach to clone detection, and they could achieve predictable improvement with different network set-ups; the complexity of the model. Alrashed et al. [
18] designed a protocol that included replica detection and a quarantine mechanism specific to mobile WSNs, which was able to effectively manage node movement but created increasing coordination overhead with increasing network dynamics. Ram and Ilavarasan [
19] surveyed the WSN security landscape in general and have found a consistent gap: the research area still lacks detection mechanisms that are both lightweight and adaptive and robust with respect to network scale and topology variations over time.
A dual-trust multi-level Sybil detection scheme designed by Khan and Singh [
20] yielded trust evaluation results that were correct but incurred a non-trivial overhead of continual per-node monitoring and recomputation of trusts, which is not feasible when deployed at scale. Devapriya et al. [
21] considered the implementation of the ECDSA authentication system to verify mobile sinks to provide high security levels at controllable calculation cost. Nagamani and Annapurna [
22] combined adaptive clustering with a trust-aware routing protocol, which increases the security and routing efficiency in the IoT-based WSNs. Dhanalakshmi [
23] discussed the modified deep learning architecture to classify an attack in WSNs, and they were highly accurate in various attack types. Bharathi and Rafi [
24] examined AI-based models of denial-of-service attacks in WSNs by noting the prospect of artificial intelligence in security development. Lastly, Liu et al. [
25] suggested a coordinate-plane-based authentication scheme to identify clone nodes and provided a simple and scalable authentication scheme that can be applied in large-scale WSN systems.
As one can see by reviewing the existing literature, there is a common trade-off: methods with good detection accuracy are usually heavyweight in terms of network overhead, whereas energy-efficient methods usually reduce the detection rates. The optimization and the deep learning methods can increase the accuracy, but at computational costs that are impractically sustainable by many sensor nodes. These findings inspire the plan presented in this work: SDC is efficient in terms of spatially informed feature grouping, and BENN offers a high classification performance on an ensemble level, which, combined, allows achieving a high detection accuracy without consuming too many resources.
3. Materials and Methods
The mechanism in this section is set to identify replicated nodes in the WSNs with a clear purpose: reliably identifying replicated nodes and maintaining the computational demands and network traffic at a level that would be supported by resource-constrained hardware. This solution consists of three interdependent processing steps: a pre-processing module that prepares raw sensor data to be analyzed, an SDC module that groups the network in spatial homogeneous clusters, and a BENN module that provides the final genuine-or-clone classification of each node. Collectively, the stages are a direct response to shortcomings identified in the prior literature, which are too much overhead, poor scale performance, and inaccurate detection in larger deployments.
As
Figure 1 shows, the pipeline starts with the network initialization, during which the parameters, including the number of nodes, area of deployment, range of transmission and energy budget, are set, and at the same time, the raw sensor dataset is loaded.
Figure 2 shows the architecture diagram of the proposed training model. The Estimate Nodes and Parameters block represents the coordinates of the node arrangements and its weight value. This raw data undergoes spatial and behavioral operations during pre-processing, which is a required step to eliminate noise and avoid irrelevant attributes to produce a cleaner feature set. Such refined features are then subjected to a selection step that retains those attributes that have the highest discriminative value, resulting in the optimized feature representation, which is consumed by both the clustering and classification modules.
The second stage involves the application of SDC to the processed feature set. The clusters consist of nodes that are grouped together in terms of geographic proximity as well as similarity in features as illustrated in the flow diagram. This is a clustering operation that creates less redundancy and it reveals inter-node correlations locally. A measure of similarity is then computed across and within clusters, and a node with a geographic location or identity that is not congruent with the pattern of the expected cluster is identified as potentially suspicious—a feature typical of a pattern of clone nodes, which consists of nodes sharing an identity but at geographically distinct locations.
At the last stage, cluster-organized feature vectors are inputted to BENN in order to undergo binary classification. In comparison to single-model classifiers, the ensemble-based classifier is much more precise, and the generalization process has high accuracy. Once a clone is detected, the system will trigger the corresponding countermeasures, either by isolating the rogue node or rerouting the traffic without it, and traces the incident to be used later in retraining the model. The model is updated with retraining on new labeled data when it is scheduled to be in line with changing network behavior. Systematic performance appraisal on the basis of conventional measures then justifies the entire structure. The flow diagram reflects the serial and repetitive character of the system, demonstrating its flexibility to the specificities of the network. The proposed system can retrain the model by adding the latest data input from the unknown data and merging it with the previously trained features. Since the proposed scheme was implemented in the base station or the edge servers, it is not necessary to run it in each sensor to avoid power consumption. This proposed model is reliable, as it uses clustering and the classification function based on the distributed data rather than validating the individual nodes. This will get less impact when there is a node failure or no response from the node in the dynamic clustering of data.
The modules of the proposed work are as follows:
Dataset collection;
Pre-processing;
Clustering using SDC;
Prediction by BENN.
3.2. Pre-Processing
The first stage in the detection pipeline is pre-processing, which is, perhaps, the most influential phase, as the quality of the data fed to all other stages (both clustering and classification) directly depends on it. In practice WSN deployments, sensor nodes are unmonitored, and the values generated by these nodes are often corrupted by environmental noise, packet loss and hardware failures, leading to incomplete, noisy, and inconsistent data. This is aggravated by clone nodes, which produce false readings intentionally or rewrite corrupted streams of data. It is thus necessary to have a careful pre-processing step: cleaning up raw inputs, extracting informative features, and reducing the feature space are done before the more intensive stages of detection can be performed. The sensor nodes of the Wireless Sensor Networks (WSNs) are deployed in unattended and resource-starved environments that usually lead to irregular patterns of communication, missing information, and noisy data. Further, clone node attacks take advantage of these constraints by creating duplicate identities of nodes and introducing incorrect or untrue information into the network. As such, an efficient pre-processing framework is needed to clean, organize and refine the raw network data and then administer sophisticated detection methods.
The first phase is the pre-processing phase which starts with the initialization of the set of network parameters. Suppose that the network “
” is composed of “
N” sensor nodes, and is denoted as
where N represents the number of sensors.
The sensor nodes (ni) include a group of raw attributes, including node identifier, spatial location, communication behavior measurements and energy measurements. A combination of these attributes is the raw dataset that is inputted into the detection pipeline. Due to the nature of unattended environments, environmental interference, packet loss and node failure are prevalent, and therefore the collected data may have gaps or inconsistencies that need to be addressed before any meaningful analysis can be conducted.
The raw data retrieved through the field deployment of WSNs is hardly ever clean. Noise, missed values, and redundant attributes have all the potential to misrepresent the decision surface learned by the classifier and lower detection rates; thus, data cleaning is the initial tangible activity in this module. The raw feature vector of node ni can be expressed as
in which the number of observed attributes is represented by m. Such features can be node ID, the x and y coordinates, transmission rate, number of packets forwarded, neighbor density, residual energy, and delay associated with communication. As most of these attributes (m) do not add much discriminating power to the clone detection objective, there would be no benefit in feeding the complete raw vector to the clustering and classification modules, and may serve to mask the actually informative signals.
The initial step is data cleaning. Missing records, duplicates and inconsistent values are detected and eliminated in the data. Transmission failures and temporary node inactivity result in missing values, which are solved with estimates of column-wise mean or median, depending upon the distribution of each attribute. Irregular transmission pattern noise is also eliminated to ensure it does not give erroneous information to the downstream clustering and classification systems.
Once cleaning is completed, there is a feature reduction step that eliminates attributes whose dispersion or association with the clone label is too small to give any valuable separation. This dimensionality reduction can reduce dimensionality, accelerate processing and also enable the further classifier to focus on the features that actually matter in detection. The feature vector of each node is reduced and denoted.
The remaining raw attributes are then extracted into higher levels of spatial and behavioral descriptors, which have greater diagnostic value in the detection of clones. Spatial descriptors are used to encode node position and self-consistency of reported positions, and behavioral descriptors encode the trends in transmission rate, forwarding behavior, and energy consumption patterns. The Euclidean distance between node ni and node nj is determined to be Euclidean distance (ni, nj).
Two nodes with the same identifier but at positions that are further than the set threshold, the system instantly emits a clone alert, as a single actual apparatus cannot be present in two different spots simultaneously.
The analysis is given another dimension through behavioral features. The frequency of packet transmission, the number of neighbors, the forwarding ratio, and the energy profile of the residual energy are all obtained and a set of values analyzed. Clone nodes frequently display non-standard behavioral patterns, i.e., abnormally high forwarding rates, energy consumption that is not in agreement with local neighborhood standards, or a mismatch of reported battery levels and real traffic volumes. The addition of both the spatial and behavioral descriptors provides the framework with a more detailed, more informative display of the true state of each node, enhancing its cloning/real node capabilities.
Because the features extracted will occupy vastly divergent numeric values, including spatial values in meters, energy values in millijoules, and the count of packets that could potentially be in the thousands, to avoid the high-intensity features playing a disproportionately large role in distance calculation in SDC or weight update in BENN, these values must be normalized.
Min-max normalization scales all features to the range [0, 1], which is represented by
where
is the value of the jth feature of node “
”.
The reason for this normalization is to be sure that features that are measured in larger units do not overwhelm features that are measured in smaller units, and each feature has a just and equal contribution to cluster-building and to the decision boundary that is learned by the neural network.
The end of pre-processing step is a final consistency check that verifies identifiers of nodes in the spatial positions and behavioral profiles. Any node whose identifier is reported in geographically incompatible locations or whose behavior is visibly different from that of its immediate neighbors is immediately reported as suspicious. This pre-filtering step narrows down the number of candidates that have to be considered by SDC and BENN and also lessens the total computational effort.
A normalized optimal feature set is the product of pre-processing module.
where each
has the spatial and behavioral properties of node ni that have been refined. This compact representation is shared as the input of the SDC clustering module, which takes advantage of it to create spatially coherent groups, and to the BENN classifier, which takes advantage of it to produce the final clone/genuine decision.
Summing up, pre-processing is not a mere data-cleaning formalism per se; it is a representation of content to the overall detection capability of the framework. This step enhances the quality of the signal passed to both SDC and BENN, improves the quality of detection, minimizes computation costs, and increases scalability of both large-scale WSN deployments by systematically removing noise, eliminating redundant dimensions and building a normalized, consistent feature space.
3.3. Clustering Using SDC
WSN deployments typically place nodes across remote or hazardous environments, where they operate unattended—conditions that make node cloning attacks especially effective. Once an attacker captures and duplicates a sensor, the resulting clones carry valid credentials and merge seamlessly into the network, from which they can skew measurements, interfere with routing tables, and progressively erode the accuracy of network-supported decisions. Traditional centralized monitoring cannot scale to handle this efficiently, since verifying every node’s identity against all others generates message traffic that far exceeds the energy capacity of low-power hardware. What the situation requires, therefore, is a strategy that concentrates the identity-verification workload within manageable local regions while still being capable of catching clones positioned across different cluster zones. The implementation of SDC clustering is to select the best node from the grouped cluster of nodes that are from the dataset. Similarly, the spatial representation of the nodes’ features is extracted to find the best feature attributes from the overall dataset.
Three interconnected problems make clone detection at scale particularly difficult. First, continuously monitoring all nodes across a large network generates message traffic that rapidly depletes battery reserves and overwhelms the limited bandwidth typical of WSN communication channels. Second, the computationally infeasible nature of verifying the credentials of every node in a central registry or all its peers is computationally infeasible on low-power processors. Third, the topology of a WSN can vary with time, with nodes joining, leaving or moving, and any detection system that fails to adapt is either going to fail to notice new clones that can be introduced or it will falsely report a legitimate topology change as a threat. These three constraints together eliminate both pure domain-based centralized and sheer force-based detection strategies and encourage a geographically based, cluster-local strategy.
The approach SDC can take to address these challenges is to structure the network as clusters, indicating its physical geography. Each node is first counted by SDC with respect to the number of other nodes within some prescribed threshold distance d0; this is the density calculation. This density score identifies the nodes that will be cluster heads (CHs), i.e., those nodes that have the largest density in the area are used as CHs because they are the most informationally central nodes in each area. The rest of the nodes are then allocated to the closest CH according to Euclidean distance, giving an approximation of a cluster cut that matches the real distribution of the deployment. Checking traffic is thus restricted to intra-cluster and adjacent-cluster traffic, and this significantly lowers the communication overhead of the whole network in comparison with network-wide broadcast methods.
This space confinement of verification activity is the primary advantage that SDC will have in detecting the clone. As cloned nodes have the same credentials as their source, yet they occur in new physical locations, they present space inconsistencies that are evident at both intra- and inter-cluster distances. In every CH, identifiers of its member nodes are monitored and cross-tabulated with the neighboring CHs, and an alert is raised in the event that the same identifier is used in more than one cluster or that the reported position of a node contradicts the anticipated cluster geometry. The reduction in these checks to cluster local and neighboring cluster traffic can cut down the amount of work done by verification by about an order of magnitude compared to checking the world, and the strategy is feasible in practice on a constrained hardware platform.
SDC supports dynamic changes in a network also. After determining the original partition of clusters, every CH position is set to the geometric centroid of the existing members of the cluster, refining the cluster groupings and taking into consideration any additions, removals, or movements of nodes since the last update. These centroid re-calculations are computed in iterative mode as cluster membership varies, maintaining the spatial map on which anomaly detection is based in line with the present network topology instead of basing it on an out-of-date snapshot.
The sensor dataset is represented as the point set S, as defined in Equation (
7).
Cluster set C is initialized as empty, and the proximity threshold is set to “
”. From this, the distance between each attribute value is calculated from Equation (8).
A loop over all N data points computes spatial density and identifies qualifying cluster members “
” as represented by Equations (9) and (10), respectively.
where,
Once densities are computed, the cluster head “CH” is chosen as the data point with maximum local density. Each CH seeds its own cluster set as in Equation (
11), and all remaining data points are assigned to the nearest CH as in Equation (
12).
Each time cluster membership changes, the CH position is recalculated as the centroid of its updated member set, as described in Equation (
13), keeping the cluster structure consistent with the latest network topology.
where
and
are the attribute parameters.
The final predicted label “” can be evaluated as the maximum index of cluster head (CH).
The algorithm steps of the proposed SDC clustering are described in Algorithm 1.
| Algorithm 1 SDC Clustering algorithm |
Input: Input parameters Output: Clone Predicted Label, Step 1: Initialize the network parameters with the total number of data points. Let the data points ‘S’ can be denoted as in the Equation (7). Initialize the threshold for distance estimation . Initialize the empty cluster ‘C’. Step 2: Compute the distance between the parameters by Equation (8). Step 3: Compute the spatial density of each attribute from the dataset. For to N // Loop run from 1 to N size of dataset. Calculate the density from (9) based on the attribute’s length within . Validate the distance value of attributes represented as by (10). Select the cluster head based on the maximum density in the cluster points as when . Initialize the cluster array ‘’ with cluster head as in (11). Assign the other attributes to the cluster that are relevant to the cluster head as (12). Update cluster head for every changes as in (13). End Loop
|
SDC also serves as a natural foundation for adding supplementary security measures. Within each cluster, lightweight cryptographic verification checks or neighborhood-voting schemes can be executed without interfering with the broader cluster structure. When a CH identifies a suspicious node identifier, the hierarchical layout of the cluster network allows the alert to travel efficiently to the base station or network controller along a short, pre-established reporting path, rather than through an uncoordinated broadcast.
Overall, SDC provides a scalable and energy-efficient basis for clone detection. It identifies cloned nodes by locally but not globally verifying the nodes by arranging the monitoring task using spatial density and geographic proximity such that communication and computation remain within the reach of resource-constrained hardware. Its centroid-update algorithm is an iterative process that makes it accurate and thus suitable in large-scale, dynamic deployments where clone attacks are most probable to appear.
3.4. Prediction by BENN
BENN is a group learning architecture; it does not use a single large neural network but a collection of smaller, autonomously functioning neural network blocks, which are merged to obtain a final prediction. The architecture of each of the blocks may be selected according to the nature of the input data and the task at hand, MLP, CNN or RNN. The argument is realistic: one model trained on small or skewed data can be overfitted to the training distribution and fail to extrapolate well to new circumstances. Independent training of several blocks and combating the biases of each block alone by combining the predicted values results in BENN averaging the biases of each block, providing a more stable and reliable composite output than either of the individual components. The BENN classifier is an ensemble classifier that will be robust in learning the features and improve the accuracy based on the decision extracted from different models and collaborate on the labels. Since the proposed model utilized the defined feature patterns to train the data or to retrain the features, this will periodically update its features on the arrival of new data features to the mode in a real-time scenario.
In practice, every block is trained on some part of the whole feature set or on a specific subset of it, intentionally introducing variety between the components—a property that is required so that the process of ensemble aggregation can be useful. In the training process, every block will minimize loss functions separately, and it will update its weights using stochastic gradient descent:
where “
” represents the weights of block “
b”,
is the learning rate, and “
” is the loss function for that block.
In inference, the probability of the output of each of the trained blocks is concatenated to constitute the final classification. In the unweighted analysis, the blocks are understood to be equally reliable:
In the weighted version, the blocks with better prediction results on withheld validation data are, to a higher degree, influential in the overall prediction:
There are three practical benefits of this means of integration. First, variance is minimized since, between blocks, individual prediction errors are likely to counterbalance rather than to add on to one another. Second, there is enhanced generalization as the blocks globalize a larger area of the feature space than any model can. Third, it is more resilient, as the impact of a single poor block on the overall output is so low that it could potentially take down the whole process.
All these features enable BENN to be a good choice to be used in the context of WSN clone detection, where the input features are generally noisy, there is often skewness in the distribution of classes, and the line between legitimate and cloned node behavior is not always clear. The blocks can be trained to learn various attributes of node behavior; one block can learn about spatial anomalies, another can learn about communication anomalies, and then the collection of these specialized assessments is integrated into a single reliable classification outcome.
Maintenance in the long term is also made easy by the modular nature of BENN. New blocks may be added to the ensemble as additional labeled data is received or when the attack space changes and existing ones are retrained on new samples without distribution to the rest of the ensemble. It follows that BENN is practical to scale: it scales with the data and adapts to the slow changes in network behavior without massive system reengineering.
In short, BENN combines the predictive power of various neural networks with the accuracy-checking as well as variance-diluting effects of ensemble aggregation. Its block-based training scheme, its lax aggregation policies, and its modular extensibility enable it to be a good fit for high-accuracy clone detection in WSNs, where both deployment conditions and accessible labeled data are highly heterogeneous and where available labeled data should be utilized in the most efficient way.
4. Results and Discussion
This section includes a strict analysis of the SDC-BENN framework that compares it with the AIA approach of [
24] and CP approach of [
25]. Various measures of classification are employed to have a complete overview of detection performance in various experimental conditions and data distributions.
All the experiments were conducted using Python 3.12. The SDC-BENN model is assessed by standard classification metrics: accuracy, precision, recall and F-measure as well as additional classification rates, which provide a reasonable and consistent point to compare and make comparisons with the similarity-based and classification-based methods [
24,
25]. This comparative analysis illustrates the feasibility of using spatial clustering, together with ensemble-based learning, to detect clone attacks in the WSNs.
The partitioning ratios are varied to three different ratios of 60, 70 and 80 percent of the data to be used in training, and the rest of the data is used to test in each ratio with different partitioning ratios to ensure that the inferences are robust and the conclusion drawn is not based on the choice of a single train-test split. This multi-split protocol evaluates the variation in detection performance with respect to the increase in the amount of training data and also allows a reasonable comparison with the reference methods of [
24] and [
25], which are based on the same partitioning scheme.
Precision, recall and F-measure are the three most common measures of evaluation. Precision is the correctness with which the model cites actual clones of all the flagged nodes—a low-precision detector generates too many false alarms and burns response energy. Recall estimates the extent to which the model is able to cover the actual clone population; a small recall detector permits assaults to be unnoticed. The f-measure is a combination of both that punishes the unbalances between the two, providing a single score that gives a comprehensive picture of the overall detection quality.
In all of the three data splits and three main metrics, SDC-BENN performs significantly better than the AIA baseline used in [
24] and the CP baseline used in [
25]. The two supplementary mechanisms that contribute to the performance gains are that spatial clustering of SDC enhances the signal of anomaly prior to reaching the classifier and the ensemble architecture of BENN prevents overfitting that constrains single-model methods. The returns are strongest at the 80% level of training, when the ensemble possesses enough labeled examples so that the individual blocks can be driven to their respective performance limits and at the same time benefit because of aggregation. The mathematical equations of the evaluation metrics are presented below.
Figure 3,
Figure 4 and
Figure 5 illustrate the variation in accuracy, recall and F-measure of feature-scaling coefficients 2 and 5, as all three change between 0.6 and 0.9. This parameter regulates the proportion of spatial and behavioral features in the process of SDC clustering. Making it vary along this range demonstrates each metric’s sensitiveness to the feature-type balance and how the classifier reacts to the respective changes in the decision boundary.
The assessment is of all four node-identity types (ID1 to ID4), each of which is a variation of normal and cloned node behavior. The accuracy, recall, and F-measure will increase with an increase in 0 which means that a higher importance is given to spatial features, the better classification in this dataset is. This increasing trend is present in all four identity groups, and it proves that the improvement does not happen to any of the categories.
Compared to the AIA approach in [
24], SDC-BENN has a definite and consistent advantage. As the
changes, the performance measures of AIA differ significantly, indicating that they are sensitive to the weighting of the particular features of the model; SDC-BENN maintains a high and consistent performance over the whole range of
. The reason this stability occurs is due to the variance-reduction effect of the ensemble: although the feature distribution may change as 0 changes, when blocks are trained independently, the aggregate vote of the blocks is still reliable.
The dataset breakdown of employment in each of the experiments is summarized in
Table 1. The four classes of identity (ID1 to ID4) are provided with training and testing samples in the ratio of the performance measures of the reference methods used in [
25], so that all performance comparisons are done on the identical basis of data partitioning. The entire data set has 3973 training examples and 226 test examples, which offers a realistic class distribution in the context of testing the clone detection in a large-scale WSN environment.
The results of
Figure 6 provide a direct comparison of precision, recall, and F-score between SDC-BENN and the Auto RS, RBRS, and HKBRS reference methods of [
24] and [
25]. As indicated by the bar chart, SDC-BENN is better in all three metrics by comparison to all three reference methods, with the widest difference in precision. This indicates the ability of SDC to generate very discriminative groupings of features that make the decision boundary of the classifier more precise.
Figure 7 represents the accuracy and error rate next to each other of the same methods. SDC-BENN has the best accuracy and the least error rate and the difference between it and its closest competitor is further exaggerated than the precision–recall numbers alone would indicate is the case. This is enough to assert that the suggested solution does not simply shift the precision–recall trade-off but does effect a true decrease in the total misclassification rate.
The full confusion matrix, as in
Figure 8, displays the decisions made by the classifier as true positives, true negatives, false positives, and false negatives by the classifier on each of the 13 identity classes included in the test set. Based on this matrix, the accuracy, sensitivity, specificity, precision, recall, and F1-score as well as the Matthews Correlation Coefficient (MCC) are all generated, giving us a comprehensive view of the areas of strength and weaknesses of the classifier in terms of residual errors.
Figure 9 and
Figure 10 show the ROC curve and a bar chart of performance measures calculated on the 70/30 training/test split. The fact that the area under the ROC curve is close to a unit indicates that SDC-BENN identifies clone and genuine nodes with high accuracy.
Figure 10 also indicates that all five measures of performance, which are accuracy, Kappa coefficient, MCC, sensitivity and specificity, exceed 0.98.
Figure 11 and
Figure 12 provide a direct baseline comparison between SDC-BENN and standalone ANN and CNN baselines of [
24] and [
25], both on the identical 70/30 split.
Figure 12 shows that SDC-BENN has the best values in terms of sensitivity, specificity, precision, F1-score and MCC, whereas
Figure 12 indicates that SDC-BENN has the best values in terms of accuracy and Kappa coefficient. The methods at the base are always inadequate by about 1 to 4 percentage points on average based on the measure. This consistency of advantages on all measures proves that the association of spatial clustering to ensemble classification performs better than either of the two methods.
Figure 13 and
Figure 14 show the performance comparison in a bar chart of the proposed work with the existing models of [
24] and [
25] for the parameters such as precision, recall and sensitivity, which were implemented and tested in datasets 2 and 3, respectively. From this, datasets 2 and 3 are created from the synthetic WSN dataset, which has the attributes of node location, communication patterns, and clone behavior. From the bar chart result, it is clear that the proposed work achieved a better performance rate than the other existing systems such as AI Aerospace [
24] and Coordinate plane [
25].