Effectiveness of Data Augmentation for Localization in WSNs Using Deep Learning for the Internet of Things

Wireless sensor networks (WSNs) have become widely popular and are extensively used for various sensor communication applications due to their flexibility and cost effectiveness, especially for applications where localization is a main challenge. Furthermore, the Dv-hop algorithm is a range-free localization algorithm commonly used in WSNs. Despite its simplicity and low hardware requirements, it does suffer from limitations in terms of localization accuracy. In this article, we develop an accurate Deep Learning (DL)-based range-free localization for WSN applications in the Internet of things (IoT). To improve the localization performance, we exploit a deep neural network (DNN) to correct the estimated distance between the unknown nodes (i.e., position-unaware) and the anchor nodes (i.e., position-aware) without burdening the IoT cost. DL needs large training data to yield accurate results, and the DNN is no stranger. The efficacy of machine learning, including DNNs, hinges on access to substantial training data for optimal performance. However, to address this challenge, we propose a solution through the implementation of a Data Augmentation Strategy (DAS). This strategy involves the strategic creation of multiple virtual anchors around the existing real anchors. Consequently, this process generates more training data and significantly increases data size. We prove that DAS can provide the DNNs with sufficient training data, and ultimately making it more feasible for WSNs and the IoT to fully benefit from low-cost DNN-aided localization. The simulation results indicate that the accuracy of the proposed (Dv-hop with DNN correction) surpasses that of Dv-hop.


Introduction
In recent decades, with the advancements in IoT technologies, the intelligent perception and management of objects have become achievable through the connection of things and people [1].WSNs have played an increasingly significant role in the IoT by facilitating the real-time sensing, collecting, and processing of information.The inherent characteristics of node location make it an essential prerequisite for many functions.During the last decade, this topic has motivated extensive research endeavors that have resulted in several interesting localization algorithms [2].As the demand for location-based services continues to grow, the accuracy of node localization significantly impacts various application areas, such as city surveillance, and smart homes [3,4].So far, many localization algorithms have been proposed in the literature and mainly fall into two categories: range-based and rangefree algorithms.Range-based localization utilizes the measurements of the received signal attributes such as angle of arriving signal (AOA) [5], received signal strength (RSS) [6], and time of arrival (TOA) [7].While range-based algorithms are known for their high accuracy, they are impractical for WSNs due to their high-power requirements for communication between anchors and regular nodes, especially in the case of small battery-powered units.Furthermore, these algorithms are susceptible to interference and fading, often necessitating additional hardware, thereby burdening both the WSN and the IoT costs.Range-free localization techniques are variations of the well-known Dv-hop algorithm that simply converts the numbers of hops into coordinates [8][9][10].Such techniques do not require any additional hardware, in contrast to range-based methods, but are relatively much less accurate.To improve localization performance, researchers have recently resorted to machine learning (ML) or DL.Several ML techniques have been investigated in this context such as, namely, support vector machine (SVM), artificial neural network (ANN), etc. [11][12][13][14].The key hurdle these techniques face in common is the requirement of large training data sets.The larger they are, the more accurate they will be in correcting the estimated node positions However, such data, generated very often by anchors deployed in few numbers due to the expensive GPS technology they integrate, are relatively scarce, thereby limiting any potential ML-driven performance improvement.Therefore, to improve localization accuracy, we develop a precise, and cost-efficient DNN-based range-free localization approach for WSN applications in the IoT.To tackle the issue of limited data for training the proposed DNN, we are also working on an efficient DAS.
The rest of the paper is organized as follows.Section 2 introduces the range-free localization process.Section 3 describes the implementation of data augmentation in WSN application.The architecture processing of the DNN is described in Section 4. In Section 5, several experiments are performed, and the experimental results are analyzed.Finally, in Section 6 we conclude the paper with a summary.

Localization Process
The aim of sensor localization is to determine the locations of unknown nodes.The localization process identifies the positions of these unknown nodes based on input data.In our case, the input data consists of the locations of both real anchors and their virtual counterparts, along with the unknown node.An overview of the localization process is depicted in Figure 1.WSN and the IoT costs.Range-free localization techniques are variations of the wellknown Dv-hop algorithm that simply converts the numbers of hops into coordinates [8][9][10].Such techniques do not require any additional hardware, in contrast to range-based methods, but are relatively much less accurate.To improve localization performance, researchers have recently resorted to machine learning (ML) or DL.Several ML techniques have been investigated in this context such as, namely, support vector machine (SVM), artificial neural network (ANN), etc. [11][12][13][14].The key hurdle these techniques face in common is the requirement of large training data sets.The larger they are, the more accurate they will be in correcting the estimated node positions However, such data, generated very often by anchors deployed in few numbers due to the expensive GPS technology they integrate, are relatively scarce, thereby limiting any potential ML-driven performance improvement.Therefore, to improve localization accuracy, we develop a precise, and costefficient DNN-based range-free localization approach for WSN applications in the IoT.To tackle the issue of limited data for training the proposed DNN, we are also working on an efficient DAS.The rest of the paper is organized as follows.Section 2 introduces the range-free localization process.Section 3 describes the implementation of data augmentation in WSN application.The architecture processing of the DNN is described in Section 4. In Section 5, several experiments are performed, and the experimental results are analyzed.Finally, in Section 6 we conclude the paper with a summary.

Localization Process
The aim of sensor localization is to determine the locations of unknown nodes.The localization process identifies the positions of these unknown nodes based on input data.In our case, the input data consists of the locations of both real anchors and their virtual counterparts, along with the unknown node.An overview of the localization process is depicted in Figure 1.This section explains the estimated distance and location computation process, while the remaining steps will be covered in the next section.
For estimation of the distance between anchor and unknown nodes, we assume each node communicates with an anchor node through a multi-hop path by using a localization algorithm [10].Firstly, all unknown nodes in the network obtain minimal hop counts to every anchor node.During the second phase, when an anchor node obtains hop counts to other anchors, it calculates an average distance for one hop, which is subsequently disseminated to the entire network.Anchor node i estimates the average hop size using the following equation: where (x , y ) and x , y are the known coordinates of anchors i and j, respectively, and h is the minimum number of hops between them Upon receiving the hop size from the anchor nodes with the least hops between them, every unknown node computes its distance (d ) to each anchor node I using the hop size This section explains the estimated distance and location computation process, while the remaining steps will be covered in the next section.
For estimation of the distance between anchor and unknown nodes, we assume each node communicates with an anchor node through a multi-hop path by using a localization algorithm [10].Firstly, all unknown nodes in the network obtain minimal hop counts to every anchor node.During the second phase, when an anchor node obtains hop counts to other anchors, it calculates an average distance for one hop, which is subsequently disseminated to the entire network.Anchor node i estimates the average hop size using the following equation: where (x i , y i ) and x j , y j are the known coordinates of anchors i and j, respectively, and h ij is the minimum number of hops between them Upon receiving the hop size from the anchor nodes with the least hops between them, every unknown node computes its distance (d i ) to each anchor node I using the hop size denoted earlier in Equation (1) as Hopsize I and the minimum hop count denoted here as hops as follows: Hence, the location of the unknown node can be estimated by solving the following set of equations: where (x i , y i ) denote the coordinates of anchor i = 1, . . . n and (x, ŷ) are the coordinates of the unknown node.Indeed, Equation (3) could be linearized and solved under the minimum mean square error (MMSE) criterion to estimate the coordinates of the unknown node (x, ŷ) as follows: where: and

Data Augmentation in WSN Application
Typical augmentation techniques applied to images involve a range of transformations such as translation, blurring, flipping, rotation, and the introduction of various types of noise to data samples.These techniques are well established in the field, and diverse DASs are tailored to specific problems.For example, in the context of the MNIST database of handwritten digits, researchers have explored augmentation techniques [15].In the field of machine learning, especially for researchers working with techniques like Generative Adversarial Networks (GANs), the limited availability of large datasets for effective training poses a significant challenge.In response to this challenge, a novel concept known as "virtual big data" is introduced [16].This concept involves the generation of synthetic or virtual datasets that mimic the characteristics of real-world data, offering a solution in situations where obtaining extensive real-world data is impractical.In consideration of the small datasets of chemical reactions, the data-driven model suffers from the difficulty of low accuracy in the prediction tasks of chemical reactions.To tackle this, the model integrated with the strategies of data augmentation [17].The data augmentation is used to improve the performance of data-driven reaction prediction models by increasing the sample size using fake data augmentation [18].
In what follows, we will employ a similar approach by utilizing data augmentation to increase the dataset.This involves creating multiple copies of virtual anchors for each real anchor around its position, as illustrated in Figure 2. The coordinates of these new virtual anchors are mathematically represented by Equation (7).This data augmentation technique is applied to enhance the dataset, generating additional instances of virtual anchors to increase the training data.technique is applied to enhance the dataset, generating additional instances of virtual anchors to increase the training data.As shown in Figure 2, there are three anchors (A = 3), and each anchor is surrounded by five virtual anchors (V = 5).The coordinates (x , y ) of the virtual anchor near the real anchor i can be obtained by adding a span multiplied by a random Gaussian variation (∆x , ∆y ), denoted as (Vcx, Vcy), as depicted in Algorithm 1 (steps 6 and 7), to the coordinates of the anchor i (x , y ) for (i = 1, …, A) and (k = 1, …, V) as follows: where A and V denote the numbers of real and virtual anchors, respectively.for i = NodeAmount:−1:BeaconAmount +1; 4.
C  As shown in Figure 2, there are three anchors (A = 3), and each anchor is surrounded by five virtual anchors (V = 5).The coordinates ( x k , y k ) of the virtual anchor near the real anchor i can be obtained by adding a span multiplied by a random Gaussian variation (∆x k , ∆y k ), denoted as (Vcx, Vcy), as depicted in Algorithm 1 (steps 6 and 7), to the coordinates of the anchor i (x i , y i ) for (i = 1, . .., A) and (k = 1, . .., V) as follows: where A and V denote the numbers of real and virtual anchors, respectively.for i = NodeAmount:−1:BeaconAmount +1; 4.
C This augmentation strategy involving virtual anchors significantly expands the dataset, providing a more extensive and diverse set of training instances for the proposed DNN model.The input data for the proposed DNN framework is the distance between the real Sensors 2024, 24, 430 5 of 13 anchors and their virtual anchors, and the unknown nodes.The data are then structured or shaped to be in a format suitable for training, often represented in matrix form.The format of the input data for the DNN is a single array form, as depicted in Figure 3.This augmentation strategy involving virtual anchors significantly expands the dataset, providing a more extensive and diverse set of training instances for the proposed DNN model.The input data for the proposed DNN framework is the distance between the real anchors and their virtual anchors, and the unknown nodes.The data are then structured or shaped to be in a format suitable for training, often represented in matrix form.The format of the input data for the DNN is a single array form, as depicted in Figure 3.In Figure 3, B and VB represent the locations of the real anchors and virtual anchors, respectively, where (i = 1, …, A), A being the total number of real anchors, and (k = 1, …, V), and V being the total number of virtual anchors.Uj denotes the unknown nodes, where (j = 1, …, N ), with N being the total number of unknown nodes.The distance between (the real anchors and their virtual anchors), and the unknown nodes, is denoted as d ( ) .The training data size (D ) is described in Equation ( 8): In this scenario, the dataset composition is determined by the number of real anchors (A = 5), unknown nodes (N = 95), and the presence or absence of virtual anchors (V = 0), the total data size (D ) is calculated as 475 using the formula of Equation (7).However, introducing virtual anchors, as exemplified with V = 20, leads to a substantial increase in the total data size, reaching up to 9975, as shown in Table 1 and Figure 4.In Figure 3, B i and VB ik represent the locations of the real anchors and virtual anchors, respectively, where (i = 1, . .., A), A being the total number of real anchors, and (k = 1, . .., V), and V being the total number of virtual anchors.U j denotes the unknown nodes, where (j = 1, . .., N u ), with N u being the total number of unknown nodes.The distance between (the real anchors and their virtual anchors), and the unknown nodes, is denoted as d (ik)j .
The training data size (D t ) is described in Equation ( 8): In this scenario, the dataset composition is determined by the number of real anchors (A = 5), unknown nodes (N u = 95), and the presence or absence of virtual anchors (V = 0), the total data size (D t ) is calculated as 475 using the formula of Equation (7).However, introducing virtual anchors, as exemplified with V = 20, leads to a substantial increase in the total data size, reaching up to 9975, as shown in Table 1 and Figure 4.

DNN-Based Estimated Distance Correction
To obtain a deeper understanding of DNNs, it is necessary to revisit the basics of ANNs.ANNs have been used in several areas, such as engineering applications and WSN applications [14,19].Several types of neural networks are described in [13].Generally, an ANN can be defined as a system or mathematical model that consists of many nonlinear

DNN-Based Estimated Distance Correction
To obtain a deeper understanding of DNNs, it is necessary to revisit the basics of ANNs.ANNs have been used in several areas, such as engineering applications and WSN applications [14,19].Several types of neural networks are described in [13].Generally, an ANN can be defined as a system or mathematical model that consists of many nonlinear artificial neurons running in parallel and may be generated as one layered or multilayered.An ANN consists of a network of neurons organized in input layers, output layers and hidden layers.Different types of networks can be implemented by varying the structure of the weights and the activation functions of the neurons.Neural network systems can learn how to approximate relationships between inputs and outputs without being overcome by the complexity and size of the problem.The training of ANN using the backpropagation (BP) technique typically occurs in three main steps: the feedforward of input training, the backpropagation of the error, and the update of weights and biases.
ANNs, as highly efficient computational methods, find widespread applications in knowledge representation, machine learning, and predicting output responses in complex systems [20].Recent advancements have underscored their effectiveness and led to notable achievements in diverse domains [21].In the domain of ANN training, various processes have been employed [22].This method is characterized by two essential stages: forward propagation, and backward propagation [23].Localization system based on WSN and backpropagation-based BP-ANN have been practically implemented to detect and determine the position of an Alzheimer's patient in an indoor environment [14].To achieve a minimal localization error, a thorough exploration of various DNN architectures was undertaken, considering different combinations of hidden layers and neurons.Through this iterative process, an optimal DNN architecture emerged, characterized by one input layer (referred to as DNN input distance), five hidden layers with neuron counts of (20, 10, 5, 10, and 20,) and a single output layer (referred to as corrected distance), as depicted in Figure 5.The process of localization accuracy involved the meticulous collection of DNN data input for the purpose of training, testing, and validation.The dataset was judiciously partitioned, allocating 70% for training, 15% for testing, and an additional 15% for validation.The iterative process for training the DNN was extended up to 1000 iterations, a crucial step undertaken to enable the DNN to reach an optimal normalized root-mean-square error (NRMSE).

Simulation and Performance Analysis
The experimental region was defined by the parameters outlined in Table 2; the nodes step undertaken to enable the DNN to reach an optimal normalized root-mean-square error (NRMSE).

Simulation and Performance Analysis
The experimental region was defined by the parameters outlined in Table 2; the nodes were randomly deployed.We also carried out a series of experiments.This section presents one case as an example of node deployment under the influence of network settings, specifically by changing the amount of virtual anchor and span ranges, as illustrated in Figure 6a.This figure provides a visualization of unknown nodes enclosed by 5 real anchors, each surrounded by 20 virtual anchors, with a span equal to 1 m. Figure 6b showcases another scenario where unknown nodes and 5 real anchors are surrounded by 20 virtual anchors around each real anchor, but with a span equal to 6 m.These visual representations offer a concrete example of node deployment configurations, demonstrating the impact of the amount of virtual anchor, and varying span ranges in the simulation setup.The process of localization accuracy involved the meticulous collection of DNN data input for the purpose of training, testing, and validation.The dataset was judiciously partitioned, allocating 70% for training, 15% for testing, and an additional 15% for validation.The iterative process for training the DNN was extended up to 1000 iterations, a crucial step undertaken to enable the DNN to reach an optimal normalized root-mean-square error (NRMSE).

Simulation and Performance Analysis
The experimental region was defined by the parameters outlined in Table 2; the nodes were randomly deployed.We also carried out a series of experiments.This section presents one case as an example of node deployment under the influence of network settings, specifically by changing the amount of virtual anchor and span ranges, as illustrated in Figure 6a.This figure provides a visualization of unknown nodes enclosed by 5 real anchors, each surrounded by 20 virtual anchors, with a span equal to 1 m. Figure 6b showcases another scenario where unknown nodes and 5 real anchors are surrounded by 20 virtual anchors around each real anchor, but with a span equal to 6 m.These visual representations offer a concrete example of node deployment configurations, demonstrating the impact of the amount of virtual anchor, and varying span ranges in the simulation setup.The process of localization accuracy involved the meticulous collection of DNN data input for the purpose of training, testing, and validation.The dataset was judiciously partitioned, allocating 70% for training, 15% for testing, and an additional 15% for validation.The iterative process for training the DNN was extended up to 1000 iterations, a crucial step undertaken to enable the DNN to reach an optimal normalized root-mean-square error (NRMSE).

Simulation and Performance Analysis
The experimental region was defined by the parameters outlined in Table 2; the nodes were randomly deployed.We also carried out a series of experiments.This section presents one case as an example of node deployment under the influence of network settings, specifically by changing the amount of virtual anchor and span ranges, as illustrated in Figure 6a.This figure provides a visualization of unknown nodes enclosed by 5 real anchors, each surrounded by 20 virtual anchors, with a span equal to 1 m. Figure 6b showcases another scenario where unknown nodes and 5 real anchors are surrounded by 20 virtual anchors around each real anchor, but with a span equal to 6 m.These visual representations offer a concrete example of node deployment configurations, demonstrating the impact of the amount of virtual anchor, and varying span ranges in the simulation setup.The process of localization accuracy involved the meticulous collection of DNN data input for the purpose of training, testing, and validation.The dataset was judiciously partitioned, allocating 70% for training, 15% for testing, and an additional 15% for validation.The iterative process for training the DNN was extended up to 1000 iterations, a crucial step undertaken to enable the DNN to reach an optimal normalized root-mean-square error (NRMSE).

Simulation and Performance Analysis
The experimental region was defined by the parameters outlined in Table 2; the nodes were randomly deployed.We also carried out a series of experiments.This section presents one case as an example of node deployment under the influence of network settings, specifically by changing the amount of virtual anchor and span ranges, as illustrated in Figure 6a.This figure provides a visualization of unknown nodes enclosed by 5 real anchors, each surrounded by 20 virtual anchors, with a span equal to 1 m. Figure 6b showcases another scenario where unknown nodes and 5 real anchors are surrounded by 20 virtual anchors around each real anchor, but with a span equal to 6 m.These visual representations offer a concrete example of node deployment configurations, demonstrating the impact of the amount of virtual anchor, and varying span ranges in the simulation setup.: real and virtual anchors (a,b).

Experiment Results
To verify the performance of the proposed Dv-hop + DNN correction algorithm, the simulations were separately carried out on the Dv-hop algorithm and the improved Dvhop + DNN correction algorithm across various values of spans and node communication ranges within a randomly selected square area.The evaluation metric employed for this comparison was the normalized root-mean-square error (NRMSE), calculated using Equation ( 9): Sensors 2024, 24, 430 8 of 13 where (x i , y i ) denotes the real position of the unknown node, and xj , ŷj represents the estimated position (Dv-hop) or corrected position (Dv-hop + DNN correction) of the unknown node.The remaining parameters are defined in Table 1.
The comparison of the localization NRMSE for two different algorithms, Dv-hop and a proposed DNN-correction algorithm, are presented in Figure 7, and a comparison was made of various values of virtual anchors, in Figure 8 concerning span, and in Figure 9 with changes in the communication range.These figures illustrate that the accuracies of all algorithms improved, as expected, with an increasing span and communication range.However, the proposed approach consistently outperformed the Dv-hop algo-rithm in terms of localization accuracy.The error in localizing unknown nodes de-creased with a higher number of virtual anchor nodes in the proposed approach.The positioning errors were smaller with the proposed approach compared to the Dv-hop algorithm, even when the number of virtual anchor nodes was the same.Gradually changing the number of virtual anchor nodes yielded different results in various sce-narios.The proposed approach demonstrates increasing superiority over the Dv-hop approach as the number of virtual anchor nodes rises.Importantly, under the same conditions, the NRMSE of the proposed approach with an augmentation system was smaller than that of the Dv-hop algorithm.
where (x , y ) denotes the real position of the unknown node, and (x , y ) repres estimated position (Dv-hop) or corrected position (Dv-hop + DNN correction) of known node.The remaining parameters are defined in Table 1.
The comparison of the localization NRMSE for two different algorithms, Dv-hop proposed DNN-correction algorithm, are presented in Figure 7, and a comparison made of various values of virtual anchors, in Figure 8 concerning span, and in Fig with changes in the communication range.These figures illustrate that the accura all algorithms improved, as expected, with an increasing span and communicatio range.However, the proposed approach consistently outperformed the Dv-hop a rithm in terms of localization accuracy.The error in localizing unknown nodes de creased with a higher number of virtual anchor nodes in the proposed approach.positioning errors were smaller with the proposed approach compared to the Dvalgorithm, even when the number of virtual anchor nodes was the same.Gradual changing the number of virtual anchor nodes yielded different results in various ios.The proposed approach demonstrates increasing superiority over the Dv-hop proach as the number of virtual anchor nodes rises.Importantly, under the same tions, the NRMSE of the proposed approach with an augmentation system was sm than that of the Dv-hop algorithm.2. From Figure 10a, using the proposed algorithm Dv-hop + DNNs correction without augmentation system 70%, and with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 90%, 94%, 95%, 96%, and 98%, respectively, of the sensors could estimate their position with an NRMSE less than 0.2.In contrast, from Figure 10b with Dv-hop for (5 V, 10 V, 15 V, 20 V, and 25 virtual), 76%, 82%, 82%, and 91%, respectively, of the sensors could estimate their position, and only about 50% with Dv-hop without adding virtual anchors.This proves even more the accuracy of the proposed localization algorithm.Figure 10c shows when the span is very small (without and with 5 to 25 virtual), using the Dv-hop + DNN correction algorithm, 71% of nodes estimate their position within a NRMSE value of less than 0. 2, and no more than 50% using Dv-hop (without and with 5 to 25 virtual), as shown in Figure 10d.Meanwhile, Figures 10e and 10f show the effect of communication range when reduced from 30 m to 20 m, for two case Dv-hop and Dv hop + DNNs.The Dv-hop + DNNs correction without augmentation system 50%, and with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 78%, 81%, 82%, 86%, and 84%, respectively, of the sensors could estimate their position with an NRMSE less than 0.2.In contrast, from Figure 10f with Dv-hop for (5 V, 10 V, 15 V, 20 V, and 25 V), 55%, 62%, 69%, and 67% of the nodes achieve the same accuracy with and only about 50% with Dv-hop without adding virtual anchors.From the simulations, we observed that the proposed approach kept improving noticeably with a larger number of virtual anchor nodes up to 20 when the total training data size was 9975, as shown in Table 1.Performance gains started to saturate beyond that threshold.2. From Figure 1 proposed algorithm Dv-hop + DNNs correction without augmentation sys with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 90%, 94 and 98%, respectively, of the sensors could estimate their position with an than 0.2.In contrast, from Figure 10b with Dv-hop for (5 V, 10 V, 15 V, 20 V, an 76%, 82%, 82%, and 91%, respectively, of the sensors could estimate their only about 50% with Dv-hop without adding virtual anchors.This proves e accuracy of the proposed localization algorithm.Figure 10c shows when th small (without and with 5 to 25 virtual), using the Dv-hop + DNN correcti 71% of nodes estimate their position within a NRMSE value of less than 0. 2, than 50% using Dv-hop (without and with 5 to 25 virtual), as shown in Figu while, Figure 10e and Figure 10f show the effect of communication range w These further prove the efficiency of the Dv-hop + DNN algorithm, and that the DAS has a sufficient effect for correcting the location of coordinates of unknown nodes.

Effect of Span
The evaluation of the Dv-hop and Dv-hop + DNN correction algorithms was conducted with systematic adjustments made to the span (distance between real anchors and virtual anchors) at values of 1 m, 3 m, 6 m, 9 m, and 12 m.The foundational aspects of the WSN model, detailed in Table 2, remained unchanged throughout these experiments.The experimental outcomes, detailed in Tables 3 and 4, highlight the performance of the Dv-hop, Dv-hop + virtual, and Dv-hop + DNN correction algorithms under varying span values, assessed through NRMSE values.Remarkably, the NRMSE obtained by the proposed Dv-hop + DNN correction consistently ranked first, particularly when virtual anchors real anchors can be cost-prohibitive and may pose challenges in terms of training data size.To address this issue, we introduced a DAS that virtually increases the number of anchors, mitigating the prohibitive cost associated with the deployment of many real anchors.

Conclusions
This article introduced an innovative and precise machine learning-based approach for range-free localization in WSN applications within the IoT.Our methodology presents a cost-effective distance estimation strategy through the development of DNN.The aim is to reduce localization errors and enhance accuracy without incurring additional hardware costs.Additionally, we proposed a DAS that virtually increases the number of anchors, significantly augmenting the training data and leading to more accurate localization.Simulation results illustrate the effectiveness of our DAS in range-free localization for WSNs, particularly with a limited number of real anchors.Notably, our proposed Dv-hop + DNNs correction surpasses the traditional Dv-hop algorithm, demonstrating superior localization accuracy.

Figure 3 .
Figure 3.The augmented input data format.

Figure 3 .
Figure 3.The augmented input data format.

Figure 4 .
Figure 4. Total data size of virtual augmentation strategy.

Figure 5 .
Figure 5. Proposed DNN framework.The process of localization accuracy involved the meticulous collection of DNN data input for the purpose of training, testing, and validation.The dataset was judiciously partitioned, allocating 70% for training, 15% for testing, and an additional 15% for validation.The iterative process for training the DNN was extended up to 1000 iterations, a crucial

Figure 6 .
Figure 6.Sensor network configuration : unknown nodes : real and virtual anchors (a,b).Figure 6. Sensor network configuration

Figure 6 .
Figure 6.Sensor network configuration : unknown nodes : real and virtual anchors (a,b).: real and virtual anchors (a,b).

Figure 10
Figure 10 illustrates the cumulative distribution function (CDF) of the localization NRMSE with experiment parameters, as shown in Table2.From Figure10a, using the proposed algorithm Dv-hop + DNNs correction without augmentation system 70%, and with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 90%, 94%, 95%, 96%, and 98%, respectively, of the sensors could estimate their position with an NRMSE less than 0.2.In contrast, from Figure10bwith Dv-hop for (5 V, 10 V, 15 V, 20 V, and 25 virtual), 76%, 82%, 82%, and 91%, respectively, of the sensors could estimate their position, and only about 50% with Dv-hop without adding virtual anchors.This proves even more the accuracy of the proposed localization algorithm.Figure10cshows when the span is very small (without and with 5 to 25 virtual), using the Dv-hop + DNN correction algorithm, 71% of nodes estimate their position within a NRMSE value of less than 0. 2, and no more than 50% using Dv-hop (without and with 5 to 25 virtual), as shown in Figure10d.Meanwhile, Figures10e and 10fshow the effect of communication range when reduced from 30 m to 20 m, for two case Dv-hop and Dv hop + DNNs.The Dv-hop + DNNs correction without augmentation system 50%, and with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 78%, 81%, 82%, 86%, and 84%, respectively, of the sensors could estimate their position with an NRMSE less than 0.2.In contrast, from Figure10fwith Dv-hop for (5 V, 10 V, 15 V, 20 V, and 25 V), 55%, 62%, 69%, and 67% of the nodes achieve the same accuracy with and only about 50% with Dv-hop without adding virtual anchors.From the simulations, we observed that the proposed approach kept improving noticeably with a larger number of virtual anchor nodes up to 20 when the total training data size was 9975, as shown in Table1.Performance gains started to saturate beyond that threshold.

Figure 10
Figure 10 illustrates the cumulative distribution function (CDF) of th NRMSE with experiment parameters, as shown in Table2.From Figure1proposed algorithm Dv-hop + DNNs correction without augmentation sys with Dv-hop + DNNs correction for (5 V, 10 V, 15 V, 20 V, and 25 V), 90%, 94 and 98%, respectively, of the sensors could estimate their position with an than 0.2.In contrast, from Figure10bwith Dv-hop for (5 V, 10 V, 15 V, 20 V, an 76%, 82%, 82%, and 91%, respectively, of the sensors could estimate their only about 50% with Dv-hop without adding virtual anchors.This proves e accuracy of the proposed localization algorithm.Figure10cshows when th small (without and with 5 to 25 virtual), using the Dv-hop + DNN correcti 71% of nodes estimate their position within a NRMSE value of less than 0. 2, than 50% using Dv-hop (without and with 5 to 25 virtual), as shown in Figu while, Figure10eand Figure10fshow the effect of communication range w

Table 1 .
Training data size.

Table 1 .
Training data size.