Learning Wireless Sensor Networks for Source Localization

Source localization and target tracking are among the most challenging problems in wireless sensor networks (WSN). Most of the state-of-the-art solutions are complicated and do not meet the processing and memory limitations of the existing low-cost sensor nodes. In this paper, we propose computationally-cheap solutions based on the support vector machine (SVM) and twin SVM (TWSVM) learning algorithms in which network nodes firstly detect the desired signal. Then, the network is trained to specify the nodes in the vicinity of the source (or target); hence, the region of event is detected. Finally, the centroid of the event region is considered as an estimation of the source location. The efficiency of the proposed methods is shown by simulations.


Introduction
The applications of the networks of wireless smart motes with sensing, processing and communication capabilities are expanding. Among all the possible applications, these networks are applicable for preventing people from suffering health effects such as sleep disturbance [1,2], annoyance [3], cardiovascular effects [4], learning impairment [5,6], and hypertension ischemic heart disease [7]. Other examples include noise monitoring [8] for obtaining noise maps and related action plans [9].
These wireless sensor networks (WSN) are the basis of the emerging technology of Internet of Things (IoT) and specifically fit very well in surveillance applications [10,11], where detecting an event source in a region of interest (ROI) and tracking it are important challenges.
Tracking over WSNs is a challenging problem since it needs implementing statistical filters with cumbersome computations. These computationally heavy algorithms should be run by each network node with severe limitations of processing power, memory, and communication.
The nodes' estimations of the source track are transmitted to a central unit of network which is referred to as the "fusion center (FC)". FC adopts an appropriate fusion rule for obtaining a tuned estimation of the track. Some of the existing fusion rules are Independent likelihood pool (ILP) [12], Covariance intersection (CI) fusion [13], Information graph [14], Track-to-track fusion [15], and Consensus-based fusion (distributed MTT-DMTT) [16].
There are also source localization approaches based on information theoretic methods. In the approach presented in [17], a coarse estimation of the source location is obtained based on the data of specified nodes which are assigned as the anchor nodes. Then, a set of non-anchor nodes with maximum mutual information (MI) between the source location and their measurements are activated. The source is localized after several iterations. The complexity of the MI-based method grows exponentially in the network size. To alleviate this problem, another sensor selection scheme for source localization based on conditional posterior Cramer-Rao Lower Bound (PCRLB) has been proposed by [18,19].
In many applications, it is important to detect the region of event after detection of its occurrence. In other words, many applications need to divide ROI into event and non-event regions after the event occurrence is detected. For example, in fire detection in forests or poisonous gas leakage applications, it is crucial to detect the region of event during time in order to estimate the event expansion rate and schedule appropriate plans.
Due to severe bandwidth and energy limitations of battery-powered sensor nodes, they are usually programmed to send as little data as possible. The extreme case in event detection is implementation of distributed detection [20] where each node sends just one bit to FC indicating its decision about event occurrence. Apparently, the detection performance of nodes is not perfect and includes false alarms and misses of the event occurrence. However, if erroneous decisions of nodes can be corrected, the region of nodes with decision '1' (indicating event occurrence) would imply the event region. The goal of this paper is to correct both false alarms and misses of the network nodes using appropriate machine learning (ML) methods in order to estimate both event location and its region.
In any WSN, we encounter distributed agents gathering data for a specific task. Since the size of data collected by WSNs is usually enormous, the related problems and challenges can be handled by suitable ML methods. Instances of applying ML methods on different aspects of WSNs-such as node localization, channel allocation, and routing-have been reviewed in [21]. Support vector machines (SVMs) are powerful tools for classification and regression [22][23][24] which are based on the statistical learning theory. Compared with other ML methods such as artificial neural networks, SVMs have a better generalization guarantee. SVM divides the data samples of two classes by determining a specific hyperplane in the input space. This hyperplane maximizes the separation between the two classes by solving a quadratic programming problem (QPP) whose solution is globally optimum.
Inspired by the idea of SVM, twin support vector machine (TWSVM) for classification problems was proposed in [25]. TWSVM seeks two non-parallel proximal hyperplanes for classifying data sets. Unlike SVM, TWSVM solves two small SVM-type problems; i.e., TWSVM solves a pair of QPPs instead of a single complex QPP as in SVM. SVM and TWSVM also work effectively for classifying data sets which are not linearly separable by using the theory of kernel functions [26].
In this paper, two source localization methods are proposed based on distributed detection together with SVM and TWSVM algorithms. Our contributions presented in this paper are as follows: • Distributed detection is implemented over a WSN in order to alleviate both computational and communication burdens (the two most important limitations of WSNs); • By applying the SVM and TWSVM learning methods, the network becomes capable of detecting the nodes in vicinity of the desired event by whom the region of the event is detected; • The event location is assumed to be the centroid of the event region. However, a correction method is provided for cases wherein the event occured at the edges of ROI; • We show that our proposed methods-referred to as "Red-S" and "Red-T", respectively, since they perform the region of event detection (Red) by applying SVM and TWSVM, respectively-are not only good at source localization but also are capable of tracking a moving source.
The proposed algorithms are more general and practical than the existing learning methods such as the one in [27] where the region of event is detected via data exchange among neighbor nodes and the false alarm probability is restrictively assumed the same as the miss probability.
The remaining of this paper is organized as follows. The models and assumptions used throughout this manuscript are discussed in Section 2. Section 3 provides the required background knowledge. Estimation of the event location and region of event detection by using SVM and TWSVM are proposed in Section 4. The evaluation results is discussed in Section 5. Finally, the paper is concluded in Section 6.
Notations: Lower-case (resp. upper-case) bold letters denote vectors (resp. matrices), with a i (resp. A i,j ) representing the ith element (resp. (i, j)th element) of a (resp. A). In addition, I is used for denoting the identity matrix, 1 is a vector of all elements 1 with proper size, a T (resp. A T ) denotes the transpose of a (resp. A), the 2-norm of vector a will be denoted by a , and N(µ, Σ) denotes a normal distribution with mean vector µ and covariance matrix Σ. In addition, U (a, b) indicates the uniform distribution between a and b. Finally, the symbol ∼ means "distributed as".

System Model and Assumptions
The system model shown in Figure 1 is adopted throughout this paper. Different parts of this model are discussed in the following subsections. Figure 1. The network configuration studied in this paper. The network nodes observe a common phenomenon of interest and decide locally about event occurrence. Then, they send their decisions to FC where the final decision u 0 is taken. If u 0 = 1, FC gives an estimation of source locationl and its region.

Node Model
A network of N random-deployed wireless smart motes (referred to as "nodes") is considered, programmed for collaborative detection of a desired event occurrence. In fact, there are two conditions: the normal condition in which no event has occurred (denoted by hypothesis H 0 ), and event occurrence (denoted by hypothesis H 1 ). Node i, i ∈ {1, . . . , N} observes a common phenomenon of interest (POI) for detection of a desired scalar parameter θ (l) originating from the event location l according to the observation model: where v i is a zero-mean additive white Gaussian noise (AWGN) with variance σ 2 (i.e., v i ∼ N 0, σ 2 ), and g(.) is a mapping function (e.g., a function indicating the attenuation of an acoustic signal originating from position l). It is assumed that the noises of nodes, v i , are uncorrelated both temporally and spatially.
Each node i takes a local decision u i about event occurrence based on an optimal decision rule-later discussed in Section 3.3-and sends it to an FC ( Figure 1). Here, note that the scalar observation model (1) has been adopted just for simplicity in our discussions. If the desired parameter (signal) is multi-dimensional, just the decision rule would simply change. Nevertheless, we adopt the scalar observation model since detection is not the focus of our study.

Network Model
We denote the location of node i, i ∈ {1, . . . , N} by x i and collect all locations in x = {x 1 , . . . , x N }. In order to detect the event region, it is assumed that the nodes' locations are known by FC. The positions may be obtained by using an appropriate localization method such as those presented in [28,29].
As illustrated in Figure 1, a parallel configuration has been adopted in this paper. The parallel configuration is the most popular configuration in the literature. However, nodes may transmit their decisions to FC indirectly through intermediate nodes in a hop-by-hop manner (since their communication range is usually limited). This configuration can be represented by the parallel configuration as well if the communication channels are assumed to be error-free [30][31][32].

Communication Channel Model
The communication channels between nodes and FC are assumed to be ideal and error-free. In practice, there are two sources of error: (i) erroneous local decisions, which are due to either local false alarms or misses of nodes, and (ii) erroneous received decisions due to the faulty nature of wireless channels. We integrate both error sources into just the first type and ignore the communication error rate since it does not affect the implementation and evaluation of our proposed algorithms at all.

Problem Statement
After the event occurrence is detected by FC, the problems are to: 1.
estimate the event location l, and 2.
divide the region into event and non-event regions.

Backgrounds
In this section, we review briefly the SVM (Section 3.1) and TWSVM (Section 3.2) learning algorithms. The section ends with a refresher of detection over WSN (Section 3.3), needed for the discussion of the proposed methods.

Support Vector Machine (SVM)
The support vector machines (SVMs) are a popular class of supervised learning algorithms. A comprehensive tutorial on the topic could be found in [33]. In its original formulation, SVM attempts to separate two classes of data by a hyperplane whose distance from the data of each class is maximized.
Let us denote the training set by x = {x 1 , ..., x N } where the class of each sample x i is represented by y i ∈ {−1, 1}. The goal of SVM is to separate the two classes of data by a hyperplane y = w T x + b whose distance with the nearest samples is maximized. To that end, the following constrained optimization problem must be solved [33]: where {ξ i }s are slack variables used for non-separable samples, and C > 0 is a penalty term used for allowing some limited misclassifications. More specifically, as shown in Figure 2a, the SVM algorithm attempts to separate the two classes by maximizing the margin around the optimum hyperplane. However, there would be borderline samples in most practical cases that make classification challenging ( Figure 2b). The slack variables ξ i are used for taking these borderline samples into account by imposing a penalty for them. ξ i = 0 for samples out of the margin, 0 < ξ i < 1 for samples within the margin, and ξ i > 1 for samples that are misclassified. The optimization problem (2) can be solved by resorting to its dual problem and forming the Lagrange objective function as follows: where Y is a diagonal matrix with Y ii = y i , α is the column vector of the Lagrange coefficients, 1 is an N × 1 vector of all elements 1, and R is the matrix of the inner products of the samples with its elements given by: Now, a quadratic optimization problem has been obtained that can be solved by using appropriate functions in related libraries of programming languages, such as the function quadprog(.) in MATLAB. After solving (3) for α, the solution to the optimum hyperplane is given by: The final decision rule for each input vector x is now: where sign(.) is the sign function. Though the SVM algorithm discussed above is applicable just in cases in which the two classes can be separated by a linear hyperplane, it can be used in nonlinear cases via the Mercer kernels.
Using Mercer kernels, the data is mapped into a higher dimensional space wherein the two classes are linearly separable. To that end, one should simply replace the inner product matrix R in (3) with an appropriate kernel matrix. As an instance, a popular kernel-also used in this paper-is the Gaussian kernel. Using the Gaussian kernel, the kernel matrix is built with its elements given by: Then, the decision function (6) generalizes to:

Twin Support Vector Machine (TWSVM)
In this subsection, Twin Support Vector Machine (TWSVM) [25] is discussed as a popular and accurate method of data classification. TWSVM formulation is similar to that of SVM except that in TWSVM, all the data points do not emerge in the constraints, simultaneously. Furthermore, TWSVM is proved to be faster than SVM as it solves two smaller sized QPPs [25].
Denoting the data points in class +1 and class −1 by matrices A ∈ R m 1 ×n and B ∈ R m 2 ×n , respectively, (m 1 + m 2 = N and n is the size of each feature) with A i and B i ∈ R n denoting the ith row of A and B, respectively. The linear TWSVM, unlike SVM, finds a pair of nonparallel hyperplanes as follows: where w 1 , w 2 ∈ R n , and b 1 , b 2 ∈ R. Each hyperplane of TWSVM is close to the data points of one of the two classes while it is distant from the data points of the other class. Therefore, the formulation of TWSVM can be stated as follows: where c 1 , c 2 are positive parameters, and q 1 , q 2 are the vectors of the related slack variables. It is clear that the purpose of TWSVM is to solve two QPPs (10) and (11). Each QPP in the TWSVM pair represents a classic SVM formulation, with the exception that not all data points show up in the constraints of either problem (see Figure 3). The corresponding Wolfe dual problems can be obtained respectively as follows: It can be shown that the solutions to (12) and (13) are given by [ The matrices H T H and G T G are positive semidefinite, thus they may be singular. Therefore, in order to avoid ill-conditioned cases, the inverse matrices (G T G) − The nonlinear TWSVM can be expressed as follows: and K(., .) is the kernel matrix of an appropriately chosen type (the Gaussian kernel in (7) here for its elements). The Wolfe dual problems of (14) and (15) to obtain the following hypersurfaces are respectively as follows: Then, the distance of each input data to each of the two obtained hypersurfaces determines its class. In other words, for each input data x, its class y is determined by:

Detection over Sensor Networks
In this subsection, the theory of detection and its implementation over sensor networks are reviewed. A comprehensive tutorial on the topic is [20].
Detection is a basic task of WSNs in many applications [34,35] where a final decision about either an event occurrence (labeled as "hypothesis H 1 ") or no event occurrence (denoted by "hypothesis H 0 ") must be taken. This final decision is taken by FC via fusing the data of the network nodes.
The network nodes may send either raw or processed observations to FC. While sending raw measurements imposes a considerable communication burden on the network, the nodes are usually programmed to make a local decision and inform FC about their decision by sending just one bit. To have this practically popular scheme-known as "distributed detection"-implemented, two types of decision rules must be designed: a local decision rule for each node, and a decision fusion rule for FC. These two kinds of decision rules are discussed in what follows.

Local Decision Rule
Nodes of the network observe a desired signal θ according to model (1). In practical scenarios, there is no information about the specifications of the source signal (e.g., the location and the strength of the acoustic signal are not known). In these cases, it is proved that the optimal decision rule for each node would be the likelihood ratio test (LRT) if the statistical information of the sensing noise (v i in (1)) is available [36]. Accordingly, the optimal decision rule for node i with observation model (1) is given by [36]: with u i = 0 (u i = 1) indicating H 0 (H 1 ), and τ i being the detection threshold that is given based on a local false alarm rate (i.e., Pr (u i = 1|H 0 )).

Fusion Rule
The network nodes send their decision to FC where a decision fusion rule should be adopted in order to have a final decision. One simple and popular one is the counting rule [37] in which the sum of received decisions is simply compared against a threshold: where N indicates the network size, and t is the detection threshold which is obtained according to a desired network false alarm rate. Under a common threshold for sensors and the same noise statistics (i.e., in homogeneous WSNs), Λ under H 0 (i.e., Λ|H 0 ) is binomial distributed while it follows a Poisson-binomial distribution in more general cases [38]. The threshold t is computed according to this distribution. While simple, the counting rule maintains robustness [39,40] and can reach almost the optimum detection performance in large network sizes [20] (i.e., in sufficiently large values of N).
There are modifications of the counting rule such as LVDF [41] and WDF [35,42]. Each node in LVDF modifies its decision based on the majority of the decisions of its neighbors. In WDF, the decision of each node is weighted based on the node's sensed signal-to-noise ratio (SNR) and then the weighted decisions are counted. Moreover, the network's detection performance can be improved by considering the correlation among the nodes' decisions [43,44].

Learning Methods for Source Localization
In this section, we propose two learning-based methods for both source localization and obtaining the nodes affected by the event. The complexity of the methods and their designing methodology are discussed as well.

Region of Event Detection by SVM (Red-S)
In this section, our first proposed method of source localization in WSNs is presented. Having it implemented in a WSN, we show that the region of event can be recognized. This method is referred to as "Red-S" which is carried out by the following steps: 1.
The network nodes observe ROI in order to detect a desired event (e.g., a desired target) when it occurs by using the decision rule (20). Then, they send their decisions to an FC where the final decision is taken by exploiting an appropriate fusion rule such as the counting rule (21). It is assumed that the network size is sufficiently large so that the overall detection performance of the network is optimum (i.e., there is neither a false alarm nor a miss).

2.
Upon event detection, FC runs the SVM algorithm in order to detect the region of the event (i.e., the nodes in vicinity of the event) as follows: The decision of each node denotes its class according to the following mapping rule: The Gaussian kernel (7) is adopted for constituting the kernel matrix K that replaces the matrix R in the optimization problem (3). Designing parameter σ of the Gaussian kernel is elaborated in Section 4.3.1.
Having SVM solved, the coefficient vector α is obtained based on which the classifier parameters w and b are obtained.
The event region is obtained by applying the nodes' locations to the obtained classifier as follows: where y i = 1 denotes that node i is in the even region while it is not if y i = −1. Note that the above relation is the vectorized version of (8).

3.
The location of the event is estimated by averaging the locations of the nodes in the region of the event. In other words, the centroid of the nodes in the vicinity of the event is considered as the location of the event. More specifically, denoting the set of the nodes in event region by E {x i : y i = 1, i = 1, . . . , N}, the estimation of the event location is given by: where |E | denotes the cardinality of E .
In summary, the locations of nodes are used as the training set together with the nodes' decisions as their classes. After having the network trained, the nodes' locations are applied to the obtained classifier. The result gives the region of event with its centroid as the estimation of the event's location.

Region of Event Detection by TWSVM (Red-T)
Red-S gives both event location and its region after the event occurrence is detected by FC. To improve the accuracy of classification of ROI into event and non-event regions, the more advanced TWSVM algorithm is applied. However, TWSVM is more sensitive to design parameters-as discussed later-and needs a priori estimation of event location in order to yield accurate outcomes. This estimation can be provided by Red-S. In other words, the classification performance of Red-S is improved by applying TWSVM on the network. The overall algorithm is referred to as "Red-T" and performs the following steps after Red-S result is out: The parameters of TWSVM are computed based on the classification result of Red-S as will be elaborated in Section 4.3.2.

2.
The TWSVM classification algorithm with its parameters computed in the previous step is applied on the nodes' locations as the training set. The class of each node is determined according to the mapping rule given in (22).

4.
The nodes are assigned to regions of event and non-event depending on to which of the two hypersurfaces given by (17) and (18) are closer. In other words, the class of node i is updated according to: where y i = 1 denotes that node i is in the region of event while it is not in the region of event if y i = −1.

5.
Similarly as in Res-S, the location of event is obtained by averaging all nodes in event region, i.e., by (24).
As an instance of how Red-S and Red-T work, see Figure 4 where the classification results of applying Red-S and Red-T to the network condition in Figure 4a has been illustrated in Figure 4b,c, respectively. As shown, both event location and its region have been detected in both methods; though the performance has been improved by applying Red-T. Note that the nodes with false alarms are ignored by Red-S and Red-T while the decisions of the nodes which are in the vicinity of the event but miss the detection are corrected.

Red-S Parameters
During implementation of Red-S, the value of σ in the Gaussian kernel (7) is of importance. Its value should be chosen so that a reasonable kernel value is obtained. Therefore, it can be considered as a function of the average distance between nodes, or simply a function of network density. For example, the average distance between nodes is less in dense networks, so σ should be adjusted to be less as well.

Red-T Parameters
The performance of TWSVM depends on the choices of the kernel function parameters. The optimal values for the parameters are determined by the grid search method with the aim of minimizing the distance between the estimated location of event by TWSVM and that of Red-S.
To alleviate the computational burden, the settings c 1 = c 3 and c 2 = c 4 in (17) and (18)

Edge Effect Correction
As discussed in previous sections, the centroid of the nodes in the event region, i.e., (24), is considered as the source location. Since the centroid is never located in the edges of the region under surveillance (ROI), Red-S and Red-T do not perform well at the edges of ROI; we refer to this fault as the edge effect.
The edge effect is alleviated by adding (or subtracting) a random number to the already obtained estimation of the source location. More specifically, if a portion of the nodes in event region, say 20 percent of them, are among the nodes with the lowest 2nd coordination (e.g., the nodes of the left edge of the network shown in Figure 1), the event may have been occurred at the edge. Thus the first coordination ofl should be corrected by:l (1) :=l(1) − 2u (26) where := indicates modification (like assignment operator in programming languages), and u is a uniformly distributed random variable between 0 andl(1), i.e., u ∼ U (0,l(1)). The same correction mechanism may be applied for other edges. Such correction is validated in Section 5.

Computational Complexity
SVM requires to solve a QPP with inequality constraints. It is well-known that for a training data set of size N, the computational complexity of SVM is O N 3 [46,47]. On the other hand, as discussed in Section 3.2, TWSVM solves a pair of QPPs instead of a single complex QPP of SVM. If the number of patterns in each class approximately equals to N/2, the complexity of TWSVM is O 2(N/2) 3 [25]. In other words, TWSVM is four times faster than SVM. Accordingly, the complexities of Red-S (involving SVM) and Red-T (involving both SVM and TWSVM) are O N 3 and O 5 4 N 3 , respectively. Note that both outstanding source localization methods based on statistical signal processing, MI-based and PCRLB-based methods, are iterative. Moreover, the nodes quantize their observations into L bits and A nodes are selected in each iteration [18]. The complexity of MI-based method in each iteration is O NL A + ALN [18] which grows exponentially with the network size (more precisely, with the number of nodes selected in each iteration). The PCRLB is much simpler with its complexity being O (ALN) [18] in each iteration. Therefore, Red-S and Red-T, while consuming much less bandwidth with sending just one bit by each node, provide complexity reduction, especially compared to the MI-based method.

Evaluation and Discussion
In this section, the performance of the Red-S and Red-T schemes is evaluated. To that end, a randomly deployed homogeneous WSN in a 100 m × 100 m region is considered where nodes' observations are contaminated by a standard normal AWGN. It is assumed that the network goal is to detect a target with an acoustic signal with the following isotropic model: (27) where P i denotes the strength of the acoustic source at the location of node i, P 0 is the strength of the source in 1 m distance from the target, d ti is the distance between node i and the target, and a is an attenuation coefficient (a = 2 is considered throughout our simulations). Therefore, the observation of node i can be modeled as: In simulations, C = 0.5 has been used as the trade-off parameter of the SVM optimization problem, and σ = 20000 N as the kernel parameter (i.e., σ = 20 is used for N = 1000). The accuracy of Red-S and Red-T for source localization has been evaluated in terms of mean squared error (MSE) defined by: (29) in which l andl are respectively the exact and the estimated location of the target, and E(.) denotes the statistical expectation.
The proposed methods are compared against the fault recognition (FR) algorithm [27]. In FR, the potentially faulty decision (i.e., either false alarm or miss of the event) of each node is meant to be corrected by comparing it with the decisions of the node's neighbors. In fact, FR exploits the spacial correlation among nodes' decisions and works as follows: 1.
Determine the neighbors of each node (i.e., the nodes in the communication range of each node).

2.
Nodes take an initial decision based on their observations. 3.
The final decision of each node is equal to the decision of majority of its neighbors.
The MSEs of the methods have been obtained via more than 5000 Monte-Carlo runs in different scenarios whose results have been depicted in Figures 5-7. In addition to FR, we also examined the naive source localization where simply the centroid of all nodes with u i = 1 is considered as the estimation of the source location. However, the MSEs of this method were more than three times those of FR (more than 120 m 2 in all cases). Therefore, its performance is included just in Figure 5 where the MSEs of the methods are shown in different network sizes. Comparison of the performance of naive source localization in different network densities (Figure 5a) with that of the other three methods (Figure 5b) reveals how much the performance is improved by processing the raw decisions. Figure 5b shows that the estimation error decreases in more dense networks. However, the decrease rate becomes negligible after a specified density. In other words, it is not advantageous to employ more network nodes after a specified density. For example, in our scenario in Figure 5b, the difference between the MSE of a 600-node and a 1000-node networks is less than 4 m 2 . Figure 5b also shows that MSE of Red-S yields a more accurate estimation of the event location in a less dense network. This is because the parameters of Red-T are more sensitive to the dynamics of networks. In addition, note that the grid search of the TWSVM algorithm has been confined in order to speed up the running time. In fact, Red-S is more robust than Red-T because there is just one parameter to be adjusted.
It has been shown in Figure 6 that the estimation error is lower in more values of SNR. In addition, Figure 7 shows how local false alarm rates affect the overall target tracking error. Note that nodes send more data (and hence consume more energy) in more false alarm rates. Therefore, the false alarm rate must be as low as possible. However, the very low local false alarm rate decreases the detection probability of the network as well. Therefore, there is a trade-off here.   In another scenario, shown in Figure 8, the accuracy of Red-S in tracking a moving target has been assessed. In this scenario, a target moves from the lowest left corner to the highest right corner in a constant speed. Figure 8 shows the average of 10 tracking performances. As is shown, the tracking performance is better in more dense networks. In addition, note that the accuracy is acceptable in even networks with low densities. Another point is that the performance is not appropriate in corners since the centroid of the region of the event is considered as the location of the event in Red-S which is never located at the edges of ROI. However, as shown in Figure 8b, the edge effect is alleviated appropriately, especially in denser cases, by applying the correction method discussed in Section 4.4.

Conclusions and Future Directions
In this paper, two methods of source localization were proposed for implementing in a WSN with low-cost nodes. In our proposed methods, the network firstly detects the region of the desired event and then computes the event location by averaging the locations of all nodes in the vicinity of the event. To reach that, a combination of distributed detection and learning algorithms is exploited. In fact, the source localization is carried out in three phases: (i) Network nodes send their decisions about event occurrence to FC. (ii) Upon event detection, the FC classifies the nodes into two classes of nodes in "event region" and nodes in "non-event region". (iii) The average of the locations of the nodes in the region of the event is considered as the event location.
The region of the event detection is specified by using SVM and TWSVM learning methods. Therefore, the proposed methods were referred to as "Red-S" and "Red-T", respectively. By simulating different scenarios, it was shown that Red-S and Red-T work appropriately in target localization, especially in dense networks. However, the proposed methods are capable of localizing just one event.
Applying learning methods, such as the k-mean clustering method, for multi-source localization may be considered as a future work.
Finally, note that though the proposed methods were applied on the WSN framework, they are applicable to every framework with false alarms and misses. One such example is medical health tests. The methods can also be applied on outlier detection applications.

Conflicts of Interest:
The authors declare no conflict of interest.