Tactile Routing for Location Privacy Preservation in Wireless Sensor Networks: A Game Theoretic Approach

Location Privacy Preservation (LPP) in Wireless Sensor Networks (WSNs) during the era of the Internet of things and smart systems is a critical element in the success of WSNs. LPP in WSN can be stated as: given a WSN with an adversary aiming to unravel the location of critical nodes of a WSN, the goal of the WSN manager is to enshroud the location of the critical nodes via routing and/or encryption mechanisms. Typical research in the LPP of WSN routing involves developing and/or estimating the performance of a fixed routing protocol under a given attack mechanism. Motivated by advancements in network softwarization, in this work, we propose an approach where the WSN manager as well as the WSN adversary can deploy multiple routing and attack mechanisms, respectively. Initially, the proposed approach is formulated as a repeated two-player zero-sum game. The formulation is further extended to handle multiple objectives and incomplete information in the game matrix. In this work, the multiple objectives are handled via the epsilon constraint method. The presence of incomplete information in the formulation is modeled as interval based uncertainty. To sum, the proposed formulation ultimately boils down to linear programming problems, which can be efficiently solved. Numerical case studies to showcase the applicability of the proposed approach are illustrated in this work. Finally, discussion on obtaining the required data from any given WSN, discussion and interpretation of the formulation’s results, and future research direction of the current work is presented.


Introduction
The Internet of things (IoT) is the heart of Industry 4.0, where a number of smart devices are deployed on a communication network. The devices process the incoming and outgoing data ubiquitously, and they communicate among themselves autonomously. Thus, IoT is the crux in developing new applications and experiences including smart factories, smart agriculture, smart homes, etc. The smart devices of IoT play key roles in collecting, processing, and forwarding data from sources nodes to the sink node(s) via multiple hop network communication. This ubiquitous and autonomous communication results in Wireless Sensor Networks (WSNs) with new challenges that are related to their sustainability. On one hand, since these devices are of constrained capabilities such as energy supply, transmission, range, computing power, etc., lots of research works focus on developing routing methods that cope with these devices. On the other hand, these networks are typically deployed and operated in unattended environments. Therefore, these networks might be vulnerable to various type of attacks that may impact the operation of the network partially or completely, such as identifying the location of critical network nodes (such as source(s) and sink(s) nodes). Securing the location of critical nodes in WSNs is a complicated, pressing challenge that needs to be tackled on different fronts. difficulty in this routing is the right number of fake packets to be induced into the network. Furthermore, fake packets consume network resources, including energy and time. Applications of fake packets can be seen in [13,14]. Ring Routing: The idea in this routing (see [15]) is to induce disjoint rings (cycles) while transmitting the packets from source nodes to sink node(s). Every disjoint ring contains a master node (bus node) that is responsible for forwarding the packets from source node to sink node (actual route). Simultaneously, the bus nodes on the actual route send additional packets to a ring of nodes. This additional set of packets sent on the disjoint rings confuses the adversaries. The further applications of ring routing can be seen in [16], and energy efficient ring routing can be seen in [17]. Multi-Path Routing: In multi-path routing, data between source and sink nodes are sent via multiple disjoint paths. In order to avoid tracing back of the paths by the adversary, some deviation mechanisms are proposed. For example, a random stride routing is proposed in [18], and a constrained random routing is proposed in [19]. The constrained routing limits the energy usage in the network. Typically, multi-path routing without careful deployment suffers from high energy consumption during the transmission. Thus, several flavors of this category focusing on reduced energy consumption in multi-path routing are proposed in [20][21][22]. Network Encoded Routing: In this routing, packets are re-encoded, split into multiple small packets, or integrated into one big packet during the transmission. Directional Routing: Directional routing consists of directional antennas and/or transceivers. The underlying assumption is that the directional routing makes eavesdropping expensive. In [23], the usage of directional antennas along with transmitting power control and information compression is depicted. More usage of directional transceivers is presented in [24]. Data Mule Routing: Data mules are nothing but mobile agents. The routing mechanism is as follows: a source node sends packet to a randomly selected intermediate node.
The intermediate node forwards the packet to a data mule. The data mule randomly moves around the network and forwards the packet to another intermediate node. Finally, the intermediate node forwards the packet to the sink node. The data mule makes it difficult for adversaries to locate the critical nodes, but it increases the transmission time. Context-Aware Routing: The idea here is to use sensor nodes that can detect and localize the adversary. Such a mechanism can then be used for local temporary shutdown (stop working for a while) of nodes around the adversary [25,26]. This shutdown acts as an entrapment of the adversary. Moreover, based on the distance of adversary from the source or sink nodes, alternate routes avoiding the adversary's coverage area can be used.

Problem Statement
Consider a Wireless Sensor Network (WSN) where an adversary makes passive attacks to identify (or locate) the critical nodes. The adversary is free to choose any attack mechanisms, including, for example: Let A be the set of attacking mechanisms available with the adversary and a ∈ A be an attack mechanism. It is assumed that the adversary is capable of launching local or hotspot-level attacks on the WSN. The goal of the WSN manager is to maximize the safety period, while ensuring the Quality of Service (QoS) with reasonable energy consumption. In this work, we assume that the WSN is designed such that multiple routing mechanisms can adaptively be employed, so the WSN manager (defender) could command the nodes in a WSN to choose among these different routing mechanisms. For example, the WSN manager (defender) could choose any of the following routing mechanisms for a certain period, and then choose another one for the next period, and so on according to a specific defense strategy: Let D be the set of options available with the defender and d ∈ D be a defense mechanism. Let h d,a be the total survival time (the time from the start of attacks to the detection of critical station(s)) when the adversary picks an attack mechanism a and the defender picks a defense mechanism d. The primary goal of the defender is to have high survival time, which is contrary to the goal of the adversary. Therefore, the payoff for the adversary conflicts with the defender and can be taken as −h d,a . Thus, the conflict between adversary and defender can be modeled as a two-player zero sum-game. However, unlike the conventional game theory models, the manager has to satisfy the QoS and energy requirements. In addition to that, the data of survival time may contain uncertainty. All the notations that are used in this paper are defined in Abbreviations. In the following section, a game-theory-based approach to LPP is proposed.

Modeling WSN LPP as a Matrix Game
The relation between network designers/administrators and intruders/attackers can be modeled as a non-cooperative game. Hence, security-and privacy-based protocols in computer networks can be designed using game theory. Game-theory-based approaches related to privacy in Internet-based communications can be seen in works [27][28][29][30][31][32]. In this work, a strategic game between the WSN manager (as the defender) and the WSN attacker (as the adversary) is proposed. The roles of the defender and the adversary, under mild assumptions, can be cast as a two-player zero-sum game. The following are the key assumptions and characteristics of the proposed game: • Rational players: Both players have well-defined individual goals, and they strive to reach their goals. • Non-cooperative players: Both players are competitive, and their individual goals are diametrically opposed. • Finite moves: Both players have a finite set of moves to play. A move for the defender (adversary) can be defined as a selection of a routing (counterattack) mechanism. • Independent moves: Unlike typical two-player games, in this game, there are no fixed time points where the players play their moves. Practically, the players can play the moves asynchronously and at different frequencies. However, the moves are taken independently, i.e., no player knows the choice of the other player a priori. • Memory-less repeated moves: The game is a multiple-shot game, where the moves of the defender are not at all visible to the adversary (and vice versa), even after the end of the game. Thus, it exhibits memory-less nature, where none of the players learn from the other based on the past shots or moves. • Expected preference: For every combination of defender move and attacker move, a payoff is known or can be estimated. This payoff captures the preference relation for each player. Both players prefer to obtain the best of the expected payoff value.
Thus, due to the above nature of the game, it can be approximated as a steady-state, two-player zero-sum game [33]. Since the moves are taken by the players over and over again during the network's operation, it is advisable for the defender to use different pure strategies on each move. Therefore, the optimal mixed strategy obtained from the twoplayer zero-sum game can be utilized as tactical operational probabilities for the defender.

Deterministic Survival Times
Let H ∈ R D×A be a matrix, where the dth row and ath column element h d,a represents the survival time when an adversary picks an attack mechanism a ∈ A and a defender picks a defense mechanism d ∈ D. Let us assume that h d,a is crisp and deterministic. The expected payoff E of the game, gain for the defender (loss for the adversary), is given as follows: x d is the probability of using the strategy d by the defender, y = [y 1 , . . . , y |A| ] T , and y a is the probability of using the strategy a by the adversary. In the case of mixed strategies, the objective of the defender is to maximize the minimum expected payoff, whereas the objective of the adversary is to minimize the maximum expected payoff.

Lemma 1. The Operations
Research (OR) model to identify the best mixed strategy for the defender (also known as max-minimizer) can be written as: where h a is the ath column of matrix H, e represents a vector containing all ones, and its size can be identified from the context.
be the worst expected payoff value of the defender for any arbitrary choice of x, defined as: The worse expected value can be re-written as: where (H T x) a is the ath element of vector H T x. Equation (4c) follows from the theory of linear programming, which states that the optimal solution of a linear function ∑ a∈A (H T x) a y a on a non-empty simplex {e T y = 1, y ≥ 0} lies at one of the |A| extreme points. Thus, the OR model that provides the best mixed strategy for the defender (also known as max-minimizer) can be written as: max : The above formulation is the same as Formulation (2).
Corollary 1. The OR model to identify the best mixed strategy for the attacker can be written as: where h d is the dth row of matrix H.

Lemma 2.
The proposed game has a mixed strategy Nash equilibrium.
Proof 2. Based on the linear programming's weak duality theory [34], the following holds: Notice that both Formulations (2) and (6) are feasible. From the linear programming's strong duality theory, it can be concluded that the optimal objective values for both formulations are equal and finite, say v. That is, the following holds: Let x * and y * be defined as follows: Equation (8) indicates that, no matter what the adversary (defender) chooses to play, the defender (adversary) has no rational to change from x * (y * ). That is, neither can hope to improve the expected payoff value of x * T Hy * . Thus, (x * , y * ) represents the mixed strategy Nash equilibrium of the proposed game.
In practical scenarios, the defender (adversary) may have additional objectives that are unrelated to the adversary (defender). For example, the defender might be inclined to deviate from the optimal mixed strategy in order to reduce energy consumption and/or improve QoS in communication. Formulation (2) can be extrapolated to include the QoS and energy objectives. Incorporation of the additional objectives for the defender's model may invalidate the above minimax equality. However, the resulting strategy will ensure that QoS and energy consumption in the communication network are at the preferred level. A Multi-Objective Linear Programming (MOLP) model that captures optimal mixed strategy for the defender can be written as: max.: where the QoS level expressed in terms of the transmission latency, obtained by strategy d, Let p max = p T x * and q max = q T x * . Formulation (11a-d) can be equivalently solved as a series of the following OR models, where ε 1 and ε 2 are parameters that are selected a priori: When ε 1 = ε 2 = 0, the solution of Formulation (12a-f) results in minimax game value. However, when ε 1 = 0 or ε 2 = 0, then the solution deviates from the minimax game value. Obtaining the solution for various values of ε 1 and ε 2 results in a Pareto surface, which can be presented to the WSN manager. The network manager and decision makers can then select the operational point from the Pareto. In the following subsection, we extend the proposed model to handle uncertainty in the H matrix.

Incomplete Information of Survival Times
Uncertainty in H matrix is typically referred as incomplete information in payoffs. Both probabilistic and non-probabilistic methods that handle uncertainty in payoffs are available in the literature [35][36][37][38][39]. Uncertain payoffs in the form of interval-valued payoffs [40][41][42][43] are related to the proposed WSN game.
In this work, we assume that the uncertainty in estimating parameter H exists. Furthermore, we assume that the uncertainty can be represented by a bounded interval. Without loss of generality, let the interval based payoff matrix of defender be expressed as Ω = (ω) d,a = ([h d,a , h d,a ]), where (ω) d,a is a closed interval, h d,a is the lower bound on the closed interval, and h d,a is the upper bound of the closed interval.
Note that the above uncertainty keeps the zero-sum nature of the game intact, since the payoff matrix of the attacker will be nothing but −Ω = (−ω) d,a = ([−h d,a , −h d,a ]). Similar to the methodology presented in [41] for handling interval-based uncertainty, we transform Formulation (12a-f) into two formulations: one for obtaining lower bound (pessimistic game) and another for obtaining upper bound (optimistic game). The formulations are as follows: max : where the values of (q max , p max ) and (q max , p max ) are obtained using (9) by replacing h T a with h a and h a , respectively. Upon solving Formulations (13a-f) and (14a-f), we obtain the upper bound λ and the lower bound λ, respectively, for a given value of ε 1 and ε 2 .
To obtain the Pareto surfaces, a grid search on different values of ε 1 and ε 2 is conducted. The solution of the above two models results in two Pareto surfaces. The network manager and/or the decision maker can pick a desired operational point from the Pareto surfaces.

Numerical Experiments and Discussion
In this section, three case studies are presented to illustrate the usage of the proposed approach. Case Study-1 is a toy example that highlights the notion of mixed strategy, steady state analysis, and data estimation. In Case Study-2, synthetic data on an arbitrary 6 × 6 game are utilized to depict the applicability of the proposed game. In Case Study-3, a nonsquare game with overlapping intervals, which is most likely related to the real-world scenarios, is presented.

Case Study-1
As a toy example, let us consider the case where a defender can pick between Phantom Routing (PR) and Multi-path Routing (MR). On the other hand, the adversary can pick between Traffic Analysis (TA) and Tracing Back (TB). From the literature, the following payoff matrix (illustrated in Table 1) can be constructed. Furthermore, the average values of energy and latency presented in Table 2 can be attributed to the above routing mechanisms.  Clearly, the 2 × 2 game presented in Table 1 has no pure strategy Nash equilibrium. In fact, this game is similar to the famous Matching Pennies game [47], which is a cornerstone example for the notion of mixed strategy Nash equilibrium and the concept of steady state analysis of the games. Indeed, Tables 1 and 2 are not constructed from a comprehensive literature survey, and the studies in the literature may involve other parameters that may result in biased comparisons. Nevertheless, the sole purpose of Case Study-1 is to establish the following points: • H, p, q can be estimated beforehand from either simulation or emulation study of the actual networks. • Mixed strategy Nash equilibrium is strongly applicable to the proposed defender and adversary game.

Case Study-2
The survival time payoff matrix H containing the interval-based normalized survival times (values between (0, 1]) are given in Table 3 Table 4 presents the energy p and the latency q values. Suitable units for survival time, energy, and latency can be obtained from the actual network. A discussion on obtaining the above data from actual WSN is presented at the end of this section. The case study has the following key structure: The defender can change the routing policy from time to time. The defender can pick from any six routing mechanisms, say: R1, R2, . . . R6 . On the other hand, the adversary can change the attacking mechanism from time to time. The adversary can pick from any six attack mechanisms, say: A1, A2, . . . A6. The grid for ε 1 is generated by taking 10 equally spaced intervals of [0, 0.2 × q max ], [0, 0.2 × q max ], and [0, 0.2 × q max ] for solving Formulations (12a-f)-(14a-f), respectively. Similarly, for ε 2 , the grid is generated by taking 10 equally spaced intervals of [0, 0.2 × p max ], [0, 0.2 × p max ], and [0, 0.2 × p max ] for solving Formulations (12a-f)-(14a-f), respectively. All the models are solved using the GLPK solver (https://www.gnu.org/software/glpk/, accessed on 1 September 2022 ). The Pareto surface obtained after solving Formulation (12a-f) is depicted in Figure 1a, and the Pareto surfaces obtained after solving Formulations (13a-f) and (14a-f) are illustrated in blue and green colored surfaces in Figure 1b, respectively. Table 5 displays the results of the case study at ε 1 = ε 2 = 0, which indicates the minimax game values. From Table 5, it can be concluded that if the defender ignores the latencyand energy-related objectives, then playing R1, R2 or R6 equally will likely be the best for LPP. Practically, it means that the network administrator can alternate between R1, R2 or R6 to obtain the longest possible survival time, while preserving the privacy location of the sink node. Other operational points can be extracted from Figure 1b

Case Study-3
Case Study-3 is similar to the setup of Case Study-2. The key difference is in the data. The interval widths in Case Study-2 are constant, and the intervals are non-overlapping. In this case study, the interval widths are not constant, and the intervals are overlapping for a given row or column. The data for Case Study-3 are presented as follows: The survival time payoff matrix H containing the interval-based survival times are given in Table 6, and Table 7 presents the energy p and the latency q values. All the models are solved using the open source GLPK solver. The Pareto surface obtained after solving Formulation (12a-f) is depicted in Figure 2a, and the Pareto surfaces obtained after solving Formulations (13a-f) and (14a-f) are illustrated in blue and green colored surfaces in Figure 2b, respectively. Table 8 displays the results of the case study at ε 1 = ε 2 = 0, which indicates the minimax game values. From Table 8, it can be concluded that if the defender ignores the latency-and the energy-related objectives, then, under optimistic circumstances, playing R3, R5, R6, R7 or R8 with the given probabilities in Table 8 is the best for LPP. In practice, this could mean that R7 is deployed 36.9% of the time, R3 is deployed 26.3% of the time, R8 is deployed 14.4% of the time, and so on. Similarly, under pessimistic circumstances, playing R1, R2, R4, R5, R6 or R7 with the given probabilities in Table 8 is the best for LPP.  Table 7. Energy (p) and Latency (q) for each routing mechanism in Case Study-3.

Estimating H, p, q
In order to implement the proposed technique, WSNs and IOTs should have flexible network design such that they can employ different routing protocols alternatively during their operations. The advancements in network softwarization could facilitate the existence of such networks. In [48], the authors surveyed the softwarization of Unmanned Aerial Vehicle (UAV) routing, while [49] presented the existing literature on AI-enabled routing protocols for UAV networks. Other applications of softwarization include, but are not limited to, Routing algorithm optimization in software-defined network Wide Area Network (WAN) [50], integrating Multi-path TCP (MPTCP), and Segment Routing (SR) paradigms over SDN/NFV [51].
Once such networks are established, H, p, q can be easily obtained either by developing a simulation model or extracting empirical data from emulated networks. The simulation and emulation models should have the ability to estimate the survival times, latency, and energy for a pair of routing and attack mechanisms. Thus, for a given network under consideration, the data H, p, q should be estimated via simulation model a priori by the WSN manager or the decision maker. A parametric estimation of H, p, q can be a future research direction.

Conclusions
In this work, we present a seed for a novel direction towards designing a multi-routing Wireless Sensor Network. This approach can be argued by the fact that the adversary can choose and/or change attacking mechanisms at any given time during the time horizon. Thus, the WSN manager should have the ability to choose/alternate among different routing protocols for the longevity of the network's LPP. Furthermore, the game theoretic model provides the WSN manager with a guide on proportion of time dedicated to each routing protocol, under optimistic, average, and pessimistic circumstances. We believe the usage of multiple routing protocols with suggested time proportions will be very effective on Location Privacy Preservation (LPP). In the future, we plan to investigate the challenges of implementing this approach in real networks and study the impact of other possible payoffs on the network performance, such as computational complexity, bandwidth utilization, and memory consumption.  the worst expected payoff value of the defender for any arbitrary choice of x x d the probability of using the mechanism d by the defender under average uncertainty conditions x vector representing probabilities of using different defense mechanisms under average uncertainty conditions, defined as x = [x 1 , . . . , x |D| ] T y a the probability of using the mechanism a by the adversary under average uncertainty conditions y a vector representing probabilities of using different attack mechanisms under average uncertainty conditions, defined as y = [y 1 , . . . , y |A| ] T x a vector representing probabilities of using different defense mechanisms under optimistic uncertainty conditions x a vector representing probabilities of using different defense mechanisms under pessimistic uncertainty conditions x * , x * , and x * the optimal value of x, x, and x, respectively y * the optimal value of y q d average transmission latency that can be obtained during the execution of defense mechanism d p d average amount of energy that is required for implementing defense mechanism d q a vector representing transmission latency for all defense mechanisms q = [q 1 , . . . , q |D| ] T p a vector representing energy consumption for all defense mechanisms p = [p 1 , . . . , p |D| ] T q max the expected transmission latency at operating point x * , defined as q max = q T x * p max the expected energy consumption at operating point x * , defined as p max = p T x * p max and p max the expected energy consumption at operating points x * and x * , respectively, defined as p max = p T x * and p max = p T x * q max and q max the expected transmission latency at operating points x * and x * , respectively, defined as q max = q T x * and q max = q T x * λ, λ, and λ the worst payoff for the defender during average, optimistic, and pessimistic uncertainty conditions, respectively e a vector containing all ones ε 1 and ε 2 parameters for epsilon constraint method