Smart Energy Harvesting for Internet of Things Networks

In this article, we address the problem of prolonging the battery life of Internet of Things (IoT) nodes by introducing a smart energy harvesting framework for IoT networks supported by femtocell access points (FAPs) based on the principles of Contract Theory and Reinforcement Learning. Initially, the IoT nodes’ social and physical characteristics are identified and captured through the concept of IoT node types. Then, Contract Theory is adopted to capture the interactions among the FAPs, who provide personalized rewards, i.e., charging power, to the IoT nodes to incentivize them to invest their effort, i.e., transmission power, to report their data to the FAPs. The IoT nodes’ and FAPs’ contract-theoretic utility functions are formulated, following the network economic concept of the involved entities’ personalized profit. A contract-theoretic optimization problem is introduced to determine the optimal personalized contracts among each IoT node connected to a FAP, i.e., a pair of transmission and charging power, aiming to jointly guarantee the optimal satisfaction of all the involved entities in the examined IoT system. An artificial intelligent framework based on reinforcement learning is introduced to support the IoT nodes’ autonomous association to the most beneficial FAP in terms of long-term gained rewards. Finally, a detailed simulation and comparative results are presented to show the pure operation performance of the proposed framework, as well as its drawbacks and benefits, compared to other approaches. Our findings show that the personalized contracts offered to the IoT nodes outperform by a factor of four compared to an agnostic type approach in terms of the achieved IoT system’s social welfare.


Introduction
Internet of Things (IoT) has gained great research and industrial interest in the last decade, as it enables the operation and collaboration of a large number of devices with different communication and computing capabilities, such as sensors, actuators, smartphones, and others [1].Those IoT devices collect and report information to several types of application in order to support the end-users' needs and deliver meaningful services, such as environmental monitoring, social networking and surveillance systems [2].The exploitation of the IoT devices' physical and social characteristics can create efficient coalitions among them, to better serve a common goal in the system, e.g., crowdsourcing, surveillance of an area of interest, and in-home healthcare [3].A common characteristic of the IoT devices is their frequent transmission of data to a receiver, e.g., access point, a multi-access edge computing server, for further processing and planning of the delivered services [4].Even though the amount of transmitted data is usually small, the frequent transmissions reduce the battery life of the IoT devices, which often have limited power resources, and their battery replacement is a difficult and costly task [5].Thus, the energy harvesting solution from radio frequency signals by deploying a wireless powered communication system has arisen as a suitable means to prolong the IoT devices' battery life [6].In this paper, we introduce a smart energy harvesting method by exploiting the principles of Contact Theory, and an artificial intelligent model to support the autonomous IoT nodes' association to femtocell access points based on Reinforcement Learning.

Related Work
The topic of energy harvesting by IoT devices has been thoroughly studied in the literature, mainly focusing on the technical and implementation aspects of the problem [7].The authors in [8] identify the problem of the limited battery of the IoT nodes and they provide a short survey regarding the existing energy harvesting technologies, and the corresponding power management techniques to sparingly use the harvested energy.In [9], the authors aim to jointly optimize the data flow from the IoT nodes to the users and the IoT nodes' battery usage, by deploying IoT gateways and energy transmitters to save the energy used for the transmissions and charge the IoT nodes in parallel, respectively.Furthermore, a game-theoretic approach is adopted based on the theory of Stackelberg games, where the IoT gateways optimize the data caching and incentivize the energy transmitters to charge the IoT nodes, by determining their optimal transmission power strategy.A detailed survey study is presented in [10] that identifies the currently available IoT energy harvesting systems, the corresponding energy distribution approaches, and the energy storage devices and control units that facilitate the IoT nodes' energy harvesting process.The provided categorization of the energy harvesting systems enables the reader to identify the differences among the existing energy harvesting techniques and the corresponding energy distribution approaches, concluding with the most appropriate selection per realistic use case scenario.A predictive energy harvesting model is introduced in [11] by exploiting the extended Kalman filtering method and jointly guaranteeing the Quality of Service (QoS) requirements and several security protection levels in the IoT system.The proposed predictive energy harvesting model can enable the IoT system to plan its energy harvesting needs per connected IoT node and proactively adapt its operation and energy consumption based on the trade-off of energy demand and energy availability.
The exploitation of multiple energy harvesting sources and techniques, such as solar, radio frequency, thermal, artificial light, is studied in [12], by introducing a hybrid energy harvesting model for the IoT nodes that can jointly support the energy harvesting from several sources of energy.The authors provide a mathematical analysis to prove the energy harvesting benefits in terms of the amount of the harvested energy and the efficiency in the energy harvesting process via multiple and hybrid energy harvesting sources compared to a single source of energy harvesting.Focusing on wireless powered communications systems and radio frequency energy harvesting, the authors, in [13], describe a gametheoretic and labor economics-based approach to deal with the optimal energy harvesting under complete and incomplete information scenarios, respectively, regarding the channel conditions among the IoT nodes and the energy transmitters.A Lyapunov optimizationbased approach is formulated in [14] to jointly optimize the frequency and the stability of the sampling rate of the IoT energy harvesting nodes showing the increase in the amount of harvested energy.The main novelty of the proposed method is its real time operation and adaptation to the IoT system's conditions and energy availability of the nodes and the access points without making any assumptions nor predictions on future energy availability patterns.Furthermore, an on-demand energy harvesting model is proposed in [15] towards improving the delay performance of the radio frequency energy harvesting process by introducing two associated discrete time Markov chain models that jointly optimize the average packet delay, the packet loss probability, and the network throughput.The novel concept of directed radio frequency signals charging in a unicast manner each IoT node is also introduced in [15].The proposed method can charge each IoT node in a personalized manner by transmitting directed radio-frequency beams to the node, thus, increase the amount of harvested energy by the IoT node.
A deep reinforcement learning approach of the actor-critic deep Q-network reinforcement learning algorithms [16] is presented in [17] to jointly address the access and power transmission and harvesting problem of the IoT nodes by considering the sum rate and prediction loss.The importance of IoT energy harvesting nodes in public safety scenarios is discussed in [18], where the IoT nodes create coalitions among each other based on their physical and socio-technical characteristics, which are further exploited by a mobile Unmanned Aerial Vehicle (UAV) in order to select the IoT node cluster that will be charged.This research work is extended in [19] by jointly optimizing the nodes' transmission power to further report their data to an access point.Additionally, an energy-harvesting-aware routing algorithm is presented in [20] to jointly improve the IoT nodes' battery life and the IoT network's Quality of Service under different traffic loads and energy availability conditions.A practical application on IoT energy harvesting nodes is introduced in [21], where the IoT nodes measure the vibration conditions of railway tracks, and report them to a reader, in order to monitor the railway track conditions.The IoT nodes are installed on the railway tracks and harvest radio frequency energy from a reader installed on the train.
Following the above analysis, it is concluded that great attention has been devoted to the technical and implementation aspects of energy harvesting in IoT systems.Specifically, the most recent approaches that have been reviewed above mainly focus on the improvement in the amount of harvested energy, either by providing directed radio-frequency beams from the transmitter to the receiver, or by improving the efficiency of the allocated charging power to the IoT nodes, or even by optimizing the energy consumption of the IoT nodes; thus, greater energy availability is achieved.However, the reviewed approaches have not fully exploited the IoT nodes' physical and social characteristics during the energy harvesting process, and their interactions with the energy transmitters [22], in order to ultimately optimize the enrgy harvesting process.
To address these issues, in this paper, we design a contract-theoretic approach to capture the interactions among the IoT energy harvesting nodes and the energy transmitters [23,24].Our goal is to determine the optimal IoT nodes' harvested energy with respect to the amount of data that they transmit, and the energy transmitters' optimal charging power.We also introduce an artificial-intelligence-based mechanism to enable the IoT devices to select the most beneficial energy transmitter based on their energy harvesting experience [25].

Contributions & Outline
The increasing number of Internet of Things (IoT) nodes and their corresponding need to extend their battery life in order to support IoT services have highlighted, which has elevated the need to address the problem of energy harvesting from radio frequency signals in a wireless powered communication system.The ultimate goal of this approach is to guarantee the smooth operation of the overall IoT system and prolong its seamless operation.To the best of our knowledge, this is the first research work that systematically studies the energy harvesting process in an IoT system from a techno-economics and artificial intelligent point of view.We introduce the concept of IoT energy harvesting node types, which are expressed as a function of their communication interest, proximity to the energy transmitter and each other, and their energy conversion efficiency.The IoT nodes' and the access points' utility functions are designed to represent the profit of the different entities from the energy harvesting and data acquisition process, respectively.The main contributions of this paper are summarized as follows: 1.
Based on the principles of Contract Theory, an optimization problem is formulated and solved to determine the IoT nodes' transmission power, transmitted data to the associated access point, and the energy transmitters' optimal charging power, in order for the overall system to converge to an optimal and stable point of operation; 2.
An artificial-intelligence-based reinforcement learning mechanism is introduced, which targets the most beneficial long-term energy transmitter selection from each IoT energy harvesting node in an autonomous and distributed manner.
The rest of this paper is organized as follows.The system model is discussed in Section 2. The IoT node types and all the involved entities' utility functions are presented in Section 3.1.The contract-theoretic optimization problem is formulated in Section 3.2 and solved in Section 3.3.The artificial intelligent energy transmitters' selection by the IoT nodes is discussed in Section 4. Numerical results are presented in Section 5, and the conclusions are drawn in Section 6.

System Model
A femtocell-based communications network is considered consisting of |F| femtocells with overlapping coverage range in the examined communications environment and their set is F = {1, . . ., f , . . ., |F|}.The femtocell access points (FAPs) jointly act as data receivers from the connected IoT nodes and energy transmitters [26] 1.
The IoT nodes can communicate among each other in order to exchange the information needed to perform a task, e.g., temperature sensors measuring the temperature in a smart building [27].We define the relationship factor r i,i ∈ [0, 1] among two IoT nodes.A higher value of the relationship factor shows a higher level of communication interest among two IoT nodes.The communication channel gain conditions among two IoT nodes and among an IoT node and a FAP are defined as , respectively, where λ, µ > 0 capture the fading phenomena.At each timeslot τ, each IoT node has some available energy E (τ) av.i [J], which indicates its maximum possible transmission power during the WIT phase, as harv.i [J] energy during the WET phase, and invests E (τ) tr.i [J] energy to transmit its data to the FAP during the WIT phase.Thus, the available energy of each IoT node for the next timeslot τ + 1, is determined as tr.i .The transmission power of the IoT node i, in order to report its data to the FAP f , is denoted as P i, f [W], while the personalized FAP's charging power for the IoT node i is P f ,i [W].The FAP uses directional beams in order to improve the efficiency of the energy's harvesting [15].Considering the non-orthogonal multiple access (NOMA) technique in the uplink communication from the IoT nodes to the FAPs, and the Successive Interference Cancellation (SIC) technique implemented at the FAPS, each IoT node's achievable data rate is given as follows based on Shannon's formula [28] where W [Hz] is the system's bandwidth and σ 2 is the power of zero-mean Additive White Gaussian Noise (AWGN).It is noted that without loss of generality, we consider , thus, by implementing the SIC technique, the signal of the IoT node with the highest channel gain is decoded first at the corresponding FAP, as presented in Equation (1).Given that the IoT devices reside in a small area, we account for the interference stemming from all the IoT nodes' transmissions, even if they are connected in different FAPs [29].The acronyms and the notation adopted in this paper are presented in Tables 1 and 2, respectively.Communication interest factor t i , q i , r i , U i IoT node's type, effort, reward, utility function k IoT node's data transmission cost e(r i (t i )) Evaluation function w FAP's cost to provide the rewards U f FAP's utility function

Contract Theoretic Energy Harvesting
In this section, we will exploit the principles of Contract Theory towards capturing the interactions among the IoT energy harvesting nodes and the FAPs, in terms of transmitting data and harvesting energy and charging the nodes, respectively.Assuming that each IoT node has selected the FAP that it will communicate with and harvest energy from (details in Section 4), each FAP acts as a virtual "employer", offering personalized rewards to each connected IoT node, in terms of charging power towards incentivizing the nodes, which act as virtual "employees", to invest an effort-translated in their transmission power-to report their collected data to the FAP for further exploitation by the IoT service that is offered to the end-users, e.g., smart heating systems.

Types, Utility Functions, and Contracts
Each IoT node is characterized by its type, which depends on the node's physical and social characteristics within the IoT network.Those characteristics are summarized in the socio-physical factor SP i , the proximity factor ρ i, f , and the energy conversion efficiency factor η i .Towards building the socio-physical factor SP i for each node i, we initially consider the channel gain symmetric matrix G = {G i,i } |I|×|I| , ∀i, i ∈ I, and cre-ate the channel quality vector The latter is a simple and indicative factor of the communication channel conditions of each node i with all the other IoT nodes within the examined IoT network.We normalize the channel quality vector, as Additionally, each node i being associated with FAP f is characterized by the proximity factor ρ i, f ∈ [0, 1], which expresses the node i's normalized distance from the FAP f , with respect to the FAP's maximum coverage range.Each node is characterized by its energy conversion efficiency factor η i ∈ [0, 1], which shows how efficiently the node can convert the harvested energy from the FAP's directed radio frequency beam to energy that can be exploited for its operations, e.g., data transmission.Considering the aforementioned three factors, the type of each IoT node is defined as follows Each node invests an effort q i ∈ [0, 1] in order to transmit its data to the FAP, which is translated to its uplink transmission power P i, f = q i • P Max i .For simplicity in the notation, we have omitted the timeslot τ indicator in the rest of the analysis.Furthermore, the FAP incentivizes each IoT node, which is connected to this FAP, to report its data by charging it with directed radio frequency beams.The FAP's personalized reward to the node i is denoted as r i ∈ [0, 1], and the corresponding power of the directed radio frequency beam is P f ,i = r i • P f , where P f [W] is the FAP f 's available charging power.Thus, the IoT node's harvested energy in a timeslot τ during the WET phase, as discussed in Section 2, is while the corresponding energy invested to its data transmission during the WIT phase is E (τ) Each IoT node evaluates the received reward r i from the FAP based on the evaluation function on e(r i (t i )), which is a strictly increasing function with respect to the received reward, e.g., e(r i (t i )) = r i (t i ).In practice, the evaluation function captures the node's required charging power.Therefore, each IoT node's utility function is defined by the revenue that the IoT node enjoys from the charging process (first term of Equation ( 3)), while considering the cost of its data transmission due to its invested transmission power (second term of Equation ( 3)) where k ∈ R + is the IoT node's experienced cost to transmit its data by investing its transmission power.Focusing on the benefit of each FAP from collecting data from the IoT nodes, we express its utility as the profit gained from the IoT nodes' invested effort, while considering the cost to provide the rewards.Each FAP is not aware of the IoT nodes' type; thus, we define the probability Pr i (t i ), with where t = [t 1 , . . ., t |I| ], r = [r 1 , . . ., r |I| ], q = [q 1 , . . ., q |I| ] are the IoT node types, rewards and effort vectors, respectively, and w ∈ R + is the FAP's cost of providing the rewards, due to the spending energy required to perform the node charging.

Problem Formulation
In this section, we will formulate the problem of optimal energy harvesting and charging as a contract-theoretic optimization problem, as follows.
The solution to the optimization problem (5a)-(5d) is the optimal contract {r * i , q * i } for each IoT node i ∈ I.
In the following description, we discuss the physical meaning of thed optimization problem formulated above in detail.To determine the optimal harvested power by the IoT nodes, and the optimal charging power provided by each FAP to each connected IoT node, the profit/benefits of the FAPs and the IoT nodes should be jointly optimized, as presented in (5a)-(5d).Each FAP aims to optimize its utility function (5a) towards determining the optimal contract {r * i , q * i }.It should be noted that the optimization problem (5a)-(5d) is solved by each FAP and the corresponding IoT nodes connected to it.Thus, we solve as many optimization problems as the number of FAPs in the examined system, while considering that each IoT node should at least receive a positive utility (Equation (5b)) in order to be incentivized to participate in the IoT network.The latter condition (Equation (5b)) is referred as Individual Rationality (IR).Furthermore, each node achieves a higher utility when receiving the contract designed for its unique characteristics, i.e., type, as compared to any other contract designed for another node (Equation (5c)).This condition is referred to as Incentive Compatibility (IC).
Additionally, for notation convenience, we sort the types of the IoT nodes as Towards further elaborating on the constraint of Equation (5d), we analyze and prove the conditions of fairness, monotonicity, and rationality in the following three propositions.Proposition 1. (Fairness) An IoT node of higher (or the same) type will receive a higher (or the same) reward, i.e., r i > r Based on the fairness condition, an IoT node of a higher type, i.e., improved sociophysical characteristics, will enjoy higher reward from the FAP, i.e., increased charging power.Proposition 2. (Monotonicity) An IoT node of higher type, i.e., t The physical meaning of the monotonicity property is that an IoT node of better sociophysical characteristics, i.e., type t i , is expected to report greater amount of information by investing more uplink transmission power, i.e., effort q i .Thus, the FAP will provide a greater reward r i by an increased charging power.The last condition that is examined is the rationality.Proposition 3. (Rationality) An IoT node of higher type, i.e., t The conditions of fairness, monotonicity, and rationality are presented in a combined manner in Equation (5d).

Problem Solution
In this section, our goal is to solve the contract-theoretic optimization problem, as presented in Equations (5a)-(5d), under the scenarios of complete and incomplete information from the FAPs perspective regarding the IoT nodes' socio-physical characteristics, i.e., types.The solution of the contract-theoretic optimization problems, which are solved by each FAP along with its connected IoT nodes, will result in determining the optimal contracts {r * i , q * i }, ∀i ∈ I. Based on this solution, the optimal charging power P f ,i of each FAP to each connected node will be determined, as well as the optimal transmission power P i, f of each IoT node.

Complete Information Scenario:
In this scenario, the FAPs know the types of the IoT nodes in a deterministic manner, thus, the contract-theoretic optimization problem (5a)-(5d) can be rewritten, as follows.
Theorem 1. (Optimal Contract under Complete Information) The optimal contract {r * i , q * i } among an IoT node i connected to the FAP f considering complete information of the IoT nodes' The complete information scenario is an ideal case, and will mainly be used for benchmarking purposes.In practice, the FAPs have limited information regarding the IoT nodes' socio-physical characteristics, i.e., types.Thus, in the following analysis, we examine the scenario of incomplete information regarding the IoT nodes' types.

Incomplete Information Scenario:
In the following analysis, we examine the contracttheoretic optimization problem that was presented in (5a)-(5d) under the incomplete information scenario.Initially, we perform a reduction in the individual rationality conditions in Equation (5b).Based on the monotonocity and incentive compatibility conditions, we have that: t i e(q i ) − kq i ≥ t i e(q i ) − kq i ≥ t i e(q 1 ) − kq 1 .Given that t i > t 1 , we can rewrite the above inequality as follows: t i e(q i ) − kq i ≥ t i e(q 1 ) − kq 1 ≥ t 1 e(q 1 ) − kq 1 ≥ 0. Thus, we conclude that the individual rationality condition holds true for all the IoT nodes, if t 1 e(q 1 ) − kq 1 ≥ 0 holds true.The latter constraint can be further reduced to t 1 e(q 1 ) − kq 1 = 0, as the FAP will provide the minimum sufficient reward to the IoT nodes to participate in the IoT network.Thus, the constraint (5b) is equivalent to t 1 e(q 1 ) − kq 1 = 0.
Next, our goal is to reduce the incentive compatibility (IC) constraints, as presented in Equation (5c).The following terminology is used in order to represent the IC constraints: (i) i, i , i ∈ {1, . . ., i − 1}: downward IC constraints; (ii) i, i − 1, i ∈ I: local down IC constraints; (iii) i, i , i ∈ {i + 1, . . ., |I|}: upward IC constraints; and (iv) i, i + 1, i ∈ I: local upward IC constraints.Based on the above analysis of reducing the constraints, we can rewrite the initial contract-theoretic optimization problem as follows: We observe that the optimization problem (7a)-(7d) is a convex optimization problem.Therefore, to determine the optimal contracts {r * i , q * i }, ∀i ∈ I, we can use standard convex optimization techniques [30].

Artificial Intelligent Association
In this section, we introduce an artificial-intelligence-based reinforcement learning mechanism to enable the IoT nodes to make the most beneficial long-term energy transmitter (i.e., FAP) selection in an autonomous and distributed manner.Our study focuses on the Log-Linear reinforcement learning algorithms, such as the Max Log-Linear and the Binary Log-Linear algorithms, which are able to converge to the best equilibrium point (if one exists) of the system with high probability.Additionally, the Log-Linear algorithms allow the IoT nodes to deviate from their probabilistically optimal decisions and make some suboptimal decisions in order to thoroughly explore their available actions.In this paper, we adopt the Max Log-Linear mechanism that requires no exchange of information among the IoT nodes and the FAPs.Each IoT node aims to learn, in the long-term, the most-beneficial choice of FAP; thus, its strategy space is S i = {s 1 , s 2 , . . ., s f , . . ., s |F| }.Initially, each IoT node selects a strategy s i ∈ S i with equal probability , where ite presents the iteration of the Max Log-Linear algorithm.Then, at each iteration, one IoT node is randomly selected to explore an alternative strategy s .The selected IoT node updates its strategy following the probabilistic learning rules in Equation (8a) and Equation (8b), while the rest of the IoT nodes keep their previously selected strategies unchanged, i.e., learning phase.
The pseudo-code of the introduced Max Log-Linear algorithm that enables the IoT nodes to select a FAP, which they can harvest energy from and communicate with the selected FAP, is presented in Algorithm 1.The outcome of the Max Log-Linear algorithm will be the stable selection of FAPs from the IoT nodes.

Numerical Results
In this section, a detailed numerical evaluation analysis is presented based on simulations in order to show the effectiveness and performance of the proposed smart energy harvesting framework for Internet of Things networks.First, in Section 5.1, we focus on validating the operation of the proposed contract-theoretic energy-harvesting mechanism, in terms of determining the optimal contracts under the scenarios of complete and incomplete information regarding the IoT nodes' socio-physical characteristics.The benefits of adopting Contract Theory and exploiting the IoT nodes' characteristics are presented in Section 5.2.Having verified and analyzed the pure operation of the proposed framework, a detailed comparative evaluation is presented in Section 5.3 to show the superior performance of the overall system by enabling the IoT nodes with artificial intelligence, against other approaches that have been used in the literature.
Throughout our evaluation, we consider av,i ∈ [0, 20]mJoule representing a typical IoT system consisting of IoT nodes, such as temperature sensors [31].The proposed framework's evaluation was conducted in an ACER laptop, with Intel Core i7, 3.9GHz Processor, and 16GB available RAM.In the following results, unless otherwise explicitly stated, the above values of the simulation parameters are used.

Pure Operation Performance
In this section, we present the pure operation performance of the proposed contracttheoretic energy harvesting model by examining the scenarios of complete and incomplete information of the IoT nodes' characterises from the FAPs' perspective.The results presented below are derived from one indicative timeslot, where the overall framework was executed, i.e., IoT nodes' association to FAPs, and determining the IoT nodes' transmission power P i, f (effort) and the FAPs' charging power P f ,i (reward) based on the introduced contract-theoretic model.Figure 2a-c present the IoT nodes' effort q i , the FAPs' reward r i , and the IoT nodes' achieved utility U i as a function of the IoT nodes' types t i considering the scenarios of complete and incomplete information.It is noted that the IoT nodes' types t i are sorted for presentation purposes, i.e., t 1 < t 2 < . . .
The results reveal that the IoT nodes of higher type, i.e., better socio-physical conditions, invest more effort (Figure 2a) by transmitting with higher transmission power to report more data to the corresponding FAPs that they are associated with.Thus, following the fairness (Proposition 1) and monotonicity (Proposition 2) conditions, the IoT nodes of higher type enjoy a higher reward (Figure 2b) from the FAPs, i.e., higher charging power.Therefore, based on the rationality (Proposition 3) condition, the IoT nodes of higher type achieve a higher utility, as shown in Figure 2c.Furthermore, it should be noted that the FAPs provide the minimum possible rewards to the IoT nodes under the complete information scenario given that they know their socio-physical characteristics; thus, U i = 0, ∈ I (Figure 2c).Additionally, Figure 3a,b illustrate the FAPs' cumulative utility and the overall IoT system's social welfare, respectively.The results show that the overall IoT system operates better under the complete information scenario.Specifically, it is observed that the social welfare of the overall IoT system is reduced, on average, by 67 % under the incomplete information scenario, where the latter is a realistic situation in an IoT system.The latter observation confirms that the proposed smart energy harvesting framework operates in an acceptable manner under the realistic conditions of complete lack of information regarding the IoT nodes' socio-physical conditions.

Benefits of Socio-Physical Approach
In this section, a detailed comparative analysis is presented in order to highlight the benefits of introducing a contract-theoretic approach to perform the smart energy harvesting and considering the unique socio-physical characteristics of each IoT node.
The realistic incomplete information scenario is considered and compared to a type agnostic scenario, where each FAP offers proportional rewards to the IoT nodes based on their invested effort, i.e., r i (q i ) = |I| ∑ i=1 t i |I| q i .Figure 4a,b presents the IoT nodes' received rewards and their corresponding achieved utility, respectively, as a function of the IoT nodes' IDs. Figure 5a,b depicts the FAPs' cumulative utility and the overall IoT system's social welfare, respectively, as a function of the number of IoT nodes in the examined system.The results reveal that the proposed contract-theoretic smart energy-harvesting model exploits the nodes' socio-physical characteristics in a personalized manner, as compared to the type agnostic model.Thus, the IoT nodes receive rewards tailored to their type (Figure 4a), and the IoT nodes' that invest a higher effort, given their higher type, receive higher rewards.The achieved benefits are also depicted in the IoT nodes' achieved utility (Figure 4b), which respects the individual rationality condition under the proposed contract-theoretic model.Thus, the IoT nodes always achieve a positive utility for their invested effort in contrast to the type agnostic scenario.The FAPs' cumulative utility is similar in both cases (Figure 5a), given that the FAPs gain from under-rewarding some IoT devices, while they spend a great amount of charging power by over-rewarding some other IoT devices in the type agnostic scenario.By studying the overall IoT system (Figure 5b), we observe that the contract-theoretic smart energy harvesting framework outperforms the type agnostic approach by a factor of four on average, given the personalized rewarding mechanism that enables the offering of personalized rewards to the IoT nodes tailored to their needs.Thus, the transmission and charging power usage is intelligently exploited in the system.

Comparative Evaluation
In this section, we demonstrate the benefits of introducing an artificial intelligent method based on reinforcement learning to facilitate the intelligent association of the IoT nodes to the FAPs.Five comparative scenarios are considered in terms of enabling the IoT nodes to select the FAP that they will be associated with: (i) the proposed reinforcement learning mechanism (RL), as introduced in Section 4, the IoT nodes' select (ii) the closest FAP to connect (Min Distance), (iii) the FAP that offered the maximum charging power in the previous timeslot (Max Charging Power), (iv) the FAP that the minimum number of IoT nodes (Min Nodes) were connected to it in the previous timeslot, and (v) a random FAP (Random).It is noted that all the IoT nodes are within the coverage area of all the considered FAPs.The overall results were derived by performing a detailed Monte Carlo analysis of 1000 executions of the overall framework for all the comparative scenarios.Figure 6a-c present the IoT nodes' invested effort, gained reward, and achieved utility, respectively, as a function of the IoT nodes' IDs. Figure 7a,b illustrate the FAPs' cumulative utility and the overall IoT system's social welfare, respectively, as a function of the number of IoT nodes within the overall system.The results show that the proposed framework outperforms compared to all other scenarios, in terms of IoT nodes' invested effort (Figure 6a), gained reward (Figure 6b), and achieved utility (Figure 6c), FAP's cumulative utility (Figure 7a), and system's social welfare (Figure 7b).This observation stems from the proposed reinforcement learning mechanism's inherent characteristics that enable the IoT nodes to select the FAPs that hollistically provide them with a superior utility in the long term, as compared to considering only fragmented selection criteria, such as the minimum distance, the maximum charging power, and/or the minimum number of connected IoT nodes to the FAPs.It is also observed that FAP selection based on the minimum distance presents the next best results after our proposed reinforcement learning-based framework, as the communication distance is a dominant factor in both the transmission and charging signals' power attenuation.The random selection scenario presents the worst results, as the IoT nodes make a non-sophisticated selection of FAPs without considering their physical and social characteristics.The Max Charging Power and Min Nodes FAP selection scenarios present similarly mediocre results, as all the IoT nodes tend to select only one FAP per timeslot, and this type of selection creates a burden on the selected FAP to serve all the connected IoT nodes efficiently.Furthermore, Figure 8a-c illustrates the total transmission power and utility of all the IoT nodes, and the total charging power of all the FAPs, respectively, for all the examined comparative scenarios.The results demonstrate that the proposed reinforcement learning-based FAPs' selection mechanism enables the IoT nodes to transmit with low power (Figure 8a) and efficiently exploit the FAPs' charging power (Figure 8c), in order to achieve superior utility (Figure 8b) within the examined IoT system.On the other hand, the single selection criterion of FAPs scenarios present worse results, as they provide a myopic view to the IoT nodes regarding the IoT system, and their most beneficial choice of FAP to be connected and transmit information, while also harvesting power.Additionally, the random scenario provides the lowest utility to the IoT nodes, as it is not able to efficiently balance the trade-off between the energy spent to transmit the IoT nodes' data and the corresponding harvested energy from the FAPs' radio frequency signals.Additionally, Figure 9a,b illustrates the IoT nodes' total achieved data rate and their corresponding total achieved energy efficiency under all the examined comparative scenarios.The results illustrate that the intelligent IoT nodes' association to the FAPs by exploiting the introduced artificial intelligent framework, results in the better exploitation of the low transmission power (Figure 8a) in order to achieve a superior data rate (Figure 9a) and improved energy efficiency (Figure 9b), compared to the rest of the examined scenarios.It is also illustrated that the comparative scenarios, which perform a myopic selection of FAP for the IoT nodes, achieve low data rate and energy efficiency.Thus, it is concluded that a multi-parameter consideration in the selection of the FAP and providing to the IoT nodes with the intelligence needed to perform the FAP selection, provides better results in terms of the transmission power and achieved data rate, and correspondingly improves the overall energy efficiency of the IoT nodes.

Conclusions
In this paper, a smart energy harvesting framework for Internet of Things is introduced based on Contract Theory and Reinforcement Learning.Initially, a wireless powered communication system model is introduced, which exploits the IoT nodes' physical and social characteristics in order to define their types.Then, the IoT nodes' transmission power and the FAPs' personalized charging power based on the IoT nodes' characteristics are determined by introducing a contract-theoretic framework to capture the interactions among the IoT nodes and the FAPs.The scenarios of incomplete and complete information the IoT nodes' types are examined in detail.Furthermore, an artificial intelligence mechanism is proposed based on reinforcement learning in order to enable the IoT nodes to select the most beneficial choice of FAP to connect to in the long-term.Finally, detailed simulation and comparative results are presented to show the pure operation performance of the proposed framework, as well as its drawbacks and benefits, compared to other approaches.
Our current and future work aims to extend the proposed framework in a 6G operation wireless environment enriched with reconfigurable intelligent surfaces in order to improve the channel conditions among the IoT nodes and the FAPs.To quantifying the benefits introduced by adopting the reconfigurable intelligent surfaces, we perform a detailed experimental analysis to measure the transmission and charging power savings.

Figure 1 .
Figure 1.Smart Energy Harvesting for Internet of Things Networks.

Lemma 1 .Lemma 2 .
All the downward IC constraints are equivalent to the local downward IC constraint.Proof.See Appendix A.5.Following the same philosophy, we state the following Lemma.All the upward IC constraints are equivalent to the local downward IC constraint.Proof.See Appendix A.6.

1 |S
i | , and receives a corresponding utility U (ite) i

Figure 2 .
Figure 2. IoT nodes' (a) invested effort, (b) gained reward, and (c) achieved utility under the proposed contract-theoretic energy harvesting framework-Complete versus Incomplete information scenarios.

Figure 3 .
Figure 3. (a) FAPs' cumulative utility and (b) the overall IoT system's social welfare under the proposed contract-theoretic energy harvesting framework-Complete versus Incomplete scenarios.

Figure 4 .Figure 5 .
Figure 4. (a) IoT nodes reward and (b) achieved utility under the contract-theoretic versus the type agnostic framework of energy harvesting.

Figure 7 .
Figure 7. (a) FAPs' cumulative utility and (b) the overall IoT system's social welfare -A Comparative Evaluation.

Figure 8 .Figure 9 .
Figure 8.(a) Total transmission power and (b) utility of all the IoT nodes, and (c) total charging power of all the FAPs -A Comparative Evaluation.
. A set of IoT energy harvesting nodes I = {1, . . ., i, . . ., |I|} is considered.The distance among two IoT nodes i, i ∈ I is denoted as d i,i [m], while the distance of an IoT node from a FAP is d i, f [m], ∀i ∈ I, ∀ f ∈ F. The overall system operates as a wireless powered communication network (WPCN), where the Wireless Energy Transfer (WET), and the Wireless Information Transmission (WIT) phases are executed within a timeslot τ[sec].The WET and WIT phases' duration is denoted as τ WET [sec] and τ W IT [sec], respectively, with τ = τ WET + τ W IT .The considered system model is presented in Figure

Table 1 .
List of Acronyms.

Table 2 .
Summary of Key Notations.IoT node's consumed energy for data transmission [J] E f IoT node's transmission power [W] P f ,i FAP's charging power for the IoT node i [W] R i, f IoT node's achievable data rate [bps] W System's bandwidth [Hz] σ 2 Power of zero-mean Additive White Gaussian Noise (AWGN) SP i Socio-physical factor of the IoT node i ρ i, f Proximity factor of the IoT node i to FAP f η i Energy conversion efficiency factor of the IoT node i G Σ Channel quality vector G Σ Normalized channel quality vector CI i,i Furthermore, we consider the communication interest factor CI i,i ∈ [0, 1] among two IoT nodes i, i , ∀i, i ∈ I, capturing the need of two IoT nodes to exchange information among each other in order to perform an IoT service.We define the communication interest symmetric matrix CI = {CI i,i } |I|×|I| , ∀i, i ∈ I and create the communication interest vector ].Thus, we obtain the normalized communication interest vector CI = [ CI 1 , . . ., CI |I| ], where CI i = 1] shows the relative communication interest of each node i with all the other IoT nodes in the network.By jointly combining the normalized communication interest and channel quality indicators, we conclude with the socio-physical factor