Smart Home Gateway Based on Integration of Deep Reinforcement Learning and Blockchain Framework

The development of information and communication technology in terms of sensor technologies cause the Internet of Things (IoT) step toward smart homes for prevalent sensing and management of resources. The gateway connections contain various IoT devices in smart homes representing the security based on the centralized structure. To address the security purposes in this system, the blockchain framework is considered a smart home gateway to overcome the possible attacks and apply Deep Reinforcement Learning (DRL). The proposed blockchain-based smart home approach carefully evaluated the reliability and security in terms of accessibility, privacy, and integrity. To overcome traditional centralized architecture, blockchain is employed in the data store and exchange blocks. The data integrity inside and outside of the smart home cause the ability of network members to authenticate. The presented network implemented in the Ethereum blockchain, and the measurements are in terms of security, response time, and accuracy. The experimental results show that the proposed solution contains a better outperform than recent existing works. DRL is a learning-based algorithm which has the most effective aspects of the proposed approach to improve the performance of system based on the right values and combining with blockchain in terms of security of smart home based on the smart devices to overcome sharing and hacking the privacy. We have compared our proposed system with the other state-of-the-art and test this system in two types of datasets as NSL-KDD and KDD-CUP-99. DRL with an accuracy of 96.9% performs higher and has a stronger output compared with Artificial Neural Networks with an accuracy of 80.05% in the second stage, which contains 16% differences in terms of improving the accuracy of smart homes.


Introduction
The smart home is the combination of IoT systems and comfort, high-quality lifestyle, security, convenience, etc. Smart home networks based on IoT are interconnected with smart devices such as wearable devices, smart homes, and smart meters. The smart home has the capability of encouraging human life to an independent lifestyle. The global market for smart homes is rapidly increasing, and it is anticipating to achieve 53.45 billion dollars in 2022, and this process has the growth of 20.8% during years 2018-2022 [1]. As stated by Gartner [2], the previously visualized records reporting 500 million smart home devices increases to 700 million in a year, and this is a challenging record in terms of security of these devices [3]. One of the advantages of an IoT network is to be sensitive to the security threads. The devices are normally without a manager and no one to supervise [4][5][6]. The devices are interrelated together from a gateway using different wireless protocols which clear the way for eavesdropping for attackers with less processing steps for which applying the security technique for each device is troublesome [7,8]. A real criminal hacking case happens in 2018 in a North American casino to steal data using fish tank [9,10]. Discovering that the treads from casinos performs some security prudence, but the hackers could send data from the tank to Finland. The IoT techniques should prepare and improve the architecture to overcome more advanced attacks or terrorist attacks [11,12]. Smart home common security problems are in terms of data privacy, authentication, access control mechanism, the issue of system configuration, etc. [13][14][15][16]. The traditional IoT systems are centralized, connected with the cloud to lead the network failure in the compromised central server. In another case, IoT devices have computation power limitations that cause the delicate to different security threats. There are various solutions to overcome the mentioned problems based on the security layer presented in [17,18] and the decentralized network architecture implementation of blockchain for smart homes in [19][20][21]. In the past decades, the solve security is based on the blockchain identified which creates trust, reliability, scalability, and privacy based on the paradigm of IoT [22,23]. The smart home adoption with blockchain decreases the concern of massive security e.g., authentication, integrity, authorisation, and attacks single point [24]. The Comprehensive Intrusion Detection System (IDS) is enough to overcome this problem to identify the conventional approach's unique arrangements. The recent technology is known as Deep Reinforcement Learning (DRL), which can apply in the data flow evaluation to interference spots and patterns of attacks. This study presents blockchain and DRL combination in smart homes based on different applications such as data sharing in the smart home.
The main contributions of this paper include: • Using a blockchain network for the smart home to investigate the problem of security. • Evaluating the common IoT devices of the smart home based on hardware implementation. • Presenting the architecture of a smart home gateway to relieve the recent challenges of the smart home. • Improving the performance of the proposed system compared with other existing works. • Deep Reinforcement Learning applied to predict and interpret the data. • Deep Reinforcement Learning creates the safer smart home using IoT sensors for improving the performance of the process.
The remainder of this paper, organized as Section 2, describes the related studies for the smart home architecture. Section 3 describes the proposed machine learning approach for the smart home gateway based on the blockchain framework. Section 4 describes the implementation and experimental results of the proposed system, and we conclude this paper in discussion and conclusions.

Smart Home Based on Public Blockchain
Varshney et al. [41] presented a blockchain-based secure platform using IoT sensors for the smart home to protect it from threads. The implementation of this system shows the secure communication between IoT devices in the distributed environment. Lazaroiu et al. [42] presents the integration of IoT and blockchain with a smart district model and gives access to users for the power grid. The developed system makes the connection between the user and blockchain in the power grid system. The one who accesses the solar panel configuration can amuse the network and buy and sell the energy through blockchain. Aggarwal et al. [43] proposed the scheme of secure energy trading, famous for Energy-Chain using the smart grid. The security of the implemented system evaluates based on the cost, communication, and computation time. Dorri et al. [44] presented the blockchain-based smart home for the representative case study. The author designed the smart home core building block. This system tested the transaction process in various components. The privacy and security analysis performed in the smart home architecture. The final process claims the lower processing time and ease of use for IoT devices.

Smart Home Based on Private Blockchain
Dorri et al. [45] presented the lightweight, secure system based on blockchain for the smart home. The smart homeowner centralizes the blockchain. The connections are defined to use a shared key for communication. The transaction process checked through lightweight hashing to disclose any deflection. This system is secured from Distributed Denial-of-service (DDoS) attacks with availability, dentiality, and integrity. One of the disadvantages of IoT devices is less storage area and limited computation power. In another point of view, there is a need for huge money and time for data streaming which causes the author to combine the blockchain and smart contract for improving the system security [46][47][48], for which the novel approach is lightweight based on a smart contract for smart home architecture. Table 1 shows the existing studies of smart homes in terms of public and private blockchain. It compares this framework in terms of confidentiality, integrity, and scalability.

Smart Home Gateway
Smart home technology in terms of the residential environment is growing due to recent developments. There is the possibility of easily controlling the living environment using a control system and devices [49]. Environmental manipulation contains various conditions, e.g., cost, preferences, and type of dependent technologies. To control and monitor the smart home, various devices communicate through the gateway [50]. To design a network using a gateway connection, there is a need for some functionality in the concept of home network connection, internet connection, remote control, software update and expansion, and a remote operation with a secure and reliable method. The goal for implementing the mentioned gateways is to create a sustainable smart home. Sivaraman et al. [51] presented the smart home network based on security vulnerabilities. Using ISP, managing devices and certificate verification is possible but not enough in terms of security due to less user data information. Jamil et al. [52] presented the integration between machine learning and blockchain for the smart grid sustainable electrical power. The applied blockchain platform is hyperledger calliper based on the resource utilization, latency, and throughput. This system is useful for energy crowdsourcing system. Table 2 presents the overview of the recent opportunities in blockchain based smart home technology.

Smart Home Based on Reinforcement Learning
The spread of a home energy management system contains two types of software and hardware to allow users in terms of managing the energy and production. Due to this process, Cheng et al. [59] used Q-learning based on a model-free process for controlling and window systems in terms of energy saving which achieve 23% of saving energy. Wei et al. [60] used the DRL based on a data driven method and 20% cost reduction. Nagy et al. [61] applied a data driven method due to a rule-based control system. Wang et al. [62] applied model-free RL to reduce the energy consumption and Gao et al. [63] applied DRL for thermal comfort control and an energy optimization process.

Integration of Blockchain and Deep Reinforcement Learning in Smart Homes
This section presents the smart home technology based on the integration of blockchain and machine learning. In recent technology, the smart home became the famous process to secure and improve the performance of buildings in terms of usage of IoT devices and overcome the problem of attacks from hackers to get users' information. In this process, we proposed blockchain-based smart home security integrated with deep reinforcement learning. The main reason for applying deep reinforcement learning is that this algorithm is learning-based, and, based on this reason, improving the smart home's performance is higher than other existing works. Figure 1 shows the architecture of smart home based on blockchain framework with reinforcement learning. Collecting information is from different IoT sensors and smart devices. The dataset uses the deep reinforcement learning framework to process into the blockchain to remove the errors, e.g., disruption, repetition, data value loss. The related problems to data excluded from the DRL framework. DRL has the ability to focus on segments of the chain instead of the data collection process. This reason creates the unique framework for fraud detection, theft detection based on prediction, etc.

Deep Reinforcement Learning
Applying machine learning techniques in the smart home framework became a permanent solution for various aspects. These techniques control the IoT devices, which are the most related to improving home security. In this process, the DRL technique directly optimizes and expresses the value function, environmental models, and strategies in an end-to-end process. DRL can build the model based on pattern extraction using original high-dimensional data and the basis of control policy. An optimization and decision control-based DLR is shown in Figure 2. There are two main parts in this process as training and execution sections. The training section is a learning-based section that executes parts and uses them to optimize decisions to learn knowledge in a real environment. In emergency cases, the agent links to the new environment and improves the captured reward to replace optimization.

Markov Decision Process and Q-Learning
There are four main components in Markov Decision Process (MDP) as state (E), actions (C), the probability of state transitions distribution (p(|e, c)), and probability of reward governing distribution (q(|e, c)). The details of each part are presented as: The responsibility of an agent in a presented smart home gateway is to select actions from environment based on maximizing reward. Following the home automation system, the agent controls the home energy system. The environment in a smart home system refers to the energy production of the smart products of home and the usage of price in this system e.g., Wi-Fi. The reward in this system is the key element of DRL algorithm which shows the Guidance of the agent to reach the acceptable value function in terms of the right direction. This strategy objective function is: more rewards equal more benefit from the real-world energy system. The action taken in the proposed smart home is the output of the DRL algorithm, which is the Q-value combined action. This means the time-shit load, power-shift load, power balance, devices physical constraint, satisfaction of demand, and other constraints. The task of MDP has a possibility to discretize to time periods. Each time period t, a state is occupied with agent e t ∈ E and selects the action in the current state from possible actions. Execute the result of the selected action in the transition state to E (t+1) and direct reward W(E t , c t ). The MDP can be applied in smart home technology and used in the completed model. Lack of information related to environment e.g., missing the transitions and reward probability, to generate the optimal policy, the model-free Q-learning can be used in this process. The temporal difference method contains the Q-learning that is able to incrementally predict online. The updated rule of Q-learning is defined in Equation (1): β represents the learning rate of the estimated value lower than one during the learning process. Algorithm 1 shows the simulation steps involved in the DRL process. In Q-learning for each state and action, Q-value pairs are saved in the Q-table. This process is updated with the stochastic gradient descent in Equation (2):

Algorithm 1 Smart home simulation process
β is the control of step-size and R t+1 + αmaxQ(e t+1 , c) is the envisage reward which can capture from action c t in terms of e t state. In terms of high dimensions, the agent is slow at value learning. If the action and state are high-dimensional, then the Q-learning process becomes unrealistic. The state is the incorporation of the smart home total information and keeps all the historical records. Actions are selected based on the policy for each time interval using E-greedy. The strategy of E-greedy selects the best actions from the policy and extracts the governed time by E.

Gateway Network Based on Blockchain in Smart Homes
The applied blockchain in smart home gateways is a conclusive data transmission process, authority, authentication, and confidence between devices. A smart home is a centralized and distributed network at the cloud layer of blockchain. The presented smart home in blockchain framework contains three main layers: the device layer, gateway later, and cloud layer as shown in Figure 3. Smart devices are collected in device layers that collect and monitor smart home data in different IoT devices configured in smart homes. The gateway layer saves the generated data from the device layer and is based on user needs. The last layer, which is the cloud layer, registers the gateway ID and process data of each gateway in the blockchain. The blocks are shared for users that they can access anytime they are needed. The data collection process allows the devices to collect and save into blockchain. This process gives this opportunity to the user to create block, format, verify, etc. Figure 4 shows the structure of blocks in this system. There are five main components: previous block hash, timestamp, nonce, fromdeviceid, and todeviceid.   Figures 5 and 6 present the process of user authorization and verification request. The user authorization for the first step requires installing a smart application and generating the unique key for each user. Next is the registration of a unique key through Rest API, and, in terms of verification, it is possible to register the client in the super-node and confirm the registration.  Figure 6. Process of verification request.

Results and Discussion
This section presents the experimental results and implementation of the smart home architecture based on DRL and blockchain. Table 3 presents the implemented environment overview. The used memory in this system is 32 GB. The system processed CPU is Intel(R) Core(TM) i7-8700@3.20 GHz. The programming language for machine learning algorithm implementation is 3.6.2. The presented blockchain framework is Ethereum. The applied machine learning algorithm is deep reinforcement learning.

Data
The process data in this system collected from IoT sensors in a smart home. This process shows the data transformation from IoT sensors to the smart home gateway. The provided information is supplied in the data collecting layer as the input of the proposed system. The special data cleaning omitted the inconsistencies of knowledge. Figure 7 shows the gateway data management in the blockchain network. This process has three layers: data collecting layer, pre-processing data layer, and hashing layer. The data collection contains time setup, requesting the data, and storing data. The data pre-processing contains the filtering process, standardization, and classification. Finally, the hashing layer contains the encryption, hashing, and stored values. The generated data from the device contain the communication with the router at a special time. In case of the necessity of new data for a gateway, the stored raw data sent to the gateway. In the second layer, for creating enough storage, only the information with the device ID is storing based on the standardization and classification. Finally, the generated data in the smart home contain the important information from the users secured by encryption and require a password from the user and store in the hash function.

Blockchain Framework Performance in Smart Homes
In this section, the proposed architecture implementation is presented to validate the performance of the system. Figure 8 shows the various statistical parameters for smart home optimization in terms of security for training and validation. There are eight parameters discussed in Figure 8. One is accuracy, two is missing rate, three is sensitivity, four is specificity, five is false positive value, six is positive predictive value, and eight is negative prediction value.  Table 4 shows a blockchain-based smart home based on a DLR prediction set. There is a total of 150,317 records processed in the training set. These records are divided into two categories of normal and attack samples. The normal records are 79,465, and attack records are 70,852. In addition, 3531 records are the invalid predicted records, and 67,321 are the correct predicted records.

Deep Reinforcement Learning Performance in Smart Homes
The performance evaluation of the DRL is based on Q-learning. The value function is defined in Equation (3): Q e,c,w ≈ Q π (e, c).
Based on the following Equation (3), the w parameter is defined as (4): Table 5 shows the validation records of the presented system. There are in total 33,886 validation samples, which are divided into 10,931 normal and 22,955 attack records. The observed 10,348 records are considered as normal and 583 records as the wrong prediction while there is no actual attack. Furthermore, 22,046 records show the correct prediction samples, and 909 show the invalidated records.  Figure 9 shows the capability of DRL algorithms according to the taken actions. The process is in one hour, and it shows that the DRL optimization range from 2 to 10 in the morning and from 5 to 9 in the afternoon is high.  Table 6 shows the comparison of the performance of the proposed system with other state-of-the-art. Artificial Neural Network (ANN), in the second stage after DRL with an accuracy of 80.05, performs better in the provided data type. NSL-KDD and KDD-CUP-99 are two types of datasets that we used to process the proposed system.  Figure 10 presents the performance evaluation of the smart home based on various machine learning algorithms due to predicted results in terms of false positive value, false negative value, positive prediction value, and negative prediction value. Figures 11 and 12 show the response time and accuracy records in terms of security measures of data traffic quantity. It is clear that the gateway layer contains the faster response and similarly has the higher security measurement. The presented process based on blockchain employs security, authentication, confidentiality, and integrity to the smart home.  We designed the broadcast and block using ESP32 device to test proposed architecture. According to the process, SN has the possibility of creating the broadcast and block for verification of transaction. Figure 13 shows the mined two blocks. The selected green parts present the block one and two start times and the selected yellow parts show the block one and two completion times. The mining time of block has the possibility of human-readable time. The difficulty target in this system is used to control the working phase of machines for new block generation. During the time limitation, if a new block is created, then the difficulty requires an adequate amount of time-in the same way, changing the difficulty and mining time which is run by code that are summarized in Table 7. Table shows the difficulty and mining time differences for block one. The actual difficulty is one and possible delay time is 0.22 for each block from difficulty two, and Figure 14 presents the time taken for the transaction and the difficulty within the processing time. Based on the figure, at difficulty four, it takes 60 s to consume. The main reason for evaluation of the difficulty and time together is to show the records of difficulty in terms of time per second.

Difficulty
Mining

Conclusions
A smart home is one of the recent technologies in the IoT and sensors framework. Interference and identification of smart homes are huge challenges in predicting and evaluating which blockchain and machine learning have great potential to achieve this objective. The limitation of power and processing in smart home deployment can not easily be applied in this system. Therefore, we have presented the Deep Reinforcement Learning integration with blockchain to minimize the authentication, confidentiality, and integrity problem of the smart home's contradictory IoT and centralized gateway. This article proposed the existing works for smart home security and a simple model for the security architecture of a blockchain. The user's performance in the blockchain framework as a node was eliminated, but, as an alternative, the IoT devices made the system unique.