An RSSI-Based Localization, Path Planning and Computer Vision-Based Decision Making Robotic System

: A robotic navigation system operates ﬂawlessly under an adequate GPS signal range, whereas indoor navigation systems use the simultaneous localization and mapping system or other vision-based localization systems. The sensor used in indoor navigation systems is not suitable for low power and small scale robotic systems. The wireless area network transmitting devices have ﬁxed transmission power, and the receivers get the different values of signal strength based on their surrounding environments. In the proposed method, the received signal strength index (RSSI) values of three ﬁxed transmitter units are measured every 1.6 m in mesh format and analyzed by the classiﬁers, and robot position can be mapped in the indoor area. After navigation, the robot analyzes objects and detects and recognize human faces with the help of object recognition and facial recognition-based classiﬁcation methods respectively. This robot detects the intruder with the current position in an indoor environment.


Introduction
In an indoor environment, satellite GPS signals do not have adequate strength due to multipath reflection from indoor objects [1][2][3][4]. Radio frequency identification (RFID)-based localization technology needs an RFID tag placed over a certain area for a randomly moving agent, and identifies the position and makes decisions [5,6]. However, if the agent misses the tag, then it is hard to achieve accurate localization. The ultra-wideband and infrared-based localization methods have higher accuracy than the RFID based methods [7]. Identifiable signatures, such as Wi-Fi, escalators, and high tension transmission lines influence sensor-based indoor navigation [8]. Jovicic et al. suggested that visible light communication through LED can be precise in tracking individuals in an indoor area [9] and Complementary Metal Oxide Semiconductor (CMOS)-based cameras decode the information transmitted [10]. The movement in the dynamic environment can be sensed by a change in the data rate. As the distance increases the received power (in dB) decreases [11]. The power spectral density (dBm/MHz) is also affected by the increment in the distance. The use of a power amplifier reduces the reduction rate of received power with respect to distance [12]. The link quality can be measured from different key performance evaluation parameters, such as received signal strength index (RSSI), link quality index (LQI), signal to noise ratio (SNR), and bit error rate (BER). These parameters are 1.8 m accuracy horizontally with the floor number where the receiver was. In the emerging field of indoor localization techniques, ultra-wide band (UWB) is a newer technology [22]. For results required from a large number of features and searches in a random section, the best way is to split each node [23]. Random forest-based forecasting converges more possible inputs than linear regression, multiple linear regression, and CNN networks [15]. The estimator calculations for random forest classification model concerning their features provide a better estimation. If the number of estimators is calculated, the prediction can be faster for real-time data [24]. Computer vision-based indoor navigation uses the marker-based approach. Based on the size, shape, distance from the camera, focusing methods, and different light conditions, the marker-based indoor navigation can be used for indoor navigation [25]. A robotic system should be capable of navigating in an indoor environment and have capabilities to understand the surroundings. In the visual localization and mapping method, the system builds the three-dimensional point cloud model for position estimation and localization.
For data storage and comparison with real-time data, one needs significant processing capabilities. Visual localization and mapping have challenges such as the dependency on illumination conditions, and a high processing time for a dense and complex environment. To process and analyze real-time data, the graphic processor provides a higher processing speed. For a system with a lower processing capability, it is hard to achieve real-time processing speed with object detection simultaneously. The errors in depth measurements are directly related to the distance between the object and the sensor. If the distance increases, the error also increases. Additionally, the data quality depends upon the sensor depth resolution [26]. The simultaneous localization and mapping (SLAM)-based indoor positioning was tested using Microsoft 3D Kinect Sensor inside the laboratory, and the results are shown in Figure 1. This sensor framework uses a 3D depth sensor which emits infrared pulses and measures the reflected pulse time. Cloud points in x, y and z co-ordinates are the sensor outputs. The sensor captures the frames with cloud points at the time of mapping and compares these points in real-time localization. Based on the cloud point matching ratio, the sensor tries to localize itself in a real-time environment. The blue track lines represent the sensor movement trajectory with orientation and sampling position, and the green dots are the perfect match that points to the database value during the real-time scan. Yellow dots indicate the change in the environment after scanning, or are signified as unmatched points. Also, the scanning time required to capture the area depends on the sensors and the processing unit. Sudden movements can affect the mapping and localization results. To scan an area of 77.51 sq.ft., the sampled database size is greater than 600 MB and this sensor requires high processing power capability, and so we conclude that the system has an advantage in accuracy, but is not suitable for applications which run on limited hardware specifications. Additionally, this sensor will not reliably work in highly illuminated regions, such as window and doors where sunlight is prominently present, thereby affecting the result. The neural network should be fast at processing the live video stream.
The YOLO (you look only once) model for real-time object detection process 45 frames per second (fps), whereas fast-YOLO processes 55 fps in real-time. By reducing the frame size, real-time detection is possible with minimal object localization error [27]. An individual facial detection and object recognition model was deployed on a system-specific low power controlling device [28]. In this cited article, the continuous data frames were captured by the camera unit and synthesized by the object detection and facial recognition classification. Computer vision-based human detection is possible with the help of a microprocessor development board, such as Raspberry Pi 3, and a passive infrared sensor [29]. The image of the person was captured by the camera module and transmitted over a mobile server for feature extraction, facial detection, and recognition in real-time [30].
The proposed design is implementable in hardware with a pre-existing wireless module and should be used to navigate the person or a robotic agent in a complex environment, such as a shopping mall or a warehouse to transport cargo. The RSSI based indoor navigation is an integral part of the robotic system to take the necessary action in a specific region. The contributions of this paper are as follows: 1.
Based on the observations and literature, the RSSI and wireless link quality values continuously varied with respect to the changes in the environment, and so a comparative analysis of different classification models was made for this method.

2.
The designed framework was implemented in a pre-existing environment without hardware modification and with limited power usage.

3.
The integrated cognition-based target selection and path planning framework was implemented.
In Section 2, the data collection method, system flow, path planning, and facial and object recognition methods are explained. Section 3 describes the sample distribution, classification comparison, and prediction accuracy. Conclusions and future work are described in Section 4.

Methodology
The navigation method should be reliable and accurate in a frequently changing indoor environment. The RSSI-based print technique registers multiple sampling inputs from different wireless local area network nodes. The scanning points collecting the signal quality and the signal strength values based upon signal to noise ratio and data rate, are received. The drawbacks include false localization for small sample size data. The system's pseudo-flow graph is in Algorithm 1. In this proposed system, 60,000 samples per position are collected at the rate of 30 ms/sample through a Raspberry Pi module. At the time as sampling from different positions, the robot also registers the angle and distance for the next and the previous sampling points using 3-axis magnetometer and accelerometer. Here P s is the respective sampling position, P e is the estimated position, and DB is the measured RSSI value.
The link quality of the wireless data link may depend upon signal interference plus noise ratio (SINR), packet delivery ratio (PDR), and bit error rate (BER). The accuracy depends on the wireless data card manufacturer. In this study, the idle link quality Q was 70, and so the sampling values were meant to be in the range of 35 to 60. The samples could not be accepted for the database until its link quality was within this range. The robot/modules which collect data waited for link quality improvements. During real-time positioning, the system again collects the data of link quality and RSSI values, which is the input for the prediction model for the estimation of the present position. The robot/module then moves towards the nearest sampling point, again collecting the same information, and predicts the model. If a nearby point adheres to the predicted position, the previous scanning point is considered the correct localized point to calculate the optimum path planning. Here n is the number of sampled positions; r = 1, 2, or 3, are the data collected from individual transmitters. θ nn is the angle between corresponding sampling points and D nn is the corresponding distance. P e is the estimated position calculated from the classification model. The actual position calculation takes place after the current position estimation, and the robot moves to the nearby positions to estimating their corresponding positions. If the nearby position is not as per the prediction, the system again scans for new data.
As the sample size contained more than 1,200,000 samples for 20 nodes (60,000 samples per positions i.e., 30 ms/sample), the decision was made through the prediction and classification models as per fitness scores. For the minimal surrounding environment, error nodes in a triangular fashion are shown in Figure 2. The generated database was tested for k nearest neighbors (KNN), random forest (RF), the radial bias function kernel-support vector machine (RBF SVM), and the multi-layer perceptron (MLP) neural network (NN) classification models. The inputs were applied to these models and we calculated fit score with RMS error. The sample data were taken in the classroom and passage area. The black dotted points were the sample area and Tx 1 , Tx 2 , and Tx 3 were wireless local area networks. D 1 , D 2 , and D 3 were the distance between starting the sampling point from position 1 to transmitters.  The system flow graph is in Figure 3. The robot asks for user input mapping position for the database. For better accuracy, the mapping point should be as dense as possible, and so here the distance between two consecutive points is 0.5 m. The system registers the link quality and the RSSI values for all the samples and points under consideration. The database stores these values as Q (link quality out of 70), DB (RSSI value), and Pos (position number), and also two separate CSV files which are operated by model defining and prediction modules. The model prediction module uses data-specific regression or classification models, such as KNN, RBF-SVM, RF, and MLP algorithms. An optimal model is selected based on accuracy and data processing time. Here this database is tested for different models, such as different regression models and classification models.
The database was split into 70:30 training:testing values for prediction, and based on their respective higher fit score, minimum root-mean-square Error (RMSE), and processing time, the model was selected. This prediction model was applied to the real-time scan values for robot localization. The robot scanned for new values and registered the RSSI values only for the link quality that was between 35 and 65. If the link quality were to degrade, the prediction model error would increase, indicating the wrong robot position in the mapped area. For each position in mapping, the database contains magnetic orientation values.
At the time of optimal path generation, the path from one fixed position to the target position is decided by connecting the nearest optimal points, as shown in Figure 4. The sampling points are in black dots and the objects in different shapes and colors. The sample node is described in (column, row) format. If the starting point was (0,5) and the target was (3,1) the optimal path was shown in red as the number of points connected to nearby reaching points. At the time of sampling, the reaching points are well registered. The starting point (0,5) was connected to point (0,4), (1,0), (1,4), (2,4), and (1,5). The connecting path from (1,0) node is a black dotted line. The estimated path from source to the destination was calculated by the algorithm and listed in Figure 4. The path which has minimal hoping points will be declared as the optimal path. At the time of path following, the presence of a number of nodes means that the system has to localize correctly after reaching the specific node. To minimize time to reach the destination, the number of hoping points should be minimum. The connecting points are stored in the database with the corresponding angle and distance. Based on the starting point and the target point, the algorithm estimates all possible paths. As shown in Figure 4, five paths were estimated from the starting point (0,5) to the target point (3,1). The algorithm calculates the connecting point path that reaches to the target point. The minimal distance path is considered optimal. The estimated path "a" which is marked in red, (0, 5) → (1, 0) → (3, 1), is the shortest path among all possible estimated paths. At the time of path following, when the robot faces the obstacles, the algorithm calculates a new optimal path for the same destination. The robot tries to reach the destination, and in the interim period, the system is enabled to start capturing frames from the on-board camera unit. The system contains the database of the authorized person images. If new authorized target coordinates are given, the robot follows the new target points and optimizes the path for a new target. The facial and object detection and recognition process are shown in Figure 5. The facial detection and recognition algorithm does not use the depth sensor data; rather, it works on two-dimensional images. An individual's images are studied to detect the facial area using the Haar cascading method. An individual frame captured by a Pi camera is synthesized by the processing unit and converted into the 128-bit real value numbers. A pre-training model generated based on the Microsoft Common object in the context dataset was used for object detection of a chair, table, bicycle, etc. This individual image detection classifier was applied to the resultant image, as in Figure 5 output phase. If the person was valid, then the speech recognition was enabled. This system takes input in the form of a for a new target. If the system fails to detect and recognize the face, the target point can also be input manually through a remote server. The system library function enables the microphone and registers the input until there is a delivery stoppage for 2 s. The given input is converted into a string to remove unnecessary background noise and the system finds only target inputs.
The hardware setup is shown in Figure 6 with the camera setup positioned at 1 m above the ground level and embedded with the Raspberry Pi 3 module to capture an adequate vision area. The MPU 6050 is an integrated accelerometer with the compass embedded with the processing module to measure angle and displacement. The robotic action signal is sent to the H-Bridge driver module based on the decision made by the navigation algorithm processed by the embedded system.

Model Selection and Data Processing
The quality and RSSI value distribution of database is shown in Figure 7. The values are not linearly distributed but change with the link quality. Position 2 shown in orange in the graph has a different distribution concerning the changing environment with time. It is hard to predict the exact location in a map without taking several samples over time and prediction or classification models. The best performing classification models were studied, and a comparative result for sampled data is shown in Figure 8.
The input samples were randomized and applied to the classification model. The inputs were link quality and RSSI value, and positions were the output variables. The randomized data were segregated into 70:30 training:testing datasets. In Figure 8, the solid red dots are the training data, and light colored dots are the testing data. The red and blue regions are the boundary conditions to separates points. The classifier accuracy depends on the dataset distribution (shown in the first column); if the datasets do not overlap, then the KNN, SVM, and NN have better accuracy compared to the RF. For RSSI-based indoor navigation system, RSSI values depend on the surrounding medium and the measured inputs overlapping. In this case, RF classification methods outperformed the KNN, SVM, and NN, as shown in the second row. The performance analysis of different classification models is shown in Tables 1 and 2. Mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were calculated for sampled and real-time scanned datasets. The RF classification method had lower MAE DB , MSE DB , MAE RT , MSE RT , and RMSE RT compared to other methods. We took the samples in a grid formation, so the error between consecutive points of the identical distance was the same. The error value cannot justify the position angle with respect to the target point. We denote the location of the samples as a label for the classifier, and the errors calculated in Table 1 are the position number errors. The computational performance is measured based on the classification execution computation speed, and therefore, has a significant impact in a large sampled area. The RF classifier was computationally the fastest for sampled and real-time scanned datasets. Based on the classification accuracy calculated for samples, the random forest classification method is accurate and fast compared to other classifiers, and so can be used as a prediction model.
The optimum estimators require for the random forest-based classification algorithm to be calculated to optimize the CPU processing. The number of estimators requires one to achieve lower out-of-bag (OOB) error, <50, for the sampled dataset, as shown in Figure 9. Here the estimator range was from 15 to 175. The error rate reduced below the level 50 estimator. The RSSI value increased as the robot moved towards the transmitter. The higher the RSSI value, the higher the signal strength. The prediction model convergence for all RSSI register values for four random position is shown in Figure 10.  The gray colored dots are RSSI register values. The random forest classification algorithm tries to converge; it can be seen as red lines. The bar graph of actual and predicted values for 25 random cases is shown in Figure 11. The robot predicted the position using the real-time scan data and prediction model. We marked the best-predicted position vis-a-vis the other points as an estimated position. The robot then moved to the nearby four points, centering the predicted position in the grid. If for a certain point, a precise prediction was available as per the assumption made by a robot, then before moving, the robot conformed to that point as an accurate estimation. The cumulative distribution of errors (in meters) was determined. In an average-sense, the RF classification method gets its 90% position predictions within the 3 m range. Additionally, the RF-based method gets a maximum error of 6 m for any positions. The RF, KNN, RBF SVM, and MLP achieved 90% accuracy at 2.87, 3.7, 3.85, and 5 m, as shown by vertical dotted lines in Figure 12. The median accuracies for this study were 0.030, 0.045, 0.05, and 0.062 m respectively. The threshold point to select the classification method was 3.2 m because the error should not go beyond half of the region of interest area. The threshold point was decided based on the mapping area where the distance between two consecutive points was 1.6 m and was set equal to two grid points, or 3.2 m from prediction point. For RF, RBF SVM, KNN, and MLP respectively, the accuracies of the individual methods at threshold points were 91.7%, 86%, 84.3%, and 82.3%. The accuracy depends on the sampling data and the real-time link quality. The object detection and recognition module and the facial detection and recognition module work independently. Object detection results are in Figure 13 (Part: A). The robot tries to understand the surroundings and identifies the objects and registers their position in the database. The robot detected the objects such as a chair, laptop, and person in the detection area; as the prediction model was unexposed to the training data for the laboratory storage cabinet, some such objects were unrecognizable. A facial detection and recognition model output helps in acquiring inputs from authorized persons. The detected person is recognized through a respective database name, whereas the person who has no record in a database is deemed to be an unknown person shown in Figure 13 (Part: B). The face orientation and lighting conditions may affect the detection process, and so the robot tries to change its orientation for appropriate adjustment to the lighting conditions. In case of detection error, the system can get the target point through manual inputs. The robotic system receives necessary action for a change in target or to scan the area with the help of the proposed localization method. This system is useful for surveillance without human interaction among the set-points.
The selection of classification methods depends on many parameters, such as data collection methods, the filtering process, and consideration of the dynamic environment. The proposed robotic navigation system algorithm has liberty to select a suitable classification method based on the dataset. Among the classification methods used in comparative studies, the RF-based classification gave relatively higher accuracy in a complex environment (90% at 3.2 m) without setting the activation function, learning rate, hidden layer neurons, and thresholding process. Additionally, the processing time for higher samples is lower compared to the other methods. Hence, the time needed for prediction during position hopping is less which is a critical factor for localization of robot in real-time.

Conclusions
The RSSI-based robotic system navigates in a diverse indoor environment with the help of available wireless data links. The RSSI values are changes as the object appears in between wireless node and a receiver module. In this case, the data-link quality plays a crucial role in identifying accurate mapping position. Dense area sampling with adequate sample points in diverse conditions improves accuracy. The accuracy of the classification models depends on the data distribution and training: testing ratio. It is necessary to calculate the classifier accuracy before fixing any model. For sampled dataset, the random forest classification method provides a promising result with the least MAE, MSE, and RMS error compare to other prediction models. Path optimization algorithm chooses the lowest hoping node path as the shortest path. In the proposed method, the RF classification method achieved a location accuracy which was 5.7%, 7.4%, and 9.4% higher than RBF SVM, KNN, and MLP in a 3.2 m span. The designed robotic platform can be further modified into a transport laboratory equipment from one room to another on the same floor. The system can sense the surroundings and make immediate decisions using the proposed computer vision. In future, it is possible to deploy this system on aerial vehicles inside buildings for surveillance purposes.