Mobile Robot Wall-Following Control Using Fuzzy Logic Controller with Improved Differential Search and Reinforcement Learning

In this study, a fuzzy logic controller with the reinforcement improved differential search algorithm (FLC_R-IDS) is proposed for solving a mobile robot wall-following control problem. This study uses the reward and punishment mechanisms of reinforcement learning to train the mobile robot wall-following control. The proposed improved differential search algorithm uses parameter adaptation to adjust the control parameters. To improve the exploration of the algorithm, a change in the number of superorganisms is required as it involves a stopover site. This study uses reinforcement learning to guide the behavior of the robot. When the mobile robot satisfies three reward conditions, it gets reward +1. The accumulated reward value is used to evaluate the controller and to replace the next controller training. Experimental results show that, compared with the traditional differential search algorithm and the chaos differential search algorithm, the average error value of the proposed FLC_R-IDS in the three experimental environments is reduced by 12.44%, 22.54% and 25.98%, respectively. Final, the experimental results also show that the real mobile robot using the proposed method can effectively implement the wall-following control.


Introduction
Wall-following [1,2], navigation [3], path tracking [4], and parallel-parking controls are prevalent in the field of robotics and artificial intelligence research. The design of robot navigation and parallel-parking behavior pushes the robot to move in an unknown environment. Wall-following behavior control is significant for mobile robot behavior. Zadah [5] proposed fuzzy logic in 1965; however, the logic was characterized by a high degree of uncertainty, complexity, and nonlinearity. Nevertheless, it was able to solve authentic world uncertainty by simulating the human experience in the form of rules. Many researchers applied fuzzy logic controllers (FLC) [6] to mobile robot navigation [7] and wall-following tasks [8]. Moreover, an optimization method has been proposed to improve the performance of FLC, such as supervised learning [9], reinforcement learning [10,11], and population-based learning [12].
Traditional supervised learning requires training data, whereas reinforcement learning does not require training data as the rewards or punishment mechanism is only needed for the training. Civicioglu [13] proposed a new heuristic differential search (DS) algorithm in 2012, which was originally used for geodetic transformation as the DS algorithm and other traditional algorithms were similar

Mobile Robot Control Using a Fuzzy Logic Controller
This section discusses the mobile robots and the architecture of FLC. The mobile robot is trained to follow the wall and must be guided by reinforcement learning reward conditions. Figure 1 shows the architecture of the FLC with the reinforcement-based improved differential search algorithm (FLC_R-IDS) for the mobile robot.

Description of Mobile Robots
The experiments carried out by a mobile robot (PIONEER 3-DX). In many navigation design and robot movement problems, the Pioneer 3-DX robot is a pony lightweight, two-wheel, two-motor differential drive ideal for indoor environmental laboratory or smaller classroom use. The robot itself

Description of Mobile Robots
The experiments carried out by a mobile robot (PIONEER 3-DX). In many navigation design and robot movement problems, the Pioneer 3-DX robot is a pony lightweight, two-wheel, two-motor differential drive ideal for indoor environmental laboratory or smaller classroom use. The robot itself is equipped with eight front ultrasonic sensors, batteries, motor encoders, microcontrollers with ARCOS firmware and a Pioneer Mobile Robot Software Development Kit. The robot ultrasonic sensor measures a range between 0.15 m and approximately 4.75 m. The ultrasonic sensor positions in Pioneer 3-DX were fixed in the following configuration: two on the side and six facing outward at 20 • intervals of 180 • forward coverage, as shown in Figure 2.
Training Environment Figure 1. Architecture of the proposed FLC_R-IDS for mobile robot.

Description of Mobile Robots
The experiments carried out by a mobile robot (PIONEER 3-DX). In many navigation design and robot movement problems, the Pioneer 3-DX robot is a pony lightweight, two-wheel, two-motor differential drive ideal for indoor environmental laboratory or smaller classroom use. The robot itself is equipped with eight front ultrasonic sensors, batteries, motor encoders, microcontrollers with ARCOS firmware and a Pioneer Mobile Robot Software Development Kit. The robot ultrasonic sensor measures a range between 0.15 m and approximately 4.75 m. The ultrasonic sensor positions in Pioneer 3-DX were fixed in the following configuration: two on the side and six facing outward at 20° intervals of 180° forward coverage, as shown in Figure 2.

Architecture of Fuzzy Logic Controller
In the architecture of a FLC, the right four ultrasonic sensors (S1, S2, S3, and S4) are FLC inputs. The left-wheel and right-wheel speeds of the robot are FLC outputs. The FLC realizes a fuzzy model in the following form: where x1 is an ultrasonic sensor value of S1; x2 is an ultrasonic sensor value of S2; x3 is an ultrasonic sensor value of S3; x4 is an ultrasonic sensor value of S4; Aij is the linguistic term of the precondition part; yl is the left-wheel speed of the robot; yr is the right-wheel speed of the robot; uj and vj are the weights of consequent parts. A fuzzification operation serves as the Gaussian membership function: where mij represents the mean of the Gaussian membership function of the fuzzy set, whereas σij represents the variance of the Gaussian membership function of the fuzzy set.

Architecture of Fuzzy Logic Controller
In the architecture of a FLC, the right four ultrasonic sensors (S 1 , S 2 , S 3 , and S 4 ) are FLC inputs. The left-wheel and right-wheel speeds of the robot are FLC outputs. The FLC realizes a fuzzy model in the following form: Rule j : IF x 1 is A 1 j and x 2 is A 2 j and x 3 is A 3 j and x 4 is A 4 j THEN y l is u j and y r is v j where x 1 is an ultrasonic sensor value of S 1 ; x 2 is an ultrasonic sensor value of S 2 ; x 3 is an ultrasonic sensor value of S 3 ; x 4 is an ultrasonic sensor value of S 4 ; A ij is the linguistic term of the precondition part; y l is the left-wheel speed of the robot; y r is the right-wheel speed of the robot; u j and v j are the weights of consequent parts. A fuzzification operation serves as the Gaussian membership function: where m ij represents the mean of the Gaussian membership function of the fuzzy set, whereas σ ij represents the variance of the Gaussian membership function of the fuzzy set.
In the fuzzy implication operation using product operation, the fuzzy implication evaluates the consequent part of each rule as follows: In the defuzzification operation, the center of the area is used in this study, and it is described as: where y l is the left-wheel speed of the robot, and y r is the right-wheel speed of the robot.

The Proposed Reinforcement-Based Improved Differential Search Algorithm
The DS algorithm is a new heuristic algorithm [13]. The main concept of the DS algorithm is the simulation of the migration behavior of the creature. Numerical optimization algorithms have some common problems because they are characterized by poor exploration ability, and it is easy to fall into the local solution. To improve the performance of the algorithm, this study proposed the IDS algorithm. The IDS algorithm is inspired by parameter adaptability, in which the controller adjusts the parameters to the most suitable value during the evolution. In addition, adjusting the probability of parameter exchange enables the algorithm to be more diverse and to perform better than the original DS algorithm. On the mobile robot wall-following control, the IDS algorithm associated with the FLC is discussed in Section 2.1. This study proposes a fuzzy logic controller with the reinforcement-based improved differential search algorithm (FLC_R-IDS) for solving the mobile robot wall-following control problem. The proposed IDS is an evolutionary algorithm to optimize parameters of the FLC. Therefore, all parameters of the FLC are encoded into each individual of IDS. Moreover, each individual (i.e., FLC) is evaluated by reinforcement learning-that is, the performance of each individual is defined as the reward value of the mobile robot wall-following control in the training environment.
The DS algorithm that simulates the migration behavior of the creature is discussed. The DS algorithm simulates the biological process of the food energy migration. The superorganism often migrates to areas of abundant resources during seasonal changes. The Brownian-like random-walk movement is used to determine the migration of the superorganism; its pseudocode is shown in Algorithm 1.

Algorithm 1. A comparison of different identifiers in terms of dynamic system identification
where yl is the left-wheel speed of the robot, and yr is the right-wheel speed of the robot.

The Proposed Reinforcement-Based Improved Differential Search Algorithm
The DS algorithm is a new heuristic algorithm [13]. The main concept of the DS algorithm is the simulation of the migration behavior of the creature. Numerical optimization algorithms have some common problems because they are characterized by poor exploration ability, and it is easy to fall into the local solution. To improve the performance of the algorithm, this study proposed the IDS algorithm. The IDS algorithm is inspired by parameter adaptability, in which the controller adjusts the parameters to the most suitable value during the evolution. In addition, adjusting the probability of parameter exchange enables the algorithm to be more diverse and to perform better than the original DS algorithm. On the mobile robot wall-following control, the IDS algorithm associated with the FLC is discussed in Section 2.1. This study proposes a fuzzy logic controller with the reinforcement-based improved differential search algorithm (FLC_R-IDS) for solving the mobile robot wall-following control problem. The proposed IDS is an evolutionary algorithm to optimize parameters of the FLC. Therefore, all parameters of the FLC are encoded into each individual of IDS. Moreover, each individual (i.e., FLC) is evaluated by reinforcement learning-that is, the performance of each individual is defined as the reward value of the mobile robot wall-following control in the training environment.
The DS algorithm that simulates the migration behavior of the creature is discussed. The DS algorithm simulates the biological process of the food energy migration. The superorganism often migrates to areas of abundant resources during seasonal changes. The Brownian-like random-walk movement is used to determine the migration of the superorganism; its pseudocode is shown in Algorithm 1.

Algorithm 1.
A comparison of different identifiers in terms of dynamic system identification. Randomly selected donor; 5 Calculate the p1, p2 and Scale; 6 Generate the Stopover site, where Stopover site = Superorganism + Scale×(donor − Superorganism); 7 //The Superorganism participating in the search process is determined through random scheme 8 Evaluation of the Stopover site; 9 if Stopover site is better than Superorganism then 10 Superorganism is replaced by Stopover site; 11 end 12 end 13 return Stopover site; In the DS algorithm, the initial position of the superorganism is expressed as follows: where rand is the uniform random number distribution between 0 and 1. Randomly selected individual of superorganism is donor = X rand_selecte(i) , and superorganism uses the donor to find the stopover sites. However, the size of the stopover site depends on the scale value. The scale is expressed as follows: where rand 1 , rand 2 , and rand 3 are gamma random number distribution, and rand is a uniform random number distribution between 0 and 1. In the DS algorithm, the stopover site position is expressed as follows: In the DS algorithm, the members of the superorganism are randomly assigned to stopover site search process. In assessing the stopover site, if the stopover site is better than the superorganism, the superorganism will migrate to the stopover site. With the constant migration of position, the superorganism will migrate to the best global solution.

Improved Differential Search Algorithm for Optimizing FLC Parameters
Although the original DS algorithm can solve the problem, they lack the diversity of the algorithm and the exploration ability [14]. To improve the diversity of algorithms and exploration capabilities and to avoid falling into the best local solution in the evolutionary process, the IDS algorithm adjusts the number of times a random scheme participates in the stopover site. For some heuristic algorithms, no fixed control parameters are better suited to solve different problems [18,19]. The original DS algorithm has two control parameters (p1 and p2) that affect the proportion of the superorganism in the stopover site. A flowchart of the IDS is shown in Figure 3 and explained as follows: the donor to find the stopover sites. However, the size of the stopover site depends on the scale value. The scale is expressed as follows: where rand1, rand2, and rand3 are gamma random number distribution, and rand is a uniform random number distribution between 0 and 1.
In the DS algorithm, the stopover site position is expressed as follows: In the DS algorithm, the members of the superorganism are randomly assigned to stopover site search process. In assessing the stopover site, if the stopover site is better than the superorganism, the superorganism will migrate to the stopover site. With the constant migration of position, the superorganism will migrate to the best global solution.

Improved Differential Search Algorithm for Optimizing FLC Parameters
Although the original DS algorithm can solve the problem, they lack the diversity of the algorithm and the exploration ability [14]. To improve the diversity of algorithms and exploration capabilities and to avoid falling into the best local solution in the evolutionary process, the IDS algorithm adjusts the number of times a random scheme participates in the stopover site. For some heuristic algorithms, no fixed control parameters are better suited to solve different problems [18,19]. The original DS algorithm has two control parameters (p1 and p2) that affect the proportion of the superorganism in the stopover site. A flowchart of the IDS is shown in Figure 3 and explained as follows:  Step 1. Initialize the Superorganism To initialize the Superorganism individually, the superorganism is defined as X ij (i = 1,2,3 . . . N, j = 1,2,3 . . . D), where N is the population size, and D is the dimension of the problem. The superorganism is composed of FLC fuzzy rules. Figure 4 shows the FLC coding principle in the IDS algorithm, where m ij is the mean of the Gaussian membership function and σ ij is the variance of the Gaussian membership function. u j and v j are the corresponding weight parameters of the consequent part. Four inputs and two outputs in the FLC are identified.
To initialize the Superorganism individually, the superorganism is defined as Xij(i = 1,2,3…N, j = 1,2,3…D), where N is the population size, and D is the dimension of the problem. The superorganism is composed of FLC fuzzy rules. Figure 4 shows the FLC coding principle in the IDS algorithm, where mij is the mean of the Gaussian membership function and σij is the variance of the Gaussian membership function. uj and vj are the corresponding weight parameters of the consequent part. Four inputs and two outputs in the FLC are identified. m 1j m 2j m 3j The input parameter is the value of the ultrasonic sensor with the right wall (S1, S2, S3, and S4), and the output parameter is the rotation speed of the left wheel (LW) and right wheel (RW). The abovementioned control parameters must be defined by the user in advance. The FLC parameters are initialized as follows: where each ultrasonic sensor value has a range of 0.2 to 0.8 m. The left-wheel and right-wheel speeds have a range of 0 m/s to 10 m/s in the simulations.
Step 2. Evaluate the individual of the Superorganism Each individual of the superorganism is a controller of the mobile robot. Thus, reinforcement learning is used to train a mobile robot to the wall-follow task. Moreover, reinforcement learning offers rewards or penalties when training the mobile robot controller. After several trainings, the mobile robot controller will gradually learn the correct action to get rewards. The reward value is used to evaluate the controller in the training process. Section 3.2 introduces reinforcement learning conditions designed for the mobile robot wall-following control task.
Step 3. Adjust the control parameters In many studies, the setting of control parameters affects the performance of the algorithm [18,19]. The IDS algorithm has two control parameters: p1 and p2. In this study, the self-adaptive parameter adjusts the different learning processes. Figure 5 shows that the parameter adaptation is different from the fixed parameter encoding. The input parameter is the value of the ultrasonic sensor with the right wall (S 1 , S 2 , S 3 , and S 4 ), and the output parameter is the rotation speed of the left wheel (LW) and right wheel (RW). The abovementioned control parameters must be defined by the user in advance. The FLC parameters are initialized as follows: where each ultrasonic sensor value has a range of 0.2 to 0.8 m. The left-wheel and right-wheel speeds have a range of 0 m/s to 10 m/s in the simulations.
Step 2. Evaluate the individual of the Superorganism Each individual of the superorganism is a controller of the mobile robot. Thus, reinforcement learning is used to train a mobile robot to the wall-follow task. Moreover, reinforcement learning offers rewards or penalties when training the mobile robot controller. After several trainings, the mobile robot controller will gradually learn the correct action to get rewards. The reward value is used to evaluate the controller in the training process. Section 3.2 introduces reinforcement learning conditions designed for the mobile robot wall-following control task.
Step 3. Adjust the control parameters In many studies, the setting of control parameters affects the performance of the algorithm [18,19]. The IDS algorithm has two control parameters: p1 and p2. In this study, the self-adaptive parameter adjusts the different learning processes. Figure 5 shows that the parameter adaptation is different from the fixed parameter encoding.  The p1 and p2 initial values are 0.5. At each generation g, the control parameters p1 and p2 of each individual of the superorganism are generated by a normal distribution of mean μp1 and μp2, and the standard deviation is 0.1 truncated to [0, 1]: The p1 and p2 initial values are 0.5. At each generation g, the control parameters p1 and p2 of each individual of the superorganism are generated by a normal distribution of mean µp1 and µp2, and the standard deviation is 0.1 truncated to [0, 1]: The mean µp1 and µp2 are initialized to 0.5 and updated to each generation as follows: where x is a positive constant 0 and 1, and the mean is the arithmetic mean. The S p1 and S p2 are the successful control parameters p1 and p2 at generation g.
Step 4. Select the Donor and Generate Scale Randomly selected individuals of the superorganism to the donor are shown as follows: The size of the stopover site depends on the scale factor where the scale is expressed as follows: where randg is gamma random number distribution, and rand is a uniform random number distribution between 0 and 1.
Step 5. Generate Position of the Stopover Site In the IDS algorithm, the position of the stopover site is expressed as follows: The stopover site must be limited in scope. The limit is expressed as follows: i f Stopover site 0.2 or Stopover site 0.8 where rand is the uniform random number distribution between 0 and 1.
Step 6. Adjust the Number of Superorganisms in the Search In this step, the superorganism participating in the search process is determined through random scheme. Its random process pseudocode is shown in Algorithm 2. Then, the procedure of the random number will produce N × D random number matrix, called the R matrix. The size of the R matrix corresponds to the population size. If the value within the R matrix is greater than 0, the individual of the Superorganism is involved in the Stopover site.
Step 7. Evaluate the Stopover Site Similarly, the reinforcement learning reward that assesses the stopover site uses the same reward condition to evaluate the stopover site.
Step 8. Update Superorganism If the solution of the Stopover site is superior to that of the superorganism, the superorganism is replaced by the stopover site: The random program is used to adjust the number of superorganisms in the search Step 7. Evaluate the Stopover Site Similarly, the reinforcement learning reward that assesses the stopover site uses the same reward condition to evaluate the stopover site.
Step 8. Update Superorganism If the solution of the Stopover site is superior to that of the superorganism, the superorganism is replaced by the stopover site:

Reward of Reinforcement Learning
Reinforcement learning is a learning method in machine learning. Contrary to supervised learning, reinforcement learning has no training data. The concept of reinforcement learning comes from reward and punishment. This involves training the learner with rewards. When the learners behave correctly, they are rewarded. After some periods, the learner learns to behave correctly in different situations. Since the correct behavior can get a reward, it is essential to discuss the reward conditions for training mobile robots for the wall-following behavior.
In reinforcement learning, in training robots along the wall, the reward value evaluates the FLC performance. This study designed three conditions to train the robot to learn to follow the right wall. When the mobile robot satisfies the three conditions at the same time, the mobile robot controller can get the reward value +1. Otherwise, when the robot violates any of the conditions during the move, the controller stops accumulating the reward value, and the next controller is tested. However, when the controller gets a reward value which has accumulated to 6000, the mobile robot will stop learning and will be upgraded to the next run of training.
To enable the robot to learn to follow the wall, it is essential to define the distance between the robot and the wall in advance. However, the first condition is to keep distance. The robot must be kept between 0.3 and 0.8 m from the wall using the sensor (S4) to measure the distance between the robot and wall.

Reward of Reinforcement Learning
Reinforcement learning is a learning method in machine learning. Contrary to supervised learning, reinforcement learning has no training data. The concept of reinforcement learning comes from reward and punishment. This involves training the learner with rewards. When the learners behave correctly, they are rewarded. After some periods, the learner learns to behave correctly in different situations. Since the correct behavior can get a reward, it is essential to discuss the reward conditions for training mobile robots for the wall-following behavior.
In reinforcement learning, in training robots along the wall, the reward value evaluates the FLC performance. This study designed three conditions to train the robot to learn to follow the right wall. When the mobile robot satisfies the three conditions at the same time, the mobile robot controller can get the reward value +1. Otherwise, when the robot violates any of the conditions during the move, the controller stops accumulating the reward value, and the next controller is tested. However, when the controller gets a reward value which has accumulated to 6000, the mobile robot will stop learning and will be upgraded to the next run of training.
To enable the robot to learn to follow the wall, it is essential to define the distance between the robot and the wall in advance. However, the first condition is to keep distance. The robot must be kept between 0.3 and 0.8 m from the wall using the sensor (S 4 ) to measure the distance between the robot and wall.
The first condition is defined as: where S 4 denotes the ultrasonic sensor values. We also use the front and right front sensors (S 1 and S 3 ) to determine whether any obstacles are present in the front. While sensing the right front of the mobile robot with or without walls, the second condition is defined as: where S 1 and S 3 are ultrasonic sensor values, and cos (40 • ) is the angle between the sensor S 1 and S 3 . Finally, the third condition is the speed of the mobile robot. To stop the robot from moving forward in the training process, we used the third condition to increase the wheel speed of the robot to more than 1 m/s. In this way, the mobile robot must move forward in the training. The third condition is defined as: where RM is the mobile robot right-wheel speed, and LM is the mobile robot left-wheel speed.

Stability Analysis of the FLC_R-IDS
The global stability of the control system is a basic requirement for solving mobile robot wall-following control problems. Since the general evolutionary algorithm is characteristic of random search, some search points may make the learning process unstable. In this subsection, the supervisory control u r (t) is designed (Figure 1) to guarantee the global stability of the closed-loop system in the sense that the error state variables must be uniformly bounded: e(t) ≤ M < ∞ for all t ≥ 0, where M is a design parameter that is specified by the designer. Therefore, using the supervisory control u r (t) in Figure 1 always yields V(t) → 0 (i.e., s(t) → 0 ), which in turn implies e(t) ≤ M.

Experimental Results
To demonstrate the proposed FLC_R-IDS for solving the mobile robot wall-following control problem, this section describes the results of wall-following control simulations performed using the PIONEER 3-DX and compares these experimental results with those of other algorithms. The FLC has four inputs, defined as the right four ultrasonic sensors of the mobile robot and two outputs defined as the left-wheel and right-wheel speeds of the mobile robot in Section 2. The IDS is designed to optimize the FLC parameters. Therefore, the FLC parameters are encoded into individuals of the IDS algorithm. Furthermore, each individual of IDS is evaluated by the reward of reinforcement learning in Section 3. In this study, three reward conditions are defined as follows: to maintain a user-defined robot-wall distance, to avoid robot-wall collision, and to ensure that the robot can successfully move along the wall to go round the stadium.
During the learning process, mobile robots are allowed to learn along the wall in a simple environment. This training environment comprises of straight lines, right angles, and obtuse walls. A simple environment to train a mobile robot demonstrates that the incentive conditions effectively learn the wall-following behavior. It can also be completed along the wall in a more complex test environment. The Webot robotic simulation software is used to train the mobile robot to learn along the wall. At the same time, we compared this with the DS algorithm [13] to optimize the FLC_R-DS. We used the chaotic DS algorithm [14] to optimize the FLC_R-CDS. Figure 6 shows the training environment. The proposed method has five parameters: a population size (PS), a number of rules, p1 and p2 and a learning success reward value. In general, the larger PS, the more robust the search will be, with increased computational cost. The number of rules depends on the complexity of the problem. In the IDS algorithm, there are two control parameters: p1 and p2. The self-adaptive parameters (p1 and p2) were adjusted on the different learning processes. A single performance measurement in terms of failure and success can be used to determine the control policy that produces a maximal learning success reward value by trial-and-error tests. However, the selection of these parameters will critically affect the simulation results. The population size, which uses the range [20,50], the number of rules, which uses the range [5,10], the p1 and p2, which use the range [0, 1], and the learning success reward value, which uses the range [5000, 8000], were carefully examined in extensive experiments. Table 1 presents the initial parameters set before the learning process. Mobile robots will learn along the wall in a training environment. If the robot satisfies the reward conditions, the controller gets the reward. Conversely, the inability of the robot to satisfy the reward conditions is considered a failure. When a failure occurs, the accumulated value of the reward is used to evaluate the FLC.
Mathematics 2020, 8, x FOR PEER REVIEW 10 of 21 measurement in terms of failure and success can be used to determine the control policy that produces a maximal learning success reward value by trial-and-error tests. However, the selection of these parameters will critically affect the simulation results. The population size, which uses the range [20,50], the number of rules, which uses the range [5,10], the p1 and p2, which use the range [0, 1], and the learning success reward value, which uses the range [5000, 8000], were carefully examined in extensive experiments. Table 1 presents the initial parameters set before the learning process. Mobile robots will learn along the wall in a training environment. If the robot satisfies the reward conditions, the controller gets the reward. Conversely, the inability of the robot to satisfy the reward conditions is considered a failure. When a failure occurs, the accumulated value of the reward is used to evaluate the FLC.
Wall Robot Figure 6. The training environment.

Training Results of Mobile Robot Wall-Following Control
The mobile robot controller will continue to train until a successful condition is reached. The success condition is the accumulated reward value that reaches 6000. When the mobile robot reaches the success condition, it performs the wall-following task. In the learning process, the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods have 30 independent training runs. Table 2 shows the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods and compares the evaluation numbers in the learning process. Figure 7 shows the learning curve of 30 independent training runs of the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods. In Figure 7a, the average number of evaluations of the FLC_R-IDS method learning success is 484. Using the FLC_R-IDS method to complete the training along the wall, the minimum number of evaluations is 26. The maximum number of evaluations is 1851. In Figure 7b, the average number of evaluations of the FLC_R-DS learning success is 936. Using the FLC_R-DS method to complete the training along the wall, the minimum number of evaluations is 121. The maximum number of evaluations is 2554. Figure 7c shows that the average number of evaluations for the FLC_R-CDS method learning success is 1047. Using the FLC_R-CDS method to complete the training along the wall, the minimum number of evaluations is 70. The maximum number of

Training Results of Mobile Robot Wall-Following Control
The mobile robot controller will continue to train until a successful condition is reached. The success condition is the accumulated reward value that reaches 6000. When the mobile robot reaches the success condition, it performs the wall-following task. In the learning process, the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods have 30 independent training runs. Table 2 shows the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods and compares the evaluation numbers in the learning process. Table 2. Comparison of the evaluation numbers of various existing models in the learning process.  Figure 7 shows the learning curve of 30 independent training runs of the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods. In Figure 7a, the average number of evaluations of the FLC_R-IDS method learning success is 484. Using the FLC_R-IDS method to complete the training along the wall, the minimum number of evaluations is 26. The maximum number of evaluations is 1851. In Figure 7b, the average number of evaluations of the FLC_R-DS learning success is 936. Using the FLC_R-DS method to complete the training along the wall, the minimum number of evaluations is 121. The maximum number of evaluations is 2554. Figure 7c shows that the average number of evaluations for the FLC_R-CDS method learning success is 1047. Using the FLC_R-CDS method to complete the training along the wall, the minimum number of evaluations is 70. The maximum number of evaluations is 2716. From the above, the FLC_R-IDS method learns faster than the other algorithms in the learning process.

FLC_R-IDS FLC_R-DS FLC_R-CDS
Mathematics 2020, 8, x FOR PEER REVIEW 11 of 21 evaluations is 2716. From the above, the FLC_R-IDS method learns faster than the other algorithms in the learning process.  In this section, we introduce the IDS algorithm without the self-adaptive method. This part removes the self-adaptive method of the IDS algorithm. To compare the difference between the IDS algorithm and IDS algorithm without the self-adaptive method, we used the IDS algorithm without the self-adaptive method to train 30 times independently. Compared with the FLC_R-DS method and FLC_R-IDS method, Table 3 shows the FLC_R-IDS method without the self-adaptive method and compares the results. The average value of the evaluation without the self-adaptive method was 626.5. Without the self-adaptive method, the control parameters (p1 and p2) cannot be adjusted during the learning process. By adjusting the number of the superorganisms in the search, the method can be learned faster than the FLC_R-DS method. Table 3. Comparison between the evaluation numbers of various existing models in the learning process.  In this section, we introduce the IDS algorithm without the self-adaptive method. This part removes the self-adaptive method of the IDS algorithm. To compare the difference between the IDS algorithm and IDS algorithm without the self-adaptive method, we used the IDS algorithm without the self-adaptive method to train 30 times independently. Compared with the FLC_R-DS method and FLC_R-IDS method, Table 3 shows the FLC_R-IDS method without the self-adaptive method and compares the results. The average value of the evaluation without the self-adaptive method was 626.5. Without the self-adaptive method, the control parameters (p1 and p2) cannot be adjusted during the learning process. By adjusting the number of the superorganisms in the search, the method can be learned faster than the FLC_R-DS method. Table 3. Comparison between the evaluation numbers of various existing models in the learning process.

Testing Results of Mobile Robot Wall-Following Control
To show the proposed FLC_R-IDS method, we describe the results of the wall-following control simulations performed using the Webots robotic simulation software and we compare the results of the performance with those of other algorithms. In the learning process, the FLC undergoes reinforcement learning to get the best controller. Then, the trained FLC will be tested in the three experimental environments used for the simulations. First, the controller performs the test in the original training environment. The terrain of the training environment is relatively simple. Most of the terrain is a straight line, right angle, and an obtuse angle. The second experimental environment terrain is a combination of a right angle and a straight line. The third experimental environment is more complex than the previous two environments. The terrain is a straight line, right angle, and acute angle. However, the acute angle never appeared in the training environment. The arc-shaped terrain is also used for the experiment. Several best controllers in the experimental environment for testing and analysis are discussed, which are used to compare the performance of the FLC_R-IDS method with those of other methods. This example aims to design and analyze the FLC for a wall-following task. With 30 independent trainings runs, we get 30 mobile robot wall-following controllers. Thirty wall-following controllers are tested one by one in the experimental environment.
(1) Comparison of results of various methods in testing environment 1 The experiment is performed to demonstrate the best-performing FLC_R-IDS controller in the training environment. Figure 8a shows that the trained controller can complete the task of following the wall. Figure 8b shows the distance values according to the ultrasonic sensors S 1 , S 3 , and S 4 and the left-wheel and right-wheel speeds of the robot. When the robot moved along the wall to point A, the robot encountered a right angle. To avoid collision with the wall, the robot quickly turned left. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 0.78, 0.53, and 0.32, respectively. The left-wheel and right-wheel speeds were 1.76 and 2.94 m/s, respectively. When the robot moved to point B along the wall, the robot slowly turned left in a straight line. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 0.76, 0.61, and 0.42 m, respectively. The left-wheel and right-wheel speeds were 1.83 and 3.25 m/s, respectively. When the robot in the C point environment was in a straight line, the robot continued to move straight ahead. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.42, and 0.3 m, respectively. The left-wheel and right-wheel speeds were 1.81 and 1.81 m/s, respectively. At point D, when the robot encountered the outer corner, the robot must turn right; otherwise, the robot will move away from the wall. In this case, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 1.0, and 1.0 m, respectively. The left-wheel and right-wheel speeds were 2.76 and 2.3 m/s, respectively. At point E, when the robot turned over the outer corner, the robot must continue along the wall. In this case, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.75, and 0.46 m, respectively. The left-wheel and right-wheel speeds were 2.64 and 2.1 m/s, respectively. Figure 9 shows the comparison of the path tracking best performances of the proposed FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods in test environment 1.
The mobile robots learn along the wall. The excellent performance along the wall is characterized by a robot's capability to maintain a distance from the wall. Thus, we analyzed the distance between the robot and wall using the mean absolute error (MAE), which evaluates the performance of the FLC in the wall-following task. When the mobile robot's S 4 sensor value is 0.3, the robot d wall error is zero. The smaller the value of MAE, the better the performance of the controller: Step total (23) where S 4(i) is the value of the S 4 sensor for each step of the robot; the d wall is 0.3 of the distance between the robot and wall; the Setp total is the robot that completes the total number of steps to walking along the wall.
IDS method with those of other methods. After 30 independent trainings runs, each algorithm got 30 controllers. In IDS, the FLC best controller performance is better than that of the DS and CDS. Regarding the average of 30 controllers, the FLC_R-IDS method performs better than the FLC_R-DS method and FLC_R-CDS algorithms.     Table 4 represents the FLC_R-IDS, FLC_R-DS and FLC_R-CDS methods for 30 controllers and their MAE values in the test environment. This study also compared the performance of the FLC_R-IDS method with those of other methods. After 30 independent trainings runs, each algorithm got 30 controllers. In IDS, the FLC best controller performance is better than that of the DS and CDS. Regarding the average of 30 controllers, the FLC_R-IDS method performs better than the FLC_R-DS method and FLC_R-CDS algorithms. The wall-following controller can train along the right wall using the fuzzy controller with evolutionary reinforcement learning. The mobile robot ultrasonic sensors are symmetrical. So, the inputs of the FLC replaced the mobile robot left ultrasonic sensors. (Sensor S 1 is replaced by S 5 ; sensor S 2 is replaced by S 6 ; sensor S 3 is replaced by S 7 ; sensor S 4 is replaced by S 8 .) The mobile robot left-and right-wheel exchange show that mobile robots can complete the task that follows the left wall. Figure 10 shows that the FLC input is replaced by the left side of the ultrasonic sensor. The FLC_R_IDS method is the path of the left wall in the test environment 1.  The wall-following controller can train along the right wall using the fuzzy controller with evolutionary reinforcement learning. The mobile robot ultrasonic sensors are symmetrical. So, the inputs of the FLC replaced the mobile robot left ultrasonic sensors. (Sensor S1 is replaced by S5; sensor S2 is replaced by S6; sensor S3 is replaced by S7; sensor S4 is replaced by S8.) The mobile robot left-and right-wheel exchange show that mobile robots can complete the task that follows the left wall. Figure  10 shows that the FLC input is replaced by the left side of the ultrasonic sensor. The FLC_R_IDS method is the path of the left wall in the test environment 1. (2) Comparison of results of various methods in testing environment 2 The results of the mobile robot controller testing in test environment 2 are discussed. This experimental environment terrain is a combination of a right angle and straight line. Figure 11a shows that the trained controller can complete the task in combination with the right angle and straight line environment. Figure 11b shows the distance values according to the ultrasonic sensors S1, S3, and S4 and the left-wheel and right-wheel speeds of the robot at the robot moving distances. When the robot moved along the wall to point A and encountered the outer corner, it must turn right; otherwise, the (2) Comparison of results of various methods in testing environment 2 The results of the mobile robot controller testing in test environment 2 are discussed. This experimental environment terrain is a combination of a right angle and straight line. Figure 11a shows that the trained controller can complete the task in combination with the right angle and straight line environment. Figure 11b shows the distance values according to the ultrasonic sensors S 1 , S 3 , and S 4 and the left-wheel and right-wheel speeds of the robot at the robot moving distances. When the robot moved along the wall to point A and encountered the outer corner, it must turn right; otherwise, the robot will move away from the wall. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 1.0, and 0.35 m, respectively. The left-wheel and right-wheel speeds were 2.78 and 2.27 m/s, respectively. When the robot moved along the wall to point B, the front area of the robot was a small corner; the robot must go left and then go right. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.36, and 0.38 m, respectively. The left-wheel and right-wheel speeds were 1.61 and 2.32 m/s, respectively. When the robot moved along the wall to point C, the robot encountered a right angle. To avoid collision with the wall, the robot quickly turned left. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 0.68, 0.71, and 0.37 m, respectively. The left-wheel and right-wheel speeds were 2.11 and 3.39 m/s, respectively. When the robot at point D of the environment is travelling in a straight line, it continues to go straight. In this case, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.41, and 0.31 m, respectively. The left-wheel and right-wheel speeds were 1.81 and 1.82 m/s, respectively. Figure 12 shows the comparison of the path tracking best performances of the proposed FLC_R-IDS, FLC_R-DS, and FLC_R-CDS methods in test environment 2. Similar to the previous experiment, Equation (23) is used to assess the performance of each FLC. Table 5 represents the FLC_R-IDS method and the MAE values of other methods for the 30 controllers in the test environment 2. It also shows the comparison of the performance of the FLC_R-IDS method with those of the other methods. From Table 5, several controllers failed in test environment 2 because the environment is more complex than the training environment. When the robot controller collides or stops in the test environment, this represents a failure. After 30 independent training runs, each algorithm got 30 controllers. The best controller performance in the FLC_R-IDS method is better than that in the FLC_R-DS and FLC_R-CDS methods. In the average of 30 controllers, the FLC_R-IDS method performs better than the FLC_R-DS and FLC_R-CDS methods. In the FLC_R-DS and FLC_R-CDS methods, a collision occurred while running test environment 2.
Mathematics 2020, 8, x FOR PEER REVIEW 15 of 21 robot will move away from the wall. At this time, the ultrasonic sensor values of S1, S3, and S4 were 1.0, 1.0, and 0.35 m, respectively. The left-wheel and right-wheel speeds were 2.78 and 2.27 m/s, respectively. When the robot moved along the wall to point B, the front area of the robot was a small corner; the robot must go left and then go right. At this time, the ultrasonic sensor values of S1, S3, and S4 were 1.0, 0.36, and 0.38 m, respectively. The left-wheel and right-wheel speeds were 1.61 and 2.32 m/s, respectively. When the robot moved along the wall to point C, the robot encountered a right angle. To avoid collision with the wall, the robot quickly turned left. At this time, the ultrasonic sensor values of S1, S3, and S4 were 0.68, 0.71, and 0.37 m, respectively. The left-wheel and right-wheel speeds were 2.11 and 3.39 m/s, respectively. When the robot at point D of the environment is travelling in a straight line, it continues to go straight. In this case, the ultrasonic sensor values of S1, S3, and S4 were 1.0, 0.41, and 0.31 m, respectively. The left-wheel and right-wheel speeds were 1.81 and 1.82 m/s, respectively. Figure 12 shows the comparison of the path tracking best performances of the proposed FLC_R-IDS, FLC_R-DS, and FLC_R-CDS methods in test environment 2. Similar to the previous experiment, Equation (23) is used to assess the performance of each FLC. Table 5 represents the FLC_R-IDS method and the MAE values of other methods for the 30 controllers in the test environment 2. It also shows the comparison of the performance of the FLC_R-IDS method with those of the other methods. From Table 5, several controllers failed in test environment 2 because the environment is more complex than the training environment. When the robot controller collides or stops in the test environment, this represents a failure. After 30 independent training runs, each algorithm got 30 controllers. The best controller performance in the FLC_R-IDS method is better than that in the FLC_R-DS and FLC_R-CDS methods. In the average of 30 controllers, the FLC_R-IDS method performs better than the FLC_R-DS and FLC_R-CDS methods. In the FLC_R-DS and FLC_R-CDS methods, a collision occurred while running test environment 2.    This test environment is more complex than the previous environments. The terrain is a straight line, right angle, and acute angle. The test environment is also a circular terrain. Figure 13a shows that the FLC can complete the task in combination with the right angle, acute angle, straight line, and circular terrain environment. Figure 13b shows the distance values according to the ultrasonic sensors S1, S3, and S4 and the left-wheel and right-wheel speeds of the robot at the robot moving distances. When the robot moved along the wall to point A because the terrain of point A was a straight line, it  This test environment is more complex than the previous environments. The terrain is a straight line, right angle, and acute angle. The test environment is also a circular terrain. Figure 13a shows that the FLC can complete the task in combination with the right angle, acute angle, straight line, and circular terrain environment. Figure 13b shows the distance values according to the ultrasonic sensors S 1 , S 3 , and S 4 and the left-wheel and right-wheel speeds of the robot at the robot moving distances. When the robot moved along the wall to point A because the terrain of point A was a straight line, it must turn right over the straight terrain to continue the action along the wall. At this moment, the values of ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 1.0, and 0.36 m, respectively. The left-wheel and right-wheel speeds were 2.8 and 2.25 m/s, respectively. When the robot moved along the wall to point B, the robot was along the circular terrain. So, the robot must stay forward to the right front. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.68, and 0.44 m, respectively. The left-wheel and right-wheel speeds were 2.45 and 2.25 m/s, respectively. When the robot moved along the wall to point C, the robot would encounter the acute angle. To avoid collision with the wall, the robot quickly turned left. In this acute angle of the terrain, if the mobile robot turns left quickly, the robot will stay away from the wall. Conversely, if the mobile robot turns left too late, the robot collides with the wall. So, the robot must turn left in the appropriate range to continue the task along the wall. In this case, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 0.69, 0.58, and 0.33 m, respectively. The left-wheel and right-wheel speeds were 1.7 and 3.5 m/s, respectively. When the robot in the D point of the environment is in a straight line, the robot will continue to go straight ahead. In this case, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.42, and 0.3 m, respectively. The left-wheel and right-wheel speeds were 1.81 and 1.82 m/s, respectively. When the robot moves along the wall to point E, the robot has just passed an obtuse angle. The robot is not parallel to the wall, so the robot must turn left to adjust the direction of the body. At this time, the ultrasonic sensor values of S 1 , S 3 , and S 4 were 1.0, 0.48, and 0.36 m, respectively. The left-wheel and right-wheel speeds were 1.86 and 2.1 m/s, respectively. Figure 14 shows the comparison of the path tracking best performances of the proposed FLC_R-IDS method and other methods in test environment 3. Table 6 represents the FLC_R-IDS method and other methods for the MAE values of 30 controllers in the test environment 3. It also compares the performance of the FLC_R-IDS method with those of the FLC_R-DS and the FLC_R-CDS methods.
wall, so the robot must turn left to adjust the direction of the body. At this time, the ultrasonic sensor values of S1, S3, and S4 were 1.0, 0.48, and 0.36 m, respectively. The left-wheel and right-wheel speeds were 1.86 and 2.1 m/s, respectively. Figure 14 shows the comparison of the path tracking best performances of the proposed FLC_R-IDS method and other methods in test environment 3. Table 6 represents the FLC_R-IDS method and other methods for the MAE values of 30 controllers in the test environment 3. It also compares the performance of the FLC_R-IDS method with those of the FLC_R-DS and the FLC_R-CDS methods.

Real Mobile Robot Wall-Following Control
This study performed the experiment to show the execution of an actual mobile robot wall-following control task using the FLC_R_IDS method and a PIONEER 3-DX robot. To illustrate the feasibility of the FLC_R_IDS method, a real environment was created to test the mobile robot's performance in an actual wall-following task. In the simulations, the inputs of the FLC were the ultrasonic sensor values. The outputs of the FLC were the robot's left-wheel and right-wheel speeds. The maximum value of each ultrasonic sensor was 0.8 m. Each wheel reached a maximum of translation speed of 10 m/s. The PIONEER 3-DX robot ultrasonic sensor was 5 m. The PIONEER 3-DX robot reached a maximum of translation speed of 1.4 m/s. In the experiments, the inputs and outputs of the FLC were linear conversion. Figure 15 shows the images of the wall-following control results of the proposed approach. The PIONEER 3-DX robot could move along the wall and maintain a user-defined distance from the wall.  Figure 15 shows the images of the wall-following control results of the proposed approach. The PIONEER 3-DX robot could move along the wall and maintain a user-defined distance from the wall.

Conclusions
This study proposed the IDS with the reinforcement learning designed FLC (FLC_R-IDS) to achieve a mobile robot wall-following control task. In the proposed approach, the IDS algorithm uses the self-adaptive parameter to adjust the control parameters. The IDS algorithm adjusts the number that random scheme times for stopover site. The above two methods are used to optimize the FLC

Conclusions
This study proposed the IDS with the reinforcement learning designed FLC (FLC_R-IDS) to achieve a mobile robot wall-following control task. In the proposed approach, the IDS algorithm uses the self-adaptive parameter to adjust the control parameters. The IDS algorithm adjusts the number that random scheme times for stopover site. The above two methods are used to optimize the FLC parameters. Using reinforcement learning to train mobile robots along the wall, reward conditions will affect the overall learning situation in reinforcement learning. If the reward conditions are set to be too difficult, it will be challenging for the robot to learn. On the contrary, if the reward conditions are set to be too simple, the robot will learn easily, but the performance may be worse. In this study, three conditions are proposed. If the three conditions are reached at the same time, the robot controller will get the reward value. Finally, the accumulated reward value can assist in evaluating the standard FLC performance. In this method, the mobile robot is not using training data during the learning process. The experimental results show that the proposed method in three experimental environments reduced the MAE values by 12.44%, 22.54%, and 25.98%, respectively, compared with the various existing models. This study proved that the FLC_R-IDS method performs better than the FLC_R-DS method and the FLC_R-CDS method, in terms of relative fewer number of evaluations of achieved "success". Moreover, the FLC_R-IDS method can be applied successfully to a wall-following control task. The advantages of the proposed FLC_R-IDS are summarized as follows: (1) it does not use training data during the learning process; (2) it uses a self-adaptive method to adjust the control parameters; (3) it designs three conditions to achieve a mobile robot wall-following control task as follows: to maintain a user-defined robot-wall distance, to avoid robot-wall collision and to ensure that the robot can successfully move along the wall to go round the stadium.
In the simulations, the three reward conditions can help robots learn along the wall. The reward conditions proposed in this study can be effectively learned along the wall. However, this method may fail in the dead zone in a more complex experimental environment. Thus, setting appropriate reward conditions is important for learning and designing reward conditions in the future. In recent years, several heuristic algorithms have been proposed, such as the flower pollination algorithm and Jaya algorithm. These algorithms have a good ability to search the solution space and control the robot to learn more quickly along the wall for escaping the dead zone.

Conflicts of Interest:
The authors declare no conflict of interest.