Interval Type-2 Neural Fuzzy Controller-Based Navigation of Cooperative Load-Carrying Mobile Robots in Unknown Environments

In this paper, a navigation method is proposed for cooperative load-carrying mobile robots. The behavior mode manager is used efficaciously in the navigation control method to switch between two behavior modes, wall-following mode (WFM) and goal-oriented mode (GOM), according to various environmental conditions. Additionally, an interval type-2 neural fuzzy controller based on dynamic group artificial bee colony (DGABC) is proposed in this paper. Reinforcement learning was used to develop the WFM adaptively. First, a single robot is trained to learn the WFM. Then, this control method is implemented for cooperative load-carrying mobile robots. In WFM learning, the proposed DGABC performs better than the original artificial bee colony algorithm and other improved algorithms. Furthermore, the results of cooperative load-carrying navigation control tests demonstrate that the proposed cooperative load-carrying method and the navigation method can enable the robots to carry the task item to the goal and complete the navigation mission efficiently.


Introduction
The robotics is rapidly progressed in recent years. Many researchers [1][2][3][4] have applied robots to various fields. Paul et al. [1] proposed a biomimetic robotic fish for informal science learning. Christopher et al. [2] presented a new robotic harvester that can autonomously harvest sweet pepper in protected cropping environments. Michail et al. [3] designed an autonomous robotic vehicle for monitoring the difficult fields to access or dangerous for humans. Maurizio et al. [4] developed effective emotion-based assistive behaviors for a socially assistive robot intended for natural human-robot interaction scenarios with explicit social and assistive task functionalities.
Several real-life tasks are easy for humans but difficult for robots. Robots do not possess brains to think as humans. Therefore, if people can make robots think like humans and judge things more flexibly, then robots can be used in various ways. Recent studies have evaluated robotic applications such as obstacle avoidance, path navigation, load carrying, and path planning.
In the real environment, the noise and interference cause sensor uncertainty. To solve complicated problems in engineering, we cannot successfully use classical methods because of various uncertainties typical for those problems. Several theories can be considered as mathematical tools for dealing with uncertainties. For a stochastically stable phenomenon in probability theory, it should exist a limit of the sample mean in a long series of trials. To test the existence of the limit, a large number of goal-oriented mode (GOM), according to various environmental conditions. Additionally, an interval type-2 neural fuzzy controller (IT2NFC) based on DGABC is proposed in this paper. Reinforcement learning was used to develop the WFM adaptively. The proposed IT2NFC-based DGABC has several advantages, such as (1) Only small amount of parameters are required; (2) Interval type-2 fuzzy sets are used to reduce sensor-sensing noise and disturbance; (3) The effectiveness and robustness of the proposed controller are improved. Based on the aforementioned advantages, the behavior mode manager makes the moving path of the mobile robot smoother and automatically switches between the two modes to complete the task.

Structure of IT2NFC
In this section, the IT2NFC is introduced. The five layers in this structure are as follows: input layer, fuzzification layer, firing layer, output processing layer, and output layer. In Figure 1, X 1~Xn represent the inputs of the IT2NFC, which are the distances between the nearby objects and the robot's infrared sensors. Y Le f t and Y Right indicate the outputs of the IT2NFC, which are the speeds of the right and left wheels, respectively. To improve the performance of the fuzzy system, the order reduction method, established by O. Castillo [19], was used in this paper. The reduction of order in the system can decrease the computational complexity. The rule in the IT2NFC is illustrated as follows: where x i is the input variable, A j i is an IT2FS, y Le f t and y Right are the output variables, ω j 0 + ∑ n i=1 ω j i x i is the output of the Takagi-Sugeno-Kang linear function, i represents the input from 1 to n, and j is the number of the rule.

Input Layer
In this layer, the input data is sent to the next layer directly without any computation.

Fuzzification Layer
In this layer, named the membership function layer, the input data is calculated in a fuzzy manner. Each node is defined as an IT2FS. Figure 2 displays that each membership function is one Gaussian membership function and has an uncertainty mean [m 1 , m 2 ] and a fixed standard deviation σ.
where m j i and σ j i are the mean and deviation in the ith input in the jth Gaussian membership function. However, horizontal shifting causes the upper bound u A and lower bound u A of the membership function. The interval can be represented as and

Firing Layer
In this layer, each node denotes the one rule node. Take the product of each fuzzy set and calculate the firing strength F j . of each rule.
where f j and f j denote the upper and lower bounds of the firing strength.

Output Processing Layer
In this layer, the center-of-sets order reduction operation is used to produce IT1FS, and then the center-of-gravity method is used to de-fuzzy the set and obtain the output value [y l , y r ].
The output of the fuzzy set is defined as the following: and where y l and y r indicate the upper and lower bound values of the output, w j i represents the linked weights of each node, M represents the number of the rule, and n means the number of the input.

5.
Output Layer In this layer, the pervious output value is averaged to get the certain output from the neural network y. y = y l + y r 2 (11) Figure 2. Interval type-2 fuzzy sets.

Structure of ABC
The ABC, proposed by Karaboga in 2005, is a global optimization algorithm based on swarm intelligence. The advantages of ABC include fewer control parameters, parallel computing, global searching, simple computing, easy implementation, and robustness. The ABC consists of employed bees, onlooker bees, and scout bees. In the traditional algorithm, the employed bees search for a new food source within previous searching experiments and share the resource messages to onlooker bees. The onlooker bees wait for the messages from the employed bees and then search for the new food source according to employed bees' information. The scout bees search all the food sources. If the food source gains do not improve, then the scout bees search a new food source in the search area randomly. The traditional ABC is introduced in the following steps [23].
Step 1: Initialization Initialize the total number of food sources (FN), the maximum number of epochs (MEN), and the limit number of each non-improved food source (limit). Put each employed bee in the solution space randomly and set the location as a food source. The fitness of the food source is named the food source gains. (12) where x j i is the initial value of the ith employed bee in the jth dimension, x j min and x j max are the minimum and maximum values of jth dimension in the search space, and the rand[0, 1] is the random number from 0 to 1.
Step 2 Calculate the profits of the food source after each employed bee moves to a new food source.
where v j i is the new location of the ith employed bee in the jth dimension, x j i is the previous location of the ith employed bee in the jth dimension, x j k is the location of the randomly chosen employed bee k in the jth dimension, and the rand[−1, 1] is a random number from −1 to 1.
Step 3 Use roulette wheel selection (Equation (14)) to determine which food source should be searched by each onlooker bee. The onlooker bee searches for the nearby food source and then calculates the profits of that food source.
where P i is the probability of the ith food source being selected, f it i is the profits of the ith food source, and FN is the number of the food source.
Step 4 If the food source gains does not improve after limit times, then abandon this food source. According to Equation (12), the scout bee will find a new food source and replace the abandoned food source.
Step 5 Record the highest food source gains and set it as the best solution in the algorithm.
Step 6 Recursively run the algorithm until the current epoch is equal to MEN. If YES, then stop the algorithm and output the best solution. If NO, then return to step 2.

Structure of DGABC
The proposed DGABC is used to adjust the parameters in the IT2NFC. DGABC balances the search ability of the traditional ABC and avoids the early converge problem. The proposed DGABC uses the dynamic grouping strategy and reforms the movement equations of the employed and onlooker bees to improve the convergence speed and the algorithm's performance. The evolution flowchart is displayed in Figure 3. In DGABC, each bee represents an IT2NFC and a set of parameters. Figure   In the traditional ABC, the number of employed and onlooker bees is equal. However, improper setting of the number of employed and onlooker bees might result in the unequal searching ability of global and local searching and lower the performance.
The group strategy is used to bring parity to the searching competence. Bees are separated into groups dynamically according to their performance. The best bee is selected as the leader. To maintain consistency with the traditional ABC, in which employed bees lead onlooker bees, the leader in each group is selected as an employed bee and the rest of the bees in the group are selected as onlooker bees. During training, the proportion of two types of bees can be adjusted automatically.
Using similarity evaluation to categorize individuals can prevent the following three cases (depicted in Figure 5). Some researchers [20] have used thresholds to group the individuals. However, the fixed threshold results in uneven number of groups in early and late epochs. Therefore, dynamic adjustment of the threshold is proposed in this paper.
The DGABC steps are as follows (depicted in Figure 6): 1. Dynamic Grouping Figure 6. Dynamic grouping method.
Step 1: Sorting Sort the bees from the best to worst and initiate 0 to all the groups.
Step 2: Calculate the threshold value of similarity.
Find the fittest bee among the ungrouped bees. Then, select the fittest bee as the new group leader, name the group g, and calculate the fitness threshold.
where ADIS g and AFIT g are the distance and fitness thresholds, respectively, in group g, Leader g j is the leader location in group g, Fit(Leader g ) is the fitness value of leader in group g, SN is the total number of bees, is the dimension, and NC is the total number of bees in group 0.
Step 3: Evaluate the similarity Calculate the distance Dis i and fitness Fit i between ungrouped bee Bee i and the leader.
If Dis i < ADIS g and Fit i < AFIT g , then the bee Bee i is similar to the group leader. Place them into the same group and change the group number to g.
Step 4: Checking Check whether there are ungrouped bees. If YES, then go back to step 1 and select the fittest bee in the ungrouped bees as the new group leader and repeat step 1 to step 3. If NO, all the bees are grouped and grouping is complete.

Employed Bees Phase
In the traditional ABC, employed bees randomly select their food source searching direction. In the proposed method, the searching direction is improved. The global best solution is used in the movement equation to enable the employed bees to move to the better solution. Moreover, the proposed method maintains the random search mechanism.
where x best j is the best solution in the group and ∅1 i j and ∅2 i j are random numbers between [−1, 1].

Onlooker Bees Phase
The grouping strategy alters the traditional onlooker bees' searching method to the leader (employed bees). The assigned leader must guide the teammates (onlooker bees) to search for a food source.
where ∅ i j is a random number between [−1, 1] and x Leader j is the location of the leader in the group.

Control of the Mobile Robot
In this paragraph, the mobile robot is introduced first and then the method of wall-following control is explained in the following section.

Description of the Mobile Robot
E-puck mobile robots, new small-sized mobile robots manufactured by EPFL, were used in this research. As displayed in Figure 7a, the robot was controlled by the DSPIC processor. Several standard configurations such as a proximal infrared sensor, voice sensor, accelerometer, and camera are included in the robot. The mobile robots connect to each other through Bluetooth and use Bluetooth to communicate with the computers. Nowadays, e-puck mobile robots are widely used. Applications for e-buck include mobile mechanical engineering, real-time programs, interpolation systems, signal transmission, image transmission, combination of sounds and images, human-computer interaction, and robot internal communication.  Figure 7b shows that e-puck is a two-wheeled mobile robot with eight 360 • surrounding infrared sensors from S 0 -S 7 . The diameter of e-puck is 7 cm, height is 5 cm, and maximum speed is 15 cm/s. The maximum distance between the mobile robot and nearby objects is 6 cm, and the minimum distance is 1 cm.

Wall-Following Control of the Mobile Robot
In this study, RL was used to implement the wall-following behavior. The robot can evaluate the performance by defining the suitable fitness functions in the training environment even when predefined rules and pre-specified training data are not provided. The wall-following control learning flowchart is displayed in Figure 8. The IT2NFC controller has four input signals (S 0 , S 1 , S 2 , and S 3 ) and two output signals (V L and V R ). The input signal S i denotes the distance from the ith infrared sensor and the output signals represent the turning speeds of the left and right wheels. Figure 9 exhibits the training environment. The goal is to enable the mobile robot to adapt to any type of unknown environments including straight lines, sharp corners, obtuse corners, smooth corners, and U-curve corners. To prevent the robot from moving far away from the walls or colliding with the obstacles, three terminate actions were designed in this study.

Action 1:
The total moving distance must be more than the preset maximum distance (the distance of navigating the training environment), which means that the robot successfully detours the training environment once.

Action 2:
As Figure 10a indicates, the detected distance from one of the sensors is less than 1 cm, which means that the robot hits the wall. As Figure 10b illustrates, the detected distance from one of the side sensors is more than 6 cm, which means that the robot is deviating from the wall.
To evaluate the performance during the learning, the fitness function is proposed in this paper. The fitness function has three sub-fitness functions, namely the total distance of the robot, the average distance between the robot and the wall, and the parallel degree between the robot and the wall (SF 1 , SF 2 , and SF 3 ).

SF 1
If the robot's moving distance R dis is close to the preset distance R total , then the robot is close to detouring the training environment.
If the robot's moving distance is more than the preset distance R total , then the robot traverses the training environment successfully and SF 1 = 0.

SF 2
To ensure the robot and the wall stay at a constant distance, the distance WD(t) between the robot's side sensor S 2 (t) and the wall is calculated in every time step, as indicated in Figure 11a.
where d wall is the expected distance between the wall and T stop is time step. In this study, d wall was selected as 4 cm. If the robot stays at a constant distance with the wall, then SF 2 = 0.

SF 3
To ensure the robot remains parallel with the wall, the law of cosines is used on the distances from the robot's front right sensor and the side sensor to calculate the length x(t). Like Figure 11b.
x(t) = RS 1 2 + RS 2 2 − 2RS 1 RS 2 cos(45 • ) (27) where r is the radius of the robot, RS 1 is the distance from the sensor S 1 , and RS 2 is the distance from the sensor S 2 . If the side sensor S 2 is vertical to the wall, then x(t) is equal to and SF 3 = 0. Therefore, the fitness function F can be defined as the following:

Cooperative Load-Carrying and Navigation Control of Mobile Robots
In this section, the cooperative load-carrying control of multi-mobile robots is depicted. First, set two mobile robots in the same direction and place the load item upon them. The front robot is the leader and the other is the follower. The distance between the two robots is 15 cm. Figure 12 demonstrates that the leader explores the environment and guides the follower. The follower assists the leader in carrying the item.

Cooperative Load-Carrying Wall-Following Control of Mobile Robots
The wall-following control method is used for the leader and the follower. The auxiliary controller is attached to the follower. Figure 13 presents the flowchart of cooperative load-carrying wall-following control. The auxiliary controller has five inputs, which consist of the distances detected from the four sensors on the follower (S 0 , S 1 , S 2 , and S 3 ) and the distance between the leader (R d ). The two outputs (V L and V R ) are the speeds from left and right wheels, respectively. Figure 14 exhibits the training environment that includes straight lines, sharp corners, obtuse corners, smooth corners, and U-curve corners.  In this control method, the leader only follows the wall and the follower assists the leader to carry the item and adjusts its own output from the auxiliary controller to match the leader's output. To prevent the follower and leader from staying at an unsafe distance or failing the mission, the following five terminations are designed in this study.

Case 1
If one of the sensors on the follower reaches a distance less than 1 cm, then the follower hits the wall.

Case 2
If the follower's side sensor reaches a distance more than 6 cm, then the follower deviates from the wall.

Case 3
If Rd is less than 10 cm or more than 20 cm, then the leader and the follower are too close to each other or too far away from each other, respectively. Improper value of Rd might cause the leader to make a wrong detection or the item to fall.

Case 4
If the sensor under the item is smaller than the robot's height (R h ), then the task fails.

Case 5
If the side sensor detects a distance less than 1 cm or more than 7.5 cm, then the item is too close to the wall or too far away from the wall.
In practical testing experiments, the side and bottom sensors are removed. When the robots encounter a termination case, the fitness value F(·) is calculated by the robot's time steps. Then, the robots return to the initial point and restart the program again until they traverse the training environment successfully.
F(·) = T stop /5000 (30) A maximum time step of 5000 is allocated in this study. The longer the robots move, the higher the fitness value gets.

Cooperative Load-Carrying Navigation Control of Mobile Robots
Known and unknown environments are the two types of environments in existence. A navigation control method to effectively assist mobile robots to navigate unknown environments is proposed in this paper. The behavior mode manager was used to switch between the goal-oriented mode (GOM) and the wall-following mode (WFM). The robots were initialized in the GOM on detecting the obstacles, then switched to the WFM. Otherwise, the robots remain in the GOM.

• GOM
In an unknown environment, the goal position is the only information available to the robots. Therefore, Figure 15 shows an angle θ RG and the robot moves toward the goal.
where θ Robot is the angle between the robot and the axis x and θ Goal is the angle between the goal and the axis x.

•
Behavior mode manager Figure 17 shows the three areas of the robot O 1 , O 2 , and O 3 . Depending on which side of the sensors detect the goal (O i ) or which sensors detect the obstacles (S i ), the behavior mode manager determines which side of the wall the robots should follow. Left-side wall-following: The goal is in O 1 and S 7 or S 6 senses the obstacles. The goal is in O 2 and S 7 , S 6 , and S 5 sense the obstacles.
Right-side wall-following: The goal is in O 1 and S 0 or S 1 senses the obstacles. The goal is in O 3 and S 0 , S 1 , and S 2 sense the obstacles.
Prior to the robot's switch to the WFM, the behavior mode manager deduces which robot detects the obstacles first. Figure 18 displays that if the leader detects the obstacles, then the leader will pre-rotate first to stay parallel to the wall. The leader then sends θ L to the follower, and the follower rotates π − θ L degrees. Once both robots are within a safe distance and parallel to the wall, the WFM is executed. After they pass the obstacles, the GOM switches back again. Figure 19 analyses the actions of the follower.    As shown in Figure 21, during the WFM, if the obstacle is in the follower's O 1 area and the right-side sensor S 1 detects a distance larger than 6 cm (if the robots are in the left-side WFM, then the distance is detected by the left-side sensor S 6 ), then the item on the robots is past the obstacle. The behavior mode manager can switch the WFM back to the GOM.

Experimental Results
Simulation is very important in the robot design process, and the developed algorithm can be quickly verified. At the same time, for robot learners, simulation tools can greatly reduce the cost of learning. The robot simulation platform integrates a physics engine which can calculate motion, rotation and collision according to the physical properties of the object. The simulation tools commonly used in robots are Gazebo, V-REP, and Webot. Table 1 shows the features of various robot simulator software. In Table 1, the V-Rep and Webots support Linux, mac OS, and Windows development environments, whereas the Gazebo only supports Linux development environment. The V-Rep simulation software has less functional support than Gazebo and Webots. Therefore, in this study Webot is used as a robot simulation platform to verify the proposed IT2NFC controller based on DGABC learning algorithm and perform the WFM control and cooperative load-carrying of mobile robots.

Results of WFM Control
The performance and stabilization for DGABC were compared with those for other algorithms. Best, worst, average, standard deviation (STD), and the number of successful runs after ten runs and the average executing times are mentioned in Table 2. The number of successful runs demonstrates that the mobile robots can traverse the training environment one time successfully from the start point to the goal during the simulation. And the learning curves of each algorithms are shown in Figure 22. Table 2 and Figure 23 are under the same conditions. In Table 2, the proposed method exhibited a higher learning effect than improved ABC, ABC, and DE. In addition, the average executing time of proposed method is shorter than those of other methods.   In addition, we have also compared the proposed IT2NFC (type-2 fuzzy sets) based on DGABC with the IT1NFC (type-1 fuzzy sets) based on DGABC. Table 3 shows the performance comparison of IT1NFC and IT2NFC. The experimental results show the performance of IT2NFC (type-2 fuzzy sets) is better than the IT1NFC (type-1 fuzzy sets)"

Results of Cooperative Load-Carrying WFM Control
The results revealed that the trained control method was implemented for cooperative load-carrying mobile robots. Table 4 compares the performance of the proposed WFM with that of other algorithms in the simulation. The better performance and higher stabilization for cooperative load-carrying WFM control in DGABC is demonstrated in Figure 24. Figure 25 shows the moving path of two mobile robots in the training environment. The PSO did not display any advantage in cooperative load-carrying WFM control. Therefore, the proposed DGABC method exhibited better performance in this control.

Results of Cooperative Load-Carrying Navigation Control
Implementation of the proposed method in cooperative load-carrying mobile robots in unknown environments is demonstrated in Figure 26. In this experiment, the average distance between two mobile robots (RAD) and the average distance between the wall and the follower robot (FAD) were the two critical standards for evaluating the performance of the proposed method. As shown in Table 5, if the RAD value is too large, the two mobile robots do not stay within the safe distance, which might cause the object to fall. On the other hand, if the FAD value is too large or too small, then the WFM does not perform satisfactorily when the mobile robots are turning around the corner. This might cause the object to shift during the mission, the object to fall, and thus, the mission to fail.

Discussion
In this study, we have successfully implemented the cooperative load-carrying task of two mobile robots by using the proposed IT2NFC-based DGABC. Next, the proposed method will be extended to three and more mobile robots for the cooperative load-carrying task. For example, Figure 27 shows the cooperative load-carrying task of three mobile robots. First, the wall-following control method of the cooperative load-carrying task is used for the leader, the follower-1, and the follower-2. Second, the auxiliary controller is attached to the followers based on the description in Section 5.1. Finally, the behavior mode manager in navigation control is also used to switch between the GOM and the WFM in Section 5.2. Therefore, the cooperative load-carrying task of three mobile robots would be implemented. According to the aforementioned steps, the proposed method can be successfully extended to the cooperative load-carrying task of three and more mobile robots.

Conclusions
The proposed IT2NFC-based DGABC improves the search capacity and shortens the convergence speed for avoiding falling into local optimal solutions. Using the proposed method, the mobile robots can develop the controller adaptively because DGABC does not need any predefined rule set nor does it require pre-specified training data. According to the environmental situations, mobile robots use the behavior mode manager to switch between WFM and GOM. Additionally, the pre-rotate mechanism ensures the follower robot follows the wall perfectly and allows the leader robot to complete the cooperative load-carrying task in unknown environments.
The cooperative load-carrying task is so complex for the robots that several factors, such as the robot speed, the object payload, and the working environment, should be taken into consideration. Therefore, we focus on implementing the cooperative load-carrying task of two mobile robots in this study. In the future work, we will consider implementing the cooperative load-carrying task of two and more real mobile robots in the unknown environments.