# Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Structures and Behaviors

- PowerBrain: A robot must include one and only one PowerBrain module as the central unit of the robot. The physical PowerBrain module in the original kit contains a battery and a simple computer to generate electric power and a control signal. On its front side, a set of input buttons are arranged in a circular layout; thses do not play a significant role in our simulation. Other modules including the motor, pivot, and main modules can be attached to the remaining five sides, where sockets are set up such that both the electric power and control signal from the PowerBrain can be seamlessly transmitted to its attached modules.
- Motor: The motor module in the original kit allows two wheels to be fitted into the axles of two opposing motors that are aligned with a pair of opposite sides of the module, such that both wheels can be rotated in a clockwise or counter-clockwise direction simultaneously. In our implementation, we simplified such a connection mechanism between a motor and wheels by removing the existence of axles and axes and by allowing wheels to be connected to the motor directly. Further, we forced the wheels to be attached to a motor in a pairwise manner, thereby excluding the possibility of attaching a single wheel to one side of a motor. In our simulator, given a desired angular speed of the motor, we consistently apply a constant torque to the wheels connected to the motor until the angular speeds of the wheels reach the desired one.
- Pivot: The pivot module plays the role of a hinge joint that comprises two subparts and a single axle that connects these parts. The angle between the subparts can be adjusted within a range of 180 degrees, from $-90$ to 90. Given a desired angle of a pivot, we consistently computed the torque based on the PD control mechanism and applied the torque to the hinge joint until the angle reached the desired one.
- Main: The main module primarily serves as a decorative component for extending a structure while transmitting the power and data to its neighboring modules. Further, it acts as support for passive wheels that should be attached by pairs similarly as with the motor module. This is an essential component for building a large-scale robot with a complicated structure.
- Wheel: The wheel modules can be attached to either the motor modules or the main modules in pairs as explained above. When the wheels are attached to a motor module, those wheels can be controlled actively by applying torque to the motor module, which makes the robot exert forces on the surrounding environment such as floors and walls, and in turn, it enables the robot to move in the opposite directions to the exerted forces. In contrast, wheels attached to the main modules can only be passively rotated, and they can thus help the robot to navigate smoothly when other wheels or pivots are contolled actively.

## 4. Problem Formulation

## 5. Learning Behaviors

## 6. Evolving Structures

- Generation: A population of initial robots are randomly generated (see Figure 4). The population size was 20 in our experiments. The random generation of each robot begins by instantiating the root node of a new tree, corresponding to the PowerBrain module in our case. After the generation of the root node, we extend the structure by recursively attaching a new node to every possible side to be extended within a limited depth of recursion. The type of each new node is randomly decided from among the motor, pivot, and main modules. If the motor module is selected, we extend the structure again by attaching two wheel modules symmetrically around the motor module, terminate the recursion, and backtrack to the parent node. Otherwise, we continue the recursion for each side of the newly attached node as long as the current depth of recursion is less than the limit. For each new extension of the structure, we check whether the extended structure is collision free. If self-collsions are identified, we cancel the latest extension and restore the structure to its previous state.
- Evaluation: We evaluate the fitness of each new robot by executing the reinforcement learning algorithm with the robot during a limited number of time steps and by measuring the average of the episodic cumulative rewards, as defined in Equation (8). The number of time steps for learning is set as a relatively small value for every robot in the initial population because the qualities of the initial robots are not expected to be sufficiently high considering their pure randomness. As the evolution proceeds by iteratively replacing second-class robots with smarter ones, we incrementally increases the number of time steps to reveal the potential learning abilities of newer robots more thoroughly.
- Reproduction: Instead of recreating the entire population for each new generation, our steady-state approach adopts the policy of gradual one-by-one reproduction. At each generation, we select two robots for reproduction from the existing population based on the fitness proportion selection scheme. In this scheme, which is also known as roulette wheel selection, the probability of selecting a specific individual increases linearly with its fitness level, as defined in the following equation.$$\begin{array}{c}\hfill {p}_{i}=\frac{{f}_{i}}{{\sum}_{j=1}^{N}{f}_{j}}\end{array}$$New robots are created by recombining the selected two robots based on subtree crossover, which has been popularly used for the genetic programming (see Figure 5) [29]. Given trees ${T}_{1}$ and ${T}_{2}$ associated with the selected robots, we randomly choose a pair of internal nodes ${n}_{1}$ and ${n}_{2}$, whose parent nodes are of the same type, from ${T}_{1}$ and ${T}_{2}$, respectively, and we swap the two subtrees rooted at ${n}_{1}$ and ${n}_{2}$ to produce a pair of new trees ${T}_{1}^{\prime}$ and ${T}_{2}^{\prime}$. It is probable that a new tree can yield an invalid robotic structure (e.g., no motion module or self-collision) that needs to be discarded instead of being included in the population. If no new robots are valid, we then reiterate the process of recombination from the beginning. Otherwise, we randomly choose one of the valid new robots as the candidate offspring for the next generation.
- Replacement: The candidate offspring born through recombination is not always accepted as a member of the population. We allow only an offspring that fits better than at least one of the existing individuals to be included in the population. To this end, we evaluate the fitness of the candidate offspring by employing the reinforcement learning algorithm as described previously, and we compare its fitness value with the minimum fitness of the current population. If the offspring’s fitness is greater than the minimum, we replace the existing robot of the minimum fitness with the candidate offspring, which will be a new member of the population. Otherwise, we simply discard the candidate offspring and retain the population at this iteration.

## 7. Experimental Results

#### 7.1. Results from Clean Environment

#### 7.2. Results from Cluttered Environment

## 8. Discussion

## 9. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Guizzo, E. By leaps and bounds: An exclusive look at how boston dynamics is redefining robot agility. IEEE Spectr.
**2019**, 56, 34–39. [Google Scholar] [CrossRef] - Sung, J.-Y.; Guo, L.; Grinter, R.E.; Christensen, H.I. My Roomba is Rambo: Intimate home appliances. In International Conference on Ubiquitous Computing; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Doncieux, S.; Bredeche, N.; Mouret, J.-B.; Eiben, A.E. Evolutionary robotics: What, why, and where to. Front. Robot. AI
**2015**, 2, 1–18. [Google Scholar] [CrossRef] [Green Version] - Tinkerbots Modular Robotics Kit. Available online: http://www.tinkerbots.de (accessed on 12 March 2021).
- Bermano, A.H.; Bermano, T.; Rusinkiewicz, S. State of the art in methods and representations for fabrication-aware design. Comput. Graph. Forum
**2017**, 36, 509–535. [Google Scholar] [CrossRef] - Zhu, L.; Xu, W.; Snyder, J.; Liu, Y.; Wang, G.; Guo, B. Motion-guided mechanical toy modeling. ACM Trans. Graph.
**2012**, 31, 127:1–127:10. [Google Scholar] [CrossRef] - Coros, S.; Thomaszewski, B.; Noris, G.; Sueda, S.; Forberg, M.; Sumner, R.W.; Matusik, W.; Bickel, B. Computational design of mechanical characters. ACM Trans. Graph.
**2013**, 32, 83:1–83:12. [Google Scholar] [CrossRef] [Green Version] - Ceylan, D.; Li, W.; Mitra, N.J.; Agrawala, M.; Pauly, M. Designing and fabricating mechanical automata from mocap sequences. ACM Trans. Graph.
**2013**, 32, 186:1–186:11. [Google Scholar] [CrossRef] [Green Version] - Thomaszewski, B.; Coros, S.; Gauge, D.; Megaro, V.; Grinspun, E.; Gross, M. Computational design of linkage-based characters. ACM Trans. Graph.
**2014**, 33, 64:1–64:9. [Google Scholar] [CrossRef] - Megaro, V.; Thomaszewski, B.; Nitti, M.; Hilliges, O.; Gross, M.; Coros, S. Interactive design of 3D-printable robotic creatures. ACM Trans. Graph.
**2015**, 32, 216:1–216:9. [Google Scholar] [CrossRef] - Geilinger, M.; Poranne, R.; Desai, R.; Thomaszewski, B.; Coros, S. Skaterbots: Optimization-based design and motion synthesis for robotic creatures with legs and wheels. ACM Trans. Graph.
**2018**, 37, 160:1–160:12. [Google Scholar] [CrossRef] [Green Version] - Zhu, Z.; Pan, Y.; Zhou, Q.; Lu, C. Event-triggered adaptive fuzzy control for stochastic nonlinear systems with unmeasured states and unknown backlash-like hysteresis. IEEE Trans. Fuzzy Syst.
**2020**. [Google Scholar] [CrossRef] - Roman, R.-C.; Precup, R.-E.; Petriu, E.M. Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur. J. Control
**2021**, 58, 373–387. [Google Scholar] [CrossRef] - Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley: Hoboken, NJ, USA, 1989. [Google Scholar]
- Baioletti, M.; Milani, A.; Santucci, V. Variable neighborhood algebraic differential evolution: An application to the linear ordering problem with cumulative costs. Inf. Sci.
**2020**, 507, 37–52. [Google Scholar] [CrossRef] - Sims, K. Evolving virtual creatures. In Proceedings of the ACM SIGGRAPH 1994, Orlando, FL, USA, 24–29 July 1994; pp. 15–22. [Google Scholar]
- Funes, P.; Pollack, J. Evolutionary body building: Adaptive physical designs for robots. Artif. Life
**1998**, 4, 337–357. [Google Scholar] [CrossRef] - Lipson, H.; Pollack, J.B. Automatic design and manufacture of robotic lifeforms. Nature
**2000**, 406, 974–978. [Google Scholar] [CrossRef] [PubMed] - Kamimura, A.; Kurokawa, H.; Yoshida, E.; Murata, S.; Tomita, K.; Kokaji, S. Automatic locomotion design and experiments for a modular robotic system. IEEE ASME Trans. Mechatron.
**2005**, 10, 314–325. [Google Scholar] [CrossRef] - Duarte, M.; Gomes, J.; Oliveira, S.M.; Christensen, A.L. Evolution of repertoire-based control for robots with complex locomotor systems. IEEE Trans. Evol. Comput.
**2018**, 2, 314–328. [Google Scholar] [CrossRef] - Larik, A.; Haider, S. A framework based on evolutionary algorithm for strategy optimization in robot soccer. Soft Comput.
**2019**, 23, 7287–7302. [Google Scholar] [CrossRef] - Alattas, R.J.; Pater, S.; Sobh, T.M. Evolutionary modular robotics: Survey and analysis. J. Intell. Robot. Syst.
**2019**, 95, 815–828. [Google Scholar] [CrossRef] [Green Version] - Lund, H.H. Co-evolving control and morphology with LEGO robots. In Morpho-functional Machines: The New Species; Springer: New York, NY, USA, 2003; pp. 59–79. [Google Scholar]
- Ha, S.; Coros, S.; Alspach, A.; Kim, J.; Yamane, K. Joint optimization of robot design and motion parameters using the implicit function theorem. Robot. Sic. Syst.
**2017**, 13. [Google Scholar] - Schaff, C.; Yunis, D.; Charkrabarti, A.; Walter, M.R. Jointly learning to construct and control agents using deep reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9798–9805. [Google Scholar]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res.
**2013**, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar] - Blum, C.; Roli, A. Metaheuristics in combonational optimization: Overview and conceptual comparison. ACM Comput. Surv.
**2003**, 35, 268–308. [Google Scholar] [CrossRef] - Banzhaf, W.; Francone, F.D.; Keller, R.E.; Nordin, P. Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications; Morgan Kaufmann: San Francisco, CA, USA, 1998. [Google Scholar]
- Juliani, A.; Berges, V.; Teng, E.; Cohen, A.; Harper, J.; Elion, C.; Goy, C.; Gao, Y.; Henry, H.; Mattar, M.; et al. Unity: A General Platform for Intelligent Agents. arXiv
**2020**, arXiv:1809.02627. [Google Scholar]

**Figure 1.**PowerBrain, motor, main, wheel, and pivot modules (in a clockwise direction from the left top).

**Figure 2.**An example robot structure. The four-wheeled robot on the upper right is represented as the tree structure.

**Figure 3.**Overview of our evolutionary design process. The blue bar below each robot represents the fitness of the robot. The robots in the red regions represent the selected and the replaced robots.

**Figure 5.**An example of the crossover operation. The sub-structures in the yellow and blue regions are pruned from the left two parent robots, and then are recombined to produce the child structure on the right.

**Figure 6.**Training candidate robotic structures in the clean environment (

**left**) and the cluttered environment (

**right**).

**Figure 7.**The resulting structures evolved from the clean environment (

**top**) and the cluttered environment (

**bottom**).

**Figure 8.**Noteworthy robots evolved from the clean environment (

**top**) and the cluttered environment (

**bottom**).

**Figure 10.**The fitnesses of all robots generated by our evolutionary process in the clean environment.

**Figure 12.**The fitnesses of all robots generated by our evolutionary process in the cluttered environment.

**Figure 13.**Comparing ‘R+R’(random structure combined with random behavior) and ‘E+R’(evolved structure combined with random behavior).

**Figure 14.**Comparing ‘R+R’(random structure combined with random behavior) and ‘R+L’(random structure combined with learned behavior).

**Figure 15.**Comparing ‘R+R’(random structure combined with random behavior) and ‘E+L’(evolved structure combined with learned behavior).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, J.H.; Lee, K.H.
Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning. *Symmetry* **2021**, *13*, 471.
https://doi.org/10.3390/sym13030471

**AMA Style**

Park JH, Lee KH.
Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning. *Symmetry*. 2021; 13(3):471.
https://doi.org/10.3390/sym13030471

**Chicago/Turabian Style**

Park, Jai Hoon, and Kang Hoon Lee.
2021. "Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning" *Symmetry* 13, no. 3: 471.
https://doi.org/10.3390/sym13030471