In approaches such as the one presented by Okumura et al. [
18] from the Toyota Research Institute, they only consider roundabout situations; in the work of Mihaly et al. [
19], they only consider intersections scenarios. In these works, only one type of driving situation is considered, roundabouts or intersections. It works properly only for a reduced scenario, but it is necessary to have more policies to scale to all the different driving situations that a car faces while it is in use. In other methodologies, as presented by Aeberhard et al. [
20], the authors try to reduce the complexity of the model dividing the driving task into a finite set of lateral and longitudinal driving situations (guidance states). They first, evaluate the driving situation and then produce a driving request which outputs a series of discrete events to evaluate by the Deterministic Automaton. Although there have been major improvements, all aspects of the automated driving system, including perception, localization, decision-making, and path planning algorithms, still need to be further developed to bridge the gap between robotics research and a customer-ready system. It is necessary to increase the number of driving situations to model under different scenarios. Other approaches, such as in the works of Thrun et al. [
21] and Ulbrich et al. [
22], apply POMDP to behavior selection in order to perform lane changes while driving in urban environments. They used a finite set of policies to speed up planning. In Brechtel et al. [
8], their approach is used to deal with potentially hidden objects and observation uncertainties. All have in common that using a finite set of policies can speed up planning, which is a good idea to solve the general driving problem and demonstrate the importance of the number selection of policies. There are different approaches where the data are divided into specialized subsets in which there is a similitude trying to solve different problems applying the “divide and conquer” technique. This method is shown to be useful to accelerate the process of learning for some algorithms. For example, in the study of Zhou et al. [
23], they propose an efficient “divide and conquer” model, which constrains the loss function of Maximum Mean Discrepancy (MMD) to a tight bound on the deviation between an empirical estimate and expected value of MMD and accelerates the training process. This approach contains a division step and a conquer step. In the division step, they learn the embedding of training images based on an autoencoder and partition the training images into adaptive subsets through K-means clustering based on the embedding. In the conquer step, sub-models are fed with subsets separately and trained synchronously. Experimental results show that with a fixed number of iterations, this approach can converge faster and achieve better performance compared with the standard MMD-GANs. In other computational research areas, such as in Reinforcement Learning, designing a good reward function is essential to problems such as robot planning, and it can be challenging. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. In the approach of Ratner et al. [
24], they introduce a “divide and conquer” model that enables the designer to specify a reward separately for each environment. They conduct user studies in an abstract grid world domain and in a motion-planning domain for a 7-DOF manipulator that measures user effort and solution quality. The solution demonstrates that this method is faster, easier to use, and produces a higher-quality solution than the typical method of designing a reward jointly across all environments. Wang et al. [
25] propose a deep neural network model based on a semantic “divide and conquer” approach, they decompose a scene into semantic segments, such as object instances and background stuff classes, and they predict a scale and shift-invariant depth map for each semantic segment in canonical space. Semantic segments of the same category share the same depth decoder, so the global depth prediction task is decomposed into a series of category-specific ones, which are simpler to learn and easier to generalize to new scene types. Even Kim et al. [
26] proposed a “divide and conquer” contour design methodology using a bioinspired design methodology for a multifunctional lever based on the morphological principle of the lever mechanism in the Salvia pratensis flower.
These kinds of solutions motivate us as a research group to ask us the following questions: How can we avoid subjectivity in the labeling of the data? How can we select an adequate number of policies from the dataset? Finally, how can we avoid the manual labeling of the data?
In the above-mentioned cases, the “divide and conquer” technique shows that there is a similitude with the idea of dividing the data into driving situations and trying to cluster similar data to accelerate the process of the learning of different policies according to driving situations. The idea of dividing the data into specialized subsets in an unsupervised manner could be a solution to avoid the subjectivity of a person labeling the data and avoids spending a lot of time making the hand-engineered labeling. It also speeds up the process of learning with the specialized subsets to train the individual policies.