Synthetic data generation to speed-up the object recognition pipeline

: This paper provides a methodology for the production of synthetic images for training 1 neural networks to recognise shapes and objects. There are many scenarios in which it is difficult, 2 expensive and even dangerous to produce a set of images that is satisfactory for the training of a 3 neural network. The development of 3D modelling software has nowadays reached such a level 4 of realism and ease of use that it seemed natural to explore this innovative path and to give an 5 answer regarding the reliability of this method that bases the training of the neural network on 6 synthetic images. The results obtained in the two proposed use cases, that of the recognition of a 7 pictorial style and that of the recognition of migrants at sea, leads us to support the validity of the 8 approach, provided that the work is conducted in a very scrupulous and rigorous manner, exploiting 9 the full potential of the modelling software. The code produced, which automatically generates the 10 transformations necessary for the data augmentation of each image, and the generation of random 11 environmental conditions in the case of Blender and Unity3D software, is available under the GPL 12 licence on GitHub. The results obtained lead us to affirm that through the good practices presented 13 in the article, we have defined a simple, reliable, economic and safe method to feed the training phase 14 of a neural network dedicated to the recognition of objects and features, to be applied to various 15 contexts.

. Flow chart summarising the various phases of the training of a neural network using virtual scenarios.
you will have to go back and examine the dataset of real photographs, train the network again, and 117 see if the goals have been met. 118 Instead of immediately beginning with the capture of real-world images, we advocate starting 119 with synthetic photographs created using 3D modelling software utilising our pipeline. If the 120 three-dimensional environment is created with care, realism, and detail fidelity, we will have a 121 clear idea of the performances that we will be able to obtain in the real world quickly and with low 122 initial costs, and only then begin the construction phase of a dataset with photographs taken in the 123 real world. 124 We have discovered a number of advantages using this method. The first benefit is the speed with 125 which we can generate photographs to train the networks, because once we have set up the working 126 environment, we will be able to generate an almost infinite number of images by simply running our 127 algorithm and letting it generate images with random combinations of lights, shadows, and objects. 128 Another advantage is cost; developing a synthetic dataset is far less expensive than creating a genuine 129 dataset, which in our example would include the hiring of a helicopter, actors, and at least one ship. It 130 should also be remembered that by utilising a synthetic dataset, we will be able to determine much 131 more quickly if the technique we are applying is appropriate or not, and, if required, entirely alter 132 strategy and attack the problem from different perspectives. We believe that a synthetic dataset should 133 not be used in place of a dataset made up of real images because it is currently impossible to faithfully 134 reconstruct all of the graphic facets and decals that make up the natural world, but we do believe that 135 it is a useful tool for researchers working on machine learning problems, particularly those involving 136 image classification.

144
We evaluated the synthetic datasets with neural networks after synthesising them with virtual 145 reality software using the methodology outlined in the previous sections.    is presented.

176
The learning curve of the network in the recognition of the training set is shown in blue in Figure   177 6A, while the dataset of men at sea is studied. As you can see, the network begins to settle after a few 178 early oscillations, becoming increasingly precise.   with the following layers. The first is a Flatten type layer, whose job was to convert the network's data to learn how to accurately categorise our datasets. We have additionally set up two more aspects.

208
The first is in the training phase: we assess the network accuracy percentage on the validation set  To finish our study, we constructed a custom neural network for the collection of paintings. The 234 goal of this network's architecture is to achieve a more linear training phase than InceptionResNet-V2. gameobject after constructing an empty gameobject to use as a container. As a result, a camera must be 274 placed in the scene and positioned so that it frames precisely the scene, as illustrated in Figure 13.

275
Depending on the lighting conditions we want to reproduce, we will also need a directional 276 light and maybe one or more pointlights or spotlight types of lights. After you have completed these 277 preparatory procedures, you will need to write a script that will change the lighting conditions at 278 random, rotate the item, and take photos with the appropriate resolution. We propose including 279 some public variables in the script so that you can define the parameters using the Unity3D graphical   The followings are best practices for generating virtual scenario datasets using Unity3D. Now let 310 us assume we want to do a true-false binary classification. First and foremost, the models of the things 311 that will make up the scene must be created. The models must be realistic, and we suggest Blender we recommend generating and putting a container object, in our case "objects", into the scene. We 323 have added two more containers to it, one for objects of the true class and the other for objects of the 324 false class. Three-dimensional models of people, ships, and barrels have been placed within the two 325 containers. Figure 14b provides an example.

326
As a result, the operations flow that we suggest is as follows.