Synthetic-to-real Composite Semantic Segmentation in Additive Manufacturing

The application of computer vision and machine learning methods in the field of additive manufacturing (AM) for semantic segmentation of the structural elements of 3-D printed products will improve real-time failure analysis systems and can potentially reduce the number of defects by enabling in situ corrections. This work demonstrates the possibilities of using physics-based rendering for labeled image dataset generation, as well as image-to-image translation capabilities to improve the accuracy of real image segmentation for AM systems. Multi-class semantic segmentation experiments were carried out based on the U-Net model and cycle generative adversarial network. The test results demonstrated the capacity of detecting such structural elements of 3-D printed parts as a top layer, infill, shell, and support. A basis for further segmentation system enhancement by utilizing image-to-image style transfer and domain adaptation technologies was also developed. The results indicate that using style transfer as a precursor to domain adaptation can significantly improve real 3-D printing image segmentation in situations where a model trained on synthetic data is the only tool available. The mean intersection over union (mIoU) scores for synthetic test datasets included 94.90% for the entire 3-D printed part, 73.33% for the top layer, 78.93% for the infill, 55.31% for the shell, and 69.45% for supports.


I. INTRODUCTION
W ITH the current exponential growth, the amount of plastic waste could reach 250 billion tons by 2050 [1], vast quantities of which end up polluting the natural environment on land and in the ocean [2].Distributed manufacturing using additive manufacturing (AM) is reforming global value chains as it increases rapidly [3] because there are millions of free 3-D printable consumer product designs and 3-D printing them results in substantial cost savings compared to conventionally-manufactured commercial products [4], [5].
The growing popularity of 3-D printing is playing a notable role in the problem of recycling as 3-D printed products rarely have recycling symbols [6], use uncommon polymers [7], and are increasing the overall market of plastic materials [8].This is not only caused by additional plastic products, but also from disturbing failure rates.Inexperienced 3-D printer users were estimated to have failure rates of 20% [9].Even experienced professionals working in 3-D print farms, Manuscript created October, 2022; This work was developed by Free Appropriate Sustainability Technology (FAST) research group at University of Western Ontario, Canada.This work is distributed under the GNU General Public License (GPL) 3.0 (https://www.gnu.org/licenses/gpl-3.0.en.html).
however, have failure rates of at least 2% [10].The probability of a manufacturing defect increases with the size and print time of the object (e.g. using large scale fused filament printers [11] or products [12], [13] or fused granule printers [14], [15]), which can magnify the materials waste created from even a small percentage of failures.It is clear that the ability to automatically detect deviations in AM will significantly help to reduce material waste and the time spent on reproducing failed prints.
As recent studies [16] show, computer vision is the dominant tool in analyzing AM and extrusion-based 3-D printing processes.For example, Ceruti et al. [17] utilized data from computer-aided design (CAD) files that are used in the first step of the design of a 3-D printed component.Then further down the software toolchain, Nuchitprasitchai et al. [18], Johnson et al. [19], and Hurd [20] developed failure analysis based on comparison with the Standard Tessellation Language (STL) files used at the slicing step in most 3-D printing.Further still, both Jeong et al. [21] and Wasserfall et al. [22] used instead the G-code files that provide the 3-D printer with spatial toolpath instructions for printing parts.The 3-D printing software toolchain does not need to be used at all as several approaches use comparisons with reference data [23], [24] or ideal 3-D printing processes [25], [26].In addition, a 3-D reconstruction-based scanning method for real-time monitoring of AM processes is also possible [27].In previous works, the authors considered the possibilities of detecting critical manufacturing errors using classical image processing methods [28], as well as employing synthetic reference images rendered with a physics-based graphics engine [29].
The popular open source Spaghetti Detective application [30], [31] is also a direct confirmation of the effectiveness of visual monitoring.An analysis of the Spaghetti Detective's [30] user performance database collected over 2.3 years showed that 24% of all 5.6 million print jobs were canceled, which can be represented as wasted 456 hours of continuous printing compared to 5,232 hours of printing where all the print jobs were finished (Figure 1).This statistics, however, does not include over a million canceled print jobs less than 5 minutes long, which are assumed to be the initial bed-leveling issues and cannot, therefore, indicate manufacturing failures.It also does not consider the working time of human operators spent on starting later canceled printing tasks.
Semantic segmentation [32] of both the entire manufactured part and its separate structural regions at the stage of production of each layer will expand the capacity of visual analysis of AM processes and will allow getting closer to the detection and localization of individual production errors.This will allow printing defects to be corrected in situ, where each successive layer can be modified depending on the deviations found in the previous stage, thus improving both the mechanical and aesthetic performance of the entire object.It also significantly reduces the requirements for camera positioning accuracy and calibration, eliminating the need for visual markers and rigid holders.
In the previous work [29], the authors demonstrated the ability of Blender [33], a free and open-source physics rendering engine, to generate photorealistic images of ideal 3-D printing processes based on existing G-code files.This work served as a milestone in the development of a deep-learning-based approach presented in this paper for the semantic segmentation of structural elements in 3-D printing environments.
Using a synthetic dataset, however, comes at a cost of a domain shift, which is often strongly associated with appearance changes [34].When the source (synthetic images) and target (real images) domains are semantically related, but are different in visual representation, direct propagation of learned knowledge about one domain to another can adversely affect segmentation performance in the latter domain.Therefore, Domain Adaptation (DA) is needed to learn generalized segmentation rules in the presence of a gap between the source and target dataset distributions [34], [35].
There are examples in the literature of successful syntheticto-real (sim-to-real) DA applications.Imbusch, Schwarz, and Behnke [36] proposed an unsupervised Generative Adversarial Network (GAN)-based DA approach to a robotics environment image dataset that provides performance close to training on real data and does not require annotations of real robotic setups.Li et al. [37] presented a semantically aware GANbased neural network model for virtual-to-real urban scene adaptation with the ability to store important semantic information.Lee et al. [38] introduced a sim-to-real vehicle identification technique consisting of DA and semi-supervised learning methods.Domain adaptation, however, is a separate area of research and is not covered in this article.The possibility of applying a cycle-consistent adversarial network (CycleGAN) [39], an image-to-image translation method, was considered for segmentation improvement, as generative adversarial networks can perform a significant role in domain adaptation techniques and be used in future research.This work, therefore, combines the following contributions: • a technique for generating synthetic image-mask pairs of layer-by-layer ideal 3-D printing processes has been developed for subsequent neural network training; • three independent labeled synthetic image datasets for (a) entire part, (b) top layer, (c) infill, shell, and supports for 3-D printed objects have been created; • a neural network was trained for semantic segmentation of the entire printed part, as well as its current printing top layer and internal structure; • image-to-image translation approach to improve segmentation results have been explored.All the above steps are sequentially described in this article after first reviewing related works in detail.The results will discuss the potential for localizing of 3-D printed parts in the image frame and applying image processing methods to its structural elements for subsequent detection of manufacturing deviations.

II. BACKGROUND
Semantic image segmentation problems represent an actively developing area of research in deep machine learning [32], [40].The main limiting factor, however, is the difficulty of obtaining annotated databases for training machine learning architectures.This approach requires thousands of images with labeled masked regions, which is an extremely difficult and time-consuming task: manual annotation of a single image with pixel-by-pixel semantic labels can take more than 1.5 hours [41].
The use of synthetic images, in turn, allows getting a segmented training database conditionally "free of charge", since masked regions of interest can be automatically annotated when creating virtual physics-based renders.In addition, advances in computer graphics make it possible to generate an almost unlimited amount of labeled data by varying environmental parameters in ranges that are difficult to obtain in real conditions [34].The success of simulated labeled data is clearly illustrated in already classic GTA5 [42] and Synthia [43] image sets.
There are many examples of applying synthetic datasets to solve real-world practical problems.Nikolenko [44] presented an up-to-date technological slice of the use of synthetic data in a wide variety of deep learning tasks.Melo et al. [45] outlined the most promising approaches to integrating synthesized data into deep learning pipelines.Ward, Moghadam, and Hudson [46] used a real plant leaves dataset augmented with rendered images for instance leaves segmentation to measure complex phenotypic traits in agricultural sustainability problems.Boikov et al. [47] presented a methodology for steel defects recognition in automated production control systems based on synthesized image data.
Several researchers introduced artificial intelligence (AI)based methods into AM field to classify the quality of manufacturing regions, as well as to segment failed areas in 3-D printing processes.Valizadeh and Wolff [48] provided a comprehensive overview of neural network applications to several aspects of the AM processes.Banadaki et al. [49] proposed a convolutional neural network (CNN)-based automated system for assessing surface quality and internal defects in AM processes.The model is trained on captured images during material layering at various speeds and temperatures and demonstrates 94% accuracy in five failure gradations in real time.Saluja et al. [50] utilized deep learning algorithms to develop a warping detection system.Their method extracts the layered corners of printed components and identifies warpage with 99.3% accuracy.Jin et al. [51] presented a novel CNNbased method incorporating strain to measure and predict four levels of delamination conditions.These works, however, solve the specific sets of certain manufacturing problems and do not allow scaling and generalization of the developed algorithms.
Analysis based on semantic segmentation, in turn, has significant potential for detecting and evaluating a wide range of manufacturing defects.Wong et al. [52], [53] have demonstrated U-Net CNN 3-D volumetric segmentation in AM using medical imaging techniques to automatically detect defects in X-ray computed tomography images of specimens with a mean intersection over union (mIoU) value of 99.3%.Cannizzaro et al. [54] proposed an AI in-situ emerging defects monitoring system utilizing the automatic GAN-based synthetic image generation to augment the training data set.These functions are built into a holistic distributed AM platform that allows storing and integrating data at all manufacturing stages.Davtalab et al. [55] presented a neural network automated system of semantic pixel-wise segmentation based on one million images to detect defects in 3-D printed layers.
Having an open structured annotated database for additive manufacturing will create considerable opportunities for the development of failure detection systems in the future.Segmentation and localization of individual structural elements of manufacturing objects can make it easier to detect and track erroneous regions when they occur.

III. METHODS
Based on the most common words in 3-D print filenames stored in the Spaghetti Detective database [30], sets of labeled images of printed products at various stages of their production were generated in the physics-based graphics engine [29].These image-mask pairs were further used to train neural networks for the tasks of visual segmentation of manufactured parts and their structural elements.Additionally, the possibilities of image-to-image style translation were also explored to reduce the domain gap and increase the segmentation precision.The segmentation efficiency was tested both on synthetic renders outside of training sets and on real images.Data and source code for this project can be obtained from the Open Science Framework (OSF) repository [56].
A. Creation of synthetic image datasets 1) Selecting CAD designs for rendering: More than 5.6 million filenames were partitioned into meaningful lexical parts and processed in Spaghetti Detective's user performance database [30] analysis to create a dictionary of the most frequently used words (Figure 2).These print jobs were performed by 49,000 unique users on 57,000 different 3-D printers.The average print time was 3.6 hours.
Based on the compiled dictionary, a set of random Standard Tessellation Language (STL) files was collected from Thingiverse [57], an open catalog of widely used computer-aided Fig. 2. Distribution of the 25 most frequently used words in file names for 3-D printing.A detailed analysis of the users' print tasks database is given in the source file repository [56].
designs (CADs) for 3-D printing, for further processing.These files formed the basis for generating a database of synthetic images.A complete list of used CAD designs is in the OSF repository [56].
2) Graphics rendering pipeline: All the selected STL files were converted into G-codes in free MatterControl software [58] maintaining the same slicing parameters: 0.3 mm layer height, 0.4 mm nozzle diameter, 4 perimeters, and 30% grid infill.The resulting G-codes were further parsed layer by layer in the Blender [33] programming interface, where the extruder trajectory is converted into a set of curves with a controllable thickness parameter and preset material settings.Each G-code layer is thus transformed into an independent 3-D object.The whole rendering process is illustrated in the following diagram (Figure 3).The functional component of the repository [59] was used as a basis for importing G-code files into the graphics engine.To create photorealistic renders, scenes similar to real physical environments were created in Blender.The position of the camera, as well as the degree of illumination and the location of light sources, were chosen to closely match the actual workspace.(Figure 4).
The whole scene, in addition to the printed part, includes components such as point light sources to create diverse heterogeneous all-round illumination, the "Sun" to create uniform background lighting, a flat printing surface with realistic texture and reflectivity, and a plane with a superimposed blurred background image to create an illusion of defocused ambient environment.The color of the plastic material and the surface characteristics of the printed part were adjusted empirically using a rich library of Blender shaders [60].When simulating surface irregularities, the Noise Texture [61] and Voronoi Texture [62] nodes were used to add Perlin and Worsley noises, respectively, while the "Bump" node was added to adjust the overall roughness.Photorealistic color, transparency, and reflection parameters were obtained by the combination of Principled [63] (adds multiple layers to vary color, reflection, sheen, transmission, and other parameters), Glossy [64] (adds reflection with microfacet distribution), Diffuse [65] (adds Lambertian and Oren-Nayar diffuse reflections), and Transparent [66] (adds transparency without refraction) Bidirectional Scattering Distribution Functions (BSDFs) (Figure 6).
The G-code parsing procedure heavily utilizes the functionality of the Blender application programming interface [67], which provides access to the properties of all shader nodes used in the scene.The entire animation process is scripted with randomized locations of the camera, light sources, and printing bed plane in timeline keyframes, where the graphics engine adds intermediate frames by interpolation.Most of the Gcodes were used twice with different levels of part completion, material color, print surface texture, light source locations, and camera orientations, resulting in approximately100 unique synthetic images for each selected CAD design.
The built-in compositing interface [68] was used to create pixel-perfect masks for each frame (Figure 7).During the slicing procedure, each extruder path acquires its own type, which can be visualized in pseudo colors in slicing environment (Figure 8).In this work, the outer and inner walls were combined into one structural element "shell".For visual segregation (masking) of individual scene elements (background, top layer, infill, shell, and support), different values of the material pass index parameter [69] were set at the G-code parsing stage.This allows each selected element to be rendered as a region filled with a certain grayscale level.Fig. 7. Composite node network (internal structure segmentation example) assigns user-defined color labels to each pixel in the output image, depending on whether it belongs to a particular area (infill, shell, or support) of the rendered part.This creates a pixel-precise ground truth mask (red) for each output image frame (red) in the animation.
The internal physics-based path tracer Cycles [70] was used to render each frame of the animation.To reduce rendering time, the number of samples was set to 64, the total number of light path reflections was reduced to 8, and the Reflective and Refractive Caustics features were disabled.The Cycles performance depends on the system's computational power.An 8 GB GPU setup with 256x256 render tile size and an output image size of 1024x1024 pixels takes up to one minute to process a single frame, depending on the scale and geometric complexity of the scene within the camera viewport.Rendering an entire 50-frame animation this way can take up to one hour.
3) Synthetic image datasets: For the further task of semantic segmentation, three separate datasets were created (Figure 9).A total of 5,763 1024x1024 pixels image-mask pairs were generated for the segmentation of the entire 3-D printed part, 3,570-for the top layer segmentation, and 1,140-for infill, shell, and support (internal layer structure) segmentation.

B. Semantic image segmentation
Minaee et al. [32] and Ulku & Akagündüz [40] presented a comprehensive overview of the modern research state in the field of semantic segmentation.As can be seen from the works [71]- [73], the U-Net family of neural network architectures has demonstrated high segmentation efficiency with small amounts of training data.The DeepLab architecture, in turn, is one of the basic architectures for subsequent domain adaptation [74]- [76].
This work employs the U-Net architecture [77] and its multi-class adaptation [78] due to its efficiency and simplicity.

C. Image-to-image translation
To potentially improve the efficiency of semantic segmentation, the application of the unpaired image-to-image translation method based on the CycleGAN network [39] was considered.The given method learns the mapping between the source domain (real images) and the target domain (synthetic images) by minimizing the cycle consistency loss L C (Figure 10) in the absence of paired data samples.
For this task were manually selected 589 synthetic renders and 794 real images of 3-D printed parts.The learning result is two generators that convert the original images of the real domain into their synthetic counterparts, and vice versa (Figure 11).As can be seen from Figure 11, translating a synthetic render into a real image makes colors more natural, while translating a real image to a synthetic one also reduces the contrast and saturation of the reflections on the printing bed and incidental filament strings.This characteristic can improve segmentation in mediocre images.

IV. RESULTS
The results of semantic segmentation are presented on the example of several real images in Figure 12.The training of the neural network was carried out on synthetic renders without using the style translation technique.
Quantitative results are shown in Table 1.Test datasets include synthetic renders of STL models both included in the training dataset and not included in it.3-D Models included in the training dataset have their color, angle, and environment parameters changed to avoid matching the data the model was trained on.
The intersection over union (IoU) quantifies the degree of overlap (from 0 to 100%) between the ground truth mask and the segmented pixel area of its predicted version, where a larger value indicates a more accurate segmentation and the mIoU is the mean IoU value across the correspondent classes in the dataset.The calculation of mIoU scores for real images was carried out only for the segmentation of the entire part, since the obtaining manually-labeled ground truth masks for the top layer and the internal structure of the part is a nontrivial task, considering the geometric complexity of the filling elements.
As can be seen from Table 1, the segmentation accuracy on real images (78.16%) is inferior to synthetic data (94.90%),which indicates the need for additional research on domain adaptation.Detecting the top layer is a more complex task for the neural network compared to segmenting the entire part, which is clearly noticeable in the results within the same dataset (mIoU 73.33% for the top layer versus 99.74% for the background).Shell segmentation has the lowest score (mIoU 55.31%).This, apparently, is due to the variety of geometric shapes and the lack of a characteristic texture that the infill and support areas have.
To analyze the influence of style transfer (ST) on semantic segmentation, a separate CNN training of three datasets of one  part was carried out (Figure 13).Synthetic and real datasets consist of 49 and 36 image-mask pairs, respectively.To compare the domains were used t-distributed stochastic neighbor embedding (t-SNE) [79], [80] projections of the normalized bottleneck layers of trained U-Net models (Figure 14).The nonlinear dimensionality reduction technique was applied to 512-dimensional normalized vectors in the narrowest parts of the trained models to visualize the affinity of the domains in latent feature space.As can be seen from Figure 14a, the feature space of the real domain (orange) is getting closer to synthetic data (blue) after the image-to-image style translation (black).In addition to t-SNE projections, the segmentation performance of the source real image data after ST was also analyzed (Figure 14b).The heatmap columns represent the data on which the neural network model was trained, and the rows stand for the input data on which segmentation was applied.The highest mIoU, as expected, was observed in those data sets on which the model was trained.When converting the real input data into ST using the image-to-image translation, however, the segmentation score increased from 61.10% to 75.19% for the model trained solely on synthetic data.This parameter is the most valuable, since in real conditions, training a convolutional network on real data may not be possible due to the lack of ground truth masks.This indicates that the ST method as a precursor to domain adaptation can significantly improve real 3-D printing image segmentation in situations where a model trained on synthetic data will be the only tool available.The sample results of image segmentation before and after style translation are shown in Figure 15.
As can be seen from Figure 15, real-to-synthetic style transferring reduces the saturation of the incidental filament strings and reflections on the printing platform, which, in turn, affects the results of semantic segmentation.Imageto-image translation, therefore, could be a powerful tool in further improving segmentation performance through domain adaptation techniques.
This work continues the previous authors' research on the use of physical rendering and demonstrates the significant potential of using synthetic data and machine learning in the field of additive manufacturing.Due to the relative simplicity of virtual printing and training data generation, segmentation of the contours of a manufactured part can be performed at every stage of its completion using a single camera in an arbitrary position.This reduces the requirements for camera calibration and eliminates the need to use visual markers to tightly bind the image frame to the coordinate system of the 3-D printing space.It also offers the flexibility to be used on any type of 3-D printing system with the addition of an after-market camera.
The limitations of the developed method are the need to create synthetic images and increase the training dataset for each new manufacturing part, as well as the implementation of transfer learning to improve the segmentation accuracy.Additional research is also required in the field of domain adaptation application based on existing state-of-the-art techniques [81]- [83].
Together with edge-based markerless tracking [84], [85], the developed technique can become an integral part of a 3-D printing control and monitoring system such as OctoPrint [86].In the future, this will make it possible to implement an inline comprehensive system for recognizing the type of a part being produced and determining its location and orientation in the workspace, as well as for tracking its manufacturing deviations.

V. CONCLUSIONS
The semantic segmentation framework for additive manufacturing can enhance the visual analysis of manufacturing processes and allow the detection of individual manufacturing errors, while significantly reducing the requirements for positioning accuracy and camera calibration.
The results of this work will allow localizing of 3-D printed parts in captured image frames and applying image processing techniques to its structural elements for following tracking of manufacturing deviations.The use of image style transfer is of significant value for further research in the field of adapting the domain of synthetic renders to real images of 3-D printed products.
The methodology demonstrated achieved the following mIoU scores for synthetic test datasets: entire printed part 94.90%, top layer 73.33%, infill 78.93%, shell 55.31%, support 69.45%.The results illustrate the effectiveness of the

Fig. 1 .
Fig. 1.Analysis of 3-D printer users' activity for 2.3 years.The runtime distribution shows a 24% failure rate for all 5.6 million printing tasks longer than 5 minutes.

Fig. 3 .
Fig. 3. Synthetic AM database creation pipeline.Each 3-D part in the form of an STL (green) file is converted into a set of printer tool head trajectories (G-code, blue), which is the input parameter of the automated scripted section (gray).Blender environment (textures, camera, lights) and compositing settings can also be automated in the future.The image-mask pairs (red) are the result of a frame-by-frame animation rendering for each individual G-code file.

Figure 5
Figure 5 illustrates several examples of realistic textures for the printing bed plane.

Fig. 5 .
Fig. 5. Texture samples for the printing bed.More than 15 photographs of surfaces such as wood, metal, paper, stone, and others were superimposed on the virtual working area.Variations in lighting, cropping, scaling, and image orientation during animation allow the creation of unique backgrounds.

Fig. 6 .
Fig. 6.Shading node network has been experimentally developed to achieve maximum realism of generated renders.The creation of all connections and node settings is fully automated in the code, which provides the flexibility to adjust the color, transparency, reflectivity, and other characteristics of the output material (red).

Fig. 8 . 3 -
Fig. 8. 3-D model slicing procedure.(a) Whole part in STL format.(b) Internal structure of the sliced layers (red-outer shell, green-inner shell, yellow-infill, blue-support.(c) Side view illustrates the current printing layer (top layer at each manufacturing stage).

Fig. 10 .
Fig. 10.Unpaired image-to-image translation using the cycle-consistent adversarial network.Handpicked images of real and virtual printed parts were loaded into CycleGAN, which learns to map real domain images to their synthetic counterparts and vice versa, minimizing the cycle consistency loss L C .

Fig. 11 .
Fig. 11.Image-to-image style translation example.Translating a real image into its synthetic version reduces the contrast and saturation of the reflections and incidental filament strings.

Fig. 12 .
Fig. 12. Results of semantic segmentation on the example of several real images.The neural network was trained on similar synthetic 3-D models.Color, printing surface texture, and slicing parameters, however, differ from those used in the training dataset.

Fig. 13 .
Fig. 13.Datasets for the style transfer influence analysis.(a) Synthetic data.(b) Real data.(c) Real data after style transfer.The upper row shows sample images and the lower row illustrates the corresponding ground truth masks.

Fig. 14 .
Fig. 14.Domain comparison via t-SNE projections (a) and segmentation performance before and after style translation (b).

Fig. 15 .
Fig. 15.Results of image segmentation before and after style translation.Real-to-synthetic style transfer reduces the saturation of the incidental filament strings and reflections on the printing platform, which, in turn, affects the results of semantic segmentation.

TABLE I SEGMENTATION
RESULTS FOR SYNTHETIC TEST DATASETS (MIOU SCORES, %)