Vacuum Thermoforming Process: An Approach to Modeling and Optimization Using Artificial Neural Networks

In the vacuum thermoforming process, the group effects of the processing parameters, when related to the minimizing of the product deviations set, have conflicting and non-linear values which make their mathematical modelling complex and multi-objective. Therefore, this work developed models of prediction and optimization using artificial neural networks (ANN), having the processing parameters set as the networks’ inputs and the deviations group as the outputs and, furthermore, an objective function of deviation minimization. For the ANN data, samples were produced in experimental tests of a product standard in polystyrene, through a fractional factorial design (2k-p). Preliminary computational studies were carried out with various ANN structures and configurations with the test data until reaching satisfactory models and, afterwards, multi-criteria optimization models were developed. The validation tests were developed with the models’ predictions and solutions showed that the estimates for them have prediction errors within the limit of values found in the samples produced. Thus, it was demonstrated that, within certain limits, the ANN models are valid to model the vacuum thermoforming process using multiple parameters for the input and objective, by means of reduced data quantity.


Introduction
Thermoforming of polymers is a generic term for a group of processes that involves the forming or stretching of a preheated polymer sheet on a mold producing the specific shape. It is considered to be one of the oldest methods of processing plastic materials [1]. The process which uses the vacuum negative pressure force to stretch this heated polymer sheet on a mold is called vacuum forming or vacuum thermoforming [2]. Specifically, this is the forming technique and/or stretching where a sheet of thermoplastic material is preheated by a heating system (Figure 1a,b), and forced against the mold surface (positive or negative) by means of the negative vacuum pressure produced in the space surface (positive or negative) by means of the negative vacuum pressure produced in the space between the mold and sheet (Figure 1c, by mold suction holes and a vacuum pump which "sucks" the air from the space and "pulls" the sheet against the surface of the mold, transferring it, after cooling and removing excess material to shape it ( Figure 1d) [3,4]. The typical sequence of this technique by Ghobadnam et al [5] is presented in Figure 1. However, what is observed, in practice, is that incorporating prior knowledge or a trial-anderror methods to predict the final result of the process and the quality of the product can be far more difficult. Thus, the evaluation of the final performance of the system is sometimes complex, due to various factors, such as the raw material of the mold, the equipment characteristics, the type and raw material of the sheet, and other factors [6][7][8]. In addition, the process often highlights the conflicts between aspects of quality and adjustments of process control variables [9,10]. In recent years, several authors have developed work with the objective of modelling and predicting the quality of the final product of the vacuum thermoforming process.
Thus, Engelmann and Salmang [6] presented a computational statistics model and data analysis, and Sala et al [11] and Warby et al [12] in a complementary focus, worked on the development of an elastic-plastic model for thickness analysis. Many studies concentrated on aspects of mold geometry and process parameters to verify their influence on the wall thickness distribution [5,[13][14][15]. A hierarchically-ordered multi-stage optimization strategy for solving complex engineering problems was developed, [3,16]. Martin et al [17] presented the study of the instrumentation and control of thermoforming equipment and its analysis and control in real-time of multiple variables. The accuracy of the developed controller and its prospective real-time application is evidenced by the results. Some studies focused on modeling, simulation, and optimization of the heating system by different methods and techniques [18][19][20].
However, in complex manufacturing processes such as this, Meziane et al [21], Tadeusiewicz [22] and Pham [23] suggest that the traditional approaches to process control fail to understand all aspects of process control or existing subsystems. Sometimes the amount and type of variables involved make the computational and mathematical modelling of the system a multi-variate, multiobjective, complex process with non-linear and conflicting objectives [9,10,24]. Thus, according to them, in the last few years, several studies have been presented, using computational intelligence (CI) techniques aimed at the modeling of the non-linear characteristics and conflicting objectives of these processes. The research was carried out using a series of computational tools for the resolution of problems that require human intelligence abilities for their resolution or computational modeling, with artificial neural networks (ANNs) being more intensively investigated and studied [25,26].
ANNs are mathematical computational models inspired by biological neural structures or biological neurons [27,28]. The artificial neurons, or perceptron, is constituted of three elements. One input, "X", one weight "W", and a combination of sum function (φ) which may be linear or not, and in some cases, a bias, θj, is included [29]. The "Y" response of the ANN is obtained by applying the activation function on the output of the combiner or sum function matrix Y = φ (W × X + θ) [30].
One algorithm model, called a basic ANN, is the multi-layer perceptron (MLP), which is typically composed of combinations of artificial neurons that are interconnected, usually by a node However, what is observed, in practice, is that incorporating prior knowledge or a trial-and-error methods to predict the final result of the process and the quality of the product can be far more difficult. Thus, the evaluation of the final performance of the system is sometimes complex, due to various factors, such as the raw material of the mold, the equipment characteristics, the type and raw material of the sheet, and other factors [6][7][8]. In addition, the process often highlights the conflicts between aspects of quality and adjustments of process control variables [9,10]. In recent years, several authors have developed work with the objective of modelling and predicting the quality of the final product of the vacuum thermoforming process.
Thus, Engelmann and Salmang [6] presented a computational statistics model and data analysis, and Sala et al. [11] and Warby et al. [12] in a complementary focus, worked on the development of an elastic-plastic model for thickness analysis. Many studies concentrated on aspects of mold geometry and process parameters to verify their influence on the wall thickness distribution [5,[13][14][15]. A hierarchically-ordered multi-stage optimization strategy for solving complex engineering problems was developed, [3,16]. Martin et al. [17] presented the study of the instrumentation and control of thermoforming equipment and its analysis and control in real-time of multiple variables. The accuracy of the developed controller and its prospective real-time application is evidenced by the results. Some studies focused on modeling, simulation, and optimization of the heating system by different methods and techniques [18][19][20].
However, in complex manufacturing processes such as this, Meziane et al. [21], Tadeusiewicz [22] and Pham [23] suggest that the traditional approaches to process control fail to understand all aspects of process control or existing subsystems. Sometimes the amount and type of variables involved make the computational and mathematical modelling of the system a multi-variate, multi-objective, complex process with non-linear and conflicting objectives [9,10,24]. Thus, according to them, in the last few years, several studies have been presented, using computational intelligence (CI) techniques aimed at the modeling of the non-linear characteristics and conflicting objectives of these processes. The research was carried out using a series of computational tools for the resolution of problems that require human intelligence abilities for their resolution or computational modeling, with artificial neural networks (ANNs) being more intensively investigated and studied [25,26].
ANNs are mathematical computational models inspired by biological neural structures or biological neurons [27,28]. The artificial neurons, or perceptron, is constituted of three elements. One input, "X", one weight "W", and a combination of sum function (φ) which may be linear or not, and in some cases, a bias, θ j , is included [29]. The "Y" response of the ANN is obtained by applying the activation function on the output of the combiner or sum function matrix Y = φ (W × X + θ) [30].
One algorithm model, called a basic ANN, is the multi-layer perceptron (MLP), which is typically composed of combinations of artificial neurons that are interconnected, usually by a node system or mesh. The MLP generally consists of "n" neurons interconnected in a system of meshes of nodes and divided into: an input layer, an output layer, and one or more hidden layers, and, between layers, the neurons are connected with their respective weights (biological synapses), which learn or record knowledge (by adjustable weights) between the input and output layers of the network. Furthermore, the network of layers is interconnected externally with their supervised training or learning algorithms [26,27].
In the MLP network, through the input and output data of the network or patterns, the network is trained in a cyclical process by its algorithms and a performance index is calculated for the network in each training round or epoch. These supervised training and learning MLP processes can be continuous until the ANN model "learns" to produce desired outputs for input from its pattern [27] or a performance index of the network, such as the mean square error (MSE), which achieves an error equal to or less than specified, or when the network reaches any other stop criteria specified during model programming. For this, the networks are implemented with training algorithms, the most commonly used being the back propagation (BP) and Levenberg-Marquardt (LM) algorithms. The BP algorithm is a method of supervised learning (batch) that seeks to minimize a global error function or Sum Squared Error (SSE) for the j neurons of the layer(s) at each epoch [31,32]. The LM algorithm, developed by Hagan and Menhaj [33] and implemented in MATLAB ® software (MathWorks Inc., Natick, MA, USA) by Demuth and Beale [34], is a method that provides a solution to the minimization problem of a non-linear function based on the Gauss-Newton method and gradient descent algorithm via calculation of Jacobian matrices [35].
The ability to work with complex or multi-dimensional and multi-criteria problems makes ANNs one of the main methods used in engineering for computational modeling [22]. A model with multi-criteria optimization is defined when it is desired simultaneously to optimize several objective functions and, in some cases, these functions are in conflict, or compete with, each other and, thus, the possible optimal solutions do not allow, for example, the maximization of all the objectives in a joint manner [36].
In this context, some authors have developed computational models based on Computational Intelligence (IC) techniques associated, or otherwise, with statistical optimization for the analysis of quality characteristics of the piece produced by vacuum thermoforming, some of them described by Chang et al. [24]. Likewise, Yang and Hung [9,10] proposed an "inverse" neural network model which was used to predict the optimum processing conditions. The network inputs in this work included the thickness distribution at different positions various parts, and the output or optimal process parameters were obtained by ANNs. Additionally, Küttneret et al. [3] and Martin et al. [17] presented the development of a methodology that uses an ANN to optimize the production technologies together with the product design. Finally, Chang et al. [24] tested an inverse model of ANN on a laboratory scale machine, where it used the desired local thicknesses as inputs and the processing parameters as outputs, with the aim being process optimization.
Thus, first of all, the current work studied both the values of manufacturing parameters and the quality of samples produced by the vacuum thermoforming process on a laboratory scale. Additionally, these initial experimental results were used to investigate the computational modeling of the process through several ANN models that aimed to correctly present the deviation values given a set of manufacturing parameters. These study sequences allowed the study of multivariable and multi-objective optimization algorithms using ANN models to obtain optimum values of the manufacturing parameters simultaneously with the group predictions of product deviations. Finally, validation tests and confirmation are carried out with the objective of evaluating the ability of each model to simulate the process under new experimental conditions and, also, estimate deviations, verify the efficiency of the approach, and validate the proposed methodology.

Material, Equipment, and System
For the three-dimensional (3D) design of the model and mold, aspects inherent to the manufacturing process and contraction of 0.5% were considered [8,37] and computer-aided design (CAD) software, integrated with computer-aided manufacturing (CAM), was used. The mold was machined in a computer numeric control (CNC) using plates of medium density fiberboard (MDF) as a raw material. This has dimensional and geometric characteristics of a product standard and, also, a 3D coordinate measuring machine (3D CMM) was used to determine the dimensional and geometric deviations present in the mold.
A semi-automated vacuum-forming machine was developed and automated by the researchers. This equipment has the capacity to work with plates of thickness of 0.1 to 3.0 mm, a useful area of 280 × 340 mm, a displacement of the mold (z axis) of 150 mm, a vacuum pump of 160 mbar with a motor of 1.0 CV, an infrared heating system composed of two resistors of 750 and 1000 W, movement by pneumatic systems, and acquisition of temperature data by "K" thermocouples and non-contact infrared. The system is programmable and controlled by a commercial personal computer (PC) integrated with an Arduino microcontroller (Arduino Company Open Source Hardware, Somerville, MA, USA).
In this work, 2.0 × 2.5 m of white laminated polystyrene (PS) sheets with a thickness of 1.0 mm were used to manufacture the parts. The plates were cut into 300 × 360 (machine size) sheets, cleaned with water and liquid soap of neutral pH, and then dried and packaged in plastic film packages that had previously been heated at 50 • C for two hours.
The commercial equipment and software used in the development of this study are described below and included: a Micro-Hite 3D TESA TM 3D coordinate measuring machine (

Parameters and Measurement Procedure
There is no consensus among authors about the measurement parameters and procedures. According to Küttner et al. [3], Muralisrinivasan [4], Yang and Hung [9,10] and Chang et al. [24] in the vacuum thermoforming process several parameters of control and quality can be used, depending on the type of equipment, mold, and product geometry. Throne [2], Klein [7], Throne [8] and Chang [24] explain that there is no specific measurement procedure or equipment to be used. Thus, they were defined to control the deviations as described in the following paragraphs, with the scales, measurement procedures, and tolerances presented.
For measurement of the errors, 3D MMC was used carrying a 4mm diameter solid probe, calibrated with an error of ±0.004 mm, which has an accuracy of 0.003 mm and CAI software. The reference values for dimensions were calculated, based on the final dimensions of the mold. Additionally, according to Throne [2] and Klein [7], a deviation of ±1% for linear dimension and ±50% for flatness on surfaces are acceptable and, as a reference, the values calculated for dimensions were adopted as the general criteria for acceptance of sample dimensions. Figure 2 presents the geometry of the product standard, where dimensions and deviations to be measured in the samples are represented.
The dimensional deviation height (DDH i ) or DEV 01 was defined as: where TSH is theoretical sample height and a negative (−) mean value indicates that the height is less than the ideal and a positive mean value (+) that it is greater than the ideal. For the calculation of DEV 01, eight (8) points were collected on each surface. Additionally, in all equations in this section, the index i represents the i-th analyzed sample.
Polymers 2017, 9, x FOR PEER REVIEW 5 of 18 The dimensional deviation height (DDHi) or DEV 01 was defined as: where TSH is theoretical sample height and a negative (−) mean value indicates that the height is less than the ideal and a positive mean value (+) that it is greater than the ideal. For the calculation of DEV 01, eight (8) points were collected on each surface. Additionally, in all equations in this section, the index i represents the i-th analyzed sample. The deviation of the diagonal length (DDLi) or DEV 02 is calculated by the difference between the values of the MLDSi and the value of the TDL, being: where MLDSi is the measured length of the diagonal in the sample, which in this work was defined as the quadratic relation of the lateral distances of the upper end of the sample (length and width) and TDL is theoretical diagonal length of the Sample = 207.97mm, so: For the calculation of DEV 02, five points were collected along each lateral of the samples. A negative (−) mean value indicates that the length is smaller than the ideal and a positive mean value that it is greater than the ideal.
The geometric deviation of flatness (GDi) or DEV 03, which will have a zero value (0) for an ideal surface or positive value, was calculated as: where MGDSi is the measurement geometric deviation flatness in the sample and TGDS is the theoretical geometric deviation flatness of the sample, that is, the deviation calculated, which was 0.11 mm. For DEV 03, nine (9) points were collected on the lower/bottom surface of the samples.
The DEV 04 or Geometric Deviation of Side Angles (GDSAi), in this study, is expressed as: The deviation of the diagonal length (DDL i ) or DEV 02 is calculated by the difference between the values of the MLDS i and the value of the TDL, being: where MLDS i is the measured length of the diagonal in the sample, which in this work was defined as the quadratic relation of the lateral distances of the upper end of the sample (length and width) and TDL is theoretical diagonal length of the Sample = 207.97 mm, so: For the calculation of DEV 02, five points were collected along each lateral of the samples. A negative (−) mean value indicates that the length is smaller than the ideal and a positive mean value that it is greater than the ideal. The geometric deviation of flatness (GD i ) or DEV 03, which will have a zero value (0) for an ideal surface or positive value, was calculated as: where MGDS i is the measurement geometric deviation flatness in the sample and TGDS is the theoretical geometric deviation flatness of the sample, that is, the deviation calculated, which was 0.11 mm. For DEV 03, nine (9) points were collected on the lower/bottom surface of the samples. The DEV 04 or Geometric Deviation of Side Angles (GDSA i ), in this study, is expressed as: where z is the number of sides and s the evaluated face.  (9) points were collected on each surface analyzed.

Experimental Study
In this research, we used the manufacturing parameters (factors) described by Throne [2] and compatible with the geometry of sample and equipment, namely: A. heating time (in seconds-s); B. electric heating power (in percentage-%); C. mold actuator power (in Bar and cm/s); D. vacuum time (s); E. vacuum pressure (in millibar-mbar). Table 1 shows the levels/values for each parameter.  The experiment was composed of 68 tests according to the planning 2 5-1V (fractional factorial design, by Montgomery [38]) with 16 processes of parameter settings and one center point. For each setting and the center point, two (2) replicates were performed in a random sequence. Still, a sample and a repetition were manufactured in the same sequence, totaling 68 pieces (4 samples per processing parameters settings).
The 68 samples of PS were produced and then cooled completely in an air-conditioned room at 22 • C with 60% humidity. After, the inspection methods described in the previous chapter were applied to quantify the linear and geometric dimensions of the samples. Table 2 shows the types of deviations and respective values of the sample means (by four samplings), the accuracy of this estimate of sample mean (AE) and the standard deviation (S) of estimate of mean [38], for the 17 process parameters settings tested (center point, test No. 17). It is observed that the data vs. type of deviation are well distributed, except for only one (1) point for DEV 03, respectively, standard test 1 (samples 26 and 31 and their repetitions-outlier).

Analysis of Data
First, the analysis of variance (ANOVA) was developed to test the factors and their effects of first and second order and to evaluate whether each factor was significant or not. The ANOVA results for deviations versus the factors studied are summarized in Table 3, or F-test table, with a confidence level of 95% (α = 0.05), and where the critical test value for the F distribution is f 0,05;1;17 = 4.45. In general, for main effects, from Table 3, it can be seen that factors "A" and "B" are the most significant for all deviations and for DEV 01. Additionally, for DEV 02, the parameter of manufacturing B stands out as significant; for DEV 03, all factors are significant; and in DEV 04, in sequence, the most significant parameters are B, A, and D. Furthermore, many interaction effects are significant in terms of the deviations. It is concluded that the critical manufacturing parameter for the deviations analyzed are the electric heating power (B) followed by the heating time (A), and also, except for the vacuum pressure factor (E) for the dimensional deviation of the diagonal length (DEV 02), at least one factor, or its interaction effect, is significant for one of the deviations. Figure 3 presents the results of mean deviation values of all factor levels for all factors for each type of deviation. In the figure, we verified that the most relevant factors are those related to heating (A and B). Additionally, in general, it reveals that there is no predominant behavior between factor levels and lower ranges of deviations and the relationships between factors are not proportional. Furthermore, the variation of any input variable (+1 or −1) generates modifications in at least one type of deviation. It can be concluded, in this analysis of data, that the modification of factor levels cannot be studied in isolation for each type of deviation. Therefore, they must be evaluated simultaneously, and also, none of these factors, or their interaction (second-order), can be eliminated from a study or computational modeling of the process since they are significant in at least one type of deviation.

Modeling, Tests, and Selection of Artificial Neural Network Models
For tests of programming of ANN multilayer models, as input data of the nets, we have used the sequence of factors (process parameters settings) and factor levels of the fractional factorial planning "25 -1V " with center points, respectively. The output data are the sample means of the results of the deviations ( Table 2).
The networks were tested with back propagation and the Levenberg-Marquardt training algorithm. The transfer functions "tansig" was used in the first layer and, in the other layers, combinations of the functions "purelin" and "tansig" were tested. The various network architecture tested were composed of an entrance layer with five data (Xi), an exit layer with four values (Y l j(p)), and still, l-th hidden layer with j-th neurons in each. Figure 4 presents the general architecture of the ANN used.

Modeling, Tests, and Selection of Artificial Neural Network Models
For tests of programming of ANN multilayer models, as input data of the nets, we have used the sequence of factors (process parameters settings) and factor levels of the fractional factorial planning "25 -1V " with center points, respectively. The output data are the sample means of the results of the deviations ( Table 2).
The networks were tested with back propagation and the Levenberg-Marquardt training algorithm. The transfer functions "tansig" was used in the first layer and, in the other layers, combinations of the functions "purelin" and "tansig" were tested. The various network architecture tested were composed of an entrance layer with five data (Xi), an exit layer with four values (Y l j(p) ), and still, l-th hidden layer with j-th neurons in each. Figure 4 presents the general architecture of the ANN used.
The networks were tested with back propagation and the Levenberg-Marquardt training algorithm. The transfer functions "tansig" was used in the first layer and, in the other layers, combinations of the functions "purelin" and "tansig" were tested. The various network architecture tested were composed of an entrance layer with five data (Xi), an exit layer with four values (Y l j(p)), and still, l-th hidden layer with j-th neurons in each. Figure 4 presents the general architecture of the ANN used.  As general parameters of training of ANNs, the following were used: learning rate = 0.001, ratio to decrease learning rate = 0.001, error maximum increment = 0.001 and network performance = "mae". As general parameters to stop the network, the following were used: performance goal = 0, minimum performance gradient = 1 × 10 −25 , maximum number of epochs to train = 10000, maximum number of validation increases = 100, and momentum constant maximum = 1 × 10 308 . Additonally, as the mean absolute error (MAE) was adopted in substitution of MSE as a performance parameter of the network, where MAE ≤ 0.145 (General MAE of the mean deviation in the samples). Equation (6) describes the calculations of MAE.
For the development of multi-criteria optimization algorithms, based on the ANN models, the script codes were implemented and processed using MATLAB ® software. In each computational test of a model of optimization, for the patterns shown to the ANN, the four initial solutions and the MAE values were recorded. Then a new test of the algorithm was recursively initialized. Where the model reached an improved general value of the MAE in a new test run, the code recorded all input and output data of the network and classified it in a sequence of solutions, but, if the MAE does not improve, the algorithm continues the tests until it reaches a net stop criterion and initializes a new model. At each renewal of the network by a stop criterion, all weights and bias were updated with random values. Each model was tested for even 2000 epochs or for the total time of simulation of 1020 min. Table 4 summarizes the performance values and processing of main of multi-criteria ANN models and data of the ANNs tested. In this table, we observe the evolution of models by modification of the models' characteristics, where techniques to improve or simplify the ANN already discussed in other works were applied, along with the change of the training algorithms (model "D", "K", etc.), the modification of the net structure (model "H", "M", etc.), the modification of the transfer function of layers (model "T", "W", etc.), the proportional adjustment between the amount the patterns of the network and the number of neuron layers (model "P", "V", etc.), and the adjustment of the amount of training data and test data of the models [39][40][41]. In Table 4, the model "A" was the first satisfactory solution (MAE ≤ 0.145); however, it presents a net structure with many nodes, a considerable number of weights and bias and, in addition, a significant amount of processing time, which results in slow computing. The models D, H, K, M, O, P, and T are some intermediate models, but they presented problems that evolved or were improved, such as Model "D" and "K", that have an MAE > 0.145, i.e., with errors of predicted values higher found in the process ( Table 2). The V, X, Y, and Z models generally achieved the best performances and predicted values errors considerably lower than the limits found in the process samples. The models are theoretically similar, and present a network structure that simplifies and reduces the processing time, with differences in the training process, the functions used and amount of data. Just as the amount of data and the functions used can modify the models the ANN generated, it cannot be said that the values of the weights and bias are the same, and, consequently, the predicted values (for 68 output data) and the general performance of the ANNs are not the same. Figure 5 presents the predicted values by these models and model "A" for each type of deviation and the target values of each pattern.
Polymers 2018, 10, x FOR PEER REVIEW 11 of 18 In Table 4, the model "A" was the first satisfactory solution (MAE ≤ 0.145); however, it presents a net structure with many nodes, a considerable number of weights and bias and, in addition, a significant amount of processing time, which results in slow computing. The models D, H, K, M, O, P, and T are some intermediate models, but they presented problems that evolved or were improved, such as Model "D" and "K", that have an MAE > 0.145, i.e., with errors of predicted values higher found in the process ( Table 2). The V, X, Y, and Z models generally achieved the best performances and predicted values errors considerably lower than the limits found in the process samples. The models are theoretically similar, and present a network structure that simplifies and reduces the processing time, with differences in the training process, the functions used and amount of data. Just as the amount of data and the functions used can modify the models the ANN generated, it cannot be said that the values of the weights and bias are the same, and, consequently, the predicted values (for 68 output data) and the general performance of the ANNs are not the same. Figure 5 presents the predicted values by these models and model "A" for each type of deviation and the target values of each pattern. As seen in Figure 5, model "A" has significant prediction errors in all deviations, being more evident in DEV 02 as, for example, data 5 = −0.222 ± 0.010 mm, and in model "A" = −0.254 mm. Model "V" has several errors in the forecasts, highlighting the data value number 5 for DEV 01 and data value number 5 for DEV 04. Of the other models, in general, "X" presents the worst performance in the predictions and one significant prediction error, for test 9 of DEV 03, considering the sample variation with value of 0.933° ± 0.132°. Models "Y" and "Z" have negligible errors and, within the ranges found in the samples, are considerably lower when compared with previous values of the other models. The gain in performance value is due to the increase in the number of training data and test data.
In Figure 6, the response surface of "V", "Y", and "Z" models for variables temperature vs. types of deviations are shown. When we compare them, we observed that, although the "V" model has a network structure similar to the "Y" and "Z" models, the use of a linear fit function (purelin) in the As seen in Figure 5, model "A" has significant prediction errors in all deviations, being more evident in DEV 02 as, for example, data 5 = −0.222 ± 0.010 mm, and in model "A" = −0.254 mm. Model "V" has several errors in the forecasts, highlighting the data value number 5 for DEV 01 and data value number 5 for DEV 04. Of the other models, in general, "X" presents the worst performance in the predictions and one significant prediction error, for test 9 of DEV 03, considering the sample variation with value of 0.933 • ± 0.132 • . Models "Y" and "Z" have negligible errors and, within the ranges found in the samples, are considerably lower when compared with previous values of the other models. The gain in performance value is due to the increase in the number of training data and test data.
In Figure 6, the response surface of "V", "Y", and "Z" models for variables temperature vs. types of deviations are shown. When we compare them, we observed that, although the "V" model has a network structure similar to the "Y" and "Z" models, the use of a linear fit function (purelin) in the network contributed to a "linearization" of the surface and the generalization errors ( Figure 6(C1-C4)); this was generally observed in other models. Already, the "Y" and "Z" models have hyperbolic tangent sigmoid transfer functions (tansig), which contributed to the nonlinear generalization of the models. However, as shown in Figure 6(B1-B4), the amount of data used in model "Y", up to now, was not adequate to generate an improved model, which was only achieved with the progressive increase of the amount of data of model "Z" (Figure 6(A1-A4)), which makes this model more suitable for this work.
Polymers 2018, 10, x FOR PEER REVIEW 12 of 18 network contributed to a "linearization" of the surface and the generalization errors ( Figure 6(C1-C4)); this was generally observed in other models. Already, the "Y" and "Z" models have hyperbolic tangent sigmoid transfer functions (tansig), which contributed to the nonlinear generalization of the models. However, as shown in Figure 6(B1-B4), the amount of data used in model "Y", up to now, was not adequate to generate an improved model, which was only achieved with the progressive increase of the amount of data of model "Z" (Figure 6(A1-A4)), which makes this model more suitable for this work.

Modeling and Test of Multi-Criteria Optimization Algorithm Models
The multi-criteria optimization algorithms were developed based on the "Z" model ( Table 4). The coefficient of performance or the objective function of the algorithm for simultaneous minimization of responses [36] was defined by Equation (7):

Modeling and Test of Multi-Criteria Optimization Algorithm Models
The multi-criteria optimization algorithms were developed based on the "Z" model ( Table 4). The coefficient of performance or the objective function of the algorithm for simultaneous minimization of responses [36] was defined by Equation (7): where j represents the j-th coefficient of performance for a (01) solution vector and i the deviation type, where i = 1, 2, 3, and 4 for the deviations DEV 01, DEV 02, DEV 03, and DEV 04. The values of the "admissible errors" for i = 1, 2, . . . , 4 were defined as |0.6 mm|, |2.1 mm|, |1 mm|, |0.72 • |, and the i-th weights adopted are: 2, 2, 3, and 1, respectively. With this data, new codes were programmed with two variations of the algorithm, each with its domain, constraints, and discretization. The data used are described in Table 5. The two variations of the algorithm were processed according to the same logic, where: the input values for the j-th possible solutions were generated in a data matrix, and then the matrix, the ANN model, and the sub-codes were used to find the initial solution. Next, the deviations of this solution were determined and the value of coefficient of performance (O j ) calculated. Finally, the information and data from this possible solution were recorded in a control table in decreasing order. Once this part is processed, the algorithm returns to the first step (internal loop process), repeating the process in search of an improved solution. If it finds one, it writes the data again for this new solution in the decreasing control table. The process was repeated until the model ran in the entire solution space, selected and, thus, found the global minimum value of the solutions vector O j and the optimal parameters of manufacturing. Tables 6 and 7 present the best results. Table 6. Summary of the 10 best results of the "A" variation of the optimization algorithm.

Solution
Factor  In Tables 6 and 7 we see that several configurations have the same value of O j , or very close values, which were already predicted when dealing with a problem with multiple solution spaces, with all being possible optimal solutions to the problem. However, analyzing Figure 3, we see that, in general, for the set of deviations, factor "A" has better results in levels ≥85, factor "B" in levels ≥95, since factor "C" improves next at levels ≤92.5, factor "D" at mean levels ≥8.1, and factor "E" close to levels ≥12.5. From this it follows that the first solution from Table 6 and the sixth solution from Table 7 are the most appropriate solutions to the problem.

Confirmation Experiment
To validate the multi-criteria optimization models developed, new experimental tests were performed, with the respective factors and levels selected. For the processing of the samples, two sequences of tests were performed with the processes of parameter settings or the solutions selected, where five (5) sequentially-manufactured repetitions were performed for each type of setting. Additionally, the same experimental conditions were preserved, as well as the same raw material and infrastructure. In addition, the same steps of the experimental tests were followed. Afterwards, the samples were inspected, adopting the same procedures already described and the deviations previously calculated. Tables 8 and 9 present the results of the expected values of the means of the four deviations for samples in the validation tests, with the 95% confidence interval (CI) on the mean (n = 5 and α = 0.05). The predictions, and the results of the best samples by the O j value in the main experimental tests, the standard test number being 5, are also shown ( Table 2).  From Tables 8 and 9 it can be seen that the samples of the validation tests have mean deviations at lower levels than those of the main experimental tests and, also, the CI limit values are at lower levels. This being the case, in relation to the average values, there is a significant improvement of 20% when compared to the best samples of each type test (type A = 18% and Type B = 22.5%). With regard to the predictions of the multi-criteria optimization algorithms models, the deviations predicted by the models are within of CI limits for the validation samples. Additionally, in relation to the means values of these samples the predicted values of the model type "A" have a mean error on average of 13.2% and type "B" o15.5%, both inside the CI. Furthermore, the values of O j are, on average, 76% below the tolerance limits defined in this work.

Conclusions
In general, it is concluded that the work developed with ANN models was able to simultaneously and satisfactorily model the geometric deviations in the polymer vacuum thermoforming process, where there are conflicts of objectives between the quality parameters and the manufacture of the variables using a laboratory infrastructure and with a small number of tests.
The tests allowed us to determine that, to minimize deviations, one should use factor "A" between 85 and 95 s, "B" within the range of 87.5% to 100%, "C" in the range of 85% to 100%, "D" for 6.3 to 8.1, and "E" between 12.5 and 15 mbar. Additionally, the main factors of the analysis of the process are heating time (A) and heating electric power (B). The understanding of their interactions is the critical point for minimizing the set of deviations. In addition, we note that the analysis of results of experimental tests does not allow us to select a (1) single set of factors and levels that simultaneously optimize all parameters. This is because different levels of the same factor could be optimal for different responses, e.g., factor "D" [9].
It has been verified that the gradual modification of the ANN architecture with the modification of functions, algorithms, and the number of layers associated with the progressive increase in the amount of data presented to ANNs significantly reduces the residues and can improve the approximation of the network. Additionally, it can lead to the development of models of optimization by ANNs with reduced numbers of neurons and satisfactory levels of generalization error.
In the validation tests, a gain was obtained in the general minimization of deviations of 20% and coefficient of performance (O j ) of 22.6% and, also, forecast efficiency average values of 84% for the target value. It was verified by CI limit values, that the predicted values by two models are within the expected variability for the process. Additionally, it is concluded that the ANN's models are an option for the development of algorithms for prediction and optimization of the polymer vacuum thermoforming process with a median amount of data.
Finally, each solution presented by the optimization models represents a (1) set of possible values of the manufacturing parameters within the established modeling criteria, and the choice of one of the solutions will depend on other technical or economic factors involved in the process, such as processing time, operating cost, electric energy consumption, etc.