Multi-Attribute Machine Learning Model for Electrical Motors Performance Prediction

: Designing an electrical motor is a complex process that needs to deal with the non-linearity phenomena caused by the saturation of the iron at high magnetic ﬁeld strength, the multi-physical nature of the investigated system and with requirements that may come into conﬂict. This paper proposes to use geometric parametric models to evaluate the multi-physical performances of electrical machines and build a machine learning model that is able to predict multi-physical characteristics of electrical machines from input geometrical parameters. The focus of this work is to accurately estimate the electromagnetic characteristics, motor losses and stator natural frequencies, using the developed machine learning model, at the early-design stage of the electrical motor, when the information about the housing is not available and to include the model in optimisation loops, to speed-up the computational time. Three individual machine learning models are built for each physics analysed, a model for the torque and back electromotive force harmonic orders, one model for motor losses and another one for natural frequencies of the mode-shapes. The necessary data is obtained by varying the geometrical parameters of 2D electromagnetic and 3D structural motor parametric models. The accuracy of different machine learning regression algorithms are compared to obtain the best model for each physics involved. Ultimately, the developed multi-attribute model is integrated in an optimisation routine, to compare the computational time with the classical ﬁnite element analysis (FEA) optimisation approach.


Introduction
Electrical machines have been intensively used in recent years in the automotive industry for applications that require high efficiency and reliability, switching designers' focus to the development of robust high-performance electrical machines. The design process of an electrical motor is a complex task that needs to deal with the non-linearity caused by the saturation of the iron at high magnetic field strength and the multi-physical nature of the investigated system, as well as with requirements that may come into conflict. Conventionally, the design process follows a sequential process: firstly, the electromagnetic targets are satisfied, then the stress and thermal aspects are analysed, leaving the noise, vibration and harshness (NVH) characteristics at the end. This means that at the end of the design process, only limited changes could be made to improve the noise and vibration characteristics. At the same time, since the design process is a multi-physical problem, where different domains are involved, experts from distinct fields must interact and contribute to achieve a robust, optimal design. Therefore, the electrical machine development process is an iterative and computational expensive one. Moreover, the multi-physical nature of motor characteristics analysis requires a synergy between 2D FEA for electromagnetic and losses analysis and 3D FEA for structural characteristics. Hence, the overall time is increased by using finite element (FE) analysis, that is time consuming and memory intensive, even when parallelization techniques are used.
Traditionally, the process used to design a high-performance electrical machine is multi-objective machine optimisation [1,2]. The optimal design is obtained by automatically varying the geometric parameters within predefined limits, for imposed objectives and constraints. The design space exploration is conducted using optimisation algorithms [3], the designer having the freedom to choose the objectives, constraints and parameters discretisation, selected based on manufacturing capabilities.
Various studies from the state of the art are focused on finding the optimal design of electrical machines. Due to economical reasons (e.g., high cost of rare-earth materials) and the need for high power densities, the cost optimisation procedure gained popularity in the latter years, the machine being optimised to met the requirements at the lowest cost [4]. Different motor topologies, including permanent magnet and synchronous reluctance machines, are analysed in order to select the best design at the lowest cost in [5], while in [6], a permanent magnet synchronous machine (PMSM) is optimised to met the performance and cost demands, with focus on high-volume mass production and its constraints. Another optimisation objective is presented in [7], where the torque ripple is optimised to obtain a high and smooth torque. Nevertheless, the computational cost of an optimisation loop can drastically increase when a large number of machine designs are analysed. This is caused by the FE-based simulations conducted to evaluate the performances of the machine designs. Despite their well-known accuracy, the simulations based on FE may limit the optimisation process, due to their high computational cost (simulation times may vary from several minutes to several hours or even days [8]). To overcome the discussed issues, fast models can be developed using machine learning models [9], reducing the computational burden in the design stage, as most of the computations are carried out in the model building phase. At the same time, several processes can be brought earlier in the design cycle. This way, the system's performances and sensitivities can be identified in the concept stage and the designer can decide if the desired targets are met.
Machine learning models used for design, optimisation, fault detection and robustness evaluation have been among the main research interests in electrical machines field over the years. Some works have already focused on generating machine learning models that allow replacement of the time consuming FEA and reduce the computational time. In [10], a statistical analysis that uses multiple correlation coefficients has been used to generate a fast model that is able to replace the FE model and reduce the computational effort, whereas in [11], the same objective is accomplished by using an artificial neural network. Another approach that uses online learning and dynamic weighting methods is presented in [12]. Moreover, in [3] the focus is on analyzing the effectiveness of using electrical machine machine learning models that incorporate tolerance or sensitivity aspects in a multi-objective optimization run. Further works focusing on machine-learning-assisted multi-objective optimisation are presented in [13], and in [14]. A recent work presents a data-driven structural modelling for electrical machine airborne vibration that is intended to be used in both design stage for optimisation purposes and in system-level simulations [15]. In [16], a multi-physics simulation workflow, based on reduce-order models, used to predict the vibro-acoustic behaviour of electrical and decrease the computation time is presented. The influence of the production mass tolerances modeled at system level and the interaction between the uncertainties and the drive's components, together with the fitted machine learning can be found in [1]. Fast prediction of electromagnetic torque and flux linkages used in system-level simulations are accomplished using a machine learning electromagnetic model based on artificial neural networks in [17]. Machine learning models employed to predict sound pressure levels are developed in [18,19], where it is proven that the developed models can be considered as replacements of the FEA for future design and optimization problems of the same motor. However, none of the above examples focus on the multi-physical characteristics of electrical machines, but only on the prediction of one physics (e.g., electromagnetic, thermal or structural characteristics). In [20], a data-aided, deep learning meta-model, based on cross-section image processing, is created to predict the multi-physical characteristics (e.g., maximum torque, critical field strength, costs of active parts, sound power) quickly and to accelerate the full optimisation process. The accuracy of this method is highly dependent on the chosen hyper-parameters settings and on the precision of the input data. Even if the hyper-parameters can be selected based on a sensitivity analysis, the accuracy is still dependent on the precision of input data. The image-based proposed method performs close to parameter-based method, that will be presented in this paper, only for increased pixel resolution of the training data, meaning that the method proposed in the cited work is memory intensive.
This paper proposes to use geometric parametric models to evaluate the multi-physical performances of electrical machines and build a multi-attribute machine learning model. The aim is to characterise the development process of a multi-attribute machine learning model that is able to predict the multi-physical characteristics of electrical machines. The obtained machine learning model is capable to quickly and accurately estimate the multiattribute performance from geometrical parameters (which represent the input data). At the same time, the model is suitable for inclusion in optimisation routines to reduce the process time and in system-level simulations for fast predictions. The main difference between the presented machine learning model and the ones found in the literature is the multi-physical feature of the developed model. The model can accurately estimate the torque and back electromotive force, motor losses and natural frequencies of the mode-shapes. Moreover, this article presents a way to simplify the machine learning problem by using a harmonic model to predict the electromagnetic values, reducing the computational burden in the training phase.
For that, the full process of designing the electrical motor is brought in the early-design stage, where the characteristics (electromagnetic, motor losses, mode-shapes and natural frequencies) can be evaluated and predicted by the designer. Therefore, the designer can come to the best solution, identifying the system's capabilities and sensitivities in the concept phase, without involving experts from different domains. The proposed workflow is identified in Figure 1. The data, needed to build the multi-attribute model, are achieved by conducting both 2D electromagnetic and 3D structural FE analysis on a set of motor models generated by imposing multiple design parameters on the reference parametric model. After that, the resulting data is harnessed using machine learning algorithms, creating one machine learning model per physics involved. One harmonic model for the electromagnetic torque and back emf is created by applying a Fourier decomposition on the electromagnetic quantities, reducing the size and the complexity of the model, without a significant loss in accuracy [21]. Another two models are built, one for motor losses and one model to predict the natural frequencies of the mode-shapes. The most suitable machine learning model for each target is chosen by testing different machine learning algorithms applied on multiple dataset sizes. Based on their capability to predict electrical motors multi-physical characteristics, the most accurate machine learning model for each physics is selected.
The paper is structured as follows: The process of building the stator parametric model, together with the most important design parameters and their interval of variation is presented in Section 2. Afterward, the multi-physical FE analysis, that include electromagnetic simulations, losses computations and structural modal analysis, performed on the parametric model for a set of imposed design variables are described in Section 3. Section 4 is dedicated to the multi-attribute machine learning model. Here, an evaluation of the prediction and fitting capabilities of different machine learning algorithms are investigated. Moreover, the impact of training sample size is discussed here. Section 5 deals with a comparison between machine learning and FEA computational cost. The final conclusions are drawn in Section 6.

Modal Analysis Modes and Frequencies
Step 3 Use Machine learning to harness the obtained data and build machine learning models Step 2 Use parametric FE motor models to obtain motor performances Step 1 Impose input parameters and analysis conditions Design Space Exploration with LHS Figure 1. Multi-attribute machine learning model workflow.

Stator Parametric Model
At the beginning of the development process, the designer has information about the stator core, but not about the rest of the components. Hence, for the structural analysis, only the stator core is modeled, leaving the housing and the rotor core aside. The stator is the main energy transfer path and has the most important influence on the NVH characteristics. The stator is excited by the electromagnetic field and transfers the energy to the exterior, causing airborne noise. Therefore, in order to characterise the transfer path, a threedimensional model is created within Simcenter 3D. The 3D stator model is constructed based on the geometric dimensions and material specifications of the electrical motor under study. Afterwards, the stator model is parameterized with the same degrees-of-freedom (DOFs) as in the electromagnetic optimization process [6], a DOF consistency being needed for multi-physics optimization routine. Figure 2 shows the cross-section and the DOFs of the parameterized model, where TWS represents the tooth width, YT is the yoke thickness, SOAng stands for the tooth tip angle, TGD is the tooth tip height, SO represents the slot opening, R is the stator inner radius, and R a stands for the stator outer radius. The parameterised model allows the generation of a set of feasible designs by performing a design of experiments using specific sampling techniques. The most important design parameters, TWS, YT, SOAng, TGD, SO, are varied within imposed boundaries, while the stator length (L stk ) and the stator inner and outer radii are kept constant. Each parameter is individually varied within imposed limits and the design space is filled with the help of the Latin Hypercube Sampling Technique (LHS). The variation interval, with its lower (LB) and upper (UB) boundaries, of the considered DOFs is presented in Table 1. The variable DOFs are chosen based on their impact on the structural characteristics and electromagnetic performances of the machine. The yoke thickness is the main contributor to the stator vibration response due to its direct influence on the stator stiffness. By increasing the yoke thickness, the stiffness value enlarges, shifting the natural frequencies corresponding to each mode-shape to a higher value. In particular, mode 0 (breathing mode) is strongly influenced by the variation of this parameter. At the same time, the yoke thickness, together with the tooth width have a massive impact on the electromagnetic characteristics by influencing the saturation levels of the electrical machine. The tooth width can also influence the structural behavior of the motor by creating local tooth bending modes. On the other hand, the variation of tooth tip height, slot opening and tooth tip, influence especially the electromagnetic quantities. The electromagnetic flux density harmonics are affected by this variation and, indirectly the structural vibration characteristics, by influencing the source of vibrations for the electrical motor.

Data Generation
Multi-physical FEA simulations are performed on the electromagnetic and structural parametric models in order to extract the electromagnetic quantities (i.e., electromagnetic torque, back electromotive force and motor losses) and the modal characteristics (i.e., modeshapes and their corresponding natural frequencies). Motor valid designs are obtained by a variation of the stator geometrical parameters (TWS, YT, SOAng, TGD and SO) using the LHS method, while the stator length (L stk ), the stator inner and outer radii, together with material properties remain invariant. For the electromagnetic analysis, the rotor and magnet properties and geometry are also kept constant. Given these assumptions, 3D FE modal analysis simulations are performed on the stator core, extracting the structural mode-shapes and natural frequencies, while 2D FE electromagnetic analysis are performed to obtain the electromagnetic characteristics and motor losses of the generated feasible designs. At the end of the data generation process, the obtained information is harnessed using different machine learning algorithms to obtain the most suitable machine learning model to predict the electrical motors performances from input geometrical parameters.

Electromagnetic Analysis
The numerical analysis is performed on a twelve slots ten poles Interior Permanent Magnet Synchronous Machine (IPMSM) with concentrated star connected windings used in electric power steering applications. Relevant parameters of the machine under study are presented in Table 2. Two-dimensional FE electromagnetic analysis is performed, within Simcenter Motorsolve, on the machine under study, with the parameterized crosssection presented in Figure 3. The mesh size is chosen according to the imposed speedaccuracy trade-off that defines a relationship between faster solutions and higher accuracy.
Electromagnetic analysis is conducted at speed-accuracy trade-off of 3 out of 10, where 1 is the fastest and 10 is the most accurate level. For the selected accuracy level, the simulation time takes approximately 3.5 min for each model. The analysis is conducted for the nominal operation conditions, imposing the nominal speed and nominal currents in the d-q reference frame. The extracted motor performances from the 2D analysis are the torque and the back electromotive force (back emf), together with the motor losses. The electromagnetic torque and the back emf waveforms extracted from the electromagnetic analysis for JT = 4.39935 mm, TWS = 6.9565 mm, SO = 1.8815 mm, SOAng = 116.65 deg., TGD = 0.98125 mm are displayed in Figure 4 with blue lines.   The generated data is intended to be used for training a machine learning model that is capable to predict the performances of new electrical machine designs. Therefore, a machine learning model suitable for waveforms should be chosen. However, this model type includes the time as one of the dimensions, hence, it is more computationally expensive in the training phase [22]. The problem can be simplified by using a harmonic model for the torque and back emf values, instead of using their time varying waveforms. This way, the time dimension is eliminated and the machine learning algorithm will be applied on discrete datasets, reducing the complexity and the training time, compared with the waveform machine learning model.
For that, the torque and back electromotive force waveforms are post processed by applying the discrete Fourier decomposition, obtaining their corresponding harmonic order amplitudes and phases. The harmonic orders obtained for electromagnetic torque and back emf waveforms extracted from the electromagnetic analysis for JT = 4.39935 mm, TWS = 6.9565 mm, SO = 1.8815 mm, SOAng = 116.65 deg., TGD = 0.98125 mm can be identified in Figure 5, where with blue it is identified the DC component for electromagnetic torque and the 1st harmonic order for back emf, and with red the higher harmonic orders, having lower amplitudes. For the training process, only the first three most influential harmonic orders are taking into account, for the electromagnetic torque (DC, 6th and 12th harmonic orders) and for the back emf (1st, 3rd and 11th harmonic orders). The reconstructed torque and back emf waveforms using the most important harmonic orders amplitude and phases are represented in Figure 4

Modal Analysis
In the process of obtaining a high-efficient motor designs, the way the system transfers energy over a frequency band, for specific materials and dynamic events, must be taken into account. The dynamical behaviour of the electric motor, dependent on the geometry, mass, stiffness, damping and boundary conditions, is characterized by its modes [23]. Each mode (a deformation pattern) occurs at a specific frequency, so-called natural frequency, and can be calculated by modal analysis technique. Modal analysis is used to show how the structure can vibrate and identify the structure natural frequency. The identification of the mode-shapes and their frequency is of great importance to an electrical machine designer, because a match between the frequencies of the air-gap exciting forces and the natural frequencies causes resonances, where there is an amplification effect of noise and vibrations. The resonance is an undesirable phenomenon and should be avoided. Knowing in advance the values of the natural frequencies, the designer can elude resonance either by modifying the transmission path and shifting the stator structure eigen-frequencies, or by influencing the excitation source, in this case, the air-gap forces. However, in some cases, because the electric machine operates with variable frequency, it is impossible to eliminate the resonance phenomena caused by variable frequency forces that excite the structure.
The modal analysis is based on the general motion equation, for a system with N Degrees-Of-Freedom (DOFs): where [M] represents the system mass matrix, [B] and [K] represents the system damping matrix and the stiffness matrix, respectively, of size N × N. The terms x(t) and f (t) are the displacement and force vectors of size N × 1. Modal analysis solves Equation (1) for free vibrations, i.e., without taking into consideration the damping and the force, resulting in the following expression of the motion equation: For this work, modal analysis is conducted by FEA, where the 3D structural parametric model of the stator is used. A proper mesh type is chosen for the structural model to fulfill the requirements regarding computational time and model accuracy. The structural mesh is built from a 1 mm mesh size on the stator face, resulting in a 14.568 3D eight-noded hexahedral solid (CHEXA (8) in Simcenter 3D terminology) and in 19.521 nodes. Once the mesh is set, the material for the stator core is chosen. An isotropic material is used for the stator steel, while free-free conditions are imposed for the modal analysis. The 3D structural model is presented in Figure 6. The structural modal analysis is performed for a frequency interval set to cover the audible frequency range, from 20 Hz to 20 kHz. The computational time of the full process per discretized 3D structural mesh, on a workstation having an Intel Core i7-9850H CPU running at 2.6 GHz, with 32 GB of RAM takes approximately 2 min (1 min and 38 s). The first six mode-shapes of the base machine, identified by the automated process, with and without longitudinal deflection, are presented in Table 3. Here, the mode-shapes are identified as a combination of (m,n) where m is the order of the circumferential deflection and n is the order of the longitudinal deflection. Table 3. Mode-shapes of the stator core.

FE Results
The process of running massive modal analysis simulations and identifying the modes and natural frequencies without human intervention is allowed by developing a set of scripts in Python and NX Open with the help of Simcenter 3D journaling capability. The automated process allows the opening of a new Simcenter 3D session and control of the simulation workflow from Python according to the user's specifications. In the main Python script, the user sets the number of DOEs, some analysis conditions (e.g., frequency range, mesh size, material), and then the full process, starting from imposing the desired geometric parameters, updating the mesh for each design, imposing simulation conditions, solving the problem, exporting the data and identification of mode-shapes and their corresponding natural frequency is carried out automatically. The mode-shapes are identified by taking into consideration the displacement in radial, tangential and axial direction of stator output nodes, highlighted in Figure 6 with orange. A developed script analyses the node's displacement values and identifies a deformation pattern of all the nodes belonging to the outer face of the stator. Moreover, the script has the capacity to distinguish between global and local modes, selecting for this work only the first six global modes of the stator.

Losses Analysis
The computation of losses is an important issue in the design part of an electrical motor. The losses in different parts of the analysed system could affect the machine performances (e.g., demagnetisation, winding resistivity modification) and, in the end, decrease the efficiency [24]. A correct estimation of electrical machine losses allows, together with the information regarding material thermal properties and heat dissipation method, the computation of the temperature distribution in different parts of electrical machine [25].
The machine losses can be computed either by using analytical methods or Finite Element Analysis Method [26]. In this paper, the motor losses are quantified and analysed using FEA for the nominal operating point of the system. Therefore, the electromagnetic parametric model is used to generate the losses for imposed input geometry conditions. The biggest source of losses for the machine under study comes from the windings [27]. Besides that, the iron losses, composed of the sum of hysteresis loss, eddy-current loss and excess loss is taken into account. Iron losses are computed a posteriori according to the Steinmetz model and separated into hysteresis and (lamination) eddy current. Because different parts of the electrical machines (e.g., stator teeth and stator back iron) are exposed at distinct flux density values, the losses are computed individually [25]. Hence, losses coming from different parts of the machine are analysed: the total machine losses, the winding, the iron, stator back iron, stator teeth, rotor back iron and the magnet losses, all expressed in kilowatts.
The automatic motor losses extraction process starts by imposing the values of the geometric parameters generated with the LHS method to the parameterised electromagnetic model. Afterwards, the nominal operating point is set by imposing the corresponding direct and quadrature currents, together with the nominal speed. At the end, the electromagnetic model is solved and motor losses are computed and extracted.

Multi-Attribute Machine Learning Model Selection
A machine learning model is constructed by harnessing a set of system responses (output values) obtained by imposing a set of predictor variables (input parameters). Therefore, defining the input-output behaviour characteristics is essential. Then, based on the type of the obtained datasets (i.e., discrete or time-series), a proper machine learning method is chosen to process the data that is divided into training, validation and test samples. Following that, the model is then trained using training samples and its accuracy is tested. However, the accuracy of the developed machine learning model depends on the selected machine learning algorithm, as well as on the size of dataset used for training the model.
In this paper, three independent machine learning models are developed for each analysed physics, one for electromagnetic torque and back emf harmonics, one for motor losses and one for the structural targets (modes and natural frequencies). For that, three types of machine learning methods used for regressions are tested on the available datasets exported from 2D electromagnetic and 3D structural analysis. The tested methods are support vector regression (SVR), gradient boosting regressor (GBR) and Gaussian process regressor (GPR) and were selected due to their capacity to predict discrete datasets. Moreover, their accuracy is tested for four training datasets, 250, 500, 750 and 1000 samples. The Python Scikit-Learn library [28] was used to implement the discussed regression models. The procedure used to train, validate and test the machine learning models is the same for each method applied. The training process uses 70% of the available dataset, while the rest of 30% is used for testing.

Support Vector Regression (SVR)
Support vector regression is a popular machine learning model used for both data classification of continuous datasets and regression. SVR is a supervised learning algorithm that uses the same principles as Support Vector Machine and is suitable for prediction of discrete values [29]. The algorithm searches the best line that can fit the data. SVR allows the introduction of the error limit, or the tolerance and will fit the data through a line or a hyperplane (in multiple dimensions). The SVR objective is to minimize the l2 norm, min( 1 2 ||w|| 2 ), where the maximum error is set to have limited variation. When using SVR algorithm, one of the most important aspects to take care of is the dataset size. SVR is not suitable for training large datasets because the training time increases with more than the square of number of samples [30] and becomes too computational expensive. For large datasets, defined by more than 10,000 samples, it becomes infeasible to use SVR due to the increased training time. For large datasets, Linear SVR or SDR regressor can be used as substitutes for classical SVR algorithm [31]. However, as the maximum size of the dataset used for this work is 1000, the classical SVR algorithm will be applied. Aside from this, SVR was chosen for this work due to its advantages for small datasets: it is robust to outliers, it has excellent generalization capability, a high prediction accuracy and it is easy to implement.
Nevertheless, another aspect that must be considered when applying SVR on datasets is the ratio between the number of features and the number of training samples. When the ratio is greater than one, meaning that the number of features for each data point exceeds the number of training samples, the SVR algorithm will underperform. To avoid this, the dimension of the features can be reduced by applying Principal Component Analysis (PCA) [32] to extract the first n principal components with highest corresponding eigen-values. However, PCA-based feature extraction can lead to errors, due to the loss of information, especially if the relationship between the features and the data is highly nonlinear. Another method to reduce the dimension of the features is to apply a feature selection algorithm [33] that allows the selection of the features that are most relevant to build the model. However, the dataset under investigation does not require any feature dimensionality reduction, because the number of features is much smaller than the number of samples. For the tested datasets, the number of features is always constant and equal to nine for the electromagnetic datasets (i.e., torque harmonics amplitudes and phases and back emfs harmonics amplitudes), equal to six for the structural dataset, and it is seven for the motor losses datasets, while the number of samples varies from 250 to 1000.
Regarding the hyper-parameters chosen to train the models, they were kept constant and equal to C = 3.2 and the default kernel, 'rbf' (radial basis function), for all cases.

Gradient Boosting Regressor (GBR)
Gradient boosting is one of the most popular machine learning algorithms for discrete datasets [34]. Compared with linear models, the tree-based methods Random Forest and Gradient Boosting are less impacted by outliers [35]. Gradient Boosting is a robust machine learning algorithm that combines both the Gradient Descent algorithm and Boosting [36]. Gradient Boosting tries to optimise the mean square error (MSE) or the mean average error (MAE).
Among its advantages, one can identify its capability to deal with missing data and its ability to fit the nonlinear relationship between the data and the features. At the same time, it allows the optimisation of different loss functions. The method trains multiple decision tress in a sequential process, starting from a tree with a weak prediction and improving the over-all model's prediction capacity by adding another decision tree that is upgraded by modifying the weights of the first decision tree. Therefore, the new model has a better performance, given by the combination of the two decision trees [37]. After that, the error (also known as residual) given by the model's prediction is evaluated. If the error is not satisfactory, a new tree is introduced to better classify the data and reduce the error. The process is repeated for a imposed number of iterations in order to minimize the error and obtain a better prediction.
One of the main drawbacks is that the GBR minimizes all errors, including the ones given by the outliers, that can cause overfitting. However, to address the overfitting issues, different methods, as regularization, setting a maximum depth and early stopping can be chosen [38,39]. Another disadvantage is that this method is almost impossible to scale up, because each decision tree is trained based on the previous one and it is hard to parallelize the process. For processes that need to be scaled up, a scalable end-to-end tree boosting system called XGBoost is widely used by data scientists. The XGBoost reduces the run time and scales to billions of examples in distributed or memory-limited settings [40].
Considering that the datasets under test are discrete, with a non-linear relationship between data and features, the GBR algorithm is a suitable method to train a machine learning model. For the four datasets under test, the overfitting issue was avoided by choosing a maximum depth of three and an early stopping coefficient to stop the training process when validation score is not improving after 20 iterations. The loss function imposed for the training process is the squared error.

Gaussian Process Regressor (GPR)
The Gaussian process is a type of machine learning model with applicability in solving regression and probabilistic classification problems [41]. For this type of model, the covariance is parameterised by using a kernel function, the most popular being constant kernel and quadratic exponential kernel, known as the radial basis function. The kernels give the shape of the prior and the posterior knowledge of the Gaussian process. The main advantages of the GPR are its capability to work well on small datasets and its ability to provide uncertainty measurements on the predictions [42].
As presented in the literature available for normative modeling, GPR has been used to characterize subject heterogeneity [43,44]. Normative modeling is described by a group of methods that are able to quantify the deviation of an individual from his expected value. GPR has the capability to model the heterogeneity, but it is either hard to estimate the aleatoric uncertainty accurately when the data are sparse, or unnecessary to model the conditional variance, when the data are dense [44].
For prediction of unseen values, GPR is a remarkably powerful machine learning algorithm. Because GPR needs a reduced number of parameters, compared to other machine learning algorithms, to make predictions, it can solve a wide type of problems, even when a small size of data samples is used [45]. GPR becomes inefficient when the number of features of the dataset exceeds a few dozens. However, by analysing the structure of the data samples under test, it can be observed that the number of features is much lower (i.e, the maximum number of features is nine, for the electromagnetic dataset) and the GPR method is suitable to be applied on the dataset under test. Another concern is that the Gaussian processes are computationally expensive and it becomes infeasible to apply it on large datasets. Nevertheless, for small datasets, such as the one under test, the GP regression is still computationally reasonable. Aside from this, the GPR was chosen due to its flexibility in implementation, the user being able to add prior knowledge and information about the model (e.g., smooth, sparse, differentiable) by selecting different kernel functions. The kernel type used in the algorithm is a radial basis function (rbf) kernel.

Performance Indicators for Prediction Accuracy Evaluation
The fitting capability of each regression model is analysed by presenting two statistical indicators. The first one, R-squared, or the coefficient of determination, measures the variation of the regression model. The model fits data good when the R-square has high scores, while very low scores indicates underfitting issues. The model has perfect predictions when the R-squared score is 1.
Consideringŷ i as the predicted value of the i − th sample, y i as the corresponding true value for a number of n fitted points with the mean valueȳ i , R 2 coefficient is defined as: whereȳ i = 1 n ∑ n i=1 y i . The second metrics used to evaluate the fitting capabilities of the machine learning models is the mean squared error (MSE). MSE is a risk indicator, giving the average of the squares errors obtained by computing the square difference between the estimated values and the actual values, MSE(y,ŷ) = 1 n ∑ n i=1 (y i −ŷ i ) 2 . The model fits data good when MSE has values close to 0.

Evaluation of Machine Learning Models Fitting Capabilities
The machine learning models that are most suitable to predict the motor multi-physical characteristics (electromagnetic, losses and structural targets) are selected based on their capacity to accurately fit the target values at the lowest computational costs. The computational cost must be cheaper than the FEA. Since the training process time is negligible, the most suitable machine learning model for each physics involved is chosen based on its accuracy and the number of samples used for the training process, looking for the one that uses the smallest dataset size. Usually, the accuracy of a machine learning model depends on the number of training samples, hence its behaviour was tested for different sample sizes (i.e., 250, 500, 750 and 1000 samples).
The first values that were fitted were the torque and back emfs harmonics. The ability of the tested regression methods to predict the most influential torque harmonic orders, (i.e., DC TH, 6th TH and 12th TH), and the back emf harmonics (i.e., 1st BmfH, 3rd BmfH and 11th BmfH harmonic orders) is presented in Table 4. Here, the R 2 and the MSE scores are presented for each individual regression method (i.e., SVR, GBR, GPR) sequentially trained using 250, 500, 750 and 1000 samples. By analysing the values, it can be seen that all tested methods show an ascendant trend for the R 2 score and a reduction of the MSE error when the number of samples is increased. In terms of machine learning models, SVR model performs much better than the GBR model for all types of samples, the GBR presenting a higher MSE and a lower R 2 than the SVR. At the same time, it can be observed that the GPR model fits the targets more accurately, for each training sets. Starting with 750 samples, the GPR presents excellent results, its R 2 score being higher than 0.93 and MSE lower than 7%. In particular, for 750 samples, GPR succeeds to improve the scores for the 12th TH, R 2 from 0.86 (for 250 samples) to 0.93 and the MSE from 13% (for 250 samples) to 7%. The same situation applies for the 11th BmfH, the R 2 value increases from 0.89 to 0.94 and the MSE reduces from 11% to 1%. However, even if the process of generating 1000 samples from FEA is more computationally expensive, GPR performs at its best for 1000 samples, enhancing, compared with the 750 samples case, the fitting capabilities, especially for the third most influential torque and back emf harmonics. For this case, the obtained R 2 score is 0.95 and MSE is 5% for 12th TH , and the R 2 score is 0.99 and the MSE is 1% for the 11th BmfH. Figure 7 shows the most influential torque and back emf harmonics over their actual target (original) values obtained from the GPR 1000 samples machine learning model.  The capability of the tested regression methods to predict the modes and their corresponding natural frequencies are presented in Table 5, where the R 2 and the MSE scores are presented for each individual regression method, SVR, GBR, GPR, sequentially trained using 250, 500, 750 and 1000 samples.
As can be observed, the machine learning models under test are able to predict very well even when they are trained using 250 samples. By analysing the presented values, it can be seen that except the GPR method, that keeps its scores constant, the SVR and the GBR show an improvement of the R 2 score and a reduction of the MSE error when the number of samples is increased. Moreover, their performance starts to saturate beyond 750 samples. This behaviour is emphasized when the data size is increased from 750 to 1000 samples. Comparing the results, it is clear that the R 2 and the MSE are the same in both cases, and an expansion of data size beyond 750 does not influence the accuracy of the developed models. Actually, the models developed for 750 samples perform well, their R 2 values being 0.99 for the SVR and GBR models and 1 for GPR model. Regarding the MSE, its values are between 0% and 1% for SVR and GBR, while for GBR it is 0%. The GPR model fits the natural frequencies more accurately than SVR and GBR. Even starting with 250 samples, the model is able to be the best performer. The low number of training samples necessary for a good performance is due to the fact that the global modes and their natural frequency are highly affected by the yoke and tooth thickness and are not as sensitive as the electromagnetic targets when the tooth tip angle, tooth tip height and the slot opening are adjusted within the set range. Therefore, less designs are needed to build the data-driven model and the structural characteristics obtained by imposing 250 input geometrical parameters are sufficient to obtain a good generalisation capacity that allows the characterisation of a new design. Figure 8 shows the model capability to approximate the motor structural characteristics. Specifically, the frequency at which the first six mode-shapes appear is estimated with a high accuracy by the GPR 250 samples machine learning model.  The fitting ability to predict the losses targets can be identified in Table 6, where the scores for all types of motor losses are presented (i.e., total-Tot, winding losses-Wind, iron-Iron, stator back iron-SBI, rotor back iron-RBI and magnet-Mag). The results for R 2 and MSE show that SVR performs better than the GBR for all dataset sizes. The performances of all three models are improved by increasing the size of the training data, the worst prediction being obtained at 250 samples for stator tooth losses. Both SVR and GBR manage to increase the R 2 to 1 and 0.99 and MSE to a value under 2 for 1000 samples case, but the model that is more suitable to accurately fit the losses data is, also in this case, GPR. This algorithm manages to predict, even for 250 samples, the losses with a maximum MSE of 9%, minimising the MSE value to 1% for 500 samples and 750 cases and to 0% for 1000 samples case. GPR perfectly fits the target data for 1000 samples case, where the accuracy indicators show perfect results, R 2 taking unity value and MSE being zero. Figure 9 displays the predicted motor losses (total, winding, iron, stator back iron, stator teeth, rotor back iron and the magnets) over their actual target values (original) obtained from the GPR 1000 samples machine learning model. The obtained values are specific to the described number of stator DOFs. For less DOFs, a reduced number of feasible designs is needed to build the data-driven model. Correspondingly, when the number of DOFs is increased, the number of analysed designs must be enlarged to keep the same accuracy.

Machine Learning Model Computational Cost. Machine Learning Model versus FEA
The purpose of developing the multi-attribute machine learning model is to reduce the computational time at the design stage and have a fast estimation of the motor performances, while maintaining the FEA high accuracy. This method allows an accurate system-level analysis and a fast optimisation routine. Therefore, an evaluation of the developed multi-attribute machine learning model capability to predict the optimal solution and its computational cost is performed. The multi-attribute machine learning model under test is defined by the combination of the best performance regressors for each physics involved, GPR 1000 for electromagnetic targets, GPR 250 for natural frequencies of modes and GPR 1000 for losses, and from now on denoted with ML1000.
To quantify the advantage of using the ML1000 machine learning model over FEA, the model is included within a geometric optimization process that targets to maximise a function (e.g., natural frequency of a mode) with respect to imposed multi-physical constraints. In parallel, a FEA based optimisation procedure, having identical objectives and constraints to the ML1000 optimisation process, is performed on the structural and electromagnetic FE motor model. After that, the optimisation results of the two presented methods are compared from the point of view of accuracy and computational costs.

Optimisation Process
The optimisation procedure is performed using the HEEDS MDO software package. HEEDS MDO searches the best and the most robust solutions in a given design space, while drastically reducing the design time, compared with other optimization software [46].
The optimisation algorithm is specific to the used software. HEEDS presents a search strategy named SHERPA (simultaneous hybrid exploration that is robust, progressive and adaptive) [46]. The main advantages of the chosen algorithm are: it allows an intuitive implementation, even for users with no expertise in the field of optimisation; uses multiple strategies simultaneously, and not sequentially, for a single search; adapts itself to each particular problem, without needing tuning parameters coming from the user; identifies the global and the local optimum simultaneously; conducts optimisation by evaluating the actual model, rather than using approximate response surface models; has a high efficiency, saving days or even weeks of CPU time [46].
The optimisation problem can be described as follows: to maximise the function f([x]), representing the natural frequency of Mode 2, subject to a set of constraints, C min ≤ C([x]) ≤ C max , where [x] represents the stator geometry DOFs vector with its upper and lower boundaries vectors [UB], [LB], described in Table 2 and C min and C max are the minimum and maximum allowed values for each constraint.
The natural frequency of Mode 2, average torque, torque ripple and total losses belong to the constraint vector C([x]).

Machine Learning vs. FEA
In order to study the accuracy and the computational feasibility of the developed ML1000 model, the optimisation is performed on a various number of motor designs, from 250 to 2000, with a step of 250 designs. The optimisation results obtained from the two parallel processes, ML1000 and FEA based optimisations for the case of 500, 1000 and 1500 optimisation designs are compared in Table 7. Here, the results obtained for geometrical parameters, together with the values of natural frequency of Mode 2, average torque, torque ripple and total losses are expressed as the percentage difference from the nominal motor values. As can be observed, the ML1000 optimisation process identifies accurate results, compared with the FEA optimisation process, for all three presented cases. A small difference between the FEA and ML1000 values, appears only for the third decimal, but this is negligible. The computational cost of both methods, ML1000 and FEA based optimisations, is identified in Figure 10. Here, the computation time is identified in function of the number of machine geometric designs taken into consideration during the optimisation process (N O ). The computational time is identified with blue for the FEA case and with red for the ML1000 model. For the machine learning model, the computation time consists of both time spent building the machine learning model and the time required for optimisation. As it can be observed, for the machine learning model, the most time consuming process is the extraction of data necessary for model identification. The cost of building the ML1000 model, based on FEA simulations, is 60 h, considering that the structural and electromagnetic FEA analysis are carried out in parallel. Once the data is available, the training time is negligible (it usually takes couple of seconds for 1000 samples), while the run time is low and considered negligible too. In the same figure, it can be noticed that the computational cost of the ML1000 model is higher than the FEA for the cases where the number of simulation used to build the model are higher than the number of optimisation designs. Starting with 1000 optimisation designs, the ML1000 model starts to be more time efficient than the FEA approach. This region is marked with a grey area in Figure 10. After this threshold, it can be noticed that the FEA cost time increases considerable with each additional design, compared with the machine learning model approach, for which the computational cost of a new design is insignificant.
In order to quantify the time reduction, a computational efficiency factor that gives the relative reduction in computational time, is introduced [47]. The computational efficiency ratio (k cr ) is defined as the ratio between the number of designs used to build the machine learning model (N D ) and the number of designs used in the machine geometric optimisation process (N O ): The saving of computational time for the proposed machine learning model is shown in Figure 11 with red. For the cases where k cr is less than 1, marked with grey area, the machine learning model is more time efficient than the FEA. The developed model presents computational feasibility over a wide range. For example, if 2000 designs are considered in the geometric optimisation process, the computational time reduction is 50%. The computational efficiency factor values for other machine learning models, trained with 250, 500 and 750 samples are identified with black, blue and green lines. For these cases, it can be observed that they start to be feasible starting with fewer optimisation designs, but the drawback is that they are not as accurate as the ML1000 model is, as presented in the previous section.

Conclusions
This paper presents a multi-physical machine learning approach for multi-attribute performance prediction of electrical motors. For that, the full process of designing the electrical motor is brought in the early-design stage, where all the characteristics (electromagnetic, motor losses, modal analysis mode-shapes and natural frequency) can be evaluated and predicted by the designer. The multi-attribute model is constructed based on the data achieved by conducting a series of 2D electromagnetic and 3D structural FE analysis on a set of motor models generated by modifying the DOFs consisting in the geometrical parameters of a base parametric motor model. Three independent machine learning models are developed for each analysed physics, one for electromagnetic torque and back emf harmonics, one for motor losses and one for the structural targets (modes and natural frequencies). The tested methods are support vector regression (SVR), gradient boosting regressor (GBR) and Gaussian process regressor (GPR) and were selected due to their capacity to predict discrete datasets. Moreover, their accuracy is tested for four training datasets, 250, 500, 750 and 1000 samples. The fitting capabilities of the regression models are individually analysed. Two types of key performance indicator (KPI) are used to measure the fitting capabilities: the coefficient of determination (R 2 ) and the mean squared error (MSE). The results show that the GPR algorithm applied on 1000 samples is the most suitable machine learning model to predict the electromagnetic torque and back electromotive force harmonics orders, its R 2 scores being higher than 0.95, while the MSE is less than 5%. For structural targets, the regressor that is able to fit the data with the best scores is the GPR model with 250 samples, R 2 score being 1 and MSE 0%. Regarding the ability of the machine learning models to predict the motor losses, the GPR 1000 samples machine learning model presents the best results, with R 2 scores of 1 and MSE values of 9%. It was demonstrated that the multi-attribute machine learning model developed by combining the best models for each physics involved can provide accurate and significant reduction in computational time, compared with the classical FEA approach.