An AI-Assisted Design Method for Topology Optimization Without Pre-Optimized Training Data

Topology optimization is widely used by engineers during the initial product development process to get a first possible geometry design. The state-of-the-art is the iterative calculation, which requires both time and computational power. Some newly developed methods use artificial intelligence to accelerate the topology optimization. These require conventionally pre-optimized data and therefore are dependent on the quality and number of available data. This paper proposes an AI-assisted design method for topology optimization, which does not require pre-optimized data. The designs are provided by an artificial neural network, the predictor, on the basis of boundary conditions and degree of filling (the volume percentage filled by material) as input data. In the training phase, geometries generated on the basis of random input data are evaluated with respect to given criteria. The results of those evaluations flow into an objective function which is minimized by adapting the predictor's parameters. After the training is completed, the presented AI-assisted design procedure supplies geometries which are similar to the ones generated by conventional topology optimizers, but requires a small fraction of the computational effort required by those algorithms. We anticipate our paper to be a starting point for AI-based methods that requires data, that is hard to compute or not available.


Introduction
In Topology Optimization (TO), the material distribution over a given design domain is optimized by minimizing a certain objective function while fulfilling specified restrictions [1].In most cases, the optimization problem is solved in a mathematical way by means of a suitable search algorithm.
The present contribution deals with the solution of TO problems by means of Artificial Intelligence (AI) techniques.State-of-the-art research in this area require optimal structures on a data basis obtained by conventional TO.For this reason, they are subject to several limitations which affect those techniques, such as large computational effort and problematic handling of multi-modal formulations.The approach proposed here aims at removing those drawbacks by generating all the artificial knowledge required for the optimization during the learning phase, with no need of relying on pre-optimized results.

Topology Optimization
In this work, only the case of mono-material topology optimization will be considered.The material of which the structure is to be build is a constant of the problem and geometry remains as unknown.
In the case of stiffness optimization, a scalar measure of structural compliance is typically chosen as the function to be minimized.In addition, the condition that a given quantity of material is used over the design domain must be fulfilled.This material quantity is expressed as fraction of the maximum possible amount of material (degree of filling).Minimization of compliance results in maximizing the stiffness.The available design domain, the static and kinematic boundary conditions for the regarded load cases as well as strength thresholds are typically considered as restrictions.
This paper will also focus on stiffness optimization, although the presented method is of general validity and could be applied to optimization with different objective functions or restrictions.
There are numerous possible approaches to TO [1].In the "Solid Isotropic Material with Penalization" (SIMP) approach according to Bendsøe [2] the design domain is divided into elements.For each of those elements the contribution to the overall stiffness of the structure is scaled with a factor to be determined.
The SIMP approach is able to provide optimized geometries for many practical cases by means of an iterative process.Each iteration involves computationally intensive operations: the most critical ones are assembling the stiffness matrix and solving the system's equation.When restrictions are involved, such as stress restrictions, the complexity of the optimization problem increases [3,4].

Artificial Neural Networks
Artificial Neural Networks (ANNs) belong to the area of Machine Learning (ML), which, in turn, are assigned to AI. ANNs are able to learn and execute complex procedures, which has led to remarkable results in recent years.For example, ANNs are able to recognize the objects shown on pictures by their shape and color or beat world champions in the board game "GO" [5,6].
The development of ANNs is progressing steadily, on the one hand due to the continuously better available computing power and on the other hand due to the discovery of new possibilities to improve the learning process.
ANNs or, more precisely, feedforward neural networks consist of layers connected in sequence.These layers contain so-called neurons [7].A neuron (see Figure 1) is the basic element of an ANN.The combination of all layers is also called a network.
The neuron receives n inputs (here given as vector z), which are linearly combined and added to a bias value b and passed as argument to an activation function fa fn(z) = fa(w T z + b).
(1) The coefficients of the linear combination, collected in the vector w, are called weights.
It is usual that several neurons have the same input.All neurons with the same inputs are grouped together in one layer (also called fully connected layer ).Since one single output is supplied by each neuron, each layer with m neurons also produces m outputs and the weights become matrices W ∈ R n×m .The outputs of a layer (except the last layer) serve as inputs for the following layer.The first layer is called input layer f (1) and the last layer is called output layer f (n L ) .Any layer whose input and output values are not accessible to the user is called hidden layer f (2,...,n L −1) .Each layer, for example the layer f (2) , has its own weights W (2) and biases b (2) .
The number of layers nL is also named depth of the network, which also originate the attribute "deep" in the term Deep Learning (DL).The term DL is generally used for ANNs with several hidden layers.The presence of several layers makes it possible to map a more complex transfer behavior between the input and the output layer.
The functional relationship realized by the ANN depends on the weights W and on the biases b, which are adjusted in the context of socalled training or learning according to certain algorithms (learning algorithms).The learning algorithm used consists in the gradient-based minimization of a scalar value termed error or loss, which is obtained from the deviations of the actual outputs from given target outputs.The values which describe the network's architecture and do not undergo any change during training, like the number of neurons in a layer, are termed hyperparameters.
In addition to the fully connected layers there are convolutional layers.These layers use convolution in place of the linear combination (1).Here the trainable weights, also called convolution kernel, are convoluted with the input of the layer and produce an output which is passed to the next layer.This process is efficient for grid-like data structures and is therefore used for many modern image applications [8].
The output of an ANN is henceforth referred to as prediction.Further details on the learning of an ANN can be found in the specific literature, for example [8,9].

ANN-based Topology Optimization
DL-based TO, by predicting the geometry through ANNs, aims to deliver optimized results in only a fraction of the time required by conventional optimization, by moving the computationally intensive part to the training algorithm, which is executed only once.The results provided by the trained ANNs can then be used directly, refined with conventional methods or adapted to the desired structure size.
There are already some attempts in this area.The majority used many thousands topology-optimized geometries as training datasets for the ANN [10,11,12,13,14].
In the case of [12] 80,000 optimized datasets based on the 88 lines of code (top88) [15] were used for the training of the neural network.
Banga et al. used an approach in which intermediate results of conventional TO are the basis for the training datasets for the ANN [16].
Nie et al. used a Generative Adversarial Network (GAN) [17] and some physical fields over the initial domain, like strain energy density or von Mises Stress, to predict the geometries.The model was trained with 49.078 conventionally topology-optimized datasets [18].
Yamasaki et al. [19] and Cang et al. [20] achieved great results using data-driven approaches.In the case of Cang two cases were trained and tested.In the first case the direction of a single load could be is changed.The second case had variable directions as well as the positions of the load.Yamasaki's ANN needed to be trained for each new boundary condition.Only the volume fraction is variable.
A different approach was presented by Chandrasekhar and Suresh, where the ANN does not generate the whole geometry but only a density value at given x and y coordinates [21].
Although such ANN topology optimization procedures are able to perform the above-mentioned task of a fast and direct generation of optimized geometries, the predictions undergo some restrictions.
Since topology-optimized training data are used, and the generation of these data with conventional methods is very time-consuming, the number of training data sets which can be considered is limited.In the case of [10] 100,000 data sets were generated and this took about 200h.Another 8h were needed for training.This limitation affects the accuracy in the prediction of unknown geometries (i.e.geometries which were not used within the training) negatively.For example in the paper [10] about 3.4 % of the generated geometries are not connected (with theoretically infinite compliance) and are therefore not usable.
This paper investigates the possibility, which differs from the state of the art, to train an ANN without the use of topology-optimized data sets.The generation of training data sets and the training itself are merged in one single procedural step.
This makes it possible to process a much larger amount of data sets for the training in a much shorter time.And since the compliance is calculated during the training the ANN learns to avoid undesirable results.
The state-of-the-art procedures require the use of large number of optimized data sets.These data sets must be optimal to be suitable as training data.Depending on the optimization formulation, this may be not the case, as local minima and convergence problems may occur.A method which doesn't use optimized data sets is not subject to these restrictions.

Method
The presented method is based on an ANN architecture called Predictor-Evaluator-Network (PEN), which was developed by the authors for this purpose.The predictor is the trainable part of the PEN and its task is to generate-based on input data sets-optimized geometries.
As mentioned, unlike the state-of-the-art methods mentioned above, no pre-optimized topology-optimized data sets are used in the training.The geometries used for the training are created by the predictor itself on the basis of randomly generated input data sets and evaluated by the remaining components of the PEN, called evaluators.
The evaluators perform mathematical operations.Other than the predictor, the operations performed by the evaluators are pre-defined and do not change during the training.
Each evaluator assesses the outputs of the predictor with respect to a certain criterion and returns a corresponding scalar value as measure of the criterion's fulfillment.This fulfillment is the loss or the error of this evaluator.A scalar function of the evaluator outputs (objective function J, see section 2.7) combines the individual losses.
During the training the objective function computed for a set of geometries (batch) is minimized by changing the predictor's trainable parameters, see section 2.2.In this way, the predictor learns how to produce optimized geometries.
The predictor, the individual evaluators, their tasks and their way of operation are explained in detail in the following sections.

Basic Definitions
In topology optimization, the design domain is typically subdivided in elements by appropriate meshing.In Figure 2, elements (with one element hatched) and nodes are visualized.In this work we examined only square meshes with equal number of rows and columns.Although this method can be used for non-square and three-dimensional geometries.

Element
The total number of elements in the 2d-case is where dy is the number of rows and dx the number of columns (see Figure 2).In the square case the number of rows and columns are equal dx = dy = d.The d 2 design variables xi {i = 1, . . ., d 2 }, termed density values, scale the contributions of the single elements to the stiffness matrix.The density has the value one when the stiffness contribution of the element is fully preserved and zero when it disappears.
The density values are collected in a vector x.In general the density values xi are defined in the interval [0, 1].In order to prevent possible singularities of the stiffness matrix, a lower limit value xmin for the entries of x is set [2]: The vector of design variables x can be transformed to a square matrix XM of order d by using the R 2d operator: Although a binary selection of the density is desired (discrete TO, material present/not present), values between zero and one are permitted for algorithmic reasons (continuous TO).To get closer to the desired binary selection of densities the so-called penalization can be used in the calculation of the compliance.The penalization is realized by an element-wise exponentiation of the densities by the penalization exponent p > 1 [22].
The arithmetic mean of all xi defines the degree of filling of the geometry The target value Mtar is the degree of filling that is to be achieved by the predictor.
The kinematic boundary conditions are stored in two (dinp + 1) × (dinp + 1) boolean matrices R k,x and R k,y .In Figure 3, which shows an overview of how the boundary condition are handled, as well as in following figures, the green arrows represent the kinematic boundary conditions and the red ones the static boundary conditions.The entries of R k,x are set to one if the x-component of the displacement in the corresponding node is fixed, and zero otherwise.Analogously, the entries of R k,y are set according to the fixed y-components of the displacements.Both matrices can be transformed into vectors with the R 1d operator, which is the inverse of operator R 2d , and then arranged in sequence so that the vector r k ∈ R (2dinp+1) 2  is created.Analogously to the kinematic boundary conditions, two (dinp + 1) × (dinp + 1) matrices Rs,x and Rs,y are firstly built on the basis of the static boundary conditions (visualized by red arrows).The x-and the y-components of the applied forces are placed, respectively, into the matrices Investigations showed that the training speed could be increased, for high-resolution geometries, by dividing the training into levels with increasing resolution.Since smaller geometries are trained several orders of magnitude faster and the knowledge gained is also used for higher resolution geometries, the overall training time is reduces compared to the training that uses only high-resolution geometries.The levels are labeled with the integer number Λ.
Increasing Λ by 1 results in doubling the number d of row or columns of the design domain's mesh.This is done by quartering the elements of the previous level.In this way, the nodes of the previous level are kept in the new level.The number of row or columns at the first level is denoted as dinp.
The input data of the predictor includes the kinematic r k and static rs boundary conditions as well as the target degree of filling Mtar.The output of the predictor is a geometry x.Input data can be only defined at the initial level and do not change while the level is changed.Hence, new nodes cannot be subject to static or kinematic boundary conditions (see Figure 4).The change of level occurs after a certain condition, which will be described later, is fulfilled.

Predictor
The predictor is in charge of creating an optimized geometry for a given input data set.Its ANN-architecture consist of multiple hidden layers, convolutional layers and an output layer with d 2 neurons (see Figure 6).As activation function fa(z) in the hidden and convolutional layers the Parametric Rectified Linear Unit (PReLU) function is being used [23].The PReLU function is the equivalent of the Rectified Linear Unit (ReLU) function [23] fReLU(z) = max(0, z) = 0 if z < 0, z otherwise, (7) with the difference of a variable negative slope α, which can be adapted during training: The sigmoid function is well suited as activation function for the output layer because it provides results in the interval (0, 1), see Figure 5.This makes the predictor's output directly suitable to describe the density values of the geometry.All parameters that can be changed during training, like the bias, the slope of the PReLU as well as the weights of the hidden layers will be generally referred to as trainable parameters in the following.They are collected in the matrix Wp.The operations performed by the predictor can be represented by a function fp:  Here the data flow through the predictor as well as output layers for different level Λ can be seen.An input data set (top left) is processed by several successive hidden layers and then passed on to some Residual Network (ResNet)-blocks.In order to reduce the resolution to a lower Λ, average pooling is used.
In Figure 6 the hidden block is the combination of a hidden or fully connected layer and an activation function call.The ResNet-block is the combination of two (convolutional) layers and a shortcut that is added as a bypass to the output of the layers.The ResNet-block allows for faster learning but also reduces the error [24].
For subsequent levels, the outputs of the last convolutional block of the previous layer and the outputs of the last hidden block of the first level are added together and then, after an additional convolutional block, converted to the desired output dimension.

Evaluator: Compliance
The task of the compliance evaluator is the computation of the global mean compliance.For this purpose, an al-gorithm based on Finite Element Method (FEM) [22] is used.The global mean compliance is defined according to [22] as with K as the stiffness matrix, f as the force vector and u as the displacement vector.The compliance has the dimension of energy.As usual in literature [22,15] in the following the units will be omitted for the sake of simplicity.
As already explained, the static boundary conditions vector rs consists first of x-entries and then y-entries.Since the degrees of freedom of the stiffness matrix are arranged in alternate way (one x-entry and one y-entry), the force vector is to be built accordingly.In order to transform the static boundary condition vector rs into the force vector f , the number of nodes and a collocation matrix where i = {0, . . ., l − 1}, iR 2(j−l)+1,j = 1 where j = {l, . . ., 2l − 1}, iR = 0 otherwise (14) are required.The force vector is then obtained as follows: The system's equations write The stiffness matrix K depends linearly on the geometry x and is expressed by where the matrices Ki are the unscaled contributions of the single elements to the stiffness matrix.The penalization exponent p achieves the desired focusing of the geometry towards the limits values xmin and 1 as described in section 2.1.
The stiffness matrix K is then reduced by removing the columns and rows corresponding to the fixed degrees of freedom according to the kinematic boundary conditions.The result is the reduced stiffness matrix K red , which then can be inverted.The reduced force vector f red is determined according to the same principle.From the reduced equation the reduced displacement vector is obtained as The reduced global mean compliance c red is finally computed as follows: The calculation of the mean global compliance c according to (12) or c red according to (20) leads to the same result, since u at the fixed degrees of freedom vanishes and therefore have no effect on c.

Evaluator: Degree of filling
The task of this evaluator is to determine the deviation of the degree of filling Mis, see (6), from the target value By considering the filling degree's deviation M in the objective function, the predictor is penalized proportionally to the extent of the deviation from the target degree of filling Mtar.

Evaluator: Filter
The filter evaluator searches for checkerboard patterns in the geometry and outputs a scalar value F ∈ [0, 1] that points to the amount and extent of checkerboard patterns detected.These checkerboard patterns consist of alternating high and low density values of the geometry.They are undesirable because they do not reflect the optimal material distribution and are difficult to transfer to real parts.These checkerboard patterns exist due to bad numerical modelling [25].
Several solutions for the checkerboard problem were developed in the framework of conventional topology optimization [26].In this work, a new strategy was chosen, which allows for inclusion of the checkerboard filter into the quality function.In the present approach, checkerboard patterns are admitted, but detected and penalized accordingly.Since the type of implementation is fundamentally different, it is not possible to compare the conventional filter method with the filter evaluator.With the matrix a two-dimensional convolution operation (discrete convolution) is performed.In detail, the convolution operation is carried out as follows: The convolution matrix V is visualized in Figure 7 This indicator would already be sufficient to exclude geometries with checkerboard patterns but also penalizes good geometries without recognizable checkerboard patterns.Therefore, an improved indicator is formed on the basis of the mean value and with the help of the e-function, which is less sensitive to small mean values but nevertheless results in a corresponding penalization for large checkerboard patterns: The parameter F k controls the shape of the F -function (see Figure 8).

Evaluator: Uncertainty
When calculating the density values of the geometry x, the predictor should, as far as possible, focus on the limit values xmin and 1 and penalize intermediate values.The deviation from this goal is expressed by the uncertainty evaluator with the scalar variable P (uncertainty).This value increases if the predicted geometry deviates significantly from the limit values and thus penalizes the predictor.The uncertainty evaluator uses the normal distribution function with σ 2 as the variance and µ as the expected value.The expected value is set to 1 2 , at which P should have the maximum.In order for P to be normalized (with x = 1 2 the function should have the value 1), the normal distribution function fg(x) is multiplied by the term The resulting function fg,n is evaluated for all elements of the geometry.The mean value of the results provides the uncertainty: The variance σ 2 determines the width of the distribution function.

Quality function and objective function
The task of the quality function is to combine all evaluator losses into one scalar.The following additional requirements must be considered: • The function should have a simple mathematical form, in order not to complicate the minimum search.
• The function must be monotonically increasing with respect to the evaluators' losses • The function contains coefficients to control the relative influence of the evaluators losses The most obvious variant fulfilling these criteria a linear combination of the losses.The problem with this choice consists in the different and variable order of magnitude of the compliance loss with respect to the other losses.For a given choice of the coefficients the relative influence of the losses changes for different parametrization and input data sets.To avoid this drawback, a quality function in the following form was chosen The addition of the constant value prevent the quality function from being dominated by one loss when its value is close to zero.
For every single dataset one value of fQ exists.Optimization on the basis of single datasets would require large computational effort and lead to instabilities of the training process (large jumps of the objective function output).Therefore, a given number bn of datasets (batch) is used and the corresponding quality function values are combined in one scalar value, which works as objective function for the optimization that rules the training.The value of the objective function ) is calculated as the arithmetic mean of the quality function values obtained for the single datasets of the batch.Investigations showed that averaging the quality function outputs over numerous data sets stabilizes the training procedure.The disadvantage of this averaging is the possibility of forming prejudices.E.g. if one element is frequently present, then its frequency is also learned, even if the element's contribution to stiffness is in some cases small or non-existent.

Training
The overview in Figure 9 describes the training process for a single level.Here, it is visible that during a batch iteration the input data sets are calculated randomly and then passed to the predictor as well as to the evaluators.Within one batch the input datasets are randomly gener- ated and the predictor creates the corresponding geometries xi.Afterwards, the quality function is computed from the evaluators losses according to (30).The objective function J is then calculated for the whole batch.
Then the gradient Gp b of the objective function with respect to the trainable parameters is calculated.The trainable parameters of the predictor for the next batch are then adjusted according to the steepest-descent criterion to decrease the value of the objective function.
When the level increases, the predictor outputs a geometry with higher resolution and the process starts again at batch b = 1.
It is important to stress that, unlike conventional topology optimization, the PEN method does not optimize the density values of the geometry, but only the weights of the predictor.

Implementation
The implementation of the presented method takes place in the programming language Python.The framework Tensorflow with the Keras programming interface (API) is used, which is well suited for programming ML algorithms in Python.Tensorflow is developed by Google and is an open source platform for the development of machine-learning applications [23].In Tensorflow, the gradients necessary for the predictor learning are calculated using Automatic Differentiation (AD), which requires the use of functions available in Tensorflow [27].The configuration of the software and hardware used for the training is shown in Table 1.The predictor's topology, with all layers and all hyperparameters, is shown in Figure 10.The chosen hyperparameters were found to be the best after numerous tests in which the deviations of the predictions from the ones obtained by conventional TO were evaluated.The hyperparameters are displayed by the shape (numerical expression over the arrow pointing outside the block) of the output matrix of a block or by the comment near the convolutional block.The label of the output arrow describes the dimensions of the output vector or matrix.The names of the elements in Figure 10, e.g."Conv2D", correspond to the Keras layer names.
The input data sets (top left) are processed by four fully connected (here termed "dense") layers, then reshaped into a three-dimensional matrix with the shape 8 × 8 × 64 and passed on to two sequential convolutional layers.Subsequently, the data gets reshaped into a vector and passed through a sigmoid activation layer.As a  result, the geometry at the first level is available.The following levels build on the previous levels.So the data from the last hidden block and the data prior to the sigmoid activation of the previous level are used by for the next layer, by transforming the outputs to the same shape and adding them together.Afterwards, the data gets reshaped into a vector and, again, passed through a sigmoid activation layer.As a result, the geometry at the next level is available.As already mentioned, the training of the predictor is based on randomly generated input data sets.All randomly chosen input data are uniformly distributed in the corresponding interval.They are generated according to the following features: • Kinematic boundary conditions r k : -Fixed degrees of freedom along the left side in x and y direction • Static boundary condition rs: -Position randomly chosen from all (dinp +1) 2 = 81 nodes (except the nodes, which have a fixed degree of freedom) of level one -Fixed magnitude rs,F • Target degree of filling Mtar: -Uniform random Mtar = {0.2,0.21, . . ., 0.8}.
The algorithms 1, 2 and 3 show the training, the trainable parameter's update and the convergence criterion code respectively.
The flow of data from the input (r k , rs, Mtar) of the ANN to the output x and the objective function J(x) is called forward propagation.
With computation and updating of the objective function's gradient with respect to the trainable parameters the information flows backward through the ANN.This backward flow of information is called back-propagation [8].Once the gradient is calculated, a trainable parameter's update is done using the learning rate η, which defines the length of the gradient step, and the adam optimizer (see algorithm 2) according to [28].
After the trainable parameters of the predictor have been updated, a new batch can be elaborated.This process will continue until a convergence criterion is fulfilled.In order to define a proper convergence criterion, the lowest objective function value J best in the current level is tracked and compared to the current objective function value J b .If the objective function value J b of one batch is not lower than J best then the integer variable ζ b (pa-Algorithm 1 Learning process, Part 1 (training) x i ← f p (r ki , r si , M tari , W p (b−1) ) 16: 18: end for 20: rough estimation, for more details see [28] tience) is increases by one, else it resets to zero: Once the patience exceeds a predefined value ζmax, termed maximal patience (see table 2), the level Λ increases or, if the maximum level was reached, the training stops (see algorithm 1 line 7).The parameters in table 2 were used for the training.

Results
The training of the predictor lasted TP = 3 h (2:56:32), which can be subdivided according to the individual levels end while 32: end for as follows: 21 %, 7 %, 45 %, 26 %.The ANN based TO geometries are similar to the results obtained by top88 according to [15] for the same input datasets.For the conventional SIMP-TO the density filter method and the parameters mentioned in table 2 were used.An example prediction is shown in Figure 11.The training history shows the progression of the objective function (see Figure 12) and of the individual evaluator losses over the number of batches (see Figure 13).The smaller batch size at higher levels produces more oscillation of the curve and therefore makes it difficult to identify a trend.For this reason, the curves shown in the figures are filtered using the exponential moving average and a smoothing factor of 0.862 [29].This filtering does not affect the original objective function and serves only for visual purposes.
The dashed vertical lines (labeled with the value of Λ)  12 and 13 show the change of level.It can be seen that after each increase of Λ, the value of the objective function increases.This can be explained by adding more weights that are randomly distributed and still untrained.
The results were validated using n = 100 randomly generated input data sets, called validation data, that were not part of the training data sets, and the corresponding optimized geometries which were conventionally calculated by the top88 available in [30].
The results of the comparison (PEN and top88) of the 100 validation data sets are summarized in the plots in Figure 14.
On average, the ANN-based TO can deliver almost the same result as the conventional method in about 8.4 ms, while the conventional topology optimizer according to Andreassen [15] requires on average 1.9 s (and is hence roughly 225 times slower), see Figure 14 a).It can also be seen that the majority of geometries generated by PEN have a compliance that is close to the geometries gener- With the help of the function the accuracy and thus the validity of the predictor can be determined and predictors with different hyperparameters can be compared.The function κ is needed because a single indicator is not sufficient to determine the accuracy of the predictor.So it is possible that the accuracy κ0.01 is low and the errors mae and mse are also small at the same time.Since these error indicators concentrate on different kinds of differences, the average κ of those is a more meaningful indicator.The examples in Figure 15 show that the predictor can deliver geometries that are similar to conventional method as well as some weaknesses.For example, in some cases the geometries are noisy and contain undesirable elements, which do not contribute to the stiffness (see Figure 15, column two or four).This may be improved by an appropriate choice of hyperparameters of the predictor and by adapting the quality function.Also included in this figure is a row of conventionally topology optimized geometries using different parameters.For all sample geometries in Figure 15 the compliance is reported under the geometry diagram.For the ANN TO generated geometries the evaluator losses are summarized in table 3.  From the data in the table 4 it can be seen that in the examined cases 79.7 % of the elements of the geometries obtained with the PEN method have density differences of less than 1% as compared to the conventionally optimized geometries.

Computing time comparison
As mentioned in section 3.2 the PEN method is by orders of magnitude faster than top88.However, the predictor profits from a computationally intensive training.So it is interesting to attempt a comparison which takes into account the training time.
The PEN computing time for a single geometry tPEN, including its share of training time, obviously depends from the number of geometries predicted ep on the basis of one single training process: where tP is the computing time per single geometry and Tp is the training time.The Break Even Point (BEP) is given by the number of predictions eBEP for which both methods require the same time (including training time contribution).To calculate the BEP tPEN is set equal to tTO, which is the computing time to optimize a single geometry using the conventional method.It results: The table 5 shows the computing times of the different methods as well as the BEP.tp tTO eBEP 8.4 ms 1.9 s 5612

Table 5: Time comparison of conventional and PEN based methods
When evaluating the results of this comparison, the following points should be considered: • Due to the fact that tp tTO the BEP, for a given reference method, essentially depends on the training time.

Online
Due to the ability to quickly get the optimized geometry by the predictor, the ANN-based TO can be executed online in the browser.Under the address: https://www.tu-chemnitz.de/mb/mp/forschung/ai-design/TODL/it is possible to perform investigations with different degrees of filling as well as static boundary conditions.

Conclusion
In this paper, a method was presented that makes it possible to realize a topology optimizer using deep learning.
The ANN in charge of generating topology-optimized geometries does not need any pre-optimized data sets for the training.The generated geometries are in most cases very similar to the results of conventional topology optimization according to Sigmund or Andreassen.
This topology optimizer is much faster, due to the fact that the computing-intensive part is shifted into the training.After the training, the Artificial Neural Network based topology optimizer is able to deliver geometries which are nearly identical to the ones generated by conventional topology optimizers.This is achieved by using a new approach, the Predictor-Evaluator-Network (PEN) approach.PEN consists of a trainable predictor, which is in charge of generating geometries, and evaluators, which have the purpose of evaluating the output of the predictor during the training.
The method was tested up to an output resolution of 64 × 64.The optimization of the computational efficiency of the training phase was not the first priority of this project since the training is performed just once and therefore affects the performance of the method only in a limited fashion.A critical step is the calculation of the displacements in the compliance evaluator.The use of faster algorithms (e.g.sparse solvers) could remove the mentioned limitations.One improving option could consist in implementing the compliance evaluator as an ANN itself and thus making it faster and more memoryefficient.This would make possible to cope with finer resolutions or to learn with much larger batch sizes and thus with more training data in the same time.
The results of the PEN method are comparable to the ones of the conventional method.However, the PEN method could prove superior in handling applications and optimization problems of higher complexity, such as stress limitations, compliant mechanisms and many more.This expectation is related to the fact that no optimized data are needed.All methods which process pre-optimized data suffer from the difficulties encountered by conventional optimization while managing the above-mentioned problems.Because the PEN method works without optimized data, it can also be applied to problems that have no optimal solutions or solutions that are hard to calculate, like the fully stressed truss optimization.
Up to now, variable kinematic boundary conditions were not tested.This will be done in future research, together with resolution improvement, application to threedimensional design domains and consideration of nonlinearities and restrictions.

Figure 1 :
Figure 1: Graphical representation of a single neuron

Figure 3 :
Figure 3: Matrix representation of a) kinematic boundary conditions and b) static boundary conditions

Figure 4 :
Figure 4: Nodes and elements for different levels Λ.The green arrows represent the kinematic boundary conditions.The red arrows represent the static boundary condition.

Figure 7 :
Figure 7: Left -sample geometry, right -convolutional matrix V where the geometry has checkerboard patterns.A first indicator can be computed as mean value of the convolution matrix: v = 1 (d − 2) 2 d−2

Figure 8 :
Figure 8: Influence of factor F k on filter calculation

Figure 11 :
Figure 11: Sample geometry (Left: ANN-based TO by using the PEN-Method; Right: 88 lines of code [15])

Figure 14 :
Figure 14: Computing time and compliance comparison

Figure 15 :
Figure 15: Additional sample geometries a) Deep Learning Topology Optimization b) validation data

•
The training time in turn depends on the convergency condition.Within the framework of this project an extensive study about the proper of choice of convergency criterion could not be made.The present choice allowed for good results.It can be expected that the training time could be reduced after a targeted study in this sense.• Of course the training time also depends on the hardware used for training.By using a high performance hardware the training time, and so the BEP can be strongly reduced without effecting the versatility of the method in everyday use.• This comparison does not include a study of the effect of the problem size (number of design variables).

Table 3 :
Summary of evaluator losses (same examples as in Figure15)

Table 4 :
Summary of indicators