Modeling Properties with Artificial Neural Networks and Multilinear Least-Squares Regression : Advantages and Drawbacks of the Two Methods †

The mean molecular connectivity indices (MMCI) proposed in previous studies are used in conjunction with well-known molecular connectivity indices (MCI) to model eleven properties of organic solvents. The MMCI and MCI descriptors selected by the stepwise multilinear least-squares (MLS) procedure were used to perform artificial neural network (ANN) computations, with the aim of detecting the advantages and limits of the ANN approach. The MLS procedure can replicate the obtained results for as long as is needed, a characteristic not shared by the ANN methodology, which, on the one hand increases the quality of a description, and on the other hand also results in overfitting. The present study also reveals how ANN methods prefer MCI relatively to MMCI descriptors. Four types of ANN computations show that: (i) MMCI descriptors are preferred with properties with a small number of points, (ii) MLS is preferred over ANN when the number of ANN weights is similar to the number of regression coefficients and, (iii) in some cases, the MLS modeling quality is similar to the modeling quality of ANN computations. Both the common training set and an external randomly chosen validation set were used throughout the paper.


Introduction
Recently [1], the mean molecular connectivity indices (MMCI) were introduced to model eleven properties of organic solvents.The multilinear least-squares (MLS) used to derive the quantitative structure-property relationships (QSPR) showed that three out of eleven properties, the refractive index (RI), the flash points (FP), and the ultraviolet cutoff values (UV), were modeled with the MMCI while the remaining properties were modeled with the well-known molecular connectivity indices (MCI).The MMCI indices are also centered on the basic concepts of the delta, valence delta, I-and S-indices that go back to the origins of the molecular connectivity theory [2][3][4][5][6][7].Results from two other recent studies that used semiempirical sets of descriptors [8,9] showed that the artificial neural network (ANN) model with a variable number of hidden neurons chosen by the software improves the quality of a QSPR obtained with the aid of the multilinear least-squares (MLS) methodology, also known as multilinear regression (MLR).Nevertheless, this improvement is somewhat artificial as the ANN computations for the eleven properties employed a number of weights, due to the presence of Appl.Sci.2018, 8, 1094 2 of 19 more than one hidden neuron, much greater than the number of weights or regression coefficients in the MLS procedure.This fact can provide poor results when new data are to be predicted.This is called overfitting, and it can be avoided by guiding the training process after the predictions in a test set, by more general regularization techniques, or by dropout of the hidden neurons.
A scheme of the work is depicted in Figure 1.Data consisting of eleven physicochemical properties of solvents were randomly split into train (TR) and evaluation (EV) sets.Molecular descriptors were calculated, as explained in section Materials and Methods, for every molecule.MLS computations performed with the train set ended up choosing the best descriptors among the set of given descriptors.These best descriptors were used to perform the Multilayer Perceptron ANN (ANN-MLP) computations.To avoid overfitting, during its training process the ANN randomly selects test sets (TE) within the original TR set.Finally, the models obtained by each method are applied, for external validation, to the evaluation (EV) set.It should be underlined that the evaluation set is common to every type of computation.
Appl.Sci.2018, 8, x FOR PEER REVIEW 2 of 17 methodology, also known as multilinear regression (MLR).Nevertheless, this improvement is somewhat artificial as the ANN computations for the eleven properties employed a number of weights, due to the presence of more than one hidden neuron, much greater than the number of weights or regression coefficients in the MLS procedure.This fact can provide poor results when new data are to be predicted.This is called overfitting, and it can be avoided by guiding the training process after the predictions in a test set, by more general regularization techniques, or by dropout of the hidden neurons.A scheme of the work is depicted in Figure 1.Data consisting of eleven physicochemical properties of solvents were randomly split into train (TR) and evaluation (EV) sets.Molecular descriptors were calculated, as explained in section Materials and Methods, for every molecule.MLS computations performed with the train set ended up choosing the best descriptors among the set of given descriptors.These best descriptors were used to perform the Multilayer Perceptron ANN (ANN-MLP) computations.To avoid overfitting, during its training process the ANN randomly selects test sets (TE) within the original TR set.Finally, the models obtained by each method are applied, for external validation, to the evaluation (EV) set.It should be underlined that the evaluation set is common to every type of computation.The aim of the present work is to pin down the real advantages and the drawbacks of the ANN methodology, and apply it to the model of the eleven properties of [1] where either MCI or MMCI are used as the descriptors.Four different types of ANN computations are here performed to detect the level of achieved improvement, if any, (a) with one hidden neuron, (b) with a pre-fixed number of hidden neurons, (c) with a variable number of hidden neurons chosen by the software, and (d) with a minor number of descriptors for the one hidden neuron case.This last case attempted to render the number of ANN weights equal to the number of MLS weights.It also monitored if ANN computations preferred either MCIs or MMCIs for modeling purposes.The descriptors for the eleven properties are those of [1]; however, whenever a property was not satisfactorily modeled by the given MCI (or MMCI) the second or third best MCI (or MMCI) was chosen.The domain of applicability of the models presented here includes substances that have been used as solvents without any other chemical restrictions.The aim of the present work is to pin down the real advantages and the drawbacks of the ANN methodology, and apply it to the model of the eleven properties of [1] where either MCI or MMCI are used as the descriptors.Four different types of ANN computations are here performed to detect the level of achieved improvement, if any, (a) with one hidden neuron, (b) with a pre-fixed number of hidden neurons, (c) with a variable number of hidden neurons chosen by the software, and (d) with a minor number of descriptors for the one hidden neuron case.This last case attempted to render the number of ANN weights equal to the number of MLS weights.It also monitored if ANN computations preferred either MCIs or MMCIs for modeling purposes.The descriptors for the eleven properties are those of [1]; however, whenever a property was not satisfactorily modeled by the given MCI (or MMCI) the second or third best MCI (or MMCI) was chosen.The domain of applicability of the models presented here includes substances that have been used as solvents without any other chemical restrictions.

Descriptors
Table 2 shows the molecular connectivity χ indices, the molecular pseudoconnectivity ψ indices (pseudo-MCI), and the dual connectivity and pseudoconnectivity indices (Dual MCI, pseudo-MCI) used throughout this study.Three new indices were used: ∆ = Σ EA n EA , Σ = Σ EA <S EA >, and T Σ/M = Σ 3 /M 1.7 (M = molar mass); ∆ encodes the number of electronegative atoms (n EA ), Σ encodes the sum of the S-State index for the electronegative atoms, N, O, F, Cl, Br (<S EA > is the average value for a specific atom).Table 3 shows the definitions of the MMCI (the first M stands for "mean"), which are based on averages of vertex invariants.The original Stolarsky's mean has a minus in the denominator here replaced by a plus to avoid zeroing the denominator due to equal δ i and δ j , although it is known that the limit of this function when δ i tends to δ j is finite.The present mean is a kind of pseudo-Stolarsky mean.
n is the number of atoms, ij means corresponds to σ bond, µ is the cyclomatic number.
These two tables summarize the pool of descriptors used throughout this study: n is the number of atoms in a molecule, i = 1 to n denotes the atoms of a molecule, ij denotes directly σ-bonded atoms, while p is assigned the value n in Table 3. Replacing δ with the valence delta, δ v , in Table 2, allows the corresponding valence MCI, {D v , 0 χ v , 1 χ v , χ v t , 0 χ d v , 1 χ d v , 1 χ s v }, to be obtained; replacing the Intrinsic-I-State with the Electrotopological S-State index the corresponding pseudoconnectivity electrotopological indices are obtained, { S ψ E , 0 ψ E , 1 ψ E , T ψ E , 0 ψ Ed , 1 ψ Ed , 1 ψ Es } [3][4][5][6][7][8][9].This subject is further elucidated in the Appendixs A and B. Replacing in Table 3 δ, with δ v , I and S three other subsets of MMCI: the valence, Ho M E , L M E , St M E } MMCI, respectively, are obtained.Because some S values can be negative (highly electropositive atoms) to avoid imaginary S-State MMCI values, a rescaling of the S value is undertaken as it is explained in [1].Summing up, we have thirty-one MCI and thirty-six MMCI.Every index was obtained with a visual basic home-made program that runs on a normal PC that uses both adjacency and distance matrices [6].

Multilinear Least-Squares Regression
The stepwise multilinear least-squares (MLS) procedure of Statistica 8 that searches the whole combinatorial space built by the descriptors was used to find the best set of indices, either MCI or MMCI, for the training compounds of Table 1.They were then used to evaluate the left-out compounds (EV, those with ( • ) in Table 1, ~30% of all compounds, 25% for El).These best descriptors were also used for the ANN computations.To model the dipole moments, indices were multiplied by a two-valued symmetry factor, φ = 0, 1, i.e., φ•[MCI or MMCI] = 0 or φ•[MCI or MMCI] = [MCI or MMCI], where zero is used for the symmetric molecules with µ = 0.The choice for the number of indices of a relationship was performed bearing in mind that the ratio of data points to the number of variables should be higher or equal to five and should provide a correlation coefficient r > 0.84, i.e., r 2 > 0.70 [10].External validation was performed for all types of model (ANN inclusive) with the set of evaluation points (EV) by adding them to check the prediction ability of the overall model.Broadly speaking, the models show robustness when 30-25% cases (the EV set) are advantageously added to complete the model.

Multi-Layer Perceptron-Artificial Neural Networks
ANN methods [11,12] that can perform regression and data validation carry out both tasks in a non-parametric way that makes no assumption regarding the relationship between y and x, where y = f(x).This means that the function Property = f(indices) is not known a priori.In short, a non-parametric model is a kind of black box that tries to discover the mathematical function that can approximate the relationship between the indices and the property well enough.It uses highly flexible transfer functions with adaptable parameters that can model a wide spectrum of functional relationships.The activation functions for both hidden and output nodes used in Statistica 8 are: identity (i), logistic sigmoid (l), hyperbolic tangent (t), sine (s), and exponential (e).
ANN results were obtained with the built-in utility of Statistica 8-the multilayer perceptron neural network (MLP-ANN).This network has three-layered feedforward architecture with unidirectional full connections between successive layers (Figure 2) and error backpropagation (or backprop).The three layers are: input units → hidden units → output units Units are also known as neurons or nodes, in our case input units correspond to our variables, i.e., variables (MCI or MMCI) → hidden units → P The only output unit, here, is the targeted property, P. In the present study the number of variables corresponds to the number of MCI or MMCI descriptors.Each neuron, or node, in a layer connects to every neuron in the next layer.The connections between neurons are the weights that determine the values assigned to the nodes.There exist additional weights assigned to the bias values that act as node value offsets; therefore, the resulting number of weights is: The given ANN scheme let us notice that if a weight is added to a hidden node the connections become seven.With five input nodes and seven hidden nodes [a 5-7-1 network] the weights become fifty.The weights adjusted by the training process are initially random and are handed over to all nodes of the following layer.The training process is iterative, and each iteration is called an epoch.Technically, the number of epochs is not definitive and it cannot be held as an unfailing parameter (it can exceed the given number).The weights are slightly varied in each epoch to minimize the sum-of-squares error function: SOS = Σi=1−N (Piclc − Pi) 2 , where Piclc (clc = calculated) is the ith predicted value (network outputs) of the property, and Pi is the target value.This function is the sum of differences between the prediction outputs and the target defined over the entire training set of points (compounds) N. Statistica 8 allows setting the number of networks to train and retain (Ntr/Nre).Two sets of values are here imposed: Ntr/Nre = 10 3 /200 and Ntr/Nre = 10 5 /200.In the corresponding tables only Ntr is shown as Nre is constant.The ANN network of Statistica 8 is optimized with the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm to ensure a fast convergence rate [13,14].
Statistica 8, as a rule, sets by default the number of hidden nodes between 3 and 11.Nevertheless, as already told, we perform four procedures (for the 4th procedure see later on): (i) first a single hidden node, then (ii) hidden nodes from two to twelve are sequentially tried 'by hand' (i.e., the program is not allowed to change the imposed number of hidden nodes), and, finally, (iii) the program chooses the number of hidden nodes.To come as close as possible to the MLS results, it was decided (iv), to compute again the one hidden neuron case where either one or two indices with the lowest sensibility value have been deleted.In this case, for instance, the number of weights for the 4-1-1 case of Tb is 7, and it equals the number of correlation coefficients from the MLS calculations with six indices.Data required no normalization by the user, since the program performs this automatically.
Since the MLS procedure optimizes a number of regression parameters equal to the number of variables plus one (the bias parameter), a practical comparison between the two methods should only be performed when ANN uses no hidden neurons.In this case, the number of ANN weights equals the number of MLS parameters.One should expect that with a growing number of hidden neurons, the model of a property should constantly improve due to the growing number of weights for each variable (akin having a variable with many different weights).With ANN it is usually the case that the model becomes exceedingly good with a growing number of weights, and this frequently results in overfitting with exceedingly poor prediction for the external values.The choice of training (TR = 80% of the values in Table 1, excluding the externally validated compounds) and test sets (TE = 20% of the values, the underlined bold values in this Table) usually prevents overfitting.In fact, the network is repeatedly trained for a number of cycles so long as the test error is on the decrease, otherwise the training is halted.This method, known as the 'early stopping' procedure [12], avoids the trap that the program will always choose the maximum number of The given ANN scheme let us notice that if a weight is added to a hidden node the connections become seven.With five input nodes and seven hidden nodes [a 5-7-1 network] the weights become fifty.The weights adjusted by the training process are initially random and are handed over to all nodes of the following layer.The training process is iterative, and each iteration is called an epoch.Technically, the number of epochs is not definitive and it cannot be held as an unfailing parameter (it can exceed the given number).The weights are slightly varied in each epoch to minimize the sum-of-squares error function: SOS = Σ i=1−N (P iclc − P i ) 2 , where P iclc (clc = calculated) is the ith predicted value (network outputs) of the property, and P i is the target value.This function is the sum of differences between the prediction outputs and the target defined over the entire training set of points (compounds) N. Statistica 8 allows setting the number of networks to train and retain (Ntr/Nre).Two sets of values are here imposed: Ntr/Nre = 10 3 /200 and Ntr/Nre = 10 5 /200.In the corresponding tables only Ntr is shown as Nre is constant.The ANN network of Statistica 8 is optimized with the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm to ensure a fast convergence rate [13,14].
Statistica 8, as a rule, sets by default the number of hidden nodes between 3 and 11.Nevertheless, as already told, we perform four procedures (for the 4th procedure see later on): (i) first a single hidden node, then (ii) hidden nodes from two to twelve are sequentially tried 'by hand' (i.e., the program is not allowed to change the imposed number of hidden nodes), and, finally, (iii) the program chooses the number of hidden nodes.To come as close as possible to the MLS results, it was decided (iv), to compute again the one hidden neuron case where either one or two indices with the lowest sensibility value have been deleted.In this case, for instance, the number of weights for the 4-1-1 case of T b is 7, and it equals the number of correlation coefficients from the MLS calculations with six indices.Data required no normalization by the user, since the program performs this automatically.
Since the MLS procedure optimizes a number of regression parameters equal to the number of variables plus one (the bias parameter), a practical comparison between the two methods should only be performed when ANN uses no hidden neurons.In this case, the number of ANN weights equals the number of MLS parameters.One should expect that with a growing number of hidden neurons, the model of a property should constantly improve due to the growing number of weights for each variable (akin having a variable with many different weights).With ANN it is usually the case that the model becomes exceedingly good with a growing number of weights, and this frequently results in overfitting with exceedingly poor prediction for the external values.The choice of training (TR = 80% of the values in Table 1, excluding the externally validated compounds) and test sets (TE = 20% of the values, the bold values in this Table) usually prevents overfitting.In fact, the network is repeatedly trained for a number of cycles so long as the test error is on the decrease, otherwise the training is halted.This method, known as the 'early stopping' procedure [12], avoids the trap that the program will always choose the maximum number of hidden nodes.Each property shows an optimal number of nodes, which rarely corresponds with its maximum number.

Results
The results of the five procedures, one MLS and four ANN, are shown in Tables 4-9.Table 4 collects the MLS results for the eleven properties.In this table, in parenthesis the errors of the regression coefficients are given in vector form (±signs have been omitted, 2nd line of each cell, 2nd column).
The training set for the elutropic value (El) includes pentane and tetrahydrofuran.
Tables 5-8 collect the different ANN-MLP results for the set of variables (descriptors, either MMCI or MCI) of Table 4.In these tables, the first column gives the δ v type (see Appendix A), and the number of networks to train, Ntr = 10 3 or 10 5 (when both numbers gave rise to similar results Ntr = 10 3 was preferred), while the number of networks to retain is always 200.The activation functions together with the neuronal architecture are in the second column of Tables 5-8.In this column, 3rd line, the number of epochs for which the ANN-MLP calculation runs are shown for each property.In the third column is the set of variables together with their statistics.In this column, second line, are shown the sensitivities.These values come from the sensitivity analysis that quantifies the importance of the input variables of the models.The r 2 and s, statistics were obtained with the EXCEL spreadsheet plotting the observed property, P, vs. the calculated one, P clc , once for the training and test compounds, N(aTR + bTE), and the second time for the training + test + evaluated compounds, N(+cEV), where a, b, and c are the number of points (i.e., compounds).We remind the reader that the MLS procedure has no test compounds, only training compounds, N(TR).No ANN weights are shown, due to their exceeding number, and because every time an ANN-MLP runs, different weights and sensitivity values are obtained.

Plots
Figures 3-5 display the normal and residual plots of those properties that give rise to the best models and that also show optimal statistics for the evaluated points (given in the captions).All these plots follow the statistics shown in Table 9, 3rd column 2nd line.The structure and importance of this type of plots was discussed in [16,17].

Plots
Figures 3-5 display the normal and residual plots of those properties that give rise to the best models and that also show optimal statistics for the evaluated points (given in the captions).All these plots follow the statistics shown in Table 9, 3rd column 2nd line.The structure and importance of this type of plots was discussed in [16,17].

Plots
Figures 3-5 display the normal and residual plots of those properties that give rise to the best models and that also show optimal statistics for the evaluated points (given in the captions).All these plots follow the statistics shown in Table 9, 3rd column 2nd line.The structure and importance of this type of plots was discussed in [16,17].

Plots
Figures 3-5 display the normal and residual plots of those properties that give rise to the best models and that also show optimal statistics for the evaluated points (given in the captions).All these plots follow the statistics shown in Table 9, 3rd column 2nd line.The structure and importance of this type of plots was discussed in [16,17].

Discussion
For the ease of discussion and interpretation the most important and detailed statistical results collected through Tables 4-7 are summarized in Table 9.Table 8 shows a special case that will be discussed later on.While Tables 4-7 collect the detailed information about the modeling of the eleven properties, and especially about the type of indices, valence deltas, and structure of the ANN computations, Table 9 gives direct information about the different models.
Looking for MMCI indices (letter M), in MLS, they are optimal for three properties: refractive index RI, flashpoints FP, and cutoff UV.
In ANN computations with one hidden neuron, (ANN 1HN, Table 5), these are instead important descriptors for cutoff UV, flashpoints FP, and elutropic values El.It seems that properties with less training points are better modelled by MMCIs.Concerning the statistical results for the training compounds, ANN 1HN (Table 9, 1st line) improves over MLS for T b , and El properties, while it lays behind for −χ•10 6 , otherwise results are rather similar.With the whole set of compounds (Table 9, second line); i.e., with training (and test with ANN)-plus evaluated compound ANN 1HN calculations improve again over MLS for T b , and El, while they stay behind with ε, γ, UV, and −χ•10 6 .
As soon as the number of hidden neurons grows either by external choice, enHN (Table 6), or by software choice, snHN (Table 7), MMCIs are optimal descriptors only for Elutropic values (silica) El, which is the property with the lowest number of points.
The multiple hidden neuron case shows that, at the training level ANN enHN (Table 9), things improve consistently over the two previous cases (MLS and ANN 1HN) for T b , ε, d, RI, γ, FP, µ, and UV.For −χ•10 6 , ANN with several hidden neurons improves with respect to ANN 1HN, and for El there is an improvement only in relation to MLS (Table 4).Results for viscosity, η, are rather similar throughout the three cases.Mostly, improvement concerns both the r 2 and the s statistics.Concerning the whole set of compounds (TR, TE and EV) statistics improve in relation to the two previous cases (MLS and ANN 1HN) for T b , ε, γ, FP, µ, and UV.
The advantage of the ANN over the MLS procedure in general is not striking in the eleven properties.In fact, with the only exception of the training plus test for the −χ•10 6 property it does not achieve any useful improvement.
Normally, for an optimal modeling the number of hidden neurons that are externally chosen (ANN enHN, Table 9) is smaller than the number of hidden neurons chosen by the software (ANN snHN, Table 9).In some cases, it is much smaller, like for T b (an extreme case), d, and γ.Furthermore, ANN snHN statistics are either worse or similar to the ANN enHN ones.This means that if you intend to let the software choose the number of hidden neurons then it is better that you stick to the MLS modeling.Could that depend on the ANN initial weights considered?Probably even if it seems a general trend; i.e., it shows up with nearly all properties.
The MLS results compare rather well with the ANN 1HN results even if the ANN computations have a number of weights bigger (by two) than the number of regression coefficients of the corresponding MLS computations.Thus, we decided to perform ANN calculations by deleting the two indices with the lowest sensibility values in Table 5.In those cases where deletion of two indices gives rise to poor modeling, we deleted only one index.In this last case, the number of weights is no longer equal (actually it is bigger by one) to the number of regression coefficients or weights of the MLS case.Results are shown in Table 8, and, as the reader can notice, four properties, γ, UV, −χ•10 6 , and El, do not show up due to poor modeling, while for properties d, FP, and µ, only one index was deleted.We also notice that the dipole moment, µ, does not obey the lowest sensibility rule (see Table 5) as following this rule we should have deleted T Σ/M index.Now, deletion of this index gives rise to a poor modeling for the dipole moment.This confirms that sensibility values change from run to run, like the weights, and they are not guidance for the absolute importance of an index, but only for its importance within a given model.The statistics here are usually not as good as in the MLS case (Table 4), with a clear and amazing exception, the modeling of the whole set of compounds for the dielectric constant, ε.Looking only at the training plus test modeling, we would simply discarded this modeling.Nevertheless, the very good modeling of the evaluated compounds helps to improve the overall model for this property.Thus, (i) before throwing away some training plus test ANN or MLS results, re-evaluate and do not forget that (ii) a very good ANN modeling may hiding somewhere.
All this comes back to the random assigning of the initial weights in ANN computations, which renders it difficult to reproduce values that seem to show up by chance.Tables 5-8 tell also that there is no fixed preferential value for the parameter Ntr (numbers of networks to train).Usually, different Ntr values give rise to rather similar statistical parameters.
Generally, the addition of the EV set does not greatly affect the overall quality of the models, showing their robustness in most cases.The differences in r 2 are not greater than 0.5, as a rule.Some exceptions are the MLS models for FP and µ, and the ANN ones for ε, γ, and µ.
Concerning the most used values for δ v , Tables 4-7 show that the δ v ppo configuration is preferred, especially throughout the nHN cases (Tables 6 and 7).This choice means a strong dependence on the core electrons for higher row atoms (see Appendix A).Regarding the exponent of the fractional term in δ v (see Appendix A), the most used values are 1, −0.5; i.e., strong hydrogen atom dependence-and 50; i.e., no hydrogen atom dependence.The strong hydrogen dependence of δ v tells us that the hydrogen atoms should not be neglected.
Plots of Figures 3-6 exemplify the best models obtained from the given properties.These four properties, T b , d, RI, and η (Vis) show the best statistics for the set of points evaluated.The residual plots, nevertheless, remind us that the models achieved could be further improved since the evaluated points are not placed symmetrically around the zero line, as required in a perfect model.In the graphs of Figures 4 and 5 a point appears away from the remaining points, which could anchor the regression line.However, the corresponding residual plots show that this is not the case, since their residuals are not insignificant.This is due to the large number of values concentrated in the cloud of the remaining points.
case (Table 4), with a clear and amazing exception, the modeling of the whole set of compounds for the dielectric constant, ε.Looking only at the training plus test modeling, we would simply discarded this modeling.Nevertheless, the very good modeling of the evaluated compounds helps to improve the overall model for this property.Thus, (i) before throwing away some training plus test ANN or MLS results, re-evaluate and do not forget that (ii) a very good ANN modeling may hiding somewhere.
All this comes back to the random assigning of the initial weights in ANN computations, which renders it difficult to reproduce values that seem to show up by chance.Tables 5-8 tell also that there is no fixed preferential value for the parameter Ntr (numbers of networks to train).Usually, different Ntr values give rise to rather similar statistical parameters.
Generally, the addition of the EV set does not greatly affect the overall quality of the models, showing their robustness in most cases.The differences in r 2 are not greater than 0.5, as a rule.Some exceptions are the MLS models for FP and μ, and the ANN ones for ε, γ, and μ.
Concerning the most used values for δ v , Tables 4-7 show that the δ v ppo configuration is preferred, especially throughout the nHN cases (Tables 6 and 7).This choice means a strong dependence on the core electrons for higher row atoms (see Appendixes A and B).Regarding the exponent of the fractional term in δ v (see Appendixes A and B), the most used values are 1, −0.5; i.e., strong hydrogen atom dependence-and 50; i.e., no hydrogen atom dependence.The strong hydrogen dependence of δ v tells us that the hydrogen atoms should not be neglected.
Plots of Figures 3-6 exemplify the best models obtained from the given properties.These four properties, Tb, d, RI, and η (Vis) show the best statistics for the set of points evaluated.The residual plots, nevertheless, remind us that the models achieved could be further improved since the evaluated points are not placed symmetrically around the zero line, as required in a perfect model.In the graphs of Figures 4 and 5 a point appears away from the remaining points, which could anchor the regression line.However, the corresponding residual plots show that this is not the case, since their residuals are not insignificant.This is due to the large number of values concentrated in the cloud of the remaining points.

Conclusions
The first interesting result of the present ANN-MLP computations is that MCIs are preferred over MMCIs, especially with properties with a relatively high number of points.In fact, only El, with a minimum number of points, is usefully described with MMCI when ANN-MLP with more than one hidden neuron is performed.
The second result suggests that for the properties given it is better to impose from the outside the number of hidden neurons.
The third result shows that, with some exceptions, ANN-MLP improves on MLS calculations, even if the improvement is not dramatic.

Conclusions
The first interesting result of the present ANN-MLP computations is that MCIs are preferred over MMCIs, especially with properties with a relatively high number of points.In fact, only El, with a minimum number of points, is usefully described with MMCI when ANN-MLP with more than one hidden neuron is performed.
The second result suggests that for the properties given it is better to impose from the outside the number of hidden neurons.
The third result shows that, with some exceptions, ANN-MLP improves on MLS calculations, even if the improvement is not dramatic.
One of the great advantages of MLS computation is that its statistical results are reproducible, no matter how many times the calculations are repeated with the same indices, the same results are obtained.The ANN-MLP results can seem, instead, as non-reproducible since the weights of the ANN-MLP calculations start with random values, and the minimization procedure usually ends up with different values from run to run.Furthermore, as a rule, different ANN-MLP computations end up in different local minima.However, it must be pointed out that repeating the training process by setting up the same procedure, by using the same seed, randomization the algorithm and precision, with the same data sets, the resulting model would be the same.
ANN-MLP results obtained with one hidden neuron either with the full set of descriptors (Table 5), or with a reduced set of descriptors (Table 8) confirm the validity of the MLS calculations.The asymmetry of the evaluated points around the zero line of the residual plots, reminds us that things might be further improved either with other types of ANN-MLP calculations or with new types of descriptors.
These results indicate that MLS models should be preferred, except when it is necessary to reach a given quality in the predictions that is only achievable with ANN-MLP models.
The present study also tells us that it is worth considering the hydrogen atoms when performing the calculations to derive the MCIs or the MMCIs, as in many cases they help to improve the quality of a model both in the MLS and ANN-MLP computations.

Figure 1 .
Figure 1.Flow chart of the methodology followed throughout the present work.

Figure 1 .
Figure 1.Flow chart of the methodology followed throughout the present work.

( 1 Figure 2 .
Figure 2.An ANN scheme with an input node (in), a bias node (b), a hidden node (hn), and an output node (on).

Figure 2 .
Figure 2.An ANN scheme with an input node (in), a bias node (b), a hidden node (hn), and an output node (on).

Table 2 .
Definition of the Molecular Connectivity Indices (MCI).Replacing δ with δ v and I with S the corresponding valence, χ v , I-State, ψ I , and E-State, ψ E , MCIs are obtained.

Table 3 .
Definition of the Mean Molecular Connectivity Indices (MMCI).Replacing δ with δ v , I, and with S the respective valence (M v ), I-State (M I ), and E-State (M E ) MMCIs are obtained.

Table 4 .
Best set of descriptors for the properties of Table1with the multilinear least-squares (MLS) methodology.1st column: δ v type for the valence-dependent indices.2nd column: set of descriptors and their statistical quality.

Table 5 .
ANN results with descriptors of Table4with one hidden neuron.1st column: the δ v -type and the Ntr value; 2nd col.: ANN-MLP architecture, abbreviations for the activation functions for the internal layers, the number of epochs, and training and test errors; 3rd col.: input indices, sensitivity values, and statistical parameters for the training plus test sets, a[N(aTR + bTE)], and plus the evaluation set, [N(+cEV)].