Net-Net AutoML Selection of Artiﬁcial Neural Network Topology for Brain Connectome Prediction

: Brain Connectome Networks (BCNs) are deﬁned by brain cortex regions (nodes) interacting with others by electrophysiological co-activation (edges). The experimental prediction of new interactions in BCNs represents a di ﬃ cult task due to the large number of edges and the complex connectivity patterns. Fortunately, we can use another special type of networks to achieve this goal—Artiﬁcial Neural Networks (ANNs). Thus, ANNs could use node descriptors such as Shannon Entropies (Sh) to predict node connectivity for large datasets including complex systems such as BCN. However, the training of a high number of ANNs for BCNs is a time-consuming task. In this work, we propose the use of a method to automatically determine which ANN topology is more e ﬃ cient for the BCN prediction. Since a network (ANN) is used to predict the connectivity in another network (BCN), this method was entitled Net-Net AutoML. The algorithm uses Sh descriptors for pairs of nodes in BCNs and for ANN predictors of BCNs. Therefore, it is able to predict the e ﬃ ciency of new ANN topologies to predict BCNs. The current study used a set of 500,470 examples from 10 di ﬀ erent ANNs to predict node connectivity in BCNs and 20 features. After testing ﬁve Machine Learning classiﬁers, the best classiﬁcation model to predict the ability of an ANN to evaluate node interactions in BCNs was provided by Random Forest (mean test AUROC of 0.9991 ± 0.0001, 10-fold cross-validation). Net-Net AutoML algorithms based on entropy descriptors may become a useful tool in the design of automatic expert systems to select ANN topologies for complex biological systems. The scripts and dataset for this project are available in an open GitHub repository.


Introduction
Any system may be represented as a complex network of nodes connected by edges, which can be any property or node interaction (L ij = connection between nodes i and j) [1][2][3][4][5][6][7]. The diversity of Figure 1 shows the workflow for the Net-Net AutoML method applied to BCNs. Our goal was to find a tool that would be able to predict BCN node connectivity (communications between brain regions). This task can be solved by training an ANN classifier using as dataset the previous known node connections (output) with any type of node descriptors (inputs). This solution involves intensive training of ANNs. Net-Net AutoML proposed a shortcut: The creation of another classifier with any ML method that can evaluate whether an ANN is able to predict with accuracy the node connectivity in BCNs, without any ANN training. In order to obtain this classifier, one should previously train different ANNs to predict BCN connectivity. Thus, this Net-Net AutoML classifier is a relationship between the ANN topologies trained for BCN prediction and the ability of these ANNs to provide good predictions. In order to build ANNs for BCN prediction and Net-Net AutoML classifiers for the best ANN, both types of networks (BCNs and ANNs) should provide numerical descriptors. Therefore, for each BCN pair of nodes (brain regions), we knew the experimental connectivity (connected or not) and we could calculate the node descriptors such as Sh k . This dataset was used to train different ANN topologies for the BCN prediction. In order to include information about ANNs to build a Net-Net AutoML classifier, we needed to include ANN information such as Sh k of the entire ANN. Thus, we combined two types of descriptors in order to mix information about the BCN connectivity and ANN topology: We mixed Sh k of BCN nodes (A i -B j pairs) with Sh k for an ANN (S hk (ANN)). We took into account the Shannon entropy from both types of networks but for different quantification: For nodes in BCN and for the entire network in ANN. The output of the Net-Net AutoML best classifier is the ability of S hk (ANN) to predict connectivity between BCN nodes A i and B j .

ANN Datasets and General Workflow
We calculated the values of entropy Sh k (A i ) and Sh k (B i ) for a large number of pair of nodes in the BCN of the CoCoMac experiment [47]. The values Sh k (A i ) are for the activated brain regions and the values Sh k (B i ) for the co-activated brain regions. With these values we created a large dataset with output values L ij = 1 (BCN links) or L ij = 0 (absence of region co-activation), and input values Sh k (A i ) and Sh k (B i ). The dataset was used to train 10 different ANNs with the STATISTICA software [48]. This work is a proof-of-concept method, so at this point we applied the default behavior of STATISTICA that used an automatic algorithm to calculate the best ANN by means of feature selection and topology modification. Future studies should include more ANNs. In the second step, we created the final dataset to find a Net-Net AutoML classifier: We used the outputs of 10 previously trained ANNs to predict BCN connections for 52,690 pairs of BCN nodes (500,740 examples after pre-processing).
Thus, the final dataset had the output as the ability of a specific ANN to predict the type of connection between BCN nodes, and the inputs as different entropies for BCN nodes and entire ANNs. In the next step, linear and nonlinear ML methods were tested to find the best Net-Net AutoML classifier to evaluate the ANN topology for BCN predictions.

Markov-Shannon Entropy Centralities for Nodes
In the present work, we used the MI-NODES application [49]. In the first step, the connectivity matrix L for the BCN was obtained using the CoCoMac experiment [47]: n × n matrix with n vertices/nodes with values of 1 for the connected nodes and 0 for not connected ones. In the next step, a Markov matrix Π was derived from L by calculating the probabilities of vertices p ij as matrix P. The power k (k = 1-5) of the probability matrix Π noted with ( 1 Π) k was multiplied with a vector containing the initial probabilities ( 0 p j ). The results are vectors with the absolute probabilities to reach nodes by walking throughout k nodes n i ( k p j ). These vectors for each k value were used to calculate the entropy centrality (Sh k ) (see Equations (1) and (2)). For more mathematical details about combining the Markov chain theory and Shannon entropy see Reference [49] for MI-NODES desktop application and Reference [46] for the previous Net-Net AutoML application on Biological Ecosystem Networks. G represents any graph/network with j nodes. Thus, the input descriptors for BCN nodes or entire networks as ANNs are modified Shannon entropies that include the data about the path throughout k nodes based on probabilities.

Net-Net AutoML Models
Once the values of the Markov-Shannon entropies were obtained for the BCN nodes and ANNs, twelve Machine Learning classifiers from scikit-learn (python) were tested to find the best Net-Net AutoML classifier able to determine the optimal ANN topology to predict BNCs (co-activations between brain regions): • KNeighborsClassifier = KNN-k-nearest neighbors: A nonparametric classifier that assigns an unclassified sample to the same class as the nearest of k samples in the training set [50]. • LinearDiscriminantAnalysis = LDA-linear discriminant analysis [51]: A statistical supervised method that projects the input data to a lower dimension in order to maximize the scatter between classes versus the scatter within each class. • GaussianNB = GBN-Gaussian Naive Bayes, a simple "probabilistic classifier" [52]. • SVC(kernel = 'rbf') = SVM_RBF-support-vector machines with nonlinear radial basis functions [53]. • LogisticRegression = LogR-Logistic regression [54] is a linear model that estimates the probability of a binary response using different factors. • MLPClassifier = MLP-multilayer perceptron (artificial neural network) using 20 neurons in a hidden layer [55]. • DecisionTreeClassifier = DT-Decision Tree (DT) represents a set of decision rules inferred from the features as a tree of rules (the paths from root to leaf represent classification rules) [56]. • RandomForestClassifier = RF-Random Forest [57] aggregates several decision trees (parallel trees). Each tree is generated using a bootstrap sample randomly drawn from the original dataset. • XGBClassifier = XGB-an optimized distributed gradient boosting library based on serial trees [58]. • GradientBoostingClassifier = GB-gradient boosting library [59]. • AdaBoostClassifier = Ada-is a meta-estimator that starts the fitting with a classifier based on the original dataset and then adds additional copies of the original classifier to the adjusted weights for the incorrectly classified instances [60]. • BaggingClassifier = Bagging-similar with Ada but the additional classifiers are based on subsets of the original dataset [61].
LDA could represent the simplest model equation. Let q S(L ij ) be the output variable of a Net-Net AutoML classifier used to score the ability of a given q ANN to correctly predict the link or brain region co-activation L ij between two nodes/brain regions, A i -B j (L ij = 1). Equation (3) describes the general formula for the LDA model. k means Sh k codifies information for nodes placed at least at a topological distance k (from the reference node).  (ANN). In order to avoid overfitting, a 10-fold cross-validation was performed. The performance of the models was measured using the Area Under the Receiver Operating Characteristics (AUROC) [62]. The scripts and dataset are available at an open repository in GitHub from one of the authors, Cristian R. Munteanu [63].

Results and Discussion
This study used the CoCoMac BCN dataset in order to build a computational tool that is able to select the best ANN topologies for BCN connectivity prediction. A recent study conducted by Van Essen et al. [64] has reviewed The Human Connectome Project, an amazing five-year enterprise devoted to characterize brain connectivity and function in healthy adult human beings. Lang [65] discussed the need for computational methods to disentangle the relationships between anatomical and functional connections. However, we have to deal with the estimation of data reliability and the presence of contradictory reports to develop new BCN models. Consequently, there is a need for computational tools for data mining using integrated large sets of partially redundant or inconsistent data in brain maps [66]. More often, data on BCN have to be systematically re-evaluated (collated) [67].
ANNs can discriminate between the correct connectivity for nodes (n j ) in complex systems (L ij ) from the incorrect and/or randomly distributed links. In order to be able to select the best ANN topology for BCN node connectivity prediction, we propose the Net-Net AutoML method. This tool represents a classifier based on Shannon entropies (Sh k ) of BCN pairs of nodes and ANN topology. In the first step, the BCN was turned into numerical input parameters (Sh k (ANN) values) for all pairs of nodes to feed alternative ANN classifiers. Next, we trained different ANNs to predict the BCN connectivity. In the final step, we joined all the data (Sh k values of BCNs and ANNs) of selected pre-trained cases to search the best AutoML classifier using twelve Machine Learning methods. Table 1 shows the values of Sh k (ANN) for 10 different topologies. As previously explained, Sh k (A i ) and Sh k (B i ) were used as inputs and L ij = 1 (BCN links) or L ij = 0 as outputs. At that point, we used Multilayer Perceptron (MLP) and Liner Neural Network (LNN) as topologies with the default values provided by STATISTICA after its automatic algorithm based on feature selection and topology modification to improve them. In order to obtain a better generalization, the future studies should use more ANN topologies and/or different configurations thereof. We may need to use a High Performance Computing (HPC) service if we want to test a high number of ANNs for many complex systems [68][69][70][71][72][73]. Table 1. Information indices Sh k (ANN) of the ANNs used as inputs to find the best Net-Net Automated Machine Learning (AutoML) model.

ANN No. ANN Profile (inputs:hidden layers EPs:outputs)
Sh k (ANN) Thus, our final classifier is based on entropies for different nodes of the BCN-Sh k (A i ) and Sh k (B j )-and entropies of different ANNs topologies-Sh k (ANN). The best Net-Net AutoML model determined the scores q S(L ij ) for a given ANN topology to predict the connectivity of BCN pair of nodes. This study presents a proof-of-concept model that fits very well, 500,740 outcomes obtained with 10 different ANNs.
Twelve ML classifiers were tested and the AUROC values were calculated using a 10-fold cross-validation (see Table 2). The best model was obtained with RF10 (RF based on 10 trees) and AUROC was 0.9983 ± 0.0001. Figure 2 shows the box-plot for the AUROC values of 12 ML methods (10-fold CV). The boxplot suggests that the AUROC values for all ML methods were stable within each fold. In addition, the difference between the RF and the other methods (box-plots are far from overlapping) proved to be statistically significant. All the Net-Net classifiers obtained good performance with an AUROC value greater than 0.90. An interesting result was provided by the simple KNN method with a mean AUROC value of 0.9958, still less than RF10 with a mean AUROC value of 0.9983. The linear methods (LDA and LogR) can provide models with a mean AUROC value > 0.92 and 0.95. The change to a tree-based method provided a better AUROC with values greater than 0.99, except for Ada. Bagging had a performance very similar to the RF classifier with a mean AUROC value of 0.9980.   In the next step, we performed a grid search for the RF number of trees: 1 tree (DT), and 2, 5, 10, 50, 100 trees (RF2, RF5, RF10, RF50, RF100). The results show that by using two trees with Random Forest, it is possible to increase the mean AUROC value from 0.9925 to 0.9944 (DT) (see Table 3 and Figure 3). By doubling the number of trees from 50 to 100, the difference between RF50 and RF100 for the mean AUROC value was just 0.0001. Therefore, we chose the best Net-Net AutoML model as RF50 with a mean AUROC value of 0.9991 (SD = 0.0001). The dataset, scripts, and results are available as an open GitHub repository [63].   Figure 4 shows the feature importance for the best Random Forest model based on 50 trees. We can observe that the entropies of the BCN nodes are the most important features followed by the ANN entropies. The differences between the BCN node entropies are less important for this RF classifier. The most important feature was the BCN node entropy containing information about the nodes placed at a minimum distance of 2-Sh 2 (B).

Conclusions
This work confirms that Markov chains are useful to calculate Shannon entropy information indices Sh k that quantify the connectivity patterns on both BCNs and ANNs. We demonstrated how to develop Net-Net AutoML models based on Sh k values of both networks. The dataset contains 500,470 examples and 20 features. Twelve linear and non-linear Machine Learning classifiers were tested and the best classification model was provided by Random Forest (50 trees) with the mean test AUROC of 0.9991 ± 0.0001 (10-fold cross-validation). The Net-Net AutoML models are useful to determine the ANN topology, which is better to predict the connectivity in the BCN system. Consequently, we can use this methodology to predict the ANN topologies, with better performance before training them. This may lead to an optimization of computing resources. The scripts and dataset for this project are available in an open GitHub repository from one of the authors, Cristian R. Munteanu [63].   ) and the European Regional Development Funds (FEDER) by the European Union. Additional support was offered by the Accreditation, Structuring, and Improvement of Consolidated Research Units and Singular Centers (ED431G/01), funded by the Ministry of Education, University and Vocational Training of Xunta de Galicia endowed with EU FEDER funds. Last, the authors also acknowledge research grants from the Ministry of Economy and Competitiveness, MINECO, Spain (FEDER CTQ2016-74881-P) and support of Ikerbasque, the Basque Foundation for Science.

Conflicts of Interest:
The authors declare no conflict of interest.