Antifragility Predicts the Robustness and Evolvability of Biological Networks through Multi-Class Classification with a Convolutional Neural Network

Robustness and evolvability are essential properties to the evolution of biological networks. To determine if a biological network is robust and/or evolvable, it is required to compare its functions before and after mutations. However, this sometimes takes a high computational cost as the network size grows. Here, we develop a predictive method to estimate the robustness and evolvability of biological networks without an explicit comparison of functions. We measure antifragility in Boolean network models of biological systems and use this as the predictor. Antifragility occurs when a system benefits from external perturbations. By means of the differences of antifragility between the original and mutated biological networks, we train a convolutional neural network (CNN) and test it to classify the properties of robustness and evolvability. We found that our CNN model successfully classified the properties. Thus, we conclude that our antifragility measure can be used as a predictor of the robustness and evolvability of biological networks.


Introduction
Robustness and evolvability are prevalent in the evolution of biological systems [1][2][3][4][5][6]. As studying the relationship between the two properties is necessary for understanding how biological systems can withstand mutations and simultaneously generate genetic variations, numerous studies on their relationship have been done [7][8][9][10][11]. Robustness allows the existing functions to be preserved in the presence of mutations or perturbations, while evolvability enables new functions to be expressed for adapting to new environments [12][13][14]. To determine if a biological system is robust and/or evolvable, the comparison of its functions before and after internal perturbations is needed. Pragmatically, in Boolean networks used as gene regulatory network models, the definition of robust and evolvable networks has been established via the comparison of dynamic attractors (i.e., stable steady states) before and after internal perturbations [14] based on numerical and experimental evidence showing that the attractors represent cell types or cell functions [15][16][17]. assigned to node v i and has input nodes v i 1 , v i 2 , . . . , v i k . IN(v i ) signifies the set of input nodes v i 1 , v i 2 , . . . , v i k to v i . • The state of each node is 0 or 1 at discrete time t. We use v i (t) to indicate the state of node v i at time t. Then, the state of node v i at time t + 1 is denoted by v i (t + 1) = f i v i 1 (t), v i 2 (t), . . . , v i k (t) .
• If we designate [v 1 (t), v 2 (t), . . . , v N (t)] as v(t), v i (t + 1) is simply written by v i (t + 1) = f i (v(t)). v(t) means a gene expression profile at time t. For the whole Boolean network, it is written by v(t + 1) = f (v(t)).
• The set of edges E is defined as E = v i j , v i v i j ∈ IN(v i ) . Then, G(V, E) represents a directed graph which shows the network topology. An edge from v i j to v i means that v i j has an effect on the expression of v i . K is the in-degree of v i . • An initial v(0) eventually converges into a set of state configurations, which is defined as an attractor. When an attractor is composed of only one state configuration (i.e., v = f (v)), it is a fixed-point attractor. When an attractor is composed of more than one state configuration, it is a cyclic attractor with period l if it consists of l state configurations (i.e., v 1 = f v l = f f v l−1 = · · · = f f · · · f v 1 · · · ). Here, l is called attractor length. The set of state configurations which eventually reach the same attractor is defined as basin of attraction. For example, in Figure 1 attraction are the rest of the configurations going toward attractors. For a better understanding, we provide formal definitions of a Boolean network, an attractor, and a basin of attraction as follows [35]:  We denote a Boolean network as ( , ) composed of a set = { 1 , 2 , … , } of nodes and a list = ( 1 , 2 , … , ) of Boolean functions, in which ( 1 , 2 , … , ) is the Boolean function assigned to node and has input nodes 1 , 2 , … , .
( ) signifies the set of input nodes 1 , 2 , … , to .  The state of each node is 0 or 1 at discrete time . We use ( ) to indicate the state of node at time . Then, the state of node at time + 1 is denoted by ( + 1) = ( 1 ( ), 2 ( ), … , ( )).  If we designate [ 1 ( ), 2 ( ), … , ( )] as ( ) , ( + 1) is simply written by ( + 1) = ( ( )). ( ) means a gene expression profile at time . For the whole Boolean network, it is written by ( + 1) = ( ( )).  The set of edges is defined as = {( , )| ∈ ( )}. Then, ( , ) represents a directed graph which shows the network topology. An edge from to means that has an effect on the expression of . is the in-degree of .  An initial (0) eventually converges into a set of state configurations, which is defined as an attractor. When an attractor is composed of only one state configuration (i.e., = ( )), it is a fixed-point attractor. When an attractor is composed of more than one state configuration, it is a cyclic attractor with period if it consists of state configurations (i.e., 1 = ( ) = ( ( −1 )) = ⋯ = ( (⋯ ( 1 ) ⋯ ))) . Here, is called attractor length. The set of state configurations which eventually reach the same attractor is defined as basin of attraction. For example, in Figure 1  In this study, we use different Boolean network models of 37 biological systems. The range of the network size is from 5 to 26. We find all attractors and basins of attraction by exhaustively In this study, we use different Boolean network models of 37 biological systems. The range of the network size is from 5 to 26. We find all attractors and basins of attraction by exhaustively tracking which states all state configurations finally converge to. Therefore, to find the attractors and basins for the ten thousand networks within a reasonable time, 26 nodes (2 26 = 67,108,864 states) are the maximum network size we can handle in our computing environment. Table 1 shows information about the biological networks collected from the Cell Collective public platform for modeling biological networks. In the table, the networks are sorted by their size.

Mutations and Classification of Robustness and Evolvability
We introduce four types of random mutations to each biological network to study their robustness and evolvability: We add, delete one regulatory link, change the position of a link in the network, or flip one state (i.e., 0 changes into 1, and 1 changes into 0) in the output of Boolean functions assigned to each node [63][64][65]. These four types of mutations are distributed as equally as possible. Then, we measure properties of different mutants without repetitions and compare them with the original networks.
Comparing the attractors between the original and mutated networks, we classify the properties of biological networks [66]. Our classification is extended from the definition of robust and evolvable network in Aldana et al.'s work [14]. They added perturbations to a network structure and then network, or flip one state (i.e., 0 changes into 1, and 1 changes into 0) in the output of Boolean functions assigned to each node [63][64][65]. These four types of mutations are distributed as equally as possible. Then, we measure properties of different mutants without repetitions and compare them with the original networks. Comparing the attractors between the original and mutated networks, we classify the properties of biological networks [66]. Our classification is extended from the definition of robust and evolvable network in Aldana et al.'s work [14]. They added perturbations to a network structure and then observed attractors between the original and perturbed networks. Under the assumption that all attractors represent essential cell types or cell functions, they considered the preservation of attractors robustness and the emergence of new attractors evolvability.
Based on their concept, we divide the properties of robustness and evolvability into four classes: Not robust & not evolvable, not robust & evolvable, robust & not evolvable, and robust & evolvable ( Figure 2). For example, an original network has a set of attractors = { 1 , 2 , 3 }.  In our simulations, we independently add an internal perturbation 1000 times to each network, and then get 1000 perturbed networks per biological network. Next, we classify the properties of all the perturbed networks into the above four classes. Since one biological network can be not robust & not evolvable, not robust & evolvable, robust & not evolvable, or robust & evolvable against the 1000 perturbations, we can get the percentage frequency distribution of the four classes per network. In our simulations, we independently add an internal perturbation 1000 times to each network, and then get 1000 perturbed networks per biological network. Next, we classify the properties of all the perturbed networks into the above four classes. Since one biological network can be not robust & not evolvable, not robust & evolvable, robust & not evolvable, or robust & evolvable against the 1000 perturbations, we can get the percentage frequency distribution of the four classes per network.

Antifragility in Boolean Networks
Antifragility was defined by Taleb [22]. Antifragility is the property of a system that improves when exposed to perturbations. We recently developed a measure to assess antifragility in Boolean networks [23,24]. In this study, we employed this antifragility measure. One advantage of this measure is that its computation time increases linearly with N ( Figure S2 in Supplementary Material).
On the assumption that antifragility indicating responses to external perturbations might be helpful to predict responses to internal perturbations (i.e., mutations), we use antifragility as a predictor to estimate the robustness and evolvability of biological networks against mutations. To understand our antifragility measure, we first have to explain how we measure complexity and external perturbations.

. Complexity of Boolean Networks
Complexity can be seen as a balance between change and regularity [26]. In terms of information, change means that information becomes different, while regularity means that information is preserved. In biology, a balance between the change and preservation of genetic information enables biological systems to have flexibility and stability by which biological systems robustly adapt to their environment.
We defined the change and regularity as emergence and self-organization, respectively and developed measures to quantify them. Moreover, using the two metrics, we presented a measure to evaluate complexity [67,68]. Complexity (C) is calculated from emergence (E) and self-organization (S) as follows: where coefficient 4 is added to normalize the values of C to the range of [0,1] (0 ≤ C ≤ 1). Since S can be regarded as the complement of E [67,68], Equation (1) is reformulated by the following equation: In a Boolean network which consists of N nodes, E (0 ≤ E ≤ 1) is measured as the average of emergence values for all nodes, where the emergence of each node is computed through Shannon's information entropy: is the ratio of how many 0 s (1 s) are expressed to T state transitions for node i (i.e., p 1 ) on state transitions not from 1 to T but from T + 1 to 2 T so that we can exclude as many transient states as possible. This enables us to get more stable values of p  In Equation (2), C has the maximum value of 1 when E is 0.5 [68,69]. For example, under the condition that all the nodes of a Boolean network have the same emergence values, when 0 or 1 is around 0.89, average E becomes 0.5. In contrast, C has the minimum value of 0 for a node when E is 1 (constant change) or 0 (no change). Under the condition that all the nodes have the same emergence values, when 0 and 1 are 0.5 or when 0 or 1 is 1, average E of a network becomes 1 or 0, respectively. As seen in the examples, complexity is determined by how 1 and 0 s are distributed during state transitions. The distribution of 1 and 0 s at each node represents how node states are altered and kept. Thus, the complexity of Boolean networks measures a degree of the balance (E = 0.5) between change (E = 1) and regularity (E= 0) of gene expression information. In Equation (2), C has the maximum value of 1 when E is 0.5 [68,69]. For example, under the condition that all the nodes of a Boolean network have the same emergence values, when p 0 or p 1 is around 0.89, average E becomes 0.5. In contrast, C has the minimum value of 0 for a node when E is 1 (constant change) or 0 (no change). Under the condition that all the nodes have the same emergence values, when p 0 and p 1 are 0.5 or when p 0 or p 1 is 1, average E of a network becomes 1 or 0, respectively. As seen in the examples, complexity is determined by how 1 and 0 s are distributed during state transitions. The distribution of 1 and 0 s at each node represents how node states are altered and kept. Thus, the complexity of Boolean networks measures a degree of the balance (E = 0.5) between change (E = 1) and regularity (E= 0) of gene expression information.

External Perturbations to Boolean Networks
We consider flipping the node states of the networks as external perturbations so as to measure antifragility [23,24] (as opposed to the internal perturbations, i.e., mutations, described above). The degree of external perturbations (∆x) is quantified by the following equation: where X is the number of nodes randomly chosen to be perturbed, The values of T and O were determined based on our previous research [23]: If T is large enough, the shape of the antifragility curve is consistently obtained over T (with T increasing). In other words, antifragility curves quickly converge with an increasing T. For Boolean networks which have less than 100 nodes, T = 200 was enough to measure antifragility. O = 1 showed the most distinctive difference of antifragility between networks. As with other variables related to our antifragility measure, the time required to perform these calculations increases linearly with N.

Antifragility of Boolean Networks
Our antifragility measure ( ) is composed of two terms: The difference of "satisfaction" before and after external perturbations (∆σ) and the degree of external perturbations (∆x) [23,24]. The equation is as follows: Here, the degree of external perturbations (∆x) was explained in Section 2.3.2. To obtain the difference of satisfaction before and after external perturbations (∆σ), the concept of satisfaction (σ) should be explained. σ represents how much the "goal" of agents has been attained [70]. The agents and goal can be defined differently depending on systems and observers. In Boolean networks, each node can be regarded as an agent. Their goal can be arbitrarily defined as achieving high complexity.
For Equation (5), ∆σ is calculated by the following equation: where C 0 and C are the complexities before and after external perturbations, respectively. ∆σ has values in the range [-1, 1] (−1 ≤ ∆σ ≤ 1) as C 0 and C have the interval [0,1]. Thus, if ∆σ is positive (C ≥ C 0 ), it means that complexity was increased by the external perturbations. That is, we define "benefiting from perturbations" as increasing their complexity. If this is the case, the Boolean network can be considered antifragile. If ∆σ is negative (C ≤ C 0 ), it indicates that complexity decreases by external perturbations. It can be seen as fragile. If ∆σ is zero (C = C 0 ), complexity is maintained against external perturbations. Thus, the network can be regarded as robust. To calculate C 0 and C, when the state transitions of the original network and perturbed one are computed, the same initial states are used at t = 0. Depending on the initial states, because complexity can be different, C 0 and C are calculated as the average of the complexity values acquired from a large number of initial conditions that are randomly chosen. This yields a more stable value of as a system property. In our simulations, the number of initial states (s) is set to 1000.

Property Classification with a Convolutional Neural Network
We tried several classification methods, including multinomial logistic regression (46.33% accuracy), and a convolutional neural network (CCN) was the most effective.

Input and Output in a CNN
We use a CNN to classify the properties of the biological networks into the four classes: Not robust & not evolvable, not robust & evolvable, robust & not evolvable, and robust & evolvable. In our CNN model, the input consists of the differences of antifragility between the original and mutated networks, and the output consists of the properties classified into the four classes. As we use 37 biological networks and add the internal perturbations 1000 times to each network, we have 37 × 1000 = 37,000 quantities of the differences of antifragility and the classified properties, respectively.
We explain how to obtain the input, taking the Mammalian cell cycle network with 20 nodes and its mutated network as examples. Figure 4a,b show the antifragility of the two networks depending on the number of perturbed nodes X. We interpolate to get 30 data points for each network (independently of N) and subtract the antifragility value of the mutated network from that of the original network at each data point. Then, we get 30 difference values in the normalized range [1/N, 1 (=N/N)] (Figure 4c). In this way, for all the networks with the different number of nodes, we can get 37,000 input elements in which one element is composed of 30 data points. This process allows antifragility of all the networks to have the same size of 30 × 1. This shape of input is necessary to use CNNs which require images with the same width and height as input.

Nested k-Fold Cross-Validation
We carry out nested k-fold cross validation to select a model and evaluate the model performance. Figure 5 briefly illustrates the process of the cross-validation. We split the data with the 37,000 quantities into training data (i.e., 37,000 × 0.75 = 27,750) and test data (i.e., 37,000 × 0.25 = 9250) with the ratio of 75 to 25 and set k = 4 (Outer loop A in Figure 5). However, the data about the properties labeled with the four classes is imbalanced, as the four classes are not equally distributed. Thus, we have a multiclass classification problem with imbalanced data. We balance the training data using an oversampling technique. We get balanced data that make up the same number of samples per class. Then, we divide the balanced data into training data and validation data with the ratio of 67 to 33 (Inner loop in Figure 5). From k = 1 to k = 4, we train our model with the balanced data and test it with the original imbalanced data for each fold (Outer loop B in Figure 5). criterion to find the best hyperparameter set. We have 12 hyperparameter sets in total (Table 2 and Figure 6). For the three rows of the inner loop derived from the first fold in Figure 5, we calculate AUC = {AUC(hyp1), AUC(hyp2), ..., AUC(hyp12)} in the validation of each row, and then get averages over the three rows (i.e., AUC avg. = {AUC(hyp1) avg. , AUC(hyp2) avg. , ..., AUC(hyp12) avg. }). Repeating the process from the second fold to the fourth one, we get their means for the four folds (i.e., AUC final = {AUC(hyp1) final , AUC(hyp2) final , ..., AUC(hyp12) final }). Finally, we selected hyp 2 as the best hyperparameter set from the values of AUC final in Table 2.

•
Outer loop: We train the model which has the optimal parameters determined from the inner loop. The model is trained with the balanced data. Tables 3 and 4 show the accuracy of the training and validation dataset, respectively. After training the model, we test it with the original imbalanced data (Outer loop B in Figure 5). To evaluate the model performance, we get a test accuracy, a confusion matrix, and precision-recall (PR) curves. They are averages over the four folds.   We have three hyperparameters: Batch size = {32, 64, 128}, data balancing = {SMOTE, ADASYN}, and CNN architecture = {simple, complex}. For the data balancing, we try SMOTE (Synthetic Minority Over-Sampling Technique) and ADASYN (Adaptive Synthetic Sampling Approach), which are oversampling techniques generating synthetic samples from the minority class [71,72]. Regarding the architecture, we try a simple CNN model and a complex one. CNN is usually composed of the convolutional layer (Conv) with a rectified linear unit (ReLU) activation function, pooling layer (Pool), flattening layer (Flat), fully connected layer (FC), and output layer (Out). The layers are arranged in the order of Conv + ReLU-Pool-Flat-FC-Out. Depending on how many Conv + ReLU-Pool layers are stacked, CNNs become deeper neural networks. The deeper CNNs can handle higher resolution images. In our study, the simple CNN has Conv-Conv + ReLU-Pool, and the complex CNN has Conv + ReLU-Conv + ReLU-Pool-Conv + ReLU-Conv + ReLU-Pool ( Figure 6). avg. = 0.7197

•
Inner loop: We set the values of hyperparameters and fit the model parameters. The model is trained and validated with the balanced data. We use AUC (i.e., area under the ROC curve) as a criterion to find the best hyperparameter set. We have 12 hyperparameter sets in total (Table 2 and Figure 6). For the three rows of the inner loop derived from the first fold in Figure 5, we calculate AUC = {AUC(hyp1), AUC(hyp2), . . . , AUC(hyp12)} in the validation of each row, and then get averages over the three rows (i.e., AUC avg. = {AUC(hyp1) avg. , AUC(hyp2) avg. , . . . , AUC(hyp12) avg. }). Repeating the process from the second fold to the fourth one, we get their means for the four folds (i.e., AUC final = {AUC(hyp1) final , AUC(hyp2) final , . . . , AUC(hyp12) final }). Finally, we selected hyp 2 as the best hyperparameter set from the values of AUC final in Table 2.

•
Outer loop: We train the model which has the optimal parameters determined from the inner loop. The model is trained with the balanced data. Tables 3 and 4 show the accuracy of the training and validation dataset, respectively. After training the model, we test it with the original imbalanced data (Outer loop B in Figure 5). To evaluate the model performance, we get a test accuracy, a confusion matrix, and precision-recall (PR) curves. They are averages over the four folds.

Attractors and Basins of Attraction of Biological Networks
We explored the attractors and basins of attraction to find the structural features of the state space in the Boolean network models of the 37 biological systems. Specifically, we measured the number of attractors, the average length of attractors, and the normalized basin entropy. In the case of the normalized basin entropy, it was calculated by dividing Krawitz et al.'s measure [73] by the number of nodes (i.e., H = − ρ p ρ log 2 p ρ /N where p ρ is the basin size of attractor ρ divided by state space size 2 N , ρ p ρ = 1). This is normalized between 0 and 1. As mentioned in the introduction, the attractors can represent cell types or functions [15][16][17]. Hence, from a biological viewpoint, the number of attractors can be interpreted as the number of cell functions, and the average length of attractors can be regarded as the time that it takes cell functions to be conducted. The normalized basin entropy can be seen as the versatility of cell functions because it provides information about the distribution of basin sizes. The more even the basin sizes of attractors are in the state space, the larger the value of normalized entropy is. In the case that all basins of attraction have similar size, when initial states are changed by noise, they are more likely to converge different attractors by jumping from one basin to another. Figure 7a shows the distribution of the number of attractors in log scale. It is clear that various biological systems have different numbers of cell functions. Figure 7b displays the distribution of the average length of attractors. Overall, more than half of the networks have only fixed-point attractors, and the rest of them have relatively short cyclic attractors when their state space sizes are considered. The values varied between 1 and 11, indicating that each biological system spends different times carrying out its cellular function(s). Figure 7c presents the distribution of the normalized basin entropy. The values are diversely distributed between 0 and 0.541. This indicates that some biological systems perform only a few cell functions dominantly, while others commonly change the cell functions among many ones.

Distribution of the Four Classes on Robustness and Evolvability of Biological Networks
We classified the robustness and evolvability of the 37 biological networks into the four classes (not robust & not evolvable, not robust & evolvable, robust & not evolvable, robust & evolvable) to investigate how the biological networks respond to mutations. Figure 8 shows the percentage frequency distribution of the four classes. For each biological network, we added a different internal perturbation 1000 times, and thus acquired the percentage frequency from 1000 different mutated networks.
As seen in Figure 8, every biological network has three, or four classes. Overall, the class of robust & evolvable takes up the lowest percentage. The classes of not robust & evolvable and robust & not evolvable account for the majority of the percentage. The only class that is not present in all networks is not robust & not evolvable. These findings propose that a biological system can display different behaviors of robustness and evolvability against mutations. Furthermore, it is known that biological systems are robust and evolvable [1][2][3][4][5][6], which could be explained not from the behavior of robust & evolvable but the two dynamical behaviors of not robust & evolvable and robust & not evolvable. In other words, it is highly likely that mutations will lead to robust or evolvable networks. It can also be argued that a more complex mechanism is required for being at the same time robust & evolvable, so this could explain why not many networks exhibit both properties frequently.
In addition, such distribution of the dynamical responses suggests evolutionary profiles of what the environments given to the biological systems were like. Not robust & evolvable makes up the highest percentage frequency of the four classes in the distribution, which implies that many biological systems might have been mainly exposed to drastic environmental changes hard to keep All three measures have large value variations, so it is difficult to find common structural features of the state space. These biological systems are specialized for their different cell functions, which may cause such variations. From the variations, we can see that there exists biological networks that have a broad variety of number of attractors, attractor lengths, and basin distributions. For these networks, it is computationally difficult to study the robustness and evolvability of biological systems by comparing the attractors before and after perturbations. It is even harder for larger biological networks because the number of attractors, the attractor length, and the basin size will increase as the number of nodes grows. Therefore, it is necessary to estimate the robustness and evolvability of biological networks without measuring attractors exhaustively.

Distribution of the Four Classes on Robustness and Evolvability of Biological Networks
We classified the robustness and evolvability of the 37 biological networks into the four classes (not robust & not evolvable, not robust & evolvable, robust & not evolvable, robust & evolvable) to investigate how the biological networks respond to mutations. Figure 8 shows the percentage frequency distribution of the four classes. For each biological network, we added a different internal perturbation 1000 times, and thus acquired the percentage frequency from 1000 different mutated networks.
As seen in Figure 8, every biological network has three, or four classes. Overall, the class of robust & evolvable takes up the lowest percentage. The classes of not robust & evolvable and robust & not evolvable account for the majority of the percentage. The only class that is not present in all networks is not robust & not evolvable. These findings propose that a biological system can display different behaviors of robustness and evolvability against mutations. Furthermore, it is known that biological systems are robust and evolvable [1][2][3][4][5][6], which could be explained not from the behavior of robust & evolvable but the two dynamical behaviors of not robust & evolvable and robust & not evolvable.
In other words, it is highly likely that mutations will lead to robust or evolvable networks. It can also be argued that a more complex mechanism is required for being at the same time robust & evolvable, so this could explain why not many networks exhibit both properties frequently.
In addition, such distribution of the dynamical responses suggests evolutionary profiles of what the environments given to the biological systems were like. Not robust & evolvable makes up the highest percentage frequency of the four classes in the distribution, which implies that many biological systems might have been mainly exposed to drastic environmental changes hard to keep the existing cell functions, and thus evolved towards preferring producing brand new cell functions to adapt to the new environments.

Association Between Mutation Type and Robustness and Evolvability
We computed Cramer's V to measure the strength of association between the mutation type and the robustness and evolvability. We had the four mutation types: Adding a link (add), deleting a link (delete), changing the position of a link (change), and flipping one state in a Boolean function (flip). We randomly chose one mutation type and then added it to the biological network. This process was repeated 1000 times per biological network, so we obtained 37,000 pairs (mutation type, property class) for the 37 biological networks. Table 5 is a contingency table displaying the frequency distribution of the four classes depending on the mutation type. From the table, we got Cramer's V = 0.2292. Cramer's V takes values from 0 to 1. The closer to zero, the weaker the association between the variables. Hence, we found that there is a weak association between the mutation type and the property class. This indicates that the type of genetic mutations do not have a strong effect on determining the robustness and evolvability of biological systems.

Association between Mutation Type and Robustness and Evolvability
We computed Cramer's V to measure the strength of association between the mutation type and the robustness and evolvability. We had the four mutation types: Adding a link (add), deleting a link (delete), changing the position of a link (change), and flipping one state in a Boolean function (flip). We randomly chose one mutation type and then added it to the biological network. This process was repeated 1000 times per biological network, so we obtained 37,000 pairs (mutation type, property class) for the 37 biological networks. Table 5 is a contingency table displaying the frequency distribution of the four classes depending on the mutation type. From the table, we got Cramer's V = 0.2292. Cramer's V takes values from 0 to 1. The closer to zero, the weaker the association between the variables. Hence, we found that there is a weak association between the mutation type and the property class. This indicates that the type of genetic mutations do not have a strong effect on determining the robustness and evolvability of biological systems.

Prediction of Robustness and Evolvability Using Antifragility
We got a test accuracy, confusion matrix, and precision-recall curves for the test data to evaluate the performance of our CNN model. We preferred using precision-recall (PR) curves to receiver operating characteristic (ROC) curves because we tested the model with the imbalanced data [74]. Table 6 presents a test accuracy. The overall accuracy is 0.5845. Figure 9a shows a normalized confusion matrix. In the confusion matrix, the x-axis refers to an instance of the predicted classes, and the y-axis represents an instance of the actual classes. The values of the diagonal elements mean the probability of correctly predicted classes. For each class, not robust & not evolvable is 0.6150, not robust & evolvable is 0.6632, robust & not evolvable is 0.5657, and robust & evolvable is 0.4108. From the accuracy and confusion matrix, we can see that our model has a satisfactory performance for the classification of robustness and evolvability. Figure 9b exhibits a micro-averaged PR curve for the four classes. We computed the micro-average globally not distinguishing the elements between different classes, which is usually preferable for imbalanced classes. In our PR curve, the average precision (AP) is 0.54. This score refers to the area under the PR curve. To compute AP, the function of average_precision_score was used in the scikit-learn library of Python (https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall. html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py). The large AP means high precision and high recall. High precision is related to a low false positive rate (type I error α) and high recall is related to a low false negative rate (type II error β).
Overall, our model has a good performance but there are differences for each class. It might be a result of the scarcity of data on robust & evolvable, as seen in Figure 8, and/or a high variance of the differences of antifragility classified into robust & evolvable. These two things might have made it hard for our CNN model to find the feature map of robust & evolvable.   Table 7 shows the AP values between our model and random classifiers. In PR curves, the AP of random classifiers is 0.5 only when there are two classes and their distributions are balanced. For binary classification of balanced and imbalanced datasets, the AP of random classifiers is defined as positive (positive + negative) ⁄ [63,74]. Our test data is imbalanced, so we computed the AP values of random classifiers based on the ratio of positive and negative. Since output is binarized when the PR curve and AP are extended to multi-class classification, calculating the AP values of our model and random classifiers is regarded as binary classification for each class. Table 7 demonstrates that all the AP values for the four classes in our model are larger than those of random classifiers. Although the class of robust & evolvable has a relatively smaller AP value than the other classes, our model has better performance to classify all the classes than random classifiers.   Table 7 shows the AP values between our model and random classifiers. In PR curves, the AP of random classifiers is 0.5 only when there are two classes and their distributions are balanced. For binary classification of balanced and imbalanced datasets, the AP of random classifiers is defined as positive/(positive + negative) [63,74]. Our test data is imbalanced, so we computed the AP values of random classifiers based on the ratio of positive and negative. Since output is binarized when the PR curve and AP are extended to multi-class classification, calculating the AP values of our model and random classifiers is regarded as binary classification for each class. Table 7 demonstrates that all the AP values for the four classes in our model are larger than those of random classifiers. Although the class of robust & evolvable has a relatively smaller AP value than the other classes, our model has better performance to classify all the classes than random classifiers.

Discussion
In this study, we classified robustness and evolvability in Boolean network models of biological systems into the four classes: Not robust & not evolvable, not robust & evolvable, robust & not evolvable, and robust & evolvable. The classification was defined based on the change of attractors representing cell fates or cell functions before and after mutations. Using antifragility which is simply calculated from the dynamics during state transitions following external perturbations, we proposed a classifier to predict the properties of the four classes. We used a convolutional neural network, where the input is the difference of antifragility between original networks and their mutated ones and the output is the four classes. Our model showed a good performance for the multi-class classification. It indicates that our antifragility measure can play a role of a predictor to estimate the robustness and evolvability of biological networks.
To evaluate the utility of our antifragility in terms of a computational cost, we measured the computation time of finding attractors and calculating antifragility. When searching for all attractors,

Discussion
In this study, we classified robustness and evolvability in Boolean network models of biological systems into the four classes: Not robust & not evolvable, not robust & evolvable, robust & not evolvable, and robust & evolvable. The classification was defined based on the change of attractors representing cell fates or cell functions before and after mutations. Using antifragility which is simply calculated from the dynamics during state transitions following external perturbations, we proposed a classifier to predict the properties of the four classes. We used a convolutional neural network, where the input is the difference of antifragility between original networks and their mutated ones and the output is the four classes. Our model showed a good performance for the multi-class classification. It indicates that our antifragility measure can play a role of a predictor to estimate the robustness and evolvability of biological networks.
To evaluate the utility of our antifragility in terms of a computational cost, we measured the computation time of finding attractors and calculating antifragility. When searching for all attractors, we used Dubrova and Teslenko's algorithm which is a SAT-based approach to find attractors efficiently in synchronous Boolean networks [75]. This algorithm has two steps to find attractors. In the first step, a problem is encoded as input for the SAT solver. In the second step, the problem is solved by the SAT solver. However, SAT is NP-complete so the NP-complete problem is intractable in the worst case scenario due to exponential complexity in the encoding or solving phase. For the 37 biological networks with less than 30 nodes in our study, finding attractors was faster than calculating antifragility ( Figure S3 in Supplementary Material). However, we tried to compute the attractors and antifragility of lymphocytic leukemia with 91 nodes (2 91 possible states) collected from Cell Collective. While it took about 47 min to calculate its antifragility value, we could not get its attractors even after a few days in our cluster equipped with 132 G memory. Similar to lymphocytic leukemia, in some large Boolean networks, finding all attractors is computationally expensive or even unfeasible because of very long attractor lengths or a large number of attractors as the state space size is exponentially increased by the growing number of nodes. As the state space size is 2 N , memory simply runs out if one attempts to explore exhaustively large networks. In this case, our classifier with antifragility can be a useful tool for studying the robustness and evolvability of biological networks, as it requires only a sample of the dynamics of original and perturbed networks.
For further study, we plan to use more data. We will collect more kinds of biological Boolean networks, introduce artificial data generated from random Boolean networks and add more mutations to the networks. Since the amount of data is one of important factors to have an influence on model performance, we will run simulations with more extensive data to find the better classifier. In addition, we will thoroughly explore the relationship between antifragility and robustness/evolvability. One possibility is that because antifragility reflects the characteristics of basins of attraction (e.g., the structure of basins of attraction) and attractors (e.g., the number and length of attractors), it might work for predicting the preservation and emergence of attractors. However, to reveal the relationship more clearly, we need further explorations. Taking a step forward, we will evolve random networks using antifragility as a fitness function and examine the robustness and evolvability of the resulting networks. This future research could give a clue to the mechanism of how biological systems obtained robustness and evolvability, and also suggest a possibility for developing robust and/or evolvable engineered systems which have antifragility as a control parameter.
Supplementary Materials: The following are available online at http://www.mdpi.com/1099-4300/22/9/986/s1, Figure S1: Histograms of transient lengths: how many state transitions all state configurations go through until they reach attractors. For the 37 biological networks, the maximum is 31 in aurka, which means that every state configuration of all biological networks converges to attractors within 32 state transitions. This implies that the basins of attraction of biological networks tend to be relatively shallow, Figure  Funding: This work was partially supported by UNAM's PAPIIT projects IN107919 and IV100120.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.