From the population of all available pathways (3241) on the database matching the previously defined criteria: origin and destiny defined, origin to destiny valid traversal, and full-coverage traversals; pathways were randomly selected for different analyses and the runs of the experiments.
5.1. Former Tests
We see that each comparison “match” comprises two distinct corresponding graphs with an origin node selected and a destiny node expected. Remember that each pathway may have different traversals from the same pathway, producing different comparisons for a single pathway. We considered pathways with a valid traversal between an origin (root node) and destiny (leave node) for both graphs for primary analysis purposes. For fair comparison criteria, we also considered full-coverage traversals only. It means that we are considering pathways as connected graphs.
For primary analysis purposes, we considered pathways with valid traversal between an origin (root node) and destiny (leave node) for both graphs. For fair comparison criteria, we also considered full-coverage traversals only. It means that we are considering pathways as connected graphs.
Taking into account all the data could mean millions of comparisons and many hours of computer work; to simplify the process, we considered a random statistical sample with selection criteria, and in the same proportion of elements presented in the total population data, they were: number of nodes (size) and one origin node (root) only (the latter guarantees full-coverage traversals).
In
Figure 11 and
Figure 12, we can see the data distribution categorized by graph size and complexity (edges), representing the number of pathways with each characteristic.
For the first overview analysis, the most representative values selected were: size from 2 to 20 for 3125 pathways (96.4%); selecting the pathways with one origin only provides a total of 2340 pathways, distributed as shown by the bar’s height in
Figure 13.
Then, we selected a random statistic sample of 20% (468 pathways) in a scaled proportion of the selected size criteria (2 to 20), meaning 20% of each of these sizes, as shown in
Figure 13 in blue color.
The first run of 109,278 pairwise comparisons was executed. All pairwise comparisons were measured using the proposed algorithms. Also, each matching pair was tested using a third-party external tool. We reviewed many previous works to evaluate our results. Considering the similarity in the outputs, not all were available, updated, open-source, accessible, and so forth. We selected a tool that was available and also provided a pairwise comparison with a 1 to 1 score on a scale of 0.0 to 1.0. This tool is called “TMPAlign”, a newer version of the tool MPAlign introduced in 2014 [
30].
With the random statistical sample of 20% of the selected data, we can see in
Figure 14 an exciting observation, as a first result: all of the pairwise comparisons reporting 0% equivalent nodes generate a score of 0 for all scores, for our Global Scores and even for the TMPAlign tool. So, for the rest of the comparisons, we are avoiding these comparisons where the Equivalent nodes ratio is 0 since they are not providing significant values and consume a significant amount of computation time in our batch runs that will finally score 0. This allowed a bigger statistical sample for subsequent runs, with more significant results.
5.2. Analysis of Algorithms for Pairwise Comparisons
Several execution metrics were conducted to evaluate the tests for each pairwise comparison previously defined. After the first tests with the sample of 20%, and considering that the comparisons without equivalent nodes always generate identical scores (0, as described before), the random statistical sample was increased to 50% while simultaneously only performing the comparisons with at least one equivalent node, so a broader diversity of metabolic pathways were tested. This new selection represented 1169 pathways, which means 682,696 possible comparisons.
The first algorithm mainly relies on the score provided by the global alignment; however, to provide a better meaning to this, some additional metrics were developed in order to adjust the relationship between the scores and the data, like the coverage of one pathway with others, especially when they are of different sizes. So, the values are indicated in a ratio relationship from 0.0 to 1.0.
The global score seeks to analyze the traversal or complete lecture of the pathway and consider the number of similar elements as a whole. In the tests carried out, it was observed that applying a negative gap assessment such as the −2 standard does not generate any meaning in a metabolic process as such, since a sequence obtained from a metabolic pathway does not lose any information during its lecture, as happens in DNA or RNA sequencing. Hence, the best values obtained, reflecting a more realistic evaluation as a pairwise comparison, was by using a gap value of 0 in the global score. Then, 0 was the gap value selected for the pairwise comparison for algorithm 1.
For the first algorithm,
Figure 15 shows the relationship between the graphs’ size and relative Global pairwise comparison scores. We can see no direct correlation between the size of the graphs and the Global scores. It was found that this “disordered” behavior is expected in all algorithms tested, meaning independence between the data and the algorithms.
It is worth noting that, for the relative global score, the “highest possible score” is delimited to the size ratio of the graphs. For example, let us consider two different pathways, one with three nodes and one with ten nodes. The best chance of a good comparison here is that the nodes of the minor pathway are all in the bigger one, in the same order; the highest possible score, in this case, would be 30%. This can also be seen in
Figure 15 as a “diagonal” that bounds the dispersion of the points across the graph.
Then, for the second algorithm,
Figure 16 shows the relation between the complex ratio of the graphs and its influence on Numerical DbP scores. The numerical evaluation of the second algorithm seeks the differences between the graphs, according to the difference in edges. We can see that there is no direct correlation between the complexity of the graphs and the numerical DbP scores either.
Next, in
Figure 17, we can see the relationship between the number of equivalent nodes and the Global scores of Algorithm 1 on the left side and for the TMPAlign tool on the right side, with the sample of 50%. Similar to the threshold observed for the global scores according to the size ratio in
Figure 14, similar behavior can be seen with the equivalent nodes ratio, with more points being aligned in the central diagonal than before. This also occurs at a lesser degree for the comparison values obtained with TMPAlign, where we can observe that the scores are less correlated to the equivalent nodes ratio; this implies that TMPAlign could be considering other factors when generating its scores. Nevertheless, there is an important observation here: when the graphs should not be similar (i.e., at low equivalent nodes ratio), both tools tend to probe this.
If we also consider the common families of the pathway, from a range of about 638 different families in the total population of 3241 pathways, it is easy to denote a great diversity of metabolic pathways. If we consider grouping the pathways in a common families criteria, we can observe in
Figure 18 that most of the scores related to each category remain very close. As the compared pathways have more families in common, the scores tend to be higher. As all in biology, there is an exception to this. Some pairwise compared metabolic pathways with several common families, eight for the case shown in the figure, may have a few or no similar elements at all for comparison, producing lower scores than its counterparts, contradicting some of the general behaviors observed for most of the comparisons.
5.4. ANOVA Tests
A two-way ANOVA was conducted to examine the effects of size ratio and families on the relative global score of pairwise comparison of metabolic pathways.
Assumptions The ANOVA test makes the following assumptions about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group. Having repeated measures for the same participants is not allowed;
No significant outliers in any cell of the design;
Normality. the data for each design cell should be approximately normally distributed;
Homogeneity of variances. The variance of the outcome variable should be equal in every cell of the design.
Residual analysis was performed to test for the assumptions of the two-way ANOVA. Outliers were assessed by the box plot method, normality was assessed using Shapiro–Wilk’s normality test, and homogeneity of variances was assessed by Levene’s test (
Listing 1).
Listing 1.
Two-way ANOVA: Global. Summary statistics. R output. (Own Source).
Listing 1.
Two-way ANOVA: Global. Summary statistics. R output. (Own Source).
> |
# Summary statistics |
# Compute the mean and the SD (standard deviation) |
# of the Global score by groups: |
> doe_selection %>% |
+ group_by(size_ratio, families) %>% |
+ get_summary_stats(global, type = "mean_sd") |
# A tibble: 9 x 6 |
size_ratio families variable n mean sd |
<chr> <fct> <chr> <dbl> <dbl> <dbl> |
1 different none global 50 0.095 0.043 |
2 different few global 50 0.101 0.048 |
3 different several global 50 0.135 0.07 |
4 medium none global 50 0.166 0.081 |
5 medium few global 50 0.166 0.094 |
6 medium several global 50 0.239 0.155 |
7 similar none global 50 0.199 0.131 |
8 similar few global 50 0.185 0.107 |
9 similar several global 50 0.321 0.161 |
> |
There were some extreme outliers, residuals were not normally distributed (p > 0.05), and there was no homogeneity of variances (p > 0.05).
A data transformation using Log 10 was applied, shown in the following
Listing 2. After that, there were no extreme outliers; residuals were normally distributed (
p > 0.05). However, there was no homogeneity of variances (
p > 0.05) for all the cases.
Listing 2.
Log10: Transforming the Data. (Own Source).
Listing 2.
Log10: Transforming the Data. (Own Source).
> |
# Some common heuristics transformations for non-normal data |
# include: log for greater skew: |
# log10(x) for positively skewed data, |
# log10(max(x+1) - x) for negatively skewed data |
|
# Log transformation of the skewed data: |
> doe_selection$global <- log10 (doe_selection$global) |
> |
The log10 transformation improved the distribution of the data to normality as shown in
Figure 20 (
Listing 3).
Listing 3.
Homogeneity of variance assumption test. (Own Source).
Listing 3.
Homogeneity of variance assumption test. (Own Source).
> |
# This can be checked using the test of Levene: |
> doe_selection %>% levene_test (global ~ size_ratio∗families) |
# A tibble: 1 x 4 |
df1 df2 statistic p |
<int> <int> <dbl> <dbl> |
1 8 441 3.48 0.000657 |
> |
Levene’s test is significant (
p > 0.05), as shown above. Therefore, we cannot assume the homogeneity of variances in the different groups [
28,
31]. In all cases, you may want Levene’s Test statistic to be non-significant. In the case that it is significant, you can either:
- (a)
ignore this violation, based on your own a priori knowledge of the distributional characteristics of the population being sampled;
- (b)
relax the assumption of homoscedasticity and run the Welch one-way test, which does not require that assumption [
32].
In this case, Levene’s test is testing whether the variances of the groups are significantly different. If Levene’s test is significant (i.e., the value of significance is less than 0.05), then we can conclude that the variances are significantly different.
If the overall p-value from the ANOVA table is less than some significance level, then we have sufficient evidence to say that at least one of the means of the groups is different from the others. However, this does not tell us which groups are different from each other. It simply tells us that not all of the group means are equal.
In order to find out exactly which groups are different from each other, we must conduct pairwise
t-tests between each group while controlling for the family-wise error rate. One of the most common ways is to use Bonferroni’s correction when calculating the
p-values for each pairwise
t-tests [
33].
The Welch test is an alternative to the standard ANOVA where the homogeneity of variance cannot be assumed (i.e., the Levene test is significant). In this case, the Games–Howell post hoc test or pairwise
t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences [
28,
31,
32].
On the other hand, if the normality assumption is not met, we could consider running the statistical tests (t-test or ANOVA) on the transformed and non-transformed data to see any meaningful differences. If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the test outputs on the original data [
34].
Tests were executed using transformed and non-transformed data; results were similar, showing evidence of interference of size ratio and families into the Global score.
The biological reference suggests that there will be higher scores when the compared data are more similar (size ratio = medium or similar) and there are some common families (some and several), as observed in the tests executed (
Listing 4).
Listing 4.
ANOVA Table after Bonferroni correction. (Own Source).
Listing 4.
ANOVA Table after Bonferroni correction. (Own Source).
> |
# In the R code below, the asterisk represents the interaction |
# effect and the main effect of each variable |
# (and all lower-order interactions). |
> res.aov <- doe_selection %>% |
+ anova_test(global ~ size_ratio ∗ families) |
Coefficient covariances computed by hccm() |
> res.aov |
ANOVA Table (type II tests) |
Effect DFn DFd F p p<.05 ges |
1 size_ratio 2 441 52.229 4.39e-21 ∗ 0.192 |
2 families 2 441 27.777 4.35e-12 ∗ 0.112 |
3 size_ratio:families 4 441 3.209 1.30e-02 ∗ 0.028 |
> |
Then, some extra tests were conducted: Welch One way ANOVA test, pairwise comparisons using Games-Howell, and pairwise comparisons using the pairwise t-test with no assumption of equal variances with Bonferroni correction [
33]. They all show a significant interaction between the size ratio and families for the most representative combinations. When there are few or several common families, the size ratio is medium or similar.
There is a statistically significant interaction between size ratio and common families on Global score, F(4, 441) = 3.21,
p = 0.013, eta2[g] = 0.03. See the ANOVA table above (Listing 4) after applying a Bonferroni adjustment [
33]. The ANOVA Summary is shown in
Figure 21.
Consequently, an analysis of simple main effects for size ratio was performed with statistical significance receiving a Bonferroni adjustment. There is a statistically significant difference in mean “Global” scores for both medium (F(2, 441) = 7,72, p < 0.05) and similar (F(2, 441) = 24.4, p< 0.05) size ratio to either none–several or few–several common families levels.
As can be expected from the biological point of view, when there are no common families between the compared metabolic pathways, the scores are not influenced and are statistically demonstrated. It is the same situation when the size ratio between pathways differs for any family category.
All pairwise comparisons were analyzed between the different families’ groups organized by size ratio with Bonferroni correction. There was a significant difference in Global scores between the relation of groups for none–several and few–several relations (p < 0.05).
5.5. Tests against Another Algorithm of Reference
The TMPAlign algorithm that was selected as a tool for comparison was outdated. It was made available in 2017, written in a python version 2.7 using services of the KEGG database that are not available today as expected. Documentation of the tool points out it can work with any database, so it was adjusted to work with the same data files from MetaCyc used in this work. TMPAlign was also not using the data about enzymes (i.e., when comparing two reactions, it only considers the reaction’s id when generating the score) because the service required from KEGG to handle this information is not currently available as a free service. The data obtained from MetaCyc does not fulfill the same criteria. Furthermore, it is worth noting that, for some pairwise comparisons, the tool TMPAlign raised errors during the execution, with no clear explanation. All these errors were excluded from the subsequent analysis for all algorithms.
It is important to remark that the main goal of using a reference tool is not to generate the same values but instead prove that, if two pathways are significantly different, both tools can denote it, and the contrary for similar pathways.
As shown in previous sections, the TMPAlign tool was used in all the executed pairwise comparisons made. The proposed algorithms and TMPAlign tools show similar and comparable behaviors. The main aspect to consider here is that when a pair of metabolic pathways are compared, we want a value of valid comparison. When pathways have a similar structure and similar inside elements, we expect to have higher values from that interception. When there are a few or non-common nodes at all, we expect to have lower values to zero. It can be observed from all shown graphs that Algorithms 1 and 2, compared to TMPAlign, have similar behaviors, and they are comparable. The next section will demonstrate that the proposed algorithms are up to 10 times faster than TMPAlign.