Author Contributions
A.S., X.C., L.S., Q.W., Y.Z., H.F., C.W., Y.X., H.C., F.S., S.C.L., X.Z., J.F.Z., X.L., X.W. and R.H. performed the study, analyzed, and interpreted the data. S.C.L., X.Z. and J.F.Z. designed and conceptualized the study. X.L., X.W., R.H., F.S., S.C.L. and J.F.Z. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Summary of study design. Machine learning was developed to overcome COG risk stratification’s limitation for identifying patients with high survival probability in the high COG-risk group. The algorithm determines the intratumoral and intracellular microbial gene abundance score, namely M-score, to separate the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification and is complementary to the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies.
Figure 1.
Summary of study design. Machine learning was developed to overcome COG risk stratification’s limitation for identifying patients with high survival probability in the high COG-risk group. The algorithm determines the intratumoral and intracellular microbial gene abundance score, namely M-score, to separate the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification and is complementary to the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies.
Figure 2.
Principal coordinate analysis of the gene dissimilarity matrix computed by Skmer (MKP). Based on microbial sequence similarity, 120 neuroblastoma patients were grouped into two MKP clusters, which were defined as MKP1 and MKP2. Patients in these two groups had significantly different microbial profiles in their tumor tissues. The survival probability of patients in MKP1 was statistically lower than that of patients in MKP2 (p = 9.505 × 10−8).
Figure 2.
Principal coordinate analysis of the gene dissimilarity matrix computed by Skmer (MKP). Based on microbial sequence similarity, 120 neuroblastoma patients were grouped into two MKP clusters, which were defined as MKP1 and MKP2. Patients in these two groups had significantly different microbial profiles in their tumor tissues. The survival probability of patients in MKP1 was statistically lower than that of patients in MKP2 (p = 9.505 × 10−8).
Figure 3.
Principal coordinate analysis of the gene dissimilarity matrix computed by Skmer (COG Risk). One of the microbiome clusters, MKP2, contained patients defined by COG criteria as high, medium, and low risk. The COG high-risk patients had distinct microbiome characteristics. Some COG high-risk patients in MKP2 had similar microbiome features to those with COG medium and low risk. Red, orange, and blue points represent patients categorized by COG criteria as high, intermediate, and low risk. Remarkably, all patients clustered in MKP1 were COG high-risk; however, MKP2 contained patients in all three different COG risk levels.
Figure 3.
Principal coordinate analysis of the gene dissimilarity matrix computed by Skmer (COG Risk). One of the microbiome clusters, MKP2, contained patients defined by COG criteria as high, medium, and low risk. The COG high-risk patients had distinct microbiome characteristics. Some COG high-risk patients in MKP2 had similar microbiome features to those with COG medium and low risk. Red, orange, and blue points represent patients categorized by COG criteria as high, intermediate, and low risk. Remarkably, all patients clustered in MKP1 were COG high-risk; however, MKP2 contained patients in all three different COG risk levels.
Figure 4.
Kaplan-Meier estimator within four different MKP and COG risk groups. The COG high-risk patients in MKP1 and MKP2 had lower survival probabilities than COG low and intermediate-risk patients in MKP2. The hazard ratio (HR) for death among high-risk patients in MKP1 was 17.1 times that of patients with low and intermediate COG risk in MKP2 (p = 4.605 × 10−9). The HR for death among high-risk patients in MKP2 was 5.56 times that of patients with low and intermediate-risk in MKP2 (p = 0.0004). However, the survival probability of high-risk patients in MKP1 was lower than that of patients with high risk in MKP2. The HR for death among high-risk patients in MKP1 was 3.78 times that of those in MKP2 (p = 6.422 × 10−6). Additionally, the total survival probability for patients in MKP1 was lower than those in MKP2. The HR for death in MKP1 was 5 times that in MKP2 (p = 9.505 × 10−8).
Figure 4.
Kaplan-Meier estimator within four different MKP and COG risk groups. The COG high-risk patients in MKP1 and MKP2 had lower survival probabilities than COG low and intermediate-risk patients in MKP2. The hazard ratio (HR) for death among high-risk patients in MKP1 was 17.1 times that of patients with low and intermediate COG risk in MKP2 (p = 4.605 × 10−9). The HR for death among high-risk patients in MKP2 was 5.56 times that of patients with low and intermediate-risk in MKP2 (p = 0.0004). However, the survival probability of high-risk patients in MKP1 was lower than that of patients with high risk in MKP2. The HR for death among high-risk patients in MKP1 was 3.78 times that of those in MKP2 (p = 6.422 × 10−6). Additionally, the total survival probability for patients in MKP1 was lower than those in MKP2. The HR for death in MKP1 was 5 times that in MKP2 (p = 9.505 × 10−8).
Figure 5.
Performance measures for the MKP model. (A) The estimated survival function for each individual. The thick red line represents overall ensemble survival, and the thick green line represents the Nelson–Aalen estimator. (B) The plot of survival probabilities is estimated for each patient based on our prediction model in the OOB ensemble (points in blue correspond to death events; black points are censored observations). (C) OOB time-dependent Brier Score (0 = perfect, 1 = poor, and 0.25 = guessing). The score is shown stratified by ensemble mortality into four groups corresponding to the 0–25, 25–50, 50–75, and 75–100 percentile values of mortality. The red line is the overall (non-stratified) time-dependent Brier score. (D) OOB time-dependent CRPS (0 = perfect, 1 = poor, and 0.25 = guessing). The score is shown stratified by ensemble mortality into four groups corresponding to the 0–25, 25–50, 50–75, and 75–100 percentile values of mortality. The red line is the overall (non-stratified) time-dependent CRPS.
Figure 5.
Performance measures for the MKP model. (A) The estimated survival function for each individual. The thick red line represents overall ensemble survival, and the thick green line represents the Nelson–Aalen estimator. (B) The plot of survival probabilities is estimated for each patient based on our prediction model in the OOB ensemble (points in blue correspond to death events; black points are censored observations). (C) OOB time-dependent Brier Score (0 = perfect, 1 = poor, and 0.25 = guessing). The score is shown stratified by ensemble mortality into four groups corresponding to the 0–25, 25–50, 50–75, and 75–100 percentile values of mortality. The red line is the overall (non-stratified) time-dependent Brier score. (D) OOB time-dependent CRPS (0 = perfect, 1 = poor, and 0.25 = guessing). The score is shown stratified by ensemble mortality into four groups corresponding to the 0–25, 25–50, 50–75, and 75–100 percentile values of mortality. The red line is the overall (non-stratified) time-dependent CRPS.
Figure 6.
Predicting Patients’ Survival Probability with Microbial Gene Abundance. Distributions of M-scores in COG high-risk patients were shown in the inset (upper right). Patients with M-scores higher than 50 were classified as Mhigh, and those with M-scores lower than 50 were classified as Mlow. Survival analysis indicated that Mhigh patients have significantly lower survival probability (p-Value = 6.422 × 10−6). Line and points in red represent Mhigh; lines in blue represent Mlow.
Figure 6.
Predicting Patients’ Survival Probability with Microbial Gene Abundance. Distributions of M-scores in COG high-risk patients were shown in the inset (upper right). Patients with M-scores higher than 50 were classified as Mhigh, and those with M-scores lower than 50 were classified as Mlow. Survival analysis indicated that Mhigh patients have significantly lower survival probability (p-Value = 6.422 × 10−6). Line and points in red represent Mhigh; lines in blue represent Mlow.
Figure 7.
CREB activation may be a key genetic event related to M-score that contributes to the tumorigenesis in the Mhigh group with low-survival probability. Among the 120 patients, the CREB was over activated in the Mhigh group relative to the Mlow group. This could be responsible for activating downstream genes related to cell growth, survival, angiogenesis, migration, and invasion, including BCL-2, VEGF, NGF, and IGF-2, thus leading to the lower survival probability in the Mhigh group.
Figure 7.
CREB activation may be a key genetic event related to M-score that contributes to the tumorigenesis in the Mhigh group with low-survival probability. Among the 120 patients, the CREB was over activated in the Mhigh group relative to the Mlow group. This could be responsible for activating downstream genes related to cell growth, survival, angiogenesis, migration, and invasion, including BCL-2, VEGF, NGF, and IGF-2, thus leading to the lower survival probability in the Mhigh group.
Table 1.
Characteristics of 120 neuroblastoma patients.
Table 1.
Characteristics of 120 neuroblastoma patients.
Characteristics | N (%) |
---|
Gender | |
Male | 70 (58.3) |
Female | 50 (41.7) |
Ethnicity | |
White | 85 (70.8) |
Others | 35 (29.2) |
MKI | |
Low | 35 (29.2) |
Intermediate | 34 (28.3) |
High | 26 (21.7) |
Unknown | 25 (20.8) |
MYCN Status | |
Amplified | 23 (19.2) |
Not Amplified | 96 (80) |
Unknown | 1 (0.8) |
COG Risk | |
Low Risk | 12 (10.0) |
Intermediate Risk | 11 (9.2) |
High Risk | 97 (80.8) |
Location of tumor | |
Abdomen | 104 (86.7) |
Others | 16 (13.3) |
| Mean (SD) |
Age(in years) | 4.3 (2.5) |
Survival Time(in days) | |
Event | 1009.2 (617.2) |
Censored | 2204.5 (734.5) |
Table 2.
Cox proportional hazards regression model test result.
Table 2.
Cox proportional hazards regression model test result.
Variables | p-Value |
---|
MKP Clusters | 9.505 × 10−8 |
Gender | 0.6899 |
MKI | 0.0556 |
MYCN Status | 0.2449 |
COG Risk | 2.659 × 10−5 |
Location | 0.9878 |
Ethnicity | 0.5443 |
Table 3.
Chi-square test of independence between MKP clusters and other potential factors.
Table 3.
Chi-square test of independence between MKP clusters and other potential factors.
Variables | Chi-Square (df) | p-Value |
---|
Gender | 0.0898(1) | 0.7645 |
Ethnicity | 0.1997(1) | 0.655 |
MKI | 5.0892(3) | 0.1654 |
MYCN Status | 0.6865(1) | 0.4074 |
COG Risk | 7.8701(2) | 0.0195 |
Location of tumor | 0.0005(1) | 0.9827 |
Table 4.
p-Values and hazard ratios between different risk groups in MKP1 and MKP2.
Table 4.
p-Values and hazard ratios between different risk groups in MKP1 and MKP2.
Variables | p-Value | Hazard Ratio |
---|
MKP1 vs. MKP2 | 9.505 × 10−8 | 5 |
MKP1 vs. COG high risk in MKP2 | 6.42210−6 | 3.78 |
MKP1 vs. COG low and intermediate risk in MKP2 | 4.60510−9 | 17.1 |
MKP2 vs. COG high risk in MKP2 | 0.2119 | 0.75 |
MKP2 vs. COG low and intermediate risk in MKP2 | 0.0041 | 4.07 |
COG high risk in MKP2 vs. COG low/intermediate risk in MKP2 | 0.0004 | 5.56 |
Table 5.
Error rate comparison with different features.
Table 5.
Error rate comparison with different features.
Variables | Error Rate (%) |
---|
Microbial Gene Abundance | 29.87 |
Gender | 71.67 |
MKI | 53.65 |
MYCN Status | 75.21 |
COG Risk | 68.97 |
Location of tumor | 82.39 |