Newborn screening (NBS) using tandem mass spectrometry (MS/MS) has transformed our ability to identify and provide early, lifesaving treatment to infants with hereditary metabolic diseases. Because screening is designed to identify affected infants at high sensitivity, it is accompanied by frequent false-positive results [1
]. Additional biochemical and DNA testing of all screen-positive cases is performed to confirm (true positive) or reject (false positive) the primary screening result and to reach a final diagnosis. In some cases, this two-tier strategy can lead to iterative testing rounds and diagnostic delays, placing undue burden on the healthcare system including physicians and clinical laboratories, and on the patients and their families.
At present, only one or a few metabolic analytes or ratios from MS/MS screening panels are used to identify infants with a metabolic disorder. For example, screen-positive cases for methylmalonic acidemia (MMA) are identified using specific cutoff values for propionylcarnitine (C3) and its ratio with acetylcarnitine (C2), two of the 39 analytes measured in MS/MS screening. As an alternative approach to analyte cutoffs, Clinical Laboratory Integrated Reports (CLIR, formerly R4S) postanalytical testing employs a large database of dynamic reference ranges for disease-related analytes and many additional informative analyte ratios in order to improve separation of true- and false-positive cases [2
]. The ranges and overlap of analyte values between patient and control groups can be adjusted in CLIR for multiple continuous and clinical variables (e.g., birth weight, sex, age at blood collection), which have been shown to significantly reduce false-positive results [6
Machine learning is an emerging strategy for the classification of metabolic disorders in newborns [7
]. In particular, Random Forest (RF) or Random Decision Forests [9
] are powerful tree-based methods for supervised machine learning with numerous applications in high-throughput genomic [11
] and metabolomic data analysis [12
]. We recently showed that analysis of all 39 MS/MS analytes in the California NBS panel using RF was able to improve the separation of true- and false-positive cases [15
]. We compared results from our RF analysis to results obtained from CLIR for the same cohort of MMA screen positives. This comparison showed that the prediction of MMA false positives was significantly improved by utilizing the entire set of MS/MS analytes measured at birth. Here we adapted our RF approach developed for methylmalonic acidemia (MMA) [15
] to the study of additional metabolic disorders to improve the diagnosis of glutaric acidemia type 1 (GA-1) and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD); and facilitate detection of ornithine transcarbamylase deficiency (OTCD) that is not currently on the Recommended Universal Screening Panel (RUSP) [16
]. The performance and stability of the RF model was evaluated using NBS data from screen-positive infants for these disorders reported by the California NBS program. Based on these findings, we developed open-source web-based software (https://rusptools.shinyapps.io/RandomForest
) that incorporates our RF model for the analysis and interpretation of newborn screening data. The new RF tool could be used to identify false-positive results in conjunction with CLIR tools and established second-tier confirmatory testing using biochemical and DNA analysis of all screen-positive cases.
Although MS/MS screening identifies most infants with a metabolic disorder on the RUSP, it also creates a high number of false positives that require additional confirmatory testing of all screen-positive cases. At present, NBS relies on the detection of abnormal levels of only one or a few disease-specific markers and their ratios. We recently showed improved separation of true and false-positive cases through Random Forest-based analysis of all analytes on the MS/MS screening panel [15
]. Here we expanded this RF-based approach for analysis of four metabolic disorders (GA-1, MMA, OTCD and VLCADD), each of which is compromised by high false-positive rates and diagnostic delays following a positive newborn screen. Without changing the sensitivity for detecting these disorders in screening, RF was able to reduce the number of false positives by 89% for GA-1, 45% for MMA, 98% for OTCD and by 2% for VLCADD (Figure 1
). By reducing false positives in first-tier screening, this RF-based second-tier approach increased the PPV, and in particular for detecting GA-1 (from 3% to 22%) and OTCD (3% to 62%) (Table 1
). These results support our previous findings of improved performance using RF-based analysis of the entire newborn metabolic profile [15
Metabolic analytes with a large mean decrease in accuracy (MDA) in the RF model are more important for classification of disease status. MDA was used to identify the top-ranked MS/MS analytes and clinical variables for each disorder (Figure 3
). All primary MS/MS markers currently in use for identifying screen positives for the four disorders in the California NBS program were among the five top-ranked analytes (Table 3
). RF also identified several secondary analytes that are part of important analyte ratios with primary analytes for GA-1 (C5DC/C8), MMA (C3/Methionine) and VLCADD (C14:1/C2) [26
]. Methionine, which was top ranked by MDA analysis for MMA, has been associated with differences in MMA phenotypic subgroups, with lower levels in patients with remethylation defects (CblC, D or F) compared to mutase deficiency (mut0/−
]. Notably, methionine was also the top-ranked analyte in the RF model for reducing OTCD false positives (Figure 3
). The methionine/citrulline ratio was identified as an OTCD screening marker [30
]. Similar in concept to separating MMA subgroups, these results suggest that methionine could be associated with OTCD phenotypic subgroups. However, abnormal levels of multiple serum amino acids such as methionine, proline, alanine and glycine could also be a sign of generalized liver damage seen in OTCD patients [29
]. In contrast to the other three diseases, there was only a very small reduction in false-positive cases for VLCADD, which indicates the need for discovery of novel screening markers and molecular confirmatory testing to identify VLCADD carriers who could mistakenly be classified as false positives [37
]. In comparison, a retrospective study using R4S tools showed that sequential postanalytical analysis could have reduced follow-up testing in 25.8% of VLCADD cases [38
Random Forest incorporates information from all metabolic analytes and clinical variables collected at birth. Analytes and variables with lower association to a particular disorder would be assigned a smaller weight in RF and downranked in the MDA analysis. By including clinical variables in RF, the metabolic analytes can be adjusted in relation to the variable. For example, if an analyte level was higher in males than in females, the cutoff value for this analyte would be automatically adjusted higher for a male compared to a female. The inclusion of additional important analyte ratios could further improve RF performance. Because it may be difficult to simultaneously adjust the levels of many analytes for multiple interacting variables, RF provides a new solution for this problem by directly integrating all the information from screening into a single RF score. A single RF score could improve prediction of metabolic disease status, and particularly as the amount of NBS data and the consequent challenges of analyzing these data increases in the future.
To further evaluate the performance of RF, a comparison to CLIR postanalytical tools was performed. Using MS/MS data for GA-1 screen-positive cases, the performances of CLIR and RF were found to be similar for predicting false positives (Table 4
). Based on the default RF score cutoff, RF predicted 14 fewer false positives and one more false negative compared to CLIR. Lowering the RF cutoff to reach the same sensitivity as CLIR resulted in four false negatives (same as CLIR) and 72 false positives (five more than CLIR). Notably, CLIR incorporates several millions of normal screening test results and profiles of screen-positive cases from NBS programs across the US and worldwide. The RF tool in comparison is currently limited to only the data from this study in one state (California NBS program) and four diseases. Similar in concept to CLIR, additional NBS data could be readily incorporated and further improve RF-based predictions. RF and CLIR utilize different methodologies with different advantages for reducing FP screens. When comparing results between CLIR and RF for detecting GA-1 false-positive cases, we found that 40 infants were categorized as TP by CLIR and as FP by RF, while an additional 26 infants were categorized as TP by RF and as FP by CLIR, respectively. Results from the two tools could be integrated using ensemble methods to achieve better predictive performance than could be obtained from each single method alone.
We note that data for metabolic analytes and clinical variables may be collected differently across NBS programs. Age at blood collection, for example, is an important covariate for metabolite levels [39
], and some states may collect blood spots earlier than 24 hours. Age at collection was included in the RF model to adjust for its effect on marker levels and to make the algorithm applicable to other NBS programs. However, there may be other distinguishing factors that limit the application of this RF model (built using CA NBS data) for these programs. To address this problem, we could either collect data from different NBS programs and make adjustment in the RF tool (e.g., batch effect correction), or develop different RF models that are tailored to specific needs of each program.
To facilitate broader application of RF in second-tier analysis and interpretation, we established a novel web-based software (https://rusptools.shinyapps.io/RandomForest/
). This RF tool could be of primary interest to NBS reference laboratories for evaluating MS/MS data from screen-positive cases. Analysis of individual NBS data and prediction of false-positive screens can be obtained within minutes, given the RF model has been established for that particular disease. However, RF-based predictions should always be considered in conjunction with established second-tier confirmatory analysis using biochemical and DNA testing of all screen-positive cases. Ideally, such combined analysis should be performed more rapidly to reduce the number of “false alarms” and positive callouts before parent contact. This is particularly important for inborn metabolic disorders that can present in the first weeks of life and require fast turnaround time of NBS results. The new open-source software creates a low barrier for entry that enables users to rapidly analyze case data, and in turn help improve the RF algorithm for newborn screening.