Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations

The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.


Introduction
Precision medicine (PM) is a disruptive concept that takes into account both individual variability and population characteristics to provide personalized care; this approach widens biological knowledge and explores the great diversity of individuals [1]. PM comprises the customization of healthcare for an individual on the basis of measurements obtained at the individual level. However, it also uses the data and learning retrieved from the rest of the population. Hence, PM relies on both biological individuality and population knowledge to provide tailored healthcare. One of the goals of PM is to use the ever-growing understanding of biology to provide patients with accurate and personalized interventions. All PM strategies include the use of decision-making processes based on biomarker-driven approaches. Genes, gene expression products (i.e., transcripts and proteins), and metabolites are the main biomarker families. Given this molecular diversity of biomarkers, the increase in high-throughput omics technologies offers an amazing opportunity to capture the whole picture of biological systems in a hypothesis-free and unbiased mode. These global strategies are, conceptually, clearly disruptive compared to the current ones, which are mainly hypothesis-driven and, thus, intrinsically reductionist. Holistic investigative methods need to be applied to multiple levels of biological information to deeply understand disease processes.
The prediction of normal and pathological states in patients is based on a dynamic understanding of gene-environment interactions on individual and population scales [2]. The new concept of systems medicine relies on global and integrative approaches for patient care. A biological system can be fully understood only if the space and time scales are considered. Figure 1 gives an overview of the multi-scale perspective of systems medicine. decision-making processes based on biomarker-driven approaches. Genes, gene expression products (i.e., transcripts and proteins), and metabolites are the main biomarker families. Given this molecular diversity of biomarkers, the increase in high-throughput omics technologies offers an amazing opportunity to capture the whole picture of biological systems in a hypothesis-free and unbiased mode. These global strategies are, conceptually, clearly disruptive compared to the current ones, which are mainly hypothesis-driven and, thus, intrinsically reductionist. Holistic investigative methods need to be applied to multiple levels of biological information to deeply understand disease processes. The prediction of normal and pathological states in patients is based on a dynamic understanding of gene-environment interactions on individual and population scales [2]. The new concept of systems medicine relies on global and integrative approaches for patient care. A biological system can be fully understood only if the space and time scales are considered. Figure 1 gives an overview of the multi-scale perspective of systems medicine. Multi-scale biology overview of systems medicine. Three main drivers define phenotype: (i) the molecular phenome, which is defined by the underlying molecular supports of biological information. The different omics strategies enable to interrogate these supports for information retrieval; (ii) environmental effects spanning from exposures to toxic substances or drugs to diet define the exposome; and (iii) the different clinical metrics used to define the clinical phenome. These different biological and clinical metrics should be approached in a multi-dimensional fashion and should take into account the inherent spatial and temporal scales of both measurement technologies and disease dynamics from the molecular to the population level.
For centuries, biological sciences independently addressed the different parts of life systems and physicians viewed and addressed diseases. Global information retrieval allows contextual pathophysiology understanding of the disease for better diagnosis and treatment [2,3]. Structure, organization, and function descriptions should be considered for a complete understanding of a given biological system. The structure involves basic biomolecules (genes, gene expression products, proteins, and metabolites). The topological connections between these molecules define the organization. The function reflects how the system evolves with regard to metabolic fluxes and environmental stimuli [4,5]. Multi-scale biology overview of systems medicine. Three main drivers define phenotype: (i) the molecular phenome, which is defined by the underlying molecular supports of biological information. The different omics strategies enable to interrogate these supports for information retrieval; (ii) environmental effects spanning from exposures to toxic substances or drugs to diet define the exposome; and (iii) the different clinical metrics used to define the clinical phenome. These different biological and clinical metrics should be approached in a multi-dimensional fashion and should take into account the inherent spatial and temporal scales of both measurement technologies and disease dynamics from the molecular to the population level.
For centuries, biological sciences independently addressed the different parts of life systems and physicians viewed and addressed diseases. Global information retrieval allows contextual pathophysiology understanding of the disease for better diagnosis and treatment [2,3]. Structure, organization, and function descriptions should be considered for a complete understanding of a given biological system. The structure involves basic biomolecules (genes, gene expression products, proteins, and metabolites). The topological connections between these molecules define the organization. The function reflects how the system evolves with regard to metabolic fluxes and environmental stimuli [4,5].
Inborn errors of metabolism (IEM) are an appealing model for systems medicine because the disrupted pathways underlying these diseases have been described at least to some extent. IEM clinical presentations are often non-specific; therefore, appropriate laboratory tests are pivotal for making a diagnosis [6]. However, the widespread routine laboratory diagnosis strategies are mainly represented by sequential investigation assays. This approach is slow and lacks an integrated overview of the generated data. For faster and effective IEM screening and diagnosis, a paradigm shift in investigation strategies is urgently needed. A part of the answer may be found in the new field of systems medicine that capitalizes on omics surge, bioinformatics, and computational advancements to translate the huge amount of data generated by high-throughput omics technologies into effective clinically actionable tools to aid medical decision-making.
In this review, omics technologies that allow holistic biological information retrieval are described. Furthermore, the huge potential of multi-omics data integration strategies within the clinical context is described, as is its role as a key driver for the clinical actionability of omics-based biomarkers. Challenges facing their clinical implementation are then discussed. There is a focus on the relevance of the use of these strategies in IEM.

Omics Revolution in Translational and Clinical Contexts
Since the discovery of the DNA structure [7], great advances have been made in understanding genome complexity; these advances have led to sequencing the whole human genome using international endeavors such as the Human Genome Project [8]. Genomics approaches have been widely adopted in biomedical research and have successfully identified the genes and genetic loci involved in the development of human diseases [9][10][11]. These findings revealed the complexity of biological systems and provided insights for new approaches to disease diagnosis, treatment, and prevention [12][13][14][15]. Additionally, other high-throughput omics technologies have been developed to measure other biomolecules, such as epigenomics for epigenetic markers, proteomics for proteins and peptides, and metabolomics for low-molecular-weight metabolites. High-throughput analytical methods allow us to study a large number of omics markers simultaneously. In many ways, omics association studies are similar because they search for omics biomarkers connected with phenotype by unbiased ome-wide screening. Given the uneven maturity of the different omics technologies, genomics seems to be the closest next-generation sequencing (NGS)-based technology introduced into the clinic compared to transcriptomics and epigenomics, which are still promising. Regarding mass spectrometry (MS)-based omics, metabolomics seems to be closer than proteomics to being introduced into clinical practice because metabolite analyses using MS are already routinely adopted in clinical laboratories for drug monitoring and IEM screening. In this review, we mainly focus on mature omics technologies that are actively involved in clinical practice to achieve the promise of PM. However, an overview is also given regarding all omics methods.

High-Throughput Sequencing (HTS) Technologies
Next-generation sequencing (NGS) techniques using a massive parallel sequencing strategy have profoundly changed the clinical genomic landscape. HTS techniques can be classified according to their applications for investigating genomes, epigenomes, or transcriptome. NGS-based strategies that could be used in medical diagnostics vary according to the size of the interrogated genome. These strategies include capturing the few protein-coding regions of a selected panel of genes (tens to hundreds), sequencing of the entire genetic code of a person, which is called whole-genome sequencing (WGS), and sequencing parts of the genome that contain exonic regions, which is called whole-exome sequencing (WES). WGS and WES are used to discover variants associated with a cell function or a disease [16][17][18]. However, NGS-based transcriptome analysis (RNA-seq) [19] entails quantitative gene expression profiling, whereas epigenomic methods focus on chromatin structure [20].

Genomics
The genome is the complete set of DNA of an organism. This genetic material is mainly found in the nucleus of the human cell (nuclear DNA). Mitochondria contain their own genome (mitochondrial DNA). Fredrick Sanger described the chain-termination strategy to replicate a nucleotide sequence of a DNA fragment (500-1000 bases) [21]. Sanger used chemically modified nucleotide bases and radioactive labeling, along with DNA polymerase, primers, chain-terminating nucleotides, and electrophoresis. Since then, sequencing chemistries evolved using fluorophore-labeled dideoxynucleotides and thermostable DNA polymerases allowed cycled sequencing. Electrophoresis automation and laser detection enhanced the sensitivity of this method [22][23][24]. The replicated DNA fragments produce signals (electropherogram peaks) related to the nucleotide sequence. Thereafter, these reads undergo an alignment step with a reference genome to identify variants and define their genomic origin. Of note, the Sanger method is still considered the standard method for DNA sequencing accuracy of approximately 1 in 10,000 bases. The first human genome sequence was achieved in 2001 [25,26]. The genome sequences of several model organisms were determined soon thereafter. These endeavors were accomplished with Sanger DNA sequencing, which involves high costs and low throughput. These drawbacks limited the potential of DNA sequencing for healthcare translation. Several HTS technologies were developed soon after the release of the human genome sequence [27] and high-throughput analysis became widely available for genomics. NGS-based platforms provide the ability to replicate, in parallel, many overlapping short DNA fragments (50-500) derived from already prepared libraries. There are different innovative approaches to the special separation of fragments on arrays or beads [8,28,29]. Simultaneous DNA replication of each fragment during the reaction cycles produces billions of short elongations of the DNA sequence. These short stretches are called reads. Hence, each base is synthesized several times. The lowest number of times that each base being monitored is incorporated into an overlapping fragment is called depth of coverage. At the end of the cycle, all the short reads are assembled according to a reference sequence that allows for reconstruction of the original sequence, ranging from a small exon to an entire genome. Innovative high-throughput NGS-based methods have the ability to conserve the genome information and the redundancy of the sequenced genome through their depth of coverage. Different commercial HTS platforms exist. These platforms differ mainly in their sequencing strategies (ligation versus synthesis), amplification by polymerase chain reaction (PCR) of the DNA fragments (flow cell bridge PCR versus bead emulsion PCR), and finally in their adopted targeted approach (PCR amplification versus hybrid capture) [8].
HTS are powerful technologies for personal genome and transcriptome sequencing [8]. Variants can only be interpreted with a good clinical history, family history, and physical examination. These preliminary steps allow physicians to assess whether there are similar or related phenotypes in other family members; if so, then the inheritance pattern can be evaluated. However, clinical validity is the most challenging aspect of NGS. According to the size of the interrogated genomic information, three strategies could be used for diagnosis purposes. Targeted gene sequencing panels are useful tools for analyzing specific genes in a given clinical condition and are widely used in current clinical practice [14,15]. WES focuses on the more functional and informative part of the genome and is being adopted for genetic studies of IEM for gene identification and clinical diagnosis [10,11,30,31]. This approach might shortly replace targeted approaches. WGS provides a unique window to investigate genetic or somatic variations, thus leading to new avenues for exploration of normal and disease phenotypes. However, the inherent data management and interpretation issues hamper its clinical implementation [32]. From a clinical perspective, comparing different genomic diagnostics approaches is of great interest but requires standard and adopted metrics [33]. Sanger sequencing is the gold standard and allows confident calling of genotypes. Because non-inferiority is a prerequisite for clinical adoption of any new medical innovation, Goldfeder et al. recently proposed an interesting metric to quantify the clinical grade reporting standard of sequencing technologies [34].

Epigenomics
Chemical modifications of DNA, histones, non-histone chromatin proteins, and nuclear RNA define the epigenome. These changes affect gene expression without altering the base sequence. Epigenetics usually refers to the structural adaptation of chromosomal regions. These epigenetic marks may be transient or inherited through cell division [35]. They are due to environmental exposures at various developmental stages throughout the life span [36]. The four main actors of epigenetic machinery include DNA methylation, histone modification, microRNA (miRNA) expression and processing, and chromatin condensation [37,38]. Epigenomic modifications depend on spatial and time-related factors. Therefore, they can be tissue-specific in response to environmental or disease-related modifiers. These modifications could regulate gene expression and, thus, affect cell homeostasis. Comprehensive mapping of epigenetic makeup in many cell types and tissues has been reported [39]. Different strategies have been developed to assess the epigenome [20]. Epigenomics methods generally focus on chromatin structure and include histone modification ChIP-seq (chromatin immunoprecipitation sequencing), thus allowing the identification of DNA-associated protein-binding sites [

Transcriptomics
The gene expression pattern in a cell/tissue can broadly reflect its functional state. The transcriptome is the complete set of RNA transcripts, including ribosomal RNA (rRNA), messenger RNA (mRNA) that represents only 1.5 to 2 percent of the transcriptome, transfer RNA (tRNA), miRNA, and other non-coding RNA (ncRNA). Quantitative analyses of the transcriptome can be performed with either microarrays (Chips) or RNA sequencing (RNAseq). Microarrays are based on specific hybridization of RNA transcripts to DNA probes, and HTS-based expression profiling by RNA-seq allows comprehensive qualitative and quantitative mapping of all transcripts [19] MS analyzers are instruments that weigh molecules and separate them according to their mass-to-charge ratios. There are several MS analyzers with different analytical technologies and, thus, various performance levels regarding resolution, accuracy, throughput and chemical coverage. MS analysis could be semi-quantitative in an untargeted fashion using high-resolution MS instruments or quantitative through targeted analysis using tandem MS [52-54]. MS instruments could be combined with separation methods such as liquid or gas chromatography, capillary electrophoresis, or ion mobility. These combinations aim to enhance the dynamic range, sensitivity, specificity, and chemical coverage [55,56]. Given the chemical diversity of proteins and metabolites and the high sensitivity of this technology, MS has proven its superiority in metabolomics and proteomics.

Proteomics
The proteome consists of all the proteins expressed by a biological system [57]. Posttranslational modifications rely on a highly specialized enzymatic arsenal specific to each cellular type, which leads to the generation of different proteomes from the same genome. These modifications add layers in proteome complexity and, thus, broaden their functionalities [58]. Hence, proteins exhibit different conformation, localization, and interactions depending on space and time factors. The development of proteomics assays is triggered by these complexity challenges. The proteome can mainly be analyzed using MS or protein microarrays [53,59]. However, MS and protein separation allow rapid and accurate detection of hundreds of human proteins and peptides from a small amount of body fluid or tissue [59][60][61]. Recent studies showed promising results using proteome analysis to explore cystinuria [62], mucopolysaccharidoses [63], and liver mitochondrial functions [64]. Despite increasing analytical performances, proteomics has not been used in routine clinical laboratory practice [65].

Metabolomics
The idea behind metabolomics or metabolic profiling has been empirically used in the past; for example, urine organoleptic characteristics (taste, odor, or color) aided in the diagnosis of medical conditions [66]. The metabolome is defined as the set of metabolites present in a given biological system, fluid, cell, or tissue at a given time [67]. Metabolomics is an omics approach based on biochemical characterizations of the metabolites and their fluctuations related to internal (genetic) and external factors (environment) [68]. The metabolomics approach has been applied in many disease studies [69,70]. MS and nuclear magnetic resonance (NMR) are the main analytical techniques used in metabolomics [71]. However, MS is already adopted in clinical laboratories. New advances in analytical technologies such as ion mobility spectrometry (IMS) combined with high-resolution MS have allowed better coverage of the metabolome [56]. Because IEM are related to metabolism disruption, metabolomics is indicated in assessing these diseases. The future of IEM diagnoses relies on simultaneous quantitative metabolic profiling of many metabolites in biological fluids. Targeted MS-based metabolomics is already widely used and implemented in IEM newborn screening national programs worldwide [72]. Untargeted approaches have also been tested and have shown promising results [73,74]. An integrated strategy for IEM assessment using both targeted and untargeted approaches has been recently proposed by Miller et al. This strategy provides useful and actionable diagnostic information for IEM. The authors have successfully diagnosed 21 IEM disorders using plasma metabolite measurements through metabolomics [75]. Aygen et al. performed a multi-center clinical study in 14 clinical centers in Turkey using NMR-based platforms. The urine samples of 989 neonates were analyzed. A set of specific metabolites that varies in patients compared with healthy individuals was characterized and predictive models were developed. Furthermore, a reference NMR database has been built [74]. For a deeper overview of the potential of metabolomics in IEM investigations, refer to a recent comprehensive review reporting underlying metabolic profiling technologies with limits and advantages and their applications in IEM [76].

Phenomics
Phenome is a term that describes the measurable physical and chemical outcomes of the interactions between genes and the environment that are experienced by individuals and influence their phenotypes [77]. Hence, phenotypes could be retrieved through precise, quantitative analysis [78]. Phenomics, which is a branch of science that explores the basis of how our genes respond to environmental changes, is an emerging and powerful approach to revealing important human attributes at the molecular level. It aims to explain how we adapt and why we are affected by diseases [79]. In other words, phenomics approaches capture our personalized experience with our environment [80][81][82][83][84].
Two main pillars build phenomics: deep phenotyping (DP) and phenomics analysis (PA). DP refers to a strategic and comprehensive approach to data acquisition that includes clinical assessment, laboratory analyses, pathology, and imaging. PA involves the evaluation of patterns and relationships between individuals with related phenotypes and/or between genotype-phenotype associations. PA relies on both clinical data and high-dimensional data integration [85], analysis, and visualization [81,86,87].
In a recent work, Kochinke et al. provided a curated database of 746 currently known genes involved in intellectual disability (ID). The genes were classified according to ID-associated clinical features. This work allowed systematic insights into the clinical and molecular landscapes of ID disorders [88]. Kim et al. introduced the integrative phenotyping framework (iPF) for disease subtype identification. This solution allows accessible visualization of multi-omics data following effective dimension reduction. The strategy has been successfully applied to chronic obstructive lung disease (COPD) [89]. The Monarch initiative is an impressive global endeavor that provides computational tools for genotype-phenotype analysis, genomic diagnostics, and PM across broad areas of disease. Thus, the Monarch initiative illustrates the importance of phenomics [90].
For more details, the reader may refer to a recent review reporting state-of-the art phenome-wide association studies [79].

Multi-Omics Strategies, or When the Whole Is More than the Sum of Its Parts
Although each omics technology is able to measure one family of biomolecules accurately and comprehensively, they are all limited by the functional roles of each type of molecule in a biological system. With the significant advancement of high-throughput technologies and diagnostic techniques described here, the molecular basis of many disorders has been unveiled and their integrative consideration could help solve this issue. However, translation of a patient-specific molecular mechanism into personalized clinical applications remains a challenging task that requires integration of multi-dimensional molecular and clinical data into patient-centric models. For example, family history, clinical history, and physical examination are mandatory for the interpretation of variants and laboratory results. However, in NGS, reporting the result is a very tricky task. NGS test accuracy is at its best when the considered variant in a given gene has been previously associated with the patient's condition and when a conclusive functional test has revealed the gene's function abnormalities. Furthermore, few functional studies are available regarding the biological effect of individual variants. This largely impedes effective and comprehensive interpretation of NGS data. In this regard, PM combining multilayer molecular information and specific clinical phenotypes for a given patient may be an answer to this limit [1]. Applying the PM concept to omics and clinical data is a challenging and exciting task. This integrative view of disease modeling is an emerging knowledge-based paradigm in translational and clinical research that capitalizes on the ever-growing power of computational methods to collect, store, integrate, model, and interpret curated disease information across multi-scale biology from molecules to phenotypes [85,91]. With the tremendous amount of available biological and clinical data, the development of appropriate data mining tools is mandatory to extract the hidden information, thereby allowing its translation into actionable clinical tools [91,92]. As technologies keep evolving and datasets grow in volume, velocity, and variety in the big data era, a strong informatics infrastructure will be essential to embrace the PM promise of improved healthcare derived from personal data. Different computational solutions using machine learning and dimension reduction methods have been developed for omics integration [93]. Recent studies have shown the potential of multi-omics studies to provide insightful biological inferences [64,[94][95][96][97][98][99] and to help determine definitive diagnoses in the IEM field [10,11].

Technical Limitations
Experimental and Analytical Noise Reproducibility and repeatability are prerequisites for obtaining consistent results [100]. These important validation steps are hampered by the so-called batch effects. In addition, this drawback can be an important confounder in association studies and potentially causes spurious associations unrelated to the outcomes of interest. Multiple technical platforms from different manufacturers are usually available for the same type of omics profiling. For example, multiple versions of microarray and sequencing platforms are available for genomics, transcriptomics, and epigenomics association studies. They usually have different coverage of the sequenced regions [44]. MS platforms for proteomics and metabolomics have different sensitivities and chemical space coverage [61,76]. This is due to the differences in MS analyzer technology in terms of ionization method, resolution power, measurement accuracy, multi-dimensional separation, scan speed, dynamic range, and analysis throughput [55,56,76]. Such technical heterogeneity often makes meta-analysis and data fusion of different omics studies very challenging. Batch effects issues can be handled by using harmonized Standard Operating Procedures (SOPs) [101][102][103][104]. Furthermore, using standard quality control (QC) processes and metrics to normalize intra-laboratory and inter-laboratory omics measurement variations [105,106] and applying consistent statistical correction methods [107][108][109] and appropriate computational tools [110] can address some technical variation issues.

Analytical Accuracy and Clinical Relevance
Historically, genomics has tightly evolved along with reference sequence GRCh38 [111]. The genome contains approximately 20,000 protein-coding genes, and these vary enormously, spanning from eight base pairs (a transfer RNA) to millions of base pairs. For a given gene, the exon number spans from one to hundreds. Furthermore, a gene's GC richness is a great challenge, especially for capture chemistry-based targeted sequencing. This great genome complexity presents challenges for NGS sequencing strategies regarding accuracy, which is the mandatory prerequisite for clinical implementation. For example, given the intrinsic short-reads sequencing strategies of HTS, simple repeats that are shorter than the read could be determined with NGS. However, if the read length is shorter than the repeat stretch, then the size of the repeated region is difficult to define [112]. Another clinically relevant challenge to using short reads is the lack of phase information, which is the parental chromosomal origin. The characterization of compound heterozygosity (two identified variants in the same gene) is challenging and, thus, illustrates this limitation. A variant-calling algorithm solution has been developed to handle such issues [113]. To solve some of these challenges, long-read sequencing strategies such as using either longer molecule barcoding fragments combined with short-read sequencing and in silico assembly [114] or longer molecule direct sequencing may be of interest [115]. Such sequencing strategies may provide a more accurate view of the genome. Chaisson et al. provided seminal evidence for the utility of long-read sequencing to generate high-quality reference genomes. The authors closed euchromatic gaps in the GRCh37 human reference genome using long-read sequencing [115]. Another major drawback of existing NGS strategies is the need for time-consuming library preparation and DNA enrichment. More automation of this step would enhance the workload and turnover and dramatically change the adoption of NGS into the clinical environment, which requires a high standard of accuracy and rapid reporting of results. The use of nanopores is a promising technology that could overcome this limitation by directly sequencing DNA fragments by passing through nanopores using either nanophotonic chambers [116] or a protein nanopore [117].
Regarding MS-based omics, there are still great challenges regarding their widespread use in the clinical environment. For metabolomics, the great drawback is still metabolite identification, particularly for untargeted approaches [118]. Accurate curated spectral repositories are essential to their clinical adoption and compliance with regulatory issues. Furthermore, harmonization of data reporting and data visualization in clinically accessible formats limits their clinical implementations [76]. Proteomic analysis is technically challenging and has major drawbacks due to splice variants and post-translational modifications of divers. The post-translational modifications interfere with DNA and RNA measurements of protein level predictions [119,120]. Better proteomic measurements require unbiased identification and quantification of proteins by direct measurements using methods analyzing their unique structure, mass, and charge with high specificity [106,121,122]. Furthermore, subtle changes in the detection of low-abundance proteins, which are often important in early-stage disease screening, might be affected by MS sensitivity limits. To overcome this limitation, different approaches have been proposed. One approach is immunocapture enrichment of low-abundance proteins prior to their MS detection [123] at the expense of additional steps in the analytical process, which may affect the throughput. The MS-based omics community is aware of these limitations and actively strives to overcome them [54, 104,124].

Omics Informatics Pipelines in the Clinical Environment
A bioinformatics pipeline is a sequential series of computationally complex data analysis processes spanning from raw data retrieval to final results output. The series includes processing, data analysis, and interrogation of reference databases. Two main pipelines are discussed here, NGS and MS-based pipelines. Figure 2 represents a schematic overview of both pipelines. analytical process, which may affect the throughput. The MS-based omics community is aware of these limitations and actively strives to overcome them [54, 104,124].

Omics Informatics Pipelines in the Clinical Environment
A bioinformatics pipeline is a sequential series of computationally complex data analysis processes spanning from raw data retrieval to final results output. The series includes processing, data analysis, and interrogation of reference databases. Two main pipelines are discussed here, NGS and MS-based pipelines. Figure 2 represents a schematic overview of both pipelines. The NGS pipeline comprises library construction and capture, sequencing reaction, and signal processing. Then, a base-calling step is performed to define the unaligned nucleotide sequence. The data are stored in FASTAQ file format containing quality scores. Subsequently, read alignment to a reference sequence is performed, followed by variant calling and annotation. The final output is a list of variants in VCF format for visualization and interpretation; Right: MS pipeline starts with sample preparation, depending on the MS instruments and the combined separation method. Data acquisition is performed according to the chosen mode (full scan or tandem MS). Subsequently, a pre-processing step is needed for feature extraction and data cleaning. The result is a list of features that will undergo data analysis, molecular annotation, and identification before biological interpretation. Signal processing is platform-dependent in NGS; however, open source solutions are available for pre-processing MS data.

NGS Informatics Pipeline
Millions of reads are generated by most common NGS platforms using short reads that overlap either the whole genome (WGS) or a specific region (targeted sequencing). An NGS pipeline includes platform-specific software to generate the sequence derived from the primary instrument The NGS pipeline comprises library construction and capture, sequencing reaction, and signal processing. Then, a base-calling step is performed to define the unaligned nucleotide sequence. The data are stored in FASTAQ file format containing quality scores. Subsequently, read alignment to a reference sequence is performed, followed by variant calling and annotation. The final output is a list of variants in VCF format for visualization and interpretation; Right: MS pipeline starts with sample preparation, depending on the MS instruments and the combined separation method. Data acquisition is performed according to the chosen mode (full scan or tandem MS). Subsequently, a pre-processing step is needed for feature extraction and data cleaning. The result is a list of features that will undergo data analysis, molecular annotation, and identification before biological interpretation. Signal processing is platform-dependent in NGS; however, open source solutions are available for pre-processing MS data.

NGS Informatics Pipeline
Millions of reads are generated by most common NGS platforms using short reads that overlap either the whole genome (WGS) or a specific region (targeted sequencing). An NGS pipeline includes platform-specific software to generate the sequence derived from the primary instrument signal; this step is called base calling. Subsequently, alignment is performed against a reference human genome sequence of the overlapping reads. Several alignment algorithms have been developed with different performance results regarding sequence variation detection [125,126]. The aligned reads are used as input files for single-nucleotide variant (SNV) detection, copy number variation (CNV), indels, and large rearrangements using open source or commercial tools. This step is called variant calling. Several variant callers are available, such as Atlas [127], MuTect [128], VarScan2 [129], and Genome Analysis Toolkit (GATK) [130]. It should be noted that these variant callers exhibit different performances depending on the platform and variant types used [131]. Thus, using different variant callers for a wider coverage of variants capture is recommended. Further annotation of the detected variant is performed using clinical data, Human Genome Variation Society annotations, genome-phenotype correlation, pathway analysis, and predicted effect (on transcription and translation). The quality and consistency of the interrogated online databases are crucial for this step. To avoid misdiagnoses, interpreting variants should be approached as a dynamic big data problem because these online databases are constantly evolving as disease knowledge evolves [132]. NGS is rapidly making its way into the clinic, and its smooth integration upstream and downstream of a sequencing analysis is becoming an important issue. Informatics challenges facing the implementation of NGS in clinical environments range from data acquisition to data reporting, including data validation, data analytics, data storage, and interoperability with already existing laboratory systems and clinical informatics infrastructures. Sample tracking and workflow management logistics are the core of any clinical grade laboratory; this should also be true for NGS. However, uncommon but important downstream offline steps should be consistently tracked such as nucleic acids extraction, library preparation, sequencing runs along with upstream steps such as bioinformatics analysis, quality assurance documentation, data interpretation, and results reporting. All these steps add complexity and error sources to the workflow. To overcome interoperability problems that face in-house custom solutions, such as fragmenting of the workflow, the ideal informatics solution should be fully integrated with the laboratory information system (LIS) so it is able to track samples from order receipt to results reporting. Of note, the generated NGS data range from 10 GB for WES to 150 GB for WGS. Hence, data storage solutions need to be addressed before implementation. Data analysis challenges not only include the computationally heavy burden of the NGS bioinformatics pipeline but also involve handling the huge amount of background data related to wet laboratory steps, sample meta-data, sample processing and tracking, reports, and QC data. With all the high-dimensional data management issues, NGS clinical implementation should be approached with big data analytics solutions [133].

Mass Spectrometry-Based Omics Informatics Pipeline
MS-based processing methods involve four main steps: (i) data acquisition; (ii) data pre-processing; (iii) data analysis using chemometrics; and (iv) identification, network, and pathway analysis [76]. Data files are acquired with proprietary software depending on each platform. Various proprietary data formats have been developed by MS manufacturers to handle MS data, but this raised sharing and processing limits between platforms. To address this problem, open formats have been developed such as netCDF, mzDATA, mzXML, and mzML [134]. Data pre-processing include peak detection, peak alignment, which is a drift time correction step in separated methods (gas chromatography-MS, liquid chromatography-MS, capillary electrophoresis-MS, and ion mobility-MS). During alignment for untargeted analysis, it is crucial to match peaks corresponding to the same analytes in different samples. Subsequently, baseline correction and spectral deconvolution for visualization are performed. Depending on the algorithm used, the order of these steps might be different [135]. The output of these steps is a matrix containing feature concentration or intensity across the different samples. Different output formats such as txt, csv, or an Excel spreadsheet could be used. Subsequently, before data analysis and modeling, different filters, transformations, and normalization methods could be applied to the generated matrix to handle the noise and clean the data. Then, various pattern recognition and machine learning techniques are applied to extract the important features (metabolites or proteins) for the next identification step and pathway and network analysis [76]. MS-based bioinformatics pipeline challenges are the same ones described for big data scaling and interoperability issues with Laboratory Information System (LIS) in a clinical environment. However, some limitations are specific to these platforms in particular, such as sample extraction and/or derivatization, which are offline processes that should be consistently tracked. The metabolite or protein identification steps still lack smooth and streamlined informatics solutions for direct database interrogation. From an informatics perspective, NGS seems to be much more advanced to be included in clinical practice. Therefore, many endeavors are needed to enhance MS informatics infrastructures to a clinical grade [136], and some initiatives have already begun [104,137,138].

Biological Variation
Biological variation is another source of discrepancies in omics studies. Except for genetic profiles being identical across tissues and cell types, all other omics profiles depend on sample type. Tissue and cell-type specificity lead to two important issues in multi-omics approaches: tissue and cell type selection and heterogeneity of tissues. The most accessible specimen in human samples is peripheral blood. Blood-based specimens such as plasma, serum, and leukocytes are commonly used in omics studies. Although the use of blood as a surrogate tissue is sometimes relevant, the biological relevance of blood omics profiles may not be apparent for many human diseases. Using blood-based specimens is a convenient start for searching novel disease-related biomarkers; however, using blood as a surrogate tissue requires cautious validation and interpretation to unravel disease mechanisms [139][140][141]. Furthermore, diet, circadian rhythm, and drugs may interfere. Another issue is cell heterogeneity; a tissue sample always involves several cell types, with each having a unique omics profile. Depending on the location of a tissue sample or the individual physiological condition, the proportions of the different cell types can change substantially. Statistical methods have been developed to adjust for potential confounding effects due to cell-type heterogeneity [142][143][144]. However, measuring the omics profile of each purified cell type is an ideal solution that could directly infer the molecular mechanism of a disease [145].

Definitions
A biomarker has been defined as a trait that can be objectively measured and evaluated; therefore, it can be used as an indicator of biological processes (normal versus disease) or of pharmacologic response upon a therapeutic intervention [146]. The FDA defined a biomarker as a measurable endpoint that may be used as an indicator of a disease or physiological state of an organism. According to these definitions, several indicators may be included, such as imaging-based or laboratory-measured biomarkers [147]. The Institute of Medicine Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials defines omics as the study of related sets of biological molecules in a comprehensive fashion. Omics-based tests are defined as "an assay composed of, or derived from, multiple molecular measurements and interpreted by a fully specified computational model to produce a clinically actionable result" [148].

Biomarker Development
To be used for diagnostics or drug development, an ideal biomarker needs to be highly specific and sensitive [147]. Biomarkers can be classified as pharmacodynamic by indicating the outcome of the interaction between a drug and a target [149] or as prognostic/predictive by stratifying the patient population to responders and non-responders [150]. Another classification that includes three types has been suggested by the Biomarkers and Surrogate End Point Working Group [146,151]: type 0 biomarkers indicate the natural history of disease and correlate with clinical indices, type I biomarkers track the effects of intervention associated with the drug mechanism of action, and type II biomarkers are surrogate end points that predict clinical benefit.
Biomarker development and translational strategies have four main issues that need to be addressed: analytical validity, clinical validity, clinical utility, and regulatory and ethical compliance. Analytical validity includes evidence of assay accuracy, reliability, and reproducibility. Clinical validity denotes evidence regarding the statistical association of biomarkers with the clinical outcome. Clinical utility assesses the benefit of the biomarker in terms of public health. Regulatory and ethical issues address guidelines and requirements compliance of the previous development steps with regulatory bodies and societal challenges, respectively [152]. Figure 3 represents the different pillars of a biomarker discovery pipeline. Biomarker development and translational strategies have four main issues that need to be addressed: analytical validity, clinical validity, clinical utility, and regulatory and ethical compliance. Analytical validity includes evidence of assay accuracy, reliability, and reproducibility. Clinical validity denotes evidence regarding the statistical association of biomarkers with the clinical outcome. Clinical utility assesses the benefit of the biomarker in terms of public health. Regulatory and ethical issues address guidelines and requirements compliance of the previous development steps with regulatory bodies and societal challenges, respectively [152]. Figure 3 represents the different pillars of a biomarker discovery pipeline.

Criteria for Omics-Based Biomarkers in Clinical Context
Three main aspects entail omics-based test development: analytical development, computational modeling of the predictor, and its clinical utility validation. Given the multi-dimensional and rich information generated by omics data, mathematical modeling is the key to building classifiers for effective medical decision-making. Because omics data are high-dimensional, machine learning and chemometric methods are needed to obtain insights from the data [91]. These methods may be divided into two main classes: unsupervised and supervised methods [76]. Unsupervised methods are exploratory and track patterns in the data; they include principal component analysis [153], independent component analysis [154], k-means clustering [155], hierarchical cluster analysis [156], and self-organizing maps [157]. Supervised methods are mainly predictive and explanatory. They model the dataset so that the class label of separate validation set samples can be predicted based on a series of mathematical models derived from the original data, namely the training set. Different supervised methods such as PLS discriminant analysis (PLS-DA) [158] and orthogonal PLS-DA (OPLS-DA) [159], as well as support vector machines [160], could be applied. For more details, the reader may refer to a recent review [76]. Figure 4 presents a schematic view of the two main computational modeling strategies using machine learning techniques for omics-based biomarker implementation.
The high-dimensionality characteristic of omics data requires new approaches for omics-based biomarkers development. McShane et al. described the main issues to take into account during omics-based biomarker development, including samples, analytical development of assays, computational model development, clinical utility assessment, and ethical and regulatory issues.
The authors suggested criteria that should be assessed for effective biomarker validation [161]. All these steps raise specific challenges regarding validation practices and determine the use of these omics-based tests [100]. A stepwise approach of using machine learning methods for clinical phenotypes prediction and omics-based predictor development spanning from data collection to

Criteria for Omics-Based Biomarkers in Clinical Context
Three main aspects entail omics-based test development: analytical development, computational modeling of the predictor, and its clinical utility validation. Given the multi-dimensional and rich information generated by omics data, mathematical modeling is the key to building classifiers for effective medical decision-making. Because omics data are high-dimensional, machine learning and chemometric methods are needed to obtain insights from the data [91]. These methods may be divided into two main classes: unsupervised and supervised methods [76]. Unsupervised methods are exploratory and track patterns in the data; they include principal component analysis [153], independent component analysis [154], k-means clustering [155], hierarchical cluster analysis [156], and self-organizing maps [157]. Supervised methods are mainly predictive and explanatory. They model the dataset so that the class label of separate validation set samples can be predicted based on a series of mathematical models derived from the original data, namely the training set. Different supervised methods such as PLS discriminant analysis (PLS-DA) [158] and orthogonal PLS-DA (OPLS-DA) [159], as well as support vector machines [160], could be applied. For more details, the reader may refer to a recent review [76]. Figure 4 presents a schematic view of the two main computational modeling strategies using machine learning techniques for omics-based biomarker implementation.
The high-dimensionality characteristic of omics data requires new approaches for omics-based biomarkers development. McShane et al. described the main issues to take into account during omics-based biomarker development, including samples, analytical development of assays, computational model development, clinical utility assessment, and ethical and regulatory issues. The authors suggested criteria that should be assessed for effective biomarker validation [161]. All these steps raise specific challenges regarding validation practices and determine the use of these omics-based tests [100]. A stepwise approach of using machine learning methods for clinical phenotypes prediction and omics-based predictor development spanning from data collection to large-scale clinical validation are presented in Figure 5.

Omics Integration and the Curse of Dimensionality
Biomedical data are becoming quantitatively (number of samples) and qualitatively (data heterogeneity) complex. The number of samples is driven by the ever-growing high throughput of data acquisition technologies and their digitization, whereas heterogeneity entails biological features (biomolecules, diseases) and related metadata (sampling metadata and clinical data). Furthermore, data could be acquired through different platforms, thus adding bias, complexity, and noise. For these issues, machine learning methods are suitable for data modeling and integration [162]. Data integrative methods can holistically analyze multiple data types to provide systems-level biological insights [91]. Dimensionality reduction techniques have been widely used to handle the biomedical big data deluge, but on a large scale they are computationally intensive. To handle these issues, topological data analysis (TDA) methods may help. TDA methods have emerged recently, but the concept goes back to Leonhard Euler and his work with algebraic topology in the 16th century. TDA methods acquire insight from data by analyzing their shapes (patterns) with geometric dimensional conversions [163][164][165]. These methods have shown good performance in finding hidden patterns when other standard methods fail [95,163,166]. Parsimony phylogenetic analysis is another promising method to handle the omics data deluge [167]. Disease subtype classification for patient stratification is both data-dependent and method-dependent. Thus, it is urgent to have a representative and consistent reference dataset that can be used for the comparison and evaluation of methods.

Omics Integration and the Curse of Dimensionality
Biomedical data are becoming quantitatively (number of samples) and qualitatively (data heterogeneity) complex. The number of samples is driven by the ever-growing high throughput of data acquisition technologies and their digitization, whereas heterogeneity entails biological features (biomolecules, diseases) and related metadata (sampling metadata and clinical data). Furthermore, data could be acquired through different platforms, thus adding bias, complexity, and noise. For these issues, machine learning methods are suitable for data modeling and integration [162]. Data integrative methods can holistically analyze multiple data types to provide systems-level biological insights [91]. Dimensionality reduction techniques have been widely used to handle the biomedical big data deluge, but on a large scale they are computationally intensive. To handle these issues, topological data analysis (TDA) methods may help. TDA methods have emerged recently, but the concept goes back to Leonhard Euler and his work with algebraic topology in the 16th century. TDA methods acquire insight from data by analyzing their shapes (patterns) with geometric dimensional conversions [163][164][165]. These methods have shown good performance in finding hidden patterns when other standard methods fail [95,163,166]. Parsimony phylogenetic analysis is another promising method to handle the omics data deluge [167]. Disease subtype classification for patient stratification is both data-dependent and method-dependent. Thus, it is urgent to have a representative and consistent reference dataset that can be used for the comparison and evaluation of methods.

Data Integrity, Standardization, and Sharing
Data quality, integrity, and security are the keys to retrieving and maintaining the flow of data and are essential for achieving the promise of "precise" medicine. Data sharing can allow a study to proceed despite the low number of participants, which is often the case in IEM studies. However, the key drivers of data sharing are data and meta-data standards. These are essential for successful data integration and exchange. The lack of such standards or their inconsistent use, especially in omics, are the main drawbacks [102]. Furthermore, in addition to global harmonization, new adapted regulatory approaches for these new omics strategies are urgently needed [168,169].
Large amounts of acquired data raise complex challenges for healthcare stakeholders, including patients. These challenges include the following: (i) sample collection, handling, storage, and transport; (ii) data analyses using multi-omics integration techniques; and (iii) collecting electronic medical record data. The integration of medical record data with biological data and their analysis are other issues. Finally, data sharing within the scientific community raises controversial legal, ethical, and privacy concerns as well [170,171].

Turning Data into Knowledge
Although molecular biomarkers have helped to unveil the underlying pathophysiological mechanisms of disease, only a few of the currently known biomarkers are clinically actionable [172]. When introducing a biomarker to the clinic, it is important to consider its functional characterization through pathways and network analysis, along with its implementation feasibility in terms of public health. Despite the progress in patient phenotyping and stratification, new methods are needed to address the PM era challenges, including analyses of large data [173], integration of multi-type data [174], and simulation of disease behaviors across multi-scale modeling in space and time [91,[175][176][177].

Clinical Research Enterprise and Embracing Multi-Disciplinary Sciences
The new omics revolution will play a central role in the post-genomics era of healthcare. To achieve this promise, it is necessary to combine expertise from multiple disciplines, including clinicians, medical laboratory professionals, data scientists, computational biologists, biostatisticians, and lawyers. This observation increases the necessity for new PM teams with new skill sets to develop overlapping expertise for more effective medical interactions across all healthcare partners. Hence, the skill sets of medical professionals need to be diverse; clinical, biological, and computational knowledge to achieve the promises of PM. Training the new generation of the medical workforce to manage and interpret omics data is one solution, and inception of such thinking has already started [178][179][180]. Clinical bioinformatics provides a bridge between omics sciences and clinical practices [181]. We are facing an urgent need to transform all aspects of the healthcare system.

Informatics and New Pathways to Clinical Actionability
Informatics research and innovation are key drivers of the science underlying PM [181]. Actionable biomarkers that aid in clinical decision-making will be envisioned by new frameworks to navigate multi-level evidence regarding whether and how a detected molecular abnormality might be a clinically relevant biomarker [11]. Thanks to databases, accurate annotations with contextual and actionable clinical information will enable the emergence of decision support systems to provide intuitive and patient-specific actionable reports [87,182,183]. Urgent areas to be addressed by clinical bioinformatics research may include biomarker discovery, computational phenotyping, and frameworks for evaluating clinical actionability and utility [181,184,185]. Furthermore, standardization and harmonization-related barriers might trap interoperability and integration by making data aggregation a challenging task [87].

Paradigm Shift in IEM Investigations
Because IEM are linked to a genetic defect, their current characterization addresses the mutated gene and its products. However, genotype-phenotype correlation is lacking in several IEM, which leads to consideration of the influence of genetic or environmental modifying factors and the impact of an altered pathway on metabolic flux as a whole. These diseases are related to the disruption of specific interactions in a highly organized metabolic network [91,186]. Thus, the impact of a given disruption is not easily predictable [6,187]. Therefore, a functional overview integrating both space and time dimensions is needed to assess the actors of the altered pathway and the potential interactions of each actor [4]. Systemic approaches may address IEM complexity and allow their diagnosis [10,91]. The effectiveness of such approaches has been recently illustrated by van Karnebeek et al. These authors observed a disruption of the N-acetylneuraminic acid pathway in patients with severe developmental delay and skeletal dysplasia by using both genomics and metabolomics approaches. As a result, variations in the NANS gene encoding the synthase for N-acetylneuraminic acid have been identified [10].
Omics-generated data and clinical data integration allow a paradigm shift in IEM handling. An innovative global approach that involves extracting the useful and actionable information may change screening and diagnosis practices. Therefore, a disruptive move from sequential and hypothesis-driven approaches to a global and hypothesis-generating approach is mandatory to embrace the PM era. The core idea of the paradigm shift in the IEM diagnosis workflow is presented in Figure 6.

Paradigm Shift in IEM Investigations
Because IEM are linked to a genetic defect, their current characterization addresses the mutated gene and its products. However, genotype-phenotype correlation is lacking in several IEM, which leads to consideration of the influence of genetic or environmental modifying factors and the impact of an altered pathway on metabolic flux as a whole. These diseases are related to the disruption of specific interactions in a highly organized metabolic network [91,186]. Thus, the impact of a given disruption is not easily predictable [6,187]. Therefore, a functional overview integrating both space and time dimensions is needed to assess the actors of the altered pathway and the potential interactions of each actor [4]. Systemic approaches may address IEM complexity and allow their diagnosis [10,91]. The effectiveness of such approaches has been recently illustrated by van Karnebeek et al. These authors observed a disruption of the N-acetylneuraminic acid pathway in patients with severe developmental delay and skeletal dysplasia by using both genomics and metabolomics approaches. As a result, variations in the NANS gene encoding the synthase for N-acetylneuraminic acid have been identified [10].
Omics-generated data and clinical data integration allow a paradigm shift in IEM handling. An innovative global approach that involves extracting the useful and actionable information may change screening and diagnosis practices. Therefore, a disruptive move from sequential and hypothesis-driven approaches to a global and hypothesis-generating approach is mandatory to embrace the PM era. The core idea of the paradigm shift in the IEM diagnosis workflow is presented in Figure 6.

Conclusions
Current medical practice is being undermined and PM is profoundly reshaping the future of medicine through recent technological advances. Omics technologies are enabling the simultaneous measurement of a huge number of biochemical entities, including genes, genes expressions, proteins, and metabolites. After decades of reductionism, holistic approaches have begun to address inborn errors of metabolism in a systemic fashion [9,64,91]. Despite some existing drawbacks, genomics and metabolomics seem to be taking the lead in the race to get into clinical practice. However, challenges such as data quality/integrity, reproducibility, and study sample sizes have to be addressed. The small number of multi-omics datasets in the field of IEM and the Figure 6. Paradigm shift in Inborn Errors of Metabolism (IEM) diagnosis workflow. Laboratory workflow using high-throughput analytical technologies, integrative bioinformatics, and computational frameworks recovers molecular information for more effective medical decision-making.

Conclusions
Current medical practice is being undermined and PM is profoundly reshaping the future of medicine through recent technological advances. Omics technologies are enabling the simultaneous measurement of a huge number of biochemical entities, including genes, genes expressions, proteins, and metabolites. After decades of reductionism, holistic approaches have begun to address inborn errors of metabolism in a systemic fashion [9,64,91]. Despite some existing drawbacks, genomics and metabolomics seem to be taking the lead in the race to get into clinical practice. However, challenges such as data quality/integrity, reproducibility, and study sample sizes have to be addressed. The small number of multi-omics datasets in the field of IEM and the lack of standardized and harmonized protocols affect the wide dissemination of these approaches. To overcome these drawbacks, attention should be given to validation strategies at all stages. Moreover, the development of new analytical and machine learning methods will facilitate analysis of multi-tissue and multi-organ data, thus enabling a real investigation of systemic effects [95,141,163]. Extended and effective resources for biobanking are also essential to ensure consistency. Addressing these challenges will improve healthcare management of IEM by moving from a reactive, targeted, and reductionist approach to a more proactive, global, and integrative one.
Upgrading laboratory informatics infrastructures and a new medical workforce trained in biomedical big data management are necessary for the successful integration of omics-based strategies. However, the potential of these strategies in the investigation of IEM has yet to be unveiled to all IEM stakeholders worldwide. Laboratory workflows with high-quality data acquisition, mining, and visualization are fundamental for fully embracing the four Ps (predictive, personalized, preventive, and participatory) of PM [188] and effectively translating the underlying biological knowledge into clinically actionable tools.
Author Contributions: Abdellah Tebani performed the literature review and wrote the manuscript, including tables and figures. Carlos Afonso critically revised and edited the manuscript. Stéphane Marret critically revised and edited the manuscript. Soumeya Bekri conceived the topic under review and critically revised and edited the manuscript. All authors approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.