Overview of Metabolomic Analysis and the Integration with Multi-Omics for Economic Traits in Cattle

Metabolomics has been applied to measure the dynamic metabolic responses, to understand the systematic biological networks, to reveal the potential genetic architecture, etc., for human diseases and livestock traits. For example, the current published results include the detected relevant candidate metabolites, identified metabolic pathways, potential systematic networks, etc., for different cattle traits that can be applied for further metabolomic and integrated omics studies. Therefore, summarizing the applications of metabolomics for economic traits is required in cattle. We here provide a comprehensive review about metabolomic analysis and its integration with other omics in five aspects: (1) characterization of the metabolomic profile of cattle; (2) metabolomic applications in cattle; (3) integrated metabolomic analysis with other omics; (4) methods and tools in metabolomic analysis; and (5) further potentialities. The review aims to investigate the existing metabolomic studies by highlighting the results in cattle, integrated with other omics studies, to understand the metabolic mechanisms underlying the economic traits and to provide useful information for further research and practical breeding programs in cattle.


Introduction
The omics, such as genomics, transcriptomics, epigenomics, proteomics and metabolomics, have emerged, whereas the terms genome, transcriptome, epigenome, proteome and metabolome are used to address the objects of such studies, respectively [1][2][3][4][5]. The metabolome is a complete set of small-molecule types, such as endogenous intermediates, metabolism products or metabolites that are applied by metabolomics to study the response of biological systems, where metabolites are the final products of cellular regulatory processes [6].
Currently, the applications of metabolomics have increased to measure metabolic responses dynamically, identify biologically relevant candidate metabolic markers, reveal potential genetic architecture and understand the systematic networks underlying the economic traits for cattle [7][8][9][10][11][12][13]. For example, potential metabolic biomarkers, pathways or networks were identified for milk protein yield (MPY) and feed efficiency traits in dairy cattle using serum and plasma samples [8,9]. Hippuric acid, nicotinamide and pelargonic acid out of 36 significant metabolites were identified to play the key roles in MPY metabolism [8], whereas α-ketoglutarate and succinic acid were found in the network of feed efficiency [9]. Meanwhile, the metabolomic signatures associated with residual feed intake (RFI) trait in beef cattle were also found using plasma, rumen fluid, muscle, liver, etc., samples [7,10,12,13], where the retinol metabolism pathway is considered to be associated with feed efficiency [12]. Furthermore, significant metabolites in different tissues, such as in liver (citrate, isocitrate, glucose-6-phosphate, nicotinamide adenine dinucleotide + hydrogen and creatine phosphate) and in muscle (choline, glycine, glycerol, malonate, glucose-6-phosphate and 3-hydroxybutyrate), were studied to reveal useful metabolic signatures for Nellore cattle [13].
Given the previous findings in different traits (e.g., production, reproduction, nutrition, health, welfare), it is essential to summarize the major results of metabolomic analysis for further research and applications in cattle, as well further ingratiation analysis with other omics. Therefore, this review aims to investigate the existing metabolomic studies by highlighting the results from five aspects in cattle, integrated by other omics studies, e.g., genomics, transcriptomics, epigenomics, microbiomics, etc., to understand the metabolic mechanisms underlying the economic traits in cattle and to provide useful information for further cattle research and practical breeding programs.

Characterizations of Metabolomic Profiles in Cattle
The diversities of metabolome characterization occurring in cattle depend on the different breeds, traits, tissues, times, etc. To generate a better understanding of the underlying metabolic mechanisms in cattle, candidate metabolic biomarkers for various tissues and their enriched metabolic pathways are summarized in this review for important economic traits, such as feed efficiency and disease.

Candidate Metabolic Biomarkers for Various Tissues Associated with Production and Healthy Traits in Cattle Identified by Previous Studies
Based on previous studies, we found that researchers investigated the metabolomics of plasma, serum, milk, rumen fluid for feed efficiency, body performance, disease, etc., traits in cattle (Table 1). Feed efficiency is an important trait to produce more per feed that can be measured by gross feed efficiency (GFE), feed conversion ratio (FCR) or RFI methods [14,15]. Archer et al. (1999) [16] demonstrated that the inherent metabolic differences between animals can be reflected by the differences of RFI, while the RFI variation is underpinned by a combination of factors including metabolism [17]. Table 1 presents 24 metabolites that have been identified to be related to RFI, where citrate and succinic acid were repeatedly detected by several studies [7,9,10]. In addition, some studies found 1,3-dihydroxyacetone in association with fat, lactose and somatic cell score [18], but lysine and succinate in association with growth trait and feed efficiency traits [7,9,10] (Table 1).
Metabolic disease is another important trait that affects efficient cattle production, where metabolomic applications are helping to understand the mechanisms and define the predictive metabolic biomarkers for incident diseases [19]. Many metabolomic studies are revealing the associated metabolites with such diseases (Table 1); for instance, β-hydroxybutyrate is found to be mainly related to cattle disease traits that cause milk problems [20][21][22], because its concentration in blood is the main reason for hyperketonemia, which can be used as the disease diagnosis [23]. Benedet et al. (2019) [23] suggested that the thresholds of β-hydroxybutyrate concentration could be divided into three categories: ≤1.2 mmol/L as hyperketonemia indication; 1.2-2.9 mmol/L as subclinical ketosis; ≥3.0 mmol/L as clinical ketosis based on the other suggestions [24][25][26][27][28].

Revealed Metabolic Pathways in Cattle
For the feed efficiency trait, the enrichment of the retinol metabolic pathway was revealed in beef cattle, where two metabolites in the pathway (a higher level of retinal and a lower level of retinoate) were found in the low feed efficient animals [12]. However, three important pathways that are the aminoacyl-tRNA biosynthesis, the alanine, aspartate, and glutamate metabolism, and the citrate cycle (TCA cycle) pathways were also associated with RFI in dairy cows using two types of pathway analysis [9]. In this review, we used the metabolites associated with RFI (n = 24, Table 1) to conduct the over-representation analysis (ORA) for metabolic pathway analysis. Fishers' exact test for ORA was done by Metabo-Analyst software (version 5.0) [35], and metabolic pathways using the Bos taurus library were also realized to show the relative betweenness centrality against pathway impact value. The results showed that nine significantly metabolic pathways (FDR < 0.05) were revealed ( Figure 1 and Supplementary Table S1), where the most significantly metabolic pathway was the aminoacyl-tRNA biosynthesis, followed by the glyoxylate and dicarboxylate metabolism and the phenylalanine metabolism ( Figure 1). Six metabolites (glutamate, glycine, lysine, phenylalanine, threonine and tyrosine) were enriched in the aminoacyl-tRNA biosynthesis pathway (Supplementary Table S1) and the metabolite connections in the pathway were visualized in Supplementary Figure S1 using MetaboAnalyst software (version 5.0) [35].
The aminoacyl-tRNA biosynthesis pathway is an amino acid metabolism and biosynthesis related pathway that has been identified as associated with RFI in dairy cows [9] and pigs [36]. This pathway is essential for normal growth and protein synthesis, and potentially influences cellular physiology and development [37,38]. Alanine, aspartate and glutamate metabolism, and the citrate cycle (TCA cycle)) pathways are also reported in relationship with feed efficiency traits [9], whereas the alanine, aspartate and glutamate metabolism is more sensitive to the diets and breed to affect the beef tenderness and meat sensory acceptability [34,39]. The mechanism illustration of the aminoacyl-tRNA biosynthesis pathway (bta00970) is shown in Figure 2, which is derived from the KEGG pathway database (https://www.genome.jp/kegg/, accessed on 20 September 2021) of Bos taurus species. It is suggested that the aminoacyl-tRNA biosynthesis pathway is mainly related to the other nine pathways. They are the alanine, aspartate and glutamate metabolism (bta00250), the glycine, serine and threonine metabolism (bta00260), the cysteine and methionine metabolism (bta00270), the valine, leucine and isoleucine biosynthesis (bta00290), the lysine biosynthesis (bta00300), the arginine and proline metabolism (bta00330), the histidine metabolism (bta00340), the phenylalanine, tyrosine and tryptophan biosynthesis (bta00400), and the tryptophan metabolism (bta00380) (Figure 2). It was found that the alanine, aspartate and glutamate metabolism was in a close relationship with the aminoacyl-tRNA biosynthesis pathway in terms of the mechanisms of feed efficiency regulation via the alanine, aspartate and glutamate metabolites [9].

Applications of Metabolomics in Cattle
Metabolomics has been applied in metabolic biomarker identification, genetic mechanism revelation, genomic prediction, understanding nutritional physiology, etc., for different economic traits of different species, which promotes the applications of metabolomics in cattle.

Revealed Biologically Genetic and Metabolic Related Mechanisms
The application of metabolomics and the other integrated omics data analysis lead to the clear cognition of the complex metabolic mechanisms [40]; for example, metabolome diversification occurs during different lactations [41,42]. Sun et al. (2017) [41] revealed five functionally enriched pathways (gluconeogenesis, pyruvate metabolism, TCA cycle, glycerolipid metabolism and aspartate metabolism) and suggested the TCA cycle, the glutamate metabolism, and the glycine biosynthesis and degradation pathways as the potential key metabolic mechanisms of lactation in the mammary gland. In fact, the aminoacyl-tRNA biosynthesis, the alanine, aspartate and glutamate metabolism, and the TCA cycle pathways also play key roles in the biochemical mechanisms in feed efficiency underlying metabolic biomarker variations (Table 1 and Figure 1). Most importantly, Wang and Kadarmideen [9] demonstrated one gene-metabolite network involved in the TCA cycle as the potential mechanism for RFI that modulates protein synthesis and regulates energy metabolism [43][44][45][46].

Improved Genomic Prediction for Complex Traits
Metabolomic-based genomic prediction has been conducted in plant species, such as wheat and barley, to display the potentiality of metabolite application as the predictor variables when no genotype is available [47][48][49][50]. Gemmer et al. (2020) [47] used the box-cox power method [51] to transform the metabolic data and then designed three prediction scenarios that are genomic prediction, metabolic prediction and the combined genomicmetabolic prediction. They found that both single-nucleotide polymorphisms (SNPs) and metabolites in the combined prediction scenario produced similar predictive abilities compared to the pure genomic prediction [47], which is consistent with other studies [48]. Nevertheless, Tong et al. (2020) [49] and Guo et al. (2016) [50] still found the integration of metabolites with genotypes significantly improved the prediction accuracies in maize and Arabidopsis, respectively; however, such predictive abilities were trait specific, so the metabolic information is suggested for use as predictors but to predict those traits directly related to metabolism. In animal breeding programs, useful metabolic information has also been suggested for incorporation into genomic prediction models or to be integrated with phenotypes or to be considered as the alternative phenotypes [52,53].

Understood Nutritional Biochemical Physiologies
Diet-based rumen metabolomic analysis can help reveal the nutritional biochemical physiology after feeding different diets [29,54]. For instance, when increasing proportions of barley grain diets were fed to dairy cows, metabolites (glucose, alanine, maltose, propionate, uracil, valerate, xanthine, ethanol, and phenylacetate) and methylamine concentrations in rumen increased as well, but the amount of 3-phenylpropionate decreased [54]. Similarly, Saleem et al. (2012) [29] explained more than 30% of grain diets influencing the health of dairy cattle because the rumen toxic or inflammatory fluid concentrations increased, such as putrescine, methylamines, ethanolamine and short-chain fatty acids. Different cattle feeding systems (e.g., only perennial ryegrass, total mixed ration and perennial ryegrass/white clover sward) could cause different metabolome profiles in milk and the subsequent products, such as amino acid composition in milk, and metabolome in skim milk and whey powders [55]. Sometimes, significant metabolome changes at different ages were found to indicate the identified metabolites as the potential biomarkers for early growing and fattening animals. Jeong et al. (2019) [56] revealed 19 metabolites and 3 metabolic pathways in beef cattle that assisted in a better understanding of cattle growth physiology for appropriate feeding strategies.

Integrated Metabolomic Analysis with Other Omics
Metabolomic analysis integrated with other omics data could contribute to the better understanding of the metabolomic complexity based on systems biology, but multiple layer integration would cause the challenge of statistics under the appropriate hypoth-esis [57,58]. The current integration analysis primarily focuses on two-layer interplays for the direct associations between two omics data that can be used to identify relevant candidate biomarkers, such as SNPs, genes, proteins, cytosine and guanine dinucleotides (CpGs), microbial communities, lipids, etc. It includes genomic−metabolomic analysis, transcriptomic/proteomic−metabolomic analysis, epigenomic−metabolomic analysis, microbiomic−metabolomic analysis and lipidomic−metabolomic analysis.

Genomics-Metabolomic Analysis
Metabolomics is the joint to connect genotypes with phenotypes [6], so their relationships are currently interpreted by the metabolome genome-wide association study (mG-WAS) using metabolites as the metabolic phenotypes. The integrated genomic−metabolomic analysis is considered as a critical supplement to biology and physiology, as the metabolites provide the details of physiological state that can drive genetic variant-associated metabolites to display larger effect sizes, and then the quantitative trait loci (QTLs) affecting metabolite concentrations can be identified [53,[59][60][61][62][63].
The mGWAS is the direct association model between genomics and metabolomics to test the candidate SNPs or QTLs related to metabolites. It can be analyzed in the tools that are applied for GWAS, such as EMMAX (efficient mixed-model association eXpedited), FaST-LMM (factored spectrally transformed linear mixed models), GCTA (genome-wide complex trait analysis), GEMMA (genome-wide efficient mixed-model association) [64][65][66][67]. The mixed model is generally described as follows: where y is the vector of phenotypes (e.g., metabolite values), W is the design matrix of covariates for fixed effects (e.g., breed, RFI, PCAs for genomic control [65,66,68,69]), a is the vector of fixed effects (i.e., corresponding coefficients) including the intercept, X is the marker covariates (i.e., SNP indicators 0, 1 or 2), b is the additive effect (fixed effect) of each marker to be tested, Z is the design matrix for g, g is the vector of polygenic effects as random effects that are the accumulated effects of all markers (i.e., captured by genetic relationship matrix (GRM) calculated using all SNPs) and e is the vector of residual effects. The polygenic and residual variances are Var[g] = Go 2 g and Var[e] = Iσ 2 e , where G and I are the GRM and identity matrix, respectively.

Transcriptomic-Metabolomic Analysis
The gene-metabolite interplay network can be constructed for transcriptomic-metabolomic analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using MetaboAnalyst [70,71]; for example, one gene (2-hydroxyacyl-CoA lyase 1 (HACL1)) associated with two metabolites (α-ketoglutarate and succinic acid) was identified in high-low feed efficient dairy cattle [9]. Likewise, web tools IMPaLA, Metabox (R based), XCMS, etc., [72][73][74][75] also integrate metabolomic data with transcriptomics on the pathway level. Interactions between genes and metabolites in different combinations of biological networks can enhance our knowledge of underlying biological mechanisms by reflecting the cellular regulations in different layers [72,73]. Based on the BioCyc [76], KEGG [77] and Uniprot [78] databases, the genes and proteins can be mapped on the predicted metabolic pathways [73]. R package IntLIM [79] was used to integrate metabolomics and gene expression data for feed efficiency traits in pigs [80], where the interactions of phenotypes and gene expressions are fitted in the model [79]. The linear model that IntLIM [79] used is as follows: where m, g and p are metabolite values (normalized), gene expression levels (log2-transformed) and phenotypes (case-control designed), respectively. Here, g : p represents the statistical interaction between gene expressions and experimental designed phenotypes, where a significant two-tailed p-value indicates the gene-metabolite association is different from the cases to the controls [79,81].

Other Two-Layer Omics−Metabolomic Analysis
For the two-layer omics data integration, epigenomic−metabolomic interactions could discover novel molecular targets via epigenetic mechanisms regulating the expression levels of metabolic genes and thereby altering the metabolome [82,83]. Wong et al. (2017) [82] suggested that epigenetic drugs (e.g., DNMT and HDAC inhibitors) could be used to target metabolic reprogramming in cancer cells. In their review, they also considered the combination of metabolism inhibitors and epigenetic modulators to achieve synergistic tumor inhibition as the developmental approach [82]. On the other hand, Petersen et al. (2014) [83] conducted an epigenome-wide association study for blood serum metabolites to investigate the relationship between DNA methylation and metabolic traits. They found that the underlying genetic effects or environmental effects mainly drove the methylome-metabotype associations, and identified several CpG site-specific associations with metabolites; therefore, DNA methylation has an important role in regulating the metabolism [83].
So far, the analysis between microbiome and metabolome could predict which compounds have been produced by a community of bacteria or the host in an R package AMON [84]. However, another similar web tool MIMOSA [85] is a relatively quantitative tool that determines the quantitative relationships between the relative abundance of genes in a metagenome and the abundance of the particular compounds in a metabolome. Moreover, Mallick et al. (2019) [86] developed the MelonnPan algorithm to predict the unobserved metabolite features in the new microbial communities by incorporating biological knowledge.
Lipidome is one subset of the metabolome as same to amino acids, sugars and nucleic acids, but lipidomics has emerged as an independent field due to the functionally structural diversity and high endogenous abundance of lipids resulting in the complexities of the organismal lipidomes [87,88]. The integration analysis between metabolomics and lipidomics are normally applied to understand the cellular mechanism and to reveal signatures for human diseases [88,89]. Wang et al. (2019) [88] summarized the previous studies on the roles of lipids and metabolites for diseases, and found that the integrated analysis of metabolomics and lipidomics was critical for the revelation of cellular biology and disease pathology. Acharjee et al. (2016) [89] used a machine learning approach to integrate the metabolomics, lipidomics and clinical data. They pinpointed that lipidomics was the most predictive data responding to different doses and then established the relationships of the metabolic and lipidomic data with aspartate amino transaminase [89].

Multiple Integrated Omics−Metabolomic Analysis
In the previous study, transcriptomic, proteomic and metabolomic integrated analysis was used to investigate the overexpression and inhibition of miR-223 affecting gene regulation in the cytoplasm of the monocyte−macrophage cell line [90]. They characterized the three-layer integrated metabolomic analysis with other omics responses to miR-223 modulation, and found that the miR-223 alteration changed the gene expressions (CARM-1, Ube2g2, Cactin and Ndufaf6) during macrophage differentiation and osteoclastogenesis and the metabolic profile of cells to potentially influence the apoptotic and proliferative states [90]. Jamil et al. (2020) [91] also proposed three levels of integration analysis for transcriptomic−proteomic−metabolomic data that are element-based (e.g., correlation and clustering), pathway-based (e.g., pathway and co-expression) and more complex mathematical-based levels. Frau et al. (2019) [92] firstly integrated metabolome, microbiome and mycobiome data in Crohn's disease (CD) with the aim of investigating the correlation of fungi metabolites with fungal species in CD patients; finally, they understood which microorganisms were likely active in CD and which microorganisms produced the metabolites of interest.

Methods and Tools Applied in Metabolomics Analysis
The current metabolomics analysis methods and tools are widely applied for metabolic biomarker detection, cluster classification, pathway and network identification, two-layer data integration, etc., which is usually limited in the metabolomics category ( Table 2). Statistical methods and analyzing tools for multiple-layer integrations are still necessary for the further integrated metabolomics analysis with other omics. The important features and used environments of the current tools are listed in Table 2.
Generally, a linear regression model is considered to analyze metabolomics data for significant metabolite identification by fitting phenotypes (e.g., RFI) as covariates [9,80]. Sometimes, the elastic net regularization model is also applied to fit microbial communities [86]. The analysis tools can be in a web environment or be directly used by the related R packages, such as MetabR, glmnet, IntLIM, etc., [79,93,94].
In order to cluster the metabolites, principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLS-DA) are normally used in the R packages or other tools, such as MetaboAnalyst, VOCCluster [93,95]. Tools with new features, such as interactive time-series cluster analysis (R package MetaboClust) [96], automated hierarchical cluster (R package hcapca) [97] are also developed for clustering analysis. Bayesian network method (BNM) can model the interactions of the metabolites to identify important metabolites in the optimal network, which has been demonstrated in the study of Rogers et al. (2014) [98]. The predictive accuracy of BNM with an area under the curve convex hull (AUCCH) was higher than PLS-DA, as PLS-DA probably led to overfitting that was indicated by the permutation test [99].
Notably, metabolomics data is complex and nonlinear, so machine-learning methods are applied for the nonlinear data interpretation based on random forest, support vector machine, artificial neural network algorithms, etc., [100][101][102][103][104]. For example, Ghaffari et al. (2019) [105] employed the machine-learning methods to reveal 12 significant metabolites and 2 meaningful pathways in normal versus over-conditioned cows. Such methods with the developed tools can also be used for biomarker detection, classification, biochemical pathway identification and multi-omics integration [103].
Some transcriptomic expression and co-expression analysis approaches are also available for metabolic data by considering the metabolite values as the expression levels to perform the metabolite analysis and the interacted networks, such as using R package limma and WGCNA [106][107][108][109][110]. WGCNA (weighted correlation network analysis) can construct a similarity matrix by Pearson correlation coefficients to measure the profiles' similarity for the further network construction, and then identify the metabolically relevant key modules [107].

Implications and Further Potentialities
Metabolomic and integrated other omics data analysis are available for the measurement of dynamically metabolic responses, identification of biologically metabolic markers, revelation of potentially genetic architecture, understanding of systematic networks. In cattle, the classified metabolome clusters, detected relevant candidate metabolites, identified metabolic pathways, potential systematic networks have been studied to achieve promising and meaningful results. We summarized the previous results of one-layer metabolomic analysis and potential two-layer integration analysis that can be presented in Figure 3, as discussed above. We fully believe that the summarized results are useful for metabolomic application in cattle farms. Multiple integrated omics analysis would become critical and favorable in a further study, especially for the integration of genomics, epigenomics, transcriptomics, proteomics, microbiomics and lipidomics to reveal the metabolic-related mechanisms, identify multiplelayer biomarkers and improve genomic predictions, etc. Theoretically, metabolite QTLs, DNA methylation QTLs, expression QTLs (eQTLs), protein QTLs (pQTLs), microbe QTLs could be integrated on the genome level by constructing a multiple-omics network using those QTLs as the joint. In the meantime, the annotated genes/proteins to QTLs affecting metabolite concentrations that are regulated by the genetic/epigenetic variants could be connected together based on the biological pathways ( Figure 3). Thus, the network construction of genomics, epigenomics, transcriptomics, proteomics, metabolomics with the joints from the genomic level to the pathway level is derived, but possibly four-layer integrations (QTLs−genes/proteins (CpG regulated) −pathways−metabolites) are difficult and challenging to find out or to identify by the current relatively smaller sample sizes, so larger populations are still necessary for further validation in cattle using one promising economic trait.

Conclusions
In summary, this review concludes the useful metabolic information from the previous research results including the characterizations of metabolomics profiles, metabolomics applications, integrated metabolomics analysis with other omics, methods and tools in metabolomics analysis, and the further potentialities and implications in cattle, which may contribute to the production improvement, disease reduction, efficient farming for the cattle economically important traits.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/metabo11110753/s1, Supplementary Table S1: Significantly metabolic pathways (FDR < 0.05) for the metabolites associated with feed efficiency using Bos taurus as the library. Supplementary Figure S1: The connections of the metabolites including glutamate, glycine, lysine, phenylalanine, threonine and tyrosine (red color) enriched in the aminoacyl-tRNA biosynthesis pathway.