Utilising a Clinical Metabolomics LC-MS Study to Determine the Integrity of Biological Samples for Statistical Modelling after Long Term −80 °C Storage: A TOFI_Asia Sub-Study

Biological samples of lipids and metabolites degrade after extensive years in −80 °C storage. We aimed to determine if associated multivariate models are also impacted. Prior TOFI_Asia metabolomics studies from our laboratory established multivariate models of metabolic risks associated with ethnic diversity. Therefore, to compare multivariate modelling degradation after years of −80 °C storage, we selected a subset of aged (≥5-years) plasma samples from the TOFI_Asia study to re-analyze via untargeted LC-MS metabolomics. Samples from European Caucasian (n = 28) and Asian Chinese (n = 28) participants were evaluated for ethnic discrimination by partial least squares discriminative analysis (PLS–DA) of lipids and polar metabolites. Both showed a strong discernment between participants ethnicity by features, before (Initial) and after (Aged) 5-years of −80 °C storage. With receiver operator characteristic curves, sparse PLS–DA derived confusion matrix and prediction error rates, a considerable reduction in model integrity was apparent with the Aged polar metabolite model relative to Initial modelling. Ethnicity modelling with lipids maintained predictive integrity in Aged plasma samples, while equivalent polar metabolite models reduced in integrity. Our results indicate that researchers re-evaluating samples for multivariate modelling should consider time at −80 °C when producing predictive metrics from polar metabolites, more so than lipids.


Introduction
Metabolomics analyses provide a comprehensive profiling of lipids and polar metabolites in biological samples, allowing researchers to identify changes in metabolic pathways and identify biomarkers associated with conditions such as metabolic syndrome and type 2 diabetes (T2D) [1][2][3].
In 2021, approximately 537 million people worldwide were reported as having diabetes.Furthermore, 118 million of these cases were Chinese adults aged 20-79 years, making China the country with the highest number of adults with diabetes in the world [4].The prevalence of diabetes in China has increased rapidly over the past few decades, driven by a combination of factors, including changes in lifestyle, an aging population, and increased urbanization [5,6].In particular, the prevalence of T2D amongst the Chinese population has been linked with the "Thin on the Outside Fat on the Inside" (TOFI) phenotype, where lean individuals present with visceral (VAT) and ectopic adipose tissue deposited in and around key organs such as the liver and pancreas [7][8][9].Researchers have shown body mass index (BMI) to be a poor predictor of VAT and organ fat deposition and, in turn, a weaker indicator of T2D development within susceptible individuals [9][10][11].Rather, researchers have established that the ratio of VAT to subcutaneous adipose tissue (VAT/SAT) provides for a more accurate predictor of an individual's T2D susceptibility [12,13].Previously published TOFI_Asia studies from our laboratory have monitored VAT/SAT ratios to further discern biological disparities between Asian Chinese and European Caucasian cohorts, modelling fasting plasma glucose (FPG) and associated metabolic markers, plus lipids and polar metabolite features for prediction of T2D risk [11,14,15].This statistical and prediction modelling provides researchers with a non-invasive means to characterize complex conditions, such as the TOFI phenotype via easily accessible sample types.
Due to the complex nature of metabolomics analyses, including lack of standardization in sample collection, extraction protocols, instrument dependent parameters, and bioinformatics processing, inconsistencies and difficulties can arise when comparing results across studies, or evaluating prediction model outcomes in follow-up studies [16][17][18].Metabolomics studies can be largely impacted by the number of detectable features, as certain structures may be missed across studies due to instrumental or environmental factors [19,20].Metabolomics samples stored in freezers inevitably degrade over time due to temperature fluctuations, freeze-thaw cycles, and natural degradation of at-risk biomolecules [21,22].Constituent proteins and nucleic acids such as DNA and RNA can degrade over time due to residual enzymatic activity and exposure to light, while lipids and polar metabolites can undergo oxidative degradation with exposure to the surrounding atmosphere [23,24].Additionally, biological samples stored in freezers may be subject to contamination by microorganisms or other foreign materials, leading to changes in sample composition and compromised downstream analyses.As such, the rate and degree of sample degradation can depend on a multitude of factors, the type of sample, the storage conditions, and the length of time in storage [18,21,22].
Multivariate modelling is essential for dealing with large information rich datasets coming from clinical omics studies.Techniques such as principal component analysis (PCA) can identify and illustrate key sources of variation and clustered patterns among cohorts [25,26], while partial least squares discriminative analysis (PLS-DA) can formulate a weighted prediction towards a discrimination model within a cohort, highlighting the variables of most importance in establishing a classification [27].While the strength of a PLS-DA model can be evaluated by goodness of fit (R2) and goodness of prediction (Q2) values, the validity of a model's performance and its ability to predict a desired classification requires further due diligence [28].For instance, receiver operating characteristic (ROC) curves can provide clinicians and data scientists with a useful means for evaluating the performance of such classification model's, illustrating the trade-off between true positive and false positive rates across a range of thresholds [29,30].The validity of a models performance can be visualized with a ROC curve and its calculated area under the curve (AUC) can quantify it [31].
Given the limited life span of liquid chromatography and mass spectrometry components, its common that a laboratories instrumentation and protocols change over years.In 2014, the metabo-RING initiative addressed concerns for possible discrepancies when implementing different metabolomics platforms for the same model.They concluded that different machines and methods for metabolomics are sufficiently comparative when modeling from the same set of samples [32].While this study provided confidence in comparing metabolomics profiles across our different platforms, our concerns with the possible effects of long term −80 • C storage on metabolomics models remained.
We aimed to statistically project lipid and polar metabolite LCMS features before (Initial) and after (Aged) 5-year −80 • C storage to our previously published TOFI_Asia ethnicity (Caucasian vs. Chinese) model [15], to compare the preservation of predictive integrity among plasma samples.The comparison of multivariate models was achieved with PCA and PLS-DA modelling procedures, and further evaluation of the multivariate models was achieved by comparing the ROC curve and prediction error rates of each given model.

Study Cohort
The previously published TOFI_Asia studies were conducted at the Human Nutrition Unit, University of Auckland, New Zealand.All participants self-reported both parents of the same ethnic descent (i.e., European Caucasian or Asian Chinese) according to ethnic group profiles established by Statistics New Zealand [33].Participants were recruited for both genders across a wide range of ages (20-70 years) and BMI (20-45 kg/m 2 ) and were either normo-glycemic or had prediabetes based on the American Diabetes Association (ADA) criteria [34].Exclusion criteria included significant weight gain or loss (>10%) in the previous 3 months, prior bariatric surgery, pregnancy, breastfeeding, current glucoserelated medications (e.g., glucocorticoids) and significant current/prior history of disease including T2D.
A detailed description of the TOFI_Asia study cohort and relevant protocols can be found elsewhere [11,14,15].From the total cohort, we randomly selected 28 European Caucasian and 28 Asian Chinese participants of equal distribution in gender (14 males and 14 females within each ethnic group) for the current sub-study.All sub-study participants had anthropometry and clinical biomarker data available and were categorized as normoglycemic (n = 33) or with prediabetes (n = 23) based on FPG.

Metabolomic Analysis 2.2.1. Chemicals and Reagents
Chemical solvents used for sample preparation and LC-MS analysis (methanol, isopropanol, formic acid, chloroform and acetonitrile) were purchased from Thermo Fisher Scientific (Auckland, New Zealand).Ammonium formate (Fluka™, HPLC grade) was obtained from Sigma-Aldrich (Auckland, New Zealand).Milli-Q ® ultrapure water was purchased from Merck Millipore (Bedford, MA, USA).All chemicals were of LC-MS grade except chloroform, which was of analytical grade.

Sample Preparation
Both lipids and polar metabolites were extracted from our Initial TOFI_Asia plasma samples simultaneously using the Bligh and Dyer biphasic method [35].Briefly, 100 µL plasma was mixed with 800 µL pre-chilled (−20 • C) CHCl 3 :MeOH (1:1, v/v), agitated for 30 s and stored at −20 • C for 60 min to allow protein precipitation.An additional 400 µL H 2 O was added to the tube, followed by 30 s of vortex-mixing and 10 min of 11,000 rpm centrifugation at 4 • C (Eppendorf Centrifuge 5427 R, Eppendorf, Hamburg, Germany).Subsequently, 200 µL of the upper aqueous layer and 200 µL of the lower organic layer were transferred into two separate 2 mL microcentrifuge tubes to isolate the extracted polar metabolites and lipids, respectively.Resulting extracts were dried down under a nitrogen stream and stored at −80 • C. On the day of instrumental analysis, dried polar extracts were reconstituted in 200 µL ACN:H 2 O (1:1, v/v) and lipid extracts in modified Folch solution (CHCl 3 :MeOH:H 2 O, 66:33:1, v/v/v).
A monophasic protocol for lipid and polar metabolite extractions was implemented with the Aged TOFI_Asia plasma samples, as previously outlined [36,37].For the extraction of lipids, plasma samples were pre-thawed at 4 • C, agitated for 30 s and 10 µL was mixed with 100 µL of pre-chilled (−20 All plasma samples were randomized into sectioned batches to run across consecutive days to account for batch-to-batch variation; pooled quality control (QC) samples were prepared by pooling aliquots from extracted samples, at the two time points, into a 15 mL falcon tube to mix, and subsequently distributed into a series of HPLC vials for intermittent QC injections after every ten samples across each study's run order.A series of blank samples were prepared by substituting plasma for H 2 O within an extraction tube, which was subject to each respective extraction procedure, for incorporation into their relative run at the time point.

Instruments and Conditions
External mass calibrations of the LCMS instruments prior to sample analyses were performed by flow injection of the calibration mix solution according to respective manufacturer's instructions.
Initial TOFI_Asia lipids were analyzed with an Accela 1250 quaternary UHPLC pump coupled to a Q-Exactive hybrid quadrupole-Orbitrap (LTQ-Orbitrap) mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA).Chromatographic separation was carried out on a Acquity CSH™ C18 1.7 µm, 2.1 mm × 100 mm column (Waters, Milford, MA, USA) using the following solvent system: A = acetonitrile/H 2 O (6:4) with 10 mM ammonium formate and 0.1% formic acid, and B = isopropanol/acetonitrile (9:1) with 10 mM ammonium formate and 0.1% formic acid.An injection volume of 2 µL reconstituted sample was used and a mobile phase flow rate of 600 µL/min was implemented.A summary of the HPLC gradient program used is outlined in the table below (Table 1: Gradient programs).High resolution data (resolution 70,000) was acquired by full scan from m/z 200-2000 with source voltage of 3500 V electrospray ionization positive mode (ESI+) or −3600 V ESI negative mode (ESI−), capillary temperature of 275 • C, and sheath, auxiliary and sweep gas flow rates of 40, 10 and 5 arbitrary units, respectively.Data-dependent MS2 data were collected with a mass resolution set to 35,000 recording a mass range of m/z 200-2000 and maximum trap fill time of 250 ms (full scan mode) or 120 ms (MS2 scan mode).The isolation window of selected MS1 scans was ± 1.5 m/z with a normalized collision energy of 30 units.Initial TOFI_Asia polar metabolites were analyzed with an Accela 1250 quaternary UHPLC pump coupled to an Exactive Orbitrap mass spectrometry (Thermo Fisher Scientific, Waltham, MA, USA).Chromatographic separation was carried out at 25 • C on a SeQuant ® ZIC ® -pHILIC 5 µm 2.1 mm × 100 mm column (Merck, Darmstadt, Germany) using the following solvent system: A = H 2 O with 10 mM ammonium formate, B = ACN with 0.1% formic acid.An injection volume of 2 µL reconstituted sample was used and a mobile phase flow rate of 250 µL/min was implemented.A summary of the HPLC gradient program used is outlined in the Supplementary Materials (Supplementary Table S1).High resolution data (resolution 25,000) were acquired by full scan from m/z 55 to 1100 with source voltage of 4.0 kV for ESI+ and −4.0 kV for ESI−, capillary temperature of 325 • C, and sheath, auxiliary, and sweep gas flow rates of 40, 10, and five arbitrary units, respectively.Aged TOFI_Asia lipids were analyzed with a Shimadzu Nexera-x2 UHPLC system coupled to Shimadzu LCMS-9030 quadrupole time-of-flight (Q-TOF) mass spectrometer (Shimadzu Scientific Instruments, Columbia, MD, USA).Chromatographic separation was carried out on a Acquity CSH™ C18 1.7 µm, 2.1 mm × 100 mm column (Waters, Milford, MA, USA) using the following solvent system: A = H 2 O/acetonitrile/isopropanol (5:3:2) with 10 mM ammonium formate, and B = H 2 O/acetonitrile/isopropanol (1:9:90) with 10 mM ammonium formate.An injection volume of 4 µL of sample was used and a mobile phase flow rate of 400 µL/min was implemented.A summary of the HPLC gradient program used is outlined (Supplementary Table S1: Gradient programs).High resolution data (resolution 30,000) measured full MS1 spectra from 250 to 1250 m/z across the entire chromatogram and collecting independent acquisition (DIA) data in 20 m/z windows from 300 to 1100 m/z, with a 0.6 s cycle time and collision energy of 25 normalized collision energy units.The source voltage was 4.0 kV for ESI+ and −4.0 kV for ESI, with a nebulizing gas flow of 2.0 L/min, heater gas flow of 10 L/min, interface temperature of 300 • C, drying gas flow of 10 L/min, desolvation line temperature of 250 • C, and heater block temperature of 400 • C. All drying and collision gasses used were nitrogen.
Aged TOFI_Asia polar metabolites were analyzed with a Shimadzu Nexera-x2 UHPLC system coupled to Shimadzu LCMS-9030 Q-TOF mass spectrometer (Shimadzu Scientific Instruments, Columbia, MD, USA).Chromatographic separation was carried out on a Thermo Accucore HILIC 2.6 µm, 2.1 × 100 mm column (Thermo Fisher Scientific, USA) using the following solvent system: A = H 2 O with 10 mM ammonium formate B = acetonitrile with 0.1% formic acid.An injection volume of 4 µL of sample was used and a mobile phase flow rate of 400 µL/min was implemented.A summary of the HPLC gradient program used is outlined in a table below (Table 1: Gradient programs).High resolution data (resolution 30,000) measured full MS1 spectra from 70 to 1000 m/z across the entire chromatogram and collecting DIA data in 20 m/z windows from 70 to 900 m/z, with a 0.6 s cycle time and collision energy of 25 normalized collision energy units.The source voltage was 4.0 kV for ESI+ and −4.0 kV for ESI, with a nebulizing gas flow of 2.0 L/min, heater gas flow of 10 L/min, interface temperature of 300 • C, drying gas flow of 10 L/min, desolvation line temperature of 250 • C, and heater block temperature of 400 • C. All drying and collision gasses used were nitrogen.

Metabolomics Data Processing
Initial samples' lipid and polar metabolite raw datafiles were converted to mzXML format using ProteoWizard tool MSconvert (v 3.0.1818).Open mzXML data files were pre-processed for features by untargeted peak filtering, peak alignment, and peak filling parameters with the XCMS package (v3.0.2) in the R programming environment (v3.2.2) [38].Features not detected in 100% of the QC samples were excluded from the analysis and resulting extracted ion chromatograms were manually examined to filter poorly integrated peaks generated by the diffreport function.Signal drift and batch effects were corrected for by LOESS algorithm in the online W4M Galaxy environment, and feature filtering with a <30% coefficient of variation limit among QC samples was applied [39].
Aged samples lipid and polar metabolite raw datafiles were converted to centroid mzML format using the Shimadzu file converter and then explored using the open access software package MS-DIAL [40].MS-DIAL was used to perform peak detection, retention time alignment, grouping, gap filling, and run order and batch correction using the LOW-ESS algorithm.MS-DIAL was also used to search and annotate the aligned peaks found in the DIA MS/MS spectral data collected.The lipidomics results were searched against the built-in lipid library containing 257,000 in silico generated MS/MS lipid fragmentation spectra, and Metabolomics results were searched against the open access FeihnLib libraries and an in-house library of putative isotope ratio outlier analysis (IROA) standards [41,42].QC samples were used for a loess adjustment of any run-order effects or loss in instrumental performance.The data matrix was exported for further filter-ing of unreliably measured peaks relative to QC samples via relative standard deviation (RSD > 0.3) and the resultant data matrix was used for downstream statistical analyses.

Statistical Analysis
To determine if the quality of a statistical model is preserved among biological samples after long term −80 • C storage, we chose to replicate the ethnicity (European Caucasian vs. Asian Chinese) modeling presented in prior TOFI_Asia metabolomics studies [14,15].All lipid and polar metabolite data matrices were subject to k-nearest neighbors imputation for missing data points and were log transformed, summary scaled and mean centered to normalize the matrix in preparation for multivariate statistics [43].
PCA and PLS-DA were performed in SIMCA software version 16 (Sartorius, Umea, Sweden).All cross validations and sparse PLS-DA (sPLS-DA) procedures were performed within the Multivariate INTegrative (MINT) mixOmics package in R programming environment (v3.2.2) [28,44].Each sPLS-DA implemented a least absolute selection and shrinkage operator (LASSO) penalization when tuning the variable selection parameters.Each resulting sPLS-DA model's predictive performance was evaluated with ROC curves, as briefly outlined [31].By setting a range of thresholds to evaluate against the sPLSDA tuned Ethnicity models, false positive rates (FPR = FN FP+TN ) and true positive rates (TPR = TP TP+FN ) were calculated and plotted across the y and x-axis, respectively, forming a ROC graph for each Initial and Aged lipid and polar metabolite dataset for the prediction of ethnicity, and an AUC value was calculated for each ROC curve [29].By using the pre-tuned sparse parameters to store a prediction score against a single component for each model, a confusion matrix was established and prediction error scores were calculated, providing a quantifiable comparison of each sPLS-DA prediction model's performance [45].

General Trends
Equal parts male (n = 14) to female (n = 14), Caucasian (n = 14) and Chinese (n = 14) TOFI_Asia study participants, aged 18-68 years, with a BMI range between 21-37 kg/m 2 , had TOFI_Asia LCMS metabolomics data retrieved (Initial), and their respective -80 • C plasma samples thawed and re-analysed via current LCMS protocols (Aged) for comparative modelling.A total of 188 lipids and 78 polar metabolite features were identified from all Initial and Aged plasma samples when comparing each LCMS feature by their mass charge value (equal to 2 decimal points) and their relative retention time value (Supplementary Tables S1 and S2).
Implementation of PCA on both lipid and polar metabolite features showed a weak but detectable visual separation between Caucasian (green symbols) and Chinese (blue symbols) participants when projecting each Initial and Aged dataset to multiple components (Supplementary Figure S1).Visual separation by ethnicity was significantly improved with PLS-DA of all lipid and metabolite models (Figure 1).These results provided confidence in ethnicity as a valid choice for evaluating the degradation or preservation of multivariate models among the TOFI_Asia studies outcomes.The Aged lipid and Aged polar metabolite PCA models presented higher cumulative R2X values relative to their respective Initialomic model values, suggesting a moderate increase in global variation among the cohort's lipid and polar metabolites when the samples were thawed from −80 • C after five years (Table 1).
among the cohort's lipid and polar metabolites when the samples were thawed from −80 °C after five years (Table 1).

The Maintenance of Initial vs. Aged Discriminative Potential of TOFI_Asia Ethnicity Models
PLS-DA calculated cumulative R2Y values decreased in both Aged lipid and Aged polar metabolite ethnic discrimination models, relative to their Initial model projections (Table 1).This indicated a minor decrease in strength for predicting the European Caucasian vs. Asian Chinese model among the Aged samples features.Regardless, each model's performance value and score plot indicates strong separation of participants by ethnicity (Figure 1).By implementing the MixOmics block-sPLS-DA function (MINT), we tuned and evaluated the Initial and Aged plasma samples collectively to further evaluate the discrimination of the TOFI_sub-cohort by ethnicity (Figure 2A).

The Maintenance of Initial vs. Aged Discriminative Potential of TOFI_Asia Ethnicity Models
PLS-DA calculated cumulative R2Y values decreased in both Aged lipid and Aged polar metabolite ethnic discrimination models, relative to their Initial model projections (Table 1).This indicated a minor decrease in strength for predicting the European Caucasian vs. Asian Chinese model among the Aged samples features.Regardless, each model's performance value and score plot indicates strong separation of participants by ethnicity (Figure 1).By implementing the MixOmics block-sPLS-DA function (MINT), we tuned and evaluated the Initial and Aged plasma samples collectively to further evaluate the discrimination of the TOFI_sub-cohort by ethnicity (Figure 2A).
By using the LASSO penalty, MINT analysis of the combined lipidomics dataset reduced the number of variables required to classify ethnicity down to eight after 10-fold cross validation and parameter tuning.The refined sPLS-DA model revealed that the first component (x-variate 1) could explain considerably more co-variance by 17% across the x-axis than the second component (x-variate 2) with only 6% along the y-axis.In contrast, both components for the combined polar metabolites model explained relatively more co-variance between the first and second component with 13% and 16% explained covariance, respectively.The polar metabolites MINT model was a result of 40 LASSO selected variables classifying ethnicity, indicating more global variation among participants' metabolites other than those contributing to the ethnicity model, irrespective of Initial or Aged plasma sampling.By using the LASSO penalty, MINT analysis of the combined lipidomics dataset reduced the number of variables required to classify ethnicity down to eight after 10-fold cross validation and parameter tuning.The refined sPLS-DA model revealed that the first component (x-variate 1) could explain considerably more co-variance by 17% across the x-axis than the second component (x-variate 2) with only 6% along the y-axis.In contrast, The degree of disparity among the lipid and polar metabolites' datasets was further evaluated by splitting the MINT analysis into their respective Initial and Aged study panels (Figure 2B).Splitting the sPLS-DA MINT lipidomics datasets into their respective Initial and Aged analyses showed a clustering of participants within the Aged dataset with a weak reduction in co-variance along X-variate 2. This clustering of participants with Aged plasma sampling was apparent within the polar metabolite's panels, but the Aged metabolites' dataset presented a strong increase in variation across both X-variate axes, relative to their Initial participants' projections.

Evaluation of Initial and Aged TOFI_Asia Ethnicity Prediction Models
After penalty tuning and establishing each Initial and Aged sPLS-DA model for ethnic discrimination, we chose to evaluate the predictive quality of each trained model with ROC curves (Figure 3).ROC curves and their calculated AUC values indicated a greater performance in predicting the cohort's ethnic discrimination from both Initial lipid and Initial polar metabolite models relative to their respective Aged models.Additionally, ROC analysis determined that both Initial and Aged lipidomic models were stronger prediction models for ethnic discrimination than their polar metabolite models.Further evaluation of each model's potential was achieved by calculating a confusion matrix with modelling error rate scores (Table 2A,B).Both the confusion matrices and subsequent error rates supported a significant decline in the prediction power of polar metabolite markers for discrimination of ethnicity after prolonged −80 • C storage.Surprisingly, the Aged lipidomics profile resulted in a weak increase in prediction, indicating that regardless, lipids are more stable features for prediction modelling, in particular the discrimination of ethnicity among the TOFI_Asia study sub-cohort, relative to polar metabolites.

Discussion
To the best of our knowledge, this is the first study examining the integrity of multivariate models from plasma samples stored at −80 • C for several years, rather than the stability of the individual lipids or polar metabolites used to establish the multivariate models.Our study re-evaluated the TOFI_Asia ethnicity models projected from plasma sample lipids and polar metabolites, as previously published [14,15].A sub-cohort of an equal number of Asian Chinese and European Caucasian TOFI_Asia participants was selected at random from the original cohort, and their plasma samples thawed and re-analysed via our updated TOF mass spectrometry protocols.Lipids and polar metabolites were subjected to multivariate modelling to determine their homogeneous/heterogenous changes, and more importantly, their relative characterization of the TOFI_Asia ethnicity models.In turn, we compared the strength of ethnicity discrimination established from the same plasma samples before and after 5 years of −80 • C storage.
Interestingly, with respect to our calculated PCA, the R2X and Q2 values for both Aged plasma lipid and polar metabolite models indicated an increase in variance relative to respective Initial plasma models.When colored according to participants' "Ethnic" denotation within the metadata, our PCA plots indicated a weak but visible separation of participants' ethnicity among both Initial and Aged lipid and polar metabolite features (Supplementary Figure S1).This degree of unsupervised ethnicity discrimination resulted in a strong ethnicity discernment by PLS-DA (Supplementary Figure S2 and Table S1).Although both lipid and polar metabolite Aged plasma models R2Y values indicated a minor loss in model power when compared to their Initial plasma samples, all models provided moderate to high predictive power (Table 1), in turn providing bioinformaticians with a degree of confidence in −80 • C storage practices.
By implementing a sparse tuning process within the block PLS-DA procedure, we found that the combined polar metabolites model required significantly more features to classify ethnic cohorts, relative to the number required from lipids as modelling features (i.e., 40 vs. 8, respectively).This reflects the overall strength of stability among lipid features relative to polar metabolites when establishing the TOFI_Asia ethnicity models after pro-longed cryogenic conditions.ROC and confusion matrices supported the notion that lipids could discriminate ethnicity more so than polar metabolites, but also showed a reduction in modelling stability from Aged plasma samples.While ROC-AUC values indicated a minor loss in model stability between Initial and Aged plasma lipid models, confusion error rates suggested a minor increase in predictive integrity with the Aged plasma lipid model.In contrast, both ROC-AUC values and confusion error rates indicated a significant collapse in model stability and predictive power with the Aged polar metabolites model relative to the Initial polar metabolites model, and both Initial and Aged lipid models.This suggests a detrimental loss of predictive power from polar metabolite features after additional years of −80 • C storage.
The structural and annotative knowledge of small molecules and lipids has become increasingly comprehensive over the years.The Human Metabolome Database (HMDB) reported a total of 114,100 metabolites, detected and predicted, all with taxonomic and ontological mappings [46].LIPIDMAPS Consortium estimated a conservative number of discrete lipid structures in the order of 200,000, given the many variations in acyl/alkyl chains and structural permutations [47].The collective efforts of geneticists and clinicians have characterized a significant portion of the human metabolome, reporting 41,993 smallmolecule metabolites [48,49].Untargeted clinical studies generally report anywhere from 100-200 metabolites within batched LCMS runs of human plasma [14,50].Previous attempts at covering the human plasma lipidome report approximately 600 lipid species, comprising six main mammalian lipid categories [51].Several studies have evaluated the degradation potential of metabolites, lipid species and lipid classes with respect to freezing, thawing, storage conditions, time on a bench or time within an incubator [52][53][54].Regardless of the inevitable degradation of commonly reported features in clinical settings, we hypothesized that the discriminant degradation of such metabolites and lipids would have a minor impact on our chosen multivariate models, with the outcome being a change in predictive strength, more than a complete loss of predictive capability.
Although our study compared features derived from different methods of extraction (bi-vs.monophasic extraction) and means of acquisition (Orbitrap vs. TOF mass spectrometry), previous reports [55][56][57] along with our in-house optimizing [58] strongly indicated that a change from biphasic to monophasic extraction protocols for lipids and polar metabolites would yield comparable features for the current study outcomes.Furthermore, the metabo-RING study, which evaluated the reliability of untargeted metabolomics data across various laboratories with their respective LCMS instruments and in-house extraction protocols, reported comparable results from the same urine and plasma samples [32].In particular, the groups comparing Q-TOF and LTQ-Orbitrap performances reported no effect due to instrumentation.For more direct comparison of un-annotated LCMS spectra, a group has developed an iterative alignment algorithm to account for retention time drifts between batches when processing features for selection [59].Unfortunately, due to the algorithm's limitations with aligning spectra with severe retention time shifts (and in our case, two different manufacturers chromatography columns and slightly different solvent gradients), our features resulting from two methodologies of LC-MS acquisition were out of scope for current tools alignment capabilities, so required manual curation for final selection, posing the greatest limitation to direct comparison of the data.

Conclusions
In conclusion, the comparison of lipid and polar metabolite features from plasma samples, after varying lengths of −80 • C storage (i.e., post collection vs. 5 years), indicated that the predictive quality of the polar metabolite features were impacted by cryogenic "Ageing".The prediction of the previously established TOFI_Asia ethnicity model showed only a weak to negligible reduction in PLS-DA strength when modelling ethnicity by both lipid and metabolite features.However, more thorough evaluation of the lipid and metabolite models by sparse-PLS-DA and subsequent ROC analysis revealed a greater detriment in metabolite modelling after 5 years at −80 • C storage, while lipid modelling • C) BuOH:MeOH (1:1, v/v).Extraction samples were sonicated for 5 min and centrifuged at 11,000 rpm for 10 min at 4 • C. The supernatants were transferred to HPLC vials and stored at −80 • C. For the extraction of polar metabolites, 50 µL of the pre-thawed plasma was mixed with 450 µL pre-chilled (4 • C) ACN:H 2 O (9:1, v/v).Extraction tubes were sonicated for 5 min and centrifuged at 11,000 rpm for 10 min at 4 • C. Resulting supernatants were transferred to HPLC vials and stored at −80 • C.

Figure 1 .
Figure 1.Partial least squares discriminative analysis of both Initial and Aged lipid and metabolite datasets for Ethnicity discrimination (Caucasian vs. Chinese).For each model, participant symbols were colored according to ethnicity (Caucasian blue symbols, Chinese orange symbols).

Figure 1 .
Figure 1.Partial least squares discriminative analysis of both Initial and Aged lipid and metabolite datasets for Ethnicity discrimination (Caucasian vs. Chinese).For each model, participant symbols were colored according to ethnicity (Caucasian blue symbols, Chinese orange symbols).

Figure 2 .
Figure 2. Sparse tuning of block PLS-DA lipid and polar metabolite models for the discrimination of Ethnicity via the MixOmics MINT analysis.For each model, participant symbols were colored according to ethnicity (Caucasian blue symbols, Chinese orange symbols).(A) Sparse tuned block PLS-DA plots of respective lipid and metabolite Ethnicity models, calculated from the combination of Initial and Age study features as collective discrimination analysis.(B) Sparse tuned block PLS-DA plots of respective lipid and metabolite Ethnicity models as separated by Initial (Study 1) and Aged (Study 2) panels for comparison of discrimination analysis.

Figure 2 .
Figure 2. Sparse tuning of block PLS-DA lipid and polar metabolite models for the discrimination of Ethnicity via the MixOmics MINT analysis.For each model, participant symbols were colored according to ethnicity (Caucasian blue symbols, Chinese orange symbols).(A) Sparse tuned block PLS-DA plots of respective lipid and metabolite Ethnicity models, calculated from the combination of Initial and Age study features as collective discrimination analysis.(B) Sparse tuned block PLS-DA plots of respective lipid and metabolite Ethnicity models as separated by Initial (Study 1) and Aged (Study 2) panels for comparison of discrimination analysis.

Metabolites 2024 , 15 Figure 3 .
Figure 3. Receiver operator curves depicting the predictive accuracy of each metabolomics model for discrimination of ethnicity.The predictive accuracy of receiver operator curves presented as "outcomes" depicting area under the curve (AUC) values.

Figure 3 .
Figure 3. Receiver operator curves depicting the predictive accuracy of each metabolomics model for discrimination of ethnicity.The predictive accuracy of receiver operator curves presented as "outcomes" depicting area under the curve (AUC) values.

Table 1 .
Comparison of Initial vs. Aged lipid and metabolomics multivariate models.

Table 2 .
Confusion matrix and prediction error rates for determining the accuracy of sPLS-DA ethnic discrimination models for Initial and Aged lipidomic and metabolomic datasets.