OS-PCA: Orthogonal Smoothed Principal Component Analysis Applied to Metabolome Data

Principal component analysis (PCA) has been widely used in metabolomics. However, it is not always possible to detect phenotype-associated principal component (PC) scores. Previously, we proposed a smoothed PCA for samples acquired with a time course or rank order, but hypothesis testing to select significant metabolite candidates was not possible. Here, we modified the smoothed PCA as an orthogonal smoothed PCA (OS-PCA) so that statistical hypothesis testing in OS-PC loadings could be performed with the same PC projections provided by the smoothed PCA. Statistical hypothesis testing is especially useful in metabolomics because biological interpretations are made based on statistically significant metabolites. We applied the OS-PCA method to two real metabolome datasets, one for metabolic turnover analysis and the other for evaluating the taste of Japanese green tea. The OS-PCA successfully extracted similar PC scores as the smoothed PCA; these scores reflected the expected phenotypes. The significant metabolites that were selected using statistical hypothesis testing of OS-PC loading facilitated biological interpretations that were consistent with the results of our previous study. Our results suggest that OS-PCA combined with statistical hypothesis testing of OS-PC loading is a useful method for the analysis of metabolome data.


Introduction
Principal component analysis (PCA) [1] and partial least squares (PLS) [2,3] have been widely applied to metabolome data [4]. PCA is an unsupervised method that does not require group information for its computation, whereas PLS has been used as a supervised method. Various multivariate analysis methods that utilize additional information also have been proposed to analyze metabolome data [5]. These methods make it possible to extract features that are especially suited to a specific purpose. Smilde et al. [6] reviewed multivariate analysis methods for time-resolved and longitudinal metabolome data (which they called dynamic metabolomic data analysis), such as smooth-PCA, which combines PCA with smoothness, and analysis of variance (ANOVA) simultaneous component analysis (ASCA) [7,8]. Dynamic PCA was extended by including a probabilistic model-probabilistic dynamic PCA [9]. We previously combined PCA, PLS, and Fisher discriminant analysis (FDA) with smoothness as smoothed PCA, smoothed PLS, and smoothed FDA, and the kernel-based nonlinear extension for this type of data [10]. The smoothed PCA can extract features that are associated with phenotypes such as time course information in the principal component (PC) scores.

Results and Discussion
We applied the OS-PCA method to two real metabolome datasets to verify its usefulness.

Case Study 1: Metabolic Turnover Analysis
In this case study, we used the metabolic turnover data reported by Nakayama et al. [15]. This dataset contains data for three groups: Saccharomyces cerevisiae BY4742 cultured in sucrose-dextran (SD) medium with amino acid supplement and S. cerevisiae X2180 cultured in SD medium with or without amino acid supplement. In all three groups, the culture medium contained 13 C-labeled glucose. The culture fluid was sampled at 0, 10, 20, 40, 80, 160, 320, 640, 1280, 2560, and 7200 s. Metabolome data measured by gas chromatography/mass spectrometry were converted to isotopomer ratios, i.e., the ratio of peak area of metabolites (amino acids) labeled with the 13 C isotope to peak area of metabolites with nonisotopic 12 C.
We performed a PCA for autoscaled data and confirmed the relation of incubation time to metabolic turnover in PC1 (Figure 1). In PC1, the sample order of the score was consistent with that of the incubation time (Figure 1b), which suggests that time course information can be extracted in PC1. However, differences between groups, such as strains and culture mediums, were not detected in PC1 and PC2. The contribution ratios of PC1 and PC2 were 65.96% and 9.596%, respectively, so the cumulative contribution ratio was over 75% for PC1 and PC2. We also calculated PC3, PC4, and PC5 scores ( Figure S1). We confirmed the fluctuation trend against incubation time of X2180 cultured in SD medium without amino acids differed from the trends of BY4742 and X2180 cultured in SD medium with amino acids in PC4 and PC5. However, the contribution ratios of PC4 and PC5 were very small-3.80% and 0.88%, respectively-so we did not consider these PCs in the subsequent analysis.
Metabolites 2021, 11, x FOR PEER REVIEW 3 of 12 of metabolites (amino acids) labeled with the 13 C isotope to peak area of metabolites with nonisotopic 12 C. We performed a PCA for autoscaled data and confirmed the relation of incubation time to metabolic turnover in PC1 ( Figure 1). In PC1, the sample order of the score was consistent with that of the incubation time (Figure 1b), which suggests that time course information can be extracted in PC1. However, differences between groups, such as strains and culture mediums, were not detected in PC1 and PC2. The contribution ratios of PC1 and PC2 were 65.96% and 9.596%, respectively, so the cumulative contribution ratio was over 75% for PC1 and PC2. We also calculated PC3, PC4, and PC5 scores ( Figure  S1). We confirmed the fluctuation trend against incubation time of X2180 cultured in SD medium without amino acids differed from the trends of BY4742 and X2180 cultured in SD medium with amino acids in PC4 and PC5. However, the contribution ratios of PC4 and PC5 were very small-3.80% and 0.88%, respectively-so we did not consider these PCs in the subsequent analysis.
Nakayama et al. [15] applied an ad-hoc transformation to this same data, whereby the average value was subtracted from every incubation time to show the differences between groups. As a result, they could confirm group differences in PC1 but not time course information [15]. Scatter plot of PC scores obtained by PCA of the metabolic turnover data of Nakayama et al. [15]. (a) Scatter plot of first and second PC scores (PC1 and PC2). The contribution ratios (variance) of PC1 and PC2 were 65.96% and 16.76%, respectively. (b) Scatter plot of PC1 score and incubation time. (○) S. cerevisiae BY4742 cultured in SD medium with amino acids (A.A.), (Δ) S. cerevisiae X2180 cultured in SD medium with amino acids, (+) S. cerevisiae X2180 cultured in SD medium without amino acids.
We also performed smoothed PCA and OS-PCA ( Figure 2) for this data with only autoscaling and no ad-hoc transformation. We subjectively set the smoothing parameter κ to 0.1 in the smoothed PCA and to 0.999 (i.e., close to 1) in the OS-PCA, and used the second differential matrix D (2) (see Section 3.1 for details).
The smoothed PC ( Figure 2a) and OS-PC scores ( Figure 2b) were almost the same, although the positive and negative directions were reversed. This result showed that OS-PCA was able to extract the same features as smoothed PCA and confirmed the PCA result as well as the relation of incubation time to metabolic turnover in smoothed PC1 ( Figure  2a) and OS-PC1 ( Figure 2b). However, unlike the PCA result of PC1 and PC2, the fluctuation trend against the incubation time of X2180 cultured in SD medium without amino acids differed from the trends of BY4742 and X2180 cultured in SD medium with amino Figure 1. Scatter plot of PC scores obtained by PCA of the metabolic turnover data of Nakayama et al. [15]. (a) Scatter plot of first and second PC scores (PC1 and PC2). The contribution ratios (variance) of PC1 and PC2 were 65.96% and 16.76%, respectively. (b) Scatter plot of PC1 score and incubation time. ( ) S. cerevisiae BY4742 cultured in SD medium with amino acids (A.A.), (∆) S. cerevisiae X2180 cultured in SD medium with amino acids, (+) S. cerevisiae X2180 cultured in SD medium without amino acids. Nakayama et al. [15] applied an ad-hoc transformation to this same data, whereby the average value was subtracted from every incubation time to show the differences between groups. As a result, they could confirm group differences in PC1 but not time course information [15].
We also performed smoothed PCA and OS-PCA ( Figure 2) for this data with only autoscaling and no ad-hoc transformation. We subjectively set the smoothing parameter κ to 0.1 in the smoothed PCA and to 0.999 (i.e., close to 1) in the OS-PCA, and used the second differential matrix D (2) (see Section 3.1 for details).  (Figure 2c). Together these results show that the PCA extracted only the time course information, but not group differences, within PC1 and PC2, and after ad-hoc transformation [15], the PCA extracted group differences, but not time course information. Smoothed PCA and OS-PCA successfully extracted both time course information and group differences as a major component. These results indicate the usefulness of smoothed PCA and OS-PCA applied to metabolome data with a time course and groups.  [15]. (a) Scatter plot of first and second smoothed PC scores (PC1 and PC2) obtained by smoothed PCA (κ = 0.1) with the second differential matrix D (2) . The contribution ratios (variance) of smoothed PC1 and smoothed PC2 were 33.83% and 9.596%, respectively. (b) Scatter plot of first and second OS-PC scores (OS-PC1 and OS-PC2) obtained by OS-PCA (κ = 0.999) with the second differential matrix D (2) . The contribution ratios (covariance) of OS-PC1 and OS-PC2 were 90.20% and 5.773%, respectively. (c) Scatter plot of OS-PC scores of auxiliary variables. (○) S. cerevisiae BY4742 cultured in SD medium with amino acids (A.A.), (Δ) S. cerevisiae X2180 cultured in SD medium with amino acids, (+) S. cerevisiae X2180 cultured in SD medium without amino acids.
Together these results show that the PCA extracted only the time course information, but not group differences, within PC1 and PC2, and after ad-hoc transformation [15], the PCA extracted group differences, but not time course information. Smoothed PCA and OS-PCA successfully extracted both time course information and group differences as a major component. These results indicate the usefulness of smoothed PCA and OS-PCA applied to metabolome data with a time course and groups.
Nakayama et al. [15] identified Peak-63 as a histidine derivative by narrowing down candidates using the hypothesis that the similarity of isotopomer ratios corresponded to distances on the metabolic pathway map. In the OS-PCA, Peak-63 had the highest OS-PC2 loading followed by Histidine with the second highest. No other statistically significant unidentified peaks were found. These results support those of Nakayama et al., who identified Peak-63 as a histidine derivative. We found that the OS-PC2 score decreased with incubation time only for X2180 cultured in SD medium with amino acids. Because Histidine and Lysine showed statistically significant positive correlations with the OS-PC2 score, we selected Histidine and Lysine as the metabolites that decreased with incubation time only for X2180 cultured in SD medium with amino acids. Because these metabolites were not labeled with the 13 C isotope, Nakayama et al. [15] concluded that they were not synthesized under this condition. Branched-chain amino acids and intermediates of the tricarboxylic acid (TCA) cycle, Isoleucine_2 trimethylsilyl (TMS) and Citric acid + Isocitric acid, were among the top 10 metabolites in OS-PC2 loading, but this result was not statistically significant. This may be because, although the difference in the fluctuation trend against incubation time was shown in OS-PC2, the group separation was not shown clearly.
The metabolites that we focused on in the OS-PCA were partially consistent with those of the previous study [15]. The OS-PCA and smoothed PCA both detected differences in the fluctuation trend with incubation time, but the OS-PCA did not completely separate the groups. This may be because OS-PCA and smoothed PCA are unsupervised, not supervised, methods. To separate groups more clearly, a supervised approach such as PLS also could be used.

Case Study 2: Metabolome Analysis for the Taste of Japanese Green Tea
In this case study, we used some of the metabolome data from the Platform for RIKEN Metabolomics (http://prime.psc.riken.jp/Metabolomics_Software/StatisticalAnalysisOn MicrosoftExcel/index.html, accessed on 4 March 2021) to evaluate the taste of Japanese green tea [16,17]. In the selected dataset, each green tea leaf that ranked 1, 6, 11, 16, 21, 31, 36, 41, 46, and 51 in tasting was measured three times.
We performed a PCA ( Figure 3) and confirmed that the differences among the green tea leaf samples were reflected in PC1 and the partial association with the taste ranking was reflected in PC2; however, no clear association with the taste ranking was detected. We also calculated PC3, PC4, and PC5 scores ( Figure S2), but their associations with the taste ranking were not confirmed. Then, we applied the OS-PCA for repeated measurement data (κ = 0.1) to the same data ( Figure 4). As described in Section 3.3, OS-PC scores for repeated measurement data were calculated by t = Xw x (Figure 4a) as was done in case study 1, whereas OS-PC scores for repeated data of auxiliary variables were calculated by Ms = MXw y (Figure 4b), which is the score for the average of repeated measures. Therefore, the OS-PC scores of auxiliary variables can be regarded as the same value for repeated data.
We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details).
The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely consistent with the samples' taste ranking. Sample order of OS-PC1 score of auxiliary variables that is consistent with the correct order (e.g., taste ranking) is important because OS-PC loading is the correlation coefficient between the OS-PC score of auxiliary variables and each metabolite level, as described in Section 3.4.  We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details).
The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely consistent with the samples' taste ranking. Sample order of OS-PC1 score of auxiliary variables that is consistent with the correct order (e.g., taste ranking) is important because  We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details).
The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely consistent with the samples' taste ranking. Sample order of OS-PC1 score of auxiliary variables that is consistent with the correct order (e.g., taste ranking) is important because  We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details).
The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely consistent with the samples' taste ranking. Sample order of OS-PC1 score of auxiliary variables that is consistent with the correct order (e.g., taste ranking) is important because  Figure S3). For this, we set κ coefficient did not change much for κ = 0.1 to 0.999, and used x D (2) (see Section 3.1 for details). ly reflected the taste ranking (Figure 4a), which suggests that ed metabolites that reflected the taste ranking. Furthermore, PC1 score of auxiliary variables (Figure 4b) was completely taste ranking. Sample order of OS-PC1 score of auxiliary varthe correct order (e.g., taste ranking) is important because  We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details).
The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely    We calculated the correlation coefficient between the OS-PC1 score and the taste ranking to confirm the effect of the smoothing parameter κ ( Figure S3). For this, we set κ = 0.1 because the correlation coefficient did not change much for κ = 0.1 to 0.999, and used the second differential matrix D (2) (see Section 3.1 for details). The OS-PC1 score roughly reflected the taste ranking (Figure 4a), which suggests that the metabolome data included metabolites that reflected the taste ranking. Furthermore, the sample order of the OS-PC1 score of auxiliary variables (Figure 4b) was completely consistent with the samples' taste ranking. Sample order of OS-PC1 score of auxiliary variables that is consistent with the correct order (e.g., taste ranking) is important because To select statistically significant metabolites, statistical hypothesis testing of OS-PC1 loading was performed (Table S2) In a previous study [16], quinic acid, amino acids, and sugars were associated with the taste ranking. The sugars raffinose, arabinose, and galactose showed statistically significant negative correlation with OS-PC1 scores (q < 0.25), which indicated these metabolites were present at high levels in the highly ranked teas. Among the amino acids, serine showed a statistically significant positive correlation with OS-PC1 scores (q < 0.25), which indicated this metabolite was present at a low level in the highly ranked teas. The statistical significance of quinic acid in green tea leaf was not confirmed in the OS-PCA.

Smoothed Principal Component Analysis (PCA)
In a previous study [10], we proposed a smoothed PCA method in which a smoothing term [18] was added to the constraint condition of PCA. The smoothed PCA is formulated as: where X is a mean-centered data matrix with a sample in each row and metabolites in each column; w x is a weight vector; var(t) indicates variance of PC score vector t = Xw x , which is a linear combination of each variables in data matrix X; n is the number of samples; and κ is a smoothing parameter. D is the first or second differential matrix that is set as: The first differential matrix D (1) is (n − g) × n and second differential matrix D (2) is (n − 2g) × n, where n is the number of samples and g is the number of groups. Finally, the smoothed PCA is written as a generalized eigenvalue problem: where λ is an eigenvalue and I is the identity matrix. When κ is set to 0, the smoothed PCA is consistent with the PCA. Theoretical details of the smoothed PCA are given in [10]. In the smoothed PCA, it is difficult to explain the eigenvector w x statistically, so statistically significant metabolites cannot be selected using loadings defined by the eigenvector.

Orthogonal Smoothed Principal Component Analysis (OS-PCA)
We introduced an auxiliary variable s (= Xw y ) and maximized the covariance between t and s instead of the variance of t in PCA as: maxcov(t, s) = (1/n)w x x Xw y s.t. w x w x = 1, (1 − κ)w y w y + κ(Ds) (Ds) = w y Pw y = 1 where w y is a weight vector of auxiliary variable; matrix P is (1 − κ)I + κX D DX; and cov(t,s) indicates covariance between score vectors t and s. This formulation is similar to that of PLS-ROG [14]. The main difference between smoothed PCA and PLS-ROG is that smoothed PCA does not use a response variable, such as group information. In PLS and PLS-ROG, the response variable has an important role when loadings are interpreted statistically. Similarly, the auxiliary variable s of OS-PCA is essential to interpret OS-PC loading statistically so that statistical hypothesis testing of OS-PC loadings can be performed, as explained in Section 3.4. Using the Lagrange multipliers method, Equation (3) was reformulated as the maximization of: where λ x and λ y are Lagrange multipliers. Partial differentiation of Equation (4) with respect to w x and w y followed by a transformation, yields two equations: So that both of these equations can express the eigenvalue problem for w x and w y , we rearranged them as: 1/n 2 X XP −1 X Xw x = λw x 1/n 2 X XX Xw y = λPw y (6) where λ = 4λ x λ y . Like PCA, OS-PCA has a unique solution because it was formulated as an eigenvalue problem for w x , and the i-th weight vector of OS-PCA corresponds to the i-th largest eigenvector in Equation (6). The smoothed PCA proposed previously [10] was formulated as a generalized eigenvalue problem, where the eigenvectors were not orthogonal with each other. For the OS-PCA, the formulation was written as an eigenvalue problem for w x , and the eigenvectors are orthogonal with each other. When κ is set to 0, the matrix P becomes the identity matrix and the two equations of eigenvalue problems for w x and w y in Equation (6) are the same. Therefore, w x and w y are the same eigenvector, which corresponds to a specific eigenvalue. When w x and w y are the same, the two equations of the eigenvalue problems in Equation (5) are the same and consistent with ordinary PCA. The contribution ratio of PCA is associated with the variance of PC scores because it corresponds to the ratio of variance of a specific PC score to the sum of variance of all PCs. Conversely, the contribution ratio of OS-PCA is associated with the covariance between OS-PC scores of explanatory and auxiliary variables. Because the contribution ratios of PCA and OS-PCA are associated with different statistics, they cannot be compared.

OS-PCA for Repeated Measurement Data
Both smoothed PCA and OS-PCA assume that all the samples are ordered. However, one sample may be measured repeatedly to reduce the effect of variability of measurement. In such a case, ordered and unordered samples of repeated measurements will be mixed in the data, so the simplest and most straightforward method for the OS-PCA is to use the averaged data for the repeated measurement. Alternatively, the averaging operation can be combined with the OS-PCA for repeated measurement data and formulated as: maxcov(Mt, Ms) = (1/n)w x X M MXw y s.t. w x w x = 1, (1 − κ)w y w y + κ(DMs) (DMs) = w y Qw y = 1, which is similar to the formulation in Section 3.2. The averaging matrix M is set as: The vector is m 1 = [1/n 1 , 1/n 1 , 1/n 1 , . . . , 1/n 1 ] and each element is the reciprocal of the number of measurement repetitions. Similar to the formulation used in Section 3.2, Equation (7) can be reformulated using the Lagrange multipliers method as: where matrix Q is (1 − κ)I + κX M D DMX. Partial differentiation of Equation (8) with respect to w x and w y yields two equations: These equations are rearranged to express the eigenvalue problem for w x and w y as: As we did for the OS-PCA in Section 3.2, the formulation of OS-PCA for repeated data was written as an eigenvalue problem for w x . OS-PC scores for repeated measurement data were calculated by t = Xw x , whereas OS-PC scores for repeated measurement data of auxiliary variables were calculated by Ms = MXw y , which is the score for the average of repeated measures. Therefore, the OS-PC scores of auxiliary variables can be regarded as the same value for repeated measurement data.
In our previous smoothed PCA [10], we did not consider repeated measurement of the same sample. In the OS-PCA, we applied smoothing for the averaged value of repeated measurements, and the OS-PCA result for repeated measurement data did not affect the order of samples within the repeated measurements.

Statistical Property of OS-PC Loading for Autoscaled Data
To select metabolites using statistical criteria, it is essential to clarify the statistical property of w x . The correlation coefficient between s and x p , the p-th variable of data matrix X, is written as: corr s, x p = cov s, x p / var(s) var x p , where corr(s,x p ) indicates the correlation coefficient between the OS-PC score vectors s and x p . When data matrix X is scaled to zero mean and unit variance for each variable (i.e., autoscaling), Equation (11) is written as: corr s, x p = cov s, x p / var(s) (12) because variance of x p is 1. Then, s = Xw y is substituted into Equation (12) as: corr s, x p = (1/n)w y X Xc/ (1/n)w y X Xw y = c (1/n)X Xw y / (1/n)w y X Xw y , where c is introduced as the column vector in which the p-th element is 1 and the other elements are 0, giving x p = Xc. Then, (1/n)X Xw y = 2λ x w x is substituted into Equation (13) as: corr s, x p = c (2λ x w x )/ (1/n)w y X Xw y = 2λ x w x,p / (1/n)w y X Xw y .
The denominator of Equation (14) is not affected by the p-th variable. Therefore, the eigenvector w x is proportional to the correlation coefficient between s and x p . We defined this statistic of the correlation coefficient between the OS-PC score of auxiliary variables and each metabolite level as OS-PC loading. We set r as corr(s,x p ) and performed statistical hypothesis testing of the correlation coefficient using a t-statistic as: which has a t-distribution with n − 2 degrees of freedom [11,14]. The result is used to select significant metabolites by statistical hypothesis testing of OS-PC loading in OS-PCA as well as PCA.

Statistical Property of OS-PC Loading for Repeated Measurements and Autoscaled Data
The correlation coefficient between averaged score Ms and averaged p-th metabolite levels Mx p is written as: corr Ms, Mx p = cov Ms, Mx p / var(Ms) var Mx p (16) In OS-PCA, the data matrix X is scaled by autoscaling for each metabolite level, whereas the averaged data matrix MX is transformed using autoscaling in OS-PCA for repeated measurement data. This means that the averaged data for repeated measurements are transformed by autoscaling. Then, Equation (16) and (1 − n)X M MXw y = 2λ x w x is substituted into Equation (18) as: corr Ms, Mx p = c (2λ x w X )/ (1/n)w y X M MXw y = 2λ x w x,p / (1/n)w y X M MXw y .
The denominator of Equation (19) is not affected by the p-th variable. Therefore, the eigenvector w x is proportional to the correlation coefficient between averaged score Ms and the averaged level of each metabolite Mx p for repeated measurement data, so statistical hypothesis testing of OS-PC loading for repeated measurement data can be performed in the same way as OS-PCA. The main features of the PCA, smoothed PCA, and OS-PCA methods are summarized in Table 1. Table 1. Main features of the PCA, smoothed PCA, and OS-PCA methods.

Method Equation Eigenvector
Hypothesis Testing

Conclusions
We developed OS-PCA as an improved version of the smoothed PCA. OS-PCA can be used to perform statistical hypothesis testing of OS-PC loading, which is very important in metabolomics because biological interpretations are made on the basis of the deter-mined significant metabolites. We applied OS-PCA to two metabolomics datasets, which confirmed the usefulness of OS-PCA in detecting significant metabolites. We expect that OS-PCA can be usefully applied to metabolome data that have external information such as time course and rank order of samples.
Supplementary Materials: The following are available online at https://www.mdpi.com/2218-1989 /11/3/149/s1, Figure S1: Scatter plot of PC scores obtained by PCA of the metabolic turnover data of Nakayama et al. [15], Figure S2: Scatter plots of PC scores obtained by PCA of the metabolome data for taste testing of Japanese green tea, Figure S3: Correlation coefficient plot between the OS-PC1 score and dummy variable of taste ranking of Japanese green tea, Table S1: Statistically significant metabolites correlated with the OS-PC2 score obtained by OS-PCA of metabolic turnover data of Nakayama et al. [15], Table S2: Statistically significantly metabolites correlated with the OS-PC1 score obtained by OS-PCA of metabolome data for taste tasting of Japanese green tea.

Data Availability Statement:
The data presented in this study are openly available from this link: https://github.com/hiroyukiyamamoto/os-pca.