Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules

: Principal component analysis (PCA) is used to reduce the dimensionalities of high-dimensional datasets in a variety of research areas. For example, biological macromolecules, such as proteins, exhibit many degrees of freedom, allowing them to adopt intricate structures and exhibit complex functions by undergoing large conformational changes. Therefore, molecular simulations of and experiments on proteins generate a large number of structure variations in high-dimensional space. PCA and many PCA-related methods have been developed to extract key features from such structural data, and these approaches have been widely applied for over 30 years to elucidate macromolecular dynamics. This review mainly focuses on the methodological aspects of PCA and related methods and their applications for investigating protein dynamics.


Historical Overview
Principal component analysis (PCA) is a widely used multivariate analysis approach, originally proposed about 100 years ago [1,2], that has found increasing applications since the widespread availability of digital computers to reduce the dimensionality of highdimensional datasets. This reduction is enabled by linear transformation from the original variables to new collective variables, so that a small number of "principal components" dominate the features of the dataset. Now PCA is considered as an unsupervised machine learning technique.
The structures of proteins and other biological macromolecules are well characterized by a set of multidimensional variables, such as atomic coordinates and dihedral angles, and information regarding the dynamics of these molecules is typically obtained as a time series of high-dimensional data or an ensemble of experimentally determined structures. Although such large ensembles of high-dimensional data contain useful information, they are not easily interpretable. Therefore, extracting important features from high-dimensional data is essential to understand the dynamics of biological macromolecules.
Similar to the increased application of PCA in other areas, the use of PCA in analyzing protein dynamics has gradually become more common as the performance of computers has improved, making the molecular simulations of proteins more accessible. The first molecular dynamics (MD) simulation of a small folded protein, bovine pancreatic trypsin inhibitor (BPTI), in vacuum was conducted in 1977 [3], and the first protein normal mode analysis (NMA) for BPTI was performed in 1983 [4][5][6]. NMA is a harmonic approximation of protein dynamics at a potential energy minimum, and clearly shows that the low-frequency normal modes of proteins are collective motions of the atoms spread over the entire protein (namely, global motions) and that the lowest normal mode frequency is a few cm −1 . Since the vibrational frequencies of bond stretching modes are higher than this by three orders of magnitude, the amplitudes of the lowest and highest modes also differ by three-fold, indicating the highly anisotropic nature of proteins even within the J 2022, 5 299 range of vibrational motions. This high anisotropy may be partly attributed to the highly packed structures of folded native proteins, whose packing densities are comparable to that of a face-centered cubic lattice [7]. In highly packed structures, local motions uncorrelated with the surroundings are limited to small amplitudes because of possible collisions, while concerted motions of groups of atoms such as protein domains or loops can move in certain directions largely without altering atomic packing. In 1981, Karplus and Kushick proposed a method to estimate the configurational entropy of macromolecules from NMA, MD and Monte Carlo (MC) simulations [8]. That publication also showed that simulations with (NMA) and without harmonic approximation (MD and MC) can be connected by PCA. The length of the first reported protein MD was 8.8 ps [3], which roughly corresponds to one period of the lowest-frequency normal mode of typical small globular proteins and thus was insufficiently long to sample large-amplitude motions of the protein. However, increasing simulation lengths allowed investigation of the quasi-harmonic features of butane and BPTI, mainly focusing on quasi-harmonic frequencies deduced from PCA [9,10]. Later, projecting simulation trajectories onto collective coordinates was shown to be very useful for characterizing dominant protein dynamics, but the early stages of this endeavor used low-frequency normal modes for the projected collective variables [11]. Since normal modes are determined based only on one energy minimum, they are not necessarily the best choice to investigate the anharmonic nature of protein dynamics. In contrast, PCA determines principal coordinates as the collective coordinates, which incorporate anharmonic features included in the MD or MC trajectory. Longer and more realistic MD simulations in solution were performed from the 1980s to the early 1990s, allowing the PCA of MD trajectories. In the early 1990s, the anisotropic and anharmonic nature of native protein dynamics was elucidated by PCA, focusing on principal components (PCs), defined as the projections onto the principal coordinates [12][13][14][15]. PCA was also shown to be useful for analyzing simulation trajectories of protein folding/non-folding dynamics [16,17]. The past three decades have been seen the frequent use of PCA to investigate the dynamic behavior of biopolymers, as well as many important methodological improvements and the elucidation of simulated dynamic features [18][19][20][21][22][23]. Since PCA employs a variancecovariance matrix for dimensionality reduction, it is useful to characterize large-amplitude conformational change in molecules, such as protein domain motion and folding. However, PCA may not be sensitive for detecting localized, small amplitude but functionally important motions, such as backrub motion [24], peptide-plane flip [25], the side-chain flip and path-preserving motions [26].
This review provides an overview of PCA and related methods and their applications for investigating protein dynamics, focusing mainly on methodological aspects. In addition, some basic concepts and important findings obtained during the early years of this field are revisited for the benefit of non-experts, as well as a review of the latest progress in PCA-related research. The following PCA applications demonstrate the examples in which macromolecular dynamics cannot be well characterized without the use of PCA.

Basic Concept behind PCA
The investigation of macromolecular dynamics by PCA requires the selection of certain degrees of freedom of the target molecule that characterize the dynamics well. Consider a vector of general coordinates of a target molecule or molecules, q, and suppose that q = {q i } is a column vector consisting of f variables (i = 1, . . . , f ). Thus, q = { q i } is the average and ∆q = q − q is the deviation from the average. To explicitly indicate an index of the mth data point among M (m = 1, . . . , M), we use the expression ∆q m . When the MD trajectory is considered, the expression ∆q(t m ) is employed to specify a coordinate set at time t m . To conduct PCA, a variance-covariance matrix C is introduced: where ∆q T represents the transpose of ∆q and · · · denotes the simple average over a given dataset. The matrix C = C ij is a positive semidefinite whose eigenvalues are non-negative. By introducing an f × M matrix of the whole dataset: A can be obtained as a matrix product: where Q T denotes the transpose of Q. By solving the standard eigenvalue problem with the orthonormal condition: we obtain V, λ and I, which are the eigenvector, eigenvalue and unit matrices, respectively. λ is a diagonal matrix whose αth diagonal element {λ α } (α = 1, . . . , f ) is the variance of the αth PC, and the αth column vector v α of V is the corresponding eigenvector. Typically, λ α is sorted in descending order such that the first PC shows the largest variance λ 1 and the corresponding column vector v 1 of V = v 1 · · · v f indicates the eigenvector of the first PC. Projection of ∆q m onto V provides: where σ m is a column vector of the projections (principal components). The overall linear transformation of Q using V gives a projection matrix Σ = σ 1 · · · σ f onto the PC: The f × M matrix Σ represents the matrix of principal components, which are collective variables defined as a linear combination of the original coordinates and the elements of V are coefficients for the transformation.
PCA can be conducted by solving the standard eigenvalue problem (Equation (4)), which requires the diagonalization of C ( f × f matrix). PCA can also be performed by the singular value decomposition (SVD) of Q ( f × M matrix). SVD directly provides a decomposition of Q into three matrices: where U T is an M × M matrix of normalized projections defined as: In this case, λ 1/2 is an f × M matrix whose non-diagonal elements are zero. From Equations (4), (5) and (9), it is straightforward to obtain the condition: To quantify the usefulness of PCA for a target dataset, the PC contribution to the total variance is examined, which is defined as: If the contributions from a small number of PCs to the total variance are dominant, PCA is useful for dimensionality reduction because most of the total variance originates from these PCs. Typically, folded proteins are highly anisotropic in nature and PCA is thus useful for analyzing protein dynamics [18][19][20]22].
It is also straightforward to consider hierarchy in the distribution of a target dataset, for example, the distribution of clusters and the distributions of data in each cluster. Suppose that the dataset of interest is clustered into L groups, each consisting of n l data points. In this case, · · · l indicates the mean over n l and f l = n l /K is the fraction of the lth group. The variance-covariance matrix can be divided into two terms as [27]: The first term represents the variance-covariance originating from the distribution of the means of the groups and the second term shows the f l weighted average of intra-group variance-covariance. Although Equation (12) shows a two-hierarchy model, extension to multiple hierarchy is straightforward. If the distribution of each group shows fluctuations in a certain local energy minimum, the first term originates from jumping among energy minima. Using this formulation, the jumping-among-minima (JAM) model shows that the JAM motions that contribute to the first term dominate a small number of anharmonic large-amplitude motions and the second term is attributed to nearly harmonic fluctuations around local energy minima that are detected as Gaussian-like distributions [19,27]. This hierarchical view of protein collective dynamics is useful for understanding the boson peak and glass transition of proteins [28], and for identifying multiple conformers in nuclear magnetic resonance (NMR)-derived structure ensembles of proteins [29,30].

Error in PCA
A dataset to be analyzed by PCA may contain an insufficient number of samples statistically, which can result in instability of the obtained eigenvectors. This situation frequently occurs when standard MD simulations of macromolecules are conducted with all-atom models because the accessible simulation time scale tends to be shorter than the characteristic time scale of macromolecular movement. Hess showed that random diffusions in high-dimensional space can result in cosine-shaped projections of the first few dominant PCs, indicating that short simulation trajectories should be carefully treated [31]. However, using random matrix theory (RMT) [32,33], Palese showed that protein dynamics is not truly Brownian even on a short time scale (~1 ns) while PCA leads to cosine-shaped projections, using apo Cox17 as a model protein [34]. In this method, C is considered to be the sum of the random component C r and non-random component C nr . C r is determined by an iterative method based on RMT, providing the cleaned C nr without the random component and its eigenvectors. Palese also proposed random component analysis (RCA) as a random projection (RP) algorithm [35]. In RCA, PCA of the correlation matrix P = C ij / C ii C jj is considered, and P is replaced by a random symmetric correlation matrix M as a dummy correlation. PCA of M and projection onto the obtained eigenvectors provides the random components. RCA provides dimensionality reduction and cluster detection comparable to that of PCA. Ref. [36] introduced and examined a parameter that evaluates the overlap between such subspaces, called the root mean squared inner product (RMSIP), and suggested that PCA for the concatenated equivalent trajectories achieves better reproducibility.

Relation with NMA
NMA is closely related to PCA as mentioned in Section 1. To conduct NMA, the second derivative matrix of potential energy E (Hessian) F should be calculated at a certain conformation, typically at a local potential energy minimum conformation where the first derivatives ∂E/∂q i = 0 for all i: when Cartesian coordinates are used for Equation (13), q should be mass weighted Cartesian coordinates. To obtain normal mode frequencies and eigenvectors, the standard eigenvalue problem of F is solved as: The βth column vector of W = w 1 · · · w f , w β , represents the βth eigenvector. The eigenvalue matrix ω 2 = ω 2 β determines the angular frequency of the normal modes ω β . Since the variance-covariance matrix C is related to F by C = k B TF −1 (k B : Boltzmann constant, T: absolute temperature), we obtain the relation for the harmonic system: Therefore, comparing λ α obtained by PCA of MD or MC trajectories to the NMAderived k B T/ω 2 α is a straightforward way to examine the anharmonicity or quasi-harmonic features of protein dynamics. W is determined for a potential energy minimum, while MD simulation can sample multiple energy minima. Therefore, V obtained from an MD trajectory can be significantly different from W calculated around a particular local energy minimum. This difference becomes larger as the MD length is increased. To consider the difference between two collective variables and to examine the anharmonicity of an energy surface, the variance expected from NMA along the αth PC is obtained by: λ har α is further used to define the anharmonicity observed in MD along the αth PC, namely, the anharmonicity factor: µ α is unity if the variance is equal to that expected from NMA, indicating that the energy surface along the αth PC is nearly harmonic [27,37]. For short MD trajectories up to 1 ns, the majority of PCs are harmonic and less than 1% of PCs show µ α > 2, and anharmonic motions dominantly contribute to the total variance [19,27].
NMA of proteins was originally conducted with full atomic models [4][5][6]. Over the past 20 years or so, NMA has been more widely used with coarse-grained models as well as with atomic models. Although this is an interesting topic related to PCA, NMA is beyond the scope of this review but is covered in reviews on NMA [38][39][40][41][42][43].

Solvent and Other Environmental Effects on Macromolecular Dynamics
To consider solvent effects on macromolecular dynamics, one of the simplest models is the consideration of the independent Langevin equation for each PC σ α (t) with harmonic potential: ..
where the first term of the right-hand side shows the harmonic force and γ α and R α (t) indicate the Stokes friction coefficient and random force acting on the αth PC, respectively. The autocorrelation functions of σ α (t) and velocity . σ α (t) are given by: .
is a real number and the correlation functions simply consist of two exponential decays, resulting in overdamping motion. If γ α < 2ω α , α consists of real and imaginary parts, with the former showing a single exponential decay and the latter providing the sum of cosine and sine terms, causing damped oscillation. Because of this relation, large-amplitude motions, i.e., low-frequency motions, tend to be γ α > 2ω α and overdamped [12,15]. The density of state (DOS) for this system is obtained as the spectrum of the velocity correlation (Equation (21)): As the ratio γ α /ω α becomes larger, the peak of the spectrum decreases and the area of the spectrum is shifted to the high-frequency region, resulting in lowered spectral density around the peak ω α and seeming disappearance of the peak in DOS [12]. This model also explains why the DOS of BPTI from neutron scattering is lower than that expected from the frequency distribution in the low-frequency region <~20 cm −1 [15].
Assuming each PC is a harmonic Langevin oscillator, the values of ω α and γ α can be estimated from the MD-derived time correlation function or its spectrum in different ways, but the obtained results can be method dependent. Considering Equation (21), one way to calculate γ α is the time derivative of the velocity autocorrelation function at t = 0 [12,15]: If γ α is calculated as the numerical derivative from a difference in the velocity correlation functions between t = 0 and ∆t, the numerical error should be carefully considered [15]. From Equation (21), the first-order approximation of the derivative for small t is obtained as: which indicates a parabolic behavior of "apparent" friction coefficients as a function of ω α deduced by this method, and such behavior is indeed observed [12,15]. From the parabolic feature of γ α (∆t), the "true" friction coefficient γ α = γ α (0) after correction based on Equation (24) can be considered to be almost constant for large-amplitude PCs for a given protein [15]. The value of γ α is quite dependent on protein size, consistent with Stokes-Einstein law, γ = 6πaη/M ∝ M −2/3 (a: radius, η: viscosity, M: mass) [27]. The MD-derived friction coefficient does not necessarily originate from solvent and it can be estimated for MD conducted in vacuum. Ref. [44] used fitting Equation (21) with MDderived data for friction calculations and showed the frequency dependence of the friction coefficient. Ref. [44] also reported that friction in vacuum is directly proportional to the intraprotein interaction of the collective mode but is also proportional to the accessible surface area of the mode in solution. In Ref. [45], MD simulations of myoglobin in aqueous solution between 120 and 300 K showed that the friction coefficient was shown to linearly increase as temperature increases up to 300 K, independent of the glass transition temperature. This tendency cannot be well explained only by the Stokes-Einstein law, at least at 0 • C and higher, because the viscosity of liquid water decreases as a function of temperature. Thus, this tendency must have a different origin. Notably, the Langevin-based time correlation function should be fitted with care because the real time correlation will likely deviate for longer t when a constant value of γ is used without considering the time dependence, as in the generalized Langevin equation. The range of 0-5 ps was used for fitting the time correlation function in Ref. [44] and a very short (2-6 fs) ∆t was used for the numerical derivative at t = 0 in Refs [15,27].
The Langevin mode is a multidimensional version of harmonic Langevin oscillators, formulated as a natural extension of NMA to include solvent effects [46,47]. Considering the f × f friction matrix Γ in addition to Hessian F, the f × 2 f eigenvector matrix S and the 2 f × 2 f diagonal eigenvalue matrix ξ = {ξ α } are determined by the relations: where the matrix elements of L and ξ are complex numbers. The friction matrix Γ can be modeled by diffusion tensors derived from hydrodynamics of polymers in solution [48][49][50].
Equations (25) and (26) correspond to (14) and (15) in NMA. It should be noted that the eigenvalue of the αth Langevin mode ξ α can have both real and imaginary parts, which determine the damping factor and oscillatory frequency, respectively. In this case, the mode is underdamping. If ξ α is a real number, the mode is overdamping. In the limit Γ → 0 , the equations for NMA are recovered as: A more general formulation based on the generalized Langevin equation and 3D reference interaction site model (3D-RISM) was proposed, called the Kim-Hirata theory [51,52]. This theory introduces the variance-covariance matrix obtained from the equilibrium free energy surface in solution determined by 3D-RISM and this matrix describes the force restoring an equilibrium conformation. Ref. [52] also proposed a protocol to evaluate friction based on the friction of an imaginary atom in solution from the site-site mode coupling theory [53,54], multiplied by the fraction of a protein atom contacting the solvent defined by the radial distribution function of solvent around the protein atom. The Kim-Hirata theory was further extended to analyze the temperature-dependent mean-square deviation of proteins [55].

Choice of Variables and Spaces for Better Representation of Macromolecular Dynamics in PCA
To successfully analyze macromolecular dynamics, a set of variables that well represent important features of dynamics should be employed for PCA. The Cartesian coordinates of atoms are frequently used. For a set of selected N atoms of the molecules of interest, the number of variables is f = 3N. The use of all-atom coordinates, including hydrogens, is useful for characterizing the anharmonic nature of protein dynamics directly compared to NMA [12,15,27,37,56]. The selection of C α atoms is useful for selecting a small number of large-amplitude motions, namely, "essential dynamics" [14]. Raw atomic coordinates from the original dataset typically reflect internal movements of the selected atoms, as well as overall translation and rotation. The translational and rotational components can be eliminated by the best fit of each dataset (typically a snapshot of a simulation trajectory) to a reference dataset so that the Eckart condition [57] is satisfied, for example, using the Kabsch method [58] or another method. To set the average q as the origin of the coordinates, q obtained after best fit should be used as the reference for the next round of best fit [12]. q quickly converges within about five cycles with this procedure. Once translational and rotational components are completely eliminated, Cartesian PCA results in (3N − 6) positive eigenvalues for internal motions and six zero eigenvalues corresponding to PCs of the translation and rotation.
Internal coordinates are also used frequently in PCA. In all-atom models of macromolecules, dihedral angles mainly determine the overall conformation of each molecule because the contributions of bond length and angle changes are relatively small. The significant movements of dihedral angles result in protein atoms moving non-linearly. Therefore, PCA in dihedral angle space can provide different information on protein dynamics compared to Cartesian PCA. If the deviation of dihedral angles from the average ∆θ is directly used as ∆q, the deviation of atomic coordinates from the average ∆r is related to ∆θ as a first-order approximation as: where L = dr/dθ is the Jacobian matrix that is evaluated for the average structure [59,60]. Omori et al. investigated the relation between the motions of atoms and dihedral angles and showed that the latter mutually move in a compensative manner, called "latent dynamics" [60]. The variance-covariance matrix of dihedral angles C θ was compared toC θ deduced from the atomic variance-covariance matrix C r as: Using backbone atoms (N, C α and C) and dihedral angles (φ, ψ, and ω) of small globular proteins, Omori et al. also showed thatC θ precisely recovers the information of C r and contains higher-order dihedral correlations, but C θ does not [60]. Additionally, the mean-square atomic displacements tended to be minimized upon rotation of the dihedral angles, indicating the compensative nature of dihedral dynamics. However, such latent dynamics behavior was not seen in dihedral PCs of deca-alanine, a short peptide. The non-Euclidean nature of dihedral angles is not sufficiently considered in a linear transformation. PCA with dihedral angles may also require careful treatment as they are singular between 180 • and −180 • , also called a periodic boundary. Stock and coworkers proposed using dihedral angles differently in their dihedral angle PCA (dPCA), which uses cosines and sines of dihedral angle θ l [61][62][63]: where l denotes the dihedral angle index. This treatment can be considered as using the real and imaginary parts of exp iθ l . Originally, Stock and coworkers focused on main chain φ and ψ [62], but it is straightforward to include other dihedral angles in this framework. In the case of short peptides, more rugged free energy landscapes were observed in the space spanned by the first few PCs in dPCA compared to the results of Cartesian PCA [61][62][63].
Based on the analysis of folding dynamics of the villin headpiece 35 (HP35), a small protein and native dynamics of BPTI up to the millisecond time scale, they reported that Cartesian PCA failed to capture important features of the free energy landscape and that dPCA gave a better presentation of the landscape [64]. Instead of projecting straight lines in the extrinsic tangent space of a mean, PCA for Riemannian manifolds was proposed based on geodesics of the intrinsic metric [65].
GeoPCA is a tool for dihedral angle-based principal component geodesics, in which angular data are projected on a sphere composed of the first two principal component geodesics [66]. GeoPCA was validated by using it to cluster a set of RNA conformations derived from a database comprising 73 RNA structures. Dihedral principal geodesic analysis (dPGA) was applied to reduce the dimension of the protein structure ensemble and the result was compared to the results obtained using PCA and dPCA [67].
The n-dimensional torus is a product space of n circles and can be used to characterize dihedral dynamics. Torus-PCA (T-PCA) was proposed and applied to the RNA benchmark used in GeoPCA [66], demonstrating the validity of T-PCA [68]. Another approach, dPCA+, minimizes residual projection error by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary of a dihedral angle [69]. Interestingly, this transformation also minimized the error of the covariance matrix. dPCA+ was also used to examine the non-equilibrium process simulated by targeted MD (TMD), and the free energy profiles of deca-alanine obtained by unbiased MD, Jarzynski identity and second-order cumulant approximation were compared [70]. In addition, the landscapes from unbiased MD, TMD and reweighted TMD were investigated in that report.
Other variables have been introduced to conduct PCA of simulation trajectories. To visualize MD and MC trajectories, Abagyan and Argos introduced a distance measure between two conformers, defined based on dihedral angles [71]. Java Essential Dynamics (JED) can use internal distance pair coordinates (dpPCA) as an option [72]. Ernst et al. introduced contact distance-based PCA (conPCA), reciprocal distance-based (iconPCA) and PCA based on inter-C α distances (C α PCA) and compared each result to those obtained using dPCA [73]. For conPCA, the distance between the closest heavy atom of each residue is considered as a contact if it is less than 4.5 Å and the residue pair of the contact is separated by more than three residues along the sequence [73,74]. Thus, conPCA can consider side chains in contact with each other but excludes information regarding local fluctuation along the sequence. Using 300 µs HP35 and 1 ms BPTI MD trajectories and examining the resolution of the free energy landscape and the decay of autocorrelation functions, Ernst et al. showed that distance-based PCAs, particularly C α PCA, tend to be versatile, but they exhibit fewer landscape details than dPCA does [73]. Recently, Ogata proposed grid-based PCA (GBPCA), which considers a grid system consisting of cubes with 5 Å edges [75]. This method uses a unit vector of mass-weighted averages of atoms in each cube to calculate the correlations to be diagonalized and was applied to bulk water, bulk methane and hydrated proteins.
This review focuses on the PCA of structural data Q, but it is worth mentioning that PCA is also used for analyzing multiple spectra measured under different conditions [76]. Consider a set of spectra obtained as a function of frequency or wavelength measured under different temperatures, pH, times, etc. Q can be constructed with i = 1, . . . , f for frequency or wavelength and m = 1, . . . , M for the conditions. For example, rapid scanning wavelength stopped-flow kinetics experiments on liver alcohol dehydrogenase (LADH) whose reaction is spectrally complex were analyzed by PCA and absorbing species were identified [77,78]. Yuan et al. analyzed Fourier transform near-infrared (FT-NIR) spectra of bovine serum albumin (BSA) at temperatures ranging 45-85 • C by PCA and evolving factor analysis (EFA) [79]. The contributions from the first PCs obtained for two frequency ranges were both greater than 99%, indicating that most of the spectral variations are explained by the first PCs. PCA and EFA also revealed temperatures of structure changes. Sakurai and Goto conducted the PCA of pH dependence of heteronuclear sequential quantum correlation (HSQC) spectra of β-lactoglobulin measure by NMR spectroscopy and identified three conformational transitions at different pH [80]. These results validate the application of PCA for characterizing various condition-dependent spectra and for investigating structure changes, which also enables the PCA under a combination of multiple conditions, such as temperature-dependent kinetics.
As shown in Section 2, PCA and SVD provide the same information, V, U and λ from Q, but SVD directly determines these matrices without calculating C. Thus, SVD is also employed for analyzing spectra similar to PCA. In spectral analysis, U quantifies condition-dependent components while V describes the condition-independent basis sets. By focusing on dominant components in λ, SVD can act as a mechanism-independent noise filter for spectra. Thus, SVD is regarded as an automated procedure of modeling of spectroscopic datasets [81,82] and is also used for the analysis of protein dynamics. For example, Hofrichter et al. measured time-dependent optical absorption spectra from 3 ns to 100 ms after the photolysis of the CO complex of hemoglobin and identified three significant basis spectra and five exponential relaxations from the time course of their amplitudes by SVD, which enabled the analysis of ligand rebinding kinetics [83]. Moffat and coworkers developed a method to analyze time-dependent difference electron density by SVD and analyzed structural intermediates of photoactive yellow protein (PYP) [84][85][86]. These works indicate the usefulness of SVD in spectral analysis.

The Fluctuation-Dissipation Theorem, Linear Response Theory and PCA
The fluctuation-dissipation theorem states that the linear response of a given system to an external perturbation is expressed in terms of fluctuation properties of the system in thermal equilibrium [87,88]. In a time-independent form, the linear response theory (LRT) shows that a perturbation applied to a system f results in response ∆q R mediated by the variance-covariance matrix C as: Using C obtained from MD simulations of an unliganded protein and f mimicking the protein-ligand interaction, LRT was shown to reproduce the response of the liganded protein [89]. Additionally, dihedral LRT based on the variance-covariance derived using Equation (30) was shown to better predict the ligand-bound form of ferric-binding protein [59]. Time-independent and time-dependent LRT showed agreement for the time response of myoglobin upon CO binding between LRT, ultraviolet resonance Raman spectroscopy and time-resolved X-ray crystallography, suggesting that the primary response can be described by LRT [90]. Hirata proposed a theory to evaluate a response function based on the aforementioned Kim-Hirata theory [91].
If LRT is considered in the PC space, the expected response (Equation (32)) in PC space σ R is obtained in a manner similar to Equation (6) as: Additionally, considering Equations (4) and (32), we obtain: If f is applied as an isotropic random perturbation, the perturbation in PC space f PC = V T f is also isotropic. However, the response σ R is scaled by λ, meaning that the perturbation force acting along the PC is proportional to λ α [92]. Equation (34) also indicates that random perturbations are expected to cause highly anisotropic responses in the protein because proteins fluctuate in a highly anisotropic manner in equilibrium. Transform and relax sampling (TRS) enhances anisotropic protein movements, implicitly expecting the response in Equation (34) but without actually calculating C [92]. TRS is carried out as cycles of transform, relax and sampling stages. In the transform stage, the protein is perturbed by random forces during MD, then the protein is relaxed during MD without perturbation in the relax stage and finally usual MD is conducted as the sampling stage. TRS successfully simulated open-close motions of domains of multi-domain proteins several times within a simulation time of 20 ns. Additionally, folding-unfolding transitions of the "mini-protein" chignolin were observed many times during a 100 ns simulation.
Linear response path following (LRPF) simulates global conformational changes in proteins upon ligand binding by periodically updating a linear response (LR) force with three phases of MD, enabling non-linear transformation to a target direction [93]. In the first phase, the LR force is obtained by computing C and a mean force acting from ligands during equilibrium MD. Biased MD subsequently induces conformational change in the second phase, then the final MD re-equilibrates the system without bias. LRPF predicted an inward-facing form of mitochondrial ADP/ATP carrier (AAC), a membrane transporter, starting from an outward-facing form determined by X-ray crystallography [94].

Non-Gaussianity and Non-Linearity in PCA
PCA performs best if the distribution of a dataset is a multidimensional Gaussian that depends only on the mean and variance-covariance (second moments), which are the only quantities considered to determine collective variables in PCA. However, a small number of dominant PCs show non-Gaussian distributions in protein dynamics. These non-Gaussian collective variables are believed to be important for protein function [12][13][14][15] and indicate a limitation of using second moments in determining new collective variables. Since non-Gaussian distributions cannot be well characterized with mean and second moments only, one solution may be to consider higher-order moments to determine collective variables other than PC. Independent component analysis (ICA) separates non-Gaussian signals that are independent from Gaussian noise and typically considers mutual information (MI) to quantify the correlations and determine the optimal coordinate transformation to minimize the MI [95][96][97]. Typical ICA employs preprocessing, "centering" and "whitening" [97]. "Centering" corresponds to the treatment already completed in PCA, as in Equation (1), in which the mean is subtracted. "Whitening" is typically the normalization of components by their standard deviations. Early applications of ICA for protein dynamics by full component analysis (FCA) used Cartesian coordinates [98] while the dihedral of Equation (31) was used for ICA in Ref. [99].
Most of the methods described in this review focus on extracting mutually independent components as much as possible, whereas independent subspace (ISA) determines significantly correlated collective motions. ISA features non-Gaussian behaviors similar to ICA, using fourth-order cumulants [100]. In this method, ISA based on the subspace joint approximate diagonalization of eigenmatrices algorithm (SJADE) [101] extracts several independent subspaces, in each of which collective modes are significantly correlated while the other modes are independent. Application of this method successfully detected the modes with long-tailed non-Gaussian probability distributions [100].
Another limitation is the linear transformation of PCA, whereas protein dynamics can be highly non-linear in nature. This problem was partially discussed in Section 6, mainly in relation to the use of dihedral angles in PCA and thus other PCA variants are discussed here. For example, Nguyen proposed the use of non-linear PCA (NLPCA) [102], enabled by non-linear mapping based on neutral networks [103].
Another possible solution for incorporating non-linearity in PCA is the introduction of kernel methods [104]. In kernel PCA [104], a new "feature space" F, which is nonlinearly related to the original space, is introduced and principal components in F are considered. ∆q m is non-linearly mapped to a function Φ(∆q m ) that satisfies the condition Φ(∆q m ) = 0. Consider the variance-covariance in F as: and the eigenvalue problem shown in Equation (4). The nth eigenvector v n is defined as a linear combination of Φ(∆q m ) with a coefficient matrix α = {α mn } as: Here, we introduce the kernel representation k(x, y) = Φ(x)Φ T (y) and an M×M matrix K as: As shown in Ref. [104], α is obtained by the relation: which can be solved by diagonalizing K using the condition: Using K and α, principal components in F space are obtained as the projection of Φ(∆q) onto the nth eigenvector by: Equations (38) and (40) indicate that eigenvalues and principal components in F space are determined from the kernel K without directly solving Equation (35), which is typically difficult. It is worth mentioning that the use of k(x, y) = xy T recovers the original PCA and the use of k(x, y) = xy T d (d > 1) considers higher moments. Additionally, Gaussian kernel k(x, y) = exp − x−y 2 2σ 2 with the adjustable parameter σ is frequently used in kernel methods. Kernel PCA can also be applied to the analysis of protein dynamics [21].
"Diffusion maps" is used as nonlinear dimensionality reduction method to embed data points into a Euclidean space in which the Euclidean distance is equal to "diffusion distance" and to conduct the reduction by neglecting certain dimensions in the diffusion space [105][106][107]. Diffusion maps considers a Markov chain random walk on normalized data points and the probability of one-step random walk between two data points (connectivity), which is proportional a kernel function (typically Gaussian kernel). For diffusion maps, "diffusion matrix" is obtained by normalizing the rows of the kernel matrix, eigenvectors of the diffusion matrix are calculated and the diffusion mapping is conducted by mapping the data points to dominant eigenvectors in the diffusion space. Ferguson et al. used diffusion maps for a protein simulation for the first time to investigate trajectories of replica-exchange molecular dynamics (REMD) simulations of pro-microcin J25 (pro-MccJ25), the 21-residue uncyclized analog of antimicrobial peptide microcin J25 (MccJ25), which identified two global order parameters and three distinct pathways of conformational change [108]. One of the pathways was shown to correspond to a conformational change to left-handed lasso coil conformations. Kim et al. employed diffusion maps to characterize MD trajectories of Trp-cage miniprotein folding [109]. Recently, Trstanova et al. demonstrated the use of diffusion maps to identify metastable states as well as to formalize the locality within the metastable states in the analysis of molecular systems [110].

Detecting Data Differences by PCA and Related Methods
PCA is also used to detect differences in a dataset or featurize the differences between datasets. Self-consistency of dipolar couplings analysis (SECONDA) is a PCA based on residual dipolar coupling (RDS) measured by NMR and was developed to examine the effects of different alignment media used for RDS measurements [111]. The results showed that if the structure and dynamics of the target molecule are the same, PCA gives at most five nonzero eigenvalues, but additional nonzero eigenvalues are obtained if differences exist.
Howe conducted Cartesian PCA of 49 NMR-determined structures of EF40, a 28-residue peptide and demonstrated that the structures are clustered and outliers are detected [112]. Using the online tool PCA_NEST, systematic analysis of 24 pairs of enzyme structures determined by both solution NMR and X-ray crystallography revealed differences in the solution and crystal structures of the proteins [113]. Notably, the X-ray structures were shown to be a conformational state along the dominant PCs derived from NMR models, consistent with the expectation from LRT, since the environmental differences (in crystal or in solution) can be considered as a perturbation to the protein.
Sakuraba and Kono employed linear discriminant analysis with iterative procedure (LDA-ITER) to compare two trajectories obtained under different conditions [114]. LDA-ITER was developed to consider the trace ratio optimization problem in supervised learning and maximizes the ratio of two matrices by unitary transformation [115,116]. This method finds the axis of projection that separates two trajectories while keeping each trajectory well clustered. LDA-ITER was applied to the wild-type and R96H mutant of T4 lysozyme, as well as to the liganded and unliganded PDZ2 domains of human phosphatase hPTP1E. The results showed very clear separation of two kinds of trajectories along the first dimension and a Gaussian-like distribution for each cluster. In contrast, the projections onto the first two PCAs significantly overlapped, and another method, partial least squares discrimination analysis (PLS-DA) [117], gave less overlapped results in the 2D projection compared to PCA. However, LDA-ITER gave the best results. In addition, important differences were characterized on a residue-by-residue basis.
Relative principal component analysis (RPCA) also featurizes the differences between two datasets, but the transformation is determined by maximizing the Kullback−Leibler divergence between the probability distributions between the two states and by simultaneously diagonalizing two variance-covariance matrices of the states with a single transformation matrix [118]. The application of RPCA to HIV-1 protease showed better performance compared to PCA and identified conformational hotspots.

Time Evolution of Collective Variables
The methods described in Sections 4 and 5 consider the time evolution of collective variables based on physical models, but another type of approach investigates evolution from a phenological perspective. Time-independent component analysis (tICA) is a variation of ICA that focuses on the time independence between time 0 and τ, instead of considering non-Gaussianity [119][120][121]. In addition to C defined in Equation (1), tICA introduces the time-lagged variance-covariance matrix: using the time lag parameter τ. Although C = C(0) is symmetric, C(τ) can be a nonsymmetric matrix. In tICA, the generalized eigenvalue problem is solved with the normalization condition: where ζ = {ζ α } and Y = y 1 · · · y f are eigenvalue and eigenvector matrices, respectively. Since matrix elements of ζ and Y are generally obtained as complex numbers, the symmetrization C(τ) = 1 2 C(τ) + C T (τ) conducted in Refs [120,121] and Equation (42) is modified as: such that ζ and Y comprise real values. Combining Equation (42) or (44) with (43) gives: respectively. Equations (43) and (45) indicate that the autocorrelation function of the αth time-independent component at time 0 is y T α Cy α = 1 and that at time τ is y T α C(τ)y α = ζ α . If the autocorrelation function is a single exponential decay with characteristic time T α , we obtain the relation exp(−τ/T α ) = ζ α , which results in: Dynamic component analysis (DCA) considers the time-lagged variance-covariance matrix for normalized PCs, but Equation (41) is recovered in the original coordinates [122], indicating the equivalence of tICA and DCA in this formulation. In practice, however, DCA uses the inter-residue distance (distance map) as the coordinates of PCA [122].
Relaxation mode analysis (RMA) [123][124][125][126][127] considers "whitened" signals as the αth relaxation mode φ α (t) as: where κ = {κ α } and δ αβ indicate the relaxation rate and Kroneker delta, respectively. Independence and single exponential decay of each relaxation mode are assumed in Equation (47). Using c(τ) values at time τ and c = c(0), the relaxation rates are determined as: These equations correspond to Equations (42) and (43) for tICA, indicating the equivalence of RMA and tICA, and that RMA in practice can be considered as tICA for the whitened signals. Similar to tICA, symmetrization c(τ) = 1 2 c(τ) + c T (τ) is also used in RMA. Both tICA and RMA can obtain T α or κ α , which characterize a single exponential decay.
In principle, the results of these methods depend on the time lag parameter τ. The rigid-body domain motion of lysine-, arginine-and ornithine-binding protein (LAO) was analyzed by tICA in the ligand-free open state during a 600 ns MD using a 10 ns lag time, and the IC vectors and relaxation rates were shown to be fairly robust on this time scale [120]. tICA of LAO backbone dynamics for 1 µs was conducted using a 1 ns lag time [121], whereas DMA of the folding/unfolding dynamics of the FiP35 WW domain for two 100 µs MD trajectories used a 10 ns lag time. RMA of a 10-residue peptide, chignolin, for a 750 ns MD simulation at 450 K in solution used a relatively short lag time (10 ps) [127]. A two-step RMA was proposed and used to conduct a 2 µs MD trajectory of hen egg-white lysozyme [128].
tICA projections of high-dimensional random walks were recently shown to resemble cosine functions, be strongly dependent on the lag time and be very similar to those of 1 µs MD trajectories of proteins, particularly for larger proteins [129]. Although the introduction of a lag time allows tICA to provide richer information than PCA, care must be taken in choosing the lag time.
Time-dependent PCA (TDPCA), proposed recently, conducts multiple PCAs for short segments taken from a single MD trajectory by shifting the time window and allowing overlap, which provides time-dependent eigenvalues and eigenvectors. This approach was applied to a bulk water model and a coarse-grained protein-G model [130].
tICA and PCA are widely used as dimension reduction methods in the Markov state model (MSM) [131][132][133]. Notably, kernel methods were combined with tICA to provide kernel tICA (ktICA) for MSM [134]. The use of tICA-related methods in MSM is described in detail in recent review articles on MSM [135,136].

Conclusions
This review reported the development of PCA and related methods for analyzing protein dynamics, from basic concepts to the latest advanced methods. Possible applications of PCA are now very broad and many variations in PCA have been developed for specific purposes. As is clear from this review, it is difficult to specify the best and most versatile PCA method. Rather, the most suitable method should be chosen based on the purpose of the simulation and analysis. In many of the examples described, improved methods provided better performance than "classical" PCA. Additionally, the choice of original coordinates is important, as shown in Section 6. In addition, different methods should be tested, allowing the best method to be selected or to examine the validity of the obtained conclusion.
It is worth mentioning that some of the advanced methods described in this review do not directly employ original coordinates for large molecules but rather use a two-step procedure. Specifically, standard PCA and dimensional reduction into a smaller subspace are conducted first, then more advanced component analysis is conducted. For example, FCA [98] and ISA [101] described in these references used 100 PCs. DCA described in [122] used dimensional reduction by PCA but the number of PCs employed is unclear to us. The top five PCs were used in kPCA [21]. Therefore, even in cases where more sophisticated methods than standard PCA are used, PCA can be used for dimensionality reduction and as a reference for the comparison of other PCA-related methods.
Author Contributions: Conceptualization, A.K.; investigation, A.K.; writing-original draft preparation, review and editing, A.K. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The author declares no conflict of interest.