# Implications of PCCA+ in Molecular Simulation

## Abstract

**:**

## 1. The Foundation of PCCA+

`cluster_by_isa`) is implemented and incorrectly denoted as PCCA+ like in [5]. PCCA+ assures that the transformation $\mathcal{A}$ is feasible in the following sense: the resulting membership functions are a partition-of-unity and the membership values are non-negative. Briefly, the linearly constrained feasible set $\mathcal{F}$ of transformation matrices is given by:

- Treating non-autonomous processes is also possible, i.e., whenever transition probabilities of the Markov chain depend on external variables. In this case, these variables become additional coordinates of the state space of the Markov process. The analysis of metastable sets turns into a coherent sets analysis [17].

## 2. Interpretation of $\chi =X\mathcal{A}$

**Example**

**1.**

- Then one has to linearly transform the basis of this invariant subspace X in such a way, that all basis vectors form a non-negative partition of unity. This means, one has to search for a suitable transformation matrix $\mathcal{A}\in {\mathbb{R}}^{n\times n}$ which provides $\chi =X\mathcal{A}$.
- With the aid of $\chi $ it is possible to construct the projected transfer operator. This will be exemplified later in Equation (5).

## 3. Consequences of $\chi =X\mathcal{A}$

#### 3.1. Rebinding Effects

#### 3.2. Exit Rates

#### 3.3. Kinetic ITC

#### 3.4. Pericyclic Reactions

#### 3.5. Sequential Spectroscopy

- In order to transform the affine linear space into a linear subspace, one could for example subtract the mean vector $\overline{v}$ (taking the mean over all measurements) from all rows of V. This leads to a normalized matrix $\overline{V}=V-{1\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}1}_{m}{\overline{v}}^{T}$, where ${1\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}1}_{m}$ is an m-dimensional constant vector, all elements are 1.
- If the assumptions for solving the problem are correct, then $\overline{V}$ should have rank $n-1$. Together with a singular value decomposition $\overline{V}=\overline{X}\Sigma \overline{Y}$, this can be used to determine n.
- We know that the columns of the matrix $\chi $ span a linear space in which the vector ${1\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}1}_{m}$ is included. Thus, after determining n and $\overline{X}$, we construct the n-dimensional linear space $\{{1\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}1}_{m},{\overline{x}}_{1},\dots ,{\overline{x}}_{n-1}\}$ by accounting for the first $n-1$ column vectors ${\overline{x}}_{i}$ in $\overline{X}$. With the aid of Gram-Schmidt orthogonalization, this basis of a linear space can be transformed into an orthogonal basis with ${1\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}1}_{m}$ being the first basis vector. The n orthogonalized basis vectors are organized in a matrix $X\in {\mathbb{R}}^{m\times n}$. This matrix provides the linear space for the convex combination factors, i.e., there must be a linear transformation $\mathcal{A}\in {\mathbb{R}}^{n\times n}$, such that $\chi =X\mathcal{A}$.
- Given the matrix X constructed from the spectral data, PCCA+ applied to X will provide a row-stochastic matrix $\chi $ which is the proposed matrix of concentrations. Thus, PCCA+ solves the problem.

## 4. Conclusions

## Supplementary Materials

## Acknowledgments

## Conflicts of Interest

## References

- Noe, F.; Wu, H.; Prinz, J.H.; Plattner, N. Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys.
**2013**, 139, 184114. [Google Scholar] [CrossRef] [PubMed] - Schütte, C.; Fischer, A.; Huisinga, W.; Deuflhard, P. A direct approach to conformational dynamics based on hyprid Monte Carlo. J. Comp. Phys.
**1999**, 151, 146–168. [Google Scholar] [CrossRef] - Deuflhard, P.; Huisinga, W.; Fischer, A.; Schütte, C. Identification of Almost Invariant Aggregates in Reversible Nearly Uncoupled Markov Chains. Linear Algebra Appl.
**2000**, 315, 39–59. [Google Scholar] [CrossRef] - Weber, M.; Galliat, T. Characterization of Transition States in Conformational Dynamics using Fuzzy Sets; ZIB Report ZR-02-12; Zuse Institute Berlin: Berlin, Germany, 2002. [Google Scholar]
- Kumar, P.; Ravindran, B.; Niveditha. Spectral Clustering as Mapping to a Simplex. ICML Workshop Spectr. Learn.
**2013**, 1–9. Available online: https://www.researchgate.net/publication/261797963_Spectral_ Clustering_as_Mapping_to_a_Simplex (accessed on 7 January 2018). - Röblitz, S. Statistical Error Estimation and Grid-Free Hierarchical Refinement in Conformation Dynamics. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2008. [Google Scholar]
- Weber, M.; Rungsarityotin, W.; Schliep, A. An Indicator for the Number of Clusters Using a Linear Map to Simplex Structure. In Studies in Classification, Data Analysis, and Knowledge, Proceedings of the 29th Annual Conference of the German Classification Society, 9–11 March 2005, Magdeburg, Germany; From Data and Information Analysis to Knowledge Engineering; Springer: Berlin, Germany, 2006; pp. 103–110. [Google Scholar]
- Martini, L.; Kells, A.; Covino, R.; Hummer, G.; Buchete, N.V.; Rosta, E. Variational Identification of Markovian Transition States. Phys. Rev. X
**2017**, 7, 031060. [Google Scholar] [CrossRef] - Deuflhard, P.; Weber, M. Robust Perron Cluster Analysis in Conformation Dynamics. Linear Algebra Appl.
**2005**, 398c, 161–184. [Google Scholar] [CrossRef] - Weber, M. Meshless Methods in Conformation Dynamics. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2006. [Google Scholar]
- Berg, M. Laufzeitoptimierung der Robusten Perron Cluster Analyse (PCCA+). Master’s Thesis, FU Berlin, Berlin, Germany, 2012. [Google Scholar]
- Röblitz, S.; Weber, M. Fuzzy Spectral Clustering by PCCA+. In Classifcation and Clustering: Models, Software and Applications; WIAS Report; Weierstrass Institute: Berlin, Germany, 2009; Volume 26, pp. 73–79. [Google Scholar]
- Wu, H. Maximum margin clustering for state decomposition of metastable systems. Neurocomputing
**2015**, 164, 5–22. [Google Scholar] [CrossRef] - Voss, J.; Belkin, M.; Rademacher, L. The Hidden Convexity of Spectral Clustering. In Proceedings of the 13th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2108–2114. Available online: https://www.researchgate.net/publication/260520172_The_Hidden_Convexity_of_ Spectral_Clustering (accessed on 7 January 2018).
- Fackeldey, K.; Weber, M. GenPCCA—Markov State Models for Non-Equilibrium Steady States. In Big Data Clustering; WIAS Report; Weiserstrass Institute: Berlin Germany, 2017; Volume 29, pp. 70–80. [Google Scholar]
- Weber, M. Eigenvalues of Non-Reversible Markov Chains—A Case Study; ZIB Report ZR-17-13; Zuse Institute Berlin: Berlin, Germany, 2017. [Google Scholar]
- Fackeldey, K.; Koltai, P.; Nevir, P.; Rust, H.; Schild, A.; Weber, M. From Metastable to Coherent Sets—Time-discretization Schemes; ZIB Report ZR-17-74; ZIB: Berlin, Germany, 2017. [Google Scholar]
- Weber, M. A Subspace Approach to Molecular Markov State Models via a New Infinitesimal Generator. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2011. [Google Scholar]
- Haack, F.; Fackeldey, K.; Röblitz, S.; Scharkoi, O.; Weber, M.; Schmidt, B. Adaptive spectral clustering with application to tripeptide conformation analysis. J. Chem. Phys.
**2013**, 139, 194110. [Google Scholar] [CrossRef] [PubMed] - Weber, M.; Fackeldey, K.; Schütte, C. Set-free Markov State Model Building. J. Chem. Phys.
**2017**, 146, 124133. [Google Scholar] [CrossRef] [PubMed] - Schütte, C.; Sarich, M. A critical appraisal of Markov state models. Eur. Phys. J.
**2015**, 224, 2445–2462. [Google Scholar] [CrossRef] - Nielsen, A. Computation Schemes for Transfer Operators. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2016. [Google Scholar]
- Bujotzek, A.; Weber, M. Efficient simulation of ligand-receptor binding processes using the conformation dynamics approach. J. Bio. Inf. Comp. Bio.
**2009**, 7, 811–831. [Google Scholar] [CrossRef] - Bujotzek, A. Molecular Simulation of Multivalent Ligand-Receptor Systems. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2013. [Google Scholar]
- Fasting, C.; Schalley, C.; Weber, M.; Seitz, O.; Hecht, S.; Koksch, B.; Dernedde, J.; Graf, C.; Knapp, E.W.; Haag, R. Multivalency as a Chemical Organization and Action Principle. Angew. Chem. Int. Ed.
**2012**, 51, 10472–10498. [Google Scholar] [CrossRef] [PubMed] - Sarich, M. Projected Transfer Operators. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2011. [Google Scholar]
- Weber, M.; Kube, S. Preserving the Markov Property of Reduced Reversible Markov Chains. In Numerical Analysis and Applied Mathematics, Proceedings of the 6th International Conference on Numercial Analysis and Applied Mathematics, Kos, Greece, 16–20 September 2008; AIP Conference Proceedings. 2008; Volume 1048, pp. 593–596. [Google Scholar]
- Vanden-Eijnden, E.; Venturoli, M. Markovian milestoning with Voronoi tessellations. J. Chem. Phys.
**2009**, 130, 194101. [Google Scholar] [CrossRef] [PubMed] - Vauquelin, G. Effects of target binding kinetics on in vivo drug efficacy: koff, kon and rebinding. Br. J. Pharmacol.
**2016**, 173, 1476–5381. [Google Scholar] [CrossRef] [PubMed] - Röhl, S. Computing the minimal rebinding effect for nonreversible processes. Master’s Thesis, FU Berlin, Berlin, Germany, 2017. [Google Scholar]
- Weber, M.; Bujotzek, A.; Haag, R. Quantifying the rebinding effect in multivalent chemical ligand-receptor systems. J. Chem. Phys.
**2012**, 137, 054111. [Google Scholar] [CrossRef] [PubMed] - Weber, M.; Fackeldey, K. Computing the Minimal Rebinding Effect Included in a Given Kinetics. Multiscale Model. Simul.
**2014**, 12, 318–334. [Google Scholar] [CrossRef] - Abendroth, F.; Bujotzek, A.; Shan, M.; Haag, R.; Weber, M.; Seitz, O. DNA-controlled bivalent presentation of ligands for the estrogen receptor. Angew. Chem. Int. Ed.
**2011**, 50, 8592–8596. [Google Scholar] [CrossRef] [PubMed] - Kijima, M. Markov Processes for Stochastic Modeling; Chapman & Hall: London, UK, 1997. [Google Scholar]
- Pavliotis, G.A. Stochastic Processes and Applications; Texts in Applied Mathematics; Springer: Berlin, Germany, 2014; Volume 60. [Google Scholar]
- Weber, M.; Ernst, N. A fuzzy-set theoretical framework for computing exit rates of rare events in potential-driven diffusion processes. arXiv, 2017; arXiv:1708.00679. [Google Scholar]
- Gzyl, H. The Feynman-Kac formula and the Hamilton-Jacobi equation. J. Math. Anal. Appl.
**1989**, 142, 77–82. [Google Scholar] [CrossRef] - Vander Meulen, K.A.; Butcher, S.E. Characterization of the kinetic and thermodynamic landscape of RNA folding using a novel application of isothermal titration calorimetry. Nucleic Acids Res.
**2011**, 40, 2140–2151. [Google Scholar] [CrossRef] [PubMed] - Dumas, P.; Ennifar, E.; Da Veiga, C.; Bec, G.; Palau, W.; Di Primo, C.; Piñeiro, A.; Sabin, J.; Muñoz, E.; Rial, J. Chapter Seven-Extending ITC to Kinetics with kinITC. Methods Enzymol.
**2016**, 567, 157–180. [Google Scholar] [PubMed] - Burnouf, D.; Ennifar, E.; Guedich, S.; Puffer, B.; Hoffmann, G.; Bec, G.; Disdier, F.; Baltzinger, M.; Dumas, P. kinITC: a new method for obtaining joint thermodynamic and kinetic data by isothermal titration calorimetry. J. Am. Chem. Soc.
**2011**, 134, 559–565. [Google Scholar] [CrossRef] [PubMed] - Igde, S.; Röblitz, S.; Müller, A.; Kolbe, K.; Boden, S.; Fessele, C.; Lindhorst, T.K.; Weber, M.; Hartmann, L. Linear Precision Glycomacromolecules with Varying Interligand Spacing and Linker Functionalities Binding to Concanavalin A and the Bacterial Lectin FimH. Marcomol. Biosc.
**2017**, 17, 1700198. [Google Scholar] [CrossRef] [PubMed] - Bowman, G.; Pande, V.; Noé, F. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation; Advances in Experimental Medicine and Biology; Springer: Berlin, Germany, 2013; Volume 797. [Google Scholar]
- Lie, H.; Fackeldey, K.; Weber, M. A Square Root Approximation of Transition Rates for a Markov State Model. SIAM. J. Matrix Anal. Appl.
**2013**, 34, 738–756. [Google Scholar] [CrossRef] - Donati, L.; Heida, M.; Weber, M.; Keller, B. Estimation of the infinitesimal generator by square-root approximation. arXiv, 2017; arXiv:1708.02124. [Google Scholar]
- Schild, A. Electron Fluxes During Chemical Processes in the Electronic Ground State. Ph.D. Thesis, FU Berlin, Berlin, Germany, 2013. [Google Scholar]
- Luce, R.; Hildebrandt, P.; Kuhlmann, U.; Liesen, J. Using Separable Nonnegative Matrix Factorization Techniques for the Analysis of Time-Resolved Raman Spectra. Appl. Spectrosc.
**2016**, 70, 1464–1475. [Google Scholar] [CrossRef] [PubMed] - Röhm, J. Non-Negative Matrix Factorization for Raman Data Spectral Analysis. Master’s Thesis, FU Berlin, Berlin, Germany, 2017. [Google Scholar]
- Schmid, P. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech.
**2010**, 656, 5–28. [Google Scholar] [CrossRef] [Green Version] - Chewle, S.; Thi, Y.N.; Weber, M.; Emmerling, F. How does choice of solvents influence crystallization pathways? An experimental and theoretical case study. Unpublished work. 2018. [Google Scholar]

**Figure 1.**

**Left:**A 2-dimensional state space $\Omega $ is decomposed into 57 subsets via a Voronoi tessellation. The potential energy surface has three local minima (contour lines). After calculation of transition probabilities due to a potential-driven diffusion process, a transition matrix $P\in {\mathbb{R}}^{57\times 57}$ is generated.

**Right:**The leading three eigenvectors of P are computed, i.e., $X\in {\mathbb{R}}^{57\times 3}$ is determined. The first eigenvector is a constant vector. The 57 entries of the second and third eigenvector are plotted as 2-dimensional points. One can clearly see the simplex structure. Those points located at the vertices of this simplex belong to Voronoi cells covering the energy minima of the potential energy landscape.

**Figure 2.**A clustering of micro-states based on membership functions $\chi $ leads to a definition of a macro-system. In order to derive a physical law for the time-dependent propagation of macro-states from the propagation of the micro-system a projected transfer operator is constructed. For this projection an invariant subspace X of the transfer operator is used. PCCA+ defines the relation between X and $\chi $.

**Figure 3.**

**Left:**Molecular example for a multivalent ligand based on a DNA backbone [33].

**Right:**In multivalent ligand-receptor complexes there exist molecular micro-states which are neither completely bound nor completely unbound. In a process with two macro-states these micro-states partially correspond to both of the macro-states. $\chi =X\mathcal{A}$ quantifies this ratio. $\mathrm{det}\left(S\right)$ quantifies their contribution to the binding rate.

**Figure 4.**In a kinetic ITC experiment the ligand concentration ($mM$) in solution increases stepwise via titration. From the ligand concentration and from the electric power which is used to keep the temperature constant, the binding rate of the process can be calculated ($m{M}^{-1}{s}^{-1}$). The blue stars represent experimental results taken from [41]. If the binding would be a one-step molecular process, then the rate should be constant number. The systematic decrease of the rate can be explained by assuming a projection from a multivalent reaction kinetics model onto a 2-dimensional invariant subspace of $\mathcal{L}$ (red line).

**Figure 5.**Formic acid dimer. Two formic acid molecules are arranged in a cycle such that the green hydrogen atom can move (along the dotted line) from the donor oxygen atom of one molecule to the acceptor oxygen atom of the other molecule. With the aid of the time-dependent Schrödinger equation, the movement of the electron density during this reaction is computed (results taken from [45]).

**Figure 6.**We applied SQRT-A to the electron density calculations of formic acid (Figure 5) in order to yield L. Via PCCA+ based on 6 Schur vectors X of L we computed the membership vectors $\chi =X\mathcal{A}$. The resulting membership vectors are plotted. It turns out that the $CH$-moiety is like a barrier for the electron density flux.

**Figure 7.**

**Top:**In sequential spectroscopy many spectra are measured in a certain time interval (matrix V). Each line in the plot corresponds to one Raman spectrum measured at a certain point of time from a crystallization process of paracetamol in ethanol (wavenumbers in cm${}^{-1}$). The unpublished data V has been measured by Surahit Chewle [49].

**Bottom Left:**With the aid of X and $\overline{P}$ received from V one can derive the concentration $\chi $ of the “pure” spectra at each point of time via PCCA+.

**Bottom Right:**Using this matrix $\chi $ the “pure” spectra are calculated as $W={\chi}^{\#}V$. From this calculation one can identify the intermediate steps of crystallization of paracetamol.

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Weber, M.
Implications of PCCA+ in Molecular Simulation. *Computation* **2018**, *6*, 20.
https://doi.org/10.3390/computation6010020

**AMA Style**

Weber M.
Implications of PCCA+ in Molecular Simulation. *Computation*. 2018; 6(1):20.
https://doi.org/10.3390/computation6010020

**Chicago/Turabian Style**

Weber, Marcus.
2018. "Implications of PCCA+ in Molecular Simulation" *Computation* 6, no. 1: 20.
https://doi.org/10.3390/computation6010020