Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture

Proteins are found in all living organisms and constitute a large group of macromolecules with many functions. Proteins achieve their operations by adopting distinct three-dimensional structures encoded within the sequence of the constituent amino acids in one or more polypeptides. New, more flexible distributions are proposed for the MCMC sampling method for predicting protein 3D structures by applying a Möbius transformation to the bivariate von Mises distribution. In addition to this, sine-skewed versions of the proposed models are introduced to meet the increasing demand for modelling asymmetric toroidal data. Interestingly, the marginals of the new models lead to new multimodal circular distributions. We analysed three big datasets consisting of bivariate information about protein domains to illustrate the efficiency and behaviour of the proposed models. These newly proposed models outperformed mixtures of well-known models for modelling toroidal data. A simulation study was carried out to find the best method for generating samples from the proposed models. Our results shed new light on proposal distributions in the MCMC sampling method for predicting the protein structure environment.


Introduction
Proteins constitute a diverse set of biological macromolecules that are often referred to as the workhorses of cells because of their central role in most biological processes. Chemically, proteins are biopolymers consisting of linear sequences of amino acid covalently linked by peptide bonds, such that each polypeptide is a single large molecule. Nineteen of the natural amino acids (all but proline) have an amino group (-NH 2 ), a carboxylic acid group (-COOH), an amino acid-specific side-chain, and a hydrogen atom attached to a central carbon atom (C α ). Each peptide bond links the carboxylate group of one amino acid to the amino group of the next. Protein structure is often described in terms of four levels of organisation. The primary structure is the sequence of amino acids. The secondary structure refers to the local folding of the polypeptide backbone into helices, strands, or loops. The tertiary structure describes the complex three-dimensional folding of a polypeptide. Finally, the quaternary structure describes the involvement of one or more polypeptides in creating a functional protein. The amino nitrogen, C α , and the carbonyl carbon of all residues constitute the protein backbone.
The 3D coordinates of proteins, as provided by electron microscopy, NMR, or X-ray crystallography, directly reveal the conformation of the backbone atoms, with knowledge of standard chemical bond angles and lengths incorporated during the refinement process.
Generally, the backbone conformation is analysed using the backbone torsion or the dihedral angles, denoted by φ, ψ, and ω, as introduced by Ramachandran [1] (Figure 1A), where ω is usually close to 180 • or occasionally 0 • . Alternatively, virtual bond and torsion angles θ and τ may be used to describe a protein backbone representation based on only C α positions ( Figure 1B). A major challenge in molecular biology and computational biochemistry involves predicting protein 3D structure. The encoding gene provides the primary structure of a protein, and the secondary structure may be predicted computationally with high reliability using artificial neural networks [2], based on the propensity of amino acids to form different secondary structures.
However, predicting the 3D structure of a protein, especially if it is larger than 100 amino acids or if a homologue with a known structure and significant sequence identity is not available, remains challenging. This challenge is addressed by de novo structure prediction, which requires parametrized physical force fields. The probability of observing a particular conformation x of the molecule, p(x|β) is considered and expressed as the Boltzmann distribution: where Z β is the normalization constant, U(x) is the potential energy of the molecule, β = (k b T) −1 is the thermodynamic beta, k b is the Boltzmann constant, and constant T is the temperature. The 3D structure of a molecule can be derived from p(x|β) by determining the mode of the distribution. Molecular dynamics (MD) is a simulation-based method used to probe for the mode of distribution. However, many millions to trillions of steps are required to simulate a single folding event. By contrast with MD, Monte Carlo (MC)-based methods are more time-efficient. In the Markov Chain Monte Carlo (MCMC) method, a Markov chain is constructed using the Metropolis-Hastings (MH) algorithm ( [3,4]), with p(x|β) as the stationary distribution. A symmetric proposal distribution is utilized in the MH algorithm. Choosing a good proposal distribution is one of the challenges in MCMC-based simulation. Gaussian perturbations are the most straightforward proposal distributions that can be used [5]. The results are more accurate when the proposal distribution is closer to the stationary distribution; therefore, protein structural information is incorporated into most proposal distributions. Using the information on angles and bond lengths observed in real proteins is a simple way to define a suitable proposal distribution. Fragment libraries for backbone angles and rotamer libraries for side-chain angles can be selected as default choices for proposal distributions [6][7][8].
Various tractable statistical distributions for modelling protein dihedral angles are briefly reviewed. These models can be used as proposal distributions for MCMC protein sampling. They can also be utilized as a prior for determining a protein structure from data. However, these models do not generate folded proteins because they work under some simplifying assumptions, both in terms of their functional form and dependency structure (see [9]). The ultimate goal of our contribution is to propose more flexible models for the proposal distribution.

Brief Overview
An overview of the models available for toroidal data that forms the departure point for the investigation in this paper follows (see [10]).
The first probability distribution on the torus was proposed by Mardia in [11]. It is the bivariate von Mises distribution: f (θ 1 , θ 2 ) = C exp(κ 1 cos(θ 1 − ι 1 ) + κ 2 cos(θ 2 − ι 2 ) + (cos(θ 1 − ι 1 ), sin(θ 1 − ι 1 ))A(cos(θ 2 − ι 2 ), sin(θ 2 − ι 2 )) T ), where C is the normalizing constant, ι 1 , ι 2 ∈ [−π, π) are location parameters, κ 1 , κ 2 ≥ 0 are concentration parameters, and matrix A 2×2 is the circular-circular dependence parameter. To move beyond the complexity created by the large number of parameters in this founding distribution, a few special cases in the literature have been considered. Rivest in [12] introduced the subclass: where α, β ∈ R. Singh et al. in [13] proposed the sine model as a special case of (1) with one less parameter, letting α = 0 and β = κ 3 : where where I α (z) is the modified Bessel function of the first kind of order α. Another submodel of (1), the cosine model, was introduced by Mardia et al. in [14] by setting α = β = −κ 3 : where It is worth noting that Kent et al. in [15] introduced another version of the cosine model, with a negative interaction given by: with the same normalizing constant as for the model with a positive interaction in (4). Kent et al. in [15] also introduced a submodel of (1), which is a hybrid between the sine and cosine models, given by: where κ 1 , κ 2 ≥ 0, λ ∈ R, and for simplicity, β = 1. Mardia and Frellsen in [16] compared the properties of these three submodels in (2), (4), and (6). The multivariate extensions of the sine model can be found in [17]. In another attempt to expand the platform of toroidal distributions, Wehrly and Johnson in [18] used a marginal specification approach to construct bivariate models with more flexible specified circular marginals. Later, Jones et al. in [19] obtained various toroidal models using the general form in [18]. In this way, Fernández-Durán in [20] proposed another general toroidal model by using a copula pdf that García-Portugués imposed periodic restrictions on in [21], and Jones et al. [19] defined it as a circula pdf, arguing that it is characterised by a circular uniform distribution. For more details, see [22]. The main incentive for defining toroidal models in recent years has been the demand from other sciences, especially bioinformatics, to model dihedral angles in order to analyse protein structures ( [13,14,23,24]). However, toroidal data can also be observed in other fields, for example, in meteorology (wind directions at two different times of day) and medicine (peak systolic blood pressure during two separate time periods). For the interested reader, some applications of toroidal models can be found in [25][26][27][28][29].
In this paper, Möbius transformation will form the foundation for the construction of competitive models. A map T : C → C is a Möbius transformation if it has the following form: where C is the set of complex numbers, a, b, c, d ∈ C are complex numbers, and ad − bc = 0. Let S ⊂ C be a unit circle, then Möbius transformation maps a point on the unit circle θ onto anotherθ. Jones in [30] subsequently applied the Möbius transformation to introduce a new family of distributions on the disc. Kato and Jones in [31] used the Möbius transformation to introduce a new distribution on the circle by transforming the von Mises distribution. Wang and Shimizu in [32] applied the Möbius transformation to cardioid random variables. Kato and Pewsey in [33] employed this transformation to define the unimodal bivariate wrapped Cauchy distribution by transforming the bivariate circular distribution in [34]: 1), and −1 < ρ < 1. Kato and McCullagh in [35] introduced the Cauchy distribution on the sphere by using a Möbius transformation.

Our Contribution
In this paper, two new distributions are introduced on the torus by applying a restricted version of the Möbius transformation developed by Kato and Pewsey in [33], namely the circular Möbius transformation that transforms θ intoθ through the following mapping: where −π ≤ µ, ν ≤ π, r ∈ [0, 1), and µ is the rotation parameter. When µ = 0, ν and r attract the point θ towards ν. By increasing r, the concentration of the points around ν increases. If r = 0, the transformation is identity mapping, and when r → 1, (θ, µ, ν, r) tends to ν. More details about the circular Möbius transformation can be found in [29,36]. The inverse of (9) can be obtained as follows: More specifically, our novel contribution includes the following highlights: • New Möbius transformation-induced toroidal distributions are developed, acting as alternatives for existing models and efficiently outperforming them in the data application in this paper; • The proposed distributions reflect the protein structure more accurately than the existing models and can serve as proposal distributions for MCMC sampling of proteins since we should incorporate protein structure information into proposal distributions to obtain more accurate results; • Sine-skewed versions of these proposed models are introduced to meet the increasing demand for the modelling of asymmetric toroidal data; • The marginals of the new models lead to new multimodal circular distributions.
The remainder of this paper is organised as follows. Section 2 introduces two new distributions emanating from the sine and cosine models in (2) and (4), respectively. Section 3 introduces the sine-skewed versions of the newly proposed transformed sine and cosine models. Section 4 outlines the maximum likelihood method for obtaining the parameter estimates for the proposed models. Three real datasets, including information on angles in protein structures, are analysed in Section 5 to determine the performance of the proposed models relative to known competitors, and demonstrate their well-deserved designation as possible models for toroidal data. In Section 6, a simulation study is conducted for two reasons: (1) to explore the best method of generating samples from the newly transformed sine and cosine models, and (2) to evaluate the numerical method, followed by the acquisition of the maximum likelihood estimates (MLEs) of the parameters.

Two New Models on the Torus
This section highlights two new flexible models for toroidal data, obtained by transforming the sine and cosine models in (2) and (4) via a Möbius transformation.

Proof. See Appendix A.
In the following, the marginal pdf and conditional pdf of the transformed cosine model (10) and their properties are discussed. The marginal pdf of θ 1 for the transformed cosine model in (10) is as follows: where and C is as defined in (5). The marginal pdf of Θ 1 in (12) is symmetric to µ 1 , small values of κ 3 approximate the transformed von Mises distribution [31], and r 1 = 0, which simplifies to the marginal pdf of the cosine model [14]. It is clear that for r 1 = 0 and small values of κ 3 , the von Mises distribution is approximated. If κ 1 = κ 2 = κ 3 = 0 in (12), then the Möbius-transformed uniform distribution is obtained. For κ 1 = κ 2 = κ 3 = r 1 = 0, the distribution is uniform. When κ 1 = κ 2 = 0 in (12), the distribution is the transformed von Mises distribution [31], and when κ 1 = κ 2 = r 1 = 0, the von Mises distribution is obtained. The plots of this generalized marginal pdf of Θ 1 are shown in Figure 3 (left) for µ 1 = 0 and different values of κ 1 , κ 2 , κ 3 and r 1 , reflecting unimodal and bimodal graphs. In the following theorem, the modality of the marginal density function Θ 1 is addressed.
In this case, the marginal pdf of Θ 1 for the transformed sine model in (16) is as follows: where and C, as shown in (3). The marginal pdf of Θ 1 is symmetric around µ 1 . If κ 3 = 0, the distribution is the transformed von Mises distribution [31]. If r 1 = 0 in (18), the marginal distribution of the sine model [13] is obtained. The plots of the marginal pdf of Θ 1 in (18) are shown in Figure 3 (right) for µ 1 = 0 and different values of κ 1 , κ 2 , κ 3 , and r 1 . As can be seen, the distribution can be both unimodal and bimodal. In the following theorem, the modality of the marginal pdf of Θ 1 in (18) is explored.

Sine-Skewed Transformed Sine and Cosine Distributions
In practice, it is possible to have skewed toroidal datasets, despite the well-known toroidal distributions being pointwise symmetric. Therefore, it would be interesting to extend this methodology to the recent model of Ameijeiras-Alonso and Ley in [24]. In this section, the skewed versions of the proposed transformed sine and cosine models in (16) and (10) are introduced. In addition, Abe and Pewsey's skew model in [37] is applied to extend models on the circle manifold using marginal density functions.
To expand the skewed circular models, the following models are introduced based on the k sine-skewed model of [37]. The skewed version of the marginal distribution of Θ 1 in (12) is the following: where C is as defined in (5), h(θ 1 ) is as defined in (13), and −1 ≤ λ ≤ 1. λ > 0 leads to left-skewed distributions, and λ < 0 provides right-skewed distributions. The plots of the skewed pdf in (25) are shown in Figure 7 (left) for k = 1, µ 1 = 0, and different values of κ 1 , κ 2 , κ 3 , r 1 , and λ. Similarly, the sine-skewed version [37] of the marginal pdf of Θ 1 in (18) is of the same density as in (25), where C is as defined in (3), h(θ 1 ) is as defined in (19), and −1 ≤ λ ≤ 1. The plots of the sine-skewed version of the marginal pdf in (18) are shown in Figure 7 (right) for k = 1, µ 1 = 0, and different values of κ 1 , κ 2 , κ 3 , r 1 , and λ. As can be seen, the distribution is both unimodal and bimodal. Multimodal results for k > 1.

Maximum Likelihood Estimation
In this section, the maximum likelihood method is outlined to obtain the estimates of parameters for both the transformed cosine and sine models.

Protein Structure Application
To demonstrate the performance of the proposed models in modelling the dihedral angles and the planar and torsion angles in a protein structure, three datasets are considered, which are available at http://scop.mrc-lmb.cam.ac.uk/scop/. SCOP.1 contains 10,188 planar and torsion angles (θ, τ) (see Figure 1A) for about 63 protein domains that were randomly selected from three remote protein classes in the structural classification of proteins (SCOP). SCOP.3 includes 4607 planar and torsion angles (θ, τ) from approximately 40 protein chains, and the TCBIG.VAL.right set consists of 2673 dihedral angles (φ, ψ) (see Figure 7B) [41]. The Ramachandran plots [1] for each dataset are presented in Figure 8. As can be seen, the datasets are at least bimodal, so bimodal or mixture distributions will be good choices for fitting. The transformed sine and cosine models in (16) and (10), along with their competitorsthe sine model, and a mixture of sine models (see (2); [13]), the cosine model, and a mixture of cosine models (see (4); [14]), and a mixture of bivariate wrapped Cauchy models (see (8); [33])-were fitted to the SCOP.1 and SCOP.3 datasets. A mixture distribution with two components was investigated as follows: where p ∈ [0, 1] and f 1 (., .) and f 2 (., .) are two toroidal distributions. The estimation of parameters, identifiability, and choosing the number of mixing components and parameters are among the well-known challenges in the application of mixture distributions. Furthermore, when the empirical density of the data is highly asymmetric, it can result in a misleading statistical inference of the parameters [42]. Multimodal distributions, which represent the random behaviour of data with multi-mode presence, can provide better model fitting. This is observed here using the bimodal transformed sine model.
The sine-skewed versions of the aforementioned distributions [24] form part of these evaluations. The results, including the MLEs of parameters, log-likelihood, Akaike information criterion (AIC), and the Bayesian information criterion (BIC), are shown in Tables 1 and 2. Based on these results, the bimodal transformed sine model in (16) provides the best fit for the data, and its performance is better than that of the mixture models for these datasets. Based on the symmetry test of Ameijeiras-Alonso and Ley in [24] and the values of log-likelihood in Tables 1 and 2, there is no evidence that rejects the fact that underlying distributions for SCOP.1 and SCOP.3 are pointwise symmetric. The results of the mixture of transformed sine and the mixture of transformed cosine models are not reported in Tables 1 and 2 becausep ≈ 1. Scatter plots of the data, together with contour plots of the fitted distributions are provided in Figures 9 and 10.
With the last dataset TCBIG.VAL.right, good results are not observed upon application of the single component distributions. Therefore, a mixture model might offer a solution. Subsequently, only mixtures of the aforementioned distributions were considered. For comparison, goodness-of-fit was evaluated for mixtures of distributions from transformed sine and cosine models, and for mixtures of distributions from existing models. The results are listed in Table 3. As can be seen, the mixture of transformed sine models provides the best fitting of the data. Scatter plots of the data and contour plots of the fitted distributions are shown in Figure 11.     The kernel density plots of the three datasets and the best-fit models obtained for each dataset are shown in Figure 12. According to the levels of contours in the kernel densities of the data and fitted curves, our proposed models provide an accurate fit.  Figure 12. Kernel density plots of the data, and the best-fit models.

Simulation Study
The authors of Ref. [16] explored suitable methods for generating samples from cosine (with positive interaction) and sine models. They found that both Gibbs and rejection sampling approaches performed well, but the latter was more efficient. To simulate a sample from the newly proposed transformed sine and transformed cosine distributions in (16) and (10), four packages in R, which are generally based on rejection sampling, including MCMCpack [43], gibbs.met [44], LearnBayes [45], and MHadaptive [46], were used and the results were compared. These packages are based on Metropolis sampling, random walk Metropolis sampling, Metropolis-Hastings MCMC sampling, and Gibbs sampling with Metropolis steps. First, a sample of size n = 1000 was generated with each package from the transformed sine model in (16), with the parameters κ 1 = 2.1585, κ 2 = 0.3489, κ 3 = 3.1712, r 1 = 0.6036, r 2 = 0.0131, µ 1 = 1.8573, and µ 2 = 2.4321 (the best-fit model for the SCOP.1 dataset in the previous section). The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], which use the last 10 percent of the chain, are shown in Figure 13. The runtime of each method is shown in Figure 14 (left) for a sample size of n = 100 [48] (system: Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz RAM 8.00 GB). Second, the MLE of the parameters and bias and the mean squared error (MSE) of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 1,001,000. The results are listed in Table 4.
Similarly, for the transformed cosine model in (10) with parameters κ 1 = 3.9891, κ 2 = 0.6532, κ 3 = 1.7911, r 1 = 0.2305, r 2 = 0.5046, µ 1 = −1.5651, and µ 2 = 0.9878, the aforementioned R packages were applied, first to generate a sample size of n = 1000. The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], are shown in Figure 15. The runtime of each method is presented in Figure 14 (right) for a sample size of n = 100 [48]. Then, the MLE of the parameters and bias and the MSE of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 100,1000. The results are listed in Table 4, which support the performance of the selected approach for obtaining the MLEs of parameters. As shown in Figure 14, the MCMCmetrop1R is the highest-speed method, and gibbs_met is the lowest-speed method. According to the results in Table 4, rejection sampling provides accurate results. Gibbs sampling with Metropolis steps (gibbs_met) is also precise despite the low speed. With increasing n, bias and MSE decrease.  Figure 13. Scatter, trace, and compare-partial plots of the simulated data from the transformed sine model using "gibbs_met" in the "gibbs.met" package (first row), "MCMCmetrop1R" in the "MCMCpack" package (second row), "met_gaussian" in the "gibbs.met" package (third row), "Metro_Hastings" in the "MHadaptive" package (fourth row), and "rwmetrop" in the "LearnBayes" package (fifth row).    Figure 15. Scatter, trace, and compare-partial plots of the simulated data from the transformed cosine model using "gibbs_met" in the "gibbs.met" package (first row), "MCMCmetrop1R" in the "MCMCpack" package (second row), "met_gaussian" in the "gibbs.met" package (third row), "Metro_Hastings" in the "MHadaptive" package (fourth row), and "rwmetrop" in the "LearnBayes" package (fifth row).

Conclusions
In MCMC protein sampling for predicting the 3D structure, when the proposal distribution is closer to the stationary distribution, the results are more accurate. Therefore, a suitable proposal distribution can be defined using the angles and bond lengths observed in natural proteins. Statistical distributions for modelling protein dihedral angles can be used as proposal distributions for MCMC protein sampling. We gave a brief overview of existing symmetric models that formed the basis of the proposed models in this paper ( (2) and (4)). In addition, new Möbius transformation-induced toroidal distributions, together with skewed versions, were developed in this study as alternatives to proposal distributions for the MCMC sampling of proteins. We demonstrated their performance with three protein datasets of toroidal nature and graphically illustrated their flexible behaviour. The AIC and BIC confirmed the better performance of our proposed models in comparison with the existing models. These newly proposed models even outperformed mixtures of well-known models for modelling toroidal data. In comparison with the existing toroidal models, these proposed models reflect the protein structural information better and should be incorporated into proposal distributions. Lastly, to meet the need for sampling of proposal distribution in the MCMC algorithm, suitable methods for generating samples from these new models were explored using different types of the Metropolis sampling. In the future, one can investigate the performance of the Möbius transformation to obtain new cylindrical distributions.  decreases from 0 to π. It can be concluded that g(θ 1 ) is decreasing in [0, π) and increasing in [−π, 0); hence, if −2r 1 + κ 2 κ 3 A(|κ 2 −κ 3 |) |κ 2 −κ 3 | (1−r 2 1 ) 2 (1−r 1 ) 2 − κ 1 (1−r 2 1 ) 2 (1−r 1 ) 2 < 0, then f Θ 1 (θ 1 ) ≥ 0 for θ 1 ∈ [−π, 0) and f Θ 1 (θ 1 ) < 0 for θ 1 ∈ [0, π); which means that f Θ 1 (θ 1 ) is unimodal. If −2r 1 + (1−r 2 1 ) 2 (1−r 1 ) 2 > 0 and −2r 1 + κ 2 κ 3 A(κ 2 +κ 3 ) κ 2 +κ 3 (1−r 2 1 ) 2 (1+r 1 ) 2 − κ 1 (1−r 2 1 ) 2 (1+r 1 ) 2 ≤ 0, then f Θ 1 (θ 1 ) is first increasing and then decreasing in [−π, 0), which means that f Θ 1 (θ 1 ) is bimodal. A more detailed proof is provided by the authors upon request.