Abstract
Proteins are found in all living organisms and constitute a large group of macromolecules with many functions. Proteins achieve their operations by adopting distinct three-dimensional structures encoded within the sequence of the constituent amino acids in one or more polypeptides. New, more flexible distributions are proposed for the MCMC sampling method for predicting protein 3D structures by applying a Möbius transformation to the bivariate von Mises distribution. In addition to this, sine-skewed versions of the proposed models are introduced to meet the increasing demand for modelling asymmetric toroidal data. Interestingly, the marginals of the new models lead to new multimodal circular distributions. We analysed three big datasets consisting of bivariate information about protein domains to illustrate the efficiency and behaviour of the proposed models. These newly proposed models outperformed mixtures of well-known models for modelling toroidal data. A simulation study was carried out to find the best method for generating samples from the proposed models. Our results shed new light on proposal distributions in the MCMC sampling method for predicting the protein structure environment.
1. Introduction
Proteins constitute a diverse set of biological macromolecules that are often referred to as the workhorses of cells because of their central role in most biological processes. Chemically, proteins are biopolymers consisting of linear sequences of amino acid covalently linked by peptide bonds, such that each polypeptide is a single large molecule. Nineteen of the natural amino acids (all but proline) have an amino group (–), a carboxylic acid group (–), an amino acid-specific side-chain, and a hydrogen atom attached to a central carbon atom (). Each peptide bond links the carboxylate group of one amino acid to the amino group of the next. Protein structure is often described in terms of four levels of organisation. The primary structure is the sequence of amino acids. The secondary structure refers to the local folding of the polypeptide backbone into helices, strands, or loops. The tertiary structure describes the complex three-dimensional folding of a polypeptide. Finally, the quaternary structure describes the involvement of one or more polypeptides in creating a functional protein. The amino nitrogen, , and the carbonyl carbon of all residues constitute the protein backbone.
The 3D coordinates of proteins, as provided by electron microscopy, NMR, or X-ray crystallography, directly reveal the conformation of the backbone atoms, with knowledge of standard chemical bond angles and lengths incorporated during the refinement process. Generally, the backbone conformation is analysed using the backbone torsion or the dihedral angles, denoted by , , and , as introduced by Ramachandran [1] (Figure 1A), where is usually close to or occasionally . Alternatively, virtual bond and torsion angles and may be used to describe a protein backbone representation based on only positions (Figure 1B).
Figure 1.
Two representations of protein backbone structures based on torsion or pseudo-torsion angles.
A major challenge in molecular biology and computational biochemistry involves predicting protein 3D structure. The encoding gene provides the primary structure of a protein, and the secondary structure may be predicted computationally with high reliability using artificial neural networks [2], based on the propensity of amino acids to form different secondary structures.
However, predicting the 3D structure of a protein, especially if it is larger than 100 amino acids or if a homologue with a known structure and significant sequence identity is not available, remains challenging. This challenge is addressed by de novo structure prediction, which requires parametrized physical force fields. The probability of observing a particular conformation of the molecule, is considered and expressed as the Boltzmann distribution:
where is the normalization constant, is the potential energy of the molecule, is the thermodynamic beta, is the Boltzmann constant, and constant T is the temperature. The 3D structure of a molecule can be derived from by determining the mode of the distribution. Molecular dynamics (MD) is a simulation-based method used to probe for the mode of distribution. However, many millions to trillions of steps are required to simulate a single folding event. By contrast with MD, Monte Carlo (MC)-based methods are more time-efficient. In the Markov Chain Monte Carlo (MCMC) method, a Markov chain is constructed using the Metropolis–Hastings (MH) algorithm ([3,4]), with as the stationary distribution. A symmetric proposal distribution is utilized in the MH algorithm.
Choosing a good proposal distribution is one of the challenges in MCMC-based simulation. Gaussian perturbations are the most straightforward proposal distributions that can be used [5]. The results are more accurate when the proposal distribution is closer to the stationary distribution; therefore, protein structural information is incorporated into most proposal distributions. Using the information on angles and bond lengths observed in real proteins is a simple way to define a suitable proposal distribution. Fragment libraries for backbone angles and rotamer libraries for side-chain angles can be selected as default choices for proposal distributions [6,7,8].
Various tractable statistical distributions for modelling protein dihedral angles are briefly reviewed. These models can be used as proposal distributions for MCMC protein sampling. They can also be utilized as a prior for determining a protein structure from data. However, these models do not generate folded proteins because they work under some simplifying assumptions, both in terms of their functional form and dependency structure (see [9]). The ultimate goal of our contribution is to propose more flexible models for the proposal distribution.
1.1. Brief Overview
An overview of the models available for toroidal data that forms the departure point for the investigation in this paper follows (see [10]).
The first probability distribution on the torus was proposed by Mardia in [11]. It is the bivariate von Mises distribution:
where C is the normalizing constant, are location parameters, are concentration parameters, and matrix is the circular–circular dependence parameter. To move beyond the complexity created by the large number of parameters in this founding distribution, a few special cases in the literature have been considered. Rivest in [12] introduced the subclass:
where . Singh et al. in [13] proposed the sine model as a special case of (1) with one less parameter, letting and :
where
where is the modified Bessel function of the first kind of order . Another submodel of (1), the cosine model, was introduced by Mardia et al. in [14] by setting :
where
It is worth noting that Kent et al. in [15] introduced another version of the cosine model, with a negative interaction given by:
with the same normalizing constant as for the model with a positive interaction in (4). Kent et al. in [15] also introduced a submodel of (1), which is a hybrid between the sine and cosine models, given by:
where , , and for simplicity, . Mardia and Frellsen in [16] compared the properties of these three submodels in (2), (4), and (6). The multivariate extensions of the sine model can be found in [17]. In another attempt to expand the platform of toroidal distributions, Wehrly and Johnson in [18] used a marginal specification approach to construct bivariate models with more flexible specified circular marginals. Later, Jones et al. in [19] obtained various toroidal models using the general form in [18]. In this way, Fernández-Durán in [20] proposed another general toroidal model by using a copula pdf that García-Portugués imposed periodic restrictions on in [21], and Jones et al. [19] defined it as a circula pdf, arguing that it is characterised by a circular uniform distribution. For more details, see [22].
The main incentive for defining toroidal models in recent years has been the demand from other sciences, especially bioinformatics, to model dihedral angles in order to analyse protein structures ([13,14,23,24]). However, toroidal data can also be observed in other fields, for example, in meteorology (wind directions at two different times of day) and medicine (peak systolic blood pressure during two separate time periods). For the interested reader, some applications of toroidal models can be found in [25,26,27,28,29].
Most of the proposed toroidal models are pointwise symmetric, whereas the data that they model usually represent asymmetric patterns. This inspired Ameijeiras-Alonso and Ley [24] to introduce bivariate sine-skewed distributions ():
where is a toroidal density symmetric (pointwise) about , and the skewness parameters , satisfy .
In this paper, Möbius transformation will form the foundation for the construction of competitive models. A map is a Möbius transformation if it has the following form:
where is the set of complex numbers, are complex numbers, and . Let be a unit circle, then Möbius transformation maps a point on the unit circle onto another . Jones in [30] subsequently applied the Möbius transformation to introduce a new family of distributions on the disc. Kato and Jones in [31] used the Möbius transformation to introduce a new distribution on the circle by transforming the von Mises distribution. Wang and Shimizu in [32] applied the Möbius transformation to cardioid random variables. Kato and Pewsey in [33] employed this transformation to define the unimodal bivariate wrapped Cauchy distribution by transforming the bivariate circular distribution in [34]:
where , , , , , , , , and . Kato and McCullagh in [35] introduced the Cauchy distribution on the sphere by using a Möbius transformation.
1.2. Our Contribution
In this paper, two new distributions are introduced on the torus by applying a restricted version of the Möbius transformation developed by Kato and Pewsey in [33], namely the circular Möbius transformation that transforms into through the following mapping:
where , , , and is the rotation parameter. When , and r attract the point towards . By increasing r, the concentration of the points around increases. If , the transformation is identity mapping, and when , tends to . More details about the circular Möbius transformation can be found in [29,36]. The inverse of (9) can be obtained as follows:
More specifically, our novel contribution includes the following highlights:
- New Möbius transformation-induced toroidal distributions are developed, acting as alternatives for existing models and efficiently outperforming them in the data application in this paper;
- The proposed distributions reflect the protein structure more accurately than the existing models and can serve as proposal distributions for MCMC sampling of proteins since we should incorporate protein structure information into proposal distributions to obtain more accurate results;
- Sine-skewed versions of these proposed models are introduced to meet the increasing demand for the modelling of asymmetric toroidal data;
- The marginals of the new models lead to new multimodal circular distributions.
The remainder of this paper is organised as follows. Section 2 introduces two new distributions emanating from the sine and cosine models in (2) and (4), respectively. Section 3 introduces the sine-skewed versions of the newly proposed transformed sine and cosine models. Section 4 outlines the maximum likelihood method for obtaining the parameter estimates for the proposed models. Three real datasets, including information on angles in protein structures, are analysed in Section 5 to determine the performance of the proposed models relative to known competitors, and demonstrate their well-deserved designation as possible models for toroidal data. In Section 6, a simulation study is conducted for two reasons: (1) to explore the best method of generating samples from the newly transformed sine and cosine models, and (2) to evaluate the numerical method, followed by the acquisition of the maximum likelihood estimates (MLEs) of the parameters.
2. Two New Models on the Torus
This section highlights two new flexible models for toroidal data, obtained by transforming the sine and cosine models in (2) and (4) via a Möbius transformation.
2.1. Transformed Cosine Model
Let have pdf (4) with . Suppose that
where is defined in (9), , and without loss of generality . Then, has a pdf of
where , , C is defined in (5), and
where are location parameters, are concentration parameters, is the circular–circular dependence parameter, and and regulate the concentrations of the marginal distributions. In (10), when , the cosine model (4) is obtained. If yields the bivariate wrapped Cauchy distribution, then follows. The pdf and contour plots of (10) are shown in Figure 2 for and different values of and , and reveal unimodal and bimodal behaviour.
Figure 2.
Pdf and contour plots of the transformed cosine model (10) for and different values of , and .
Proposition 1.
Assuming the transformed cosine model (10), when , then has approximately a bivariate normal distribution if and only if .
Proof.
See Appendix A. □
In the following, the marginal pdf and conditional pdf of the transformed cosine model (10) and their properties are discussed. The marginal pdf of for the transformed cosine model in (10) is as follows:
where
and C is as defined in (5). The marginal pdf of in (12) is symmetric to , small values of approximate the transformed von Mises distribution [31], and , which simplifies to the marginal pdf of the cosine model [14]. It is clear that for and small values of , the von Mises distribution is approximated. If in (12), then the Möbius-transformed uniform distribution is obtained. For , the distribution is uniform. When in (12), the distribution is the transformed von Mises distribution [31], and when , the von Mises distribution is obtained. The plots of this generalized marginal pdf of are shown in Figure 3 (left) for and different values of and , reflecting unimodal and bimodal graphs. In the following theorem, the modality of the marginal density function is addressed.
Corollary 1.
Proof.
See Appendix A. □
The conditional pdf results in the transformed von Mises distribution [31] given by the following:
where is as defined in (13), and
2.2. Transformed Sine Model
Let have a bivariate pdf (2), with . Suppose that
where is as defined in (9), , , and without loss of generality . Then, has a pdf as follows:
where , , C is as defined in (3), and
where are location parameters, are concentration parameters, is the circular–circular dependence parameter, and and regulate the concentrations of the marginal distributions. If in (16), then the sine model in (2) follows. The pdf and contour plots of (16) are shown in Figure 4 for and for different values of and . As can be seen, this transformed sine pdf (16) can have both unimodal and bimodal forms.
Figure 4.
Pdf and contour plots of the transformed sine model (16) for and different values of , and .
Proposition 2.
Assuming the transformed sine model in (16), when , then has an approximately bivariate normal distribution if and only if .
Proof.
Similarly, Theorem 1 is proved using the results in [13]. □
In this case, the marginal pdf of for the transformed sine model in (16) is as follows:
where
and C, as shown in (3). The marginal pdf of is symmetric around . If , the distribution is the transformed von Mises distribution [31]. If in (18), the marginal distribution of the sine model [13] is obtained. The plots of the marginal pdf of in (18) are shown in Figure 3 (right) for and different values of , and . As can be seen, the distribution can be both unimodal and bimodal. In the following theorem, the modality of the marginal pdf of in (18) is explored.
Corollary 2.
Proof.
See Appendix A. □
Interestingly, the conditional distribution is the transformed von Mises distribution [31]. When in (20), the von Mises distribution with parameters and is obtained.
3. Sine-Skewed Transformed Sine and Cosine Distributions
In practice, it is possible to have skewed toroidal datasets, despite the well-known toroidal distributions being pointwise symmetric. Therefore, it would be interesting to extend this methodology to the recent model of Ameijeiras-Alonso and Ley in [24]. In this section, the skewed versions of the proposed transformed sine and cosine models in (16) and (10) are introduced. In addition, Abe and Pewsey’s skew model in [37] is applied to extend models on the circle manifold using marginal density functions.
By substituting (10) in (7), the sine-skewed transformed cosine () distribution can be defined as follows:
where , , C is as defined in (5), and – are as defined in (11). The pdf and contour plots of the sine-skewed transformed cosine model for , , , , , , and different values of and are shown in Figure 5 (top).
The marginal pdf of for in (22) is as follows:
where and are obtained from (13) and (15), respectively. When , is the Möbius-transformed sine-skewed version [37] of the marginal pdf of the cosine model. The plots of the skewed pdf in (23) are shown in Figure 6 (left) for and different values of , and . As can be observed, the distribution can be both unimodal and bimodal.
Figure 6.
Plots of the marginal pdf of for (left) and (right) for and different parameter values.
Similarly, from (16) and (7), the sine-skewed transformed sine () distribution can be obtained as follows:
where , , C is as defined in (3), and – are defined in (17). The pdf and contour plots of the sine-skewed transformed sine model for , , , , , , and different values of and are shown in Figure 5 (bottom).
The marginal pdf of for is of the same density as in (23), where and are obtained from (19) and (21). When , is the Möbius-transformed sine-skewed version [37] of the marginal pdf of the sine model. The plots of the skewed pdf in (23) are shown in Figure 6 (right) for and different values of , and . Figure 6 illustrates that the distribution can have both unimodal and bimodal forms.
To expand the skewed circular models, the following models are introduced based on the k sine-skewed model of [37]. The skewed version of the marginal distribution of in (12) is the following:
where C is as defined in (5), is as defined in (13), and . leads to left-skewed distributions, and provides right-skewed distributions. The plots of the skewed pdf in (25) are shown in Figure 7 (left) for , , and different values of , and .
Similarly, the sine-skewed version [37] of the marginal pdf of in (18) is of the same density as in (25), where C is as defined in (3), is as defined in (19), and . The plots of the sine-skewed version of the marginal pdf in (18) are shown in Figure 7 (right) for , , and different values of , and . As can be seen, the distribution is both unimodal and bimodal. Multimodal results for .
4. Maximum Likelihood Estimation
In this section, the maximum likelihood method is outlined to obtain the estimates of parameters for both the transformed cosine and sine models. Suppose that are the parameters associated with the transformed cosine model (10). The log-likelihood function of the transformed cosine model is represented as follows:
where C is as defined in (5), and – are as defined in (11). The MLE of the parameters, , can be determined by maximizing (26) with respect to .
Supposing that are the parameters associated with the transformed sine model (16), the log-likelihood function of the transformed sine model can be represented as follows:
where C is as defined in (3), and – are as defined in (17). The maximization of (27) with respect to results in the MLE of the parameters, .
By setting the partial derivatives of the log-likelihood functions in (26) and (27) with respect to to zero, the MLEs of can be derived for the transformed cosine and sine models. Given the fact that no closed-form expressions exist, it is necessary to use numerical methods to obtain the MLEs. Operationally, the maximization of (26) and (27) with respect to is obtained by the DEoptim package in the R software [38] based on the differential evolution (DE) algorithm [39]. Extensive studies have validated its significant performance as a global optimization algorithm for continuous numerical minimization problems [40]. It is worth noting that this package was also used to obtain the MLEs of the parameters for sine-skewed versions and mixtures of transformed cosine and sine models.
5. Protein Structure Application
To demonstrate the performance of the proposed models in modelling the dihedral angles and the planar and torsion angles in a protein structure, three datasets are considered, which are available at http://scop.mrc-lmb.cam.ac.uk/scop/. SCOP.1 contains 10,188 planar and torsion angles (see Figure 1A) for about 63 protein domains that were randomly selected from three remote protein classes in the structural classification of proteins (SCOP). SCOP.3 includes 4607 planar and torsion angles from approximately 40 protein chains, and the TCBIG.VAL.right set consists of 2673 dihedral angles (see Figure 7B) [41]. The Ramachandran plots [1] for each dataset are presented in Figure 8. As can be seen, the datasets are at least bimodal, so bimodal or mixture distributions will be good choices for fitting.
Figure 8.
Ramachandran plots for each dataset.
The transformed sine and cosine models in (16) and (10), along with their competitors—the sine model, and a mixture of sine models (see (2); [13]), the cosine model, and a mixture of cosine models (see (4); [14]), and a mixture of bivariate wrapped Cauchy models (see (8); [33])—were fitted to the SCOP.1 and SCOP.3 datasets. A mixture distribution with two components was investigated as follows:
where and and are two toroidal distributions. The estimation of parameters, identifiability, and choosing the number of mixing components and parameters are among the well-known challenges in the application of mixture distributions. Furthermore, when the empirical density of the data is highly asymmetric, it can result in a misleading statistical inference of the parameters [42]. Multimodal distributions, which represent the random behaviour of data with multi-mode presence, can provide better model fitting. This is observed here using the bimodal transformed sine model.
The sine-skewed versions of the aforementioned distributions [24] form part of these evaluations. The results, including the MLEs of parameters, log-likelihood, Akaike information criterion (AIC), and the Bayesian information criterion (BIC), are shown in Table 1 and Table 2. Based on these results, the bimodal transformed sine model in (16) provides the best fit for the data, and its performance is better than that of the mixture models for these datasets. Based on the symmetry test of Ameijeiras-Alonso and Ley in [24] and the values of log-likelihood in Table 1 and Table 2, there is no evidence that rejects the fact that underlying distributions for SCOP.1 and SCOP.3 are pointwise symmetric. The results of the mixture of transformed sine and the mixture of transformed cosine models are not reported in Table 1 and Table 2 because . Scatter plots of the data, together with contour plots of the fitted distributions are provided in Figure 9 and Figure 10.
Table 1.
Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.1 ( 10,188).
Table 2.
Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.3 ().
Figure 9.
Contour plots of fitted pdfs together with scatter plot for SCOP.1 ( 10,188). The last row includes the proposed models.
Figure 10.
Contour plots of fitted pdfs together with scatter plot for SCOP.3 (). The last row includes the proposed models.
With the last dataset TCBIG.VAL.right, good results are not observed upon application of the single component distributions. Therefore, a mixture model might offer a solution. Subsequently, only mixtures of the aforementioned distributions were considered. For comparison, goodness-of-fit was evaluated for mixtures of distributions from transformed sine and cosine models, and for mixtures of distributions from existing models. The results are listed in Table 3. As can be seen, the mixture of transformed sine models provides the best fitting of the data. Scatter plots of the data and contour plots of the fitted distributions are shown in Figure 11.
Table 3.
Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for TCBIG.VAL.right ().
Figure 11.
Contour plots of fitted pdfs together with scatter plots for TCBIG.VAL.right (). The last row includes the proposed models.
The kernel density plots of the three datasets and the best-fit models obtained for each dataset are shown in Figure 12. According to the levels of contours in the kernel densities of the data and fitted curves, our proposed models provide an accurate fit.
Figure 12.
Kernel density plots of the data, and the best-fit models.
6. Simulation Study
The authors of Ref. [16] explored suitable methods for generating samples from cosine (with positive interaction) and sine models. They found that both Gibbs and rejection sampling approaches performed well, but the latter was more efficient. To simulate a sample from the newly proposed transformed sine and transformed cosine distributions in (16) and (10), four packages in R, which are generally based on rejection sampling, including MCMCpack [43], gibbs.met [44], LearnBayes [45], and MHadaptive [46], were used and the results were compared. These packages are based on Metropolis sampling, random walk Metropolis sampling, Metropolis-Hastings MCMC sampling, and Gibbs sampling with Metropolis steps. First, a sample of size was generated with each package from the transformed sine model in (16), with the parameters , , , , , , and (the best-fit model for the SCOP.1 dataset in the previous section). The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], which use the last 10 percent of the chain, are shown in Figure 13. The runtime of each method is shown in Figure 14 (left) for a sample size of [48] (system: Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz RAM 8.00 GB). Second, the MLE of the parameters and bias and the mean squared error (MSE) of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 1,001,000. The results are listed in Table 4.
Figure 13.
Scatter, trace, and compare-partial plots of the simulated data from the transformed sine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
Figure 14.
Execution times for generating a sample size of from a transformed sine model (left) and a transformed cosine model (right) for each method.
Table 4.
Maximum likelihood estimates of parameters and bias, and the MSE of the estimates for the simulated data obtained from each method.
Similarly, for the transformed cosine model in (10) with parameters , , , , , , and , the aforementioned R packages were applied, first to generate a sample size of . The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], are shown in Figure 15. The runtime of each method is presented in Figure 14 (right) for a sample size of n = 100 [48]. Then, the MLE of the parameters and bias and the MSE of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 100,1000. The results are listed in Table 4, which support the performance of the selected approach for obtaining the MLEs of parameters. As shown in Figure 14, the MCMCmetrop1R is the highest-speed method, and gibbs_met is the lowest-speed method. According to the results in Table 4, rejection sampling provides accurate results. Gibbs sampling with Metropolis steps (gibbs_met) is also precise despite the low speed. With increasing n, bias and MSE decrease.
Figure 15.
Scatter, trace, and compare-partial plots of the simulated data from the transformed cosine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
7. Conclusions
In MCMC protein sampling for predicting the 3D structure, when the proposal distribution is closer to the stationary distribution, the results are more accurate. Therefore, a suitable proposal distribution can be defined using the angles and bond lengths observed in natural proteins. Statistical distributions for modelling protein dihedral angles can be used as proposal distributions for MCMC protein sampling. We gave a brief overview of existing symmetric models that formed the basis of the proposed models in this paper ((2) and (4)). In addition, new Möbius transformation-induced toroidal distributions, together with skewed versions, were developed in this study as alternatives to proposal distributions for the MCMC sampling of proteins. We demonstrated their performance with three protein datasets of toroidal nature and graphically illustrated their flexible behaviour. The AIC and BIC confirmed the better performance of our proposed models in comparison with the existing models. These newly proposed models even outperformed mixtures of well-known models for modelling toroidal data. In comparison with the existing toroidal models, these proposed models reflect the protein structural information better and should be incorporated into proposal distributions. Lastly, to meet the need for sampling of proposal distribution in the MCMC algorithm, suitable methods for generating samples from these new models were explored using different types of the Metropolis sampling. In the future, one can investigate the performance of the Möbius transformation to obtain new cylindrical distributions.
Author Contributions
Conceptualization, M.A. and A.B.; methodology, M.A., N.N.R., A.B. and W.-D.S.; validation, M.A., N.N.R. and A.B.; formal analysis, M.A., N.N.R. and A.B.; investigation, M.A., N.N.R., A.B. and W.-D.S.; writing—original draft preparation, N.N.R.; writing—review and editing, M.A., N.N.R. and A.B.; visualization, M.A., N.N.R., A.B. and W.-D.S.; supervision, M.A. and A.B.; project administration, N.N.R.; funding acquisition, N.N.R. and A.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by National Research Foundation grant number 71199, 109214, 120839.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data are available at http://scop.mrc-lmb.cam.ac.uk/scop/ (accessed on 20 August 2020).
Acknowledgments
We would like to sincerely thank the three anonymous reviewers for their constructive comments that improved the paper. This work was based on research supported in part by the National Research Foundation (NRF) of South Africa, SARChI Research Chair UID: 71199; Ref.: IFR170227223754 grant No. 109214; Ref.: SRUG190308422768 grant No. 120839, STATOMET at the Department of Statistics at the University of Pretoria, and DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), South Africa. The opinions expressed and conclusions arrived at are those of the authors and are not necessarily attributed to the CoE-MaSS or the NRF.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Appendix A.1. Proof of Proposition 1
When , pdf (10) tends to the cosine distribution, which for large values of and is concentrated near 0. Suppose (without loss of generality), according to Theorem 1 in [14] and using Taylor expansions, , where , with .
Appendix A.2. Proof of Corollary 1
Without loss of generality, we consider . According to (12), we conclude that:
In (A1), if , then . Therefore, for , , and for , . Thus, is increasing in and decreases from 0 to . In addition, , which means that is symmetric around 0; thus, for , is unimodal. If , decreases from to 0 and increases from 0 to , and and . From Lemma 1 in Singh et al. (2002), is a decreasing function of t; therefore, is increasing in and decreases from 0 to . It can be concluded that is decreasing in and increasing in ; hence, if , then for and for ; which means that is unimodal. If and , then is first increasing and then decreasing in , which means that is bimodal. A more detailed proof is provided by the authors upon request.
Appendix A.3. Proof of Corollary 2
Suppose (without loss of generality). According to (18), the following result can be obtained:
In (A2), if , then and the sign of (A2) depends on the sign of . Hence, for , and for , . Thus, is increasing in and decreasing from to . In addition, , which means that is symmetric around 0; therefore, is unimodal. For , is an increasing function of , and according to Lemma 1 in [13], is a decreasing function of . We can conclude that if , then is a decreasing function from 0 to , and because is symmetric around 0, it increases from to 0. If , then first increases and then decreases in and (because it is symmetric around 0), which states that is bimodal.
References
- Ramachandran, G.T.; Sasisekharan, V. Conformation of polypeptides and proteins. Adv. Protein Chem. 1968, 23, 283–437. [Google Scholar]
- Holley, L.H.; Karplus, M. Protein secondary structure prediction with a neural network. Proc. Natl. Acad. Sci. USA 1989, 86, 152–156. [Google Scholar] [CrossRef] [Green Version]
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version]
- Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
- Irbäck, A.; Mohanty, S. PROFASI: A Monte Carlo simulation package for protein folding and aggregation. J. Comput. Chem. 2006, 27, 1548–1555. [Google Scholar] [CrossRef]
- Jones, D.T. Successful ab initio prediction of the tertiary structure of NK-lysin using multiple sequences and recognized supersecondary structural motifs. Proteins Struct. Funct. Bioinform. 1997, 29, 185–191. [Google Scholar] [CrossRef]
- Jones, T.A.; Thirup, S. Using known substructures in protein model building and crystallography. Embo J. 1986, 5, 819–822. [Google Scholar] [CrossRef]
- Simons, K.T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 1997, 268, 209–225. [Google Scholar] [CrossRef] [Green Version]
- Ley, C.; Verdebout, T. Applied Directional Statistics: Modern Methods and Case Studies; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Ley, C.; Verdebout, T. Modern Directional Statistics; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
- Mardia, K.V. Statistics of directional data. J. R. Stat. Soc. Ser. B (Methodol.) 1975, 37, 349–371. [Google Scholar] [CrossRef]
- Rivest, L.P. A distribution for dependent unit vectors. Commun. Stat.-Theory Methods 1988, 17, 461–483. [Google Scholar] [CrossRef]
- Singh, H.; Hnizdo, V.; Demchuk, E. Probabilistic model for two dependent circular variables. Biometrika 2002, 89, 719–723. [Google Scholar] [CrossRef]
- Mardia, K.V.; Taylor, C.C.; Subramaniam, G.K. Protein bioinformatics and mixtures of bivariate von-Mises distributions for angular data. Biometrics 2007, 63, 505–512. [Google Scholar] [CrossRef]
- Kent, J.T.; Mardia, K.V.; Taylor, C.C. Modelling strategies for bivariate circular data. In Proceedings of the Leeds Annual Statistical Research Conference; The Art and Science of Statistical Bioinformatics, Leeds University Press: Leeds, UK, 2008; pp. 70–73. [Google Scholar]
- Mardia, K.V.; Frellsen, J. Statistics of bivariate von Mises distributions. In Bayesian Methods in Structural Bioinformatics; Springer: Berlin/Heidelberg, Germany, 2012; pp. 159–178. [Google Scholar]
- Mardia, K.V.; Hughes, G.; Taylor, C.C.; Singh, H. A multivariate von Mises distribution with applications to bioinformatics. Can. J. Stat. 2008, 36, 99–109. [Google Scholar] [CrossRef]
- Wehrly, T.E.; Johnson, R.A. Bivariate models for dependence of angular observations and a related Markov process. Biometrika 1980, 67, 255–256. [Google Scholar] [CrossRef]
- Jones, M.C.; Pewsey, A.; Kato, S. On a class of circulas: Copulas for circular distributions. Ann. Inst. Stat. Math. 2015, 67, 843–862. [Google Scholar] [CrossRef]
- Fernández-Durán, J.J. Models for circular–linear and circular–circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics 2007, 63, 579–585. [Google Scholar] [CrossRef]
- García-Portugués, E.; Crujeiras, R.M.; González-Manteiga, W. Exploring wind direction and SO2 concentration by circular–linear density estimation. Stoch. Environ. Res. Risk Assess. 2013, 27, 1055–1067. [Google Scholar] [CrossRef] [Green Version]
- Pewsey, A.; García-Portugués, E. Recent advances in directional statistics. TEST 2021, 30, 1–58. [Google Scholar] [CrossRef]
- Di Marzio, M.; Panzera, A.; Taylor, C.C. Kernel density estimation on the torus. J. Stat. Plan. Inference 2011, 141, 2156–2173. [Google Scholar] [CrossRef] [Green Version]
- Ameijeiras-Alonso, J.; Ley, C. Sine-skewed toroidal distributions and their application in protein bioinformatics. Biostatistics 2020. Available online: https://doi.org/10.1093/biostatistics/kxaa039 (accessed on 20 January 2021). [CrossRef]
- Kato, S.; Shimizu, K.; Shieh, G.S. A circular–circular regression model. Stat. Sin. 2008, 18, 633–645. [Google Scholar]
- Shieh, G.S.; Johnson, R.A. Inferences based on a bivariate distribution with von-Mises marginals. Ann. Inst. Stat. Math. 2005, 57, 789–802. [Google Scholar] [CrossRef]
- Shieh, G.S.; Zheng, S.; Johnson, R.A.; Chang, Y.F.; Shimizu, K.; Wang, C.C.; Tang, S.L. Modeling and comparing the organization of circular genomes. Bioinformatics 2011, 27, 912–918. [Google Scholar] [CrossRef]
- Liu, D.; Peddada, S.D.; Li, L.; Weinberg, C.R. Phase analysis of circadian-related genes in two tissues. BMC Bioinform. 2006, 7, 87. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Downs, T.D.; Mardia, K.V. Circular regression. Biometrika 2002, 89, 683–697. [Google Scholar] [CrossRef]
- Jones, M.C. The Möbius distribution on the disc. Ann. Inst. Stat. Math. 2004, 56, 733–742. [Google Scholar] [CrossRef]
- Kato, S.; Jones, M.C. A family of distributions on the circle with links to, and applications arising from, Möbius transformation. J. Am. Stat. Assoc. 2010, 105, 249–262. [Google Scholar] [CrossRef] [Green Version]
- Wang, M.Z.; Shimizu, K. On applying Möbius transformation to cardioid random variables. Stat. Methodol. 2012, 9, 604–614. [Google Scholar] [CrossRef]
- Kato, S.; Pewsey, A. A Möbius transformation-induced distribution on the torus. Biometrika 2015, 102, 359–370. [Google Scholar] [CrossRef]
- Kato, S. A distribution for a pair of unit vectors generated by Brownian motion. Bernoulli 2009, 15, 898–921. [Google Scholar] [CrossRef]
- Kato, S.; McCullagh, P. Some properties of a Cauchy family on the sphere derived from the Möbius transformation. Bernoulli 2020, 26, 3224–3248. [Google Scholar] [CrossRef]
- McCullagh, P. Möbius transformation and Cauchy parameter estimation. Ann. Stat. 1996, 24, 787–808. [Google Scholar] [CrossRef]
- Abe, T.; Pewsey, A. Sine-skewed circular distributions. Stat. Pap. 2011, 52, 683–707. [Google Scholar] [CrossRef]
- Mullen, K.; Ardia, D.; Gil, D.L.; Windover, D.; Cline, J. DEoptim: An R package for global optimization by differential evolution. J. Stat. Softw. 2011, 40, 1–26. [Google Scholar] [CrossRef] [Green Version]
- Storn, R.; Price, K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
- Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Najibi, S.M.; Maadooliat, M.; Zhou, L.; Huang, J.Z.; Gao, X. Protein structure classification and loop modeling using multiple Ramachandran distributions. Comput. Struct. Biotechnol. J. 2017, 15, 243–254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Moghimbeygi, M.; Golalizadeh, M. Spherical logistic distribution. Commun. Math. Stat. 2020, 8, 151–166. [Google Scholar] [CrossRef]
- Martin, A.D.; Quinn, K.M.; Park, J.H.; Park, M.J.H. MCMCpack: Markov Chain Monte Carlo (MCMC) Package; Version 1.5-0; R Package: Vienna, Austria, 2020; Available online: https://cran.r-project.org/web/packages/MCMCpack/index.html (accessed on 25 August 2020).
- Li, L. gibbs.met: Naive Gibbs Sampling with Metropolis Steps; Version 1.1-3; R Package: Vienna, Austria, 2015; Available online: https://cran.r-project.org/web/packages/gibbs.met/index.html (accessed on 25 August 2020).
- Albert, J. LearnBayes: Functions for Learning Bayesian Inference; Version 2.15.1; R Package: Vienna, Austria, 2018; Available online: https://cran.r-project.org/web/packages/LearnBayes/index.html (accessed on 25 August 2020).
- Chivers, C.; Chivers, M.C. MHadaptive: General Markov Chain Monte Carlo for Bayesian Inference Using Adaptive Metropolis-Hastings Sampling; Version 1.1-8; R Package: Vienna, Austria, 2015; Available online: https://cran.r-project.org/web/packages/MHadaptive/index.html (accessed on 25 August 2020).
- Fernández-i-Marın, X. ggmcmc: Analysis of MCMC samples and Bayesian inference. J. Stat. Softw. 2016, 70, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Mersmann, O. Microbenchmark: Accurate Timing Functions; Version 1.4-7; R Package: Vienna, Austria, 2019; Available online: https://www.rdocumentation.org/packages/microbenchmark/versions/1.4-7/topics/microbenchmark (accessed on 25 August 2020).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).