Next Article in Journal
Diversity of HLA-A2-Restricted and Immunodominant Epitope Repertoire of Human T-Lymphotropic Virus Type 1 (HTLV-1) Tax Protein: Novel Insights among N-Terminal, Central and C-Terminal Regions
Next Article in Special Issue
Is Cancer Reversible? Rethinking Carcinogenesis Models—A New Epistemological Tool
Previous Article in Journal
Biophysical and Biochemical Characterization of the Binding of the MarR-like Transcriptional Regulator Saro_0803 to the nov1 Promotor and Its Inhibition by Resveratrol
Previous Article in Special Issue
Systems Biology Analysis of Temporal Dynamics That Govern Endothelial Response to Cyclic Stretch
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Bayesian Method for Estimating the Degree of the Skewness of X Chromosome Inactivation Based on the Mixture of General Pedigrees and Unrelated Females

1
Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China
2
Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Biomolecules 2023, 13(3), 543; https://doi.org/10.3390/biom13030543
Submission received: 4 February 2023 / Revised: 13 March 2023 / Accepted: 14 March 2023 / Published: 16 March 2023
(This article belongs to the Special Issue Systems Biology and Omics Approaches for Complex Human Disease)

Abstract

:
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases. Several methods have been proposed to estimate the degree of XCI-S (denoted as γ ) for quantitative and qualitative traits based on unrelated females. However, there is no method available for estimating γ based on general pedigrees. Therefore, in this paper, we propose a Bayesian method to obtain the point estimate and the credible interval of γ based on the mixture of general pedigrees and unrelated females (called mixed data for brevity), which is also suitable for only general pedigrees. We consider the truncated normal prior and the uniform prior for γ . Further, we apply the eigenvalue decomposition and Cholesky decomposition to our proposed methods to accelerate the computation speed. We conduct extensive simulation studies to compare the performances of our proposed methods and two existing Bayesian methods which are only applicable to unrelated females. The simulation results show that the incorporation of general pedigrees can improve the efficiency of the point estimation and the precision and the accuracy of the interval estimation of γ . Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use.

1. Introduction

X chromosome inactivation (XCI) is an important epigenetic phenomenon, which was described by Lyon [1] for the first time. In mammals, females have two X chromosomes, whereas males have only one X chromosome. During the early development of embryos in females, one of the two X chromosomes becomes a Barr body and remains inactivated in subsequent somatic cells to ensure the balance of transcriptional dosages on the X chromosome between females and males [2]. In general, the process of XCI is random. Specifically, in females, approximately 50% of the cells have the paternal allele at an X-chromosomal locus inactivated, and the remaining approximately 50% of the cells keep the maternal allele inactivated, which is called random XCI (XCI-R) [3]. However, there are still two other patterns of XCI: the escape from XCI (XCI-E) and the skewed XCI (XCI-S) [3]. XCI-E means that a female has a region of the X chromosome without inactivation, i.e., the alleles on both X chromosomes are kept active. In humans, approximately 15–30% of X-linked loci have been reported to undergo XCI-E [4,5]. As for XCI-S, more than 75% of the cells inactivate the same allele at an X-chromosomal locus in females [6,7,8]. In some extremely skewed cases, it is possible that more than 90% of the cells have the same allele being inactivated [9].
Some X-linked diseases have been reported to be associated with XCI-S, such as esophageal carcinoma, recurrent spontaneous abortion, and Klinefelter’s syndrome [10,11,12,13]. The degree of XCI-S can affect the severity of X-linked diseases in heterozygous females [14]. A larger proportion of the cells with the activated deleterious allele in heterozygous females will cause more severe expression of the related diseases, whereas a smaller proportion can protect the females from the diseases [6,7]. For example, the XCI-S towards mutant alleles on the F9 gene may cause moderately severe haemophilia B, whereas the XCI-S against the same mutant alleles may cause mild haemophilia B in heterozygous females [15]. Thus, the incorporation of the XCI-S information into association analysis may improve the test power of the X-chromosomal association tests [16]. In fact, some methods have been proposed to test for the association between X-chromosomal single nucleotide polymorphisms (SNPs) and traits, which consider the XCI patterns [17,18,19,20,21,22,23,24,25,26]. For unrelated data, Wang et al. [24] proposed a permutation-based maximum likelihood ratio association test for qualitative traits, which takes account of all the XCI patterns. More specifically, for XCI, three female genotypes ( d d , D d , and D D ) are encoded as 0, γ , and 2, respectively, meanwhile two male genotypes ( d and D ) are encoded as 0 and 2, respectively, where d and D are the normal and deleterious alleles, respectively.  γ [ 0 , 2 ] is the unknown genotypic value used to measure the degree of XCI-S. For XCI-E, three female genotypes are encoded as 0, 1, and 2, and two male genotypes are encoded as 0 and 1. For pedigree data, Ding et al. [21] put forward a Monte Carlo pedigree disequilibrium test for X-linked qualitative traits and Zhang et al. [25] constructed the orthogonal model and used the kinship matrix to represent the correlation between the individuals in pedigrees for X-linked quantitative traits. Both methods take XCI-R or XCI-E into account, however, they are not suitable for XCI-S. Furthermore, the method of Ding et al. [21] cannot directly incorporate covariates and the method of Zhang et al. [25] is time-consuming. On the other hand, there is an autosomal association test, named GEMMA, which can incorporate covariates and is computationally efficient for pedigree data [27]. Moreover, GEMMA can be easily extended to accommodate the XCI-R and XCI-E patterns.
Recently, some methods to measure the degree of XCI-S have become available. Based on family trios (parents and their affected daughter), Xu et al. [28] proposed a statistical index to measure γ for qualitative traits. Wang et al. [29] and Li et al. [30] used unrelated females to estimate γ and derive the corresponding confidence interval (CI) for qualitative and quantitative traits, respectively. In Wang et al. [29] and Li et al. [30], γ was expressed as the ratio of two regression coefficients, and the CI was obtained using Fieller’s method. However, these methods may yield unbounded CIs when the denominator in the ratio is close to zero. It should be noted that Wang et al. [31] put forward a penalized Fieller’s method which can obtain the bounded CI of a ratio by penalizing the denominator of the ratio away from zero. Therefore, Yu et al. [32] applied the penalized Fieller’s method to the estimation of the degree of XCI-S for unrelated females. However, the penalized Fieller’s method does not consider the constraint condition of γ [ 0 , 2 ] , and just simply uses the interval [0, 2] to truncate the point estimate and the CI of γ , which may result in extreme point estimates (0 or 2), empty sets, non-information intervals (i.e., [0, 2]), and discontinuous intervals. Therefore, Yu et al. [32] considered the constraint condition γ [ 0 , 2 ] as the prior, and further proposed a Bayesian method for estimating the degree of XCI-S based on unrelated females. The Bayesian method can avoid the generation of extreme point estimates, empty sets, non-information intervals, and discontinuous intervals. However, the above-mentioned methods are all based on family trios or unrelated females and cannot accommodate general pedigrees. It should be noted that general pedigrees are increasingly popular because pedigree designs are naturally equipped to control for population stratification [33,34]. Therefore, it is necessary to suggest a method for estimating the degree of XCI-S based on general pedigrees or the mixture of general pedigrees and unrelated females.
In this paper, we propose a Bayesian method to estimate the degree of XCI-S based on the mixture of general pedigrees and unrelated females for both quantitative and qualitative traits, which is also suitable for only general pedigrees. We use the kinship matrix to represent the correlation between females in general pedigrees and construct the generalized linear mixed model. The prior of γ is set to be a truncated normal distribution and a uniform distribution. The posterior distribution of γ is drawn using a Hamiltonian Monte Carlo (HMC) sampling algorithm. We regard the mode of the sample from the posterior distribution as the point estimate of γ , and consider the corresponding highest posterior density interval (HPDI) as the credible interval of γ [35]. Because the posterior sampling process of the generalized linear mixed model is very computationally intensive [36], we additionally employ the eigenvalue decomposition (EVD) and Cholesky decomposition to accelerate the computation speed. Further, we conduct extensive simulation studies to compare the performances of our proposed methods and the existing Bayesian methods. Finally, we apply our proposed methods to Minnesota Center for Twin and Family Research (MCTFR) data for their practical use.

2. Materials and Methods

2.1. Notations

Consider an X-chromosomal locus with alleles d and D being the normal allele and the deleterious allele, respectively. Suppose that we have collected the X-linked traits (quantitative or qualitative), the genotypes at the locus of N p pedigrees (including n p individuals, males or females), and additional n I f independent/unrelated females. Note that the individuals in the same pedigree are genetically correlated. Since XCI only exists in females, we only select n p f females in these pedigrees and additional n I f unrelated females to build the model, and we assume that n f = n p f + n I f . Let Y i be the trait of the i th female and G i = { d d , D d , D D } indicate the genotype of the i th female ( i = 1 , 2 , , n p f , n p f + 1 , , n f ) . Then, Y = ( Y 1 , Y 2 , , Y n p f , Y n p f + 1 , , Y n f ) T is the trait vector of all the females, and G = ( G 1 , G 2 , , G n p f , G n p f + 1 , , G n f ) T is the genotype vector of all the females. According to Wang et al. [24], we encode the genotypes G i = { d d , D d , D D } as the genotypic values X i = { 0 , γ , 2 } , where γ [ 0 , 2 ] represents the degree of XCI-S. As such, the genotypic value vector of all the females can be expressed as X = ( X 1 , X 2 , , X n p f , X n p f + 1 , , X n f ) T . Considering the correlations among n p f females selected from N p pedigrees, we utilize the kinship matrix to measure the correlations of this kind. To be specific, we first use both the males and the females in the pedigrees to construct an n p × n p kinship matrix ψ , which can be obtained using the algorithm of Lange [37] through the “kinship2” package of R software [38]. Then, we select the corresponding rows and columns of n p f females in matrix ψ and obtain the n p f × n p f matrix ψ f of these n p f females. As for n I f unrelated females, the genetic relatedness matrix can be expressed as the n I f × n I f identity matrix I n I f × n I f . Finally, the genetic relatedness matrix φ of Y can be denoted as the following block matrix:
φ = 2 ψ f 0 0 I n I f × n I f
We build the generalized linear mixed model to describe the association between G i and Y i
h μ i = β X i + a T Z i + b i
where β is the regression coefficient of X i ; Z i = ( Z i 1 , Z i 2 , , Z i m ) T is the vector of m covariates of the i th female including 1 as the first element and Z = ( Z 1 , Z 2 , , Z n p f , Z n p f + 1 , , Z n f ) T is an n f × m covariate matrix; a = ( a 1 , a 2 , , a m ) T is the m × 1 vector of the regression coefficients of Z i with a 1 being the intercept; b i is a random effect, and the random variable b = ( b 1 , b 2 , , b n p f , b n p f + 1 , , b n f ) T is generated by the multivariate normal distribution, i.e., b ~ M V N ( 0 , σ g 2 φ ) , where σ g 2 is the variance of the polygenic effects; h ( ) is the link function; and μ i = E Y i X i , Z i is the conditional mean of Y i given X i and Z i .
To estimate γ , we decompose X i in Equation (1) into X i = γ X 1 i + ( 2 γ ) X 2 i according to Wang et al. [29], where X 1 i and X 2 i are two indicator variables. X 1 i = I { G i = D d o r D D } indicates whether the i th female contains at least one deleterious allele D , and X 2 i = I { G i = D D } denotes whether the i th female has two deleterious alleles. Then, we can rewrite Equation (1) as follows:
h ( μ i ) = β γ X 1 i + β ( 2 γ ) X 2 i + a T Z i + b i
For quantitative traits, h ( ) is the identity function and Y i has the residual error ε i , so Equation (2) becomes a linear mixed model
Y i = β γ X 1 i + β ( 2 γ ) X 2 i + a T Z i + b i + ε i
where ε i ~ N ( 0 , σ e 2 ) and σ e 2 is the variance of ε i . For qualitative traits, h ( ) is the logit function, and Equation (2) can be written as
l o g i t ( μ i ) = β γ X 1 i + β ( 2 γ ) X 2 i + a T Z i + b i

2.2. Building Bayesian Models

For quantitative traits, Y = ( Y 1 , Y 2 , , Y n p f , Y n p f + 1 , , Y n f ) T follows a multivariate normal distribution according to Equation (3), i.e.,
Y ~ M V N ( β γ X 1 + β ( 2 γ ) X 2 + Z a , σ g 2 φ + σ e 2 I n f × n f )
where X 1 = ( X 11 , X 12 , , X 1 n p f , X 1 ( n p f + 1 ) , , X 1 n f ) T and X 2 = ( X 21 , X 22 , , X 2 n p f , X 2 ( n p f + 1 ) , , X 2 n f ) T . The unknown parameters are θ 1 = ( β , γ , a T , σ g , σ e ) T , and let L 1 be the likelihood function of Y based on expression (5). So, the posterior distribution of θ 1 can be expressed as f θ 1 X 1 , X 2 , Z , φ = f θ 1 L 1 f θ 1 L 1 d θ 1 , where f θ 1 is the joint prior of θ 1 .
As for qualitative traits, Y i follows a Bernoulli distribution based on Equation (4), i.e.,
Y i ~ B ( p i )
where p i = l o g i t 1 ( η i ) and
η i = β γ X 1 i + β 2 γ X 2 i + a T Z i + b i
The unknown parameters are θ 2 = ( β , γ , a T , σ g ) T , and let L 2 be the likelihood function of Y based on expression (6). The posterior distribution of θ 2 can be expressed as f θ 2 X 1 , X 2 , Z , φ = f θ 2 L 2 f θ 2 L 2 d θ 2 , where f θ 2 is the joint prior of θ 2 .

2.3. Eigenvalue Decomposition and Cholesky Decomposition for Accelerating Computation Speed

It should be noted that, due to the high-dimensional matrix φ , the Bayesian posterior sampling processes of f θ 1 X 1 , X 2 , Z , φ and f θ 2 X 1 , X 2 , Z , φ would be computationally intensive, especially when n f is large [36,39]. So, according to Runcie and Crawford [40] and Zhao et al. [36], we use the EVD and Cholesky decomposition to accelerate the sampling process for quantitative and qualitative traits, respectively. The transformed posterior distributions of θ 1 and θ 2 are denoted by f * θ 1 X 1 * , X 2 * , Z * , Σ and f * θ 2 X 1 , X 2 , Z , C , h , respectively, where X 1 * = Q X 1 , X 2 * = Q X 2 and Z * = Q Z , respectively, are the transformed X 1 , X 2 and Z based on φ = Q T Σ Q by the EVD; C is a lower triangular matrix satisfying φ = C C T by Cholesky decomposition; and h follows M V N ( 0 , I n f × n f ) and satisfies σ g C h ~ M V N ( 0 , σ g 2 φ ) . The details refer to Supplementary Appendices SA and SB. From Table 1, we find that using the EVD and Cholesky decomposition in the posterior sampling process can greatly reduce running time (the details can be seen in Section 3).

2.4. HMC Algorithm and Priors

Note that it is difficult to derive the closed forms of the posterior distributions f * θ 1 X 1 * , X 2 * , Z * , Σ and f * θ 2 X 1 , X 2 , Z , C , h , so we use the HMC algorithm [35] to sample the parameters from the approximate posterior distributions, which can be efficiently implemented through the “cmdstanr” package in R software. We choose the HMC algorithm because it can improve the independence of the samples and has higher efficiency than the other Markov-Chain Monte Carlo methods [35].
According to Yu et al. [32], we set the priors of θ 1 and θ 2 as follows: For nuisance parameters β and a , we select non-informative priors to reduce their influence on the posterior distributions. Specifically, we assume that β ~ N ( 0 , 10 2 ) and a ~ M V N ( 0 , d i a g ( 10 2 , 10 2 , , 10 2 ) ) [41] so that β and a can be sampled from the positive and negative values with equal probabilities. For the standard deviation σ g of polygenic effects, we choose the exponential distribution with mean being 1, i.e., σ g ~ e x p ( 1 ) [35]. For θ 1 based on quantitative traits, there is an extra parameter σ e , and we also suppose that σ e ~ e x p ( 1 ) . For the parameter γ of interest, by considering the constraint condition of γ [ 0 , 2 ] , we set two priors. One is a uniform distribution from 0 to 2, i.e., γ ~ U ( 0 , 2 ) , which is a non-informative prior. The other is to assume that the more skewed values of γ have the lower probability and the probability of γ being 1 is the highest, which is consistent with the genetic background [3]. In this way, we set γ to obey a truncated normal distribution with both the parameters being fixed at 1 and the values ranging from 0 to 2. The probability density function of the prior of γ is
f γ = ϕ γ 1 Φ 1 Φ 1 , 0 γ 2 0 , o t h e r w i s e
where ϕ is the probability density function of the standard normal distribution and Φ is its cumulative distribution function. We assume that the unknown parameters are unrelated to each other because the HMC algorithm does not dramatically suffer from the correlated parameters in the model [35]. Therefore, the prior distributions f θ 1 and f θ 2 can be calculated as the product of the priors of all the parameters. Moreover, f θ 1 and f θ 2 can also be flexibly set according to practical background.
After we obtain the posterior samples of θ 1 and θ 2 through the HMC algorithm, we calculate the mode of the samples as the point estimate of γ , and compute the HPDI of the samples as the credible interval of γ . We denote the Bayesian methods with the truncated normal distribution and the uniform distribution for the mixture of general pedigrees and additional unrelated females as BNM and BUM, and the corresponding point estimates yielded by these two methods as γ ^ B N M and γ ^ B U M , respectively.

2.5. Situations When Considering General Pedigrees and Unrelated Females, Respectively

Notice that our proposed methods are also applicable to the situation with only general pedigrees and that with only unrelated females. For the situation with only general pedigrees, the genetic relatedness matrix φ degenerates to twice the kinship matrix of all the n p f females from the pedigrees, i.e., 2 ψ f . We denote the Bayesian methods with the truncated normal distribution and the uniform distribution for general pedigrees as BNP and BUP, and the corresponding point estimates as γ ^ B N P and γ ^ B U P , respectively. For the situation with only unrelated females, our proposed methods still work where the genetic relatedness matrix φ is reduced to be the identity matrix I n I f × n I f . However, compared with the existing BN and BU methods having the prior of γ being the truncated normal distribution and the uniform distribution, respectively [32], our proposed methods require additionally estimating the random effects b i ’s, which may reduce the estimation accuracy and be time-consuming. Therefore, in practice, for unrelated females, we recommend using the existing BN and BU methods. Furthermore, just like Yu et al. [32], the point estimates of γ based on the BN and BU methods are represented as γ ^ B N and γ ^ B U , respectively.

2.6. Situation When There Are Missing Genotypes for Some Individuals from General Pedigrees

It should be noted that our proposed methods are also suitable for the situation where the genotypes of some individuals from some pedigrees are missing, by simply excluding the individuals with missing genotypes and deleting the corresponding rows and columns of these individuals from the genetic relatedness matrix φ .

2.7. Simulation Settings

To evaluate the performance of our proposed methods (BNM and BUM for the mixture of general pedigrees and additional unrelated females, and BNP and BUP for only general pedigrees) and compare them with the existing methods (BN and BU for only unrelated females) [32] when estimating the degree of the XCI-S, we conduct the following extensive simulation studies. When simulating general pedigrees, we consider three pedigree structures: (1) the nuclear family with 4 people, (2) the three-generation family with 10 people and (3) the four-generation family with 12 people, as shown in Figure 1. We fix the sex ratio at 1:1 in our simulation study. A total of 50 pedigrees under each pedigree structure are simulated, which leads to N p being 150, n p being 1300, and n p f being approximately 650. For a larger sample size, we simulate 200 pedigrees under each pedigree structure, and the corresponding N p is 600, n p is 5200, and n p f is approximately 2600. Because there are two X chromosomes in females and only one in males, we first generate the genotypes { d d , D d , D D } of the female founders using probabilities { ( 1 p f ) 2 , 2 p f ( 1 p f ) , p f 2 } and the genotypes { d , D } of the male founders using probabilities { ( 1 p m ) , p m } , where p f and p m are the frequencies of the deleterious allele D in females and males, respectively. We first set p f to be 0.3 and 0.1 and keep p m consistent with p f . To simulate the situations with p f and p m being different, we further set p f , p m = 0.3 , 0.1 and 0.1 , 0.3 . Then, we simulate the genotypes of the nonfounders according to Mendelian inheritance. We consider a covariate K , which is generated from the standard normal distribution. Note that the estimation of the degree of XCI-S only needs the females. As such, let K p i be the value of K for the i th female ( i = 1 , 2 , , n p f ) and we only simulate the quantitative trait values of all the n p f females in the pedigrees, which are generated based on the following multivariate normal distribution:
Y p ~ M V N ( β 0 + β X p + δ K p , 2 σ g 2 ψ f + σ e 2 I n p f × n p f )
where Y p is the vector of the quantitative trait values of these n p f females; X p is the vector of their genotypic values with the elements being 0, γ , or 2 respectively corresponding to genotypes { d d , D d , D D } , where the value of γ represents the degree of XCI-S and is randomly sampled from U ( 0 , 2 ) ; K p = ( K p 1 , K p 2 , , K p n p f ) T ; and ψ f is the kinship matrix of the n p f females and I n p f × n p f is an n p f × n p f identity matrix. β 0 is the intercept and δ is the regression coefficient of the covariate K , which are both fixed at 0.5 [42]. According to Schifano et al. [43], we set σ g 2 = { 1 / 3 , 1 } and σ e 2 = 1 , which means that the values of the polygenic heritability h p 2 = σ g 2 / ( σ g 2 + σ e 2 ) = { 0.25 , 0.50 } . Furthermore, we set β = 0.2 so that the heritability due to the causal SNP, h c 2 = β 2 p f ( 1 p f ) / ( σ g 2 + σ e 2 ) , remains less than 2% for the chosen values of p f , σ g 2 , and σ e 2 mentioned above. As for a qualitative trait, we generate the corresponding values using the threshold model [44]. Specifically, once the quantitative trait values in Equation (8) are generated, they are transformed to be affected if they are less than the threshold and otherwise to be unaffected. Here, we fix the prevalence of the disease under study at 0.3, and the threshold is then taken as the 30% quantile of the distribution of the quantitative trait. In addition, to consider the situation in which the genotypes of some individuals in the pedigrees are missing, the missing rate ( M R ) is set to be 0 and 0.4. M R = 0 means that the genotypes of all the individuals in the pedigrees are collected and M R = 0.4 indicates that the genotype of an individual is randomly missing with probability 0.4.
When simulating unrelated females, we directly generate their genotypes { d d , D d , D D } using probabilities { ( 1 p f ) 2 , 2 p f ( 1 p f ) , p f 2 } . For comparing BNP and BUP for only general pedigrees with BN and BU for only unrelated females, respectively, we set the number of unrelated females ( n I f ) to be 650 and 2600, which is almost equal to the number of the females in 150 and 600 pedigrees mentioned above, and we fix the variance of the residual error in the unrelated females at σ g 2 + σ e 2 [45], which is the same as the total variance of the quantitative trait value in the females from the general pedigrees. Other parameters and simulation settings are kept the same as those when simulating general pedigrees. Specifically, the quantitative trait values of the n I f unrelated females are generated according to the following multivariate normal distribution:
Y I ~ M V N ( β 0 + β X I + δ K I , ( σ g 2 + σ e 2 ) I n I f × n I f )
where Y I is the vector of the quantitative trait values of the n I f unrelated females; X I is the vector of their genotypic values with the elements being 0, γ , or 2 corresponding to genotypes { d d , D d , D D } ; K I = ( K I 1 , K I 2 , , K I n I f ) is the covariate vector, where K I i is the value of the covariate K for the i th female ( i = 1 , 2 , , n I f ); and I n I f × n I f is an n I f × n I f identity matrix. As for a qualitative trait, just like simulating the general pedigrees, we also generate the corresponding values using the threshold model [44]. By combining the females in the general pedigrees and additional unrelated females, we can obtain the mixed data. We use the BNM and BUM methods, the BNP and BUP methods, and the existing BN and BU methods to obtain the point estimates and the HPDIs of γ based on the mixed data, only general pedigrees, and only unrelated females, respectively.
Ma et al. [23] claimed that the variance of the quantitative trait under study for heterozygous females ( D d ) may be higher than those for homozygous females ( d d and D D ) due to the XCI and other factors (e.g., gene-gene interactions and gene mutation), and the increase ratio can be up to 20%. However, so far, in our model, we do not consider the heteroscedasticity of this kind because of the potential computation cost in Bayesian inference. To investigate whether our proposed methods are still robust in the presence of the heteroscedasticity, we additionally simulate the mixed data for quantitative traits with the heteroscedasticity. Specifically, we use σ e 0 2 , σ e 1 2 , and σ e 2 2 to represent the residual variance σ e 2 in females with genotypes d d , D d , and D D , respectively. The simulation settings for the mixed data are the same as those under the homoscedasticity, except that we assume σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1.2 , 1 ) here. Furthermore, for comparison, we utilize σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) to represent that the variances across different genotypes are the same. We apply the BNM and BUM methods to the mixed data, and apply the BNP and BUP methods to only general pedigrees.
We conduct 500 replicates for each simulation setting. For each replicate, we set 4 chains for extracting the samples simultaneously. For each chain, we extract 3000 samples, and the first 1000 samples are used for warming up. Therefore, we finally obtain 8000 samples in each replicate. To ensure the convergence, the target acceptance rate is taken as 0.9. We assess the convergence of Markov chains by calculating the convergence diagnostic R ^ [46]. Note that the R ^ ’s of our proposed methods are all less than 1.05, which indicates good convergence and also means that drawing 8000 samples is enough. The above posterior sampling process is implemented using the “cmdstanr” package in R software (version 4.1.2, http://r-project.org, accessed on 2 February 2023). To evaluate the accuracy of the point estimates, we calculate their mean squared errors (MSEs). Here, M S E = w = 1 500 ( γ ^ w γ w ) 2 / 500 , where γ w is the w th true value of γ , and γ ^ w is the estimate of γ w ( w = 1 , 2 , , 500 ). We also draw scatter plots to visually display the six point estimates ( γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U ) against the true values of γ . To compare the performances of the interval estimation of all the six methods (BNM, BUM, BNP, BUP, BN, and BU), we calculate the coverage probability (CP) as well as the median, the mean, the interquartile range, and the standard deviation of the widths of the 95% HPDIs of γ (respectively denoted by W m e d i a n , W m e a n , W i q r , and W s d ). Moreover, we draw scatter plots of the interval widths of all the six methods against the true values of γ .

3. Results

3.1. Simulation Results under the Situations of Homoscedasticity and Allele Frequencies in Females and Males Being the Same

To assess the computation efficiency of our proposed methods based on the EVD and Cholesky decomposition, we considered the BNP method for only general pedigrees as an example. Here, N p was taken to be 150 and 600, p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) and M R = 0 (i.e., there were no missing genotypes in all the pedigrees) for both quantitative and qualitative traits. The other parameters were fixed in the same way as in the “Simulation Settings” subsection. A total of 500 replicates were conducted for each simulation setting. There were two kinds of BNP methods that we wanted to compare: (1) the BNP method with the posterior sampling process based on the EVD (for quantitative traits) or Cholesky decomposition (for qualitative traits), and (2) the BNP method with the posterior sampling process based on the posterior distribution f θ 1 X 1 , X 2 , Z , φ (for quantitative traits) or f θ 2 X 1 , X 2 , Z , φ (for qualitative traits), which is called the original posterior sampling process in this paper. We computed the mean running time of the BNP method based on the EVD or Cholesky decomposition for all 500 replicates. However, it is important to note that the original posterior sampling process may take up a huge amount of time. Therefore, we only calculated the mean running time of the original posterior sampling process over the first 10 replicates. All the computations were performed on a Tsinghua Tongfang Z900 personal computer (Microsoft Windows 7 Enterprise (Service Pack 1), 4 GB of RAM and 3.60 GHz Intel(R) Core(TM) i7-4790 CPU). The results of the mean running time are given in Table 1. As shown in Table 1, the EVD and Cholesky decomposition can greatly speed up the Bayesian sampling process, especially when N p is 600.
The MSEs of the six point estimates ( γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U ) of γ under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) are listed in Table 2. We found that the MSEs of γ ^ B N M and γ ^ B U M based on the mixed data are the smallest under all the simulated scenarios, which means that it is more efficient to estimate the degree of XCI-S by simultaneously using general pedigrees and additional unrelated females. The MSEs of γ ^ B N P and γ ^ B U P for only general pedigrees are slightly larger than those of γ ^ B N and γ ^ B U for only unrelated females in all the simulated situations. This probably demonstrates that general pedigrees provide less information for estimating the degree of XCI-S than unrelated females when the total number of the females in all the pedigrees and that of the unrelated females are the same. As for the two priors of γ , the point estimates ( γ ^ B N M , γ ^ B N P , and γ ^ B N ) with the truncated normal distribution have the MSEs similar to those ( γ ^ B U M , γ ^ B U P , and γ ^ B U ) with the uniform distribution, with γ ^ B N M , γ ^ B N P , and γ ^ B N performing slightly better than γ ^ B U M , γ ^ B U P , and γ ^ B U , respectively. Furthermore, it can be observed from Table 2 that the MSEs of the six point estimates decrease when N p and n I f increase, p f and p m (the frequency of the deleterious allele D ) increase, and σ g 2 (the variance of the polygenic effects) decreases. As expected, compared to M R = 0 (i.e., there are no missing genotypes in all the pedigrees), the MSEs of the six point estimates increase when M R = 0.4 (i.e., the genotypes of about 40% individuals in general pedigrees are missing). In addition, the six point estimates have smaller MSEs for quantitative traits than for qualitative traits.
Figure 2 and Figure 3 show the scatter plots of the six point estimates ( γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U ) against the true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative and qualitative traits, respectively. Supplementary Figures S1–S14 show the corresponding scatter plots under other simulation settings. The six rows of each figure represent the results of the six point estimates, and the two columns of each figure denote the corresponding results with M R = 0 and 0.4 , respectively (i.e., subplots (a), (c), (e), (g), (i) and (k) are the scatter plots of γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U with M R = 0 , respectively, whereas subplots (b), (d), (f), (h), (j) and (l) are the corresponding scatter plots with M R = 0.4 ). The upper side and the right side of each subplot are the distribution of the true value of γ and that of the point estimate of γ , respectively. By comparing the six subplots in the same column of each figure, we found that γ ^ B N M and γ ^ B U M based on the mixed data are closer to the true value of γ than γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U . Moreover, noting that the distribution of the true value of γ is U ( 0 , 2 ) , it can be seen that the distributions of γ ^ B N M and γ ^ B U M are more uniform than those of the four other point estimates. These indicate that it is necessary to combine general pedigrees with unrelated females when estimating γ . The dispersion of γ ^ B N M is slightly smaller than that of γ ^ B U M , and the dispersions of γ ^ B N and γ ^ B U are slightly less than those of γ ^ B N P and γ ^ B U P , although differences of these kinds are not so obvious in most figures. By comparing the two subplots in the same row of each figure, it can be seen that the estimates with M R = 0.4 (in the subplot of the second column) have larger dispersion than those with M R = 0 (in the subplot of the first column), implying that the missing genotypes of some individuals in the collected pedigrees would increase the MSEs of the point estimates. Furthermore, comparing Figure 2 with Figure 3 (or comparing Supplementary Figures S1–S7 with Supplementary Figures S8–S14, respectively) shows that the six point estimates have better performance for quantitative traits than for qualitative traits. In addition, from these figures, the trend of the six point estimates with respect to N p , n I f , p f , p m , and σ g 2 is consistent with that in Table 2. Finally, it is observed from these figures that most of the point estimates can be evenly distributed on both sides of the true value of γ , except for the situations with N p = 150 , n I f = 650 , and p f = p m = 0.1 , where the six point estimates may underestimate γ (Supplementary Figures S2, S3, S9 and S10). However, when N p = 600 , n I f = 2600 , and p f = p m = 0.1 , we can obtain point estimates which are much more evenly distributed around the true value of γ (Supplementary Figures S6, S7, S13 and S14). This suggests that when analyzing the SNPs with low frequencies of the deleterious allele, our proposed point estimates need large sample sizes.
Table 3 describes the CPs of the six interval estimation methods (BNM, BUM, BNP, BUP, BN, and BU) under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) . From Table 3, we can find that all six methods can control the CPs around 95% in all the simulated situations, which verifies their accuracy when estimating the degree of XCI-S. Table 4 and Supplementary Table S1 display the medians and the means of the widths of the 95% HPDIs ( W m e d i a n and W m e a n ), respectively, obtained by the six methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) . From these tables, we can see that the BNM and BUM methods based on the mixed data have smaller W m e d i a n and W m e a n than the other four methods (BNP, BUP, BN, and BU) under all the simulated scenarios, which indicates that simultaneously using general pedigrees and additional unrelated females can improve the precision of the interval estimation of the degree of XCI-S. The W m e d i a n and W m e a n of the BNP and BUP methods for only general pedigrees are slightly larger than those of the BN and BU methods for only unrelated females, which is consistent with the findings based on the MSEs of their corresponding point estimates from Table 2. For two priors of γ , the interval estimation with the truncated normal prior (the BNM, BNP, and BN methods) and that with the uniform prior (the BUM, BUP, and BU methods) have a similar performance, whereas the BNM, BNP, and BN methods respectively obtain slightly smaller W m e d i a n and W m e a n than the BUM, BUP, and BU methods. When N p and n I f increase, p f and p m (the frequency of the deleterious allele D ) increase, σ g 2 (the variance of the polygenic effects) decreases, M R (the probability of the genotype of an individual in a pedigree being missing) changes from 0.4 to 0, or the trait changes from qualitative to quantitative, the W m e d i a n and W m e a n of the six methods decrease.
Table 5 and Supplementary Table S2 show the interquartile range and the standard deviation of the widths of the 95% HPDIs ( W i q r and W s d ), respectively, of the six methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) . Figure 4 and Figure 5 display the scatter plots of the widths of the 95% HPDIs based on the six interval estimation methods (BNM, BUM, BNP, BUP, BN, and BU) against the true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative and qualitative traits, respectively. Supplementary Figures S15–S28 give the corresponding scatter plots under other simulation settings. The six rows of each figure represent the results of the six methods, and the two columns of each figure denote the corresponding results with M R = 0 and 0.4 , respectively. From Table 5, we find that when the prior of γ is fixed to be the truncated normal distribution, the BNM method generally obtains smaller W i q r than the BNP and BN methods under all the simulated scenarios except for the cases of N p = 150 , n I f = 650 , and p f = p m = 0.1 for quantitative traits and those of N p = 150 and n I f = 650 for qualitative traits. Similarly, when the prior of γ is taken as the uniform distribution, the W i q r of the BUM method are less than those of the BUP and BU methods under all the simulated scenarios in general, except for the situations mentioned above. It can be seen in Supplementary Table S2 that the BNM (BUM) method generally derives smaller W s d than the BNP and BN (BUP and BU) methods except for the cases with p f = p m = 0.1 for both quantitative and qualitative traits and those with N p = 150 , n I f = 650 , and p f = p m = 0.3 for qualitative traits. This may be explained by the fact that the largest width of the 95% HPDIs of the six methods is 2, and when N p = 150 and n I f = 650 or p f = p m = 0.1 , the widths of the intervals obtained by the BNP, BUP, BN, and BU methods are very close to 2 (as can be observed in Supplementary Figures S16, S17 and S22–S24), which make the dispersion of the widths of the intervals of the BNP, BUP, BN, and BU methods smaller and cause smaller W i q r and W s d of the BNP, BUP, BN, and BU methods. It is important to note that the width of the 95% HPDI of γ does not follow the normal distribution under most of the simulated scenarios (Supplementary Figures S15–S17, S20–S24, S27 and S28), so the trend of the results of the W s d is not exactly the same as that of the W i q r . On the other hand, the W i q r and W s d of the BNP (BUP) method are larger than those of the BN (BU) method. In addition, the W i q r and W s d of the six methods decrease with higher p f and p m when N p = 600 and n I f = 2600 , and increase when N p = 150 and n I f = 650 and other parameters are unchanged. As for two priors, the BNM, BNP, and BN methods obtain slightly smaller W i q r and W s d than the BUM, BUP, and BU methods. It is shown in some subplots of Figure 4 and Figure 5 and Supplementary Figures S15–S28 that the scatter plots look like an inverted V shape. This indicates that shorter intervals are obtained when the true values of γ are close to 0 or 2, by noting γ [ 0 , 2 ] . On the other hand, in some figures (e.g., Supplementary Figures S16, S17 and S22–S24), most of the widths of the intervals based on the BNP, BUP, BN, and BU methods are very close to 2, which leads to the smaller dispersion of the interval widths. Other findings are similar to those from Table 4 and Table 5, and Supplementary Tables S1 and S2, and we do not discuss them here for brevity.

3.2. Simulation Results When Allele Frequencies in Females and Males Being Different

Supplementary Table S3 shows the MSEs of the point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , and γ ^ B U P under p f , p m = 0.3 , 0.1 and 0.1 , 0.3 , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) . Supplementary Tables S4–S8 give the CP, W m e d i a n , W m e a n , W i q r , and W s d , respectively, of the BNM, BUM, BNP, and BUP methods under p f , p m = 0.3 , 0.1 and 0.1 , 0.3 , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) . It can be observed from Supplementary Table S4 that when p f , p m = 0.3 , 0.1 and 0.1 , 0.3 , all four methods control the CPs around 95%. From Supplementary Tables S3 and S5–S8, the MSE, W m e d i a n , W m e a n , W i q r and W s d of the four methods with p f , p m = 0.3 , 0.1 and 0.1 , 0.3 are generally smaller than those with p f , p m = 0.1 , 0.1 and larger than those with p f , p m = 0.3 , 0.3 (compared with Table 2 and Table 4, Supplementary Table S1, Table 5 and Supplementary Table S2, respectively), implying that our proposed methods still work when there are differences in the frequencies of the deleterious alleles between females and males.

3.3. Simulation Results under Heteroscedasticity

Supplementary Table S9 displays the MSEs of the point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , and γ ^ B U P under p f = p m , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) and ( 1 , 1.2 , 1 ) . Supplementary Tables S10–S14 show the CP, W m e d i a n , W m e a n , W i q r , and W s d , respectively, of the BNM, BUM, BNP, and BUP methods under p f = p m , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) and ( 1 , 1.2 , 1 ) . As shown in Supplementary Table S10, our four proposed methods all control the CPs around 95% well when heteroscedasticity exists (i.e., σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1.2 , 1 ) ). From Supplementary Table S9 and Supplementary Tables S11–S14, we can find that the MSE, W m e d i a n , W m e a n , W i q r , and W s d of our proposed methods under heteroscedasticity are similar to the corresponding results under homoscedasticity (i.e., σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) ) for all simulated situations, which indicates that our proposed methods are still robust when heteroscedasticity is present.

3.4. Application to MCTFR Data

The MCTFR Genome-Wide Association Study of Behavioral Disinhibition is a family-based study of substance abuse and related psychopathology [47]. The dataset can be downloaded from the database of Genotypes and Phenotypes with the accession number phs000620.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1, accessed on 2 February 2023). This dataset contains 2183 families, 7377 participants (3831 females and 3546 males), and 527,829 SNPs. There are five quantitative traits in the dataset: the nicotine composite score, the alcohol consumption composite score, the alcohol dependence composite score (DEP), the illicit drug composite score, and the behavioral disinhibition composite score [48]. Because we only use females for measuring the degree of XCI-S, 3831 females and 12,354 SNPs on the X chromosome were selected. We filtered the data using the following quality control criteria: (1) excluding SNPs with a missing rate > 10%, (2) removing SNPs with a minor allele frequency < 5%, and (3) excluding individuals with a genotype missing rate > 10%. After quality control, 850 families, 3195 females (including 1959 females from 850 families and additional 1236 unrelated females), and 11,344 SNPs were kept to conduct the subsequent analyses.
It is important to note that estimating γ requires the SNPs on the X chromosome to be associated with the traits under study. Therefore, borrowing the idea of the GEMMA method for association analysis on autosomes based on only general pedigrees [27], we propose an improved linear mixed model to test for association on the X chromosome based on the mixed data. We made the following two main modifications: Firstly, we set the relatedness matrix as the block matrix φ in the Materials and Methods section so that the proposed linear mixed model is applicable to the mixture of general pedigrees and additional unrelated females. Secondly, the parameter γ is generally unknown. To consider the XCI, referring to Wang et al. [24], we utilized the grid search method and γ was taken to be {0, 0.5, 1, 1.5, 2} in the increments of 0.5. We used the improved linear mixed model to calculate the p-value for each value of γ , and then combined these five p-values using Cauchy’s method [49] to obtain the final test statistic. We conducted some simulation studies and found that the proposed improved linear mixed model can control the type I error rate well (the details can be seen in Supplementary Appendix SC and Supplementary Table S15). It should be noted that the five quantitative traits in the MCTFR dataset do not follow normal distributions. Therefore, we transformed the traits using the rank-based inverse normal transformation [50] before carrying out association analysis. Furthermore, we incorporated two covariates, age and year of birth, into the improved linear mixed model. The significance level of the association tests was set to be 0.05 / 11344 = 4.41 × 10 6 based on the Bonferroni correction.
The proposed linear mixed model identified three SNPs, rs10522027, rs12860832, and rs12849233, which are associated with the DEP trait at the 4.41 × 10 6 level. The positions, alleles, minor allele frequencies, corresponding traits, p-values, and genes which the three SNPs belong to are presented in Table 6. SNP rs10522027 is found within the gene transmembrane protein 47 (TMEM47), which may be associated with the chemoresistance of breast cancer cells and hepatocellular carcinoma [51]. SNPs rs12860832 and rs12849233 are found in the gene PAS domain containing repressor 1 (PASD1), which might serve as a new target for the prognosis and the future treatment of glioma [52]. Furthermore, we calculated the point estimates ( γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U ) and the 95% HPDIs of γ based on the proposed BNM, BUM, BNP, and BUP methods and the existing BN and BU methods for these three SNPs, where the BNM and BUM methods use the mixed data (850 families and an additional 1236 unrelated females), the BNP and BUP methods utilize only 850 families with 1959 females, and the BN and BU methods are applied to only the additional 1236 unrelated females. The point estimates and the corresponding 95% HPDIs of γ obtained by the six methods for these SNPs are listed in Table 7. It is shown that the six point estimates of γ for the three SNPs are not far away from one, and the corresponding 95% HPDIs all contain one, which means that the XCI patterns of the three SNPs are the XCI-R or the XCI-E. In addition, we can observe the advantage of the BNM and BUM methods because they generally obtain smaller credible intervals than the other four methods, which is consistent with our simulation results. However, the BNP and BUP methods can give shorter HPDIs than the BN and BU methods, which does not coincide with our simulation results. This could be because the number of females in the 850 families is 1959, which is much larger than the number of additional unrelated females (1236).

4. Discussion

In this paper, we consider a generalized linear mixed model and propose two Bayesian methods (BNM and BUM) to estimate the degree of XCI-S (i.e., γ ) based on the mixture of general pedigrees and additional unrelated females for both quantitative and qualitative traits, where the BNM method uses the prior of the truncated normal distribution and the BUM method utilizes the prior of the uniform distribution, which both make full use of the constraint condition of γ [ 0 , 2 ] . When only general pedigrees were available, the BNM and BUM methods were reduced to the BNP and BUP methods, respectively. We do not propose the corresponding Fieller’s method and the Penalized Fieller’s method to estimate the degree of XCI-S based on general pedigrees in this paper, as it has been confirmed that the performance of the above two methods is worse than Bayesian methods for only unrelated females [32]. It is important to note that that the closed form of the posterior distribution of γ is not easily derived, so we applied the HMC algorithm to conduct the posterior sampling process, calculated the mode of the resulting samples as the point estimate of γ , and regarded the HPDI of γ as the credible interval of γ . However, the posterior sampling process based on general pedigrees is very computationally intensive, especially when the dimension of the relatedness matrix (i.e., φ in this paper) is over 1000 [36]. As such, we used the EVD and Cholesky decomposition of φ to speed up the posterior sampling process for quantitative and qualitative traits, respectively. On the other hand, we also considered the median and the percentile interval (the 2.5th and 97.5th percentiles) of the posterior samples as the point estimate and the credible interval of γ , respectively. However, they performed less well than the mode and the HPDI (data not shown for brevity), and then we selected the latter instead.
The simulation results demonstrate that the EVD and Cholesky decomposition can greatly speed up the posterior sampling process, which is important to allow our proposed methods to accommodate large sample sizes, and may be referenced by other Bayesian researchers. The simulation results under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) also show that the BNM and BUM methods have similar performances and are advantageous over the other four methods, which indicates that it is necessary to simultaneously analyze general pedigrees and additional unrelated females when estimating the degree of XCI-S in practice. More specifically, for the point estimation, the MSEs of γ ^ B N M and γ ^ B U M are close to each other and are smaller than those of the four other point estimates. The MSE of γ ^ B N M is the smallest in all the simulated situations. The MSEs of the existing point estimates γ ^ B N and γ ^ B U for unrelated females are slightly smaller than those of γ ^ B N P and γ ^ B U P for general pedigrees when the number of females is fixed. This suggests that general pedigrees provide less information for estimating the degree of XCI-S than unrelated females when the total number of the females in all the pedigrees and that of the unrelated females are the same. For the interval estimation, all six methods (BNM, BUM, BNP, BUP, BN, and BU) control the CPs around 95%. The BNM and BUM methods perform similarly to each other and both obtain much smaller credible intervals ( W m e d i a n and W m e a n ) than the other four methods under all the simulated scenarios. The BNP and BUP methods perform slightly worse than the BN and BU methods when the number of females is fixed, which is consistent with the findings based on the point estimation ( γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U ). For two priors of γ , the performances of the BNM, BNP, and BN methods with the truncated normal prior are slightly better than those of the BUM, BUP, and BU methods with the uniform prior, whereas differences of these kinds are not so obvious in our proposed methods, suggesting that our proposed methods are not as sensitive to the choice of priors. Furthermore, our proposed methods perform better when N p and n I f increase, p f and p m (the frequency of the deleterious allele D ) increase, σ g 2 (the variance of the polygenic effects) decreases, or the trait changes from qualitative to quantitative. When there are missing genotypes for some individuals in pedigrees, the SLINK software based on the peeling algorithm [53] could be used to impute these missing genotypes. However, to make the test statistics in hypothesis testing robust, or the parameter estimation accurate and precise, one may repeatedly impute the missing genotypes using the SLINK software (e.g., 50 imputations), which is very time-consuming for our proposed Bayesian methods. On the other hand, it is easy to combine 50 resulting point estimates of γ by taking the mean, median, or mode of them as the final point estimate; however, there appears to be an issue with the process of combining the 50 resulting credible intervals. Therefore, when the genotypes of some individuals in the collected pedigrees were missing, we did not impute these missing genotypes. Instead, we chose to delete the individuals with missing genotypes directly. In fact, the simulation results show that, even when the genotypes of approximately 40% of individuals in general pedigrees are missing, our proposed methods can still control the CPs well, indicating that our proposed methods are robust when there are missing genotypes in the data. The simulation results also show that our proposed methods still work when the frequency of the deleterious allele in females and that in males are different (i.e., p f , p m = 0.3 , 0.1 and 0.1 , 0.3 ). Furthermore, when heteroscedasticity exists (i.e., σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1.2 , 1 ) ), our proposed methods remain robust.
The proposed methods have the following issues to be discussed: Firstly, it is well known that the prior distributions of unknown parameters are important in Bayesian inference, and the choice of them may affect the results. In this paper, we consider two priors for γ , a non-informative prior U ( 0 , 2 ) which has little effect on the posterior distribution of γ , and a truncated normal distribution N 1,1 [ 0 , 2 ] based on the genetic background of XCI. We also take account of non-informative priors for regression coefficients and weak priors for variances. In practical applications, researchers can choose appropriate priors according to their research background. Secondly, the Bayesian method adopts the HMC algorithm for the posterior sampling process, which is not greatly influenced by the correlations among unknown parameters. Therefore, for computational efficiency, we assume that all unknown parameters are unrelated. However, Bayesian methods should have better performance if the correlation between parameters is considered. Thirdly, the HPDIs that contain the number one can only indicate that the SNP undergoes the XCI-R or XCI-E pattern. The process of further distinguishing the XCI-R and XCI-E patterns is a potential problem to be solved. Fourthly, Ma et al. [23] claimed that the variance of the quantitative trait under study for heterozygous females may be higher than that for homozygous females in some cases. For computational efficiency, we assumed that the variances of quantitative traits for different genotypes in females are the same in our proposed methods.
To address the issues mentioned above, we will consider the following improvements in the future: Firstly, we will take into account non-informative priors for variances, such as non-informative Gamma prior or inverse-Gamma prior [41], to improve our proposed methods. Secondly, we will use the Gibbs sampling algorithm [54] to conduct the Bayesian posterior sampling process when the parameters are correlated. Thirdly, the information from the XCI-E can be estimated using the difference of transcriptional dosage on the X chromosome between male hemizygotes and female homozygotes. Therefore, we will incorporate the information from males into our model to further distinguish the XCI-E from the XCI-R. Fourthly, although we have completed some simulation studies showing that our proposed methods are still robust in the presence of heteroscedasticity (Supplementary Tables S9–S14), we will extend our proposed methods to manage the situation of heteroscedasticity to further improve the precision and the accuracy of estimating the degree of XCI-S in the future. Finally, besides the GEMMA [27], we understand that the REGENIE method for autosomal SNPs [55] could take into account population stratification. Therefore, we will extend it to test for the association between X chromosomal SNPs and traits based on the mixed data in the future.

5. Conclusions

In summary, we propose a Bayesian method with two priors (the truncated normal prior and the uniform prior) to estimate the degree of XCI-S based on the mixture of general pedigrees and additional unrelated females, which are denoted by the BNM and BUM methods, respectively. We also develop the corresponding Bayesian method, which is suitable for only general pedigrees, denoted by the BNP and BUP methods. We conducted an extensive simulation study to compare the performance of our four proposed methods with the two existing BN and BU methods. The simulation results show that the BNM method obtains the smallest MSE, the shortest width of the HPDIs, and the most stable CPs, which indicates that it is more efficient in estimating the degree of XCI-S by simultaneously using general pedigrees and additional unrelated females. Finally, we applied the proposed methods to the MCTFR data, and found that three associated SNPs (rs10522027, rs12860832, and rs12849233) undergo the XCI-R or XCI-E pattern.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom13030543/s1. Supplementary Appendix SA: Using the EVD to speed up the posterior sampling process for quantitative traits; Supplementary Appendix SB: Using Cholesky decomposition to speed up the posterior sampling process for qualitative traits; Supplementary Appendix SC: Simulation study of the type I error rate for our proposed improved linear mixed model; Tables S1 and S2: W m e a n s and W s d s of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively; Table S3: Mean squared errors (MSEs) of point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , and γ ^ B U P under p f , p m = 0.3 , 0.1 and 0.1 , 0.3 , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data and only general pedigrees, respectively; Tables S4–S8: Coverage probabilities (CPs, in %), W m e d i a n s, W m e a n s, W i q r s and W s d s of the BNM, BUM, BNP, and BUP methods under p f , p m = 0.3 , 0.1 and 0.1 , 0.3 , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data and only general pedigrees, respectively; Table S9: Mean squared errors (MSEs) of point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , and γ ^ B U P under p f = p m , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) and ( 1 , 1.2 , 1 ) among 500 replicates for quantitative traits; Tables S10–S14: Coverage probabilities (CPs, in %), W m e d i a n s, W m e a n s, W i q r s, and W s d s of the BNM, BUM, BNP, and BUP under p f = p m , and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) and ( 1 , 1.2 , 1 ) methods among 500 replicates for quantitative traits; Table S15: Type I error rate of our proposed improved linear mixed model under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 1000 replicates. Figures S1–S14: Scatter plots of six point estimates of γ against true values of γ with N p = 150 and 600, n I f = 650 and 2600, p f = p m = 0.3 and 0.1, σ g 2 = 1 / 3 and 1, σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for the quantitative or qualitative traits; Figures S15–S28: Scatter plots of widths of HPDIs based on six methods against true values of γ with N p = 150 and 600, n I f = 650 and 2600, p f = p m = 0.3 and 0.1, σ g 2 = 1 / 3 and 1, σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for the quantitative or qualitative traits.

Author Contributions

Conceptualization, J.-Y.Z.; methodology, Y.-F.K. and S.-Z.L.; software, Y.-F.K., S.-Z.L. and J.-Y.Z.; validation, K.-W.W., B.Z., Y.-X.Y. and M.-K.L.; writing original draft, Y.-F.K., S.-Z.L. and K.-W.W.; review and edit, B.Z., Y.-X.Y., M.-K.L. and J.-Y.Z.; supervision, J.-Y.Z.; project administration, J.-Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 82173619, the Guangdong Basic and Applied Basic Research Foundation, grant number 2023A1515011242, and the Science and Technology Planning Project of Guangdong Province, grant number 2020B1212030008.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The R package BEMXCIS for the BNM, BUM, BNP and BUP methods is freely available at https://github.com/Yi-FanKong/BEMXCIS (accessed on 2 February 2023), which is implemented by R software (version 4.1.2). The MCTFR data used for this study can be found on the database of Genotypes and Phenotypes (dbGaP) with the accession number phs000620.v1.p1 and the dbGaP request number 86747-7 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1, accessed on 2 February 2023).

Acknowledgments

The Minnesota Center for Twin and Family Research (MCTFR) was supported by the National Institute on Drug Abuse grant no. U01 DA024417. The sample ascertainment and data collection in MCTFR data were supported by the National Institute on Drug Abuse grant nos. R37 DA05147 and R01 DA13240, the National Institute on Alcohol Abuse and Alcoholism grant nos. R01 AA09367 and R01 AA11886, and the National Institute of Mental Health grant no. R01 MH66140.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 1961, 190, 372–373. [Google Scholar] [CrossRef]
  2. Zito, A.; Davies, M.N.; Tsai, P.C.; Roberts, S.; Andres-Ejarque, R.; Nardone, S.; Bell, J.T.; Wong, C.; Small, K.S. Heritability of skewed X-inactivation in female twins is tissue-specific and associated with age. Nat. Commun. 2019, 10, 5339. [Google Scholar] [CrossRef] [Green Version]
  3. Amos-Landgraf, J.M.; Cottle, A.; Plenge, R.M.; Friez, M.; Schwartz, C.E.; Longshore, J.; Willard, H.F. X chromosome-inactivation patterns of 1,005 phenotypically unaffected females. Am. J. Hum. Genet. 2006, 79, 493–499. [Google Scholar] [CrossRef] [Green Version]
  4. Peeters, S.B.; Cotton, A.M.; Brown, C.J. Variable escape from X-chromosome inactivation: Identifying factors that tip the scales towards expression. Bioessays 2014, 36, 746–756. [Google Scholar] [CrossRef] [Green Version]
  5. Posynick, B.J.; Brown, C.J. Escape from X-chromosome inactivation: An evolutionary perspective. Front. Cell Dev. Biol. 2019, 7, 241. [Google Scholar] [CrossRef] [Green Version]
  6. Deng, X.; Berletch, J.B.; Nguyen, D.K.; Disteche, C.M. X chromosome regulation: Diverse patterns in development, tissues and disease. Nat. Rev. Genet. 2014, 15, 367–378. [Google Scholar] [CrossRef] [PubMed]
  7. Medema, R.H.; Burgering, B.M. The X factor: Skewing X inactivation towards cancer. Cell 2007, 129, 1253–1254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Shvetsova, E.; Sofronova, A.; Monajemi, R.; Gagalova, K.; Draisma, H.; White, S.J.; Santen, G.; Chuva, D.S.L.S.; Heijmans, B.T.; van Meurs, J.; et al. Skewed X-inactivation is common in the general female population. Eur. J. Hum. Genet. 2019, 27, 455–465. [Google Scholar]
  9. Bajic, V.; Mandusic, V.; Stefanova, E.; Bozovic, A.; Davidovic, R.; Zivkovic, L.; Cabarkapa, A.; Spremo-Potparevic, B. Skewed X-chromosome inactivation in women affected by Alzheimer’s disease. J. Alzheimers Dis. 2015, 43, 1251–1259. [Google Scholar] [CrossRef]
  10. Giliberto, F.; Radic, C.P.; Luce, L.; Ferreiro, V.; de Brasi, C.; Szijan, I. Symptomatic female carriers of duchenne muscular dystrophy (DMD): Genetic and clinical characterization. J. Neurol. Sci. 2014, 336, 36–41. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Li, G.; Jin, T.; Liang, H.; Tu, Y.; Zhang, W.; Gong, L.; Su, Q.; Gao, G. Skewed X-chromosome inactivation in patients with esophageal carcinoma. Diagn. Pathol. 2013, 8, 55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Sangha, K.K.; Stephenson, M.D.; Brown, C.J.; Robinson, W.P. Extremely skewed X-chromosome inactivation is increased in women with recurrent spontaneous abortion. Am. J. Hum. Genet. 1999, 65, 913–917. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Simmonds, M.J.; Kavvoura, F.K.; Brand, O.J.; Newby, P.R.; Jackson, L.E.; Hargreaves, C.E.; Franklyn, J.A.; Gough, S.C. Skewed X chromosome inactivation and female preponderance in autoimmune thyroid disease: An association study and meta-analysis. J. Clin. Endocrinol. Metab. 2014, 99, E127–E131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Sun, Z.; Fan, J.; Wang, Y. X-chromosome inactivation and related diseases. Genet. Res. 2022, 2022, 1391807. [Google Scholar] [CrossRef]
  15. Okumura, K.; Fujimori, Y.; Takagi, A.; Murate, T.; Ozeki, M.; Yamamoto, K.; Katsumi, A.; Matsushita, T.; Naoe, T.; Kojima, T. Skewed X chromosome inactivation in fraternal female twins results in moderately severe and mild haemophilia B. Haemophilia 2008, 14, 1088–1093. [Google Scholar] [CrossRef]
  16. Ozbalkan, Z.; Bagislar, S.; Kiraz, S.; Akyerli, C.B.; Ozer, H.T.; Yavuz, S.; Birlik, A.M.; Calguneri, M.; Ozcelik, T. Skewed X chromosome inactivation in blood cells of women with scleroderma. Arthritis Rheum. 2005, 52, 1564–1570. [Google Scholar] [CrossRef]
  17. Chen, Z.; Ng, H.K.; Li, J.; Liu, Q.; Huang, H. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat. Methods Med. Res. 2017, 26, 567–582. [Google Scholar] [CrossRef]
  18. Chen, B.; Craiu, R.V.; Strug, L.J.; Sun, L. The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies. Genet. Epidemiol. 2021, 45, 694–709. [Google Scholar] [CrossRef]
  19. Clayton, D. Testing for association on the X chromosome. Biostatistics 2008, 9, 593–600. [Google Scholar] [CrossRef]
  20. Deng, W.Q.; Mao, S.; Kalnapenkis, A.; Esko, T.; Sun, L. Analytical strategies to include the X-chromosome in variance heterogeneity analyses: Evidence for trait-specific polygenic variance structure. Genet. Epidemiol. 2019, 43, 815–830. [Google Scholar] [CrossRef]
  21. Ding, J.; Lin, S.; Liu, Y. Monte carlo pedigree disequilibrium test for markers on the X chromosome. Am. J. Hum. Genet. 2006, 79, 567–573. [Google Scholar] [CrossRef] [Green Version]
  22. Gao, F.; Chang, D.; Biddanda, A.; Ma, L.; Guo, Y.; Zhou, Z.; Keinan, A. XWAS: A software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 2015, 106, 666–671. [Google Scholar] [CrossRef] [Green Version]
  23. Ma, L.; Hoffman, G.; Keinan, A. X-inactivation informs variance-based testing for X-linked association of a quantitative trait. BMC Genom. 2015, 16, 241. [Google Scholar] [CrossRef] [Green Version]
  24. Wang, J.; Yu, R.; Shete, S. X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet. Epidemiol. 2014, 38, 483–493. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Zhang, L.; Martin, E.R.; Morris, R.W.; Li, Y.J. Association test for x-linked QTL in family-based designs. Am. J. Hum. Genet. 2009, 84, 431–444. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Zheng, G.; Joo, J.; Zhang, C.; Geller, N.L. Testing association for markers on the X chromosome. Genet. Epidemiol. 2007, 31, 834–843. [Google Scholar] [CrossRef]
  27. Zhou, X.; Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012, 44, 821–824. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Xu, S.Q.; Zhang, Y.; Wang, P.; Liu, W.; Wu, X.B.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on family trios. BMC Genet. 2018, 19, 109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Wang, P.; Zhang, Y.; Wang, B.Q.; Li, J.L.; Wang, Y.X.; Pan, D.; Wu, X.B.; Fung, W.K.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on case-control design. BMC Bioinform. 2019, 20, 11. [Google Scholar] [CrossRef]
  30. Li, B.H.; Yu, W.Y.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data. BMC Genom. Data 2021, 22, 24. [Google Scholar] [CrossRef] [PubMed]
  31. Wang, P.; Xu, S.; Wang, Y.X.; Wu, B.; Fung, W.K.; Gao, G.; Liang, Z.; Liu, N. Penalized fieller’s confidence interval for the ratio of bivariate normal means. Biometrics 2021, 77, 1355–1368. [Google Scholar] [CrossRef] [PubMed]
  32. Yu, W.Y.; Zhang, Y.; Li, M.K.; Yang, Z.Y.; Fung, W.K.; Zhao, P.Z.; Zhou, J.Y. BEXCIS: Bayesian methods for estimating the degree of the skewness of X chromosome inactivation. BMC Bioinform. 2022, 23, 193. [Google Scholar] [CrossRef] [PubMed]
  33. Zhou, H.; Zhou, J.; Sobel, E.M.; Lange, K. Fast genome-wide pedigree quantitative trait loci analysis using mendel. BMC Proc. 2014, 8, S93. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Zhou, H.; Blangero, J.; Dyer, T.D.; Chan, K.K.; Lange, K.; Sobel, E.M. Fast genome-wide QTL association mapping on pedigree and population data. Genet. Epidemiol. 2017, 41, 174–186. [Google Scholar] [CrossRef] [Green Version]
  35. Annis, J.; Miller, B.J.; Palmeri, T.J. Bayesian inference with stan: A tutorial on adding custom distributions. Behav. Res. Methods 2017, 49, 863–886. [Google Scholar] [CrossRef] [Green Version]
  36. Zhao, J.H.; Luan, J.A.; Congdon, P. Bayesian linear mixed models with polygenic effects. J. Stat. Softw. 2018, 85, 1–27. [Google Scholar] [CrossRef]
  37. Lange, K. Mathematical and Statistical Methods for Genetic Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2002; p. 384. [Google Scholar]
  38. Sinnwell, J.P.; Therneau, T.M.; Schaid, D.J. The kinship2 R package for pedigree data. Hum. Hered. 2014, 78, 91–93. [Google Scholar] [CrossRef] [Green Version]
  39. Bae, H.T.; Perls, T.T.; Sebastiani, P. An efficient technique for Bayesian modeling of family data using the bugs software. Front. Genet. 2014, 5, 390. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Runcie, D.E.; Crawford, L. Fast and flexible linear mixed models for genome-wide genetics. PLoS Genet. 2019, 15, e1007978. [Google Scholar] [CrossRef] [Green Version]
  41. Kruschke, J.K. Bayesian data analysis. Wiley Interdiscip. Rev.-Cogn. Sci. 2010, 1, 658–676. [Google Scholar] [CrossRef]
  42. Ma, C.; Boehnke, M.; Lee, S. Evaluating the calibration and power of three gene-based association tests of rare variants for the X chromosome. Genet. Epidemiol. 2015, 39, 499–508. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Schifano, E.D.; Epstein, M.P.; Bielak, L.F.; Jhun, M.A.; Kardia, S.L.; Peyser, P.A.; Lin, X. SNP set association analysis for familial data. Genet. Epidemiol. 2012, 36, 797–810. [Google Scholar] [CrossRef] [Green Version]
  44. Won, S.; Lange, C. A general framework for robust and efficient association analysis in family-based designs: Quantitative and dichotomous phenotypes. Stat. Med. 2013, 32, 4482–4498. [Google Scholar] [CrossRef] [PubMed]
  45. Saad, M.; Wijsman, E.M. Association score testing for rare variants and binary traits in family data with shared controls. Brief. Bioinform. 2019, 20, 245–253. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Vehtari, A.; Gelman, A.; Simpson, D.; Carpenter, B.; Bürkner, P. Rank-normalization, folding, and localization: An improved R for assessing convergence of MCMC. Bayesian Anal. 2020, 16, 667–718. [Google Scholar] [CrossRef]
  47. Miller, M.B.; Basu, S.; Cunningham, J.; Eskin, E.; Malone, S.M.; Oetting, W.S.; Schork, N.; Sul, J.H.; Iacono, W.G.; Mcgue, M. The Minnesota Center for Twin and Family Research genome-wide association study. Twin Res. Hum. Genet. 2012, 15, 767–774. [Google Scholar] [CrossRef] [Green Version]
  48. Mcgue, M.; Zhang, Y.; Miller, M.B.; Basu, S.; Vrieze, S.; Hicks, B.; Malone, S.; Oetting, W.S.; Iacono, W.G. A genome-wide association study of behavioral disinhibition. Behav. Genet. 2013, 43, 363–373. [Google Scholar] [CrossRef] [Green Version]
  49. Liu, Y.; Chen, S.; Li, Z.; Morrison, A.C.; Boerwinkle, E.; Lin, X. ACAT: A fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 2019, 104, 410–421. [Google Scholar] [CrossRef] [Green Version]
  50. Mccaw, Z.R.; Lane, J.M.; Saxena, R.; Redline, S.; Lin, X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 2020, 76, 1262–1272. [Google Scholar] [CrossRef]
  51. Ng, K.T.; Yeung, O.W.; Liu, J.; Li, C.X.; Liu, H.; Liu, X.B.; Qi, X.; Ma, Y.Y.; Lam, Y.F.; Lau, M.Y.; et al. Clinical significance and functional role of transmembrane protein 47 (TMEM47) in chemoresistance of hepatocellular carcinoma. Int. J. Oncol. 2020, 57, 956–966. [Google Scholar] [CrossRef]
  52. Li, R.; Guo, M.; Song, L. PAS domain containing repressor 1 (PASD1) promotes glioma cell proliferation through inhibiting apoptosis in vitro. Med. Sci. Monitor 2019, 25, 6955–6964. [Google Scholar] [CrossRef] [PubMed]
  53. Weeks, D.E.; Ott, J.; Lathrop, G.M. SLINK: A general simulation program for linkage analysis. Am. J. Hum. Genet. 1990, 47, A204. [Google Scholar]
  54. Cheng, H.; Qu, L.; Garrick, D.J.; Fernando, R.L. A fast and efficient Gibbs sampler for Bayes in whole-genome analyses. Genet. Sel. Evol. 2015, 47, 80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Mbatchou, J.; Barnard, L.; Backman, J.; Marcketta, A.; Kosmicki, J.A.; Ziyatdinov, A.; Benner, C.; O’Dushlaine, C.; Barber, M.; Boutkov, B.; et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 2021, 53, 1097–1103. [Google Scholar] [CrossRef]
Figure 1. Pedigree structure used for the simulation studies. The squares are males, the circles are females and the rhombus could be any gender. The numbers are used to encode the family members. (a) Nuclear family; (b) three-generation family; and (c) four-generation family.
Figure 1. Pedigree structure used for the simulation studies. The squares are males, the circles are females and the rhombus could be any gender. The numbers are used to encode the family members. (a) Nuclear family; (b) three-generation family; and (c) four-generation family.
Biomolecules 13 00543 g001
Figure 2. Scatter plots of six point estimates of γ against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the point estimate of γ , respectively. (a) γ ^ B N M with M R = 0 ; (b) γ ^ B N M with M R = 0.4 ; (c) γ ^ B U M with M R = 0 ; (d) γ ^ B U M with M R = 0.4 ; (e) γ ^ B N P with M R = 0 ; (f) γ ^ B N P with M R = 0.4 ; (g) γ ^ B U P with M R = 0 ; (h) γ ^ B U P with M R = 0.4 ; (i) γ ^ B N with M R = 0 ; (j) γ ^ B N with M R = 0.4 ; (k) γ ^ B U with M R = 0 ; and (l) γ ^ B U with M R = 0.4 .
Figure 2. Scatter plots of six point estimates of γ against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the point estimate of γ , respectively. (a) γ ^ B N M with M R = 0 ; (b) γ ^ B N M with M R = 0.4 ; (c) γ ^ B U M with M R = 0 ; (d) γ ^ B U M with M R = 0.4 ; (e) γ ^ B N P with M R = 0 ; (f) γ ^ B N P with M R = 0.4 ; (g) γ ^ B U P with M R = 0 ; (h) γ ^ B U P with M R = 0.4 ; (i) γ ^ B N with M R = 0 ; (j) γ ^ B N with M R = 0.4 ; (k) γ ^ B U with M R = 0 ; and (l) γ ^ B U with M R = 0.4 .
Biomolecules 13 00543 g002
Figure 3. Scatter plots of six point estimates of γ against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the point estimate of γ , respectively. (a) γ ^ B N M with M R = 0 ; (b) γ ^ B N M with M R = 0.4 ; (c) γ ^ B U M with M R = 0 ; (d) γ ^ B U M with M R = 0.4 ; (e) γ ^ B N P with M R = 0 ; (f) γ ^ B N P with M R = 0.4 ; (g) γ ^ B U P with M R = 0 ; (h) γ ^ B U P with M R = 0.4 ; (i) γ ^ B N with M R = 0 ; (j) γ ^ B N with M R = 0.4 ; (k) γ ^ B U with M R = 0 ; and (l) γ ^ B U with M R = 0.4 .
Figure 3. Scatter plots of six point estimates of γ against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the point estimate of γ , respectively. (a) γ ^ B N M with M R = 0 ; (b) γ ^ B N M with M R = 0.4 ; (c) γ ^ B U M with M R = 0 ; (d) γ ^ B U M with M R = 0.4 ; (e) γ ^ B N P with M R = 0 ; (f) γ ^ B N P with M R = 0.4 ; (g) γ ^ B U P with M R = 0 ; (h) γ ^ B U P with M R = 0.4 ; (i) γ ^ B N with M R = 0 ; (j) γ ^ B N with M R = 0.4 ; (k) γ ^ B U with M R = 0 ; and (l) γ ^ B U with M R = 0.4 .
Biomolecules 13 00543 g003
Figure 4. Scatter plots of widths of HPDIs based on six methods against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the width of the HPDI of γ , respectively. (a) BNM with M R = 0 ; (b) BNM with M R = 0.4 ; (c) BUM with M R = 0 ; (d) BUM with M R = 0.4 ; (e) BNP with M R = 0 ; (f) BNP with M R = 0.4 ; (g) BUP with M R = 0 ; (h) BUP with M R = 0.4 ; (i) BN with M R = 0 ; (j) BN with M R = 0.4 ; (k) BU with M R = 0 ; and (l) BU with M R = 0.4 .
Figure 4. Scatter plots of widths of HPDIs based on six methods against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the width of the HPDI of γ , respectively. (a) BNM with M R = 0 ; (b) BNM with M R = 0.4 ; (c) BUM with M R = 0 ; (d) BUM with M R = 0.4 ; (e) BNP with M R = 0 ; (f) BNP with M R = 0.4 ; (g) BUP with M R = 0 ; (h) BUP with M R = 0.4 ; (i) BN with M R = 0 ; (j) BN with M R = 0.4 ; (k) BU with M R = 0 ; and (l) BU with M R = 0.4 .
Biomolecules 13 00543 g004
Figure 5. Scatter plots of widths of HPDIs based on six methods against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the width of the HPDI of γ , respectively. (a) BNM with M R = 0 ; (b) BNM with M R = 0.4 ; (c) BUM with M R = 0 ; (d) BUM with M R = 0.4 ; (e) BNP with M R = 0 ; (f) BNP with M R = 0.4 ; (g) BUP with M R = 0 ; (h) BUP with M R = 0.4 ; (i) BN with M R = 0 ; (j) BN with M R = 0.4 ; (k) BU with M R = 0 ; and (l) BU with M R = 0.4 .
Figure 5. Scatter plots of widths of HPDIs based on six methods against true values of γ with N p = 150 , n I f = 650 , p f = p m = 0.3 , σ g 2 = 1 / 3 , σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) , and M R = { 0 , 0.4 } for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of γ and that of the width of the HPDI of γ , respectively. (a) BNM with M R = 0 ; (b) BNM with M R = 0.4 ; (c) BUM with M R = 0 ; (d) BUM with M R = 0.4 ; (e) BNP with M R = 0 ; (f) BNP with M R = 0.4 ; (g) BUP with M R = 0 ; (h) BUP with M R = 0.4 ; (i) BN with M R = 0 ; (j) BN with M R = 0.4 ; (k) BU with M R = 0 ; and (l) BU with M R = 0.4 .
Biomolecules 13 00543 g005
Table 1. Mean running time of the BNP method with a posterior sampling process based on EVD or Cholesky decomposition and an original posterior sampling process for general pedigrees.
Table 1. Mean running time of the BNP method with a posterior sampling process based on EVD or Cholesky decomposition and an original posterior sampling process for general pedigrees.
N p Quantitative TraitQualitative Trait
EVD aOriginal bCholesky aOriginal b
15010 s1.8 h1.3 min7.2 h
60040 s7 days47 min30 days
a The mean running time is based on 500 replicates; b the mean running time is based on 10 replicates.
Table 2. Mean squared errors (MSEs) of point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Table 2. Mean squared errors (MSEs) of point estimates γ ^ B N M , γ ^ B U M , γ ^ B N P , γ ^ B U P , γ ^ B N , and γ ^ B U under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Trait ( N p , n I f ) p f σ g 2 M R Mixed DataPedigreesUnrelated
Females
γ ^ B N M γ ^ B U M γ ^ B N P γ ^ B U P γ ^ B N γ ^ B U
Quantitative(150, 650)0.31/300.06430.07070.11670.13420.11230.1258
0.31/30.40.09430.10320.15640.17570.15320.1704
0.3100.08890.09660.15280.16460.13910.1556
0.310.40.13230.14730.20860.24090.20170.2354
0.11/300.18500.19680.29590.32470.22840.2499
0.11/30.40.24550.26700.37810.43250.33040.3706
0.1100.20100.21920.33990.37920.32600.3595
0.110.40.27540.30640.42240.49000.40460.4709
(600, 2600)0.31/300.02290.02360.04070.04210.03770.0383
0.31/30.40.03590.03770.05700.06060.05580.0596
0.3100.02560.02600.05430.05760.05400.0571
0.310.40.04160.04450.08050.08790.07640.0813
0.11/300.07860.08460.12100.13000.11690.1209
0.11/30.40.11470.12050.16890.17500.15930.1684
0.1100.09620.10390.16500.17730.15120.1660
0.110.40.13530.14150.20800.22520.19100.2056
Qualitative(150, 650)0.31/300.08620.09260.15510.17740.14990.1621
0.31/30.40.12430.12980.21810.25930.19220.2197
0.3100.11750.12490.21120.23810.19230.2138
0.310.40.15880.18630.27680.34050.24590.2793
0.11/300.26540.28150.39110.43430.38220.4107
0.11/30.40.40510.42500.54030.61220.53830.6103
0.1100.29100.33160.43420.51590.43220.4921
0.110.40.44300.50200.60240.69670.59780.6838
(600, 2600)0.31/300.03440.03470.05510.05990.05400.0562
0.31/30.40.05640.05810.08470.09210.08110.0885
0.3100.04410.04590.08160.08720.06670.0700
0.310.40.07270.07430.11540.12610.10450.1091
0.11/300.10920.12140.24050.25460.15490.1615
0.11/30.40.15770.16650.35850.37220.22250.2273
0.1100.14290.14960.24270.25700.17180.1832
0.110.40.17830.20170.36140.38570.24720.2614
Table 3. Coverage probabilities (CPs, in %) of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively a.
Table 3. Coverage probabilities (CPs, in %) of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively a.
Trait ( N p , n I f ) p f σ g 2 M R Mixed DataPedigreesUnrelated
Females
BNMBUMBNPBUPBNBU
Quantitative(150, 650)0.31/3094.696.295.496.295.294.8
0.31/30.495.095.695.295.695.496.0
0.31095.095.093.894.695.295.2
0.310.494.495.294.295.094.094.4
0.11/3093.893.695.894.495.095.2
0.11/30.493.495.495.094.294.495.4
0.11096.294.694.095.094.294.8
0.110.494.495.894.295.493.894.2
(600, 2600)0.31/3094.294.294.094.494.895.0
0.31/30.494.294.495.695.495.695.4
0.31095.096.495.695.895.295.2
0.310.495.895.894.694.695.494.8
0.11/3094.294.295.094.694.495.4
0.11/30.494.095.094.695.295.296.2
0.11095.695.695.696.093.695.2
0.110.495.095.494.895.695.694.6
Qualitative(150, 650)0.31/3095.894.895.094.894.295.2
0.31/30.494.895.094.295.094.695.2
0.31095.295.695.494.895.095.6
0.310.494.896.494.295.494.893.8
0.11/3095.494.895.294.895.494.6
0.11/30.495.295.094.895.294.895.6
0.11093.694.695.095.895.494.8
0.110.494.895.294.694.895.095.4
(600, 2600)0.31/3095.494.895.094.896.295.6
0.31/30.493.895.294.295.495.094.2
0.31094.294.295.094.894.295.4
0.310.495.695.295.095.695.894.4
0.11/3093.694.696.094.494.695.8
0.11/30.495.095.094.495.694.095.0
0.11094.293.895.293.495.295.0
0.110.495.695.294.495.495.095.8
a The empirical CP should be between 93.05% and 96.95% ( 0.95 ± 2 × 0.95 × 0.05 500 ) with 95% probability.
Table 4. W m e d i a n s of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and unrelated females, respectively.
Table 4. W m e d i a n s of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and unrelated females, respectively.
Trait ( N p , n I f ) p f σ g 2 M R Mixed DataPedigreesUnrelated
Females
BNMBUMBNPBUPBNBU
Quantitative(150, 650)0.31/300.97700.98151.21521.23361.21031.2249
0.31/30.41.16011.18011.44671.47811.39371.4373
0.3101.06271.06361.36671.39661.34051.3653
0.310.41.25721.26271.55251.60171.52601.5821
0.11/301.42581.44521.58631.63051.57291.6297
0.11/30.41.55021.59351.67201.72011.65841.7103
0.1101.54531.59401.67131.71661.66371.7112
0.110.41.64931.69991.71091.76331.71061.7629
(600, 2600)0.31/300.53500.53780.73320.73240.72920.7278
0.31/30.40.66580.66420.88940.88400.86390.8650
0.3100.62160.62720.87540.88610.83880.8433
0.310.40.77550.78681.03891.05151.02651.0282
0.11/301.00731.02171.23401.25051.22011.2450
0.11/30.41.14941.17901.36331.38851.35381.3814
0.1101.12081.12981.34681.36541.33921.3562
0.110.41.30671.33131.46641.50541.46341.5038
Qualitative(150, 650)0.31/301.07191.08211.41111.43531.39911.4332
0.31/30.41.28571.31031.61861.65731.57671.6258
0.3101.22241.23941.57031.61781.55591.6176
0.310.41.42061.45311.67041.72881.66441.7193
0.11/301.51671.54851.65621.70931.65511.6972
0.11/30.41.59141.64491.70381.75431.67251.7236
0.1101.61211.67571.71351.76491.69071.7450
0.110.41.67511.72841.72031.76901.71601.7642
(600, 2600)0.31/300.68880.69200.86480.86840.83690.8448
0.31/30.40.82070.82741.05301.05551.01171.0057
0.3100.77810.77771.02351.01990.95270.9559
0.310.40.97140.97391.19881.20891.15221.1660
0.11/301.14141.15501.31641.34471.30271.3380
0.11/30.41.30281.31841.44631.48311.41821.4421
0.1101.26141.28481.40631.43361.40221.4294
0.110.41.40441.43281.52271.56811.51851.5555
Table 5. W i q r s of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Table 5. W i q r s of the BNM, BUM, BNP, BUP, BN, and BU methods under p f = p m and σ e 0 2 , σ e 1 2 , σ e 2 2 = ( 1 , 1 , 1 ) among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Trait ( N p , n I f ) p f σ g 2 M R Mixed DataPedigreesUnrelated
Females
BNMBUMBNPBUPBNBU
Quantitative(150, 650)0.31/300.33330.37000.46200.52680.45300.5122
0.31/30.40.41060.46800.47600.54660.44350.4987
0.3100.42180.47550.47390.54180.42910.5261
0.310.40.43510.51760.41620.46730.40690.4314
0.11/300.40580.46200.33090.36030.31930.3392
0.11/30.40.33890.38120.26030.26900.22670.2243
0.1100.33450.38170.26120.28060.25860.2585
0.110.40.24670.25850.16180.16580.15900.1488
(600, 2600)0.31/300.15900.16290.23910.26550.22720.2582
0.31/30.40.21840.23440.32950.38030.28230.3190
0.3100.19460.20870.32410.37950.29680.3297
0.310.40.27330.30800.38920.44340.36430.4125
0.11/300.37850.43800.38780.43450.37900.4319
0.11/30.40.43860.47440.39980.46490.38800.4338
0.1100.37440.42970.41010.44890.36700.4449
0.110.40.36530.39610.36400.41600.35240.4152
Qualitative(150, 650)0.31/300.42900.49220.48410.53990.45610.5148
0.31/30.40.45420.53140.39880.45120.39020.4319
0.3100.45390.52040.40090.44660.39700.4309
0.310.40.44240.50620.29800.31320.28930.3005
0.11/300.38110.43250.28660.30860.26940.2752
0.11/30.40.31180.36360.28220.33420.22450.2540
0.1100.31950.33870.22390.20820.18560.1867
0.110.40.24680.29130.19730.22090.19320.2027
(600, 2600)0.31/300.18860.21090.29290.32320.27150.3036
0.31/30.40.25800.28840.39470.44780.35360.4152
0.3100.26800.29370.38010.40090.36170.3981
0.310.40.35760.40610.44040.51070.40600.4630
0.11/300.38260.43680.34210.40350.33500.3903
0.11/30.40.33330.37000.46200.52680.45300.5122
0.1100.41060.46800.47600.54660.44350.4987
0.110.40.42180.47550.47390.54180.42910.5261
Table 6. SNPs detected in association analysis for the MCTFR data.
Table 6. SNPs detected in association analysis for the MCTFR data.
SNPPositionAllelesMAF aTraitp-ValueGene
rs1052202734630163G > A0.141DEP 3.64 × 10 7 TMEM47
rs12860832151643064G > A0.263DEP 2.00 × 10 6 PASD1
rs12849233151645704C > A0.329DEP 1.26 × 10 6 PASD1
a MAF represents the minor allele frequency.
Table 7. Application of the six methods to SNPs detected in association analysis for the MCTFR data.
Table 7. Application of the six methods to SNPs detected in association analysis for the MCTFR data.
SNPPoint Estimate95% HPDI
γ ^ B N M γ ^ B U M γ ^ B N P γ ^ B U P γ ^ B N γ ^ B U BNMBUMBNPBUPBNBU
rs105220270.69220.68950.63940.64940.72380.7429(0.2451, 1.3518)(0.2316, 1.4420)(0.0156, 1.5816)(0.0063, 1.5567)(0.1791, 1.6615)(0.1870, 1.6384)
rs128608320.83710.82880.94220.94480.72810.7200(0.3266, 1.4935)(0.3942, 1.5788)(0.1878, 1.6258)(0.2077, 1.6698)(0.0945, 1.6294)(0.1214, 1.6503)
rs128492330.76330.74260.88430.87360.69060.6968(0.2236, 1.2934)(0.2133, 1.3054)(0.1054, 1.5392)(0.1361, 1.5964)(0.0211, 1.5229)(0.0764, 1.5490)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kong, Y.-F.; Li, S.-Z.; Wang, K.-W.; Zhu, B.; Yuan, Y.-X.; Li, M.-K.; Zhou, J.-Y. An Efficient Bayesian Method for Estimating the Degree of the Skewness of X Chromosome Inactivation Based on the Mixture of General Pedigrees and Unrelated Females. Biomolecules 2023, 13, 543. https://doi.org/10.3390/biom13030543

AMA Style

Kong Y-F, Li S-Z, Wang K-W, Zhu B, Yuan Y-X, Li M-K, Zhou J-Y. An Efficient Bayesian Method for Estimating the Degree of the Skewness of X Chromosome Inactivation Based on the Mixture of General Pedigrees and Unrelated Females. Biomolecules. 2023; 13(3):543. https://doi.org/10.3390/biom13030543

Chicago/Turabian Style

Kong, Yi-Fan, Shi-Zhu Li, Kai-Wen Wang, Bin Zhu, Yu-Xin Yuan, Meng-Kai Li, and Ji-Yuan Zhou. 2023. "An Efficient Bayesian Method for Estimating the Degree of the Skewness of X Chromosome Inactivation Based on the Mixture of General Pedigrees and Unrelated Females" Biomolecules 13, no. 3: 543. https://doi.org/10.3390/biom13030543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop