Next Article in Journal
Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods
Next Article in Special Issue
XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates
Previous Article in Journal
Genome-Wide Association Analysis Reveals Genetic Architecture and Candidate Genes Associated with Grain Yield and Other Traits under Low Soil Nitrogen in Early-Maturing White Quality Protein Maize Inbred Lines
Previous Article in Special Issue
Estimation of Causal Effect of Age at Menarche on Pubertal Height Growth Using Mendelian Randomization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation

1
Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China
2
Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
3
Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2022, 13(5), 827; https://doi.org/10.3390/genes13050827
Submission received: 14 April 2022 / Revised: 1 May 2022 / Accepted: 2 May 2022 / Published: 6 May 2022
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)

Abstract

:
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases, and currently several methods have been proposed to estimate the degree of the XCI-S (denoted as γ ) for a single locus. However, no method has been available to estimate γ for genes. Therefore, in this paper, we first propose the point estimate and the penalized point estimate of γ for genes, and then derive its confidence intervals based on the Fieller’s and penalized Fieller’s methods, respectively. Further, we consider the constraint condition of γ [ 0 ,   2 ] and propose the Bayesian methods to obtain the point estimates and the credible intervals of γ , where a truncated normal prior and a uniform prior are respectively used (denoted as GBN and GBU). The simulation results show that the Bayesian methods can avoid the extreme point estimates (0 or 2), the empty sets, the noninformative intervals ( [ 0 ,   2 ] ) and the discontinuous intervals to occur. GBN performs best in both the point estimation and the interval estimation. Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use. In summary, in practical applications, we recommend using GBN to estimate γ of genes.

1. Introduction

X chromosome inactivation (XCI) is an important epigenetic phenomenon. Under the XCI, one of two X chromosomes in females is silenced in the early stage of embryonic development to ensure that the transcriptional dosage of X chromosomes in females and that in males are balanced [1]. Generally, there are three patterns of the XCI [2], random X chromosome inactivation (XCI-R), skewed X chromosome inactivation (XCI-S) [3,4,5,6], and escape from X chromosome inactivation (XCI-E) [7,8]. The XCI-R means that the paternal and maternal X chromosomes in females have the same probabilities to be inactive, i.e., for a locus on the X chromosome, approximately 50% of the cells inactivate one of the alleles, while the remaining 50% of the cells keep the other allele inactive. Under the XCI-E, the alleles on both the X chromosomes in females are expressed, which are similar to those at an autosomal locus. For humans, about 15-30% of the X-linked genes have been reported to undergo the XCI-E [7]. Finally, the XCI-S is defined as more than 75% of the cells in females inactivating the same allele [9]. For some extreme skewed cases, it is possible that more than 90% of the cells keep the same allele silenced [9,10]. As such, the difference in the number of the X chromosomes in females and males and the complexity of the XCI make the association tests for the X chromosomes more complicated than those for the autosomes.
The skewness of the XCI can reflect, or cause, biological consequences for females [9]. The clonal expansion of a somatic cell in females may lead to a cell population with extremely skewed XCI [9]. For some X-linked disorders, there is strong selection of the cells which keep the mutant allele inactive in the heterozygous carriers and, hence, assessing the degree of the skewness of the XCI is helpful in terms of being indicative of the carrier’s disease status [11]. Further, the degree of the skewness of the XCI can determine the severity of certain X-linked diseases, such as haemophilia B [12,13]. On the other hand, even for the same mutant allele, the XCI-S in different tissues or cells may result in different clinical consequences. For example, in heterozygous females with a mutant FoxP3 allele, the XCI-S against the mutant allele in specific tissues can prevent autoimmune disease, while the XCI-S skewed towards the mutant allele in breast epithelial cells can cause breast cancer [14]. Besides this, studies have shown that some diseases, such as ovarian cancer, Rett syndrome, Duchenne muscular dystrophy and recurrent miscarriage, are also related to the XCI-S [15,16,17,18]. Therefore, in recent years, researchers have proposed some methods to test the association between the alleles at an X-chromosomal single nucleotide polymorphism (SNP) locus and traits [19,20,21,22,23,24,25,26]. For example, Wang et al. [23] developed a permutation-based test statistic which considers all the XCI patterns. For the XCI-R and the XCI-S, this method respectively codes three female genotypes ( d d , D d and D D ) as 0, γ and 2 at an X-chromosomal SNP, with the major allele d and the minor allele D , where γ [ 0 ,   2 ] is an unknown genotypic value for heterozygous females, and respectively codes two male genotypes ( d and D ) as 0 and 2. Here, γ can be used to measure the degree of the XCI skewing. For instance, γ [ 0 ,   1 ) is indicative of the XCI-S skewed towards the minor allele D , γ = 1 means that the XCI pattern is the XCI-R, and γ ( 1 ,   2 ] indicates the XCI-S skewed towards the major allele d . For the XCI-E, three female genotypes are coded as 0, 1 and 2, and two male genotypes are coded as 0 and 1, respectively. However, the X-chromosomal association tests mentioned above are only applicable to a single SNP and common variants, and are not suitable for genetic regions or genes with multiple SNPs and rare variants. Rare variants refer to the variants with a minor allele frequency (MAF) less than 1%, and those with MAF 1 % are called common variants [27,28]. Over the past few years, genome-wide association studies have identified many common variants associated with complex traits, but these variants usually explain only a small part of the estimated heritability for a given trait. On the other hand, it has been shown that rare variants play a key role in influencing traits [29]. Single-variant tests often have low test power when applied to the rare variants. Therefore, many statistical methods had been presented, which focus on testing the cumulative effect of rare variants in genetic regions or SNP sets (such as genes), including the burden test and the variance-component tests [27,30,31,32,33]. The burden test collapses all the rare variants in a genetic region into a single burden variable, and then regresses the trait on the burden variable to test the cumulative effect of the rare variants in that region [27]. The variance-component tests, such as the sequence kernel association test (SKAT), do not directly aggregate the variants in the modeling process, but aggregate the association between the variants and the trait through a kernel matrix [33]. Another method, SKAT-O, proposed by Lee et al. [34], has the advantages of both the burden and SKAT tests, but the time cost is higher than the previous two methods. All these methods have one thing in common, i.e., increasing the weights of rare variants’ contributions and decreasing the weights of common variants’ contributions. However, for a trait-related gene, the relative influence of rare and common variants is not known [35]. Therefore, Iuliana et al. [35] put forward several multi-locus association tests, such as the adaptive sum test, which consider the effects of both common and rare variants on the trait, and these methods are more powerful when the genes simultaneously contain rare and common variants. Note that these multi-locus association tests are all based on genetic regions or genes on autosomes, and may not be directly applied to the X chromosomes. Therefore, Clement et al. [36] improved the traditional burden test, SKAT and SKAT-O methods and suggested three gene-based X-chromosomal association tests. However, these methods only take account of the XCI-R and XCI-E patterns. What is more, the FxSKAT method, proposed by Asuman et al. [37], is not only applicable to pedigree data, but also takes the XCI-E into account during the analysis process.
Except for testing the association between the genes on the X chromosome and the traits under study, it is also important to develop methods to measure the corresponding degree of the skewness of the XCI (denoted as γ ). At present, researchers have put forward several methods to estimate γ for a single SNP, which can simultaneously get the point estimates and the confidence intervals (CIs) of γ . Specifically, Xu et al. [38] proposed a statistical index for estimating γ based on family trios (both parents and their daughter), which can be represented as the ratio of two relative risks in females, and derived the corresponding CI with the likelihood ratio (LR) test. Wang et al. [39] used the ratio of two regression coefficients of a logistic regression to estimate γ , and obtained the CIs with the LR, Fieller’s and delta methods, respectively. Li et al. [40] further extended the methods of Wang et al. so that they can accommodate quantitative traits. However, the above-mentioned methods are all constructed for a single SNP, and are not suitable for genetic regions or genes containing multiple SNPs. Furthermore, when applied to rare variants, they perform poorly. In addition, it should be noted that the delta method cannot control the coverage probability (CP) well, and the LR and Fieller’s methods have similar performance in the interval estimation, while the Fieller’s method is computationally efficient. Thus, the Fieller’s method is recommended in practice. However, both the LR and Fieller’s methods may yield unbounded CIs when the denominators in the ratios used to estimate γ are close to 0. Fortunately, the penalized Fieller’s (PF) method, which was proposed by Wang et al. [41], can be used to conduct the ratio estimation and always get the bounded CIs by choosing an appropriate penalty parameter. However, it has not been applied to the estimation of the degree of the skewness of the XCI yet. On the other hand, the above-mentioned methods do not consider the constraint condition of γ [ 0 ,   2 ] , and simply cut off the point estimates and the CIs within [ 0 ,   2 ] , which may result in extreme point estimates (0 or 2) and empty sets or noninformative CIs (i.e., [ 0 ,   2 ] ). In contrast, the Bayesian methods can effectively utilize the prior information of each unknown parameter in the analysis, and have been widely used in statistical genetics [42].
Therefore, in this paper, we borrow the idea of the burden test, aggregate all the variants in a gene under study into a burden variable by selecting the appropriate weights, and then estimate the mean degree of the skewness of the XCI over all the SNPs in the gene based on the burden variable. We first propose the point estimate and the penalized point estimate of γ for the gene, and then derive its CIs based on the Fieller’s and PF methods, respectively. Then, by considering the constraint condition of γ [ 0 ,   2 ] , we propose the Bayesian methods to obtain the point estimates and the credible intervals of γ . Specifically, after getting enough samples drawn from the posterior distribution of γ , we calculate the mode of the samples as the point estimate of γ and the highest posterior density interval (HPDI) as the credible interval of γ [43]. We conduct extensive simulation studies to compare the performances of the proposed point estimation methods and the interval estimation methods for γ . Finally, we demonstrate the practical utility of the proposed methods by applying them to the Minnesota Center for Twin and Family Research (MCTFR) data.

2. Materials and Methods

2.1. Notations

Suppose that we only collect n female subjects, because male subjects provide no information on the XCI skewing. Consider an X-linked trait (quantitative or qualitative) and let y i represent the trait value of the i th female ( i = 1 ,   2 ,   ,   n ), then Y = ( y 1 , y 2 ,   , y n ) T is the vector of the trait values for all the females. Assume that a gene which contains J SNPs is associated with this trait, and let d j and D j denote the major allele and the minor allele at the j th SNP ( j = 1 ,   2 ,   ,   J ), respectively. Let G i j be the genotype at the j th SNP of the i th female (i.e., G i j = d j d j ,   D j d j or D j D j ). If we use γ [ 0 ,   2 ] to measure the mean degree of the skewness of the XCI for all the SNPs in the gene, then g i j = 0 ,   γ and 2 can be used to denote the genotypic values for genotypes d j d j , D j d j and D j D j , respectively. As such, G i = ( g i 1 , g i 2 , , g i J ) T is the vector of the genotypic values at the J SNPs of the i th female. Therefore, we consider the association between the gene and the trait based on the following generalized linear model
h ( μ i ) = β 0 + β T G i + b T Z i ,
where   h ( · ) is a link function; Z i = ( Z i 1 , Z i 2 , , Z i m ) T is the vector of m covariates of the i th female, which are needed to be adjusted, and Z = ( Z 1 ,   Z 2 ,   , Z n ) T is an n × m covariate matrix; μ i = E ( y i | G i , Z i ) is the conditional mean of the i th female’s trait value given G i and Z i ; β 0 is the intercept, β = ( β 1 , β 2 , , β J ) T is the vector of the regression coefficients of G i , and b = ( b 1 , b 2 , , b m ) T is an m × 1 vector of the regression coefficients of Z i .
Based on the idea of the burden test [27], we aggregate all the SNPs in the gene into a burden variable and let X i = j = 1 J ω j g i j , where ω j is a weight for the j th SNP. Here we assume that ω j is a function with respect to the MAF at the j th SNP (denoted as MAF j ), i.e., ω j = B e t a ( MAF j , 0.5 ,   0.5 ) [35]. So, model (1) can be rewritten as
h ( μ i ) = β 0 + β c X i + b T Z i ,
where β c is the regression coefficient of X i . Next, we consider two variables g i j ( 1 ) = I { G i j = D j d j   or   D j D j } and g i j ( 2 ) = I { G i j = D j D j } , where I { · } is the indicator function. Thus, g i j ( 1 ) = 1 means that the i th female contains at least one minor allele at the j th SNP, and g i j ( 2 ) = 1 denotes that the female is a homozygote D j D j at the j th SNP. Through simple transformations, we can get g i j = γ g i j ( 1 ) + ( 2 γ ) g i j ( 2 ) , and X i can be expressed as X i = j = 1 J ω j [ γ g i j ( 1 ) + ( 2 γ ) g i j ( 2 ) ] = γ X i ( 1 ) + ( 2 γ ) X i ( 2 ) , where X i ( 1 ) = j = 1 J ω j g i j ( 1 ) and X i ( 2 ) = j = 1 J ω j g i j ( 2 ) . Further, let X ( 1 ) = ( X 1 ( 1 ) , X 2 ( 1 ) ,   , X n ( 1 ) ) T and X ( 2 ) = ( X 1 ( 2 ) , X 2 ( 2 ) ,   , X n ( 2 ) ) T . To estimate the mean degree of the XCI skewing for the gene (i.e., γ ), we substitute X i = γ X i ( 1 ) + ( 2 γ ) X i ( 2 ) into model (2) and get
h ( μ i ) = β 0 + β c [ γ X i ( 1 ) + ( 2 γ ) X i ( 2 ) ] + b T Z i .
For quantitative traits, h ( · ) is the identity function, and model (3) can be written as y i = β 0 + β c [ γ X i ( 1 ) + ( 2 γ ) X i ( 2 ) ] + b T Z i + ε i , where ε i is the random error and follows N ( 0 ,   σ 2 ) . In this case, the unknown parameters are θ 1 = ( β 0 , β c , γ , b T , σ ) T , and the corresponding likelihood function of the sample is
L 1 ( θ 1 ) = ( 2 π σ 2 ) n 2 exp { i = 1 n [ y i β 0 γ β c X i ( 1 ) ( 2 γ ) β c X i ( 2 ) b T Z i ] 2 2 σ 2 } .
As for qualitative traits, h ( · ) is the logit function, and model (3) is written as Logit ( P r ( y i = 1 | X i ( 1 ) , X i ( 2 ) , Z i ) ) = β 0 + β c [ γ X i ( 1 ) + ( 2 γ ) X i ( 2 ) ] + b T Z i . The unknown parameters are θ 2 = ( β 0 , β c , γ , b T ) T and the likelihood function is
L 2 ( θ 2 ) = i = 1 n π i I { y i = 1 } ( 1 π i ) I { y i = 0 } ,
where y i = 1 and 0 respectively indicate that the i th female is a case and a control, and π i = 1 / { 1 + exp [ β 0 γ β c X i ( 1 ) ( 2 γ ) β c X i ( 2 ) b T Z i ] } . Let β c ( 1 ) = γ β c and β c ( 2 ) = ( 2 γ ) β c , and we have
h ( μ i ) = β 0 + β c ( 1 ) X i ( 1 ) + β c ( 2 ) X i ( 2 ) + b T Z i .
As such, we obtain β c = ( β c ( 1 ) + β c ( 2 ) ) / 2 and γ can be expressed as
γ = β c ( 1 ) β c = 2 β c ( 1 ) β c ( 1 ) + β c ( 2 ) .
By assuming that the degree of the skewness of the XCI at the j th SNP is γ j , γ satisfies, under a certain condition (the proof is given in Appendix A),
γ = j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) γ j j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) ,
where g . j ( 1 ) = i = 1 n g i j ( 1 ) is the number of the females who contain at least one minor allele at the j th SNP, and g . j ( 2 ) = i = 1 n g i j ( 2 ) is the number of the females whose genotypes at the j th SNP are D j D j . So, γ is the weighted mean of the γ j ’s for all the SNPs in the gene with the weights being ω j ( g . j ( 1 ) g . j ( 2 ) ) / j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) . When there are rare variants at some SNPs or when the variation of the γ j ’s in the gene is large, γ is still well defined for the whole gene. On the other hand, from Equation (5), γ can be well defined if there is an association between the gene and the trait (i.e., β c = ( β c ( 1 ) + β c ( 2 ) ) / 2 0 ). Further,   γ = 0 if and only if β c ( 1 ) = 0 and β c ( 2 ) 0 , which means that all the γ j ’s are 0 and the XCI-S is completely skewed towards the minor allele for each SNP, and γ = 2 only when β c ( 1 ) 0 and β c ( 2 ) = 0 , indicating that all the γ j ’s are 2 and the XCI-S is completely skewed towards the major allele for each SNP. However, γ = 1 means that on the average, the gene undergoes the XCI-R or the XCI-E. After obtaining the estimates of β c ( 1 ) and β c ( 2 ) , respectively denoted by β ^ c ( 1 ) and β ^ c ( 2 ) which can be derived by the maximum likelihood method, the point estimate of γ can be expressed as γ ^ = 2 β ^ c ( 1 ) / ( β ^ c ( 1 ) + β ^ c ( 2 ) ) .

2.2. Point Estimate and CI of γ by Fieller’s Method

Note that γ should take the possible values from the interval [ 0 ,   2 ] . So, the original estimate γ ^ = 2 β ^ c ( 1 ) / ( β ^ c ( 1 ) + β ^ c ( 2 ) ) needs to be cut off in [ 0 ,   2 ] and the resulting estimate is denoted by γ ^ G F . Further, we utilize the Fieller’s method to get the CI of γ . Specifically, borrowing the idea of Wang et al. [39], we have β ^ c = ( β ^ c ( 1 ) + β ^ c ( 2 ) ) / 2 , Var ^ ( β ^ c ) = 1 4 [ Var ^ ( β ^ c ( 1 ) ) + Var ^ ( β ^ c ( 2 ) ) + 2 Cov ^ ( β ^ c ( 1 ) , β ^ c ( 2 ) ) ] and Cov ^ ( β ^ c ( 1 ) , β ^ c ) = 1 2 Var ^ ( β ^ c ( 1 ) ) + 1 2 Cov ^ ( β ^ c ( 1 ) , β ^ c ( 2 ) ) . To construct the CI of γ , we first establish a Wald test under the null hypothesis H 0 :   γ = γ 0 , where γ 0 is a pre-specified value (e.g., 1, which means that on the average, the gene undergoes the XCI-R or the XCI-E). As such, we have β c ( 1 ) γ 0 β c = 0 , and the Wald test statistic is
β ^ c ( 1 ) γ 0 β ^ c Var ^ ( β ^ c ( 1 ) ) + γ 0 2 Var ^ ( β ^ c ) 2 γ 0 Cov ^ ( β ^ c ( 1 ) ,   β ^ c ) ~ N ( 0 ,   1 ) .
Therefore, the 100 ( 1 α ) % CI of γ can be derived by solving the following equation
[ β ^ c ( 1 ) γ 0 β ^ c Var ^ ( β ^ c ( 1 ) ) + γ 0 2 Var ^ ( β ^ c ) 2 γ 0 Cov ^ ( β ^ c ( 1 ) ,   β ^ c ) ] 2 = Z 1 α / 2 2 ,
where Z 1 α / 2 is the ( 1 α / 2 ) upper quantile of the standard normal distribution. Rearrange the above equation with respect to γ 0 into a quadratic equation
A γ 0 2 + B γ 0 + C = 0 ,
where A = β ^ c 2 Z 1 α / 2 2 Var ^ ( β ^ c ) , B = 2 [ Z 1 α / 2 2 Cov ^ ( β ^ c ( 1 ) , β ^ c ) β ^ c ( 1 ) β ^ c ] and C = ( β ^ c ( 1 ) ) 2 Z 1 α / 2 2 Var ^ ( β ^ c ( 1 ) ) . When Δ = B 2 4 A C = 0 or A = 0 , the CI of γ will degenerate to be a point. The CI of γ for other cases is as follows
{ ( B Δ 2 A ,   B + Δ 2 A ) [ 0 ,   2 ] ,   if   Δ > 0   and   A > 0 ( ( ,   B + Δ 2 A ) ( B Δ 2 A , + ) ) [ 0 ,   2 ] ,   if   Δ > 0   and   A < 0 [ 0 ,   2 ] ,   if   Δ < 0   and   A < 0 ,   if   Δ < 0   and   A > 0
It should be noted that even in the case of Δ > 0 , the CI of γ obtained by the Fieller’s method may still be an empty set. And in the case of Δ > 0 and A < 0 , the CI may be composed of two parts, which is the discontinuous interval.

2.3. Penalized Point Estimate and CI of γ by PF Method

As mentioned above, we construct γ ^ = β ^ c ( 1 ) / β ^ c as the point estimate of γ , where β ^ c = ( β ^ c ( 1 ) + β ^ c ( 2 ) ) / 2 . However, if the denominator β ^ c is very close to 0, γ ^ will tend to the infinity. The CI of γ based on the Fieller’s method before the truncation is usually unbounded. To deal with this issue in the ratio estimate and borrow the idea of Wang et al. [41], we propose the following PF method to obtain the penalized point estimate of γ and the corresponding CI. Consider the penalized log-likelihood function of β c as follows: p l = ( β ^ c β c ) 2 / ( 2 Var ^ ( β ^ c ) ) + λ log | β c | , where λ > 0 is a penalty parameter and is taken to be Z 1 α / 2 2 / 4 as suggested by Wang et al. [41] because the CI obtained by the PF method is always bounded with λ = Z 1 α / 2 2 / 4 . By maximizing the function p l , we have the penalized denominator β ˜ c = β ^ c / 2 + sign ( β ^ c ) β ^ c 2 / 4 + λ Var ^ ( β ^ c ) , where sign ( · ) is the signum function. Further, we can get Var ^ ( β ˜ c ) = ξ 2 Var ^ ( β ^ c ) + O ( n 3 ) , where ξ = β ˜ c / ( 2 β ˜ c β ^ c ) . If we replace β ^ c by β ˜ c to obtain the point estimate γ ˜ = β ^ c ( 1 ) / β ˜ c , then γ ˜ is a biased estimate of γ . To reduce this bias, we need to correct the numerator β ^ c ( 1 ) by β ˜ c ( 1 ) = β ^ c ( 1 ) + γ ˜ ( β ˜ c β ^ c ) . Correspondingly, we can get Var ^ ( β ˜ c ( 1 ) ) = ξ 2 Var ^ ( β ^ c ( 1 ) ) 4 ( ξ 1 1 ) γ ˜ Cov ^ ( β ^ c ( 1 ) , β ^ c ) + 4 ( 1 ξ ) 2 γ ˜ 2 Var ^ ( β ^ c ) and Cov ^ ( β ˜ c ( 1 ) , β ˜ c ) = Cov ^ ( β ^ c ( 1 ) , β ^ c ) 2 ξ ( 1 ξ ) γ ˜ Var ^ ( β ^ c ) . After obtaining the corrected denominator β ˜ c and the corrected numerator β ˜ c ( 1 ) , γ ^ * = β ˜ c ( 1 ) / β ˜ c truncated by [ 0 ,   2 ] is the penalized point estimate of γ , which is denoted by γ ^ G P F . The construction process of the corresponding CI of γ ^ G P F is the same as the Fieller’s method, except for respectively replacing β ^ c , β ^ c ( 1 ) , Var ^ ( β ^ c ) , Var ^ ( β ^ c ( 1 ) ) and Cov ^ ( β ^ c ( 1 ) , β ^ c ) by β ˜ c , β ˜ c ( 1 ) , Var ^ ( β ˜ c ) , Var ^ ( β ˜ c ( 1 ) ) and Cov ^ ( β ˜ c ( 1 ) ,   β ˜ c ) in Equation (6). However, it should be noted that although the CI of γ based on the PF method is always bounded when λ = Z 1 α / 2 2 / 4 , it may be out of [ 0 ,   2 ] and we need to truncate it by [ 0 ,   2 ] .

2.4. Point Estimate and Credible Interval of γ by Bayesian Method

Note that the point estimates ( γ ^ G F and γ ^ G P F ), and the corresponding CIs mentioned above, are cut off in the interval [ 0 ,   2 ] and cannot directly incorporate the information on γ [ 0 ,   2 ] . Therefore, in this subsection, we introduce the Bayesian method to give the point estimate and the credible interval of γ by considering the prior information of γ [ 0 ,   2 ] . Specifically, we have the posterior distribution of the unknown parameter θ . as follows
f ( θ . | Y , X ( 1 ) ,   X ( 2 ) , Z ) = f ( θ . ) L . ( θ . ) f ( θ . ) L . ( θ . ) d θ . ,
where f ( θ . ) is the joint prior distribution of θ . ; when the traits are quantitative, θ . = θ 1 and L . ( θ . ) = L 1 ( θ 1 ) ; when the traits are qualitative, θ . = θ 2 and L . ( θ . ) = L 2 ( θ 2 ) . However, in general, we cannot get the analytical solutions of f ( θ . | Y , X ( 1 ) ,   X ( 2 ) , Z ) . Therefore, it is not feasible to directly sample from the posterior distribution. Fortunately, there are several algorithms for sampling from an approximate distribution of the posterior distribution, such as the Hamiltonian Monte Carlo (HMC) algorithm which can be implemented by the “rstan” package in R [43]. On the other hand, according to Annis et al. [43], the correlation between the parameters has little influence on the HMC algorithm. To simplify the operations, and improve the sampling efficiency, we assume that the unknown parameters in θ . are independent of each other, and use the HMC algorithm to sample from the approximate posterior distribution of θ . . In other words, we choose the prior distribution for each unknown parameter separately.
The prior distributions of the parameters in θ . are selected as follows. To reduce the influence of the selection of the prior distributions on the results, for nuisance parameters β 0 , β c and b (there is an additional nuisance parameter σ when the trait is quantitative), we choose the weak prior distributions [44]. Specifically, we assume that the prior distributions of β 0 and β c are both   N ( 0 , 10 2 ) , and that of b is M V N ( 0 ,   diag ( 10 2 ,   10 2 ,   ,   10 2 ) ) . For quantitative traits, we also specify the prior distribution of σ to be an exponential distribution, i.e., σ ~ exp ( 1 ) . As for the parameter γ of interest, which is used to measure the mean degree of the skewness of the XCI over all the SNPs in the gene and satisfies the constraint condition of γ [ 0 ,   2 ] , we consider two possible prior distributions. The first one is the truncated normal distribution, with both parameters being 1 and the values ranging from 0 to 2, and the probability density function of the prior distribution is
f ( γ ) = { ϕ ( γ 1 ) 1 2 π 0 2 exp [ 1 2 ( x 1 ) 2 ] d x , 0 γ 2 0 , otherwise ,
where ϕ ( · ) is the probability density function of the standard normal distribution. In this way, γ not only satisfies the constraint condition of γ [ 0 ,   2 ] , but also the probability of γ being close to 1 is the highest, which is consistent with the literature [2], i.e., most of the SNPs on the X chromosome undergo the XCI-R. Meanwhile, the selected truncated normal distribution of γ also avoids that the probability of γ taking the extreme value (0 or 2) is too low, which may be more suitable for practical applications. The second prior distribution of γ is a uniform distribution, i.e., γ ~ U ( 0 ,   2 ) .
After specifying the prior distributions of all the unknown parameters, we can get enough samples of γ through the HMC algorithm, and then calculate the mode of the samples as the point estimate of γ , and the highest posterior density interval (HPDI) as the credible interval of γ . Here, we denote the Bayesian methods with the truncated normal prior and the uniform prior as GBN and GBU, and the point estimates obtained by these two methods are denoted as γ ^ G B N and γ ^ G B U , respectively.

3. Results

3.1. Simulation Settings

We conducted extensive simulation studies to evaluate the performances of the proposed point estimation and interval estimation methods. The number of female subjects (i.e., the sample size n ) is set to be 500 and 2000. Consider a gene associated with the trait under study and the number of the SNPs in the gene (i.e., J ) is fixed at 100, i.e., we assume that all the 100 SNPs are associated with the trait. Meanwhile, we define η as the proportion of rare variants among the 100 SNPs. To explore the effect of η on the proposed methods, we set η = 0 , 0.4 and 1, which correspond to the cases of all the 100 SNPs only including common variants, the 100 SNPs simultaneously containing common and rare variants, and all the 100 SNPs only consisting of rare variants, respectively. Among them, the MAFs for common variants are sampled from U ( 0.01 ,   0.5 ) , while the MAFs for rare variants are randomly simulated from U ( 0.005 ,   0.01 ) [45,46,47]. We generate the genotypes of n female subjects by referring to the ideas of Wang et al. [45], Basu et al. [46], and Turkmen et al. [47]. We first generate a latent vector V = ( V 1 , V 2 , , V 100 ) T from the multivariate normal distribution with the mean vector being 0 and the elements of the variance-covariance matrix satisfying Var ( V j ) = 1 and Corr ( V j , V k ) = ρ | j k | ( j ,   k = 1 ,   2 ,   ,   100 ) [45,47], where the linkage disequilibrium among the SNPs is taken into consideration. For simplicity, we set ρ = 0.5 in our simulation studies. Once V is generated, it is then transformed to 0 (major allele) or 1 (minor allele) determined by the corresponding MAFs. This process is repeated twice, and two simulated vectors of length 100 are put together to form the genotypes at the 100 SNPs for a female subject. After simulating the genotypes of n female subjects, we have an n × 100 genotypic value matrix   G = ( G 1 ,   G 2 ,   ,   G n ) T with the elements being 0, 1 or 2, and then we replace the elements of G equal to 1 with γ to simulate the information on the XCI-S. Note that to simplify the simulation and better evaluate the performances of our proposed methods (e.g., the calculation of the mean squared errors (MSEs) of the point estimates requires a single true value of γ for each replicate; the details are given later), we set the degrees of the XCI skewing γ j ’s at all the 100 SNPs to be the same in the simulation study (i.e., γ j = γ ,   j = 1 ,   2 ,   ,   100 ).
We only consider a covariate Q , which is generated from the standard normal distribution. For the quantitative trait, we simulate the trait value y i of the i th female according to the following model
y i = β 0 + β 1 g i 1 + β 2 g i 2 + + β 100 g i 100 + δ Q i + ε i ,
where ε i is the random error, which is generated from the standard normal distribution; β 0 is the intercept and δ is the regression coefficient of the covariate Q , and both the parameters are set to be 0.5 [36]; | β j | = e | log 10 MAF j | / 2 is the regression coefficient of the genotypic value g i j at the j th SNP [33,34,36], where e is the tuning parameter and is used to avoid the effect of a SNP being too large or too small [36]. To highlight the effects of rare variants on the trait, we set e = 1.5 when the j th SNP has a rare variant, otherwise   e = 1.1 . Further, notice that the directions of the effects of different SNPs on the trait may be different. Therefore, we consider the proportion of the SNPs with positive effects among the 100 SNPs (denoted by τ ) and set τ to be 0.6 and 1, indicating that the effect directions of some SNPs are positive and some are negative, and all the SNP effects are positive, respectively. We do not simulate the case of τ = 0 (i.e., all the SNP effects are negative) because the results with τ = 0 are similar to those with τ = 1 . As for the qualitative trait, the model for generating the affection status y i of the i th female is as follows
Logit ( P r ( y i = 1 | G i , Q i ) ) = β 0 + β 1 g i 1 + β 2 g i 2 + + β 100 g i 100 + δ Q i .
All of the parameters are the same as when simulating the quantitative trait, except that we need to set the case-control ratio to be 1:1.
After simulating the genotypes and the trait values, we use model (4) to obtain the estimates of β c ( 1 ) and β c ( 2 ) , where X i ( 1 ) = j = 1 100 ω j g i j ( 1 ) , X i ( 2 ) = j = 1 100 ω j g i j ( 2 ) , ω j = B e t a ( MAF ^ j ,   0.5 ,   0.5 ) , and MAF ^ j is the estimate of the MAF at the j th SNP. Then, we get the point estimate γ ^ G F , the penalized point estimate γ ^ G P F , and the CIs of γ derived by the Fieller’s and the PF methods. As for the Bayesian methods, the HMC algorithm is implemented through the “sampling” function in the R package “rstan”. We set 8 chains for the parallel sampling in the simulation. For each chain, we extract 10,000 samples, and the first 5000 are used for warm-up. So, we finally get 40,000 samples. To ensure the convergence, the target acceptance rate is set to be 0.99.
The above simulation steps are all implemented in the R software (version 4.1.1, http://r-project.org, accessed on 5 January 2022). For each simulation setting, the number of the replicates is fixed to be 500, and for each replicate, the true value of γ is sampled from the uniform distribution U ( 0 ,   2 ) . To evaluate the accuracy and the robustness of γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F , we calculate the MSEs of these point estimates. Here, MSE = s = 1 500 ( γ ^ s γ s ) 2 / 500 , where γ s represents the true value of γ and γ ^ s is the point estimate in the s th replicate ( s = 1 ,   2 ,   ,   500 ). Note that γ ^ G B N and γ ^ G B U are always between 0 and 2, so we only compute the proportions of γ ^ G P F and γ ^ G F taking the extreme values (0 or 2), respectively. Meanwhile, scatter plots are used to show the relationship between the four point estimates and the true values of γ . To compare the performances of the GBN, GBU, PF and Fieller’s methods in the interval estimation, we calculate the CP as well as the mean, the median, the standard deviation and the interquartile range of the widths of the 95% HPDIs or CIs (denoted by W m e a n , W m e d i a n , W s d and W i q r ), respectively. For the PF and Fieller’s methods, we also compute the proportions of the empty sets (EP), the noninformative intervals (NP), and the discontinuous intervals (DP) to further compare the effectiveness of these two methods, where the noninformative interval means the CI being [ 0 ,   2 ] . However, it should be noted that the GBN and GBU methods avoid the cases of empty sets, noninformative intervals, and discontinuous intervals occurring. In addition, we draw the scatter plots between the interval widths of the four proposed methods and the true values of γ .

3.2. Simulation Results

The proportions of the extreme values (0 or 2) for γ ^ G P F and γ ^ G F are shown in Table 1. It can be seen from the table that the proportions of the point estimates equal to 0 are the same for both γ ^ G P F and γ ^ G F , while the proportion of the point estimates equal to 2 for γ ^ G P F is reduced. This is because before the truncation, both γ ^ * = β ˜ c ( 1 ) / β ˜ c and γ ^ = β ^ c ( 1 ) / β ^ c always have the same sign, and γ ^ * is bounded. Specifically, when γ ^ * and γ ^ are negative, γ ^ G P F and γ ^ G F are both 0. On the other hand, when γ ^ * and γ ^ are positive, compared with γ ^ , the proportion of γ ^ * being greater than 2 decreases. Further, from Table 1, with the increase of the sample size or the trait changing from qualitative to quantitative, the proportions of the extreme values for γ ^ G P F and γ ^ G F both become less. Next, let us take a look at the effects of the proportion of the rare variants ( η ) and the proportion of the SNPs with the positive effects ( τ ) among all the SNPs on the proportions of the extreme values for γ ^ G P F and γ ^ G F . Under the situation that the trait is quantitative and τ = 0.6 (i.e., the effect directions of some SNPs are positive and some are negative), the proportions of the extreme values (0 and 2) for γ ^ G P F and γ ^ G F with η = 0 (all the SNPs only include common variants) are less than those with η = 1 (all the SNPs only consist of rare variants), irrespective of the sample size ( n ). As for the qualitative trait, when n = 2000 and τ = 0.6 , the proportion of the extreme values equal to 0 for γ ^ G P F and the proportions of the extreme values (0 and 2) for γ ^ G F with η = 0 are smaller than those with η = 1 , while the proportion of the extreme values equal to 2 for γ ^ G P F with η = 0 (12.8%) is larger than that with η = 1 (10.4%). When the trait is qualitative, n = 500 and τ = 0.6 , the results are similar to those with n = 2000 , except that the proportion of the extreme values equal to 2 for γ ^ G F with η = 0 (20.0%) and that with η = 1 (19.2%) are very close to each other. In addition, the proportions of the extreme values (0 or 2) for γ ^ G P F and γ ^ G F . have no obvious trends for other cases of different values of η and τ .
The MSEs of the four point estimates ( γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F ) are listed in Table 2. From Table 2, we can see that the MSEs of γ ^ G B N and γ ^ G B U are smaller than those of γ ^ G P F and γ ^ G F , and the MSE of γ ^ G B N is the smallest. When the sample size increases or the trait turns from qualitative to quantitative, the MSEs of these four point estimates decrease significantly. In general, the MSEs of the four point estimates gradually become larger when η changes from 0, 0.4 to 1 (i.e., higher proportion of rare variants) and other parameters are kept unchanged, except for the case when the trait is quantitative, n = 500 and τ = 1 . On the other hand, the MSEs of the four point estimates with τ = 0.6 (i.e., the effect directions of some SNPs are positive and some are negative) are smaller than those with τ = 1 (i.e., all the SNP effects are positive), when other parameters are fixed.
Figure 1, Figure 2 and Figures S1–S6 are the scatter plots of the four point estimates against the true values of γ under different simulation settings. These figures can more intuitively compare the performances of the four point estimates. For example, Figure 1 and Figure 2 are the scatter plots of the four point estimates against the true values of γ for the quantitative trait with n = 500 , and τ = 0.6 and 1, respectively. In each figure, subplots (a)–(d) (four subplots in the first row) are respectively the scatter plots of γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F with η = 0 ; subplots (e)–(h) (four subplots in the second row) and subplots (i)–(l) (four subplots in the third row) are the corresponding scatter plots with η = 0.4 and 1, respectively. By comparing the four subplots in the same row of each figure, we find that the two point estimates ( γ ^ G B N and γ ^ G B U ) obtained by the Bayesian methods are closer to the true values of γ , and both perform better than γ ^ G P F and γ ^ G F . On the other hand, note that the distribution of the true value of γ is U ( 0 ,   2 ) , and it can be seen from the figures that the distributions of γ ^ G B N and γ ^ G B U are more uniform, while the distributions of γ ^ G P F and γ ^ G F are skewed towards the extreme values (0 and 2). Meanwhile, by respectively comparing subplots (a), (e) and (i) for γ ^ G B N with subplots (b), (f) and (j) for γ ^ G B U , there is a little greater dispersion for γ ^ G B U than γ ^ G B N . In addition, from subplots (c), (g) and (k) for γ ^ G P F and subplots (d), (h) and (l) for γ ^ G F , we observe that there exist many extreme point estimates for γ ^ G P F and γ ^ G F (represented by the blue points). Moreover, the scatter plots for γ ^ G P F and γ ^ G F provide the additional information that most of the extreme point estimates generally occur when the true values of γ are less than 0.5 or greater than 1.5. Further, by comparing the subplots in different rows of each figure when τ = 0.6 (Figure 1, Figures S1, S3 and S5), i.e., η changing from 0, 0.4 to 1, the dispersions of the four point estimates generally increase, indicating that, in general, the MSEs of the four point estimates become larger, which are consistent with the results in Table 2. The numbers of the blue points in subplots (c) and (d) with η = 0 are much less than those in subplots (k) and (l) with η = 1 , respectively. However, for those figures with τ = 1 (Figure 2, Figure S2, S4 and S6), there is no obvious trend for the number of the blue points. Compared to Figure 1 ( τ = 0.6 ), the agreements between the four point estimates and the true values of γ in Figure 2 ( τ = 1 ) are worse, which can also be seen in other figures (Figures S1, S3 and S5 vs. Figures S2, S4 and S6, respectively). Observing Figure 2, we find that the four point estimation methods may underestimate γ when τ = 1 . Finally, these four point estimation methods perform better for the quantitative trait than for the qualitative trait (Figure 1, Figure 2, Figures S1 and S2 vs. Figures S3–S6, respectively), and when the sample size increases (Figures S1, S2, S5 and S6 vs. Figure 1, Figure 2, Figures S3 and S4, respectively).
Table 3 displays the EPs, NPs and DPs of the PF and Fieller’s methods. From Table 3, we observe that the EPs of the PF method are generally smaller than, or equal to, those of the Fieller’s method, except for the quantitative trait with n = 500 , η = 0.4 or 1, and τ = 1 , and the qualitative trait with n = 500 or 2000 , η = 0.4 or 1, and τ = 1 . However, the NPs of the PF method are always smaller than, or equal to, those of the Fieller’s method. Note that when we use the PF and Fieller’s methods to calculate the CIs of γ , we need to truncate the CIs by the interval [ 0 ,   2 ] . As such, compared to the Fieller’s method, the PF method can get shorter CIs, which means that the PF method reduces the possibility of the truncated CIs being the noninformative intervals. On the other hand, if the CIs before the truncation are disjoint from the interval [ 0 ,   2 ] , the PF method will increase the possibility that the truncated CIs are empty sets, which is the reason why the PF method may have bigger EPs than the Fieller’s method in some scenarios. In addition, all the DPs of the PF method are equal to 0. This is because we consider the penalty parameter λ = Z 1 α / 2 2 / 4 , and the CIs derived by the PF method are always continuous. With increase of the sample size, the NPs of the PF and Fieller’s methods and the DPs of the Fieller’s method become smaller. Moreover, under the same simulation settings, the NPs of both methods, and the DPs of the Fieller’s method, for the quantitative trait are less than those for the qualitative trait. Under the situation that τ = 0.6 , when η changes from 0, 0.4 to 1 and other parameters are kept unchanged, the EPs of both methods have no obvious trends, while the NPs of both methods and the DPs of the Fieller’s method generally become larger. As for τ = 1 , when η changing from 0, 0.4 to 1 and other parameters being fixed, the EPs of the PF method appear larger except for the quantitative trait and n = 2000 , while the DPs of the Fieller’s method are relatively stable, and the NPs of the PF and Fieller’s methods show a trend of first increasing and then decreasing on most occasions. On the other hand, when other parameters are fixed, the EPs and NPs of the PF and Fieller’s methods with τ = 0.6 are smaller than those with τ = 1 in most cases, and the DPs of the Fieller’s method with τ = 0.6 are larger than or equal to those with τ = 1 .
The CPs, W m e a n and W m e d i a n of the GBN, GBU, PF and Fieller’s methods are displayed in Table 4, and the corresponding W s d and W i q r are given in Table 5. Table 4 demonstrates that, for the quantitative trait, the CPs of the GBN, GBU and Fieller’s methods are controlled around 95%. However, when n = 500 , η = 1 and τ = 1 , the CP of the PF method is underestimated (87.8%). As the sample size increases to 2000 and other parameters remain unchanged, the CP of the PF method is 96.6%. For the qualitative trait, when n = 500 , the CPs of the GBN, GBU and PF methods are underestimated in most situations. With the increase of the sample size to 2000, the CPs of these three methods generally increase to be around 95%, but the CPs when η = 1 and τ = 1 are still underestimated. Thus, for this simulation setting, we conduct an additional simulation study with larger sample sizes (3000 and 4000), and the corresponding results are presented in Table S1. It is shown in Table S1 that the CPs of these three methods are closer to 95% when the sample size continues to increase. This is explainable by the fact that qualitative traits generally require larger samples to achieve the same CPs than quantitative traits. In addition, we can see from Table 4 that the Fieller’s method has higher CPs under various simulation settings for the qualitative trait. However, according to Table 3, when the trait is qualitative, the NPs of the Fieller’s method are relatively high, which means that many CIs obtained by the Fieller’s method are the noninformative intervals (i.e., [ 0 ,   2 ] ). This may explain why the CPs of the Fieller’s method are on the high side. Further, from Table 4 and Table 5, the W m e a n , W m e d i a n , W s d and W i q r of the GBN and GBU methods are smaller than those of the PF and Fieller’s methods in most situations. The GBN method has the smallest W m e a n , W m e d i a n and W i q r in most cases, and it also has the smallest W s d under all the simulated settings. As can be seen from Table 4, when the trait is qualitative and n = 500 , the W m e d i a n ’s of the Fieller’s method are all 2, which indicates that in this case, more than half of the CIs based on the Fieller’s method are the noninformative intervals. This is consistent with the results of the NPs in Table 3. When the sample size increases, or the trait turns from qualitative into quantitative, the W m e a n ’s and W m e d i a n ’s of the four interval estimation methods greatly decrease. However, for the W s d and W i q r , there are different trends in some situations. For example, when the trait is qualitative, the W s d ’s and W i q r ’s of the four methods become larger in most cases as the sample size increases. Note that the widths of the intervals obtained by the four methods are closer to 2 and the corresponding variation will be smaller when n = 500 . With the sample size increasing, the widths of the intervals gradually decrease and the corresponding variation appears larger, which may cause the bigger W s d and W i q r .
In the case of τ = 0.6 , the four methods have larger W m e a n ’s and W m e d i a n ’s in most cases when η changes from 0, 0.4 to 1, while for the scenario of τ = 1 , the four methods show a trend of first increasing and then decreasing on most occasions, except that the W m e a n ’s and W m e d i a n ’s of the Fieller’s method are gradually larger for the qualitative trait. When the trait is quantitative and τ = 0.6 , the W s d ’s and W i q r ’s of the four methods become larger with η increasing from 0, 0.4 to 1, irrespective of the sample size. When the trait is qualitative, n = 500 and τ = 0.6 , as η is bigger, the W s d ’s and W i q r ’s of the four methods generally are smaller, while when n = 2000 , the W s d ’s of the four methods and the W i q r ’s of the GBN and GBU methods are relatively stable, and the W i q r ’s of the PF and Fieller’s methods generally become larger. For the quantitative trait with n = 500 and τ = 1 , with the increase of η , the W s d ’s and W i q r ’s of the GBN, GBU and Fieller’s methods appear smaller and those of the PF method are larger in most situations, while in the case of n = 2000 , the four methods usually have larger W s d ’s and W i q r ’s. When the trait is qualitative and τ = 1 , with η increasing, the W s d ’s and W i q r ’s of the GBN and GBU methods present a tendency of first decreasing and then increasing on most occasions, while those of the PF method are larger in most cases, and those of the Fieller’s method become smaller, irrespective of the sample size. On the other hand, when other parameters are fixed, the W m e a n ’s and W m e d i a n ’s of the four methods with τ = 0.6 are smaller than those with τ = 1 , except for the W m e a n ’s of the GBN, GBU and PF methods and the W m e d i a n ’s of the GBN and GBU methods for the quantitative trait with n = 500 and η = 1 , and those for the qualitative trait with n = 500 or 2000 , and η = 1 . Under the scenarios where η is kept unchanged, the W s d ’s and W i q r ’s of the GBN, GBU and Fieller’s methods with τ = 0.6 are generally larger than those with τ = 1 for the quantitative trait with n = 500 , and the qualitative trait with n = 500 or 2000 , while there are different trends for the quantitative trait with n = 2000 . In addition, the W s d ’s and W i q r ’s of the PF method with τ = 0.6 generally are smaller than those with τ = 1 , when other parameters are fixed.
Figure 3, Figure 4 and Figures S7–S12 are the scatter plots of the widths of the 95% HPDIs or CIs obtained by the four interval estimation methods (GBN, GBU, PF and Fieller) against the true values of γ under different simulation settings. We can clearly observe the distributions of the widths of the HPDIs or CIs through these figures. For example, Figure 3 and Figure 4 are the scatter plots of the widths of the HPDIs or CIs against the true values of γ for the quantitative trait with n = 500 , and τ = 0.6 and 1, respectively. In each figure, subplots (a)–(d) (four subplots in the first row) are respectively the scatter plots for the GBN, GBU, PF and Fieller’s methods with η = 0 ; subplots (e)–(h) (four subplots in the second row) and subplots (i)-(l) (four subplots in the third row) are the corresponding scatter plots with η = 0.4 and 1, respectively. It can be seen from the four subplots in the same row of each figure that the distributions of the widths of the HPDIs for the GBN and GBU methods are similar, and both have smaller dispersions than those of the CIs for the PF and Fieller’s methods. Furthermore, these figures display that the distributions of the interval widths for the PF and Fieller’s methods are greatly more skewed towards 2 than the GBN and GBU methods. We respectively compare subplots (a), (e) and (i) for the GBN method with subplots (b), (f) and (j) for the GBU method and find that the dispersions of the widths of the HPDIs for the GBN method are slightly smaller than the GBU method. Additionally, subplots (c), (g) and (k) for the PF method, and subplots (d), (h) and (l) for the Fieller’s method, show that the PF and Fieller’s methods may yield empty sets or noninformative intervals (displayed by the blue points), and the Fieller’s method may also get discontinuous intervals (shown by the orange points). By comparing the subplots in different rows of each figure (Figure 3 and Figure S7) when the trait is quantitative and τ = 0.6 , the dispersions of the widths of the HPDIs or CIs become slightly larger as η changing from 0, 0.4 to 1, and it can also be seen from Figure 3 that the distributions of the interval widths are a little more skewed towards 2. On the other hand, when the trait is qualitative with τ = 0.6 (Figures S9 and S11), there are no obvious trends in the dispersions of the interval widths, except that their distributions are more skewed towards 2. However, under the situation that τ = 1 (Figure 4, Figures S8, S10 and S12), the points in these figures become less discrete in most cases when η increases, and the overall widths of the four interval estimation methods also somewhat decrease, except for the scenarios where the trait is quantitative and n = 2000 , and the trait is qualitative and n = 500 . Further, by comparing the figures for different values of τ (Figure 3, Figures S7, S9 and S11 vs. Figure 4, Figures S8, S10 and S12, respectively), it can be found that the overall widths of the HPDIs or the CIs obtained by the four interval estimation methods with τ = 0.6 are generally smaller than those with τ = 1 , except for those with η = 1 . Lastly, as the trait turns from qualitative into quantitative (Figures S9–S12 vs. Figure 3, Figure 4, Figures S7 and S8, respectively) or the sample size increases (Figure 3, Figure 4, Figures S9 and S10 vs. Figures S7, S8, S11 and S12, respectively), the performances of the four interval estimation methods are greatly improved.

3.3. Application to MCTFR Data

The MCTFR Genome-Wide Association Study of Behavioral Disinhibition is a family-based epidemiological study of substance abuse and related psychopathology. The dataset can be made available from the database of Genotypes and Phenotypes with accession numbers 86747-6 and 95621-5 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1, accessed on 5 January 2022). The dataset includes 2183 families and 7377 participants (3831 female subjects and 3546 male subjects). Among them, only 5960 subjects have both the phenotypic and genotypic data, while others do not have phenotypic data or do not have genotypic data. There are five quantitative traits included in this dataset: the nicotine composite score, the alcohol consumption composite score, the alcohol dependence composite score, the illicit drug composite score and the non-substance use related behavioral disinhibition composite score. To avoid the influence of family structure on the results, we exclude offspring from the real data application. At the same time, we only need the information of female subjects, so we also exclude male subjects from the analysis. Meanwhile, 12,354 SNPs on the X chromosome are included in the dataset. We use the following quality control criteria to filter the SNPs [48,49]: (1) genotype call rate being less than 99%, (2) MAF being smaller than 1 × 10 5 , (3) individual call rate being below 99%, and (4) the p value of the Hardy–Weinberg equilibrium test being less than 1 × 10 6 . Finally, 1994 female subjects and 12,342 SNPs on the X chromosome are utilized to conduct real data analysis. Since we estimate the degree of the skewness of the XCI based on genes, we first need to find the genes which each SNP belongs to. Based on the GRCH38 (Genome Reference Consortium Human Genome Build 38, https://uswest.ensembl.org/, accessed on 25 February 2022) reference, we use the “getBM” function in the R package “biomaRt” to match the SNPs to the genes on the X chromosome [45]. As such, we find 733 matched genes, while there are some genes containing only a single SNP in the dataset. As there have been several methods available to estimate the degree of the skewness of the XCI for a single SNP, we exclude genes consisting of only one SNP. Therefore, only 493 genes are included in the subsequent analysis.
Note that estimating γ requires the genes on the X chromosome to be associated with the traits. So, we need to test if the associations between the genes and the traits exist before using our proposed methods to estimate the degree of skewness of the XCI. Notice that the five traits in the MCTFR dataset do not follow normal distributions; therefore, we use the rank-based inverse normal transformation to transform the trait data [50]. Further, to adjust the effects of other variables, we incorporate two covariates, age and year of birth, into the application [48]. Due to the fact that we only use female subjects, we still apply the adaptive sum test proposed by Iuliana et al. [35] to test for the association between each gene and each trait. Unlike other multi-locus association analysis methods, when there are both rare and common variants in a gene, the adaptive sum test still maintains high test power. We set the significance level to be α = 0.05 / ( 5 × 493 ) = 2.03 × 10 5 based on the Bonferroni correction. After identifying the genes associated with the traits, we calculate the four point estimates of γ ( γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F ), and then use the GBN, GBU, PF and Fieller’s methods to obtain the corresponding HPDIs or CIs.
We finally identify only one gene, TMEM47, statistically significantly associated with the alcohol dependence composite score (p value = 2.32 × 10 6 ). There are two SNPs (rs10522027 and rs5928615) included in the gene. The estimated MAFs of these two SNPs are 0.1407 and 0.0998, respectively, which means that both SNPs only contain common variants. It has been confirmed that TMEM47 is located in the NC_000023.11 region and includes three exons. Studies have shown that the gene is expressed in the bladder, adipose and 23 other tissues and found that the overexpression of TMEM47 may induce resistance in patients to certain chemotherapy drugs [51,52]. The four point estimates ( γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F ) of γ for the gene are 0.4703, 0.4547, 0.4816 and 0.4847, and the 95% HPDIs or CIs derived by the GBN, GBU, PF and Fieller’s methods are ( 0.0023 ,   1.2380 ) , ( 0.0337 ,   1.3083 ) , ( 0.0562 ,   1.2410 ) and ( 0.0557 ,   1.3896 ) , respectively. That is to say, the point estimates are all less than 0.5, while the 95% HPDIs or CIs all contain 1, which means that the XCI pattern for TMEM47 on the alcohol dependence composite score may be the XCI-R or the XCI-E. By comparing the interval widths of these four interval estimation methods, we find that the width of the CI obtained by the PF method is the shortest, followed by the HPDI obtained by the GBN method, and the longest is the CI yielded by the Fieller’s method.

4. Discussion

In this paper, we propose four point estimates ( γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F ) and four interval estimation methods (GBN, GBU, PF and Fieller) of the degree of the skewness of the XCI for a gene (i.e., γ ). Among the point estimates, γ ^ G F is constructed by truncating the ratio of the two regression coefficients by the interval [ 0 ,   2 ] . And, γ ^ G P F is obtained by choosing the penalty parameter λ = Z 1 α / 2 2 / 4 , and respectively correcting the denominator and the numerator, which is also truncated by [ 0 ,   2 ] . Both the γ ^ G B N and γ ^ G B U are developed, based on the Bayesian theory, by considering the prior information of γ [ 0 ,   2 ] , and the corresponding prior distributions of γ are respectively a truncated normal distribution and a uniform distribution. Use of     γ ^ G B N and γ ^ G B U can avoid the extreme point estimates of γ (0 or 2) occurring. Among the interval estimation methods, the Fieller’s method has been widely used to construct the CIs of a ratio estimate. The PF method can always get the bounded CIs by choosing an appropriate penalty parameter. The GBN and GBU methods calculate the HPDIs of the samples randomly chosen from the approximate posterior distributions of γ as the credible intervals, which can avoid empty sets, noninformative intervals (i.e., [ 0 ,   2 ] ) and discontinuous intervals to occur. We conducted extensive simulation studies to compare their performances, by simulating different types of traits (quantitative and qualitative), different sample sizes ( n = 500 and 2000), different proportions of rare variants among all the SNPs considered ( η = 0 , 0.4 and 1), and different proportions of the SNPs with positive effects among all the SNPs considered ( τ = 0.6 and 1). The simulation results showed that there may exist some extreme point estimates for γ ^ G P F and γ ^ G F , especially when the sample size is small or the proportion of rare variants is high. The least MSE, in most situations, is derived from γ ^ G B N , and the MSEs of γ ^ G B N and γ ^ G B U are smaller than those of γ ^ G P F and γ ^ G F . As for the interval estimation, the CIs derived by the Fieller’s method may be empty sets, noninformative intervals and discontinuous intervals. Although the PF method can avoid discontinuous intervals, the resulting CIs can be empty sets and noninformative intervals. In addition, most of the CPs of the GBN and GBU methods can be controlled around 95%, and a larger sample size is required only when the trait is qualitative and all the SNPs are rare variants. For qualitative traits, the CPs of the PF method appear a little low when the sample size is relatively small. However, the CPs of the Fieller’s method seem to be well controlled, which is due to the large proportion of noninformative intervals. The GBN method has the smallest W m e a n , W m e d i a n and W i q r in most situations, and the least W s d under all the simulation settings. Therefore, we recommend using γ ^ G B N and the GBN method to estimate the degree of the XCI skewing in practical applications.
On the other hand, concerning the simulation settings and the simulation results, we further discuss the following issues. Firstly, we consider the influence of the proportion of rare variants ( η ) and the proportion of the SNPs with positive effects ( τ ) among all the SNPs in the gene under study on the estimation results. When τ = 0.6 and other parameters are fixed, the proportions of the extreme values (0 and 2) for γ ^ G P F and γ ^ G F with η = 0 are generally less than those with η = 1 , while they have no obvious trends for other cases of different values of η and τ . In general, the MSEs of the four point estimates generally become larger as η changes from 0, 0.4 to 1 and other parameters are kept unchanged. The four point estimates with τ = 0.6 always have smaller MSEs than τ = 1 . The changing trends of the EPs, NPs and DPs of the PF and Fieller’s methods with the increase of η are related to τ . Furthermore, the EPs and NPs of the PF and Fieller’s methods with τ = 0.6 generally are smaller than τ = 1 , while the DPs of the Fieller’s method with τ = 0.6 are larger than or equal to those with τ = 1 . On the other hand, in the case of τ = 0.6 , the four interval estimation methods have larger W m e a n ’s and W m e d i a n ’s in most cases with η changing from 0, 0.4 to 1, while for the scenario of τ = 1 , those of the four methods show a trend of first increasing and then decreasing on most occasions. The changing tendencies of the W s d ’s and W i q r ’s of the four methods, with η increasing, are affected by the trait type, n and τ . When other parameters are kept unchanged, the W m e a n ’s and W m e d i a n ’s of the four methods with τ = 0.6 are smaller than those with τ = 1 in most cases. Besides this, the findings, by comparing the W s d ’s and W i q r ’s of the GBN, GBU and Fieller’s methods for τ = 0.6 with those for τ = 1 , are related to the trait type and n , while the W s d ’s and W i q r ’s of the PF method with τ = 0.6 are generally smaller than those with τ = 1 . Secondly, to better evaluate the performances of the proposed methods, we set the degrees of the XCI skewing at all the SNPs in the gene to be the same in our simulation studies. For example, when we calculate the MSEs of the point estimates and the CPs of the HPDIs or the CIs, a single true value of γ for each replicate is required. However, note that there may be different degrees of the XCI skewing at different SNPs, and, actually, we can also consider this issue in our simulation studies, although we have no appropriate evaluation indexes to assess the performances of the proposed methods for this situation. Finally, when we simulate quantitative traits, the random error ε i is generated from the standard normal distribution, where the standard deviation ( σ ) is equal to 1. To further illustrate the effect of different values of σ on the estimation results, we conducted additional simulation studies with n = 2000 and assume that ε i follows N ( 0 ,   4 ) , where σ = 2 . The corresponding results are presented in Tables S2–S4 and Figures S13–S16. As can be seen from these tables and figures, the Bayesian methods still have obvious advantages in both the point estimation and the interval estimation. Further, the four point estimation methods, and the four interval estimation methods with σ = 2 , perform worse than σ = 1 .
We applied the proposed methods to the MCTFR data and identified a gene, TMEM47, which is statistically significantly associated with the alcohol dependence composite score. However, although the four point estimates of γ for the gene TMEM47 on the alcohol dependence composite score are all smaller than 0.5, the corresponding 95% HPDIs or CIs all contain 1, which means that the XCI pattern for this gene may not be the XCI-S. Further, we observed that the width of the CI obtained by the PF method is the shortest, followed by the HPDI obtained by the GBN method, and the longest was the CI yielded by the Fieller’s method. However, it should be noted that the PF method may not control the CP well (e.g., Table S3).
Last, but not least, there are still some issues in our proposed methods which need to be discussed. Firstly, we would like to further discuss the effect of the truncation by the interval [ 0 ,   2 ] on the point estimation and the interval estimation of γ . When we use the γ ^ G P F and γ ^ G F to estimate γ , both of them are truncated by [ 0 ,   2 ] . If the point estimates before the truncation ( γ ^ * and γ ^ ) lie outside [ 0 ,   2 ] , γ ^ G P F and γ ^ G F become the extreme values (0 or 2). Correspondingly, when using the PF and Fieller’s methods to construct the CIs of γ , it is easy to obtain empty sets or noninformative intervals. On the contrary, the Bayesian methods can avoid extreme point estimates, empty sets and noninformative intervals by specifying the appropriate prior distributions of γ and making full use of the constraint condition of γ [ 0 ,   2 ] . In addition, the extreme point estimate of 0 (2) means that the XCI is completely skewed towards the minor alleles (major alleles) at all the SNPs in a gene. However, these phenomena are not common in practice [2]. Meanwhile, it should be noted that empty sets and noninformative intervals are not informative, and the discontinuous CIs are also not useful, because the discontinuous CIs cannot be clearly explained in practice. Secondly, since the XCI patterns at different SNPs may be different, our estimated γ ^ is just the mean degree of the skewness of the XCI over all the SNPs in the gene under study, and we cannot obtain the degree of the skewness of the XCI for each SNP in this gene. Meanwhile, in the process of estimating γ , the target allele is the minor one at each SNP, and it is not possible to distinguish the disease allele from the normal allele at each SNP. Therefore, we can only identify whether or not the XCI of the gene is skewed towards the minor alleles, but it is not possible to know whether the XCI is skewed towards the disease alleles or the normal alleles. Thirdly, the proposed Bayesian methods need to specify the prior distributions of all the unknown parameters in advance, and the selection of the prior distributions may have a certain impact on the results. For simplicity, we only considered two possible prior distributions for γ , and one prior distribution for each of the other unknown parameters. However, the prior distributions of these parameters are usually unknown, and we cannot guarantee that the weak prior distributions we used are the most appropriate. We provide an R package named GEXCIS, which is publicly available at https://github.com/Meng-KaiLi/GEXCIS (accessed on 30 April 2022), and can be used to estimate the degree of the skewness of the XCI for genes through the proposed methods in this paper. This R package also allows researchers to specify the prior distribution of each unknown parameter from their own research backgrounds. Fourthly, the Bayesian methods use the HMC algorithm for the sampling, which is not affected by the correlation between unknown parameters. Therefore, to improve computational efficiency, we assumed that all the unknown parameters are independent. However, the Bayesian methods, taking the correlation between the parameters into account, should have better performance, which is our future work. Fifthly, if the HPDIs or CIs we get contain 1, which means that the XCI pattern for the gene is the XCI-R or the XCI-E, our proposed methods cannot distinguish them. Therefore, in our future study, we will consider including males’ information to distinguish the XCI-R from the XCI-E. Finally, the proposed methods are only applicable to independent female subjects, and we will extend them in future so that they could accommodate the family data.

5. Conclusions

We propose four point estimates and four interval estimation methods to estimate γ of genes. Among the four point estimates, γ ^ G F may have the extreme point estimates, and γ ^ G P F can only reduce the occurrence of the extreme point estimates equal to 2, while γ ^ G B N and γ ^ G B U can avoid the extreme point estimates occurring. As for the four interval estimation methods, the Fieller’s method may derive empty sets, discontinuous intervals and noninformative intervals, and the PF method can avoid the occurrence of discontinuous intervals and get less noninformative intervals, while the GBN and GBU methods do not yield these three types of the intervals. However, it should be noted that through these proposed methods, we cannot obtain the degree of the skewness of the XCI for each SNP in the gene, and cannot know whether the XCI is skewed towards the disease alleles or the normal alleles. In summary, the point estimates obtained by the GBN method always have the least MSE, and the HPDIs of the GBN method generally have the shortest width and the lowest variation, so we recommend using the GBN method in practical applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes13050827/s1, Table S1: Results of point estimations and interval estimations for γ among 500 replications with n = 3000 and 4000, η = 1 , τ = 1 and σ = 1 for qualitative trait; Table S2: MSEs of γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F among 500 replications with n = 2000 and σ = 2 for quantitative trait; Table S3: CPs (%), W m e a n and W m e d i a n of GBN, GBU, PF and Fieller’s methods among 500 replications with n = 2000 and σ = 2 for quantitative trait; Table S4: W s d and W i q r of GBN, GBU, PF and Fieller’s methods among 500 replications with n = 2000 and σ = 2 for quantitative trait; Figures S1–S6: Scatter plots of point estimates of γ against true values of γ for quantitative ( σ = 1 ) or qualitative trait with n = 500 and 2000, and τ = 0.6 and 1; Figures S7–S12: Widths of HPDIs or CIs of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative ( σ = 1 ) or qualitative trait with n = 500 and 2000, and τ = 0.6 and 1; Figures S13 and S14: Scatter plots of point estimates of γ against true values of γ for quantitative trait with n = 2000 , τ = 0.6 and 1, and σ = 2 ; Figures S15 and S16: Widths of HPDIs or CIs of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative trait with n = 2000 , τ = 0.6 and 1, and σ = 2 .

Author Contributions

Conceptualization, J.-Y.Z.; methodology, M.-K.L. and Y.-X.Y.; software, M.-K.L., Y.-X.Y. and J.-Y.Z.; validation, M.-K.L., Y.-X.Y., B.Z. and K.-W.W.; writing—original draft preparation, M.-K.L. and Y.-X.Y.; writing—review and editing, B.Z., K.-W.W., W.K.F. and J.-Y.Z.; supervision, W.K.F. and J.-Y.Z.; project administration, J.-Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 82173619 and 81773544, the Science and Technology Planning Project of Guangdong Province, grant number 2020B1212030008, and the Hong Kong Research Grants Council, grant number 17302919.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1 (accessed on 5 January 2022).

Acknowledgments

A Minnesota Center for Twin and Family Research (MCTFR) was supported by the National Institute on Drug Abuse, grant number U01 DA024417. The sample ascertainment and data collection in MCTFR data were supported by the National Institute on Drug Abuse, grant numbers R37 DA05147 and R01 DA13240, the National Institute on Alcohol Abuse and Alcoholism, grant numbers R01 AA09367 and R01 AA11886, and the National Institute of Mental Health, grant number R01 MH66140.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We assume that γ is the mean degree of the skewness of the XCI for the gene under study. For the i th female, we have X i = j = 1 J ω j [ γ g i j ( 1 ) + ( 2 γ ) g i j ( 2 ) ] . On the other hand, when supposing that the degree of the skewness of the XCI at the j th SNP is γ j , the genotypic values of genotypes d j d j , D j d j and D j D j at the j th SNP of the i th female are 0, γ j and 2, respectively. Similar to the construction process of X i , we can get X i * = j = 1 J ω j [ γ j g i j ( 1 ) + ( 2 γ j ) g i j ( 2 ) ] . Under the assumption of i = 1 n X i = i = 1 n X i * , we have
i = 1 n j = 1 J ω j [ γ g i j ( 1 ) + ( 2 γ ) g i j ( 2 ) ] = i = 1 n j = 1 J ω j [ γ j g i j ( 1 ) + ( 2 γ j ) g i j ( 2 ) ] ,
and
γ i = 1 n j = 1 J ω j ( g i j ( 1 ) g i j ( 2 ) ) = i = 1 n j = 1 J ω j ( g i j ( 1 ) g i j ( 2 ) ) γ j .
Then,
γ j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) = j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) γ j ,
where g . j ( 1 ) = i = 1 n g i j ( 1 ) and g . j ( 2 ) = i = 1 n g i j ( 2 ) . Finally, we have
γ = j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) γ j j = 1 J ω j ( g . j ( 1 ) g . j ( 2 ) ) .

References

  1. Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 1961, 190, 372–373. [Google Scholar] [CrossRef] [PubMed]
  2. Amos-Landgraf, J.M.; Cottle, A.; Plenge, R.M.; Friez, M.; Schwartz, C.E.; Longshore, J.; Willard, H.F. X chromosome-inactivation patterns of 1,005 phenotypically unaffected females. Am. J. Hum. Genet. 2006, 79, 493–499. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Plenge, R.M.; Stevenson, R.A.; Lubs, H.A.; Schwartz, C.E.; Willard, H.F. Skewed X-chromosome inactivation is a common feature of X-linked mental retardation disorders. Am. J. Hum. Genet. 2002, 71, 168–173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Shvetsova, E.; Sofronova, A.; Monajemi, R.; Gagalova, K.; Draisma, H.; White, S.J.; Santen, G.; Chuva de Sousa Lopes, S.M.; Heijmans, B.T.; van Meurs, J.; et al. Skewed X-inactivation is common in the general female population. Eur. J. Hum. Genet. 2019, 27, 455–465. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Medema, R.H.; Burgering, B.M. The X factor: Skewing X inactivation towards cancer. Cell 2007, 129, 1253–1254. [Google Scholar] [CrossRef] [Green Version]
  6. Deng, X.; Berletch, J.B.; Nguyen, D.K.; Disteche, C.M. X chromosome regulation: Diverse patterns in development, tissues and disease. Nat. Rev. Genet. 2014, 15, 367–378. [Google Scholar] [CrossRef]
  7. Posynick, B.J.; Brown, C.J. Escape from X-chromosome inactivation: An evolutionary perspective. Front. Cell Dev. Biol. 2019, 7, 241. [Google Scholar] [CrossRef]
  8. Peeters, S.B.; Cotton, A.M.; Brown, C.J. Variable escape from X-chromosome inactivation: Identifying factors that tip the scales towards expression. Bioessays 2014, 36, 746–756. [Google Scholar] [CrossRef]
  9. Minks, J.; Robinson, W.P.; Brown, C.J. A skewed view of X chromosome inactivation. J. Clin. Invest. 2008, 118, 20–23. [Google Scholar] [CrossRef]
  10. Chabchoub, G.; Uz, E.; Maalej, A.; Mustafa, C.A.; Rebai, A.; Mnif, M.; Bahloul, Z.; Farid, N.R.; Ozcelik, T.; Ayadi, H. Analysis of skewed X-chromosome inactivation in females with rheumatoid arthritis and autoimmune thyroid diseases. Arthritis Res. Ther. 2009, 11, R106. [Google Scholar] [CrossRef] [Green Version]
  11. Sun, Z.; Fan, J.; Wang, Y. X-chromosome inactivation and related diseases. Genet. Res. 2022, 2022, 1391807. [Google Scholar] [CrossRef] [PubMed]
  12. Okumura, K.; Fujimori, Y.; Takagi, A.; Murate, T.; Ozeki, M.; Yamamoto, K.; Katsumi, A.; Matsushita, T.; Naoe, T.; Kojima, T. Skewed X chromosome inactivation in fraternal female twins results in moderately severe and mild haemophilia B. Haemophilia 2008, 14, 1088–1093. [Google Scholar] [CrossRef] [PubMed]
  13. Garagiola, I.; Mortarino, M.; Siboni, S.M.; Boscarino, M.; Mancuso, M.E.; Biganzoli, M.; Santagostino, E.; Peyvandi, F. X chromosome inactivation: A modifier of factor VIII and IX plasma levels and bleeding phenotype in Haemophilia carriers. Eur. J. Hum. Genet. 2021, 29, 241–249. [Google Scholar] [CrossRef]
  14. Zuo, T.; Wang, L.; Morrison, C.; Chang, X.; Zhang, H.; Li, W.; Liu, Y.; Wang, Y.; Liu, X.; Chan, M.; et al. FOXP3 is an X-linked breast cancer suppressor gene and an important repressor of the HER-2/ErbB2 oncogene. Cell 2007, 129, 1275–1286. [Google Scholar] [CrossRef] [Green Version]
  15. Li, G.; Jin, T.; Liang, H.; Tu, Y.; Zhang, W.; Gong, L.; Su, Q.; Gao, G. Skewed X-chromosome inactivation in patients with esophageal carcinoma. Diagn. Pathol. 2013, 8, 55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Simmonds, M.J.; Kavvoura, F.K.; Brand, O.J.; Newby, P.R.; Jackson, L.E.; Hargreaves, C.E.; Franklyn, J.A.; Gough, S.C. Skewed X chromosome inactivation and female preponderance in autoimmune thyroid disease: An association study and meta-analysis. J. Clin. Endocrinol. Metab. 2014, 99, E127–E131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Giliberto, F.; Radic, C.P.; Luce, L.; Ferreiro, V.; de Brasi, C.; Szijan, I. Symptomatic female carriers of Duchenne muscular dystro-phy (DMD): Genetic and clinical characterization. J. Neurol. Sci. 2014, 336, 36–41. [Google Scholar] [CrossRef] [Green Version]
  18. Sangha, K.K.; Stephenson, M.D.; Brown, C.J.; Robinson, W.P. Extremely skewed X-chromosome inactivation is increased in women with recurrent spontaneous abortion. Am. J. Hum. Genet. 1999, 65, 913–917. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Zhang, Y.; Xu, S.Q.; Liu, W.; Fung, W.K.; Zhou, J.Y. A robust test for X-chromosome genetic association accounting for X-chromosome inactivation and imprinting. Genet. Res. 2020, 102, e2. [Google Scholar] [CrossRef] [Green Version]
  20. Zhang, L.; Martin, E.R.; Morris, R.W.; Li, Y.J. Association test for X-linked QTL in family-based designs. Am. J. Hum. Genet. 2009, 84, 431–444. [Google Scholar] [CrossRef] [Green Version]
  21. Zheng, G.; Joo, J.; Zhang, C.; Geller, N.L. Testing association for markers on the X chromosome. Genet. Epidemiol. 2007, 31, 834–843. [Google Scholar] [CrossRef] [PubMed]
  22. Clayton, D. Testing for association on the X chromosome. Biostatistics 2008, 9, 593–600. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, J.; Yu, R.; Shete, S. X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet. Epidemiol. 2014, 38, 483–493. [Google Scholar] [CrossRef] [PubMed]
  24. Liu, W.; Wang, B.Q.; Liu-Fu, G.; Fung, W.K.; Zhou, J.Y. X-chromosome genetic association test incorporating X-chromosome inactivation and imprinting effects. J. Genet. 2019, 98, 99. [Google Scholar] [CrossRef] [PubMed]
  25. Ma, L.; Hoffman, G.; Keinan, A. X-inactivation informs variance-based testing for X-linked association of a quantitative trait. BMC Genomics 2015, 16, 241. [Google Scholar] [CrossRef] [Green Version]
  26. Gao, F.; Chang, D.; Biddanda, A.; Ma, L.; Guo, Y.; Zhou, Z.; Keinan, A. XWAS: A software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 2015, 106, 666–671. [Google Scholar] [CrossRef] [Green Version]
  27. Madsen, B.E.; Browning, S.R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009, 5, e1000384. [Google Scholar] [CrossRef] [Green Version]
  28. Li, B.; Leal, S.M. Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. Am. J. Hum. Genet. 2008, 83, 311–321. [Google Scholar] [CrossRef] [Green Version]
  29. Schork, N.J.; Murray, S.S.; Frazer, K.A.; Topol, E.J. Common vs. rare allele hypotheses for complex diseases. Curr. Opin. Genet. Dev. 2009, 19, 212–219. [Google Scholar] [CrossRef] [Green Version]
  30. Han, F.; Pan, W. A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 2010, 70, 42–54. [Google Scholar] [CrossRef] [Green Version]
  31. Ionita-Laza, I.; Buxbaum, J.D.; Laird, N.M.; Lange, C. A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet. 2011, 7, e1001289. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Price, A.L.; Kryukov, G.V.; de Bakker, P.I.; Purcell, S.M.; Staples, J.; Wei, L.J.; Sunyaev, S.R. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 2010, 86, 832–838. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Wu, M.C.; Lee, S.; Cai, T.; Li, Y.; Boehnke, M.; Lin, X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011, 89, 82–93. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Lee, S.; Emond, M.J.; Bamshad, M.J.; Barnes, K.C.; Rieder, M.J.; Nickerson, D.A.; NHLBI GO Exome Sequencing Project—ESP Lung Project Team; Christiani, D.C.; Wurfel, M.M.; Lin, X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 2012, 91, 224–237. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Ionita-Laza, I.; Lee, S.; Makarov, V.; Buxbaum, J.D.; Lin, X. Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Hum. Genet. 2013, 92, 841–853. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Ma, C.; Boehnke, M.; Lee, S.; GoT2D Investigators. Evaluating the calibration and power of three gene-based association tests of rare variants for the X chromosome. Genet. Epidemiol. 2015, 39, 499–508. [Google Scholar] [CrossRef] [PubMed]
  37. Turkmen, A.S.; Lin, S. Detecting X-linked common and rare variant effects in family-based sequencing studies. Genet. Epidemiol. 2021, 45, 36–45. [Google Scholar] [CrossRef]
  38. Xu, S.Q.; Zhang, Y.; Wang, P.; Liu, W.; Wu, X.B.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on family trios. BMC Genet. 2018, 19, 109. [Google Scholar] [CrossRef]
  39. Wang, P.; Zhang, Y.; Wang, B.Q.; Li, J.L.; Wang, Y.X.; Pan, D.; Wu, X.B.; Fung, W.K.; Zhoui, J.Y. A statistical measure for the skewness of X chromosome inactivation based on case-control design. BMC Bioinformatics 2019, 20, 11. [Google Scholar] [CrossRef]
  40. Li, B.H.; Yu, W.Y.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data. BMC Genom. Data 2021, 22, 24. [Google Scholar] [CrossRef]
  41. Wang, P.; Xu, S.; Wang, Y.X.; Wu, B.; Fung, W.K.; Gao, G.; Liang, Z.; Liu, N. Penalized Fieller’s confidence interval for the ratio of bivariate normal means. Biometrics 2021, 77, 1355–1368. [Google Scholar] [CrossRef] [PubMed]
  42. Stephens, M.; Balding, D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 2009, 10, 681–690. [Google Scholar] [CrossRef] [PubMed]
  43. Annis, J.; Miller, B.J.; Palmeri, T.J. Bayesian inference with Stan: A tutorial on adding custom distributions. Behav. Res. Methods 2017, 49, 863–886. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Kruschke J., K. Bayesian data analysis. Wiley Interdiscip. Rev. Cogn. Sci. 2010, 1, 658–676. [Google Scholar] [CrossRef] [PubMed]
  45. Wang, C.; Deng, S.; Sun, L.; Li, L.; Hu, Y.Q. A nonparametric test for association with multiple loci in the retrospective case-control study. Stat. Methods Med. Res. 2020, 29, 589–602. [Google Scholar] [CrossRef]
  46. Basu, S.; Pan, W. Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol. 2011, 35, 606–619. [Google Scholar] [CrossRef] [PubMed]
  47. Turkmen, A.S.; Yan, Z.; Hu, Y.Q.; Lin, S. Kullback-Leibler distance methods for detecting disease association with rare variants from sequencing data. Ann. Hum. Genet. 2015, 79, 199–208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  48. McGue, M.; Zhang, Y.; Miller, M.B.; Basu, S.; Vrieze, S.; Hicks, B.; Malone, S.; Oetting, W.S.; Iacono, W.G. A genome-wide association study of behavioral disinhibition. Behav. Genet. 2013, 43, 363–373. [Google Scholar] [CrossRef] [Green Version]
  49. Asadollahi, H.; Vaez Torshizi, R.; Ehsani, A.; Masoudi, A.A. An association of CEP78, MEF2C, VPS13A and ARRDC3 genes with survivability to heat stress in an F2 chicken population. J. Anim. Breed. Genet. 2022. [Google Scholar] [CrossRef]
  50. McCaw, Z.R.; Lane, J.M.; Saxena, R.; Redline, S.; Lin, X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 2020, 76, 1262–1272. [Google Scholar] [CrossRef]
  51. Ng, K.T.; Yeung, O.W.; Liu, J.; Li, C.X.; Liu, H.; Liu, X.B.; Qi, X.; Ma, Y.Y.; Lam, Y.F.; Lau, M.Y.; et al. Clinical significance and functional role of transmembrane protein 47 (TMEM47) in chemoresistance of hepatocellular carcinoma. Int. J. Oncol. 2020, 57, 956–966. [Google Scholar] [CrossRef]
  52. Men, X.; Su, M.; Ma, J.; Mou, Y.; Dai, P.; Chen, C.; Cheng, X.A. Overexpression of TMEM47 induces tamoxifen resistance in human breast cancer cells. Technol. Cancer Res. Treat. 2021, 20, 15330338211004916. [Google Scholar] [CrossRef]
Figure 1. Scatter plots of point estimates of γ against true values of γ for quantitative trait with n = 500 and τ = 0.6 . The blue points represent the extreme values (0 or 2). (a) γ ^ G B N with η = 0 ; (b) γ ^ G B U with η = 0 ; (c) γ ^ G P F with η = 0 ; (d) γ ^ G F with η = 0 ; (e) γ ^ G B N with η = 0.4 ; (f) γ ^ G B U with η = 0.4 ; (g) γ ^ G P F with η = 0.4 ; (h) γ ^ G F with η = 0.4 ; (i) γ ^ G B N with η = 1 ; (j) γ ^ G B U with η = 1 ; (k) γ ^ G P F with η = 1 ; (l) γ ^ G F with η = 1 .
Figure 1. Scatter plots of point estimates of γ against true values of γ for quantitative trait with n = 500 and τ = 0.6 . The blue points represent the extreme values (0 or 2). (a) γ ^ G B N with η = 0 ; (b) γ ^ G B U with η = 0 ; (c) γ ^ G P F with η = 0 ; (d) γ ^ G F with η = 0 ; (e) γ ^ G B N with η = 0.4 ; (f) γ ^ G B U with η = 0.4 ; (g) γ ^ G P F with η = 0.4 ; (h) γ ^ G F with η = 0.4 ; (i) γ ^ G B N with η = 1 ; (j) γ ^ G B U with η = 1 ; (k) γ ^ G P F with η = 1 ; (l) γ ^ G F with η = 1 .
Genes 13 00827 g001
Figure 2. Scatter plots of point estimates of γ against true values of γ for quantitative trait with n = 500 and τ = 1 . The blue points represent the extreme values (0 or 2). (a) γ ^ G B N with η = 0 ; (b) γ ^ G B U with η = 0 ; (c) γ ^ G P F with η = 0 ; (d) γ ^ G F with η = 0 ; (e) γ ^ G B N with η = 0.4 ; (f) γ ^ G B U with η = 0.4 ; (g) γ ^ G P F with η = 0.4 ; (h) γ ^ G F with η = 0.4 ; (i) γ ^ G B N with η = 1 ; (j) γ ^ G B U with η = 1 ; (k) γ ^ G P F with η = 1 ; (l) γ ^ G F with η = 1 .
Figure 2. Scatter plots of point estimates of γ against true values of γ for quantitative trait with n = 500 and τ = 1 . The blue points represent the extreme values (0 or 2). (a) γ ^ G B N with η = 0 ; (b) γ ^ G B U with η = 0 ; (c) γ ^ G P F with η = 0 ; (d) γ ^ G F with η = 0 ; (e) γ ^ G B N with η = 0.4 ; (f) γ ^ G B U with η = 0.4 ; (g) γ ^ G P F with η = 0.4 ; (h) γ ^ G F with η = 0.4 ; (i) γ ^ G B N with η = 1 ; (j) γ ^ G B U with η = 1 ; (k) γ ^ G P F with η = 1 ; (l) γ ^ G F with η = 1 .
Genes 13 00827 g002
Figure 3. Widths of highest posterior density intervals (HPDIs) or confidence intervals (CIs) of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative trait with n = 500 and τ = 0.6 . The blue points represent the widths of the empty sets or the noninformative intervals, and the orange points represent the widths of the discontinuous intervals. (a) GBN with η = 0 ; (b) GBU with η = 0 ; (c) PF with η = 0 ; (d) Fieller with η = 0 ; (e) GBN with η = 0.4 ; (f) GBU with η = 0.4 ; (g) PF with η = 0.4 ; (h) Fieller with η = 0.4 ; (i) GBN with η = 1 ; (j) GBU with η = 1 ; (k) PF with η = 1 ; (l) Fieller with η = 1 .
Figure 3. Widths of highest posterior density intervals (HPDIs) or confidence intervals (CIs) of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative trait with n = 500 and τ = 0.6 . The blue points represent the widths of the empty sets or the noninformative intervals, and the orange points represent the widths of the discontinuous intervals. (a) GBN with η = 0 ; (b) GBU with η = 0 ; (c) PF with η = 0 ; (d) Fieller with η = 0 ; (e) GBN with η = 0.4 ; (f) GBU with η = 0.4 ; (g) PF with η = 0.4 ; (h) Fieller with η = 0.4 ; (i) GBN with η = 1 ; (j) GBU with η = 1 ; (k) PF with η = 1 ; (l) Fieller with η = 1 .
Genes 13 00827 g003
Figure 4. Widths of HPDIs or CIs of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative trait with n = 500 and τ = 1 . The blue points represent the widths of the empty sets or the noninformative intervals. (a) GBN with η = 0 ; (b) GBU with η = 0 ; (c) PF with η = 0 ; (d) Fieller with η = 0 ; (e) GBN with η = 0.4 ; (f) GBU with η = 0.4 ; (g) PF with η = 0.4 ; (h) Fieller with η = 0.4 ; (i) GBN with η = 1 ; (j) GBU with η = 1 ; (k) PF with η = 1 ; (l) Fieller with η = 1 .
Figure 4. Widths of HPDIs or CIs of GBN, GBU, PF and Fieller’s methods against true values of γ for quantitative trait with n = 500 and τ = 1 . The blue points represent the widths of the empty sets or the noninformative intervals. (a) GBN with η = 0 ; (b) GBU with η = 0 ; (c) PF with η = 0 ; (d) Fieller with η = 0 ; (e) GBN with η = 0.4 ; (f) GBU with η = 0.4 ; (g) PF with η = 0.4 ; (h) Fieller with η = 0.4 ; (i) GBN with η = 1 ; (j) GBU with η = 1 ; (k) PF with η = 1 ; (l) Fieller with η = 1 .
Genes 13 00827 g004
Table 1. Proportions (%) of extreme values of γ ^ G P F and γ ^ G F among 500 replications.
Table 1. Proportions (%) of extreme values of γ ^ G P F and γ ^ G F among 500 replications.
Trait n η a τ b γ ^ G P F γ ^ G F
02Total02Total
Quantitative50000.68.610.619.28.611.820.4
500017.619.226.87.621.429.0
5000.40.69.68.217.89.610.620.2
5000.4111.216.027.211.221.232.4
50010.613.411.825.213.415.028.4
500119.09.018.09.015.824.8
200000.65.26.011.25.26.211.4
2000015.09.414.45.09.614.6
20000.40.65.64.610.25.65.010.6
20000.416.410.817.26.411.217.6
200010.69.87.016.89.87.217.0
2000111.412.213.61.413.815.2
Qualitative50000.619.612.832.419.620.039.6
5000123.817.040.823.820.444.2
5000.40.618.812.831.618.822.040.8
5000.4129.210.039.229.219.248.4
50010.622.09.031.022.019.241.2
5001127.80.628.427.87.835.6
200000.69.412.822.29.414.624.0
2000018.019.427.48.021.429.4
20000.40.614.610.825.414.613.227.8
20000.4113.416.429.813.420.033.4
200010.611.810.422.211.815.427.2
20001116.25.021.216.213.029.2
a Proportion of rare variants among all the SNPs; b proportion of the SNPs with positive effects among all the SNPs.
Table 2. Mean squared errors of γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F among 500 replications.
Table 2. Mean squared errors of γ ^ G B N , γ ^ G B U , γ ^ G P F and γ ^ G F among 500 replications.
Trait n η a τ b γ ^ G B N γ ^ G B U γ ^ G P F γ ^ G F
Quantitative50000.60.09760.10220.12360.1287
500010.14090.16010.23440.2549
5000.40.60.13350.13950.15790.1633
5000.410.19530.22480.30080.3601
50010.60.14140.15920.20790.2363
500110.16230.17030.26900.3475
200000.60.03590.03790.04030.0405
2000010.05410.06420.07930.0805
20000.40.60.04800.05120.05550.0558
20000.410.07550.07730.09220.0959
200010.60.04810.05090.05780.0591
2000110.06870.07270.09620.1160
Qualitative50000.60.27650.33820.48490.5503
500010.31000.40380.52860.5788
5000.40.60.33200.40870.57850.6344
5000.410.38260.47000.64160.7254
50010.60.34050.43290.59150.6369
500110.75190.76731.01901.0193
200000.60.12070.13670.15950.1668
2000010.13620.15030.21330.2306
20000.40.60.13200.14920.19370.2090
20000.410.21680.24600.33470.3647
200010.60.14310.16150.21440.2364
2000110.31630.32630.46840.5145
a Proportion of rare variants among all the SNPs; b proportion of the SNPs with positive effects among all the SNPs.
Table 3. Proportions (%) of empty sets (EPs), noninformative intervals (NPs), and discontinuous intervals (DPs) of PF and Fieller’s methods among 500 replications.
Table 3. Proportions (%) of empty sets (EPs), noninformative intervals (NPs), and discontinuous intervals (DPs) of PF and Fieller’s methods among 500 replications.
Trait n η a τ bPFFieller
EPNPDPEPNPDP
Quantitative50000.60.07.20.00.816.61.0
500010.019.00.00.221.80.0
5000.40.60.210.20.00.222.20.4
5000.411.427.20.00.433.80.0
50010.60.014.80.00.831.22.8
500116.83.60.01.03.60.0
200000.60.00.00.00.00.00.0
2000010.60.00.00.60.20.0
20000.40.60.00.00.00.20.00.0
20000.410.02.40.00.44.20.0
200010.60.00.20.00.02.20.0
2000110.20.20.00.20.60.0
Qualitative50000.60.043.40.00.665.02.8
500011.458.20.01.464.40.0
5000.40.60.045.40.00.068.24.0
5000.411.855.20.01.264.01.0
50010.60.044.00.00.475.03.6
5001110.453.40.00.054.20.0
200000.60.010.80.00.419.80.6
2000010.420.80.00.625.20.0
20000.40.60.014.40.00.227.01.4
20000.411.226.20.00.631.00.2
200010.60.019.00.00.236.62.2
20001112.44.80.00.216.00.0
a Proportion of rare variants among all the SNPs; b proportion of the SNPs with positive effects among all the SNPs.
Table 4. Coverage probability (CP, in %), W m e a n and W m e d i a n of GBN, GBU, PF and Fieller’s methods among 500 replications.
Table 4. Coverage probability (CP, in %), W m e a n and W m e d i a n of GBN, GBU, PF and Fieller’s methods among 500 replications.
Trait n η a τ bCP W m e a n W m e d i a n
GBNGBUPFFiellerGBNGBUPFFiellerGBNGBUPFFieller
Quantitative50000.696.295.895.895.21.23571.25241.23381.26741.24391.25711.20721.2328
5000196.297.097.895.81.35361.36951.45931.43751.39591.41521.47491.5010
5000.40.695.095.695.696.21.26631.28621.28151.33051.26621.29731.24491.2682
5000.4195.696.694.295.61.47181.49771.55551.58871.51581.55711.67341.6888
50010.696.296.695.494.21.34571.36891.33631.37671.40011.44901.29911.3461
5001194.695.487.893.81.28411.29831.29181.38271.31351.33161.48141.4465
200000.694.694.294.894.60.72160.72580.73770.74130.71490.72300.74060.7425
20000195.896.095.894.20.89340.89460.91840.92490.90680.90350.93960.9469
20000.40.694.095.494.494.60.78950.79580.80670.81520.77700.78500.80870.8124
20000.4195.696.297.496.21.04391.05051.08001.09501.04151.04201.08571.0828
200010.695.896.696.296.20.82840.83250.84060.85390.79330.79740.82110.8190
20001195.495.696.695.00.94830.95600.97501.00660.99880.99821.02941.0527
Qualitative50000.692.694.295.495.01.62891.66671.67201.72361.72021.77491.83542.0000
5000194.096.090.094.81.65751.69341.70531.75781.73871.78482.00002.0000
5000.40.693.094.693.696.01.67821.71931.69861.76681.75161.80331.87212.0000
5000.4193.094.884.694.01.67751.71541.61081.77881.73601.78302.00002.0000
50010.692.694.893.096.01.73181.77421.69811.79651.78371.82831.86592.0000
5001177.074.474.299.41.38961.35231.40881.87041.48541.47882.00002.0000
200000.694.695.896.695.01.25191.26861.25311.27741.23881.27101.19331.2177
20000197.096.897.295.61.38321.40101.48691.47341.41621.45021.54041.5295
20000.40.696.296.696.895.21.34681.36821.34431.39081.41631.45141.34431.3965
20000.4195.095.893.695.41.47651.50291.55651.57811.51531.56231.69851.6909
200010.696.496.894.295.01.42161.44881.38421.45161.52411.57721.31741.4640
20001189.889.684.698.61.38331.39671.37641.61431.45761.49361.70961.6751
a Proportion of rare variants among all the SNPs; b proportion of the SNPs with positive effects among all the SNPs.
Table 5. W s d and W i q r of GBN, GBU, PF and Fieller’s methods among 500 replications.
Table 5. W s d and W i q r of GBN, GBU, PF and Fieller’s methods among 500 replications.
Trait n η a τ b W s d W i q r
GBNGBUPFFiellerGBNGBUPFFieller
Quantitative50000.60.33090.36190.40660.48510.50360.56970.54030.6862
500010.30200.33640.44290.49480.46130.52740.69590.7625
5000.40.60.33120.36240.41980.48680.53340.59100.58620.8516
5000.410.26310.29170.48810.44980.37410.42440.62790.6390
50010.60.35850.38900.44920.54870.57650.63820.73461.0386
500110.25630.28910.60800.45680.30860.34870.76330.5616
200000.60.19610.21180.22510.23500.23690.26840.25200.2564
2000010.26230.28740.33810.35140.36090.40000.42810.4336
20000.40.60.22140.24190.25000.27230.28740.32030.29520.3094
20000.410.30840.33860.41540.44470.38160.45370.55500.5927
200010.60.27200.29410.30490.34550.34550.38400.35890.3830
2000110.31840.34420.45150.46610.39690.45190.66740.6647
Qualitative50000.60.25350.27270.38930.45650.28000.28410.59750.4816
500010.20050.21940.50120.44400.21400.23360.42910.3656
5000.40.60.19980.21290.35990.41050.20590.19660.56320.3726
5000.410.16110.17820.60860.43170.17480.16580.64700.2657
50010.60.15530.16320.37050.41440.11620.10550.54300.0354
500110.29330.37070.87490.24170.38470.55081.92120.1898
200000.60.35010.38240.44150.51420.56240.65110.66390.8792
2000010.29360.32610.44170.49110.44470.51200.73720.8589
20000.40.60.35180.38240.44110.50980.56820.63660.67471.0159
20000.410.24870.27800.49360.45450.35290.39630.64570.6883
200010.60.34560.37580.43500.52090.54820.60680.75290.9691
2000110.27620.31740.70950.35780.20320.25350.79920.3615
a Proportion of rare variants among all the SNPs; b proportion of the SNPs with positive effects among all the SNPs.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, M.-K.; Yuan, Y.-X.; Zhu, B.; Wang, K.-W.; Fung, W.K.; Zhou, J.-Y. Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation. Genes 2022, 13, 827. https://doi.org/10.3390/genes13050827

AMA Style

Li M-K, Yuan Y-X, Zhu B, Wang K-W, Fung WK, Zhou J-Y. Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation. Genes. 2022; 13(5):827. https://doi.org/10.3390/genes13050827

Chicago/Turabian Style

Li, Meng-Kai, Yu-Xin Yuan, Bin Zhu, Kai-Wen Wang, Wing Kam Fung, and Ji-Yuan Zhou. 2022. "Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation" Genes 13, no. 5: 827. https://doi.org/10.3390/genes13050827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop