Impact of Experimental Bias on Compositional Analysis of Microbiome Data

Yingtian Hu; Glen A. Satten; Yi-Juan Hu

doi:10.3390/genes14091777

,

and

¹

Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA

²

Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA

^*

Author to whom correspondence should be addressed.

Genes2023, 14(9), 1777;https://doi.org/10.3390/genes14091777

This article belongs to the Special Issue Statistical Analysis of Microbiome Data: From Methods to Application

Version Notes

Order Reprints

Review Reports

Abstract

Microbiome data are subject to experimental bias that is caused by DNA extraction and PCR amplification, among other sources, but this important feature is often ignored when developing statistical methods for analyzing microbiome data. McLaren, Willis, and Callahan (2019) proposed a model for how such biases affect the observed taxonomic profiles; this model assumes the main effects of bias without taxon–taxon interactions. Our newly developed method for testing the differential abundance of taxa, LOCOM, is the first method to account for experimental bias and is robust to the main effect biases. However, there is also evidence for taxon–taxon interactions. In this report, we formulated a model for interaction biases and used simulations based on this model to evaluate the impact of interaction biases on the performance of LOCOM as well as other available compositional analysis methods. Our simulation results indicate that LOCOM remained robust to a reasonable range of interaction biases. The other methods tend to have an inflated FDR even when there were only main effect biases. LOCOM maintained the highest sensitivity even when the other methods could not control the FDR. We thus conclude that LOCOM outperforms the other methods for compositional analysis of microbiome data considered here.

Keywords:

compositional effect; false-discovery rate (FDR); LOCOM; interaction bias; main effect bias; taxon ratios; test differential abundance

1. Introduction

Experimental bias is a pervasive feature of microbiome data because each step in the sequencing experiment workflow (i.e., DNA extraction, PCR amplification, amplicon sequencing, and bioinformatics processing) favors certain taxa over others [1,2,3]. Absent experimental procedures that produce unbiased data, it is therefore necessary to account for experimental bias when analyzing microbiome data. Fortunately, McLaren, Willis, and Callahan (MWC) [4] have proposed a model explaining how experimental bias affects microbiome data. In this model, the observed relative abundance of each taxon is a product of the taxon’s true relative abundance and a taxon-specific bias factor, normalized over all taxa observed in the sample. Each taxon-specific bias factor represents the accumulation of multiplicative biases over all steps in the experimental pipeline so that multiple sources of bias are described by a single factor. With the MWC model in mind, we developed LOCOM [5], a logistic regression-based compositional analysis method for detecting differentially abundant taxa, that only estimates parameters that are free from bias, i.e., are not affected by bias factors. To the best of our knowledge, LOCOM was the first method that accounted for experimental bias and was shown both analytically and numerically to be fully robust to any bias that follows the MWC model; none of the other existing methods for testing taxon differential abundance have considered experimental bias.

The MWC model assumes no between-taxon “interaction” bias, i.e., the presence or abundance of one taxon does not affect the bias factors of any other taxa in the sample. LOCOM may not be bias-robust if the “main effect” biases of the MWC model were supplemented by interaction biases. In fact, Zhao and Satten [6] investigated the existence of interaction biases by expanding the MWC model to include interaction terms and then fitting this generalized model to mock community samples [7]. They found a small amount of evidence for interaction biases, albeit with smaller magnitudes than the main effect biases in the MWC model. This finding is the motivation for this study. Here, we formulate a new model for generating interaction biases and use simulations based on this model to evaluate the impact of these interaction biases on the performance of LOCOM, as well as a number of existing approaches (ANCOM [8], ANCOM-BC [9], fastANCOM [10], ALDEx2 [11], WRENCH [12], DACOMP (v1.26) [13], LinDA [14], and the Wilcoxon rank-sum test of log-ratio transformed data after adding pseudo counts of either 0.5 or 1). Thus, this report provides the first assessment of the impact of interaction biases on the analysis of microbiome data. Note that this assessment can only be achieved by simulations instead of real data since we cannot control the nature of the bias even using model community data.

2. Methods

We generalize the MWC model using the framework of Zhao and Satten [6] by adding an interaction bias

θ_{j j^{'}}

to the log-linear model:

log (p_{i j}) = log (π_{i j}) + γ_{j} + \sum_{j^{'} \neq j} θ_{j j^{'}} π_{i j^{'}} + α_{i},

(1)

where

π_{i j}

is the true relative abundance of taxon j in sample i,

p_{i j}

is the expected value of the observed relative abundance,

γ_{j}

is the main effect bias for taxon j in the MWC model, and

α_{i}

is the sample-specific normalization factor that ensures the compositional constraint on

p_{i j}

. We followed [6] to introduce the effects of covariates

X_{i}

on taxon j by replacing

log (π_{i j})

in (1) with

log (π_{j}^{0}) + X_{i}^{T} β_{j}

, where

β_{j}

contains the effect sizes, and

π_{j}^{0}

is the true relative abundance of taxon j when

X_{i} = 0

. Note that, similar to all compositional analysis methods, the effect size parameters

β_{j}

here characterize differences in “absolute” abundances rather than “relative” abundances; relative abundances are further modulated by the normalization factor

α_{i}

which is a function of all the

β_{j}

s. The interaction

θ_{j j^{'}}

determines the extent to which taxon

j^{'}

affects the bias factor for taxon j. The assumption underlying the MWC model is

θ_{j j^{'}} = 0

for all j and

j^{'}

. We adopt the following model for generating the interaction biases relative to the main effect biases in our simulation studies:

θ_{j j^{'}} = - sign (γ_{j^{'}}) ϕ ϵ_{j j^{'}} | γ_{j} |,

(2)

where

ϕ

is a constant for all taxa pairs that we call the magnitude of interaction biases, and

ϵ_{j j^{'}}

is a non-negative error term with a mean of one.

The value of

ϕ

and the minus sign in (2) reflect the findings of Zhao and Satten [6], who considered a model similar to (1) but with an interaction that depended on the presence of a taxon rather than its relative abundance. In particular, they considered the model

log (p_{i j}) = log (π_{i j}) + γ_{j} + \sum_{j^{'} \neq j} {\tilde{θ}}_{j j^{'}} I [π_{i j^{'}} > 0] + α_{i},

(3)

where

{\tilde{θ}}_{j j^{'}}

determines the extent to which the presence of taxon

j^{'}

affects the bias factor for taxon j. Using data from all seven taxa in the Brooks mock community samples, they estimated the main effect biases

γ_{j}

s and interaction biases

{\tilde{θ}}_{j j^{'}}

s and presented the results in their Table 4. For the convenience of our readers, we display their results in Figure 1. First, we observe that the interaction biases have smaller magnitudes than the main effect biases; we calculated the mean of the interaction–main effect ratios (i.e.,

| {\tilde{θ}}_{j j^{'}} / γ_{j} |

) to be 0.204. Because

{\tilde{θ}}_{j j^{'}}

in model (3) corresponds to the product

θ_{j j^{'}} π_{i j^{'}}

in our model (1) and because most of the Brooks samples have three taxa with equal proportions, i.e.,

π_{i j^{'}} = 1 / 3

, we obtained the (approximate) mean of

| θ_{j j^{'}} / γ_{j} |

and, thus, an estimate for

ϕ

of 0.612. In our simulations, we varied the value of

ϕ

from 0 to 4, which includes our empirical estimate of 0.612 while also extending the range of

ϕ

to allow for stronger interaction biases than those found in the Brooks data. In addition, Figure 1 shows that the interaction bias in taxon j caused by taxon

j^{'}

has an opposite sign to the sign of the main effect bias in taxon

j^{'}

, except for some very small interaction biases. This implies that if taxon

j^{'}

has a measured abundance higher than its true abundance, it is associated with decreased measured abundances of other taxa, which seems plausible if we assume that only a relatively constant total number of amplicons are sequenced. Finally, interaction biases are not necessarily symmetric between a pair of taxa, as also noted by Zhao and Satten [6], so our model (2) also does not guarantee such symmetry.

Figure 1. Main effect biases (red dots) of the seven taxa in the Brooks mock community samples and interaction biases (black dots) that each taxon (in the x-axis) caused to the other six taxa, as estimated by Zhao and Satten [6].

Our simulation studies were based on models (1) and (2) and data on 856 taxa of the upper-respiratory tract (URT) microbiome by Charlson et al. [15]. We considered three scenarios for the distribution of interaction biases. In the first scenario (referred to as S-nondiff), we sampled

ϵ_{j j^{'}} / 2

from

N (0.5, 0 . 1^{2})

for all taxa pairs j and

j^{'}

. In the second scenario (referred to as S-diff-causal), we modified S-nondiff to sample

ϵ_{j j^{'}} / 2

from

B e t a (0.5, 0.5)

when taxon j was “causal” (i.e., associated with the trait of interest) and for all

j^{'}

. Both distributions have a mean of 0.5; however,

N (0.5, 0 . 1^{2})

has one mode at 0.5, whereas

B e t a (0.5, 0.5)

has two modes 0 and 1 and, hence, a variance of larger than

0 . 1^{2}

. In the third scenario (referred to as S-diff-half), we used

N (0.5, 0 . 1^{2})

for half of the randomly selected taxa js and

B e t a (0.5, 0.5)

for the remaining half of the taxa. Unlike S-nondiff, S-diff-causal has a modest (but trait-related) variation in

ϵ_{j j^{'}}

, while S-diff-half has a large variation in

ϵ_{j j^{'}}

that is not trait-related. The distribution of the bias factor due to taxon–taxon interactions,

η_{i j} = \sum_{j^{'} \neq j} θ_{j j^{'}} π_{i j^{'}}

, in each taxon j across all samples is displayed in Figures S1 and S2 and has the following features. In all scenarios and for all taxa js, the mean of

η_{i j}

was approximately zero due to the averaging of contributions with different directionalities. The variance of

η_{i j}

increased as

γ_{j}

increased at a given

ϕ

. S-nondiff and S-diff-causal had the same

η_{i j}

values at the null taxa and different

η_{i j}

values at the causal taxa. Finally, in S-diff-half,

η_{i j}

had a larger variance in one half of the taxa than in the other half of the taxa.

Additional details of the simulation settings generally follow the simulations in [5]. Specifically, we considered both binary and continuous traits of interest without any confounding covariates; we also considered a binary confounder when the trait was binary. We used the two sets of “causal” taxa (i.e., taxa that are associated with the trait) that were used in [5] and referred to as M1 and M2. In M1, 20 taxa were randomly sampled to be causal from the set of taxa having mean relative abundances in the URT data [15] greater than 0.005 (but excluding the most prevalent taxon). In M2, the top five most abundant taxa (having mean relative abundances of 0.105, 0.062, 0.054, 0.050, and 0.049) were selected to be causal. For simulations with a binary confounder, we assumed that the confounder was associated with 20 taxa under M1 (10 sampled at random from the causal taxa and 10 from the null taxa) and 5 taxa under M2 (2 from the causal taxa and 3 from the null taxa). We set the sample size to 100. Let

T_{i}

denote the trait and

C_{i}

denote the confounder for the ith sample so that

X_{i} = {(T_{i}, C_{i})}^{T}

. To generate a binary trait, we selected an equal number of samples with

T_{i} = 1

and

T_{i} = 0

. When a binary confounder was present, we drew

C_{i}

from

B e r n o u l l i (0.2)

in samples with

T_{i} = 0

and from

B e r n o u l l i (0.8)

in samples with

T_{i} = 1

. To generate a continuous trait, we sampled

T_{i}

from

U [- 1, 1]

. To simulate the read count data for the 856 taxa, we first sampled the baseline relative abundances

π_{i}^{(0)} = (π_{i 1}^{(0)}, π_{i 2}^{(0)}, \dots, π_{i J}^{(0)})

of all taxa for each sample from

D i r i c h l e t (\bar{π}, θ)

, in which

\bar{π}

and

θ

took the estimated mean and overdispersion (0.02) from fitting the Dirichlet–Multinomial model to the URT data. We formed the true relative abundances

π_{i j}

for all taxa by spiking the causal taxon

j^{'}

with an

exp (β_{j^{'}, 1} T_{i})

fold change, spiking the confounder-associated taxon

j^{″}

with an

exp (β_{j^{″}, 2} C_{i})

fold change, and normalizing the relative abundances, so that

π_{i j} = \frac{exp (β_{j, 1} T_{i} + β_{j, 2} C_{i}) π_{i j}^{(0)}}{\sum_{j^{*} = 1}^{J} exp (β_{j^{*}, 1} T_{i} + β_{j^{*}, 2} C_{i}) π_{i j^{*}}^{(0)}} .

We formed the observed relative abundances

p_{i j}

by additionally multiplying the bias factor

exp (γ_{j} + \sum_{j^{'}} θ_{j j^{'}} π_{i j^{'}})

, so that

p_{i j} = \frac{exp (γ_{j} + \sum_{j^{'}} θ_{j j^{'}} π_{i j^{'}} + β_{j, 1} T_{i} + β_{j, 2} C_{i}) π_{i j}^{(0)}}{\sum_{j^{*} = 1}^{J} exp (γ_{j^{*}} + \sum_{j^{'}} θ_{j^{*} j^{'}} π_{i j^{'}} + β_{j^{*}, 1} T_{i} + β_{j^{*}, 2} C_{i}) π_{i j^{*}}^{(0)}} .

Note that

β_{j, 1} = 0

for null taxa and

β_{j, 2} = 0

for confounder-independent taxa. For simplicity, we set

β_{j, 1} = β

, a common effect size of the trait, on all causal taxa. We generated the main effect bias

γ_{j}

from

N (0, 0 . 8^{2})

, which gave a range between 0.2 and 5 for most (95%) fold changes (

exp γ_{j}

) caused by the main effect bias. The scheme for generating the interaction bias

θ_{j j^{'}}

has been described earlier. Finally, we generated the taxon count data using the Multinomial model with the mean

(p_{i 1}, p_{i 2}, \dots, p_{i J})

and a library size sampled from an N(10,000, (10,000/3

)^{2})

distribution that was left-truncated at 2000.

Following [5], we applied the following compositional analysis methods: LOCOM, ANCOM, ANCOM-BC, ALDEx2, DACOMP (v1.26), WRENCH, Wilcox-alr-half, and Wilcox-alr-one. The latter two are Wilcoxon rank-sum tests of additive-log-ratio transformed count data after adding pseudocount 0.5 or 1 to all count data and using the most abundant null taxon (known in simulated data) as the reference; the p-values were adjusted for multiple testing by the Benjamini–Hochberg [16] procedure. In addition, we included two newly developed methods, fastANCOM and LinDA. LOCOM requires that the taxa present in fewer than 20% of samples are filtered out. Recall that, since LOCOM fits logistic regression models to taxa count data, zero read counts are naturally handled as possible outcome values. We applied all methods to each replicate of simulated data after using this filter. For ANCOM, ANCOM-BC, fastANCOM, and LinDA, we also applied them to data using their own filter with 10% presence as the cutoff and denoted them as ANCOM^o, ANCOM-BC^o, fastANCOM^o, and LinDA^o. The empirical FDR and sensitivity (proportion of truly causal taxa that were detected) were evaluated at the nominal level of 20% based on 1000 replicates of data. A relatively high nominal FDR level was chosen due to the small numbers of causal taxa in both M1 and M2.

3. Results

The empirical FDR and sensitivity of all aforementioned compositional analysis methods for detecting the “causal” taxa (i.e., taxa that are associated with the trait), for simulations with a binary trait and no confounder under scenarios M1 and M2, are displayed in Figure 2 and Figure 3, respectively (results for simulations with confounders and continuous traits showed similar patterns and are therefore deferred to Figures S3–S6 in Supplementary Materials). Note that

ϕ = 0

corresponds to no interaction bias at any taxa (i.e., main effect biases only), and

β = 0

corresponds to no differential abundance at any taxa (i.e., the global null). In all cases, the FDR of LOCOM remained at or close to the nominal level as long as the magnitude of the interaction bias

ϕ < 1

, which is substantially larger than the empirical value of 0.612 observed in the Brooks data [6]; under the global null in particular, LOCOM always controlled the FDR regardless of the value of

ϕ

. It was only when both

ϕ

and

β

became unrealistically large that we observed moderate inflation in the FDR of LOCOM. The FDR inflation of LOCOM was similar in S-nondiff and S-diff-causal because the interaction biases were similarly distributed at the majority of taxa, which were null taxa; the inflation was (slightly) larger in S-diff-half because the interaction biases had the largest variability among the three scenarios at a given

ϕ

. The interaction biases caused some loss of sensitivity for LOCOM, but the drop was relatively small, and LOCOM maintained the highest sensitivity among all methods in all of our simulation scenarios. The results of the other methods showed a similar trend to LOCOM with the FDR increasing and the sensitivity decreasing as

ϕ

increased, as well as an increase in FDR inflation as

β

increased.

Figure 2. FDR and sensitivity at the nominal FDR 20% (black dotted line) of LOCOM and other compositional methods for data simulated under M1 and with a binary trait. With superscript ^o, the filter for rare taxa that are present in less than 10% of samples as adopted by the original programs was used; without ^o, the more stringent filter with 20% cutoff as recommended by LOCOM was used.

Figure 3. Similar to Figure 2 except that the data were simulated under M2.

The other methods had very different performances even when there were only main effect biases (

ϕ = 0

). ANCOM-BC and fastANCOM performed the best among the other methods, controlling the FDR for the full range of

β

values we have considered when

ϕ

was small and having similar FDR inflation as LOCOM when both

β

and

ϕ

became large; however, this performance came at the cost of a substantially lower sensitivity than that of LOCOM. ANCOM had a moderately inflated FDR when

β

was small but performed better when

β

was increased; nevertheless, its sensitivity was among the lowest. ALDEx2 had an inflated FDR and poor sensitivity when

β

was large in M1 but not in M2. WRENCH, Wilcox-alr-half, and Wilcox-alr-one had highly inflated FDRs except in the global null case, and LinDA had a highly inflated FDR in all cases including the global null. These results were based on data after applying the filter of rare taxa as recommended by LOCOM; in general, ANCOM, ANCOM-BC, fastANCOM, and LinDA had worse FDR control when a less stringent filter was applied. Note that, unlike the LOCOM paper containing results from the older version (v1.1) of DACOMP, we used the latest version (v1.26) here, which yielded a highly inflated FDR and low sensitivity in M2 because some causal taxa were incorrectly selected to be among the reference set.

4. Discussion

In this study, we found that LOCOM was robust to not only main effect biases but also a reasonable range of interaction biases. The other methods tended to have an inflated FDR even when there were only main effect biases; many of them did not control the FDR even when there was no experimental bias at all (results shown in [5]). LOCOM maintained the highest sensitivity among all methods even when the other methods did not control the FDR. Therefore, we conclude that LOCOM outperforms most (if not all) existing methods for compositional analysis of microbiome data.

The robust performance of all methods to the interaction bias is likely due to each term

θ_{j j^{'}} π_{i j^{'}}

in (1) that governs the effect of taxon

j^{'}

on the bias factor of taxon j being small because the relative abundance

π_{i j^{'}}

is generally small. Further, even if all contributions from different taxa to the total interaction bias

\sum_{j^{'}} θ_{j j^{'}} π_{i j^{'}}

have the same sign, any non-zero mean interaction bias across taxa (or individuals) would be automatically included in the normalization factor

α_{i}

(or the main effect bias

γ_{j}

). Thus, the interaction bias is “automatically centered” by these constraints, which may also decrease the apparent effect of the interaction bias.

We used the Brooks mock community data to motivate our simulation studies, which may have two limitations. First, the Brooks samples contain at most seven taxa, all with equal true relative abundance; other mock community datasets, especially datasets that contain a large number of taxa and that mimic real microbiome data, should be used to study the interaction bias. Second, bacteria in a real microbial community may have different interactions from bacteria in a mock community. Therefore, the study of experimental biases and their impact on downstream analysis continues to be an important topic for the foreseeable future.

5. Key Points

Microbiome data are subject to experimental bias, which not only takes the form of taxon-specific main effects but also taxon–taxon interactions.
LOCOM is robust to all main effect biases and a reasonable range of interaction biases.
With the exception of LOCOM, the currently available methods tend to have inflated FDRs even when there are only main effect biases.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes14091777/s1, Figure S1: Distribution of the bias factor due to taxon-taxon interactions; Figure S2: Distribution of the bias factor due to taxon-taxon interactions, based on one replicate of data that were generated under M2 with exp(

β

) = 3 and

ϕ = 4

and had a binary trait; Figure S3: Similar to Figure 2 except that the data were simulated under M1 and had a binary trait and a binary confounder; Figure S4: Similar to Figure 2 except that the data were simulated under M2 and had a binary trait and a binary confounder; Figure S5: Similar to Figure 2 except that the data were simulated under M1 and had a continuous trait; Figure S6: Similar to Figure 2 except that the data were simulated under M2 and had a continuous trait.

Author Contributions

Conceptualization, Y.-J.H. and G.A.S.; methodology, Y.-J.H., G.A.S. and Y.H.; simulations, Y.H.; writing—original draft preparation, Y.-J.H. and G.A.S.; writing—review and editing, Y.H.; funding acquisition, Y.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health awards R01GM116065 (Hu) and R01GM141074 (Hu).

Data Availability Statement

All simulation code are available on GitHub at https://github.com/yijuanhu/LOCOM (accessed on 7 September 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Brooks, J.P. Challenges for case-control studies with microbiome data. Ann. Epidemiol. 2016, 26, 336–341. [Google Scholar] [CrossRef] [PubMed]
Hugerth, L.W.; Andersson, A.F. Analysing microbial community composition through amplicon sequencing: From sampling to hypothesis testing. Front. Microbiol. 2017, 8, 1561. [Google Scholar] [CrossRef] [PubMed]
Pollock, J.; Glendinning, L.; Wisedchanwet, T.; Watson, M. The madness of microbiome: Attempting to find consensus “best practice” for 16S microbiome studies. Appl. Environ. Microbiol. 2018, 84, e02627-17. [Google Scholar] [CrossRef] [PubMed]
McLaren, M.R.; Willis, A.D.; Callahan, B.J. Consistent and correctable bias in metagenomic sequencing experiments. eLife 2019, 8, e46923. [Google Scholar] [CrossRef] [PubMed]
Hu, Y.; Satten, G.A.; Hu, Y.J. LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc. Natl. Acad. Sci. USA 2022, 119, e2122788119. [Google Scholar] [CrossRef] [PubMed]
Zhao, N.; Satten, G.A. A log-linear model for inference on bias in microbiome studies. In Statistical Analysis of Microbiome Data; Springer: Berlin/Heidelberg, Germany, 2021; pp. 221–246. [Google Scholar]
Brooks, J.P.; Edwards, D.J.; Harwich, M.D.; Rivera, M.C.; Fettweis, J.M.; Serrano, M.G.; Reris, R.A.; Sheth, N.U.; Huang, B.; Girerd, P.; et al. The truth about metagenomics: Quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 2015, 15, 66. [Google Scholar] [CrossRef] [PubMed]
Mandal, S.; Van Treuren, W.; White, R.A.; Eggesbø, M.; Knight, R.; Peddada, S.D. Analysis of composition of microbiomes: A novel method for studying microbial composition. Microb. Ecol. Health Dis. 2015, 26, 27663. [Google Scholar] [CrossRef] [PubMed]
Lin, H.; Peddada, S.D. Analysis of compositions of microbiomes with bias correction. Nat. Commun. 2020, 11, 3514. [Google Scholar] [CrossRef] [PubMed]
Zhou, C.; Wang, H.; Zhao, H.; Wang, T. fastANCOM: A fast method for analysis of compositions of microbiomes. Bioinformatics 2022, 38, 2039–2041. [Google Scholar] [CrossRef]
Fernandes, A.D.; Reid, J.N.; Macklaim, J.M.; McMurrough, T.A.; Edgell, D.R.; Gloor, G.B. Unifying the analysis of high-throughput sequencing datasets: Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2014, 2, 15. [Google Scholar] [CrossRef] [PubMed]
Kumar, M.S.; Slud, E.V.; Okrah, K.; Hicks, S.C.; Hannenhalli, S.; Bravo, H.C. Analysis and correction of compositional bias in sparse sequencing count data. BMC Genom. 2018, 19, 799. [Google Scholar] [CrossRef] [PubMed]
Brill, B.; Amir, A.; Heller, R. Testing for differential abundance in compositional counts data, with application to microbiome studies. Ann. Appl. Stat. 2022, 16, 2648–2671. [Google Scholar] [CrossRef]
Zhou, H.; He, K.; Chen, J.; Zhang, X. LinDA: Linear models for differential abundance analysis of microbiome compositional data. Genome Biol. 2022, 23, 95. [Google Scholar] [CrossRef]
Charlson, E.S.; Chen, J.; Custers-Allen, R.; Bittinger, K.; Li, H.; Sinha, R.; Hwang, J.; Bushman, F.D.; Collman, R.G. Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS ONE 2010, 5, e15216. [Google Scholar] [CrossRef]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Impact of Experimental Bias on Compositional Analysis of Microbiome Data

Abstract

1. Introduction

2. Methods

3. Results

4. Discussion

5. Key Points

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics