# MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Model and Notation

#### 2.1. Model for Data without Zeros

#### 2.2. Model for Data with Zeros

#### 2.3. Mechanism for Observing Zeros of the Mediator

#### 2.4. Marginal Mediation Effect and Direct Effect

#### 2.5. Sequential Ignorability Assumption

## 3. Parameter Estimation

## 4. Simulation

#### 4.1. Simulation Setting 1: Univariate ZIB Distribution

#### 4.2. Simulation Setting 2: Multivariate Zero-Inflated Dirichlet-Multinomial Distribution

_{1}should be significant for ${M}_{1}$ and ${M}_{2}$, and NIE

_{2}should be significant for ${M}_{2}$ in the analysis results of this simulation. This setting also mimicked the real study case where there were only two OTU’s with significant NIE

_{1}.

_{1}and NIE

_{2}in terms of Recall (>77.5%), Precision (>87.2%) and F1 (>87.3%). MarZIC achieved the targeted Precision of 80% across all cases. CCMM had good performance in terms of Recall, but its Precision rates (38.8–52.4%) were much lower than the targeted Precision rate (80%) which resulted in low F1 values (55.3–66.1%). This suboptimal performance is likely due to: (a). CCMM was proposed to model the RA on log-scale whereas Equation (12) is on the original scale of RA, (b). CCMM was not developed to incorporate the mediation effect of the binary variable ${1}_{({M}_{1}>0)}$, and (c). CCMM could not handle interactions between the independent variable and mediators such as $X{1}_{({M}_{1}>0)}$ in model (12). And CCMM could not generate any results for those scenarios with the number of taxa greater than or equal to 300 (See Table 2) due to computational issues whereas MarZIC can handle all cases very well. This is likely because CCMM is too computationally demanding for its ${\ell}_{1}$ regularization algorithm which is not computationally capable of handling such high dimensionality. IKT had good Precision rates (>99.7%), but low recall rates (23.5–58.0%) compared to MarZIC, and thus also low F1 values.

_{1}and one taxon having significant NIE

_{2}and cases with 10 taxa having significant NIE

_{1}and one taxon having significant NIE

_{2}. The simulation results (See Table 3) also showed that MarZIC outperformed the other approaches. It had good recall rates for NIE

_{1}($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}85.3\%$) and NIE

_{2}($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}93\%$), and also achieved the target precision rate (80%) for both NIE

_{1}and NIE

_{2}except that it was 77.10%, slightly lower than 80%, for the case with 300 taxa of which 10 taxa had significant NIE

_{1}. Its F1 values were also good for both NIE

_{1}($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}79.6\%$) and NIE

_{2}($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}86.6\%$). CCMM had fair recall ($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}66.0\%$), but much lower precision rate (19.0–66.2%) and therefore low F1 values (31.2–43.9%). IKT, on the other hand, achieved target precision rate for all cases ($>\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}99.1\%$), but low recall rate (29.3–66.2%), and thus low F1 values (44.3–78.2%).

_{1}and the second taxon had non-zero NIE

_{2}. The simulation results from 100 random data sets showed good performance for both NIE

_{1}(Recall = 0.95, Precision = 0.96 and F1 = 0.94) and NIE

_{2}(Recall = 1, Precision = 0.97 and F1 = 0.98).

## 5. Real Study Application

_{1}of two OTUs were found to be statistically significant. The first OTU was assigned to the family S24-7 under order Bacteroidales and the second one was assigned to class Bacilli. The estimates of NIE

_{1}were 0.27 (95% CI: 0.1, 0.42) and −1.28 (95% CI: −2.06, −0.49) respectively. The interpretation for the mediation effects are that the treatment had a marginal positive effect of 0.27 on the dysplasia score through changing the RA of the first OTU and it also had a marginal negative effect of −1.28 on the dysplasia score through changing the RA of the second OTU. The family S24-7 and class Bacilli found by our approach have also been reported to be related with colorectal cancer in the literature [44,45]. To give a full picture of the mediation effects in this data set, a heatmap based on p-values was constructed (see Figure 2) to illustrate the NIE

_{1}of all OTUs. CCMM and IKT did not find any significant mediation effects of the OTUs.

## 6. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Marginal Association beween Y and M_{j}

## Appendix B. Multivariate Delta Method for Obtaining 95% CI of NIE_{1}, NIE_{2}, NDE and CDE

_{1}, NIE

_{2}, NIE, NDE and CDE can be considered as functions of the full parameter vector $\zeta $. Let ${f}_{1}\left(\zeta \right)={\mathrm{NIE}}_{1}$ as derived in Section 2.4 and thus ${f}_{1}\left(\widehat{\zeta}\right)$ is the MLE of NIE

_{1}where $\widehat{\zeta}$ is the MLE of $\zeta $. We first calculate the observed Fisher information matrix which can be calculated as ${I}_{obs}=-\frac{{\partial}^{2}\ell}{\partial \zeta \partial {\zeta}^{\top}}{|}_{\zeta =\widehat{\zeta}}$ where ℓ is the loglikelihood function in Equation (11). By using the multivariate Delta method, we can calculate the variance of the estimator as follows:

_{1}can calculated as $({f}_{1}\left(\widehat{\zeta}\right)-{z}_{0.025}\sqrt{\mathrm{var}\left({f}_{1}\left(\widehat{\zeta}\right)\right)},$${f}_{1}\left(\widehat{\zeta}\right)+{z}_{0.025}\sqrt{\mathrm{var}\left({f}_{1}\left(\widehat{\zeta}\right)\right)})$. The 95% CI for NIE

_{2}, NDE and CDE can be calculated similarly.

## Appendix C. Microbiome Data Generation Process for Simulation Setting 2

**Step 1**: Generate true zeros for all taxa. The zeros for a taxon were generated using a Bernoulli distribution, $Ber(\Delta )$, with $\Delta $ given in Equation (7). So the probability of the taxon being 0 is equal to $\Delta $. For taxon 1, we set $\Delta =0$ so that there’s no zero in taxon 1. For taxon 2, we set ${\gamma}_{0}=1$ and ${\gamma}_{1}=-3$ in Equation (7) for $\Delta $. For all the other taxa, ${\gamma}_{0}$ were generated from $U(1,2)$ and ${\gamma}_{1}=0$ in Equation (7) for $\Delta $. So only the absence (or presence) of taxon 2 was associated with X. The total percentage of zeros was between 68.8% and 81.6% with $K+1$ ranging from 10 to 500, which indicates high data sparsity.

**Step 2**: Generate RA for the non-zero taxa from a Dirichlet distribution. Assume we had P non-zero taxa (from Step 1) indexed by $({t}_{1},{t}_{2},\cdots ,{t}_{P})$ in the ascending order meaning ${t}_{1}<\cdots <{t}_{P}$. Here ${t}_{1}=1$ since the first taxon does not have any zeros from Step 1. The RA of those non-zero taxa was generated by the P-dimensional Dirichlet distribution with the dispersion parameter $\varphi $ and mean parametesr $({\mu}_{{t}_{1}},\cdots ,{\mu}_{{t}_{P}})$ that satisfies ${\sum}_{p=1}^{P}{\mu}_{{t}_{p}}=1$. The dispersion parameter $\varphi $ was set to be 50 to mimic the over-dispersion in real data. The values of mean parameters were chosen in a way such that it has smaller values for taxa with larger $\Delta $’s in Step 1 so that taxa with lower abundance are more likely to have zeros. More specifically, the mean parameters were determined as follows:

**Step 3**: Generate sample absolute abundance (SAA) and false zeros from a multinomial distribution. Let $({\mathcal{R}}_{{t}_{1}},\cdots ,{\mathcal{R}}_{{t}_{P}})$ denote the RA generated in Step 2 for the P non-zero taxa and thus ${\sum}_{p=1}^{P}{\mathcal{R}}_{{t}_{p}}=1$. The P-dimensional multinomial distribution with the parameter vector $({\mathcal{R}}_{{t}_{1}},\cdots ,{\mathcal{R}}_{{t}_{P}})$ and the library size (randomly selected from real data) was used to generate SAA for all the P non-zero taxa. Those taxa with SAA = 0 generated from the multinomial distribution are false zeros.

**Step 4**: Getting final RA for all non-zero taxa. After SAA were generated for all non-zero taxa in Step 3, the SAA were divided by the library size to get the final RA for all non-zero taxa.

**Step 5**: Repeat the above Steps 1–4 for each subject to get a full data set of microbiome data for 200 subjects.

## References

- Belkaid, Y.; Hand, T.W. Role of the microbiota in immunity and inflammation. Cell
**2014**, 157, 121–141. [Google Scholar] [CrossRef] [PubMed] - Wang, X.; Sun, G.; Feng, T.; Zhang, J.; Huang, X.; Wang, T.; Xie, Z.; Chu, X.; Yang, J.; Wang, H.; et al. Sodium oligomannate therapeutically remodels gut microbiota and suppresses gut bacterial amino acids-shaped neuroinflammation to inhibit Alzheimer’s disease progression. Cell Res.
**2019**, 29, 787–803. [Google Scholar] [CrossRef] [PubMed] - Jin, C.; Lagoudas, G.K.; Zhao, C.; Bullman, S.; Bhutkar, A.; Hu, B.; Ameh, S.; Sandel, D.; Liang, X.S.; Mazzilli, S.; et al. Commensal Microbiota Promote Lung Cancer Development via GammaDelta T Cells. Cell
**2019**, 176, 998–1013.e16. [Google Scholar] [CrossRef] [PubMed] - Tanoue, T.; Morita, S.; Plichta, D.R.; Skelly, A.N.; Suda, W.; Sugiura, Y.; Narushima, S.; Vlamakis, H.; Motoo, I.; Sugita, K.; et al. A defined commensal consortium elicits CD8 T cells and anti-cancer immunity. Nature
**2019**, 565, 600–605. [Google Scholar] [CrossRef] - Li, H. Statistical and Computational Methods in Microbiome and Metagenomics. Handb. Stat. Genom.
**2018**. [Google Scholar] [CrossRef] - Sohn, M.B.; Li, H. Compositional mediation analysis for microbiome studies. Ann. Appl. Stat.
**2019**, 13, 661–681. [Google Scholar] [CrossRef] - Wang, C.; Hu, J.; Blaser, M.J.; Li, H. Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data. Bioinformatics
**2019**, 36, 347–355. [Google Scholar] [CrossRef] - Zhang, H.; Chen, J.; Li, Z.; Liu, L. Testing for mediation effect with application to human microbiome data. Stat. Biosci. 2019; in press. [Google Scholar]
- VanderWeele, T.J. Marginal structural models for the estimation of direct and indirect effects. Epidemiology
**2009**, 20, 18–26. [Google Scholar] [CrossRef] - Imai, K.; Keele, L.; Tingley, D. A General Approach to Causal Mediation Analysis. Psychol. Methods
**2010**, 15, 309–334. [Google Scholar] [CrossRef] - VanderWeele, T.J. Explanation in Causal Inference: Methods for Mediation and Interaction; Oxford University Press: New York, NY, USA, 2015. [Google Scholar]
- Baron, R.M.; Kenny, D.A. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations. J. Personal. Soc. Psychol.
**1986**, 51, 1173–1182. [Google Scholar] [CrossRef] - MacKinnon, D.P. Introduction to Statistical Mediation Analysis; Erlbaum: New York, NY, USA, 2008. [Google Scholar]
- MacKinnon, D.P.; Fairchild, A.J.; Fritz, M.S. Mediation analysis. Annu. Rev. Psychol.
**2007**, 58, 593–614. [Google Scholar] [CrossRef] [PubMed] - VanderWeele, T.J. Mediation Analysis: A Practitioner’s Guide. Annu. Rev. Public Health
**2016**, 37, 17–32. [Google Scholar] [CrossRef] [PubMed] - Lange, T.; Hansen, K.W.; Sørensen, R.; Galatius, S. Applied mediation analyses: A review and tutorial. Epidemiol. Health
**2017**, 39, e2017035. [Google Scholar] [CrossRef] [PubMed] - Dalrymple, M.L.; Hudson, I.L.; Ford, R.P.K. Finite mixture, zero-inflated Poisson and hurdle models with application to SIDS. Comput. Stat. Data Anal.
**2003**, 41, 491–504. [Google Scholar] [CrossRef] - Chai, H.; Jiang, H.; Lin, L.; Liu, L. A marginalized two-part Beta regression model for microbiome compositional data. PLoS Comput. Biol.
**2018**, 14, e1006329. [Google Scholar] [CrossRef] - Chen, E.Z.; Li, H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics
**2016**, 32, 2611–2617. [Google Scholar] [CrossRef] - Tang, Z.Z.; Chen, G. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics
**2018**, 20, 698–713. [Google Scholar] [CrossRef] - Peng, X.; Li, G.; Liu, Z. Zero-Inflated Beta Regression for Differential Abundance Analysis with Metagenomics Data. J. Comput. Biol.
**2016**, 23, 102–110. [Google Scholar] [CrossRef] - Chen, J.; Li, H. Variable Selection for Sparse Dirichlet-Multinomial Regression with an Application to Microbiome Data Analysis. Ann. Appl. Stat.
**2013**, 7, 418–442. [Google Scholar] [CrossRef] - Martin, B.D.; Witten, D.; Willis, A.D. Modeling Microbial Abundances and Dysbiosis with Beta-Binomial Regression. Ann. Appl. Stat.
**2020**, 14, 94–115. [Google Scholar] [CrossRef] - Ferrari, S.; Cribari-Neto, F. Beta Regression for Modelling Rates and Proportions. J. Appl. Stat.
**2004**, 31, 799–815. [Google Scholar] [CrossRef] - Cribari-Neto, F.; Zeileis, A. Beta Regression in R. J. Stat. Softw.
**2010**, 34, 24848. [Google Scholar] [CrossRef] - Terhorst, H.J. On Stieltjes Integration in Euclidean-Space. J. Math. Anal. Appl.
**1986**, 114, 57–74. [Google Scholar] [CrossRef] - Efron, B.; Tibshirani, R. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Stat. Sci.
**1986**, 1, 54–75. [Google Scholar] [CrossRef] - Imai, K.; Keele, L.; Yamamoto, T. Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Stat. Sci.
**2010**, 25, 51–71. [Google Scholar] [CrossRef] - Pearl, J. Direct and indirect effects. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, 26–29 August 2001; Breese, J., Koller, D., Eds.; Morgan Kaufman: San Francisco, CA, USA, 2001; pp. 411–420. [Google Scholar]
- Robins, J. Semantics of causal DAG models and the identification of direct and indirect effects. In Proceedings of the Highly Structured Stochastic Systems; Green, P., Hjort, N., Richardson, S., Eds.; Oxford University Press: Oxford, UK, 2003; pp. 70–81. [Google Scholar]
- Peterson, M.; Sinisi, S.; van der Laan, M. Estimation of Direct Causal Effects. Epidemiology
**2006**, 17, 276–284. [Google Scholar] [CrossRef] - Hafeman, D.M.; VanderWeele, T.J. Alternative Assumptions for the Identification of Direct and Indirect Effects. Epidemiology
**2011**, 22, 753–764. [Google Scholar] [CrossRef] [PubMed] - Tingley, D.; Yamamoto, T.; Hirose, K.; Keele, L.; Imai, K. mediation: R Package for Causal Mediation Analysis. 2017. Available online: https://cran.r-project.org/web/packages/mediation/vignettes/mediation.pdf (accessed on 6 June 2022).
- Martinez, M.N.; Bartholomew, M.J. What does it “mean”? A review of interpreting and calculating different types of means and standard deviations. Pharmaceutics
**2017**, 9, 14. [Google Scholar] [CrossRef] - Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A Practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B
**1995**, 57, 289–300. [Google Scholar] [CrossRef] - Gionchetti, P.; Rizzello, F.; Venturi, A.; Brigidi, P.; Matteuzzi, D.; Bazzocchi, G.; Poggioli, G.; Miglioli, M.; Campieri, M. Oral bacteriotherapy as maintenance treatment in patients with chronic pouchitis: A double-blind, placebo-controlled trial. Gastroenterology
**2000**, 119, 305–309. [Google Scholar] [CrossRef] - Sood, A.; Midha, V.; Makharia, G.K.; Ahuja, V.; Singal, D.; Goswami, P.; Tandon, R.K. The probiotic preparation, VSL# 3 induces remission in patients with mild-to-moderately active ulcerative colitis. Clin. Gastroenterol. Hepatol.
**2009**, 7, 1202–1209. [Google Scholar] [PubMed] - Madsen, K.; Cornish, A.; Soper, P.; McKaigney, C.; Jijon, H.; Yachimec, C.; Doyle, J.; Jewell, L.; De Simone, C. Probiotic bacteria enhance murine and human intestinal epithelial barrier function. Gastroenterology
**2001**, 121, 580–591. [Google Scholar] [CrossRef] [PubMed] - Pagnini, C.; Saeed, R.; Bamias, G.; Arseneau, K.O.; Pizarro, T.T.; Cominelli, F. Probiotics promote gut health through stimulation of epithelial innate immunity. Proc. Natl. Acad. Sci. USA
**2010**, 107, 454–459. [Google Scholar] [CrossRef] [PubMed] - Arthur, J.C.; Gharaibeh, R.Z.; Uronis, J.M.; Perez-Chanona, E.; Sha, W.; Tomkovich, S.; Mühlbauer, M.; Fodor, A.A.; Jobin, C. VSL# 3 probiotic modifies mucosal microbial composition but does not reduce colitis-associated colorectal cancer. Sci. Rep.
**2013**, 3, 2868. [Google Scholar] [PubMed] - Caporaso, J.G.; Kuczynski, J.; Stombaugh, J.; Bittinger, K.; Bushman, F.D.; Costello, E.K.; Fierer, N.; Pena, A.G.; Goodrich, J.K.; Gordon, J.I.; et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods
**2010**, 7, 335. [Google Scholar] [CrossRef] [PubMed] - Bokulich, N.A.; Subramanian, S.; Faith, J.J.; Gevers, D.; Gordon, J.I.; Knight, R.; Mills, D.A.; Caporaso, J.G. Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat. Methods
**2013**, 10, 57. [Google Scholar] [CrossRef] [PubMed] - Wang, Q.; Garrity, G.M.; Tiedje, J.M.; Cole, J.R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol.
**2007**, 73, 5261–5267. [Google Scholar] [CrossRef] - Peters, B.A.; Dominianni, C.; Shapiro, J.A.; Church, T.R.; Wu, J.; Miller, G.; Yuen, E.; Freiman, H.; Lustbader, I.; Salik, J.; et al. The gut microbiota in conventional and serrated precursors of colorectal cancer. Microbiome
**2016**, 4, 69. [Google Scholar] [CrossRef] - Bråten, L.S.; Sødring, M.; Paulsen, J.E.; Snipen, L.G.; Rudi, K. Cecal microbiota association with tumor load in a colorectal cancer mouse model. Microb. Ecol. Health Dis.
**2017**, 28, 1352433. [Google Scholar] [CrossRef] - Gianola, D. Least-Squares Means Vs Population Marginal Means. Am. Stat.
**1982**, 36, 65–66. [Google Scholar] - Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2014; Volume 333. [Google Scholar]
- Aitchison, J. The Statistical Analysis of Compositional Data. J. R. Stat. Soc. Ser. B-Stat. Methodol.
**1982**, 44, 139–177. [Google Scholar] [CrossRef]

**Figure 2.**Heatmap of mediation strength based on NIE

_{1}in VSL#3 study. The mediation strength is measured by (1-p) where p is the unadjusted p-value. Negative sign indicates negative NIE

_{1}. Taxonomic assignment is labeled on the vertical axis. Samples are labeled on the horizontal axis. Absence of an OTU in a sample is left blank in the heatmap.

**Table 1.**Simulation results for comparison between MarZIC and IKT with sample size of $n=200$. Bias, percentage of the bias, the empirical standard errors, the the mean of estimated standard errors and the empirical coverage probability of the $95\%$ CI for each estimator is respectively reported under the columns Bias, Bias %, SE, Mean SE and CP(%). Mediation effects from the IKT approach are provided at the bottom part of the table.

Low Relative Abundance (Mean = 0.0025) | High Relative Abundance (Mean = 0.5) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Parameter | True | Mean | Bias | Bias | SE | Mean | CP (%) | True | Mean | Bias | Bias | SE | Mean | CP (%) |

/Effect | Estimate | % | SE | Estimate | % | SE | ||||||||

MarZIC | ||||||||||||||

NIE_{1} | 0.10 | 0.11 | 0.01 | 10.0 | 0.08 | 0.07 | 91 | 9.30 | 9.11 | −0.18 | −1.98 | 2.68 | 2.70 | 96 |

NIE_{2} | 0.55 | 0.52 | −0.03 | −5.67 | 0.55 | 0.56 | 97 | 0.55 | 0.50 | −0.06 | −10.15 | 0.62 | 0.56 | 94 |

NIE | 0.65 | 0.63 | −0.02 | −3.31 | 0.58 | 0.58 | 96 | 9.85 | 9.61 | −0.24 | −2.44 | 3.25 | 3.20 | 95 |

${\beta}_{0}$ | −2.00 | −2.05 | −0.05 | −2.45 | 0.32 | 0.33 | 96 | −2.00 | −1.92 | 0.07 | 3.82 | 0.32 | 0.29 | 94 |

${\beta}_{1}$ | 100.00 | 101.89 | 1.89 | 1.89 | 18.04 | 19.04 | 97 | 100.00 | 99.96 | −0.04 | −0.04 | 1.89 | 1.74 | 91 |

${\beta}_{2}$ | 4.00 | 4.05 | 0.05 | 1.37 | 0.38 | 0.36 | 94 | 4.00 | 3.93 | −0.07 | −1.73 | 0.58 | 0.57 | 91 |

${\beta}_{3}$ | 5.00 | 5.08 | 0.08 | 1.53 | 0.53 | 0.51 | 94 | 5.00 | 4.97 | −0.03 | −0.62 | 0.46 | 0.46 | 99 |

${\beta}_{4}$ | 3.00 | 2.93 | −0.07 | −2.40 | 0.58 | 0.55 | 92 | 3.00 | 3.02 | 0.02 | 0.55 | 0.53 | 0.54 | 99 |

$\delta $ | 1.00 | 0.99 | −0.01 | −1.00 | 0.07 | 0.07 | 90 | 1.00 | 0.97 | −0.03 | −2.99 | 0.07 | 0.07 | 89 |

${\alpha}_{0}$ | −6.20 | −6.24 | −0.04 | −0.69 | 0.36 | 0.36 | 94 | −1.00 | −1.01 | −0.01 | −0.93 | 0.05 | 0.05 | 90 |

${\alpha}_{1}$ | 0.40 | 0.42 | 0.02 | 5.52 | 0.33 | 0.29 | 92 | 0.40 | 0.41 | 0.01 | 1.69 | 0.06 | 0.07 | 95 |

$\xi $ | 50.00 | 56.42 | 6.42 | 12.83 | 24.21 | 19.35 | 97 | 50.00 | 53.37 | 3.37 | 6.74 | 8.22 | 8.40 | 96 |

${\gamma}_{0}$ | −1.16 | −1.23 | −0.07 | −5.75 | 0.35 | 0.36 | 99 | −1.16 | −1.20 | −0.04 | −3.18 | 0.37 | 0.34 | 95 |

${\gamma}_{1}$ | −0.50 | −0.53 | −0.03 | −5.10 | 0.55 | 0.55 | 97 | −0.50 | −0.47 | 0.03 | 6.91 | 0.58 | 0.53 | 91 |

IKT | ||||||||||||||

NIE | 0.65 | 0.10 | −0.55 | −84.81 | - | - | 9 | 9.85 | 9.20 | −0.65 | −6.62 | - | - | 94 |

**Table 2.**Simulation results for the comparison of MarZIC with CCMM and IKT. Here n denotes the sample size and $K+1$ denotes the number of taxa.

Recall (%) | Precision (%) | F1 (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{K}+1$ | n | MarZIC | MarZIC | CCMM | IKT | MarZIC | MarZIC | CCMM | IKT | MarZIC | MarZIC | CCMM | IKT |

(NIE_{1}) | (NIE_{2}) | (NIE_{1}) | (NIE_{2}) | (NIE_{1}) | (NIE_{2}) | ||||||||

10 | 200 | 99.00 | 100.00 | 100.00 | 58.00 | 97.70 | 98.00 | 38.80 | 99.70 | 97.90 | 98.60 | 55.30 | 68.10 |

25 | 200 | 99.50 | 100.00 | 96.00 | 39.50 | 98.20 | 99.50 | 52.40 | 100.00 | 98.50 | 99.60 | 66.10 | 48.30 |

50 | 200 | 97.50 | 100.00 | 97.00 | 44.00 | 100.00 | 100.00 | 46.40 | 100.00 | 98.30 | 100.00 | 60.60 | 54.70 |

100 | 200 | 96.00 | 98.90 | 100.00 | 32.50 | 95.50 | 100.00 | 42.80 | 100.00 | 94.50 | 98.90 | 58.00 | 41.30 |

300 | 200 | 86.00 | 97.80 | - | 25.00 | 90.80 | 99.50 | - | 100.00 | 85.80 | 97.50 | - | 31.30 |

500 | 200 | 77.50 | 94.70 | - | 23.50 | 97.80 | 87.20 | - | 99.00 | 83.00 | 87.30 | - | 30.00 |

Recall (%) | Precision (%) | F1 (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{K}+1$ | Number of Taxa | MarZIC | MarZIC | CCMM | IKT | MarZIC | MarZIC | CCMM | IKT | MarZIC | MarZIC | CCMM | IKT |

with Non-Zero NIE_{1} | (NIE_{1}) | (NIE_{2}) | (NIE_{1}B) | (NIE_{2}) | (NIE_{1}) | (NIE_{2}) | |||||||

50 | 5 | 95.00 | 100.00 | 89.00 | 66.20 | 99.00 | 98.50 | 27.90 | 99.60 | 96.60 | 99.00 | 42.20 | 78.20 |

50 | 10 | 95.70 | 92.00 | 66.00 | 62.40 | 98.80 | 91.80 | 33.20 | 99.60 | 97.10 | 86.20 | 43.90 | 75.70 |

100 | 5 | 96.60 | 99.00 | 89.40 | 60.60 | 92.70 | 98.30 | 19.00 | 99.10 | 94.10 | 97.80 | 31.20 | 73.30 |

100 | 10 | 92.10 | 91.00 | 80.10 | 46.00 | 93.70 | 97.80 | 27.20 | 100.00 | 92.50 | 89.50 | 40.40 | 61.20 |

300 | 5 | 94.20 | 96.00 | - | 56.10 | 80.50 | 97.00 | - | 99.70 | 85.20 | 94.00 | - | 69.90 |

300 | 10 | 85.30 | 93.00 | - | 29.30 | 77.10 | 91.00 | - | 99.60 | 79.60 | 86.60 | - | 43.40 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wu, Q.; O’Malley, J.; Datta, S.; Gharaibeh, R.Z.; Jobin, C.; Karagas, M.R.; Coker, M.O.; Hoen, A.G.; Christensen, B.C.; Madan, J.C.;
et al. MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data. *Genes* **2022**, *13*, 1049.
https://doi.org/10.3390/genes13061049

**AMA Style**

Wu Q, O’Malley J, Datta S, Gharaibeh RZ, Jobin C, Karagas MR, Coker MO, Hoen AG, Christensen BC, Madan JC,
et al. MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data. *Genes*. 2022; 13(6):1049.
https://doi.org/10.3390/genes13061049

**Chicago/Turabian Style**

Wu, Quran, James O’Malley, Susmita Datta, Raad Z. Gharaibeh, Christian Jobin, Margaret R. Karagas, Modupe O. Coker, Anne G. Hoen, Brock C. Christensen, Juliette C. Madan,
and et al. 2022. "MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data" *Genes* 13, no. 6: 1049.
https://doi.org/10.3390/genes13061049