# Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data

^{*}

## Abstract

**:**

## 1. Introduction

- We introduce an aggregation scheme that provably retains the original methods’ guarantees—see Theorem 1.
- We show numerically that the aggregation can increase the original methods’ power—see Section 3.1 and Section 3.2.
- We show that the resulting pipelines for FDR control can be readily applied to empirical data and lead to new discoveries—see Section 3.3.

## 2. Methods and Theory

#### 2.1. A Brief Introduction to the Knockoff Filter

#### 2.2. Aggregating Knockoffs

**Theorem**

**1.**

**Proof**

**of Theorem 1.**

#### 2.3. Other Approaches

## 3. Simulations and a Real Data Analysis

#### 3.1. Simulation 1: Linear Regression

#### 3.2. Simulation 2: Logistic Regression

#### 3.3. Influence of the Gut Microbiome on Obesity

## 4. Discussion

## Appendix A. Additional Explanations

#### Appendix A.1. Further Simulations for Comparison to Multiple Knockoffs (MKO)

**Figure A1.**Our approach AKO (solid, orange circles) has a similar FDR to the standard KO (hollow, purple circles) but has more power. The MKO (solid, blue square) is more conservative than our AKO, has lower power.

#### Appendix A.2. Choice of q_{1},…,q_{k}

#### Appendix A.3. Various Settings for the Simulation Part

**Figure A3.**Our approach AKO (solid, orange circles) has a similar FDR to the standard KO (hollow, purple circles) but has more power.

#### Appendix A.4. Better Than other Competitors (under the AGP Data)

**Table A1.**Selected bacterial phyla by four methods—BH, TreeFDR, KO, and AKO (correponds to Table 1 (i)).

(i) ALL | |||
---|---|---|---|

BH | TreeFDR | KO | AKO |

Actinobacteria | Actinobacteria | ||

Bacteroidetes | |||

Cyanobacteria | Cyanobacteria | Cyanobacteria | |

Proteobacteria | Proteobacteria | ||

Spirochaetes | |||

Synergistetes | Synergistetes | ||

Tenericutes | Tenericutes | ||

Verrucomicrobia | Verrucomicrobia | Verrucomicrobia |

## Appendix B. Additional Results on the Genera Rank

**Table A2.**Analysis at the genus level rank for the grouping (ii) uw+ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Collinsella | Collinsella |

Firmicutes | Lachnospira | |

Acidaminococcus | ||

Catenibacterium | ||

Tenericutes | RF39 | RF39 |

**Table A3.**Analysis at the genus level for the grouping (iii) nor + ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Actinomyces | Actinomyces |

Collinsella | Collinsella | |

Cyanobacteria | YS2 | YS2 |

Firmicutes | Bacillus | Bacillus |

Lactococcus | ||

Lachnospira | Lachnospira | |

Ruminococcus | Ruminococcus | |

Acidaminococcus | Acidaminococcus | |

Megasphaera | Megasphaera | |

Mogibacteriaceae | ||

Erysipelotrichaceae | ||

Catenibacterium | Catenibacterium | |

Proteobacteria | RF32 | RF32 |

Haemophilus | ||

Tenericutes | RF39 | RF39 |

ML615J-28 |

**Table A4.**Analysis at the genus level for the grouping (iv) ow + ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Eggerthella | Eggerthella |

Cyanobacteria | YS2 | YS2 |

Streptophyta | Streptophyta | |

Firmicutes | Bacillus | |

Clostridium | Clostridium | |

Lachnospira | Lachnospira | |

Acidaminococcus | Acidaminococcus | |

1-68 | ||

Erysipelotrichaceae | Erysipelotrichaceae | |

Catenibacterium | ||

Proteobacteria | Haemophilus | Haemophilus |

**Table A5.**Analysis at the genus level for the grouping (v) uw + nor + ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Actinomyces | Actinomyces |

Collinsella | Collinsella | |

Cyanobacteria | YS2 | YS2 |

Firmicutes | Bacillus | Bacillus |

Lactococcus | ||

Lachnospira | Lachnospira | |

Ruminococcus | Ruminococcus | |

Acidaminococcus | Acidaminococcus | |

Megasphaera | Megasphaera | |

Mogibacteriaceae | ||

SHA-98 | ||

Erysipelotrichaceae | ||

Catenibacterium | Catenibacterium | |

Proteobacteria | RF32 | RF32 |

Haemophilus | ||

Tenericutes | RF39 | RF39 |

ML615J-28 |

**Table A6.**Analysis at the genus level for the grouping (vi) uw + ow + ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Eggerthella | Eggerthella |

Cyanobacteria | YS2 | YS2 |

Streptophyta | Streptophyta | |

Firmicutes | Bacillus | Bacillus |

Lactobacillus | ||

Clostridium | Clostridium | |

Lachnospira | Lachnospira | |

Veillonellaceaes | ||

Acidaminococcus | Acidaminococcus | |

1-68 | 1-68 | |

Erysipelotrichaceae | Erysipelotrichaceae | |

Catenibacterium | Catenibacterium | |

Eubacterium | Eubacterium | |

Proteobacteria | RF32 | |

Haemophilus | Haemophilus |

**Table A7.**Analysis at the genus level for the grouping (vii) nor+ow+ob. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Actinomyces | Actinomyces |

Collinsella | Collinsella | |

Eggerthella | Eggerthella | |

Cyanobacteria | YS2 | YS2 |

Firmicutes | Bacillus | Bacillus |

Lachnospira | Lachnospira | |

Ruminococcus | Ruminococcus | |

Acidaminococcus | Acidaminococcus | |

Megasphaera | Megasphaera | |

Erysipelotrichaceae | Erysipelotrichaceae | |

Catenibacterium | Catenibacterium | |

Proteobacteria | RF32 | RF32 |

Haemophilus | Haemophilus | |

Tenericutes | RF39 |

**Figure 1.**Our approach, AKO (solid, orange circles), has a similar FDR to the standard KO (hollow, purple circles) but has more power.

**Figure 2.**Our approach AKO (solid, orange circles) has a similar FDR to the standard KO (hollow, purple circles) but has more power.

**Table 1.**Selected bacterial phyla by our pipeline (AKO) and the original pipeline (KO) at FDR level $q=0.1$ for seven groupings. AKO consistently selects more phyla than KO.

(i) all | (ii) uw + ob | ||

KO | AKO | KO | AKO |

Actinobacteria | Actinobacteria | Actinobacteria | Actinobacteria |

Bacteroidetes | |||

Cyanobacteria | Cyanobacteria | Cyanobacteria | |

Firmicutes | |||

Proteobacteria | Proteobacteria | ||

Spirochaetes | |||

Synergistetes | Synergistetes | Synergistetes | |

Tenericutes | Tenericutes | Tenericutes | Tenericutes |

Verrucomicrobia | |||

(iii) nor + ob | (iv) ow + ob | ||

KO | AKO | KO | AKO |

Actinobacteria | Actinobacteria | Actinobacteria | |

Bacteroidetes | Bacteroidetes | ||

Cyanobacteria | Cyanobacteria | Cyanobacteria | Cyanobacteria |

Firmicutes | |||

Lentisphaerae | |||

Proteobacteria | Proteobacteria | Proteobacteria | |

Spirochaetes | Spirochaetes | ||

Synergistetes | Synergistetes | Synergistetes | |

TM7 | |||

Tenericutes | Tenericutes | Tenericutes | Tenericutes |

Verrucomicrobia | |||

Thermi | |||

(v) uw + nor + ob | (vi) uw + ow + ob | ||

KO | AKO | KO | AKO |

Actinobacteria | Actinobacteria | Actinobacteria | |

Bacteroidetes | Bacteroidetes | ||

Cyanobacteria | Cyanobacteria | Cyanobacteria | Cyanobacteria |

Firmicutes | |||

Lentisphaerae | |||

Proteobacteria | Proteobacteria | Proteobacteria | |

Spirochaetes | Spirochaetes | ||

Synergistetes | Synergistetes | Synergistetes | |

TM7 | |||

Tenericutes | Tenericutes | Tenericutes | Tenericutes |

(vii) nor+ow+ob | |||

KO | AKO | ||

Actinobacteria | |||

Bacteroidetes | |||

Cyanobacteria | Cyanobacteria | ||

Proteobacteria | Proteobacteria | ||

Spirochaetes | |||

Synergistetes | Synergistetes | ||

Tenericutes | Tenericutes | ||

Verrucomicrobia |

**Table 2.**Selected bacterial genera by our pipeline (AKO) and the original pipeline (KO) at FDR level $q=0.1$ for ALL—cf. (i) in Table 1. AKO selects more genera than the original KO.

Phylum | KO | AKO |
---|---|---|

Actinobacteria | Actinomyces | Actinomyces |

Collinsella | Collinsella | |

Eggerthella | Eggerthella | |

Cyanobacteria | YS2 | YS2 |

Streptophyta | ||

Firmicutes | Bacillus | Bacillus |

Lactobacillus | ||

Lactococcus | Lactococcus | |

Clostridium | ||

Lachnospira | Lachnospira | |

Ruminococcus | Ruminococcus | |

Peptostreptococcaceae | ||

Acidaminococcus | Acidaminococcus | |

Megasphaera | Megasphaera | |

Mogibacteriaceae | ||

Erysipelotrichaceae | Erysipelotrichaceae | |

Catenibacterium | Catenibacterium | |

Proteobacteria | RF32 | RF32 |

Haemophilus | Haemophilus | |

Tenericutes | RF39 |

