Sex-Specific Associations between Gut Microbiome and Non-Alcoholic Fatty Liver Disease among Urban Chinese Adults

Non-alcoholic fatty liver disease (NAFLD) has been linked to altered gut microbiome; however, evidence from large population-based studies is limited. We compared gut microbiome profiles of 188 male and 233 female NAFLD cases with 571 male and 567 female controls from two longitudinal studies of urban Chinese adults. History of NAFLD was assessed during surveys administered in 2004–2017. Microbiota were assessed using 16S rRNA sequencing of stool samples collected in 2015–2018. Associations of NAFLD with microbiome diversity and composition were evaluated by generalized linear or logistic regression models. Compared with controls, male cases had lower microbial α-diversity, higher abundance of genera Dialister and Streptococcus and Bifidobacterium species, lower abundance of genus Phascolarctobacterium, and lower prevalence of taxa including order RF39 (all p < 0.05). In contrast, female cases had higher α-diversity, higher abundance of genus Butyricimonas and a family of order Clostridiales, lower abundance of Dialister and Bifidobacterium species, and higher prevalence of RF39. Significant NAFLD–sex interactions were found for α-diversity and above taxa (all false discovery rate < 0.1). In conclusion, we observed sex-specific gut microbiome features related to history of NAFLD. Further studies are needed to validate our findings and evaluate the health effects of NAFLD-related gut microbiota.


Introduction
Non-alcoholic fatty liver disease (NAFLD) is a highly prevalent metabolic disease, defined as ≥5% hepatic steatosis, not caused by excessive alcohol consumption or other secondary conditions such as viral hepatitis or hereditary liver diseases [1]. The estimated global prevalence of NAFLD was 25%, which varied significantly across countries from 4% to 41% [2]. In China, the prevalence of NAFLD has doubled in the past 20 years with a nationwide prevalence of 29% estimated in 2019; meanwhile, the prevalence was 33% in males and 22% in females [3]. The pathophysiology of NAFLD is complex; however, the gut-liver axis, i.e., the bidirectional relationship of the gut and its microbiota with the liver, has attracted increasing attention [4,5]. Gut microbiota can be involved in NAFLD development and progression through several mechanisms, including changing intestine permeability, changing energy harvest from diet, affecting lipogenesis and choline and bile acid metabolism, producing ethanol in the intestine, and linking to inflammation [4,6,7]. Recent animal models and human studies have linked gut dysbiosis with NAFLD [4,6]. Experiments using gut microbiota transplantation to germ-free mice showed that gut microbiota determine the development of NAFLD independent of obesity [8]. In addition, inflammasome-mediated gut dysbiosis was shown to be involved in NAFLD progression to non-alcoholic steatohepatitis (NASH) [9]. In human studies, as summarized by Safari and Gerard [4], several case-control studies have shown altered abundance of fiber-fermenting and inflammation-modulating bacteria, including Dorea, Lactobacillus, and Ruminococcus, in NAFLD patients compared with healthy controls. Increased abundance of genus Bacteroides and decreased Prevotella levels have been found in NASH compared with NAFLD patients, and Ruminococcus abundance increased in patients in fibrosis stage F ≥ 2 [10]. Furthermore, a random forest model comprising predominantly gut bacterial features showed a strong diagnostic precision to detect advanced fibrosis in NAFLD patients [11,12]. However, most previous studies had a small sample size or inadequately controlled potential confounding factors such as diet and lifestyles, and findings regarding individual taxa associations remain limited and inconsistent [4,6,13].
In the present study, we used resources from two large prospective cohorts of middleaged to older urban Chinese adults and compared gut microbial diversity and taxonomic composition among over 1500 adults with or without a history of NAFLD. Such comparisons may help better understand the gut-liver axis and identify potentially important gut bacteria that may play a role in NAFLD development and progression, and thus offer innovative options for prevention and treatment of this leading liver disease.

Study Population
Participants of this study were selected from two population-based cohort studies, the Shanghai Women's Health Study (SWHS) and Shanghai Men's Health Study (SMHS). The designs and methods of the SWHS and SMHS have been described in detail elsewhere [14,15]. Briefly, the SWHS recruited 74,941 women aged 40-70 years between 1996 and 2000 from urban communities in Shanghai, China, with a response rate of 92.7% [14]. The SMHS recruited 61,480 men aged 40-70 years between 2002 and 2006 from the same communities, with a response rate of 74.0% [15]. In-person interviews were conducted at baseline to collect sociodemographic data, disease history, diet/lifestyles, and anthropometrics; biospecimens were also collected, including blood, urine, and/or oral rinse samples. Participants were followed-up through in-person surveys every 2-4 years (response rates > 92%) with supplemental annual record linkages to Shanghai Vital Statistics and Shanghai Cancer Registry (completion rates > 99%) to collect information on the occurrence of cancer and other chronic diseases including liver diseases, as well as to update information on diet, lifestyle, and anthropometrics. Informed consent was obtained from all study participants. A participant inclusion/exclusion flow chart for the present study is shown in Supplementary Figure S1 and described in detail below.

NAFLD Assessment
Information on fatty liver diagnosis and ultrasound examination was collected during follow-up surveys conducted between 2004 and 2017 (the 3rd to 5th in-person visits of the SWHS and the 2nd and 3rd visits of the SMHS). In each survey, participants were asked whether they had been diagnosed with fatty liver disease by a physician (if yes, the time of diagnosis) and whether there was an abdominal ultrasound. Given that NAFLD is usually asymptomatic, to reduce potential misclassification of NAFLD status, we limited their analysis to participants who had an abdominal ultrasound and answered the fatty liver question. Meanwhile, we only included participants who had no history of viral hepatitis and zero to moderate alcohol consumption (≤1 drink/day for women and ≤2 drinks/day for men; 1 drink = 14 g ethanol), using data from baseline and follow-up surveys.

Stool Sample Collection and 16S rRNA Gene Sequencing
Stool sample collections were carried out in both cohort studies between 2015 and 2018 (the 5th visit of the SWHS and the 3rd visit of the SMHS). Stool samples were collected from a total of 10,655 participants (5526 women and 5129 men) using the 95% ethanol method, as described in detail in our previous publication [16]. At the time of stool collection, participants were also asked for the date and time of stool collection, antibiotic and medication uses in the past 7 days and 6 months, and whether they had diarrhea in the last 7 days. Stool samples were shipped to the laboratory within 24 h after collection and stored at −80 • C.
Stool sample DNA of 3358 study participants was isolated using QIAGEN's DNeasy PowerSoil kit (Germantown, MD, USA). Sequencing libraries were prepared using NEXTflex 16S V4 Amplicon-Seq Kit (Bioo Scientific 4201-05, Austin, TX, USA). The 16S rRNA gene sequencing was performed at pair-end 250 bp using Illumina HiSeq System. For each 96-well plate, one negative control sample (distilled water) was included. The protocols for sequencing data processing and quality controls were published elsewhere [17]. Briefly, raw sequencing data were trimmed and filtered to remove bases and low-quality reads by using Sickle. BayesHammer was utilized to correct sequencing errors and PANDAseq to stitch paired-end reads. Clean reads were then clustered into Operational Taxonomic Units (OTUs) at 97% sequence identity using the closed reference OTU picking strategy, with Greengenes [18] as reference, via the taxonomy classification function "mothur" [19] implemented in Quantitative Insights into Microbial Ecology (QIIME) v1.9.1 [20].
As described previously, we obtained 16S rRNA sequencing data from 3194 participants after quality control procedures [16]. Among them, 2358 participants had information on NAFLD history and abdominal ultrasound. For the current study, we further excluded participants who used antibiotics or had diarrhea in the past 7 days before stool collection (n = 81) and who were ever diagnosed with or self-reported probably gut microbiomeimpacting diseases, including any cancer (n = 46), diabetes (n = 183), stroke (n = 366), or coronary heart disease (n = 234) at baseline or during follow-up. A total of 1559 adults, including 759 men and 800 women, were included in the final analysis.

Statistical Analysis
The analyses were conducted in men and women separately and in a combined dataset adjusting for sex. The sequencing reads per sample ranged between 17,013 and 244,929, with a mean of 134,520. We rarefied the OTU table using the minimal sequencing depth and estimated observed bacterial numbers and α-diversity indices, including Chao1, Shannon, and phylogenetic diversity (PD_whole_tree). A linear regression model was used to evaluate the differences in α-diversity between NAFLD cases and controls. Association between NAFLD and genus level Bray-Curtis β-diversity was evaluated using permutational multivariate analysis of variance (PERMANOVA) with the adonis2 function in R package vegan [21].
The presence of individual taxa was defined as their relative abundance ≥0.00588% in a sample (i.e., ≥1 read when there were 17,013 reads, the minimum sequencing depth of our samples). Common taxa were defined if present in (carrier frequency) >50% of control participants; rare taxa were defined if present in 10-50% of control participants; taxa present in <10% of control participants were excluded from analyses. For common taxa, sequencing counts for each taxon were normalized using centered log-ratio transformation after adding 1 as a pseudo-count [22,23]. General linear regression models were used to evaluate associations of NAFLD with each taxon. Logistic regression was used to evaluate associations between NAFLD and the presence (yes/no) of rare taxa. Potential confounders were adjusted for in two models: the basic model included age at stool collection, sex (for combined analysis), the season of stool collection, education, income, and sequencing batch; the full model further included body mass index (BMI), waist-to-hip ratio (WHR), smoking status, alcohol drinking status, physical activity, total energy intake, dietary fat intake, bowel movement frequency, history of hypertension, and history of dyslipidemia.
Associations from the full model were presented as the main results. Sequencing depth was included as an additional covariate for analyses with rare taxa prevalence. Covariates were updated using data from follow-up surveys conducted between 2012 and 2017, except for education and income, which were assessed only at baseline. Stratified analyses were conducted by age (< or ≥65 years at stool collection), overweight (BMI < or ≥24 kg/m 2 , according to recommendation for Chinese adults [24]), WHR (men: < or ≥0.9; women: < or ≥0.8), healthy diet score (< or ≥24.5 [median]), history of dyslipidemia, history of hypertension, and time between self-reported NAFLD diagnosis and stool collection (< or ≥9.5 years [median]; or <5, 5-15, or ≥15 years). An interaction term of NAFLD with a stratified variable was added to the regression model. The Benjamini-Hochberg false discovery rate (FDR) was applied to account for multiple comparisons at each taxonomic level. Significance was defined at an FDR < 0.1 at each taxonomic level. All analyses were carried out using QIIME [20], SAS Enterprise Guide 7.1 (SAS Institute Inc., Cary, NC, USA), or R version 3.6.3.

Characteristics of the Study Subjects
The current study included 188 men and 233 women with NAFLD and 571 men and 567 women without NAFLD. Compared with non-NAFLD controls, participants with NAFLD had higher BMI (mean: 25.8 vs. 23.9 among men; 26.1 vs. 23.5 among women), WHR (mean: 0.92 vs. 0.89 among men; 0.83 vs. 0.81 among women), and prevalence of dyslipidemia (19.2% vs. 6.0% among men; 44.6% vs. 14.6% among women) ( Table 1; all p < 0.001). Meanwhile, female cases had a higher income level, lower dietary fat intake, and higher prevalence of hypertension. Otherwise, participants with or without a history of NAFLD did not differ by age (mean: 68 years at stool collection; range: 51-89 years), education level, smoking status, alcohol drinking status, overall diet quality, total energy intake, and bowel movement frequency.

Associations of NAFLD History with Gut Microbiome Alpha and Beta Diversity
As shown in Figure 1 and Supplementary Table S1, men with a NAFLD history had slightly decreased microbiome α-diversity (including PD_whole_tree, Shannon index, Chao1, and observed OTUs) than men without a history of NAFLD, whereas women with NAFLD showed slightly increased α-diversity than women without NAFLD (all p < 0.05 compared with controls). A potential effect modification by sex on the NAFLD and α-diversity association was suggested (all p < 0.02 for interactions). The genus-level Bray-Curtis dissimilarities between NAFLD cases and controls were not significant in either sex; NAFLD status explained 0.23% and 0.09% Bray-Curtis variance among men and women, respectively.

Associations of NAFLD History with Individual Gut Microbial Taxa
Similar to the α-diversity results, we observed significant sex-specific associations between NAFLD history and individual taxa ( Table 2). We examined 145 common taxa (5 phyla, 10 classes, 12 orders, 20 families, 38 genera, and 60 species). Among men, NAFLD was associated with increased abundance of genera Dialister (median relative abundance: 0.0554% in cases vs. 0.0214% in controls; p = 0.001) and Streptococcus (0.1144% vs. 0.0787%; p = 0.01), two Bifidobacterium species (both p = 0.03 for B. adolescentis and B. Other), and an unclassified Dialister species, while a decreased abundance of genus Phascolarctobacterium (0.9446% vs. 1.672%; p = 0.01). Among women, NAFLD was associated with increased abundance of genus Butyricimonas (0.1061% vs. 0.0463%; p = 0.003) and an unclassified species within it, an unclassified family and genus of order Clostridiales (0.0127% vs. 0.0052%; p = 0.003), and an Oscillospira species (0.0275% vs. 0.0146%; p = 0.009). Significant interactions between NAFLD history and sex were observed for all these associations (all FDR < 0.1 for interactions). In the combined dataset with additional adjustment for sex, the abundance of an unclassified Streptococcus species was higher, while the abundance of an unclassified Blautia species was lower in NAFLD cases than controls (all p < 0.05, Supplementary Table S2).  Gut microbiome α-diversity indexes (PD_whole_tree_distance and Shannon) between non-alcoholic fatty liver disease and healthy controls among males from the Shanghai Men's Health Study (SMHS) and females from the Shanghai Women's Health Study (SWHS). General linear regression was conducted, adjusting for age at stool collection, the season of stool collection, body mass index, waist-to-hip ratio, education, income, smoking status, alcohol drinking status, physical activity, total energy intake, fat intake, bowel movement frequency, history of hypertension, history of dyslipidemia, and sequencing batch. Abbreviation: PD, phylogenetic diversity.

Associations of NAFLD History with Individual Gut Microbial Taxa
Similar to the α-diversity results, we observed significant sex-specific associat between NAFLD history and individual taxa ( Table 2). We examined 145 common tax phyla, 10 classes, 12 orders, 20 families, 38 genera, and 60 species). Among men, NAF was associated with increased abundance of genera Dialister (median relative abunda 0.0554% in cases vs. 0.0214% in controls; p = 0.001) and Streptococcus (0.1144% vs. 0.078 p = 0.01), two Bifidobacterium species (both p = 0.03 for B. adolescentis and B. Other), and unclassified Dialister species, while a decreased abundance of genus Phascolarctobacter (0.9446% vs. 1.672%; p = 0.01). Among women, NAFLD was associated with increa abundance of genus Butyricimonas (0.1061% vs. 0.0463%; p = 0.003) and an unclassi species within it, an unclassified family and genus of order Clostridiales (0.0127% 0.0052%; p = 0.003), and an Oscillospira species (0.0275% vs. 0.0146%; p = 0.009). Signifi interactions between NAFLD history and sex were observed for all these associations Figure 1. Gut microbiome α-diversity indexes (PD_whole_tree_distance and Shannon) between non-alcoholic fatty liver disease and healthy controls among males from the Shanghai Men's Health Study (SMHS) and females from the Shanghai Women's Health Study (SWHS). General linear regression was conducted, adjusting for age at stool collection, the season of stool collection, body mass index, waist-to-hip ratio, education, income, smoking status, alcohol drinking status, physical activity, total energy intake, fat intake, bowel movement frequency, history of hypertension, history of dyslipidemia, and sequencing batch. Abbreviation: PD, phylogenetic diversity.  The rare taxa were defined as those with relative abundance ≥0.00588% and present in (carrier frequency) 10-50% of control participants. b p_, c_, o_, f_, g_, and s_ indicate taxonomic levels of phylum, class, order, family, genus, and species, respectively. c Logistic regression model for NAFLD association with rare taxa, adjusted for age at stool sampling, the season of sample collection, body mass index, waist-to-hip ratio, education, income, smoking status, alcohol drinking status, physical activity, total energy intake, fat intake, bowel movement frequency, history of hypertension, history of dyslipidemia, sequencing batch, and sequencing depth. d False discovery rate (FDR) < 0.1 at each taxonomic level. NAFLD, non-alcoholic fatty liver disease; RA, relative abundance; se, standard error; SMHS, Shanghai Men's Health Study; SWHS, Shanghai Women's Health Study.

Discussion
In this study of 1559 predominantly elderly urban Chinese adults, we found that NAFLD was associated with gut microbiome α-diversity and several taxa differently in men and women, suggesting the importance of considering sex/gender in research of the gut-liver axis. Among men, NAFLD was associated with decreased microbial αdiversity, increased abundance of genera Dialister and Streptococcus and Bifidobacterium species, reduced abundance of genus Phascolarctobacterium, and reduced prevalence of order RF39 and unclassified genus/species of families (Mogibacteriaceae), Rikenellaceae, and Peptococcaceae. In contrast, among women, NAFLD was associated with increased microbial α-diversity and altered abundance and prevalence of above taxa, generally in the opposite direction. We also found that age, BMI, WHR, diet quality, and history of hypertension may modify NAFLD association with specific taxa in men or women.
Increasing evidence supports sex differences in the gut microbiome and potential sex-dependent associations of gut microbiota with health outcomes [25][26][27][28][29]. In the present study, we observed significant associations of NAFLD with microbial α-diversity and individual taxa varied by sex, but no significant sex differences in those microbial features (i.e., similar diversity and abundance/prevalence between men and women among NAFLD cases or controls). The underlying mechanisms for such findings are not clear, although sex differences in the gut microbiome, hormone, BMI, and lifestyles have been shown [27]. The observed sex-specific associations might be due to older age at stool collection in women than in men [30][31][32], and much fewer smokers and alcohol drinkers, high prevalence of morbidities such as hypertension and dyslipidemia, or changes in diet and lifestyles after disease diagnosis among women than men; however, all those covariates had been adjusted for in our main models. Future studies are needed to examine the sex-specific gut microbiome associations with NAFLD and investigate underlying biological mechanisms.
In addition, we found a significantly increased abundance of genus Dialister in male NAFLD cases but decreased level in female cases. Dialister is a genus of Firmicutes, which was found to increase among liver cirrhosis patients [48,49]. Genus Phascolarctobacterium has been associated with age and weight loss in NAFLD patients [50]. Its abundance has also shown sex-difference in metabolic syndrome patients, i.e., higher in female than in male patients [51]. We observed a reduced Phascolarctobacterium abundance among male NAFLD cases but an increased abundance among female NAFLD cases. Among rare taxa that showed sex-dependent associations with NAFLD, order RF39 has been positively associated with healthy diets [16,52] and negatively associated with BMI, blood triglycerides, and frailty among older adults [52][53][54][55][56], suggesting its potential health benefits.
This study has several strengths. First, this is the largest population-based study to date identifying gut microbiome features related to history of NAFLD in an Asian population. Second, information is available on a wide range of medical, sociodemographic, and lifestyle factors, allowing us to exclude participants with a history of other diseases (e.g., diabetes, cardiovascular disease, and hepatitis) and adjust for various covariates to minimize potential confounding. Third, for the first time, we found potential sexdependent gut microbial features related to NAFLD, although further studies are needed to validate such findings. Several limitations should also be acknowledged. First, there may be misclassification of NAFLD status, which may attenuate the observed associations. Second, despite comprehensive covariate adjustments, the impact of residual confounding due to poorly measured or unmeasured variables such as other underlying diseases and medication uses cannot be overlooked [57][58][59]. At the same time, some variables included in the full model may be confounders and were also involved in the causal pathways between NAFLD and gut microbiota (e.g., WHR and history of dyslipidemia). However, a minimal adjustment model yielded similar results to the full model. Third, stool samples were collected 2.2 to 35.3 years (median 9.5 years) after the first NAFLD diagnosis, while we did not know how the disease may have progressed during this time. However, we did not find significant effect modifications by time period: the NAFLD-time interaction was not significant; the sex-specific associations presented in all tables generally remained when we limited NAFLD cases to those diagnosed <15 years before stool collection (336 cases), and association directions were consistent when limited to diagnosed <5 years (n = 68) or ≥15 years (n = 85) before stool collection. Fourth, stool samples were stored at −80 • C for up to three years before sequencing. Although recent studies showed that long-term storage at −80 • C (i.e., up to five years) has limited effects on 16S rRNA sequencing results of human fecal samples [60,61], we do not know how the sample storage may have affected our results, particularly for rare or low-abundance taxa. Finally, due to the bidirectional relation between gut microbiota and NAFLD [62], how the observed microbiota alterations may affect NAFLD development or progression is unclear and needs to be clarified in future studies.
In summary, in a large cohort of older, urban Chinese adults, we found significant sexspecific associations of NAFLD history with gut microbiome α-diversity and composition. Further studies are needed to validate these findings and investigate whether those gut microbial changes may play a role in the development or progression of NAFLD.