Discovery of Candidate Stool Biomarker Proteins for Biliary Atresia Using Proteome Analysis by Data-Independent Acquisition Mass Spectrometry

Biliary atresia (BA) is a destructive inflammatory obliterative cholangiopathy of the neonate that affects various parts of the bile duct. If early diagnosis followed by Kasai portoenterostomy is not performed, progressive liver cirrhosis frequently leads to liver transplantation in the early stage of life. Therefore, prompt diagnosis is necessary for the rescue of BA patients. However, the prompt diagnosis of BA remains challenging because specific and reliable biomarkers for BA are currently unavailable. In this study, we discovered potential biomarkers for BA using deep proteome analysis by data-independent acquisition mass spectrometry (DIA–MS). Four patients with BA and three patients with neonatal cholestasis of other etiologies (non-BA) were recruited for stool proteome analysis. Among the 2110 host-derived proteins detected in their stools, 49 proteins were significantly higher in patients with BA and 54 proteins were significantly lower. These varying stool protein levels in infants with BA can provide potential biomarkers for BA. As demonstrated in this study, the deep proteome analysis of stools has great potential not only in detecting new stool biomarkers for BA but also in elucidating the pathophysiology of BA and other pediatric diseases, especially in the field of pediatric gastroenterology.


Introduction
Biliary atresia (BA) is a destructive inflammatory obliterative cholangiopathy of neonates that affects various parts of the intra-and extrahepatic bile duct and causes cholestasis, which manifests as jaundice with hyperbilirubinemia. BA is a pediatric emergency because progression frequently leads to cirrhosis and liver transplantation [1]. As there are many other rare diseases that present with cholestasis in infancy and specific biomarkers for BA have not yet been identified, the prompt diagnosis of BA remains challenging for pediatric surgeons and pediatric gastroenterologists. Operative cholangiogram, an invasive diagnostic procedure, is the gold standard for the definitive diagnosis of BA, as it

Patients
This was a retrospective observational study that analyzed stool proteins of BA and non-BA patients. Four BA patients before Kasai portoenterostomy and three non-BA patients were recruited for stool proteome analysis. Naturally defecated stools were preserved at −80 • C. All patients with BA were classified as type III (obstruction of the most proximal part of the extrahepatic biliary tract at the porta hepatis), whereas three non-BA patients had neonatal intrahepatic cholestasis caused by citrin deficiency (NICCD), cholestasis after repair of gastroschisis, and veno-occlusive disease (VOD). The average ages of BA patients and non-BA patients at the time of stool collection were 56 and 80 days, respectively (Table 1). Abbreviations: Ave, avellage; BA, biliary atresia; NICCD, intrahepatic cholestasis caused by citrin deficiency; GS, gastroschisis; VOD, veno-occlusive disease; AST, aspartate amino transferase (U/L); ALT, alanine aminotransferase (U/L); T-Bil, total bilirubin (mg/dL); D-Bil, direct bilirubin (mg/dL); GGTP, gamma-glutamyl transpeptidase (U/L); RR, reference range. Non-BA: cholestasis other than BA. Age: days after birth.

Proteome Analysis
Soluble proteins in stools prepared in PBS with protease inhibitors were extracted by pipetting and inverting after incubating for 30 min on ice. After centrifugation at 15,000× g for 15 min at 4 • C to remove insoluble matter, the supernatants were transferred to new tubes and subjected to trichloroacetic acid precipitation (final concentration 12.5% v/v), followed by acetone washing and drying via an opened lid. The dried samples were redissolved in 0.5% sodium dodecanoate and 100 mM Tris-HCl (pH 8.5) using a water-bath-type sonicator (Bioruptor UCD-200, SonicBio Corporation, Kanagawa, Japan). The pretreatment for shotgun proteome analysis was performed as reported previously [15].
Peptides were directly injected onto a 75 µm × 20 cm PicoFrit emitter (New Objective, Woburn, MA, USA) packed in-house with C18 core-shell particles (CAPCELL CORE MP 2.7 µm, 160 Å material; Osaka Soda Co., Ltd., Osaka, Japan) at 45 • C and then separated with an 80-min gradient at a flow rate of 100 nL/min using an UltiMate 3000 RSLCnano LC system (Thermo Fisher Scientific, Waltham, MA, USA). Peptides that eluted from the column were analyzed on a Q Exactive HF-X (Thermo Fisher Scientific) for overlapping window DIA-MS [15]. MS1 spectra were collected in the range of 495-785 m/z at 30,000 resolutions to set an automatic gain control target of 3e6 and maximum injection time of 55. MS2 spectra were collected in the range of more than 200 m/z at 30,000 resolutions to set an automatic gain control target of 3e6, maximum injection time of "auto", and stepped normalized collision energies of 22, 26, and 30%. The isolation width for MS2 was set to 4 m/z, and overlapping window patterns in 500-780 m/z using window placements were optimized using Skyline.
MS files were searched for a human spectral library using Scaffold DIA (Proteome Software, Inc., Portland, OR, USA). The human spectral library was generated from the human protein sequence database (UniProt id UP000005640, reviewed, canonical) using Prosit [16]. The Scaffold DIA search parameters were as follows: experimental data search enzyme, trypsin; maximum missed cleavage sites, 1; precursor mass tolerance, 8 ppm; fragment mass tolerance, 8 ppm; and static modification, cysteine carbamidomethylation. The protein identification threshold was set for both peptide and protein false discovery rates of less than 1%. Peptide quantification was calculated using the EncyclopeDIA algorithm [17] in Scaffold DIA. For each peptide, the four highest-quality fragment ions were selected for quantitation. Protein quantification was estimated from the summed peptide quantification.

Data Analysis
The Gene Ontology (GO) enrichment analysis tool (Enrichr) was used to retrieve functional annotation [18]. To determine differential proteins between BA and non-BA, the statistical p-value (Mann-Whitney U test, p < 0.05) was used in data analysis. Elevated proteins in plasma and liver tissues were referred to the Human Protein Atlas (HPA; https://www.proteinatlas.org/) [19]. A heatmap was drawn based on Z-scores calculated from the DIA protein quantification using the R (version 3.5.1) function "heatmap2".

Ethical Approval and Consent to Participate
This study was approved by the institutional review board (IRB) of the Faculty of Medicine, University of Tokyo (IRB No. 2019010NI), and informed consent was obtained from all subjects. Figure 1 shows the stool proteome analysis workflow. In this study, to prevent entry of proteins other than those derived from the host as much as possible, proteins were mildly extracted with PBS so as not to break the bacteria and food debris in the stool, and proteins were purified by TCA precipitation. Then, the proteins were digested and measured using overlapping DIA-MS. Stool proteins were identified and quantified from the seven MS data (four samples from patients with BA and three from non-BA individuals). In our proteome analysis, 2110 host-derived proteins were identified in stool samples. The host stool proteins overlapped only approximately 50% with the plasma proteins, and the plasma and stools had different protein profiles ( Figure 2A). Thus, unique stool biomarkers may be identified. In addition, a wide range of proteome analysis was performed with dynamic ranges of 10 7 or greater, and the number of identified proteins was approximately 2000 ( Figure 2B). When measuring HEK293 digests in the same analysis, more than 8000 proteins were observed. Based on these facts, the host stool protein concentrations presented a wide dynamic range, and high-depth analyses such as DIA-MS have great value for discovering biomarkers from stool samples.

Data Analysis
The Gene Ontology (GO) enrichment analysis tool (Enrichr) was used to retrieve functional annotation [18]. To determine differential proteins between BA and non-BA, the statistical p-value (Mann-Whitney U test, p < 0.05) was used in data analysis. Elevated proteins in plasma and liver tissues were referred to the Human Protein Atlas (HPA; https://www.proteinatlas.org/) [19]. A heatmap was drawn based on Z-scores calculated from the DIA protein quantification using the R (version 3.5.1) function "heatmap2".

Ethical Approval and Consent to Participate
This study was approved by the institutional review board (IRB) of the Faculty of Medicine, University of Tokyo (IRB No. 2019010NI), and informed consent was obtained from all subjects. Figure 1 shows the stool proteome analysis workflow. In this study, to prevent entry of proteins other than those derived from the host as much as possible, proteins were mildly extracted with PBS so as not to break the bacteria and food debris in the stool, and proteins were purified by TCA precipitation. Then, the proteins were digested and measured using overlapping DIA-MS. Stool proteins were identified and quantified from the seven MS data (four samples from patients with BA and three from non-BA individuals). In our proteome analysis, 2110 host-derived proteins were identified in stool samples. The host stool proteins overlapped only approximately 50% with the plasma proteins, and the plasma and stools had different protein profiles ( Figure 2A). Thus, unique stool biomarkers may be identified. In addition, a wide range of proteome analysis was performed with dynamic ranges of 10 7 or greater, and the number of identified proteins was approximately 2000 ( Figure 2B). When measuring HEK293 digests in the same analysis, more than 8000 proteins were observed. Based on these facts, the host stool protein concentrations presented a wide dynamic range, and high-depth analyses such as DIA-MS have great value for discovering biomarkers from stool samples.   Among the identified proteins, 103 were significantly different (p < 0.05) between the two groups (BA vs. non-BA) ( Figure 3). Of these 103 proteins, 49 proteins were significantly higher in patients with BA (BA-dominant: Table 2), whereas 54 proteins were significantly lower in patients with BA (non-BA-dominant: Table 3).
BA is a disorder that occurs during infancy with unknown etiology, which may lead to liver cirrhosis [1]. BA requires prompt and accurate diagnosis because late Kasai portoenterostomy is one of the risk factors of inappropriate bile drainage, which is an early indication for liver transplantation [20]. However, neonatal cholestasis has many causes (other than BA); thus, the accurate diagnosis of BA is challenging. Although many examinations are available for diagnosing BA, including Sudan III staining of stool fat [2], measurement of duodenal bile acid [21], and hepatobiliary scintigraphy [22], more reliable examinations or biomarkers for BA are needed. Unfortunately, invasive procedures, such as operative cholangiogram [1], are eventually required to distinguish BA from non-BA causes of neonatal cholestasis, as in non-BA.1 patient in this study, who underwent an operative cholangiogram in order to distinguish the condition from BA.
Many studies have attempted to determine the etiology of BA and discover new specific BA biomarkers. Recently, a serum proteome analysis of patients with BA found that serum levels of matrix metalloprotease-7 (MMP-7) are high in patients with BA, and this has been considered a feasible BA biomarker [23][24][25]. These studies suggested that serum MMP-7 may help diagnose BA, but the diagnostic range of enzyme-linked immunosorbent assay (ELISA) kits for MMP-7 is inconsistent. Therefore, the role of MMP-7 as a feasible biomarker remains controversial [26].
We hypothesized that stools of patients with BA contained fewer proteins produced in the biliary tract, secondary to the obstruction to the normal route of the bile juice, which is the pathophysiological mechanism of BA. Therefore, certain specific proteins that originate in the biliary tract are possibly absent or dramatically reduced in the stools of patients with BA, compared to the stools of non-BA patients.
We found that specific proteins that are elevated in the liver tissues, such as RBP4, SHMT2, HMGCS1, ADH6, ALDH1A1, ACADS, ADK, KHK, ACAA2, PSAT1, AMACR, and PTGR1, presented significantly lower abundances in stool samples from patients with BA. This finding supported our hypothesis that stools in patients with BA contained fewer proteins produced in the biliary tract. Measurement of these specific proteins or a combination of these proteins may assist in the early diagnosis of BA.
In addition, among the 49 dominant proteins in the stools of the BA group, we found that specific proteins such as CEACAM1, CEACAM5, and CEACAM8 were significantly higher in patients with BA than in non-BA patients. CEACAM1, known as biliary glycoprotein (BGP-I) is considered a cell adhesion molecule that is also distributed in the biliary tract [19,27,28]. Furthermore, soluble CEACAM1 is shed into human bile, where it can serve as an indicator of obstructive and inflammatory liver diseases [28]. Regarding CEACAM5, components of human bile from patients with biliary obstruction exhibited cross-reactivity with CEACAM5 antisera in the absence of gastrointestinal malignancies [29]. Moreover, CHI3L1 is correlated with worse outcomes of BA [30], and XDH plays a role in oxidative stress and hepatic disease pathogenesis [31]. Therefore, these dominant proteins in the stools of the BA group may contribute to the pathogenesis of BA, and measurement of these proteins in stool may also assist in the early diagnosis of BA.
This study has several limitations. First, the sample size of the preliminary study was small because BA is a rare disease, and the prevalence of BA ranges from 1 in 5000 to 25,000 live births [1]. Second, the incidence of non-BA patients is also very small. Third, although our data were statistically analyzed, a study with larger cohorts may show different results, and it is possible that these results are not generalizable to larger, more diverse populations.
However, deep proteomic analysis of BA patient stools is a new method that has the potential to detect biomarkers and elucidate the unknown etiology of BA according to stool proteins. The stool biomarker candidates of BA that were found in this study can be clinically applied after they are validated by studies that use target-based high-throughput methods, such as ELISA and selected reaction monitoring (SRM). Therefore, it is quite important to validate our preliminary results with larger cohorts.

Conclusions
Our study is the first to establish deep proteome analysis of stools and apply it to infants with cholestasis, including both BA and non-BA cohorts. Our new method of deep proteome analysis by DIA-MS can detect over 2000 host-derived proteins in stools and provides a method for discovering new BA biomarkers. Further large-scale studies are needed to validate our results regarding the varying levels of host-derived stool proteins in BA as potential specific and reliable biomarkers for diagnosing this disease. Moreover, deep proteome analysis of stools has great potential to elucidate the pathophysiology of BA and other pediatric diseases, especially in the field of pediatric gastroenterology.

Conflicts of Interest:
The authors declare no conflict of interest.