Shotgun Proteomics Revealed Preferential Degradation of Misfolded In Vivo Obligate GroE Substrates by Lon Protease in Escherichia coli

The Escherichia coli chaperonin GroEL/ES (GroE) is one of the most extensively studied molecular chaperones. So far, ~80 proteins in E. coli are identified as GroE substrates that obligately require GroE for folding in vivo. In GroE-depleted cells, these substrates, when overexpressed, tend to form aggregates, whereas the GroE substrates expressed at low or endogenous levels are degraded, probably due to misfolded states. However, the protease(s) involved in the degradation process has not been identified. We conducted a mass-spectrometry-based proteomics approach to investigate the effects of three ATP-dependent proteases, Lon, ClpXP, and HslUV, on the E. coli proteomes under GroE-depleted conditions. A label-free quantitative proteomic method revealed that Lon protease is the dominant protease that degrades the obligate GroE substrates in the GroE-depleted cells. The deletion of DnaK/DnaJ, the other major E. coli chaperones, in the ∆lon strain did not cause major alterations in the expression or folding of the obligate GroE substrates, supporting the idea that the folding of these substrates is predominantly dependent on GroE.


Introduction
Most proteins must fold into their native structures to gain their functions [1]. However, protein folding often competes with the formation of aggregates, which not only impair protein function but also cause cytotoxicity in some cases [2][3][4]. In cells, molecular chaperones facilitate the proper folding of various proteins by preventing aggregate formation [2,3]. Indeed, previous studies have revealed that a significant fraction of the proteins in any cell requires at least one or several chaperones. In Escherichia coli, GroEL/ES (GroE) and DnaK/DnaJ (DnaKJ)-GrpE are known to assist in the proper folding of a large subset of proteins [2,3].
GroE is the only essential chaperone for the growth of E. coli [5,6]. One of the decadeslong efforts in chaperonin biology is to identify the substrate proteins that are obligately dependent on GroE for proper folding in the cell (in vivo obligate GroE substrates). Kerner et al. [6] identified hundreds of proteins that interact with GroEL in E. coli, using a mass spectrometry (MS)-based proteomics approach. The GroEL-interactors are classified into three categories (Classes I, II, and III) according to the relative amount of the proteins bound to GroEL against the total cellular amount of each protein. In their study, the Class III substrates, which are most enriched in the GroE complex, were classified as potential obligate GroE substrates. Subsequently, Fujiwara et al. [7] conducted a further systematic assessment by using a conditional GroE expression strain, E. coli MGM100 [8], and found that only a subset of the Class III substrates has obligate GroE dependences for their folding in vivo. These substrates are regarded as in vivo obligate GroE substrates, and termed Class IV substrates. Further analyses based on data from a cell-free proteomics approach identified several in vivo obligate GroE substrates [9].  The horizontal axis indicates the value of the fold changes by the GroE depletion taken as log 2 , and the vertical axis indicates p-values by Welch's t-test (two-sided) with three technical replicates in each sample taken as −log 10 . For multiplicity correction, the p-values are adjusted by the Benjamini-Hochberg method. Red dots indicate the in vivo obligate GroE substrates. (C) Proteomic changes in the MGM100-derived protease-deletion strains by the GroE depletion, depicted as volcano plots. Red dots indicate the in vivo obligate GroE substrates. (D) Distribution of the fold changes of the in vivo obligate GroE substrates in each strain depicted as a boxplot. The box portions and the central bands are described according to the 25th percentile and the median, respectively. **** p-value < 0.001 (Wilcoxon's rank-sum test).

Obligate GroE Substrates Tend to Be Degraded by Lon under GroE-Depleted Conditions
To investigate the degradation properties of the in vivo obligate GroE substrates under GroE-depleted conditions, we employed the SWATH-MS acquisition method. SWATH-MS can evaluate the relative abundances of proteins in a label-free manner at the proteome level in nutrient-rich media. To deplete the expression of GroE in cells, we used the MGM100 strain, in which the groESL gene is controlled by the arabinose-inducible BAD promoter to regulate the expression by sugar, as used in previous studies [7,9] ( Figure 1A). First, we compared the expression amounts of the whole-cell proteins under GroE-depleted conditions with those under GroE-normal conditions. Based on this analysis, we evaluated the fold changes of~1300 proteins (Supplementary Dataset S1). Among these evaluated proteins,~20 in vivo obligate GroE substrates were included ( Table 1). As reported previously, the MS analysis confirmed the GroE depletion-induced large proteome alteration, including the over expression of MetE and several heat shock proteins such as ClpB and DnaK [7] (Supplementary Dataset S1). Importantly, the expression of the obligate GroE substrates showed a strong tendency to decrease when GroE was depleted ( Figure 1B). This result suggests that many obligate GroE substrates tend to be degraded, rather than forming aggregates in the GroE-depleted cells. Next, we investigated the relevance of the Lon, ClpXP, and HslUV cytosolic proteases on the degradation of misfolded proteins in the GroE-depleted cells. We made new deletion strains of each protease in MGM100 as the background strain (MGM100∆lon, MGM100∆clpPX, and MGM100∆hslVU). Then, we investigated their proteome changes elicited by the GroE-depletion. Strikingly, the volcano plot, which indicates the variation of expression amounts on the horizontal axis and statistical certainty on the vertical axis, showed that the decrease in the obligate GroE substrates under GroE-depleted conditions was largely recovered in the MGM100∆lon strain ( Figure 1C and Supplementary Dataset S1). This recovery trend was statistically significant from the fold change distributions of the obligate GroE substrates (p = 0.000942, by Wilcoxson's rank-sum test), represented as a box-plot, which depicted the distribution of the values in each sample ( Figure 1D). In addition, some of the previously known Lon substrates including LipA, one of the obligate GroE substrates, were increased in the MGM100∆lon strain (Supplementary Table S1), corroborating the assumption based on the previous findings. In contrast, the volcano plots of the ClpXP and HslUV deletion strains did not show this trend ( Figure 1C and Supplementary Dataset S1). The boxplot revealed a weak recovery of the GroE substrates in the MGM100∆hslVU cells, but the difference was not statistically significant (p = 0.3273, by Wilcoxson's rank-sum test) ( Figure 1D). No significant changes by GroE depletion were observed for the GroE substrates belonging to other classes (Class I, II, and III minus) in any of the E. coli strains (Supplementary Figure S1). Furthermore, the MS analysis revealed that five GroE substrates (FbaB, FtsE, NagZ, YbhA, and Tas), which were identified in the GroE-normal cells but not in the GroE-depleted cells were reproducibly identified in the MGM100∆lon strain under GroE-depleted conditions ( Table 1 and Supplementary Figure S1B). This result suggests that the proteolysis of the five proteins by Lon protease under GroE-depleted conditions was circumvented in the Lon-deleted cells.

Deletion of DnaKJ Barely Affects the Folding of Most In Vivo Obligate GroE Substrates
Our previous reconstituted cell-free translation (PURE system) analysis revealed that almost all of the obligate GroE substrates have a strong tendency to form aggregates without chaperones [7,16], but for many of them the aggregation-formation is rescued by the DnaKJ system [17]. Therefore, we assumed that the folding of the in vivo obligate GroE substrates might depend on not only GroE but also DnaKJ. If this is the case, then the Lon deletion in DnaKJ-deleted cells would affect the abundance of the in vivo obligate GroE substrates, even in the presence of GroE. Accordingly, we compared the proteome changes between the wildtype, dnaKJ-deleted (∆dnaKJ), and dnaKJ&lon-deleted (∆dnaKJ∆lon) strains. As reported previously, the deletion of dnaKJ caused drastic proteome changes [18] (Figure 2A and Supplementary Dataset S2). Among~1300 evaluated proteins, 60~70 and 60~80 proteins were specifically up-and down-regulated (fold change > 2 or <0.5 and adjusted p-value < 0.05) by the deletion of dnaKJ or dnaKJ and lon, respectively ( Figure 2A and Supplementary Dataset S2). In contrast, the proteome change by the deletion of lon in addition to dnaKJ was small ( Figure 2A and Supplementary Dataset S2). Although 30~40 proteins were specifically up-regulated (fold change > 2 and adjusted p-value < 0.05) by the deletion of lon, their fold change values were not large compared to the results in wildtype vs. ∆dnaKJ or wildtype vs. ∆dnaKJ∆lon ( Figure 2A and Supplementary Dataset S2).
In addition, only about five to nine proteins were down-regulated in the ∆dnaKJ∆lon strain against ∆dnaKJ (Figure 2A and Supplementary Dataset S2). Notably, the obligate GroE substrates did not show any remarkable changes in both deletion strains ( Figure 2A, Table 2 and Supplementary Figure S2A). This result suggests that DnaKJ is not an additional factor associated with the folding of the obligate GroE substrates in cells.
However, the possibility that these substrates form aggregates before degradation remained, and hence the amounts of the proteins are not changed. To assess this possibility, we prepared the pellet fraction from the lysates of each strain by centrifugation and conducted the same proteomic analysis. Although the reproducibility of the fold change values was not as good as that of the total proteome analysis (Supplementary Figure S2B), the results clearly demonstrated that only a small subset of the obligate GroE substrates accumulated in the ∆dnaKJ cells and the ∆dnaKJ∆lon cells ( Figure 2B and Table 2). In other words, the absence of DnaKJ does not induce the aggregate formation of many obligate GroE substrates. In summary, the results suggest that there is no additional benefit to having DnaKJ for the folding of many obligate GroE substrates when GroE is present.

Metabolic Perturbations by Protease Deletions under GroE-Depleted Conditions Revealed by Clustering Analysis
In the above analyses, we only focused on the changes in the amounts of the obligate GroE substrates. However, although the GroE depletion alone causes drastic changes in protein expression and metabolism [7], the additional deletion of the proteases may elicit further perturbations of the proteome or metabolome. Therefore, we assessed the proteome changes caused by deleting each protease under GroE-depleted conditions. For this purpose, we performed a clustering approach with the fold change values, defined as the ratio of protein abundances under GroE-normal and GroE-depleted conditions in each strain. We chose~1000 proteins, with fold change values quantified in all four strains, for clustering by the k-means method. The number of clusters was set to six, and the fold change values were converted to logarithmic values before the clustering. The clustering analysis returned four clusters containing small numbers of proteins (Clusters 1~4) and two clusters with larger numbers of proteins (Clusters 5~6) ( Figure 3A and Supplementary Dataset S3). Clusters 1~4 exhibited relatively more significant differences in their fold-changes than the other two, suggesting that these four clusters could provide some information about the specific changes caused by the deletion of the proteases. We then applied an enrichment analysis with annotation by the KEGG BRITE hierarchy [19] to characterize these four clusters ( Table 3). As shown in Table 3, many metabolic pathways were enriched in each cluster. Especially, Cluster 3, in which the fold change pattern in MGM100∆lon was only decreased, had the highest number of metabolic pathways with fluctuations, including amino acid metabolism and nucleic acid metabolism. This result suggests that the deletion of Lon protease in the GroE-depleted cells causes a further metabolic perturbation in addition to the GroE depletion.
applied an enrichment analysis with annotation by the KEGG BRITE hierarchy [19] to characterize these four clusters ( Table 3). As shown in Table 3, many metabolic pathways were enriched in each cluster. Especially, Cluster 3, in which the fold change pattern in MGM100Δlon was only decreased, had the highest number of metabolic pathways with fluctuations, including amino acid metabolism and nucleic acid metabolism. This result suggests that the deletion of Lon protease in the GroE-depleted cells causes a further metabolic perturbation in addition to the GroE depletion.     To investigate the perturbations from the deletion of Lon more directly, we defined the proteins with specific changes between MGM100 and MGM100∆lon from the distribution of the fold changes ( Figure 3B). The results of the enrichment analysis for the up-regulated and down-regulated proteins in MGM100∆lon showed that the deletion of Lon caused the upregulation of some metabolic enzymes related to amino acid synthesis under GroE-depleted conditions (Table 4 and Supplementary Dataset S4). Another minor change was observed in Cluster 1, as its fold change pattern revealed a large increase in both MGM100∆lon and MGM100∆hslVU as compared to the other two strains. This cluster included some proteins induced in the stationary growth phase, such as Sra, Dps, WrbA, ElaB, and OsmC.

Discussion
In this analysis, we have shown that many in vivo obligate GroE substrates are degraded by cytosolic proteases under GroE-depleted conditions, and Lon protease is mainly responsible for this degradation (Figure 1). Conversely, DnaKJ does not act as a dominant factor for the folding of these substrates, although a few obligate GroE substrates tended to form aggregates by the deletion of DnaKJ (Figure 2). Based on these results, a plausible scheme of the behavior of the obligate GroE substrates under GroE-depleted conditions is depicted in Figure 4. When GroE is absent, these substrates cannot complete their folding and are degraded by proteases such as Lon. In contrast, when DnaKJ was absent, most of the obligate GroE substrates can complete their folding with the aid of GroE, although a small fraction of the obligate GroE substrates form aggregates in cells. However, since various chaperones such as GroE are highly up-regulated in ∆dnaKJ cells [18], our observation might be affected by these up-regulated chaperones' effects. Considering this point, our results do not exclude the possibility that DnaKJ is also involved in the folding of the obligate GroE substrates under some conditions. degraded by cytosolic proteases under GroE-depleted conditions, and Lon protease is mainly responsible for this degradation (Figure 1). Conversely, DnaKJ does not act as a dominant factor for the folding of these substrates, although a few obligate GroE substrates tended to form aggregates by the deletion of DnaKJ (Figure 2). Based on these results, a plausible scheme of the behavior of the obligate GroE substrates under GroEdepleted conditions is depicted in Figure 4. When GroE is absent, these substrates cannot complete their folding and are degraded by proteases such as Lon. In contrast, when DnaKJ was absent, most of the obligate GroE substrates can complete their folding with the aid of GroE, although a small fraction of the obligate GroE substrates form aggregates in cells. However, since various chaperones such as GroE are highly up-regulated in ΔdnaKJ cells [18], our observation might be affected by these up-regulated chaperones' effects. Considering this point, our results do not exclude the possibility that DnaKJ is also involved in the folding of the obligate GroE substrates under some conditions. Under normal conditions, the in vivo obligate GroE substrates can fold into their native structures with the aid of GroE. Upon GroE depletion, these GroE substrates tend to be degraded by proteases, preferentially by the Lon protease, since they cannot complete their folding without GroE.
The statistical analysis of the proteome changes in our experiments suggested that the deletion of Lon may cause additional metabolic perturbations, such as in amino acid synthesis. Note that the lon deletion does not show large proteome changes under nutrient-rich medium conditions (Niwa T. et al., in preparation). Of course, since the GroE depletion itself reportedly induces large metabolic changes, including the depletion of several amino acids, S-adenosylmethionine, and NADPH [7], this phenomenon may be significant only under exceptional metabolic conditions. However, this observation might reflect the possibility that protein degradation affects the metabolism in an unappreciated manner, although it may be significant only under extreme conditions such as severe energy deficiency. Under normal conditions, the in vivo obligate GroE substrates can fold into their native structures with the aid of GroE. Upon GroE depletion, these GroE substrates tend to be degraded by proteases, preferentially by the Lon protease, since they cannot complete their folding without GroE.
The statistical analysis of the proteome changes in our experiments suggested that the deletion of Lon may cause additional metabolic perturbations, such as in amino acid synthesis. Note that the lon deletion does not show large proteome changes under nutrient-rich medium conditions (Niwa T. et al., in preparation). Of course, since the GroE depletion itself reportedly induces large metabolic changes, including the depletion of several amino acids, S-adenosylmethionine, and NADPH [7], this phenomenon may be significant only under exceptional metabolic conditions. However, this observation might reflect the possibility that protein degradation affects the metabolism in an unappreciated manner, although it may be significant only under extreme conditions such as severe energy deficiency.
In addition, HslUV may also be involved in the degradation of misfolded proteins and additional metabolic perturbations, as shown in Figure 1B,C and Figure 3A and Table 3. Although HslUV is expressed abundantly in cells and its expression is induced by heat stress, knowledge about the physiological role of HslUV is limited. Our observations suggest that HslUV may have certain specific functions in the cell, with some overlapping with Lon. However, the evidence for this overlapped role is not strong, and the details are still unclear.
The up-regulation of some stationary-phase-induced proteins in both MGM100∆lon and MGM100∆hslVU suggests that the deletion of these two proteases may cause additional changes related to starvation or another specific factor could invoke the transition of the growth phase under GroE-depleted conditions. However, the connection between them remains unclear. We confirmed that RpoS, a factor responsible for various stress responses and the growth phase transition [20,21], does not appear to be involved in this change since its expression was up-regulated only in MGM100∆clpPX (Supplementary Dataset S1). Accordingly, further investigations are needed to clarify the physiological roles and the overlapped manners of these cytosolic proteases in detail.
In summary, our study has partially uncovered the fate of the in vivo obligate GroE substrates under GroE-depleted conditions. Furthermore, the inability to degrade the misfolded protein perturbs proper intracellular metabolism. The intimate link between chaperones and proteases in cellular proteostasis is important but not well understood; hence, the approach conducted here would be valuable for analyses of other organisms, including eukaryotes.

Cell Culture and Sample Preparation for the LC-MS/MS Analysis
For the analysis of the MGM100 and MGM100-derived mutants, cells were grown in LB medium supplemented with 0.2% arabinose at 37 • C and harvested at an early logarithmic growth phase (0.2~0.3 OD 660 ). After washing with LB medium, the cells were inoculated into LB medium supplemented with 1 mM diaminopimelate and either 0.2% arabinose or 0.2% glucose. At this time, the OD 660 of the culture solution was set to 0.04. After 3 h of cultivation at 37 • C, the cells were harvested. The OD 660 at the time of collection was around 0.8~1.0. For the analyses of ∆dnaKJ and ∆dnaKJ∆lon, cells were grown in LB medium at 37 • C and harvested at a logarithmic growth phase (~1.0 OD 660 ).
The harvested cells were resuspended in the PTS solution [24] (100 mM Tris-HCl (pH 9.0), 12 mM sodium deoxycholate, 12 mM sodium N-lauroylsarcosinate) and boiled at 95 • C for 5 min. The solution was then frozen at −80 • C for 10 min. Next, the solution was sonicated in an ultrasonic bath for 20 min at room temperature for further cell disruption. After the cell disruption, the protein concentration was measured using the BCA protein assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and fixed at 50 µg in 100 µL or 25 µg in 50 µL by dilution with the PTS solution.
The obtained total proteins were reduced by 10 mM dithiothreitol and incubated for 30 min at room temperature. Afterwards, 50 mM iodoacetamide was added, and the solution was incubated for 20 min at room temperature in the dark for alkylation. After the reduction and alkylation, the solution was diluted 5-fold with 50 mM ammonium bicarbonate and digested by adding Lys-C protease (FUJIFILM-Wako, Osaka, Japan) at 1/100 of the total protein weight and incubating for 3 h at room temperature. The fragmented peptides were further digested by adding Trypsin Gold (Promega, Madison, Wisconsin, USA) at 1/50 of the total protein weight, and incubated at 37 • C overnight. After the digestion, an equal volume of ethyl acetate and 1/20 volume of 10% trifluoroacetic acid (TFA) were added to the peptide solution and mixed vigorously. The mixture was centrifuged at 15,700× g for 2 min, and the upper ethyl acetate layer containing the surfactants was withdrawn. The resulting lower water layer was dried with a centrifugal evaporator. The peptides were then re-dissolved in 0.1% TFA and 2% acetonitrile and desalted with a handmade Stage Tip [25] composed of an SDB-XC Empore Disk (3 M, Maplewood, MN, USA). The peptides bound to the Stage Tip were eluted with 0.1% TFA and 80% acetonitrile. After the desalting, the peptides were dried by a centrifugal evaporator again and re-dissolved in 0.1% TFA and 2% acetonitrile for LC-MS/MS measurements.
For the proteome analysis of pellet fractions, cells were resuspended in lysis buffer (50 mM Tris-HCl (pH 7.5), 100 mM NaCl, 1 mM EDTA) supplemented with a protease inhibitor cocktail (cOmplete™ mini, EDTA free, Roche, Basel, Switzerland) and disrupted by sonication. The resulting lysate was centrifuged at 20,000× g for 10 min, and the supernatant was discarded. After washing twice with the lysis buffer, the pellet was dissolved in the PTS solution. The protein concentration was measured with a BCA protein assay kit and fixed at 10 µg in 25 µL by dilution with the PTS solution. The subsequent processes were performed as described above.

LC-MS/MS Measurement and Data Analysis
The LC-MS/MS measurements were conducted with an Eksigent NanoLC Ultra and TripleTOF 4600 tandem-mass spectrometer or an Eksigent NanoLC 415 and TripleTOF 6600 mass spectrometer (AB Sciex, Framingham, MA, USA). The trap column used for nanoLC was a 5.0 mm × 0.3 mm ODS column with a particle size of 5 µm (L-column2, Chemical Evaluation and Research Institute, Tokyo, Japan). The separation column was a 12.5 cm × 75 µm capillary column packed with 3 µm C18-silica particles (Nikkyo Technos, Tokyo, Japan). The detailed settings for the LC-MS/MS measurements are summarized in Supplementary Table S3. The measurement was conducted three times for each sample. One biological replicate set was used for the analysis of the MGM100 and its derivative strains, and two biological replicate sets were used for ∆dnaKJ and ∆dnaKJ∆lon strains.
Data analysis was performed by the DIA-NN software (version 1.7.16, https://github. com/vdemichev/diann, accessed on 30 April 2021) [26]. The library for SWATH-MS was obtained from the SWATH atlas (http://www.swathatlas.org/, accessed on 30 April 2021); the original data were acquired by Midha et al. [27]. The fold changes between mean intensities and p-values by Welch's t-test were calculated by an in-house R script (R.app for Mac, version 3.6.2). For the correction of multiple testing, p-values were adjusted by the Benjamini-Hochberg method (using the "p.adjust" function). Only the proteins with intensities obtained in all three measurements in both samples were used to calculate fold changes. The enrichment analysis was performed with an in-house R script (using Fisher's exact test). The KEGG BRITE hierarchy information (E. coli MG1655 strain) was downloaded from the website (https://www.genome.jp/brite/eco00001) on 23 March 2016.