In Silico Prediction and Prioritization of Novel Selective Antimicrobial Drug Targets in Escherichia coli

Novel antimicrobials interfering with pathogen-specific targets can minimize the risk of perturbations of the gut microbiota (dysbiosis) during therapy. We employed an in silico approach to identify essential proteins in Escherichia coli that are either absent or have low sequence identity in seven beneficial taxa of the gut microbiota: Faecalibacterium, Prevotella, Ruminococcus, Bacteroides, Lactobacillus, Lachnospiraceae and Bifidobacterium. We identified 36 essential proteins that are present in hyper-virulent E. coli ST131 and have low similarity (bitscore < 50 or identity < 30% and alignment length < 25%) to proteins in mammalian hosts and beneficial taxa. Of these, 35 are also present in Klebsiella pneumoniae. None of the proteins are targets of clinically used antibiotics, and 3D structure is available for 23 of them. Four proteins (LptD, LptE, LolB and BamD) are easily accessible as drug targets due to their location in the outer membrane, especially LptD, which contains extracellular domains. Our results indicate that it may be possible to selectively interfere with essential biological processes in Enterobacteriaceae that are absent or mediated by unrelated proteins in beneficial taxa residing in the gut. The identified targets can be used to discover antimicrobial drugs effective against these opportunistic pathogens with a decreased risk of causing dysbiosis.


Introduction
Due to the worldwide increase in resistance observed among certain bacterial pathogens, there is a pressing need for novel antimicrobials. Most of the antimicrobial drugs approved for human use since the end of the antibiotic golden age in the 1960s belong to known antimicrobial classes and are primarily active against Gram-positive bacteria [1]. Although in recent years novel antimicrobials have become available for effective treatment of Gramnegative bacterial infections, such drugs belong to known antimicrobial classes, and only one antimicrobial with a novel mechanism of action, Murepavadin, has made it to Phase 3 trials [2,3]. Thus, there is a demand for truly novel antimicrobial compounds targeting Gram-negative bacteria. In 2017, the World Health Organization released a list of global priority antimicrobial-resistant (AMR) pathogens in order to guide research, discovery and development of new antibiotics [4]. The highest category ('Priority 1: Critical') comprises Acinetobacter baumannii, Pseudomonas aeruginosa and several Enterobacteriaceae, including Escherichia coli. The latter species is the most frequent cause of several common infections, such as urinary tract infection and septicaemia, that are increasingly difficult to treat due to the global spread of specific hyper-virulent and multidrug-resistant clones. In particular, ST131 is a major contributor to the global spread of fluoroquinolone resistance and extended-spectrum β-lactamase-mediated resistance to β-lactams, and is responsible for millions of multidrug-resistant infections each year [5,6].
Oral antimicrobial therapy impacts the healthy gut microbiota by inducing a loss of beneficial microbes followed by expansion of opportunistic pathogenic bacteria, such as Enterobacteriaceae [7]. This phenomenon, generally referred to as dysbiosis, can in extreme

Essential Genes in the Target Pathogen
Due to the lack of understanding of the essential genome in pathogenic strains such as ST131, the model strain BW25113 was used as a basis for our study. The predicted amino acid sequences of 353 out of the 358 essential genes identified by Goodall et al. [10] were retrieved ( Figure 1). The remaining five genes (ttcC, yddL, yedN, ygeF and ygeN) were labelled as 'pseudogenes' or 'putative proteins'. None of these proteins had been identified as essential in the Keio collection, a systematic collection of single-gene knockout mutations previously used as the gold standard to define the essentialome of E. coli BW25113 [11]. Such genes were excluded from further analysis since their classification as essential by Goodall et al. [10] may be an artefact of the methodology employed in the original study.
The sequences of the 353 proteins were compared to those found in E. coli O25b:H4-ST131 using a pre-defined cut-off (bitscore ≥ 50 or ≥ 70% sequence identity and ≥ 75% alignment length, see Materials and Methods). All of the 353 essential BW25113 proteins were associated with at least one hit in ST131, apart from YmfE and YmiB (Supplementary  Table S1). Eleven additional proteins were excluded as they scored below the cut-off threshold, leading to a list of 340 essential and conserved proteins for further analyses. Inspection of the excluded sequences revealed that 10 were prophage-related or uncharacterized proteins. The presence of phages in a bacterial genome is expected to vary with the specific strain history, and this may explain the observed difference between the laboratory strain BW25113 and the hyper-virulent clonal lineage ST131. The essentiality status of some of these phage-related genes (e.g., relB, ydaS and racR) can be explained through their function as transcriptional regulators, and removal of these proteins allows the phage to activate and subsequently kill the cell.

Similarity to Proteins in Mammalian Hosts
The second step of the analysis aimed at removing E. coli targets homologous to the human proteome. A high degree of similarity between the pathogen-specific target and one or more proteins in the host proteome may result in off-target binding of a drug, leading to toxicity and unwanted side effects. The 340 selected essential proteins were therefore compared to the human proteome, leading to 181 proteins fulfilling the same stringent cut-offs as described above (Figure 1, Table S2).

Similarity to Proteins in Mammalian Hosts
The second step of the analysis aimed at removing E. coli targets homologous to the human proteome. A high degree of similarity between the pathogen-specific target and one or more proteins in the host proteome may result in off-target binding of a drug, leading to toxicity and unwanted side effects. The 340 selected essential proteins were therefore compared to the human proteome, leading to 181 proteins fulfilling the same stringent cut-offs as described above (Figure 1, Table S2).

Similarity to Proteins in Beneficial Taxa of the Gut Microbiota
The next step in the selection pipeline aimed to exclude proteins with high similarity to those found in representatives of the beneficial gut microbiota. Given the complexity and variability of the gut microbiome, we decided to focus on seven taxa containing species previously shown to have beneficial and protective effects on the host [7,8,[12][13][14][15][16][17][18]: Faecalibacterium, Prevotella, Ruminococcus, Bacteroides, Lactobacillus, Lachnospiraceae and Bifidobacterium (Tables S3-S9). The 181 proteins were blasted against the abovementioned taxa using the same cut-off values as before ( Figure 1). As expected, this step was the most selective, leaving just 26 proteins for further analysis (Table 1), and removed all targets of commercially available antibiotics, including FtsI and MrdA (targets of β-lactams), as well as parts of the 30S and 50S ribosome (targets of macrolides, aminoglycosides and tetracyclines) and RNA polymerase (target of rifamycins).
Among the identified 26 proteins, only PheM and TrpL, leader peptides in the Phe tRNA synthetase and Trp biosynthetic operon, respectively, were found to be missing completely in all taxa. The uncharacterized protein YqeL was found to be missing in all except for Lachnospiraceae and Lactobacillus. SafA (part of the low pH stress response) was found to be missing in Lachnospiraceae, Bifidobacterium and Faecalibacterium. MreD, a rod shape-determining protein was missing from Bacteroides and Bifidobacterium. Furthermore, WzyE (probable ECA polymerase) lacked hits in Bifidobacterium and Ruminococcus. Neither LolA, LolB (both involved in lipoprotein transport) nor DnaT (primosomal protein 1) were associated with any hits in Bifidobacterium; LptE (part of the lipopolysaccharide (LPS) assembly machinery) and TusE (a sulphur transferase) were missing in Faecablibacterium; FtsL (a cell division protein) was missing from Bacteroides; and CydX (cytochrome bd-I ubiquinol oxidase subunit X) from Ruminococcus. All other proteins were associated with hits below the cut-off in all taxa.  Due to the high stringency applied in the above step, potentially valuable targets may have been missed in the selection process. Thus, a second analysis of the microbiota BLAST results was undertaken to find proteins associated with only 10 or less hits over cut-off. Ten proteins (BamD, FabA, LptD, MukB, MukF, SecE, YdfO, YgfZ, YrfF and ZipA) were each found to have been excluded based on bitscore cut-off rather than sequence identity. Thus, these proteins were included in further analyses, leading to 36 proteins as potential E. coli-selective targets (Table 1).

Target Essentiality and Conservation in K. pneumoniae
We evaluated presence and essentiality of the selected targets in K. pneumoniae, which represents another global priority pathogen closely related to E. coli. The essentiality of the 36 proteins was checked against the library generated by Ramage et al. [29], together with conservation of the amino acid sequence as above ( Figure 1, Table S10). Eighteen were found to be essential in both organisms, all displaying high sequence conservation. However, all of the remaining proteins not reported to be essential in K. pneumoniae fulfilled the similarity-based selection criteria, except for YqeL, which lacked hits completely (Table 1).

Biological Function of Selected Targets
Of the 36 identified targets (Table 1), 20 were found to share similar biological functions ( Figure 2). The largest functional group comprised of eight proteins involved in outer membrane (OM) biogenesis and maintenance (BamD, LptA, LptD, LptE, LolA, LolB, SecE and YciS) ( Figure 2). Cell division made up the second largest functional group (MukB, MukE, MukF, FtsB, FtsL, FtsQ and ZipA), which is closely linked to the functional groups involved in cell shape (MreD) and DNA replication (HolD, PriB, DnaT and YgfZ) ( Figure 2). Additionally, two members belonging to the functional group involved in transcriptional regulation (PheM and TrpL) were also identified, together with one member involved in translation (TusE). Furthermore, three proteins were involved in biosynthetic pathways (WzyE, FabA and HemD), three in stress response (CydX, IraM and SafA) and two in toxin-antitoxin systems (HipB and HigA) ( Figure 2). Finally, five proteins with unknown functions were also identified (YrfF, YdhL, YcaR, YqeL and YdfO).

Target Localization
An essential requirement for developing an efficient antimicrobial drug is target accessibility. This is especially important in Gram-negative bacteria, where the double membrane structure acts as a permeability barrier, efficiently blocking many compounds from accessing intracellular targets. SCL was therefore considered to evaluate protein's druggability. The SCL for 25 of the 36 selected proteins could be determined using Swiss-Prot, the manually annotated section of UniProtKB ( Figure 2, Table 1). For eleven proteins (DnaT, HemD, HigA, HipB, HolD, PheM, PriB, TrpL, YdfO, YdhL and YqeL), no information about SCL was available. Based on this criterion, the four OM-associated proteins

Target Localization
An essential requirement for developing an efficient antimicrobial drug is target accessibility. This is especially important in Gram-negative bacteria, where the double membrane structure acts as a permeability barrier, efficiently blocking many compounds from accessing intracellular targets. SCL was therefore considered to evaluate protein's druggability. The SCL for 25 of the 36 selected proteins could be determined using Swiss-Prot, the manually annotated section of UniProtKB ( Figure 2, Table 1). For eleven proteins (DnaT, HemD, HigA, HipB, HolD, PheM, PriB, TrpL, YdfO, YdhL and YqeL), no information about SCL was available. Based on this criterion, the four OM-associated proteins (LptD, LptE, LolB and BamD) are promising potential targets, especially LptD, which contains extracellular domains.

Existence of Known Inhibitors
Next, a literature search was conducted to identify previously reported inhibitors of the selected E. coli-specific targets. As expected, none of the targets presented in Table 1 are inhibited by commercially available antibiotics. Through analysis of scientific literature, we were able to identify inhibitors targeting a few of the listed targets but, to our knowledge, none have gone beyond laboratory studies: the ZipA/FtsZ interaction has been reported to be inhibited by certain antimicrobial compounds [27,28]; the insect peptide Thanatin blocks LptA [23]; compound IMB-881 blocks the interaction between LptA and LptC [24]; JB-95 inhibits β-barrel proteins including LptD [25]; MAC13243 inhibits LolA [22]; BamD is inhibited by an inhibitory peptide [19] while the compound IMB-H4 has been shown to block BamA-BamD interaction [20]; MukB is inhibited by the small molecules Michellamine B and NSC260594 [26]; and FabA by 3-Decynoyl-NAC [21]. Thanatin has been shown to possess antimicrobial activity against several Gram-negative bacteria beyond E. coli, including K. pneumoniae, Salmonella typhimurium and Enterobacter cloacae [23]. IMB-H4 was also able to inhibit growth in K. pneumoniae, P. aeruginosa and A. baumannii [20]. NSC176319 was found to be active against Staphylococcus aureus, permeabilized P. aeruginosa and A. baumannii [26]. JB-95 was reported to have antimicrobial activity against A. baumannii, P. aeruginosa and S. aureus [25], and MAC13243 has been shown to be active against P. aeruginosa [22]. With the information provided in this study, some of these inhibitors may represent starting scaffolds for development into pathogen-specific antibacterials. In addition, they might represent useful tools in validating future target-based assays.

Target Structure
Structure-guided drug design is a powerful in silico approach that can rapidly screen millions of compounds for their ability to dock into a desired target, and identified hits can subsequently be tested in vitro. Thus, 3D structures at a high enough resolution represent an advantage for the targets identified in this study.
Information retrieved from the Protein Data Bank (PDB) [30] showed that 3D structure at a resolution of <3 Å was found for 18 proteins, >3 Å for 3 proteins and no structure could be found for 13 proteins, while both YrfF and YdfO were associated with a structure but no resolution information was reported in the databases (Table 1).

Discussion
The originality of the present study lies in the identification of E. coli-selective cellular targets that may lead to the discovery of innovative antimicrobial drugs with limited effect on the healthy gut microbiota. Similar in silico studies have previously been conducted for verotoxigenic E. coli O157:H7, K. pneumoniae, Yersinia pseudotuberculosis and Enterobacteriaceae [31][32][33][34][35]. However, these studies were not designed to identify targets with low identity to the corresponding proteins in beneficial taxa residing in the intestinal tract, and some suffered from limitations related to the lack of a well-established essential genome for the target pathogen.
We identified 36 potential drug targets selective for E. coli based on protein sequence identity. A large proportion of the identified proteins are functionally related amongst themselves. Within the proteins involved in OM biogenesis, BamD is directly associated with the OM, and is part of the β-barrel assembly machinery (BAM); LptA, LptD and LptE are all part of the lipopolysaccharide transport (LPT) machinery; LolA and LolB are located in the periplasmic space and on the periplasmic side of the OM respectively, both belonging to the lipoprotein transport machinery responsible for delivering OM lipoproteins to the OM assembly machineries (LOL, BAM and LPT) [36]; SecE is part of the SecYEG protein translocation machinery responsible for transporting proteins into the periplasm [37]; and the inner membrane protein YciS (also known as LapA, lipopolysaccharide assembly protein A) is part of the machinery responsible for envelope stress response and regulation of LPS production [38].
Two functional groups responsible for DNA replication (HolD, PriB, DnaT and YgfZ) and cell division (MukB, MukE, MukF, FtsB, FtsL, FtsQ and ZipA) were identified. DNA replication is a tightly controlled mechanism where the DNA Polymerase III holoenzyme is the major replication complex in E. coli, and wherein HolD (ψ subunit) makes up parts of the clamp-loading complex [39]. DNA damage can cause this machinery to stall and disassemble on the chromosome, leading to replication failure. To restart replication, the cell must make use of the replication restart primosome, where PriB and DnaT primase is found [40]. YgfZ has been shown to be part of the system regulating chromosomal replication [41]. The MukBEF complex is found only in a subset of γ-proteobacteria and is involved in cell division, making up the sole E. coli condensin for chromosome replication, segregation and organization [42]. Further downstream in this process, the transmembrane complex FtsBL is found [43], which together with FtsQ [44] and ZipA, is involved in cell division [45]. In two related processes, MreD is involved in determining cell shape [46] and TusE in translation [47].
Three selected targets, CydX, IraM and SafA, have functions related to stress response. CydX is part of the CydAB cytochrome bd oxidase complex involved in aerobic respiration and maintaining the charge across the membrane used for ATP synthesis [48]. IraM is a regulator of σ S , the stationary phase sigma factor responsible for controlling expression of a plethora of genes involved in stress response [49]. SafA is a protein that connects the signal transduction between the two-component systems EvgS/EvgA and PhoQ/PhoP in response to acid stress conditions [50].
Four selected targets are involved in biosynthetic pathways. WzyE has been implicated in the assembly of the enterobacterial common antigen [51]; FabA is a protein involved in fatty acid biosynthesis [52]; TrpL is involved in controlling tryptophan biosynthesis [53]; and HemD is a uroporphyrinogen III synthase [54]. The remaining targets include two anti-toxins of the Type II toxin-antitoxin system, HipB and HigA, which counteract the effect of their cognate toxins [55]. One protein involved in transcriptional regulation, PheM, which is responsible for attenuation of phenylalanyl-tRNA synthetase was identified [56]. Finally, no information regarding biological function could be found for the remaining five proteins (YrfF, YcaR, YdhL, YqeL or YdfO), indicating that there is more to discover regarding E. coli biology.
When searching for homologues in the seven representative taxa of the healthy gut microbiota, only TrpL and PheM lacked hits in all these groups. However, no information on SCL and 3D structure is available for either of these two proteins. trpL is part of the essential tryptophan biosynthesis pathway, the costliest of the amino acid synthesis pathways. All organisms capable of this biosynthesis employ structurally similar proteins, but the organization within the trp operon and its regulatory mechanisms vary widely between different organisms. trpL functions as an operon leader peptide, which under low levels of uncharged tRNA Trp , causes the ribosome to stall during translation of this operon. This leads to the formation of an antiterminator structure, allowing translation to continue [57]. Similarly, pheM acts as an operon leader peptide in the phenylalanine biosynthetic pathway and regulates transcription of the pheMS operon [56]. Although the mechanisms of control may differ between bacteria, targeting either of these gene products would be challenging as they exert their control on a transcriptional level. Interestingly, though these proteins are not thought to have any functions in trans, it has recently been shown that in the α-proteobacterial plant symbiont Sinorhizobium meliloti, TrpL can, upon antibiotic exposure, utilize antimicrobial compounds for post-transcriptional regulation of resistance operons, a trait that was further shown to be conserved in other α-proteobacteria [58]. If a similar role can be established for TrpL in E. coli, this may be an interesting target for development of helper drugs.
The potential to target an infecting pathogen without affecting the beneficial microbiota is clinically attractive due to increasing concerns regarding the impact of antimicrobial therapy on dysbiosis [59]. The clinical value of such antibiotics would be even higher if their spectrum covered other pathogenic bacterial species. Thus, the sequence conservation and essentiality of the 36 target proteins selective for E. coli were analyzed for sequence similarity and essentiality in K. pneumoniae, a close relative to E. coli, and another major contributor to multidrug-resistant infections worldwide [60]. Although 35 of the 36 proteins were highly conserved in both species, only 18 were found to be essential in this pathogen. This could be due to disparity of information, since the E. coli essential genome is well characterized, whereas the essentiality status of genes in K. pneumoniae is defined by a single study [29], which may be affected by different methods and conditions to those used to establish the E. coli 'essentialome'.
Due to the double-membrane structure in Gram-negative bacteria, the accessibility of a potential drug target is essential. Proteins located in the OM are therefore considered to be optimal targets [61]. Here, we identified LptD, LptE, LolB and BamD as OM-associated proteins. While LolB and BamD are associated with two separate OM biogenesis pathways, LptD and LptE, together with LptA, all belong to the Lpt pathway responsible for LPS transport. This pathway is integral for cell viability, as Gram-negative bacteria are dependent on it for LPS transport, and could potentially be efficiently targeted without requiring access to the intracellular environment. The Lpt pathway consists of seven essential proteins, of which three have been indicated through this study as appropriate potential drug targets, suggesting that this is a druggable pathway in E. coli. Furthermore, LptD also has extracellular domains, indicating that it may be possible to find an inhibitor that interferes with this protein without the need to cross the OM. All three Lpt proteins were found to be associated with hits in the selected gut microbiotal taxa. Based on bitscore, the highestranking hit for LptA was found in Lachnospiraceae (bitscore = 38, % alignment = 37% and % id = 35%), whereas for LptD and LptE both were found in Bacteroides (bitscore = 172, % alignment = 11%, % id = 96%, and bitscore = 35, % alignment = 35% and % id = 34%, respectively).
Notably, the four proteins listed above have excellent 'druggability' potential, since in addition to their optimal SCL, they are also essential in K. pneumoniae and have known 3D structures. Furthermore, all four proteins displayed low levels of sequence identity to homologous proteins in the selected gut taxa used in this study; in particular, LolB and LptE, which both fell below the cut-offs. Both LptD and BamD were recovered by manual inspection of the proteins excluded. However, both proteins were excluded due to their bitscore, and were only associated with two and four hits, respectively. The selectivity of BamD and LptD can be further evaluated by future in vitro studies using known inhibitors that specifically interfere with these proteins [19,20,25]. Known inhibitors are also available for ZipA [27,28], LptA [23,24], LolA (MAC13243) [22], BamD (peptide and IMB-H4) [19,20], MukB (Michellamine B and NSC176319) [26] and FabA (3-Decynoyl-NAC) [21]. The fact that six inhibitors targeting OM-related processes were found further strengthens the hypothesis that these functions are viable for development of novel antimicrobials.
All in silico studies suffer from drawbacks related to arbitrary cut-off criteria that may lack biological relevance. Too stringent cut-offs potentially exclude valuable drug targets, while too loose criteria may result in an unmanageable list of targets. In the present study, we chose to integrate stringent cut-offs with manual revision in order to minimize exclusion of relevant proteins. Another limitation of this study is that essentiality may differ between in vitro and in vivo conditions. The essential genome established by Goodall et al. [10] was characterized in rich media conditions, and may therefore not include genes that are essential for metabolism inside the host [10,62]. Certain bacterial biosynthetic pathways may be downregulated as the pathogen instead relies on the host to supply nutrients such as amino acids, vitamins and nucleobases [62]. However, targeting biosynthetic pathways involved in maintenance of the cell is likely to represent a target relevant in vitro as well as in vivo [63].
In silico studies such as this are a first but essential step towards the discovery of novel pathogen-targeted antimicrobials. The results of our study provide a starting point towards the identification and development of novel specific antimicrobials targeting E. coli. Future wet-lab studies are required to validate the presumptive selective targets identified by the study. High-throughput screens can be applied to find inhibitors interfering with the specific protein targets, e.g. through 'knock-down' strains with reduced expression of the target protein. The antimicrobial activity of the identified inhibitors could subsequently be evaluated on a comprehensive strain collection representative of the healthy gut microbiota, or directly on faecal samples using a metagenomics approach in order to assess their selective toxicity towards E. coli and other pathogenic Enterobacteriaceae.

Protein Essentiality in E. coli
The GenBank record for E. coli BW25113 (GenBank: CP009273.1) was downloaded, and the protein sequences from the 358 genes found to be essential by Goodall et al. [10] were extracted. Five genes were removed due to being labeled as pseudogenes (ttcC, yedN, ygeF and ygeN) or putative protein (yddL).

Protein Homology in E. coli ST131, Humans and Gut Beneficial Taxa
To establish presence of proteins in E. coli O25b:H4-ST131 (NCBI:txid941322), NCBI + BLASTp was used to BLAST the 353 protein sequences against this organism using the organism name as Entrez query against the refseq protein BLAST database. Percent alignment was calculated by dividing the length of the hit by the length of the query protein, extracted from the NCBI record. A dual cut-off was used, where hits with percent id ≥ 70% and percent alignment ≥ 75% or bitscore ≥ 50 were excluded. These cut-offs were selected to be equal to, or more stringent than, those used in previous in silico studies [31,32,35,62]. Any hits below the cut-offs were removed, and the remaining were taken on to the next step.
To find human analogues, the NCBI + command line remote BLAST tool was used to BLAST the remaining protein sequences using first the Entrez queries 'Homo sapiens [Organism]' against the 'refseq_proteins' database, and the output was sorted using the same cut-offs as described above.
Remote BLASTp command line applications were used to assess protein homology in specific gut taxa using the Entrez queries 'Faecalibacterium  Tables S3-S9. The results were downloaded and analyzed using the same methodology and cut-offs as described above.
All hits sorted as above the cut-off in the similarity search against the simplified microbiota were collected and the number of hits for each protein evaluated. A table with the number of hits, together with the information for the highest scoring hit for each protein, was generated. The proteins with fewer than ten hits were manually inspected, and proteins with high-scoring similarity were removed.

Protein Conservation and Essentiality in K. pneumoniae
To assess protein conservation in K. pneumoniae, command line applications for BLASTp were used to BLAST remotely against 'Klebsiella pneumoniae [Organism]' using the 'refseq_proteins' database (Table S10). The supplementary dataset generated by Ramage et al. [29] was downloaded and used to search for the potential target genes using gene names.

Identification of Inhibitors by Screening Scientific Literature
Scientific literature was screened to identify inhibitors by searching PubMed and Google Scholar using the phrases 'Escherichia coli inhibitor' or 'Escherichia coli antimicrobial' in combination with the protein of interest. Abstracts and manuscripts were screened to find inhibitors targeting the protein of interest in E. coli.

Protein SCL and 3D Structure
The SCL for each protein was manually checked by querying the UniProt/SwissProt database, and retrieving the information found under 'Subcellular Localisation'.
Information about 3D structure for proteins in E. coli K-12 was manually retrieved through the UniProt/SwissProt entries for each protein individually. The PDB accession number, molecules in complex and resolution were recorded.