Cell-Penetrating Peptide–Peptide Nucleic Acid Conjugates as a Tool for Protein Functional Elucidation in the Native Bacterium

Approximately 30% or more of the total proteins annotated from sequenced bacteria genomes are annotated as hypothetical or uncharacterized proteins. However, elucidation on the function of these proteins is hindered by the lack of simple and rapid screening methods, particularly with novel or hard-to-transform bacteria. In this report, we employed cell-penetrating peptide (CPP) –peptide nucleotide acid (PNA) conjugates to elucidate the function of such uncharacterized proteins in vivo within the native bacterium. Paenibacillus, a hard-to-transform bacterial genus, was used as a model. Two hypothetical genes showing amino acid sequence similarity to ι-carrageenases, termed cgiA and cgiB, were identified from the draft genome of Paenibacillus sp. strain YYML68, and CPP–PNA probes targeting the mRNA of the acyl carrier protein gene, acpP, and the two ι-carrageenase candidate genes were synthesized. Upon direct incubation of CPP–PNA targeting the mRNA of the acpP gene, we successfully observed growth inhibition of strain YYML68 in a concentration-dependent manner. Similarly, both the function of the candidate ι-carrageenases were also inhibited using our CPP–PNA probes allowing for the confirmation and characterization of these hypothetical proteins. In summary, we believe that CPP–PNA conjugates can serve as a simple and efficient alternative approach to characterize proteins in the native bacterium.


Introduction
Proper and correct functional analyses of newly discovered proteins is crucial in order to elucidate the role of the protein in a given metabolic reaction or cellular system. Thus far, elucidation of such novel proteins has depended highly on heterologous protein expression techniques whereby the gene encoding for the protein is cloned into specified cloning vectors, allowing for the protein to be overexpressed in easily cultivable and manipulative host strains [1]. Well-established with numerous commercially available protein expression system lineups [2][3][4], many newly discovered proteins can be easily and efficiently characterized. However, the expression of the target protein using this approach still requires validation and optimization based on factors such as the physiological role and solubility of the expressed protein, the presence or absence of post-transcriptional modification, and codon usage. Furthermore, protein expression can be hindered by microbial host compatibility if the target protein originates from unknown or uncultivable strains [5], while some hosts are also known to rearrange genes or alter them [6] as a defense mechanism to prevent cytotoxic effects of the foreign protein to itself. To counter this effect, efforts to establish new microbial expression hosts are being conducted, however, this process is tedious, time consuming, and still does not guarantee the expression of the target gene or gene clusters.
Due to the difficulties faced by conventional heterologous protein expression, alternatively, researchers have taken advantage of the vast amount of data generated by next generation sequencers from online databases to perform protein functional analyses. Gene annotation can now be further facilitated by in silico algorithms such as structural and functional gene annotation, where it is possible to make predictions and speculate the biological characteristics of new or candidate proteins based on orthology, conserved domains, subcellular targeting signals or its tertiary structure [7,8]. In addition, this approach is applicable in predicting the function of novel proteins attained from uncultivable hosts or those that were unsuccessfully expressed in current conventional protein expression systems. However, despite the availability and applicability of these in silico approaches, a downside to their application is the possibility of incorrect protein characterization [9][10][11]. Proteins whose functions have been predicted based on a low number of available samples, or proteins that have not been experimentally justified, can be used as a base for prediction. Today, even with the availability of high-throughput sequencing and computationally advanced protein prediction tools, we still face problems with misannotated proteins in public databases [12].
With the challenges faced from in vitro approaches, such as the current heterologous expression and in silico prediction systems, efforts to further promote more effective and accurate characterization of proteins have been directed towards enhancing in vivo expression of the proteins within the native bacterium. One approach is the alteration of the cultivation environment of the bacterium by employing extreme environmental factors [13]. Altering the cultivation environment by applying different stresses results in the microbe expressing more of the target protein or utilizing a different metabolic pathway to counter the stress applied, leading to the expression of "silent" or "dormant" proteins. Biosynthetic engineering approaches, such as the recombination of transcription regulators via gene refactoring, have also been introduced to promote the expression of "silent" gene clusters of microbial natural products [14][15][16]. Such an approach allows the identification of modification enzymes that can be employed as key components for the synthesis of novel lead compounds. Although the implementation of such approaches is effective and has allowed for more proteins to be discovered, validating the function of the protein of interest within the native bacterium is challenging. Researchers still need to resort to current conventional methods in order to functionally characterize the target protein.
Our group focuses on cell-penetrating peptides (CPPs) as a tool to promote the delivery of biomolecules into bacteria, where we have been successful in promoting CPP permeation in diverse bacterial strains by optimizing the abiotic factors [17] and using unnatural amino acid residues [18]. In this study, we challenged the use of CPP in combination with peptide nucleic acid (PNA) to elucidate the function of proteins in vivo within the native bacterium, taking it one step further by testing the approach with a hard-to-transform bacterium, Paenibacillus, as a model. We believe that our work will allow for a more simple and efficient approach to characterizing proteins, eventually working towards a new standard approach in protein functional analysis.

Genomic DNA Extraction
Paenibacillus sp. YYML68 was cultured in Marine Broth 2216 (MB; BD Difco, Tokyo, Japan) at 30 • C for 2 days with agitation. The cultured cells were collected via centrifugation at 7500× g, 10 min, and genomic DNA extraction was performed using the Genomic DNA Buffer Set and Genomic-tip 100/G (QIAGEN, Tokyo, Japan) based on the manufacturer's protocols. The extracted DNA concentration was quantified using the Qubit dsDNA HS Assay Kit with a Qubit 3.0 fluorometer (Thermo Fischer Scientific, Tokyo, Japan).

Genome Sequencing and Assembly
Genomic DNA sequencing was performed using the Oxford Nanopore Technologies (ONT; Oxford, UK) and Illumina (Tokyo, Japan) platforms. For ONT sequencing, a genomic library was prepared using the Rapid sequencing kit (SQK-RAD004) based on the manufacturer's protocols. Sequencing was performed using a MinION system with a run time of 48 h, using a R9.4 flow cell. For Illumina sequencing, library preparation and sequencing were performed by Macrogen Japan Corporation (Tokyo, Japan). A library was prepared using the TruSeq DNA PCR-Free kit from 1 µg of genomic DNA, followed by a 151 bp paired-end sequencing run using a NovaSeq6000 system.

Carrageenan Degradation by Paenibacillus sp. YYML68
Paenibacillus sp. strain YYML68 was cultured, collected as described above, and washed twice with filter-sterilized seawater. Upon washing, the cells were counted using a bacteria-counting chamber (Erma Inc., Tokyo, Japan) and were inoculated to 3 mL (final concentration of 1.0 × 10 5 cells/mL) of 4× diluted MB supplemented with 1% ιcarrageenan, 1% κ-carrageenan or 2% λ-carrageenan, respectively, prepared in 15 mL centrifugation tubes. Each sample was incubated at 30 • C with gentle agitation, and degradation of each carrageenan type was measured at 0, 8, 16 and 24 h. Non-bacteria inoculated MB-carrageenan samples were used as negative controls. Taking advantage of the fact that viscous carrageenan solutions lose their viscosity due to enzymatic degradation, carrageenan degradation was determined simply by measuring the increase in height of the liquid level of the respective cultures upon stirring, and comparing it to the negative controls at the designated time points. Stirring of the samples was performed using a Vortex-Genie 2 Mixer (Scientific Industries, New York, USA) at the lowest speed setting. The CPP-PNA probe design and sequences used in this work are provided in Figure 1. Briefly, the CPP sequence, (DabFF) 3 Dab, used in this work comprised a combination of an unnatural cationic amino acid (2,4-diaminobutyric acid, Dab) and phenylalanine (F), arranged similarly to that in our previous reports [17]. 2-aminoethoxy-2-ethoxy acetic acid (AEEA) was used as the linker between (DabFF) 3 Dab and PNA. The PNA sequences, for the designated CPP-Pae68acp, CPP-Pae68cgiA and CPP-Pae68cgiB probes, were designed to bind to the start codon of the mRNA encoding for the acyl carrier protein, AcpP, the candidate ι-carrageenase, CgiA, and the candidate ι-carrageenase, CgiB, respectively. All CPP-PNA probes were synthesized using standard peptide solid-phase synthesis protocols, purified with a reversed-phase high-performance liquid chromatography (RP-HPLC) system (JASCO Corp., Tokyo, Japan), and subjected to mass spectrometry analysis using a MALDI-TOF MS (autoflex speed TOF) system (Bruker Japan K. K., Kanagawa, Japan) analysis for successful synthesis confirmation. geenan samples were used as negative controls. Taking advantage of the fact that viscous carrageenan solutions lose their viscosity due to enzymatic degradation, carrageenan degradation was determined simply by measuring the increase in height of the liquid level of the respective cultures upon stirring, and comparing it to the negative controls at the designated time points. Stirring of the samples was performed using a Vortex-Genie 2 Mixer (Scientific Industries, New York, USA) at the lowest speed setting.

CPP-PNA Design and Synthesis
The CPP-PNA probe design and sequences used in this work are provided in Figure  1. Briefly, the CPP sequence, (DabFF)3Dab, used in this work comprised a combination of an unnatural cationic amino acid (2,4-diaminobutyric acid, Dab) and phenylalanine (F), arranged similarly to that in our previous reports [17]. 2-aminoethoxy-2-ethoxy acetic acid (AEEA) was used as the linker between (DabFF)3Dab and PNA. The PNA sequences, for the designated CPP-Pae68acp, CPP-Pae68cgiA and CPP-Pae68cgiB probes, were designed to bind to the start codon of the mRNA encoding for the acyl carrier protein, AcpP, the candidate ι-carrageenase, CgiA, and the candidate ι-carrageenase, CgiB, respectively. All CPP-PNA probes were synthesized using standard peptide solid-phase synthesis protocols, purified with a reversed-phase high-performance liquid chromatography (RP-HPLC) system (JASCO Corp., Tokyo, Japan), and subjected to mass spectrometry analysis using a MALDI-TOF MS (autoflex speed TOF) system (Bruker Japan K. K., Kanagawa, Japan) analysis for successful synthesis confirmation.

Inhibition Assays Using CPP-PNA
To perform the growth inhibition assay, we used the CPP-Pae68acp probe. 3 mL MB inoculated with strain YYML68 to a final concentration of 1.0 × 10 5 cells/mL were used as starter cultures. The CPP-Pae68acp probe was added to the starter cultures at final concentrations of 1, 2, 4, 6, 8 and 10 µM and incubated at 30 • C with agitation. Culture turbidity was measured at 0, 2, 4, 6, 8, 10, 12, 24 and 36 h, respectively, using an OD monitor miniphoto 518R (Taitec Corp., Saitama, Japan) at 660 nm. Non-CPP-Pae68acp probe inoculated starter cultures were used as negative controls.
To evaluate the function of the candidate carrageenases CgiA and CgiB within strain YYML68, four combinations using the CPP-Pae68cgiA and CPP-Pae68cgiB probes were tested; −/−, +/−, −/+ and +/+ (+ and − symbols represent the addition or non-addition of the probes, respectively). 3 mL 4× diluted MB supplemented with 1% ι-carrageenan, inoculated with strain YYML68 to a final concentration of 1.0 × 10 5 cells/mL were used. The final concentration of the added probes was 8 µM. Each sample was incubated at 30 • C with gentle agitation, and degradation of ι-carrageenan, measured based on medium viscosity as described above, was performed at 0, 8, 16 and 24 h. Non-strain YYML68 inoculated medium and medium inoculated with a non-carrageenan degrading bacterium, Bacillus megaterium were used for comparison.

Graph Presentation and Analysis
All graphs including t-test scores were generated using Prism9 v9.4.1 (GraphPad Software Inc., San Diego, CA, USA).

Candidate Carrageenase Genes in Paenibacillus sp. YYML68
Paenibacillus sp. strain YYML68 is a bacterium that can efficiently degrade crude carrageenan ( Figure S1). At the time of its isolation, in order to identify the genes encoding carrageenan-degrading enzymes or carrageenases, we sequenced the genome of this bacterium and annotated the genes. However, we were unable to identify any genes designated as carrageenases (Unpublished data). Since then, the project to isolate carrageenases from the bacterium was put on hiatus. Now that we can use CPP-PNA as a simple and efficient way to possibly identify the carrageenase-encoding candidate genes, strain YYML68 was used as the model bacterium for this work.
To refine the genome and to improve the probability of identifying possible candidate genes, we re-sequenced the genome of strain YYML68 using a combination of ONT and Illumina sequencing. Upon sequence assembly, we attained a single contig of 5,091,020 bp in size with GC contents of 53.5%. Analysis using BUSCO resulted in the identification of 444 single orthologue genes, accounting for 98.6% of the total 450 genes from the Bacillales order, suggesting that the sequenced genome was nearly complete. Gene annotation using RAST resulted in 4671 coding sequences (CDS) where 1022 genes (21.9%) were classified to 293 subsystems, while 1423 genes (30.5%) were uncategorized. On the other hand, 2226 genes (47.6%), nearly half of the total genes, were annotated as hypothetical proteins. Using this annotation result, although we did not identify any genes annotated as carrageenases, we successfully identified two candidate genes annotated as hypothetical proteins, HP60 and HP61, now termed as cgiA and cgiB, each encoding for the proteins CgiA and CgiB, that showed 67.3% and 76.8% amino acid sequence similarity to the ιcarrageenase identified from Paenibacillus sp. HB172198 (QCT00926.1) within the GH82 family ( Figure S2). The amino acid sequence homology between CgiA and CgiB was 69.5%. The draft genome of Paenibacillus sp. YYML68 and the gene sequences of cgiA and cgiB has been deposited at DDBJ/EMBL/GenBank under the accession number BQYI01000001 (first version), LC730803 and LC730804, respectively.
Simultaneously, we tested the degradation properties of strain YYML68 against ι-, κand λ-carrageenan. From this degradation test, we observed a significant 1.5-fold relative reduction in culture media viscosity at 24 h only when strain YYML68 was incubated with ι-carrageenan (Figure 2). In contrast, no reduction in media viscosity was observed when incubated with κand λ-carrageenan. Our results attained from the preliminary genome analysis and degradation test suggest that strain YYML68 harbors functional ι-carrageenases, and that cgiA and cgiB may be the candidate genes encoding for the enzyme.
from the Bacillales order, suggesting that the sequenced genome was nearly complete. Gene annotation using RAST resulted in 4671 coding sequences (CDS) where 1022 genes (21.9%) were classified to 293 subsystems, while 1423 genes (30.5%) were uncategorized. On the other hand, 2226 genes (47.6%), nearly half of the total genes, were annotated as hypothetical proteins. Using this annotation result, although we did not identify any genes annotated as carrageenases, we successfully identified two candidate genes annotated as hypothetical proteins, HP60 and HP61, now termed as cgiA and cgiB, each encoding for the proteins CgiA and CgiB, that showed 67.3% and 76.8% amino acid sequence similarity to the ι-carrageenase identified from Paenibacillus sp. HB172198 (QCT00926.1) within the GH82 family ( Figure S2). The amino acid sequence homology between CgiA and CgiB was 69.5%. The draft genome of Paenibacillus sp. YYML68 and the gene sequences of cgiA and cgiB has been deposited at DDBJ/EMBL/GenBank under the accession number BQYI01000001 (first version), LC730803 and LC730804, respectively.
Simultaneously, we tested the degradation properties of strain YYML68 against ι-, κand λ-carrageenan. From this degradation test, we observed a significant 1.5-fold relative reduction in culture media viscosity at 24 h only when strain YYML68 was incubated with ι-carrageenan ( Figure 2). In contrast, no reduction in media viscosity was observed when incubated with κ-and λ-carrageenan. Our results attained from the preliminary genome analysis and degradation test suggest that strain YYML68 harbors functional ι-carrageenases, and that cgiA and cgiB may be the candidate genes encoding for the enzyme. Figure 2. Degradation of ι, κ or λ-carrageenan using Paenibacillus strain YYML68. The concentration of the carrageenan types was 1%, 1%, and 2% respectively. Samples were measured at 0, 8, 16 and 24 h time points, and degradation was determined by the reduction in media viscosity in comparison to non-strain YYML68 treated samples. All samples were performed in triplicates. Statistical significance (p < 0.05) was determined by t-test where **** indicates the level of significance. ns: nonsignificant. Figure 2. Degradation of ι, κ or λ-carrageenan using Paenibacillus strain YYML68. The concentration of the carrageenan types was 1%, 1%, and 2% respectively. Samples were measured at 0, 8, 16 and 24 h time points, and degradation was determined by the reduction in media viscosity in comparison to non-strain YYML68 treated samples. All samples were performed in triplicates. Statistical significance (p < 0.05) was determined by t-test where **** indicates the level of significance. ns: non-significant.

CPP-PNA Efficiently Suppresses Protein Translation in Strain YYML68
In our attempt to use CPP-PNA as a simple preliminary approach in protein function elucidation, we first evaluated the ability of CPP-PNA to regulate protein translation in strain YYML68. An acpP gene encoding for the acyl carrier protein, AcpP, was identified from the sequenced genome and was used as a target. The gene sequence of the AcpP protein was also registered at DDBJ/EMBL/GenBank under the accession number LC730802. We showed that the growth inhibition of strain YYML68 correlated with the concentration of CPP-Pae68acp probes used (Figure 3). The negative control showed an increase in cell growth to an OD 660 of approximately 0.08, which was close to the final OD 660 of a 2-day culture of approximately 0.1. However, with cultures treated with the CPP-Pae68acp probe, we observed a gradual reduction in cell growth with probe concentrations as low as 1 µM. Complete growth inhibition of strain YYML68 at 24 h was attained in cultures treated with 6 µM of CPP-Pae68acp probes and above. However, cell growth recovery was observed at 36 h, suggesting that the CPP-Pae68acp probe was non-toxic to strain YYML68. concentration of CPP-Pae68acp probes used (Figure 3). The negative control showed an increase in cell growth to an OD660 of approximately 0.08, which was close to the final OD660 of a 2-day culture of approximately 0.1. However, with cultures treated with the CPP-Pae68acp probe, we observed a gradual reduction in cell growth with probe concentrations as low as 1 µM. Complete growth inhibition of strain YYML68 at 24 h was attained in cultures treated with 6 µM of CPP-Pae68acp probes and above. However, cell growth recovery was observed at 36 h, suggesting that the CPP-Pae68acp probe was non-toxic to strain YYML68. Having shown that CPP-PNA may regulate protein translation in strain YYML68, we verified their effects against the ι-carrageenase candidate genes cgiA and cgiB, by treating Having shown that CPP-PNA may regulate protein translation in strain YYML68, we verified their effects against the ι-carrageenase candidate genes cgiA and cgiB, by treating strain YYML68 with either the CPP-Pae68cgiA or the CPP-Pae68cgiB probe, or with both the probes simultaneously. Using the similar carrageenan degradation test as before, cells treated only with the CPP-Pae68cgiA showed an approximate 1.5-fold relative reduction in culture media viscosity, similar to the cells not treated with either of the probes (Figure 4). On the other hand, we did not observe any significant reduction in culture media viscosity when strain YYML68 was treated with CPP-Pae68cgiB, when both of the probes were applied or with the negative control, where we used a non-carrageenan degrading bacterium, B. megaterium. To further justify the degradation of ι-carrageenan during this degradation test, the growth inhibition effects of each of the probes was evaluated and the degradation products of each sample were also analyzed using MALDI-TOF MS. From the growth inhibition analysis, neither of the probes showed any growth inhibition effects to strain YYML68 ( Figure S3). From the MALDI-TOF MS analysis, three identical and distinct peaks of compounds, showing predicted and calculated molecular sizes equivalent to the oligomers, 3,6-anhydro-D-galactose-2-sulfate (DA2S) and D-galactose-4-sulfate (G4S) of ι-carrageenan, were observed from the samples treated with strain YYML68 only and samples treated with strain YYML68 in the presence of the CPP-Pae68cgiA probe ( Figure S4).
Based on these results, we conclude that there is a high possibility that the candidate gene cgiB encodes for a functional extracellularly secreted ι-carrageenase. Candidate gene cgiA, however, did not show significant ι-carrageenan degradation in the presence of the CPP-Pae68cgiB probe and, therefore, was regarded as not expressed or non-functional.
to strain YYML68 ( Figure S3). From the MALDI-TOF MS analysis, three identical and dis-tinct peaks of compounds, showing predicted and calculated molecular sizes equivalent to the oligomers, 3,6-anhydro-D-galactose-2-sulfate (DA2S) and D-galactose-4-sulfate (G4S) of ι-carrageenan, were observed from the samples treated with strain YYML68 only and samples treated with strain YYML68 in the presence of the CPP-Pae68cgiA probe (Figure S4).
Based on these results, we conclude that there is a high possibility that the candidate gene cgiB encodes for a functional extracellularly secreted ι-carrageenase. Candidate gene cgiA, however, did not show significant ι-carrageenan degradation in the presence of the CPP-Pae68cgiB probe and, therefore, was regarded as not expressed or non-functional.  . Functional analysis of the ι-carrageenase candidate genes, cgiA and cgiB, via CPP-PNA regulation. The concentration of ι-carrageenan and the designated probes was 1% and 8 µM, respectively. Samples were measured at 0, 8, 16 and 24 h time points, and degradation was determined by the reduction of media viscosity in comparison to a non-carrageenan degrading bacterium, Bacillus megateirum (Negative control; NC). All samples were performed in triplicates. Statistical significance (p < 0.05) was determined by t-test where **** indicates the level of significance. ns: non-significant.

Carrageenase Gene Comparison
Showing that CPP-PNA was applicable in identifying the functional ι-carrageenase encoded by the cgiB gene, we took one step further to investigate the structure of the protein CgiB to see if it harbored unique features in comparison to currently known ι-carrageenases. Here, we also evaluated the structure of the CgiA protein to see if there were any significant hints to explain why the protein may be non-functional. For tertiary structure comparison, the protein sequence of CgiA and CgiB were uploaded to the SWISS-MODEL online 3D modelling server, and the results showed that the predicted structure of both proteins showed high similarity to the ι-carrageenase of Alteromonas fortis [29], with amino acid sequence coverage of 78% and 71%, and identities of 20.38% and 20.82%, respectively. As observed, both CgiA and CgiB were clearly missing domain A, a region known to be crucial for the formation of the enzyme catalytic tunnel required for ι-carrageenan degradation [30] among ι-carrageenases. Using the predicted tertiary structural conformation of the βsheets and α-helixes of these proteins, they were subsequently aligned and compared at the secondary structure level with reported ι-carrageenases that were also reported to be missing domain A, including those from M. thermotolerans JAMB-A94 and P. atlantica T6c [27].
The secondary structure comparison analysis was used to identify the important amino acid residues for ι-carrageenan degradation ( Figure 5). Comparing CgiA and CgiB from strain YYML68 with the ι-carrageenase from another Paenibacillus strain HB172198, (isolated from brown agar, China, Hainan province) listed within the GH82 family of the CAZy database, we found that all three sequences were strikingly similar when multiple aligned. The sequences averaged at 480 amino acid residues in length with no prominent gaps; they share 296 identical sites (61.9%), and have a high pairwise identity of 71.0%. CgiA and CgiB also showed high structural similarity to the ι-carrageenases of M. thermotolerans JAMB-A94 T and P. atlantica T6c, and all the catalytic sites related to carrageenan degradation were highly conserved among all the proteins, including those identified from A. fortis.

Discussion
CPPs are short peptides that have properties to act as a carrier to deliver biomolecules into diverse prokaryotic and eukaryotic cells. In bacteria, CPPs have been employed for the delivery of numerous biomolecules, including nucleic acid, small compounds and proteins [31][32][33]. PNAs, on the other hand, are nucleotide analogues with an artificial peptide backbone comprising N-2-aminoethylglycine repeats that show strong binding affinity to DNA and RNA and resistance to intracellular enzymes including nucleases and proteases [34][35][36][37]. Due to the unique features of both these molecules, together, CPP-PNA conjugates can serve as probes to bind to specific nucleic acid sequences in live cells, namely with mRNA in vivo, and to inhibit the translation of proteins. CPP-PNA conjugates are currently employed in the biomedical field where growth inhibition of pathogenic strains or antibiotic-resistant bacteria have been reported [38][39][40]. Although showing potential application, as far as we know, CPP-PNA conjugates are still not fully exploited.
One possible major setback to the application of CPPs with bacteria is the limitation in its versatility. Reports thus far have shown that CPPs are species specific [41]. As such, more versatile CPPs applicable with diverse bacterial strains, particularly with novel or hardto-transform strains, are required. Using several newly found properties of CPPs attained from our previous efforts, we employed our optimized CPPs with a hard-to-transform bacterium, Paenibacillus sp. strain YYML68. Genus Paenibacillus is known to produce numerous secondary metabolites important to industry, but its genetic manipulation has been limited to several strains, and reports use namely electroporation [42][43][44] or conjugation [45,46]. In this work, we used the (DabFF) 3 Dab CPP where we showed that this CPP could efficiently and easily permeate the membrane of Paenibacillus sp. strain YYML68 ( Figure S5). Subsequently, upon designing a PNA targeting the mRNA of the acyl carrier protein, AcpP, and conjugating it to our CPP, we showed that CPP-PNA could arrest bacterial growth in a concentration dependent manner within a 24 h window upon CPP-PNA application. In addition, we also verified that CPP was not toxic to strain YYML68 where gradual growth recovery was observed for all probe concentrations at 36 h (Figure 3). Full growth recovery for the cells treated with 10 µM of the CPP-Pae68acp probes (OD 660 equivalent to the non-treated cells) was observed at 96 h. Finally, we employed CPP-PNA as a tool to identify and elucidate the function of uncharacterized carrageenases within strain YYML68 (Figure 4).
Based on our tertiary and secondary structure comparative analysis of CgiA and CgiB with other ι-carrageenases, the most prominent feature of both CgiA and CgiB from strain YYML68 was the missing of domain A. However, our most interesting discovery was the inactivity of CgiA. Although CgiA harbored all the conserved catalytic sites discussed to be crucial in ι-carrageenan degradation ( Figure 5), it did not show any activity when using the CPP-Pae68cgiB probe ( Figure 4). Based on this analysis, we hypothesized two possible explanations to this phenomenon. The first would be the possibility of CgiA being expressed but not functional. Having performed the comparative analysis between the currently reported Clade C ι-carrageenases ( Figure S1 and Figure 5), we showed that both proteins were highly similar in structure and have near identical catalytic sites. Therefore, a sound assumption to this difference could be due to the presence of other undiscovered catalytic sites. The second would be the possibility of strain YYML68 switching between the expression and utilization of CgiA and CgiB. Such an ability enhances the host to efficiently degrade in addition to producing different degraded isoforms of the target polysaccharide. However, showing here that CgiA and CgiB have near sequence and structural similarity, it is possible to assume that strain YYML68 may have a mechanism that controls the expression of both enzymes. As our current objective in this work was to show the applicability of CPP-PNA in in vivo protein functional analysis, further detailed study of these hypotheses was not pursued.
In this work, we showed the potentiality and applicability of using CPP-PNA as a tool for the identification and elucidation of uncharacterized proteins within a hardto-transform bacterium. We showed that the method is simple since CPP-PNA probes designed to target the gene-of-interest are incubated with the target bacterium and can be assayed for several hours upon CPP-PNA incubation without the need to generate or select mutant strains. CPP-PNA is also advantageous, since gene regulation can be performed in a concentration-dependent manner, allowing it to be employed to elucidate and evaluate the function of enzymes within metabolic pathways that may cause cell death if their genes are completely knocked out.
However, although we clearly see the applicability of using CPP-PNA conjugates in protein functional analyses, several improvements still need to be considered. This includes the possibility of off-target effects that may occur due to the length of the PNA sequences used in this work which is only 10-mer. 10-mer was used as the length of our PNA, since it has been proven to permeate the cells at high efficiency and is sufficient to regulate genes as previously reported [47,48]. In similar specific sequence binding experiments using DNA, longer probes are favorable (approximately 20-30-mers) to increase binding specificity, but it is currently technically challenging to synthesize PNAs that are more than 15-mers in length [49]. Using standard peptide solid-phase synthesis protocols, we have been successful in synthesizing PNA probes to as long as 15-mers, but based on our experience and also from previously reported [47], CPP harboring PNA conjugates longer than 12-mer greatly affect the permeation efficiency of the conjugates into live cells. Methods to synthesize longer PNA sequences have been introduced [50,51], but for these probes to be applied, CPPs with the ability to deliver longer PNA into bacteria also need to be designed and evaluated. Based on the difference between the thermostability and affinity of PNAs binding to mRNA, which is more stable than DNA-RNA interactions, there is also the possibility of increase in non-specific partial binding of longer PNA probes to non-target mRNA molecules, hence, resulting in the possibility of increased off-target effects. Therefore, the length of PNA and its binding properties with DNA and its relativity to protein translation regulation should be further evaluated and optimized. Another potential challenge we see in using CPP-PNA conjugates for the characterization of novel proteins in vivo is that, similar to silencing RNA (siRNA), in contrast to knock-out mutant strains where gene expression of the target gene has been nullified and functional evaluation of the target gene can be performed immediately, the effects of CPP-PNA inhibition is only seen once the intracellular abundance of the target protein is lowered or depleted. Depletion of the target protein varies due to the expression of the protein ranging from several hours to days, and in most cases is best monitored using specific antibodies via western blotting or by assays that directly measure the reaction of the target protein. Thus, currently, CPP-PNA conjugates are more likely to be more applicable in evaluating or elucidating the function of proteins in which antibodies are available or where assays have been developed.
Despite some of the setbacks highlighted above, we believe that CPP-PNA conjugates have the potential to serve as a new approach to elucidate and characterize proteins in vivo within the native bacterium.