Development of Methods Derived from Iodine-Induced Specific Cleavage for Identification and Quantitation of DNA Phosphorothioate Modifications

DNA phosphorothioate (PT) modification is a novel modification that occurs on the DNA backbone, which refers to a non-bridging phosphate oxygen replaced by sulfur. This exclusive DNA modification widely distributes in bacteria but has not been found in eukaryotes to date. PT modification renders DNA nuclease tolerance and serves as a constitute element of bacterial restriction–modification (R–M) defensive system and more biological functions are awaiting exploration. Identification and quantification of the bacterial PT modifications are thus critical to better understanding their biological functions. This work describes three detailed methods derived from iodine-induced specific cleavage-an iodine-induced cleavage assay (ICA), a deep sequencing of iodine-induced cleavage at PT site (ICDS) and an iodine-induced cleavage PT sequencing (PT-IC-Seq)-for the investigation of PT modifications. Using these approaches, we have identified the presence of PT modifications and quantized the frequency of PT modifications in bacteria. These characterizations contributed to the high-resolution genomic mapping of PT modifications, in which the distribution of PT modification sites on the genome was marked accurately and the frequency of the specific modified sites was reliably obtained. Here, we provide time-saving and less labor-consuming methods for both of qualitative and quantitative analysis of genomic PT modifications. The application of these methodologies will offer great potential for better understanding the biology of the PT modifications and open the door to future further systematical study.


Introduction
Naturally occurring epigenetic modifications of DNA have been found prevalent in organisms from all domains of life [1]. Usually, those modifications involve a variety of chemical groups (such as methyl group, amino acids, polyamines, monosaccharides, and disaccharides) appended to the nucleobase portion of a nucleotide. These modifications do not alter the specificity of base pairing but and quantity in different bacteria, even in an individual cell of a population, because of the heterogeneity of PT modification [27,31].
Here, we describe a series of methods based on PT-specific iodine-induced cleavage and high-throughput next-generation sequencing technologies: ICA, ICDS, and PT-IC-Seq [27,31] (See Figure 1 for a schematic overview).
Biomolecules 2020, 10, x 3 of 15 for PT modification discovery [27][28][29][30]. As PT modification is a partial and highly dynamic modification, more techniques should be developed to locate the PT modification sites and uncover the diverse sequence selectivity and quantity in different bacteria, even in an individual cell of a population, because of the heterogeneity of PT modification [27,31].
Here, we describe a series of methods based on PT-specific iodine-induced cleavage and highthroughput next-generation sequencing technologies: ICA, ICDS, and PT-IC-Seq [27,31] (See Figure  1 for a schematic overview). Among them, the ICA, a method of rapidly detecting PT modification status in the genome, has been developed by using an electrophoresis pattern of the iodine-cleaved gDNA. The DNA samples were cleaved by iodine solution under an optimized condition, and the presence of PT modifications Among them, the ICA, a method of rapidly detecting PT modification status in the genome, has been developed by using an electrophoresis pattern of the iodine-cleaved gDNA. The DNA samples were cleaved by iodine solution under an optimized condition, and the presence of PT modifications was first detected by electrophoresis in agarose gel, and then confirmed by LC-MS/MS. The ICDS technology combining high-throughput Illumina sequencing with iodine-specific cleavage maps the locating of PT-modified sites in the genome. The PT-modified sites were cleaved specifically by iodine, and the resultant cutting sites were then marked with special labels. The tag location sites could be detected by the two-generation sequencing, which in turn fulfilled the determination of the distribution and localization of genomic PT modification sites. The PT-IC-Seq technology was an updated derivative of ICDS with higher throughput and resolutions. For the quantitative determination of PT modification percentage at each site, the DNA samples were first cleaved by iodine at selected phosphorothioate linkages, followed by tag ligations and high-throughput sequencing. No enrichment process was required for mapping sequenced reads to the reference genome. Taken together, these methods could be applied to investigate and quantitatively analyze PT modifications at whole genomic landscapes, and will expand our knowledge of PT modifications in the future. Harvest cells by centrifugation at 10,000 rpm for 2 min.

3.
Aspirate and discard the media.

4.
Extract gDNA using the Bacterial DNA Kit according to the manufacturer's instructions.

5.
Store eluted DNA at −80 • C until ready for use in following experiments. 6.

2.
Add~1 µg gDNA with 1 × ICA Reaction Buffer to clean microcentrifuge tube. Mix well, and incubate at 65 • C for 10 min and then slow cooled (0.1 • C/s) to 4 • C using a thermal cycler.
At the same time, analyze the above samples using the LC-MS/MS method (see Quantification of PT modifications in DNA by LC-MS/MS section). 5.
[Note: Always prepare a fresh iodine for every use, and confirm slow cooling process. Reduce voltage and increase gel running times possible during electrophoresis. [Note: The total number of A is equal to the number of T, and similarly the number of C is equal to the number of G. Thus, we could choose two mono nucleosides to draw the standard curve.] 3.
Analyze samples using the HPLC system with an Agilent ZORBAX SB-C18 column (3. Analyze the results: with the concentration of mono nucleosides standards (ng/µL) as the abscissa (X) and the peak area values of HPLC as the ordinate (Y), the standard curve was drawn. 6.

2.
Add~100 fmol/µL the Sp configuration of PT-modified dinucleotides to each standard used as internal reference. With the concentration of Rp configuration of the PT-modified dinucleotide (fmol/µL) as the abscissa (X) and the ratio of the standard peak area to internal reference peak area values measured by LC-MS/MS as the ordinate (Y), the standard curve was drawn. 5.
[ Add~20 µg purified DNA, 2 U of nuclease P1 and Hydrolysis Buffer to a clean microcentrifuge tube. Mix well, and incubate at 50 • C for 2 h using a thermal cycler.

3.
Then mix in 100 mM Tris-HCl, pH 8.0 and 15 U of FastAP Alkaline Phosphatase, and incubate at 37 • C for 2 h for the totally dephosphorylation.

4.
Remove the enzymes by ultrafiltration at 10,000 rpm for 10 min. 5.
[Note: This step is critical for removal of enzymes that may interfere with follow-up experiments.] 6.
Add~200 µL deionized water to ultrafiltration tube and repeat step 4 until all the sample has been transferred to the collection tube. 7.
Collect the filtrate in a clean microcentrifuge tube and dry the sample. 8.
[Note: For subsequent LC-MS/MS analysis, the volume of sample is as small as possible. Concentrated sample using the Freeze Dryer may yield better results.] 9.
Resuspend and elute the sample from step 6 in 40 µL deionized water. 10. Analyze sample quantitatively by LC-MS/MS system with an electrospray ionization source in positive mode as described previously [2], and the parameters as follow: 11. Gas flow, 10 L/min; nebulizer pressure, 30 psi; drying gas temperature, 325 • C; and capillary voltage, 3100 V. Multiple reaction monitoring mode was used for detection of product ions derived from the precursor ions, with all instrument parameters optimized for maximal sensitivity (retention time in min, precursor ion m/z, product ion m/z, fragmentor voltage, collision energy): Remove residual iodine and salts using MicroSpin G-25 columns. 3.
[Note: This step is critical for removal of residual iodine and salts that may interfere with follow-up experiments.] 4.
Add 10 U of FastAP Alkaline Phosphatase to the sample at 37 • C for 1h for remove terminal phosphate groups.

5.
Inactivate the enzyme by heating at 75 • C for 5 min and slow cooled (0.1 • C/s) to 4 • C to assure proper complementary re-annealing. 6.
Clean up the sample using the Cycle Pure Kit and elute it in 30 µL MilliQ water. 7.
Blunt end of the break sites using the Quick BluntingTM Kit at room temperature for 30 min. 8.
Inactivate the enzyme by heating at 75 • C for 10 min and slow cooled as before. 9.
Clean up and elute the sample as before. 10. Then mix in 1x NEB Buffer 2, 0.1 mM dATP and 15 U of Klenow Fragment (3 →5 exo-) and incubate at 37 • C for 30 min for the 3 -deoxyadenylation (namely A-tailing). 11. Inactivate the enzyme by heating at 75 • C for 20 min and slow cooled as before. 12. Clean up and elute the sample as before. 13 tag and done quality control as follows: • Clipping the adapter sequences.

•
Removing non-A, G, C, T bases of the 5 end.

•
Removing reads with more than 10% of "N" calls.

•
Filtering small fragments with less than 25 bp after clipping the adapter sequences and quality control, and then aligned to reference genomes by Burrows-Wheeler Aligner (BWA) and the position-wise coverage values were calculated by using the custom python script. The GAAC/GTTC sites will be defined as PT-modified sites if their reads above 50 and ended at this site. Meanwhile, 10 non-GAAC/GTTC sites were randomly selected as control.
[Note: This step is critical for removal of residual iodine and salts that may interfere with follow-up experiments.] 4.
Shear the DNA samples to fragment lengths of 150-350 bp by sonication. 5.
[Note: To fragment the DNA samples to a size range of 150-350 bp, using a probe sonicator as before.] 6.
According to instructions provided with the NEBNext DNA Library Prep Reagent Set for Illumina, the resulting fragments were end-repaired, adenylated at the 3 ends and ligated to Illumina paired-end adaptors. 7.
Amplify DNA fragments by PCR for 15 cycles using standard Illumina adapter-specific primers. 8.
Sequence the libraries constructed from step 5 on the Illumina HiSeq X Ten platform. 9.
[Note: Agilent Technologies 2100 Bioanalyzer is used to confirm successful library generation and Life Technologies Qubit 3.0 Fluorometer for quantification.] 10. Analyze the PT-IC-Seq data, and the details as follow: 11. All sequencing reads were trimmed for adaptor and low-quality bases and aligned to reference genomes by BWA for creating Sequence Alignment/Map (SAM) files. Then, SAM files were converted to Binary Alignment/Map (BAM) files that were piled up using samtools and the results were visually performed using the Integrated Genomics Viewer 2.3 software (IGV; Broad Institute, Cambridge, MA, USA). Meanwhile, the position-wise reads number obtained from fragment terminals and across the same site were calculated respectively by using the custom python script.
The PT modification frequency of each GAAC/GTTC sites were calculated by the number of reads ended at this site divided by all of the number of reads ended and crossed the same sites.
To eliminate the false-ended reads arising from random shearing the DNA, 10 non-GAAC/GTTC sites used as the control and their average PT modification frequency were calculated and used them as thresholds. The modification ratios at each PT sites in the whole genome will be calculated and analyzed if the modification frequency of those sites were above the set thresholds.

ICA Assay
For the feasibility assessment of ICA method, the gDNA isolated from E. coli B7A containing PT-modified genes, was tested as an example. The ICA assay showed that treatment of the PT-modified gDNA with iodine resulted in smaller fragment distribution compared with the control treated without iodine, whereas the gDNA from E. coli DH10B lacking PT gene cluster was not cleaved (Figure 2A). The results demonstrated that iodine could cleave PT-modified DNA with high efficiency. This finding was consistent with the previous cleavage studies [20][21][22][23], so this assay is feasible to detect the presence of PT modifications in the genome. To consolidate the above results, we re-analyzed PT modifications in gDNA of E. coli B7A by LC-MS/MS. As shown in Figure 2B,C, the PT-containing dinucleotides (GpsA or GpsT) after hydrolysis of the gDNA by nucleases is consistent with the retention time of the standard that possessed PT modifications with Rp configuration. These results confirmed that the ICA approach could rapidly detect PT modification status in the genome.

Establishment of Standard Curve
According to above the method, the standard curves of different mono nucleosides standards are shown in Figure S1A for T and Figure S1B for C, and the standard curves of different PT-modified dinucleotide standards are shown in Figure S1C (GpsA) and Figure S1D (GpsT).

Quantification of PT Modifications in DNA by LC-MS/MS
According to above the method, the dinucleotides of gDNA isolated from E. coli B7A were The results demonstrated that iodine could cleave PT-modified DNA with high efficiency. This finding was consistent with the previous cleavage studies [20][21][22][23], so this assay is feasible to detect the presence of PT modifications in the genome. To consolidate the above results, we re-analyzed PT modifications in gDNA of E. coli B7A by LC-MS/MS. As shown in Figure 2B,C, the PT-containing dinucleotides (GpsA or GpsT) after hydrolysis of the gDNA by nucleases is consistent with the retention time of the standard that possessed PT modifications with Rp configuration. These results confirmed that the ICA approach could rapidly detect PT modification status in the genome.

Establishment of Standard Curve
According to above the method, the standard curves of different mono nucleosides standards are shown in Figure S1A for T and Figure S1B for C, and the standard curves of different PT-modified dinucleotide standards are shown in Figure S1C (GpsA) and Figure S1D (GpsT).

Quantification of PT Modifications in DNA by LC-MS/MS
According to above the method, the dinucleotides of gDNA isolated from E. coli B7A were quantitively analyzed as an example. By using synthetic mono nucleosides and PT-modified dinucleotide standards, the amount of PT-containing dinucleotides after hydrolysis of the gDNA by nucleases was analyzed by LC-MS/MS ( Figure 3A,B) and HPLC ( Figure 3C). We were thus able to detect PT modification levels occurring in GpsA and GpsT contexts at 325 ± 8 and 361 ± 11 PTs per 10 6 nt, respectively in 20 μg gDNA ( Figure 3D), which is consistent with previous research [3].

ICDS Assay
The ICDS method was used to characterize the PT-modified on the gDNA isolated from E. coli B7A, which was tested as an example. By aligning the output reads to the reference genome sequence and visual analysis using the IGV, the partial data were visually depicted in Figure 4A, and the GAAC/GTTC sites in E. coli B7A genome were defined as PT-modified sites, with their reads above 50 with the unique tag and ending at this site (such as the red triangle mark in Figure 4A). We were thus able to detect PT modification levels occurring in GpsA and GpsT contexts at 325 ± 8 and 361 ± 11 PTs per 10 6 nt, respectively in 20 µg gDNA ( Figure 3D), which is consistent with previous research [3].

ICDS Assay
The ICDS method was used to characterize the PT-modified on the gDNA isolated from E. coli B7A, which was tested as an example. By aligning the output reads to the reference genome sequence and visual analysis using the IGV, the partial data were visually depicted in Figure 4A, and the GAAC/GTTC sites in E. coli B7A genome were defined as PT-modified sites, with their reads above 50 with the unique tag and ending at this site (such as the red triangle mark in Figure 4A). The distribution and localization of E. coli B7A genome PT modification sites were determined and shown in Figure 4B. These results demonstrated that the method was feasible for discerning PT sites in the whole genome and can draw the high-resolution genomic maps of PT modifications. However, this approach was not applied in single-stranded modifications, such as Vibrio cyclitrophicus FF75 [27].

PT-IC-Seq Assay
As the PT modifications have been enriched during the PCR amplification step, the ICDS method can only be used for determining PT sites, not for quantification. Taking the limits of ICDS method into consideration, a novel approach named PT-IC-Seq was developed. The distribution and localization of E. coli B7A genome PT modification sites were determined and shown in Figure 4B. These results demonstrated that the method was feasible for discerning PT sites in the whole genome and can draw the high-resolution genomic maps of PT modifications. However, this approach was not applied in single-stranded modifications, such as Vibrio cyclitrophicus FF75 [27].

PT-IC-Seq Assay
As the PT modifications have been enriched during the PCR amplification step, the ICDS method can only be used for determining PT sites, not for quantification. Taking the limits of ICDS method into consideration, a novel approach named PT-IC-Seq was developed.
PT-IC-Seq method was applied to quantify the PT modification on the gDNA isolated from Salmonella enterica serovar Cerro 87, which was tested as an example. By aligning the output reads to the reference genome sequence and visual analysis using the IGV, the partial data were visually depicted in Figure 4C. There are two main forms at the GAAC/GTTC site. One is that the intact GAAC/GTTC motifs appeared at the internal of sequencing reads, which originated from unmodified GAAC/GTTC sites, and the other is initiated with AAC or TTC, regarded as PT-modified sites, which resulted from the cleavage of the DNA strand at the PT-modified GpsAAC/GpsTTC site by iodine. The PT modification level at each modified site throughout the Salmonella enterica serovar Cerro 87 genome was calculated (Supplementary Table S2) and shown in the PT modification map ( Figure 4D). These results demonstrated the method was feasible for quantitatively characterizing the genomic landscape of PT modifications and determining PT modification frequency at each modified site.

Discussion
The nucleic acids contain diverse chemical modifications, which exerts important influence in a variety of life processes [32][33][34]. Among these modifications, DNA methylation is the best known and has essential roles in cellular processes, such as genome regulation, development, and disease [35][36][37]. The recently discovered PT modifications occurring on the DNA backbone was a novel DNA modification, in which the non-bridging phosphate oxygen is replaced by sulfur. This exclusive DNA modification is widespread in bacteria within a sequence selective and Rp stereo-specific manner, but not found in mammal cells, and their physiological functions remain poorly understood. Thus, determination and quantitative analysis of the PT modifications are essential for exploring and understanding their biological functions. We provide here a comprehensive description for the entire process of a series of approaches for analyzing PT modifications in the genome so that it can be easily adopted by researchers.
Since the initial report on PT modification detection using the ICA assay in 2014 [27], the technique has been widely used for PT modification discovery. An important advantage of this method is its simplicity, because the detection of PT modifications can be completed in a relatively short time and at low cost. The main goal of the ICA method is to rapidly detect the presence of PT modifications in the genome. The treatment of iodine introduced a strand break at the PT-modified site, providing the basis for new technology development. During the development of the method, we found that the quality of iodine was the key to successful experimentation according to previous results. From experience, a fresh iodine solution should be prepared every time. Meanwhile, the reduction of voltage and extending gel running time during electrophoresis process of ICA assay may help to yield better results. Importantly, we also used the LC-MS/MS method to further confirm the results after the ICA assay, which ensure the accuracy of the results.
To further understand the characteristics of PT modifications in the context of the genomic landscape, we developed two highly novel approaches: ICDS and PT-IC-Seq, which integrate the integration of PT-specific iodine-induced cleavage with high-throughput next-generation sequencing technology. As illustrated in Figure 1, ICDS was developed to map PT locations in bacterial genomes based on the adaptation of high-throughput next-generation sequencing technology. In the ICDS approach, iodine reagent was introduced to cleave DNA at PT modification sites, and then ligated to an adaptor with a specific index sequence for enriching the DNA fragments with PT modifications by PCR amplification. Since the PCR amplification step will result in enriched amplicons with PT modifications, ICDS sequencing approach can only be used for identifying PT sites but not for quantification. PT-IC-seq was then developed to quantitatively determine the PT modification percentage at each site without enrichment process. The PT-modified motifs would be cleaved and should be presented as the reads ends in the sequencing output. The unmodified motifs would not be cleaved and should be present in the internal locations of DNA fragments. The frequency of PT modification can be determined by the ratio of a sequence reads at the end versus internal, so the modification was quantified authentically. However, ICDS sequencing could not do this. Moreover, the sites with lower modification frequency missing in ICDS could also be detected by PT-IC-Seq with high sequencing depth. Compared with the conventional methods such as Dnd phenotype electrophoresis analysis and LC-MS/MS, ICDS and PT-IC-Seq technology were convenient and efficient approaches to characterize PT modification in the context of the genome-wide scale quantitatively and accurately. Unfortunately, neither the ICDS nor the PT-IC-Seq method can be used to analyze strains possessing single-stranded PT modifications, such as Vibrio cyclitrophicus FF75. In fact, when the PT modification occurs on a single strand, the treatment of iodine could not introduce a double-strand break at modified sites. Therefore, ICDS and PT-IC-Seq methods are bistranded PT modification-specific methods. Even so, this limitation may not impact the application for PT modifications research, because almost all reported PT modifications have occurred on the double-strand of DNA to date, except for a few bacterial such as the Vibrio cyclitrophicus FF75. However, these methods set important stages for new derivative methods, such as iodine-induced cleavage quantitative real-time PCR (IC-qPCR), which integrated the iodine-induced cleavage and the TaqMan real-time PCR, and was used to assess the frequency of PT modification at a given site in bacterial genome [38]. Other methods may be developed in the future for the study of PT modification.

Conclusions
In summary, we describe a series of methods coupling PT-specific iodine-induced cleavage with high-throughput next-generation sequencing technologies, to identify and quantify PT modifications in the genome. Using the ICA method, we have achieved rapid detection of PT modification status in the genome. Moreover, to characterize the PT modification on whole genomic landscapes, ICDS technology was developed. We have successfully applied it to identify features of this epigenetic mark at any one of genomic positions. Based on the success, a high-resolution map of locations for PT-modified sites has been achieved. Furthermore, the PT-IC-Seq technology was developed on the basis of the ICDS method, which was able to quantitatively analyze the frequency of specific PT-modified sites on the genome. Overall, the protocols not only pave a path to a better understanding of the biology of the PT modification, but also serve as a useful technique suitable for the investigation of PT modification related studies.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-273X/10/11/1491/s1, Table S1: ICDS sequencing analysis of phosphorothioate modifications throughout the E. coli B7A genome, Table S2: PT-IC-Seq sequencing analysis of phosphorothioate modifications throughout the Salmonella enterica serovar Cerro 87 genome, Figure S1. The establishment of standard curve. The standard curve of (A) T and (B) C, and standard curve of (C) GpsA and (D) GpsT.