Dynamics of Specific cfDNA Fragments in Plasma of Full Marathon Participants

Plasma cell-free DNA (cfDNA) is frequently analyzed using liquid biopsy to investigate cancer markers. Accordingly, we hypothesized this concept could be applied to the field of exercise physiology. Here, we aimed to identify specific cfDNA (spcfDNA) sequences in the plasma of non-treated human participants using next generation sequencing (NGS) and to clearly define the dynamics regarding the amounts of spcfDNA-fragments upon extreme exercise, such as running a full marathon. NGS analysis was performed using cfDNA of pooled plasma collected from non-treated participants. We confirmed the TaqMan-qPCR assay had a high sensitivity and found the spcfDNA sequence abundance was 16,600-fold higher than a normal genomic region. We then used the TaqMan-qPCR assay to investigate the dynamics of the levels of spcfDNA-fragments upon running a full marathon. Quantities of the spcfDNA fragments were significantly increased post marathon. Furthermore, the amounts of spcfDNA fragments strongly correlated with the numbers of white blood cells and plasma myoglobin concentrations. These results suggest the spcfDNA fragments identified in this study were highly sensitive response markers to extreme physical stress. The findings of this study may provide new insights into exercise physiology and genome biology on the human.


Introduction
In recent years, many studies using plasma cell-free DNA (cfDNA) have been conducted in cancer research and are associated with the concept of liquid biopsy being a minimally invasive method. These studies are based on the physiological phenomenon by which mutated DNA fragments of dead cancer cells leak into the blood and remain in the plasma. Therefore, these fragments can be detected as biomarkers of cancer outbreak and progression in patients with cancer [1][2][3].
For example, in a study of non-small cell lung cancer including a large number of patients worldwide, it was found that quantifying the abundance of circulating tumor DNA (ctDNA) present in plasma cfDNA enables the evaluation of early diagnosis, efficacy of molecular targeted drugs, and drug resistance [3]. Conventionally, the diagnosis of cancer and evaluation of malignant degeneration have been performed using tissue biopsy. However, in recent years, plasma cfDNA has also been used and allows cancer parameters to be evaluated by employing minimally invasive methods without the need of tissue biopsy. Therefore, this method is considered a promising approach for the medical examination of cancers in the future [4].
Based on the findings of the above-mentioned cancer studies, we conceived the possibility that plasma cfDNA could be applied in the field of exercise physiology, as it is well known that excessive oxidative stress occurs during severe exercise [5,6], which induces cell death [7][8][9]. Accordingly, we inferred that oxidative stress caused by extreme exercise would induce cell death in the tissues of an individual, which in turn would enable the leakage of genomic DNA into the blood. In support of this speculation, previous studies have reported that severe exercise increases the absolute amounts of plasma cfDNA, suggesting that cfDNA may become a marker of physical stress [10][11][12]. However, the previous studies are limited in the fact that they only quantified the absolute amount of cfDNA, and there are no reports on sequence information regarding the cfDNA. Genomic DNA exists as a nucleosome with various proteins being bound to it. Therefore, it can be inferred that depending on the binding site of the protein, there are likely DNA regions that are less susceptible to degradation by deoxyribonuclease (DNase). Therefore, it is important to clarify whether specific sequences tend to persist as a result of extreme exercise.
In the current study, we aimed to identify specific cfDNA (spcfDNA) sequences that may tend to remain in plasma by evaluating samples from non-treated individuals using next generation sequencing (NGS). We also aimed to clarify the dynamics regarding the amounts of scfDNA-fragments upon extreme exercise by evaluating runners before and at various time points after completing a full marathon. By clarifying these parameters, we expect to obtain new insights into the field of exercise physiology and genome biology.

Ethical approval and study overview
This study was approved by the Ethical Committee of the Faculty of Medicine at the University of Tsukuba in accordance with the Declaration of Helsinki (approval number: 274). Prior to performing the experiments, all participants received an explanation and documents describing the purpose of the study, its design details, and potential safety issues, and each provided informed consent. An overview of the experimental protocol is shown in Figure 1. The participants enrolled in the current study and their analyzed blood samples were the same as those reported in a previously published online article as our previous study [13]. However, the current study presents additional new findings that are not included in the previous article, particularly those pertaining to the NGS analysis.

Study participants
Twenty-six healthy males who performed aerobic exercise at least twice a week were recruited through a public notice. The enrolled participants planned to participate in the 38th Tsukuba Marathon, which is a spot event of a full marathon in Tsukuba City, Ibaraki Prefecture, Japan. The average age, height, and body weight (± standard deviation) of the participants were 25.2 (± 7.3) years, 172.0 (± 5.3) cm, and 64.9 (± 6.9) Kg, respectively. The participants were instructed to not consume alcohol, get a sufficient amount of sleep, and avoid binge eating prior to the full marathon. On the day of the full marathon, the participants freely performed warm-up exercises and drank water.

Blood sample collection
The conditions of the outside air on the day of the full marathon were a temperature of 12.7 °C with a relative humidity level of 60.6%. Blood samples of the participants were collected into EDTA blood collection tubes at four time points, immediately before the marathon (Pre), immediately after the marathon (Post), two hours post marathon (2 h), and 1 day post marathon (1 d). The participants were instructed to drink only water between the Post and 2 h collection points. The collected blood samples were centrifuged at 3,000 rpm for 15 min at 4°C. Aliquots of the plasma were then dispensed into 1.5 ml microtubes and stored at −80 °C until further analysis.

Measurement of general stress markers in the blood samples
Analysis of the blood samples was outsourced to a local clinical laboratory (Tsukuba i-Laboratory, Tsukuba, Ibaraki, Japan). The blood samples were evaluated for the number of white blood cells (WBCs) and concentrations of plasma myoglobin (MG) and creatine kinase (CK).

Extraction of cfDNA from pooled plasma samples
A total of 90 μL of individual plasma samples were pooled at each time point and the pooled plasma (2.07 mL) processed. Briefly, cfDNA in the pooled plasma samples was extracted using a NucleoSnap cfDNA Kit (Takara Bio, Shiga, Japan), according to the manufacturer's instructions. Concentrations and size distributions of the cfDNA to perfume pre-preparation for subsequent analyses were measured using an Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and an Agilent High Sensitivity DNA Kit (Agilent Technologies), according to the manufacturer's instructions. The extracted cfDNA was stored at −20 °C until further analysis.

Extraction of cfDNA form individual plasma samples
Using 200 μL of individual plasma samples at each time point, cfDNA was extracted using a NucleoSpin cfDNA XS Kit (Takara Bio), according to the manufacturer's instructions. The final elution volume was 30 μL. Because the plasma cfDNA concentrations were very low and the concentrations could not be measured using a spectrophotometer, we chose to use undiluted plasma-cfDNA samples for subsequent analysis. The cfDNA was stored at −20 °C until further analysis.

Library preparations for NGS
Library for NGS was prepared using 3 ng of pooled plasma cfDNA from time point Pre (non-treated) and a SMARTer ThruPLEX Plasma-seq Kit (Takara Bio), according to the manufacturer's instructions. The concentrations and fragment sizes of the libraries were measured using an Agilent High Sensitivity DNA Kit, according to the manufacturer's instructions. The libraries were stored at −20 °C until further analysis.

NGS analysis
The plasma cfDNA libraries were pooled and the concentrations adjusted to 2 nM. The pooled libraries were then diluted to 1.8 pM for the denaturation step. NGS was performed using a NextSeq 500 System (Illumina, San Diego, CA, USA) and NextSeq 500/550 v2.5 (75 Cycles) Kits (Illumina). The sequencing conditions were paired-end reads of 36 bases. After the sequencing run, a quality score over 30 was confirmed for 89.63% of all reads, indicating the success of the run. The read number was 205 million paired-end reads.

Bioinformatics analyses
Bioinformatics analyses were performed using CLC Genomics Workbench 20.0.3 software (QIAGEN, Hilden, Germany). FastQ files obtained from NGS were imported to the software and poor-quality leads were trimmed or excluded using the program "Trim Reads" on the default settings. The trimmed reads were subjected to analysis with the "Map Read to Reference" program available on Genome Reference Consortium Human Build 37 (GRCh37; hg19). To identify plasma spcfDNA sequences, the "Transcription Factor ChIP-Seq" program as a peak call using the default settings was alternatively performed for the mapping data. We also evaluated the mapping data to visualize the genome regions mapped to spcfDNA sequences and normal sequences (glyceraldehyde 3-phosphate dehydrogenase; GAPDH cfDNA). A pre-sample BED file of pre-sample peaks was exported from CLC Genomics Workbench software and added GAPDH region (chr12:6,645,100-6,645,500). To quantify the alignments from the BAM file with the overlap region in a BED file, a bedtools multicov function (version 2.30.0) was performed as a default setting using a pre-sample BED file against a position-sorted and indexed BAM file of the pre-sample using samtools (version 1.7). The values were expressed as transcripts per million (TPM) in python (version 3.8.5).

TaqMan-qPCR assay
The TaqMan-qPCR primers and probe for the spcfDNA sequences (spcfDNA-1) identified by bioinformatics analysis and those for the genomic GAPDH (gGAPDH) region were designed using the Primer-BLAST (National Library of Medicine, Bethesda, MD, USA) web tool. The primers and probe as a double quencher system were synthesized by Integrated DNA Technologies (Coralville, IA, USA) and the sequences are shown in Table 1. The 1st TaqMan-qPCR assay was performed as duplicate measurements to quantify the spcfDNA-1 and GAPDH fragments present in the individual plasma-cfDNAs at the Pre time point (non-treated) using PrimeTime Gene Expression Master Mix (Integrated DNA Technologies) with the PCR primers and TaqMan probe on a QuantStudio 5 Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA). The template in a volume 2 μL, 200 nM of each primer, and 100 nM probe were included in a total reaction volume of 10 μL per well. Negative-control wells were also prepared by using distilled water (DW) in the assays instead of template and no amplification was confirmed. The threshold cycle (CT) values obtained were converted to relative quantification values using the 2 −ΔΔCt method. The spcfDNA-1 fragments were also quantified at each time point on individual plasma-cfDNAs as absolute quantification using same TaqMan-qPCR assay. Genomic DNA of the human embryonic fibroblast cell line JCRB 1006.7 (JCRB Cell Bank; original developers: Kouchi and Namba) was used to construct the standard curve for the absolute quantification. The R 2 of the standard curves was greater than 0.99.

Statistics
All data without the date on bioinformatics analyses (section 2.9.) were statistically analyzed using GraphPad Prism version 7.04 software (GraphPad, San Diego, CA, USA). To evaluate the values of each group, we first performed the Shapiro-Wilk normality test to verify normality of the distributions. We then performed non-parametric testing between all the groups. To test for differences between two groups, Mann-Whitney U test was performed. For testing among the four groups, Kruskal-Wallis H tests (one-way ANOVA of ranks) were performed followed by a two-stage Benjamini, Krieger, and Yekutieli False Discovery Rate (FDR) procedure, which was ran as a post hoc test with a defined pre-value as a control. For correlation analysis, we first converted the values for spcfDNA-1, WBC, MG, and CK to a common logarithm as log10. The correlation analyses were then performed between spcfDNA-1 and the WBC, MG, and CK values for all time points. Statistical significance was set at p < 0.05. In the graphs, the y-axes are displayed on normal or logarithmic scales.

Confirmation of cfDNA concentrations and fragment sizes in pooled plasma
Total  3.2. spcfDNA sequences were identified in the Pre pooled plasma cfDNA Bioinformatics analyses of the pooled plasma cfDNA collected at the Pre time point after the NGS run identified spcfDNA sequences as significant top 5 peaks (Figure 3a). We chose to focus on the top peak of chromosome no. 1 based on the p-value and its expression (Figure 3b). This spcfDNA was referred to as spcfDNA-1. TPM and coverage values of the spcfDNA-1 region was approximately 17,600-fold higher than that of the gGAPDH region (Figure 3b and c). Consistent with these results, the median of CT values on TaqMan-qPCR assay for the spcfDNA-1 on individual cfDNAs were lower than that of the median of CT value of the gGAPDH fragments (Figure 3d). Also, standard deviations (SD) of the individual CT values upon duplicate measurements for the gGAPDH fragments were significantly higher than that of spcfDNA-1 fragments (Fig.  3e). Additionally, the spcfDNA-1 fragments in individual cfDNAs were approximately 16,600-fold higher than the gGAPDH fragments as a median (Figure 3f).

spcfDNA-1 fragments and general stress markers increased with full marathon as extreme exercise
At the Post time point, the levels of spcfDNA-1 fragments were significantly increased (5-fold to 6-fold) compared to that at the Pre and 2 h time points. Moreover, the amounts of spcfDNA-1 fragments remained slightly higher (1.3-fold) at the 1 d time point compared to that of the Pre time point (Figure 4a). The values for WBCs and MG demonstrated similar dynamics to that of spcfDNA-1 (Figure 4b and 4c) with the high values being confirmed at Post and 2 h time points. In contrast, the CK values exhibited different dynamics (Figure 4d), with high values being confirmed at the 1 d time point. The data regarding general markers have been published previously online [13].

Strong correlations between spcfDNA-1 and general stress makers confirmed
The spcfDNA-1 values strongly correlated with the values for WBCs or MG ( Figure  5a and 5b). However, only a weak correlation was observed between the spcfDNA-1 values and CK values (Figure 5c).

Discussion
In this study, we investigated the presence of spcfDNA fragments in non-treated human participants. The results revealed some spcfDNA sequences through NGS and bioinformatics analyses (Figure 3a and b). We subsequently focused on the top spcfDNA fragment and found its amounts were approximately 16,600-fold higher than those of cfDNA of the gGAPDH region as normal (Figure 3d and 3e). This was probably due to the spcfDNA-1 region being degraded much less in the blood than the normal genomic regions. The genome of somatic cells has a chromatin structure that includes nucleosomes and many proteins bound to it, such as histones, transcription factors, and co-factors [14,15]. Nucleic acids and nucleosome proteins undergo various modifications for epigenetic regulation, such as acetylation and methylation [16][17][18]. Therefore, depending on the protein(s) binding or the type of chemical modification, regions would vary in their susceptible to degradation by DNase enzymes. It was not possible in this study to determine the mode of degradation of the plasma cfDNA in detail. Therefore, we will investigate the mode of cfDNA degradation using an in vitro model that mimics blood.
It is well known that the concentration of plasma cfDNA that can be extracted is very low, making it difficult to amplify target genome-DNA fragments using qPCR. In general, if the concentration of target DNA fragments is very low, the CT value would be high and the quantitative ability is significantly lost because variation would become large. In fact, concentrations of the cfDNA extracted from 200 μL plasma of the individual samples in this study could not be measured using a spectrophotometer as the concentrations were very low. In addition, when amplified by targeting a normal genomic region (gGAPDH) in this study, the median CT value was 34.4 and the SDs were significantly high (Figure 3d and e), which mean quantitative ability is significantly lost. On the other hand, the median CT value for the spcfDNA-1 fragments in this study was 20.4, a difference of -14 in CT value compared to that of the gGAPDH fragments ( Figure 3d). Moreover, the SDs were significantly low (Figure 3e), which mean quantitative ability is not lost. Therefore, quantifying spcfDNA-1 was high sensitivity and analyzing the dynamics of plasma cfDNA was quantitative stable. Moreover, because the spcfDNA-1 target was highly sensitive, it may be possible to analyze the dynamics of spcfDNA-1 fragments under various stresses using a single drop of blood collected from the fingertip.
In our dynamic analyses, concentrations of the pooled cfDNA were dramatically increased (almost 10-fold) at the Post time point compared to that of the Pre time point. The concentrations subsequently returned to baseline at the 2 h time point (Figure 2a, b). In contrast, spcfDNA-1 fragments were significantly increased (approximately 6-fold) at both the Post and 2 h time points compared to that of the Pre time point. Even after 1 d, there was a significant increase in spcfDNA-1 fragments (Figure. 4a). Therefore, we considered the spcfDNA-1 fragments to reflect stress conditions longer than that of total cfDNA on the full marathon.
In dynamic analyses including general stress makers, the spcfDNA-1 fragments showed similar dynamics to that of WBCs and MG with high values being confirmed at the Post and 2 h time points and then returning to near baseline at 1 d (Figure 4a-c). Meanwhile, CK demonstrated differential dynamics, with the highest values being observed at 1 d (Figure 4d). Myoglobin (MG) is known to leak into the blood immediately after muscle damage and WBCs reflect an acute inflammatory response. Previous reports have also shown that WBCs and MG level in the blood are significantly increased immediately after running a marathon race [19][20][21]. In addition, strong correlations were confirmed in our current study between spcfDNA-1 and WBC or MG (Figure 5a and 5b). On the other hand, although a significant correlation was observed between CK and spcfDNA-1, the correlation was not strong (Figure 5c). Therefore, it is suggested that spcfDNA-1 may reflect acute muscle damage and inflammation and that it may become a complex biomarker of physical stress.

Conclusions
We first identified spcfDNA sequences called spcfDNA-1 that were present in the plasma at 16,600-times higher levels than that of a normal genomic region and had high sensitivity for the TaqMan-qPCR assay. We also found that the levels of the spcfDNA-1 fragments fluctuated significantly upon extreme exercise and severe physiological stress, such as that caused by running a full marathon. The spcfDNA-1 fragment levels strongly correlated with WBC and MG levels. Overall, the findings of this study provide new insights into exercise physiology and genome biology on the human.
Author Contributions: Conceptualization, methodology, validation, formal analysis, investigation, data curation, writing of the original draft preparation, visualization, writing, reviewing and editing of the final manuscript, T.K., T.S., and S.F. Validation, formal analysis, investigation, data and curation, N.I., and K.Tamai. Supervision, project administration, and funding acquisition, Y.K. and K.Takekoshi.