A Simplified Cluster Analysis of Electron Track Structure for Estimating Complex DNA Damage Yields

Complex DNA damage, defined as at least two vicinal lesions within 10–20 base pairs (bp), induced after exposure to ionizing radiation, is recognized as fatal damage to human tissue. Due to the difficulty of directly measuring the aggregation of DNA damage at the nano-meter scale, many cluster analyses of inelastic interactions based on Monte Carlo simulation for radiation track structure in liquid water have been conducted to evaluate DNA damage. Meanwhile, the experimental technique to detect complex DNA damage has evolved in recent decades, so both approaches with simulation and experiment get used for investigating complex DNA damage. During this study, we propose a simplified cluster analysis of ionization and electronic excitation events within 10 bp based on track structure for estimating complex DNA damage yields for electron and X-ray irradiations. We then compare the computational results with the experimental complex DNA damage coupled with base damage (BD) measured by enzymatic cleavage and atomic force microscopy (AFM). The computational results agree well with experimental fractions of complex damage yields, i.e., single and double strand breaks (SSBs, DSBs) and complex BD, when the yield ratio of BD/SSB is assumed to be 1.3. Considering the comparison of complex DSB yields, i.e., DSB + BD and DSB + 2BD, between simulation and experimental data, we find that the aggregation degree of the events along electron tracks reflects the complexity of induced DNA damage, showing 43.5% of DSB induced after 70 kVp X-ray irradiation can be classified as a complex form coupled with BD. The present simulation enables us to quantify the type of complex damage which cannot be measured through in vitro experiments and helps us to interpret the experimental detection efficiency for complex BD measured by AFM. This simple model for estimating complex DNA damage yields contributes to the precise understanding of the DNA damage complexity induced after X-ray and electron irradiations.


Introduction
Ionizing radiation within the human body causes DNA damage [1] both physically (i.e., energy deposition) [2,3] and chemically (i.e., free radical) reacting to the DNA target [4][5][6][7]. Among various DNA damage types [8,9], DNA double-strand breaks (DSBs), defined as two strand breaks within 10 base pairs (bp) [2,10,11] are conventionally recognized as fatal DNA damage, which can lead to cell death with a certain probability [12]. Therefore, the relative biological effectiveness (RBE) at the endpoints of DSBs and cell survival has been investigated in vitro and in silico in previous reports [2,6,8,9,[12][13][14][15][16][17][18][19]. Additionally, complex DNA damage composed of at least three vicinal lesions caused within 10-20 bp, such as DSBs coupled with strand breaks or base damage (BD), is believed to be more lethal to cells than simple DSBs [20,21] due to refractory damage [22,23]. To assess lethality, it is necessary to quantify the clustering degree of DNA damage with both experiments [24,25] and simulations [8,[26][27][28][29]. It has been pointed out that a considerable amount of complex DNA damage can be induced after irradiation; however, the yield and nature of complex DNA damage are still difficult to be experimentally measured [30]. Due to the difficulty of measuring complex DNA damage, the validity of simulations has not been sufficiently demonstrated yet.
Experimental methods for detecting complex DNA damage have evolved in recent decades [24,[30][31][32][33][34][35][36]. Among several techniques [24,[30][31][32][33][34][35][36], microscopic operations coupled with an antibody against γ-H2AX [31][32][33][34] enables researchers to obtain spatial distributions of DSB induction in the cell nucleus. The adjacent degree of the DSB site can be evaluated from γ-H2AX foci volume in an assay [35,36]; however, the damage complexity at the nano-meter scale (the scale of DNA) cannot be obtained due to the limited spatial resolution from hundreds of nm to a few µm [35,36]. Meanwhile, complex DNA damage composed of BD can be quantified by means of gel electrophoresis after treatment of base excision repair enzymes [24,30,37,38] and fluorescence resonance energy transfer [39,40]. The structure of complex DNA damage, i.e., the number of lesions per damaged site was recently revealed with atomic force microscopy (AFM), where an individual BD in a complex damage site is specifically labelled with biotin/avidin coupled with an aldehyde reactive probe (ARP) [41]. However, whether all complex BD caused within a few bp can be detected in the AFM experiment [41] or not remains unknown. Thus, it is essential to compare simulations based on track structure with the corresponding experimental data, which contributes to interpreting the detection efficiency of complex BD.
A cluster analysis of inelastic interactions based on a radiation track structure in liquid water has been conducted as a powerful tool for estimating DNA damage yield [2,3,8,16,42]. The aim of this study is to develop a simple model for estimating complex DNA damage (i.e., isolated DSB, DSB + BD, DSB + 2BD, 2BD, 3BD, 4BD) based on previous simulation techniques [17,27,28], and to investigate the simulation accuracy and the nature of X-ray-(and electron-) induced complex damage in comparison with experimental data. This work finally quantifies the complexity of DNA damage under X-ray and electron irradiations.

Comparison between the Present Model and Experimental Complex Damage
We first checked the model validity, in detail yield ratio of base damage (BD) and single-strand break (SSB) = 1.3, comparing the simulation results (Equations (2)-(5) defined in Materials and Methods) with experimental data on yield fractions of SSB, double-strand beak (DSB), BD and complex BD (cBD) measured by enzymatic cleavage [37,38]. Figure 1A shows the fractions of SSB, DSB, BD and cBD obtained from our model and experiments [37,38]. Considering the good agreement between the estimation (6.9% total complex damage, 2.6% for DSB and 4.3% for cBD) and experimental data (7.2% total complex damage, 2.5% for DSB and 4.7% for cBD) in Figure 1, the developed model for complex BD considering BD/SSB = 1.3 seems reasonable. Additionally, the assumed induction ratio of BD/SSB = 1.3 could be further validated from the cross sections for impact to the DNA strand and base presented by Bernhardt et al. [43]. that the efficiency for detecting BD under atomic force microscopy (AFM) operation is over 90% which is consistent with the experimental efficiency [41]. Moreover, this agreement shown in Figure  1B suggests (i) that the density of ionizations and electronic excitation events clearly reflects the damage complexity, (ii) that inter-lesion distance in the AFM experiment is within approximately 10 bp and (iii) that 9 ionization and electronic excitation events are needed for inducing one additional BD at the DSB site.  [37,38] for (A) and with the direct observation technique with AFM after 70 kVp Xray irradiation [41] for (B). The calculation represents the present model estimation based on Equations (2)-(5) and cluster analysis based on Table 1, and the determination of isolated or complex damage also follows the summary in Table 1.  (5)) BD/BD/BD/BD Complex 20 ≤ Ncl < 29 YcBD× f (20 ≤ Ncl < 29), (Equation (5)) * The maximum inter-lesion distance, Lc, was set to be 10 bp for sampling the events per cluster.
Based on the comparison results shown in Figure 1, the simple methodology in the present DNA damage model is reasonable and sufficient for estimating complex BDs and for identifying the  [37,38] for (A) and with the direct observation technique with AFM after 70 kVp X-ray irradiation [41] for (B). The calculation represents the present model estimation based on Equations (2)-(5) and cluster analysis based on Table 1, and the determination of isolated or complex damage also follows the summary in Table 1.
* The maximum inter-lesion distance, L c , was set to be 10 bp for sampling the events per cluster.
To directly compare our simulation results with experimental complex damage data [41], we considered the experimental detection efficiency η in our simulation. Focusing on the type of complex damage, as shown in Figure 1B, the present cluster analysis for identifying complex DSB types, in which the efficiency η = 0.9 was considered, also accurately reproduced the experimental results [41]. It should be noted that the yields of a BD and a cBD, Y BD and Y cBD , are proportional to the efficiency η and the square η 2 , respectively (e.g., Y BD × η and Y cBD × η 2 ). Regarding the case of η = 1.0, the simulation was in better agreement with the experimental data [41] than the estimated value with η = 0.9, proving that the efficiency for detecting BD under atomic force microscopy (AFM) operation is over 90% which is consistent with the experimental efficiency [41]. Moreover, this agreement shown in Figure 1B suggests (i) that the density of ionizations and electronic excitation events clearly reflects the damage complexity, (ii) that inter-lesion distance in the AFM experiment is within approximately 10 bp and (iii) that 9 ionization and electronic excitation events are needed for inducing one additional BD at the DSB site.
Based on the comparison results shown in Figure 1, the simple methodology in the present DNA damage model is reasonable and sufficient for estimating complex BDs and for identifying the number of BDs at the DSB site. This indicates that the cluster analysis techniques [44,45] also are reasonable for predicting complex DNA damage.

Testing for Consistency with Other Simulations by Different Codes
We determined the complex damage type (i.e., double-strand break (DSB), DSB/base damage (BD), DSB/BD/BD) using the present model based on a simple cluster analysis using the number of events in a small volume with 10 bp radius, N cl (based on Table 1). The additional comparison between the present model and the previous simulations, such as by Kurbuc [44] and Geant4-DNA [45], thus becomes strong evidence to prove that these assumptions of 9 events for inducing one additional BD at the DSB site and the yield ratio of BD/SSB = 1.3 is reasonable. Added to the comparison between the simulation and the experimental data [37,38,41] (as shown in Figure 1), we also tested if this cluster model can reproduce the previous simulation data on complex DSB [44,45], i.e., DSB+ and DSB++, or not.
The comparison results between the present model coupled with Particle and Heavy Ion Transport System (PHITS) (based on Table 2) and previous simulation data calculated by Kurbuc [44] and Geant4-DNA [45] are summarized in Table 3. The present model with the PHITS calculation indicates that more than 20-30% of DSBs can be classified as complex forms. Among the 0.3 keV, 1 keV, 10 keV and 100 keV electrons, the yield of DSBs in the case of the 0.3 keV electron is highest [17], while the fraction of complex DSBs (cDSBs composed of DSB+ and DSB++) for the case of 1 keV electron is highest. This comparison indicates that the present model can reproduce the simulation results by using a different prediction model from the previous simulations [44,45], suggesting that this identification approach for complex damage type using the number of events in sites with a 10 bp radius (9 and 12 events for inducing a BD and a SB at each DSB site, respectively) is reasonable. Note that the other codes provide the complex damage yield based on the events and energy deposition to the DNA cylinder [2,29,44,45] and ion cluster size [46]. Considering the good agreements with other simulations [44,45] and the experimental data [41], this simplified model with a cluster analysis is, therefore, sufficient for identifying the complexity of DNA damage induced after X-ray and electron irradiations. Table 2. Classification of complex DSB coupled with strand breaks in the present model.

DNA Damage Type Symbol Complexity Event/Cluster N cl * Model for Yield Calculation
Single-strand breaks SSB Isolated N cl < 2 Y SSB , (Equation (2)) Double-strand breaks (+ strand breaks) * The maximum inter-lesion distance, L c , was set to be 10 bp for sampling the events per cluster. (i) SSB = single-strand break; (ii) DSB = double-strand break; (iii) DSB+ and DSB++ = DSB coupled with a strand break and two strand breaks within 10 bp, respectively; (iv) cDSB is the sum of the percentages of DSB yields coupled with strand breaks (DSB+ and DSB++).

Interpretation of Complex Base Lesions Directly Measured by Atomic Force Microscopy (AFM) Imaging
The most recent atomic force microscopy (AFM) techniques for detecting complex base damage (BD) enables us to quantify the complex damage type, i.e., BD/BD, BD/BD/BD and BD/BD/BD/BD in addition to complex double-strand breaks (DSBs) coupled with BD (e.g., DSB/BD and DSB/BD/BD) [41]. Concerning this detection technique for BD, streptavidin labelling of DNA with aldehyde reactive prob (ARP) and AFM imaging has to be used. To reproduce the experimental data on complex BD directly measured by AFM imaging [41], we must consider several experimental efficiencies, such as the enzymatic reaction efficiency and spatial resolution for ARP. Regarding this, we next estimated the fractions of complex BD and complex DSBs coupled with BD, and tried to reproduce the experimental results measured by AFM imaging [41]. This comparison between simulation and experiments contributes to the interpretation of a direct observation technique for complex BD. Figure 2 shows the comparison between the simulation results by our model and the experimental data [41], where (A) is the fraction of isolated BD and the complex damage composed of DSB and cBD, and (B) is the fractions of DSB, BD/BD, DSB/BD, BD/BD/BD, DSB/BD/BD and BD/BD/BD/BD. It should be noted that the inter-lesion distance within 10 bp was used for estimating the yield of complex damage. Regarding both cases of η = 1.0 and 0.9, the calculated fractions of complex damage were in good agreement with the experimental results [41], as shown in Figure 2A. Additionally, as shown in Figure 2B, highly complex forms (i.e., BD/BD/BD) and a high fraction of BD/BD can be seen in the simulation results, while the experiments did not show such tendencies.

Interpretation of Complex Base Lesions Directly Measured by Atomic Force Microscopy (AFM) Imaging
The most recent atomic force microscopy (AFM) techniques for detecting complex base damage (BD) enables us to quantify the complex damage type, i.e., BD/BD, BD/BD/BD and BD/BD/BD/BD in addition to complex double-strand breaks (DSBs) coupled with BD (e.g., DSB/BD and DSB/BD/BD) [41]. Concerning this detection technique for BD, streptavidin labelling of DNA with aldehyde reactive prob (ARP) and AFM imaging has to be used. To reproduce the experimental data on complex BD directly measured by AFM imaging [41], we must consider several experimental efficiencies, such as the enzymatic reaction efficiency and spatial resolution for ARP. Regarding this, we next estimated the fractions of complex BD and complex DSBs coupled with BD, and tried to reproduce the experimental results measured by AFM imaging [41]. This comparison between simulation and experiments contributes to the interpretation of a direct observation technique for complex BD. Figure 2 shows the comparison between the simulation results by our model and the experimental data [41], where (A) is the fraction of isolated BD and the complex damage composed of DSB and cBD, and (B) is the fractions of DSB, BD/BD, DSB/BD, BD/BD/BD, DSB/BD/BD and BD/BD/BD/BD. It should be noted that the inter-lesion distance within 10 bp was used for estimating the yield of complex damage. Regarding both cases of η = 1.0 and 0.9, the calculated fractions of complex damage were in good agreement with the experimental results [41], as shown in Figure 2A. Additionally, as shown in Figure 2B, highly complex forms (i.e., BD/BD/BD) and a high fraction of BD/BD can be seen in the simulation results, while the experiments did not show such tendencies.  [41], where the energy of the X-ray is 70 kVp. The calculation represents the estimation based on Equations (2)-(5), where the determination of isolated or complex damage follows the summary in Table 1.
From the comparisons in Figure 2 to reproduce the experimental results, we next considered the loss for detecting complex BD composed of two vicinal lesions caused within 5 bp due to the big ARP   Table 1.
From the comparisons in Figure 2 to reproduce the experimental results, we next considered the loss for detecting complex BD composed of two vicinal lesions caused within 5 bp due to the big ARP size (approximately a 10 bp diameter). Considering these, the yield of complex BD with the detection loss, Y cBD * , can be expressed as: where Y cBD (10 bp) and Y cBD (5 bp) are the yields of complex BD caused within 10 bp and 5 bp, respectively. Under this assumption, the detection of complex BD containing more than two BDs within 10 bp was estimated to be completely impossible from the simulation standpoint. The estimations based on Equation (1) also are described as right bars in Figure 2A,B. The estimated fractions of complex damage were in good agreement with the experimental data, compared to the simulation without the detection loss. Based on these results, we found that the detection for complex BD (especially, 3BD and 4BD) was still difficult by means of an in vitro experiment. The experimental results for the cBD, i.e., 2BD, 3BD and 4BD, thus should be corrected using this simulation technique. The experimental process for detecting BD in AFM [41] induced all types of BDs treated with DNA glycosylases (resulting apurinic/apyrimidinic (AP) sites) which can be labelled with an ARP that has both the alkoxyamine for the reaction with the aldehyde group of DNA and the biotin moiety for the subsequent labelling. Since the biotin moiety bound to DNA can be tagged with streptavidin (53 kDa as a large molecule), the resulting ARP-streptavidin complex can be visualized with AFM. During the series of the labelling processes, it was experimentally interpreted that overlapping and uncoupling biotins might result in a reduced efficiency for visualizing complex BDs induced within a few bp, i.e., down to about 70-80% [41]. Regarding this, the evaluation shown in Figure 2 might reflect a 3D structure problem and reduced labelling efficiency.

Estimation of Complex DNA Damage for Mono-Energetic Electron
The model for estimating the yields of complex DNA damage was tested in these comparisons shown in Figure 1 and Table 3. Using the present model, we further calculated the yields of the complex base damage (BD) including at least two BD, Y cBD , and the ratio of the complex and isolated BD, Y cBD /Y BD , as functions of incident electron energy.
Y cBD and Y cBD /Y BD , estimated by using the maximum inter-lesion distance L c = 10 bp (used for comparisons) for mono-energetic electron exposure, are shown as the red circle in Figure 3. We also show Y DSB and Y DSB /Y SSB using the blue square in Figure 3 to compare Y DSB with Y cBD . As reported previously [17], the peak of the number of linkages per incident electron energy was found to be around 0.3 keV electrons. The maximum Y DSB and Y cBD for a 10 bp cluster size (L c = 10 bp) also were found to be 3.24 × 10 −11 (Gy −1 Da −1 ) (Y DSB /Y SSB = 14.5%) and 4.58 × 10 −11 (Gy −1 Da −1 ) (Y cBD /Y BD = 16.2%), respectively. The cluster size for complex BD (cBD) was conventionally set to be 3 bp, corresponding to 1.0 nm in other simulations [8]. Regarding this, we also calculated the cBD caused within 3 bp, which is shown as the green triangle in Figure 3. Changing the cluster size down to 3 bp, the maximum Y cBD becomes much lower than that with L c = 10 bp (Y cBD = 1.37 × 10 −11 (Gy −1 Da −1 ), Y cBD /Y BD = 3.96% for 0.3 keV electron).

Monte Carlo Simulations of X-Ray and Electron Processes
To compare our model with the experimental yield of complex DNA damage after X-ray irradiation [37,38,41], we used two types of spectrums of X-rays: 150 kVp [37,38] and 70 kVp [41], both with a 0.2 mm Al filter, and simulated them with the Particle and Heavy Ion Transport System (PHITS) code [47]. The X-ray spectrums were estimated according to the semiempirical model reported by Tucker et al. [48]. We adapted an electron gamma shower (EGS) [49] mode for photon transport and an electron track structure mode (etsmode) [50][51][52][53][54][55] for electron transport in the PHITS calculation. It should be noted that the "etsmode" implemented in the PHITS code was verified from

Monte Carlo Simulations of X-Ray and Electron Processes
To compare our model with the experimental yield of complex DNA damage after X-ray irradiation [37,38,41], we used two types of spectrums of X-rays: 150 kVp [37,38] and 70 kVp [41], both with a 0.2 mm Al filter, and simulated them with the Particle and Heavy Ion Transport System (PHITS) code [47]. The X-ray spectrums were estimated according to the semiempirical model reported by Tucker et al. [48]. We adapted an electron gamma shower (EGS) [49] mode for photon transport and an electron track structure mode (etsmode) [50][51][52][53][54][55] for electron transport in the PHITS calculation. It should be noted that the "etsmode" implemented in the PHITS code was verified from

Monte Carlo Simulations of X-ray and Electron Processes
To compare our model with the experimental yield of complex DNA damage after X-ray irradiation [37,38,41], we used two types of spectrums of X-rays: 150 kVp [37,38] and 70 kVp [41], both with a 0.2 mm Al filter, and simulated them with the Particle and Heavy Ion Transport System (PHITS) code [47]. The X-ray spectrums were estimated according to the semiempirical model reported by Tucker et al. [48]. We adapted an electron gamma shower (EGS) [49] mode for photon transport and an electron track structure mode (etsmode) [50][51][52][53][54][55] for electron transport in the PHITS calculation. It should be noted that the "etsmode" implemented in the PHITS code was verified from various endpoints including range, stopping power, nanodosimetry and double-strand break (DSB) yield [17]. We sampled the secondary electron spectrums generated by 150 kVp and 70 kVp X-rays and transported the electrons in liquid water. The cut-off energy of electrons was set as 1 eV. The coordinates of inelastic interactions were then output using a tally named "t-userdefined", as reported previously [17].

Model for Estimating Single-and Double-Strand Break Yields
Using the calculated electron spectrum, we estimated strand break yields. According to the DNA damage model previously developed [17], the number of the events, N event , and that of the linkages within 3.4 nm (10 bp), N link (10) , were sampled to calculate the yield of single-strand breaks (SSBs) Y SSB and that of double-strand breaks (DSBs) Y DSB in Gy −1 Da −1 . Y SSB and Y DSB can be calculated by: where k SSB = 5.66 × 10 −12 (keV Gy −1 Da −1 ), k DSB = 1.61 × 10 −13 (keV Gy −1 Da −1 ) [17], and E dep is the energy deposited by electron inelastic events in keV. These coefficients of k SSB and k DSB were found to reproduce the experimental yields of SSB and DSB after exposure to 220 kVp X-rays in our previous report [17]. It should be noted that 10 bp was defined as the classical distance for two SSBs leading to a DSB [11]. Based on Equations (2) and (3), we calculated the DNA strand break yields under 150 kVp, 70 kVp X-rays and mono-energetic electron irradiations.

Model for Estimating Base Damage Yields
We obtained the coefficient of base damage (BD) induction, k BD (keV Gy −1 Da −1 ), in the presence of a 10 mM tris (hydroxymethyl) aminomethane-HCl buffer from the experimental literature reporting the yield ratio of BD and single-strand break (SSB), which was given as k BD /k SSB = 1.3 [37]. It should be noted that the tris-HCl concentration was almost equivalent to liquid water due to the low radical scavenging capacity [37]. Based on this ratio, we deduced the coefficients of isolated BD and complex BD (cBD) to be k BD = k SSB × 1.3 = 7.36 × 10 −12 (keV Gy −1 Da −1 ) and k cBD = k DSB × 1.3 2 = 2.72 × 10 −13 (keV Gy −1 Da −1 ), respectively. Using the same manner as in the strand break case, the yields of BD and cBD can be expressed by: where Y BD and Y cBD are the yields of BD and cBD in Gy −1 Da −1 , respectively, and L c is the maximum distance in bp between two events to sample the linkage. Because the maximum inter-lesion distance, L c , for cBD depends on the experimental detection conditions, Y cBD should be the yield as a function of L c . Complex BD can be detected as non-DSB in the enzymatic cleavage technique [37,38], while the diameter of an aldehyde reactive prob (ARP) avidin labelled at a BD site is equal to about 1.5 times the width of a DNA ladder (2.3 nm) in the direct observation technique for BD by atomic force microscopy (AFM) [41]. Based on these, the parameter L c was set to be 10 bp (3.4 nm) for comparing the simulation experiments with both enzymatic cleavage and AFM techniques. Considering that the distance between two adjacent ARPs can be within 10 bp, we assumed that L c was equal to 10 bp in the simulation for the AFM experiment. Using this value for parameter L c , we calculated the yields of isolated and complex BD [37,38,41]. After comparison with the corresponding experimental data, we also estimated Y cBD under the conditions of L c = 3 bp as an example inducing much toxicity.

A Cluster Analysis for Determining Complex Damage Type
Because the modelling for the yield estimation of double-strand break (DSB) and complex base damage (cBD) does not enable us to determine the damage complexity, we added a cluster analysis for calculating the event density around a DSB or cBD site. We counted the number of ionization and electronic excitation events within a sampling site with the L c radius at a DSB or cBD site based on the number of events per cluster, N cl . Then, we determined the type of damage complexity from N cl in reference to the cluster analysis reported by Yoshii et al. [27]. To reproduce the complex DSB coupled with base damage (BD) measured in the literature [41], it was estimated that approximately 9 events per cluster were needed on average to induce a BD within a 10 bp separation from a DSB or 2BD (cBD) site. Additionally, the mean N cl to induce a simple DSB or 2BD site was calculated to be 6 in our previous study [17]. We therefore assumed that the ranges of N cl to induce simple DSB (2BD), DSB+ BD (3BD) and DSB + 2BD (4BD) are 2 ≤ N cl < 11 (6 on average), 11 ≤ N cl < 20 (15 on average), 20 ≤ N cl < 29 (24 on average), respectively. Under these assumptions, the mean deposition energy to cause DSB + BD within 10 bp, i.e., N cl = 15, was estimated to be 121.7 eV, which was within the range of its reference value (102.5-122.6 eV) given by a different calculation [29]. The criteria for determining DNA damage type is summarized in Figure 5 and Table 1. Using the fraction of N cl , f (N cl ), and the equations listed in Table 1, we calculated the yields of DNA damage, i.e., single-strand break (SSB), DSB (simple DSB), complex DSB (e.g., DSB + BD, DSB + 2BD), isolated BD, and complex BD (e.g., 2BD, 3BD, 4BD).

3.4.A Cluster Analysis for Determining Complex Damage Type
Because the modelling for the yield estimation of double-strand break (DSB) and complex base damage (cBD) does not enable us to determine the damage complexity, we added a cluster analysis for calculating the event density around a DSB or cBD site. We counted the number of ionization and electronic excitation events within a sampling site with the Lc radius at a DSB or cBD site based on the number of events per cluster, Ncl. Then, we determined the type of damage complexity from Ncl in reference to the cluster analysis reported by Yoshii et al. [27]. To reproduce the complex DSB coupled with base damage (BD) measured in the literature [41], it was estimated that approximately 9 events per cluster were needed on average to induce a BD within a 10 bp separation from a DSB or 2BD (cBD) site. Additionally, the mean Ncl to induce a simple DSB or 2BD site was calculated to be 6 in our previous study [17]. We therefore assumed that the ranges of Ncl to induce simple DSB (2BD), DSB+ BD (3BD) and DSB + 2BD (4BD) are 2≤ Ncl <11 (6 on average), 11< Ncl <20 (15 on average), 20≤ Ncl <29 (24 on average), respectively. Under these assumptions, the mean deposition energy to cause DSB + BD within 10 bp, i.e., Ncl = 15, was estimated to be 121.7 eV, which was within the range of its reference value (102.5-122.6 eV) given by a different calculation [29]. The criteria for determining DNA damage type is summarized in Figure 5 and Table 1. Using the fraction of Ncl, f(Ncl), and the equations listed in Table 1, we calculated the yields of DNA damage, i.e., single-strand break (SSB), DSB (simple DSB), complex DSB (e.g., DSB + BD, DSB + 2BD), isolated BD, and complex BD (e.g., 2BD, 3BD, 4BD).   Figure 5. Identification of DNA damage type in the present simulation. The illustration on the left represents an electron track structure and a DNA cylinder. Considering the distance between two ionization and electronic excitation events, we identified single-strand break (SSB), double-strand break (DSB), base damage (BD), complex BD (cBD). The complexities of DSB and cBD then were determined from the number of events per cluster, N cl , at the complex damage site (DSB and cBD). After estimating the yield of each DNA damage type, we compared the estimated results with experimental data [37,38,41] (right images are example pictures obtained with AFM [41]).

Comparison between Estimation and Experimental Data
We compared the estimated fractions of single-strand break (SSB), double-strand break (DSB), base damage (BD) and complex BD (cBD) with experimental results measured by enzymatic cleavage [37,38]. Using this comparison, we checked the model performance for estimating the yield ratios, BD/SSB and cBD/BD. Then, we compared the fractions of complex DSB (e.g., DSB, DSB/BD, DSB/BD/BD) estimated by this model with experimental data measured by the atomic force microscopy (AFM) imaging [41] to check the performance of this cluster analysis for estimating complex DSB.
After that, to interpret the most recent technique for detecting complex damage type using AFM [41], we compared the estimated fractions of isolated BD and complex damage (cBD and DSB) yields with those measured in the AFM experiment [41]. We also identified all complex damage type (e.g., DSB, BD/BD, DSB/BD, BD/BD/BD, DSB/BD/BD, BD/BD/BD/BD) shown in Figure 5 and calculated the fraction of each complex form. We then compared the estimated complex damage fractions with the experimental data [41]. It should be noted that the inter-lesion distance was set to ≤10 bp throughout this analysis.

Additional Benchmark Test of Cluster Analysis for Identifying Complex DSBs
Classically, the yields of complex double-strand breaks (DSBs) containing one or more strand breaks (SBs) within a 10 bp separation has been reported. These yields are designated as DSB+ (DSB coupled with one SSB) and DSB++ (DSB coupled with two strand breaks). Added to the comparison of DSB coupled with base damage (BD) between this estimation and the experimental data [37,38,41], we also compared our calculation results of DSB+ and DSB++ with the computational results in the literature [44,45]. Under the assumptions that the mean N cl to induce an additional BD was 9 and the BD/SSB ratio was 1.3 [37], we then deduced that 12 events were needed on average for inducing an additional strand break at a DSB site. Applying the present cluster analysis and the criteria to additionally induce strand breaks (listed in Table 2), we identified the DSB+ from a DSB site on the basis of 14 ≤ N cl < 26 and DSB++ from a DSB site on the basis of 26 ≤ N cl < 38. Regarding the case of mono-energetic electron irradiation of 0.3 keV, 1 keV, 10 keV and 100 keV, we compared the DSB complexity (simple DSB, DSB +, DSB ++) estimated by our simulation to other simulations calculated by the Kurbuc [44] and the Geant4-DNA codes [45].

Conclusions
During this work, we compared experimental data of complex DNA damage to computational results based on an electron track structure calculated by the Particle and Heavy Ion Transport System (PHITS) code. Using the comparison between the simulations and the experimental data for complex DNA damage yields, it was confirmed (i) that the yield ratio of base damage (BD) and single-strand break (SSB) is 1.3; (ii) that the spatial pattern (density) of ionization and electronic excitation events reflects the damage complexity; and (iii) that 9 and 12 ionization and electronic excitation events are needed for inducing an additional one BD and one strand break at a double-strand break (DSB) site within a 10 bp separation. The present results indicate that conventional cluster analysis for inelastic interactions is a powerful tool and reasonable for reproducing the experimental complex DNA damage. Additionally, this model estimation can contribute to the interpretation of the experimental efficiency for detecting BD at complex damage sites and presents the fractions of complex DNA damage yields after X-ray and mono-energetic electron irradiations. While further development of this current model for high-LET irradiation is essential, this work can provide a simplified model for estimating the yield of complex DNA lesions which connects experimental and track structure simulations.
Funding: This work was supported by the Japan Society for the Promotion of Science KAKENHI (Grant no. 16H02959, 17K07022 and 19K17215).