Twelve out of nineteen (63%) of the barley flanking regions aligned wholly or partly to repetitive elements, indicating insertion of the T-DNAs into repetitive regions of genomic DNA. As at least 60% of the barley genome is predicted to comprise of repeats [28
], this is roughly the proportion of lines that would be expected to integrate into repetitive DNA by chance, if T-DNA integration occurred randomly throughout the genome. Zhao et al
] examined barley T-DNA flanking regions during the establishment of a gene tagging system using the maize Ds element. They found that 40% of the T-DNA insertions had significant homology to monocot EST database entries, around another 8% were close to coding regions and 36% of insertions were in repetitive regions suggesting a preference for insertion into non-redundant, gene-containing regions of the genome. However, when we took a subset (the first 21 lines) of these insertion sites and blasted against the TREP (Triticeae Repeat Sequence) database, 13 gave significant hits (from 75%–100% homology). This equates to 62% inserting into, or near to, repetitive elements, a very similar figure to that obtained in the current study. It is likely that the inclusion of alignments to the TREP database in the current study explains the increased proportion of flanking regions aligning to repetitive elements compared to that seen by Zhao et al
]. The majority of repeat elements identified in our study were retrotransposons, but some DNA transposons were also identified. Retrotransposons replicate by transcription followed by reverse transcription and integration of the cDNA back into the genome. Repetitive sequences within cereals are often referred to as being transcriptionally-inactive “junk DNA”, however recent studies have shown that many barley retrotransposons are active [28
]. This finding is supported by the current analysis, with 77.8% of the retrotransposons identified here aligning to cereal EST sequences within the NCBI database. The presence of active repetitive regions within EST sequences provides a possible explanation for the apparent discrepancy between our findings and the findings of Zhao et al
]. Our findings are not in disagreement with the conclusion that T-DNAs may preferentially insert into active genomic regions. Previous studies in Arabidopsis
]), rice [27
], tobacco [20
] and barley [2
] have reported that transgenes have a tendency to insert into or close to genes. It is interesting therefore that such a high proportion of the single-copy T-DNAs characterised here appear to have inserted into or close to repetitive elements. However, the identification of one or more repetitive elements in a T-DNA flanking region does not necessarily mean that the T-DNA has not integrated into or close to a gene. Many repetitive elements are positioned close to genes, and the miniature inverted-repeat transposable element (MITE) for example (as identified in the flanking sequence of line 71-09-01) has a tendency to insert into the non-coding regions of genes [32
]. Differences between this study and those carried out in Arabidopsis
may also be the result of the differing genome compositions of Arabidopsis
and cereal genomes. Whilst barley and other cereal genomes contain a high proportion of repetitive elements, only 5% of the Arabidopsis
genome is predicted to comprise of repeats [28
therefore has a much higher gene density, making it more likely that T-DNAs will insert into or close to a gene by chance.
Some authors of previous reports have classified T-DNAs as being inserted into protein-coding regions solely on the basis of flanking sequence alignments to EST sequences [27
] without attempting to align the flanking sequences or ESTs to characterised repeat elements. As shown by the analyses reported here this approach is unreliable, as many retrotransposons are represented in the EST databases. One may argue that an EST alignment at least confirms the presence of a T-DNA insertion close to a transcriptionally active region of the genome. However, similarity to a transcribed retrotransposon does not necessarily signify that the specific retrotransposon flanking the T-DNA is transcribed. Retrotransposons are present in high numbers within cereal genomes, and not all of them are active.
Salvo-Garrido et al
] concluded that transgenes preferentially integrate into gene-rich regions of the barley genome on the basis of flanking sequence alignments obtained for seven lines generated by particle bombardment. Lines generated by particle bombardment present additional difficulties for the analysis of flanking sequences because the plasmid break-point must be identified before proceeding to isolate flanking sequences. In addition, such lines are known to contain more complex transgene integrations [33
]. Physical mapping of transgene insertions suggested a non-random pattern of insertion [2
]. This is in agreement with the analysis of large numbers of rice T-DNA flanking sequences that revealed a non-random distribution of T-DNA insertions with a bias towards certain chromosomes [34
The flanking sequence data reported here appears to show no correlation between the local T-DNA integration-site and transgene expression. Previous reports have suggested that integration within repetitive DNA can lead to low transgene expression (of the gene of interest and the selection gene) which can bias the retrieval of transformed lines towards those with their T-DNA inserted within genic regions [4
]. However, in this study, insertion of T-DNA into repetitive DNA did not appear to have an obvious negative effect on luciferase activity, with many highly expressing lines appearing to have their T-DNA inserted within retrotransposons. As noted before, individual retrotransposons may be active or inactive, and therefore it is not possible to conclude whether or not a particular T-DNA has inserted into a transcriptionally active region of genomic DNA purely on the basis of a retrotransposon or EST alignment.
We therefore propose that the evidence for preferential integration of transgenes into gene rich regions of barley is not as strong as previously reported. The data shown here suggests that the insertion of T-DNAs into repetitive regions of barley DNA is not having a negative effect on transgene expression, and hence is not leading to the biased selection of transformed lines with T-DNAs inserted within genes. A factor influencing this may be the high number of actively transcribed retrotransposons within the barley genome [28
]. Of the 15 lines found here to be inserted into or close to regions aligning to repeats or ESTs, ten aligned strongly to repeats, three aligned only to ESTs, and two featured regions of alignment to both. A further four sequences showed no alignments to either although they did show homology to unannotated cereal DNA. The three ESTs which did not align to repetitive elements showed no significant alignments to characterised proteins; it is therefore possible that some of them correspond to further unidentified repeat elements.
It is now possible to target transgenes to specific genomic locations [35
]. However, until such technology is used for all transgenic crop production, analysis of transgene flanking regions is required to fully understand the transgene genomic environment and to detect re-arrangements such as those highlighted in this study.