Genetic Variation in Soybean at the Maturity Locus E4 Is Involved in Adaptation to Long Days at High Latitudes

Soybean (Glycine max) cultivars adapted to high latitudes have a weakened or absent sensitivity to photoperiod. The purposes of this study were to determine the molecular basis for photoperiod insensitivity in various soybean accessions, focusing on the sequence diversity of the E4 (GmphyA2) gene, which encodes a phytochrome A (phyA) protein, and 118 its homoeolog (GmphyA1), and to disclose the evolutionary consequences of two phyA homoeologs after gene duplication. We detected four new single-base deletions in the exons of E4, all of which result in prematurely truncated proteins. A survey of 191 cultivated accessions sourced from various regions of East Asia with allele-specific molecular markers reliably determined that the accessions with dysfunctional alleles were limited to small geographical regions, suggesting the alleles' recent and independent origins from functional E4 alleles. Comparison of nucleotide diversity values revealed lower nucleotide diversity at non-synonymous sites in GmphyA1 than in E4, although both have accumulated mutations at almost the same rate in synonymous and non-coding regions. Natural mutations have repeatedly generated loss-of-function alleles at the E4 locus, and these have accumulated in local populations. The E4 locus is a key player in the adaptation of soybean to high-latitude environments under diverse cropping systems.


Introduction
Flowering time determines the adaptability of plant species to diverse environments.Molecular dissection of flowering behavior in Arabidopsis thaliana has revealed that the transition from vegetative to reproductive growth is under the control of a complicated network involving more than 60 genes [1].Natural allelic variation has been surveyed to explore the molecular mechanisms underlying the adaptation of Arabidopsis to diverse environments, but has been identified in only a few of these flowering genes [2].Among the genes, FRIGIDA (FRI) and FLOWERING LOCUS C (FLC), which are pivotal regulators of the vernalization pathway, each exhibit a high degree of functional polymorphism, which likely underlies the extensive natural variation in flowering time [3][4][5][6][7].More than 25 independent loss-of-function alleles have so far been described at the FRI locus [2].Most early-flowering spring annual ecotypes have evolved multiple times from late-flowering winter annual ancestors through independent loss-of-function mutations in one or both genes.
Natural variations in flowering time have been explored in another model plant species, rice (Oryza sativa).In contrast to Arabidopsis, rice is a short-day plant with no response to vernalization.The association of variation in flowering time with DNA polymorphisms of major flowering genes has been assayed in a core collection of 64 rice cultivars [8].Flowering time in rice is closely correlated with expression levels of Hd3a, a rice ortholog of Arabidopsis FLOWERING LOCUS T (FT).Variability in the expression of Hd3a is due in part to sequence variations in the Hd3a promoter region, but is also likely to be affected by different alleles of Hd1, a rice ortholog of Arabidopsis CONSTANS (CO).Sequencing Hd1 in the core collection identified 17 haplotypes, 9 of which are nonfunctional owing to frame-shift and nonsense mutations; the presence of these mutations suggests that polymorphism in Hd1 is one of the main causes of the diversity of flowering time in rice [8].The findings in Arabidopsis and rice thus suggest that different genes underlie the natural variation in the control of flowering in these two species, and that independently induced mutations at a few key loci have repeatedly contributed to the natural variation in flowering of the two species.Soybean (Glycine max) is cultivated over a wide range of latitudes, from the equator to high latitudes of at least 50°N.However, each cultivar is restricted to a relatively narrow range of latitudes.The wide adaptability of soybean has thus been created by natural variation in a number of major genes and quantitative trait loci (QTL) that control flowering behavior.Soybean is a short-day plant, and flowering is induced when the day length is shorter than a critical length.This sensitivity to photoperiod is weak or absent in soybean cultivars adapted to high latitudes, which should initiate flowering under long-day (LD) conditions of early summer to mature within limited frost-free seasons.Four major maturity loci-E1, E3, E4, and E7-have so far been reported to be involved in the control of this insensitivity [9-15, reviewed in 16].Recent molecular analyses have revealed that E3 and E4 encode the phytochrome A (phyA) proteins GmphyA3 and GmphyA2, respectively [17,18]; and that E1 encodes a protein that contains a putative bipartite nuclear localization signal and a region distantly related to a B3 domain, and controls time to flowering by suppressing the expression of two soybean orthologs of FT, GmFT2a and GmFT5a, under the regulation of E3 and E4 [19].A phyA-regulated E1-GmFT pathway is thus a key determinant of the adaptation of soybean to long-daylength environments.
The genetic mechanism of the photoperiod insensitivity varies among cultivars [16].Genetic analyses have revealed that soybean cultivars and landraces that are adapted to the cool summers of northern Japan possess recessive genotypes at the E3 and E4 loci, namely e3e3e4e4 [20,21].Another group of photoperiod-insensitive cultivars are grown mainly as a short-season crop across Japan and the Korean peninsula.One of these cultivars, Sakamotowase, has the e3e3E4E4 genotype [20,21] and an allele at or a gene tightly linked to the E1 locus that controls the insensitivity in the presence of E4 [21].Xia et al. [19] analyzed the E1 sequence of Sakamotowase, and found that it possessed a dysfunctional allele, e1-fs, that produced a truncated protein that was unable to suppress the function of GmFTs owing to a premature stop codon due to a frame shift caused by a single-base deletion.The photoperiod insensitivity of Sakamotowase is thus most likely controlled by a dysfunctional allele at the E1 locus under the genetic background of the e3e3E4E4 genotype.Accordingly, at least two genetic mechanisms are known so far to be involved in the insensitivity to photoperiod of soybean.
In addition to these two cultivar groups, various other landraces and cultivars that are adapted to LD conditions of high latitudes are also insensitive to photoperiod, but the genetic mechanisms involved are unknown.Here, we report that most of these cultivars possess independently induced dysfunctional alleles at the E4 locus.Our data suggest that independent mutations at this locus have contributed to the adaptation of soybean to LD conditions of high latitudes.

Classification of Photoperiod-Insensitive Soybean Accessions
The genetic variation underlying photoperiod insensitivity was surveyed for the 27 accessions collected from various regions of East Asia (Figure 1A).When sown in the late of May, these accessions flowered in the middle to late of July in Sapporo, Japan (43°06 N, 141°35 E) in which the natural daylength including twilight reached a maximum of 16.5 h, and exhibited no marked delay in flowering in artificially-induced LD conditions of 20 h generated by incandescent lamps.They were classified into three distinct groups (I-III), two singletons (IV, V), and a separate group (VI), by means of UPGMA cluster analysis of the combined data for 11 isozymes and 9 SSRs (Figure 1B).Similar results were obtained in the analyses using isozymes and SSRs separately (data not shown).Group I consisted mainly of the landraces from Hokkaido, Japan, and far-eastern Russia (accessions 1-8), including Miharudaizu, whose genotype at the E1, E3, and E4 loci was determined as E1E1e3e3e4e4 [20].The Group I accessions from northern Japan (accessions 1-7) possessed the same genotype at all of the marker loci tested, although they differed in their time to flowering and their seed coat colors.Group II consisted of landraces that are grown as a short-season crop across Japan and the Korean peninsula (9)(10)(11)(12)(13)(14)(15), including Sakamotowase, which has the genotype e3e3E4E4 and a dysfunctional allele (e1-fs) at the E1 locus [19][20][21].Group III consisted of landraces collected in northern Honshu, Japan (16)(17)(18)(19)(20)(21).Tsukue-4 (group IV, accession 22) and Otomewase (group V, accession 23) together formed a loose clade.The accessions from northeastern China and far-eastern Russia (24-27) formed a loose clade that was separate from the other five groups.The genotypes at the E1, E3, and E4 maturity loci of the accessions in groups III to VI have not yet been determined.

Sequence and DNA Marker Analyses of E4
A crossing experiment between Miharudaizu (Group I) and Kamaishi-17 (Group III) exhibited no transgressive segregation in flowering time under the artificially-induced LD conditions (our unpublished data).This suggests that Kamaishi-17 possesses the same genotype (e3e3e4e4) at the E3 and E4 loci as is the case in Miharudaizu.The dysfunction of e4 allele in Miharudaizu is caused by an insertion of a Ty1/copia-like retrotransposon, SORE-1, in exon 1 [17].An analysis with allele-specific markers, which detect the presence or absence of the insertion, however, revealed that Kamaishi-17 did not have the e4 allele in which SORE-1 had been inserted [22].This prompted us to analyze the E4 (GmphyA2) sequences of Kamaishi-17 and the other photoperiod-insensitive accessions.
The sequence analysis revealed that Kamaishi-17 had a single-base deletion at position 3085 from the adenine of the start codon in exon 2 (Figure 2A).This deletion resulted in a frame shift that led to premature termination of translation, and the gene was thus predicted to produce a truncated protein of 894 amino acids (AA) in length (Figure 2B).The result obtained from the sequence analysis was thus in good agreement with our expectation from the crossing experiment, indicating that Kamaishi-17 possessed a dysfunctional e4 allele, as was the case in Miharudaizu.We then extended the sequence analysis to the other three accessions, which were selected from each of groups IV to VI (Tsukue-4, Otomewase, and Keshuang).Interestingly, all had single-base deletions at different sites in exons 1 (Otomewase) or 2 (Tsukue-4 and Keshuang), and these variants were predicted to produce truncated proteins of different lengths: 456 AA in Otomewase, 759 AA in Tsukue-4, and 979 AA in Keshuang (Figure 2A,B).The SORE-1-inserted e4 allele produced a truncated protein of 237 AA [17].The predicted AA sequences produced in Kamaishi-17 and Keshuang lacked a histidine-kinase domain required for phosphorylation, but retained the two PAS domains (PAS1 and PAS2) that are important for downstream signaling, whereas the Otomewase variant lacked all three domains, and the Tsukue-4 variant lacked both the PAS2 and histidine-kinase domains.No other DNA polymorphism was detected in the sequences, other than these deletions, among the accessions we tested or between those and Williams 82, a cultivar that was used for whole-genome sequencing (Glyma20g22160).We designated these variant alleles after the names of cultivars: e4-oto in Otomewase, e4-tsu in Tsukue-4, e4-kam in Kamaishi-17, and e4-kes in Keshuang.We then developed markers to reliably determine which alleles the remaining photoperiod-insensitive accessions possessed (Figure 3).The five other accessions from Group III had a PCR product with the same digestion pattern as Kamaishi-17: when digested by AflII, the amplified 494-bp product was separated into fragments of 286 and 208 bp.Similarly, the three accessions of Group VI from far-eastern Russia (Zeya-2, Oktyabr-70, and Severnaya-4) had a PCR product with the same digestion pattern as Keshuang: when digested by BspHI, the amplified 494-bp product was separated into fragments of 399 and 95 bp.The digestion patterns observed in e4-oto and e4-tsu were not detected in the rest of the collection of 27 accessions.Furthermore, the marker analyses for the four alleles and the e4 allele containing the SORE-1 insert revealed that the accessions in Group II all possessed the dominant E4 allele, like Sakamotowase, whereas those in Group I all possessed the e4 allele containing the SORE-1 insert, like Miharudaizu.Accordingly, all of the photoperiod-insensitive accessions except for the Group II accessions had different loss-of-function alleles due to single-base deletions or the insertion of SORE-1.

Survey of Genetic Variation Using Allele-Specific DNA Markers
To determine the geographical distributions of the newly detected dysfunctional alleles at the E4 locus, we extended the marker analysis to a total of 164 cultivated soybean accessions sourced from East Asia (64 from China, 30 from Korea, and 70 from Japan; Supplemental Table S1), in addition to the 27 photoperiod-insensitive accessions.The digestion patterns at each of the four markers indicated that all of the accessions except for a landrace from northern Japan (Wasekeburi) possessed the dominant E4 allele.Wasekeburi possessed the e4-kam allele, which was distributed mainly in northern Honshu (Figure 1A).The other loss-of-function alleles were not detected in this collection.Therefore, these loss-of-function alleles appear to be rare in the cultivated soybean germplasm.

Comparison of Nucleotide Diversity between E4 and GmphyA1
Sequencing and DNA marker analyses revealed that photoperiod-insensitive accessions, except for those in Group II, possessed dysfunctional alleles at the E4 (GmphyA2) locus.The E4 gene possesses a homoeologous copy, GmphyA1 (Glyma10g28170) [17], owing to the paleopolyploid nature of the soybean genome [23].This raises questions about whether there are dysfunctional mutations responsible for earlier flowering in GmphyA1, and about the function, if any, of GmphyA1.To answer these questions, we sequenced E4 and GmphyA1 in wild and cultivated accessions collected from various regions of East Asia (Supplemental Tables S2 and S3).
Characteristics of the DNA polymorphisms in the E4 and GmphyA1 regions are summarized in Table 1.Recently released sequence data for 31 wild and cultivated soybeans [24], excluding those with missing or obscure data, were also included in the nucleotide diversity analysis.The E4 region comprised a total of 6341 aligned base pairs; across this region, 44 sites were polymorphic, comprising 33 SNPs, 9 single-or multiple-base insertion-deletions (indels), and 2 SSRs (Supplemental Figure S1).In addition to the 4 single-base deletions and the insertion of SORE-1, 10 SNPs occurred in exons, of which 4 generated amino acid substitutions.On the other hand, the GmphyA1 region comprised a total of 5517 aligned base pairs; across this region, 20 sites were polymorphic.Of these, 7 were detected in exons, of which only 1 SNP caused an amino acid substitution (Supplemental Figure S2).The analysis of GmphyA1 included 9 photoperiod-insensitive accessions that were analyzed for the E4 sequences.No sequence variation causing a dysfunction of GmphyA1 was detected in the 9 accessions; all had the same amino acid sequence as Williams 82, a photoperiod-sensitive cultivar (Supplemental Table S3 and Supplemental Figure S2).Two common measures of nucleotide diversity, Tajima's estimator of diversity [25] and Watterson's estimator [26], were calculated for synonymous and non-coding regions (s) and for non-synonymous sites (a) (Table 1).All of the mutations except the SSRs were collectively considered to be SNPs and were subjected to nucleotide diversity analysis.For the synonymous sites and non-coding regions, the two homoeologs showed similar nucleotide diversities for all accessions combined: = 1.35 × 10 3 for GmphyA1 and 1.12 × 10 3 for E4, and = 1.36 × 10 3 and 1.65 × 10 3 , respectively.Accordingly, mutations appear to have accumulated at almost the same rate in the two homoeologs since gene duplication occurred.On the other hand, the nucleotide diversity values for non-synonymous sites in GmphyA1 ( = 0.03 × 10 3 , = 0.16 × 10 3 ) were only 8% and 23%, respectively, of the corresponding values in E4 ( = 0.39 × 10 3 , = 0.70 × 10 3 ).Comparison of nucleotide diversity in the cultivated and wild soybeans further produced different results between the two genes: GmphyA1 had similar diversity in the cultivated and wild soybeans in all diversity parameters, whereas E4 (GmphyA2) had lower diversity in the cultivated soybean than in the wild soybean in all diversity parameters; cultivated soybean retained only 6% ( (s) ) to 33% ( (s) ) or 23% ( (a) ) to 56% ( (a) ) of the diversity present in the wild soybean population.

Haplotype Networks
Minimum-span haplotype networks were constructed using all of the observed polymorphisms to determine the origins of the dysfunctional alleles and to elucidate the structure of the variations observed in the cultivated and wild soybeans (Figure 4; Supplemental Figures S1 and S2).The haplotype network for the E4 region consisted of 17 haplotypes, including five non-functional alleles, four alleles detected in this study and the allele containing SORE-1 [17], with 2 putative unmapped recombinants.All of the loss-of-function alleles appear to have derived from haplotype 14.Wild soybeans possessed 12 haplotypes that were not found in cultivated soybeans.Only haplotype 14 was common to both wild and cultivated soybeans.All of the cultivated accessions that we tested except for a Chinese one (in haplotype 5) possessed haplotype 14 or non-functional alleles that were derived from the former haplotype.On the other hand, the haplotype network for the GmphyA1 region consisted of 13 haplotypes with 2 putative unmapped recombinants.Of these, 9 were specific to wild soybeans, 3 (haplotypes 3, 4, and 12) were common to both wild and cultivated soybeans, and 3 (haplotypes 1, 13, and 15) were specific to cultivated soybeans.The haplotypes for the GmphyA1 region that were observed in the cultivated soybeans were divided into two clusters (haplotypes 1 to 11 and haplotypes 12 and 13) that differed by at least 7 SNPs.The existence of such distantly related haplotypes in cultivated soybeans resulted in a higher nucleotide diversity in GmphyA1 than in wild soybeans (Table 1).In contrast, the reduction of nucleotide diversity in E4 in cultivated soybeans (Table 1) may be attributable to the predominant distribution of haplotype 14.

Figure 4.
Haplotype networks for two phytochrome A homoeologs, GmphyA1 and E4 (GmphyA2).Light blue circles, haplotypes specific to wild soybean (WS); orange, those common to both wild and cultivated soybeans; green, those specific to cultivated soybean (CS).Black arrows, insertions; white arrows, deletions.Black dots represent differences (e.g., SNPs) separating the haplotypes.The number of accessions that belong to each haplotype is given in parentheses.

Discussion
Sequencing and DNA marker analyses of E4 revealed that all of the photoperiod-insensitive accessions analyzed, except for those in Group II (Figure 1), have a dysfunctional allele at the E4 locus.We detected four new, independent dysfunctional alleles (e4-oto, e4-tsu, e4-kam, and e4-kes), all of which exhibited single-base deletions in the first or second exon.These deletions generated premature stop codons as a result of a frame shift, resulting in truncated proteins of different lengths.It is therefore likely that the loss of function in the E4 gene played an important role in the evolution of insensitivity to photoperiod in early-flowering, photoperiod-insensitive accessions that are adapted to high latitudes.Like the natural variations observed in FLC and FRI in Arabidopsis [3][4][5][6][7] and in Hd1 in rice [8], natural mutations have repeatedly generated loss-of-function alleles at the E4 locus in soybean.The survey of

3
(2) 191 cultivated soybean accessions collected from various regions of East Asia further revealed that the 4 dysfunctional alleles were limited to relatively small geographical regions, as was the case in the e4 allele containing the SORE-1 insert, which was detected only in northern Japan out of 332 cultivated and 85 wild soybean accessions that were surveyed [22].The loss-of-function alleles at the E4 locus may therefore have originated relatively recently and independently in different soybean landraces that possess the functional E4 allele of haplotype 14.Mutations leading to early flowering and the resultant early maturity would have permitted the use of diverse cropping systems and would consequently have extended the soybean production season.Under human selection, the loss-of-function alleles at the E4 locus may have accumulated multiple times in local populations.The E4 locus is therefore a key player in the adaptation of soybean to LD conditions of high latitudes and diverse cropping systems.
Soybean is a paleopolyploid species with a complex genome, which is estimated to have become duplicated both 59 and 13 million years ago [23, reviewed in [27]].Approximately 75% of the genes are present as multiple copies, some of which have diverged in their functions, as suggested by different expression patterns between homoeologs [28,29].phyA is one such example, and consists of 2 sets of homoeologous partners, GmphyA1/GmphyA2(E4) and GmphyA3(E3)/GmphyA4 [16][17][18].The presence of multiple copies of soybean phyA contrasts sharply with other legume species such as pea (Pisum sativum), Medicago truncatula, and Lotus japonicus, which all possess a single phyA gene [30].Of the four copies, GmphyA2 and GmphyA3 correspond to the soybean maturity genes E4 and E3, respectively; however, neither a major gene nor a QTL controlling flowering time has so far been reported near the genomic positions of GmphyA1 and GmphyA4.In particular, GmphyA4 is most likely dysfunctional in Williams 82, a cultivar that has been used for whole-genome sequencing, because of a deletion in the third exon [16,18].
E3 and E4 were originally identified by different responses of flowering to LD conditions induced by light with a high red (R) to far-red (FR) quantum ratio generated by R-enriched fluorescent lamps and by light with a low R:FR ratio generated FR-enriched incandescent lamps [9][10][11][12].E3 controls flowering under LD conditions with a high R:FR ratio; e3e3 recessive homozygous plants can initiate flowering under these conditions [9].E4 is involved in flowering under LD conditions with a low R:FR ratio; a recessive e4 allele is necessary for plants homozygous for the e3 allele to flower under these conditions [10][11][12][13].Both genes thus control flowering under LD conditions with a wide range of R:FR ratios, but in a non-additive manner.phyA is an FR sensor that is involved, directly or via interactions with other photoreceptors, in various developmental processes [31].It also acts as a red-light photoreceptor, particularly under R light with a high photon irradiance; in Arabidopsis, quadruple-null mutants for the phytochrome family (phyBphyCphyDphyE) that only contain functional phyA were able to respond to the R-mediated de-etiolation of seedlings and survive until flowering under continuous R light with a high photon irradiance [32,33].The different responses of E3 and E4 to LD conditions with different light qualities may therefore indicate that the two genes participate in different aspects of phyA functions.
On the other hand, the function of GmphyA1, a homoeolog of E4, remains undetermined, because no genetic variants producing any phenotypic differences have been available at this locus.However, two findings suggest that like E4, GmphyA1 is also involved in both the de-etiolation response and flowering under FR-enriched LD conditions [13,17].First, the e4 allele partially impaired the de-etiolation response to continuous FR light [17].This is in sharp contrast to the phyA null mutants of Arabidopsis, pea, and rice, which show a complete loss of the de-etiolation response under continuous FR light [34][35][36][37].E3 is not involved in the de-etiolation responses under either continuous R or FR light, suggesting that the redundancy in the de-etiolation response of the e4 allele may be attributable to GmphyA1 [17].Second, when combined with a dominant allele at the E1 locus, a double-recessive genotype for the E3 and E4 loci retains the photoperiod sensitivity, particularly to LD conditions with a low R:FR ratio (<1.0), although it is insensitive to LD conditions with a relatively high R:FR ratio (1.0-5.0)[13].These findings suggest that the homoeolog of E4, GmphyA1, itself functions redundantly with E4 in both de-etiolation responses and photoperiod responses under FR-enriched light.
The results obtained from our sequencing analyses of a diverse collection that included both wild and cultivated accessions introduced mainly from various regions of East Asia reveal that E4 and GmphyA1 exhibit almost the same nucleotide diversities at synonymous sites and in non-coding regions among all accessions combined, suggesting that the two phyA genes have accumulated mutations at almost the same rate since gene duplication.However, the nucleotide diversity at non-synonymous sites, as a whole, was lower in GmphyA1 than in E4.In particular, the dysfunctional mutations were concentrated in only E4, despite their predicted redundant functions in both photoperiod sensitivity and the de-etiolation response [13,17].The low diversity in non-synonymous sites at GmphyA1 may therefore indicate that there are some differences in phyA functions between the two homoeologs, and that GmphyA1 might have been more amenable than E4 to purifying selection.Further understanding of the function of GmphyA1 will be needed before we can explain why the mutations are concentrated in only one of the two homoeologs.
Nucleotide diversity in a homoeologous gene pair has also been evaluated in the soybean orthologs of Arabidopsis TERMINAL FLOWER 1 (TFL1), a gene involved in the phase transition in the shoot apical meristem (SAM) [38].The soybean TFL1 ortholog consists of two homoeologs, GmTFL1a and GmTFL1b, the latter of which is the determinate growth habit gene Dt1 [38,39].The two homoeologs are expressed differently: GmTFL1b is expressed mainly in the vegetative SAM and the roots, whereas GmTFL1a is expressed mainly in the stem tip after flowering and in the immature cotyledons [39].Arabidopsis TFL1 is highly expressed in the shoot apex and roots and weakly in the seeds and siliques (Arabidopsis eFP Browser [40,41]).Therefore, the different expression profiles of GmTFL1a and GmTFL1b may reflect the subfunctionalization of the Arabidopsis TFL1 gene.Tian et al. [38] found that at least four allelic variants at the Dt1 locus in the cultivated soybean population caused stem termination in the SAM as a result of single amino acid substitutions, whereas no non-synonymous mutation in GmTFL1a was detected in either wild or cultivated soybeans.Subfunctionalization following duplication of the multifunctional ancestral gene may have enabled one of the homoeologs to accumulate functional mutations under human selection without any constraints imposed by the other functions of the ancestral gene.The asymmetrical accumulation of dysfunctional mutations observed in the maturity gene E4 and its homoeolog may therefore reflect their subfunctionalization as well.and sequenced with an ABI PRISM 3100 Avant Genetic Analyzer using a BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems Japan, Tokyo, Japan).The sequences for novel dysfunctional alleles were further confirmed by cloning with the pGEN-T easy vector system (Promega K. K. Japan, Tokyo, Japan), followed by sequencing as described above.4.2.3.Analysis of the Distribution of Loss-of-Function Alleles Using DNA Markers Allele-specific DNA markers were developed from sequences flanking the mutation sites.We used cleaved amplified polymorphic sequence (CAPS) markers and derived CAPS (dCAPS) markers.The targeted region for each mutation was amplified from the DNA preparations using ExTaq polymerase with primers specific to each mutation.The primers used were -CCCAGACACTCTTGTGTGAT--CCATACTCTCGGTATCTTTG-e4-oto -CACCCTAGGAGTTGTGTTGTT--GCGGTTCTGTACAATTGCCTGATA-e4-tsu -CTTAATAAAGCCATGACTGGTTTG--CTTGAGTTTCAATGAGGTTTCAAC-e4-kam and e4-kes.A marker analysis for the e4 allele containing inserted SORE-1 to detect amplification products of different lengths was carried out as described in [17], using a common forward primer, -AGACGTAGTGCTAGGGCTATallele--GCATCTCGCATCACCAGATCA-E4 an -GCTCATCCCTTCGA ATTCAG-e4.The PCR products were digested with appropriate restriction enzymes for all of the alleles except for the SORE-1-inserted e4 (SacI for e4-oto, EcoRV for e4-tsu, AflII for e4-kam, and BspHI for e4-kes).The PCR products or digestion products were separated by electrophoresis in 0.8% or 3% agarose gel, and visualized under UV light.

Statistical Analyses
Sequence alignment for GmphyA1 and GmphyA2 was done using the CLUSTALW algorithm [46].Sequence variability was estimated using the DnaSP software (v.5.0) [47].Using this software, we calculated the number of segregating sites (S), the number of haplotypes (Hap), Tajima's estimator of diversity ( ) [25], and Watterson's estimator ( ) [26] for synonymous sites and non-coding regions (s) and for non-synonymous sites (a).Haplotype networks were constructed from informative DNA polymorphisms, and then adjusted for haplotype-specific polymorphisms.

Conclusions
We detected four novel dysfunctional alleles at the E4 locus in early-maturing, photoperiod-insensitive soybean accessions from various geographical origins in East Asia.These alleles have accumulated independently and repeatedly in local populations of northern Japan and northeastern China.The E4 locus may therefore be a key player in the adaptation of soybean to LD conditions of high latitudes and diverse cropping systems.The allele-specific markers developed in this study will be useful tools to assess the genotypes and facilitate marker-assisted selection in breeding of cultivars adapted to higher latitudes.Comparison of the two phyA sequences may provide insights into why the mutations have accumulated only in E4, and not in its homoeolog GmphyA1.The lower nucleotide diversity at non-synonymous sites in GmphyA1 relative to E4 suggests an unknown functional divergence between the two homoeologs despite their redundant functions in photoperiodic

Figure 1 .
Figure 1.Photoperiod-insensitive early-maturing soybean accessions analyzed in this study.(A) Geographical distribution of the photoperiod-insensitive accessions from East Asia; (B) Classification of the photoperiod-insensitive accessions by means of UPGMA, based on the similarity of 20 polymorphic isozyme and SSR marker loci.

Figure 2 .
Figure 2. Independent single-nucleotide deletions in the E4 gene that result in premature stop codons that produce dysfunctional truncated proteins.(A) Sites of deletions and of the insertion of a Ty1/copia-like retrotransposon, SORE-1, in the GmphyA2 exons.The deleted nucleotides are presented in parentheses; (B) Premature stop codons (*) generated by the deletions or by the insertion of SORE-1.

Table 1 .
DNA polymorphisms in two homoeologous phytochrome A genes, GmphyA1 and E4 (GmphyA2) in cultivated and wild soybeans.