Accelerating Genetic Gain in Sugarcane Breeding Using Genomic Selection

Sugarcane is a major industrial crop cultivated in tropical and subtropical regions of the world. It is the primary source of sugar worldwide, accounting for more than 70% of world sugar consumption. Additionally, sugarcane is emerging as a source of sustainable bioenergy. However, the increase in productivity from sugarcane has been small compared to other major crops, and the rate of genetic gains from current breeding programs tends to be plateauing. In this review, some of the main contributors for the relatively slow rates of genetic gain are discussed, including (i) breeding cycle length and (ii) low narrow-sense heritability for major commercial traits, possibly reflecting strong non-additive genetic effects involved in quantitative trait expression. A general overview of genomic selection (GS), a modern breeding tool that has been very successfully applied in animal and plant breeding, is given. This review discusses key elements of GS and its potential to significantly increase the rate of genetic gain in sugarcane, mainly by (i) reducing the breeding cycle length, (ii) increasing the prediction accuracy for clonal performance, and (iii) increasing the accuracy of breeding values for parent selection. GS approaches that can accurately capture non-additive genetic effects and potentially improve the accuracy of genomic estimated breeding values are particularly promising for the adoption of GS in sugarcane breeding. Finally, different strategies for the efficient incorporation of GS in a practical sugarcane breeding context are presented. These proposed strategies hold the potential to substantially increase the rate of genetic gain in future sugarcane breeding.


The Commercial Importance of Sugarcane and Production Trends
Sugarcane (Saccharum spp, Poaceae) is a perennial C4 [1,2] grass, which is commercially grown in tropical and subtropical production regions worldwide [3]. Sugarcane is an industrial crop and is one of the oldest cultivated plants in the world. Sugarcane accounts for more than 70% of the total sugar produced globally, mostly consumed as refined sugar. Recently, sugarcane has received attention as an energy crop [4]; in many countries, including Australia [5], and Brazil [6], bagasse (the fibrous part after juice extraction) is burnt by sugar mills to produce electricity to power the mills' operations. Among C4 plants, sugarcane is highly efficient in solar energy conversion and accumulates the highest There were no substantial gains in cane yield in the top sugarcane producing countries in the last two decades (Figure 1). Several countries have been facing yield plateaus, and there are several different potential explanations for these production trends. Pests and diseases, the production potential of the land being used, and climatic conditions are all likely to contribute substantially to the observed reduced yield increases.
The occurrence of new diseases and pests could cause increased losses. Continuing monoculture cropping can build up soil pathogens and nematode pressure, which might be partly responsible for a lack of sugarcane yield increase worldwide [14]. Additionally, diseases have been observed to substantially impact sugarcane yield . Ratoon stunting disease (RSD) is one of the most economically important sugarcane diseases worldwide. Reported yield losses due to RSD are 15-50% in irrigated and rainfed trials in South Africa [15] and 29% in Fiji [16]. RSD primarily affects yield, while key quality characteristics like sugar content are only minimally affected. In 2000, a relatively new pathogenic race of orange rust destroyed the high yielding sugarcane variety Q124, which accounted for approximately 45% of the crop in Australia. It amounted to a loss of nearly $200M for the There were no substantial gains in cane yield in the top sugarcane producing countries in the last two decades (Figure 1). Several countries have been facing yield plateaus, and there are several different potential explanations for these production trends. Pests and diseases, the production potential of the land being used, and climatic conditions are all likely to contribute substantially to the observed reduced yield increases.
The occurrence of new diseases and pests could cause increased losses. Continuing monoculture cropping can build up soil pathogens and nematode pressure, which might be partly responsible for a lack of sugarcane yield increase worldwide [14]. Additionally, diseases have been observed to substantially impact sugarcane yield . Ratoon stunting disease (RSD) is one of the most economically important sugarcane diseases worldwide. Reported yield losses due to RSD are 15-50% in irrigated and rainfed trials in South Africa [15] and 29% in Fiji [16]. RSD primarily affects yield, while key quality characteristics like sugar content are only minimally affected. In 2000, a relatively new pathogenic race Agronomy 2020, 10, 585 3 of 21 of orange rust destroyed the high yielding sugarcane variety Q124, which accounted for approximately 45% of the crop in Australia. It amounted to a loss of nearly $200M for the Australian sugarcane industry [17]. In this case, a considerable reduction in sugar content was also reported. Another major disease that affects sugarcane crops worldwide is sugarcane smut, which can have devastating impacts on yield. The estimated average potential losses due to sugarcane smut in the Herbert region in Australia was 26% [18]. Nearly 70% of the Australian sugarcane cultivars were susceptible to smut before 1998 [19]; sugarcane smut resistance is now one of the primary breeding objectives for Australian sugarcane. There was a significant increase in smut-resistance crosses in Australian breeding programs from 0.4 to 52% between 2000 to 2007 [20], nearly doubling the smut-resistant clones by the end of 2011 [21]. Many successful smut-resistant varieties are now bred in many sugarcane breeding programs worldwide.
The expansion of the sugarcane industry onto marginal land could be another possible reason that yield per hectare has plateaued. Regions that require significantly more inputs, such as irrigation, fertilizers, and high transportation costs, are now used to grow sugarcane. The adoption of mechanical harvesting in some countries and long-term degradation of soil fertility associated with cultivation might also have limiting effects on the productivity trends [3].
Extreme weather can also have significant impacts on sugarcane yield. In Fiji, favorable growing conditions in 1994 resulted in 5.2M tons of national production. In subsequent years, sugarcane production was reported to be reduced by half in the same region because of extreme climatic fluctuation [22]. Similar observations were reported in China in 2003-2004, where drought decreased average cane yields by around 18% [23]. However, as there is no evidence that these negative impacts have increased over the periods of low productivity improvement, the impact of environment-management is not sufficient to explain the continuous slow rate of improvement in sugarcane yield over time.
In addition to improving management practices, the genetic improvement of modern cultivars is a main avenue to enhance productivity in sugarcane. To overcome static yield trends, intensified breeding efforts are needed to develop new, improved varieties. However, there are several factors inherent to sugarcane biology, management and breeding practices that impose difficulties on the realization of genetic improvement through breeding.

Development of Modern Cultivars and Inherent Challenges
Sugarcane (S. officinarum) has been cultivated in India, China, and Papua New Guinea for sugar production for 10,000 years. The first sugarcane breeding programs were established in Java and Barbados in the late 1800s after the discovery that sugarcane can produce viable seeds [3,24]. Until the first quarter of the 20th century, sugarcane varieties used in industrial-scale production of sugar were S. officinarum clones, also known as a noble cane, originating from New Guinea. It is reported that S. officinarum species were domesticated from wild S. robustum in New Guinea around 8,000 years ago [3]. Unlike S. officinarum Indian cane (S. barberi) and Chinese cane (S. sinense) are derived from interspecific hybridization between octoploid S. officinarum (2n = 80) and S. spontaneum (2n = 40-128) with varying ploidy levels [25].
Historically, S. officinarum species had good commercial milling characteristics such as high sugar content, low impurity levels, and low fiber. However, this species lacked vigor, ratooning performance, and was susceptible to several diseases [24]. S. spontaneum is a genetically diverse wild species that is characterized by a lower commercial merit than S. officinarum, because of thin stalks and low sucrose content. Conversely, compared to S. officinarum, S. spontaneum has an increased ratooning capacity, a higher fiber level, and an overall superior adaptive capacity, characterized by an ability to perform better in unfavorable environmental conditions, such as drought, flood, or high salinity [26].
The genetic improvement of sugarcane can be divided in three main phases [27]. The first phase began with screening and intercrossing among S. officinarum clones. The major limitation of this approach was that noble canes, and hence progeny created from intercrossing, were susceptible to biotic and abiotic stresses. This led to the second phase, which involved the development of cultivars derived Agronomy 2020, 10, 585 4 of 21 from interspecific hybridization between S. officinarum and S. spontaneum, and continuous backcrossing efforts with S. officinarum clones. Interspecific hybrids between S. officinarum and S. spontaneum were able to combine a high cane yield potential with increased disease resistance and improved ratooning ability.
An example is the cultivar "POJ2878," which led to a significant increment in productivity [28]. Many commercial cultivars used around the world today can be traced back to this cultivar [29]. In the third phase of modern genetic improvement of sugarcane, interspecific hybrids that were created in phase two were intensively exploited, through intercrossing among selected hybrids and recurrent selection among newly created progeny. This practice initially led to significant increases in genetic gain and still represents the main breeding strategy today.
Improved sugarcane varieties have played a pivotal role in the development of sugar industries throughout the world. There was a significant change in Hawaii's sugar yield from 1915-2003 by continuously updating sugarcane varieties. At the end of 2003, annual sugarcane production was around 15t/ha in Hawaii. Approximately 50% of Hawaii's sugar yield gains resulted from the genetic improvement of varieties [30]. The sugar yield in Colombia increased from 5t sugar/ha-year at the end of the 1950s to 8 t sugar/ha-year in the 1970s and recorded 12 t/ha-year at the end of 2000 [31]. Sugarcane production in Brazil and India increased throughout the same period and reached nearly 64-70 t/ha by the end of 2000. Results of a long-term study investigating productivity trends from 1968 to 2000 in Florida demonstrated significant improvements in cane and sucrose yield across the plant cane in first and second-ratoon crops. The positive impacts of genetic gain increases on Florida's sugarcane industry played a significant role in the country's economy across those years [32].
However, the observed increases in sucrose yield for the most recent varieties in Florida (unpublished data from a 2011 study) were associated with an increase in total cane yield, rather than improvements in CCS [13]. Similar results were reported from three small scale studies conducted in Australia where no significant differences for CCS could be found between older and new varieties [33]. Thus, genetic gain for key traits, particularly sucrose content and, to some extent, cane yield, has been stagnating in the past ten years in some countries. Conversely, genetic improvements for disease resistance achieved through traditional breeding programs have been very substantial.
On a global scale, most modern sugarcane cultivars are the product of only a few interspecific crosses between approximately 15-20 genotypes that can be traced back to ancestral sugarcane clones developed in Java and India [27]. In modern breeding programs, relatively old genetic material (>50 years old) is still widely used in crossing designs to create new varieties [34]. Thus, there have been few opportunities (~7-9 breeding generations) for chromosome recombination from the original founders. One consequence of the foundation bottleneck is strong genome-wide linkage disequilibrium (LD) patterns observed in elite germplasm [35] and a narrow genetic base in modern sugarcane germplasm [36].
Commercial hybrids originate from the initial hybrid (S. officinarum × S. spontaneum), which would have 2n transmission from the S. officinarum parent and n transmission from the S. spontaneum [37,38]. The hybrid is then crossed back to other hybrids to recover the high sugar phenotype, which breaks down the hybrid into n + n transmission [38]. Because of the narrow genetic base of important traits, genetic diversity could be reintroduced in sugarcane by utilizing the potential of wild relatives that are considered reservoirs of potentially useful alleles for important economic traits that might have been lost during domestication and breeding. Such practices of continual introgression of wild material into commercial breeding programs are used intensively in some breeding programs, e.g., in Louisiana.
New commercial hybrid cultivars have a complicated chromosome set, ranging between 2n = 100−130; 80% of the chromosomes are of S. officinarum origin, 10-15% of the chromosomes are of S. spontaneum origin, and the rest of the chromosomes are a combination of the two species [39][40][41][42][43]. Eight to 14 homo(eo)logus copies of alleles at a given locus in the hybrid genome are reported in the literature [44,45]. While the haploid genome of sugarcane is estimated at 1 Gb, the total size of the sugarcane nuclear genome is approximately 10Gb [46,47], making it ten times larger than the closest related genome sequenced species, which is sorghum [48].
The extreme polyploid genome of interspecific hybrids possesses irregular genetic characteristics that are passed from both parental species, making it more complicated than that of its precursors [40]. This phenomenon contributes substantially to the high level of heterozygosity observed between sugarcane cultivars [49]. Because of the random sorting of chromosomes in each crossing, the number of chromosomes varies between genotypes. The complex genetic composition of modern hybrids which are referred to as poly-aneuploids also results in inherent polygenic control of important agronomic traits. This complex genetic structure potentially makes the selection procedure slower and more complicated than in other major crop species.

Identifying and Overcoming Bottlenecks in Breeding Programs Using the Breeder's Equation
The overarching objective of any breeding program is to create new germplasm with improved genetic merit. The rate at which this improvement is realized in a given timeframe is referred to as "genetic gain." Increasing genetic gain in crop breeding has been identified as one of the key steps towards meeting the increasing future demand for plant-based products. In the context of the breeder's equation (Equation (1)), genetic gain (∆G) [50] can be understood as the improvement in the mean genetic value of a trait of interest for a population over a defined time period, e.g., one breeding cycle. Following this equation, the expected rate of genetic gain that can be achieved in a given breeding cycle can be calculated as where ∆G is the rate of genetic gain, i represents the selection intensity, h 2 represents the narrow-sense heritability of the desired trait, σ P is the observed phenotypic variation, and L is the total interval in time units to complete one cycle of selection. Selection intensity is related to the proportion of selected individuals as parents of the next breeding cycle, usually expressed in standard deviation units from the mean (assuming that most phenotypes are normally distributed). The narrow-sense heritability is a measure of the heritable (additive) genetic variation in the population relative to the total observed phenotypic variation (σ P ) in the population. This equation shows that increasing the heritability, selection intensity, and phenotypic variation increase the rate of genetic gain, while decreasing the breeding cycle length has the same effect. The breeder's equation (Equation (1)) provides a useful quantitative framework for the identification of potential bottlenecks in breeding programs that limit the rate of genetic gain, and for developing strategies to address these bottlenecks to accelerate gain in optimized breeding schemes.

Practices and Limitations of Conventional Sugarcane Breeding
A typical sugarcane breeding scheme ( Figure 2) follows four key steps which include (i) the generation of a large progeny population generated from targeted crosses, (ii) the evaluation of those progeny in different phenotyping stages, (iii) the selection of clones with superior characteristics, and (iv) the recombination of selected clones to initiate the next breeding cycle [51].
To initiate a breeding cycle, parental clones are selected from a source population that has been characterized for key major commercial and agronomical traits, such as tones cane per hectare (TCH), sugar content measured as commercial cane sugar (CCS), fiber content and resistance to important diseases in the target environments. These parental clones may be sourced from intra-or inter-national breeding programs. The selected clones are crossed to create a large number of seedlings that are tested as families and then clonally propagated and selected throughout the remaining phenotypic testing phases [51] (Figure 2). Finally, the clones used as crossing parents are assessed based on the performance of their progeny. If the progeny performs relatively well, the corresponding crossing parents and the cross will be identified as "proven parents" and "proven cross," respectively, and may be used repeatedly to produce thousands of offspring that undergo selection.  [51]. A large number of progenies are generated from the targeted crosses, which are evaluated in three different stages of selection, best clones are screened in the advanced stage of selection which is intermated to initiate the new breeding cycle. PAT = progeny assessment trial; CAT = clonal assessment trial; FAT = final assessment trial.
To initiate a breeding cycle, parental clones are selected from a source population that has been characterized for key major commercial and agronomical traits, such as tones cane per hectare (TCH), sugar content measured as commercial cane sugar (CCS), fiber content and resistance to important diseases in the target environments. These parental clones may be sourced from intra-or international breeding programs. The selected clones are crossed to create a large number of seedlings that are tested as families and then clonally propagated and selected throughout the remaining phenotypic testing phases [51] (Figure 2). Finally, the clones used as crossing parents are assessed based on the performance of their progeny. If the progeny performs relatively well, the corresponding crossing parents and the cross will be identified as "proven parents" and "proven cross," respectively, and may be used repeatedly to produce thousands of offspring that undergo selection.
For clonal improvement, the selection procedure depends on the crop-cycle length and number of ratoon cycles, which typically varies among breeding stations. In Australia, the conventional breeding program involves three stages that include seven years of selection, and three years of propagation [52]. At each stage, the top 5-10% of clones are progressed to the next stage of selection, and finally, a cultivar is released. In the process of releasing a variety, breeders test selected candidates in replicated multi-location trials to screen elite clones with high agronomic performances across a range of different environments. Because of the biology and management of sugarcane and the extensive phenotypic testing system, it can take more than ten years to complete a breeding cycle and even longer to commercially release a new cultivar.
In the context of increasing genetic gain using the breeder's equation framework (Equation (1)), it is widely reported that significant favorable genetic variation exists among the clones of Saccharum species. Since it is the additive genetic variance that selection acts on, this could potentially improve genetic gain in sugarcane. Increasing the selection intensity could also potentially lead to an improvement in genetic gain. However, simulation studies have shown that increasing the selection Figure 2. The conventional sugarcane breeding scheme. Adapted from [51]. A large number of progenies are generated from the targeted crosses, which are evaluated in three different stages of selection, best clones are screened in the advanced stage of selection which is intermated to initiate the new breeding cycle. PAT = progeny assessment trial; CAT = clonal assessment trial; FAT = final assessment trial.
For clonal improvement, the selection procedure depends on the crop-cycle length and number of ratoon cycles, which typically varies among breeding stations. In Australia, the conventional breeding program involves three stages that include seven years of selection, and three years of propagation [52]. At each stage, the top 5-10% of clones are progressed to the next stage of selection, and finally, a cultivar is released. In the process of releasing a variety, breeders test selected candidates in replicated multi-location trials to screen elite clones with high agronomic performances across a range of different environments. Because of the biology and management of sugarcane and the extensive phenotypic testing system, it can take more than ten years to complete a breeding cycle and even longer to commercially release a new cultivar.
In the context of increasing genetic gain using the breeder's equation framework (Equation (1)), it is widely reported that significant favorable genetic variation exists among the clones of Saccharum species. Since it is the additive genetic variance that selection acts on, this could potentially improve genetic gain in sugarcane. Increasing the selection intensity could also potentially lead to an improvement in genetic gain. However, simulation studies have shown that increasing the selection intensity can diminish the long-term selection response, decrease genetic diversity over time and increase inbreeding [53,54].
Along with long breeding cycles and large proportions of non-additive genetic variance for key traits, breeders typically deal with other practical problems, for instance, the synchronization of flowering, which has been the focus of many studies [60][61][62]. Other factors, such as insufficient replication of new breeding materials in early generation trials, experimental errors, competition between adjacent plots [63,64], and G × E interaction effects [52,65], can also negatively affect the selection response for the target traits.
The most important traits in sugarcane are under quantitative genetic control, meaning that they are controlled by multiple genes along with environmental effects [66]. G × E interaction is an important source of phenotypic variation in sugarcane, especially for CCS and fiber content. G × E is difficult to account for in a breeding program, and therefore G × E interactions can reduce the rate of genetic improvement in sugarcane [52,67,68]. In the breeder's equation framework, this is due to the negative impact of G × E on the trait heritability. The genetic variance of a given trait can be biased by the variation caused by G × E interaction effects. Improved estimates of genetic variance can be obtained by partitioning the variation of G × E effects from the genetic variance [69].
To deal with G × E interaction, breeders typically test their breeding germplasm in multienvironment trials (MET), which ideally are a representative sample of the target production environment (also referred to as the target population of environments, TPE) and cover several locations and years. Breeders can significantly minimize the risk associated with fluctuating environmental conditions and improve the efficiency of their breeding program by understanding G × E interaction for their specific genotype-environment system. Several statistical methods have been developed specifically to explore and account for G × E interaction in plant breeding, essentially aiming to minimize its negative impact on the selection accuracy [70]. The development of methodologies and strategies that enable performance prediction under G × E interaction, especially for situations in which the aim is to predict the performance of novel genotypes in new (i.e., untested) environments, is a wide and active field of research [71].
Today, the vast majority of sugarcane breeding programs (outlined in Figure 2), which are based on phenotypic selection, are very cost-and time-consuming. Strategies that could enable the reduction of cycle length, as well as approaches that are more adequate for performance prediction and breeding value estimation, would be a major step forward for improving sugarcane breeding programs in the future.

Genomic Selection: A Powerful New Breeding Tool
Genomic selection (GS) is a relatively new breeding method in which individuals are selected based on their predicted breeding values that are calculated from genome-wide DNA marker profiles [72]. Decreasing costs of DNA marker screening methods such as high-density SNP arrays and genotyping by sequencing (GBS) approaches, and the development of statistical methods that can accurately predict marker effects are the main reasons why GS has increasingly been implemented in modern animal and plant breeding programs [73,74]. Two main avenues by which GS can accelerate the rate of genetic gain is by improving the accuracy at which individuals are selected and by reducing the length of the breeding cycle. However, the incorporation of GS into a breeding program is not a trivial task. It highly depends on several factors, such as the mating type, the genetic architecture and heritability of the target traits, the availability of genotyping platforms, and the total financial budget of the program to build large reference populations that are necessary to accurately estimate the typically small effects of DNA markers that are associated with the underlying causal mutations that affect the traits [53,75].
Conceptually, GS involves two main steps (Figure 3). The first step is to develop a prediction equation based on a training population (TP) that consists of individuals for which both high-quality phenotypes and genome-wide DNA marker profiles have been obtained.
The fundamental requirement for GS to work is that quantitative trait loci (QTL, the actual mutations) that are affecting the expression of the target trait are in LD with the DNA markers that are used for genotyping [72,75]. If this requirement is met, trait effects for DNA markers can be estimated and used in the prediction equation. In the second step, these marker effects are used to calculate the genomic estimated breeding values (GEBVs) of selection candidates (prediction population; PP) for which only genome-wide marker data (but no phenotypic data) are available. Genotypes can then be ranked based on their GEBVs to support selection decisions in a breeding program.
Agronomy 2020, 10, x FOR PEER REVIEW 8 of 22 estimate the typically small effects of DNA markers that are associated with the underlying causal mutations that affect the traits [53,75]. Conceptually, GS involves two main steps (Figure 3). The first step is to develop a prediction equation based on a training population (TP) that consists of individuals for which both high-quality phenotypes and genome-wide DNA marker profiles have been obtained. The fundamental requirement for GS to work is that quantitative trait loci (QTL, the actual mutations) that are affecting the expression of the target trait are in LD with the DNA markers that are used for genotyping [72,75]. If this requirement is met, trait effects for DNA markers can be estimated and used in the prediction equation. In the second step, these marker effects are used to calculate the genomic estimated breeding values (GEBVs) of selection candidates (prediction population; PP) for which only genome-wide marker data (but no phenotypic data) are available. Genotypes can then be ranked based on their GEBVs to support selection decisions in a breeding program.
A number of statistical models and algorithms have been developed to deal with the problem that, in most situations, the number of DNA markers for which effects are to be estimated strongly exceeds the number of phenotypic observations, including parametric, Bayesian, and non-parametric methods. The most commonly used statistical methods RR-BLUP and GBLUP (which yield mathematically equivalent results) assume a normal distribution of SNP effects, while Bayesian approaches like BayesA, BayesB, BayesC(pi), and BayesR consider different variance distributions to allow for differences in marker effect sizes [76][77][78]. Kernel methods [79] utilize the distance (similarity) matrix, which is particularly useful for predicting non-additive effects. They also allow handling of complex multi-environment and/or multi-trait data and are, therefore becoming very popular in plant breeding [80].
A fundamental step for the implementation of GS is the development of the training population (TP). Numerous studies have demonstrated that in order to obtain high prediction accuracies, the TP has to be large and should include individuals with varying degrees of relationship [81][82][83]. A number of statistical models and algorithms have been developed to deal with the problem that, in most situations, the number of DNA markers for which effects are to be estimated strongly exceeds the number of phenotypic observations, including parametric, Bayesian, and non-parametric methods. The most commonly used statistical methods RR-BLUP and GBLUP (which yield mathematically equivalent results) assume a normal distribution of SNP effects, while Bayesian approaches like BayesA, BayesB, BayesC(pi), and BayesR consider different variance distributions to allow for differences in marker effect sizes [76][77][78]. Kernel methods [79] utilize the distance (similarity) matrix, which is particularly useful for predicting non-additive effects. They also allow handling of complex multi-environment and/or multi-trait data and are, therefore becoming very popular in plant breeding [80].
A fundamental step for the implementation of GS is the development of the training population (TP). Numerous studies have demonstrated that in order to obtain high prediction accuracies, the TP has to be large and should include individuals with varying degrees of relationship [81][82][83]. Daetwyler et al. [84] reported an improvement in prediction accuracy of 50% by increasing the TP size from 500 to 2000. For wheat, Cericola et al. [85] observed an increment in prediction accuracy with an increase of the size of the TP, which included full-sibs, half-sibs, and less related lines from three continuous breeding cycles. This trend reached a plateau at around 700 breeding lines.
The expected prediction accuracy can be calculated as r = N h 2 N h 2 +M e [86,87], in which r (the expected prediction accuracy) is affected by the size of the TP (N), the heritability of the trait (h 2 ), and the effective number of independent chromosomes segments in a given population (M e ) which is Agronomy 2020, 10, 585 9 of 21 calculated as 2 × N e (effective population size) × L (the genome size in Morgan). To maximize GEBV accuracy, the TP should be related to the PP [88]. M e can be estimated empirically by the mean LD (r 2 ) between all pairwise SNPs [89] or by using specific family structures [90].
To maintain a high prediction accuracy in GS-based breeding programs, the TP must be frequently updated with new phenotyped and genotyped accessions [91,92]. This is mainly due to the decrease in marker-QTL LD because of recombination events over time. For example, Auinger et al. [93] trained a prediction model for a rye breeding program by using multiple breeding cycles and demonstrated that prediction accuracies were significantly increased when the prediction model was constantly updated as the breeding program advanced.
Good quality phenotypic and genotypic data are the key factors to take full advantage of GS [73]. Because of inevitable constraints in operating budgets, breeders are always interested in finding the minimum number of markers needed to obtain to get useful GEBVs. The extent of LD (affected by N e , and population structure) helps to determine the number of markers required for GS. High marker densities are desired for the prediction of far related individuals [53], because of reduced LD.
GS was implemented in animal breeding prior to its introduction to plant breeding. The implementation of GS in dairy cattle breeding programs have resulted in significant improvements compared to traditional phenotypic selection [94]. The reduction in total generation interval from 7 years to 1 year (young bulls are being ranked based on their GEBVs, and selected for artificial insemination) has almost doubled the rate of genetic gain. Furthermore, there has been a reduction in costs for progeny testing [94,95]. Interestingly, genetic gains were also reported for low-heritability traits such as disease resistance and fertility [73]. Consequently, GS has been implemented on a very large scale in other animal species such as beef cattle, pigs, sheep, and chicken [96,97].
In plant breeding, the potential of GS was first evaluated in corn (Zea mays L.) using simulations [98]. A range of simulation studies in different crop species such as wheat [92], barley [99], rice [100], and sorghum [101] have shown that implementing GS could result in a significant increase in genetic gain. However, only limited reports are available in crops on the realized genetic gain that were achieved as an outcome of implementing GS. One example is given by the drought-tolerant "AQUAmax" hybrid corn variety, which was created by integrating GS with enhanced phenotyping and crop growth modelling in a commercial maize breeding program [102]. Significantly higher yields were reported in the United States when growing "AQUAmax" maize hybrids under both drought and favorable conditions, with considerably improved yield stability underwater limitation [103].

Implementation of Genomic Selection in Sugarcane Breeding
Increasing the rate of genetic gain is a big challenge in sugarcane breeding, as implied by the static or slowly increasing yield trends in most countries. Several reasons for the observed yield plateaus have been proposed, such as a narrow genetic base of modern elite germplasm [36], highly complex genetic architectures for agronomically important quantitative traits for which non-additive gene action is likely playing a significant role, and very long breeding cycle lengths [34].
The use of molecular markers has become a standard practice in most important crop species. Traditionally, plant breeders have incorporated molecular markers in phenotypic selection for monoor oligogenic traits to increase the efficiency of the breeding program. For instance, marker-assisted selection (MAS) has proven to be a practical approach for single gene introgression or pyramiding multiple genes in elite cultivars, to improve disease resistance or grain quality [104]. Despite the fact that a range of QTL mapping studies has been undertaken in sugarcane [105], the size and complexity of the sugarcane genome have limited DNA marker-based selection in this crop [44]. Generally, MAS has been largely ineffective for the improvement of highly quantitative traits because of several technical reasons that have been discussed extensively in the literature [106,107]. Polygenic traits are typically controlled by a huge number of QTL, each having infinitesimal small effects, or possibly with interactions among them as well as with environmental factors [108].
GS can be a promising tool for improving the rate of genetic gain for quantitative traits in sugarcane breeding. Since GS has not extensively been investigated in sugarcane and other highly polyploid crops, increased evaluation and validation efforts are needed to better understand the challenges associated with the implementation of the technology in breeding programs. A recent study investigated the potential use of GS in tetraploid potato and octoploid strawberry by the use of SNPs markers and partial sequence data, respectively. The authors concluded that the actual advantage of GS depends on the underlying genetic architecture of the trait [109]. For genetic improvement of quantitative traits in octoploid strawberry (e.g., yield and fruit quality), GS has been strongly recommended in practical breeding programs because of high prediction accuracies found in true validation trials [110].
Gouy et al. evaluated the potential of GS for sugarcane breeding in two different panels from a commercial breeding program in Reunion Island and Guadaloupe consisting of 167 clones each [111]. All 334 clones were genotyped with 1499 DArT markers and phenotyped for ten agronomically important traits. By comparing four genomic prediction models (Ridge Regression, Bayesian Lasso, Partial Least Square Regression, Reproducing Kernel Hilbert Space), prediction accuracies ranged from 0.11-0.62 within the panels and 0.13-0.55 between panels across the ten investigated traits which included morphological trait (stalk diameter, and millable stalk number), technological traits (bagasse content, brix), lignocellulosic traits (acid detergent fiber, invitro neutral detergent fiber digestibility of the bagasse, acid detergent lignin), and resistances to different diseases (yellow leaf disease, smut, and brown rust) [111]. These prediction accuracies seem promising, particularly when considering the relatively small size of the TP that was used in the study.
In another study, three different populations of clones from early and advanced selection stage of an established sugarcane breeding program were used to estimate the prediction accuracy of cane yield and sugar content. Different genomic prediction models (GBLUP, BayesA, BayesB, Bayesian LASSO, and RKHS) were compared with or without the use of pedigree information. The prediction accuracy for sugar content was highest in advanced stage trials while it was lower for cane yield. The prediction accuracies ranged from 0.25-0.45 in most data sets, which is promising and strongly supports the potential usefulness of GS for sugarcane breeding [112].
In sugarcane, modern germplasm can be traced back to only a small number of founder clones, which suggests that the effective population size N e in elite germplasm is small. This is consistent with the high levels of LD reported in modern sugarcane breeding populations [35]. However, a considerable number of SNP markers still needs to be used to achieve accurate predictions due to the large size and complexity of the sugarcane genome.
Unlike major crops such as corn, wheat, or rice, high throughput genotyping is still relatively expensive in sugarcane (~AUD 95 per sample using the 50k Axiome SNP array). The cost associated with genotyping is still a major limiting factor for large scale genomic evaluation in commercial breeding programs. In addition to genotyping, high-throughput, and precision phenotyping, e.g., in multi-environment or managed trials, should be considered more seriously when GS is implemented because of potential negative effects of G × E interactions on genomic prediction accuracy [69]. Parameters that quantify critical environmental conditions could also be included in genomic prediction models to increase the heritability and hence the prediction accuracy for the target trait [113].
The use of advanced phenotyping methods might be helpful for improving the prediction accuracy in sugarcane. One main consideration is how to effectively use available information from modern high-throughput phenotyping in genomic prediction models [114]. An extensive review is given by Van Eeuwijk et al. [113] regarding a range of genotype-to-phenotype (G2P) modelling methods for the use of high-throughput phenotypes measured in field trials. The main idea is to collect data on secondary traits, e.g., time series traits such as dynamic measurement of canopy architecture or biomass, and include these data as covariates in genomic prediction models. Since this could allow to specifically target component traits that are important for performance under specific environmental conditions, approaches like this have the potential to better account for variation caused by environmental factors. Therefore, the accuracy of estimated genetic merit of breeding germplasm in a given environmental context could ultimately be improved, which would directly translate into an increase in genetic gain.
Allelic and non-allelic (dominance and epistasis) interactions for target traits can create potential challenges for the implementation of GS in sugarcane breeding. The presence of dominance and epistasis genetic effects can change the average effects of allele substitution among populations that are targeted for selection in a breeding program because of the changes in allele frequencies that selection causes [115]. This results in a complicated situation when the ranks of the genotypes change as a consequence of changes in marker effect estimates [73]. Thus it is particularly necessary to update the training data set in the presence of strong epistatic effects [91]. This makes GS more expensive to implement for crop breeding.
The underlying assumption of most common genomic prediction approaches is that quantitative traits are determined by many additively acting genes. While approaches based on this assumption have been applied very successfully in plant and animal breeding, there is ample biological evidence that gene-gene interactions (epistasis) are important for agronomic traits. Because sugarcane cultivars are deployed as clones, all genetic effects could be utilized, and the accurate prediction of additive and non-additive genetic effects would be of great value in the future for predicting clonal performance and selecting parents for the next breeding cycle. Cheverud and Routman [116] proposed a new quantitative genetic parametrization for the analysis of physiological epistasis (i.e., on the genotype level) to understand the effect of gene-by-gene interaction on variance components that are important for quantitative genetics and breeding (additive, dominance and epistasis). They concluded that epistasis could be a source of increased additive genetic variance in populations that have undergone selection [117]. The use of extended statistical models that consider non-additive effects could be beneficial to derive precise marker effects and, ultimately, high prediction accuracies in crop breeding [118].
One challenge in polyploid species is to correctly distinguish between different types of heterozygotes. In polyploidy species, pseudo-diploid models are commonly used to account for heterozygosity. Polyploidy can create phenotypic variation through allele dosage. For instance, significant phenotypic differences in fruit size in tomato and plant architecture in corn were associated with allele dosage [119]. Therefore, the inclusion of allele dosage information has become a matter of high interest for genetic studies in polyploidy species. The explicit consideration of allele dosage in genomic prediction models might improve the prediction accuracy by providing a more realistic representation of genotypic class effects. For potato, an autotetraploid species, Endelman et al. showed significantly higher prediction accuracies by including digenic effects as well as accounting for allelic dosage using data from a SNP array [120]. Conclusively, the adequate treatment of non-additive effects and allele dosage in GS models could be very beneficial for sugarcane.

Recurrent Genomic Selection and Reciprocal Recurrent Genomic Selection: Two Strategies for the Incorporation of Genomic Selection in Sugarcane Breeding
Regarding the implementation of GS in sugarcane breeding, a key question is how to incorporate the technology into an existing breeding program. The first critical step in any breeding program is to create new genetic variation. In conventional sugarcane breeding, a large number of seedlings is created through targeted crossing, followed by several selection stages that aim to determine the relative genetic merit of the new germplasm in designed field trials. From the perspective of increasing genetic gain, a key bottleneck with this conventional approach is that alleles are only recombined in the crossing stage at the beginning of the breeding cycle. This could potentially be overcome by a breeding strategy called recurrent genomic selection (RGS) (Figure 4) which aims to rapidly improve the genetic merit of a population of heterozygous genotypes through rapid, recurrent selection and crossing of elite germplasm, and to simultaneously channel selected clones into advanced testing stages that ultimately develop commercial products. rapidly improve the genetic merit of a population of heterozygous genotypes through rapid, recurrent selection and crossing of elite germplasm, and to simultaneously channel selected clones into advanced testing stages that ultimately develop commercial products. Heffner et al. [53] first proposed the idea to separate population improvement from line development in a genomics-assisted plant breeding program. Later, Gaynor et al. [121] investigated RGS for a line breeding program using simulations by splitting the breeding program into a population improvement component and a product development component (cultivar release). They showed that a RGS-based program could generate up to 2.5 times more genetic than a conventional phenotypic selection scheme, and up to 1.5 times more genetic gain than the best-performing standard GS strategy in which GS is used to improve selection within the breeding cycle. A key role of phenotyping in a genomics-assisted breeding program is to (re)estimate marker effects. Changes in allele frequencies in populations under selection and epistatic gene-action result in changes in marker effect estimates that might reduce selection accuracy and hence realized genetic gains from GS-based breeding strategies [73]. Thus, there is a need for constant updating of the prediction model in each selection cycle, especially in an RGS system in which generation turnover and hence the number of recombination events is accelerated.
RGS breeding schemes that prioritize parents with high general combining ability typically capture and improve additive genetic effects in each generation cycle. The use of RGS for interpopulation improvement may boost long-term selection gain in hybrid sugarcane breeding.
To maximize the response in crossbred populations, reciprocal recurrent selection (RRS) was proposed by Comstock et al. [122]. The RRS breeding scheme aims to simultaneously improve two genetically diverse, purebred populations that are used for targeted crossbreeding, ultimately aiming to maximally explore both general and specific combining ability. Individuals from purebred populations are selected based on their crossbred progeny performance. For instance, RRS was very successfully applied to improve general combining ability and specific combining ability for root yield and sucrose-content in sugarbeet [123], and grain yield and prolificacy in maize [124]. The main Heffner et al. [53] first proposed the idea to separate population improvement from line development in a genomics-assisted plant breeding program. Later, Gaynor et al. [121] investigated RGS for a line breeding program using simulations by splitting the breeding program into a population improvement component and a product development component (cultivar release). They showed that a RGS-based program could generate up to 2.5 times more genetic than a conventional phenotypic selection scheme, and up to 1.5 times more genetic gain than the best-performing standard GS strategy in which GS is used to improve selection within the breeding cycle. A key role of phenotyping in a genomics-assisted breeding program is to (re)estimate marker effects. Changes in allele frequencies in populations under selection and epistatic gene-action result in changes in marker effect estimates that might reduce selection accuracy and hence realized genetic gains from GS-based breeding strategies [73]. Thus, there is a need for constant updating of the prediction model in each selection cycle, especially in an RGS system in which generation turnover and hence the number of recombination events is accelerated.
RGS breeding schemes that prioritize parents with high general combining ability typically capture and improve additive genetic effects in each generation cycle. The use of RGS for inter-population improvement may boost long-term selection gain in hybrid sugarcane breeding.
To maximize the response in crossbred populations, reciprocal recurrent selection (RRS) was proposed by Comstock et al. [122]. The RRS breeding scheme aims to simultaneously improve two genetically diverse, purebred populations that are used for targeted crossbreeding, ultimately aiming to maximally explore both general and specific combining ability. Individuals from purebred populations are selected based on their crossbred progeny performance. For instance, RRS was very successfully applied to improve general combining ability and specific combining ability for root yield and sucrose-content in sugarbeet [123], and grain yield and prolificacy in maize [124]. The main practical drawback of RRS is that generation intervals need to increase substantially, which can lead to a reduction in the overall genetic response to selection. An increase in generation intervals is necessary for RRS because selection decisions are made based on the performance of the crossbred progeny [122]. In the RRS scheme, GS can be used to predict crossbred performance and prioritize certain combinations of accessions from the distinct purebred pools. This practice is widely used in modern maize breeding [125].
In oil palm, Cros et al. [126] concluded that reciprocal recurrent genomic selection (RRGS) could increase annual gains by reducing the breeding cycle from 20 to six years compared to conventional RRS. Hence, RRGS seems to be a promising method to achieve long-term genetic gain under situations where traits are affected by heterosis, and when the breeding cycle is very long, as in the oil palm example. Rembe et al. [127] suggested that using RRGS-based breeding strategies that integrate product development and population improvement can increase long-term genetic gain in hybrid wheat breeding. Similar trends could be achieved in sugarcane.
Considering the importance of specific along with general combining ability effects in determining the performance of crosses, implementation of a modified version of RRGS, as shown in Figure 5, might improve long-term genetic gain in hybrid sugarcane breeding. Such a breeding scheme could begin with developing a genomic prediction model by using a reference population comprising a large number of progeny generated from a proven cross, say A × B where parent A and parent B are unrelated. One of the parents (e.g., parent B) and its derived self-progeny would be selected based on the predicted breeding value using a previously developed genomic prediction equation. If selfing is not feasible, very closely related clones (e.g., from one family) could be used instead. Several self-clones derived from parent B with high predicted breeding value would then be crossed with the opposite parent (Parent A). Potentially, the new crosses from the selected self-clones are better than the original high-value cross because of the improved genetic merit of the B-derived clones. The selected self-clones could also be crossed together and undergo further ongoing improvement cycles via rapid RGS. A similar breeding system could be initiated with a small number (2 or 3) parents on one, or both A and B sides (rather than single parents as in Figure 5), and progeny derived from crossing parents on one side would be selected for high predicted breeding values before crossing them with the opposite side. Extending the theory from Cheverud and Routman (1996) to a situation in which a quantitative trait is controlled by many epistatic QTL, in a modified RRGS breeding scheme, the QTL alleles in the opposite heterotic group could be fixed (remain unchanged). This could result in a genetic model with increased additive genetic variance and reduced statistical epistasis. This could contribute to an increase in predictability, leading to improved selection efficiency and higher genetic gain.
The proposed GS-based breeding schemes can be advantageous when the desired alleles for the traits of interest are available in the breeding germplasm. However, it could be the case that genetic variation for the trait of interest is limited in the primary gene-pool. In that situation, genetic variation in cultivated hybrid pools could be replenished by introgressing novel alleles from wild gene pools. This approach is time-consuming and cumbersome when a trait is affected by a large number of small-effect QTL. A well-designed pre-breeding program in which landraces and wild materials are exploited could be promising in maintaining and managing genetic diversity and long-term genetic gain in breeding programs. Pre-breeding programs could significantly benefit from GS approaches because they could help to prioritize accessions and track introgressions on the molecular level [128]. Incorporating GS without specific knowledge of the target QTL into a gene introgression program in fish was useful in preserving QTL. It sped up the process of introgression of a gene while increasing genetic gain compared to the classical selection, especially for disease resistance [129]. Integration of GS with genome-wide association studies (GWAS) can prevent the loss of target genes and sustain increased genetic gain through an appropriate capture of large-and small-effect QTL underlying a trait of interest [130].
One main drawback of genomic selection is that it can increase the rate of inbreeding per generation. However, Daetwlyer et al. [131] suggested that Mendelian sampling variation can be estimated more accurately using DNA markers, compared to traditional BLUP, and GS could reduce the probability of selecting siblings. Consequently, the inbreeding rate per generation can be reduced when DNA markers are used in the selection process. However, several simulation studies have shown that selection that is purely based on GEBVs can lead to a loss of genetic variance and hence an increase in the rate of inbreeding [75,126].
Agronomy 2020, 10, x FOR PEER REVIEW 14 of 22 Figure 5. Flow diagram of a modified reciprocal recurrent genomic selection breeding scheme for sugarcane. The prediction model is trained by generating hundreds of offsprings from a proven cross of unrelated parents that are known to combine well. Either one or both clones in the cross are selfed, and offspring are selected based on their genomic estimated breeding values. If selfing is not feasible, closely related clones (e.g., from the same family) can be used instead. The selfed offspring is crossed with the opposite parent. GEBV = genomic estimated breeding value.
The proposed GS-based breeding schemes can be advantageous when the desired alleles for the traits of interest are available in the breeding germplasm. However, it could be the case that genetic variation for the trait of interest is limited in the primary gene-pool. In that situation, genetic variation in cultivated hybrid pools could be replenished by introgressing novel alleles from wild gene pools. This approach is time-consuming and cumbersome when a trait is affected by a large number of small-effect QTL. A well-designed pre-breeding program in which landraces and wild materials are exploited could be promising in maintaining and managing genetic diversity and long-term genetic gain in breeding programs. Pre-breeding programs could significantly benefit from GS approaches because they could help to prioritize accessions and track introgressions on the molecular level [128]. Incorporating GS without specific knowledge of the target QTL into a gene introgression program in fish was useful in preserving QTL. It sped up the process of introgression of a gene while increasing genetic gain compared to the classical selection, especially for disease resistance [129]. Integration of GS with genome-wide association studies (GWAS) can prevent the loss of target genes and sustain increased genetic gain through an appropriate capture of large-and small-effect QTL underlying a trait of interest [130].
One main drawback of genomic selection is that it can increase the rate of inbreeding per generation. However, Daetwlyer et al. [131] suggested that Mendelian sampling variation can be estimated more accurately using DNA markers, compared to traditional BLUP, and GS could reduce the probability of selecting siblings. Consequently, the inbreeding rate per generation can be reduced when DNA markers are used in the selection process. However, several simulation studies have Figure 5. Flow diagram of a modified reciprocal recurrent genomic selection breeding scheme for sugarcane. The prediction model is trained by generating hundreds of offsprings from a proven cross of unrelated parents that are known to combine well. Either one or both clones in the cross are selfed, and offspring are selected based on their genomic estimated breeding values. If selfing is not feasible, closely related clones (e.g., from the same family) can be used instead. The selfed offspring is crossed with the opposite parent. GEBV = genomic estimated breeding value.
To avoid inbreeding depression in parental populations, the maintenance of genetic variation is necessary. Increasing the number of selected individuals could slow down the inbreeding rate, but at risk of a reduction in selection response [132]. Many modified selection criteria have been proposed to allow balancing genetic gain and maintaining genetic diversity while applying GS [133][134][135][136][137]. The main idea behind these selection criteria is to determine the exact contribution of an individual to the following generation based on its genetic merit and its genetic relationship with other individuals. Expanding on that principle, Toro and Varona [138] highlighted the potential of mate-allocation within a population. They used genomic prediction models, including dominance effects, to predict the performance of offspring generated through mating pairs of individuals. This was followed by an optimization procedure in which a set of mate pairs that can maximize performance in the subsequent generation was selected. In this example, selection and mating were simultaneously performed for improving the management of inbreeding. The advantage of an adequate mate allocation strategy is particularly relevant for improving complex traits with a high amount of non-additive genetic variance [118].
There are only a few studies that have investigated GS for sugarcane, and the empirical evaluation of different implementation strategies is impractical. Breeding simulations are an elegant way to assess the potential impacts that GS can have on sugarcane breeding efficiency because they require only a few physical resources. Furthermore, simulations can accommodate different genetic models with varying numbers of genes/alleles, dominance, epistatic gene effects, and also handle genotype-environment interaction effects [139]. A breeding program typically operates on fixed budgets, making the optimal allocation of resources very critical for breeders. Simulations allow one to investigate and compare breeding methods in terms of genetic gain and cost-effectiveness. Extensive simulation studies are needed in sugarcane to identify the best potential GS-based breeding scheme designs, e.g., RGS or RRGS, as discussed above, that can generate the highest rate of genetic gain per unit cost and time. Empirical validation experiments are then critical to test the most promising strategy in a practical breeding context. Thus, increased simulation efforts could provide valuable information and decision support for the design of empirical validation experiments, and ultimately for the efficient implementation of GS in practical sugarcane breeding.
While GS has the potential to tackle fundamental challenges associated with improving important traits in sugarcane, increased research efforts are needed to enable the implementation of the technology. The RGS or RRGS breeding schemes proposed in this paper hold the potential to increase long-term genetic gain for complex quantitative traits in sugarcane, but further investigations are needed.

Conflicts of Interest:
The authors declare no conflict of interest.