Challenges and Opportunities in Applying Genomic Selection to Ruminants Owned by Smallholder Farmers

: Genomic selection has transformed animal and plant breeding in advanced economies globally, resulting in economic, social and environmental beneﬁts worth billions of dollars annually. Although genomic selection offers great potential in low- to middle-income countries because detailed pedigrees are not required to estimate breeding values with useful accuracy, the difﬁculty of effective phenotype recording, complex funding arrangements for a limited number of essential reference populations in only a handful of countries, questions around the sustainability of those livestock-resource populations, lack of on-farm, laboratory and computing infrastructure and lack of human capacity remain barriers to implementation. This paper examines those challenges and explores opportunities to mitigate or reduce the problems, with the aim of enabling smallholder livestock-keepers and their associated value chains in low- to middle-income countries to also beneﬁt directly from genomic selection.


Introduction
Although major differences exist between the productivity and available resources of livestock producers in advanced and low-to middle-income countries (LMICs), several very significant challenges need to be overcome by all farmers, regardless of their location, if they are to capture the new opportunities that already exist and continue to emerge.
The world's population is expected to increase from 7 billion people in 2011 to 9 or 10 billion by 2050, with most of that growth occurring in Africa and Asia [1]. The incomes of many people in LMICs are now increasing and, with rising incomes, the demand for meat and dairy products is also growing [2]. To achieve food security by 2050, livestock enterprise and industry efficiency, as measured by total factor productivity, needs to increase by 2.0-2.5% per annum. This is the equivalent of doubling outputs from constant resource inputs through to 2050 [3]. Due to the pressures on agriculture in developed countries, a significant proportion of that increased production must occur in the regions of greatest need, i.e., in Africa and Asia. This increased demand for food is leading to greater competition for inputs such as land, water, grain and labor, driving up the cost of livestock production. Climate change is adding to this challenge [4], requiring animals that are productive under hotter and drier climates and, in the tropics and sub-tropics, requiring animals that can tolerate significant increases in ecto-and endo-parasitic burdens and vector-borne diseases. There is therefore an urgent need to greatly increase the productivity

•
The very rapidly reducing costs of full genome sequencing [8]; • The momentous reductions in the time taken to sequence an entire genome, down from close to 13 years for the first human genome sequence [9] to a full sequence now being achieved in a single day [10], and with potential in the near future to achieve full genome sequencing on the same day in the field rather than at laboratory sites; • The ability to accurately impute whole-genome sequence data from lower-density, lower-cost single nucleotide polymorphism (SNP) panels [11][12][13][14]; • The potential to use whole-genome sequence data to discover the mutations causing variations among animals and, in turn, using that knowledge of functional mutations to improve the accuracy of breeding value predictions [15]; • Resolution of the "missing heritability" problem [16], proving that genomic selection approaches account for significant proportions of the genetic variation for economically important complex traits; • Vastly improved computational capacity that is now allowing the cost-effective storage and processing of petabyte (10 12 bytes)-scales of data [17]; • The ability to use pooled DNA samples from groups of animals to identify the average genetic merit at low cost [18], thereby enabling the development of new, cost-effective management applications based on genomic information; and • The increased ability to capture essential individual animal performance data (phenotypes) through the use of automated or semi-automated electronic data capture methods.
Traditional genetic improvement programs, based on measuring large numbers of pedigree-recorded animals in well-defined cohort groups for the full range of economically important productive and adaptive traits, is generally not possible for smallholder farmers in LMICs. Now, the opportunity to use genomic data, in conjunction with the use of information and communication technologies, offers significant new opportunities to increase the rates of genetic gain by characterizing indigenous and crossbred animals for use in conservation, crossbreeding and within-breed selection programs, to improve economically important traits. Other technologies, such as genome editing, coupled with emerging reproductive technologies that enable rapid multiplication and decreased dependency on cold chains for the delivery of improved genetics, will potentially transform livestock breeding even further as causal mutations are found [19].
To date, there has been limited use of genomic technologies in grazing livestock in LMICs, due to several major challenges inhibiting their use. The following sections examine those challenges and identify opportunities to mitigate or remove them for the ruminant livestock species that predominate in those regions, i.e., beef and dairy cattle, sheep, and goats. Even though this paper focuses on the application of genomic selection in LMICs, no attempt is made to evaluate the ongoing refinement of the genomic selection methodology or the increasingly sophisticated demands on the computational capacity required to drive the method, because those challenges continue to be addressed more rapidly than the alternative constraints facing the use of genomic selection in LMICs.

The Need for Accurate Phenotyping and Record-Keeping
In both advanced economies and LMICs, the main limitation to genomic (and traditional) selection in extensively managed livestock is the difficulty and expense of measuring animals in appropriately sized contemporary groups for the full range of economically important productive and adaptive traits. As discussed by [20], technology may in the future provide the means of measuring animals, but it cannot replace the statistical imperative that, for these measurements to be beneficial for genetic improvement programs, contemporary groups of appropriate structure and sufficient size are required. Unless the design is adequate in terms of contemporary group size and structure, the measurements will not provide useful predictions of genetic merit. This applies to traditional genetic improvement programs as well as those capturing beneficial traits through genomic selection.
As suggested in Table 1, the measurement of most phenotypes required for genetic improvement programs in smallholder herds and flocks is generally not feasible in the field. Where measurement is feasible, there is an additional requirement that accurate records be maintained at the level of individual animals. Such record-keeping is often an additional challenge for smallholder farmers, mainly because platforms that can effectively collate and make sense of such highly fragmented data are lacking. This recording has been assisted in the past by animal breeding research projects, such as those described by [21]. However, with the short-term nature and eventual closure of many of those types of projects, the data capture has generally been discontinued by the smallholder farmers. More recent research projects such as the African Dairy Genetic Gains (ADGG), BAIF India and community-based breeding programs (CBBP) in Ethiopia and Malawi are now adapting digital tools, such as mobile phones and tablets, to capture performance data for easy-to-measure traits such as milk yield, body condition score and artificial insemination records [21].
By way of example in the ADGG project, milk yield, heart girth (for predicting body weight), and body condition score are collected monthly using software based on the Open Data Kit (ODK) that is installed on tablets and mobile phones, employing the services of performance-recording agents. In addition, iCow (http://www.icow.co.ke/, accessed 15 November 2021), a technological platform owned by a private company, Green Dreams, a partner in the ADGG, has provided feedback information to farmers for herd management through text messages and web-based training. This performance data has enabled the genomic prediction and selection of first-rate young bulls for breeding in Tanzania [30]. The main challenges of the data-capture system are the high cost of employing performancerecording agents and poor internet connectivity to upload the data. The most obvious data issues relate to inconsistencies in the dates for various animal events, such as birth, calving, and milking dates. This leads to a large number of animals being rejected from any meaningful genetic analysis. Table 1. Phenotypes that should ideally be included in livestock-breeding objectives and the feasibility of recording them in smallholder herds and flocks in low-to middle-income countries.

Phenotype and Purpose
Options to Measure Key Traits for Use in Genetic Improvement Programs in Smallholder Livestock Populations

Product Quantity and Quality
Animal live weights and weight gains for the genetic evaluation of potential meat quantity in meat-producing animals and to provide assessments of animal nutrition and the effect of environmental stressors and/or endemic diseases on individuals and groups of meat and dairy animals Except for farmers directly engaged in well-funded genetic improvement research programs, a lack of animal-handling infrastructure and access to scales generally means that records of individual animal weights and weight gains are not feasible in smallholder herds/flocks. This is particularly true for meat animals. Future infrastructure development may enable remote walk-over weighing or similar options for measurements, although [22] concluded that current walk-over weighing systems did not justify the investment needed in individual animal electronic identification. An alternative approach is for farmers to measure the animal's circumference and length as an indicator of body weight, but this would also require access to appropriate handling facilities to enable accurate tape placement and length and height measurements. The accuracy of these assessments is reasonable in well-designed cohorts of the same breed of animals, but it varies markedly across breeds and animal size. Hence, in situations where animal breed composition is unclear, as is often the case in smallholder herds and flocks, this is not a reliable measurement for use in genetic evaluation [23]. Predictions from images based on deep learning may become available in the near future.
Milk volume as the primary product for dairy animals and a maternal trait for meat animals Measuring milk volume is relatively straightforward in dairy animals in smallholder herds and flocks but is not feasible in meat-producing animals, other than indirectly through the offspring's weaning weights.
Meat (e.g., tenderness, flavor, juiciness) and milk quality (e.g., protein concentration, fat content, etc.) attributes Although sensor-based meat quality assessment systems exist, they are not readily available and given the meat evaluation and pricing systems, such assessments may not provide value for money, except where animals from smallholder flocks and herds are sold through commercial value chains to meet market quality specifications. In those cases, the processor or retailer purchasing the carcasses or milk should be able to provide the phenotypes required. However, quality-based value chains are currently scarce in most LMICs.
Live animal fat depth, eye muscle area, etc., in meat animals Subjective assessments of fat coverage or body condition scores are feasible but the value of doing so for genetic improvement programs in the absence of other measures, such as weights, is not high.
Efficiency of feed utilization, particularly while animals are grazing on pasture While this phenotype represents the best opportunity to improve the amount of feed required by an animal to maintain its body weight, it is still not feasible to record even in sophisticated breeding programs in high-income countries, except when animals are fed grain-based diets in pens or assessments are made on monoculture crops/pastures. Those measures are unable to account for animals' diet selections and browsing behaviors, both of which impact feed utilization at pasture.
Female reproduction (most measures of male reproduction are not generally feasible in smallholder herds and flocks) Age at first estrus Smallholder farmers can be trained to recognize the signs of the first estrus in female animals that are closely monitored and the use of digital cow calendar reminder systems enable subsequent signs to be easily and accurately timed and recognized. However, as the signs vary across animals, the records are unlikely to be sufficiently accurate for genetic improvement programs. Another option could be to use solar-powered sensor networks to remotely capture livestock data, such as estrus and pregnancy status, using animal ear-or neck tags (https://www.allflex.global/product/heatimepro/, accessed 15 November 2021), but to justify the expense, this type of data would need to be captured in relatively large herds or flocks.
Date of calving/lambing, age of first calving/lambing and inter-calving/lambing period In closely managed smallholder herds and flocks, recording the date of calving/lambing of individual dams is feasible, though not currently routinely practiced except through research programs. Knowledge of this date enables the calculation of the age of first calving/lambing and inter-calving/lambing periods.

Pregnancy rate
If smallholder farmers have cost-effective access to veterinary services, pregnancy testing may be a feasible option for some. Use of estrus detection ear tags (see "Age at first estrus" section above) may also provide a useful indicator of pregnancy status if the tags are also used during the breeding season by indicating which females do not return to estrus.
Weaning rate and offspring mortality rate If smallholder farmers practice calf/lamb weaning and the calves/lambs are routinely individually identified to their dams, it may be feasible for annual weaning rates (and, hence, annual offspring mortality rates) to be calculated if the number of females mated in the previous breeding season is also recorded.

Adaptive traits
Resistance to ecto-and endo-parasites As summarized by [24], "resistance" in this context refers to both the ability of the individual host to resist infection or control the parasite lifecycle (resistance) and also where an individual host may be infected by a parasite but suffer little or no harm (tolerance). These terms are used interchangeably here. An in-depth discussion of the ability of cattle to resist a wide range of individual parasites is given in [24], but measuring an individual animal's resistance to any of those parasites in advanced and low-to medium-income economies is not generally feasible due to both the intermittent nature of parasite infestations and the difficulty of measurement.

Resistance to endemic diseases transmitted by parasites
Measurement of disease resistance under pastoral conditions is generally very difficult, even in advanced economies. The simple presence/absence of disease can be subjectively assessed by a skilled recorder when the animals are observed during routine handling procedures. However, infrequent observation of animals means that incidence of disease generally goes unrecorded, except where animal deaths occur and a diagnosis giving the cause of death is possible.
Tolerance to heat stress Traditionally, heat stress has been recorded using repeated measurements of the rectal temperature of animals under conditions of heat stress [25,26] or through subjective coat scores of animals during the summer months [27]. Increasingly, however, tolerance to heat stress is being assessed through the use of temperature-humidity indices based on the automatic meteorological recording of ambient temperatures and humidity in local regions [28,29].
Another consideration is how such record-keeping can be made sustainable beyond the life of the associated research projects, while recognizing that the farmers who provided the records remain the owners of that data beyond the life of the projects. This is an issue that needs to be directly addressed by each of those projects. ADGG has been examining several business model options for the sustainability of the record-capture system; these include piloting mobile phone-based systems for direct data capture from farmers through monthly alerts, engaging government officials in the respective country to encourage their involvement, and exploring private company participation. Currently, direct farmer-incentivized systems are being tested in Kenya as a possible long-term solution to this challenge.
Even where measurement is feasible, it is likely that many smallholder herds and flocks are unable to generate within-herd genetic linkages through the use of multiple sires to generate contemporary groups, meaning that the contemporary group design requirements also present significant challenges. For this reason, the most feasible option for smallholder herds and flocks to participate in genetic improvement programs is likely to be through the use of specifically designed reference, resource or nucleus populations aimed at the identification of genetically superior sires for subsequent use in smallholder herds and flocks. However, such reference populations need to be managed under conditions that are as similar to the smallholder and pastoral systems as possible. Past attempts, where government research centers were used, have generally failed.

The Role of Reference Populations
Over many decades, the dairy cattle industries in high-income countries have conducted successful genetic improvement programs using a model where individual dairy herds contributed pedigree and performance records (and more recently, genomic information) to national and international genetic evaluation schemes. These types of schemes have generally not been feasible for other livestock species, such as beef cattle and meatand wool-yielding sheep and goats, which have traditionally focused on visual appraisal as an indicator of performance in the absence of objective, routinely recorded performance data, such as the daily milk volumes that exist in the dairy industries.
For this reason, an alternative approach was developed for use in those species, where large livestock populations were specifically designed and established to accurately manage and record animals, particularly for difficult-or expensive-to-measure phenotypes, within well-designed contemporary groups to capture data for the traits of interest. As part of the design of these populations, great effort was expended to generate strong genetic linkages within and across contemporary groups of animals and across herds and flocks being evaluated, whether by the exchange of specific bulls and rams or by the use of specified AI sires in reference herds and flocks. To achieve the levels of accuracy required for these difficult-or expensive-to-measure traits, very large animal resource populations that have been accurately recorded for the particular trait are needed [8]. These populations are known as reference, resource, or nucleus populations and, to date, have all been established as part of large and well-funded research projects.
The first beef cattle and then sheep reference populations established in Australia were needed because, at the time of their establishment, there were no breed associations or breeding companies interested in or able to undertake genetic improvement based on objective performance data (cf. the traditional visual appraisal approaches common at that time) and particularly for hard-or expensive-to-record traits. Examples of such populations in beef cattle in Australia are described by [31] for growth, feed efficiency and carcass and beef quality, and by [32,33] for the full range of productive and adaptive traits in the breeding objective. More recent examples include the "Repronomics ® " project [34] that builds on the populations described in [32,33], and the more recent Northern Genomics project [35]. The Northern Genomics Project works with 54 collaborating herds across northern Australia (including those farmed in some very challenging environments), with 26,000 heifers and cows now genotyped and trait-recorded [35]. The collaborators and associated veterinarians collect data on cohorts of heifers in well-defined, and in some cases very large, contemporary groups. In most cases, the herds are mixed-breed, crossbreed or tropical composites (these composites being admixtures of three or four breed types, i.e., Bos indicus, tropically adapted Bos taurus, temperate Bos taurus-British and temperate Bos taurus-European, with many of the composites having ancestry from 6 or more individual breeds, as described in detail by [31,32]). The traits include heifer puberty (based on ultrasound scans to determine if the heifers have cycled or not), weight, height, and body condition score at approximately 600 days, whether they are pregnant or not four months after calving (a re-breed trait), farmer-scored temperament, tick score and buffalo fly lesion score. All traits, except heifer puberty and being pregnant or not four months after calving, are farmer-recorded following some minimal training on field days. Breed composition and Bos indicus percentage, derived from SNP marker predictions, were used in the models used to derive SNP prediction equations for the traits. The project has estimated genomic heritabilities that are similar to those produced from pedigreerecorded herds and has also validated useful accuracies of genomic estimated breeding values (GEBV) across breeds and composites [35]. The project clearly demonstrates that useful GEBV can be produced from data collected in commercial herds. However, a clear difference between these herds and those in LMICs is in contemporary group size and, as indicated above, this is really the key challenge when using data from smallholder herds in LMICs.
An additional study in the USA developed specific populations to record resistance/susceptibility to bovine respiratory disease in beef and dairy cattle [36]. Similar populations designed to capture data for a range of productive attributes in meat-and wool-yielding sheep in Australia are described by [37,38]. International efforts have been expended in creating an international resource population of dairy cows for feed intake records, collected in research herds [39].
Similar populations have been established more recently for smallholder dairy farmers in countries in sub-Saharan Africa and India through externally funded, highly participatory research programs, such as the ADGG project. These programs use information and communication technologies (ICT) to digitally capture and submit data that are sufficiently large for use in genetic evaluation [40,41]. In ADGG, the dairy cattle population designated for monthly monitoring and data capture involves animals located in sites from six regions of Tanzania and Ethiopia, covering the major agro-ecological zones in those countries. Therefore, genomic predictions based on the ADGG data can be used to select genetically superior animals for use across their respective countries.
In India, the BAIF Development Research Foundation has set up an excellent smartphonebased herd recording system for use by farmers and specialized milk recorders [42]. The availability of high-quality data has resulted in GEBV with moderate accuracy (~0.45) for some breed/cross-breed groupings of Indian dairy cattle [42].
Alternative CBBP have been established specifically for indigenous breeds of sheep and goats in Latin America, Africa and Asia, primarily supported by national governments in conjunction with local organizations. The implementation of CBBP combines genetic improvement programs with infrastructure, community and market development. Examples of CBBP in local sheep and goat breeds across several LMICs are described by [43][44][45][46][47][48]. Guidelines for establishing CBBP focused on small ruminants are provided by [49].
There are currently no known resource populations for smallholder beef cattle, primarily due to a lack of the significant funding necessary for their establishment and the significant length of time required to achieve genetic improvement in those herds, which have an average generation interval of 4-6 years. An attempt was made to establish linkages with populations in South Africa through the government-funded "Beef Genomics Project" that services commercial seedstock herds [50]. However, even in those seedstock herds in South Africa, challenges remain when recording the more difficult or expensive-to-measure phenotypes [50]. However, the existence of that population may, in the future, provide opportunities for smallholder beef farmers across Africa and potentially elsewhere to link with it, to drive genetic improvement programs in their own regions. This opportunity is discussed further in subsequent sections of this paper.
Opportunities to maximize the accuracy of genomic selection using multi-breed reference populations and multi-omic data are provided by [51], while another report provides guidelines to minimize the loss of genetic diversity through the use of reference populations [52]. The issue of loss of genetic diversity is of critical concern, particularly as it relates to the indigenous livestock breeds of many LMICs.
While the existence of these resource populations is currently providing significant opportunities for smallholder dairy cattle, sheep, and goats in a small number of LMICs, the greatest challenge is their sustainability on a longer-term basis. In high-income countries, the existing resource populations are in the process of being migrated from research funding to a variety of co-investment models; this will ultimately result in a model that is funded by the beneficiaries of that genetic improvement. A similar transition will ultimately be required for the small number of existing resource populations in LMICs, but how and when that will be achieved is still not clear. Meanwhile, the vast majority of LMICs have no access to the resources needed to even establish suitable resource populations to target the very significant economic, social and environmental benefits derived from the genomic selection of livestock in advanced economies.

Data Analyses and Estimation of Genomic Breeding Values
The basic model of best linear unbiased prediction (BLUP) evaluations [53] is: Y~mean + contemporary group + fixed effects + animal + e (1) where "animal" is a random effect~N(0,A σ A ), A is a relationship derived from pedigree and σ A is the additive genetic variance. The contemporary group is commonly also fitted as a random effect and e always is a random effect. Fixed effects depend on traits, for example, lactation number in dairy cattle or kill-day in beef-quality traits. In a genomic evaluation, the second model is very similar, the only difference being N(0,G σ G ), with G being the genomic relationship among animals constructed from the SNP genotypes [54]. The animal solutions from BLUP in a genomic model are usually referred to as GEBV and the model itself, GBLUP.
A third model for evaluations combines both information from animals with pedigrees and phenotypes, but no genotypes, and animals with pedigrees, phenotypes and genotypes, in a "single-step approach" [55]. In this approach, an H relationship describes the relationship among animals and replaces the A in the first model, i.e., animal~N(0,H σ H ). The H includes the elements of A for non-genotyped animals and elements of G combined with A for genotyped animals. This model has been implemented very successfully in several developed countries for dairy cattle, dairy goats, and pigs [56].
A major problem in the genetic analysis of data from smallholder systems is usually the lack of pedigree information. For instance, the genomic prediction in data from Tanzania [30] has been based on model two described above; this involved 1906 genotyped cows. However, only 226 cows of those cows had either both or only one parent known; this clearly underlines the importance of the availability of genotypic information in enabling prediction of the genetic merit in smallholder systems, as the pedigree relationships are clearly inadequately recorded.
However, under the ADGG program, the combined use of the genotypic and pedigree information that is increasingly becoming available provides hope for a better future. The use of the genomic matrix, derived from SNP information, to infer relationships among animals, and the application of model three, as described above, has enabled the estimation of genetic parameters and genomic prediction in smallholder systems [30,40].
In general, the methods currently used for genomic prediction in smallholder dairy systems include GBLUP, single-step procedures and various Bayesian methods (see [30] for a detailed review). However, most of the genomic prediction systems are based largely on females and small datasets, making it very difficult to adequately define separate reference and validation populations.
Consequently, most studies have used cross-validation approaches rather than forward validation [30]. However, some studies have applied forward validation or both validation approaches [30,57,58]. The validation accuracies are mostly of low to medium value (0.21 to 0.60) for milk yield, backfat thickness and rear eye area [58][59][60], but some high estimates (0.71 to 0.83) have been reported for bodyweight and other beef traits [61,62].
The complexities of recording accurate pedigrees in LMICs make implementing either the single-step model, or even the original pedigree-based model, rather unattractive and, as described above, makes the pure genomic model really attractive. For the routine production of GEBV, an alternative to the GBLUP model may be useful. A model can be fitted that estimates SNP effects directly. For example, BayesR [63] fits the model thus: Y~mean + contemporary group + fixed effects + Zg + e (2) where g = vector of SNP effects, and g ∼ N 0, Iσ 2 i with four possibilities for σ 2 i = 0, 0.0001 * σ 2 g , 0.001 * σ 2 g , 0.01 * σ 2 g , where σ 2 g is the genetic variance of the trait. Each SNP is from one of four possible normal distributions: N(0, 0 * σ 2 g ), N(0, 0.0001 * σ 2 g ), N(0, 0.001 * σ 2 g ) and N(0, 0.01 * σ 2 g ). Four distributions are used so the marker effects can be moderate to large (e.g., in the case of DGAT1), small, very small or zero. Z is the animal x-marker genotype matrix.
It has been demonstrated that BayesR results in a higher accuracy of GEBV compared to GBLUP in multi-breed populations when high-density markers are used [64].
A major advantage of the BayesR approach for LMICs is that GEBV for new selection candidates can be run very quickly and with limited computing power. GEBV for these new candidates (i.e., young sires not in the reference population) can be calculated as GEBV = Zg_hat, which takes seconds or, at worst, minutes to compute on a laptop with a reasonable random-access memory (RAM). The g_hat, estimated from running the BayesR with Gibbs sampling, for example [63], can be run on a high-performance laptop with a much larger RAM, or a high-performance computer as large numbers of new reference animals become available, for example once or twice per year. The g_hat can then be passed onto the evaluation centers for rapid routine evaluations.
The reference populations to derive the g_hat can include data from multiple countries, as demonstrated by [65], in order to expand the reference population and, therefore, make GEBV more accurate.
In theory, the highest possible marker density should be used in genomic evaluations, particularly in multi-breed populations, as this allows the SNP with the highest linkage disequilibrium (LD) with the actual mutations to be used in the predictions, and this LD should persist against breeding. The ultimate solution would be to use whole-genome sequencing in the predictions, as this would allow the actual causative mutation to be used in the prediction equation, rather than to rely on LD with a random SNP. One problem is that it is still too expensive to sequence the whole genome of all animals in the reference set. An alternative is to impute the reference set from their low-density markers (e.g., 50 k) up to the whole-genome sequence using the 1000 bull genomes database [66].
The outcome from using these imputed genotypes would, however, be an enormous prediction equation-for example, 43 million SNP long! The practical alternative that has been adopted in industry is to use an SNP panel with a relatively small number of putative causal mutations identified from sequence data (in genome-wide association studies (GWAS), for example) plus the standard panel of high-density SNP (e.g., the bovine HD array). This is much more computationally tractable and, in many cases, gives better accuracy than full sequence data [67]. Furthermore, including genome annotation information to focus on those regions more likely to harbor causal mutations can increase the accuracy of genomic predictions using sequence data [68].
It has become increasingly clear that pooling data across regions and countries is beneficial for increasing the accuracy of genomic predictions. Hence, one critical consideration during the design phase of any breeding program is the need for consistent trait definitions across the countries planning to share data, to ensure that animals in multiple populations are recorded for the same trait(s). Alternatively, where resource populations are being developed, they need to be large enough to allow an estimation of genetic correlations with indicator traits, if consistent recording of the same trait(s) cannot be achieved across all populations. Regardless, estimating these genomic correlations and genotypes by environment interactions becomes more straightforward with genomic information, as what is required is observations of the traits/environments on common chromosome segments, rather than the sires' progeny [69,70].

Infrastructure and Human Capacity
Two problems of major significance to smallholder farmers in LMICS are: (i) the lack of infrastructure required to undertake the on-farm management and phenotyping of animals, laboratory testing of animal samples, data capture and storage, and lack of computing facilities, etc.; and (ii) lack of human capacity, particularly in areas of technological capability, data analysis and interpretation.
The issue of data capture and storage is starting to be addressed through the use of portable devices that do not require on-site internet connection (e.g., mobile phones, tablets). However, in the absence of research projects that can assist with infrastructure development, many of these issues remain as significant challenges to the implementation of livestock genetic improvement programs in these countries, since it is unclear how business models might develop for data-recording in most LMICs. One opportunity that is currently being explored in conjunction with ADGG is the possibility of developing a web interface that would enable data from livestock resource populations from countries and industries not currently serviced by ADGG to be uploaded to the ADGG platform, and then undertake genomic prediction using the pipeline developed by ADGG at the International Livestock Research Institute in Nairobi. If that opportunity could be achieved, that would mean other livestock industries and countries would not need to develop their own separate software or pipeline, thereby generating some efficiencies.
The recent launch of the African Animal Breeders' Network (AABN-http:// animalbreeding-africa.org/, accessed 15 November 2021) is in direct response to the second issue relating to the lack of human capacity, with the aim of strengthening collaboration among academia, industry, farmers' organizations, the public sector, philanthropic organizations, and development agencies to drive the development and implementation of genetic improvement programs across the African continent. Professional development and capacity-building across all sectors of the livestock genetic improvement chain, from smallholder farmers to service providers and academics, are key pillars of AABN. Similar networks will be required in LMICs in other areas of the world, such as Asia, the Pacific, Central America and the Caribbean, to build capacity among livestock keepers and service providers in those areas.

The Value of National and International Collaborations
One key learning from the successes of genomic selection in the livestock industries in advanced economies is that strong and effective multi-organizational, multi-disciplinary, and, often, multi-national partnerships are key to their success. Such partnerships need to be inclusive of all sectors of the animal breeding chain, from farmers through to the service providers and researchers who provide decision-making recommendations to farmers and continue to improve the technologies being used in the processes. Such processes are likely to be even more critical for genetic improvement programs in LMICs, where generating sufficient animal data independently is unlikely to be feasible for decades to come. This need for effective partnerships is behind the establishment of the AABN, referred to in the previous section. Its usefulness can also be demonstrated using a recent example presented by [59], where genomic breeding values for a very difficult-and expensive-to-measure trait (cattle resistance to ticks) were successfully estimated in relatively small numbers of beef cattle in unrelated cattle breeds in South Africa (Nguni) and Australia (Tropical Composites comprising different admixtures of four breed types and at least six individual breeds) through the use of larger phenotyped populations of Angus, Hereford, Braford and Brangus cattle in Brazil where, in effect, the Brazilian herds became effective reference populations for cattle in South Africa and Australia. This suggests that a very viable solution for genetic improvement programs in LMICs would be to formally link resource populations and genetic evaluations in LMICs with livestock-breeding programs in more advanced economies, to enable the effective implementation of genomic selection across all countries. This type of collaboration may have the added benefit of perhaps, partially, overcoming the lack of laboratory infrastructure that is a common constraint in LMICs.

The Ability of Genomic Information to Mitigate These Challenges
As outlined in earlier sections of this paper, the availability of genomic information is now providing exciting new opportunities to identify genetically superior animals in smallholder herds and flocks in LMICs and, based on well-documented evidence from advanced economies, to simultaneously deliver very significant economic, social and environmental benefits to those smallholder farmers and the communities and countries where they live.
The major benefit of genomic selection derives from the ability of genomic information to replace the need for pedigree recording and, specifically, generating genetic linkages within and across herds and flocks that record the same phenotypes. Another important benefit from the use of genomic information is that fewer animals are required using genomic selection approaches to achieve accurate GEBVs relative to traditional genetic improvement programs because the chromosome segments that are shared among the breeds now provide genomic linkages across the different populations. This will be particularly important if data from smaller reference populations in LMICs can in the future be combined for analysis with data from larger reference populations in more advanced economies, as occurred in the example given by [59]. The use of genomic information also enhances decision-making in crossbreeding programs by providing accurate information on the breed composition of individual animals and, in doing so, also provides a mechanism for identifying indigenous breeds that require conservation.
In the future, there is good potential for genomic information to replace an animal's phenotype, not only through the identification of causal mutations and regions of the genome impacting on particular traits but also through the use of new "-omics" technologies, such as functional genomics, gene expression, transcriptomics, proteomics and metabolomics. This will most likely occur initially for difficult-or expensive-to-measure traits with very high economic impacts, with these new technologies delivering simpler and more cost-effective diagnostic tests for both animal management and genetic improvement purposes. In that scenario, instead of data being primarily recorded for management purposes, it may in the future be more useful to imagine data being collected specifically for genetic improvement in nucleus farms (either centrally or distributed). As such, these "phenotype farms" would have a requirement to generate genomically improved genetic material for distribution to smallholder farmers in LMICs.

Future Opportunities
In addition to the new or adapted uses of genomic information described above, several new opportunities will become available over the coming years to assist smallholder farmers to capture some of the well-documented and very significant economic, social and environmental benefits of genomic selection that are already achieved by livestock farmers in advanced economies. These opportunities include the increasing use and availability of digital and possibly automated data capture through, for example, spatial technologies such as high-resolution satellite imaging and unmanned aerial vehicles (drones), or by using solar-powered sensor networks to remotely capture livestock data, such as live weights, estrus or pregnancy status using animal ear-or neck tags. These technologies will allow the real-time tracking of animals and animal products, providing new phenotypes for genetic improvement programs as well as improving efficiencies and data collection across the entire supply chain. The opportunity to effectively capture and analyze "big data", including publicly available information such as geographical location and meteorological information, will also allow new levels of insight and development of decision support tools, such as apps for the use of farmers in both advanced economies and LMICs.
Potentially, the greatest opportunity for smallholder farmers to capture the benefits of genomic selection over the coming years will, however, be through expansion of the very small number of existing livestock resource populations and the development of new populations in other LMICs and other livestock industries not currently serviced by the existing genetic improvement platforms in those regions. Linking those existing and new resource populations through collaborations with livestock populations in advanced economies, as outlined in Section 6, will also generate strong benefits for LMICs.
An operational framework to establish new resource populations could be along the lines of the following:

•
Resource populations could be formed at relevant new regional (e.g., national or possibly even multi-national) levels within livestock species, with those populations being managed overall with input from the smallholder farmers contributing the animals, but with the responsibility for technical areas (phenotyping, genotyping, data upload, etc.) being the remit of technicians with appropriate training; • Initial funding would be required to cover the costs of designing the populations (to ensure local relevance) and establishing them, for phenotyping the animals for the full range of economically important traits for each of the species, and for genotyping them, although only selected animals would require the use of higher-density (and thus more expensive) SNP panels; • Designing the resource populations should be undertaken in direct collaboration with established resource populations in other countries with similar environmental systems, to ensure compatibility of the populations for the future pooling of data for genetic evaluation purposes-in LMICs, this generally means collaborations with other resource populations that operate in tropical or sub-tropical environments; • However, due to the assumed use of genomic information, the design of the new populations does not need to specifically generate genetic linkages across the different populations, nor is there a need to restrict the design to animals of the same breed, as demonstrated by [65]; • Capture of data from the new resource populations would be achieved electronically in the field using mobile devices and subsequently uploaded to the data platform when internet coverage is available; • Assuming that the opportunity described in Section 5 can be realized, a web interface would be built to enable the ADGG portal to capture the data from the new resource populations, thereby avoiding the need for the new populations to develop separate software and pipelines. That portal would also provide appropriate analytical models to enable multiple-country genetic evaluations within species, with data ownership continuing to be retained by the farmers who own the livestock being evaluated. It would also become the permanent repository for data collected through those new populations, beyond the life of any research projects that initially fund the data collection, thereby overcoming another of the challenges raised earlier in this manuscript; • Many of the countries lacking existing resource populations do have trained animal geneticists with an interest in having greater involvement in data analysis. Ongoing capacity-building of those and other interested people through the AABN (also described in Section 5) would ensure that the analytical models are directly relevant to the countries or regions where they are employed; • The animal geneticists identified to manage the new resource populations and to undertake the genetic evaluations would be encouraged to directly collaborate with researchers in other countries, to undertake cross-country genetic evaluations that generate value for farmers across all collaborating countries; • In countries where artificial breeding centers do not exist, such centers would be established either by governments or the private sector to collect germplasm from animals proven to be genetically superior, with the germplasm (most likely semen) being made available to smallholder farmers, thereby enabling genetic improvement of their herds/flocks; • Over time, the beneficiaries of the genetic improvement program would be expected to contribute to the ongoing costs of maintaining the genetic improvement programs, as is now occurring in Australia to sustain the earlier research-funded populations (e.g., livestock producers paying to have their own sires evaluated through the resource populations, by contributions to the costs of the resource populations or even, in some cases, establishing populations in their own herds or flocks); • Establishing new resource populations using these guidelines would enable smallholder farmers to directly capture the benefits of genetic improvement through the use of genetically superior breeding animals but without the need to understand the complexities or overcome the major challenges of new technologies (e.g., hardware incompatibility; complexity; language barriers; lack of electricity, computers, internet access, etc.) that have proven to be major barriers to adoption in LMICs, as described by [71].
However, the expansion of existing resource populations and the development of new populations is entirely dependent on the availability of new funding for this purpose, and where that funding will come from is not at all clear. A recent presentation [72] comparing the public acceptance of biotechnologies, such as genetic engineering and gene editing with genomic selection, highlighted this major difficulty, indicating: "There are glaring disparities when it comes to the implementation of genomic selection in the developing world . . . it is expensive to develop large populations of genotyped, phenotyped animals. It is not a scale-neutral technology, advantaging large breeds and genetic providers over small ones. Such inequality concerns would derail a genetic engineering application, yet these concerns are rarely even discussed as it relates to genomic selection . . . ". Therefore, perhaps the greatest opportunity to secure the proven and very significant economic, social and environmental benefits of genomic selection for smallholder farmers in LMICs is to attempt to engage a range of government, non-government and philanthropic organizations to give priority to improving the rates of genetic gain in livestock farmed by smallholders in those countries.