Whole-Genome Sequencing and Characterization of Buffalo Genetic Resources: Recent Advances and Future Challenges

Simple Summary Recent advancements in high-throughput technologies like whole-genome sequencing, genome-wide association study (GWAS), gene expression profiling, next-generation sequencing (RNA and DNA), and genome-wide CHIP-seq scanning are used to detect genetic variants and study gene regulation, gene functioning, and single nucleotide polymorphism (SNP) ordering resources. These techniques offer a wide range of whole-genome data and high coverage to genomic, epigenomic, transcriptomic, and proteomic information related to cellular interactions, functioning, and behavior. In buffaloes, candidate gene studies use the available genetic resources to uncover the functional candidate genes and their interactions associated with buffalo productivity, including production, adaptation, and disease resistance. Thus, the whole-genome and candidate gene approach to next-generation data could help elucidate the inheritance of complex traits, full genomic coverage, and the genetic dissection of productivity-related attributes, which could ultimately help accelerate genetic progress in buffaloes. Abstract The buffalo was domesticated around 3000–6000 years ago and has substantial economic significance as a meat, dairy, and draught animal. The buffalo has remained underutilized in terms of the development of a well-annotated and assembled reference genome de novo. It is mandatory to explore the genetic architecture of a species to understand the biology that helps to manage its genetic variability, which is ultimately used for selective breeding and genomic selection. Morphological and molecular data have revealed that the swamp buffalo population has strong geographical genomic diversity with low gene flow but strong phenotypic consistency, while the river buffalo population has higher phenotypic diversity with a weak phylogeographic structure. The availability of recent high-quality reference genome and genotyping marker panels has invigorated many genome-based studies on evolutionary history, genetic diversity, functional elements, and performance traits. The increasing molecular knowledge syndicate with selective breeding should pave the way for genetic improvement in the climatic resilience, disease resistance, and production performance of water buffalo populations globally.


Introduction
The buffalo was domesticated around 3000-6000 years ago and has substantial economic significance as a meat, dairy, and draught animal [1][2][3]. This species is stereotypically distributed across wet grasslands, tropical and subtropical forests, swamps, and marshes. Though buffaloes are terrestrial mammals, they spend much of their time wallowing in rivers or mud. Wallowing is a comfort behavior that not only helps the animals to keep cool but also protects them from insect bites. Typically, buffaloes are inhabitants of mire holes, to keep cool but also protects them from insect bites. Typically, buffaloes are inhabitants of mire holes, rivers, streams, trees, and tall grasses that provide enough food, water, and coverage [2]. Bubalus is thought to have spread from Europe to southern Asia in the Pleistocene epoch, but a progressive dry climate became the ultimate reason for contracting its distribution area to Indonesia, India, and Southeast Asia. It is believed that the buffaloes were brought to Italy from central Europe in the 6th century or from the Gulf of Tunis in the 7th century, along with Arab conquests. There is, however, recent evidence of the introduction of buffaloes into Africa, America, and Australia [1,3].
The worldwide buffalo population is approximately 207 million heads, with a distribution of more than 200 million (97%) in Asia, 6.831 million (2%) in Africa (predominantly Egypt), 1.449 million (0.7%) in South America, and less than 0.144 million (0.2%) in Australia and Europe, as shown in Figure 1 [4]. Buffaloes constitute a littl more than 11.1% of the global bovid population, but larger human communities depend on buffaloes for their livelihood as compared to any other domesticated species [1]. Globally, in the last 20 years, about a 2% annual increase in the buffalo population has been observed. Buffaloes contribute about 13% of the global milk supply [5]. Buffalo milk contains a lower amount water but higher amounts of protein, fat, minerals, and lactose as compared to cow's milk [2,4]. Buffalo milk is considered more suitable for the production of butter, cheese, and other high-quality dairy products. Buffalo meat is lean with lower cholesterol and fat contents than beef and comparatively better taste [2]. In waterlogged conditions, like rice paddy fields, buffaloes are considered superior draught animals and have been frequently used for ploughing and can drag heavier loads than other cattle. Furthermore, their hide is quite useful in making a variety of leather products [2,4]. Buffalo manure is used as a natural fertilizer, which supplements the soil with organic matter and essential elements and successfully reduces the need for chemical fertilizers. Most importantly, in villages, the small farmers and poor people prefer to rear buffaloes owing to their strong ability to Buffalo milk is considered more suitable for the production of butter, cheese, and other high-quality dairy products. Buffalo meat is lean with lower cholesterol and fat contents than beef and comparatively better taste [2]. In waterlogged conditions, like rice paddy fields, buffaloes are considered superior draught animals and have been frequently used for ploughing and can drag heavier loads than other cattle. Furthermore, their hide is quite useful in making a variety of leather products [2,4]. Buffalo manure is used as a natural fertilizer, which supplements the soil with organic matter and essential elements and successfully reduces the need for chemical fertilizers. Most importantly, in villages, the small farmers and poor people prefer to rear buffaloes owing to their strong ability to efficiently use low-quality and less digestible roughage than other ruminants, which makes them easy to raise on locally available crop residues. Additionally, buffaloes are considered lucrative assets and ready cash, particularly for landless and smallholder families [6].
Buffaloes are a comparatively resilient and thrifty animal, but it is also vulnerable to parasites and diseases that affect cattle, including tuberculosis, piroplasmosis, brucellosis, trypanosomiasis, and rinderpest [6]. Owing to wallowing behavior, buffaloes are less susceptible to ectoparasites, like ticks, that cause diseases such as babesiosis, anaplasmosis, and theileriasis [4] and also resist the screwworm fly, which is the main pest of farm animals. After a buffalo has wallowed, the mud covering the buffalo's body coat suffocates the screwworm fly larvae, ultimately providing protection from screwworm fly larval infection [6]. Though wallowing behavior helps to resist some parasites, it increases susceptibility to liver fluke, as buffaloes are more vulnerable to the waterborne stage of liver fluke. Furthermore, buffaloes are less affected by mastitis than dairy cattle due to wallowing behavior [6].
Buffaloes are adaptable and productive domestic animals, having national and international importance and contributing significantly to the rural economy of many countries [7]. As a species, buffaloes have remained underutilized despite having many comparative advantages, as mentioned above. The major reasons behind this underutilization include many problems and challenges, such as a higher incidence of infertility, deprived reproductive efficiency, poor production potential, and lesser calf survival rates [8]. In spite of their major contribution to the livestock sector, the development of buffaloes as a dairy producer is handicapped, mainly due to delayed puberty [9], poor estrus expression [10], a distinct seasonal reproductive pattern [11,12], and prolonged calving intervals [13]. Moreover, quite a few calving disorders negatively affect the reproductive and productive performance of buffaloes [14].
This scenario requires intensive research work on buffalo genetic resources to improve and further develop available buffalo germplasm to enhance buffalo production and effective use. Genome-based research has created a wide array for endorsing and employing gene technologies in different areas of livestock production. Genome biotechnology provides an opportunity to develop sustainable animal production systems by manipulating the intra-and interbreed genetic diversity. Genomic characterization is a preliminary step required to differentiate molecular phenotypes and predict breeding values to screen superior mates to produce improved progeny. Buffaloes are a globally important genetic resource that need to be characterized at a genomic level, which necessitates the development of a well-annotated and assembled reference genome de novo for buffaloes because it is mandatory to explore genetics and understand biology to devise breeding strategies and perform the genomic selection.
The present review focuses on providing comprehensive insight into the available genomic resources and genome re-sequencing efforts related to buffaloes aimed to better understand buffalo physiology, which would help to optimize reproduction efficiency, production potentials, nutritional value, and product quality.

Chromosomal Array of Asian Buffaloes
The chromosomal array presents 25 pairs of chromosomes in river buffaloes (Bubalus bubalis bubalis), whereas swamp buffaloes (Bubalus bubalis carabanesis) have 24 chromosome pairs. Genetic diversity in the two buffalo subspecies, the swamp buffalo and the water or river buffalo, is spotted by the fusion of chromosomes 4 and 9 to chromosome 1 in swamp buffalos, while all the other chromosomes and chromosomal arms are conserved in both subspecies ( Figure 2) [17]. A recent study demonstrated that buffalo submetacentric chromosomes are a centric fusion of ancestral acrocentric chromosomes [18]. Furthermore, the hybrid of the two subspecies is fertile but has 49 chromosomes and in subsequent mating has lower reproductive capability. All the chromosomal pairs, including the sex chromosomes, are acrocentric in river and swamp buffaloes except for the first five chromosomal pairs, which are bi-armed ( Figure 2). Various studies have reported that domestic cattle and buffaloes, members of the Bovidae family, are closely linked. The corresponding chromosome in buffaloes and cattle can be matched from arm to arm ( Figure 2) as cytogenetic characterization presents chromosomal banding and gene ordering homology in both members ( Figure 2) [17,19].

Chromosomal Array of African Buffaloes
Cytogenetic studies have demonstrated that S.c. caffer has 26 pairs of chromosomes and S.c. nanus possesses 27 chromosome pairs. Both species can interbreed, while their offspring have 53 chromosomes, with a lower fertility rate due to gamete imbalance, which results in abnormal zygote development [17]. The major difference between the two African buffalo subspecies is the presence of three bi-armed chromosomes in S.c. nanus whereas S.c. caffer has four chromosomes. All the rest of the chromosomes, plus the sex chromosomes, in both subspecies are acrocentric. These bi-armed chromosomal pairs in S.c. caffer were created through the fusion of cattle chromosomes 1 and 13, 2 and 3, 5 and 20, and 11 and 29 [23].
Additionally, no bi-armed chromosome pair sharing between Bubalus and Syncerus is detected, which suggests that there are no crosses between these two genera and if it happens, the resulting hybrid would have a deranged set of chromosomes. Consequently, the chromosome morphology categorizes each as a separate genus [17].

Sex Chromosomes
A high degree of homology has been revealed in different studies among bovid autosomal chromosomes, chromosomal arms, banding patterns, and the order of genes in buffaloes, cattle, goats, and sheep [24,25]. The bovid sex chromosomes have a more complex sequence rearrangement as compared to the very similar autosomal chromosomes For instance, there are 29 acrocentric chromosomal pairs together with a pair of sex chromosomes (XY) in the cattle genome, whereas the buffalo genome consists of 19 acrocentric and 5 bi-armed chromosomes, in addition to a pair of sex chromosomes (XY) ( Figure 2). In cattle, two acrocentric chromosomes fuse to form the corresponding first five bi-armed chromosomal pairs of buffaloes (i.e., chromosomes 1 and 27 fuse to form buffalo chromosome 1, 2 and 23 fuse to form 2, 8 and 19 fuse to form 3, 5 and 28 fuse to form 4, and 16 and 29 fuse to form 5). All other buffalo chromosomes correspond to each of the remaining cattle chromosomes [18][19][20][21][22]. Synteny sequence confirmed the five chromosomal fusion events in cattle and river buffaloes (resulting in five bi-armed autosomal pairs of chromosomes), as shown in Figure 2A. Precisely, the chromosomal fusion of river buffalo chromosomes 4 and 9 results in the longest chromosome in swamp buffaloes ( Figure 2A) [21].

Chromosomal Array of African Buffaloes
Cytogenetic studies have demonstrated that S.c. caffer has 26 pairs of chromosomes and S.c. nanus possesses 27 chromosome pairs. Both species can interbreed, while their offspring have 53 chromosomes, with a lower fertility rate due to gamete imbalance, which results in abnormal zygote development [17]. The major difference between the two African buffalo subspecies is the presence of three bi-armed chromosomes in S.c. nanus whereas S.c. caffer has four chromosomes. All the rest of the chromosomes, plus the sex chromosomes, in both subspecies are acrocentric. These bi-armed chromosomal pairs in S.c. caffer were created through the fusion of cattle chromosomes 1 and 13, 2 and 3, 5 and 20, and 11 and 29 [23].
Additionally, no bi-armed chromosome pair sharing between Bubalus and Syncerus is detected, which suggests that there are no crosses between these two genera and if it happens, the resulting hybrid would have a deranged set of chromosomes. Consequently, the chromosome morphology categorizes each as a separate genus [17].

Sex Chromosomes
A high degree of homology has been revealed in different studies among bovid autosomal chromosomes, chromosomal arms, banding patterns, and the order of genes in buffaloes, cattle, goats, and sheep [24,25]. The bovid sex chromosomes have a more complex sequence rearrangement as compared to the very similar autosomal chromosomes [17]. Comparing the chromosomal bands that display conserved portions of these chromosomes, large blocks of constitutive heterochromatin are in X-chromosomes of buffaloes while cattle lack this. Cytogenetic studies have illustrated complex rearrangements of loci order on sex chromosomes that appeared in the karyotype evolution of buffaloes and cattle. X-chromosomes of both buffaloes and cattle have a variable position of the centromere with the same gene order ( Figure 2). This indicates the constitutive heterochromatin loss in cattle that distinguished it from buffaloes and, eventually, the centromere translocation event [17]. Comparative FISH mapping demonstrated the analogous positions of the Y-chromosomes in buffaloes and cattle. The differences observed in the Y-chromosomes of buffaloes and cattle are pericentric inversion (an inversion with the centromere and both arm breakage points) and that the Y-chromosome of buffaloes has more heterochromatin and is larger than the Y-chromosome of cattle [17].

Evolution and Domestication of Buffaloes
Previous studies have reported that the divergence time of swamp and river buffaloes ranges between 10,000 years and 1.7 million years [1,[26][27][28][29][30][31][32][33]. This variation is possibly due to the differences in the population size, source of samples, mitochondrial DNA (mtDNA) sequences, and rate of mutation. Whole mitogenomes have been employed to estimate the time of divergence, which ranged from 860,000 to 900,000 years [34], which is perhaps overestimated as population splitting time may have been much earlier based on mtDNA divergence [35]. Recently, a whole-genome study evaluated the divergence time between the river and swamp buffaloes by using Ks distribution and MCMC tree approaches and reported the estimated divergent time as 1.1 MYA and 3.5 MYA, respectively [21].
However, it is interesting to hypothesize that the estimated divergence time is the period when the Arakan Mountain system (east and west) separated the panmictic population into two regions; subsequently, chromosomal translocation occurred and diffused through the population of eastern B. arnee. Lau et al. [30] speculated that the B. arnee species originated in Southeast Asia mainland and afterward spread to the west Indian subcontinent, where river-type animals evolved. Instantly, the 4p/9 chromosome admixture occurred in the Southeast Asian mainland population. The Pleistocene glacial events affected the ancestral population of swamp buffaloes, which led to a decline in population size and development of remote refugia [34] and probably the river buffalo ancestors were also affected similarly.
During Holocene, the postglacial period (6000-11,000 years), the population size expanded and genetic data have indicated the independent event of domestication of swamp buffaloes in Southeast Asia and the river types in the Indian subcontinent [3,21,[30][31][32]36,37]. Molecular data provide the evidence that the divergence of swamp and river buffaloes preceded their domestication [36,38], where the swamp buffalo was domesticated in the Indo-China bordering area about 3000-7000 years ago and the river buffalo around 6300 years ago in the west of the Indian subcontinent [34,39,40]. Rapid population expansion and geographical distribution during the postglacial phase (Holocene) after domestication occurred due to climate improvement [34,41]. Historical and archaeological data have proved the westward migration of the river buffalo from its domestication epicenter to areas as far away as the Balkans, Italy, and Egypt [42].
Studies have confirmed this geographical expansion of buffaloes using molecular data, including SNPs and mtDNA [38], and archaeological evidence [37]. Generally, it looks like successive events of migration occurred at a geographical scale at different times. However, the buffalo population prevalent in the geologically adjoining areas of Pakistan and Iran are genetically different, as the population of Iran is closer to that of Turkey and Egypt. The Egyptian and Italian buffalo populations are also different from each other. Thus, Zeuner's [43] suggestion looks promising that the westward migration remained discontinuous, late, and slow. Colli et al. [37] proposed two independent events of migration that are well matched with population genetic diversity: the first one the proto-Mediterranean gene pool of Italy that came through the Balkans and the second the proto-Middle Eastern gene pool toward the Caspian Sea and Mesopotamia, later extended to Egypt and Turkey (Figure 3) with the expansion of Islam, as proposed by Unal et al. [44].  Microsatellite, mtDNA and skeletal fossils data of the domesticated buffalo suggest that Indo-China, south China, and north Thailand are the regions where swamp buffaloes were domesticated and spread to the Indonesian islands (Sulawesi, Sumatra, and Java) through Malaysia and then through northeast/north to central China and Philippines and Borneo through the eastern island route via Taiwan [39,40,46].
Sun et al. [47] described Y-chromosome-based distinctive riverian buffalo lineages with a base group (YR) unique to Pakistan and Indian breeds. The YR1 haplotype was a typical inhabitant of south Asia, but YR2 was only prevalent in Italy. The previous studybased scenarios indicated the likelihood early domestication of river buffaloes in the Indo-Pakistan region, followed by their subsequent migration to Europe [37]. Afterward, the YR1 group remained in south Asia, though it still is extremely diffused, while the YR2 buffalo group became unique to Italy. In the case of swamp buffaloes, the haplogroup YS2 Microsatellite, mtDNA and skeletal fossils data of the domesticated buffalo suggest that Indo-China, south China, and north Thailand are the regions where swamp buffaloes were domesticated and spread to the Indonesian islands (Sulawesi, Sumatra, and Java) through Malaysia and then through northeast/north to central China and Philippines and Borneo through the eastern island route via Taiwan [39,40,46].
Sun et al. [47] described Y-chromosome-based distinctive riverian buffalo lineages with a base group (YR) unique to Pakistan and Indian breeds. The YR1 haplotype was a typical inhabitant of south Asia, but YR2 was only prevalent in Italy. The previous study-based scenarios indicated the likelihood early domestication of river buffaloes in the Indo-Pakistan region, followed by their subsequent migration to Europe [37]. Afterward, the YR1 group remained in south Asia, though it still is extremely diffused, while the YR2 buffalo group became unique to Italy. In the case of swamp buffaloes, the haplogroup YS2 is distributed across Southeast Asia and southwest China (84.62%), whereas the YS1 buffaloes are dominating in the lower-middle and upper regions of Yangtze (76.09%). The early divergence of YS2 in the phylogeny indicates that the swamp buffalo population traveled from the south to the northern regions and YS1 experienced population expansion [47]. From the swamp buffalo mitochondrial gene pool, a pattern of geographical events has also been identified. Taking together the frequencies of swamp buffalo uniparental haplogroups as represented in Figure 4, a correlation between mtDNA and Y-chromosome haplogroups, i.e., between SA and YS1 and between SB and YS2 is observed, which points out the similarities between paternal and maternal histories. chromosome haplogroups, i.e., between SA and YS1 and between SB and YS2 is observed, which points out the similarities between paternal and maternal histories. Mitochondrial genome-based reports anticipated two sub-populations of swamp buffaloes [39], while the whole-nuclear-genome study suggested a monomorphic population of swamp buffaloes [21]. Other domesticated animals, including pigs, horses, and dogs, also followed the same phenomenon, representing hybridization in the past [48]. Furthermore, the time of divergence between the river and swamp buffaloes determined by Luo et al. [21] overlapped well-known geographical events, providing a hypothetical explanation for the forces responsible for this divergence. Particularly, in the Xixiabangma Mitochondrial genome-based reports anticipated two sub-populations of swamp buffaloes [39], while the whole-nuclear-genome study suggested a monomorphic population of swamp buffaloes [21]. Other domesticated animals, including pigs, horses, and dogs, also followed the same phenomenon, representing hybridization in the past [48]. Furthermore, the time of divergence between the river and swamp buffaloes determined by Luo et al. [21] overlapped well-known geographical events, providing a hypothetical explanation for the forces responsible for this divergence. Particularly, in the Xixiabangma glacial era, the descending sea-level created a migration passage for ancient buffaloes to cross the India-Myanmar bordering area, so initiating geographical separation and facilitating the fixation of genetic polymorphisms and chromosomal number variation into the genomes of two isolated populations. So, buffalo migration patterns, i.e., in Yunnan, Laos, and Myanmar, developed a hybrid zone harboring a genetic admixture of river and swamp buffaloes. This proposed scenario indicates a higher flow of genes among the buffalo population of the Southeast Asian region [21].

Recent Advances in Whole-Genome Sequencing of the Buffalo Genome
Buffaloes remained underutilized despite their excellent genetic potential. To make the best use of the buffalo, it is imperative to characterize genetic diversity and study its genomic architecture through high-throughput technologies. During the last five years, marvelous advancement in whole-genome sequencing of buffaloes has been made. As of March 14, 2021, a total of five annotated buffalo genome assemblies had been deposited in the National Center for Biotechnology Information (NCBI) [49]. A total of 1,921,573 nucleotide and 34,831 water buffalo gene sequences, including 471 mitochondrial sequences, have been submitted in the GenBank database. The nucleotide sequences deposited in GenBank include 273,061 whole-genome shotgun sequences, 538 microsatellite, 490 minisatellite, and 503 satellite sequences, while other sequences mainly consist of cytosine-phosphate-guanine (CpG) island, gene-like sequence cis-acting regulator, and repeat sequence and there are some additional genomic sequences.
Mintoo et al. [50] re-sequenced and assembled the river buffalo genome (2n = 50) from Bangladesh. The size of the genome was 2.77 Gb, with a scaffold N50 of 6.9 Mbp and contig N50 of 25 kb. From the assembled genome, 24,613 genes were annotated for further functional genomic studies. The assembled genomes were comparable with those of the water buffalo (Mediterranean) and the African buffalo. They used two different strategies to evaluate the genome completeness. Firstly, they downloaded EST/mRNA sequences from the NCBI and aligned them with the assembled genome by BLAST [51], which provided well-aligned data of approximately 98.15%. Secondly, they employed benchmarking against universal single-copy orthologs (BUSCO 2.0), which gave 94.3% core genes coverage from the assembled genome. More CpG island sequences were identified in the water buffalo genome (39,578) as compared to the cattle genome (12,120) [52]. This variation is attributed to the differential rate of recombination and chromosome size of these two species [53]. Mintoo et al. [50] stated segmental duplication of the buffalo genome, 94.5 Mbp which is comparable with about 94.4 Mbp of a previously reported cattle sequence [54], suggesting an event of duplication in the last common ancestor of cattle and water buffaloes [55]. In water buffaloes, 51.19% of the genome accounts for repetitive DNA (1418 Mbp) sequences, which is~13% higher than that of African buffaloes (37.21%) [56]. Notably, 49.06% of the genome were transposable elements (TEs), of which 41.50% accounted for long interspersed nuclear elements (LINEs) as the core TE component. Additionally, 38,483 transfer RNAs (tRNAs), 867 ribosomal RNAs (rRNAs), 1758 smalls nuclear RNAs (snRNAs), and 23,310 microRNAs (miRNAs) were annotated in the water buffalo genome. Moreover, expansion of 159 gene families was observed substantially in water buffaloes in contrast to other mammals [50]. A study also identified 382 candidate genes with positive selection sites that might play a role in the development of climatic adaptability in water buffaloes in different environments [50]. Mostly these genes were annotated to the metabolic pathways, the immune system functional pathway, and signal transduction pathways [50].
Recently, Dutta et al. [57] re-sequenced the genomes of 73 animals from six distinct buffalo breeds (Bhadawari, Banni, Murrah, Jaffarabadi, Surti, and Pandharpuri) to evaluate the genetic diversity of the water buffalo population in India by comparing them with 6 Mediterranean buffaloes as a distinct outgroup. Half of the animals from each of the breeds were sequenced at an average of 8× and the remainder at 37×, while all the Mediterranean buffaloes were sequenced at the coverage of a mean depth of 36×. The subsequent sequences of DNA were aligned with chromosome-level high-quality reference assembly of the water buffaloes (UOA_WB_1). Here, they identified 5,897,230 short insertions/deletions and 37,682,631 single nucleotide variants (SNVs). After the removal of low-quality variants, a final set of biallelic SNVs (26,247,559) was perceived of which 25,513,085 were autosomal [57].
The most comprehensive study on the buffalo genome to date re-sequenced 230 individuals (98 river buffaloes and 132 swamp buffaloes) across Europe and Asia [21]. Multiple approaches were used for sequencing and assembling the river and swamp buffalo reference genomes. The genomes of one Fuzhong swamp buffalo (female) and one river buffalo (Murrah female) were sequenced, and PacBio assemblies were constructed into contig N50 sizes of 3.1 Mb and 8.8 Mb for river and swamp buffaloes, respectively. The Illumina high-throughput/high-resolution chromosome conformation capture sequencing (Hi-C) was used to develop the scaffolds from contigs. These were developed into chromosomallevel scaffolds with N50 sizes of 116.1/117.3 Mb, the largest one 198.8/269.09 Mb for river/swamp buffaloes. The final assemblies covered 25 chromosomes, with 96.5% genome coverage, of the river buffalo, having an error rate of 2.13 × 10 −5 , and 24 chromosomes of the swamp buffalo, with 97.5% genome coverage, and an error rate of 9.22 × 10 −6 . The coverage and mapping rates for both buffaloes from Illumina reads reached levels >99%. Further, the repetitive sequences in the genomes of river and swamp buffaloes were 46.37% and 46.62%, respectively (Table 1), with a similar percentage of TE and ruminant-specific repeats as observed for cattle. No obvious chromosome fusion, breakpoint, or gene fusion events other than 4p/9 chromosome fusion were detected. Luo et al. [21] reported 20,202 and 19,279 gene models corresponding to river and swamp buffaloes by using repeat-masked genomes. BUSCO evaluations specified high-quality genomes with gene structure predictions representing 96%/96.8% completeness (Table 1) and identified 17,890 gene families. They also reported 78 and 99 positive selection genes in river and swamp buffaloes, respectively. In comparison to the Bos genus, Bubalus displayed more gene family contraction events (2117 vs. 1022) and fewer gene family expansion events (112 vs. 148). In the case of the two buffalo subspecies, 261 vs. 179 gene expansion events were identified in river and swamp buffaloes, respectively, which may have a functional effect related to the phenotypes of both buffalo subspecies [21].

Identification of Genes Affecting Important Buffalo Traits
Unraveling the possible role of genetic variants in and their physiological effects on phenotypic traits is the major challenge for genetics. To date, different studies have explored the association of genotypic variants in buffaloes, which are considerably related to heat stress, reproduction, behavior, coat color, and production traits (i.e., milk).

Heat Shock Protein Genes (HSPs)
Globally, heat stresses badly affect livestock production. HSPs are conserved molecular chaperones critical for protein maturation, refolding, and degradation, and in farm animals, including cattle and buffaloes, HSPs not only develop thermotolerance but also act as potential biological markers considered to measure the extent of heat stress in livestock animals [58]. Under stress conditions, HSPs are crucial for survival, protein homeostasis, and cellular responses. Molecular mass-based classification characterizes HSPs into HSPB1, HSPD, HSPH1, HSP40, HSP10, HSP70, and HSP90 gene families. A total of 64 HSP genes were reported in buffaloes, of which 39 genes belong to the HSP40 family; 4 to HSP90; 8 to HSPB; 10 to HSP70; and 1 each to HSPH1, HSP10, and HSPD [59]. In thermal stress conditions, the expression of HSPs distinctively increases to enhance the thermotolerance ability and serve as the first line of defense against heat shock to protect cells and tissues [60,61].
HSPs are known to have a crucial role in the regulation of immune response in buffaloes, as HSP-derived lymphocyte peptides are required to initiate/trigger the innate and adaptive immune responses [62]. The functional association of HSPs with reproductive efficiency has also been identified in different bovine species as variation in cattle HSP40 genes has been associated with early in vitro embryonic development. Likewise, polymorphism in the HSP70 family of cattle has also shown a potential association with observed differences in reproductive performance [63]. In different biological processes, HSPs (particularly HSP40 and HSP70) have exhibited their distinctive roles in several pathological disorders, such as neurodegeneration, muscular dystrophy, viral infectious diseases, and cancer [64][65][66]. HSP90, through its conformation and stability, is involved in stimulating oncogenic proteins. Thus, such oncogenic signaling pathways can be suppressed via the inhibition of HSP90 [65,66]. After heat shocks, the subsequent induction of HSP70 is effective in the regulation of different physiological parameters, i.e., pulse, respiratory rates, and rectal temperature, of the animal [67].
Moreover, the SNPs in the genic, untranslated, and promoter regions of HSPs have shown significant association with higher milk production, heat tolerance, stress resilience, and disease vulnerability in livestock [21,[68][69][70]. Furthermore, a recent whole-genome sequence-based study reported evidence of positive selection related to thermotolerance in buffaloes [21,59]. Luo et al. [21] reported HSP gene family expansion in both buffalo subspecies (river and swamp), which are stress-inducible molecular chaperones and responsible for environmental adaptation in buffaloes.

Reproductive Physiology-Related Genes
Buffalo productivity is primarily attributed to reproductive efficiency [71]. Different studies have associated fertility with potential genetic variants in both females and males [72]. Recently, in swamp buffaloes, the gonadotropin-releasing hormone receptor (GnRHR) and luteinizing hormone beta polypeptide (LHB) have been identified to be associated with semen quality parameters, including higher sperm count and ejaculate volume [73,74]. In the cytochrome P450 aromatase (CYP19A1) gene, four SNPs were identified, three in exon 10 and one in 5 UTR, which were associated with the fertility of Egyptian female river buffaloes [75]. Similarly, the Murrah buffaloes treated with melatonin exhibited variable rates of pregnancy due to melatonin receptor 1A (MTNR1A) gene diversity. Particularly, in animals having genotype TT in exon II (812 bp fragment) at position 72, the gene showed a high conception rate and estrus activity soon after the beginning of melatonin treatment [76]. A recent genome-wide association study (GWAS) detected and annotated 436 SNPs along with 34 indels in fertility-related 38 candidate genes of Murrah buffaloes [77].
Moreover, the candidate genes involved in the reproductive physiology of buffaloes at different stages are as follows: For age at first calving (AFC), interferon gene, including interferon-Tau (IFN-TAU), production in the early embryonic phase is the sign associated with maternal pregnancy confirmation in buffaloes [78]. The SNPs in the X-chromosomemapped SELP gene are significantly linked with AFC as their expression levels could have considerable effects on conception outcomes [79][80][81]. For calving interval (CI), the TPCNI gene is involved in the spermatozoa acrosomal reaction that is necessary for fertilization and is interesting to study in view that male fertility might be related to herd performance as compared to female fertility-related genes. The fertilization ability of bulls in a population could be studied based on increased conception rates and decreased CI [82]. For number of services per conception (NSC), the increased expression of the ABCC4 gene in the endometrium of pregnant cattle [83] and pigs [84] seems to act on prostaglandin efflux from cells and had a significant role in supporting pregnancy [83].
Buffaloes are a seasonal estrus species, and hormonal regulation has an impact on the animals' reproductive performance [85]. Li et al. [55] used Buffalo SNP Array (90 K Affymetrix Axiom) and identified five genes related to hormonal regulation: for thyrotropin TRHDE [55]; for prostrate hormones KCNMA1 [86], TBCB [87], and CDH10 [88]; and for thyroid hormones THRB [89]. In seasonal cycles of reproduction and body weight, thyroid hormones play a vital role [90] and thyroid dysfunction is linked with infertility, anovulation, and abortion [91]. Furthermore, thyroid hormone resistance (THR) is significantly associated with THRB mutation [89]. The KCNMA1 gene can enhance sexual behavior and erectile strength [92]. In buffalo granulosa cells, low expression of these hormone-related genes has been reported earlier but a lower hormone concentration may have a substantial effect on follicle growth regulation [93]. So, it could be assumed that buffalo fertility is somewhat controlled by genes involved in hormonal regulation [94].
Li et al. [94] reported five important genes, CSGALNACT1, GMDS, NDUFS2, HYAL4, and KYNU, related to metabolic pathways. The CSGALNACT1 gene reportedly affects the follicular growth of buffaloes by regulating the glucose metabolism level. CADM2 (cell adhesion molecule 2), which was detected in early follicular (granulosa cells) GCs, was associated with the reproductive performance of buffaloes [95]. The SNPs detected in CADM2 upstream (about 143 kb) and downstream regions are linked with AFC and SCA traits. The Nolz-1 or ZNF503, MTPN, and KRR1 genes were perceived to be highly expressed in the GCs of buffaloes, suggesting that these may have aided in dominant follicle selection. The Nolz-1 or ZNF503 plays an essential role in cell invasion and promotes the proliferation of mammary epithelial cells [96] and embryogenesis [97]. Moreover, KRR1 was stated to be related to polycystic ovary syndrome [98] and MTPN enhances skeletal muscle and cell growth [99] and plays an important role in the immune response by antigen recognition [100]. IGFBP7 is highly expressed in buffalo GCs of antral follicles, and IGFBP7 shares sequence identity with follistatin [101], which inhibits the secretion of follicle stimulating hormone (FSH) [102,103] and is critically involved in folliculogenesis and ovarian function [104,105]. Moreover, IGFBP7 knockdown in buffalo GCs reportedly affects cell cycles, cell proliferation, production of progesterone and estrogen, and the number of apoptotic cells. Therefore, it was speculated that IGFBP7 may be involved in follicular development and ovulation regulation [94].
In mammals, the reproduction system is regulated by the hypothalamic, pituitary, and gonad axis. Kisspeptin is an effective GnRH endogenous secretagogue that governs GnRH secretion, which ultimately drives spermatogenesis, steroidogenesis, and folliculogenesis and also activates ovulation in females [106].

Milk Production-Related Genes
Identification of genomic regions and respective candidate genes associated with milk production traits is imperative to devise strategies for the genetic improvement of buffaloes. So far, in different buffalo breeds, 19 candidate genes associated with milk production containing a total of 47 mutations have been reported [107]. Of these 47 mutations, 4 were present in the promoter region, 24 in intronic regions, and 19 in coding regions; of these 19 mutations in coding regions, 13 were non-synonymous and caused amino acid substitution. These 19 candidate genes for milk production traits are classified into four major groups: milk yield-associated genes comprise STAT1, STAT5A, LEP, MC4R, OXT, INSIG2, LALBA, BTN1A1, PRL, SCD, and SREBF1; milk fat yield-related genes include GHRL and A2M; milk fat percentage-related genes include STAT1, TG, A2M, DGAT1, GHRL, LEP, MC4R, PRL, SCD, and SREBF1; and milk protein percentage-related genes include CSN1S1, DGAT1, GHRL, ADRA1A, A2M MTNR1A, PRL and SPP1, INSIG2, and MC4R, as shown in Table 2 [107]. Previously, the bovine GWAS SNP chip has been used for studying buffalo milk production traits [95,108,109] due to the chromosomal homology between cattle and buffaloes [110,111]. For the first time, Wu et al. [95] identified seven SNPs associated with milk yield in buffalo populations by using a bovine SNP chip (Illumina BovineSNP50 BeadChip). Later on, Venturini et al. [108] applied a high-density bovine SNP chip (777 k SNPs, Illumina Infinium BovineHD BeadChip) to study the Brazilian buffalo population. A huge number of SNPs related to the milk production traits were identified, but after multiple testing and modification, only two SNPs (present on chromosomes 15 and 20) were confirmed to have significant association with milk yield. Afterward, Affymetrix (Axiom Buffalo Genotyping Array 90 K) also released a commercial buffalo SNP chip [107].
ESSRG was found in the Brazilian buffalo breed by both buffalo [111] and bovine [108] SNP chips. Both in Italian Mediterranean [113,115] and Brazilian buffalo breeds [108], APOB was identified by different SNP chips. The interactive relationship of the APOB was observed with candidate genes, including thyroglobulin (TG), leptin (LEP), and sterol regulatory element-binding transcription factor 1 (SREBF1) [107]. Low-density lipoproteins and chylomicrons apolipoprotein are the main products of the APOB gene. Recently, Lee et al. [116] reported that APOB is involved in the regulation of lipid metabolism.
In river buffaloes, a non-synonymous mutation in exon 10 of the insulin-like growth factor 2 (IGF2) gene has shown a potential association with higher daily weight gain from birth to 36 weeks [117]. Furthermore, in Mediterranean river buffalo females, the C > T substitution in STAT5A was associated with milk protein percentage [118]. El-Komy et al. [119] screened the GHR polymorphisms and their potential association with milk yield traits in Egyptian buffaloes (400 animals). The mutations in four exons (E4-E6 and E8) of the GHR gene were investigated, and in E4 no variation was detected, while two SNPs in E5 (c.380G > A/p.Arg127Lys and c.387C > T/p.Gly129), a single silent alteration (c.435A > G/p.Pro145) in E6, and an additional missense mutation (c.836T > A/p.Phe279Tyr) in E8 were spotted. Two SNPs c.380G > A and c.836T > A in the extracellular and transmembrane regions, respectively, were related with milk yield; protein percentage; fat percentage; and 305-day milk, protein, and fat yield, with higher levels in individuals possessing the mutant A allele. Remarkably, the animal with two mutant alleles (AA) gave a higher milk yield, with higher protein and fat percentages, by upregulating the expression of GH, GHR, prolactin (PRL), prolactin receptor (PRLR), diacylglycerol acyltransferase-1 (DGAT1), CSN2 gene-encoded beta-casein and insulin-like growth factor 1 (IGF1) genes, and proteins in milk-producing cells [119]. In the CSN1S2 gene, the coding sequence of river and swamp buffaloes, 13 SNPs were identified, including 8 non-synonymous substitutions. The amino acid variations due to c.580T > C and c.642C > G might have a physiological impact on the αS2-CN synthesis and ultimately the milk yield in buffaloes [120,121].
Recent genome-wide studies have detected 8 DGAT genes in buffaloes, which were grouped into two subfamilies, DGAT1 and DGAT2, in comparison to 15 DGAT homologous proteins in Bos taurus [122]. Association of DGAT genes with milk production traits was analyzed by using data from 489 buffaloes with 1424 lactations [122], and 20 SNPs associated with different milk production traits were identified. The most significant SNP in the DGAT1 genomic region (Affx-79,549,398) was associated with buffalo milk protein and fat percentages.

Body Coat Color
Visible skin-color phenotypes can be used to discover gene expression regulation and the patterns of coat color evolution in animals [123]. The dark-gray body color is common in swamp buffaloes, while in some populations, the white coat color variant is prevalent up to 10% [124]. In swamp buffaloes, genetic analysis revealed the dominance of white color over black [125]. Missense mutations in the MC1R gene of river buffaloes were identified, but these variations did not explain the difference between the dark-gray and white color of swamp buffaloes [126]. In Indonesia (Tana Toraja), mainly the white-spotted buffalo bulls are highly prized and slaughtered in funeral ceremonies [127]. In microphthalmiaassociated transcription factor (MITF), two independent mutations that caused functional loss, a donor splice-site alteration and a premature stop codon, were importantly linked with white-spotted skin color [127].
Liang et al. [123] conducted a whole-genome-sequencing-based study to map the swamp buffalo white coat phenotype. The comparative population-based genomic analyses (41 black and 22 white swamp buffaloes) detected the signatures of the selection underlying the white coat phenotype. In the agouti signaling protein (ASIP) gene, Liang et al. [124] reported in LINE-1 a 2809-bp-long insertion, which is the reason for the white coat pigmentation in swamp buffaloes. In the white body coat, the LINE-1 insertion acted as a potential proximal promoter, which increased the transcription of ASIP 10-fold. The transcribed 165 bp of 5 UTR from LINE-1 is combined with ASIP first coding exon, and a chimeric transcript is developed. The enhanced expression of ASIP prevents the maturation of melanocytes, resulting in the absence of color in white buffaloes' hairs and skin. The phylogenetic analyses specified recent genetic transposition events that evolve a specific ASIP allele related to buffalo white coloration in swamp buffaloes. Moreover, in cattle ASIP gene, similar insertion of LINE-1 has been identified, which is evident for the convergent mechanism of evolution for coat color in the Bovini tribe [123].

Disease Resistance
Water buffaloes' susceptibility or resistance to a specific disease, like tuberculosis or mastitis, has also been influenced by some genetic variants. The higher vulnerability of Mediterranean buffaloes to bovine tuberculosis was associated with an SNP (G > A) at position 4467 in the 3 UTR of the interferon-gamma (IFNG) gene, which caused the target sequence disruption for the micro-RNA (miR-125b) [128]. A substitution C > A in exon 27 of the C3 (complement component 3) gene was traced in the buffalo population that is significantly associated with the somatic cell score, which is a potential indicator of mastitis in dairy animals [129].
Globally, in animals, foot-and-mouth disease (FMD) is caused by single-stranded RNA virus (FMDV) infection, which leads to economic losses in livestock production. The resistance or susceptibility of FMD in buffaloes is owing to genetic variations in the BoLA-DRB3 gene. In exon 2 of the BoLA-DRB3 gene, 302-bp-amplified fragments were digested with HaeIII endonuclease enzyme and five BoLA-DRB3 genotypes were traced and the genotype AA was correlated with FMD resistance. However, the AC genotype might be associated with FMD susceptibility in Egyptian buffaloes [130]. Furthermore, in cattle, the natural resistance to brucellosis has been associated with the (GT)13 microsatellite allele of SLC11A1 at 3 UTR [131]. Various other reports have also discovered a strong association of (GT)n microsatellite alterations with vulnerability to or resistance against brucellosis in buffaloes [131][132][133][134].

The Future Perspective of Buffaloes
Domesticated water buffaloes are a proficient source of nutritious products, particularly milk and meat, and have an excellent capability to live on marginal resources under adverse environmental conditions [5,135]. Buffaloes are an important dairy species due to the peculiar taste of their milk and unique milk products, like Mozzarella cheese, in addition to their socio-cultural significance. Owing to better climate resilience and adaptability attributes, buffaloes would be important in the future climate change scenario for sustainable dairy production setups where Bos taurus cannot thrive or perform well, particularly in tropical areas.
To exploit the excellent potential of buffaloes in terms of superior production performance and climate resilience, it is imperative to accelerate genetic progress in this species by using genomic selection and improved breeding. Genomic selection has proven successful in cattle for making genetic improvements and increasing the rates of genetic gain and is equally feasible for buffalo improvement [136]. A simulated study has shown that using genomic selection in buffaloes can reduce the generation interval from 9.5 to 3.3 years (for male path) while increasing selection response by two times and reducing the cost of proving bulls by 88% [137]. This envisages future advancement in water buffalo breeding programs by exploiting genomic selection. No doubt the use of genomic information in genetic evaluation of buffaloes is still in its infancy, but it has a very bright future as remarkable progress is expected in buffalo breeding using genomic information. The advancement in the field of genomics will also facilitate a better understanding of the unique genetic features of water buffaloes and will also provide more influential tools for the genome-based identification of quantitative trait loci (QTL) and candidate genes. This would help to understand the regulatory mechanism of fundamental traits, including adaptation, disease resistance, and production.
Further, genome editing technologies are also likely to enhance the genetic potential of buffalo populations. So far, few studies involving the whole buffalo genome have been documented; therefore, buffalo genomic data are still scanty. Thus, there is a dire need to study the genomic annotation of buffaloes on a wider scale using high-throughput technologies. Furthermore, the identification of functional candidate genes should be given more consideration to explore the genetic variants associated with production performance, climate adaptation, and disease resistance. Taken together, genome-wide information would facilitate future initiatives focusing on genetic improvement and effective use of buffalo genetic resources globally.

Conclusions
Recent advancements in high-throughput technologies like whole-genome sequencing, GWAS, gene expression profiling, next-generation sequencing (RNA and DNA), and genome-wide CHIP-seq scanning are used to detect the genetic variants, elucidate gene regulation, and perform function profiling in buffaloes. These offer wide-ranging whole-genome data and high coverage to genomic, epigenomic, transcriptomic, and proteomic information related to cellular interactions, functioning, and behavior. In buffaloes, candidate gene studies have used the available genetic resources to uncover the functional candidate genes and their potential association with buffalo performance, including production, adaptation, and disease resistance. Thus, the whole-genome and candidate gene approach to next-generation data could be helpful in elucidating the complex traits, genomic coverage, and productivity-related attributes, which will facilitate improved breeding and better use of buffalo genetic resources.