Analysis of the Origin of Emiratis as Inferred from a Family Study Based on HLA-A, -C, -B, -DRB1, and -DQB1 Genes

In this study, we investigated HLA class I and class II allele and haplotype frequencies in Emiratis and compared them to those of Asian, Mediterranean, and Sub-Saharan African populations. Methods: Two-hundred unrelated Emirati parents of patients selected for bone marrow transplantation were genotyped for HLA class I (A, B, C) and class II (DRB1, DQB1) genes using reverse sequence specific oligonucleotide bead-based multiplexing. HLA haplotypes were assigned with certainty by segregation (pedigree) analysis, and haplotype frequencies were obtained by direct counting. HLA class I and class II frequencies in Emiratis were compared to data from other populations using standard genetic distances (SGD), Neighbor-Joining (NJ) phylogenetic dendrograms, and correspondence analysis. Results: The studied HLA loci were in Hardy–Weinberg Equilibrium. We identified 17 HLA-A, 28 HLA-B, 14 HLA-C, 13 HLA-DRB1, and 5 HLA-DQB1 alleles, of which HLA-A*02 (22.2%), -B*51 (19.5%), -C*07 (20.0%), -DRB1*03 (22.2%), and -DQB1*02 (32.8%) were the most frequent allele lineages. DRB1*03~DQB1*02 (21.2%), DRB1*16~DQB1*05 (17.3%), B*35~C*04 (11.7%), B*08~DRB1*03 (9.7%), A*02~B*51 (7.5%), and A*26~C*07~B*08~DRB1*03~DQB1*02 (4.2%) were the most frequent two- and five-locus HLA haplotypes. Correspondence analysis and dendrograms showed that Emiratis were clustered with the Arabian Peninsula populations (Saudis, Omanis and Kuwaitis), West Mediterranean populations (North Africans, Iberians) and Pakistanis, but were distant from East Mediterranean (Turks, Albanians, Greek), Levantine (Syrians, Palestinians, Lebanese), Iranian, Iraqi Kurdish, and Sub-Saharan populations. Conclusions: Emiratis were closely related to Arabian Peninsula populations, West Mediterranean populations and Pakistanis. However, the contribution of East Mediterranean, Levantine Arab, Iranian, and Sub-Saharan populations to the Emiratis’ gene pool appears to be minor.


Introduction
The human leukocyte antigen (HLA) complex is a group of over 200 closely linked genes, located on the short arm of human chromosome 6, spanning 3.6 Mb [1]. HLA

Introduction
The human leukocyte antigen (HLA) complex is a group of over 200 closely linked genes, located on the short arm of human chromosome 6, spanning 3.6 Mb [1]. HLA genes are extremely polymorphic, with over thirty-six thousand alleles at multiple loci described [1]. HLA genes encode for the cell-surface HLA molecules involved in recognizing processed peptide antigens presented to T lymphocytes [2]. There is a strong linkage disequilibrium (LD) between alleles at different HLA loci, under which specific alleles are inherited together more frequently than expected. HLA allele and haplotype frequencies vary greatly among different populations and ethnic groups [3,4], which makes comparison of HLA frequencies between populations valuable for anthropological studies and understanding the history of human migrations.
The United Arab Emirates (UAE) is situated at the South-East corner of the Arabian Peninsula, west of the Strait of Hormuz, bordering Saudi Arabia to the west and south, and Oman to the east (Figure 1). The UAE spreads over 83,600 km 2 area, includes 200 islands, and consists of 7 Emirates which entered into federation in 1971. The current UAE population is about 9.60 million (2018) [5], 11.48% of whom are indigenous Emiratis [6]. Arabic is the official language of the country and is spoken by the indigenous population. Artifacts discovered at Jebel Faya in the Emirate of Sharjah show that the territory has been inhabited for the past 125,000 years. Neolithic village settlement and Late Stone Age artefacts were also discovered in the Marawah and Baynunah Islands [7]. Historically, the UAE is a central link for connecting Africa with Asia, and the first human migration out of Africa to Asia passed across the UAE [8,9]. More recent migrations into the Arab Peninsula from the Middle East and Asia took place across the Straits of Hormuz, a major trade channel linking the Indian subcontinent and the Gulf States. This migration out of Africa may have had a bearing on the genetic makeup of the present-day Emirati population. The Arab migration in 7th and 11th centuries contributed slightly to the homogenization of the Arabian Peninsula (Saudis, Kuwaitis, Emiratis) and African populations [10]. The notion of the pre-Islamic relatedness between North Africans and Peninsular Arabs is favored by the documented relatedness of Emiratis with Berbers and Basques who remained isolated and did not undergo external genetic exchanges during the 7th and 11th centuries' Arab migration [10]. It may also be attributed to the fact that Peninsular Arabs Artifacts discovered at Jebel Faya in the Emirate of Sharjah show that the territory has been inhabited for the past 125,000 years. Neolithic village settlement and Late Stone Age artefacts were also discovered in the Marawah and Baynunah Islands [7]. Historically, the UAE is a central link for connecting Africa with Asia, and the first human migration out of Africa to Asia passed across the UAE [8,9]. More recent migrations into the Arab Peninsula from the Middle East and Asia took place across the Straits of Hormuz, a major trade channel linking the Indian subcontinent and the Gulf States. This migration out of Africa may have had a bearing on the genetic makeup of the present-day Emirati population. The Arab migration in 7th and 11th centuries contributed slightly to the homogenization of the Arabian Peninsula (Saudis, Kuwaitis, Emiratis) and African populations [10]. The notion of the pre-Islamic relatedness between North Africans and Peninsular Arabs is favored by the documented relatedness of Emiratis with Berbers and Basques who remained isolated and did not undergo external genetic exchanges during the 7th and 11th centuries' Arab migration [10]. It may also be attributed to the fact that Peninsular Arabs are distant from the Arab Levantines, although the Arab migration were earlier, massive, frequent and effective in the Levant [10].
This paper aims to investigate the genetic relatedness between Emiratis and other populations. This is the largest study that examines the origin of present-day Emiratis using HLA family data. We hypothesize that Emiratis are more closely related to Arabian Penin-sula, West Mediterranean and Indian subcontinent populations than to East Mediterranean, Levantine Arab, Iranian, and Sub-Saharan populations.

Study Subjects
Study subjects included 200 unrelated healthy parents from 100 indigenous UAE families, with the average number of siblings in families ranging from 2 to 7. Only families with four haplotypes well-defined by segregation were included in this study. All grandparents of the study subjects lived in the UAE. For comparative purposes, populations from Arabian Peninsula, Asia, Africa and Europe were included, and are detailed in the Supplementary Table S1. The study received an ethical approval from the Institutional Review Board (IRB) at Sheikh Khalifa Medical City in Abu Dhabi.

DNA Genotyping
Total genomic DNA was prepared from anti-coagulated peripheral venous blood using the QIAmp Blood Mini kit (Qiagen, Hilden, Germany) or the MagNa Pure 96 DNA and Viral NA small volume kit (Roche Diagnostics, Mannheim, Germany) according to the manufacturers' protocol. DNA samples were stored below −20 • C until analysis. The DNA concentration was quantified using a NanoDrop 2000 C spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA). HLA class I (A, C and B) and class II (DRB1 and DQB1) genotyping was performed at low-to-intermediate resolution using Luminex LabType Sequence Specific Probe Hybridization (SSO) typing kits (OneLambda Inc., Thermo Fisher, Canoga Park, CA, USA) on a Luminex 200 or Luminex Flexmap 3D instrument, following the manufacturer's instruction. Briefly, the target DNA was PCRamplified by the reverse single specific oligonucleotide using group-specific biotinylated primers specifically designed for exons 2 and 3 of HLA class I (HLA-A, -C and -B) and exon 2 for HLA class II (HLA-DRB1 and -DQB1) genes. Biotinylated PCR product was denatured, hybridized to probes in beads and detected by R-Phycoerythrin-conjugated Streptavidin (SAPE) in the Luminex flow analyzer. The instrument has a red laser for beads identification and a green one for phycoerythrin detection. Positive and negative controls were included in each set of beads in order to subtract non-specific background signals and normalize the raw data for possible sample variability and reaction efficiency.
To ensure accuracy of the results, quality control measurements comprised of retyping 10% of samples with the most common haplotypes using the One Lambda NXType HLA sequencing assay and TypeStream Virtual NGS analysis software v3.0 (One Lambda, Inc., Canoga Park, CA, USA) on Ion Torrent Sequencing System (Thermo Fisher Scientific, Waltham, MA, USA). Our HLA laboratory is accredited by the College of American Pathologists (CAP), American Society of Hostocompatibility and Immunogenetics (ASHI) and ISO15189.

Statistical Analysis
HLA haplotypes were assigned with certainty by segregation (pedigree) and haplotypes frequencies were obtained by direct counting. The D [11], Wn [12], Wa/b and Wb/a [13] measurements of the linkage disequilibrium (LD) were calculated for one-field (allele lineage level) haplotypes using the Phased Or Unphased Linkage Disequilibrium (POULD) R package (v1.0.1) (https://cran.r-project.org/web/packages/pould/pould.pdf), (accessed on 23 November 2022) [14,15]. The conditional asymmetric LD (ALD) measurements, Wa/b and Wb/a, are extensions of the Wn measurement for highly polymorphic loci. Where a Wn value may over-estimate the LD between such loci, the ALD approach allows the investigation of LD for pairs of polymorphic loci in which alleles at one locus may display complete LD with alleles at a second locus, while alleles at the second locus are in a less-than-complete LD with alleles at the first locus. The Wa/b measurement reflects a variation on alleles at the a locus on any of the haplotypes conditioned on b locus alleles, while the Wb/a measurement reflects variation on alleles at the b locus on any of the haplotypes conditions on a locus alleles. HLA allele frequency counts and Hardy-Weinberg Equilibrium (HWE) testing were done using PyPop (Python for Population) genomics, version 0.7.0 (http://www.pypop.org), (accessed on 21 June 2021) [16,17]. For Ewens-Watterson homozygosity (EWH), PyPop was also used in calculating observed (F obs ) and expected (F exp ) homozygosity, and the normalized deviate of the homozygosity (F nd ) for each locus [18][19][20][21]. DISPAN software was used to construct phylogenetic trees (dendrograms) [22] using the neighbor-joining (NJ) method [23][24][25] from matrices of standard genetic distances (SGDs) [25]. Three-dimensional correspondence analysis and bi-dimensional representation based on SGDs were performed using the VISTA version 7.2.8 software [26].

HLA Allele Lineage Frequencies
The studied HLA-A, -C, -B, -DRB1 and -DQB1 frequencies were in Hardy-Weinberg Equilibrium (HWE) in the UAE population (Supplementary Table S2). Frequencies of the allelic lineages of the HLA-A, -C, -B, -DRB1 and -DQB1 are shown in Table 1

Two-and Five-Locus HLA Haplotypes
Linkage Disequilibrium (LD) estimates for two-locus phased haplotypes are shown in Table 2. While D and Wn values are often comparable (c.f., values for A~DQB1 and C~DQB1 haplotypes), the ALD values illustrate asymmetry in the LD for specific locus pairs. The ALD measurements (WDRB1/DQB1 and WDQB1/DRB) dissect the variation on each locus conditioned on the other. The highest ALD values are observed between DRB1 and DQB1 and between HLA-B and HLA-C; WDRB1~DQB1 is 0.9, while WDQB1~DRB1 is 0.58, and WB~C is 0.78, while WC~B is 0.64. The lowest LD values are observed between HLA-C and DQB1 and between HLA-A and DQB1; WC~DQB1 is 0.31, while WDQB1~C is 0.19, and WA~DQB1 is 0.22, while WDQB1~A is 0.13. In most cases, where there are large differences in the number of alleles at two loci, the less polymorphic locus displays a lower ALD when conditioned on the more polymorphic locus. Overall, the LD and ALD values are higher between more proximal loci (c.f., B~C vs. A~C, and A~DQB1 vs. DRB1~DQB1), reflecting higher recombination rates over longer physical distances [10]. In the Emirati families, DRB1*03~DQB1*02 (21.3%), DRB1*16~DQB1*05 (17.3%), B*35~C*04 (11.7%), B*08~DRB1*03 (9.7%) and A*02~B*51 (7.5%) were the most frequent two locus haplotypes. Those with significant linkage disequilibrium and frequencies exceeding 1% are shown in Table 3 and the complete list of HLA-class I and -class II two-locus haplotypes are shown in the Supplementary Table S3.  HLA-A*26~C*07~B*08~DRB1*03~DQB1*02 was the most frequent 5-locus HLA haplotype (4.25%) in the studied Emirati sample (Table 4).

Ewens-Watterson Homozygosity Test of Neutrality
Ewens-Watterson homozygosity (EWH) test results for Emiratis revealed negative F nd values for the five loci tested, and lower than expected homozygosity under selective neutrality (Supplementary Table S4). Except for the HLA-B locus (p = 0.15), significant deviations were observed for A (p = 0.006), C (p = 0.003), DRB1 (p = 0.005), and DQB1 (p = 0.006). The Ewens-Watterson homozygosity test of neutrality findings suggest that the HLA allele frequency distributions were shaped by balancing selection, conforming with previous works carried out on a worldwide sample of populations [27].

Phylogenetic and Correspondence Analysis
To construct Neighbor-Joining (NJ) dendrograms and correspondence plots, standard genetic distance (SGD) values are used. SGD values between Emiratis and other populations are shown in Table 5. Omani (0.0276), Saudi (0.0886), North African [as Tunisian (0.1108) and Algerian (0.1127)] and Iberian populations had the closest genetic distance from the investigated Emirati population, while East Mediterranean and Sub-Saharan populations had the highest genetic distance.  NJ dendrograms, based on SGD using HLA-DRB1 allele frequency data from 85 population datasets (Table 5), demonstrated relatedness between Emiratis and other populations ( Figure 2). The NJ tree shows clustering of the Emirati population with the Arabian Peninsula (Omanis, and Kuwaitis) and West Mediterranean (North Africans) populations. However, they appear distant from Turkish, Macedonian, Greek, Levantine (Syrians, Palestinians, and Lebanese), Iranian, Iraqi Kurdish, and African Sub-Saharan populations.
Correspondence analysis based on SGD, presenting the relationship between Emiratis and several populations from the Arabian Peninsula, Asia, Africa and Europe according to HLA-A, -B, -DRB1, and -DQB1 allele frequency data is shown in Figure 3. Similar trends were demonstrated, with clustering of the 35 populations into 3 distinct groups. Emiratis cluster with the Arabian Peninsula, Pakistani, North African and Iberian populations. Correspondence analysis based on SGD, presenting the relationship between Emiratis and several populations from the Arabian Peninsula, Asia, Africa and Europe according to HLA-A, -B, -DRB1, and -DQB1 allele frequency data is shown in Figure 3. Similar trends were demonstrated, with clustering of the 35 populations into 3 distinct groups. Emiratis cluster with the Arabian Peninsula, Pakistani, North African and Iberian populations.  Supplementary Table S1. The studied population (Emiratis).is marked in red.

Discussion
This work constitutes an anthropological genetic study on Emiratis and is unique in that it involves HLA family data, a large sample size relative to the total indigenous population, and genotypes for five HLA loci. This was in contrast to earlier studies of the UAE population, which often had small sample sizes, and some even relied on serological HLA data [28][29][30][31]. HLA allelic frequencies of the present study were compared with approximately 90 populations from the three continents.

Discussion
This work constitutes an anthropological genetic study on Emiratis and is unique in that it involves HLA family data, a large sample size relative to the total indigenous population, and genotypes for five HLA loci. This was in contrast to earlier studies of the UAE population, which often had small sample sizes, and some even relied on serological HLA data [28][29][30][31]. HLA allelic frequencies of the present study were compared with approximately 90 populations from the three continents.
The distribution of HLA-A, -C, -B, -DRB1, and -DQB1 genotypes in Emiratis was compared with those of other Arab, Mediterranean, and Sub-Saharan African populations using genetic distances, NJ dendrograms, and correspondence analysis (Figures 2 and 3). Our results showed that Emiratis appear to be related to West Mediterranean (North African, Iberian, and French) and Arabian Peninsula (Saudi, Kuwaiti, Omani) populations, and relatively distinct from Levantines, Mediterranean East Europeans, Iranian, and Sub-Saharan communities. This suggests a low genetic contribution of Levantine Arabs, Iranians, and Sub-Saharans to the Emirati gene pool.
The relatedness of Emiratis to neighboring Peninsular Arab communities is explained by the fact that Arab Peninsula countries share, with slight differences, similar historical background and the same territory. On the other hand, the relatedness of Emiratis to North African and Iberian populations is likely the result of the mass eastern migration which took place 10,000 years earlier, after the settlement of hyper-arid conditions in the African Sahara [39].

Conclusions
Our study, based on genetic distance, NJ dendrograms, and correspondence analysis, showed that Emiratis are related to the Arabian Peninsula, Pakistani, and West Mediterranean populations, but distinct from Levantine, Iranian, and Sub-Saharan communities. An important part of the Emirati genome came from the West.  Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors in the UAE. This work was supported by the National Institutes of Health (NIH) and National Institute of Allergy and Infectious Disease (NIAID), grant R01AI128775 (SJM). The content is solely the responsibility of the authors and does not necessarily reflect the official views of the NIAID, NIH, and United States government.

Institutional Review Board Statement:
The study protocol was approved by the Institutional Review Boards at SKMC (REC-25-10-2016 324 RS-445).

Informed Consent Statement:
The study received an ethical approval from the Institutional Review Board (IRB) at Sheikh Khalifa Medical City in Abu Dhabi. The research used the existing data from the hospital medical records, and therefore, the informed consent was waived.

Data Availability Statement:
The data presented in this study can be found in the article/Supplementary Material; further inquiries can be directed to the corresponding author.