Identification of Genetic Variants in 65 Obesity Related Genes in a Cohort of Portuguese Obese Individuals

Obesity is a major public health problem, which has a strong genetic component that interplays with environmental factors. Several genes are known to be implicated in the regulation of body weight. The identification of alleles that can be associated with obesity is a key element to control this pandemic. On the basis of a Portuguese population, 65 obesity-related genes are sequenced using Next-Generation Sequencing (NGS) in 72 individuals with obesity, in order to identify variants associated with monogenic obesity and potential risk factors. A total of 429 variants are identified, 129 of which had already been associated with the phenotype. Comparing our results with the European and Global frequencies, from 1000 Genomes project, 23 potential risk variants are identified. Six new variants are discovered in heterozygous carriers: four missense (genes ALMS1-NM_015120.4:c.5552C>T; SORCS1-NM_001013031.2:c.1072A>G and NM_001013031.2: c.2491A>C; TMEM67-NM_153704.5:c.158A>G) and two synonymous (genes BBS1-NM_024649.4:c.1437C>T; TMEM67-NM_153704.5:c.2583T>C). Functional studies should be performed to validate these new findings and evaluate their penetrance and pathogenicity. Regardless of no cases of monogenic obesity being identified, this kind of investigational study is important when we are still trying to understand the aetiology and pathophysiology of obesity. This will allow the identification of rare variants associated with obesity and the study of their prevalence in specific populational groups.


Introduction
Obesity is a major public health problem. The World Health Organization estimated that in 2016 more than 650 million adults (13% of the world's population) would be obese and 1.9 billion (39% of the world's population) would be overweight [1]. In Portugal, it was estimated that in 2017, 22.3% of the population was obese [2].
Obesity is a complex disease influenced by the interaction of genetic and environmental factors [3] and a major risk factor for the development of other conditions, such as type 2 diabetes mellitus, hypertension, cardiovascular diseases and even certain types of cancer. The presence of these severe co-morbidities is responsible for the increased risk of mortality in these individuals [4,5].
Obesity can also be one of the phenotypic characteristics in syndromic diseases, such as the Prader-Willi syndrome, Bardet-Biedl syndrome or Albright's hereditary osteodystrophy, but this only accounts for a small percentage of obesity cases.
Moreover, cases of non-syndromic monogenic obesity are also very rare, accounting for less than 5% of diagnoses in Europe [6]. Non-syndromic monogenic obesity may have a dominant or recessive inheritance, resulting from a pathogenic mutation in genes with an essential role in maintaining energy homeostasis by participating in the leptin-melanocortin Genes 2021, 12, 603 2 of 9 signalling pathway. These include the LEP, LEPR, MC4R, POMC and PCSK1 genes. A mutation with loss of function in one of these genes in homozygous carriers will cause disruption in this signalling pathway, resulting in monogenic obesity [3,4,6]. Approximately 140 variants in these genes associated with non-syndromic monogenic obesity are documented in the ClinVar (https://www.ncbi.nlm.nih.gov/ClinVar (accessed on 3 January 2020)) and Ensembl (https://www.ensembl.org (accessed on 3 January 2020)) databases.
Most often, obesity has a polygenic and multifactorial origin. Polygenic, since it results from the presence of several common variants. When these variants are isolated, they have no or little metabolic effects, but when they are together, they increase the susceptibility to gain weight [7]. Multifactorial, since it is related not only to predisposing genetic factors, but also to the "obesogenic" environment, associated with a sedentary lifestyle (imbalance between intake and energy expenditure) and social and cultural factors. These non-genetic factors, such as nutrition and physical activity, also seem to have an impact on the modulation of gene activity, due to changes in the epigenetic signatures [3].
Therefore, the identification of variants associated with obesity is extremely relevant for differential diagnosis, for treatment selection in these patients and for prenatal diagnosis in families where a known variant was found.
Despite all the research studies already conducted, the aetiology of obesity is still far from being fully understood. The development of further studies, in order to identify new genes and variants involved in the regulation of body weight and obesity, and the study of their prevalence in a specific population, will allow us to move towards a future focused on the patient, with more effective prevention, follow-up and personalised treatment.
With this in mind, the primary aim of this research is to find new genetic variants linked to obesity and assess its prevalence in a Portuguese population.

Materials and Methods
We considered 65 genes (exons and exon-intron junctions), previously identified as risk factors for obesity development, which were sequenced by NGS, in a Portuguese group of obese/overweight individuals.

Patient Selection
The study sample consisted of 72 individuals (55 females, aged between 21 and 67 years and with a body mass index (BMI) between 26.9 and 68.0 kg/m 2 , followed up at the Multidisciplinary Obesity Department of the Centro Hospitalar Universitário de Lisboa Central (Curry Cabral). The collection of the samples occurred after the individual signature of an informed consent. The protocol was approved by the Ethical Committee of ESTeSL (CE-ESTESL-No.41-2018).

Laboratory Analysis
Oral epithelium cells were collected with swab and DNA was extracted with Ex-tractMe kit ® from BLIRT SA. The success of the extraction was evaluated by a fluorimetric method using the QuBit ® equipment and the Qubit dsDNA high-sensitivity assay, from ThermoFisher Scientific (target-selective dyes bound to DNA and emit fluorescence). For all samples, some after 2 elutions, it was possible to reach the minimum concentration required for subsequent sequencing of the panel of 65 genes by NGS.
After obtaining the desired DNA concentrations, the libraries were prepared with TruSight One kit ® from Illumina, Inc. (San Diego, CA, USA) respecting the manufacturer's instructions. Sequencing was performed on the NextSeq550 equipment from Illumina, Inc using the NextSeq 500/550 High Output Kit v2 (150 cycles).

Data Analysis
The analysis of the obtained data began with the pre-processing of the sequences, and the variant calling, in the Enrichment and BWA Enrichment applications of Illumina, Inc. For the variant annotation, the software Illumina Variant Studio 3.0 was used. Only variants detected in both applications (Enrichment and BWA Enrichment of Illumina, Inc.) and passing the PASS filter of Illumina Variant Studio 3.0, were considered for the continuation of the study. Variants outside exons and exon-intron regions (up to 12 bases) were also excluded.
To check the quality of readings, coverage and read depth, Software IGV-Integrative Genomics Viewer was used.
For the analysis of the identified variants, prediction of their impact and clinical relevance, we used ClinVar (NCBI), a public database, where associations between human genetic variants and phenotypes, scientifically based, are reported, and PolyPhen-2, a software available online, which predicts the possible impact of amino acid substitutions on the stability and function of human proteins. In this software, SNPs are functionally annotated, coding SNPs are mapped to gene transcripts, protein sequence annotations and structural attributes are extracted and conservation profiles are developed. The probability of the missense mutation being harmful is then calculated using a combination of these properties. In Polyphen, we can find two variant classifiers, HumDIV and HumVAR. Only predictions from HumDIV will be presented in this study, since is considered preferential for rare alleles [20]. We also resort to the Human Splicing Finder to evaluate the impact on splicing, of intron, missense and synonymous variants located in splicing regions (1-3 bases of the exon, 1-8 bases of the intron).
To identify risk variants associated with obesity, the frequencies of the variants identified in this study were compared with European and Global frequencies in the 1000 Genomes project (https://www.internationalgenome.org (accessed on 6 January 2020) and, when not reported, in TopMed (https://www.nhlbiwgs.org/ (accessed on 6 January 2020). To be considered a risk allele, the difference between the frequency in the studied sample and the European and Global reference frequency had to be equal to or greater than 1.41% (corresponding to the proportion of two alleles in the calculated sample frequency−2/(2 × 71)).

Results
Of the 72 samples analysed, a coverage smaller than 80% was obtained in eight samples. After applying the PASS filter Illumina Variant Studio, variants were detected in all samples, except for one. The use of a non-invasive method for sample collection (oral epithelium cells collected with swabs) was probably the reason for the low coverage detected in these samples. To perform the technique, it was necessary to obtain a DNA Genes 2021, 12, 603 4 of 9 concentration higher than 5 ng/µL per sample, which for some samples was only possible after a second extraction and elution. Effectively, the efficiency of DNA recovery and extraction from swabs is usually, regardless of the material, less than 50% [21].
No variants were detected in PHF6 and PTEN genes. After verifying the mean coverage value of these sequences, we could rule out the suspicion of inefficacy of the enrichment probes.

Variant Consequences
Analysing the 429 different variants identified (Table 2), missense variants were the most frequent, counting 48.25%, followed by the synonymous variants with 41.49%. According to the 1000 Genomes Project, 52.09% of these variants have a European prevalence of less than 1%. According to PolyPhen HumDIV, of the 207 missense variants identified, 125 are predicted to be benign, 35 are possibly damaging and 44 probably damaging. Three variants have unknown impact. In absolute numbers, 1701 missense variants were detected, 1435 benign, 117 possibly damaging and 145 probably damaging. Table 3 describes the clinical associations that may be related to obesity, the number of variants identified and the number of samples where these variants were detected. A total of 129 variants are potentially associated with obesity, according to ClinVar and the consulted bibliography, some of which have more than a clinical association. Of these variants, seven are classified as a risk factor, one is classified as probably pathogenic and three as pathogenic (Table 4).

Variant Incidence
Of the 65 genes studied and considering the 64 samples with coverage greater than 80.00%, in three genes, variants were found in 100% of the samples: ALMS1, LRP2 and NEGR1 (Figure 1), followed by the BBS4 and PCK1 genes, with variants in 98.44% of the samples.  2  3  3  Leptin receptor deficiency  5  138  59  Metabolic syndrome susceptibility  2  77  50  Thyroid hormone resistance  3  27  27 A total of 129 variants are potentially associated with obesity, according to ClinVar and the consulted bibliography, some of which have more than a clinical association. Of these variants, seven are classified as a risk factor, one is classified as probably pathogenic and three as pathogenic (Table 4).

Variant Incidence
Of the 65 genes studied and considering the 64 samples with coverage greater than 80.00%, in three genes, variants were found in 100% of the samples: ALMS1, LRP2 and NEGR1 (Figure 1), followed by the BBS4 and PCK1 genes, with variants in 98.44% of the samples. Of all the identified variants, 16.80% were found in the LRP2 gene and 11.66% in the ALMS1 gene (Figure 2).
In the 64 samples, we found variants in an average of 26.69 genes and an average of 66.2 variants per sample. Of all the identified variants, 16.80% were found in the LRP2 gene and 11.66% in the ALMS1 gene (Figure 2).

Identification of Potential Risk Alleles
The identified potential risk alleles associated with obesity are described in Table 5. Although they did not fully meet the criteria defined previously, variants rs183867145, rs147058423 and rs189273089 (found in LRP2 gene) were also considered as potential risk variants associated with obesity. Effectively, according to the 1000 Genomes Project, variant rs183867145 was only identified in one American heterozygous individual, variant rs147058423 in a European heterozygous individual and variant rs189273089 in two heterozygous carriers, one American and one African. The three variants were detected each in two individuals in our sample.
In five variants, the frequency of the minor allele in the sample was lower than the European and Global frequencies, according to 1000 Genomes project (Table 6). These findings suggest that the c.724-8G allele of the BBS1 gene, c.196G of the BDNF gene, In the 64 samples, we found variants in an average of 26.69 genes and an average of 66.2 variants per sample.

Identification of Potential Risk Alleles
The identified potential risk alleles associated with obesity are described in Table 5. Although they did not fully meet the criteria defined previously, variants rs183867145, rs147058423 and rs189273089 (found in LRP2 gene) were also considered as potential risk variants associated with obesity. Effectively, according to the 1000 Genomes Project, variant rs183867145 was only identified in one American heterozygous individual, variant rs147058423 in a European heterozygous individual and variant rs189273089 in two heterozygous carriers, one American and one African. The three variants were detected each in two individuals in our sample.
In five variants, the frequency of the minor allele in the sample was lower than the European and Global frequencies, according to 1000 Genomes project (Table 6). These findings suggest that the c.724-8G allele of the BBS1 gene, c.196G of the BDNF gene, c.2512A of the BBS14/CEP290 gene, c.745C of the IGF2R gene and c.84T of the SLC6A14 gene constitute potential risk alleles for obesity.

New Variants
In To validate these results, functional studies should be performed.

Discussion
In the 72 individuals studied, no cases of monogenic obesity were recognised. In all samples, several variants were identified that constitute risk factors associated with obesity (between 3 and 12 variants per individual), which may influence the development of this phenotype in these individuals.
Of all the variants identified in this study, only five (Table 7), found in heterozygous carriers, have a ClinVar classification of pathogenic or probably pathogenic. The four syndromes referenced in Table 6 have autosomal recessive transmission, and, with the exception of the Donnai-Barrow Syndrome, obesity may be part of the phenotype [14,22,23]. The variant rs2229707 is described as associated with severe obesity and was detected in a study conducted by Argyropoulos et al., in an obese woman in heterozygosity (39 years), in three obese descendants in homozygosity (11,14 and 20 years-BMI is higher in older individuals) and in a non-obese descendant (9 years) in heterozygosity [24]. The information from this study and the other consulted bibliography is not enough to define whether this mutation may be associated with monogenic obesity and whether it has autosomal dominant or recessive heredity. In one of our samples, this variant was detected along with another possibly damaging variant in the UCP3 gene (rs8179180), both in heterozygosity. This sample belongs to an obese woman, whose father and mother, three siblings and two children are obese. It would be interesting to carry out the same study, sequencing the 65 genes, to all family members (including non-obese family members, who are not included in the available family history). Thus, we could test the penetrance and evaluate the clinical significance, and association with obesity, of these two variants identified in the UCP3 gene.

Limitations
One of the limitations of this study was the use of a sample collected by a non-invasive method (samples of oral epithelium cells collected by swab). This was probably the reason for the low coverage detected in some samples and the lack of results in one of the samples. The chosen NGS technique also failed to detect chromosomal structural changes such as massive deletions, insertions, inversions, or duplications, as well as epigenetic changes that may have occurred [9,25]. The lack of detailed clinical and family history of the individuals under study, namely associated co-morbidities, cases of childhood obesity or the identification of non-obese family members, and the non-sequencing of samples of relatives (ascending and descending), hindered the interpretation of the found variants, although it would help in the establishment of correlations between alleles and phenotypes. The variant frequency analysis was also hampered by the lack of a control group and the use of a database, the 1000 Genomes Project, in which obesity was not an exclusion criterion and there was no representation from Portugal [26].

Final Considerations
This study aimed to identify new genetic variants associated with obesity and to determine its prevalence in the Portuguese population. The development of similar studies may allow in the future to easily identify individuals at high risk of developing obesity in order to define more effective preventive strategies. In that sense, it will allow personalised medicine to be applied to the obesity field.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
None of the authors have any potential conflicts of interest associated with this research.