In-Silico Investigation of Effects of Single-Nucleotide Polymorphisms in PCOS-Associated CYP11A1 Gene on Mutated Proteins

Polycystic ovary syndrome (PCOS) is a reproductive disorder with multiple etiologies, mainly characterized by the excess production of androgens. It is equally contributed to by genes and environment. The CYP11A1 gene is imperative for steroidogenesis, so any dysregulation or mutation in this gene can lead to PCOS pathogenesis. Therefore, nucleotide diversity in this gene can be helpful in spotting the likelihood of developing PCOS. The present study was initiated to investigate the effect of single nucleotide polymorphisms in human CYP11A1 gene on different attributes of encoded mutated proteins, i.e., sub-cellular localization, ontology, half-life, isoelectric point, instability index, aliphatic index, extinction coefficient, 3-D and 2-D structures, and transmembrane topology. For this purpose, initially coding sequence (CDS) and single nucleotide polymorphisms (SNPs) were retrieved for the desired gene from Ensembl followed by translation of CDS using EXPASY tool. The protein sequence obtained was subjected to different tools including CELLO2GO, ProtParam, PHYRE2, I-Mutant, SIFT, and PolyPhen. It was found that out of seventy-eight SNPs analyzed in this project, seventeen mutations, i.e., rs750026801 in exon 1, rs776056840, rs779154292 and rs1217014229 in exon 2, rs549043326 in exon 3, rs755186597 in exon 4, rs1224774813, rs757299093 and rs1555425667 in exon 5, rs1454328072 in exon 7, rs762412759 and rs755975808 in exon 8, and rs754610565, rs779413653, rs765916701, rs1368450780, and rs747901197 in exon 9 considerably altered the structure, sub-cellular localization, and physicochemical characteristics of mutated proteins. Among the fifty-nine missense SNPs documented in present study, fifty-five and fifty-three were found to be deleterious according to SIFT and PolyPhen tools, respectively. Forty-nine missense mutations were analyzed to have a decreasing effect on the stability of mutant proteins. Hence, these genetic variants can serve as potential biomarkers in human females for determining the probability of being predisposed to PCOS.


Introduction
Polycystic ovary syndrome (PCOS) is a disease of the human female reproductive system associated with physiological, psychological, metabolic, endocrine, and menstrual malaises. The disorder is known as polycystic due to detection of many undeveloped follicles in ovary called as cysts in ultrasonography (USG). Cysts arise from primitive follicles which failed to develop due to abnormal ovarian function caused by hormonal disturbances [1]. PCOS affects 6-12% of females at their reproductive age, i.e., 15-49 years [2]. According to another study, the amount of females affected with PCOS is 4-20% [3]. Recently, a study was conducted on a population of 1960 eligible Iranian females to detect PCOS prevalence using three diagnostic criteria, i.e., National Institutes of Health (NIH), the Androgen Excess Society (AES), and the 2003 Rotterdam [4,5]. According to these criteria, the prevalence rates have been determined to be 13.6%, 17.8%, and 19.4%, respectively [6]. Another study performed to detect the global incidence of PCOS among females of susceptible age revealed 1.55 million PCOS cases with uncertainty intervals (UIs) of 95% [7].
Reproductive abnormalities are fluid-filled sacs in enlarged ovaries (polycystic ovaries), endometrial carcinoma, complete absence of menstruation at puberty or amenorrhea, uterine and abdominal cramps during the reproductive cycle also known as dysmenorrhea, and the absence of the release of ovum from the ovary due to a decreased amount of luteinizing hormone (LH) and follicle-stimulating hormone (FSH) associated with estrogen deficiency or anovulation [1]. Physiological symptoms include obesity, acne, thinning of hair, alopecia, hirsutism, and heavy hair growth on the face [8]. Psychological symptoms are characterized by decreased self-esteem, sexual satisfaction, depression, anxiety, and a negative perception of body image [9,10]. In aged females, the disorder undergoes a transition from reproductive to a more metabolic one. The metabolic symptoms are dyslipidemia, hypertension, cardiovascular diseases, hepatic steatosis, type-2 diabetes (T2DM), impaired glucose tolerance (IGT), and increased resistance to insulin [10,11].
Under normal conditions, a sex hormone-binding globulin protein (SHBG) binds testosterone and controls its proportion in body circulation. However, high concentration of insulin inhibits SHBG, leading to an excess amount of free circulating male sex hormone androgens which, in turn, increases hypothalamic gonadotropin-releasing hormone (GnRH) [12,13]. It leads to more secretion of gonadotropin hormone (GH) from the pituitary gland. This causes a rise in luteinizing hormone (LH) concentration over follicle-stimulating hormone (FSH). LH binds with the LH receptor and promotes ovarian androgenic production [14,15]. Over-production of anti-mullerian hormone (AMH) from granulosa cells of human ovary causes inhibition of the development of primitive follicles, resulting in the formation of cysts [16].
PCOS is a multifactorial trait being regulated by interactions of polygenes and the environmental factors. The environmental factors that may contribute to PCOS may be physical, chemical, or related to diet and lifestyle, etc. Chemical factors are exposure to mercury, cadmium, lead, pesticides, carbon disulfide, organic solvents such as toluene, ether and ethylene glycol, di-isobutyl phthalate (DIBP), di-isononyl phthalate (DINP), and areca nut chewing and tobacco smoke [17][18][19]. Other external factors include stress and mood swings, difficulty in weight loss, weight gain, sleep apnea, intake of high-calorie food, and the use of plastic ware and indoor decorations [20,21].
CYP11A1 belongs to the cytochrome P450 family. It is alternatively known as cholesterol side chain cleavage enzyme, cholesterol desmolase, CYPXIA1, cytochrome P450 11A1, and cytochrome P450 (scc). This gene is functionally related with synthesis of steroid hormones. It encodes enzyme which catalyzes the transformation of cholesterol into pregnenolone (a precursor of steroid hormones) in mitochondria. This is the very first step of the steroidogenesis pathway [52].
SNPs in CYP11A1 can be useful in establishing the genetic architecture of PCOS. However, the literature reports so far are scarce. Therefore, keeping in view the significant association of CYP11A1 gene with PCOS, the present project was designed in order to analyze the effect of the reported SNP's transcript variant of this gene on the properties of the encoded enzyme. These SNPs might be useful in prediction and susceptibility of PCOS in human females and can be tested further through experimental work.

Materials and Methods
In present study, in-silico characterization of the CYP11A1 gene was performed to predict the effect of the reported SNPs on the encoded mutated proteins. The scheme is shown in Figure 1.

Retrieving CDS of CYP11A1
CYP11A1 comprises 10 transcripts. A transcript variant of this gene, i.e., CYP11A1201 (ENST00000268053.11) located on chromosome GRCh38.p12 was focused on in this study. This splice variant is a product of gene ENSG00000140459.18. This transcript comprises 9 exons encoding a protein of 521 amino acids. Coding sequence and SNPs of this variant were retrieved from the ENSEMBL database (https://asia.ensembl.org/index.html, accessed on 23 March 2022).

Retrieving SNPs and Construction of Mutated CDS
The ENSEMBL database was used for retrieving the SNPs. A total of seventy-eight SNPs were retrieved in the exonic regions (Table 1).
A total of twenty-two cases were designed for seventy-eight SNPs. SNPs were analyzed through the SIFT tool (https://sift.bii.a-star.edu.sg, accessed on 25 June 2022) and PolyPhen (genetics.bwh.harvard.edu/pph2/, accessed on 25 June 2022) to predict the nature of the effect that may be deleterious or benign.
For each case documented in present study, separate mutated CDSs were prepared by incorporating mutations in CDSs of wild type gene sequence. The location of the mutations discussed in the present study are shown in detail in Figure 2.
Symbol * shown in cases of stop gained mutations is used to represent absence of amino acid.
A total of twenty-two cases were designed for seventy-eight SNPs. SNPs were analyzed through the SIFT tool (https://sift.bii.a-star.edu.sg, accessed on 25 June 2022) and PolyPhen (genetics.bwh.harvard.edu/pph2/, accessed on 25 June 2022) to predict the nature of the effect that may be deleterious or benign.
For each case documented in present study, separate mutated CDSs were prepared by incorporating mutations in CDSs of wild type gene sequence. The location of the mutations discussed in the present study are shown in detail in Figure 2.

Translation of CDS into Protein Sequence
The EXPASY translation tool (https://web.expasy.org/translate, accessed on 23 March 2022) was used to translate the wild type and mutated CDSs. A total of twenty-two mutated protein sequences were obtained for all the cases designed (Supplementary Data Figure S1).

Sub-Cellular Localization and Ontology of Wild Type and Mutated Proteins
To study the effect of SNPs in each case on the localization and ontology of mutated proteins, the CELLO2GO online tool was used [58]. The results obtained were compared with the localization and ontology of normal protein.

Physicochemical Features
ProtParam tool (https://web.expasy.org/protparam/, accessed on 25 March 2022) was used to analyze the effect of SNPs in question on the physicochemical parameters of mutated proteins. The parameters include number of amino acids, molecular weight, theoretical pI, half-life, instability index, aliphatic index, and extinction coefficient.

3-D and 2-D Structures and Trans-Membrane Topology
The PHYRE2 tool (http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index, accessed on 15 April 2022) was used to determine the effect of SNPs on 2-D and 3-D structures of mutated proteins and trans-membrane topology.

Effect of SNPs on Mutated Protein Stability
Fifty-nine missense SNPs addressed in the present study were examined for their influence on the stability of proteins. It was found that all the missense mutations except rs1258660118, rs1393077247, rs1424340465, rs139449608, rs867506250, rs1208632679, rs752776256, rs1416463682, and rs200726137 contributed to the destabilization of the corresponding mutated proteins (Table 2).

Effect of SNPs on Sub-Cellular Localization of Proteins
Mutations documented in cases 2, 9, 10, 11, 14, 15, 16 to 19, 21, and 22 changed the localization of the corresponding mutated proteins from mitochondrial to higher in cytoplasm and high in mitochondria. In case 20, mutation significantly affected the location of the mutant protein and changed it to cytoplasmic only. Mutations included in cases 3, 5, 12, and 13 changed localization from mitochondrial to higher in mitochondria and high in cytoplasm. Mutations documented in cases 6, 7, 8, and 9 did not affect the localization. Case 4 mutation changed localization from mitochondrial to highest in the nucleus, higher in cytoplasm, and high in mitochondria. Case 1 changed the localization to higher in mitochondria and high in the nucleus (Figure 3, Table 3).

Effect of SNPs on Ontology of Mutated Proteins
Ontology analysis revealed the effect of SNPs on three types of molecular and biological functions. The molecular functions included ion binding, oxidoreductase activity, and lipid binding while biological functions discussed are lipid metabolism, biosynthesis, and small-molecule metabolism. No prominent effect was observed on the ion binding and oxidoreductase activity in cases 16 to 22. Slight or no deviations from normal values were observed with reference to lipid binding, lipid metabolism, biosynthesis, and smallmolecule metabolism in cases 17 to 22 (Supplementary Data Figure S2). Largest deviation from normal values of ion binding and oxidoreductase activity was recorded in cases 1, 9, 14, and 15. Case 1 showed the highest deviation from lipid binding followed by case 4, 3, 7, 8, 10-13, 15, and 16. As far as lipid metabolism, biosynthesis, and small molecule metabolism are concerned, the greatest variation was observed in cases 1, 3, and 4 ( Figure 4, Table 4).
The aliphatic index of mutated proteins was affected by single nucleotide variations in case 1 and case 4 due to the SNPs rs750026801 and rs776056840 incorporated in exon 2. The values observed were 54.86 and 70.89. Other SNPs noticed in the remaining cases also affected the aliphatic index but the variation was not considerable.

Effect of SNPs on 3D Structures of Mutated Proteins
In case 1, a single-frame shift mutation rs750026801 in exon 1 was documented, which caused a change in amino acid at the 85 position of the mutated protein as well as the reading frame of the sequence. This mutation caused abnormality in the overall 3D conformation of protein. Fifteen missense mutations incorporated in exons 1, 2, and 3 in case 2, nineteen missense mutations documented in exons 4, 5, and 6 in case 9, and twenty four missense SNPs incorporated in exons 7, 8, and 9 in case 14 did not affect the overall 3D conformation of mutated protein (Supplementary Data Figure S3).
In case 3, single-stop gained mutation rs776056840 in exon 2 was introduced in exon 2. It caused a change of the encoding codon to stop the codon at position 120 of the protein sequence, resulting in a truncated protein. In case 4, single-frame shift mutation rs779154292 was incorporated in exon 2 of CDS, resulting in a change of amino acid I > X at position 102. This led to the abnormal protein conformation. In case 5, single-frame shift mutation rs1217014229 was documented in exon 2 which caused a change in amino acid H > PX at position 130 of the protein sequence. This change resulted in the distortion of the overall protein structure. In case 6, single-stop gained mutation rs549043326 was introduced in exon 3, which caused a change at codon 144, converting it into a stop codon. This resulted in the truncation of the protein. In case 7, single-frame shift mutation rs1421587886 was incorporated in exon 3, which resulted in change of amino acid A > X at position 173 of the protein sequence. As a consequence, the 3D protein structure was altered. In case 8, single-frame shift mutation rs1178589612 was incorporated in exon 3 of the gene. This mutation replaced the amino acid L > X at the 170 position of the protein sequence, which led to an alteration in the overall protein structure. In case 10, single-stop gained mutation rs755186597 was documented in exon 4 of CDS, which caused a change in the encoding codon into a stop codon at position 232 of the protein sequence. This led to the formation of a truncated protein.
In case 11, single-stop gained mutation incorporated in exon 5 of gene resulted in a stop codon at position 282 of the protein sequence, resulting in the formation of a truncated protein. In case 12, a frame-shift mutation rs757299093 documented in exon 5 of CDS changed the reading frame of mRNA, leading to the formation of an abnormal protein. In case 13, a mutation rs1555425667 documented in exon 5 changed the amino acid E > X at position 314 of the protein sequence as well as the reading frame, resulting in the formation of a distorted protein structure. In case 15, a stop-gained mutation rs1454328072 in exon 7 of gene resulted in a change of the coding sequence at 405 to a non-coding sequence. This led to the formation of a truncated protein.

Effect of SNPs on Secondary Structure of Mutated Proteins
Three characteristics of secondary structures, i.e., disorder, α helix, and β strand percentage were analyzed. SNPs documented in case 1 (rs750026801 in exon 1), case 3 (rs776056840 in exon 2), case 4 (rs779154292 in exon 2), case 5 (rs1217014229 in exon 2), and case 6 (rs549043326 in exon 3) caused larger deviations from normal disorder values in mutated proteins. On the other hand, mutations incorporated in case 2 and cases 7 to 22 did not affect the disorder (%) of mutated proteins.

Effect of SNPs on Secondary Structure of Mutated Proteins
Three characteristics of secondary structures, i.e., disorder, α helix, and β strand percentage were analyzed. SNPs documented in case 1 (rs750026801 in exon 1), case 3 (rs776056840 in exon 2), case 4 (rs779154292 in exon 2), case 5 (rs1217014229 in exon 2), and case 6 (rs549043326 in exon 3) caused larger deviations from normal disorder values in mutated proteins. On the other hand, mutations incorporated in case 2 and cases 7 to 22 did not affect the disorder (%) of mutated proteins.

Effect of SNPs on Trans-Membrane Topology of Mutated Proteins
In case 1, case 3, case 4, case 5, case 6, and case 8, no topology was observed through in-silico analysis. Mutation induced in case 3 resulted in the formation of three trans-membrane domains in mutant form as compared with two domains in normal protein.
In cases 2, 9, and 11 to 22, mutations did not affect the number of domains of mutated proteins (Supplementary Data Figure S4). Mutations addressed in cases 7 and 10 reduced the number of domains to 1 in mutant forms of protein compared with normal ( Figure 6). A slight variation in the number of amino acids in signal peptide and trans-membrane segments of mutated proteins was also observed (Supplementary Data Table S2).

Effect of SNPs on Trans-Membrane Topology of Mutated Proteins
In case 1, case 3, case 4, case 5, case 6, and case 8, no topology was observed through in-silico analysis. Mutation induced in case 3 resulted in the formation of three trans-membrane domains in mutant form as compared with two domains in normal protein. In cases 2, 9, and 11 to 22, mutations did not affect the number of domains of mutated proteins (Supplementary Data Figure S4). Mutations addressed in cases 7 and 10 reduced the number of domains to 1 in mutant forms of protein compared with normal ( Figure 6). A slight variation in the number of amino acids in signal peptide and trans-membrane segments of mutated proteins was also observed (Supplementary Data Table S2).

Discussion
According to the literature, increased expression of the CYP11A1 gene causes enhanced steroidogenesis in ovaries of PCOS patients. Therefore, it is considered a potential candidate gene for PCOS [59]. Association studies of this gene have been performed in PCOS-infected British, Greek, Indian, Chinese, Iraqi, and Spanish females. In Iraqi females, PCOS was found to be associated with the incidence of three and five repeats in the promoter region of CYP11A1 [60].
The PCOS-infected Caucasian females of America have been reported to exhibit pentanucleotides with nine repeats. On the other hand, Chinese and Spanish females with eight and four repeats, respectively, have been reported [61,62].
A study conducted in the Indian population revealed the connection between PCOS and a pentanucleotide polymorphic repeat. Females with more than eight repeats were infected with PCOS while those with less than eight were found healthy [63].
Most of the studies conducted so far dealt with SNPs located in the promoter region and UTRs of the gene. A study conducted on Chinese patients of PCOS reported the association of seven SNPs, i.e., rs1843090, rs11632698, rs4887139, rs4077582, D15S1547, D15S1546, and D16S520, with this disease [64]. A strong association of rs4077582 was also reported in another study. This SNP was found to induce a change in the level of testosterone by affecting LH production [53,65].
Another study reported a strong relation of pentanucleotide repeat (TTTTA) found in the 5 UTR region of the gene with abnormally high testosterone level in PCOS-infected British females [67].
Another investigation targeted this microsatellite pentanucleotide polymorphism and found it to be the cause of enhanced risk of PCOS in Egyptian females [67][68][69]. A project aimed at finding the association of CYP11A1 polymorphisms with PCOS revealed three potentially effecting genetic variations, i.e., rs4887139, rs4077582, and rs11632698 [70].
Hence, the literature strongly supports association of CYP11A1 gene polymorphism with incidence of PCOS.
Although mutational changes in promoter and untranslated regions might affect the transcription regulation of the gene [71], mutations in exons are more crucial because they may directly alter the structure and function of the protein. Therefore, the present study was initiated to determine the effect of polymorphisms localized in exons of this gene. For this purpose, polymorphisms were obtained from a genome-wide database of human variations produced by the HapMap project and investigated for their effect on protein structure and activity.
Other polymorphisms that altered the structure of protein included rs750026801 (exon 1), rs779154292 and rs1217014229 (exon 2), and rs757299093 and rs1555425667 (exon 5). These mutations were found to have a strong impact on the nature of encoded proteins and, hence, can be suggested to be considered for PCOS association studies as well as strong genetic biomarkers for the disease. These SNPs can also be helpful in determining the susceptibility of females for PCOS.

Conclusions
By comparing the literature as well as the present study regarding the CYP11A1 gene, it can be inferred that both the promoter and the exonic regions are prone to PCOS-related mutations. Exonic mutational changes may lead to the formation of proteins with faulty structure and function. In addition, polymorphism in the promoter region may contribute to up-or downregulation of gene expression. Both these cases may have deleterious consequences. Therefore, study of the SNPs in candidate genes of PCOS such as CYP11A1 might be helpful in the future as genomic markers for PCOS susceptibility, in tracking the inheritance pattern of these variants in families, and for the elucidation of the role of genes in disease pathogenesis.
Supplementary Materials: https://www.mdpi.com/article/10.3390/genes13071231/s1, Figure S1: Protein sequences of mutated pro-teins retrieved through Expasy tool for cases 1-22, containing variations documented in present study, Figure S2: Effect of SNPs documented in cases 17-22 on the ontology of mutated proteins, Figure S3: Effect of SNPs documented in cases 2, 9 and 14 on 3D structure of mutated proteins, Figure S4: Effect of SNPs documented in cases 2 and 11-22 on the topology of mutated proteins, Table S1: Variations in 2D structure of mutated proteins induced by SNPs documented in present study; Table S2: Effect of mutations on trans-membrane topology of mutated proteins.

Data Availability Statement:
The SNPs documented in present study have been retrieved from ENSEMBL database ((https://web.expasy.org/translate).

Conflicts of Interest:
The authors declare no conflict of interest.