In Silico Analysis Identified Putative Pathogenic Missense nsSNPs in Human SLITRK1 Gene

Human DNA contains several variations, which can affect the structure and normal functioning of a protein. These variations could be single nucleotide polymorphisms (SNPs) or insertion-deletions (InDels). SNPs, as opposed to InDels, are more commonly present in DNA and may cause genetic disorders. In the current study, several bioinformatic tools were used to prioritize the pathogenic variants in the SLITRK1 gene. Out of all of the variants, 16 were commonly predicted to be pathogenic by these tools. All the variants had very low frequency, i.e., <0.0001 in the global population. The secondary structure of all filtered variants was predicted, but no structural change was observed at the site of variation in any variant. Protein stability analysis of these variants was then performed, which determined a decrease in protein stability of 10 of the variants. Amino acid conservation analysis revealed that all the amino acids were highly conserved, indicating their structural and functional importance. Protein 3D structure of wildtype SLITRK1 and all of its variants was predicted using I-TASSER, and the effect of variation on 3D structure of the protein was observed using the Missense3D tool, which presented the probable structural loss in three variants, i.e., Asn529Lys, Leu496Pro and Leu94Phe. The wildtype SLITRK1 protein and these three variants were independently docked with their close interactor protein PTPRD, and remarkable differences were observed in the docking sites of normal and variants, which will ultimately affect the functional activity of the SLITRK1 protein. Previous studies have shown that mutations in SLITRK1 are involved in Tourette syndrome. The present study may assist a molecular geneticist in interpreting the variant pathogenicity in research as well as diagnostic setup.


Introduction
Human DNA contains several variations in its sequence including single nucleotide polymorphisms (SNPs) and insertion deletions (InDels). However, SNPs are the most frequently occurring variations in the human genome. These variations in the genome may alter the protein structure and function and can affect the normal character(s) of an organism [1][2][3]. InDels cause substantial genetic variation in the genome of an organism. Many of the InDels occur at the functionally important part of the genome and hence may also play their role in disease onset [4]. Single nucleotide substitution may either cause missense or nonsense effect. The detailed classification of variant is shown in Figure 1. In missense effect, one amino acid is replaced by another amino acid, while the nonsense variants replace the coding codon with a stop codon that eventually leads to the truncation of protein [5]. About 90% of human genome polymorphisms comprises SNPs. Through genome wide prioritization, 0.12% of the variants out of total human genome are predicted to be pathogenic [6]. Several SNPs do not contribute to the causation of disease, but there are certain SNPs that are called missense SNPs or non-synonymous SNPs (nsSNPs) and that are involved in genetic disorders [7]. In about 50% of the total known mutations, nsSNPs are the major contributing factor [8,9]. missense effect, one amino acid is replaced by another amino acid, while the nonsense variants replace the coding codon with a stop codon that eventually leads to the truncation of protein [5]. About 90% of human genome polymorphisms comprises SNPs. Through genome wide prioritization, 0.12% of the variants out of total human genome are predicted to be pathogenic [6]. Several SNPs do not contribute to the causation of disease, but there are certain SNPs that are called missense SNPs or non-synonymous SNPs (nsSNPs) and that are involved in genetic disorders [7]. In about 50% of the total known mutations, nsSNPs are the major contributing factor [8,9].
Most of the disease-causing SNPs are reported at evolutionary-conserved regions of the human genome, which have great importance in the structure and function of proteins. It is very important to identify the pathogenicity of specific SNP for disease prognosis. The identification of SNPs involved in disease is a difficult job, as it requires multiple tests for hundreds to thousands of SNPs in candidate genes. Prioritizing SNPs using bioinformatical tools would be a possible way to overcome this problem. Bioinformatics prediction tools help us to discriminate disease-causing variants from neutral ones.
In the current study, several bioinformatics tools were used to investigate the structural and functional consequences of nsSNPs present in the coding region of the human SLITRK1 gene. We also predicted the 3D structure of the wildtype SLITRK1 protein and its prioritized predicted pathogenic variants. This is the first in silico study of the human SLITRK1 gene, which is helpful in predicting pathogenic nsSNPs in the coding region of the SLITRK1 gene.  Most of the disease-causing SNPs are reported at evolutionary-conserved regions of the human genome, which have great importance in the structure and function of proteins. It is very important to identify the pathogenicity of specific SNP for disease prognosis. The identification of SNPs involved in disease is a difficult job, as it requires multiple tests for hundreds to thousands of SNPs in candidate genes. Prioritizing SNPs using bioinformatical tools would be a possible way to overcome this problem. Bioinformatics prediction tools help us to discriminate disease-causing variants from neutral ones.

Variant Recruitment
In the current study, several bioinformatics tools were used to investigate the structural and functional consequences of nsSNPs present in the coding region of the human SLITRK1 gene. We also predicted the 3D structure of the wildtype SLITRK1 protein and its prioritized predicted pathogenic variants. This is the first in silico study of the human SLITRK1 gene, which is helpful in predicting pathogenic nsSNPs in the coding region of the SLITRK1 gene.

Variant Recruitment
Variants of the SLITRK1 gene were recruited from the Ensembl genome browser (https://asia.ensembl.org/index.html, accessed on 28 February 2021). Manual variant filtration was performed on an MS Excel supported file enlisting all the variants of the SLITRK1 gene. Only those variants were selected for further analyses, which were nsSNPs (i.e., missense) and which fell within the coding region of the SLITRK1 gene. The nsSNPs were analyzed through various bioinformatics tools to find putative pathogenic variants ( Figure 2).

Predicting Pathogenicity of Missense nsSNPs
Different online bioinformatics tools were used to predict the pathogenicity of filtered nsSNPs.

Variant Frequency
The frequency of variants that are commonly predicted to be pathogenic by all the bioinformatics pathogenicity predictor tools was checked using dbSNP (https://www.ncbi. nlm.nih.gov/snp/, accessed on 2 October 2021).

Secondary Structure Prediction
The secondary structures of a normal SLITRK1 protein and common predicted pathogenic variants were analyzed through an online tool PSIPRED (http://bioinf.cs. ucl.ac.uk/psipred/, accessed on 4 October 2021).
The ConSurf web server [10] demonstrates the evolutionary pattern of the amino acids and nucleic acids by predicting the structural and functional areas. The results are predicted based on conservation scores that range from 1 to 9, where 1 indicates variable regions, 5 indicates mild conserved regions and 9 indicates highly conserved regions. However, exposed residues with high scores are considered functional residues, whereas buried residues with high scores are considered structural.

Protein 3D Structure Prediction
I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/, accessed on 1 December 2021) tool was used to predict the 3D structure of SLITRK1 protein and its commonly predicted pathogenic variants. Missense3D (http://missense3d.bc.ic.ac.uk/missense3d/, accessed on 10 January 2022) was used to predict structural changes in protein by substitution of an amino acid. UCSF Chimera (candidate version 1.15) was used to visualize the 3D structures of proteins retrieved from I-TASSER. 3D structure of normal and variant proteins was overlapped in UCSF Chimera to observe structural changes.

Protein-Protein Interactions
The interaction of SLITRK1 protein with other proteins was studied using online tool STRING (https://string-db.org, accessed on 20 January 2022), which predicts the top ten proteins that show interactions with the query gene. STRING predicts the interactors of a protein on the basis of gene fusion, co-expression, function and experimental data. It shows combined scores for each interacting protein, ranging from 0 to 1, where 0 shows the lowest interaction and 1 indicates the highest interaction [11].

Protein-Protein Docking
The online tool Cluspro was used for docking of the normal and mutant SLITRK1 proteins with its close functional interactor [12].
predicted based on conservation scores that range from 1 to 9, where 1 indicates variable regions, 5 indicates mild conserved regions and 9 indicates highly conserved regions. However, exposed residues with high scores are considered functional residues, whereas buried residues with high scores are considered structural.

Protein 3D Structure Prediction
I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/, accessed on 1 December 2021) tool was used to predict the 3D structure of SLITRK1 protein and its commonly predicted pathogenic variants. Missense3D (http://missense3d.bc.ic.ac.uk/missense3d/, accessed on 10 January 2022) was used to predict structural changes in protein by substitution of an amino acid. UCSF Chimera (candidate version 1.15) was used to visualize the 3D structures of proteins retrieved from I-TASSER. 3D structure of normal and variant proteins was overlapped in UCSF Chimera to observe structural changes.

Protein-Protein Interactions
The interaction of SLITRK1 protein with other proteins was studied using online tool STRING (https://string-db.org, accessed on 20 January 2022), which predicts the top ten proteins that show interactions with the query gene. STRING predicts the interactors of a protein on the basis of gene fusion, co-expression, function and experimental data. It shows combined scores for each interacting protein, ranging from 0 to 1, where 0 shows the lowest interaction and 1 indicates the highest interaction [11].

Protein-Protein Docking
The online tool Cluspro was used for docking of the normal and mutant SLITRK1 proteins with its close functional interactor [12].

Variant Recruitment
The total identified variants in SLITRK1 include 07 indels, 321 5′UTR and 624 3′UTR variants, and 5 nonsense variants. A total of 2255 variants were recruited from the Ensembl genome browser, which consisted of synonymous, non-synonymous, intronic, 3′UTR and 5′UTR variants. Out of the nonsynonymous variants, 7 were indels and 447

Variant Recruitment
The total identified variants in SLITRK1 include 07 indels, 321 5 UTR and 624 3 UTR variants, and 5 nonsense variants. A total of 2255 variants were recruited from the Ensembl genome browser, which consisted of synonymous, non-synonymous, intronic, 3 UTR and 5 UTR variants. Out of the nonsynonymous variants, 7 were indels and 447 were SNPs. Of these nsSNPs, 442 variants were missense, i.e., causing change in amino acid, and 5 variants resulted in the formation of a stop codon, causing truncation of the protein. Only missense nsSNPs were selected for further bioinformatical analysis.

Pathogenicity Prediction of Variants
All the missense nsSNPs were subjected to ten bioinformatical tools to predict their biological pathogenicity. The proportion of pathogenic variants predicted by different software include 146 by Polyphen2, 101 by SNPs&Go, 102 by Meta-SNP, 103 by PROVEAN, 177 by SIFT, 104 by MutationAssessor, 395 by PANTHER, 141 by PhD-SNP, 160 by SNAP2 and 387 by PMut. Among all the variants, 16 were commonly predicted to be pathogenic by all the ten bioinformatical tools (Table 1). These 16 variants were then finally selected for further analysis due to their high susceptibility for being pathogenic.

Variant Frequency
dbSNP (https://www.ncbi.nlm.nih.gov/snp/, accessed on 2 October 2021) was used to check the frequency of commonly predicted pathogenic variants. The frequency of all the variants was very low (<0.0001) in the global population. This supported the findings produced by pathogenicity prediction tools.

Secondary Structure Prediction
The secondary structure predictions indicated that all the selected variants lie in the coil of the SLITRK1 protein and only one lies in the helix. There was no structural change observed at the point of change. However, some upstream and downstream changes were observed.

Protein Stability Analysis
I-Mutant and MUpro tools were used to check the stability of the SLITRK1 protein for selected amino acid substitutions. Among 16 commonly predicted pathogenic variants, 10 of the variants were commonly predicted to show decreases in the stability of the protein, which suggestively may cause greater loss to the SLITRK1 protein ( Table 2). The Clustal Omega tool was used for multiple sequence alignment of the SLITRK1 protein of humans with other species. The results revealed that all the amino acids at the point of variations were highly conserved in all other species, indicating the evolutionary and functional importance of selected amino acids (Figure 3).

ConSurf
The ConSurf tool was used to identify the evolutionary conservation of amino acids of the SLITRK1 protein. ConSurf predicts which amino acids play structural or functional roles based on conservation and solvent accessibility. Residues are predicted as being functional when they are highly conserved and exposed and as structural when they are highly conserved and buried. The results indicate that all 10 nsSNPs that are predicted to be damaging are highly conserved i.e., nine variants having a conservation score of 9 and 1 showing a conservation score of 8. Of the above 10 amino acids, half were buried and predicted as structural residues while the rest were exposed and predicted as functional residues (Figure 4).

ConSurf
The ConSurf tool was used to identify the evolutionary conservation of amino acids of the SLITRK1 protein. ConSurf predicts which amino acids play structural or functional roles based on conservation and solvent accessibility. Residues are predicted as being functional when they are highly conserved and exposed and as structural when they are highly conserved and buried. The results indicate that all 10 nsSNPs that are predicted to be damaging are highly conserved i.e., nine variants having a conservation score of 9 and 1 showing a conservation score of 8. Of the above 10 amino acids, half were buried and predicted as structural residues while the rest were exposed and predicted as functional residues (Figure 4).

3D Structure Predictions
Three-dimensional models of all SLITRKI variants were designed and superimposed with the 3D structure of wild SLITRK1 ( Figure 5). The Missense3D tool was used to detect the structural changes that were caused due to substitution of the amino acids. The Missense3D tool predicted structural damage in variants I, III and VIII, while no structural damage was detected in the other variants. Hence, we selected variants I, III and VIII for further analysis. By manual comparison of all the structures, different changes in the folding pattern of protein were absorbed. The highest similarity index of 71.12% was shown in the 3D model of the mutant Asn529Lys with the wildtype 3D model of the SLITRK1 protein, while the lowest similarity index was shown in the 3D model of mutant Leu94Phe with the wildtype 3D model of the SLITRK1 protein. The similarity index of the 3D model of mutant Leu496Pro with the wildtype SLITRK1 protein was 31.32%.

3D Structure Predictions
Three-dimensional models of all SLITRKI variants were designed and superimposed with the 3D structure of wild SLITRK1 ( Figure 5). The Missense3D tool was used to detect the structural changes that were caused due to substitution of the amino acids. The Mis-sense3D tool predicted structural damage in variants I, III and VIII, while no structural damage was detected in the other variants. Hence, we selected variants I, III and VIII for further analysis. By manual comparison of all the structures, different changes in the folding pattern of protein were absorbed. The highest similarity index of 71.12% was shown in the 3D model of the mutant Asn529Lys with the wildtype 3D model of the SLITRK1 protein, while the lowest similarity index was shown in the 3D model of mutant Leu94Phe with the wildtype 3D model of the SLITRK1 protein. The similarity index of the 3D model of mutant Leu496Pro with the wildtype SLITRK1 protein was 31.32%.

Protein-Protein Interactions
The String tool was used to predict the close interactor protein of SLITRK1. The results showed that SLITRK1 has close interactions with the PTPRD, PTPRS, PTPRF, OPCML, PTPRA, PTPRE, DLGAP3, PTPRT, IGHMBP2 and SGCE proteins. However, the SLITRK1 protein determined PTPRD as being the closest functional interactor ( Figure 6).

Protein-Protein Interactions
The String tool was used to predict the close interactor protein of SLITRK1. The results showed that SLITRK1 has close interactions with the PTPRD, PTPRS, PTPRF, OPCML, PTPRA, PTPRE, DLGAP3, PTPRT, IGHMBP2 and SGCE proteins. However, the SLITRK1 protein determined PTPRD as being the closest functional interactor ( Figure 6).

Protein-Protein Docking
The wildtype SLITRK1 protein and all its variants were docked with the close interactor PTPRD protein and notable differences were observed in the interacting sites of wildtype SLITRK1 and its variants with PTPRD. The results showed that wildtype SLITRK1 has interactions with PTPRD at eight amino acid residues, i.e., Ala680, Lys647, Phe641, Arg584, Thr604, Tyr582, Glu550 and Ser616. These interactions are through 8 interactive forces including 7 H-bonds and 1 unfavorable bond.
Variant Leu94Phe showed the lowest interaction with PTPRD, i.e., interacting at five different residues via four H-bonds and one unfavorable bond, while variant Leu496Pro showed the highest interaction with PTPRD, interacting at 14 different residues via 15 bonds, including 12 H-bonds and 3 unfavorable bonds. The interactions of SLITRK1 (normal and all variants) with close interactor PTPRD are diagrammatically shown in Figure 7.

Protein-Protein Docking
The wildtype SLITRK1 protein and all its variants were docked with the close interactor PTPRD protein and notable differences were observed in the interacting sites of wildtype SLITRK1 and its variants with PTPRD. The results showed that wildtype SLI-TRK1 has interactions with PTPRD at eight amino acid residues, i.e., Ala680, Lys647, Phe641, Arg584, Thr604, Tyr582, Glu550 and Ser616. These interactions are through 8 interactive forces including 7 H-bonds and 1 unfavorable bond.
Variant Leu94Phe showed the lowest interaction with PTPRD, i.e., interacting at five different residues via four H-bonds and one unfavorable bond, while variant Leu496Pro showed the highest interaction with PTPRD, interacting at 14 different residues via 15 bonds, including 12 H-bonds and 3 unfavorable bonds. The interactions of SLITRK1 (normal and all variants) with close interactor PTPRD are diagrammatically shown in Figure  7.

Discussion
SNPs, also known as single nucleotide variants (SNVs), are the most commonly found variants in the human genome. According to an estimate, the human genome contains at least 11 million SNPs (1 per 300 bp on average) [13]. SNPs are found in protein

Discussion
SNPs, also known as single nucleotide variants (SNVs), are the most commonly found variants in the human genome. According to an estimate, the human genome contains at least 11 million SNPs (1 per 300 bp on average) [13]. SNPs are found in protein coding as well as in non-protein coding regions [14,15]. Research has shown that variations in non-coding elements may also be the cause of several genetic conditions [16]. Numerous evidence has shown that there are some variants that are found in functional non-coding regions including chromatin marks, DNase hypersensitivity and enhancer elements [17,18]. The role of variations in non-coding regions including intergenic sequence, non-coding RNAs and non-coding elements in protein coding gene, is challenging to determine and needs to be better understood. The non-coding variants involved in causation of different disorders may be found in enhancer regions, promoter sites, or 5 UTR or 3 UTR of the gene [19]. Another important role of the non-coding variant is in the regulation of gene expression, which is a challenging task when identifying the effect of variation in molecular mechanism of gene regulation [20]. Studies have also demonstrated that the role of variations in non-coding regions are associated with the timing of DNA replication [21]. Variations in the non-coding genome are associated with various diseases, but to fully understand their functional effects, much research is still required.
In the human genome, around 24,000 to 60,000 coding SNPs are estimated [22,23]. nsSNPs are more significant because they have the potential to affect the structure and function of expressed proteins and are, therefore, likely to represent modifiers of inherited susceptibility to disease [24]. nsSNPs alter cellular functions in many ways. Indeed, nsSNPs often influence normal protein function through a combination of effects on protein stability, protein-protein interactions and many other features [25]. Numerous studies in the past have shown that nsSNPs are responsible for about 50% of mutations that are involved in various genetic disorders [9]. This information confirms that nsSNPs, especially missense SNPs, are associated with various human diseases. Recent studies on the nsSNPs using computational approaches reveal the potential impact of mutation on understanding the molecular mechanisms of various diseases [26][27][28].
SLITRK1 is a member of the SLITRK family and, similar to other members of the SLITRK family, is an integral membrane protein with the domain '2 N-terminal leucine rich repeat (LRR) [29]. SLITRK1 has the LRP1 domain, through which they interact with LAR receptor protein tyrosine phosphatases (PTPs) and control synapse formation [30][31][32].
The SLITRK1 gene was mapped to a region of chromosome 13q31 [29][30][31][32][33]. The SLITRK1 gene is highly expressed in adult and fetal brains; moderately expressed in lungs and pancreas; and has very low expressions in the ovaries, kidneys, heart and liver [33]. Mouse Slitrk1 cDNA was cloned by Aruga and Mikoshiba (2003), who discovered that the protein contains a signal peptide at N-terminus, which was followed by LRR domains and a transmembrane domain at the C-terminus [29]. Aruga and Mikoshiba (2003) performed a Northern blot analysis of various mouse tissues and found a very high expression of Slitrk1 only in the brain [29]. Until now, 14 different types of mutations have been identified in the SLITRK1 gene (HGMD). Mutations in the SLITRK1 gene have been reported to be involved in Tourette Syndrome (TS; OMIM No# 137580) [34]. TS is a neuropsychiatric disorder with an estimated onset in early childhood. It is characterized by vocal and motor tics. It prevails in 1 out of every 100 individuals worldwide. TS patients often have obsessive compulsive disorder (OCD) as well as attention deficit hyperactivity disorder (ADHD) along with some disorders of mood, sleep, depression and anxiety [35,36]. Many genes (such as NTN4, SLC6A4, IMMP2L, CNTNAP2, NLGN4, HDC and SLITRK1) and some chromosomal loci have been known to date to be involved in TS [37]. The dysfunction of serotonin and dopamine neurotransmitters and defects of cortico-striatal-thalamic-cortical pathways are considered to have an association with TS. Despite extensive research in genetics, the pathogenetic mechanism of TS is still largely lacking and the number of variants likely to cause TS is extremely small [38,39].
Here, in the current study, we performed a bioinformatical approach to predict the probably harmful nsSNPs and their possible consequences on the structure and function of SLITRK1 proteins. The total identified variants in SLITRK1 include 07 indels, 321 5 UTR and 624 3 UTR variants, and 05 nonsense variants. The analysis initially identified 442 missense variants out of 2255 total variants. Different pathogenicity prediction tools commonly predicted 16 variants to be presumably harmful for protein structure and function. The frequency of all 16 variants was very low. The secondary structure of these 16 variants did not show any change at the site of variation. A protein stability analysis is necessary to assess the structural and functional activity of a protein [40]. Protein stability governs the conformational structure of the protein and thus determines the function. Any alteration in protein stability may cause miss-folding, degradation or aberrant accumulation of proteins [41]. Out of 16 variants, 10 variants showed decreases in the protein stability based on the I-Mutant and MuPro tools. The conservation of amino acids at the points of substitutions was checked in nine different species including Homo sapiens using the Clustal Omega tool. The results showed that all the amino acids at the site of variations were highly conserved, indicating their structural and functional importance. The Missense 3D tool was used to check the possible loss in the 3D structure of the SLITRK1 protein, which is caused by a substitution of amino acids. The results showed that variants I, III and VIII affect the structural confirmation of SLITRK1 protein. Hence, variants I, III and VIII were selected for further in silico analysis. The wildtype SLITRK1 protein and its three variants i.e., Leu94Phe, Leu496Pro and Asn529Lys were docked with the close interactor protein PTPRD. Molecular docking analysis revealed that the aforementioned variants can possibly affect the functional activity of the SLITRK1 protein.
The limitation of current study is that the analysis was conducted without considering the disease model because a single gene may independently cause different diseases with different segregation patterns. Further to this, the alleles (obtained from genome browser for current analysis) exists in real form; however, until now, it was not associated with any disease onset due to several reasons, such as the allele being associated with some recessive condition where the presence of homozygous genotype would be necessary for disease onset. Nevertheless, the allele floats in the population, which may coincidently or due to extensive consanguinity, unite in a single individual (i.e., homozygous genotype) and cause the disorder. Therefore, in this study, we tried to focus on predictions of the detrimental effect of the allele regardless of disease inheritance pattern or disorder type. Hence, it is speculated that, if this allele segregates in an individual or family, it may have a negative impact. Moreover, the study also did not include the non-coding variants in spite of their significant role in the spatio-temporal gene expression pattern.
Conclusively, the present bioinformatic study would assist a molecular geneticist in interpreting the variant pathogenicity in research as well as diagnostic setup.