Soft Computing Methods in Bioinformatics: a Comprehensive Review

Applications of genomic and proteomic, epigenetic, pharmacogenomics, and systems biology have shown increased a lot, resulting in an explosion in the amount of highly dimensional and complicated data being generated. The data of bioinformatics fields are always with high-dimension and small samples. Genome-wide investigations generate in large numbers of data and there is a need for soft computing methods (SCMs) such as artificial neural networks, fuzzy systems, evolutionary algorithms, metaheuristic and swarm intelligence algorithms, statistical model algorithms etc. that can deal with this amount of data. The use of soft computing methods has been increased to a variety of bioinformatics applications. It is used to inquire the underlying mechanisms and interactions between biological molecules in a lot of diseases, and it is a main tool in any biological (or biomarker) discovery process. The aim of this article is to introduce soft computing methods for bioinformatics. These methods present supervised or unsupervised classification, clustering and statistical or stochastic heuristics models for knowledge discovery. In this article, the current problems and the prospects of SCMs in the application of bioinformatics is also discussed.


INTRODUCTION
Bioinformatics research, develop, and apply computational approaches for analyzing, and thus expanding, the use of biological, behavioral, and medical data.There are many biological domains of bioinformatics where SCMs are applied for knowledge extraction of biological and medical problems of from data.These problems can be classified into six different domains: genomics, proteomics, microarrays, systems biology, evolution and text mining [1].
Genomics domain is one of the most important domains in bioinformatics which discipline in genetics applications recombinant DNA, DNA sequencing methods, and to sequence, assemble, and analyze the function and structure of genomes.Genomics data requires pre-processing in order to acquire useful information.As a first step, from genome sequences, it is possible to extract the location and structure of the genes.Sequence information can be used for gene function and RNA secondary structure prediction [2,3].
Proteomic domain is an essential application of SCMs for protein structure prediction.Proteins are very complicated macromolecules with thousands of atoms and bounds.For this reason, the number of possible structures consists of very big data which makes protein structure prediction a very complicated combinatorial problem where complex optimization methods such as SCMs are required [1].
Genomic and proteomic data analysis is essential tools for understanding the underlying factors that are included in human illness problems [4].Applications of genomic and proteomic technologies have seen a high increase, resulting in a huge amount of multi dimensional and complicated data being created [5,6].This is due to their ability to get over with multi dimensional complicated datasets such as those developed by protein mass spectrometry and DNA microarray experiments.As such, artificial neural networks have been applied to diagnosis of illness problems and authentication of biomarkers.Feature selection is used along with classifier architecture to avoid over-fitting, to create more efficient classifier and to supply more insights into the underlying causal relationships [7].
Microarray domain is the management of complicated experimental data for application of computational methods in biology.Complicated experimental data causes of two types of problems.The first one is data need to pre-processing, i.e. modified to be suitably used by SCMs.The second is the analysis of the data which depends on what we search for.The most well known applications are on pattern recognition, classification and genetic network in the case of the microarray data [1].
Systems biology domain is another important domain of biology that incorporates with the soft computing methods.Systems biology is the work of systems of biological components, which might be molecules, cells, organisms or entire species.It is very complicated to model the life processes that take place inside the cell.Thus, SCMs are extremely helpful when modeling biological networks especially genetic networks, signal transduction networks and metabolic pathways [8].
Evolution domain, especially phylogenetic tree reconstruction can also take advantage of SCMs.Phylogenetic trees (or evolutionary tree) is a schematic representations of organisms" evolution demonstration the inferred evolutionary relationships a variety biological species (or other entities) based upon similarities and differences in their physical and genetic features.There are many different reasons behind the alignment of biological sequences.Biological sequence alignment helps to discover functional and structural similarity of sequences.Scientists work with these aligned sequences to constitute phylogenetic trees, characterize protein families, and estimate protein structure [9][10].Generally, they were constituted belonging to different features such as morphological features, metabolic features, etc. but, nowadays, with the great amount of genome sequences available, phylogenetic tree construction algorithms are based on the comparison between different genomes.This comparison is made by means of multiple sequence alignment, where optimization methods are very useful.
Text mining domain is a side effect of the application of SCMs for bioinformatics because of the increasing amount of data.This allows for a new source of valuable information which is required for the knowledge extraction.Thus, text mining is becoming more and more interesting in computational biology, and it is being applied in functional annotation, cellular location estimation and protein interaction analysis [11].
Bioinformatics is a discipline that built upon the fields of computer and information sciences.It relies mainly upon strategies to achieve, store, organize, archive, analysis, and visualize data.Bioinformatics (or computational biology) encompasses the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, medical, and behavioral systems.SCMs are well-suited for many bioinformatics problems including gene selection, clustering and classification, signal processing and image analysis in bioinformatics works, supervised or unsupervised classification with multi-dimensional input variables is frequently encountered.Thus, SCMs are able to get over with multi dimensional complicated datasets [12].SCMs are used to solve other bioinformatics problems.SCMs can be divided into two class as supervised and unsupervised learning rules.Unsupervised or clustering techniques is used to group similar genomic or proteomic profiles and therefore is elucidate relationships within sample groups.These techniques is also assigned biomarkers to sub-groups based on their expression profiles across patient samples.Although clustering is useful for exploratory analysis, it is delimited due to its inability to incorporate expert knowledge.Furthermore, classification and feature ranking are supervised, knowledge-based soft computing methods that estimate the distribution of biological expression data and, in doing so, can extract important information about these experiments.Classification is closely coupled with feature ranking, which is a main data reduction technique that uses to estimate classification error or other statistical tests to score features [13].Figure 1 shows that Classification of the topics where soft computing methods are applied [1].

Figure 1. Classification of the topics where soft computing methods
This article is organized as follows: The first section is an introduction to the literature of previous researches.The second section presents SCMs and gives some brief information for different types of well known algorithms of SCMs.The third section discusses the applications of SCMs on some bioinformatics problems.This section is also defined literature studies on bioinformatics.Final section explains the conclusions of this revision on SCMs in bioinformatics.

SOFT COMPUTING METHODS
Soft computing methods (SCMs) consist in programming computers to optimize a performance criterion by using example data or past experience.Basically, soft computing is not a homogeneous body of concepts and methods.Rather, it is a partnership of distinct techniques that in one way or another conform to its guiding principle.At this point, the aim of soft computing is to utilize the tolerance for imprecision and uncertainty to achieve tractability, robustness and low solutions cost.SCMs deal with imprecision, uncertainty, partial truth, and approximation to achieve practicability, robustness and low solution cost [14].It must be noticed that the efficiency of the learning and inference algorithms, as well as their space and time complexity and their transparency and interpretability, can be as important as their learning accuracy [15].The well known SCMs are: Artificial neural networks, Fuzzy systems, Bayesian network, Evolutionary algorithms, Genetic algorithms, Metaheuristic and Swarm Intelligence (such as Ant colony optimization, Bees algorithms, Bat algorithm, Cuckoo search, Harmony search, Firefly algorithm, Artificial immune systems, Particle swarm optimization etc), and Chaos theory.Generally speaking, SCMs resemble biological processes more closely than traditional methods, which are largely based on formal logical systems, such as sentential logic and predicate logic, or rely heavily on computer-aided numerical analysis (as in finite element analysis).SCMs are intended to complement each other.Unlike hard computing schemes, which strive for exactness and full truth, SCMs exploit the given tolerance of imprecision, partial truth, and uncertainty for a particular problem.Another common contrast comes from the observation that inductive reasoning plays a larger role in soft computing than in hard computing.

Artificial Neural Networks
Artificial Neural Networks (ANN) is an information processing model, implemented in hardware or software that is modeled after biological process of the brain studied.Artificial neural network has ability to derive meaning from imprecise or complicated data to extract patterns and to detect trends that are not easily to recognize by humans or other computer techniques [16][17].ANN has been mainly used to examine the complicated relationships between input and output variables in many scientific and technological areas including biomedical and bioinformatics [18][19].Some well-known ANN algorithms such as Back-Propagation (BP), Radial Based Function (RBF), and Support Vector Machines (SVM) are mostly used to solve bioinformatics problems.
The algorithm of Back-propagation used generalized delta learning rule is an iterative gradient algorithm designed to minimize the root mean square error between the actual output of a multilayered feed-forward ANN and a desired output.Each layer is fully connected to the previous layer, and has no other connection.The algorithm of Backpropagation classifier can be described as [20];  Initialization: Set all the weights and biases to small real random values. Presentation of input and desired outputs: Present the input vector x(1), x in which x j (n)= output of node j at iteration n, l is layer, k is the number of nodes of output of neural network, M is output layer, φ is activation function [21].Sigmoid and hyperbolic tangent activation functions are more effective than the other activation functions [22].The learning rate is indicated by μ.It may be noted here that a large value of the learning rate may cause to faster convergence but may also result in oscillation.
Radial basis function (RBF) neural network is based on supervised learning.RBF"s are embedded in a two layer neural network, where each hidden layer implements a radial activated function.The output layer realize a weighted sum of outputs of hidden layer.All hidden nodes simultaneously receive the n-dimensional real valued input vector X.The output of hidden-layer, j Z is obtained by closeness of the input X to an n-dimensional parameter vector j  associated with the jth hidden layer.The response characteristics of the jth hidden layer ( j = 1, 2, , J) is assumed as where K is a strictly positive radials symmetric function (kernel) with a unique maximum at its "centre" j  and which drops off very fast to zero away from the centre.j  is the width of the receptive field in the input space from layer j [23].This means that j Z has an perceptible value only when the distance than the width j  .Given an input vector X, the output of the RBF network is the L- dimensional activity vector Y, whose lth component (l = 1, 2 L) is given by, Support Vector Machines (SVM) is specifically used to solve a binary classification problem in a supervised manner and the learning problem is formulated as a quadratic optimization problem where the error surface is free of any local minimum and has global optimum [24].SVM is to build an optimal separating hyper plane in such a way that the margin of separation between two classes is maximized.SVM accomplish this desirable property on the basis of the principle of structural risk minimization.To realize the SVM based classifiers for linearly separable patterns, let us consider a training set indicated by {(x j ,y j )} (j=1,..., N), where x j is the n-dimensional input feature vector and y j indicates the desired (or target) output.The input patterns indicated by the desired output y j = 1 constitute the positive group and the desired output y j = -1 constitute the negative group [25].Now suppose we have a machine whose task it is to learn the mapping x j y j .The machine is defined by a set of possible mappings xf(x;), where the functions f(x;) themselves are sorted by the adjustable parameters .The machine is assumed to be deterministic: for a given input x, and selection of , it will always give the same output f(x;).A particular selection of  creates what we will call "trained machine."Thus, for example, an ANN with fixed structure, with  corresponding to the weights and biases, is a learning machine in this sense.The expectation of the test error for a trained machine is therefore: Note that, when a density p(x; y) exists, dP(x;y) can be written p(x;y)dxdy.This is a nice way of writing the true mean error, but unless we have an estimate of what P(x; y) is, it is not very useful.The quantity R() is named the expected risk, or just the risk.
Here we will call it as the actual risk, to emphasize that it is the quantity that we are interested in.The "empirical risk" R emp () is defined to be just the measured mean error rate on the training set (for a fixed, finite number of observations): Note that no probability distribution appears here.R emp () is a fixed number for a particular choice of  and for a particular training set fxi; yig.The quantity 1/2y i -f(x i ,  is called the loss.For the case described here, it can only take the values 0 and 1.Now choose some η such that 0 ≤ η ≤1.Then for losses taking these values, with probability 1-η, the following bound holds) [26]: where h is a non-negative integer called the Vapnik Chervonenkis dimension, and is a measure of the notion of capacity mentioned above.In the following we will call the right hand side of the last equation as risk bound [27].

Fuzzy Systems
The Fuzzy system model is the knowledge-based model with linguistic rules.Fuzzy sets are described for all input and output variables and the set of rules.Fuzzy logic ensures the means to process this knowledge and compute output values for given input data.The main problem of this approach is to find a suitable set of linguistic rules that define the system to be modeled [28].Fuzzy systems is represented in the form of if-then rules or fuzzy conditional statements as in the expression of the form IF A THEN B, where A and B are labels of the fuzzy sets.The set of rules should be complete and provide an answer for every input value.
Fuzzy systems consist of three steps as the fuzzification, fuzzy inference and the defuzzification.The fuzzification module pre-processes the input values submitted to the fuzzy expert system.The inference engine uses the results of the fuzzification module and accesses the fuzzy rules in the fuzzy rule base to infer what intermediate and output values to produce.Fuzzification is the transformation of numerical variables into linguistic variables and the corresponding allocation of the grade of membership (changing between 0 and 1) to the different membership functions [29].The linguistic combination of the traits is achieved in the fuzzy inference system (FIS).There are two types of FIS models; Mamdani FIS model and Sugeno FIS model.Here we have only described Mamdani FIS model.The rules used are resulted from human knowledge and have the form: if condition, then conclusion.The degree to which each part of the condition has been fulfilled for each rule is known by the belonging grades of membership.The final output of the fuzzy system is provided by the defuzzification module.Through the calculation of the centre of gravity of these areas, the fuzzy values are converted back in order to resolve a single output value from the set.The centroid technique is used for defuzzification.The centroid of composed shape is computed by, where z is the consequent variable and μ c (z) is the function of the composed form [30].
Fuzzy c-means (FCM) clustering algorithm is often used as an initial step for the fuzzy systems to find membership values of each training data vector in each cluster.These membership values are assumed to represent best partitions of the given dataset [31].Formally, clustering an unlabeled data X = {x 1 , x 2 , . . ., x N } ⊂ Rh, where N represents the number of data vectors and h the dimension of each data vector, is the assignment of c partition labels to the vectors in X. c-partition of X constitutes sets of (cN){uik} membership values that can be arranged as a (c × N) matrix U = [uik].The problem of fuzzy clustering is to find the optimum membership matrix U.The most often used function for fuzzy clustering is the weighted within-groups sum of squared errors Jm, which is used to describe the following constrained optimization problem [32]: ( min where V ={v1, v2, . . ., vc} is the vector of (unknown) cluster centers, and ǁxǁ A =(x T Ax) 1/2 an inner product norm.A is an h × h positive definite matrix, which specifies the form of the clusters.The matrix A is generally selected as the identity matrix, leading to Euclidean distance and, consequently, to spherical clusters.Fuzzy partitions are implement using the FCM algorithm through an iterative optimization of considering the following steps [33]:  Choose the number of clusters (c), weighting exponent (m), iteration limit (iter), termination criterion (_>0), and norm for error ǁV t − V t−1 ǁ.  Guess initial position of cluster centers:  Iterate for t = 1 iter, calculate and  IF error=ǁV t − V t−1 ǁ≤ , THEN stop, and put (U f , V f ) = (U t , V t ) for NEXT t.
There is some special model to find the optimum number of clusters model such as the fuzzy function cluster validity index [34].

Statistical Model Algorithms
Different statistical classification algorithms can also use to solve bioinformatics problems such as K-Nearest Neighbors and Naïve Bayes.
K Nearest Neighbor (K-NN) is an simple non parametric algorithm which is a method for classifying cases based on their similarity to other cases.Similar cases are near each other and dissimilar cases are distant from each other.Thus, the distance between two cases is a measure of their dissimilarity.Training a nearest neighbor model involves computing the distances between cases based upon their values in the feature set.The nearest neighbors to a given case have the smallest distances from that case.The distance is calculated using one of the following measures [35]: A Naïve Bayes classifier is a simple but effective probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions.A more descriptive term for the underlying probability model would be independent feature model.According to the precise nature of the probability model, Naïve Bayes classifiers is trained efficiently in a supervised learning setting.In numerous applications, parameter estimation for Naïve Bayes uses the technique of maximum likelihood.In spite of their naive design and apparently over-simplified assumptions, Naïve Bayes classifiers often work much better in many complex real-world situations than one might expect [36].Note that the naive Bayes classifier assumes the conditional independence of features.This assumption however does not hold in most cases.Despite this apparent violation of the assumption, the naive Bayes classifier exhibits good performance for various natural language processing tasks.An advantage of Naïve Bayes classifier is that it needs to less training data to estimate the parameters (means and variances of the variables) necessary for classification [37].Naïve Bayes classifier combines this model with a decision rule.One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori decision rule.The corresponding classifier is the function classify defined as follows [36]:

Metaheuristic and swarm intelligence algorithms
Recently, well-known modern heuristic algorithms such as Genetic Algorithm (GA), Differential Evolution (DE), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), and Ant Colony Optimization (ACO) are used on bioinformatics problems.
Genetic Algorithm (GA) is good candidates for this task since GA is most useful in multiclass, high-dimensionality problems where heuristic knowledge is sparse or incomplete.Holland [38] defined a methodology for studying natural adaptive systems and designing artificial adaptive systems.It is now often used as an optimization technique, based on an analogy to the process of natural selection in biology.A GA approach needs to a population of chromosomes representing a combination of features from the solution set, and needs to a cost function (called an valuation or fitness function).This function computes the fitness of each chromosome.The algorithm manipulates a finite set of chromosomes (the population), based loosely on the mechanism of evolution.In each generation, chromosomes are subjected to certain operators, such as crossover, inversion and mutation, which are analogous to processes which consists of in natural reproduction.Crossover of two chromosomes produces a pair of offspring chromosomes which are synthesis of the traits of their parents [39].The Basic Genetic Algorithm consists of following steps: 1. Generate random population of n chromosomes , 2. Evaluate the fitness f(x) of each chromosome x in the population, 3. Generate a new population by repeating following steps till the new population is complete;  Select two parent chromosomes from a population depending on their fitness  With a crossover probability cross over the parents to form a new offspring.If no crossover was performed, offspring is an exact copy of parents  With a mutation probability mutate new offspring at each locus  Place new offspring in a new population, 4. Use new generated population for a further run of algorithm, 5.If the end condition is satisfied, stop, and return the best solution in current population, 6. Go to step 2 Differential Evolution (DE) is a population-based search strategy very similar to standard evolutionary algorithms.The major difference is in the reproduction step where offspring is created from three parents using an arithmetic cross-over operator.DE is described for floating-point representations of individuals.DE does not use of a mutation operator that is related some probability distribution function, but introduces a new arithmetic operator which depends on the differences between randomly selected pairs of individuals [40].For each parent, x i (t), of generation t, an offspring, x′ i (t) is generated in the following way: Randomly select three individuals from the current population, namely x i1 (t), x i2 (t), and x i3 (t), with i 1 ≠ i 2 ≠ i 3 ≠ iψ and i 1 , i 2 , i 3 ..., U(1,…, s), where s is the population size.Select a random number rψ˜U(1,…, N d ), where N dψ is the number of genes of a single chromosome.Then, for all genes jψ= 1, ψN d , if U(0,ψ 1 ) < ψP r , or if jψ = r, let; Here, P r ψ is the probability of reproduction (with P r ψ∈[0ψ1]), γψ is a scaling factor with γ ∈ (0ψ∞), and x′ i,j (t) and x i,j (t) indicate respectively the jth genes (or parameter) of the offspring and the parent.Thus, each offspring consists of a linear combination of three randomly chosen individuals when U(0,ψ1) < ψ P r ; otherwise the offspring inherits directly from the parent.Even when P r ψ= 0, at least one of the parameters of the offspring will differ from the parent [41].
Particle swarm optimization (PSO) is an optimization technique which has been developed being inspired by the social behaviors of swarms like bird flocking or fish schooling by Kennedy and Eberhart [42].In PSO method, each potential solution is referred as a particle and each particle has positions (x i;j ) and velocities (v i;j ) in a jdimensional feature space [43].The solution set which consists of the particles is called as swarm.At the beginning of the algorithm, each particle is generated by taking random values from the solution space.The success of each particle is determined employing a fitness function.Through the iteration process, the best instance of each particle and the swarm is kept as local bests (P besti;j ) and global best (G besti;j ) respectively.The velocity and position of each particle is updated utilizing these equations [44]; where i is the index of the particle, j is the index of the position in particle, t shows the iteration number, v i;j (t) is the velocity of the ith particle in the swarm on jth index of the position in the particle and x i;j (t) is the position.R 1 and R 2 are the random numbers uniformly distributed between 0 and 1. c 1 and c 2 are the acceleration numbers and default values are 2 and w is the inertial weight.The original procedure for implementing PSO is as follows [45]: 1. Generate each particle randomly within the j-dimensional feature space.2. Evaluate the success of each particle using the tness function.
3. If the success of the current particle is better than the success of Pbesti;j then determine Pbesti;j as the current particle.4. If the success of the current particle is better than the success of Gbesti;j then determine Gbesti;j as the current particle.5. Update the velocity and position of the particle using equations given above.6. Repeat the steps from 2 to 5 until the stopping criteria or maximum iteration is reached.
Artificial Bee Colony (ABC) algorithm has been presented by Karaboga for optimizing numerical problems.The algorithm simulates the intelligent foraging behavior of honey bee swarms.It is a very simple but efficient, robust and population based stochastic optimization algorithm.In ABC algorithm, the colony of artificial bees includes three groups of bees: employed bees, onlookers and scouts.A bee waiting on the dance area for making a decision to select a food source is named onlooker and one going to the food source visited by it before is called employed bee.The other kind of bee is scout bee that carries out random search for discovering new sources.Pseudo-code of the ABC algorithm is [46]: Ant Colony Optimization (ACO) agents mimic the foraging behavior of their biological counterparts in finding the shortest-path to the food source.The first algorithm following the principles of the ACO metaheuristic is the Ant System, where ants iteratively construct solutions and add pheromone to the paths corresponding to these solutions [47][48].Path selection is a stochastic procedure based on two parameters, the pheromone and heuristic values.The pheromone value gives an pointing of the number of ants that select the trail recently, while the heuristic value is a problem dependent quality measure.When an ant arrives a decision point, it is more likely to select the trail with the higher pheromone and heuristic values.Once the ant reaches at its destination, the solution corresponding to the ant"s followed path is evaluated and the pheromone value of the path is increased accordingly.In addition to, evaporation admits of the pheromone level of all trails to diminish gradually.Therefore, trails that are not reinforced gradually lose pheromone and will in turn have a lower probability of being chosen by subsequent ants.ACO algorithm consists of the specification of the following aspects [49];  An environment that indicates the problem domain in such a way that it lends itself to incrementally building a solution to the problem. A problem dependent heuristic evaluation function, which ensures a quality measurement for the different solution components. A pheromone updating rule, which bring in account the evaporation and reinforcement of the trails. A probabilistic transition rule based on the value of the heuristic function and on the strength of the pheromone trail that determines the path taken by the ants. A clear specification of when the algorithm converges to a solution.
There are some more algorithms which are used to solve bioinformatics problems such as Hidden Markov Model, Decision Trees, Stochastic Optimization, Dynamic Programming, Stochastic Context Free Grammars, Multiple Kernel Learning, Max-Margin Structured Output Learning, Needleman-Wunsch algorithm, Instance-based learning, Case Based Reasoning, and Self Organizing Feature Maps (SOM), Principal Component Analysis, Independent Component Analysis etc.Some type of inductive learning, Evolutionary programming and combinational (or hybrid) models can be used to solve the same problems.

SCMs applications on Sequence alignment
Sequence alignment is a common task in bioinformatics.It plays an essential role in detecting regions of significant similarity among a collection of primary sequences of nucleic acids or proteins.If they are highly similar, then they have similar 3D structures or share similar functions.Given a family S = (S1,..., SN) of N sequences, the problem can formally be represented as a set of sequences, and each sequence has its own length.The characters of sequences are defined over an alphabet Σ including a gap symbol denoted by "-", which is a molecular biology term, indel (insertion or deletion).The indels indicate that some parts of a sequence are inserted or deleted.The sequence is either a DNA, ribonucleic acid (RNA), or amino acid (protein) sequence.The nucleotide bases are adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U).The alphabet is {A, C, G, T} and {A, C, G, U} for DNA and RNA, respectively.The sequence alignment problem has two computational approaches: local alignment and global alignment.Global alignment is used Needleman-Wunch algorithms.Local alignment is used Smith-Waterman algorithms.In global alignment, sequences are aligned as a whole, whereas in local sequence alignment, similarities detected locally between sequences are aligned [50].Assume that 2 DNA sequences are given as S1 = {GCTGAACG} and S2 = {CTATAATC} with lengths |S1| and |S2|, respectively.This pair of sequences can be aligned as shown in Figure 2.

An alignment without gaps: GCTGAACG CTATAATC
An alignment with gaps: GCTGA--A--CG --CT--ATAATC Gap is sequence of g missing characters inserted in a string to achieve alignment.Gaps are assigned with two kinds of negative scores: Gap-open penalty: negative score associated with the initiation of a gap (i.e., with the first missing character), and Gapextension penalty: negative score associated with each additional missing character.
For reasons of computational complexity, sequence alignment is divided into two categories:  Pairwise alignment (i.e., the alignment of two sequences). Multiple-sequence alignment (i.e., the alignment of three or more sequences).Pairwise alignment problems have exact solutions by using dynamic programming.Multiple-sequence alignment problems have approximate (heuristic) solutions.The function of sum-of-pairs is the most popular scoring method for evaluation of the quality of the alignment.The goal of general multiple sequence alignment algorithms is to find out the alignment with the highest sum-of-pairs [50].There are numerous existing methods for sequence alignment.The efficiency of an alignment is assessed by the application of SCMs.Table 1 summarizes some applications of sequence alignment with their used SCMs chronologically (from 1993 to 2011).In this table, the first column describes the authors and the second column describes the soft computing methods that were used.According to Table 1, we can say that metaheuristic and swarm intelligence algorithms are more useful than the other soft computing algorithms for Sequence alignment.

SCMs application on single nucleotide polymorphism problem
Single Nucleotide Polymorphism (SNPs) are simply sequence variations between individuals at a particular point in the genome.Genetic variants mostly consist of SNPs, and human genome is estimated to include around 10 million SNPs [77][78].Most of these genome-wide association (GWA) studies are aimed to determine genetic variants possibly related to complex diseases.Since SNPs are single-base pair changes, the smallest unit of genetic variation, they are present in very small segments of DNA and are more likely to survive severe environmental degradation than any other form of genetic variation.In this regard, it is generally preferred to use SNPs in GWA studies which are used soft computing methods [79].The number of individuals and SNPs are quite effective on the statistical significance of a GWA study.However, it is still very expensive and time-consuming to genotype all the SNPs in a large population found in the candidate area for large-scale GWA studies [80].
SNPs found on chromosome set is called a haplotype.High-given methods, each allele are not capable of distinguishing the source chromosome.Position of the two alleles of a SNP is usually only take care of such methods.This is the source of alleles chromosomes identifiable.This combined with the target locus allele genotype is called knowledge.In a healthy state of being of an individual or the individual's phenotype of the patient is called.Figure 3 describes all haplotype, genotype, and phenotype [81].There are different SCMs applied on SNPs in recent years.These methods are based on the fact that human genome can be divided into discrete blocks, and small sets of common haplotypes in each block are shared by a specific population.Table 2 summarizes some applications on SNPs with their used SCMs chronologically.According to Table 2, we can say that Artificial Neural Networks (ANN) algorithms are more useful than the other soft computing techniques for single nucleotide polymorphism.

CONCLUSIONS
According to this review article we can say that there is a wide area to develop new methods which might share the advantages of SCMs analysis in terms of implementing a parsimonious approach to detect the patterns of multi-marker genotypes which can be observed when an associated susceptibility locus is present.Bioinformatics data needs to be invested in collecting samples of cases and controls and obtaining genotypes it seems sensible to argue that considerable effort should be expended on ensuring that methods of analysis applied to the data obtained are as effective as possible.
I can advice the close collaboration between biologists and bioinformaticians are to make available user-friendly software packages that can be used jointly by researchers with expertise in experimental biology and researchers with expertise in computer science.Because, soft computing methods may not be interpretable by a biologist.For example, a question that requires multiple processors to answer might need the assistance of someone with expertise in parallel computing.To be intuitive to a biologist, the software needs to be easy to use and needs to provide output that is visual and easy to navigate.As a result, we can say that if biologists want to solve bioinformatics problems more easily, they have to collaborate with computer scientist who have experienced on soft computing methods.This collaboration provides to solve complex bioinformatics problems.Future bioinformatics databases and analysis tools that successfully integrate with their collaborations which will prove to be the most useful for biological and biomedical discovery.
Euclidean Distance  Minkowski Distance  Mahalanobis Distance Simple K-NN algorithm consists of following steps:  For each training example <x,f(x)>, add the example to the list of training examples,  Given a query instance xq ¨ Given a query instance x to be classified, q to be classified, Let x1, x2….xk denote the k instances from training examples that are nearest to xq.Then, return the class that represents the maximum of the k instances.
Load training samples,  Generate the initial population z i , (i=1...SN),  Evaluate the fitness (f i ) of the population,  Set cycle to 1,  repeat  For each employed bee{ Produce new solution v i by using (v ij = z ij +  ij (z ij − z kj ) Compute the value f i Apply greedy selection process}  Compute the probability values p i for the solutions (z i ) by (p i )  For each onlooker bee{ Select a solution z i depending on p i Produce new solution v i Calculate the value f i Apply greedy selection process}  If there is an abandoned solution for the scout then replace it with a new solution which will be randomly produced by  Memorize the best solution so far  cycle=cycle+1  until cycle=MCN

Table 1 .
Soft Computing methods applied for sequence alignment problem

Table 2 .
Soft Computing methods applied for single nucleotide polymorphism