Next Generation Sequencing Based Forward Genetic Approaches for Identification and Mapping of Causal Mutations in Crop Plants: A Comprehensive Review

The recent advancements in forward genetics have expanded the applications of mutation techniques in advanced genetics and genomics, ahead of direct use in breeding programs. The advent of next-generation sequencing (NGS) has enabled easy identification and mapping of causal mutations within a short period and at relatively low cost. Identifying the genetic mutations and genes that underlie phenotypic changes is essential for understanding a wide variety of biological functions. To accelerate the mutation mapping for crop improvement, several high-throughput and novel NGS based forward genetic approaches have been developed and applied in various crops. These techniques are highly efficient in crop plants, as it is relatively easy to grow and screen thousands of individuals. These approaches have improved the resolution in quantitative trait loci (QTL) position/point mutations and assisted in determining the functional causative variations in genes. To be successful in the interpretation of NGS data, bioinformatics computational methods are critical elements in delivering accurate assembly, alignment, and variant detection. Numerous bioinformatics tools/pipelines have been developed for such analysis. This article intends to review the recent advances in NGS based forward genetic approaches to identify and map the causal mutations in the crop genomes. The article also highlights the available bioinformatics tools/pipelines for reducing the complexity of NGS data and delivering the concluding outcomes.


Introduction
Availability of abundant genetic variability and diversity in the gene pool is the elementary need for genetic enhancement of any crop species. Conventional plant breeding is entirely dependent on the accessibility of sufficient genetic variations for crop improvement. However, the required genetic variations may not always be available in proper form [1,2]. Crop improvement through conventional plant breeding might be hampered due to a complicated and long breeding cycle, availability of

Need of Identification and Mapping of Causal Mutations
Mutagens create genome wide DNA variations viz., single base substitutions, deletions, inversions, translocations etc. in crop plants. Of the several mutations, a single mutation or a few of them determine the function of the mutated gene. Different approaches through NGS techniques and analysis pipelines help in pinpointing such mutations through the usage of mutant-derived populations. Thus, the idea of applying such NGS based pipelines to correctly identify the causal mutations will help us to understand the nature of mutations, identifying genes, and the impact of such genes on phenotypes.
After establishing the facts on the nature of mutations and mutant genes, some gene specific markers may be generated to transfer the mutant trait in the background of high yielding genotypes with the help of marker (gene-based) assisted breeding (MAB). Such markers for climate resilient mutant traits will be really helpful to transfer the mutant trait even in absence of sophisticated screening facility in field. Transfers of sub1 and salt tolerant gene in rice are the prominent example of such efforts [4,47]. Moreover, understanding of casual mutations will also assist us to apply the same knowledge for improvement of traits of a mega-variety through the use of new breeding technologies within a short span of time. In the way, more than 300 popular rice landraces has been targeted for their revival and improvement through radiation induced mutation breeding [40,41]. Till now more than 18 stable rice mutants have been developed in the background of 15 popular rice landraces. Several mutant lines are under various stages of stabilization and evaluation. Moreover, three mutant varieties viz., Trombay Chhattisgarh Dubraj Mutant-1 (TCDM-1), Vikram-TCR and Chhattisgarh Jawaphool Trombay has been released. From which, TCDM-1 has been notified by Government of India for commercial cultivation in Chhattisgarh state, India and rest two varieties are under the process of notification [41,42]. Moreover, the mutant TCDM-1 is becoming very popular among the farmers' of Chhattisgarh state. Dubraj, the parent of TCDM-1 was famous for their aromatic short grains and excellent cooking quality however, it was disappeared from farmers' field due to tall stature, late maturity and poor yield potential [40,42]. Similarly, Vikram-TCR (parent: Safri-17) is popular for higher yield potential, drought tolerance ability and excellent puffed rice making quality, in addition to semi-dwarf stature and mid-early maturity habit. With the help of mutation breeding, we have reduced the plant height, maturity duration and increased the yield potentials of these mutants [41]. Therefore, mutation breeding is a powerful approach to improve one or two undesirable traits in crop plants [40,41].
A major bottleneck in plant mutation breeding is the essentiality of generating and evaluating large mutant populations to increase the possibility of getting a desirable mutant [6]. This problem may be conquered by site-directed mutagenesis, the process of creating a mutation at a target site in a DNA molecule; and insertion mutagenesis, the insertion of T-DNA or activation of transposable elements, cisgenesis [43][44][45][46]. However, these site-directed approaches require high technicalities and genetic engineering expertise along with the amenability of plant species for tissue culture and callus differentiation [4].

Need of Identification and Mapping of Causal Mutations
Mutagens create genome wide DNA variations viz., single base substitutions, deletions, inversions, translocations etc. in crop plants. Of the several mutations, a single mutation or a few of them determine the function of the mutated gene. Different approaches through NGS techniques and analysis pipelines help in pinpointing such mutations through the usage of mutant-derived populations. Thus, the idea of Plants 2020, 9, 1355 5 of 47 applying such NGS based pipelines to correctly identify the causal mutations will help us to understand the nature of mutations, identifying genes, and the impact of such genes on phenotypes.
After establishing the facts on the nature of mutations and mutant genes, some gene specific markers may be generated to transfer the mutant trait in the background of high yielding genotypes with the help of marker (gene-based) assisted breeding (MAB). Such markers for climate resilient mutant traits will be really helpful to transfer the mutant trait even in absence of sophisticated screening facility in field. Transfers of sub1 and salt tolerant gene in rice are the prominent example of such efforts [4,47]. Moreover, understanding of casual mutations will also assist us to apply the same knowledge for improvement of traits of a mega-variety through the use of new breeding technologies within a short span of time.

Concept of Mapping, Sequencing, Resequencing, and Mapping by Sequencing
Genome mapping is an important tool for sequential allocation of loci/genes along the chromosomes and to determine the relative distances between them. Genetic linkage mapping and physical mapping are two forward genetic approaches for genome mapping. Genetic linkage mapping method involves constructing genetic maps to show the relative position of genes/polymorphic markers along the chromosomes and is based on the Mendelian principles of segregation and recombination. The first genetic linkage map was constructed in 1913 for fruit fly (Drosophila melanogaster) using phenotypic markers [48]. Since then, morphological markers have been used for genetic mapping in many studies. However, morphological markers have limited polymorphism and are strongly influenced by environment, making them less useful in plant breeding. With the advent of DNA technology, different kinds of molecular markers such as restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), Inter simple sequence repeat (ISSR), microsatellite or simple sequence repeat (SSR), amplified fragment length polymorphism (AFLP), single nucleotide polymorphism (SNP) have been identified and deployed to accelerate genetic mapping studies in crop plants [49].
Most of the agriculturally important traits in crop plants are complex or quantitative in nature. Accurate and well-saturated genetic maps may serve as an important tool in genetic and genomic analysis of complex traits [50,51]. Furthermore, genetic mapping is a pre-requisite to facilitate high-resolution genetic mapping, map-based cloning, and construction of physical maps. Multiple genetic maps are currently available for most of the important crop species including cereals [52][53][54], legumes [55][56][57], oil seeds [58][59][60], etc. Genetic mapping requires the generation of large segregating populations by crossing two parents having contrasting phenotypic differences for one or more traits of interest. Different types of segregating populations that can be utilized for genetic mapping usually consist of F 2 populations, backcross (BC) populations, recombinant inbred lines (RILs), and double haploids (DH) [50].
For genetic linkage mapping, Michelmore et al. [61] developed a simple and rapid method called bulked segregant analysis (BSA) to identify the molecular markers linked to the gene of interest. In BSA, two DNA bulks contrasting for the target trait are prepared from a segregating bi-parental population and screened with molecular markers to identify the polymorphic markers that distinguish the two bulks. Based on segregation analysis, the identified polymorphic markers are then mapped to the target gene to identify the precise genetic distance between the markers and the gene. These genetic distances are calculated based on recombination frequency between the markers and gene, and are usually expressed in Centimorgan (cM). BSA is the most widely used method to map the genes controlling simple traits in plants, and can also be applied to genetic dissection of the QTLs by screening bulks of informative individuals. BSA approach has been successfully employed to map many agronomically important traits in crop plants [62][63][64][65][66][67].
While genetic mapping provides the location of the target genes/loci, physical mapping is required to get an estimation of the actual (physical) distance between loci/genes on a chromosome. A physical map consists of linearly ordered array of genomic DNA fragments encompassing the whole genome Plants 2020, 9, 1355 6 of 47 or a particular genomic region of interest. The physical distances between loci are expressed as the number of base pairs between them. For physical mapping, large-insert genomic DNA libraries constructed using high capacity vectors such as yeast artificial chromosome (YAC), bacterial artificial chromosome (BAC) and cosmids are required and have been constructed for most of the major crop and model plants [68][69][70][71][72][73][74]. Physical maps are important genomic resources for map based gene cloning, analyzing chromosome and genome structure, and establishing relationship between genetic (cM) and physical distances (bp).
DNA sequencing has been a high priority in genetics research to determine the sequence of individual genes, larger genomic regions, full chromosomes, or entire genomes of any organism. Initially, Sanger sequencing (chain termination method) and Maxam-Gilbert sequencing (chemical degradation method) were the two methods developed for DNA sequencing [75,76]. Because of its high efficiency and certain other advantages, Sanger sequencing gained importance and became the most preferred and widely used DNA sequencing technique among the biologists. In this method, read length up to 1000 bp can be obtained with an accuracy of 99.99%. With the development of automated Sanger sequencing platforms, Sanger sequencing has been extensively used for construction of reference genomes of several plant species like Arabidopsis [77], rice [78], maize [79], sorghum [80], and soybean [81]. However, high operational costs, low throughput, and longer time to output have limited the application of Sanger sequencing for whole genome sequencing (WGS) in many species, particularly in those having large genome. The emergence of next-generation sequencing (NGS) technologies has reduced the sequencing cost several fold and greatly increased sequencing throughputs. Over the past 12 years, advances in NGS technologies haveled to the development of many commercial NGS platform such as Roche/454 (GS20, GS FLX, GS FLX Titanium, GS Junior and GS Junior+), Illumina/Solexa (MiniSeq, MiSeq, HiSeq and HiSeqX), ABI/SOLiD (5500 W, 5500×lW), Ion Torrent (Ion PGM, Ion Proton, Ion S5), PacBio and Oxford Nanopore (MinION) [82,83]. NGS technologies utilize massively parallel sequencing from multiple samples at much reduced cost to generate several giga bases of sequence information per day with minimum error rate [84].
In species where WGS are available, NGS technologies allow re-sequencing of the genome rapidly by using already available genome sequence as a reference to guide the alignment of the reads. The re-sequencing facilitates identification of minor sequence variations such as SNPs and insertions/deletions (InDels) between the reference genomes and the sample of interest by comparing the consensus sequence, leading to rapid mapping and identification of desirable mutations [85,86].

Role of NGS in Detection and Mapping of Mutated Genes/Locus
Mutagenesis-based screens are a powerful tool to identify novel genes or gene functions. Though it is not difficult to generate mutants using various chemical or physical mutagens, the molecular identification and characterization of genes involved in the altered phenotype/biological processes remain the primary goal of such mutant analyses. Identification and mapping of desired mutations in any genome involve several steps viz., genetic mapping of a wide chromosomal region having gene of interest (mutated genes); (i) identification of candidate genes (mutated genes) contained in the identified chromosomal region; and (iii) validation of identified candidate genes responsible for the mutant traits.
In the past, various strategies including transposon-or transgene-tagged mutagenesis, molecular markers were tried to find out the mutated gene [87]. But, as mutation happens randomly, conventional DNA markers can define the wide or large chromosomal region having the desired mutation. Generation, identification, and characterization of mutants have always been an important part of plant breeding. However, identification of the causal gene for the mutant phenotype by classical linkage analysis and map based cloning is a costly, laborious, time-intensive task and imposed a significant limitation. Therefore, targeted sequencing within the selected chromosomal region may be required to identify the true causal mutation. The limitations of positional cloning and linkage mapping have been resolved with the advent of next generation sequencing. It may serve as an important tool for easy and rapid identification and mapping of the causal mutations through whole genome sequencing [88,89]. It has greatly improved the power and efficiency of mutant identification that not only allows the identification of genetic markers but also enables for the simultaneous identification and mapping of causal mutations. In the last decade, the method of mapping and cloning the mutations of interest has advanced quickly with the advent of NGS-based approaches [8,11,88,90].
NSG technologies have already gained widespread popularity in plant breeding. Apart from whole genome sequencing, NGS along with powerful computational pipelines have provided novel and rapid ways for transcriptome sequencing, molecular markers discovery, gene expression studies, and targeted re-sequencing to identify agronomically important genes in plants [91][92][93][94]. The NGS helped to sequence whole genome in a matter of weeks rather than years and in turn, led to the development of several NGS-based mapping approaches in plants that have significantly reduced the time required for mutation identification [11,12,95]. The approaches that primarily rely on the principle of 'mapping by sequencing' have reduced the efforts and time required to identify the causal mutations [11]. Many mapping by sequencing approaches that combines classical mapping strategies with NGS have been successfully applied to directly map causative mutations in plants [11][12][13][14][15][16][17][18][19][20].

NGS Based Forward Genetics for Identification and Mapping of Causal Mutations
Once a mutant is generated, it is important to know which genes have been altered to induce the phenotype of interest. In this context quantitative trait locus (QTLs) and map-based cloning have been proved as effective forward genetic approaches to characterize point mutations or small insertions/deletions (InDels). Once a link between QTL and the quantitative trait is established, it may be further mapped or cloned individually to identify the gene of interest. Depending upon the size of QTL, fine mapping, association mapping, or positional cloning are further performed with large number of molecular markers to identify the candidate gene responsible for the trait of interest [96]. The NGS based forward genetic approaches have followed the three main approacheswhich are briefly described below: (a) Genotyping by sequencing (GBS) approach Among various NGS platforms, genotyping by sequencing (GBS) has significantly increased the availability and applicability of molecular markers for crop improvement. GBS helps in the identification and genotyping of a huge number of SNPs. Candidate SNPs may be linked with desired traits with the help of genome-wide association mapping and/or QTL mapping and further utilized in marker-assisted breeding for gene introgression or to validate trait-linked haplotypes in crop plants. GBS approaches were used for identification and mapping of genes/QTLs in recombinant inbred lines of rice [97,98], maize, and barley [94] and double haploid population in wheat [99]. Though GBS does not necessarily require a reference genome, the two major drawbacks viz., intrinsic error rate of sequencing process and low depth of sequencing areassociated with this approach [100].
(b) Whole genome resequencing (WGR) approach With the availability of whole genome sequence as a reference for many commercial crops, the whole genome resequencing (WGR) has gained a lot of importance. In this approach, sequencing of a new individual is performed and compared with its reference genome to identify not only polymorphism (including SNPs) but also structural variants like insertions-deletions (InDels), gene conversions, etc. [92]. Using this approach, constitutive photomorphogenic-9 (COP 9) signalosome complex subunit 8 (CSN8) genes responsible for seed weight in chickpea wereidentified. Similarly, the WGR was effectively utilized for identification of the desired gene in other crops including Setariaitalica [81] and rice [86].
(c) RNA-Seq approach (whole exome sequencing approach) RNA-Seq i.e., direct sequencing of cDNA derived from total transcripts has turned out to be an important tool for comprehensive profiling of QTL genes expression. This approach focuses on the protein coding regions in the genome, comprising approximately 1-2% of the genome [101]. NGS assisted expression profiling in a mutant and its comparison with the parent can identify candidate Plants 2020, 9, 1355 8 of 47 genes associated with the desired phenotype. In sorghum (Sorghum bicolor L.), comparative RNA-Seq analysis of parents revealed 108 differentially expressed genes (DEGs) were involved in plant hormone metabolism, glycolysis and nitrogen metabolism [102]. These DEGs were situated near to QTLs of multiple agronomic traits under normal and low-nitrogen conditions. Similarly, RNA-Seq analysis and QTL-analysis jointly helped to identify the gene of interest in maize [103] and Glycine max L. [104]. Further, integration of both the approaches i.e., RNA-Seq and QTLs (called expression QTLs or eQTLs) enabled the expression of complex traits [105,106].
By following these methodologies, numerous NGS based forward genetic approaches have been developed and applied in various crops for identification and mapping of causal mutations ( Table 3). Most of the approaches followed the principles of mapping by sequencing of bulk segregants of two populations. However, minor differences in type of base materials to be used, need of reference genomes, the imposition of custom filters, utilization of bioinformatics tools/pipelines, etc. have been found in all the NGS based approaches ( Table 2). The brief description of these NGS based forward genetic approaches will be discussed individually in the subsequent paragraphs.

SHOREmap
SHOREmap was developed by Schneeberger et al. [11] to identify the causal mutations. It helps in genome wide genotyping and sequencing of a candidate gene from a large pool of recombinant lines.
It follows the principle of mapping-by-sequencing. Selfing of mutagen induced M 1 lines yields M 2 population from which desired mutant is selected. This recessive mutant (after confirming its true breeding behavior) is crossed with genetically diverse wild-type line followed by selfing to produce F 2 progenies. Resultant F 2 population will segregate for the mutant phenotype. The individuals displaying mutant phenotypes (~500) can be isolated, pooled, and sequenced up to genome coverage of 22×. An 'interval' plot is also created. The relative allele frequencies of the two mapping parents can be represented by an 'interval' plot which is obtained after analysis revealing the candidate region harboring site of mutation. INTERVAL generates a visual output which allows the user to define a mapping interval. By default, INTERVAL generates 10 different plots of all chromosomes by sliding window analysis. This plot(s) may contain candidate region causing mutation which can be used as an input for ANNOTATE [11,107]. The highest peak must be included within the selected region to select the smallest possible interval. ANNOTATE uses this peak to emphasize the mutations within the interval. A set of software tools have been developed for the analysis of whole-genome sequencing data obtained from SHOREmap process. Currently, SHOREmap v3.0 is useful for analyzing the WGS data [107].
Schneeberger, et al. [11] made a crossing in Arabidopsis thaliana, between a slow growth light green leaves mutant in the Columbia-0 (Col-0) accession with a wild-type plant in the Landsberg erecta (Ler-1) accession and used 500 mutant F 2 progenies to identify the causal mutation in one next-generation sequencing run of 20× coverage of the genome. Changes in amino acid sequences occurred i.e., Amino acid serine changed into asparagine, due to codon change in the AT4G35090 gene. However, some limitations are also associated with SHOREmap. Crossing the mutant to highly diverse line create disturbances by altering or interfering with the mutant phenotype, resulting into incorrect phenotyping/pooling, and thus leads to formation of a considerably larger mapping interval. Compared with the other NGS techniques like MutMap, SHOREmap has higher noise in SNP calling and poor alignment. Moreover, F 2 progenies required for bulking in SHOREmap is much larger than other techniques [142]. Estimation of the frequencies of k-mers (short subsequences) on the WGS data of two highly related genomes

M 3 No
Applicable to all organisms. However, especially useful for non-model organism where genome has not been sequenced and where mutagenesis is feasible. [21] 7 LNISKS (longer needle in a scanter k-stack) Estimation of the frequencies of k-mers (short subsequences) on the WGS data of two highly related genomes with custom k-filters.

BC 1 F 2 or F 2 No
Applicable to all organisms. However, especially useful for complex genomes and large and repetitive crop genomes like wheat (17 Gbp).

NGM (Next-Generation Mapping)
NGM is an alternative method to SHOREmap which requires a much smaller mapping population (10-50 F 2 ) to identify and map the causal mutations [12]. The NGM approach utilizes the genetic phenomenon that linkage disequilibrium (LD) between a mutant locus and its surrounds genomic region will lead to reduced heterozygosity in that particular region in mapping population like F 2 . Using sequencing and computation tools, a measure of ratio of homozygous and heterozygous locus in the local area is done to identify all type of variations viz., causal mutation, insertion, deletion, integration of T-DNA, etc. The approach was demonstrated to identify three mutant genes in Arabidopsis. The method can identify all type of mutation (insertion, deletion, and SNPs) present in homozygous state causing mutation. Pooled DNA of F 2 lines depicting contrasting phenotypes were sequenced using illumina paired end sequencing ( Figure 2). The resulting sequences were aligned with the Arabidopsis reference genome (TAIR9 release) to identify SNP variation [77]. Localization of identified SNP on genomic regions was carried out to identify hotspot for variation and non-recombinant blocks [12].
A mathematical score termed 'discordant chastity' statistic (Ch d ) derived from the illumina chastity statistic was used to estimate the proportion of reads at these SNP sites which were different from the reference genome. Using the Ch d statistic, probability density estimates and linkage analysis, the causal SNP can be pinned down. Compared to map based cloning which requires 100-200 F 2 population, NGM may be carried out using 10-50 F 2 . Even in cases where reduced representation of genome is available for sequencing (e.g., exome) NGM can be utilized for mutant trait identification. NGM analysis can directly take input from mapping output generated from common packages like Mapping and Assembly with Quality (MAQ) or SAMtools. The approach has been demonstrated to identify causal SNP which are missed by other pipelines and is highly cost and time effective. NGM in combination with map based cloning has been applied in many systems for identification of mutated gene. The NGM approach is flexible and can be utilized for mutant locus identification irrespective of type of mutation viz. SNP, insertion, deletion etc., as LD between mutant locus and surrounding region will lead to reduced heterozygosity regardless of type of mutation [12,143,144].

dCARE (Deep CAndidate RE-Sequencing)
Hartwig et al. [14] proposed deep candidate resequencing (dCARE) that combined isogenic bulk sequencing with deep candidate genes resequencing to identify and map the causal mutations in the Arabidopsis genome. They have demonstrated the use of mutagen induced variation as segregating markers in mapping-by-sequencing for identification of candidate gene/causal mutagens. dCARE is a NGS based Ion Torrent Personal Genome Machine sequencing platform for identification and mapping of causal mutations. The assumption being made in this technique was that the highest frequency of the causative change occurs in pools of bulked segregants among all EMS-induced changes ( Figure 3). If only resequencing is used then it will not possible to distinguish between the subtle allele frequencies that are closely linked in EMS changes [14]. dCARE has potential to identify the subtle phenotypes that were previously inaccessible. dCARE provides increased coverage for linked changes that reduced the large number of candidate genes to single causal gene.
In the dCARE technique, sequencing of bulked DNA from F 2 population may be performed to identify the putative mutations or hot spot regions based on allele frequencies. Thereafter, the actual candidate mutations are identified through new Ion Torrent sequencing technology. This is a WGS based approach to characterize mutants in model as well as in non-model crop species. Using such a technique, Hartwig et al. [14] unequivocally identified mutation in At3g63270 that corresponded with the suppressor mutation like in Arabidopsis [14].

dCARE (Deep CAndidate RE-Sequencing)
Hartwig et al. [14] proposed deep candidate resequencing (dCARE) that combined isogenic bulk sequencing with deep candidate genes resequencing to identify and map the causal mutations in the Arabidopsis genome. They have demonstrated the use of mutagen induced variation as segregating markers in mapping-by-sequencing for identification of candidate gene/causal mutagens. dCARE is a NGS based Ion Torrent Personal Genome Machine sequencing platform for identification and mapping of causal mutations. The assumption being made in this technique was that the highest frequency of the causative change occurs in pools of bulked segregants among all EMS-induced changes ( Figure 3). If only resequencing is used then it will not possible to distinguish between the subtle allele frequencies that are closely linked in EMS changes [14]. dCARE has potential to identify the subtle phenotypes that were previously inaccessible. dCARE provides increased coverage for linked changes that reduced the large number of candidate genes to single causal gene.
In the dCARE technique, sequencing of bulked DNA from F2 population may be performed to identify the putative mutations or hot spot regions based on allele frequencies. Thereafter, the actual candidate mutations are identified through new Ion Torrent sequencing technology. This is a WGS based approach to characterize mutants in model as well as in non-model crop species. Using such a technique, Hartwig et al. [14] unequivocally identified mutation in At3g63270 that corresponded with the suppressor mutation like in Arabidopsis [14].

MutMap Approach
MutMap is based on whole genome sequencing of bulked DNA from mutant progenies to identify the causal mutations for the trait of interest [13]. In MutMap process, a mutant having altered phenotype identified in the M2 or later generation is crossed to the wild type parent. The resultant F1 plants are allowed to self-pollinate and F2 population is screened for segregating mutant and wild type phenotype. DNA from multiple F2individuals (about 20-30 individuals) showing the mutant and wild phenotype are pooled and subjected to NGS based WGS with substantial genomic coverage (<10×). Simultaneously, a parental reference sequence is constructed by re-sequencing the wild type parent and aligning the reads to the publicly available reference genome of the species (Figure 4). In the consensus sequence, nucleotides of the reference sequence are replaced with those of parental line at all the detected SNP positions to make the parental reference sequence. The F2 bulked sequence reads are then aligned to this parental reference sequence and result of the alignment is used to infer the genomic location of the causal SNPs responsible for the mutant phenotype. The majority of the SNPs that are unlinked or loosely linked to mutant phenotype will segregate in a 1:1 mutant/wild type ratio, but SNPs which are linked to mutant phenotype will show 0% wild-type and 100% mutant reads. Calculation of SNP index, which is the ratio between the number of reads of a mutant SNP and the total number of reads corresponding to the SNP helps to predict the linkage of loci to the mutant phenotype. The SNP index of 1 or near to 1 indicates that SNP is linked to mutant phenotype, whereas those near to 0.5 correspond to the unlinked loci. MutMap can detect all kind of nucleotide variations created by mutagenesis such as SNPs, insertion and deletions.
Abe et al. [13] for the first time demonstrated the MutMap by characterization of two mutant rice genotypes (Hit1917-pl1 and Hit0813-pl2) having pale-green leaf phenotypes. They have successfully identified a causative SNP in the chlorophyllideea-oxygenase (OsCAO1) gene that lead to a L253F mutation (codon CTT→TTT) resulting in pale-green phenotype. Similarly, Takagi et al. [47] used MutMap to characterize a salt tolerant rice mutant (hst1) of rice cultivar Hitomebore and identified a SNP in the third exon of the OsRR22gene as the causative mutation. This SNP caused a nonsense mutation (codon TGG → TAG) in OsRR22gene and was linked with the salinity-tolerance phenotype of hst1mutant. Authors also developed a salt tolerant rice variety "Kaijin" by backcrossing

MutMap Approach
MutMap is based on whole genome sequencing of bulked DNA from mutant progenies to identify the causal mutations for the trait of interest [13]. In MutMap process, a mutant having altered phenotype identified in the M 2 or later generation is crossed to the wild type parent. The resultant F 1 plants are allowed to self-pollinate and F 2 population is screened for segregating mutant and wild type phenotype. DNA from multiple F 2 individuals (about 20-30 individuals) showing the mutant and wild phenotype are pooled and subjected to NGS based WGS with substantial genomic coverage (<10×). Simultaneously, a parental reference sequence is constructed by re-sequencing the wild type parent and aligning the reads to the publicly available reference genome of the species (Figure 4). In the consensus sequence, nucleotides of the reference sequence are replaced with those of parental line at all the detected SNP positions to make the parental reference sequence. The F 2 bulked sequence reads are then aligned to this parental reference sequence and result of the alignment is used to infer the genomic location of the causal SNPs responsible for the mutant phenotype. The majority of the SNPs that are unlinked or loosely linked to mutant phenotype will segregate in a 1:1 mutant/wild type ratio, but SNPs which are linked to mutant phenotype will show 0% wild-type and 100% mutant reads. Calculation of SNP index, which is the ratio between the number of reads of a mutant SNP and the total number of reads corresponding to the SNP helps to predict the linkage of loci to the mutant phenotype. The SNP index of 1 or near to 1 indicates that SNP is linked to mutant phenotype, whereas those near to 0.5 correspond to the unlinked loci. MutMap can detect all kind of nucleotide variations created by mutagenesis such as SNPs, insertion and deletions.
(designated as Ms9), which encode a plant homeotic domain (PHD)-finger transcription factor critical for tapetum degeneration and pollen formation. The Ms9 gene was the first nuclear male sterility gene identified in sorghum and provided an opportunity to control male sterility for the development of a two-line breeding system for hybrid sorghum. MutMap has been used for identifying agronomically important genes in many crop plants such as rice, sorghum, soybean, wheat, maize, etc. (Table 1).

MutMap+ Approach
MutMap+ method is an emerging technique which could be better exploited to identify mutated allele and causal SNPs. Unlike MutMap, if mutants obtained here are lethal or sterile, it is difficult to make crosses and thus cannot be used to identify the causative genes, therefore MutMap+ has been developed to deal with this setback [15]. MutMap+ is an NGS based approach which identifies genetic variations in wild and mutant strain simultaneously by whole genome sequencing. Showing similarity with the other genetic mapping methods, it also uses the principle of genetic linkage [90]. It follows the principle of mapping by sequencing through Bulk Segregant Analysis [15].
In MutMap+, after treating the seeds with desired mutagen, they are sown to obtain M1 plants. Mother panicle/ear of the M1 plants are harvested and are sown in panicle/ear to row method in the Abe et al. [13] for the first time demonstrated the MutMap by characterization of two mutant rice genotypes (Hit1917-pl1 and Hit0813-pl2) having pale-green leaf phenotypes. They have successfully identified a causative SNP in the chlorophyllideea-oxygenase (OsCAO1) gene that lead to a L253F mutation (codon CTT→TTT) resulting in pale-green phenotype. Similarly, Takagi et al. [47] used MutMap to characterize a salt tolerant rice mutant (hst1) of rice cultivar Hitomebore and identified a SNP in the third exon of the OsRR22gene as the causative mutation. This SNP caused a nonsense mutation (codon TGG → TAG) in OsRR22gene and was linked with the salinity-tolerance phenotype of hst1mutant. Authors also developed a salt tolerant rice variety "Kaijin" by backcrossing the mutant hst1 to parental line Hitomebore for two generations and confirmed the presence of the mutant hst1allele through Sanger sequencing. Variety "Kaijin" took only two years to develop and was practically similar to cultivar Hitomebore, except for the hst1mutation. This demonstrated the power of genomics-based crop breeding approaches for accelerating the development of climate ready improved cultivars. Chen et al. [120] mapped a novel nuclear male sterility mutant (ms9) in sorghum using MutMap and identified the causal mutation for male sterility in Sobic.002G221000gene (designated as Ms9), which encode a plant homeotic domain (PHD)-finger transcription factor critical for tapetum degeneration and pollen formation. The Ms9 gene was the first nuclear male sterility gene identified in sorghum and provided an opportunity to control male sterility for the development of a two-line breeding system for hybrid sorghum. MutMap has been used for identifying agronomically important genes in many crop plants such as rice, sorghum, soybean, wheat, maize, etc. (Table 1).

MutMap+ Approach
MutMap+ method is an emerging technique which could be better exploited to identify mutated allele and causal SNPs. Unlike MutMap, if mutants obtained here are lethal or sterile, it is difficult to make crosses and thus cannot be used to identify the causative genes, therefore MutMap+ has been developed to deal with this setback [15]. MutMap+ is an NGS based approach which identifies genetic variations in wild and mutant strain simultaneously by whole genome sequencing. Showing similarity with the other genetic mapping methods, it also uses the principle of genetic linkage [90]. It follows the principle of mapping by sequencing through Bulk Segregant Analysis [15].
In MutMap+, after treating the seeds with desired mutagen, they are sown to obtain M 1 plants. To detect the SNPs, the obtained sequence is aligned to the reference genome and SNP index is generated. Frequency of SNPs is used to create SNP index which is the ratio of the number of sequence reads that have a mutant SNP to the total number of the sequence reads corresponding to the SNP. This index showing indicates its close proximity with the causal gene and 0.5 depicts the location of SNPs away from the genomic region. The whole genome is scanned to detect the SNP index. Genomic region indicating SNP index 1 is the potential region possessing causal mutation site (Figure 4) [15].
MutMap+ is applied to find the causal mutation site and identify the gene/allele responsible for the trait occurred due to mutation. Nakata et al. [127] screened a mutant population of a japonica cultivar Nipponbare and found two mutant lines with altered starch gelatinization property. MutMap+ discovered that both of these lines harbor novel mutant alleles (age alleles) of starch branching enzyme IIb (BEIIb) gene. Since MutMap+ involves genetic mapping without artificial crossing, it can be exploited for rapid gene identification in many crops where crossing is a difficult choice. MutMap+ describes rapid identification of genes/QTL from natural variants or mutants using NGS.

MutMap-Gap
One of the main requisite of MutMap is that the presence of causative mutation in the parental reference sequence and therefore, MutMap cannot identify mutations located in the gap region of the reference genome. To overcome this shortfall, Takagi et al. [16] proposed a modified approach called MutMap-Gap for the identification of causal mutations in the gap regions of reference sequence. MutMap-Gap is a combination of MutMap and targeted de novo assembly of genomic gap regions. In MutMap-Gap, the parental reference sequence is prepared by re-sequencing the parental line and aligning the resulting reads to publicly available reference genome as in MutMap [13]. The majority of sequence reads from parental line align with the reference genome. However, when the parental line displayed significant genetic variations from the reference genome, the reads derived from a parental line specific genomic region cannot be aligned to the reference genome and therefore, remained unmapped ( Figure 4). Therefore, mutations present in these gap regions (unmapped reads) cannot be identified by MutMap, but approximate position of the causative mutation can be delineated by MutMap-Gap method.
Takagi et al. [16] demonstrated MutMap-Gap to isolate the blast resistant gene Pii from the rice cultivar Hitomebore, using a mutant line that has lost Pii function. The mutation was located in the gap region of the reference genome of rice cultivar Nipponbare and therefore causative SNPs cannot be identified by MutMap. Using MutMap Gap analysis, a causative SNP was identified in the second exon of the gene Os09t0327600-01, which was located in the gap region. This SNP represented a nonsense mutation, causing an amino acid change from Tryptophan (TGG) to a stop codon (TGA) and lead to the mutant phenotype. The MutMap-Gap is a good approach for identification of novel gene in cultivars which are genetically distinct from reference genome. 6.7. RNA Sequencing (RNA Seq) Based Mapping RNA sequencing-based approach is useful in identifying and mapping the genomic regions/candidate genes harboring a mutation and appropriate lesions causing altered phenotype in a strain [17]. This approach may be applied in the model organism/plants whose genome is already sequenced and genome sequences are available. In the RNA-seq the sample being sequenced is limited to the expressed portion of the genome/genes. Hence, it reduces the large amount of sequence data and effectively identifies possible mutations, causing nonsense and mis-sense changes, affecting splicing, and affecting gene expression [17].
Benefit of the RNA-seq based approach is to directly identify and assess the consequences of splice-altering mutations [17]. RNA-seq approach of NGS mapping offers numerous benefits over the WGS as it minimizes the representation of the genome to the expressed portion thus reduce amount of sequencing data and its cost. The effect of candidate mutation can be directly assessed in mutants and expression level in the expressed genes. It facilitates the identification of a very small number of high priority nonsense and mis-sense candidates underlying a phenotype of interest. It has potential to enhance the efficiency of forward genetics screens in model systems with large, polymorphic genomes [145].
RNA-seq based approach have been successfully applied for development of D-genome specific chromosomal marker in synthetic hexaploid wheat [141] and identification of stem resistance locus in Aegilops umbellulata [140]. It can also be applied to better understanding of the molecular mechanisms and genetic consequences domestication of crop plants [145].

QTL-Seq Approach
Most of the agronomically important traits are controlled by multiple genes having individual minor effects which are known as quantitative trait loci (QTLs). Identification and mapping of desired QTLs by whole-genome resequencing (WGR) of DNAs from two extreme populations for a given phenotype is called QTL seq approach [18]. In order to map QTL using QTL-seq approach, mapping populations which are generated from two genotypes having contrasting phenotypes for desired traits is required. The double haploid (DH) and Recombinant inbred lines (RILs) populations showed a high degree of homozygosity and are suitable for identification of QTLs of minor effects. In other cases, BC1F 2 progenies (10-20 individuals) showing two discrete characters may be selected for DNA isolation from both the groups and isolated DNA may be pooled for further process through QTL-seq approach [61,146]. DNA bulks are subjected to WGR with a minimum coverage of >6× genome and sequenced short reads are aligned to the reference genome to estimate the SNP-index. It is expected that the bulked DNA contains 1:1 ratio of genomes from both parents in the majority of genomic regions. Unequal representation of the two parental genomes containing QTLs for desired trait may also be detected. The short reads are aligned with the reference genome and the numbers (k) of short reads harbouring SNPs that are different from the reference sequence are counted. The QTL can be identified as peaks or valleys of the SNP-index plot. Accordingly, the SNP-index is 0 if the entire short reads contain genomic fragments from the parent that was used as a reference sequence. The SNP-index is 1 if all the short reads represent the genome from the other parent. A SNP-index of 0.5 means an equal contribution of both parents' genomes to the bulked progeny ( Figure 5). The QTL-seq approach has been successfully applied in rice [18], cucumber [136], tomato [147], chickpea [135,148], peanut [149], watermelon [150,151], broccoli [152,153], and squash [139] for identification and mapping of QTLs/genes.

Exome Capture Approach
Mapping by traditional positional or map-based cloning is being gradually replaced with mapping by sequencing approaches. Numerous approaches and successes have been demonstrated in Arabidopsis and some other plant models using approaches like SHOREmap and NGM. However, despite tremendous success of NGS based mapping in such systems, the extension of the same methods to complex genomes like wheat and barley has not been successful, due to very high amounts of repetitive regions and polyploidy in these genomes. Genomes of these important crops are very complex and hence, genomic data analysis and fishing of gene of interest is very cumbersome [132]. An alternative to whole genome analysis is the analysis of only expressed part of the genome or the exome. Traditionally exome includes all protein coding exons, small RNA and some additional locus with known function. In exome capture the exome region is fished out using hybridization with biotinylated probes which binds to target regions. The sample is the amplified and used for NGS or long-range sequencing approach. This reduction in complexity from entire genome to only coding region greatly helps in accelerating mapping of target genes (Figure 2). For example, wheat genome is 17 Gb in size, whereas the exome is only 84 Mb, and similarly, barley genome is 5 Gb in size, the exome is much smaller (62 Mb) [154]. This has led to identification of many important genes in both the crops like identification of "MANY-NODED DWARF" gene responsible for dwarfness in X-ray induced deletion in barley, Yr6yellow rust resistance gene in wheat, and several others [132,155]. QTL-seq allows an accurate quantitative evaluation of the genomic contribution from the two parents to the bulked DNAs by using SNP-index. QTL-seq does not require DNA marker development and genotyping and thus offers the rapid QTL identification with much higher power than the previous methods. It can be applied to any population for detecting genomic regions that underwent artificial or natural selection.

Exome Capture Approach
Mapping by traditional positional or map-based cloning is being gradually replaced with mapping by sequencing approaches. Numerous approaches and successes have been demonstrated in Arabidopsis and some other plant models using approaches like SHOREmap and NGM. However, despite tremendous success of NGS based mapping in such systems, the extension of the same methods to complex genomes like wheat and barley has not been successful, due to very high amounts of repetitive regions and polyploidy in these genomes. Genomes of these important crops are very complex and hence, genomic data analysis and fishing of gene of interest is very cumbersome [132]. An alternative to whole genome analysis is the analysis of only expressed part of the genome or the exome. Traditionally exome includes all protein coding exons, small RNA and some additional locus with known function. In exome capture the exome region is fished out using hybridization with biotinylated probes which binds to target regions. The sample is the amplified and used for NGS or long-range sequencing approach. This reduction in complexity from entire genome to only coding region greatly helps in accelerating mapping of target genes (Figure 2). For example, wheat genome is 17 Gb in size, whereas the exome is only 84 Mb, and similarly, barley genome is 5 Gb in size, the exome is much smaller (62 Mb) [154]. This has led to identification of many important genes in both the crops like identification of "MANY-NODED DWARF" gene responsible for dwarfness in X-ray induced deletion in barley, Yr6yellow rust resistance gene in wheat, and several others [132,155].
Exome sequencing has been extensively used in forward genetics screening and cataloguing of mutant locus [20,156,157]. Simultaneously, this method is also useful in reverse genetics screening. Using exome capture in tetra and hexaploidy wheat with 84 Mb capture probes, over 10 million high confidence point mutations in coding regions of an EMS induced Targeting Induced Local Lesions in genome (TILLING) population were identified. The average density of mutations was observed to be 35-40 mutation per kb, roughly equivalent to 24 non-synonymous mutations per annotated wheat gene [155]. The method also identified large scale deletion mutants in 29% of wheat lines. Similar approach identified causal mutation for tall mutant in wheat to present in Rht-B1gene, which is known in plant height phenotype [132].

NIKS (Needle in the k-Stack) Approach
Most of the NGS based forward genetic approaches are based on the availability of reference genome for comparison, which restricted the use of these techniques on the organisms where genetic map is available or genome has been sequenced. To overcome this problem, NIKS (needle in the k-stack) has been developed by Nordstrom et al. [21] and experimentally validated its applicability in rice cultivars and Arabis alpina. In both species, they found similar mutations while comparing the pooled F 2 individuals as well as in M3 individuals. Therefore, they concluded that NIKS may be applied for forward genetic screen in any species (where mutagenesis is amenable) without requiring segregating populations, genetic maps, and reference sequences. NIKS is a forward genetic approach for reference-free genome comparison, discovery of homozygous mutations, gene identification, and mapping based exclusively on the frequencies of k-mers (a set of short subsequences) within the WGS data of two closely related genomes, such as mutant and wild-type genomes [21].
To identify the candidate mutations by excluding/reducing the undesirable variations, mutant genotype is crossed with their wild parent and genome of F 2 individuals are used for WGS [13,14]. Thus NIKS utilizes the bulked segregant pooling based WGS strategy to identify the unknown or novel mutations in the genome. However, in species where crossing is difficult, two allelic mutants of their M 3 seedlings may be utilized for the same purpose through NIKS.
At the first step of NIKS, frequency of each k-mers maybe estimated within the WGS data of each sample using k-mers counting software Jellyfish [158]. Native k-mers and k-mers overlapping with sequencing errors can be distinguished by k-mers frequency histograms. Therefore, k-mers overlapped with sequencing errors may be separated easily from the reads that are error free. To reduce the sequencing errors raised during amplification artifacts, filtering for identical k-mers may be performed before running the NIKS. Differences in genome sequences of two populations generate numerous specific and overlapping k-mers ( Figure 6). NIKS technique firstly identifies the sample specific k-mers and merges them to form long sequences or seeds. Differences in seeds of both populations may arise due to induced mutagenesis. NIKS considers only those seeds which are homologous but not identical (obtained through pairing of seeds of both populations i.e., wild-type and the mutant population seed). Small mutations (less than k − 1 bp) and small InDels are combined in one elongated seed whereas larger InDels might not be assembled into one seed.

MutChromSeq (Mutant Chromosome Sequencing) Approach
DNA sequencing based mutational genomics techniques are very costly and compel significant computational challenges in some important crops such as barley, rye, and wheat. Genome size of these crops is huge (in GBs) and highly complexed polyploid nature. Therefore, traditional map based cloning has been applied for cloning of only a limited number of their genes [159]. Advancement of NGS techniques enabled several new gene cloning approaches viz., methylation filtration [160], duplex-specific nuclease digestion [161], transcriptome sequencing [162], and exome capture sequencing [20]. These approaches reduce the DNA sequence complexity and sequencing costs but they are unable identify all the potentially significant sequences.
To avoid these complications, the MutChromSeq (Mutant Chromosome Sequencing) approach was proposed by Sanchez-Martin et al. [22] for rapid gene isolation in barley and wheat. MutChromSeq is a chromosome flow sorting and sequencing based powerful, sequence-unbiased and reference-free forward genetics approach for genome complexity reduction and induced causal mutations identification without having positional fine mapping [163]. It has been effectively applied to reclone the Eceriferum-q gene (resistant to wax covered leaf sheath); Rph1 (leaf rust resistance gene) in barley [130] and clone the Pm2 gene (Powdery mildew resistance) in wheat [22].
It does not depend on recombination or fine-mapping for gene cloning and causal mutation identification. This approach may be performed in any crop species where, the mutagenesis is feasible; the target gene is associated with a phenotype and the chromosomal location of target gene is known [22,163]. Chromosomal sequence comparison of multiple independently derived mutants and their wild parents confirms the identification of causal mutations in a single candidate gene or a non-coding sequence. MutChromSeq does not follow the recombination-based mapping and targets all the DNA sequences and is therefore very useful for forward genetic screening in crop species that In the last step, NIKS generates local de novoassemblies or contigs to extend the sequences associated with the mutated region in the genome. Furthermore, de novogene predictions or gene annotation alignments may be used in the generated contigs for functional analysis of putative causal mutations. Based on these, accurate seed pair that represents the homozygous mutagen induced changes has been identified and determined. A major advantage of NIKS algorithm is that it does not require segregating populations, genetic maps and reference sequences for bioinformatics analysis; it has potential to identify the mutations within repetitive regions in some extent; it is useful in development of markers/SNPs/InDels in non-model organisms with reference-independent methods [21].

MutChromSeq (Mutant Chromosome Sequencing) Approach
DNA sequencing based mutational genomics techniques are very costly and compel significant computational challenges in some important crops such as barley, rye, and wheat. Genome size of these crops is huge (in GBs) and highly complexed polyploid nature. Therefore, traditional map based cloning has been applied for cloning of only a limited number of their genes [159]. Advancement of NGS techniques enabled several new gene cloning approaches viz., methylation filtration [160], duplex-specific nuclease digestion [161], transcriptome sequencing [162], and exome capture sequencing [20]. These approaches reduce the DNA sequence complexity and sequencing costs but they are unable identify all the potentially significant sequences.
To avoid these complications, the MutChromSeq (Mutant Chromosome Sequencing) approach was proposed by Sanchez-Martin et al. [22] for rapid gene isolation in barley and wheat. MutChromSeq is a chromosome flow sorting and sequencing based powerful, sequence-unbiased and reference-free forward genetics approach for genome complexity reduction and induced causal mutations identification without having positional fine mapping [163]. It has been effectively applied to reclone the Eceriferum-q gene (resistant to wax covered leaf sheath); Rph1 (leaf rust resistance gene) in barley [130] and clone the Pm2 gene (Powdery mildew resistance) in wheat [22].
It does not depend on recombination or fine-mapping for gene cloning and causal mutation identification. This approach may be performed in any crop species where, the mutagenesis is feasible; the target gene is associated with a phenotype and the chromosomal location of target gene is known [22,163]. Chromosomal sequence comparison of multiple independently derived mutants and their wild parents confirms the identification of causal mutations in a single candidate gene or a non-coding sequence. MutChromSeq does not follow the recombination-based mapping and targets all the DNA sequences and is therefore very useful for forward genetic screening in crop species that have complex genomes. Methodology of MutChromSeq approach is briefly presented in flow diagram (Figure 7).

MutRenSeq Approach
MutRenSeq is a resistance gene (R-gene) cloning pipeline, which integrates the EMS based mutational genomics with exome capture targeting R-genes to identify the causal mutations in a single candidate gene. It defines the mutagenesis of R genes prior to cloning and identification with target capture sequencing. It was developed by Steuernagel et al. [23] to clone the stem rust resistance genes Sr22 and Sr45 from hexaploid bread wheat. MutRenSeq is the advanced version of R-gene enrichment sequencing (RenSeq) approach [164]. This method utilizes the RenSeq data of EMS derived loss-of-resistance mutants (disease susceptible mutant) with wild-parent to compare the Rgene complements for identification of R-gene/causal mutations. This method enabled the fast identification of disease resistance genes (R-gene) without any positional cloning or fine mapping [163]. This approach was further utilized by Marchal et al. [131] for cloning of three major and distinct genes (Yr7, Yr5 and YrSP) for yellow rust resistance in wheat (Triticum aestivum L.). They avowed that the nucleotide binding and leucine-rich repeats (NLRs) encoding genes may provide diverse resistance spectra to important fungal diseases [131]. MutRenSeq is particularly appropriate for plant species with large genome size (wheat, barley, rye) where whole genome sequencing of multiple individuals

MutRenSeq Approach
MutRenSeq is a resistance gene (R-gene) cloning pipeline, which integrates the EMS based mutational genomics with exome capture targeting R-genes to identify the causal mutations in a single candidate gene. It defines the mutagenesis of R genes prior to cloning and identification with target capture sequencing. It was developed by Steuernagel et al. [23] to clone the stem rust resistance genes Sr22 and Sr45 from hexaploid bread wheat. MutRenSeq is the advanced version of R-gene enrichment sequencing (RenSeq) approach [164]. This method utilizes the RenSeq data of EMS derived loss-of-resistance mutants (disease susceptible mutant) with wild-parent to compare the R-gene complements for identification of R-gene/causal mutations. This method enabled the fast identification of disease resistance genes (R-gene) without any positional cloning or fine mapping [163].
This approach was further utilized by Marchal et al. [131] for cloning of three major and distinct genes (Yr7, Yr5 and YrSP) for yellow rust resistance in wheat (Triticum aestivum L.). They avowed that the nucleotide binding and leucine-rich repeats (NLRs) encoding genes may provide diverse resistance spectra to important fungal diseases [131]. MutRenSeq is particularly appropriate for plant species with large genome size (wheat, barley, rye) where whole genome sequencing of multiple individuals is complicated and expensive [23].
MutRenSeq works on the principle of exome capture targeted to the R-gene complement (NLR sequence) and mutational genomics. It generates a wild type RenSeq data based de novo assembly to be utilized as reference to map the RenSeq data of susceptible mutants [163]. Therefore, MutRenSeq includes the generation, screening, and identification of disease susceptible mutants or loss-of-function mutants from the M 2 population of disease resistant line followed by RenSeq. In order to identify the susceptible mutants, EMS mutagenesis should be done in the genotype in which resistance is controlled by a single R-gene. Because the susceptible or loss-of-function mutant may generate only when mutation occurs directly in the R-gene. A genotype harboring two or more R-genes for disease resistance would not be appropriate for MutRenSeq because there may be difficulties to select the susceptible mutants [23].
MutRenSeq requires three major steps for fast isolation of resistance genes viz., (i) Identification and selection of disease susceptible mutant (loss of disease resistance) from resistant wild type parent, (ii) Sequencing of NLR enriched genomes of both susceptible mutant and wild type resistant plants (iii) comparing these genes in mutants and wild types to identify the exact mutations responsible for the loss of disease resistance. Methodology of MutRenSeq approach is briefly presented in flow diagram (Figure 7).

SIMM (Simultaneous Identification of Multiple Causal Mutations)
Simultaneous Identifications of Multiple Mutations (SIMM) was invented to identify causal mutations in multiple mutations at a time by analyzing simultaneously their sequence data. The method does not demand wild type parental genome sequence information for the analysis. Here, each novel mutant (obtained from same mutagenized population) was back-crossed with the parent, and DNA of 20-30 F 2 individuals having phenotype of mutant were pooled (for each mutant different mutant pools should be obtained) and sequenced to >20× of the genome size using NGS method. Clean reads from each mutant-pool sample were then aligned to the reference genome (available for the crop) using software SOAP2 [165], bwa [166], or Bowtie2 [167]. Exclusively mapped reads were retained for SNP calling using SOAPsnp (for SOAP2) [168] or SAM tools (for bwa and Bowtie2) [143]. Total SNPs available in each bulk of discrete mutant may be identified and compared with each other to pointing out the candidate SNPs for each mutant phenotype. Moreover, to reduce sequencing error and increase the precision, few SNPs having <5 supporting reads were excluded from the further analysis. To overcome the problem of SNP index in MutMap or other techniques, Allele Index (AI) was introduced to consider SNPs supporting wild-type alleles in background mutants. AI may be calculated by dividing the number of supporting reads from wild-parent to the total number of reads in the mutant genotypes. AI with 0.8 value represents the availability of 20% sequencing errors in the particular SNPs. Moreover, candidate SNPs may be further refined by applying the Euclidean Distance (ED) analysis [169] to consider the SNP index in the test strain and reference mutants. ED value was calculated as ED = [2 (SI t − Si bc )] 1/2 , where SI t stands for the SI of mutation allele in the test strain, whereas SI bc stands for the SI of the same mutation allele in the background mutants. Since ED value ranged from 0 to 2 1 2 , it was raised to power 6 (ED 6 ), which was promising enough to enlarge the differences between causal mutations and closely linked mutations, and to signify candidate regions. Candidate regions harboring the causal mutation were expected to show a cluster of SNPs with high SI and ED 6 values. Finally the candidate mutations were validated using phenotype association study through high resolution melting analysis [170].
The main advantages of SIMM are identification of candidate mutations in several mutants simultaneously, originated from a single wild parent; no need of the reference sequence of parental genotype and assembly of sample specific k-mers; exclusion of a large number of background polymorphisms through the use of other mutant's data; avoidance of wrongly retaining or excluding candidate sites; high precision for resolving the causal mutations; applicability to detect the allelic relations among mutants with similar phenotypes.
The founding research paper [24] had characterized seven mutants obtained from EMS mutagenesis of Huanghuazhan (HHZ) rice. Of the seven mutants, four were male sterile. The study revealed mutations in three different genes (LOC_Os04g39470, LOC_Os03g58600, LOC_Os07g32480) responsible for male sterility in rice. Later, one such mutant HT5763 which showed mutations in LOC_Os04g39470 was deeply characterized to reveal the molecular mechanism for male sterility in rice. The locus LOC_Os04g39470 codes for OsMyb80 gene where a G to A mutation in HT5763 caused Glu74 (GAG) substitution by Lys (AAG). The mutations affects the activity of OsMyb80 and this compromised mutated Myb80 was not able to promote the gene expression of downstream genes that synthesize precursors for pollen wall formation; transportation of small nutrient molecules to nurture the pollen cell growth; degradation of the cell wall surrounding pollen mother cells (PMCs) and the tetrads for microspore separation; massive protein degradation, redox homeostasis, and cell death gene expression associated with the tapetum; and signal transduction and transcriptional regulation that regulate downstream events for pollen development [171].

TACCA (Targeted Chromosome-Based Cloning via Long-Range Assembly)
NGS based approaches rely upon amplification of target sequence and massive parallel sequencing, which resulted in high depth of sequencing and coverage of genome. However, NGS approach suffers in region of high repetitive sequences or structural elements like transposons. Due to very short read length (20-30 bp) of NGS approach, it is almost impossible to align millions of repeat short reads to a reference genome or de novo assembly sequence [22]. This is a particular challenge in crops like wheat and barley which have a very high percentage of the genome as repetitive elements.
To overcome the short read length limitation of second-generation sequencing approaches, a few alternative technologies providing long read lengths have been utilized, termed third generation sequencing. Unlike second-generation technologies the third generation sequencing does not relies on amplification of target sequence, instead sequences a single DNA molecule. The sequence is generated in real-time and read lengths are in average of the order of 12-15 kb with claimed lengths of up to 100 kb. The long read lengths of this technology remove the hurdles of computational algorithms for genome assembly, and hence are very useful for denovogenome assembly, transcript assembly, and mapping mutations [22]. Although long range sequencing is believed to accelerate functional genomics, it has a limitation of accuracy of prediction of bases in sequence and very high cost compared to short read sequencing. In general as the read length increases in sequencing the read quality decreases, which then require intervention either in experiment design or computational approach to deal with accuracy. The error rates of third-generation long range sequencing is higher than second-generation short read sequencing, which may be critical in mutant identification, causal SNP identification studies. Several computational approaches have been devised to overcome the error rate issue; however, it still remains a challenge for long range sequencing.
Currently, two major technologies viz. Pacific Biosciences Single Molecule Real Time (SMRT) sequencing and Oxford Nanopore's technology are utilized for long range sequencing. A combination of short and long read approaches termed hybrid sequencing is proposed for best results of both the techniques. This approach relies on fixing of DNA in the nucleus before sequencing using various approaches, followed by standard WGS. As a result the sequences which are closer together will have higher read pairs compared to sequences which are far apart. Long reads from third generation approaches may be used to create long template or scaffolds or sequencing repeat regions and then short read from second generation approaches may be used to remove the small errors from this sequence [22].
Targeted chromosome-based cloning via long-range assembly (TACCA), utilizes sequencing of isolated desired single chromosome separated by flow cytometry using long range sequencing approach (Figure 8) [22,25].

AgRenSeq (Association Genetics with R-Gene Enrichment Sequencing) Approach
AgRenSeq is the techniques that allows cheaper discovery and cloning of resistance (R) gene from a diverse germplasm panel and wild relatives of any crop species [26]. This technique does not require reference-genome and can directly identify the nucleotide-binding/leucine-rich (NLR) regions which confer resistance rather than identifying a genomic region encoding multiple paralogs. Screening of wild plants for variety of diseases and sequencing them can be done to identify resistance genes. Association analysis was combined with RenSeq approach to screen and identify the R-gene, for RenSeq a bait library that targets R-genes in particular plant species is required. A sequence capture bait library was designed and optimized for capturing nucleotide-binding/leucinerich (NLR) sequences encoded by the R-genes in this population. The enriched R-gene sequences were then assembled into NLR contigs and NLR k-mers were extracted for each accession. After a pre-filtering step, k-mer based association mapping was conducted to identify k-mers associated with the resistance trait. Phenotype scores are converted to AgRenSeq scores that assign positive values to resistance and negative values to susceptibility. Intermediate phenotype should have an AgRenSeq score close to zero (Figure 7). It was successfully applied in wheat crop and four stem rust resistance genes; Sr33, Sr45, Sr46, and SrTA1662, against three races of the stem rust pathogen were identified using this approach [26].
Subsequently, this approach was recommended for rapid cloning of R-gene and facilitates marker-assisted breeding and broad-spectrum resistance engineering in genetically modified crops without a need for a reference genome [172]. It also serves interrogate pan-genome sequence variation in diverse germplasm to isolate uncharacterised R-genes. Recent examples have shown utility of RenSeq for improving disease resistance in plants, and similar techniques identification of genes for abiotic stress-tolerant will greatly benefits the crops [173]. AgRenSeq exploits entire gene set of all strains of a species to isolateand cloning of the uncharacterized R-genes. However, biasness during TACCA approach was used to clone wheat gene Lr22a which is an important leaf rust resistance gene transferred to the bread wheat from wild relative Aegilops taushchii. Lr22a was mapped on chromosome 2D using microsatellite markers, further with high resolution mapping a 0.48-cM interval on map flanked by two markers was delimited. 2D chromosome from Lr22a carrier and non-carrier genotypes were isolated using flow cytometry and subjected to long range sequencing and also short read sequencing to alleviate errors. The long range sequencing using this approach lead to scaffolds with 50% of them having sizes 9.76 Mb or more with highest scaffold up to 36.4 Mb which is around 100× more than a BAC library. Using combination of short read sequencing and mutant analysis the Lr22alocus was positively identified as NLR enriched R-gene [25]. Target Enrichment Sequencing (TenSeq) approaches have been recently reported to be very useful in cloning of target genes in complex genomes like wheat. These approaches like MutRenSeq, MutChromSeq, AgRenSeq are dependent on NGS based sequencing and assembly. Long range sequencing in such cases especially where repeat regions are encountered is highly useful for cloning of target genes. TACCA approach has many advantages over traditional map based cloning approach viz., no dependence on enrichment library and reference sequence. In addition, this approach is fast and cost effective compared to BAC based cloning [25].

AgRenSeq (Association Genetics with R-Gene Enrichment Sequencing) Approach
AgRenSeq is the techniques that allows cheaper discovery and cloning of resistance (R) gene from a diverse germplasm panel and wild relatives of any crop species [26]. This technique does not require reference-genome and can directly identify the nucleotide-binding/leucine-rich (NLR) regions which confer resistance rather than identifying a genomic region encoding multiple paralogs. Screening of wild plants for variety of diseases and sequencing them can be done to identify resistance genes. Association analysis was combined with RenSeq approach to screen and identify the R-gene, for RenSeq a bait library that targets R-genes in particular plant species is required. A sequence capture bait library was designed and optimized for capturing nucleotide-binding/leucine-rich (NLR) sequences encoded by the R-genes in this population. The enriched R-gene sequences were then assembled into NLR contigs and NLR k-mers were extracted for each accession. After a pre-filtering step, k-mer based association mapping was conducted to identify k-mers associated with the resistance trait. Phenotype scores are converted to AgRenSeq scores that assign positive values to resistance and negative values to susceptibility. Intermediate phenotype should have an AgRenSeq score close to zero (Figure 7). It was successfully applied in wheat crop and four stem rust resistance genes; Sr33, Sr45, Sr46, and SrTA1662, against three races of the stem rust pathogen were identified using this approach [26].
Subsequently, this approach was recommended for rapid cloning of R-gene and facilitates marker-assisted breeding and broad-spectrum resistance engineering in genetically modified crops without a need for a reference genome [172]. It also serves interrogate pan-genome sequence variation in diverse germplasm to isolate uncharacterised R-genes. Recent examples have shown utility of RenSeq for improving disease resistance in plants, and similar techniques identification of genes for abiotic stress-tolerant will greatly benefits the crops [173]. AgRenSeq exploits entire gene set of all strains of a species to isolateand cloning of the uncharacterized R-genes. However, biasness during NLR capture may mislead the isolation and cloning of R-gene through AgRenSeq approach [26].

LNISKS (Longer Needle in a Scanter K-Stack) Approach
LNISKS, an extension of NIKS, is a high throughput method for mutation discovery and mapping in crop genome, especially in large and repetitive crop genomes without availability of reference genome. This follows the similar principle of NIKS except for applying the custom k-mer filter to increase the precision of identification of causative mutations ( Figure 6). Other innovations in LNISKS are pertaining to extension of k-mers to seeds both before and after the seeds are clustered/paired [27]. Suchecki et al. [27] experimentally proved and validated that the filtering of k-mers significantly reduces the quantity of call variants which has to be taken in WGS data of 16Australian wheat cultivars. This method was used for identification and mapping of ms5 genic male sterility mutations in bread wheat. Furthermore, LNISKS also recognized the markers which are responsible for narrowing the Ms5/ms5 genomic region.
In LNISKS, once sample-specific k-mers are identified, a new customized k-mer filtering step is applied based on availability of suitable data and specific biological context of the input datasets [27]. This step relieves the computational requirements by significantly reducing the total number of k-mers taken for the final assembly and also reduces the number of false positive calls by removing the irrelevant loci. This helps in identification of desired candidate mutations in the genome. Higher k-values in combination with limited sequencing coverage increased the proportion of target sequences, not being captured by k-mers. Therefore, fixing the k-value at or slightly below of that value is appropriate for further analysis. The main advantages of LNISKS are simultaneous identification of homozygous as well as heterozygous mutation; reference and alignment-free genotyping of second generation sequencing datasets for a pre-defined set of varying loci; applicability for mutation discovery and mapping in complex and large genomes like wheat (17 Gb) and reduces the computational cost and time over NIKS [27].

Bioinformatics Tools/Software/Pipelines Used in NGS Based Forward Genetic Screen for Mutation Identification and Mapping
Bioinformatics are essential for processing and analyzing the large and complex genomic datasets and obtaining their functional insights. The advent of next-generation sequencing (NGS) has drastically changed the process of associating a phenotype with their causal gene/QTL with the help of sophisticated bioinformatics tools [28,174]. NGS based forward genetic screening may possess high cost and the complexity of analyzing high throughput sequencing data. Assembly of sequenced plant genome is severely hampered by long repetitive segments, large genome sizes, and polyploidy genome [7,8,175]. Advances in sequencing technologies and bioinformatics tools have allowed rapid progress since the sequencing and assembly of the crop genome [7]. Various bioinformatics tools/pipelines such as SHOREmap [11]; MAQGene [29]; GenomeMapper [30]; Mapping and Assembly with Short Sequences (MASS) [31]; Next-Generation Mapping (NGM) [12]; The SNPtrack tool [32]; CloudMap [33]; CandiSNP [34]; SIMPLE Pipeline [35]; artMAP [28] have been developed for reducing the complexity of NGS data and delivering the concluding outcomes (Table 4). These pipelines/tools are further highlighted with their importance and applicability in subsequent heads.

MAQGene
MAQGene is a user-friendly, simple web browser interface developed by Bigelow et al. [29] especially to detect the causative mutations and to further classify the mutations based on associated exon annotations in Caenorhabditis elegans.
MAQGene automatically launches the publicly available MAQ (Mapping and Assembly with Quality) software and collects the customized summary and functional outputs (viz., position and specific features of sequence variants) from the WGS data of mutant genome compared to a wild-type reference genome. It does not require any technical knowledge of command-line tools and can be run through the web interface entirely. The MAQGene have specific set of parameters for analyzing and interpreting WGS reads. The user may choose the appropriate set of parameters according to need. MAQGene may handle long reads (up to 127 bases) and map in both single read and paired-end modes. The output file of MAQGene is easily convertible to an Excel spreadsheet for further processing. Various measures of output files allows user to perform easy browsing of sequence variants; comparisons of different genomes; fast assess the degree of coverage for a given nucleotide position; selection of desirable variants for validation by Sanger re-sequencing and the test probing true and functional relevance of a nucleotide variant.
Bigelow et al. [29] used MAQGene for discovering sequence variants generated by in-house Illumina Genome Analyzer-II based WGS reads in different C. elegans genomes compared to the wild-type C. elegans reference genome. In addition, it may also useful to compare any input WGS reads (in fastq format) to any wild-type reference genome (fastq format with general-feature format). MAQGene has been broadly used by the scientists while working with C. elegans [87,176]. However, it was no longer updated by the developers because the pipeline relies on an outdated aligner (MAQ) and requires technical expertise to install, which inevitably limits its general adoption.

GenomeMapper
GenomeMapper is a standalone algorithm to simultaneously align the short reads of multiple genomes by integrating related genomes into a single graph structure [30]. It provides accurate and high alignment quality by aligning a sequence against a graph of sequences rather than aligning two linear sequences. This algorithm firstly provided the standards to tackle the problems arising from aligning the multiple references. It was specific tool for the Arabidopsis 1001 Genomes Project [177] and provides incessantly basic short read alignment option for pipeline SHORE [178]. Schneebergeret al. [30] demonstrated the construction of a multiple genome sequence graph based on published polymorphisms of Arabidopsis and compared the results with the conventional approach of aligning the same set of reads against a single reference. However, GenomeMapper may be used to analyze sequence reads obtained from bacterial, plant, invertebrate, and mammalian genomes.
Moreover, GenomeMapper can also be used for alignments against a single target genome and provides access to regions that are highly divergent from the first reference. This approach reduces the number of false-positive SNP calls caused by misalignments near InDels [178]. Furthermore, it is useful tool for precise alignment of longer reads of whole genome sequence of known species or related species and single step analysis of metagenomic samples. GenomeMapper broadly follows the three steps. In the first step, GenomeMapper scans the hash index for k-mers which are identical between sequence reads and genome graph. In the second step, location and sequence of Nearly Identical Maximal Substrings (NIMS) between sequence reads and genome graph is being determined. At last, a k-banded alignment by applying dynamic programming to ensure a consistent gap placement is being executed.

MASS (Mapping and Assembly with Short Sequences)
MASS (Mapping and Assembly with Short Sequences) was created by Cuperus et al. [31] from the Carrington Laboratory, specifically for mapping and assembling of sequenced data of Arabidopsis mutants. It is freely available for download and utilization at http://jcclab.science.ore-gonstate.edu/ MASS. MASS pipeline is utilized to identify the small number of candidate genes/causative mutations within a relatively small interval of 1-2 Mb from the DNA sequence of bulk segregant population. MASS package have potentials for simultaneous mapping and sequencing at a genome-wide level. It employs Illumina sequenced pair end reads obtained from DNA pools of BC 1 F 2 populations of mutant and wild parent for mapping process.
Additionally MASS package may be used to identify the mir390a-1 mutation in Arabidopsis thaliana. It includes pipelines to create SNP enrichment plots, alignment with MAQ, and SNP filtering. The MASS pipeline contains preloaded scripts to run Cache Assisted Hash Search using XOR logic (CASHX) [179], Short Oligonucleotide Analysis Package (SOAP) [180] and Mapping and Assembly with Quality (MAQ) [29] for its various functioning. MASS utilizes CASHX to perform mapping process illumine sequenced reads whereas SOAP is utilized to define the syntactic roles of aligned Illumina reads. MASS filters the SNP data set (cns.snp) from the MAQ output assembled them into short intervals. Cuperus et al. [31] nicely explained the application of MASS for identification of MIR390 mutants in Arabidopsis by direct genome sequencing. They informed that MASS is extremely useful for causal mutation identification in EMS derived mutant genome, where hundreds or thousands of changes existed in addition to the causal mutation.

Next-Generation Mapping (NGM)
The NGM approach can be accessed on web at http://www.bar.utoronto.ca/NGM/index.html. The application can be used for mapping of EMS based mutants in Arabidopsis. The experimental setup involves creating a mapping population (F 2 ) from mutant and wild type parents, NGS sequencing of pooled genomic DNA of mapping population, aligning the resultant sequence with reference genome, and identification of SNP. This output SNP data is submitted to the web based interface for NGM [12]. The web-based system works in the following four step process: i.
SNP data from F 2 mapping population: This involves getting sequence data from sequencer, cleaning, and pre-processing of sequence data. Uploading and filtering of SNP data to website. ii. Localization of SNPs: Localization of mutants to Arabidopsis chromosome is done by identifying non-recombinant (less heterozygosity) area within genomic region with mutations. iii. Segregating SNPs based on their variation to reference genome. iv. Localization and annotation of causal SNP by fine mapped region.
The NGM approach relies on two key parameter values for narrowing down on putative locus. These parameters viz. kernel size (used for smoothing of chastity threads) and number of cluster used for k-means clustering, are essential to be used selected optimally, and may influence the mutant locus identification power. A bigger kernel size provides greater smoothing of chastity threads losing on potential SNPs while a smaller kernel size although improve resolution may lead to potential errors. The k parameter in k-means clustering is essential for separation of homozygous and heterozygous signals. Large k-values increase homozygous to heterozygous ratio but may not correspond to actual size of candidate region [12].

The SNPTrack Tool
Leshchiner et al. (32) developed the SNPTrack tool by calculating the log likelihood based on a Hidden Markov Model of recombination breakpoints to make possible the rapid and accurate identification and mapping of causal mutations in model organisms. They have used the SNPTrack tool for analysis of sequencing data of zebrafish. SNPTrack adopts a client-server system that integrates data management, analysis, and interpretation into a single system [32].
SNPTrack was developed as a one-stop-shop bioinformatics solution capable of performing the functional analysis of for genetic data. This tool offers a full suite of data storage and management, analysis, and interpretation tools for genetic association studies [181,182]. The Oracle server stores and integrates phenotypic and genotypic data as well as annotations of genetic biomarkers from public resources about SNPs, quantitative trait loci (QTLs), genes, proteins, and pathways. SNPtrack was used to analyze the data and determine the causative mutation [32] and further it was applied to the study the epigenetic control of intestinal barrier function and inflammation in zebrafish [183] and role of MYB36 to regulate the transition from proliferation to differentiation in the Arabidopsis root [184].

CloudMap
CloudMap is an open source cloud computing resource for mapping of mutants, originally designed for C. elegans but applicable to Arabidopsis and other plants. The CloudMap is available originally on galaxy web platform (http://www.usegalaxy.org/cloudmap) which has been shifted to MiModD system (http://mimodd.readthedocs.io/en/latest/). Cloud map may be run on galaxy cloud or amazon web service platform or a local instance on machine [33]. CloudMap uses custom python scripts for mutant locus identification from NGS based reads. CloudMap provides following features; (i) alignment, variant calling, and annotation; (ii) variant subtraction and filtration; (iii) checking of candidate genes for mutation and creating useful gene lists; (iv) in-silico complementation testing; and (v) identification of deletions.

CandiSNP
CandiSNP is a web based and user-friendly bioinformatics application to identify the causal mutations/SNPs from high throughput sequencing (HTS) data of F 2 progenies having mutants and parental phenotypes. It was developed by Etherington et al. [34] to enables fast assessment of causal SNPs and their positions. CandiSNPcreates density plots from the HTS data provided by the user, therefore identification of SNP positions is essential before CandiSNP activity.
CandiSNP perform two important steps viz., use of snpEff [185] to categorize the SNPs based on chromosomal position and chromosome wise plotting the SNPs into a graph based on user-selected alternate allele frequency (AF) threshold, provides the desired information by coloring SNPs in different densities according to SNP categories. The density and distribution of SNPs is visualized chromosome wise in the output line graph to decipher the causal SNPs [34].
Candidate causative mutations/SNPs in annotated coding regions are highlighted on the plots/graph and listed in a table. Furthermore, CandiSNP gives annotations describing the genomic feature in which each SNP is located. This function is very useful in creating associations between SNPs/causal genomic regions and their molecular and biological functions. Based on this, selection and refinement of candidate gene becomes easier and faster. CandiSNP is useful in identification of recessive mutants in homozygous F 2 (BC 1 F 2 ) segregants generated from a back-cross as well as dominant mutations in homozygous F 2 (BC 1 F 2 ) after confirming their homozygosity in the F 3 prior to bulk segregant analysis. By plotting homozygous and close-to homozygous SNPs identified from HTS along the chromosome arms, the program visualizes areas of linkage and easily narrows down candidate mutation positions [34].
The web-application CandiSNP is freely available online at http://candisnp.tsl.ac.uk. User may run the CandiSNP process on a command line as part of bioinformatics pipelines, a Perl module is also available as part of the source code. At present, CandiSNP is available for the genome annotations of several plant species viz., Arabidopsis thaliana TAIR9 and TAIR10, Oryzae sativa v7, Solanum lycopersicum v2.40, Glycine max 1.09v8, Vitus vinefera v1, and Zea mays B73 v5b. This web application has been utilized by Wambugu et al. [186] and Xu et al. [187] to dissect the genetic control of amylose content and study the Lincomycin (LIN)-mediated inhibition of protein synthesis in chloroplastsin rice, respectively.

A SIMPLE Pipeline
SIMPLE pipeline (Simple Mapping Pipeline) is a NGS based user friendly and easy to use bioinformatics tool to identify the causal mutations in forward genetic screens. It was developed by Wachsman et al. [35] to identify and map the causal mutations in a simple and easy way. This pipeline utilizes the NGS fastq reads; generated from WGS of DNA pool of mutant type and wild type progenies to create resultant tables and plots which have information about all the possible candidate genes and causal SNPs. The pipeline is based on a short BASH script in order to generate several variant call format files and three plots. The program may be operated on Mac OSX version 10.11.6 and Linux release 6.7 (GNOME 2.28.2) with having Java 1.7 installed on the system. SIMPLE pipeline is hosted on GitHub and may be easily downloaded from https://github.com/wacguy/Simple and installed without any difficulty. It may be operated without prior understanding of NGS programming and bioinformatics tools. It requires only a few simple preparatory steps viz., downloading the fastq files and determining the species to start the program.
Wachsman et al. [35] suggested that the pipeline may work with any paired end or single-endfastq combination obtained from M 2 /F 2 or M 3 /F 3 . However, working with an F 2 /M 2 generation rather than an F 3 /M 3 generation is more fruitful to obtain robust outcomes. It does not require NGS reads from any map cross or back cross [13,15]. An important consideration is the sequencing depth; pipeline prefers sequencing depth up to 30× to get reliable results. Additionally, this pipeline may work with fewer numbers of sampled individuals (a few dozen) which further shortens the analytical complexity and analysis time. The SIMPLE pipeline can be useful for analysis of any diploid species; however it has only been validated for Arabidopsis thaliana and Oryza sativa (rice) [35].

artMAP
artMAP, a user-friendly tool, was developed to discover and map EMS induced mutations in the Arabidopsis genome [28]. The artMAP may be operated on Windows/Mac/ Linuxoperatingsystemplatforms and its pipelines consist of several open source software integrated into a docker container (https://www.docker.com/) to provide a graphical user interface (GUI). This software has overcome the limitations of data generation platforms and it allows the data generated by all the sequencing platforms. Input sequencing files generated from single or paired-end sequencing can be used by artMAP for the mapping of EMS induced mutations in plants. By artMAP, mapping and identification of causal mutations may be possible with only a few mouse clicks and analysis results may come out with interactive graphs which display the annotation details of each mutation [28].
Due to its graphical user interface (GUI) artMAP can be run on a standard desktop/laptop, thereby limiting the bioinformatics expertise required. The artMAP pipeline consists of well-established tools including TrimGalore, BWA, BED Tools, SAM tools, and SnpEff which were integrated in a Docker container. The artMAP pipeline consists of six steps performed by integrated softwares, namely (i) pre-processing of the sequencing read files by Trimgalore; (ii) alignment of reads to the Arabidopsis genome by BWA; (iii) post-processing of aligned reads by SAMtools (iv) identification of single-nucleotide polymorphisms (SNPs) specific to mutant samples through the combined use of SAMtools and BED Tools suite, (v) Visualization of the SNPs, annotation of SNPs by SnpEff and; (v) Finally, artMAP provides a list of SNPs along with their allele frequency, depth, and annotation in a tab separated file. artMAP also provides an additional option to control the removal of PCR duplicates from the control and mutant BAM files. This bioinformatics tool helps to map EMS-induced mutations in Arabidopsis and asses their association with the desired phenotype [28].

Limitations and Way Ahead
Next-generation sequencing platforms have enhanced our knowledge in sequencing and mapping of crop genomes, identifying causal mutations, gene regulation, and more [181]. NGS based techniques are gaining great achievements in functional genomics, agri-genomics, and plant breeding research and have prospects for their utilization in other potential fields of research in the future [188].
It is well known that most of the plant genomes are complex with relatively higher proportion of the repetitive sequences and transposons. Due to this, the short sequence reads (35-700 bp) generated by second generation sequencing platforms viz. Illumina, SOLiD, Roche (454) etc. are sometime not efficient, especially if the genome size is big [100,189]. Long read sequencing (~several Kb) produced by third generation sequencing platforms like Oxford Nanopore and PacBio appears to be more promising. This will enable identification of epigenetic marks such as DNA methylation in highly variable genomic regions and its expression. It is also helpful in constructing high quality reference genome and accelerate gene discovery in plants [109,190]. With the help of NGS techniques, genome wide SNP discovery, allele mining, developing molecular markers and genotyping can be performed in other species or on non-model organisms, facilitating its speedy use in research activities [93]. However, the detection of rare point mutations in plant genome through NGS remains challenging. To improve the accuracy of conventional NGS method, Stalhberg et al. [191,192] developed an improved version of sequencing called as Simple, Multiplxed, PCR-based bar-coding of DNA for selective mutation detection using sequencing (SiMSen-Seq). It can detect variants at or below 0.1% frequency with low DNA input. Similarly, to overcome the limitations of mutation detection in reduced representation sequencing, Monson-Miller et al. [193] demonstrated the use of Restriction Enzyme Sequence Comparative ANalysis (RESCAN) to detect single nucleotide polymorphism (SNP) in rice and Arabidopsis. Though the genotyping/sequencing has made significant progress in the last decade, phenotyping did not register a similar pace. Automated high throughput phenotyping platforms for greenhouse as well as field can definitely accelerate the gene discovery even further (Figure 9) [190]. detection in reduced representation sequencing, Monson-Miller et al. [193] demonstrated the use of Restriction Enzyme Sequence Comparative ANalysis (RESCAN) to detect single nucleotide polymorphism (SNP) in rice and Arabidopsis. Though the genotyping/sequencing has made significant progress in the last decade, phenotyping did not register a similar pace. Automated high throughput phenotyping platforms for greenhouse as well as field can definitely accelerate the gene discovery even further (Figure 9) [190]. Future perspective of NGS in plant breeding is obtaining new allelic variants in the genome of many crops. NGS technologies have made available genome sequences for many important crops which will facilitate genome editing approaches. These editing approaches enable site directed mutagenesis to improve economically useful traits by involving modification in targeted locus. Sequence-specific nucleases, such as zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN) and CRISPR/Cas9 system can be exploited for the same (Figure 9) [194]. Using diverse irradiation methods to generate mutants and their characterization by refined NGS pipelines would become more popular in future studies. Combining irradiation facilities like heavy ion beams, cosmic rays, etc., and DNA sequencing technologies will maximize mutagenesis efficiencies and will optimize the use of the developed genetic materials for plant breeding and functional genomics investigations [190][191][192].

Conclusions
It is worth to say that mutation breeding became one of the major pillars of modern plant breeding as it plays an important role in global nutritional and food security now. With the advent of molecular marker techniques, mapping and cloning techniques, next generation sequencing approaches and functional genomics, induced mutagenesis has become more useful and feasible for Future perspective of NGS in plant breeding is obtaining new allelic variants in the genome of many crops. NGS technologies have made available genome sequences for many important crops which will facilitate genome editing approaches. These editing approaches enable site directed mutagenesis to improve economically useful traits by involving modification in targeted locus. Sequence-specific nucleases, such as zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN) and CRISPR/Cas9 system can be exploited for the same (Figure 9) [194]. Using diverse irradiation methods to generate mutants and their characterization by refined NGS pipelines would become more popular in future studies. Combining irradiation facilities like heavy ion beams, cosmic rays, etc., and DNA sequencing technologies will maximize mutagenesis efficiencies and will optimize the use of the developed genetic materials for plant breeding and functional genomics investigations [190][191][192].

Conclusions
It is worth to say that mutation breeding became one of the major pillars of modern plant breeding as it plays an important role in global nutritional and food security now. With the advent of molecular marker techniques, mapping and cloning techniques, next generation sequencing approaches and functional genomics, induced mutagenesis has become more useful and feasible for crop improvement as well as for discovering the novel candidate genes and their biological functions. The NGS techniques have made the mapping and sequencing procedures more feasible and became an essential tool for crop geneticists to identify and characterize genomic variations associated with economically important traits. WGR and transcriptome profiling, which contribute to providing comprehensive information on genetic variability and its regulatory mechanisms, are the most popular applications of NGS. The NGS-based approaches presented throughout this review are applicable to classic mutations whose phenotypes fall into distinct categories compared to wild type (qualitative traits). Moreover, many of them are applicable even in the absence of a reference genome, known single-nucleotide polymorphisms, or genetic tools. With these advancements, rapid identification and mapping of desired mutations are now possible in forward genetic screens. Additionally, these approaches also provide a wealth of background mutations in germplasm collections that carry the mutations to the scientific community. However, to be more successful in the interpretation of NGS data, bioinformatics and statistical tools are essential for delivering accurate assembly, alignment, and variant detection. Therefore, a modern plant breeding team must have the scientists from multidisciplinary background viz., plant biology, genetics, physiology, molecular biology, bioinformatics, statistics, and mathematics to ensure the precision and success of research work in this direction. Furthermore, the advent of third-generation sequencing technology, associated with longer reads, will further improve the quality of variant and mutation identification. We do hope that the information provided in this review will be useful for all the scientific communities working on these aspects.