# Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. The Standard 1st Order HMM

- A hidden state alphabet, Λ, with “Prior” Probabilities P(λ) for all λ $\in $ Λ, and “Transition” Probabilities P(λ
_{2}|λ_{1}) for all λ_{1}λ_{2}$\in $ Λ—where the standard transition probability is denoted a_{kl}= P(λ_{n}= l|λ_{n−1}= k) for a 1st order Markov model on states with homogenous stationary statistics (i.e., no dependence on position ‘n’). - An observable alphabet, B, with “Emission” Probabilities P(b|λ) for all λ $\in $ Λ b $\in $ B—where the standard emission probability is e
_{k}(b) = P(b_{n}= b|λ_{n}= k), i.e., a 0th order Markov model on bases with homogenous stationary statistics.

- Evaluation—Determine the probability of occurrence of the observed sequence.
- Learning (Baum–Welch)—Determine the most likely emission and transition probabilities for a given set of observational data.
- Decoding (Viterbi)—Determine the most probable sequence of states emitting the observed sequence.

_{0}, b

_{1}, …, b

_{n−1}being emitted by the sequence of hidden states Λ = λ

_{0}, λ

_{1}, …, λ

_{n–1}is solved by using P(B, Λ) = P(B|Λ)·P(Λ) in the standard factorization, where the two terms in the factorization are described as the observation model and the state model, respectively. In the 1st order HMM, the state model has the 1st order Markov property and the observation model is such that the current observation, b

_{n}, depends only on the current state, λ

_{n}:

_{0}|λ

_{0}) P(b

_{1}|λ

_{1})…P(b

_{n–1}|λ

_{n–1}) × P(λ

_{0})P(λ

_{1}|λ

_{0})P(λ

_{2}|λ

_{0}, λ

_{1})…P(λ

_{n–1}|λ

_{0}…λ

_{n−2})

_{0}|λ

_{0}) P(b

_{1}|λ

_{1})…P(b

_{n–1}|λ

_{n–1}) × P(λ

_{0})P(λ

_{1}|λ

_{0})P(λ

_{2}|λ

_{1})…P(λ

_{n–1}|λ

_{n–2})

_{k}(n) = “the most probable path ending in state ‘k’ with observation ‘b

_{n}’”. The recursive definition of v

_{k}(n) is then: v

_{l}(n + 1) = e

_{l}(b

_{n+1}) max

_{k}[v

_{k}(n) a

_{kl}]. From which the optimal path information is recovered according to the (recursive) trace-back:

_{Λ}P(B, Λ) = (λ*

_{0}, …, λ*

_{n−1})

_{n}|λ*

_{n+1}= 1 = argmax

_{k}[v

_{k}(n) a

_{kl}], and where λ*

_{L–1}= argmax

_{k}[v

_{k}(L – 1)], for length L sequence.

#### 2.2. The Meta-State HMM

- 1)
- Non-negative integers L and R denoting left and right maximum extents of a substring, w
_{n}, (with suitable truncation at the data boundaries, b_{0}and b_{N−1}) are associated with the primitive observation, b_{n}, in the following way:- w
_{n}= b_{n−L+1}, …, b_{n}, …, b_{n+R} - $\tilde{\mathrm{w}}$
_{n}= b_{n−L+1}, …, b_{n}, …, b_{n+R−1}

- 2)
- Non-negative integers l and r are used to denote the left and right extents of the extended (footprint) states, f. Here, we show the relationships among the primitive states λ, dimer states s, and footprint states f:
- δ
_{n}= λ_{n}λ_{n+1}(dimer state, length in λ’s = 2) - f
_{n}= δ_{n−l+1}, …, δ_{n+r}≅ λ_{n−l+1}, …, λ_{n}, …, λ_{n+r+1}(footprint state, length in δ’s = l + r)

_{n}is aligned with the n-th hidden state λ

_{n}. Given the above, the clique-factorized HMM is as follows [11]:

#### 2.3. HMM States and Transitions for Gene-Structure Identification

- Exon states = {e
_{0}, e_{1}, e_{2}}, where frame label is ‘real’, i.e., there are three emission tables; - Intron states = {i
_{0}, i_{1}, i_{2}}, where frame label is a convenient implementation artifact (so one em table); - Junk state = {j}, the non-coding (non-exonic) nucleotides in the intergenic regions, while the non-coding nucleotides in the intragenic regions are the aforementioned introns.

- jj...je
_{0}e_{1}e_{2}…e_{0}i_{0}i_{0}**…**i_{0}e_{1}…e_{0}e_{1}e_{2}jj…j (intron follows exon base with frame 0) - jj...je
_{0}e_{1}e_{2}…e_{1}i_{1}i_{1}**…**i_{1}e_{2}…e_{0}e_{1}e_{2}jj…j (intron follows exon base with frame 1) - jj...je
_{0}e_{1}e_{2}…e_{2}i_{2}i_{2}**…**i_{2}e_{0}…e_{0}e_{1}e_{2}jj…j (intron follows exon base with frame 2)

_{0}, e

_{0}e

_{1}, e

_{1}e

_{2}, e

_{2}e

_{0}, e

_{0}i

_{0}, e

_{1}i

_{1}, e

_{2}i

_{2}, i

_{0}i

_{0}, i

_{1}i

_{1}, i

_{2}i

_{2}, i

_{0}e

_{1}, i

_{1}e

_{2}, i

_{2}e

_{0}, e

_{2}j_TAA, e

_{2}j_TAG, e

_{2}j_TGA}. See Supplementary Section 2 for details on the 33-state model for the forward and reverse encoding together, to be described next.

_{n−1}|λ

_{n−1}) terms). A simple HMM with single-base state representation has poor performance in modeling the anomalous statistics in the transition regions between exon, intron, or junk regions without additional side-rules that break with a purely HMM implementation. If a transition ‘je

_{0}’ has occurred, for example, and we are looking at the base emission for the ‘e

_{0}’ state, we cannot account for the prior state with the simple P(b

_{n−1}|λ

_{n−1}) conditional probabilities in the standard bare-bones HMM modeling, we minimally need P(b

_{n−1}|λ

_{n−2}, λ

_{n−1}), i.e., state modeling at the dimer-level or higher.

## 3. Experimental Section—Methods

#### 3.1. Genome Versions Used in Data Analysis (All from www.ensembl.org)

#### 3.2. Pre-Partitioning Training Data for Massively Parallel/Distributed Solutions

#### 3.3. Two-Track Annotation and Counting—Order of Annotation Governs Track Placement

## 4. Results

- (3′|i) V-transitions: i0ii, i1ii, i2ii, iii0, iii1, iii2, AIII, BIII, CIII, IIAI, IIBI, IICI.
- (5′|i) V-transitions: 0iii, 1iii, 2iii, ii0i, ii1i, ii2i, IAII, IBII, ICII, IIIA, IIIB, IIIC.
- (3′|e) V-transitions: 01i1, 12i2, 20i0, i020, i101, i202, AIAC, BIBA, CICB, BABI, CBCI, ACAI.
- (5′|e) V-transitions: 0i01, 1i12, 2i20, 010i, 121i, 202i, IABA, IBCB, ICAC, BAIA, CBIB, ACIC.

## 5. Discussion

#### 5.1. V–Transition Rules

- (1)
- Approximate frame agreement rule: j0 on track 1 cannot overlap 2j, 0i, 1i, i1, i2 on track 2, and only rarely overlap with 01 or 12 on track 2 (a consensus agreement rule). {j0, jC, 2j, Aj} similar, so excluding 4 × 5 = 20. Similarly with 01 track 1 not overlapping with 12, 20, 1i, 2i, i0, i2 on track 2 (rarely with j0 and 2j as noted). {10.12.20.BA, CB, AC} similar, so excluding 6 × 6 = 36 V–transitions. Similarly 0i on track 1 cannot overlap with j0, 2j, 12, 20, 1i, 2i, i0, i2 on track 2. {0i, 1i, 2i, i0, i1, i2, AI, BI, CI, IA, IB, IC} similar, so excluding 12 × 8 = 96 V-transitions.
- (2)
- No ‘eiie’ or ‘ieei’ rule (a consensus agreement rule): 0i on track 1 cannot overlap with i1. {0i, 1i, 2i, i0, i1, i2, AI, BI, CI, IA, IB, IC} similar, so excluding 12 × 1 = 12 V-transitions.
- (3)
- No exon boundary overlap with reverse coding region rule, where j0 cannot overlap BA, CB, AC, for example. {j0, jC, 2j, Aj} similar, so excluding 4 × 3 =12. Similarly 0i cannot overlap BA, CB, AC, and there are 12 splice types, so excluding 12 × 3 = 36. And, 01 cannot overlap jC, Aj, AI, BI, CI, IA, IB, IC, so 6 × 8 = 48 more exclusions. This appears to be a rule that shows that a coevolutionary linkage between cis or trans regulatory regions and reverse coding regions is highly unfavorable.

- (4)
- Start/End consensus disagreement rule: Figure 15 shows how consensus agreement is possible for ‘eij0’ and ‘ie2j’, but not for flipped consensus EIj0 or IEj0 or IE2j or EI2j (so twelve cases). When treating Aj and jC similarly to j0 and 2j, get another 12, for 24 V-transition exclusions total.
- (5)
- Avoid forward/reverse splice signal overlap. ‘0i’ cannot overlap AI, BI, CI; and would generally not favor overlap with IA, IB, IC. There are 12 × 6 = 72 similar exclusions.

- (i)
- Zero counts found for: j0jC, j02j, j0Aj, jCj0, 2jj0, Ajj0 ➔ non-overlap with other start/end rule.
- (ii)
- Zero counts found for j0 overlap with reverse transitions except for II.
- (iii)
- Zero counts found for j0 overlaps with forward splice unless 3’ (dominated by base-frame 0 to be in agreement with 0 frame in ‘j0’).

- (I)
- Zero counts found for: j2j0, j20i, j21i, j2i0, j2i1, j2i2, indicating a non-overlap with other start/end or splice rule, except for 2j2i (end overlap with 5’splice appearing in more spliced genomes, and only in-frame, showing a slower growth in encumbered 2j versus encumbered j0, as with j0, have indications of spliceosomally driven alt-splice gene extension via exon recruitment from the trans-side of the gene).
- (II)
- Zero counts found for 2j overlap with reverse transitions except for II.

#### 5.2. Impact of Annotation Errors

## 6. Conclusions

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Stanke, M.; Morgenstern, B. AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res.
**2005**, 33, W465–W467. [Google Scholar] [CrossRef] [PubMed] - Rajapakse, J.C.; Ho, L.S. Markov Encoding for Detecting Signals in Genomic Sequences. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2005**, 2, 131–142. [Google Scholar] [CrossRef] [PubMed] - Majoros, W.H.; Pertea, M.; Salzberg, S.L. TigrScan and GlimmerHMM: Two open source ab initio eukaryotic gene-finders. Bioinformatics
**2004**, 1, 2878–2879. [Google Scholar] [CrossRef] [PubMed] - Taher, L.; Rinner, O.; Garg, S.; Sczyrba, A.; Brudno, M.; Batzoglou, S.; Morgenstern, B. AGenDA: Homology-based gene prediction. Bioinformatics
**2003**, 19, 1575–1577. [Google Scholar] [CrossRef] [PubMed] - Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol.
**1990**, 215, 403–410. [Google Scholar] [CrossRef] - Sonnenburg, S.; Zien, A.; Ratsch, G. ARTS: Accurate recognition of transcription starts in human. Bioinformatics
**2006**, 22, e472–e480. [Google Scholar] [CrossRef] [PubMed] - Do, J.H.; Choi, D.-K. Computational Approaches to Gene Prediction. J. Microbiol.
**2006**, 44, 137–144. [Google Scholar] [PubMed] - Korf, I. Gene finding in novel genomes. BMC Bioinform.
**2004**, 5, 59. [Google Scholar] [CrossRef] [PubMed][Green Version] - Mathe, C.; Sagot, M.-F.; Schiex, T.; Rouze, P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res.
**2002**, 30, 4103–4117. [Google Scholar] [CrossRef] [PubMed] - Allen, J.E.; Majoros, W.H.; Pertea, M.; Salzberg, S.L. JIGSAW, GeneZilla, and GlimmerHMM: Puzzling out the features of human genes in the ENCODE regions. Genome Biol.
**2006**, 7 (Suppl. 1), S9. [Google Scholar] [CrossRef] [PubMed] - Winters-Hilt, S.; Baribault, C. A Meta-state HMM with application to gene structure identification in eukaryotes. EURASIP J. Adv. Signal Process.
**2010**, 2010, 581373. [Google Scholar] [CrossRef] - Winters-Hilt, S.; Jiang, Z. A hidden Markov model with binned duration algorithm. IEEE Trans. Signal Proc.
**2010**, 58, 948–952. [Google Scholar] [CrossRef] - Winters-Hilt, S.; Jiang, Z.; Baribault, C. Hidden Markov model with duration side-information for novel HMMD derivation, with application to eukaryotic gene finding. EURASIP J. Adv. Signal Process.
**2010**, 2010, 761360. [Google Scholar] [CrossRef] - Noguchi, H.; Park, J.; Takagi, T. MetaGene: Prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res.
**2006**, 34, 5623–5630. [Google Scholar] [CrossRef] [PubMed] - Kulp, D.; Haussler, D.; Reese, M.G.; Eeckman, F.H. A generalized hidden Markov model for recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol.
**1996**, 4, 134–142. [Google Scholar] [PubMed] - Van Baren, M.J.; Koebbe, B.C.; Brent, M.R. Using N-SCAN or TWINSCAN to predict gene structures in genomic DNA sequences. Curr. Protoc. Bioinform.
**2007**. [Google Scholar] [CrossRef] - Rogic, S.; Mackworth, A.K.; Francis Ouellette, B.F. Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome Res.
**2001**, 11, 817–832. [Google Scholar] [CrossRef] [PubMed] - Dunham, I.; Shimizu, N.; Roe, B.A.; Chissoe, S. The DNA sequence of human chromosome 22. Nature
**1999**, 402, 489–495. [Google Scholar] [CrossRef] [PubMed] - Burset, M.; Guigo, R. Evaluation of Gene Structure Prediction Programs. Genomics
**1996**, 34, 353–367. [Google Scholar] [CrossRef] [PubMed] - Winters-Hilt, S.; Roux, B. Hybrid MM/SVM structural sensors for stochastic sequential data. In Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics, Oklahoma City, OK, USA, 23–24 February 2008. BMC Bioinform.
**2008**, 9 (Suppl. 9), S12. [Google Scholar] - Liu, H.; Han, H.; Li, J.; Wong, L. DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences. Available online: http://sdmc.i2r.a-star.edu.sg/DNAFSMiner/ (accessed on 26 June 2016).
- Sonnenburg, S.; Schweikert, G.; Philips, P.; Behr, J.; Rätsch, G. Accurate splice site prediction using support vector machines. BMC Bioinform.
**2007**, 8 (Suppl. 10), S7. [Google Scholar] [CrossRef] [PubMed] - Degroeve, S.; Saeys, Y.; de Baets, B.; Rouzé, P.; van de Peer, Y. SpliceMachine: Predicting splice sites from high-dimensional local context representations. Bioinformatics
**2005**, 21, 1332–1338. [Google Scholar] [CrossRef] [PubMed] - Muro, E.M.; Herrington, R.; Janmohamed, S.; Frelin, C.; Andrade-Navarro, M.A.; Iscove, N.N. Identification of gene 3’ ends by automated EST cluster analysis. PNAS
**2008**, 105, 20286–20290. [Google Scholar] [CrossRef] [PubMed] - Bellora, N.; Farre, D.; Alba, M.M. PEAKS: Identification of regulatory motifs by their position in DNA sequences. Bioinformatics
**2007**, 23, 243–244. [Google Scholar] [PubMed] - He, X.; Ling, X.; Sinha, S. Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution. PLoS Comput. Biol.
**2009**, 5, 1–14. [Google Scholar] [CrossRef] [PubMed] - Winters-Hilt, S.; Baribault, C. A novel, fast, HMM-with-Duration implementation—For application with a new, pattern recognition informed, nanopore detector. BMC Bioinform.
**2007**, 8 (Suppl. 7), S19. [Google Scholar] [CrossRef] [PubMed] - Winters-Hilt, S. Hidden Markov Model Variants and their Application. BMC Bioinform.
**2006**, 7 (Suppl. 2), S14. [Google Scholar] [CrossRef] [PubMed] - Lu, D. Motif Finding. Master Thesis, University of New Orleans, New Orleans, LA, USA, 2009. [Google Scholar]
- Shinozaki, D.; Akutsu, T.; Maruyama, O. Finding optimal degenerate patterns in DNA sequences. Bioinformatics
**2003**, 19 (Suppl. 2), ii206–ii214. [Google Scholar] [CrossRef] [PubMed] - Frickey, T.; Weiller, G. Mclip: Motif detection based on cliques of gapped local profile-to-profile alignments. Bioinformatics
**2007**, 23, 502–503. [Google Scholar] [CrossRef] [PubMed] - De Hoon, M.J.L.; Imoto, S.; Nolan, J.; Miyano, S. Open source clustering software. Bioinformatics
**2004**, 20, 1453–1454. [Google Scholar] [CrossRef] [PubMed] - Wang, G.; Yu, T.; Zhang, W. WordSpy: Identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res.
**2005**, 33, W412–W416. [Google Scholar] [CrossRef] [PubMed] - Durbin, R.; Eddy, S.; Krogh, A.; Mitchison, G. Biological Sequence Analysis, Probabilistic Models of Proteins and Nucleic Acids; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Rabiner, L.R.; Juang, B.H. An Introduction to Hidden Markov Models. IEEE ASSP Mag.
**1986**, 3, 4–16. [Google Scholar] [CrossRef] - Winters-Hilt, S. Machine-Learning Based Sequence Analysis, Bioinformatics & Nanopore Transduction Detection; Lulu.com Publishing: Chapel Hill, NC, USA, 2011. [Google Scholar]

**Figure 1.**Comparison of standard hidden Markov model (HMM) and the clique-generalized meta-state HMM. The upper graphical model is for the standard HMM and shows the ‘emission’ observation sequence x

_{i}, and the associated hidden label sequence λ

_{i}, and the arrows denote the conditional probability approximations used in the model (for the transition and emission probabilities). Focusing at the level of the core joint-probability construct at instant ‘i’ in the middle graph, the standard HMM is a subset of the joint probability construct P(λ

_{i}, λ

_{i+1}, x

_{i+1}). The generalized-clique HMM is shown in the graphical model at the bottom for one particular clique generalization. The model can be exact on emission positionally, then extend via zone dependence and use of generalized HMM (gIMM) interpolation. The model can be exact to higher order in state, and using an HMMD generalization [12] also extends modeling to have HMM with duration modeling. When doing the latter, zone-dependent and position dependent modeling can be incorporated via reference to the duration in the model, and can be directly incorporated into a generalized Viterbi algorithm (and other generalized HMM algorithms), as well as any other side-information of interest [13]. Reprinted with permission [11].

**Figure 2.**Top Panel. Sliding-window association (clique) of observations and hidden states in the meta-state hidden Markov model, where the clique-generalized HMM algorithm describes a left-to-right traversal (as is typical) of the HMM graphical model with the specified clique window. The first observation, b

_{0}, is included at the leading edge of the clique overlap at the HMM’s left boundary. For the last clique’s window overlap we choose the trailing edge to include the last observation b

_{N–1}. Bottom Panel. Graphical model of the clique-generalized HMM, where the interconnectedness on full joint dependencies is only partly drawn. Reprinted with permission [11].

**Figure 3.**The standard forward-read Gene Predictor with five state labels: j, i, 0, 1, 2; and 13 state transitions: jj, j0, 2j, 01, 12, 20, 0i, 1i, 2i, i0, i1, i2, ii. The arrow covers the extent of the exon bounded region.

**Figure 4.**The standard two-pass gene predictor. A forward pass is used to catch forward reads, followed by a reverse complement pass to catch reverse reads.

**Figure 5.**The problem with the standard two-pass gene predictor. Confusion can result in the forward pass across reverse read regions, as shown, that can obscure the true start of other, valid, forward reads.

**Figure 6.**The single-pass forward/reverse coding Gene Predictor with non-overlapping encoding. Top: the forward and reverse reads shown on two separate tracks. Bottom: the forward and reverse reads on a single forward-scan pass on an enlarged state and transition model (shown without refinement involving intron frame-pass and end-of-coding stop codon validation states).

**Figure 7.**Overlapping encoding requires more than one annotation track if using a single forward-pass gene-structure identifier.

**Figure 8.**Overlap of coding with reverse intronic regions, with individual base states shown below for the two tracks.

**Figure 9.**Alt-splice overlap encoding with alternatively spliced exon present on track 1. We find V-state labels: $\left(\begin{array}{c}0\\ \mathrm{i}\end{array}\right)$, etc., and V-transitions: $\left(\begin{array}{c}\mathrm{i}1\\ \mathrm{ii}\end{array}\right)$, compactly denoted ‘i1ii’, shown. Other 3-prime splice-site, intron-exon (ie), overlap with intron (ii) transitions include ‘i0ii’ and i2ii’. We also have V-transitions: $\left(\begin{array}{c}0\mathrm{i}\\ \mathrm{ii}\end{array}\right)$, denoted ‘0iii’, for 5-prime splice-site, exon-intron (ei), overlap with intron (ii) transitions (other 5-prime ei overlap V-transitions include ‘1iii’ and 2iii’).

**Figure 10.**Pseudocode for two-track annotation conversion from GTF, together with counting on two-track states and transitions.

**Figure 12.**The relative number of counts on start-of-coding (j0 and Aj dimers) and on the different splice-sites are shown in relation to each other. The charts, from left-to-right, are for the human, mouse, fly, and worm genomes.

**Figure 13.**The relative number of counts on alternative start-of-coding. The charts, from left-to-right, are for the human, mouse, fly, and worm genomes.

**Figure 14.**The relative number of counts on alternative end-of-coding. The charts, from left-to-right, are for the human, mouse, fly, and worm genomes.

**Figure 15.**The eij0 and ie2j types of V-transitions can have consensus agreement in their overlap, thus are allowed.

**Figure 17.**End-of-coding alternative splicing, with the last case shown (bottom track) indicative of a process (possibly spliceosome mediated) for gene growth by way of new last exon recruitment.

Species | Release | GTF File |
---|---|---|

Human (Homo sapiens) | 75 | Homo_sapiens.GRCh37.75.gtf |

Mouse (Mus musculus) | 81 | Mus_musculus.GRCm38.81.gtf |

Worm (Caenhorhabditis elegans) | 83 | Caenhorhabditis_elegans.WBcel235.83.gtf |

Fly (Drosophila melanogaster) | 75 | Drosophila_melanogaster.BDGP5.75.gtf |

**Table 2.**Counts on start-of-coding (j0 and Aj dimers) and on the different splice-sites. Altsum is the sum total of the different splice types (5′|i, 5′|e, 3′|i, and 3′|e). The last column ‘Alt/j0’ is the ratio of altsum to the {j0 + Aj} counts.

Species | j0+Aj | 5’|i | 5’|e | 3’|i | 3’|e | altsum | Alt/j0 |
---|---|---|---|---|---|---|---|

Worm | 25,462 | 809 | 809 | 1,438 | 653 | 4,283 | 0.175 |

Fly | 18,730 | 768 | 768 | 1,501 | 699 | 4,385 | 0.234 |

Mouse | 33,561 | 2,260 | 2,260 | 7,922 | 1,540 | 18,473 | 0.550 |

Human | 36,620 | 12,075 | 3,186 | 14,317 | 2,002 | 31,580 | 0.862 |

V-trans | Human | Mouse | Fly | Worm |
---|---|---|---|---|

H-trans j0 | 18,911 | 16,899 | 9,389 | 12,938 |

{j0jj + jjj0} | 4,208 | 6,112 | 4,334 | 8,323 |

j0j0 | 5,892 | 4,623 | 2,178 | 1,750 |

{j001 + 01j0}* | 106 | 32 | 10 | 4 |

{j012 + 12j0}* | 65 | 14 | 0 | 2 |

{j020 + 20j0}* | 1,695 | 888 | 251 | 686 |

{j0i0 + i0j0}* | 88 | 55 | 6 | 32 |

{j0ii + iij0} | 873 | 490 | 237 | 204 |

{j0II + IIj0} | 118 | 59 | 193 | 181 |

{j0jj + jjj0}/j0 | 0.223 | 0.362 | 0.462 | 0.643 |

*/non-* | 0.176 | 0.088 | 0.038 | 0.069 |

V-trans | Human | Mouse | Fly | Worm |
---|---|---|---|---|

H-trans 2j | 19,040 | 16,727 | 9,409 | 12,955 |

{2jjj + jj2j} | 6,641 | 7,347 | 4,355 | 7,843 |

2j2j | 3,442 | 3,359 | 2,163 | 2,249 |

{2j01 + 012j}* | 926 | 383 | 48 | 46 |

{2j12 + 122j}* | 908 | 406 | 39 | 79 |

{2j20 + 202j}* | 704 | 339 | 70 | 5 |

{2j2i + 2i2j}* | 51 | 55 | 0 | 3 |

{2jii + ii2j} | 2,749 | 1,371 | 378 | 306 |

{2jII + II2j} | 156 | 117 | 192 | 175 |

{2jjj + jj2j}/2j | 0.349 | 0.439 | 0.463 | 0.605 |

*/non-* | 0.199 | 0.097 | 0.022 | 0.013 |

© 2017 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Winters-Hilt, S.; Lewis, A.J. Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes. *Informatics* **2017**, *4*, 3.
https://doi.org/10.3390/informatics4010003

**AMA Style**

Winters-Hilt S, Lewis AJ. Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes. *Informatics*. 2017; 4(1):3.
https://doi.org/10.3390/informatics4010003

**Chicago/Turabian Style**

Winters-Hilt, Stephen, and Andrew J. Lewis. 2017. "Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes" *Informatics* 4, no. 1: 3.
https://doi.org/10.3390/informatics4010003