We propose that miRNA localization in cellular compartments is an emergent property from interactions of a cloud of RNA-binding proteins and RNA sequences composed of nucleotide words. Here, words are extracted from sliding windows over all transcript sequences with some functional window size. An emergent consequence of this cloud model is that anomalous diffusion can occur if random-walk target RNA transcripts interact with surrounding protein scaffold as a cloud, and if the cloud relaxation time is long [6
]. This would be similar to what can be observed with falling objects clustering or trailing in a fluid [7
]. We propose that RNAs with sequences similar to the whole transcriptome will exhibit enhanced transport compared to RNA sequences without similar sequences. Thus, miRNAs should partition into different cellular compartments based on word compositions from their sequence. We can determine the frequency of all words in the transcriptome as a matrix composed of RNA sequences and copy levels. For each transcript, we count the number of words in common with all others in the cloud list or dictionary as a similarity measure to the transcriptome, and we were also able to compare them to randomized sequence words.
RNA molecule diffusion initially in nuclear then cytoplasmic compartments would lead to extracellular export of RNA if the transcript half-life is greater than its transit rate. Calculations at arbitrary transit distances could be determined from a dynamic systems model with a large set of partial differential equations modeling RNA mobility as in the Fick’s equation, but this would be computationally prohibitive [8
]. Instead, we pursue a thermodynamic approach based on the Fokker–Planck equation [5
]. Consider that each transcript is affected by local protein scaffolds with an effective interaction window of some sequence length w. The closer the word set of the target (miRNA) to the whole transcriptome, the more canonical its diffusion. As such, the mobility displays consistent patterns with the whole transcriptome. Anomalous RNA diffusion can give rise to emergent and patterned behavior in the cell [9
]. Some transcripts will have specialized transport modes, which will show up as outliers in this algorithmic treatment. The transcriptome cloud dictionary is built as a collection of transcriptome word sets along with expression levels that depend on the cell state. Model parameters like optimum word size can be estimated from RNA datasets obtained from public data sources. Assume that the smallest reasonable word in the cloud is four nt long, this corresponding to the lower limit of size for a seed sequence in miRNA [10
]. In this case, there are only 44
= 256 different words so that the transcriptome dictionary would have high expression values for the many duplicate words. The upper limit for word size is set at 22 nt, corresponding to the size of a typical mature miRNA. This is the same as the MRE size in the similarly related ceRNA hypothesis by [11
]. We determine the frequency of all words in the transcriptome with a matrix composed of RNA sequences and copy levels (e.g., normalized reads or RPKM). For each miRNA target transcript, count the number of words in common with the cloud dictionary as a similarity measure (“tCount”) to the transcriptome, or multiply each word count (tCount) by its expression value to derive “tWord” measure.
The maximal size for all possible 22 nt words would be 422
= 1.76 × 1013
since there are four possible nucleotide letters at each of the 22 nt positions. The actual transcriptome contains much fewer than that number of possibilities (Table 1
). Assume there are 5 × 104
tRNA, rRNA, mRNA, and ncRNA different transcripts in a cell, with an average length of 2 × 103
nt, then counting all overlapping words, there are 1 × 108
possible words in a whole transcriptome matrix. This data set can fit on a big data scale computer system for analysis.