The RNA isolated from three different tissues of Siberian stone pine—cambium, needles and buds—were sequenced and assembled for each tissue separately. The transcriptomes were converted into the triplet frequency dictionaries (FD) of two types. The first type FD allows to detect the strand specificity of clusters via the Chargaff’s second parity rule. The clusters were identified using K-means applied to the FD of contigs. We observed a four-cluster triangle structure in the distribution of the FDs. The observed symmetry of the clusters was apparently based on the strand specificity of contigs.
The second FD type was developed separately only for well annotated and sufficiently long contigs for each transcriptome. Each sequence has been tiled using a 300 bp long moving window with a 8 bp step and converted into a triplet FD. Then, each FD has been labelled by phase index. The fragments falling out of a coding region were labelled by 0. Otherwise, they were labelled according to the reading frame shift from 1 to 3, respectively. All the FDs were clusterized by K-means, yielding a four-cluster pattern, where the clusters included the FDs with the same phase index. A highly symmetric four-cluster triangle pattern was observed.
Conflicts of Interest
The authors declare no conflict of interest.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).