Next Article in Journal
Chromatographic Profile and Redox-Modulating Capacity of Methanol Extract from Seeds of Ginkgo biloba L. Originating from Plovdiv Region in Bulgaria
Next Article in Special Issue
Recognition of Knee Osteoarthritis (KOA) Using YOLOv2 and Classification Based on Convolutional Neural Network
Previous Article in Journal
2-(3-Bromophenyl)-8-fluoroquinazoline-4-carboxylic Acid as a Novel and Selective Aurora A Kinase Inhibitory Lead with Apoptosis Properties: Design, Synthesis, In Vitro and In Silico Biological Evaluation
Previous Article in Special Issue
Novel DERMA Fusion Technique for ECG Heartbeat Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of Echinococcus multilocularis in Red Foxes in Poland

1
Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk, Poland
2
Department of Nuclear Medicine, Medical University of Gdańsk, 80-210 Gdańsk, Poland
3
Department of Tropical Parasitology, Medical University of Gdańsk, 81-519 Gdynia, Poland
4
Department of Parasitology and Invasive Diseases, National Veterinary Research Institute, 24-100 Puławy, Poland
*
Author to whom correspondence should be addressed.
Life 2022, 12(6), 877; https://doi.org/10.3390/life12060877
Submission received: 25 April 2022 / Revised: 20 May 2022 / Accepted: 8 June 2022 / Published: 10 June 2022

Abstract

:
The 4D-Dynamic Representation of DNA/RNA Sequences, an alignment-free bioinformatics method recently developed by us, has been used to study the genetic diversity of Echinococcus multilocularis in red foxes in Poland. Sequences of three mitochondrial genes, i.e., NADH dehydrogenase subunit 2 (nad2), cytochrome b (cob), and cytochrome c oxidase subunit 1 (cox1), are analyzed. The sequences are represented by sets of material points in a 4D space, i.e., 4D-dynamic graphs. As a visualization of the sequences, projections of the graphs into 3D space are shown. The differences between 3D graphs corresponding to European, Asian, and American haplotypes are small. Numerical characteristics (sequence descriptors) applied in the studies can recognize the differences. The concept of creating descriptors of 4D-dynamic graphs has been borrowed from classical dynamics; these are coordinates of the centers or mass and moments of inertia of 4D-dynamic graphs. Based on these descriptors, classification maps are constructed. The concentrations of points in the maps indicate one Polish haplotype (EmPL9) of Asian origin.

1. Introduction

Recently, a rapid growth of the experimental data in nucleotide databases can be observed, which stimulated the development of mathematical methods to describe these large and complex objects. One group of approaches is formed by the so-called alignment-free bioinformatics methods. For reviews of alignment-free methods, see [1,2]. They are an alternative to standard, alignment-based sequence analysis approaches, e.g., ClustalW [3], Blast [4], Needleman–Wunsch algorithm [5], or T-Coffee [6]. Alignment-free methods are usually computationally simple and there are no sequence limitations. They are particularly useful for Big Data analysis and research on various aspects of similarity between the biological (DNA, RNA, protein) sequences.
Similarity of complex objects is not unique. Multi-dimensional objects can be similar in one aspect/property and very different if other characteristics are taken into account. Different aspects of similarity may be relevant to different problems. Let us take a model example and consider two different pairs of DNA sequences:
1.
G G T T
G G A A
2.
G T G T
G A G A
In both cases, the similarity value is 50%, but non-zero contributions to the final result come from different positions of G in the sequences. In the first case, G are cumulative at the beginning of the sequences, and in the second one the distributions of G are symmetric. The same results are also obtained if, for example, G is replaced by C. Different structures give the same result in standard alignment methods. The degree of non-uniqueness increases with the lengths of the sequences. Advantages of non-standard (alignment-free) methods include the suitability for Big Data analysis with no restrictions for the sequences, as already mentioned, as well as a variety of derived information. In non-standard methods, we obtain a series of values characterizing different properties of a single sequence. The similarity of these properties can be studied separately using non-standard methods and may be correlated with different biological consequences. Therefore, the creation of new methods is very important to reveal some hidden properties of the sequences.
Similarity/dissimilarity analysis is strictly related to classification studies, which is an interdisciplinary problem [7,8]. For example, we obtained information about different types of objects by examining their similarity in the quality of life research [9,10], or in bioinformatics [11,12].
In bioinformatics, there are many different alignment-free methods. For example, Zhou et al. constructed a complex network for similarity/dissimilarity analysis of DNA sequences [13]. We represented the protein sequence as a set of material points in a 20D space [14]. Saw et al. analyzed the similarity of DNA sequences using the fuzzy integral with a Markov chain [15]. Lichtblau applied frequency chaos game representation and signal processing for genomic sequence comparison [16]. He et al. introduced a numerical representation of a DNA sequence, called the Subsequence Natural Vector, and applied it for HIV-1 subtype classification [17].
A subgroup within alignment-free bioinformatics methods is formed by the so-called Graphical Representations of Biological Sequences, applicable to both graphical and numerical similarity/dissimilarity analysis of biological sequences. It is not obvious how to represent graphically multidimensional objects in two or three dimensions to reveal the most important features without losing information. A variety of approaches have been developed, bringing together ideas from different fields of science, and each of them focuses on various aspects of similarity. Method names are often associated with some properties or ideas applied to the construction of graphs or numerical characteristics describing the graphs. The first graphical representation methods were based on walks in three [18,19] and two [20,21,22] dimensions. Since then, there has been a dynamic development of the graphical bioinformatics branch observed (for reviews see [23,24]). Let us just mention the last few methods of graphical representation: in the “Spider representation of DNA sequences”, the graphs resemble a spider’s web [25]; in a method called by us “Spectral-dynamic representation of DNA sequences”, the plots resemble atomic, molecular, or stellar spectra composed of sequences of sharp spectral lines [11]. For the numerical characterization of these plots, we applied some ideas used in classical dynamics. Hu et al. applied fractal interpolation in their graphical representation of protein sequences [26]. Graphical representations of protein sequences based on physiochemical properties may be found in works by Mahmoodi-Reihani et al. [27], or by Xie and Zhao [28]. A graphical representation of DNA sequences proposed by Xie et al. is based on trigonometric functions [29]. The 2D graphic representation of the DNA sequence proposed by Liu is based on the horizon lines [30]. Another 2D graphical representation of DNA sequences proposed by Wu et al. is based on variant map [31]. The goal is to create approaches in which both graphs and numerical characteristics, often referred to as sequence descriptors, represent a biological sequence in a unique (i.e., degeneracy-free) way. The first sequence descriptors related to graphical representation of sequences were designed by Raychaudhury and Nandy [32] and by Randić et al. [33]. Since then, many approaches have been created, e.g., spectral moments in the sequence similarity studies were considered by Agüero-Chapin et al. [34] (for review see [35]).
In the present work, we apply 4D-Dynamic Representations of DNA/RNA Sequences created by us [12]. This is a multidimensional alignment-free bioinformatics method, but it also offers some kind of visualization (for details see subsequent section). We applied this method for a characterization of SARS-CoV-2 and Zika viruses. In the present work, we perform analogous studies on genetic diversity of Echinococcus multilocularis in red foxes in Poland. Alveolar echinococcosis is a serious parasitic zoonosis caused by Echinococcus multilocularis, Leuckart 1863. E. multilocularis was found in Poland in relatively high percentages in red foxes; in some regions, the prevalence reached up to approximately 50% [36]. More than one hundred cases of human alveolar echinococcosis were described before 2013 [37]. The present study is a continuation of our previous work in which the results have been obtained using a standard ClustalW method [38].

2. Materials and Methods

In the present studies, we apply the 4D-Dynamic Representation of DNA/RNA Sequences—an alignment-free bioinformatics method proposed by us [12]. In this approach, the DNA/RNA sequence is represented as a set of material points in a 4D space, called the 4D-dynamic graph. The distribution of the points in the space is characteristic for the sequence. A 4D-dynamic graph is created using a method of shifts (walk) starting from the point with coordinates ( 0 , 0 , 0 , 0 ). The first shift is performed according to the unit vector representing the first nucleobase in the sequence. Starting from the end of this vector, the second shift is performed according to the unit vector representing the second nucleobase in the sequence. The process continues until the last nucleobase in the sequence. At the end of each vector, a material point is located with the mass m i = 1 . Then, the total mass of the 4D-dynamic graph is the length of the sequence (N):
N = i = 1 N m i .
We represent the nucleobases by the following unit vectors: adenine by the vector A = (1,0,0,0), cytosine by C = (0,1,0,0), guanine by G = (0,0,1,0), and thymine/uracil by T/U = (0,0,0,1). The final similarity relations between the sequences are the same for different assignments of particular unit vectors to the nucleobases. Choosing the mass different from 1, the final relative similarity relations also remain the same. The mass of each material point and the unit vectors representing particular nucleobases should be the same for all the sequences.
An example of the construction of the 4D-dynamic graph for a model sequence AUGAC is given in [12].
As a visualization of the 4D-dynamic graphs, we apply their projections into 2D or 3D spaces. For example, if we put x i 1 and x i 2 coordinates equal to zero, then we obtain a 2D projection, i.e., x 3 x 4 -graph. The distributions of the material points in the 3D or 2D spaces give some information about the locations of three or two nucleobases along the sequences.
As the numerical characteristics of the 4D-dynamic graphs (sequence descriptors), we apply values analogous to the ones used in the classical dynamics. One kind of such sequence descriptors are the coordinates of the center of mass of the 4D-dynamic graph:
μ k = i = 1 N m i x i k i = 1 N m i = 1 N i = 1 N x i k .
x i k are the coordinates of the mass m i in the 4D space and k = 1 , 2 , 3 , 4 .
Another kind of value analogous to the one used in the classical dynamics is the tensor of the moment of inertia of 4D-dynamic graph. It is given by the matrix
I ^ = I 1 1 I 1 2 I 1 3 I 1 4 I 2 1 I 2 2 I 2 3 I 2 4 I 3 1 I 3 2 I 3 3 I 3 4 I 4 1 I 4 2 I 4 3 I 4 4
with the elements:
I j j = i = 1 N m i k = 1 4 x ^ i k ( 1 δ j k ) 2 ,
I j k = I k j = i = 1 N m i x ^ i j x ^ i k ,
where
δ j k = 1 j = k , 0 j k
is the Kronecker delta. x ^ i k are the coordinates of m i in the Cartesian coordinate system for which the origin has been selected at the center of mass:
x ^ i k = x i k μ k .
The eigenvalue problem of the tensor of inertia is defined as:
I ^ ω k = I k ω k , k = 1 , 2 , 3 , 4 ,
where I k are the eigenvalues and ω k are the eigenvectors. The eigenvalues are obtained by solving the fourth-order secular equation:
det ( I ^ I E ^ ) = 0 ,
where E ^ is 4 × 4 unit matrix. The eigenvalues I k are called the principal moments of inertia.
As the sequence descriptors, we apply the normalized principal moments of inertia:
r k 4 D = I k N , k = 1 , 2 , 3 , 4 .
The presented method is applied to estimate the genetic diversity of the cestode Echinococcus multilocularis, Leuckart 1863, in Poland based on sequence analysis of the mitochondrial genes of worms isolated using the sedimentation and counting technique [39] from the intestines of red foxes Vulpes vulpes (Linnaeus). More details concerning the isolation of parasites, sample preparation, polymerase chain reactions (PCRs), and sequencing were described earlier [38]. The nucleotide sequence data used for the calculations are available in GenBank. Sequences of three mitochondrial genes, i.e., NADH dehydrogenase subunit 2 (nad2), cytochrome b (cob), and cytochrome c oxidase subunit 1 (cox1), are analyzed (for the accession numbers see subsequent section) [38,40].

3. Results and Discussion

Figure 1 shows examples of projections of the 4D-dynamic graphs to 3D space: x 2 x 3 x 4 -graphs. The differences between the graphs representing the sequences for different countries (Poland, Slovakia, USA, China) are small. The corresponding principal moments of inertia for all the sequences used in the calculations are shown in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6.
In our previous work, combined sequence analysis of three genes (cob, nad2, cox1) exhibited fifteen Polish haplotypes (EmPL1–EmPL15). Separate analyzes within individual genes showed less differentiation. The number of haplotypes is smaller for cob, nad2, and cox1 genes. They are denoted by the letters A-J (see Table 1, Table 2 and Table 3) [38]. As a consequence, in some cases, the sequence descriptors are the same. For example, the descriptors of sequences No. 1 and No. 7 in Table 3 (haplotytypes A) are the same. All the values for particular genes are similar. For example, the principal moments of inertia are similar for sequences No. 6 (EmPL6 cox_C) and No. 7 (EmPL7 cox_A) (Table 3). They are equal to 117.0, 117.0, 116.9, and 131.1 for sequence No. 6 and to 117.1, 117.1, 117.0, and 132.2 for sequence No. 7.
These small differences can be better observed in the classification maps (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6). Figure 2, Figure 3 and Figure 4 show r k 4 D r l 4 D μ m and μ k μ l r m 4 D classification maps. Figure 2 represents the cob gene, Figure 3 represents the nad2 gene, and Figure 4 represents the cox1 gene. Figure 5 shows μ 1 μ 2 μ 4 classification maps for all three genes. Figure 6 shows μ 2 μ 3 μ 4 also for all three genes. The points in the maps corresponding to fourteen Polish haplotypes EmPL1, EmPL2, EmPL8 and EmPL10, EmPL11, EmPL15 are concentrated close to the ones representing European clades. Several Polish haplotypes nearly overlap with some European clades (for example with Austria in Figure 2 or with Slovakia in Figure 4). The exception is the Polish haplotype EmPL9. The points representing this sequence are concentrated close to the points representing Asian clades. In particular, Kazakhstan is the closest point to EmPL9 in: Figure 2 (all panels), Figure 3 (all panels), Figure 5 (panels top, middle), and Figure 6 (panels top, middle). This means that the largest similarities between EMPL9 and Kazakhstan are observed for cob and nad2 genes in all the aspects considered. Figure 4, Figure 5 (bottom panel), and Figure 6 (bottom panel) show the classification maps for the cox1 gene. In these cases, China (Sichuan) and Japan (Hokkaido) are the closest points to EMPL9.
The results coming from our method can be also presented in a form similar to phylogenetic trees of the standard methods. Figure 7 shows cluster dendrogram for the cob gene using r 3 4 D , r 1 4 D , μ 3 , and the Euclidean distance measure. This dendrogram is another representation of the results of the calculations shown in the top left panel of Figure 2.
The method has no restriction as far as the lengths of the sequences are concerned. Within this method, it is also possible to compensate the information coming from three genes separately into one sequence. Figure 8 shows the x 2 x 3 x 4 -graphs for combined long sequences cob, nad2, and cox1 genes. The same four examples (14; Slo; A-A; CHM) are displayed as in Figure 1. Analogous calculations of the descriptors, as the ones shown in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, can be performed for these concatenated data from the three mitochondrial genes.

4. Conclusions

In the present work, non-standard bioinformatics studies on the genetic diversity of the cestode Echinococcus multilocularis in red foxes in Poland are performed. The 4D-Dynamic Representation of DNA/RNA Sequences, an alignment-free method proposed by us, has been applied [12].
Visualization of multidimensional method is restricted, but some aspects (appropriate projections into 3D space) are shown. The sequences corresponding to European, Asian, and American haplotypes are similar to each other, so the corresponding 3D projections nearly overlap [Figure 1 all panels (sequences No. 14 in Table 1, Table 2 and Table 3; No. 3, 6, and 8 in Table 4; and No. 3, 7, and 9 in Table 5 and Table 6), and Figure 8].
We observed much larger differences for coronaviruses in our previous study [12]. Our studies have shown that the distribution of clusters of points which emerged in the classification maps supports the hypothesis that SARS-CoV-2 may have originated in bat and in pangolin [12].
The considered sequence descriptors are sensitive enough to study the differences for Echinococcus multilocularis. Our first report based on the standard bioinformatics method indicated one Polish haplotype (EmPL9 found only in northeast Poland) of probable Asian origin [38]. The present studies indicate aspects of similarities (descriptors related to some properties of the sequences represented in the axes of the maps), in which Polish haplotypes are similar to sequences for different countries. By analyzing the clusters of points in the classification maps (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6), the Asian origin of one Polish haplotype (EmPL9) is confirmed.
In summary, by choosing the descriptors, we can reveal different properties of the sequences. In particular, the principal moments of inertia (the values used in the classical dynamics) are equal to the moments of inertia associated with the rotations around the principal axes. The moment of inertia of an object around a rotational axis describes how difficult it is to induce the rotation of the object around this axis. If the mass is concentrated far away from the axis, it is difficult to accelerate into spinning fast and the moment of inertia is large. As a consequence, the descriptors based on moments of inertia reflect the concentrations of masses of the 4D-dynamic graphs around the axes. This way, we can compare the shapes of the graphs representing the sequences.
The correct interpretation of biological and medical data strongly depends on the accuracy of the mathematical models used. Because the accuracy of the presented method is very high (the descriptors used in this method can recognize a difference by a single nucleobase in the compared sequences) the medical importance of the presented approach is significant.
An attractive application of this approach in our future research is predicting the development of viral sequences. Building a predictive model can be crucial in dealing with the future epidemics. Pilot calculations for the Zika virus showed that such an approach could be used to describe the time evolution of the viral genome sequences [12].

Author Contributions

Conceptualization, D.B.-W., P.W., A.L. and J.K.; methodology, D.B.-W. and P.W.; software, P.W. and D.B.-W., formal analysis, D.B.-W. and P.W.; writing—original draft preparation, D.B.-W.; visualization, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Centre, Poland (grant no. 2020/37/B/NZ7/03934).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The nucleotide sequence data used for the calculations are available in GenBank.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vinga, S.; Almeida, J. Alignment-free sequence comparison-a review. Bioinformatics 2003, 19, 513–523. [Google Scholar] [CrossRef] [PubMed]
  2. Jin, X.; Jiang, Q.; Chen, Y.; Lee, S.J.; Nie, R.; Yao, S.; Zhou, D.; He, K. Similarity/dissimilarity calculation methods of DNA sequences: A survey. J. Mol. Graph. Model. 2017, 76, 342–355. [Google Scholar] [CrossRef] [PubMed]
  3. Chenna, R.; Sugawara, H.; Koike, T.; Lopez, R.; Gibson, T.J.; Higgins, D.G.; Thompson, J.D. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31, 3497–3500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Altschul, S.; Gish, W.; Miller, W.; Myers, E.; Lipman, D. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
  5. Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
  6. Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Bielińska, A.; Majkowicz, M.; Bielińska-Wa̧ż, D.; Wa̧ż, P. Classification Studies in Various Areas of Science. In Numerical Methods and Applications; Nikolov, G., Kolkovska, N., Georgiev, K., Eds.; Conference Proceedings NMA 2018, Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11189, pp. 326–333. [Google Scholar]
  8. Bielińska, A.; Majkowicz, M.; Wa̧ż, P.; Bielińska-Wa̧ż, D. Mathematical Modeling: Interdisciplinary Similarity Studies. In Numerical Methods and Applications; Nikolov, G., Kolkovska, N., Georgiev, K., Eds.; Conference Proceedings NMA 2018, Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11189, pp. 334–341. [Google Scholar]
  9. Bielińska, A.; Bielińska-Wa̧ż, D.; Wa̧ż, P. Classification Maps in Studies on the Retirement Threshold. Appl. Sci. 2020, 10, 1282. [Google Scholar] [CrossRef] [Green Version]
  10. Bielińska, A.; Wa̧ż, P.; Bielińska-Wa̧ż, D. A Computational Model of Similarity Analysis in Quality of Life Research: An Example of Studies in Poland. Life 2022, 12, 56. [Google Scholar] [CrossRef]
  11. Bielińska-Wa̧ż, D.; Wa̧ż, P. Spectral-dynamic representation of DNA sequences. J. Biomed. Inform. 2017, 72, 1–7. [Google Scholar] [CrossRef]
  12. Bielińska-Wa̧ż, D.; Wa̧ż, P. Non-standard bioinformatics characterization of SARS-CoV-2. Comput. Biol. Med. 2021, 131, 104247. [Google Scholar] [CrossRef]
  13. Zhou, J.; Zhong, P.Y.; Zhang, T.H. A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks. Evol. Bioinform. 2016, 12, 229–235. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Czerniecka, A.; Bielińska-Wa̧ż, D.; Wa̧ż, P.; Clark, T. 20D-dynamic representation of protein sequences. Genomics 2016, 107, 16–23. [Google Scholar] [CrossRef] [PubMed]
  15. Saw, A.K.; Raj, G.; Das, M.; Talukdar, N.C.; Tripathy, B.C.; Nandi, S. Alignment-free method for DNA sequence clustering using Fuzzy integral similarity. Sci. Rep. 2019, 9, 3753. [Google Scholar] [CrossRef] [PubMed]
  16. Lichtblau, D. Alignment-free genomic sequence comparison using FCGR and signal processing. BMC Bionformatics 2019, 20, 742. [Google Scholar] [CrossRef] [Green Version]
  17. He, L.L.; Dong, R.; He, R.L.; Yau, S.S.T. A novel alignment-free method for HIV-1 subtype classification. Infect. Genet. Evol. 2020, 77, 104080. [Google Scholar] [CrossRef]
  18. Hamori, E.; Ruskin, J.H. Curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem. 1983, 258, 1318–1327. [Google Scholar] [CrossRef]
  19. Hamori, E. Novel DNA sequence representations. Nature 1985, 314, 585–586. [Google Scholar] [CrossRef]
  20. Gates, M.A. Simpler DNA sequence representations. Nature 1985, 316, 219. [Google Scholar] [CrossRef]
  21. Nandy, A. A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes. Curr. Sci. 1994, 66, 309–314. [Google Scholar]
  22. Leong, P.M.; Morgenthaler, S. Random walk and gap plots of DNA sequences. Comput. Appl. Biosci. 1995, 11, 503–507. [Google Scholar] [CrossRef] [Green Version]
  23. Randić, M.; Novič, M.; Plavšić, D. Milestones in graphical bioinformatics. Int. J. Quant. Chem. 2013, 113, 2413–2446. [Google Scholar] [CrossRef]
  24. Mizuta, S. Graphical Representation of Biological Sequences. In Bioinformatics in the Era of Post Genomics and Big Data; Abdurakhmonov, I.Y., Ed.; IntechOpen: London, UK, 2018. [Google Scholar]
  25. Aram, V.; Iranmanesh, A.; Majid, Z. Spider representation of DNA sequences. J. Comput. Theor. Nanos. 2014, 11, 418–420. [Google Scholar] [CrossRef]
  26. Hu, H.; Li, Z.; Dong, H.; Zhou, T. Graphical Representation and Similarity Analysis of Protein Sequences Based on Fractal Interpolation. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 182–192. [Google Scholar] [CrossRef]
  27. Mahmoodi-Reihani, M.; Abbasitabar, F.; Zare-Shahabadi, V. A novel graphical representation and similarity analysis of protein sequences based on physicochemical properties. Phys. A 2018, 510, 477–485. [Google Scholar] [CrossRef]
  28. Xie, X.L.; Zhao, Y.X. A 2D Non-degeneracy Graphical Representation of Protein Sequence and Its Applications. Curr. Bionformat. 2020, 15, 758–766. [Google Scholar] [CrossRef]
  29. Xie, G.S.; Jin, X.B.; Yang, C.L.; Pu, J.X.; Mo, Z.X. Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions. Acta Biotheor. 2018, 66, 113–133. [Google Scholar] [CrossRef] [PubMed]
  30. Liu, H.L. 2D graphical representation of dna sequence based on horizon lines from a probabilistic view. Biosci. J. 2018, 34, 744–750. [Google Scholar] [CrossRef]
  31. Wu, R.X.; Liu, W.J.; Mao, Y.Y.; Zheng, J. 2D Graphical Representation of DNA Sequences Based on Variant Map. IEEE Access 2020, 8, 173755–173765. [Google Scholar] [CrossRef]
  32. Raychaudhury, C.; Nandy, A. Indexing scheme and similarity measures for macromolecular sequences. J. Chem. Inf. Comput. Sci. 1999, 39, 243–247. [Google Scholar] [CrossRef]
  33. Randić, M.; Vračko, M.; Nandy, A.; Basak, S.C. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J. Chem. Inf. Comp. Sci. 2000, 40, 1235–1244. [Google Scholar] [CrossRef]
  34. Agüero-Chapin, G.; Sánchez-Rodríguez, A.; Hidalgo-Yanes, P.I.; Pérez-Castillo, Y.; Molina-Ruiz, R.; Marchal, K.; Vasconcelos, V.; Antunes, A. An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference. PLoS ONE 2011, 6, e26638. [Google Scholar] [CrossRef] [Green Version]
  35. Agüero-Chapin, G.; Galpert, D.; Molina-Ruiz, R.; Ancede-Gallardo, E.; Pérez-Machado, G.; De la Riva, G.A.; Antunes, A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2020, 10, 26. [Google Scholar] [CrossRef] [Green Version]
  36. Karamon, J.; Kochanowski, M.; Da̧browska, J.; Sroka, J.; Różycki, M.; Bilska-Zaja̧c, E.; Cencek, T. Dynamics of Echinococcus multilocularis infection in red fox populations with high and low prevalence of this parasite in Poland (2007–2014). J. Vet. Res. 2015, 59, 213–217. [Google Scholar]
  37. Nahorski, W.L.; Knap, J.P.; Pawłowski, Z.S.; Krawczyk, M.; Polański, J.; Stefaniak, J.; Patkowski, W.; Szostakowska, B.; Pietkiewicz, H.; Grzeszczuk, A.; et al. Human alveolar echinococcosis in Poland: 1990–2011. PLoS Negl. Trop. Dis. 2013, 7, e1986. [Google Scholar] [CrossRef]
  38. Karamon, J.; Stojecki, K.; Samorek-Pieróg, M.; Bilska-Zaja̧c, E.; Różycki, M.; Chmurzyńska, E.; Sroka, J.; Zdybel, J.; Cencek, T. Genetic diversity of Echinococcus multilocularis in red foxes in Poland: The first report of a haplotype of probable Asian origin. Folia Parasitol. 2017, 64, 007. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  39. Hofer, S.; Gloor, S.; Muller, U.; Mathis, A.; Hegglin, D.; Deplazes, P. High prevalence of echinococcus multilocularis in urban red foxes (Vulpes vulpes) and voles (Arvicola terrestris) in the city of Zurich, Switzerland. Parasitology 2000, 120, 135–142. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Nakao, M.; Xiao, N.; Okamoto, M.; Yanagida, T.; Sako, Y.; Ito, A. Geographic pattern of genetic variation in the fox tapeworm Echinococcus multilocularis. Parasitol. Int. 2009, 58, 384–389. [Google Scholar] [CrossRef]
Figure 1. x 2 x 3 x 4 -graphs representing cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. Notations: 14—Polish haplotype (sequences No. 14 in Table 1, Table 2 and Table 3); Slo—Slovakia; A-A—USA, Alaska (St. Lawrence Island); CHM—China (Inner Mongolia).
Figure 1. x 2 x 3 x 4 -graphs representing cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. Notations: 14—Polish haplotype (sequences No. 14 in Table 1, Table 2 and Table 3); Slo—Slovakia; A-A—USA, Alaska (St. Lawrence Island); CHM—China (Inner Mongolia).
Life 12 00877 g001
Figure 2. Classification maps for cob gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 1); A-A—USA, Alaska (St. Lawrence Island); A-I—USA, Indiana; Aus—Austria; CHM—China (Inner Mongolia); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Figure 2. Classification maps for cob gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 1); A-A—USA, Alaska (St. Lawrence Island); A-I—USA, Indiana; Aus—Austria; CHM—China (Inner Mongolia); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Life 12 00877 g002
Figure 3. Classification maps for nad2 gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 2); A-A—USA, Alaska (St. Lawrence Island); A-I—USA (Indiana); Aus—Austria; CHM—China (Inner Mongolia); CHS—China (Sichuan); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Figure 3. Classification maps for nad2 gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 2); A-A—USA, Alaska (St. Lawrence Island); A-I—USA (Indiana); Aus—Austria; CHM—China (Inner Mongolia); CHS—China (Sichuan); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Life 12 00877 g003
Figure 4. Classification maps for the cox1 gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 3); A-A—USA, Alaska (St. Lawrence Island); A-I—USA, Indiana; Aus—Austria; CHM—China (Inner Mongolia); CHS—China (Sichuan); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Figure 4. Classification maps for the cox1 gene: r k 4 D r l 4 D μ m (left panel) and μ k μ l r m 4 D (right panel); k , l , m = 1 , 2 , 3 , 4 . Colors: blue—Europe excluding Poland; red—Asia; green—America; black—Poland. Detailed notations: 1 , 2 , 15 —Polish haplotypes (Table 3); A-A—USA, Alaska (St. Lawrence Island); A-I—USA, Indiana; Aus—Austria; CHM—China (Inner Mongolia); CHS—China (Sichuan); Fra—France; Jap—Japan (Hokkaido); Kaz—Kazakhstan; Slo—Slovakia.
Life 12 00877 g004
Figure 5. Classification maps μ 1 μ 2 μ 4 for cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. The colors and the detailed notations are the same as in Figure 2, Figure 3 and Figure 4.
Figure 5. Classification maps μ 1 μ 2 μ 4 for cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. The colors and the detailed notations are the same as in Figure 2, Figure 3 and Figure 4.
Life 12 00877 g005
Figure 6. Classification maps μ 2 μ 3 μ 4 for cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. The colors and the detailed notations are the same as in Figure 2, Figure 3 and Figure 4.
Figure 6. Classification maps μ 2 μ 3 μ 4 for cob (top panel), nad2 (middle panel), and cox1 (bottom panel) genes. The colors and the detailed notations are the same as in Figure 2, Figure 3 and Figure 4.
Life 12 00877 g006
Figure 7. Cluster dendrogram obtained using Euclidean distance measure and r 3 4 D , r 1 4 D , and μ 3 for the cob gene (top left panel of Figure 2).
Figure 7. Cluster dendrogram obtained using Euclidean distance measure and r 3 4 D , r 1 4 D , and μ 3 for the cob gene (top left panel of Figure 2).
Life 12 00877 g007
Figure 8. x 2 x 3 x 4 -graphs representing cob nad2 cox1 genes.
Figure 8. x 2 x 3 x 4 -graphs representing cob nad2 cox1 genes.
Life 12 00877 g008
Table 1. Principal moments of inertia of 4D-dynamic graphs representing the cob gene for Poland ( N = 1068 ).
Table 1. Principal moments of inertia of 4D-dynamic graphs representing the cob gene for Poland ( N = 1068 ).
No.AccessionPolish Haplotype I 1 / 10 5 I 2 / 10 5 I 3 / 10 5 I 4 / 10 2
1KY205662EmPL1 cob_A335.9335.8335.7325.5
2KY205663EmPL2 cob_G335.9335.8335.7325.5
3KY205664EmPL3 cob_E336.0335.9335.8325.7
4KY205665EmPL4 cob_A335.9335.8335.7325.5
5KY205666EmPL5 cob_D336.7336.6336.5313.5
6KY205667EmPL6 cob_A335.9335.8335.7325.5
7KY205668EmPL7 cob_H335.4335.4335.3313.6
8KY205669EmPL8 cob_A335.9335.8335.7325.5
9KY205670EmPL9 cob_B338.5338.4338.3341.9
10KY205671EmPL10 cob_C335.9335.8335.7333.7
11KY205672EmPL11 cob_F336.0335.9335.8316.7
12KY205673EmPL12 cob_A335.9335.8335.7325.5
13KY205674EmPL13 cob_A335.9335.8335.7325.5
14KY205675EmPL14 cob_J336.2336.1336.0332.8
15KY205676EmPL15 cob_I335.4335.4335.3313.6
Table 2. Principal moments of inertia of 4D-dynamic graphs representing the nad2 gene for Poland.
( N = 882 ).
Table 2. Principal moments of inertia of 4D-dynamic graphs representing the nad2 gene for Poland.
( N = 882 ).
No.AccessionPolish Haplotype I 1 / 10 5 I 2 / 10 5 I 3 / 10 5 I 4 / 10 2
1KY205692EmPL1 nad_A207.7207.7207.6181.5
2KY205693EmPL2 nad_A207.7207.7207.6181.5
3KY205694EmPL3 nad_D207.8207.8207.7189.3
4KY205695EmPL4 nad_A207.7207.7207.6181.5
5KY205696EmPL5 nad_D207.8207.8207.7189.3
6KY205697EmPL6 nad_A207.7207.7207.6181.5
7KY205698EmPL7 nad_A207.7207.7207.6181.5
8KY205699EmPL8 nad_C207.7207.6207.6178.2
9KY205700EmPL9 nad_B208.4208.4208.3176.9
10KY205701EmPL10 nad_A207.7207.7207.6181.5
11KY205702EmPL11 nad_D207.8207.8207.7189.3
12KY205703EmPL12 nad_C207.7207.6207.6178.2
13KY205704EmPL13 nad_C207.7207.6207.6178.2
14KY205705EmPL14 nad_D207.8207.8207.7189.3
15KY205706EmPL15 nad_A207.7207.7207.6181.5
Table 3. Principal moments of inertia of 4D-dynamic graphs representing the cox1 gene for Poland.
( N = 1608 ).
Table 3. Principal moments of inertia of 4D-dynamic graphs representing the cox1 gene for Poland.
( N = 1608 ).
No.AccessionPolish Haplotype I 1 / 10 6 I 2 / 10 6 I 3 / 10 6 I 4 / 10 3
1KY205677EmPL1 cox_A117.1117.1117.0132.2
2KY205678EmPL2 cox_B117.1117.1117.0134.1
3KY205679EmPL3 cox_B117.1117.1117.0134.1
4KY205680EmPL4 cox_B117.1117.1117.0134.1
5KY205681EmPL5 cox_B117.1117.1117.0134.1
6KY205682EmPL6 cox_C117.0117.0116.9131.1
7KY205683EmPL7 cox_A117.1117.1117.0132.2
8KY205684EmPL8 cox_D117.1117.0117.0134.6
9KY205685EmPL9 cox_E117.3117.3117.2140.4
10KY205686EmPL10 cox_B117.1117.1117.0134.1
11KY205687EmPL11 cox_B117.1117.1117.0134.1
12KY205688EmPL12 cox_B117.1117.1117.0134.1
13KY205689EmPL13 cox_F117.2117.2117.1138.9
14KY205690EmPL14 cox_G117.1117.1117.0134.1
15KY205691EmPL15 cox_B117.1117.1117.0134.1
Table 4. Principal moments of inertia of 4D-dynamic graphs representing the cob gene for different countries ( N = 1068 ).
Table 4. Principal moments of inertia of 4D-dynamic graphs representing the cob gene for different countries ( N = 1068 ).
No.AccessionCountry I 1 / 10 5 I 2 / 10 5 I 3 / 10 5 I 4 / 10 2
1.AB461395Austria335.9335.8335.7325.5
2.AB461396France336.6336.5336.4313.2
3.AB461397Slovakia336.0335.9335.8325.7
4.AB461398Kazakhstan338.5338.4338.3341.9
5.AB461399Japan (Hokkaido)338.6338.5338.4359.2
6.AB461400USA (Alaska)338.4338.4338.3328.7
7.AB461401USA (Indiana)342.1342.1342.0281.8
8.AB461402China (Mongolia)338.5338.5338.4298.7
Table 5. Principal moments of inertia of 4D-dynamic graphs representing the nad2 gene for different countries ( N = 882 ).
Table 5. Principal moments of inertia of 4D-dynamic graphs representing the nad2 gene for different countries ( N = 882 ).
No.AccessionCountry I 1 / 10 5 I 2 / 10 5 I 3 / 10 5 I 4 / 10 2
1.AB461403Austria207.7207.7207.6164.6
2.AB461404France207.8207.7207.7167.3
3.AB461405Slovakia207.9207.8207.8174.7
4.AB461406Kazakhstan208.4208.4208.3163.2
5.AB461407Japan (Hokkaido)208.9208.9208.8171.8
6.AB461408China (Sichuan)209.0208.9208.8162.1
7.AB461409USA (Alaska)206.8206.8206.7157.7
8.AB461410USA (Indiana)207.7207.6207.6161.7
9.AB461411China (Mongolia)208.0208.0207.9146.7
Table 6. Principal moments of inertia of 4D-dynamic graphs representing the cox1 gene for different countries ( N = 1608 ).
Table 6. Principal moments of inertia of 4D-dynamic graphs representing the cox1 gene for different countries ( N = 1608 ).
No.AccessionCountry I 1 / 10 6 I 2 / 10 6 I 3 / 10 6 I 4 / 10 3
1.AB461412Austria117.0117.0116.9139.0
2.AB461413France117.3117.3117.2134.3
3.AB461414Slovakia117.1117.1117.0134.1
4.AB461415Kazakhstan117.4117.4117.3142.8
5.AB461416Japan (Hokkaido)117.3117.3117.2138.8
6.AB461417China (Sichuan)117.3117.3117.2140.4
7.AB461418USA (Alaska)117.3117.3117.2145.2
8.AB461419USA (Indiana)117.4117.4117.3146.2
9.AB461420China (Mongolia)117.1117.1117.0137.9
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bielińska-Wąż, D.; Wąż, P.; Lass, A.; Karamon, J. 4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of Echinococcus multilocularis in Red Foxes in Poland. Life 2022, 12, 877. https://doi.org/10.3390/life12060877

AMA Style

Bielińska-Wąż D, Wąż P, Lass A, Karamon J. 4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of Echinococcus multilocularis in Red Foxes in Poland. Life. 2022; 12(6):877. https://doi.org/10.3390/life12060877

Chicago/Turabian Style

Bielińska-Wąż, Dorota, Piotr Wąż, Anna Lass, and Jacek Karamon. 2022. "4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of Echinococcus multilocularis in Red Foxes in Poland" Life 12, no. 6: 877. https://doi.org/10.3390/life12060877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop