The arrangement of A, C, G and T nucleotides in large DNA sequences of many prokaryotic and eukaryotic cells exhibit long-range correlations with fractal properties. Chaos game representation (CGR) of such DNA sequences, followed by a multifractal analysis, is a useful way to analyze the corresponding scaling properties. This approach provides a powerful visualization method to characterize their spatial inhomogeneity, and allows discrimination between mono- and multifractal distributions. However, in some cases, two different arbitrary point distributions, may generate indistinguishable multifractal spectra. By using a new model based on multiplicative deterministic cascades, here it is shown that small-angle scattering (SAS) formalism can be used to address such issue, and to extract additional structural information. It is shown that the box-counting dimension given by multifractal spectra can be recovered from the scattering exponent of SAS intensity in the fractal region. This approach is illustrated for point distributions of CGR data corresponding to Escherichia coli
and Mouse mitochondrial
DNA, and it is shown that for the latter two cases, SAS allows extraction of the fractal iteration number and the scaling factor corresponding to “ACGT” square, or to recover the number of bases. The results are compared with a model based on multiplicative deterministic cascades, and respectively with one which takes into account the existence of forbidden sequences in DNA. This allows a classification of the DNA sequences in terms of random and deterministic fractals structures emerging in CGR.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited