Complex Network Characterization Using Graph Theory and Fractal Geometry: The Case Study of Lung Cancer DNA Sequences

: This paper discusses an approach developed for exploiting the local elementary movements of evolution to study complex networks in terms of shared common embedding and, consequently, shared fractal properties. This approach can be useful for the analysis of lung cancer DNA sequences and their properties by using the concepts of graph theory and fractal geometry. The proposed method advances a renewed consideration of network complexity both on local and global scales. Several researchers have illustrated the advantages of fractal mathematics, as well as its applicability to lung cancer research. Nevertheless, many researchers and clinicians continue to be unaware of its potential. Therefore, this paper aims to examine the underlying assumptions of fractals and analyze the fractal dimension and related measurements for possible application to complex networks and, especially, to the lung cancer network. The strict relationship between the lung cancer network properties and the fractal dimension is proved. Results show that the fractal dimension decreases in the lung cancer network while the topological properties of the network increase in the lung cancer network. Finally, statistical and topological signiﬁcance between the complexity of the network and lung cancer network is shown.


Introduction
Theoretical models on complex networks have assumed a key role in numerous disciplines, ranging from computer science, physics, sociology, engineering, and medicine, to molecular, population biology, and deoxyribonucleic acid (DNA) sequences analysis [1,2]. In the classical conception of DNA geometry, the double helix represents a ribbon constructed from smooth curves describing an idealized structure. How to read and recognize the primary structure of a DNA sequence seems to be a fundamental problem. In recent years, DNA sequencing and fragment assembly have received a much more specific attention for improving the reconstruction of full strands of DNA focused on the pieces of data to record. Fragment assembly due to imperfect data sets constitutes one of the most interesting challenges for researchers. In this paper, the concept of graph theory and fractal geometry are proposed a particular way of analyzing DNA sequences and their properties. Many scientific papers [3] dealing with DNA sequences analysis [4] have been published. However, the main aim of the work is to outline the advantage of applying graph theory and fractals in the study of lung cancer DNA sequences. This application allows the existence of hidden geometries to be shown, which can help underline more clearly the structure of a DNA sequence. An important consideration of this research is that this geometry has fractal structures. In order to achieve this goal, some basic steps must be observed, and they are indicated as follows: (1) Converting DNA nucleotides into a new coordinate system; (2) presenting DNA nucleotides as a path in a graph in the new coordinate system; (3) connecting points into a continuous graph; (4) using a method for estimating the Hurst exponent H; (5) using fractal geometry to determine the complexity of the graph; (6) constructing visibility graphs; and (7) calculating statistical and topological properties of the DNA network.
In this perspective, this paper introduces an initial presentation of the concepts of graph theory, fractals, and pattern recognition for their possible use in the calculation of statistical and topological properties of the DNA network and the characterization of DNA sequences in lung cancer.
The study of graph theory [5,6] was much elaborated in the twentieth century when the use of modeling techniques was implemented, and it has grown significantly to be routinely applied in various branches of science and engineering. Obviously, as special applications have multiplied in number and scope, the theory has developed considerably. It is known in mathematics that through graphs, complex relationships between many objects are commonly modelled. It is right to state that a graph is defined by a connection of nodes (also known as vertices) and edges. As an abstract data point, a node could represent anything, like nucleotides of DNA sequences. The application of mathematics to systems (such as a database of DNA sequences) with a large number of vertices and edges seems possible now as computers can be effectively adopted to provide solutions to such large graphs.
Fractals [7][8][9][10] are complex patterns occurring with self-similarly across different scales characterized by a fractal dimension. As mathematical descriptions of natural shapes, they become models generated by equations resulting in chaotic systems. In mathematical visualization, they are created from simple patterns that continue repeating over and over in a recursive process whose irregularity is very difficult to be described using classical geometry. Fractals can be thought of as images of dynamic systems. What makes fractal geometry more interesting for many scientists is its useful application in medicine, biology, and geology. In biology, chaotic systems can be used to show the rhythms of heartbeats, walking strides, and even the biological changes of aging. Fractals can be used to model the structures of nerve networks, circulatory systems, lungs, and even DNA [11]. Fractal geometry has been demonstrated to be a powerful tool for solving important problems in applied science [12]. Much more physical fractal systems are being rapidly known. The use of concepts of fractal geometry in life sciences has contributed significantly to understanding the complexity and topological properties of networks that characterize DNA sequences.
Pattern recognition [13] involves the evolution of systems that are oriented to find a solution to a given problem by using a set of example instances represented by a number of features. Statistical pattern recognition basically treats the problem of the automatic identification of objects belonging to one of the possible classes. Pattern recognition, dealing with identifying regularities in data with algorithms that learn to solve a problem using a limited set of measurement data, is closely related to informatics education. Statistical DNA pattern recognition is a technique based on finding genetic code in the DNA sequence and has applications in forensics, genetic engineering [14], bioinformatics [15], DNA nanotechnology, history, and so on.
Hypoxia-inducible factor 1α (HIF1A) [16] is the master transcriptional regulator of the cellular and systemic homeostatic response to hypoxia ruling the execution of the program that makes cells capable of running hypoxic stress [17]. As a protein-coding gene, HIF1A is the oxygen-sensitive subunit of the hypoxia-inducible factor (HIF), which, active under hypoxic conditions, plays an essential role in tumors. It functions by mediating transcription of over 40 genes involved in survival, glucose metabolism, invasion, metastasis, and angiogenesis (e.g., VEGF). Two subunits constitute the hypoxia-inducible factor 1 (HIF1) transcription factor as a heterodimer: Hypoxia-inducible factor 1, α subunit (basic helix-loop-helix transcription factor) (HIF1A/HIF-1α), and aryl hydrocarbon receptor nuclear translocator (ARNT/HIF-1β) 1,2; though, during ongoing synthesis, the HIF1A protein degrades swiftly by the ubiquitin-proteasome system under normal oxygen concentrations (normoxia) [18]. An adaptive response is driven to hypoxia by activating the expression of genes in which regulate erythropoiesis, angiogenesis, and glycolysis [19]. HIF1 is overexpressed in several cancers, often connected to poor prognosis, and is considered an interesting target for pharmacological manipulation [20][21][22].
Globally, lung cancer [23] has shown to be one of the most common types of cancer since 1985, in relation to incidence and mortality rates. Basically, two general types of lung cancer are recognized and are classified as non-small cell lung cancer (NSCL) [24][25][26][27][28][29][30] and small cell lung cancer (SCLC). They start growing in the lungs: The most common type is NSCL, while SCLC grows and spreads faster than NSCL. The first application of massively parallel sequencing in lung cancer research was published in 2008; most recently, an examination of paired NSCLC and normal lung tissue from a never-smoking patient with adenocarcinoma by [31].
Hypermethylation of cytosine guanine (CpG) islands [32] located in the promoter regions of tumor suppressor genes is now assumed as an important mechanism for gene inactivation. CpG island hypermethylation has resulted in almost every tumor type. Before 1994, CpG island promoter hypermethylation was conceived as a mechanism to inactivate genes in cancer fully restored as a result of the discovery that the von Hippel-Lindau (VHL) gene also undergoes methylation-associated inactivation [33].
Therefore, in this study, the application of a novel method is proposed, focusing on network and graph theory, fractal geometry, and statistical pattern recognition, to define the prognostic value of HIF-1α expression in surgically treated lung cancer patients. The paper is structured into four sections, excluding the introduction: In the second section, the proposed method is illustrated, providing a definition of the selected concepts considered as useful tools for the analysis of DNA sequences in relation to the carcinoma of the lung; in Section 3, the results obtained in terms of the main differences of the topological and statistical properties between lung cancer and non-cancer DNA networks are presented; Section 4 is a discussion on the functional role of HIF1A with respect to the connection between networks and fractals in DNA sequencing; and, finally, the conclusions of this study are provided.

Methodology
The approach that has orientated the development of this method focuses on the interaction between some theoretical concepts and assumptions that we consider relevant to treat the case of carcinoma of the lung. This section is thus divided into five subsections where these concepts are discussed. In the first subsection, we present carcinoma of the lung. In Section 2.2, a definition of the basic notions of graph theory and fractal geometry is provided. In Section 2.3, we describe the proposed method for DNA sequencing, which is based on the graph theory and fractal geometry. Section 2.4 presents statistical DNA pattern recognition. The topological properties of DNA graphs are also presented. The last section concerns data Preparation-Application using DNA data from the Homo sapiens HIF1A base.

Carcinoma of the Lung
Carcinoma of the lung is the leading cause of cancer death worldwide. Hypermethylation of CpG islands in the promoter regions of genes easily occurs in lung cancer, as shown by investigating the methylation status of over 40 genes from lung cancer tumors, cell lines, patient sputum, and/or serum. A problematic issue will be to harness the power of methylation signatures, which involves recognizing possible markers, the testing of these markers through case-control association studies and prospective trials [34], and developing statistical tools for the analysis of complex methylation data [35]. The fact that lung cancer is a heterogeneous disease implies that effective lung cancer biomarkers will probably require patient-specific molecular defects, clinical characters, and elements of the tumor microenvironment to be addressed. In [36], many emerging clinical bioinformatics-based strategies were studied specifically related to lung cancer. In [37], authors suggested a new model based on epigenomics data with the aim of predicting transcriptome-level differential gene expression in lung cancers. Dropping-off feature sets by data type show that CpG methylation features have a significant role in the prediction. In eukaryotic cells, the DNA methylation figures at the 5 carbon position of the cytosine residue in the cytosine guanine (CpG) dinucleotide context. In [38], authors developed a computational model to detect those CpG islands that are methylated in colon cancer and unmethylated in normal cells, respectively. They created a highly accurate prediction model for those CpG islands where methylation differentiation is involved in colon cancer, which was assessed by using extensive cross-validation and generalization testing experiments. In [39], authors integrated the ordering networks, classification information, and pathway database to develop the topology-based pathway analysis for identifying cancer class-specific pathways, which might be essential in the biological significance of cancer. In our research, we present a new method using a hybrid system of two mathematical disciplines of fractal geometry and network theory. With a new topology classification and a new approach to determining complexity by using fractal geometry of a DNA network, we can accurately distinguish lung cancer DNA sequences from non-cancer DNA sequences.

Graph Theory and Fractal Geometry
First, let us introduce a few concepts from graph theory. A graph G = (V, E) consists of a set V of nodes (or vertices) and a set E of edges between these nodes, where each edge is an unordered pair of nodes. The number of nodes in a graph is the size of the graph. Nodes that are connected by edges are called neighbors and the number of neighbors a node has is called a node degree. We denote with N(v) the neighbors of v and with d(v) = |N(v)| the degree of the node v. Graphs are the main objects studied in graph theory.
In practice, graphs represent some real-life objects and relations within them. Statistical properties of the graph are often also calculated, such as the clustering coefficient, which measures how much nodes tend to cluster together. For a graph G = (V, E), the local clustering coefficient of a node v is defined as and the global clustering coefficient of a graph as defined by Watts and Strogatz is a network average clustering coefficient, i.e., Now, let us introduce a field of fractal geometry, which is a branch of geometry that studies fractals. It offers a plethora of tools for describing and predicting natural phenomena by exploiting iterated mathematical concepts, such as complex number fractals, iterated function systems, and cellular automata. Perhaps, two of the most well-known complex number fractals are Julia and Mandelbrot sets. The latter is described by the recursive formula, where z 0 = 0 and c = x + y·i is an initial complex number; here, i denotes the imaginary unit.
Repeating the calculation of z n and observing the values result in a fractal. In particular, the z n is observed whether it remains near the origin or escapes to infinity. We say that a point c is a member of the Mandelbrot set if it remains near the origin. The process may be repeated for every point in the complex plane. Moreover, it is often graphically represented by coloring the points depending on the result. Such representations exhibit one of the main properties of fractals, namely, self-similarity.
Another important property of fractals is their fractal dimension, which describes the capacity of the fractal pattern to fill the space where the fractal scales differently from the space it is embedded in. It is common that a fractal exhibits a non-integer fractal dimension.
One way to calculate a fractal dimension D is by relating it to the Hurst exponent H; in particular, D = 2 − H. The Hurst exponent is defined for a time series as where R(n) is the range of the first n cumulative deviations from the mean, S(n) is their standard deviation, n is the time span of the observation, and C is a constant.

A New Method for DNA Sequencing
We have different base DNA sequences. For example, the coding sequences of the first exon of the β-globin gene of Human (Homo sapiens) (92 bases) is:

ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTG GAGTAAGTTGGTGGTGAGGCCCTGGGCAG
For each nucleotide of the DNA base, we present an Archimedean spiral. These nucleotides are presented in the polar coordinate system. The polar coordinate system is a two-dimensional coordinate system in which each point located in a plane is determined by specifying a distance r and angle ϕ from the origin of the system.
Firstly, we determine n ∈ N of all nucleotides (in Human (92 bases), we have 92 bases, also n = 92). Then, we determine the number of all nucleotides n(A), n(C), n(G), and n(T) (in Human (92 bases), we have 17 nucleotides of A, 19 nucleotides of C, 36 nucleotides of G, and 20 nucleotides of T). After the determination of the number of all nucleotides, we calculate ϕi, for each nucleotide i ∈ {A, C, G, T}, as follows: In Figure 1, nucleotides of a DNA base on an Archimedean spiral are presented. Secondly, we transform Archimedean coordinates into Cartesian coordinates: In Equation (1), we determine a = 1 and b = 1 (we can use ∀a,b). Thus, r = 1 + ϕ and (r,ϕ)→((1 + ϕ)cosϕ, (1 + ϕ)sinϕ).
In Figure 1, nucleotides of a DNA base on an Archimedean spiral are presented. Secondly, we transform Archimedean coordinates into Cartesian coordinates: In Figure 2, the transformation of Archimedean coordinates to Cartesian coordinates is presented.
In Equation (1), we determine a = 1 and b = 1 (we can use ∀a,b). Thus, r = 1+ϕ and As we have 4 different ϕi, i∈{A,C,G, T}, and ϕ(A), ϕ(C), ϕ(G), ϕ(T), we have transformation In Figure 2, the transformation of Archimedean coordinates to Cartesian coordinates is presented. In the fourth step, we estimate the complexity of the network. In addition, we have coordinates of DNA sequences (r,ϕ), which determine the network of DNA. Nucleotides of DNA in sequences present nodes of the network. We find how many nodes lie on a line with equal ϕ on the network of DNA, where 0 < ϕ < 2π. For each ϕi, where i∈N, we find j nodes, where j∈N ( Figure 3). Firstly, nodes that have equal ϕ are connected together sequentially. Secondly, each nucleotide C is connected with G in CpG islands in DNA sequences ( Figure 4).  In the fourth step, we estimate the complexity of the network. In addition, we have coordinates of DNA sequences (r,ϕ), which determine the network of DNA. Nucleotides of DNA in sequences present nodes of the network. We find how many nodes lie on a line with equal ϕ on the network of DNA, where 0 < ϕ < 2π. For each ϕ i , where i ∈ N, we find j nodes, where j ∈ N ( Figure 3). Firstly, nodes that have equal ϕ are connected together sequentially. Secondly, each nucleotide C is connected with G in CpG islands in DNA sequences ( Figure 4).
As we have 4 different ϕi, i∈{A,C,G, T}, and ϕ(A), ϕ(C), ϕ(G), ϕ(T), we have transformation In Figure 2, the transformation of Archimedean coordinates to Cartesian coordinates is presented. In the fourth step, we estimate the complexity of the network. In addition, we have coordinates of DNA sequences (r,ϕ), which determine the network of DNA. Nucleotides of DNA in sequences present nodes of the network. We find how many nodes lie on a line with equal ϕ on the network of DNA, where 0 < ϕ < 2π. For each ϕi, where i∈N, we find j nodes, where j∈N ( Figure 3). Firstly, nodes that have equal ϕ are connected together sequentially. Secondly, each nucleotide C is connected with G in CpG islands in DNA sequences ( Figure 4).  In the next step, we determine the sequence (ϕi, j), where i,j ∈ N. Sequence (ϕi, j) presents the graph Γ(ϕi, j) shown in Figure 5. Fractal mining is characterized by different statistical properties from conventional ones. It can present a probability distribution function, a slowly decaying autocorrelation function, and a power spectrum function of type. Otherwise, it can show the statistical dependence, either long-range dependence or short-range dependence, and global or local self-similarity. When self-similar, it is invariant with reference to the scale adopted to observe the data set. It is assumed that fractals have a scaling property that is displayed at every scale used to look at them, whereas the self-similarity is recognized at many data sets over a range of scales. The fractal dimension is useful to measure self-similarity. As a relevant aspect of many complex systems, the fractal dimension can play a powerful representation technique. The DNA graph Γ(ϕi, j) of sequence (ϕi, j) from DNA sequences does not have self-similarity but statistical self-affinity. In addition, graph Γ(ϕi, j) is a statistical self-affinity function. Let W be a function obtained by the sampling of function Y(X) from x: Z(x) = Y(ax), where a is repeating sampling interval (2,3,4). As Y(a) is a statistical self-affinity function, we have W(x) = Y(ax). Thus, Z(a) = a −H Y(ax), and this is the affine function. In this relation, H presents the Hurst exponent. In addition, for graph Γ(ϕi, j), we estimate Hurst exponent H [40] with the R/S method [41]. After estimating the Hurst exponent H, we calculate the fractal dimension D = 2 − H, which determines the complexity of the DNA network.

Statistical DNA Pattern Recognition
The perspective of attributing a certain relevance to different network properties allows the study of the internal organization of a biological network, the repartition of molecules among cellular processes, and the structure of DNA sequences. As follows, the fundamental properties that In the next step, we determine the sequence (ϕ i , j), where i, j ∈ N. Sequence (ϕ i , j) presents the graph Γ (ϕi, j) shown in Figure 5. In the next step, we determine the sequence (ϕi, j), where i,j ∈ N. Sequence (ϕi, j) presents the graph Γ(ϕi, j) shown in Figure 5. Fractal mining is characterized by different statistical properties from conventional ones. It can present a probability distribution function, a slowly decaying autocorrelation function, and a power spectrum function of type. Otherwise, it can show the statistical dependence, either long-range dependence or short-range dependence, and global or local self-similarity. When self-similar, it is invariant with reference to the scale adopted to observe the data set. It is assumed that fractals have a scaling property that is displayed at every scale used to look at them, whereas the self-similarity is recognized at many data sets over a range of scales. The fractal dimension is useful to measure self-similarity. As a relevant aspect of many complex systems, the fractal dimension can play a powerful representation technique. The DNA graph Γ(ϕi, j) of sequence (ϕi, j) from DNA sequences does not have self-similarity but statistical self-affinity. In addition, graph Γ(ϕi, j) is a statistical self-affinity function. Let W be a function obtained by the sampling of function Y(X) from x: Z(x) = Y(ax), where a is repeating sampling interval (2,3,4). As Y(a) is a statistical self-affinity function, we have W(x) = Y(ax). Thus, Z(a) = a −H Y(ax), and this is the affine function. In this relation, H presents the Hurst exponent. In addition, for graph Γ(ϕi, j), we estimate Hurst exponent H [40] with the R/S method [41]. After estimating the Hurst exponent H, we calculate the fractal dimension D = 2 − H, which determines the complexity of the DNA network.

Statistical DNA Pattern Recognition
The perspective of attributing a certain relevance to different network properties allows the study of the internal organization of a biological network, the repartition of molecules among cellular processes, and the structure of DNA sequences. As follows, the fundamental properties that Fractal mining is characterized by different statistical properties from conventional ones. It can present a probability distribution function, a slowly decaying autocorrelation function, and a power spectrum function of type. Otherwise, it can show the statistical dependence, either long-range dependence or short-range dependence, and global or local self-similarity. When self-similar, it is invariant with reference to the scale adopted to observe the data set. It is assumed that fractals have a scaling property that is displayed at every scale used to look at them, whereas the self-similarity is recognized at many data sets over a range of scales. The fractal dimension is useful to measure self-similarity. As a relevant aspect of many complex systems, the fractal dimension can play a powerful representation technique. The DNA graph Γ (ϕi, j) of sequence (ϕ i , j) from DNA sequences does not have self-similarity but statistical self-affinity. In addition, graph Γ (ϕi, j) is a statistical self-affinity function. Let W be a function obtained by the sampling of function Y(X) from x: Z(x) = Y(ax), where a is repeating sampling interval (2,3,4). As Y(a) is a statistical self-affinity function, we have W(x) = Y(ax). Thus, Z(a) = a −H Y(ax), and this is the affine function. In this relation, H presents the Hurst exponent. In addition, for graph Γ (ϕi, j) , we estimate Hurst exponent H [40] with the R/S method [41]. After estimating the Hurst exponent H, we calculate the fractal dimension D = 2 − H, which determines the complexity of the DNA network.

Statistical DNA Pattern Recognition
The perspective of attributing a certain relevance to different network properties allows the study of the internal organization of a biological network, the repartition of molecules among cellular processes, and the structure of DNA sequences. As follows, the fundamental properties that will be analyzed in DNA networks are briefly illustrated. Firstly, we calculate the topological property density, average degree, the Zagreb group index 1, the Zagreb group index 2, the Platt index, edges, type of triads 3-102, type of triads 16-300, Watts-Strogatz clustering coefficient, and fractal dimension of cancer and non-cancer DNA networks. Then, we calculate statistical properties standard deviation, standard error, variance, mean absolute deviation, coefficient of variation, coefficient of dispersion, and Pearson's contingency coefficient of graphs Γ (ϕi, j) of sequence (ϕ i , j) for cancer and non-cancer DNA sequences. For calculating topological properties, we use the program Pajek [42].

Data Preparation-Application
We use the Homo sapiens HIF1A base. HIF-1a is encoded by the HIF1A and HIF1A loci, which map to regions of conserved synteny on human chromosome 14q21-q24 and mouse chromosome 12, respectively [43]. HIF1A consists of 15 exons and 14 introns. All the 5 and 3 splice junctions of the HIF1A gene conformed to established consensus sequences [44]. Each of the 14 introns of the HIF1A gene interrupts the coding sequence at the same location as in the mouse HIF1A gene [45]. The length of introns between these exons on DNA is long enough to avoid DNA amplification. We use methods of statistical DNA pattern recognition for cancer and non-cancer DNA sequences for all exons, introns, and for 5 flanking sequences and 3 flanking sequences. Figure 6 presents the structure of the human HIF1A gene.

Data Preparation -Application
We use the Homo sapiens HIF1A base. HIF-1a is encoded by the HIF1A and HIF1A loci, which map to regions of conserved synteny on human chromosome 14q21-q24 and mouse chromosome 12, respectively [43]. HIF1A consists of 15 exons and 14 introns. All the 5´ and 3´splice junctions of the HIF1A gene conformed to established consensus sequences [44]. Each of the 14 introns of the HIF1A gene interrupts the coding sequence at the same location as in the mouse HIF1A gene [45]. The length of introns between these exons on DNA is long enough to avoid DNA amplification. We use methods of statistical DNA pattern recognition for cancer and non-cancer DNA sequences for all exons, introns, and for 5´ flanking sequences and 3´ flanking sequences. Figure 6 presents the structure of the human HIF1A gene.

Results
Complexity in biological systems is predicted by describing interactions amongst different DNA structures. Graphs are emerging as indispensable tools in explaining how the DNA structure functions. We calculated topological properties that describe Lung gene networks. In Table 1, we present the SNP ID/substitution, their genomic location, and association to diseases and phenotypes. In Table 2, we show the topological and statistical properties of the non-cancer DNA network. We calculate the total adjacency index, Zagreb index 1, Platt index, density, average degree, and fractal dimension by DNA network. In Table 3, we show the topological and statistical properties of the lung cancer DNA network. We calculate equal topological and statistical properties as the non-cancer DNA network.

Results
Complexity in biological systems is predicted by describing interactions amongst different DNA structures. Graphs are emerging as indispensable tools in explaining how the DNA structure functions. We calculated topological properties that describe Lung gene networks. In Table 1, we present the SNP ID/substitution, their genomic location, and association to diseases and phenotypes. In Table 2, we show the topological and statistical properties of the non-cancer DNA network. We calculate the total adjacency index, Zagreb index 1, Platt index, density, average degree, and fractal dimension by DNA network. In Table 3, we show the topological and statistical properties of the lung cancer DNA network. We calculate equal topological and statistical properties as the non-cancer DNA network.   In Table 4, the statistical properties of graphs Γ (ϕi, j) of sequence (ϕ i , j) from non-cancer DNA sequences are calculated: Standard deviation; standard error; variance; mean absolute deviation; coefficient of variation; coefficient of dispersion, and Pearson's contingency coefficient. In Table 5, we show the statistical properties of lung cancer of graphs Γ (ϕi, j) of sequence (ϕ i , j) from cancer DNA sequences. Figure 7 presents the comparison graph Γ (ϕi, j) of sequence (ϕ i , j) from DNA sequences between non cancer and cancer rs10873142 [C > T]. The same identical results were also obtained in the comparison graph Γ(ϕ i , j) of sequence (ϕ i , j) from DNA sequences between non cancer and cancer rs41508050 [C > T] and rs10645014 [C > T].

Discussion
This study investigated the functional role of HIF1A between exons and introns and the connection between these polymorphisms and lung cancer risk by using fractal geometry and graph theory. In the field of network analysis applied in biology, the overall structure of the networks resulted to be far from random; contrarily, it seemed to be very complex. A more sophisticated application of the theory of random graphs in the study of biological networks has been advancing [46][47][48][49][50]. This is needed in order to establish null models to use in assessing the statistical significance of subgraphs, paths, patterns, and motifs that are found in biological networks. The average probability that two genes are associated with cancer or another has not been observed as fairly uniform. We need to be able to distinguish observed patterns and subgraphs from those that occur with a high probability in a random graph, under a biologically appropriate model of randomness. The overexpression of the complex structure of HIF1A mRNA in cancer is also correlated to cancer aggressiveness [51,52]. The Perez-school of study of recursive systems as an interdisciplinary school [53] demonstrated that the DNA is fractal at DNA, codon, full chromosome set, and whole genome levels. Perez [54] presented a new bioinformatics bridge between genomics and mathematics. Deciding to consider the "Universal Fractal Genome Code Law," the frequency of each of the 64 codons across the entire human genome is shown to be determined by the codon's position in the Universal Genetic Code table. In this paper, we discussed that the distribution of nucleotides along the sequence has some hidden mathematical geometrical rules of fractals that depend on the biological activity if we use the presented method for DNA sequencing [55][56][57][58].
In this study, we investigated the relationship between eight HIF-1 alpha polymorphisms , and rs10645014 [C > T] among lung cancer patients by using a new method for the determination of the complexity of the DNA network. We calculated the total adjacency index, Zagreb index 1, Platt index, density, average degree, and fractal dimension by DNA network. A significant relationship was found in topological properties between the lung cancer and non-lung cancer DNA networks.
There was a significant difference in statistical property between the lung cancer and non-lung cancer DNA networks. We calculated equal topological and statistical properties as a non-cancer DNA network. Our results are indirectly supported by a study showing that the topological properties were increased in the lung cancer DNA network.
As a key network parameter, network density measures the portion of the potential connections in a network that are substantially actual connections. What is different in a potential connection is that a connection potentially exists between two nodes, regardless of its concrete subsistence. Measuring the density of a network gives us a ready index of the degree of dyadic connection in a network. Our results showed that density increased in the lung cancer DNA network.
Network density and average degree differed from each other in the fact that the average degree emphasized the factors influencing the number of links, while the network density outlined the factors that affect the number of nodes. The average degree increased in the lung cancer DNA network.
When the Zagreb index was evaluated, the vascular network was more similar to the Erdős-Renyi graph than to the irregular bounded valence graph. The first Zagreb index M1 can be indicated as one of the oldest and the most famous topological molecular structure descriptors, specifically described as the sum of squares of the degrees of the vertices [59]. In our research, Zagreb index 1 was increased in the lung cancer network.
The second Zagreb index M2 can be defined, instead, as equal to the sum of the products of the degrees of pairs of adjacent vertices of the underlying network (graph). In our research, Zagreb index 2 increased in the lung cancer network, as well.
It was easy to recognize the importance of the two-bond molecular fragments for the properties of chemical compounds. These fragments in their total number constitute, in the chemical theory, Platt's index [60]. In general, Platt's index can be considered as a more specific complexity measure than the number of edges. At the same number of edges, where operating complexifying factors like branches and cycles, Platt's index is subjected to a rapid increase. From the analysis carried out in chemical graph theory, it follows that Platt's index still fails to mirror some complex structural patterns, and better measures need to be discovered. In our study, the lung cancer network is associated with increasing Platt index.
The total adjacency index presents the number of edges of a graph. A significant difference in statistical topological property was found between the lung cancer and non-lung cancer DNA networks. Moreover, total adjacency index increased in the lung cancer network.
Since 1986, the elaboration of statistical methods has been developed for probability distributions of graphs with a significant relationship to the triad or triplet counts, complemented with star counts and nodal variables. Moreover, what is recently important is that an adequate modeling of empirical network data requires the inclusion of higher-order configurations (subgraphs with more nodes). In our study, the lung cancer network results are associated with an increasing number of type of triads 3-102 and 16-300.
The clustering coefficient applied to an undirected graph can be defined as a measure of the number of triangles in a graph. This coefficient is computed as the probability that edges are connected to form triangles. In other words, it measures the proportion that incident edge pairs are completed by a third edge to form a triangle. The Watts-Strogatz model is used to replicate a wide range of clustering coefficients and shortest path lengths simultaneously, but proves to be less efficient in producing the observed types of node degree distributions. Nodes tend to attach to popular nodes; popularity is attractive. In our study, the lung cancer network is associated with increasing Watts-Strogatz clustering coefficient.
The goal of our research was to study network science from a mathematical point of view, especially concerning the connection between networks and fractals. In [61], the authors describe the architectural organization and associated emergent topological properties of gene regulatory networks (GRNs) that describe protein-DNA interactions (PDIs) in several model eukaryotes.
Characterization of the dimensionality of complex networks was first introduced by Csányi [62] and was further developed by Gastner and Newman [63]. In addition, the paper proposes a new method for determination of the network complexity. This new method meaningfully favors data analysis on the one hand, and perhaps more importantly, it permits the estimation of the dimension of a network without having global information of its geometric structure. In our study of the fractal dimension of DNA network conditions, we showed that the fractal dimensions decreased in the lung cancer DNA network.
We compared the statistical properties of graphs Γ(ϕi, j) of sequence (ϕi, j) from non-cancer DNA sequences and cancer DNA sequences, which are presented in Tables 4 and 5.
The standard deviation is well known to be a good measure of dispersion. Nevertheless, it is heavily subjected to the influence of outlying observations, as well as the overall shape of the distribution. When the assumption of normality is valid, it can be said that the confidence intervals for the mean, variance, and standard deviation are valid as well. It is also true that the standard error of each of these intervals is determined by the sample standard deviation and the sample size. In addition, this is the standard error of the mean. This means that it is the estimated standard deviation for the distribution of sample means for an infinite population. Another measure of dispersion is the sample variance, s2, which measures how much data spread around the mean. In statistics, it is also defined as an average of the squared deviations from the mean. Mean absolute deviation is not affected by outliers as much as the standard deviation, as the differences from the mean are not squared. As a robust relative measure of dispersion, the coefficient of variation is instead most often used to compare the amount of variation in two samples. The coefficient of dispersion is a measure of association independent of sample size, which ranges between 0 (no relationship) and 1 (perfect relationship). For any particular table, the maximum possible depends on the size of the table, so it should only be used to compare tables with the same dimensions. In our study, the lung cancer network is associated with increasing statistical properties of graphs Γ(ϕi, j) of sequence (ϕi, j) from lung cancer DNA sequences.
Our work also raises an important question of how the fractal dimension of the DNA network correlates with the statistical properties of the cancer DNA network. Additionally, the complexity of small networks under big networks means precisely the key to the fractal geometry. In addition, the key of fractal geometry is the fractal dimension, which describes the complexity of DNA geometric structures.
These observations are finally advanced also by examining other researches [64,65], which, in the future, will allow us to outline the advantages of this study as future work for the vitality of this research field.

Conclusions
This study addresses the potential of an applied analytical method for the prediction of lung cancer. In the paper, this new method is based on a hybrid system of two mathematical disciplines; fractal geometry and network theory are presented for application in bioinformatics and, especially, for the analysis of DNA structures. With a new topology classification and a new approach to the determination of complexity by using fractal geometry of the DNA network, we can accurately distinguish lung cancer DNA sequences from non-cancer DNA sequences. Developing a connection of graph theory and fractal geometry in this proposed method represents the first of many steps that can be made to simplify the analysis of DNA sequencing. Compared to the method proposed by [66], our method is more rigorous, sovereign, and more useful. Baish et al. used the box-counting method for analyzing fractal dimension from pictures. A critical aspect could be that many published results claiming fractal behavior of natural objects may be flawed by poor attention to detail and multiple checks that were revealed to be necessary. In our method, we present the fractal dimension directly from DNA sequences. From a conceptual and computational point of view, the approach is simple and can, therefore, be very useful in many fields of bioinformatics. The proposed approach seems to be advantageous as it allows visual inspection of data and to identify major similarities among different DNA sequences. We believe that the analysis of the motif profiles in complex network models can constitute a point of departure of open and challenging problems in biology. As future work, we aim to introduce some improvements to this research study in terms of a comparative study in order to enhance the accuracy of the investigation technique. In the future, we aim to validate the results in comparison to other research to improve the adopted method for an experimental evaluation and numerical assessment of DNA sequencing. This scientific intention to develop further comparison among the methods used in other studies could produce a better prediction of data processing and analysis in DNA structures in lung cancer networks but also in other types of cancer. Thanks to future improvements, the proposed method could also extend its potentiality for a robust performance from the bioinformatics field to other research areas and fields such as public transport, transit systems, communication systems, robot systems, and motion.