Molecular Sticker Model Stimulation on Silicon for a Maximum Clique Problem

Molecular computers (also called DNA computers), as an alternative to traditional electronic computers, are smaller in size but more energy efficient, and have massive parallel processing capacity. However, DNA computers may not outperform electronic computers owing to their higher error rates and some limitations of the biological laboratory. The stickers model, as a typical DNA-based computer, is computationally complete and universal, and can be viewed as a bit-vertically operating machine. This makes it attractive for silicon implementation. Inspired by the information processing method on the stickers computer, we propose a novel parallel computing model called DEM (DNA Electronic Computing Model) on System-on-a-Programmable-Chip (SOPC) architecture. Except for the significant difference in the computing medium—transistor chips rather than bio-molecules—the DEM works similarly to DNA computers in immense parallel information processing. Additionally, a plasma display panel (PDP) is used to show the change of solutions, and helps us directly see the distribution of assignments. The feasibility of the DEM is tested by applying it to compute a maximum clique problem (MCP) with eight vertices. Owing to the limited computing sources on SOPC architecture, the DEM could solve moderate-size problems in polynomial time.


Introduction
DNA computers make use of DNA strands as the physical substrate in which information is represented, and the information is mainly manipulated through a set of useful operations on DNA strands. Their prominent advantage is making use of DNA molecules with enormous genetic code memory, and the immense parallelism of biochemical reactions. The essential work for DNA computing ability was revealed by solving a nondeterministic polynomial (NP)-complete problem, a seven-vertex Hamiltonian path problem [1]. Since then, many DNA-based models have been proposed, such as the sticker model [2], splicing model [3][4][5], hairpin model [6], the plasmid model [7,8], self-assembly model [9][10][11], and so on. Most are parallel filtering models with a large combinatorial library of solutions for the problem in question. These models are computationally complete and universal, and many NP-complete problems can be solved within polynomial runtime and exponential spaces [12][13][14][15][16].
However, a DNA-based computer has some limitations in terms of convergence speed, adaptability, and effectiveness due to the tools of biochemical laboratory, molecules encoding, detection technology, etc. Chang [17][18][19][20][21][22][23] proposed a biological computer inspired by molecular computing and solved several complex problems. The sticker model, as with typical DNA-based computers, has simple data structures called memory complexes, which are composed of a single-stranded DNA molecule and its associated stickers. According to the principle of Watson-Crick complementarity, the associated stickers could be annealed to or removed from the related complexes to realize bit-vertical operations. Reference [24] described a stickers model for the maximum clique problem and its implementation using a field programmable gate array (FPGA) architecture.
Inspired by the method of processing parallel information processing in the sticker model, a fundamental theoretical parallel model was proposed in our manuscripts [25,26] where we reported experiments performed theoretically to solve Boolean satisfiability problems, the 0-1 Knapsack problem. In this paper, we realize the theoretical model and propose a novel parallel computing algorithm on System-on-a-Programmable-Chip (SOPC) architecture. Firstly, we introduce the sticker model. Then, we realize DNA Electronic Computing Model (DEM) on SOPC architecture. An experiment in solving a maximum clique problem shows the feasibility and the application value of the model.

Results
The DEM uses System-on-a-Programmable-Chip (SOPC) architecture in which custom intellectual property (IP) soft core is embedded based on FPGA. The scan-driving controller and address-driving controller are packaged as the IP component of SOPC Builder. PDP is regarded as an I/O external device, and the instruction controller is implemented in NiosII.
The experimental platform uses the CORE4E-6DF Cyclone IV FPGA development kit of Altera as the core board, in which FPGA main chip uses EP4CE6F17C8N of the Altera high price Cyclone IV FPGA series. The target screen is two electrode PDP. Based on the existing experiment platform, an 8-vertex maximum clique problem (MCP) problem is taken as an example.

The Procedure of Experiment
The undirected graph in Figure 1 has 8 vertices and 12 edges. The scan-driving controller for the graph has 8 input ports of x11, x10, x21, x20, x31, x30, x41, x40 and 16 output ports of Xi in which i is from 0 to 15. The address-driving controller for the graph has 8 input ports of x51, x50, x61, x60, x71, x70, x81, x80 and 16 output ports of Yi in which i is from 0 to 15. Firstly, different grey scales are defined for the different numbers of edges and different colors of the sub-graphs Figure 2A. According to the Equations (5) and (6), the relationship between the input and output signals for scan-driving and address-driver controller are showed in Table 1.
Then, MCP solution space is obtained in 2M steps in which M is the number of edges in the graph. Take the edge E1 as an example; the vertices are x1 and x3. Thus, the sub-graphs in which x1 = 1, x3 = 1 regardless of other variables, should all be set to be 1 in the first subfield. Otherwise, the left sub-graphs should be set to be 0 in this subfield. For the convenience of operation, all the sub-graphs are set to be 0 at first, and the related sub-graphs are set to be 1 in the first subfield. In the state of four-letter logic, x1 = 10, x3 = 10, and other variables are 11. Here, xi1xi0 is used to denote xi, so, x11= 1, x10 = 0, x31= 1, x30 = 0. According to Table 1, all addressing signals are 1, and the scanning signals of X10, X11, X14, X15 are 1. That is to say, the 10th, 11th, 14th, 15th lines are all set to be 1 in the first subfield. The result is displayed in Figure 2B. Assume that there are M edges in the graph, all the sub-graphs can be shown in different colors according to the number of edges in 2M steps. The image in Figure 2C depicts the result.  At the third step, all invalid sub-graphs are selected and set to 1 in the last subfield after steps. Table 2 minutely shows the operation steps of MCP algorithm for Figure 1. Steps 1 to 12 are employed to generate an MCP solution space in which each step consists of two sub-steps, and steps 13 to 28 are used to eliminate all infeasible solutions. Table 2. MCP operation steps of MCP algorithm for Figure 1.

Time Complexity of the Algorithm
The operation time complexity of the above algorithm is the number of operations taken to solve the MCP. The algorithm includes four main steps. The first step has no operation. Not only is the operation time related to the number of vertices M, but also the number of edges M. Assume that edges density is β, then in which M is the number of edges, and 2 N C is the maximum number of edges with N vertices.

The operation time is
M . So the maximum clique problem for G can be solved in o(n 2 ) operational time complexity. Figure 3 shows that the runtime is polynomial in terms of the growing of vertex number. For the same number of vertices, the operation time increases from 0 at the smallest to 1 at the largest. In this way, the decisive advantage of the DEM is reflected distinctly. The exponential growth in time is shifted into space, and the runtime is polynomial.

Space Complexity of the Algorithm
Originally, in order to solve the maximum clique problem, the DEM need 2 n SOPC memory to store the information and 2 n PDP pixels to show the change of solutions, and n is the number of vertices. The main weakness of this library is that the number of memories required to represent all solutions will grow exponentially with the size of the problem. Consequently, it will generally be impossible to solve NP-complete problems using the DEM on SOPC architecture when the size of the problem is large.
Reference [10] introduced small combinatorial input libraries from which we can get some inspiration to deal the exponential growth of memories with the size of the problem. Reference [24] introduced a heuristic algorithm to address the limited FPGA memory. However, it is not guaranteed that the found cliques are maximal. Other methods will also be studied in the future research to overcome the demerit of the DEM.

Comparison with Previous Studies
So far, many methods have been proposed for solving the MCP. It is difficult to compare the DEM implemented on SOPC architecture with existing algorithm because the latter solvers are basically software methods run on different CPUs. However the operational time complexity could be used to compare their time efficiency.
Reference [27] introduced a sticker model based on FPGA architecture. The algorithm has a time complexity of o(n 2 ) in solving k-clique, so it needs o(n 3 ) time to solve the MCP.
Reference [28] presented an algorithm for MCPs based on DNA computation. The algorithm has a computation complexity of o(n 2 −|E|), where |E| is the number of edges in the graph.
In reference [29], the maximum clique problem for G can be solved using o(n 3 ) operational time complexity, and o(n 3 ) time complexity, where n is the number of edges of complement G.
In the DEM, the exponential growth in time is completely shifted into the spatial dimension. In this way, time efficiency is shown to be the outstanding advantage of the DEM over other methods. Times/s

Representation of Information
It is generally accepted that the Watson-Crick complementarity principle plays an essential role in the massive parallel information processing in DNA-based computers. DNA molecule is composed of four nucleotides: adenine, cytosine, guanine, and thymine, or A, C, G, and T for short. These four nucleotides are always present in A-T and C-G pairs (the so-called Watson-Crick complementarity) in the annealing of two single-stranded oligonucleotides to form double stranded molecules. Mathematically, these four nucleotides suggest that DNA computers use a four-letter alphabet {A, T, C, G} to encode information.
The typical sticker model [2] has a memory strand with N bases in length subdivided into K non-overlapping sub-regions each M bases long (thus N ≥ MK). The sub-regions, which are identified with exactly one bit position (or equivalently one Boolean variable), are significantly different from each other. Each sticker is M bases long and is complementary to one and only one of the K memory regions. If a sticker is annealed to its matching region on a given memory strand then a bit corresponding that particular region is on, or 1, for that strand. If no sticker is annealed to a region then that region's bit is off, or 0. Indeed, memory strands are used as registers, and stickers are used to write and erase information in the registers in the sticker model. Figure 4 shows memory strands and associated stickers representing a bit string.

Operations on Sets of Strings
A DNA-based computer is based on biochemical operation to realize computations. There are four main operations on sets of bit strings. The four principle operations are to combine two sets of strings into one new set, separate one set of strings into two new sets, and set or clear the i th bit of every string in a set.

Representation of Information
The DEM model employs the binary code of memory address and the content string to represent information. The memory address string is subdivided into L which is identified with exactly one Boolean variable during the course of computation, and each binary code bit can be labeled as a variable. These L variables (i.e., the number of variables is L, x1, x2, ··· xL) can be used as the variables in NP-complete or other complex combinational problems. The content string of K length is also used for internal calculations.
Supposed that the binary code of i is i1, i2, ··· iL, then the value of variable xj is xj in which j is from 1 to L. For example, the binary code of the 5 th memory address string is 000101, so x4 and x6 are 1, and other variables are 0. When the variables from 1 to 6 are 000101 respectively, then the 5thmemory is selected. The memory address string labeled as Xj, just as the literal meaning, is the address of memories and in some manners similar with the address signal in electronic computers.
When Xi is set to 1, the i th memory is selected and can be read or written. The i th content string is Di which can be denoted as d1, d2, ···, dK. The i th information representation string is displayed explicitly in Figure 5.

Processing of Information
To simulate the representation and parallel processing of massive information, the DEM uses a four-letter input alphabet K = {φ, 0, 1, *} to encode the memory address string. Note that the memory address string is based on the binary code 0 and 1; in other words, the output alphabet is 0 and 1. There is an inclusion relationship between input symbols: If the input address information string is "0*11",it can be separated into the 3th (0011) and 7th (0111) strings according to the above relationship, that is to say, the "0011" and "0111" strings can be combined into "0*11". Consequently, X3 and X7 are set to be 1, and the related memories are selected. The input address strings could be regarded as double stranded molecules in DNA computers, and the selected memory address string could be regarded as a single stranded molecule.
The processing of information in a DNA computer and in the DEM is compared in Figure 6. Figure 6A shows how information is processed in the DEM. The combine of the address information string is labeled with the yellow color, and the single address information string is in white. Figure 6B shows the processing of information in a DNA-based computer. The yellow components represent double stranded molecules. In the DEM, we can select all the 16 memories if we input "****", while in a DNA computer, all single-stranded molecules can be also attached to all complementary molecules simultaneously in the process of annealing. Therefore, both approaches enable us to select as many memory addresses as possible to write or read at one step in parallel processing of massive amounts of information.
It should be noted that the input memory address string is of a four-letter alphabet, while the output memory address string is of 0 or 1. The i th memory could be selected only when the binary code of the memory address is included in the input string. There is a mapping relationship between the input memory address string of L length and all 2 L decimal codes of memory address strings.
, and Xi (the binary representation is   When L is 4, there are 16 memory address strings. If the input address string is V = 0*11, then the output memory address string is 0010001000000000. This means that the 3th and 7th memory logic units are activated. Since electronic computers are based on two-letter logic (i.e., 0 and 1), 00 is practically used to express φ, 01 is used to denote 0, 10 is used to denote 1, and 11 is used to denote *. The logical input address string "0*11" could thus be translated into 01111010.

Computation of DEM
DEM has two special states, SW and SR. SW is the state of simultaneous writing, and SR is the state of simultaneous reading. DEM can simultaneously write data into the selected memories in the state of SW at one step. DEM read data out from the selected memories in the state of SR. Data is * if the contents of the selected memories contain 0 and 1, 0 or 1 if the contents only contain 0 or 1.
Computation is implemented in the DEM as follows: Step 1: Finite memories are selected according to the mapping relationship.
Step 2: Manipulate (S, V, d, i). If the operation is SW, then DEM write 0,1 d ∈ into the i th bit of selected memories.
If the operation is SR, then scan DEM the selected memories, and read out the i th bit of the contents to 0,1 d ∈ .

Example of DEM Computing
A maximum clique problem (MCP) is used as a benchmark problem to illustrate the power of the operations defined above.
Let G = {V, E} be an undirected graph with N vertices and M edges, where V = {v1, v2, ···, vN} is the vertex set of G, and E V V ⊆ × is the edge set of G. A graph is complete if all its vertices are pair-wise adjacent. The maximum clique asks for clique of maximum cardinality. For instance, the graph in Figure 1 has exactly one maximum clique{x1, x4, x6, x7}. Finding the maximum cliques in a graph is a NP-complete problem. This problem is quite important because it appears in many real world problems. Many important intractable problems turn out to be reducible to MCP-for example, the Boolean satisfiability problem and the vertex covering problem. Obviously, any two vertices connected in the original graph are not connected in the complementary graph, so the two vertices connected in the complementary graph cannot be 1 in the original graph at the same time.
The DEM algorithm for MCP is as follows: Step 1: Define the information representation of the graph.
Step 2: Generate a solution space of (N, M) set, in which Nis the number of vertices and M is the number of edges.
Step 4: Find the maximum clique.
The first step is realized by constructing the information representation of the complete 2 N solutions for a given graph with N vertices. All cliques are denoted as a set of binary numbers consisting of 0 and 1, in which 1 stands for the vertices in the cliques, and 0 represents the vertices out of the cliques. For instance, the maximum clique {x1, x4, x6, x7} is shown as 10010110. We utilize the binary code of memory address string as the set of 2 N binary numbers.
In order to solve the MCP, we use content strings to denote E1E2···EM which has M bits in the set to indicate whether the edge is included in the clique. The content string of the clique {x1, x4, x6, x7} can be shown as 010110000111 because the clique has the edges E2, E4, E5, E10, E11, E12. The algorithm yields the following types of information representation: X1X2···X8|E1E2···E12. Thus, the clique {x1, x4, x6, x7} can be denoted as 10010110010110000111.
At the second step, we need to obtain all the solutions in which the binary address codes demonstrate the vertices and the binary content codes demonstrate the edges. In order to construct the binary content strings for every binary address string according to sub-graphs, we firstly initialize the i th bit of the content string to 0. The i th bit of the content string is 1 only when the two vertices xi1 and xi2 of the edge Ei are both 1, so we can set these relative edges to be 1 at one step when the related memory addresses are selected.
At the third step, cliques of which vertices xi1 andxi2 are not connected, specifically connected in the complementary graph, are invalid. The following procedure is designed to eliminate the invalid cliques in which i is from 1 to is the maximum number of edges of graph G with N vertices. After the above three steps, the maximum cliques are the sub-graph with the highest number of edges when there are still sub-graphs. The maximum clique is the binary code of the memory address string with the most number of 1in the content strings.

Architecture of DEM
DEM is made up of three main parts from the functional aspect in Figure

Plasma Display Panel (PDP) Display Model
At present, finding the true solutions from solution space for bio-molecular computing, namely the detection of a solution, is a complex issue. How to detect accurate solutions for problems in polynomial time is an important issue in eliminating the restriction on the development of DNA-based computers. Taking an NP complete problem as an example, if we could directly show the distribution of false solutions, truth solutions, or optimum solutions of the problem in different colors, we can solve the detection of solution easily.
PDP display model can scan the whole display panel at one subfield according to the input data. All subfields constitute the final picture after scanning. The binary codes of the display pixel address are labeled as memory address strings to represent the solution space, and the contents of display pixels are labeled as content strings to represent the edges in MCP.
Through progressive scanning, the address display period-separated subfield method (ADS) can simultaneously display image data in different grey scales in one field. Every edge can be seen as a subfield, so there are M subfields according to the number of edges. The subfield can be scanned to show when the associated edge exits. The grey scale of pixel is the sum of the associated sub-graph. Then, we need to delete the sub-graphs whose vertices exit simultaneously in the complementary graph. Notice that, the selected sub-graphs are set to red theoretically in order to show the infeasible sub-graph in step 3. The lightest valid pixels are the maximum clique. Figure 8 shows the address display period-separated subfield method of DEM for MCP. The subfields (SP) from 1 to M are operated respectively at one step. The subfield SP(M+1) is operated When N is an even number, N = 2n (n = 1,2,3,4,···), the display resolution of PDP is 2 n × 2 n , that is, the number of pixels at each line is 2 n , and there are 2 n lines. When N is an odd number, N = 2n + 1 (n = 1,2,3,4,···), the display resolution of PDP is 2 n × 2 n+1 ,that is, the number of pixels at each line is 2 n+1 , and there are 2 n lines. Figure 8. Address display period-separated subfield method (ADS) of DEM for maximum clique problem (MCP). The yellow colors mean the preparation period, the green colors mean the address period, and the purple colors mean the display period.

Control Model
The Control Model is made up of three parts: the instruction-controlling model, address-driving controller (ADC), and scan-driving controller (SDC).
In order to become suitable for an implementation in SOPC, instructions are transformed into machine code. The full instructions and its machine code is illustrated in Table 3. The format of instruction is 4 bits of instruction followed by the pixel address of the display model and one operand.
Suppose that the input memory addressing strings are Xi in which i is from 1 to N; in four-valued logic, Xi1Xi0 is used to denote the input address memory address Xi. There are 2n input ports and 2 n output ports connected to scanning signals. The input signals are x11, x10, x21, x20, ···, xn1, xn0. If the binary code of i is i1i2i3···in, in which i is between 0 and 2 n − 1, then scanning signal Xi is as follows: in which ^ means logic "and" operation. For example, if N is 6, there are 6 input ports, and 0 scanning signals. The binary code of 4 is 100, so 4 Figure 9 shows the circuit logic diagrams. Assume that x1 = 0, x2 = 0, x3 = * in four-letter logic (i.e., x11 = 0, x10 = 1, x21 = 0, x20 = 1, x31 = 1, x30 = 1), the outputs X0 = 1, X1 = 1, and other outputs are 0 through the circuit. Thus, the first and second memory can be simultaneously selected to read or write in parallel. Figure 9. Scan-driving controller for N = 6.

Result-Analyzing Model
The function of the result-analyzing model is to obtain all feasible assignments of MCP problem. Assume that the pixel in the i th row and j th column is the answer.

Conclusions
Inspired by the concept of information processing among bio-molecules, we propose a DNA Electronic Computing Model (DEM) that is a very novel combination of the advantages of traditional computers and DNA computing. Unlike previous molecular models, it does not rely on biotechnology, and requires no molecular strands or enzymes. The DEM has similar massive parallel operation characteristics with DNA computing power on a silicon-based computing medium.
The DEM seems well-suited to hardware implementation because of its deterministic faster operation. However, owing to the exponential growth of space with the size problem, it cannot be applied to large-scale data processing, and it is our future work to further improve the model for large-scale data processing.