Multi-Probe Based Artificial DNA Encoding and Matching Classifier for Hyperspectral Remote Sensing Imagery

In recent years, a novel matching classification strategy inspired by the artificial deoxyribonucleic acid (DNA) technology has been proposed for hyperspectral remote sensing imagery. Such a method can describe brightness and shape information of a spectrum by encoding the spectral curve into a DNA strand, providing a more comprehensive way for spectral similarity comparison. However, it suffers from two problems: data volume is amplified when all of the bands participate in the encoding procedure and full-band comparison degrades the importance of bands carrying key information. In this paper, a new multi-probe based artificial DNA encoding and matching (MADEM) method is proposed. In this method, spectral signatures are first transformed into DNA code words with a spectral feature encoding operation. After that, multiple probes for interesting classes are extracted to represent the specific fragments of DNA strands. During the course of spectral matching, the different probes are compared to obtain the similarity of different types of land covers. By computing the absolute vector distance (AVD) between different probes of an unclassified spectrum and the typical DNA code words from the database, the class property of each pixel is set as the minimum distance class. The main benefit of this strategy is that the risk of redundant bands can be deeply reduced and critical spectral discrepancies can be enlarged. Two hyperspectral image datasets were tested. Comparing with the other classification methods, the overall accuracy can be improved from 1.22% to 10.09% and 1.19% to 15.87%, respectively. Furthermore, the kappa coefficient can be improved from 2.05% to 15.29% and 1.35% to 19.59%, respectively. This demonstrated that the proposed algorithm outperformed other traditional classification methods.


Introduction
Hyperspectral imagery (HSI) has found many applications in various fields, such as military, agriculture, and mineralogy [1].As one of the traditional hyperspectral analysis techniques, spectral matching classification is often used for signature discrimination, which concentrates on recognizing absorption band and shape of spectra [2].Many spectral matching classification techniques have been proposed.They often depend on a scalar metric to estimate how close two spectra are.In general, they fall into two categories [3][4][5][6].In the first category, traditional algorithms estimate the similarity of two spectra by comparing signature distance.For example, minimum Euclidean distance (MED) classification method divides the classes of land covers by calculating the Euclidean distance of spectral vectors.The smaller the distance value is, the more similar the land covers are.MED provides a similarity scale to estimate spectral brightness with high efficiency.Binary coding (BC) is another method comparing spectral signature brightness [7].It calculates the mean value of spectral curve, and then the spectral curve is encoded to a sequence of zeroes and ones through mean-value-threshold.It may not provide a precise spectral matching result.Spectral distance based classification methods may lead the misclassification when image sunlight is insufficient or shadow exists [8,9].In the second category, spectra are regarded as multidimensional vectors.The similarity of two spectra is estimated by comparing shapes.For example, spectral angle mapper (SAM) method calculates the generalized angle of two vectors.The smaller the angle is, the more similar the spectrums are [10,11].Spectral shape based matching methods can avoid the partial noises, so a high accuracy of spectral matching result can be acquired.However, it is worth noting that some important details may be smoothed in these kinds of methods [12].In addition, there are some other encoding works with successful performance are proposed for image processing.Spatial pyramid matching (SPM) divides image into blocks in different scales, then the histograms of local features in each block represent the image [13].This approach achieved impressive performance of image classification.SCSPM incorporates sparse code (SC) into the framework of SPM.This variant of SPM achieves a higher recognition rate than traditional SPM [14].However, it consumes more time for encoding the local descriptors.Low rank representation (Lrr) based spatial pyramid matching (SPM) method encodes the descriptors under the framework of SPM and calculates the representation in the matrix space directly [15,16].It improves the robustness of the variants of SPM.LrrSPM achieves a better efficiency than the variants of SPM.
Recently, a novel classification strategy inspired by the artificial deoxyribonucleic acid (DNA) technology has been presented for hyperspectral remote sensing imagery [17][18][19][20].As a branch of computational intelligence, artificial DNA encoding and matching (ADEM) method has the strong computing and matching capability to discriminate the tiny differences in DNA strands [21].It can discriminate the tiny differences in DNA strands by DNA encoding and matching in the molecule layer.DNA encoding method in this classifier plays an important role.It not only reduces the influence of the noise caused by sunlight and weather but also takes both brightness and shape information into consideration [7].In a word, it has been proven that ADEM can comprehensively compare the similarity of two spectra.However, there are some limitations in practice so that the classification result is affected.Because when all of the bands participate in the encoding procedure, data volume is expanded which may amplify redundancy.Moreover, the full-band comparison degrades the importance of key spectral bands carrying critical information in class separation [22].
In order to overcome these shortcomings, a new multi-probe based artificial DNA encoding and matching (MADEM) method is proposed in this paper.The biggest difference between ADEM and MADEM is that the latter looks for and extracts the specific information/fragments of the encoded DNA strands, which can replace the full-encoded DNA strands for the matching and classification process in hyperspectral remote sensing imagery.In other words, MADEM aims at seeking out the temporal information of the encoded DNA strands.In this method, multiple probes, defined as the extracted specific fragments of the DNA strand, is put forward for HSI.They are the temporal information of the DNA strands and are used to copy genes and find the most characteristic element of a spectrum.It is well known that only genes can transfer the genetic information of creatures in biological sciences [23].Differences in genes can lead to various lives.Thus, the probes, which represent unique fragments of encoded strand, can be extracted to distinguish different spectra.This new method can help to improve the district division by discriminating spectra of different land covers.Meanwhile, the impact of redundant bands can be reduced.Generally, the number of probes is varied due to numerous genes on DNA strands.The probes are extracted randomly and do not share cross regions during the process.In order to generate the most effective probes, the algorithm can be iterated many times to achieve the best solution.The extracted probes provide a more precise discrimination of spectrums, which is favorable for matching and classification process.
The rest of this paper is organized as follows: Section 2 introduces the background of biological DNA and DNA encoding method for hyperspectral remote sensing data.Section 3 provides the background of DNA probe technology and describes the multi-probe selection strategy in detail.Section 4 shows two experiments using several traditional algorithms compared with MADEM.The computational complexity of these methods is provided in Section 5. Finally, conclusion is drawn in Section 6.

The Basic Theory of DNA
As showen in Figure 1a, DNA exists in creature chromosome, and it contains key genetic information of creatures.However, for human beings, 99.99% of human DNA sequences are the same.Some specific fragments in DNA, called genes, enable to distinguish one individual from another [24].DNA strands consist of four types of bases: thymine (T), adenine (A), cytosine (C), and guanine (G), as shown in Figure 1b.Not all of the bases on the DNA strands have direct influence on lives, which means only a few pieces of DNA strands, i.e., genes, play a critical role in biological difference [25].Genes consist of four types of original bases.Genes make creatures different from each other by conveying genetic information while non-genomic sequences have structural purposes.

The Basic Theory of DNA
As showen in Figure 1a, DNA exists in creature chromosome, and it contains key genetic information of creatures.However, for human beings, 99.99% of human DNA sequences are the same.Some specific fragments in DNA, called genes, enable to distinguish one individual from another [24].DNA strands consist of four types of bases: thymine (T), adenine (A), cytosine (C), and guanine (G), as shown in Figure 1b.Not all of the bases on the DNA strands have direct influence on lives, which means only a few pieces of DNA strands, i.e., genes, play a critical role in biological difference [25].Genes consist of four types of original bases.Genes make creatures different from each other by conveying genetic information while non-genomic sequences have structural purposes.As shown in Figure 2, genetic information stored on the DNA strand is transcribed to ribonucleic acid (RNA) strand.After the RNA has been transcribed, the RNA translates the genetic information to amino acids, which are the composition of proteins [28].Different combinations of proteins determine different kinds of creature.On the RNA strand, three adjacent bases are called codon, on which the type of amino acid depends.There are 61 kinds of codons including two kinds of initial codons, in which the amino acid starts to be translated.In sum, genes on DNA strand determine the differences of creatures through two steps: transcription and translation.As shown in Figure 2, genetic information stored on the DNA strand is transcribed to ribonucleic acid (RNA) strand.After the RNA has been transcribed, the RNA translates the genetic information to amino acids, which are the composition of proteins [28].Different combinations of proteins determine different kinds of creature.On the RNA strand, three adjacent bases are called codon, on which the type of amino acid depends.There are 61 kinds of codons including two kinds of initial codons, in which the amino acid starts to be translated.In sum, genes on DNA strand determine the differences of creatures through two steps: transcription and translation.

Amino acid
Although DNA strands of different creatures are overall similar, some key parts are significantly different.It is these key parts that lead to the variety of lives.If these key parts can be located, the most critical information for discrimination can be extracted.
(RNA) strand.After the RNA has been transcribed, the RNA translates the genetic information to amino acids, which are the composition of proteins [28].Different combinations of proteins determine different kinds of creature.On the RNA strand, three adjacent bases are called codon, on which the type of amino acid depends.There are 61 kinds of codons including two kinds of initial codons, in which the amino acid starts to be translated.In sum, genes on DNA strand determine the differences of creatures through two steps: transcription and translation.Although DNA strands of different creatures are overall similar, some key parts are significantly different.It is these key parts that lead to the variety of lives.If these key parts can be located, the most critical information for discrimination can be extracted.

The DNA Encoding Method
A DNA strand can be likened to an array of four different symbols {T, C, A, G}, meaning DNA strands can be encoded by these four code words, which is enough to describe the information of DNA by different combinations [29].Considering the similarity of DNA strand and spectrum, i.e., they are both high dimensional data, DNA encoding method was utilized for hyperspectral data.After DNA encoding, spectra, i.e., image pixels, are encoded to DNA strands consisting of {T, C, A, G}.To extract comprehensive information of spectra, encoded DNA strand must include the brightness and shape information of the spectra simultaneously [30].Spectral DNA encoding is illustrated in Figure 3. Figure 3a shows the original spectral curve of alunite mineral, and Figure 3b shows the DNA encoded alunite mineral.It is clearly shown that DNA encoding method can well describe the brightness and shape information.

The DNA Encoding Method
A DNA strand can be likened to an array of four different symbols {T, C, A, G}, meaning DNA strands can be encoded by these four code words, which is enough to describe the information of DNA by different combinations [29].Considering the similarity of DNA strand and spectrum, i.e., they are both high dimensional data, DNA encoding method was utilized for hyperspectral data.After DNA encoding, spectra, i.e., image pixels, are encoded to DNA strands consisting of {T, C, A, G}.To extract comprehensive information of spectra, encoded DNA strand must include the brightness and shape information of the spectra simultaneously [30].Spectral DNA encoding is illustrated in Figure 3. Figure 3a shows the original spectral curve of alunite mineral, and Figure 3b shows the DNA encoded alunite mineral.It is clearly shown that DNA encoding method can well describe the brightness and shape information.

DNA Encoding for Spectral Brightness Information
The continuous-valued reflectance must be converted into four types of gradients, and then the spectra can be encoded to an array of DNA code words according to the gradients.For the i-th band of the spectral signature, defined as i y , three thresholds separate the spectral value into four gradients.After the spectral values of the i-th band are compared with the thresholds, it is encoded to the corresponding code words according to the gradients.The thresholds are calculated as follows.First, the middle threshold marked as middle T can be acquired, and then, the other two thresholds marked as lower T and higher T can be calculated based on middle T [19,20].

DNA Encoding for Spectral Brightness Information
The continuous-valued reflectance must be converted into four types of gradients, and then the spectra can be encoded to an array of DNA code words according to the gradients.For the i-th band of the spectral signature, defined as y i , three thresholds separate the spectral value into four gradients.After the spectral values of the i-th band are compared with the thresholds, it is encoded to the corresponding code words according to the gradients.The thresholds are calculated as follows.First, the middle threshold marked as T middle can be acquired, and then, the other two thresholds marked as T lower and T higher can be calculated based on T middle [19,20].
T higher " T lower " where Biggerpy i q " # y i if y i ě T middle 0 else , Smallerpy i q " # y i if y i ă T middle 0 else , and Nb is the number of number of the image bands.k is the number of bands whose value are bigger than T middle , p is the number of bands whose value is smaller than T middle and ρ is the adaptive coefficient of DNA brightness encoding with domain value (0.5, 1.0).Since different ρ can lead to different code, the range of ρ has to be turned.The encoding method is based on the four gradients: G, if y i P ry min , T lower q A, if y i P rT lower , T middle q C, if y i P rT middle , T higher q T, if y i P rT higher , y max s where y min means the minimum value of the spectral curve, and the y max means the maximum value of it.

DNA Encoding for Spectral Shape Information
Only considering spectral brightness may not well discriminate different land covers.The shape information has to be encoded.There are many approaches to describe spectral shape.For example, the change of spectral curve of two adjacent bands can reflect the shape, but it can only be encoded to three kinds of code words.Three adjacent bands can provide nine kinds of spectral shape change.Nevertheless, more than three adjacent bands provide more kinds of spectral shape change.It is not efficient for encoding process.To describe shape details, three adjacent bands can be appropriate for DNA encoding.Assume that y " y 1 , y 2 , y 3 , . . ., y Nb ( T represents a hyperspectral signature, and ∆ represents the spectral value tolerance.Then spectral shape details can be encoded as follows [19,20]: Type 1: if ˇˇy i ´yi´1 ˇˇď ∆ and ˇˇy i`1 ´yi ˇˇď ∆ Type 2: if p ˇˇy i ´yi´1 ˇˇď ∆ and ˇˇy i`1 ´yi ˇˇą ∆q or p ˇˇy i ´yi´1 ˇˇą ∆ and ˇˇy i`1 ´yi ˇˇď ∆q Type 3: if py i ´yi´1 ă ´∆ and y i`1 ´yi ă ´∆q or py i ´yi´1 ą ∆ and y i`1 ´yi ą ∆ Type 4: if py i ´yi´1 ă ´∆ and y i`1 ´yi ą ∆q or py i ´yi´1 ą ∆ and y i`1 ´yi ă ´∆q The ∆ can be set as follows: where θ is the shape adaptive coefficient.The DNA encoding technique is used to capture spectral texture feature changes in three consecutive adjacent bands.If 1 < i < Nb, we can define the DNA shape details code words DNA Shape i as: Remote Sens. 2016, 8, 645 6 of 18 As the brightness and shape features have been encoded, the whole DNA strands of a spectrum can be represented as follows: The number of spectral shape code words is Nb-2, because three consecutive adjacent bands determine code words and the last two bands cannot extract the shape information.

The Multi-Probe Based Artificial DNA Encoding and Matching Method
The proposed MADEM algorithm is mainly made up of the DNA encoding method and the DNA multi-probe extracting strategy.The former transforms spectral information into DNA information, and the latter optimizes the DNA information through selecting specific fragments of DNA strand.In this section, the DNA probe and multi-probe extracting approaches will be introduced.

The DNA Probe Technology
In 1975, DNA probe technology was first proposed with in the field of biology by South.Currently, the technology has become the basic technology of modern molecular biology, and it has been improved and developed in many fields [30].DNA probe technology is also called molecular hybridization technique.It utilizes the DNA molecule characteristics of degeneration, renaturation, and high accuracy of complementary base pairs.These characteristics allow DNA probe to be utilized to search the target DNA strands [30].In practical applications, DNA probe technology can be used to quickly detect pathogens by using specific artificial single-stranded DNA fragment of the pathogens with radioactivity or biomarkers.The probe is a single-stranded RNA, which can detect the target DNA strand through testing the radiation after base pairing of the probe and the target DNA strand.Compared with the traditional method, DNA probe technology is fast and sensitive.Traditional methods spend several days or even weeks to finish once detection with low precision while DNA probe technology just spends one day with high precision.It is reported that DNA probe approach can detect ten viruses from one ton water [30,31].

The DNA Probe Technology
In 1975, DNA probe technology was first proposed with in the field of biology by South.Currently, the technology has become the basic technology of modern molecular biology, and it has been improved and developed in many fields [30].DNA probe technology is also called molecular hybridization technique.It utilizes the DNA molecule characteristics of degeneration, renaturation, and high accuracy of complementary base pairs.These characteristics allow DNA probe to be utilized to search the target DNA strands [30].In practical applications, DNA probe technology can be used to quickly detect pathogens by using specific artificial single-stranded DNA fragment of the pathogens with radioactivity or biomarkers.The probe is a single-stranded RNA, which can detect the target DNA strand through testing the radiation after base pairing of the probe and the target DNA strand.Compared with the traditional method, DNA probe technology is fast and sensitive.Traditional methods spend several days or even weeks to finish once detection with low precision while DNA probe technology just spends one day with high precision.It is reported that DNA probe approach can detect ten viruses from one ton water [30,31].
Figure 4 shows the mechanism of DNA probe technology in biology.Every letter means a kind of base (thymine (T), adenine (A), cytosine (C), and guanine (G)).Every three bases make up a codon, such as the ACA and CCT in Figure 3, which is the basic unit of fragment of a DNA strand.DNA probe is some continuous sequences of the RNA strand.It must be specific to discriminate the target DNA strand from others.The DNA probe detecting process in biology field is expressed as follows: First, a probe is extracted from the reference DNA strand with specificity being assured.Then, many copies of the probe are synthesized for detection.Although it increases the time of detection, a precise result can be obtained by this contribution.The hybridization environment (such as water, proper temperature, enzyme and so on) of the probes and target DNA strands can help to perform the next step.

The DNA Multi-Probe Extracting Strategy Based on Hyperspectral Remote Sensed Image
Based on this concept, it is possible for hyperspectral data to be recognized by special band sequences if DNA probes can be extracted.As shown in Figure 5, spectral curves of alunite (Figure 5a) and halloysite (Figure 5b) are both encoded to DNA strands.Furthermore, the difference of the DNA strands of alunite and halloysite can be represented in Figure 5c.The different code words are represented as "1" for blue, and the same code words are represented as "0" for white in the chart.It is clear that the different parts of the encoded DNA strands are some fragments, not all the DNA strands.Only some sequences on the DNA strand provide a distinction for different DNA strands, as does land cover spectrum: different land covers can be distinguished more precisely by specific fragments.Figure 4 shows the mechanism of DNA probe technology in biology.Every letter means a kind of base (thymine (T), adenine (A), cytosine (C), and guanine (G)).Every three bases make up a codon, such as the ACA and CCT in Figure 3, which is the basic unit of fragment of a DNA strand.DNA probe is some continuous sequences of the RNA strand.It must be specific to discriminate the target DNA strand from others.The DNA probe detecting process in biology field is expressed as follows: First, a probe is extracted from the reference DNA strand with specificity being assured.Then, many copies of the probe are synthesized for detection.Although it increases the time of detection, a precise result can be obtained by this contribution.The hybridization environment (such as water, proper temperature, enzyme and so on) of the probes and target DNA strands can help to perform the next step.Proper environment is conducive to detection efficiency.If the probes find the pairing bases, they match with each other, unless they are dissociative.Pairing rules are: (1) T matches G; and (2) C matches G.The dissociative probes are dissolved.Finally, the remaining DNA strands are tested.If there are biomarkers or radiation in the remaining strands, it means the target DNA strands are successfully detected.

The DNA Multi-Probe Extracting Strategy Based on Hyperspectral Remote Sensed Image
Based on this concept, it is possible for hyperspectral data to be recognized by special band sequences if DNA probes can be extracted.As shown in Figure 5, spectral curves of alunite (Figure 5a) and halloysite (Figure 5b) are both encoded to DNA strands.Furthermore, the difference of the DNA strands of alunite and halloysite can be represented in Figure 5c.The different code words are represented as "1" for blue, and the same code words are represented as "0" for white in the chart.It is clear that the different parts of the encoded DNA strands are some fragments, not all the DNA strands.Only some sequences on the DNA strand provide a distinction for different DNA strands, as does land cover spectrum: different land covers can be distinguished more precisely by specific fragments.The proposed DNA multi-probe extracting strategy includes four steps: DNA encoding, multi-probe extracting, matching classification and parameter iteration.First, the hyperspectral image and spectral library are both encoded to DNA strands using Equations ( 1)- (7).The continuous spectra are converted to discrete values.After that, the multi-probe is extracted for matching classification from the DNA strands instead of the RNA strands.It is the core step of the proposed algorithm.For comparison of different matching classification performance, several groups of probe numbers are set by the user.Meanwhile, probes are selected randomly in order to extract specific probes to differentiate land covers, which means the starting positions and lengths of probes are decided randomly.The probe extraction strategy can reach the best selection result by iterating the process.To assure the validity of the probes, three rules are observed: (1) The starting positions of the probe, which means the initial codon in Figure 2, should be determined in  The proposed DNA multi-probe extracting strategy includes four steps: DNA encoding, multi-probe extracting, matching classification and parameter iteration.First, the hyperspectral image and spectral library are both encoded to DNA strands using Equations ( 1)- (7).The continuous spectra are converted to discrete values.After that, the multi-probe is extracted for matching classification from the DNA strands instead of the RNA strands.It is the core step of the proposed algorithm.For comparison of different matching classification performance, several groups of probe numbers are set by the user.Meanwhile, probes are selected randomly in order to extract specific probes to differentiate land covers, which means the starting positions and lengths of probes are decided randomly.The probe extraction strategy can reach the best selection result by iterating the process.To assure the validity of the probes, three rules are observed: (1) The starting positions of the probe, which means the initial codon in Figure 2, should be determined in the range of strands; (2) The length of the strands should not be beyond the strand length; and (3) Each probe should be selected from the strands, which is the probe selection rule.
In MADEM, the starting positions of the probes cannot be set by the user unless he or she knows the locations of the specific fragments.After the number of probes is set, the starting points are determined randomly.Rule 1 can obtain the starting positions of the probes and decreases the risk of human intervention at the same time.Rule 2 imposes some restrictions on the probe length because the probe extraction result is affected by the length of the probe, and that has a great impact on classification performance.If the probe is longer, more nonspecific fragments are contained in the probe.Furthermore, some probes are overlapped so that the comparison of spectrum is affected, which can decrease the classification accuracy.On the contrary, if the probe is shorter, some code words of the specific fragment may be ignored, which can lead to the typical information of classification is missing.For the above reasons, both the aspects decrease the classification accuracy.Thus, the proper probe length should be chosen based on Rule 2. Rule 3 assures the validity of probes.Random selection of start position and probe length may lead the probe extract beyond the strand rang.These kinds of probes are invalid and illegal, so Rule 3 puts the restriction on the selection process.
After probe extraction, match classification is carried out.Spectral match process aims at comparing the spectral similarity of the reference spectrum and a pixel spectrum.The similarity is defined the total AVD of each probe: where Np means the number of probes, Lp means the length of the pth probe, and fpp 1 i , p 2 i q " . The result of calculation will be larger if the two spectra are more similar.The classification process can be carried out according to the match result.When a pixel has the largest similarity with a reference spectrum, the pixel will be classified to that class.Due to random selection of probes of encoded DNA strands, the performance may not be stable.Iteration strategy contains two aspects: iteration and stopping criterion.The iteration increases the probability of getting the proper probes by increasing the selection times.During the process, there should be a measure evaluating once the classification result will be satisfied.In this paper, it is called stopping criterion parameter (SCP).
SCP " Kappa i (9) where m kk is the number of observations in row k and column k of the confusion matrix, which is calculated by test area and the i-th classified map; m k`a nd m `k are the marginal totals for row k and column k of the confusion matrix; and N is the total number of observations.The value of SCP equals the kappa of the i-th iteration calculating result.This measure can well evaluate the classification performance.The entire algorithm is illustrated in Figure 6a.In Step 1, the image and the spectral library are encoded to DNA strands.In Step 2, DNA probes are extracted from the encoded DNA strands of spectral library and image based on some extraction rules mentioned previously.In Step 3, classification is carried out based on the AVD of reference probes and the pixel probes.In Step 4, the CSP is calculated from the local classification result and test samples.In Step 5, the iteration stopping criterion is tested if the criterion is reached, the algorithm is terminated; otherwise, repeat Steps 2-5.More details are shown in Figure 6b, including the pseudo-code of MADEM.The stopping criterion contains two aspects: if the SCP of the partial result of the iteration reaches the artificially set threshold, the iteration will end; and, if the iteration time reaches the artificially set value, the iteration will end.Furthermore, the result of bet SCP will be output.
once the classification result will be satisfied.In this paper, it is called stopping criterion parameter (SCP).
where m is the number of observations in row k and column k of the confusion matrix, which is calculated by test area and the i-th classified map; m and m are the marginal totals for row k and column k of the confusion matrix; and N is the total number of observations.The value of SCP equals the kappa of the i-th iteration calculating result.This measure can well evaluate the classification performance.
The entire algorithm is illustrated in Figure 6a.In Step 1, the image and the spectral library are encoded to DNA strands.In Step 2, DNA probes are extracted from the encoded DNA strands of spectral library and image based on some extraction rules mentioned previously.In Step 3, classification is carried out based on the AVD of reference probes and the pixel probes.In Step 4, the CSP is calculated from the local classification result and test samples.In Step 5, the iteration stopping criterion is tested if the criterion is reached, the algorithm is terminated; otherwise, repeat Steps 2-5.More details are shown in Figure 6b, including the pseudo-code of MADEM.The stopping criterion contains two aspects: if the SCP of the partial result of the iteration reaches the artificially set threshold, the iteration will end; and, if the iteration time reaches the artificially set value, the iteration will end.Furthermore, the result of bet SCP will be output.

Experiment 1
The image data in this experiment were acquired by the airborne ROSIS sensor at Pavia urban area, north Italy.The size of the whole data set is 1400 ˆ512 pixels, and the experimental area is a subset of 323 ˆ187 in the experiment.The data set was provided by the Data Fusion Technical Committee (DFTC) [32].The original 115 bands over 0.43 to 0.86 µm and the spectral resolution is 4 nm.Thirteen bands were removed due to low signal-to-noise ratio.The spatial resolution is 1.3 m.The ground-truth image and the list of classes and their corresponding numbers of samples are shown in Figure 7 and Table 1, respectively.In order to better test the practicality of MADEM, only 119 training samples of the image are chosen.If the small training size can lead to a satisfactory classification result, the proposed algorithm will be proven more robust and better than the other methods.Figure 7a shows the false-color image with band 10, 20 and 61. Figure 7b shows the ground truth, where the six classes are: roof, vegetation, asphalt, water, concrete, and shadow.Figure 7c-i shows the classification results of the seven classifiers.
The image data in this experiment were acquired by the airborne ROSIS sensor at Pavia urban area, north Italy.The size of the whole data set is 1400 × 512 pixels, and the experimental area is a subset of 323 × 187 in the experiment.The data set was provided by the Data Fusion Technical Committee (DFTC) [32].The original 115 bands over 0.43 to 0.86 μm and the spectral resolution is 4 nm.Thirteen bands were removed due to low signal-to-noise ratio.The spatial resolution is 1.3 m.The ground-truth image and the list of classes and their corresponding numbers of samples are shown in Figure 7 and Table 1, respectively.In order to better test the practicality of MADEM, only 119 training samples of the image are chosen.If the small training size can lead to a satisfactory classification result, the proposed algorithm will be proven more robust and better than the other methods.Figure 7a shows the false-color image with band 10, 20 and 61. Figure 7b shows the ground truth, where the six classes are: roof, vegetation, asphalt, water, concrete, and shadow.Figure 7c-i    Parameters of the algorithm are set as follows.The brightness adaptive coefficient of DNA encoding and the shape adaptive brightness coefficient are both set as 1.0.The SVM applies a radial basis function as the kernel function and a cross-validation approach to determine the optimal values of the parameters [33].The parameter of probe number is set as 5.The iteration time of the iteration calculating is set as 1000.The artificially SCP is set as 0.99.Table 1 shows the training and testing pixels of the experiment 1.In the aspect of visual comparison, BC produces the worst visual result.It can distinguish the concrete road, but cannot recognize shadows.Moreover, BC method misclassified some water pixels as shadows.SAM can well distinguish most classes except concrete road and asphalt road.
The result of SAM has clear eades of different land covers, but it misclassified too many concrete road and vegetation pixels.The performance of SCM is better than SAM.SCM can recognize more concrete pixels than SAM.The result of CCSM is similar to that of SCM.The result of SVM shows clear edges of different land covers and it can discern the shadows well.However, SVM ignored many concrete road and buildings.Compared to the result of MADEM, ADEM gets less shadow pixels right identified.The result of MADEM can discern the vegetation well, and it can get an excellent division of all the land covers.
The confusion matrix is in Figure 8.The kappa coefficient, overall accuracy (OA) and process time are listed in Table 2. From Figure 8, it is noticed that every classifier, except BC and SAM, can 100% recognize the water correctly.As for vegetation, all classifiers can 100% recognize it correctly.From Table 2, it is clear that the performance of BC is the worst.The performance of SAM, SCM, CCSM and SVM are similar in this experiment.MADEM provides the best result with the kappa of 94.35% and OA of 96.62%, with a gain of 3.25% and 1.99% over SVM and with a gain of 2.66% and 1.56% over ADEM.

Experiment 2
Another hyperspectral remote sensing image was also used: the urban data set captured by the HYDICE sensor in October 1995, and 210 bands are included.The objects in this image are more diverse and include different types of roofs and roads, posing a greater challenge for the classifiers than the experiment 1.It is located at Copperas Cove, near Fort Hood, Texas, U.S., with a size of 307 × 307.The spectral resolution is 10 nm and the spatial resolution is 2 m.The main classes in this area include mall roof (roof-1), house roof (roof-2), tree, concrete, grass and asphalt.In addition, due to the low solar altitude, trees and houses cast long shadows  BC cannot distinguish the water and shadow well because the two spectral curves are close in spectral brightness.BC discriminates land covers through binary codes of spectral brightness without extracting discriminative information in spectral shapes.SAM, SCM, and CCSM are based on spectral shape.They generate better classification than BC because the spectral curves of the land covers in this experiment are more important.The result of CCSM is better than SAM and SCM because the RMS can take the skewness and crest of the spectral curves into account.SVM has the similar performance with the former traditional methods mainly because of the small training samples.This affects the learning process of SVM, while it has little influence to the DNA method.ADEM utilizes the brightness and shape information of the spectral curves to recognize the land covers, so its performance is better than the traditional algorithms.From the experiment, it is clear that MADEM has a gain over SVM and ADEM.That means the proposed algorithm can get a better performance than the traditional ADEM and SVM method in this experiment.

Experiment 2
Another hyperspectral remote sensing image was also used: the urban data set captured by the HYDICE sensor in October 1995, and 210 bands are included.The objects in this image are more diverse and include different types of roofs and roads, posing a greater challenge for the classifiers than the experiment 1.It is located at Copperas Cove, near Fort Hood, Texas, U.S., with a size of 307 ˆ307.The spectral resolution is 10 nm and the spatial resolution is 2 m.The main classes in this area include mall roof (roof-1), house roof (roof-2), tree, concrete, grass and asphalt.In addition, due to the low solar altitude, trees and houses cast long shadows on the ground [34].The ground truth image and the list of classes and their corresponding numbers of samples for this experimental data set are described in Figure 9 and Table 3, respectively.Before classification, bands 103-108, 139-152, and 208-210 are removed because they are regarded as noisy or water-absorption bands [35,36].Similar to the Pavia urban area data set, the same strategy was used to separate the ground truth into the training samples and test samples.
In the MADEM, the parameters of probe number and iteration time are set as 8 and 900, respectively.The artificially SCP is still set as 0.99.The parameters of SVM are set same as the experiment 1.The false-color image is shown in Figure 9a. Figure 9b shows the ground truth of the experimental data.Figure 9c-i shows the classification results of the seven classifiers.The parameters are set as the experiment 1 in the algorithm process.Table 3 shows the training and testing samples of the experiment 2. According to Figure 9, visual comparisons demonstrate that the result of BC contains plenty of misclassifications of the land covers, and many pixels in the grass area are misclassified to tree.SAM cannot well separate the asphalt road and concrete road, and it misclassifies grass and concrete.The results of SCM and CCSM are similar, and both better than the result of SAM.The results of SCM and CCSM contain many pixels of tree and grass that are misclassified to concrete.SVM outperforms the aforementioned classifiers, and it can provide clear edges of different land covers.ADEM can well separate many land covers except grasses.MADEM yields the best classification result.The result of MADEM contains least misclassifications, and the edges of different land covers are sharp-edged.many pixels in the grass area are misclassified to tree.SAM cannot well separate the asphalt road and concrete road, and it misclassifies grass and concrete.The results of SCM and CCSM are similar, and both better than the result of SAM.The results of SCM and CCSM contain many pixels of tree and grass that are misclassified to concrete.SVM outperforms the aforementioned classifiers, and it can provide clear edges of different land covers.ADEM can well separate many land covers except grasses.MADEM yields the best classification result.The result of MADEM contains least misclassifications, and the edges of different land covers are sharp-edged.From the Figure 10 and Table 4, the performance of BC is the worst, again.SAM is better than BC.The results of SCM and CCSM are close and much better than the result of BC and SAM.SVM and ADEM perform better than the former traditional methods, but worse than MADEM.BC is based on spectral brightness, so it cannot distinguish grasses, concrete road and trees.The spectral signatures of asphalt road and roof-2 have similar shapes, so SAM, SCM, CCSM and even SVM misclassify asphalt road and roof-2.The result of ADEM is better mainly because the DNA encoded strands can well extract the brightness and shape information.Simultaneously, MADEM offers the best classification accuracy than the aforementioned algorithms.MADEM improves the kappa from 70.62% to 91.94% from the Table 4.  From the Figure 10 and Table 4, the performance of BC is the worst, again.SAM is better than BC.The results of SCM and CCSM are close and much better than the result of BC and SAM.SVM and ADEM perform better than the former traditional methods, but worse than MADEM.BC is based on spectral brightness, so it cannot distinguish grasses, concrete road and trees.The spectral signatures of asphalt road and roof-2 have similar shapes, so SAM, SCM, CCSM and even SVM misclassify asphalt road and roof-2.The result of ADEM is better mainly because the DNA encoded strands can well extract the brightness and shape information.Simultaneously, MADEM offers the best classification accuracy than the aforementioned algorithms.MADEM improves the kappa from 70.62% to 91.94% from the Table 4.

Computational Complexity Analysis
The computational complexity, also called calculation times and space cost of the computer, of the contrastive methods are provided in this section.As shown in Table 5, BC has the least calculation times and space cost because of the simple encoding strategy.SAM and SCM have the same calculation times and they both cost more space than BC.CCSM has an iteration of getting spectral correlation 21 times.Thus, it needs more space to store the iteration result.As for SVM, because of its complex calculating process, the cross-validation and classification process contribute the most of the calculation times.The space cost of SVM is related to iteration times, band number, image size and class number.In the two DNA encoding methods, ADEM and MADEM have the same part of calculation time (6NB + 6SLB), mainly including the threshold calculation of T middle, T higher and T lower.The extra iteration process of ADEM and MADEM cost 2SLNB + SLN and I(NPB + PBSL + NBSL + 2SLN) times, respectively.Besides, they have the similar space cost.

Computational Complexity Analysis
The computational complexity, also called calculation times and space cost of the computer, of the contrastive methods are provided in this section.As shown in Table 5, BC has the least calculation times and space cost because of the simple encoding strategy.SAM and SCM have the same calculation times and they both cost more space than BC.CCSM has an iteration of getting spectral correlation 21 times.Thus, it needs more space to store the iteration result.As for SVM, because of its complex calculating process, the cross-validation and classification process contribute the most of the calculation times.The space cost of SVM is related to iteration times, band number, image size and class number.In the two DNA encoding methods, ADEM and MADEM have the same part of calculation time (6NB + 6SLB), mainly including the threshold calculation of T middle, T higher and T lower.The extra iteration process of ADEM and MADEM cost 2SLNB + SLN and I(NPB + PBSL + NBSL + 2SLN) times, respectively.Besides, they have the similar space cost.Mathematical variables are defined as follows.S, the samples of the HSI; L, the lines of the HSI; B, the bands of the HSI, N, the number of the classes; P, the probe number in MADEM; and I, the iteration times of SVM and MADEM.
Based on the above analysis, the actual time costs of all classifiers in the two experiments are listed in Table 6.It should be noted that SAM, SCM, SVM and CCSM are performed with float-type data of the original image, while the other spectra encoding based classification methods, such as BC, ADEM and MADEM, are performed with the byte-type data transformed.BC is the fastest algorithm in the two experiments because the calculation time and space cost are both the smallest.SAM and SCM both cost more than BC and have the similar time.CCSM is the slowest algorithm because of its long iteration.In addition, the calculation of CCSM is based on float-type data, which will be more complex than byte-type data.SVM also costs much time because it contains a parameter optimization process.After parameter optimization, SVM reconstruct itself to classify the land covers.ADEM and MADEM both cost more time than BC, SAM and SCM since they have the DNA encoding process, but little than SVM and CCSM.When the image size is large, they can release more computational memory stress than other non-encoding algorithms.The probe extraction iteration of MADEM can lead to a little more calculation time than ADEM, but the time cost is still acceptable.Since the computational complexity of MADEM is affected by the iteration time (I) and probe number (P), it is necessary to analyze the effect of setting these parameters when running the MADEM algorithm.In the two experiments, the parameters of probe number are both set from 1 to 10. Furthermore, the iteration time are both set from 100 to 1000 with the step length of 100.The overall accuracies (OA) of the two experimental images based on above two factors are shown in Figure 11.
As can be seen in Figure 11, the best performance appears when the probe number is 5 and the iteration time is 1000 in experiment 1.Furthermore, the best performance appears when the probe number is 8 and the iteration time is 900 in experiment 2. It has been found that the line chart of OA based on two factors both present vibratory.However, the differences in the accuracies among different parameter setups are very small.For probe number analysis, the difference between overall accuracy of the best and the worst classifications is near 0.03% for both case studies.For iteration time analysis, these differences are almost 0.08% and 0.3% for the two case studies.Thus, the conclusion can be drawn that, although the computational complexity of MADEM is more complex than the others, this kind of complexity can be reduced by decreasing the probe number and iteration times, which means, in this experiment, MADEM can reach an expectable result without more iteration and probe numbers.

Conclusions
In this paper, a new algorithm called MADEM is proposed for hyperspectral remote sensing imagery.MADEM can capture the rich information of spectral brightness and shape, which are both important for the spectral matching and discrimination process.It aims at finding separable fragments of encoded DNA strands of the spectral curves to distinguish them.For each pixel spectrum, it is encoded into DNA strands first.After that, DNA probes containing specific fragments are extracted for further matching and classification.The traditional ADEM method always compared the full strand of pixels with that of the reference spectral curves to classify the land covers.Unlike ADEM, the main idea of the MADEM method is that fragmented and dispersive probes contain the most specific fragments and help to improve the performance of classification.Because the strands contain rich spectral information, the probe extracting strategy reduces irrelevant fragments to participate in the matching procedure, which enhances discrepancy of different spectra.It is demonstrated that the proposed algorithm is superior to traditional classifiers, and MADEM is even better than SVM classifiers and DNA encoding-based ADEM method.

Conclusions
In this paper, a new algorithm called MADEM is proposed for hyperspectral remote sensing imagery.MADEM can capture the rich information of spectral brightness and shape, which are both important for the spectral matching and discrimination process.It aims at finding separable fragments of encoded DNA strands of the spectral curves to distinguish them.For each pixel spectrum, it is encoded into DNA strands first.After that, DNA probes containing specific fragments are extracted for further matching and classification.The traditional ADEM method always compared the full strand of pixels with that of the reference spectral curves to classify the land covers.Unlike ADEM, the main idea of the MADEM method is that fragmented and dispersive probes contain the most specific fragments and help to improve the performance of classification.Because the strands contain rich spectral information, the probe extracting strategy reduces irrelevant fragments to participate in the matching procedure, which enhances discrepancy of different spectra.It is demonstrated that the proposed algorithm is superior to traditional classifiers, and MADEM is even better than SVM classifiers and DNA encoding-based ADEM method.
Proper environment is conducive to detection efficiency.If the probes find the pairing bases, they match with each other, unless they are dissociative.Pairing rules are: (1) T matches G; and (2) C matches G.The dissociative probes are dissolved.Finally, the remaining DNA strands are tested.If there are biomarkers or radiation in the remaining strands, it means the target DNA strands are successfully detected.

Figure 4 .
Figure 4. Mechanism of DNA probe in biology.

Figure 4 .
Figure 4. Mechanism of DNA probe in biology.

Figure 5 .
Figure 5.DNA encoded spectral curve of two different land covers: (a) DNA encoded spectral curve of alunite; (b) DNA encoded spectral curve of halloysite; and (c) different code words of alunite and halloysite DNA strands.

Figure 5 .
Figure 5.DNA encoded spectral curve of two different land covers: (a) DNA encoded spectral curve of alunite; (b) DNA encoded spectral curve of halloysite; and (c) different code words of alunite and halloysite DNA strands.

Figure 10 .
Figure 10.Confusion matrix of all classifiers.

Figure 11 .
Figure 11.Overall Accuracy (OA) of two factors, probe number and iteration times: (a) in experiment 1 (Exp1), the line chart of OA influenced by probe number; (b) in Exp1, the line chart of OA influenced by iteration times; (c) in experiment 2 (Exp2), the line chart of OA influenced by probe number; and (d) in Exp2, the line chart of OA influenced by iteration times.

Figure 11 .
Figure 11.Overall Accuracy (OA) of two factors, probe number and iteration times: (a) in experiment 1 (Exp1), the line chart of OA influenced by probe number; (b) in Exp1, the line chart of OA influenced by iteration times; (c) in experiment 2 (Exp2), the line chart of OA influenced by probe number; and (d) in Exp2, the line chart of OA influenced by iteration times.

Table 1 .
Training and testing pixels.

Table 2 .
Overall accuracy and kappa coefficient of all classifiers.

Table 3 .
Training and testing pixels.

Table 4 .
Overall accuracy and kappa coefficient of all classifiers.

Table 4 .
Overall accuracy and kappa coefficient of all classifiers.

Table 5 .
Computational complexity of the algorithms.

Table 6 .
Time costs of all classifiers in the two experiment.