Tongue Tumor Detection in Medical Hyperspectral Images

A hyperspectral imaging system to measure and analyze the reflectance spectra of the human tongue with high spatial resolution is proposed for tongue tumor detection. To achieve fast and accurate performance for detecting tongue tumors, reflectance data were collected using spectral acousto-optic tunable filters and a spectral adapter, and sparse representation was used for the data analysis algorithm. Based on the tumor image database, a recognition rate of 96.5% was achieved. The experimental results show that hyperspectral imaging for tongue tumor diagnosis, together with the spectroscopic classification method provide a new approach for the noninvasive computer-aided diagnosis of tongue tumors.


Introduction
Cancer of the tongue is a malignant tumor that begins as a small lump, a firm white patch, or an ulcer. If untreated, the tumor may spread throughout the tongue and to the gums. As the tumor grows, it becomes increasingly life threatening by metastasizing to lymph nodes in the neck and to the rest of the body. Early detection has an immense effect on outcome because cancer treatment is often simpler and more effective when diagnosed at an early stage. Tumor detection methods may help physicians diagnose cancers, to dissect the malignant region with a safe margin, and to evaluate the tumor bed after resection. Currently, histopathology is still the gold standard for cancer diagnosis. However, this OPEN ACCESS method is invasive, expensive, greatly depends on the judgment of pathologists, and needs time for preparing the results. Moreover, the biopsy specimens can only be captured from a few points. Therefore, a simple, noninvasive, and reliable technique for rapid cancer detection is required to aid physicians.
Computer vision technologies provide an approach to computer-aided diagnosis assisted by digital cameras. Conventional color cameras acquire color intensity from three broad spectral visible bands, i.e., red, green, and blue. However, the actual information from these three bands is very limited [1].
Hyperspectral image (HSI) sensors generate two-dimensional spatial images along a third spectral dimension. Each pixel in the hyperspectral image has a sequence of reflectance in different spectral wavelengths that display the spectral signature of that pixel; this indicates that this kind of sensor measures the intensity over a hundred or more narrow spectral bands. Currently, hyperspectral imaging has been used for medicine, where it is known as medical hyperspectral imaging (MHSI). MHSI is a novel, camera-based method of imaging spectroscopy that integrates spatial and spectroscopic data from tissue in a set of images. MHSI delivers near-real-time images of biomarkers in tissue, thereby providing an assessment of pathophysiology and the potential to distinguish different tissues based on their spectral characteristics. Therefore, MHSI is a promising method for noninvasive, rapid, and inexpensive evaluation of cancer in the tumor bed at the time of diagnosis [2].
There have been several previous studies on MHSI in the last decade [3]. Panasyuk used MHSI in distinguishing tumors from normal breast and other tissues, which demonstrated a sensitivity of 89% and a specificity of 94% for the detection of residual tumors [4]. Akbari [5] detected gastric cancer by MHSI. Klaessens et al. [6] measured the changes in O 2 Hb and HHb concentration in tissues. Liu [7] and Li [8] used MHSI for tongue diagnosis in Traditional Chinese Medicine. Marzani [9] used an artificial neural network-based multispectral imaging system to reconstruct the hyperspectral cutaneous data. Novakovic [10] presented their work on the phototherapy of psoriasis based on spectrophotometric intracutaneous analysis. Medina [11] worked on the human iris in vivo by MHSI. Larsen [12] studied atherosclerotic plaques by MHSI. All these research results demonstrate that MHSI has tremendous potential for detecting important biomarkers based on their unique spectral signatures during the early stages of disease. However, some bottlenecks have limited its use for in vivo screening applications, most notably their huge temporal cost and poor spatial resolution. In addition, their sensitivity and accuracy need to be improved. For tongue cancer, because of the instinctive squirming of the human tongue and the noise caused by saliva, detecting the tumor range accurately is difficult under MHSI. To address this problem, we present an MHSI system based on an acousto-optic tunable filter (AOTF) and the corresponding classification algorithm based on sparse representation (SR). The rest of this paper is organized in four sections. Section 2 presents the data acquisition of the proposed system. Section 3 introduces the proposed method for cancer area detection. Section 4 then presents the experiments conducted to evaluate the performance of the proposed system. Finally, concluding remarks are offered in Section 5.

Hyperspectral Image Acquisition
Over the last years, hyperspectral imaging devices have been mainly based on two sequential acquisition principles. The first is wavelength scanning; single images are recorded for each different wavelength by many discrete filters or tuneable filters. The second is spatial scanning, which requires relative movement between camera and sample [13]. For medical applications, especially for tongue tumor detection, tuneable filters are preferable because they are fast and versatile, and do not require any mechanically moveable parts. Furthermore, the spatial resolution of HSI systems is (within limits) independent of the tuneable filter and can be optimized by selecting optics and cameras [14]. AOTF is a rapid wavelength-scanning solid-state device that operates as a tunable optical band pass filter. The acoustic wave is generated by radiofrequency signals, which are applied to the crystal via an attached piezoelectric transducer [15].
AOTF is based on the acoustic diffraction of light in an anisotropic medium and it has several advantages over traditional spectrometers, which are typically based on a filter wheel or a grating, and therefore require careful handling and frequent calibration. They also suffer from lower scan speeds and lower reliability. AOTFs are solid-state tunable filters with no moving parts and are therefore immune to orientation changes or even severe mechanical shock and vibrations. Moreover, AOTFs are high-throughput and high-speed programmable devices capable of accessing wavelengths at rates of up to 100 kHz, making them excellent tools for spectroscopy. Other advantages of AOTF technology are their broad tuning range (0.4-5 µm), large field of view, and the fact that they are electronically programmable. Image capture can achieve real-time acquisition (30 frames per second or much faster). Unlike grating-based instruments, no motion of the imager or object is required to obtain a complete image cube. This feature makes the structure of the new AOTF-based system simpler and more compact. A schematic diagram of the proposed system for tongue tumor detection is shown in Figure 1. The hyperspectral imaging system consists of a spectral illuminator, which provides spectroscopic light to the sample and a focal plane array detector, which captures the reflected spatial image information, synchronized by a computer program. The AOTF unit is a VA210, with a wavelength range of 600-1,000 nm (Brimrose Corporation, USA), controlled by a PC controlled RF Driver, which is handled by the computer through the RS232 cable. The camera is a JAI BM141-GE with a GigE port and a 1,392 × 1,040 array with 6.45 µm × 6.45 µm pixels, a frame rate of 30 frames/s at full resolution in continuous operation, which provides good performance for transferring the huge amount image data. 81 mono-channel images with 5 nm spectral resolution were used for tongue tumor detection. A pair of 500 W halogen lamps, which can provide a fairly uniform lighting of the subject, were used as light sources for illumination. Computer software was specifically developed by the authors for managing the spectral illumination turn on/off, data collection, image processing, and classification. The optical part of the system and a standard Tamron 28-80 mm f//3.5 lens are mounted in front of the AOTF unit. The light source, battery backup systems, and power supplies are placed on a cart, which provides the system with portability within and between surgical and clinical suites. Using this system, every point on the surface of the tongue is represented on the matrix detector by a series of monochromatic points that produces a continuous spectrum in the direction of the spectral axis, which is illustrated in Figure 2.

Method for Tumor Detection
In this section, the method for tumor detection in medical hyperspectral images based on SR is presented. First, the preprocessing on the hyperspectral image is introduced. Second, the details of the sparse representation model used in the proposed algorithm are described. Finally, SR is extended for tumor target detection. The overview of the method is shown in Figure 3. The method includes two stages: training and testing. For training, the hyperspectral images of the tongue tumor are collected to learn the dictionary for SR after the denoising module and normalization module. Similar to the training stage, the test sample also has a sparse representation after denoising and normalization. Then, the reconstruction residuals are computed for comparison to get the decision. The following subsection describes these steps in detail.

Preprocessing
The preprocessing step mainly includes the denoising and normalization. For tongue tumor detection using a hyperspectral imager, the noise is introduced because of the saliva on the surface of the tongue and its instinctive squirming. A median filter is used to remove the noise effects.
Normalization of the hyperspectral image data is necessary to eliminate the influence of the dark current. A standard reference white board was placed in the scene of imaging, and its data were utilized as the white reference. This white reference is a standard reflectance that should be used for data normalization, which shows the maximum standard reflectance in each wavelength and in the capturing of time temperature. The reflectance from the board provides an estimate of the incident light on the tongue at each wavelength, which is used in the normalization of the spectrum. The dark current was captured by keeping the camera shutter closed. Then the data were normalized to determine the relative reflectance using the following equation: where ( ) R λ is the calculated relative reflectance value for each wavelength,

Sparse Representation
SR has proven to be an extremely powerful tool for representing and compressing high-dimensional signals. This success is mainly because important classes of signals, such as audio and image signals, have natural sparse properties [17]. SR is able to extract the simple but important properties of the data. SR has been used for face recognition in gray images successfully [18]. Recently, Chen [19] has extended SR for classification in hyperspectral images. For completeness, SR is briefly introduced as follows.
Suppose that there are L image classes and n training images with p × q pixels. Each image can be represented as a column vector with D = p × q dimensionality. This means that the image with p × q pixels can be represented as a column vector with p × q dimensionality by concatenating each column of the image. Let A test vector D y R = from an unknown class, which can be represented by a linear combination of the training vectors as: 1 1 where ij a R ∈ are the coefficients. Equation (3) can be written as: where 1 ⋅ denotes the 1 l norm. This problem is often known as basis pursuit and can be solved in polynomial time [20]. The 1 l norm is an approximation of the 0 l norm. The approximation is necessary because the optimization problem in Equation (6) with the 0 l norm, which is used to seek the sparsest α , is NP-hard and computationally difficult to solve [21]. Considering that noise is inevitable in natural images, Equation (6) can be written as: 1 arg min = α α α subject to 2 y ε − ≤ Aα (7) where ε is the error tolerance.Therefore, the test sample can be represented as:

Tumor Detection Based on SR
Based on the work by Chen [19], a method for tumor detection in MHSI is proposed. Let x be a hyperspectral pixel observation, which is a B-dimensional vector whose entries correspond to the spectral bands with B being the number of spectral bands. In the hyperspectral images of the tongue, if x is a noncancerous pixel, its spectrum approximately lies in a low-dimensional subspace spanned by the noncancerous training samples. Then, x can be approximately represented by a linear combination of the training samples as follows:  (9) where nc N is the number of noncancerous training samples, nc A is the B × N c noncancerous dictionary consisting of the noncancerous training pixels, and α is a sparse vector whose entries contain the abundances of the corresponding atoms in nc A . A cancerous pixel x can be sparsely represented by a linear combination of the training samples: where N c is the number of cancerous training samples, A c is the B × N c cancerous dictionary consisting of the cancerous training pixels, and β is a sparse vector whose entries contain the abundances of the corresponding atoms in c A . A test sample lies in the union of the noncancerous and cancerous subspaces. Therefore, by combining the two dictionaries nc A and c A , a test sample x can be written as a sparse linear combination of all training pixels: where A is a ( ) If the solution is sparse enough, the optimization problem in Equation (12) can be solved efficiently as a linear programming [20] or by greedy pursuit algorithms [22,23].
Once the sparse vector is obtained, the class of x can be determined by comparing the residuals || || and || || , where and represent the recovered sparse coefficients that correspond to the noncancerous and cancerous dictionaries, respectively. In the proposed approach, the algorithm output is calculated by If ( ) 0 D x > , then x is determined as a cancerous pixel; otherwise, x is labeled as noncancerous.

Experimental Results and Analysis
To the best of our knowledge, there is no public medical hyperspectral image database. Therefore, we constructed our own tongue tumor image database. The current database includes 65 tumors and performed partial resection of 34 tumors, which yields 34 full tumor/partial resection/tumor bed sets for analysis. For performance evaluation, both the results under MHSI and histopathology were recorded. Figure 4 presents examples of tongue tumor hyperspectral images in the database. Each pixel in the hyperspectral image has a sequence of brightness in different wavelengths, which constructs the spectral signature of that pixel. The difference in spectral signature between the tumor and the normal tissue can be determined. The curves in Figure 5 show the difference in spectral signatures between normal and cancerous tissues. The values are the averages of the reflectance of the pixels from the normal and cancerous tissue regions. The curves are smoothed for a clear image. The standard deviation in each wavelength is shown in Figure 6. In this figure, the red squares and blue circles represents the standard deviation of the reflectence values of the pixels from the normal tissue region and the cancerous tissue region respectively. The differences between the spectral signatures are strongly related to the protein changes, as shown in the paper by Tsenkova [24].
SR classifier is used to detect the cancerous tissues based on the spectral signature. As the spectral signatures of normal tissues are different from cancerous ones, 81 bands were used without compression to separate the normal from the cancerous parts. After this step, majority of the pixels were detected, although there were some that were lost because of glare. To address this problem, we used mathematical morphology method as a post processing step to fill the holes. In this study, to lend credibility to the performance analysis of the system, histopathologic analysis was performed by a physician on each sample and confirmed normal and malignant tissue locations as the basis for comparison. Figure 7 shows the performance of the detection using the proposed method based on SR. As shown in the figure, the system was able to identify the same cancerous regions as the medical expert. To evaluate the performance of our method, we randomly chose around 10% of the labeled samples for training and 90% for testing. The number of training and test samples for each class is shown in Table 1.   The proposed method, as well as the classical methods, e.g., support vector machine (SVM) [25], relevance vector machine (RVM) [26], were applied to the MHSI for tongue tumors and the results were compared quantitatively by the curves shown in Figure 8. The graph describes the probability of detection as a function of the percentage of training samples. In Figure 8, the proposed method based on a 2D medium filter and SR outperform the other two popular methods with 96.5% accuracy. The method worked well even in tumors up to a depth < 3 mm and was covered with mucosa [5]. This SR-based method, unlike SVM and RVM, search for dedicated atoms in the training dictionary for each test pixel (i.e., the support of the sparse vector is dynamic). Therefore, the sparsity-based algorithms are more computationally intensive than SVM and RVM. The computer used in this system was equipped with an Intel ® CPU i7 with a 4 GB random access memory. The comparison in terms of speed of classification is shown in Table 2. As shown, SR achieves much faster classification times than the other popular methods.

Methods
Our method SVM RVM Classification time (s) 3.4 7.8 6.9 The performance criteria for cancer detection were the false negative rate (FNR) and the false positive rate (FPR), which were calculated for each hyperspectral image. When a pixel was not detected as a tumor pixel, the detection was considered as false negative if the pixel was an actual tumor pixel in the pathological results. FNR refers to the number of false negative pixels divided by the total number of tumor pixels. When a pixel was detected as a tumor tissue, the detection was a false positive if the pixel was not a tumor. FPR refers to the number of false positive pixels divided by the total number of normal tissues. The numerical results of the FPR and FNR and a comparison among our method, SVM, and RVM is given in Table 3. Table 3. Evaluation results with FPR and FNR.

Conclusions
A hyperspectral image system for tongue tumor detection based on AOTF technology has been presented. Algorithms based on the spectral characteristics of the tissues and sparse representation, were proposed to distinguish between tumors and normal tissues. The capability of the system has been proven through an MHSI dataset. A best recognition rate of 96.5% was achieved. The experimental results demonstrate that the system has great potential as an important imaging technology for medical imaging devices that provide additional diagnostic information regarding tissues under investigation. Although the final diagnostic decision remains the burden of physicians, the system supports physicians during decision making.
Follow-up studies on patients are planned to allow the quantitative grading of tumors automatically according to their clinicopathologic features and to study further the spectrochemical properties of tumor tissues. This system has obvious applications as a computer-aided medical diagnostic tool. The modality of imaging combined with spectroscopic data will prove useful in tumor detection and in the assessment of tissue response to therapy.