Fast PET Scan Tumor Segmentation Using Superpixels, Principal Component Analysis and K-Means Clustering

Positron Emission Tomography scan images are extensively used in radiotherapy planning, clinical diagnosis, assessment of growth and treatment of a tumor. These all rely on fidelity and speed of detection and delineation algorithm. Despite intensive research, segmentation has remained a challenging problem due to the diverse image content, resolution, shape, and noise. This paper presents a fast positron emission tomography tumor segmentation method using superpixels. Principal component analysis is applied on the superpixels and their average value. The distance vector of each superpixel from the average is computed in the principal components coordinate system. Finally, k-means clustering is applied on the distance vector to recognize tumor and non-tumor superpixels. The proposed approach is implemented in MATLAB 2016A, and promising accuracy with execution time of 2.35 ± 0.26 s is achieved. Fast execution time is achieved since the number of superpixels, and the size of distance vector on which clustering was done are low compared to the number of pixels in the image.


I. INTRODUCTION
Positron Emission Tomography (PET) is a non-invasive nuclear medicine functional imaging method that images the distribution of biologically targeted radiotracers with high sensitivity.PET imaging provides detailed quantitative information about many diseases and is often used to evaluate cancer with segmentation as a principal role.Image contrast enhancement is an essential pre-processing stage in image segmentation [1].For several years, great effort has been devoted to the study of image enhancement techniques; wavelet-contourlet transform [2], iterative denoising and partial volume correction [3], iterative deconvolution [4] were few among them.
Segmentation can be thought as two consecutive processes, recognition and delineation.Recognition is determining where the targeted object is in the image, while the second process is defining the spatial extent of the recognized region [5].[6], [7] demonstrated that manual segmentation is time-consuming, labor intensive, operator dependent, subjective, and these makes it less precise and reproducible.In the recognition process, regions of high uptake of tracer are identified either manually or automatically [8].
Although the number of PET image segmentation publications has always been lower than both CT and MRI [6], there have been some publications; graph cut and locally connected conditional random field via energy minimization [9], binary and Gaussian filtering regularized level set method with capability of detecting weak tumor boundary [10] were developed.In addition [11] developed k-means and fuzzy c-means clustering based segmentation; however, clustering was applied on image pixels directly and this in turns increases the execution time.
PCA based analysis of internal statistics of image patches gives tremendous insight to recognizing patterns in an image [12], which is applied to detect salient objects in natural images.
This paper presents implementation of unsupervised automatic PET image segmentation system to detect tumor regions from PET scans.Section 2 presents the mathematical formulation and implementation of the proposed approach which contains, contrast enhancement superpixels, PCA followed by k-means clustering to recognize the cancerous superpixels.Section 3 is devoted to discussion and evaluation of the simulation results.Finally, Section 4 concludes the paper.

II. IMPLEMENTATION
The workflow of the proposed approach is divided into three stages: Preprocessing, Feature extraction, and clustering segmentationwhere the second step can be divided into three sub-steps, and the third step into two as shown in Figure 1.

A. Preprocessing
Image enhancement is a subjective process to make the image suitable for the next step.
In this paper, piecewise contrast enhancement was applied during the preprocessing part.Upon extensive observation from different images, it was found that by stretching pixels values greater than 110 to a range of gray values from 200 to 255 using piecewise linear stretching makes the image easy for clustering.This is mathematically shown in equation (1) below.
where, I is input image and I enh is contrast enhanced image.

B. Feature Extraction
Feature extraction is a process of simplifying the content of a large set of data in order to describe it efficiently for the purpose of facilitating further processing, storage requirement, and dimensionality reduction.In this paper, features are extracted using superpixel and Principal Component Analysis (PCA) as described below.
Superpixel is a group of pixels in proximity that has similar intensity.Simple Linear Iterative Clustering (SLIC) algorithm [13] is applied due to its fast computational time [14], [15].The size of original superpixels extracted from SLIC is different as there might be a small number of pixels near each other with the similar pixel value in some of the region of the image(most of the time in tumor region), while the in non-tumor part of the image their size will be large.However, we need the same size of superpixels in order to apply PCA.This problem is solved as follows: 1) We computed average size of the superpixel as shown in equation (2).
Where N is the number of superpixels, n i is number of pixels in i th superpixel, M is average number of pixels per superpixel.
2) Then, the size of each superpixel is made same as of the average one by padding some pixel value to the smaller size superpixel and removing some intensity value from the large size superpixels.Instead of appending random intensity values to smaller size superpixels, we pad by repeating the last pixels value of the superpixel itself.Finally, the superpixel matrix is generated as shown in equation ( 3).
Where each column represents a superpixel pixels, M is in equation ( 2) and N is number of superpixels.
As the goal is to detect pixels that are cancerous, and we know in PET images pixels that belong to the tumor have distinct intensity due to high uptake of radioactive tracer, so we need a method that analyses the internal statistics and makes an easy differentiation between the cancerous superpixels.PCA is one of the novel methods to study internal statistics of data.In addition to that, PCA reduces the dimensional space of the data [17].In our implementation, PCA of superpixels is done as follow: 1) Compute average superpixel.
Where S i is the i t h superpixel and S a average superpixel.2) Determine the covariance of superpixels (C s ) Where Y superpixel matrix after average superpixel padding and Y t a is mean of transpose of Y .
3) Calculate the eigenvectors and eigenvalues of the covariance matrix Where P is matrix with eigensuperpixels(principal components) as column and Σ is diagonal matrix of eigenvalues.4) Project the superpixels onto Principal components that contain most variance of the data.Here, the number of Principal components is same as the number of superpixels.As stated in [19], the eigenvectors or Principal components that contain at least 95% variance of superpixels can represent the whole image by confidence.This reduces the dimensional space as most of the information is contained in the first two or three largest eigenvalues.
In our implementation, 95% variance of superpixels was contained in the top two principal components for most of the images.Once, the K dominant vectors are found for feature extraction (distance), the superpixel matrix is projected onto these dominant eigensuperpixels(eigenvectors) using equation (7).
Where P k is eigenvectors matrix that contains at least 95% of the variation in the image and P Pro j is the projection of superpixel matrix to P k .5) Calculate the distance of each superpixel in respect to average superpixel.
While computing distance, we should consider the distribution of superpixels in the principal component coordinate system [12].To incorporate this concept we computed the distance along the principal components.Mathematically, this will be computing L 1 norm distance in the principal components coordinate system as shown in Equation 8 below.
where Si is coordinate of S i relative to S a in the principal component coordinate system, and D(S i ) is L 1 norm distance.

C. Tumor Detection and Contouring
Currently, there are a variety of PET segmentation methods.The most commonly used methods are Fuzzy Locally Adaptive Bayesian (FLAB), Classification/Clustering, and some mixture of them.As stated in [6], there is a growing need for research in clustering based methods as they have the capability of detecting tumors with a complex shape in heterogeneous PET images.In our work, after distance vector is calculated in the principal components coordinate system, Kmeans clustering is applied.K-means is an algorithm that clusters a set of data based on distance measure.In our case, it separates the superpixels as tumor and non-tumor, which is binary classification using a minimization problem as shown in equation( 9).Then, morphological operations(erosion and dilation) are then applied to delineate the spatial scope of the tumor.
where c i is the set of points that belong to cluster i, µ i is the center of i t h cluster, X is distance vector extracted above and D is square of the Euclidean distance.

III. RESULT AND DISCUSSION
Figure 2 shows the input image with a corresponding enhanced image in figure 3. It can be clearly seen that contrast between tumor and non-tumor region of the image is enhanced.Figure 5 illustrates the L 1 norm distance of superpixels from their average along the principal components coordinate system.The horizontal axis represents the superpixel index and the vertical axis represents the distance from average superpixel.For the input image in Figure 2, the size of distance vector is 692 which far smaller than the size of the image (233x328).This is the reason why the execution time is so less for the proposed approach.Non-tumor superpixels (represented by a green point) are located near to average superpixel while tumor (red stars) are far from average.In addition, the heat map of the distance of superpixels from the average in the image space is shown in yellow color which is more distinguishable from the other superpixels with a large distance as depicted in the color bar.Internal statistics of tumor superpixels is so different from the average, thus, the distance will be very large.This classification can fail if cancerous part of the image is larger than the normal part.In case this happens, we have included another step to check some pixel values from each class so identification of the cluster to which the tumor belongs to can be more accurate.
Figure 7 shows the final result of our segmentation algorithm.As it is depicted in the figure the cancer region is identified and delineated correctly.
Table 1 contains information about the size of 3 sample images (2D) obtained from the 3 scans in the used dataset,   Even that [16] and [18] were tackling a similar problem to the one presented in our work, however, they have not provided any measures of the execution time of their algorithms.The main concern of our paper was to design a fast PET tumor segmentation.As it can be seen from the table above, execution time of our proposed approach is very fast due to the following reasons: First, there is usually a small number of superpixels compared to the number of pixels in the image.Second, PCA again further reduces the dimension of the data which is then fed as the input to classification.In addition to that, MATLAB vectorization capability has been also extensively exploited throughout our implementation.
Additionally, Dice similarity for our algorithm was 84.2% which is very a comparable and competitive value with respect to the work in [16] and [18] as they obtained a Dice similarity measures of 80%-85% and 92%, respectively.

IV. CONCLUSION
In this paper, we describe and evaluate PET image segmentation to extract cancerous part of the image.Piecewise contrast enhancement was first applied on the input image.Then, superpixel extraction and PCA was performed to extract feature for segmenting the image.After that, K-means clustering was applied to classify the image region into cancerous and non-cancerous parts.The experimental result shows that the proposed approach is capable of providing robust segmentation with fast execution time.
One of the major challenges encountered is the non-availability of public PET datasets to test the algorithm's performance on, that's why the algorithm was tested only on a small number of PET images.Therefore, testing and tuning the algorithm's parameters on other PET datasets surely will help increasing its generalization possibility.

Figure 2 :
Figure 2: Input imageFor the input image in Figure2, it was found that 95% of the variance of superpixels is contained in the top two eigensuperpixels

Figure 4 :
Figure 4: Scatter plot of projection of superpixels of the enhanced image onto the principal components

Figure 6 :Figure 7 :
Figure 6: Heat map plot of superpixels distance in image space and superpixels Kmeans clustering

Figure 8 :
Figure 8: Input image and tumor segmentation results of some test PET images.The first column are input images with corresponding segmented image on second column.

Table 1 :
Scan's sample sizes, number of supperpixels, size of distance vector, execution time and scan's average dice similarity.