Supervoxels-Based Histon as a New Alzheimer’s Disease Imaging Biomarker

Alzheimer’s disease (AD) represents the prevalent type of dementia in the elderly, and is characterized by the presence of neurofibrillary tangles and amyloid plaques that eventually leads to the loss of neurons, resulting in atrophy in specific brain areas. Although the process of degeneration can be visualized through various modalities of medical imaging and has proved to be a valuable biomarker, the accurate diagnosis of Alzheimer’s disease remains a challenge, especially in its early stages. In this paper, we propose a novel classification method for Alzheimer’s disease/cognitive normal discrimination in structural magnetic resonance images (MRI), based on the extension of the concept of histons to volumetric images. The proposed method exploits the relationship between grey matter, white matter and cerebrospinal fluid degeneration by means of a segmentation using supervoxels. The calculated histons are then processed for a reduction in dimensionality using principal components analysis (PCA) and the resulting vector is used to train an support vector machine (SVM) classifier. Experimental results using the OASIS-1 database have proven to be a significant improvement compared to a baseline classification made using the pipeline provided by Clinica software.


Introduction
Alzheimer's disease (AD) is the predominant form of dementia in the elderly, and the number of patients is expected to multiply over the next few years [1]. Alzheimer's disease, at its earliest stage, involves small-scale alterations in the brain defined by the presence of neurofibrillary tangles and beta-amyloid plaque deposits (Aβ) [2]. These alterations result primarily in damage to synapses, followed by degeneration of the axons and, ultimately, atrophy of the dendritic tree and perikaryon and leading to atrophy in specific regions of the brain [3]. This process especially affects specific areas of the brain such as the right and left hippocampus, temporal gyri, cingulate and precuneus [4].
It is becoming increasingly apparent that, when a patient is diagnosed with Alzheimer's disease, the atrophy is already well established in the brain. The earliest clinical presentation of symptoms that can eventually progress to a clinical diagnosis of Alzheimer's disease is generally classified as mild amnesic cognitive impairment (MCI) [5]. MCI can be considered a state of clinical impairment, commonly memory loss, beyond that which can be expected for the subject's age and education without meeting the criteria for classification as dementia. Although not all patients with amnestic MCI will develop AD, cerebral atrophy is already present at this stage [6]. In mildly Alzheimer's disease affected individuals, entorhinal volumes have already been reduced by 20-30% and hippocampus volumes hippocampus, thalamus, and lateral ventricles, in order to perform a linear discriminant analysis for AD prediction.
Related to the methods derived from density maps, but based on the use of textural measurements instead of direct measurements, in this work, we propose the extension of the concept of histons [30] to volumetric images and its use as a textural feature in the classification of T1-weighted MRI images, in order to differentiate Alzheimer's disease (AD) patients from cognitive normal (CN) patients. Textural methods can identify voxel-intensity patterns and relationships hidden from the unaided human eye [31]. The histon concept represents a way to visualize information about color regions in an image. Compared to other textural characteristics, a histon is particularly sensitive to subtle variations in color in relation to the space [32]. The proposed work offers a simple whole-brain descriptor based on the relationship between the voxel probabilities corresponding to gray matter, white matter or cerebrospinal fluid-instead on a unique voxel intensity measurement as in [33] or [34]. The partitioning of the MRI image into these three probability volumes allows each of them to be equated to a spectral band and to characterize the MRI volume within the RGB color space.
The rest of the paper is organized as follows: Section 2 briefly reviews the concepts underlining this work: histons and supervoxels. The materials used and the general pipeline of the presented method are detailed in Section 3. The experimental results are presented and discussed, in Section 4, concluding in Section 5 with the final remarks and possible future work related to the proposed classification method.

Background
This section presents the underlying concepts behind our proposal, those related to the superpixel-based segmentation carried out, as well as specific features proposed for Alzheimer's disease classification.

Superpixel Segmentation and SLIC
A superpixel can be defined as a perceptually uniform region in the image (see [35]). The concept behind the idea of superpixels is the fact that the construction of an image using pixels is simply a technological limitation associated with the image capturing device, not a real property of the source of that image. A superpixel segmentation produces a set of small spectrally-constrained areas in the image, an over-segmentation, which can be used for the estimation of local scale image features. Superpixels capture redundancy in the image and significantly reduce the complexity of the subsequent image processing operations. Since their introduction, superpixels have been successfully used in applications such as pre-processing steps in image segmentation [36,37], depth estimation [38], tracking [39] or skeletonization [40]. Multiple techniques have been developed to generate superpixels. In this work, we will use SLIC. Simple Linear Iterative Clustering (SLIC, [41]) is a spatially constrained revision of the k-means algorithm. It begins by sampling a number of regularly-spaced positions as centres of the clusters, followed by a k-means clustering procedure. SLIC redefines the k-means distance D w by adding a second, colour-based component so the clustering distance between two different pixels is weighted by the colour distance and space distance, defined as: where both the colour and spatial distance between the pixels I (x 1 , y 1 , s i ) and I (x 2 , y 2 , s i ) in the spectral band s i are represented by d c and d s , respectively. Related to the spatial distance, the parameter In is the sampling interval of the cluster's centroids. Associated with the color distance, the parameter m controls the compactness of the superpixels, as the greater the value of m, the more the spatial proximity is emphasized. The spatial distance restrictions ensure superpixel homogeneity. SLIC produces an even superpixel distribution on the image that adheres to the object's limits as well as or better than the supepixels generated by others methods (see [42]). The SLIC generation algorithm can be easily modified to work on different spatial or spectral domains. Specifically, the extension of these ideas to a volumetric image is called "supervoxel".

Histon
A histon [43] is a contour added on the top of any of the existing histograms of the spectral components of the image. It exploits co-occurrences between neighbouring pixels, both on the same spectral plane as in adjacent planes, as a method of asserting an intra and extra planar correlation between the components of the image. In this context, a histon can be considered as a textural feature. A histon is defined by a similar colour sphere, known as similarity threshold or expanse, E, and a spatial distance measurement d t (x, y) that defines which pixels should be inserted into each bin of a histogram.
A similarity threshold defines an area in the spectral intensity space in which all the intensity values within this area can be considered part of the same colour value. For an intensity value g in the base histogram, this similarity threshold defines the set of points to be evaluated for its pertinence to the corresponding bin in the histogram. Given an image, I(x, y, s) of size M × N, where s are the spectral planes in the image, a histon can be expressed as: where, for each of the spectral components, L is the number of intensity levels, δ (·) is the Kronecker delta (consequently, δ (I (x, y, s i ) − g) that can be seen as a definition of a histogram) and S (x, y) is a similarity function that tests whether or not an element of the neighbourhood is part of the similar colour sphere (4). If the sum of the spectral distances of the planes is defined as s 1 , · · · s i , ∈ S p for the neighbourhood of sizes P × Q of any element in the image (x, y), the distance measurement d t (x, y) provides the following similarity function: In Section 3.3, we propose an extension of the concept of a histon based on a supervoxel segmentation.
The correlation between histogram and histon [43] has been used as an image segmentation method, both for photography [44] and in some modalities of medical imaging [45], including MRI [46]. However, the potential of a histon, in a way a context-aware histogram, as an element of characterization remains essentially unexplored.

Materials and Methods
The classification process presented is divided into three stages, as can be seen in Figure 1. In the first stage, the dataset is processed in order to carry out bias correction and spatial normalization, and to obtain a white matter/grey matter/cerebrospinal fluid segmentation. In the second stage, we calculate a set of histons to be used as a feature vector. To provide a natural neighbourhood for the histon-calculation process, we carry out an over-segmentation using the aggregate volume for GM, WM and CSF. Finally, feature reduction is achieved by means of PCA and the resulting feature vector is used to train an SVM-based classifier.
The subjects included in this study were obtained from the Open Access Series of Imaging Studies (OASIS-1).

Dataset: OASIS
The Open Access Series of Imaging Studies (OASIS) [47] is a project aimed at making MRI data sets of the brain freely available to the scientific community. OASIS is made available from the Washington University Alzheimer's Disease Research Center, Dr. Randy Buckner at the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).
For this study, the "Cross-sectional MRI Data in Young, Middle-Aged, Non-demented and Demented Older Adults" (OASIS-1) have been selected. This collection is a cross-sectional dataset of 416 individuals of both genders, all right-handed, aged between 18 and 96. The set includes 100 patients (aged over 60) with a clinical diagnosis of Alzheimer's disease ranging from very mild to moderate. A summary of the demographic characteristics of for the OASIS-1 is shown in Table 1.

Clinica Software
In order to work with a standardized pre-processing work-flow, compatible with multiple neuroimaging databases, the volume pre-processing and general dataset management would be carried out using the Clinica software (version 0.1.0). Clinica is a software platform for clinical neuroscience research studies, developed by the ARAMIS Lab at the Institut du Cerveau et de la Moelle épinière (ICM, Brain & Spine Institute) in Paris, using multimodal data (neuroimaging, clinical and cognitive evaluations, genetics, etc.) and, most often, a longitudinal follow-up.
The general image pre-processing process of a T1-weighted MRI image implies tissue segmentation, bias correction and spatial normalization to the Montreal Neurological Institute (MNI) space. The Clinica software wraps the segmentation procedure from SPM (Statistical Parametric Mapping) [48] that carries out all these processes simultaneously, in a procedure known as "Unified segmentation" [49].
SPM models the brain as a layer of cerebrospinal fluid surrounding the gray and white matter. The prior probability that any voxel contains grey or white matter can be determined using a probabilistic atlas of tissue types. The main idea of this method is to model image intensities as a mixture of k Gaussians, where each Gaussian cluster is modelled by its mean, variance and a known tissue mixing proportion. In the unified model, multiple tissue probability maps are used as a priori information of the tissue classes. The Bayes rule is used to produce the posterior probability of each tissue class. This posterior probability is then combined with the data from the image to determine the final tissue type. Using this approach, two voxels with identical intensities can be identified as different tissues. An example of the results obtained can be seen in Figure 2. The pipeline then computes a group template by applying the DARTEL (Diffeomorphic Anatomical Registration Through Exponentiated Lie. Algebra [50]) diffeomorphic method to the T1-weighted MRI image of each subject considered. In computational anatomy, a diffeomorphic system is a system designed to assign metric distances on the space of anatomical images, in order to permit the quantization and comparison of morphometric changes in anatomical structures. Diffeomorphic mapping is a broad term that may actually refer to a number of different algorithms, processes, and methods. DARTEL is based on the idea of producing a bidirectional "flow field" as the core for image "deformation" in the process of image registering.
The DARTEL process begins by taking the parameter produced by a GM/WM/CSF segmentation and aligning it as close as possible to a set of tissue probability maps, by means of rigid transformations. In the next step, from the average of all the images, an initial template is created that is then used for the simultaneous registration of tissues between images. This model is used to compute individual deformations to each of the individual images, and finally the inverse of the deformations are applied and averaged, in order to regenerate the template. This process is repeated several times. When comparing data from a number of scans, all cerebral volumes are required to be in the same 3D space. In this process, it is achieved by normalizing the volumes on the space defined by the Montreal Neurological Institute (MNI) template.
Finally, Clinica provides a modular way of making a classification based on machine learning by combining different inputs, algorithms, and validation strategies. These modules rely on scikit-learn for classification purposes [51].

Features Extraction: Supervoxel-Based Histons for Structural MRI Alzheimer Detection
As can be seen in [52], in the process of unified segmentation, there is a relationship between neuronal degeneration and the presence of voxels with a relatively low probability of it being part of a specific type of tissue. Since Alzheimer's disease tends to manifest itself as atrophy in specific areas of the brain, in these cases, the process of tissue segmentation will tend to show "ambiguous" areas difficult to classify as one or the other type of tissue, in addition to the decrease in the total volume of white matter and grey matter associated with ageing. In this work, we propose the possibility of exploiting both the decrease in the total volume of GM and WM and the relationship between GM, WM and CSF through the use of histons as a textural volumetric characteristic.
In [53], we propose the use of super pixel segmentation (using SLIC) as a way of mitigating some of the limitations of the original histon generation method [44] (histons calculated from a color sphere based on a predefined neighbourhood and colour distance), as a method oriented to the segmentation of multispectral images. In this case, we have extended the original method to MRI volumes.
Thus, we have used an aggregation of the probability volumes for GM, WM and CSF to carry out a segmentation using supervoxels. In this respect, for segmentation purposes, the different probability volumes (which are represented as 8-bit intensity maps) can be equated to the spectral bands of a colour volume (see Figure 3). Using a supervoxel segmentation (over-segmentation), we get a way of characterizing the local similarities within the aggregate volume, obtaining a natural set of neighbourhoods for the generation of histons. Similarly to what can be seen in [53], the use of supervoxels as a neighbourhood implies the adherence of the neighbourhoods set to the boundaries and features present in the image, so, when a histon is calculated, there is a direct spatial relationship between the voxel tested for its belonging and the color sphere. Furthermore, we can quantify the overall local homogeneity of a volume using the average intensity standard deviation of the supervoxel-defined space since segmentation using supervoxels already produces locally homogeneous areas. We will associate intensity with the probability of it being GM, WM and CSF in each of the probability volumes. Thus, when calculating a histon, a voxel will be considered to be inside the color sphere if the distance between the mean intensity of its corresponding supervoxel and the intensity of that voxel is less than the mean local deviation of the probability volume in the space defined by the supervoxels, in each of the probability volumes. Let us denote the set of resulting supervoxels in an image segmentation as Cs = {Cs 1 , Cs 2 , · · · , Cs Ns }, where Ns is the total number of supervoxels; I(x Cs j , y Cs j , z Cs j , v i ) represents the centroid of the supervoxel Cs j in the probability volume v i ∈ V p , where V p is the set of probability volumes V p = {W M, GM, CLF}; and N p j is the number of voxels in that supervoxel. Then, the similarity function S 2 (x, y, z) is defined as: The use of supervoxels as a neighbourhood allows the probability distribution to be represented in a natural and accurate way as a supervoxel represents a real volume within the image, created by taking into account the features present in the image. Thus, the histons will encode the relationships between the probabilities of their being GM, WM and CFS by taking into account their spatial distribution in a volume, based on a natural set of neighbourhoods.
The feature vector obtained consists of 768 components (a different histon for each probability volume, each with 256 levels).
Working with high-dimensional feature vectors makes a classifier prone to over-fitting by choosing the wrong dimension as a discriminatory feature. To decrease the high dimensionality represented by the aggregate histon vector, principal component analysis (PCA) is used. PCA [54] aims to transform a set of original variables into a new set of variables, a linear combination of the original ones, called principal components (PCs), without losing any information. For a standardized dataset, the principal components can be calculated as the normalized eigenvectors of the covariance matrix of the original variables and can be sorted by the amount of variation found in the data they explain. From a geometrical point of view, each component can be viewed as the maximizing direction of the variance of the samples, uncorrelated to previous components, when they are projected onto the component itself. The number of components extracted is equal to the number of variables being analysed, so only a subset of them are generally used in the classification. Usually, only the first few components account for meaningful amounts of variance, and the rest will tend to represent only trivial amounts of variance.

SVM Classifier
To carry out the categorization of the T1-weighted MRI volumes, in order to separate Alzheimer's disease patients and cognitive normal patients, the Support Vector Machine (SVM) [55] has been selected to train the classifier, as it generally yields good results and is remarkably robust to model bias or model variance [56].
SVM is a general supervised learning method able to carry out binary group separation. An SVM belongs to the category of linear classifiers, as the classification is carried out by finding the plane or (depending on the dimensionality of the problem) hyperplane that better differentiates the two classes. The idea is to obtain what is called a maximum margin on each side of the hyperplane by selecting an equidistant separation hyperplane from the closest samples of each class. Only the data that define the borders (the support vectors) of those margins are considered. The search for the separation hyperplane in these spaces, normally of very high dimension, will be implicitly made using the so-called kernel functions. From an algorithmic point of view, the problem of optimizing the geometric margin represents a quadratic optimization problem with linear constraints that can be solved using standard quadratic programming techniques.

Results
The experiments were carried out in a subset of the T1-weighted MRI transversal image part of the OASIS-1 dataset, using all the subjects aged 60 and over. To assess the differences in demographic and clinical characteristics between groups (AD and CN), we used a Student's t-test for age and MMSE (Mini-Mental State Examination) and Pearson's chi-square test for gender. The significance level was set at 0.05. Significant differences between controls and patients were found for the MMSE, and gender. The gender differences between groups, as well as the general large variability in age of the dataset, are factors that can result in a bias in the classification results. Taking this into account, a second reduced dataset will be used, their subjects selected randomly under the criteria of minimizing gender differences between groups and, as far as possible, discarding outliers to decrease standard deviation of the age. A summary of the subject's demographics and dementia status for the population of both subsets is detailed in Table 2. Segmentation and feature extraction is carried out with our own implementation of SLIC on MATLAB (MATLAB and Statistics Toolbox Release 2017b, The MathWorks, Inc., Natick, MA, USA), feature reduction and classification is carried out using R.

Validation Strategy
In order to obtain unbiased estimates of the performances, following the recommendations presented in [57], each dataset is randomly split ten times into two groups: training sets and testing sets (80/20%). The split division process preserves the distribution of age and gender. Each training set is used to train a classifier, and their corresponding testing sets are used for evaluation purposes. The training sets obtained from the aged 60 and over dataset are also used to determine the optimal kernel for the SVM classifier. Individual demographic information for each split can be seen in Appendix A.
For comparison purposes, the results provided by the machine learning-based classification modules are used as a baseline, following the lines proposed in [58]. In this specific case, DARTEL-modulated gray matter probability maps obtained from the T1-weighted MRI images are used to calculate a linear kernel using the Gram matrix from the feature vectors of the subjects provided (all the voxels in the volume). This kernel is used as input for a generic SVM whose cost parameter is optimized to improve the balanced accuracy by means of an exhaustive grid search. This process is repeated on both aged over 60 and reduced datasets, for each split.

SVM Parameters and Kernel Selection
To decide the suitable size of the PCA-based dimensionality reduction, the scree graph method is used. The scree graph [59] shows the the eigenvalues of the covariance against number of principal components. The scree test is used to decide on the size of the feature vector via visual analysis, by looking for a "break" between the components with relatively large eigenvalues and those with small eigenvalues. When the curve bends displaying an "elbow", it is assumed that the variance explained will not increase significantly with the addition of mere eigenvectors. Figure 4 shows the scree graph, plotting the variance explained in terms of the first 30 main components of the dataset analysed. As can be seen, the elbow occurs between the 5th and 6th principal components; therefore, the first five components appear to be enough to describe the variance in the data. Since we do not know the specific characteristics of the processed dataset and, therefore, we do not know which is the most appropriate kernel, generic linear, polynomial and radial kernels for the SVM classifier are tested. We apply a 10-fold cross-validation methodology, repeating the folding experiment 10 times for a total of 100 iterations of the algorithm for each of the training subsets of the dataset of subjects aged 60 and over.
Adjusting the parameters in an SVM classifier represents a compromise between achieving the model that best fits the training set and maintaining the classifier ability to generalize to new data (see [60]). A process of parameter optimization (model fitting) can lead to a hyperplane too focused on classifying each element of the training set correctly, resulting in a loss of generalization properties. While this does not have to result in an overfitting problem, it is a possibility that should be taken into account, so no optimization step is carried out on the presented models. The default parameters for the SVM in R are used ( 1 for the cost parameter, 1/(data dimension) for the gamma parameter and 3 for the degree parameter). From the results presented in Table 3, we will select a linear kernel to carry out the rest of the experiments, as it represents an improvement all the performance measurements evaluated.  Table 4 shows the means of the results obtained with the dataset of subjects aged 60 and over (upper table) as well as with the reduced subset (lower table) using both the Clinica baseline and the proposed histon-based feature classification methods. Individual evaluation results for each split can be located in Appendix B.

Classification
To assess whether the proposed method performs significantly better than the Clinica baseline classifier, we used McNemar's chi-square tests. The use of histons as features represents an improvement in all the proposed evaluation metrics (McNemar test p < 0.05 for all splits, except split 5). As can be appreciated from both the confusion matrix (see Table 4, upper row) and the negative and positive prediction values (NPV and PPV), the errors of classification in the Clinica baseline have a slight bias towards false negatives, whereas, for the proposed method, errors are mainly associated with false positives. It should be noted the remarkable differences between the results presented in Tables 3 and 4 for the aged 60 and over dataset. This is the result of the different evaluation strategies between the two cases. Cross-validation not only has a pessimistic bias (see [57]), but the cross-validation folds are completely random, not respecting the age or gender distribution of the original dataset.
As expected, using the reduced dataset without the bias imposed by significant differences in gender between AD and CN groups and less variability in age, the results improve for both the Clinica baseline and the proposed method, as can be seen in Table 4 (lower table). In this case, the proposed method provides better results for all evaluation metrics compared to the results with the aged 60 and over dataset. The distribution of errors does not differ from the previous scenario (see Table 5, lower row). In this case, we can not claim significance for the results using the McNemar test.It should be noted that the McNemar's chi-square test may be inadequate for small sample sizes [61].

Conclusions
In this study, we propose the use of histons as a textural characteristic to carry out the categorization of T1-weighted MRI volumes, in order to separate Alzheimer's disease and cognitive normal patients. Specifically, Clinica software is used to carry out a preprocessing stage: tissue segmentation, bias correction and spatial normalization to MNI space. After the normalization stage, we perform an over-segmentation using the aggregate volume for gray matter, white matter and cerebrospinal fluid, in order to provide a natural set of neighbourhoods for the histon-calculation process. Then, a subset of the vectors features is selected using PCA. Finally, we train SVM classifiers using the reduced features.
The use of a volume-based histon aims to exploit the relationship between gray matter, white matter and cerebrospinal fluid. For this purpose, the method for histon calculation presented in [53] has been extended to volumetric images. The concept of histons represents a mean for visualization of color information for the evaluation of similar color regions in an image. Compared to other textural features, a histon is especially sensitive to subtle variations of color in relation to space, particularly when we provide a natural set of neighbourhoods for its creation through the use of supervoxels. This allows for quantifying the colour variations associated with neuronal degeneration on a RGB interpretation of an aggregation of the probability volumes for GM, WM and CSF.
Experimental results, on both the aged 60 and over and the reduced subset, have demonstrated a significant improvement in performance for AD versus CN classification compared to the direct voxel classification of the T1-weighted MRI volumes (baseline provided by Clinica). Although given the differences in age, gender, impairment and/or image quality between study populations it is impossible to make a direct comparison, in general, we can affirm that the results obtained are comparable to or better than those of similar textural methods (see [33,62] or [34]). On the other hand, the current implementation of the method, where histons are calculated on the whole brain, does not assert specific spatial patterns of cerebral degeneration associated with Alzheimer's disease. This may limit the method's ability to discriminate in the presence of cerebral atrophy associated with other pathologies, or even in very elderly patients.
The use of a standardized work-flow, provided by Clinica, represents an important step towards the reproducibility of this research and its comparability with future developments. However, it should be noted that this pre-processing step can be computationally expensive, especially if large datasets are considered.
The remarkable results obtained with the method proposed suggest the extension of the study to other cases, such as discrimination between cognitive normal and mild cognitive disorder or to predict the evolution from mild cognitive disorder to Alzheimer's disease, as well as to the expansion and refinement of the study group by extending it to other databases. In the future, it is planned to improve the discrimination capacity of the textural feature presented by applying it only to specific areas of the brain.

Conflicts of Interest:
The authors declare no conflict of interest.