## 1. Introduction

Paper and ink are the two most important pieces of evidence in forensic document analysis; understanding the legibility of both of them has a vital role in the investigation of document forgery. To determine the originality of a document, forensic experts need to examine both paper and inks used. In the case of a multipage document, the presence of different paper types may lead to potential chances of forgery. In order to extract this information, the forensics experts rely on different techniques. The most commonly used techniques to detect forgeries includes ultraviolet (UV) and infrared (IR) imaging [

1], chemical analysis and visual inspection [

2]. The document analysts always prefer to use non-destructive methods, to preserve the original evidence even after the analysis. Unfortunately, chemical methods usually cause some damage to samples, and are therefore less popular compared to the non-destructive techniques. The major techniques that follow the non-destructive paradigm are; Fourier transform infrared (FTIR) [

3], Raman spectroscopy [

4], video spectral comparator (VSC) [

5], multi-spectral imaging and hyperspectral imaging (HSI) [

6,

7].

HSI combines spectroscopy and imaging in order to record the spectral information of the sample across the spatial area of interest. HSI captures hundreds of narrowband images in the visible and near infrared region, and this results in a large amount of data. Motivated by the possibility of non-destructive investigation of material properties, HSI has become one of the most popular and trustworthy tool for analysis in many fields of science, including food quality inspection [

8], medical imaging [

9], material science [

10], cultural heritage imaging [

11] and forensics investigation [

12]. Compared to traditional RGB images, HSI images can be considered as three-dimensional data, with the third dimension encoding the spectral range as shown in

Figure 1. Each pixel of HSI data represents the spectrum in that spatial point, and this information can be used as a material fingerprint for characterizing each point.

The HSI data contains redundant information, and it requires an efficient method to extract the most interesting and useful information [

13]. When considering hyperspectral dimensionality reduction, a favorite method is the well-known PCA approach [

14], as outlined in several papers [

3,

15,

16]. Other traditional techniques such as Independent Component Analysis (ICA) [

17] and Linear Discriminant Analysis (LDA) [

18] as well as statistical methods [

19,

20,

21]. Aside from the methods discussed, a new method known as t-Distributed Stochastic Neighbor Embedding (t-SNE) [

22] is gaining popularity in dimensionality reduction related problems. t-SNE dimensionality reduction techniques are already deployed in HSI processing, and have obtained better results than the traditional methods. However, this technique has not yet been evaluated in HSI data of paper, hence we have decided to explore the power of t-SNE algorithm in the dimensionality reduction and visualization of hyperspectral data of paper samples. The main contribution of this research will be to test and evaluate the t-SNE based workflow for unsupervised clustering of HSI images of paper samples, and benchmarking the proposed method against PCA. To implement this, we have created an HSI dataset of 40 different paper samples.

t-SNE was chosen as a candidate because of the following advantages over the conventional methods. Primarily, t-SNE is one among the few algorithms that is capable of simultaneously retaining both local and global structure of the data; also, it calculates the probability similarity for points in high dimensional space as well as in low dimensional space. Since its invention, t-SNE has been introduced into many fields. We present a few of them here in order to show the range of applications. Walid et al. identified that t-SNE has better capability to resolve the bio-molecular intra-tumor heterogeneity from mass spectroscopy images [

23]. Erdogan et al. applied t-SNE on the visualization of human tissue relationships [

24], whilst in another study t-SNE was used as a scalable alternative to create visualizations (projections) enabling insight into the structure of time dependent data sets [

25]. Another example is the report made by Kunihiko et al. which suggests visualizing curricula using a combination of cosine similarity, t-SNE, and scatter plots to help students select their courses [

26]. In addition, Chen et al. found that the t-SNE algorithm can be used to optimize underwater target radiated noise spectrum features for the purpose of improving the accuracy and efficiency of the classification algorithm [

27]. A few experiments touched upon the t-SNE of HSI data sets. One amongst them is made by Pouyet et al. [

28] which uses t-SNE to visualize HSI data of paint pigments. Song et al. also demonstrated the capability of t-SNE for remote sensing data processing [

29]. In addition, there are a few reports which are not focused on dimensionality reduction, but which nevertheless utilize t-SNE and HSI data [

30,

31].

Performance of the proposed method is evaluated against PCA [

14], which is identified as one of the most commonly used methods for dimensionality reduction. As well as visual comparison and quantitative methods are also used to get the clustering quality of processed data from both methods by using k-means clustering.

The remaining part of this paper is organized into three parts; the first part will explain the HSI acquisition, sample preparation, algorithms and evaluation methods; the following part will discuss results; and the paper ends with a conclusion that points to possible future works.

## 3. Results and Discussions

This experiment used 40 papers samples, and selected different spectral sample sizes between 25 and 2500 pixels from each paper samples, also tuned for perplexity. The clustering indices are measured 20 times for each combination of sample count and perplexity, and the average classification indices obtained for optimal perplexities are given in

Table 1 below.

The NMI value for the data from t-SNE obtained a high score (0.92) indicating a good clustering, compared to 0.72 for the PCA processed data. The CI and HI indices of clustering obtained from t-SNE processed data also achieved a score close to unity, demonstrating the efficiency of t-SNE dimensionality reduction compared to PCA. Finally, the SI index, which indicates the tightness of the clustering, gives t-SNE algorithm an upper hand over PCA.

Figure 6 visualizes the results obtained from dimensionality reduction from PCA and t-SNE (where the original spectral dimension of 186 bands has been reduced to two-dimensional data). A simple visual inspection is enough to conclude that the t-SNE clusters are more distinguishable than those of PCA. In this context, t-SNE provides a better visualization than PCA, and this helps us to predict the nature of the data.

From the clustering indices and visual inspection, it is clear that for HSI data of paper samples the t-SNE algorithm surpasses the results obtained from PCA. These findings are not surprising [

32], since PCA always tries to find a linear relationship between data points, and this may fail at many data points while dealing with highly non-linear data such as a spectrum with 186 dimensions. This is because PCA projects the data (n-dimensional) onto an m-dimensional (m < n) linear subspace defined by the leading eigenvectors of the original data’s covariance matrix, to obtain a global linear model [

41]. However, t-SNE is designed to mitigate this problem by extracting non-linear relationships, which helps t-SNE to produce a better classification.

The experiment uses different sample sizes of between 25 and 2500 pixels, and for each sample size the t-SNE is executed over a list of perplexities in order to find the optimal perplexity. The list of perplexities used are 5, 10, 25, 50, 100, 300, 600 and 1000, and we select as our optimal perplexity value that which gives the highest value for all clustering quality parameters.

Table 2 lists the sample counts used and the optimal perplexity values obtained, along with the clustering index values corresponding to the optimal perplexity. It is observed that the optimal perplexity value depends on the sample size, which is visualized in

Figure 7.

A more detailed analysis of results leads us to the important finding that when using t-SNE the clustering indices are quite stable at varying clustering sizes, as seen in

Table 2 and visualized in

Figure 8. This demonstrates that the sample size has little influence on the dimensionality reduction power of the t-SNE algorithm.

The t-SNE algorithm does require more computation time because of its quadratic time complexity; compared to PCA this might be the major disadvantage of t-SNE. In the present study, for 2500 samples from 40 different papers, t-SNE consumes 3763.3 seconds on average while PCA consumes 10.2 seconds. The performance is measured for an Intel Core i7 8650U CPU with 16 GB of RAM, and

Figure 9 shows the variation in time consumption against sample size.

While processing with t-SNE, the parameter perplexity needs to be optimized for those particular data, compared to the straightforward processing of PCA. This parameter tuning introduces extra processing into the workflow, which is not required for PCA.