Next Article in Journal
Ambient Intelligence Environment for Home Cognitive Telerehabilitation
Previous Article in Journal
A Method for Extrinsic Parameter Calibration of Rotating Binocular Stereo Vision Using a Single Feature Point

Sensors 2018, 18(11), 3670;

Application of Computational Intelligence Methods for the Automated Identification of Paper-Ink Samples Based on LIBS
Faculty of Physics, Mathematics and Computer Science, Cracow University of Technology, Warszawska 24, 31-155 Krakow, Poland
Laboratory for Forensic Chemistry, Faculty of Chemistry, Jagiellonian University, Gronostajowa 2, 30-387 Krakow, Poland
Faculty of Materials Science and Ceramics, AGH University of Science and Technology, Mickiewicza 30 Av., 30-059 Krakow, Poland
Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, 535 Clementi Rd, 599489 Singapore, Singapore
Department of Biomedical Engineering, School of Science and Technology, Singapore School of Social Sciences, 599494 Singapore, Singapore
School of Medicine, Faculty of Health and Medical Sciences, Taylor’s University, 47500 Subang Jaya, Malaysia
Department of Computer Engineering, Munzur University, 62000 Tunceli, Turkey
Author to whom correspondence should be addressed.
Received: 23 August 2018 / Accepted: 26 October 2018 / Published: 29 October 2018


Laser-induced breakdown spectroscopy (LIBS) is an important analysis technique with applications in many industrial branches and fields of scientific research. Nowadays, the advantages of LIBS are impaired by the main drawback in the interpretation of obtained spectra and identification of observed spectral lines. This procedure is highly time-consuming since it is essentially based on the comparison of lines present in the spectrum with the literature database. This paper proposes the use of various computational intelligence methods to develop a reliable and fast classification of quasi-destructively acquired LIBS spectra into a set of predefined classes. We focus on a specific problem of classification of paper-ink samples into 30 separate, predefined classes. For each of 30 classes (10 pens of each of 5 ink types combined with 10 sheets of 5 paper types plus empty pages), 100 LIBS spectra are collected. Four variants of preprocessing, seven classifiers (decision trees, random forest, k-nearest neighbor, support vector machine, probabilistic neural network, multi-layer perceptron, and generalized regression neural network), 5-fold stratified cross-validation, and a test on an independent set (for methods evaluation) scenarios are employed. Our developed system yielded an accuracy of 99.08%, obtained using the random forest classifier. Our results clearly demonstrates that machine learning methods can be used to identify the paper-ink samples based on LIBS reliably at a faster rate.
classification; computational intelligence methods; discrimination power; LIBS; machine learning; paper-ink analysis

1. Introduction

During the last decade, there have been important developments in laser induced breakdown spectroscopy (LIBS). This atomic emission spectroscopy technique, also known as laser induced plasma spectroscopy (LIPS), is used for qualitative and quantitative chemical analysis of samples in all states of matter [1,2]. In this technique, high-power, short-duration laser pulse causes an ablation of the analyzed material, which due to its high temperature (10,000 K) dissociates into excited ions and atoms. When plasma cloud cools down this excited species revert to lower energy states and emit optical radiation which can be recorded and analyzed, revealing information about the elemental composition of the sample [2]. LIBS spectra are generally very rich in emission lines coming from excited atoms and ions occurring in a high temperature plasma cloud. Physical and chemical phenomena behind LIBS are not fully understood yet, as they are very complex in nature [3,4]. Nevertheless, LIBS applications have recently been rapidly growing due to the number of advantages of this method. The most important ones are minimally destructive measurements with little or no sample preparation, no efficiency, or no possibility to analyze in real-time all elements in a single laser shot in the samples of all three states of matter [5]. For solids, both mapping (2D) and depth profiling (3D) can be obtained. LIBS can be used for quantitative chemical analysis, material identification, and discrimination [1]. This method can be applied in laboratories and industrial plants, and even at stand-off distances of tens of meters [6]. These properties predispose the LIBS technique for use in many fields: space exploration [7], remote analysis of hazardous materials [8], on-line quality control in various industries [9], cultural heritage studies [10], forensic chemistry [11,12], geology [13], weld quality assurance [14], robotics [15], and many others [16,17,18,19].
Although currently LIBS is already an established technology, spectrochemical LIBS analysis is not straightforward [20]. Identification of elemental constituents in the sample is usually based on the strongest lines present in the LIBS spectrum (so called persistent lines [1]), which are compared with a literature database collected for all the elements from the periodic table. This analysis is cumbersome and time-consuming because the emission spectrum is determined also by the properties of the plasma, not only by the composition of the examined sample [21].
At present, the great advantages of LIBS seem to be impaired to some extent by the main drawbacks: problems coming from often poor reproducibility, the impact of sample composition on spectra recorded for individual components (matrix effects), and the difficulty of performing an overall reliable data analysis (sometimes over 500 spectral lines need to be interpreted). Thus, a question is immediately raised as to whether the LIBS method can be supported by the modern achievements in the field of computational intelligence [22] in an attempt to overcome the limitations of the current LIBS spectrum analysis methodology. The analogous problem of classification of a one-dimensional series of data is well known in computer science. The methods developed for this problem have been successfully applied in many areas [23,24,25,26,27,28,29,30,31]. Therefore, the application of machine learning methods to LIBS spectra of samples may certainly give a strong impact for further developments and applications.
Motivated by the aforementioned arguments, we decided to apply computational intelligence methods to the classification of paper-ink samples for forensic purposes, based on LIBS spectra of the samples. The problem of discrimination of different paper-ink samples has already been addressed in previous studies based on the LIBS spectrum analysis. Trejos et al. showed [32] that the highest discrimination power (DP) (96.4%) was obtained when comparisons were done qualitatively by spectral overlap of the regions of interest (three different emission lines monitored per element) and quantitatively followed by pairwise comparisons (one emission line per element) using ANOVA (analysis of variance). Kula et al. [33] presented the LIBS method as a useful tool in qualitative elemental differentiation of ink samples. They obtained the discrimination power (DP) coefficient of 61%, 82%, and 83% for red, black, and blue inks, respectively. Elsherbiny and Nassef [34] studied the dependence of the obtained spectra of various black gel inks on the wavelength of laser excitation, reporting the DP in the range of 88–91%. In the next study [35], the elemental analysis with the use of LIBS was performed with inkjet inks and toners on office paper. The LIBS results supported by pairwise comparison analysis (ANOVA with Tukey’s post hoc test) provided discrimination power of 98.4% (3 indistinguishable/190 compared pairs) for the toners and 100% for the inkjet inks. Moreover, these three undifferentiated toner pairs were discriminated using a Student’s t-test at a 95% confidence limit. The authors claimed that LIBS as a tool for the determination of elemental composition of sample can be a part of a procedure for questioned document examination.
The problem of paper-ink classification, on the other hand, is less common in the literature. To the best of authors’ knowledge there are only two articles dealing with LIBS spectra of paper-ink samples employed in classification/fitting problems by means of computational intelligence techniques. In a paper by Hoehses et al. [36], the benefit of applying several independent chemometric methods to LIBS data was demonstrated. A consecutive methodology of applying soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) enabled the step-wise classification of data and separation of inks that were not identified by principal component analysis (PCA). The support vector machine (SVM) yielded a correct classification rate of 87%, and cross-validation accuracy amounted to 81%. In the second paper [37], multiple methods such as three comparative functions (linear correlation, overlapping integral, and sum of squared deviations) and two advanced statistical methods (multivariate curve resolution alternating least squares (MCR-ALS) with classification tree and discriminant analysis (DA)) were applied to statistically evaluate LIBS spectra. The newly introduced MCR-ALS/DA methodology showed identification of the paper and printer type with an accuracy of 96.3% and 83.3%, respectively. Computational intelligence methods analyzed in this study have a larger hypothesis space than SIMCA, PLS-DA, MCR-ALS, or SVM; therefore, they can fit the data more accurately.
In the present paper, we show that the similar problem can be solved using computational intelligence methods. The classification problem discussed in this paper is very difficult due to similarities between LIBS spectra from different paper-ink samples and differences between such spectra from one single class. Additionally, we operate over spectra with a large number of lines with significant noise. To solve this classification problem, we tested many preprocessing ways and computational intelligence methods, but the paper presents only selected and best ones.

2. Materials and Methods

2.1. Materials

Fifty ballpoint pens (10 items of each of 5 models) produced by four different manufacturers from Germany and Poland were purchased in Poland: Bic Orange fine blue (denoted B), Rystor Kropka 0.5 1986 (denoted R), Staedtler Stick 430F (denoted S), Staedtler Ball 432F (denoted SB), and Toma Sunny fine 050 (denoted T). Fifty sheets (10 sheets of each of 5 types) of five Canadian certified reference papers were used (papers denoted A, D, L, N, and O from the set “Fillers in paper” supplied by A.S.O. Design Canada).
Each particular ink from each of 10 pens of 5 types was deposited (as straight lines) on each of 10 sheets of 5 types of papers, using standard hand pressure. All 2500 (50 sheets of papers with 50 deposits each) paper-ink samples were placed in plastic bags and stored in darkness at room temperature.
In the experiment, data for 30 classes (A, A + B, A + R, A + S, A + SB, A + T, D, D + B, …, O + SB, O + T) based on the combination of ink, paper, and empty papers were recorded. They are summarized in Table 1.

2.2. LIBS

The analysis of all paper-ink samples was carried out using a laser induced breakdown spectroscopy system LIBS-6 (Applied Photonics, Skipton North Yorkshire, UK). It consists of an integrated Q-switched Quantel Ultra Nd-YAG laser (Quantel, Les Ulis, France) working at λ = 1064 nm emitting a maximum energy equal to 150 mJ due to one laser pulse (6 ns), and an Avaspec-2048-2-USB2 fibre optic Czerny–Turner spectrometer (6-channel) with a CCD detector (Avantes, NS Apeldoorn, The Netherlands). The system was also equipped with a camera enabled to observe the analyzed object and a movable sample table. The LIBS-6 system was operated by LIBSoft V6.0.1 software (Applied Photonics, Skipton North Yorkshire, UK). Under normal conditions, because of no moving elements inside, a wavelength calibration of the spectrometer was not required. Every measurement were conducted in air under atmospheric pressure. The spectrum of a new background was collected prior to the analysis of each new sample.
A Q-switch delay time of 165 μ s, the integration delay time of 1.27 μ s, and integration time of 1.2 ms were used. Samples were analyzed directly without any special preparation. They were placed at the sample table at the focal point of the focusing lens (at a distance determined by the nozzle about 70 mm from the optical head). The diameter of the ablation spot, ranging from 0.6 mm to 1 mm, was dependent on the analyzed material.
The results of LIBS analysis is an emission spectrum—a two-dimensional dependence between the intensity and the wavelength (nm) of electromagnetic radiation emitted by an atom or molecule making a transition from a high energy state to a lower energy state. There are many possible electron transitions for each atom, and each has a specific energy difference. These different transitions, leading to different radiated wavelengths, create a unique emission spectrum. The emission spectra were collected in the UV-vis range (185 nm to 904 nm, spectral resolution 0.1 nm).
In total, 30 classes of experimental data listed above were recorded. For each class, 100 LIBS spectra were acquired, resulting in total in 3000 LIBS emission spectra, each one constituting a single sample. There was one spectrum collected from each line on a piece of paper. A sample spectrum is shown in Figure 1. A detailed version of Figure 1 with descriptions of the most prominent lines is in Supplementary Materials. The spectrum of each sample is available on our website [38].

2.3. Spectral Line Identification

In the spectral line identification process (Table 2), the standard protocol was applied [1]. Different elemental compositions for different types of papers and inks were identified. In order to find which spectral line in a particular spectrum of a paper-ink sample originates from ink, a comparison with a spectrum of pure paper was made. The general observation was that the elemental profile of all papers is in accordance with the information (included in the certificate of analysis) about fillers used in analyzed standard papers. Thus, the identified elements were mainly corresponding to the following fillers: calcium carbonate ( CaCO 3 ) , kaolinite ( Al 4 [ Si 4 O 10 ] ( OH 8 ) ) , titanium dioxide ( TiO 2 ) , and talc ( Mg 3 ( OH ) 2 Si 4 O 10 ) . In spectra obtained for papers containing TiO 2 (L and N) due to the ubiquity of Ti spectral lines, no lines from ink components were observable. The best searching of ink spectral lines of elements originating from inks was in the case of O paper in which kaolinite was the main filler. Concerning ballpoint pen inks, four of them have relatively complicated elemental composition (at least three elements). In one of them (ink T) only one element was found.

2.4. Data Analysis

We first performed a visual inspection of data after using PCA. In Supplementary Materials, we include 14 figures presenting data after PCA for all classes on truncated and normalized data. As can be seen, various types of paper (A, D, L, N, and O) separated well. In contrast, various types of inks (B, R, S, SB, and T) were not distinct from each other. This visual analysis confirms the difficulty of the issue—the automated identification of paper-ink samples.
The analysis of the LIBS spectrum consisted of the following steps:
  • independent preprocessing of LIBS spectra;
  • selection of data for the cross validation and testing sets;
  • data analysis based on computational intelligence methods;
  • evaluation of the results.
Steps of the experiment are presented in the flowchart in Figure 2.

2.4.1. Signal Preprocessing

The initial preprocessing of the raw data was performed to reduce the number of data points corresponding to individual LIBS spectra. This preprocessing step was applied because the spectral range of LIBS is very broad (11,746 data points in this case). From the viewpoint of computational intelligence applied to classification tasks, it contains irrelevant information. Moreover, the removed spectral ranges are also not chemically relevant—there is only noise (the intensities close to zero) and there are no spectral lines. Therefore, these ranges do not provide any relevant information about the elemental composition of the sample. It can be expected that the truncation of the datasets enhances extraction of characteristic features of the distinctive classes (types of material) [39]. Four preprocessing steps were taken under consideration:
  • data truncation by removing the data points from the beginning (first 3746 data points) and from the end (last 1000 data points) of the LIBS spectrum, which do not contain relevant information (7000 data points are left for further analysis);
  • normalization of intensity values to the interval [ 0 , 1 ] ;
  • standardization of intensity values (therefore, the mean value becomes equal to 0 and standard deviation becomes equal to 1).
Four preprocessing ways based on these steps were constructed and evaluated: data truncation and either standardization (also known as standard normal variate) or normalization. These steps are depicted at the bottom of Figure 2.
The Simple Intuitive Language for Experiment Modeling (SILEM) [40] software was used for data preprocessing tasks.

2.4.2. Cross-Validation

The set of 3000 LIBS emission spectra was divided into two subsets [41]: a training subset containing 90% (2700) of spectra and a test subset containing 10% (300) of spectra. Selection of spectra for both subsets was performed in a stratified way, that is the numbers of spectra from different classes had the same proportions in each subset. The cross-validation subset was used for parameter optimization, and the test subset was then used for final evaluation of different types of classifiers with optimized parameters.
The 5-fold stratified cross-validation method was applied to build the training and validation data sets. In each of the five analyzed combinations, a training set of 72 LIBS spectra from each of 30 classes (2160 in total) and a validation set of 18 LIBS spectra from each of 30 classes (540 in total) were used.
Evaluation of the classifiers with optimized parameters was performed separately for each training subset using the test subset. Results of evaluation based on each training subset were obtained by averaging the folds.

2.4.3. Computational Intelligence Methods

The LIBS spectra, after preprocessing and cross-validation, were fed to the classifiers: generalized regression neural network (GRNN) [42], probabilistic neural network (PNN) [43], multi-layer perceptron (MLP) [44], support vector machine (SVM) [45], decision trees (DT) [46], k-nearest neighbor (kNN) classifier [47], and random forest (RF) [46].
Each method is potentially dependent on a set of parameters that are either quantitative of qualitative. They have an influence on the overall performance of the method. These parameters were separately optimized for each machine learning algorithm to receive the lowest number of erroneous classes. The basic categorical parameters for each method were set to constants listed in Table 3.
Methods based on an adaptive neuro-fuzzy inference system (ANFIS) [48] and Gaussian process [49] were also tested, but they are computationally intensive and take a long time to receive results even with a reduced data set.
The SILEM software uses the scikit-learn Python library [50] as a source of implementations of employed computational intelligence algorithms.

2.4.4. Evaluation Criteria

Evaluation of classification process was based on methodology described in [51,52]. The 5-fold cross-validation strategy was adopted in our experiment. Five performance parameters (figures of merit), namely accuracy (ACC), sensitivity (SEN), specificity (SPE), mean values, and Cohen’s kappa ( κ ), were calculated. The mean values were calculated to estimate the overall performance of the computational intelligence methods used in this study separately for the task of recognition of each of the different classes of LIBS spectra.
To test whether there are classes which are classified with higher accuracy, specificity, or sensitivity, we calculated for each class S the number of true positives TP ( S ) , false positives FP ( S ) , true negatives TN ( S ) , and false negatives FN ( S ) separately. Then, the accuracy ACC ( S ) , sensitivity SEN ( S ) , and specificity SPE ( S ) with respect to class S are defined as averages over five folds of cross-validation ( K = 5 is the number of folds):
ACC ( S ) = 1 K i = 1 K TP i ( S ) + TN i ( S ) TP i ( S ) + FP i ( S ) + TN i ( S ) + FN i ( S )
SEN ( S ) = 1 K i = 1 K TP i ( S ) TP i ( S ) + FN i ( S )
SPE ( S ) = 1 K i = 1 K TN i ( S ) TN i ( S ) + FP i ( S )
where T P i ( S ) , FP i ( S ) , TN i ( S ) , and FN i ( S ) are, respectively, the number of true positives, false positives, true negatives, and false negatives for the ith fold of cross-validation with respect to class S, i = 1 , 2 , , K .
The overall values of ACC , SEN , and SPE for the classification system are the arithmetic means of ACC ( S ) , SEN ( S ) , and SPE ( S ) over all classes.
To evaluate the degree of discrimination between two different samples discrimination power (also known as discrimination accuracy) coefficient (denoted DP) was proposed [53,54]. DP is the ratio of the number of correctly identified pairs of test samples (identified as from different classes) to the number of all possible pairs of test samples. It can be calculated by the following equation:
DP = 2 D T ( T 1 ) = 1 2 N T ( T 1 )
  • D—the number of differentiated pairs, that is the number of pairs correctly identified as belonging to the same class or correctly identified as belonging to different classes;
  • N—the number of non-differentiated pairs, N = T ( T 1 ) / 2 D ;
  • T—the total number of analyzed samples (the total number of possible pairs of samples is equal to T ( T 1 ) / 2 ).

3. Results

An example of a raw spectrum and the output from successive stages of spectrum preprocessing (data truncation and normalization) are shown in Figure 3. The data truncation stage is optional and normalization can be replaced by standardization.
Parameter selection is a key part in reaching the optimal overall performance of a classification system. There are many possible options available, so it is important to analyze the outcomes of experiments for different values of parameters to demonstrate the possibilities of machine learning methods. Particular classification methods depend on the various basic parameters set as listed in Table 3 and some other parameters that were optimized. Optimization of these parameters was performed in two steps. The first step was to find out the general range of values for each parameter for the fine-tuning procedure. Then, the detailed grid search of the selected ranges of parameters was performed.
The classification algorithms, the tuning parameters and range of these parameter values are described below.
  • Decision trees—the range of the number of features to consider when looking for the best split: from 100 to 7000 with a step equal to 100.
  • Random forest—two parameters were optimized during preprocessing. The number of trees in the forest was optimized in range from 10 to 200 with a step of 10 and from 200 to 1000 with a step equal to 50, and the number of features to consider when looking for the best split was optimized in the same range.
  • kNN—the number of neighbors was optimized in the range from 1 to 4 and exponent used to calculate the Minkowski distance was optimized from 1 to 10 with a step of 1.
  • SVM—the gamma parameter of the RBF kernel function was optimized in a range from 0.01 to 1.00 with a step of 0.01 and the nu parameter of the nu-SVC algorithm, related to the error tolerance of the SVM classification, was optimized in a range from 0.01 to 1.00 with a step of 0.01.
  • PNN—the radius (the spread) of the kernel function of the network (standard deviation for the probability density function of the normal distribution). This parameter was optimized in a range from 0.01 to 0.20 with a step equal to 0.01 when normalization in preprocessing was used and from 0.1 to 1.0 with a step of 0.1 when standardization was used.
  • GRNN—the spread parameter with an identical meaning and range as in PNN was optimized;
  • MLP—the number of neurons was optimized in a range from 10 to 200 with a step equal to 10. The activation function was selected from “identity,” “logistic,” “tanh,” “relu”. The solver for weight optimization was chosen from “lbfgs” (an optimizer in the family of quasi-Newton methods), “sgd” (a stochastic gradient descent), or “adam” (a stochastic gradient-based optimizer proposed in [55]).
These optimal parameters were selected in a series of experiments to maximize the mean ACC, SEN, and SPE values described in Section 2.4.4.
The best results of the experiments with different preprocessing methods and machine learning algorithms based on 5-fold cross-validation (3000 spectra were divided into training set containing 2160 spectra, a validation set with 540 spectra, and test sets containing 300) are presented in Table 4. The preprocessing involves data truncation and standardization or normalization. Generally, the classification results were worse when the preprocessing included normalization instead of standardization. In the table, we show the values of ACC, SEN, SPE (and their mean value denoted MEAN), κ , and DP.
Table 5 shows the values of ACC ( S ) , SEN ( S ) , and SPE ( S ) for all S classes and all classifiers with appropriate preprocessing methods. It can be seen from the table that the most easily distinguishable (highest ACC ( S ) , SPE ( S ) , and SEN ( S ) ) spectra are in six classes: A, A + T, D, N, O, and O + T ( ACC ( S ) , SPE ( S ) , and SEN ( S ) reached the value of 100%) using the random forest method. Additionally, spectra of samples from 2 classes—L and N + T—are well recognized by SVM and MLP methods as well. The least distinguishable classes are L + S and O + B with a value of SEN ( S ) less than 50% by the random forest method, but the SPE ( S ) value was over 99%. This means that the spectra of samples from these classes are often assigned to other classes.

4. Discussion

The results of experiments confirm that the computational intelligence methods can be used to analyze LIBS data and obtain accurate classification of paper-ink samples (please see Table 4). We have obtained an accuracy of 99.08% for the best classifier. The results of the present study cannot be directly compared with those obtained in [33,34] focusing on the problem of discrimination between two samples which were not assigned to a priori classes. However, the methods used in [33,34] used Plasus SpecLine 2.13 software for the identification of spectral lines and National Institute of Standards and Technology (NIST) spectral database, gave values of the DP in the range from 85 to 92%. The same standard methods with visual spectra was applied to the data collected in our study gave DP equal to 90.6%. After adopting our algorithms to the problem of discrimination between two different samples, we achieved the DP of 99.0% with SVM and more than 98% with decision trees, random forest, PNN, and MLP classifiers. The limitation of applying our solution to the problem of discrimination is that the classes must be predefined.
The detailed analysis of Table 4 shows that the best method for the class of problems addressed in the present study is the random forest classifier. Parameters for this classifier were optimized using the 5-fold cross-validation using standardized data. The classifier during test over independent test set was able to correctly recognize 1294 cases of LIBS spectra out of 1500 cases and had no erroneous recognitions in training sets (10,800 cases). The cases are summed across all five folds of cross-validation. This classifier achieved an accuracy of 99.08%, a sensitivity of 86.27%, a specificity of 99.53%, a kappa of 85.79% and a DP of 99.08% for the test set.
Taking into account the number of erroneous classifications under different settings, the worst classifier was GRNN.
Two preprocessing methods were applied on the raw data prior to the classification. Hence, we have shown that an appropriate selection of preprocessing procedures influences the performance of classification. However, although in most cases the actual selection of the pair preprocessing-classifier determines the overall performance of the classification method, for some specific selections of classifier the final classification performance depends only marginally on the preprocessing. For this reason, it is not possible to select a single best preprocessing method.
We have taken two other methods into consideration: resampling and wavelet decomposition. The former resamples the input data at given wavelengths of the original sequence. The latter performs one dimensional Haar wavelet decomposition [56]. The initial results were promising, but this method requires more detailed investigation in the future. This technique needs further research to determine which type of wavelet decomposition structure and which parameters will lead to the best results.
The results of the present study show that modern computational intelligence methods can be used for the classification of LIBS spectra into predefined classes, which solves a broad class of problems related to LIBS applications. The main limitation of this study is that with the chosen settings, another broad class of LIBS applications, that is analysis of elemental composition, cannot be solved. The problem of classification addressed in the present study is certainly simpler than the problem of detailed analysis of the composition of materials. However, a successful solution to this problem encourages further research in this area. We suppose that, because the problem of analysis of composition of materials based on a LIBS spectrum may be equivalent to the recognition of a presence of specific spectral lines, this problem can be brought to a series of classification subproblems.

5. Conclusions

The aim of this study was to solve the difficult problem of distinguishing paper-ink samples. Hence, we have designed a computational intelligence system to solve this problem using LIBS spectra. The difficulty of this problem is caused by the high variability of spectra within a single class of paper-ink samples and strong similarity of spectra from samples of different classes.
A machine learning system is developed and based on results presented in Table 4 and Table 5, we can conclude that the system performs well. The random forest classifier provided the most accurate classification with an accuracy of 99.08%, a sensitivity of 86.27%, a specificity of 99.53%, a kappa of 85.79% and a DP of 99.08% for the test set.
Our developed system can be used for other applications, e.g., the identification of metal alloys. The main limitation of this method is that we do not list particular elements of the tested material, but it may be considered as future work. Additional feature extraction and classification methods, such as principal component analysis [57], can be applied in the future. Hence, the sensitivity of the described method can also be further improved.

Supplementary Materials

The following are available online at

Author Contributions

The work presented in this paper was a collaboration of all authors. The manuscript was revised by all authors. Conceptualization, M.K., K.R., T.S., and P.P.; Methodology, K.R., M.K., and P.P.; Software, T.S.; Validation, K.R. and P.P.; Investigation, M.K., K.R., T.S., and P.P.; Resources, M.K., M.N., and T.Ł.; Data Curation, M.K., T.S., and P.P.; Writing—Original Draft Preparation, M.K., K.R., and P.P.; Writing—Review & Editing, M.B., K.R., O.Y., P.P., and U.R.A.; Visualization, T.S. and P.P.; Supervision, K.R., P.P., and U.R.A.


This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.


  1. Singh, J.; Thakur, S. Laser-Induced Breakdown Spectroscopy, 1st ed.; Elsevier: London, UK, 2007. [Google Scholar]
  2. Anabitarte, F.; Cobo, A.; Lopez-Higuera, J.M. Laser-Induced Breakdown Spectroscopy: Fundamentals, Applications, and Challenges. Int. Sch. Res. Not. Spectrosc. 2012, 2012, 285240. [Google Scholar] [CrossRef]
  3. Hahn, D.W.; Omenetto, N. Laser-induced breakdown spectroscopy (LIBS), part I: Review of basic diagnostics and plasma-particle interactions: Still-challenging issues within the analytical plasma community. Appl. Spectrosc. 2010, 64, 335–366. [Google Scholar] [CrossRef] [PubMed]
  4. Hahn, D.W.; Omenetto, N. Laser-induced breakdown spectroscopy (LIBS), part II: Review of instrumental and methodological approaches to material analysis and applications to different fields. Appl. Spectrosc. 2012, 66, 347–419. [Google Scholar] [CrossRef] [PubMed]
  5. Galbács, G. A critical review of recent progress in analytical laser-induced breakdown spectroscopy. Anal. Bioanal. Chem. 2015, 407, 7537–7562. [Google Scholar] [CrossRef] [PubMed]
  6. Stelmaszczyk, K.; Rohwetter, P.; Méjean, G.; Yu, J.; Salmon, E.; Kasparian, J.; Ackermann, R.; Wolf, J.P.; Wöste, L. Long-distance remote laser-induced breakdown spectroscopy using filamentation in air. Appl. Phys. Lett. 2004, 85, 3977–3979. [Google Scholar] [CrossRef]
  7. Wiens, R.; Maurice, S. The ChemCam investigation: Compositions at the curiosity rover landing site. In Proceedings of the 2012 GSA Annual Meeting in Charlotte, Charlotte, NC, USA, 7 November 2012; Volume 44, p. 190. [Google Scholar]
  8. Ramirez-Cedeno, M.; Ortiz-Rivera, W.; Pacheco-Londono, L.; Hernandez-Rivera, S. Remote Detection of Hazardous Liquids Concealed in Glass and Plastic Containers. IEEE Sens. J. 2010, 10, 693–698. [Google Scholar] [CrossRef]
  9. Noyel, M.; Thomas, P.; Charpentier, P.; Thomas, A.; Brault, T. Implantation of an on-line quality process monitoring. In Proceedings of the 2013 International Conference on Industrial Engineering and Systems Management (IESM), Rabat, Morocco, 28–30 October 2013; pp. 1–6. [Google Scholar]
  10. Ortiz, R.; Ortiz, P.; Colao, F.; Fantoni, R.; Gómez-Morón, M.; Vázquez, M. Laser spectroscopy and imaging applications for the study of cultural heritage murals. Constr. Build. Mater. 2015, 98, 35–43. [Google Scholar] [CrossRef]
  11. Rinke-Kneapler, C.; Sigman, M. Applications of laser spectroscopy in forensic science. In Laser Spectroscopy for Sensing; Baudelet, M., Ed.; Woodhead Publishing: Sawston, UK, 2014; pp. 461–495. [Google Scholar]
  12. Król, M.; Kowalska, D.; Kościelniak, P. Examination of Polish Identity Documents by Laser-Induced Breakdown Spectroscopy. Anal. Lett. 2018, 51, 1592–1604. [Google Scholar] [CrossRef]
  13. Gaft, M.; Sapir-Sofer, I.; Modiano, H.; Stana, R. Laser induced breakdown spectroscopy for bulk minerals online analyses. Spectrochim. Acta Part B At. Spectrosc. 2007, 62, 1496–1503. [Google Scholar] [CrossRef]
  14. Anabitarte, F.; Mirapeix, J.; Portilla, O.M.C.; Lopez-Higuera, J.M.; Cobo, A. Sensor for the Detection of Protective Coating Traces on Boron Steel With Aluminium–Silicon Covering by Means of Laser-Induced Breakdown Spectroscopy and Support Vector Machines. IEEE Sens. J. 2012, 12, 64–70. [Google Scholar] [CrossRef]
  15. Bukin, O.; Proschenko, D.; Chekhlenok, A.; Golik, S.; Bukin, I.; Mayor, A.; Yurchik, V. Laser Spectroscopic Sensors for the Development of Anthropomorphic Robot Sensitivity. Sensors 2018, 18, 1680. [Google Scholar] [CrossRef] [PubMed]
  16. Moncayo, S.; Manzoor, S.; Navarro-Villoslada, F.; Caceres, J.O. Evaluation of supervised chemometric methods for sample classification by Laser Induced Breakdown Spectroscopy. Chemom. Intell. Lab. Syst. 2015, 146, 354–364. [Google Scholar] [CrossRef]
  17. Zhang, T.; Xia, D.; Tang, H.; Yang, X.; Li, H. Classification of steel samples by laser-induced breakdown spectroscopy and random forest. Chemom. Intell. Lab. Syst. 2016, 157, 196–201. [Google Scholar] [CrossRef]
  18. Wang, N.; Wang, X.; Chen, P.; Jia, Z.; Wang, L.; Huang, R.; Lv, Q. Metal Contamination Distribution Detection in High-Voltage Transmission Line Insulators by Laser-induced Breakdown Spectroscopy (LIBS). Sensors 2018, 18, 2623. [Google Scholar] [CrossRef] [PubMed]
  19. Zhang, C.; Shen, T.; Liu, F.; He, Y. Identification of Coffee Varieties Using Laser-Induced Breakdown Spectroscopy and Chemometrics. Sensors 2018, 18, 95. [Google Scholar] [CrossRef] [PubMed]
  20. El Haddad, J.; Canioni, L.; Bousquet, B. Good practices in LIBS analysis: Review and advices. Spectrochim. Acta Part B At. Spectrosc. 2014, 101, 171–182. [Google Scholar] [CrossRef]
  21. Tognoni, E.; Palleschi, V.; Corsi, M.; Cristoforetti, G.; Omenetto, N.; Gornushkin, I.; Smith, B.W.; Winefordner, J.D. Laser-Induced Breakdown Spectroscopy (LIBS): Fundamentals and Applications, from Sample to Signal in LIBS; Cambridge University Press: New York, NY, USA, 2006; p. 122. [Google Scholar]
  22. Tadeusiewicz, R. Place and Role of Intelligent Systems in Computer Science. Comput. Meth. Mater. Sci. 2010, 10, 193–206. [Google Scholar]
  23. Pławiak, P.; Sośnicki, T.; Niedźwiecki, M.; Tabor, Z.; Rzecki, K. Hand Body Language Gesture Recognition Based on Signals From Specialized Glove and Machine Learning Algorithms. IEEE Trans. Ind. Inform. 2016, 12, 1104–1113. [Google Scholar] [CrossRef]
  24. Pławiak, P. An estimation of the state of consumption of a positive displacement pump based on dynamic pressure or vibrations using neural networks. Neurocomputing 2014, 144, 471–483. [Google Scholar] [CrossRef]
  25. Pławiak, P.; Maziarz, W. Classification of tea specimens using novel hybrid artificial intelligence methods. Sens. Actuators B Chem. 2014, 192, 117–125. [Google Scholar] [CrossRef]
  26. Rzecki, K.; Pławiak, P.; Niedźwiecki, M.; Sośnicki, T.; Leśkow, J.; Ciesielski, M. Person recognition based on touch screen gestures using computational intelligence methods. Inf. Sci. 2017, 415–416, 70–84. [Google Scholar] [CrossRef]
  27. Pławiak, P.; Tadeusiewicz, R. Approximation of phenol concentration using novel hybrid computational intelligence methods. Int. J. Appl. Math. Comput. Sci. 2014, 24, 165–181. [Google Scholar] [CrossRef][Green Version]
  28. Pławiak, P.; Rzecki, K. Approximation of Phenol Concentration Using Computational Intelligence Methods Based on Signals From the Metal-Oxide Sensor Array. IEEE Sens. J. 2015, 15, 1770–1783. [Google Scholar] [CrossRef]
  29. Pławiak, P. Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Syst. Appl. 2018, 92, 334–349. [Google Scholar] [CrossRef]
  30. Pławiak, P. Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECG signals. Swarm Evol. Comput. 2018, 39, 192–208. [Google Scholar] [CrossRef]
  31. Abdar, M.; Zomorodi-Moghadam, M.; Das, R.; Ting, I.H. Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 2017, 67, 239–251. [Google Scholar] [CrossRef]
  32. Trejos, T.; Flores, A.; Almirall, J.R. Micro-spectrochemical analysis of document paper and gel inks by laser ablation inductively coupled plasma mass spectrometry and laser induced breakdown spectroscopy. Spectrochim. Acta Part B At. Spectrosc. 2010, 65, 884–895. [Google Scholar] [CrossRef]
  33. Kula, A.; Wietecha-Posłuszny, R.; Pasionek, K.; Król, M.; Woźniakiewicz, M.; Kościelniak, P. Application of laser induced breakdown spectroscopy to examination of writing inks for forensic purposes. Sci. Justice 2014, 54, 118–125. [Google Scholar] [CrossRef] [PubMed]
  34. Elsherbiny, N.; Nassef, O.A. Wavelength dependence of laser induced breakdown spectroscopy (LIBS) on questioned document investigation. Sci. Justice 2015, 55, 254–263. [Google Scholar] [CrossRef] [PubMed]
  35. Lennard, C.; El-Deftar, M.M.; Robertson, J. Forensic application of laser-induced breakdown spectroscopy for the discrimination of questioned documents. Forensic Sci. Int. 2015, 254, 68–79. [Google Scholar] [CrossRef] [PubMed]
  36. Hoehse, M.; Paul, A.; Gornushkin, I.; Panne, U. Multivariate classification of pigments and inks using combined Raman spectroscopy and LIBS. Anal. Bioanal. Chem. 2011, 402, 1443–1450. [Google Scholar] [CrossRef] [PubMed]
  37. Metzinger, A.; Rajkó, R.; Galbács, G. Discrimination of paper and print types based on their laser induced breakdown spectra. Spectrochim. Acta Part B At. Spectrosc. 2014, 94–95, 48–57. [Google Scholar] [CrossRef]
  38. Team of Science and Industrial Intelligent Applications. Available online: (accessed on 27 October 2018).
  39. Zieliński, T.P. Digital Signal Processing: From Theory to Applications; WK: Warsaw, Poland, 2005. [Google Scholar]
  40. Simple Intuitive Language for Experiment Modeling. Available online: (accessed on 27 October 2018).
  41. Murphy, K.P. Machine Learning: A Probabilistic Perspective, 1st ed.; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  42. Specht, D.F. A general regression neural network. IEEE Trans. Neural Netw. 1991, 2, 568–576. [Google Scholar] [CrossRef] [PubMed]
  43. Specht, D.F. Probabilistic neural networks. Neural Netw. 1990, 3, 109–118. [Google Scholar] [CrossRef]
  44. Hinton, G.E. Connectionist Learning Procedures. Artif. Intell. 1989, 40, 185–234. [Google Scholar] [CrossRef]
  45. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef][Green Version]
  46. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin, Germany, 2014. [Google Scholar]
  47. Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef]
  48. Sugeno, M. Industrial Applications of Fuzzy Control; Elsevier: London, UK, 1985. [Google Scholar]
  49. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning); The MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
  50. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  51. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  52. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  53. Trejos, T.; Corzo, R.; Subedi, K.; Almirall, J. Characterization of toners and inkjets by laser ablation spectrochemical methods and Scanning Electron Microscopy-Energy Dispersive X-ray Spectroscopy. Spectrochim. Acta Part B At. Spectrosc. 2014, 92, 9–22. [Google Scholar] [CrossRef]
  54. Aitken, C. Statistical and the Evaluation of Evidence for Forensic Scientists; John Wiley & Sons: Chichester, UK, 1995. [Google Scholar]
  55. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv, 2014; arXiv:1412.6980. [Google Scholar]
  56. Chui, C.K. An Introduction to Wavelets; Academic Press Professional, Inc.: San Diego, CA, USA, 1992. [Google Scholar]
  57. Pořízka, P.; Klus, J.; Képeš, E.; Prochazka, D.; Hahn, D.W.; Kaiser, J. On the utilization of principal component analysis in laser-induced breakdown spectroscopy data analysis, a review. Spectrochim. Acta 2018, 148, 65–82. [Google Scholar] [CrossRef]
Sample Availability: LIBS spectra used in this study are available on the webpage:
Figure 1. First spectrum of each class.
Figure 1. First spectrum of each class.
Sensors 18 03670 g001
Figure 2. Flow chart of the experiment.
Figure 2. Flow chart of the experiment.
Sensors 18 03670 g002
Figure 3. Visualization of data preprocessing for single sample spectrum from Class A. Figure (a) depicts an example of a raw spectrogram with marked truncation points, (b) shows normalized data.
Figure 3. Visualization of data preprocessing for single sample spectrum from Class A. Figure (a) depicts an example of a raw spectrogram with marked truncation points, (b) shows normalized data.
Sensors 18 03670 g003
Table 1. List of papers and ballpoint pens examined in this study.
Table 1. List of papers and ballpoint pens examined in this study.
BicBA + BD + BL + BN + BO + B
RystorRA + RD + RL+RN + RO + R
Staedtler/StickSA + SD + SL + SN + SO + S
Staedtler/BallSBA + SBD + SBL + SBN + SBO + SB
TomaTA + TD + TL + TN + TO + T
Table 2. Spectral line identification.
Table 2. Spectral line identification.
Paper ClassIdentified Elements
ACa, Mg, Na, K
DCa, Na, K
LCa, Ti, Al, Si, Na, K
NCa, Ti, Si, Mg, Fe, Na, K
OCa, Al, Si, Mg, Na, K
Ink sample
BCr, Cu, Zn, Pb, La
RCr, Cu, Zn, Pb, Ni, Mn
SCr, Cu, Zn
SBCr, Cu, Zn, Pb
Table 3. Computational intelligence methods and their basic parameters used for LIBS spectra identification.
Table 3. Computational intelligence methods and their basic parameters used for LIBS spectra identification.
1Decision TreesCriterion: gini, splitter type: best, maximum depth: none
2Random ForestCriterion: gini, maximum depth: none
3kNNDistance metric: Minkowski
4SVMType: nuSVC, type of kernel function: radial basis function
5Neural NetworkType: PNN
6Neural NetworkType: GRNN
7Neural NetworkType: MLP
Table 4. Performance results for optimized parameters.
Table 4. Performance results for optimized parameters.
MethodParametersACC (%)SEN (%)SPE (%)MEAN (%) κ (%)DP (%)
Decision TreesMax features = 610098.0871.1399.0089.4070.1498.40
Random ForestN estimators = 700
Max features = 950
kNNN = 1
Exponent = 1.00
SVMNu = 0.17
Gamma = 0.03
PNNSpread = 0.296.8452.6098.3782.6050.9797.80
GRNNSpread = 0.596.2744.0098.0779.4542.0795.58
MLPN. of neurons = 12098.2273.2799.0890.1972.3498.36
Table 5. Results of classification per class.
Table 5. Results of classification per class.
Standardization Decision TreeStandardization Random ForestStandardization kNNNormalization SVMNormalization PNNStandardization GRNNStandardization MLP
A + B98.6099.4574.0099.33100.0080.0097.5398.9756.0099.2799.8682.0097.4099.1746.0098.5399.5968.0098.6099.5970.00
A + R97.9399.2460.0099.4099.7290.0097.8799.6646.0099.33100.0080.0097.4099.3840.0098.2799.4564.0097.8798.8370.00
A + S96.6097.9358.0099.4099.7290.0096.8097.8666.0099.3399.7288.0096.4798.6932.0097.2798.0774.0096.8797.8668.00
A + SB96.4097.9352.0098.6798.9092.0097.2098.1470.0098.6098.8392.0095.2795.9376.0097.4798.9056.0096.8098.6942.00
A + T98.8099.7272.00100.00100.00100.0099.6799.7298.0099.6099.59100.0098.3399.3170.0099.2799.3896.0099.6799.7298.00
D + B98.0799.0370.0099.3399.7288.0095.8098.4818.0098.4799.5966.0095.6798.2820.0096.2799.318.0095.7398.2822.00
D + R98.4099.1776.0099.9399.93100.0097.8099.7940.0099.7399.72100.0096.6798.2850.0098.2099.3166.0097.9399.5252.00
D + S98.5399.1780.0099.53100.0086.0096.4097.1774.0099.5399.8690.0095.4097.6630.0097.1397.6682.0095.6097.1052.00
D + SB96.6798.2152.0098.8099.5976.0096.0798.1436.0097.7398.7668.0094.6097.3116.0097.4098.5564.0095.7397.8634.00
D + T99.1399.5986.0099.0799.03100.0098.4099.1078.0099.2099.3196.0097.2098.3464.0098.3398.9780.0099.0099.1096.00
L + B98.0799.2464.0099.8799.9398.0096.4799.3812.0099.2799.5990.0095.8797.5946.0096.3399.592.0098.4799.5966.00
L + R97.7399.4548.0098.4799.9356.0096.8099.4520.0098.2099.2468.0096.8798.9038.0096.3399.592.0097.9399.7246.00
L + S96.0098.0736.0097.2099.1740.0096.0098.9710.0097.4799.5936.0095.8098.4818.0096.4799.4510.0096.7398.0758.00
L + SB95.6796.5570.0096.6797.1084.0091.5393.4536.0097.0097.5282.0094.8796.3452.0091.2092.9740.0097.2797.6686.00
L + T98.0799.4558.0099.2099.6686.0096.0098.6918.0099.5399.6696.0097.9399.6648.0095.2798.550.0099.6099.7994.00
N + B98.3399.6660.0098.7399.5974.0096.9399.2430.0099.67100.0090.0097.4798.6264.0095.6798.902.0098.8099.6674.00
N + R98.4099.5964.0099.53100.0086.0095.6798.970.0097.6799.4546.0095.3397.8622.0095.6098.900.0098.1399.5956.00
N + S97.6098.5570.0099.2799.9380.0097.6099.5242.0098.1399.4560.0095.8098.3422.0096.8799.7214.0098.4799.1778.00
N + SB96.3397.5960.0097.3398.0078.0092.4793.9350.0096.5396.9784.0094.4796.7628.0088.5390.1442.0096.7397.5274.00
N + T99.3399.4596.0099.6799.66100.0099.0799.3890.00100.00100.00100.0099.0799.1796.0096.6099.3816.00100.00100.00100.00
O + B96.7398.9034.0097.5399.9328.0096.6099.930.0098.0099.8644.0097.2099.3834.0096.3399.660.0097.80100.0034.00
O + R99.4799.8688.0099.67100.0090.0096.6099.930.0098.9399.8672.0096.5398.9028.0096.2799.590.0099.00100.0070.00
O + S98.7399.4578.0099.4799.8688.0096.2799.454.0098.0799.2464.0095.2798.0016.0096.3399.660.0098.2098.8380.00
O + SB95.8797.3154.0096.9396.9098.0087.9389.4544.0096.0096.4184.0092.8794.9034.0084.6786.5530.0096.1396.7678.00
O + T99.3399.6690.00100.00100.00100.0098.4098.7688.0099.9399.93100.0098.8098.8398.0095.4798.624.0099.6799.66100.00

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Back to TopTop