Hyperspectral Classification of Blood-Like Substances Using Machine Learning Methods Combined with Genetic Algorithms in Transductive and Inductive Scenarios

This study is focused on applying genetic algorithms (GAs) to model and band selection in hyperspectral image classification. We use a forensic-inspired data set of seven hyperspectral images with blood and five visually similar substances to test GA-optimised classifiers in two scenarios: when the training and test data come from the same image and when they come from different images, which is a more challenging task due to significant spectral differences. In our experiments, we compare GA with a classic model optimisation through a grid search. Our results show that GA-based model optimisation can reduce the number of bands and create an accurate classifier that outperforms the GS-based reference models, provided that, during model optimisation, it has access to examples similar to test data. We illustrate this with experiments highlighting the importance of a validation set.


Introduction
Genetic optimisation, inspired by natural evolution, is a well-known heuristic optimisation and search procedure that can be used for both feature and model selection in machine learning (ML). The focus of this paper is the use of genetic algorithms (GA) to train accurate ML algorithms; i.e., hyperspectral classifiers. A hyperspectral classifier aims to assign pixels in a hyperspectral image to predefined classes; e.g., different types of crops in an image of agricultural area. A hyperspectral pixel is a vector of measurements (typically, reflectance values) corresponding to a specific band: a narrow wavelength range of the electromagnetic spectrum. Since materials in the imaged scene uniquely reflect, absorb and emit electromagnetic radiation based on their molecular composition and texture, hyperspectral classification allows them to be accurately distinguished [1]. However, there are several challenges related to the task, such as the huge volume of images, their high dimensionality, the redundancy of information in hyperspectral bands and the presence of noise introduced by acquisition process and calibration procedures [2]. In addition, the observed spectra are mixtures (e.g., linear combinations) of material spectra in the imaged scene [3].
One particular challenge lies in the availability and quality of training data; i.e., the selection of a training set. Typically, due to the high cost of generating hyperspectral training examples [4], training sets in hyperspectral classification are small. However, when training pixels are randomly and uniformly sampled from the classified image itself, it is possible to achieve high accuracy even for very small training sets of 5-15 examples per class; e.g., by exploiting the spatial-spectral structure of the image and using semi-supervised learning [5]. This is because hyperspectral images provide highly distinctive features and because classes are usually relatively large in the image. In such problems, we may be more interested in finding the best assignment of pixels to classes than in finding the classification function itself. Therefore, referring to the concept of transductive learning proposed by Vapnik [6], we call such scenario a hyperspectral transductive classification (HTC) problem.
The challenge is elevated when training pixels come from a different image than test pixels. In such a case, differences in the acquisition environment (e.g., light intensity, time differences) and in-class spectra (e.g., different background materials in spectral mixtures) may be perceived as a complex noise. In such a scenario, the classifier is expected to generalise and compensate for the differences between the training set and classified data. In contrast to the HTC scenario, which treats the image as a "closed world", we call this scenario a hyperspectral inductive classification (HIC), emphasising the importance of finding the best classification function. The HIC scenario shares similarities with the hyperspectral target detection problem [7], where spectra to be found in an image commonly come from spectral libraries.
Genetic algorithms [8,9] are well-established techniques for the selection of features and optimisation of classifier parameters. GAs are based on natural selection, inheritance and the evolutionary principle of the survival of the best-adapted individuals. Their advantages compared to the classic feature and model selection procedures such as grid search (GS) are, e.g., (a) their resistance to local extremes, (b) the ability to control selective pressure (exploration and exploitation) from global to local search and (c) ease of application due to feature selection being combined with parameter optimization. These advantages have resulted in GAs being frequently used for hyperspectral band selection [10] and the classification of multispectral [11] and hyperspectral data [12]. However, in most reference works, GAs are applied for a problem corresponding to the HTC scenario, typically using well-known hyperspectral datasets such as the "Indian Pines" or the "University of Pavia" images. Under such conditions, the simultaneous optimisation of classifier parameters with band selection allows researchers to achieve high classification accuracy [13]. To test both the HTC and HIC scenarios, in our experiments, we use a dataset described in [14] that consists of multiple hyperspectral images with blood and blood-like substances. The dataset is inspired by problems related to forensic analysis; e.g., the detection of blood. However, we focus on the problem of classification; i.e., distinguishing between classes corresponding to visually similar blood-like substances in the images. We use multiple images with the same classes but with significant spectral differences to compare the HTC and the HIC scenarios. We analyse the impact of GAs on the classification accuracy in comparison to the grid-search parameter selection using multiple state-of-the-art hyperspectral classifiers.
Our thesis is that hyperspectral classification with a GA-based model and band selection would allow more accurate classifiers to be obtained compared to the approach when parameters are selected with GS. To test this, we compare the accuracy of classifiers optimised with GA and GS in both the transductive and inductive hyperspectral classification scenarios. Our main contribution is the identification and experimental verification of the conditions under which GA outperforms GS in hyperspectral classification. We show that in order for this advantage to be significant, the classification problem must be sufficiently complex, such as in the HIC scenario, which is more difficult compared to the HTC. In addition, the data in the validation set used for model selection must be sufficiently similar to data in the test set, which is not always the case in the HIC. Since the use of GA can be time consuming compared to GS, our conclusions allow for a more informed choice of model selection method in various hyperspectral classification problems.

State of the Art
Machine learning algorithms are currently popular and widely used in medical imaging [15]. Some of the main areas of ML application are image segmentation-e.g., for melanin [16] or epidermis [17]-and segmentation and classification-e.g., for the detection of pigment network in dermoscopic images [18]. Due to the visibility of haemoglobin in the spectra, hyperspectral imaging (HSI) has become useful in areas related to medical diagnosis [19]. In addition, the detection and estimation of blood age in hyperspectral images [20] can be applied to forensic analysis [14]. However, the complexity of hyperspectral data makes the development of dedicated ML methods, especially classification algorithms, particularly important. Genetic algorithms are promising for the construction of hyperspectral classifiers as they enable simultaneous model selection and the reduction of data dimensionality.

Hyperspectral Classification
In this paper, we focus on spectral classification [1] which uses only spectral vectors. The leading approaches involve the use of Support Vector Machines (SVMs) [21], Extreme Learning Machines and their Kernel-based variants [22] or Multinomial Logistic Regression [23]. In order to further improve classification accuracy, spectral-spatial approaches [24] are employed. They make use of both pixel spectra and their spatial position in the image. In particular, a combination of spatial-spectral and semi-supervised approaches allows a high classification accuracy to be obtained, even for a small training set [5]. Recently, deep learning methods [25] are popular, although their limiting factor is the fact that they usually require relatively large training sets. However, some works, such as the approach presented in [26], based on residual networks, seem to be able to significantly reduce this dependency.

Evolutionary Computation and Genetic Algorithms
The advantages of techniques based on computational intelligence [27] methods lie in the properties inherited from their biological counterparts: the learning and generalization of knowledge (artificial neural networks [28]), global optimization (evolutionary computation [29]) and the use of imprecise terms (fuzzy systems [30]). The inspiration to undertake research on evolutionary computation (EC) [29] was the imitation of nature in its mechanism of natural selection, inheritance and functioning. Genetic algorithms (GAs) [31] are a part of evolutionary computation techniques, which have been used with success in fields such as the vehicle routing problem [32], feature selection [33], optimization [34], heart sound segmentation [35] or traveling salesman problem [36].
Genetic algorithms are one of the leading approaches to solve optimisation problems [9]. Due to the fact that they are computationally complex, they are often solved with heuristic methods, which make it possible to find a near-optimal solution faster. GA works by creating a population consisting of a selected number of individuals, each of them representing one solution to the problem. Then, from among all the individuals, those with the best results are selected and then subjected to genetic operators, which then create a new population. In particular, this technique can be applied for model selection to find parameters of a machine learning model and simultaneously perform feature selection, such as in works on heart arrhythmia detection [37,38], early diagnosis of hepatocellular cancer [39] or the prediction of credit scoring [40].

Hyperspectral Classification and Band Selection with GAs
GAs have been used many times for the classification and selection of characteristic wavelengths in hyperspectral data. For example, in [10], the authors use GA to find small subsets of the most distinctive bands. In [12], GAs are applied for band selection in preprocessed hyperspectral images in order to classify them. In [41], GA optimization is used to divide hyperspectral bands into three classes related to their discriminative power in the classification task. Authors verify their results using three standard hyperspectral datasets; i.e., the "University of Pavia", "Indian Pines" and "Hekla". The use of GAs for the simultaneous optimization of SVM parameters and band selection in HSI classification is presented in [13]. A similar scheme for multispectral data is used in [11], in which the authors emphasize the advantage of genetic algorithms over parameter optimization using a grid search. A very interesting use of a GA is presented in [42]: the authors apply a GA to a large number of hyperspectral cubes (111 images) in order to determine a subset of wavelengths characteristic for the identification of charcoal rot disease in soybean stems.

Dataset
We used the dataset described in [14], consisting of multiple hyperspectral images of blood and blood-like substances such as artificial blood, tomato concentrate or poster paint. Hyperspectral pixels in which these substances are visible were annotated by authors.
Images in the dataset were captured using a SOC710 hyperspectral camera operating in the spectral range of 377-1046 nm with 128 bands. Two types of images were used in our experiments: the "Frame" images, denoted as F in [14], which present classes on a uniform, white background; and the "Comparison" images (denoted as E), which present classes on diverse backgrounds consisting of multiple materials and fabrics.
We used images captured on days {1, 7, 21}. Following the convention from [14], we denoted the day of acquisition after the scene name in brackets; e.g., F(1) for the scene "Frame" from day 1. The visualisation of the dataset is presented in Figure 1. Figure 1a,b presents the acquisition scenes for two selected images with marked pixels of different substances used during further experiments. Their mean spectra are presented in Figure 1c,d, while Figure 1e,f shows two components of the PCA projection. It is possible to observe that pixels marked in the F(1) image as "uncertain blood" have similar values of principal components to background pixels, while in the case of the E(1) image, "uncertain blood" is more similar to "blood-like substances". Furthermore, spectra of different classes on the F(1) image are more diverse than in the case of the E(1) image, where pixels of various substances overlap according to the PCA projection.  Upper panels present classes as a coloured ground truth on RGB images created from hyperspectral cubes. Middle panels present mean class spectra. Bottom panels present the PCA projection of data for the first two principal components. Images come from [14].

Data Preprocessing
The aim of the initial preprocessing applied to dataset images was to reduce noise and compensate for uneven lighting. The following sequence of transformations was applied to every image:

1.
Median filter: Images were smoothed with a spatial median filter with a window size of one pixel. This operation was intended to reduce the noise in spectra, using the fact that classes were spatially significantly larger than a single pixel. 2.
Spectra normalization: As suggested in [14], the spectrum of each pixel was divided by its median. The purpose of this normalisation was to compensate for uneven lighting in the image.

Feature Extraction
In our experiments, we used derivative transformation to highlight important features of spectra. Derivative analysis [43] is a well-known method for transforming spectral signatures. Derivatives are sensitive to the shape of spectra; therefore, they are particularly effective in differentiating signals with characteristic spectral responses, such as haemoglobin response in blood [44], visible as peaks in wavelengths ∼542 nm and ∼576 nm (called α and β bands). We used first-order derivatives, computed as the difference between neighbouring bands.
A visualisation of the impact of preprocessing and feature extraction on example spectra is presented in Figure 2. Figure 2a,b presents the reflectance spectra of blood for different days after spilling, without and with the division of each pixel by its median value, respectively, while Figure 2c shows spectra after calculating first-order derivatives.

Support Vector Machines
In this work, we focus on the Support Vector Machine [45] (SVM) classifier, which is accurate in hyperspectral classification problems [1], including the classification of hyperspectral forensic data [46] and is well suited for optimisation with GA [13]. HSI classification with SVM can be described as follows: where X denotes a set of examples (e.g., hyperspectral pixels) and Y = {−1, 1} denotes the set of labels, the SVM classifies a hyperspectral example x ∈ X ⊂ R d using a function: where β i ≥ 0 and b are coefficients computed through Lagrangian optimisation (margin maximisation on the training set). The kernel function K : X × X → R is used to compute the similarity measure between the classified example x and every training instance x i . We use three kernel functions: In addition to parameters of a chosen kernel, the SVM has an additional regularisation parameter, C, that controls the balance between the maximisation of the margin between classes and missclassification of examples. The value of this parameter must be fitted to a given problem, typically through cross-validation. However, the use of GA for selecting parameters is complicated by the fact that the value of C is unbounded from the above. Therefore, in our experiments, we used the classifier proposed in [47], namely the ν-SVM, which uses a bounded regularisation parameter ν ∈ (0, 1 , which is an upper bound on the fraction of misclassified examples from the training set and a lower bound on the fraction of support vectors.

K-Nearest Neighbour (KNN)
The K-nearest neighbour algorithm (KNN) [48] belongs to the family of non-parametric models. The principle of operation of the algorithm is based on making predictions based on the closest neighbourhood of an example. A new, unclassified sample is labelled through a majority vote of a neighbourhood of a fixed size weighted by the distance of this sample from the voting neighbors. In our experiments, we used the Euclidean, the Manhattan and the Chebyshev distance measures.

Multilayer Perceptron
A Multilayer Perceptron (MLP) [49] is a neural network composed of a combination of individual perceptrons that together form a multilayer structure. The most frequently distinguished layers are the input, hidden and output layer. Each layer may have a different number of neurons. Advanced network models consist of multiple hidden layers. The MLP is typically trained using a backpropagation algorithm. Despite its simplicity, the MLP achieves high accuracy on hyperpsectral data and is often used as a reference method for other algorithms [1].

Model and Feature Selection with Genetic Algorithms
We used genetic optimisation [9] to simultaneously select parameters of a machine learning model and perform feature selection. The ν-SVM [45] classifier was chosen for this type of optimisation due to its bounded parameterisation of the margin (see Section 3.4).
Taking advantage of the capabilities of the GA, which allow for the optimization of many parameters at once, in our implementation, the type of kernel function, kernel parameters, the regularization parameter and feature (hyperspectral band) selection were performed simultaneously. Table 1 presents the structure of a single individual. In our implementation, this individual consisted of one chromosome. The chromosome consisted of five genes responsible for the kernel type and its parameters and 113 genes responsible for hyperspectral bands. Table 1. The structure of a chromosome corresponding to optimized parameters of the nu-SVM classifier along with selected hyperspectral bands. RBF: radial basis function.

Parameter
Range of Values  Figure 3 shows an example crossover between two individuals (i.e., classifiers). We observed that high probabilities of crossing and mutation had a positive effect on the search space; i.e., they allowed the search space to be better explored and for more solutions to be checked, reducing the chances of finding a locally optimal solution [50]. Thanks to the elitist strategy, there is a certainty that the best individual found will not be lost. The mutation of an individual consists in the modification of a single gene in the chromosome. If it is a gene responsible for a parameter of the SVM, its value is replaced by the new value of the given parameter from the set range (acceptable values are shown in Table 1). If we draw a gene that represents a feature, its value is replaced by the opposite one; e.g., from "not selected" (0) to "selected" (1). Values of our genetic algorithm parameters are presented in Table 2 Figure 3. Visualisation of a one-point crossover between two individuals.

Model Selection with Grid Search
In our experiments, grid search (GS) was used as a reference method for model selection. In many works, the SVM with the regularisation parameter C (denoted SVC) with an RBF kernel function has been used as a reference algorithm; therefore, we used this approach in addition to the ν-SVM. We also tested the KNN and MLP classifiers, as described in Section 3.4. Parameters of model selection with the GS are provided in Table 3.

Model Performance Metric
Because the number of examples in the classes of our dataset was similar, we used the accuracy as a performance metric, defined as follows: where N is the number of folds in cross validation, TP i denotes true positives, TN i denotes true negatives, FP i denotes false positives and FN i denotes false negatives.

Experiments
The main idea behind our experiments was to perform model and feature selection with GA and compare these results with a diverse set of classifiers trained classically; i.e., with a grid-search. Referring to classification scenarios introduced in Section 1, we considered three experimental scenarios: was performed using a separate validation set that was randomly, uniformly sampled from the "Comparison" scene. This scenario was designed to test the capabilities of GA optimisation under different conditions to those in the HIC scenario, which is discussed in detail in Section 6.

The Scheme of Experiments
The experiments can be divided into six stages: 1.
Raw data-The data set consisted of seven hyperspectral images from the data set described in Section 3.1. Every image had 128 hyperspectral bands. The images represented two scenes-the "Frame" scene and the "Comparison" scene. Four of the seven images showed the "Frame" scene, captured in days {1, 1 a , 7, 21}, where the value 1 a represents the afternoon of the first day. The three "Comparison" images were captured on days {1, 7, 21}.

2.
Data preprocessing-Data were transformed in accordance with the methodology described in Section 3.2: in order to reduce the effect of noise and uneven lighting, spectra were smoothed with the median window, normalised and noisy bands were removed. Background (unannotated pixels) and pixels from the class "beetroot juice" (class 4) that was not present in all images were removed. Finally, the problem was posed as a six-class classification with classes Y = {1, 2, 3, 5, 6, 7}.

3.
Feature extraction-A derivative transformation was used, as described in Section 3.3.

4.
Data split-Data were divided into training and test sets. A detailed description of this stage is included in Sections 4.2-4.4.

5.
Model optimization-Model and feature selection were performed as described in detail in Section 3.5. The reference method used for comparison was a grid search. In both cases, the accuracy was chosen as the evaluation criterion. The settings and details of the cross-validation varied depending on the scenario of the experiment; detailed descriptions are provided in descriptions of the individual scenarios. 6.
Model evaluation-The final final results were expressed in terms of classification accuracy. After finding the best model in stage 5, this model was trained on the entire training set and tested on the test sets. The test sets were created from both scenes: "Frame" and "Comparison". The training and testing process was repeated five times and the average accuracy with the standard deviation was calculated.
An overview schema of our experiments based on the above steps is presented in Figure 4. Transitions between successive stages are also described with a short summary of consecutive experiments phases.

Hyperspectral Transductive Classification (HTC)
In the HTC scenario training, pixels were randomly, uniformly sampled from the same images as test pixels. This scenario bore resemblance to a common hyperspectral classification setting, when classifiers are tested, e.g., using the "Indian Pines" data set [1]. The aim of this experiment was to test the capability of classifiers to model classes and distinguish between them.
The training set was a combination of examples from all images; i.e., "Frame" and "Comparison" scenes from all days. The training set consisted of an equal number of examples from each class and each day. We used the size of the least numerous class among all the images (989); therefore, the training set consisted of 41,538 hyperspectral pixels (989 pixels * six classes * seven images).
After selecting the best parameters and features using cross-validation on the training set, classifiers were trained on the whole training set and tested on the remaining examples.

Hyperspectral Inductive Classification (HIC)
In the HIC scenario, classifiers were trained on "Frame" images and tested on "Comparison" images. This scenario simulated a potential forensic application, where the model was prepared using laboratory samples and applied in the field in an unknown environment.
The training set size was 6000 examples (250 examples from each class, from four available images). The test set consisted of a total of 82,097 examples from "Comparison" scenes.
Each model was optimized in the process of a 10-fold cross-validation as visualised in Figure 5. Each time, one fold was used for training and the remaining ones for testing. Additionally, only a subset of 10 randomly selected examples from each class in the training set were used for training in a single cross-validation iteration. After the optimization stage, the best model was trained on examples in the training set and tested on the test set.

Hyperspectral Inductive Classification with a Validation Set (HICVS)
In the HICVS scenario, classifiers were trained on "Frame" images and tested on "Comparison" images, but in the model optimisation stage, a separate validation set was used, consisting of a subset of randomly, uniformly sampled examples from the "Comparison" images. The aim of this experiment was to determine and discuss the impact of applying GA in the model optimisation stage. The purpose was to test a scenario in which GA could perform the selection of features while maintaining model overfitting control. A discussion of this scenario is presented in the Section 6.  Figure 6. Nine folds formed a training subset, and the model was tested on a validation set. The remaining fold was not involved in the validation process. After the optimisation process, the best model was tested on a test set that did not contain examples from the validation set.

Results
This section presents our results divided into the three scenarios corresponding to experiments described in Section 4.

The HTC Scenario
The accuracies of all tested models on "Frame" images in the HTC scenario were close to 100%. Results for the "Comparison" images are presented in Table 4. The accuracy of all classifiers was the highest among the three tested scenarios (HTC, HIC and HICVS). Only the KNN classifier did not achieve an accuracy higher than 90%. The model based on the MLP classifier optimized with GS outperformed other classifiers in every case.
Interestingly, the accuracy for the image from the seventh day was higher than for the remaining images. This may result from time-induced changes in spectra, in particular from the oxidation of haemoglobin in the blood. On the first day, the spectra undergo significant changes, which may translate into high data variance and lower class cohesion. After a few days, the spectra (especially the blood) become more uniform, as can be seen in Figure 2a. Lower accuracy for significantly aged data after 21 days may result from the equalisation of spectral responses between classes as well as additional noise resulting, for example, from the presence of deposited dust. Table 4. Results of the HTC scenario for classification with GA and reference classifiers trained with a grid search (GS). The highest result in each day is denoted with a bold font. SVC: SVM with the regularisation parameter C.

Model
Classifier Accuracy/Day

The HIC Scenario
Results of the HIC scenario are presented in Table 5. The classifier trained with GA outperformed reference methods only on the first day, and even then, the ranges of standard deviations overlapped. For the remaining days, the SVM with a linear kernel scored best. Interestingly, the best kernel chosen by GA optimisation was also the linear kernel, and the number of bands was reduced from 113 to 61. We noticed that the training accuracy-i.e., the accuracy measured on the training set during model optimisation-was close to 100% for almost all models including the classifier trained with GA, which is consistent with the results of the HTC experiment.

HIC Scenario with a Validation Set
Results of the HICVS scenario experiments are presented in Table 6. In this scenario, the accuracy of almost all classifiers improved compared to the HIC scenario (see Table 5), but the GA-optimised classifier outperformed other methods. However, we also noticed an almost fourfold increase in standard deviation for the GA optimised model. Once again, the linear kernel was the winning model for GA and the number of selected bands was 64. Similarly to the HIC scenario, the training accuracy-i.e., the accuracy measured on the training set during model optimisation-was close to 100% for almost all models including the classifier trained with GA.

Computation Time
The 6. Discussion 6.1. The Impact of Preprocessing The preprocessing described in Section 3.3 was done with the aim of extracting class features that were similar in all images. In order to illustrate the impact of the proposed preprocessing and data transformation on classification accuracy in the HTC and HIC scenarios, we performed a simple experiment: we repeated the HIC scenario; i.e., we trained the ν-SVM classifier obtained in the optimization process during the HIC scenario (including feature selection) with examples from all "Frame" images. However, we omitted step 3, "feature extraction", from the procedure described in Section 4; i.e., the classifier processed normalised spectra. The training set size was 6000 examples (250 examples from each class, from four available images). The accuracy for the combined "Comparison" images was acc Comp. = 54.02 ± 0.21, which was lower than the corresponding value in the Table 5; i.e., acc Comp = 66.54 ± 0.45. At the same time, the accuracy for the remaining pixels of "Frame" images was acc Frame = 99.61 ± 0.05, which was similar to the results of HTC experiments.
We conclude that, in the HTC scenario, where training and testing examples came from the same scene, the classifier was able to model classes and reach high classification accuracy even without preprocessing. However, the proposed preprocessing improved the accuracy in the HIC scenario, when the training and test were are more different.

Model Optimisation with GA in Hyperspectral Classification
Reference works on hyperspectral GA-based classification described in Section 2 present their advantages such as the reduction in data dimensionality through band selection, their resistance to overfitting or their consistently higher accuracy than for the reference model selected with GS [11]. However, most of the works consider only the HTC scenario, use similar, airplane or satellite-based images and sometimes compare the method with a model trained with preset parameters [13]. Therefore, to better assess the capability of GA-based model selection, we compared GA and GS in two scenarios that differed in regards to the complexity of the classification problem.
Our results show that in the HTC scenario, both model optimisation techniques resulted in comparable, highly accurate models. We noticed that the accuracy measured on the training set during the process of model optimisation was very similar to the final accuracy on the test set. It seems that for training and test sets created by randomly, uniformly sampling a hyperspectral image, spectra in both sets are similar enough that GA and GS-based model are comparable in regards to their accuracy, and the major advantage of GA in this scenario is the band selection, which more than halved the number of features in our experiments.
Compared to the HTC, the HIC scenario proved to be significantly more challenging. The accuracy values in Table 5 are lower compared to values in Table 4, and it seems that the GA-trained classifier was only slightly better than GS for images captured on the first day and scored second for test images captured on other days (although the number of features was once again halved). In the HIC scenario, training and test data came from images that differed in regards to the lightning conditions, spectral mixtures of imaged classes and the image background. We hypothesise that, despite the fact that both images contained the same, precisely applied and clearly visible substances, differences between the training and the test set were so significant that the selected model was overfitted. This is supported by the fact that, similar to the HTC scenario, the accuracy measured on the training set during the process of model optimisation was very high in the HIC. While GAs allow local maxima to be avoided during model optimization, when all training data are noisy in the same way, there is no global maximum that a GA could find. This hypothesis is further supported by the higher accuracy of the method on the first-day images. Images acquired on the first day were more similar since aging had a significant impact on spectra; e.g., the "blood" class spectrum changed significantly [44] due to haemoglobin oxidation.
In order to better explore the capabilities of GA in HSI model optimisation, we proposed one more experiment: the HICVS scenario described in detail in Section 4.4. In HICVS, the classifier was trained on a similar training set as in the HIC scenario, but during the model optimisation stage, the optimisation algorithm had access to examples in the validation set that were similar to test data. We expect that in this situation GA should gain an observable advantage over GS: since the algorithm can now control model overfitting through every epoch, it should be able to create a better generalizing classifier. Results in Table 6 confirm this hypothesis: while the results of the GS also improved, the improvement for GA was higher, and it scored first for all images.
Referring to our initial hypothesis introduced in Section 1 that GAs allow more accurate hyperspectral classifiers to bed obtained than GS, in our opinion, the presented results support this hypothesis, provided that certain assumptions related to the nature of the processed hyperspectral images are met. First, for a uniform data set, e.g., in the HTC scenario, when the training set is sufficient and uniformly sampled, both model optimisation methods can result in highly accurate, comparable classifiers. However, when spectra become noisy, which results in differences between the training and test sets, GA can outperform GS and avoid model overfitting, provided that a subset of examples similar to test data are available during model optimisation. When the noise between training and test data becomes too big, the advantage of GA over GS in terms of accuracy seems not significant. However, compared to GS, in all scenarios, GA can produce similar or more accurate classifiers while at the same time significantly reducing the dimensionality of the data through band selection.

Conclusions and Future Works
We compared a GA-based model selection with the classic approach based on a grid search in three different hyperspectral classification scenarios. In the hyperspectral transductive classification (HTC) scenario, the training and test data were taken from a single image, so they were similar. For this scenario, if a sufficiently large training set was available, both methods of model selection achieved comparable, very high accuracy. In the hyperspectral inductive classification (HIC) scenario, the training and test data came from different images, which negatively affected the accuracy of all tested classifiers. In this scenario, GAs only gained an advantage over GS for some images; e.g., day 1 image, where the characteristic blood features associated with haemoglobin spectral response were most visible. The third scenario, i.e., the hyperspectral inductive classification with a validation set (HICVS), was created on the basis of the HIC scenario. In the HICVS scenario, the model selection algorithm had access to examples similar to those in the test set, which allowed the GA-based optimisation to outperform GS for all images.
Our results show that for noisy data, as in HIC, the advantage of GA over GS in terms of accuracy is not significant and that in order to achieve this advantage, GA must have examples representative of the test set at the model selection stage; e.g., in the HICVS scenario. On the other hand, for a typical HTC scenario, existing approaches such as [5] or [25] allow very high accuracy to be obtained without an extensive search of the parameter space. This suggests that GA is a promising solution to challenging problems of hyperspectral classification, but its effective use imposes certain requirements on the available training data. This problem shares similarities with the problem of domain adaptation, described, e.g., in [55]. However, in all tested scenarios, the GA was able to generate models that were similar to or more accurate than GS while reducing the number of spectral bands by almost half.
We plan to apply the GA-based approach to different models, in particular recurrent neural networks, deep neural networks and ensemble learning. We would also like to test different feature extraction methods dedicated to the GA-based classification of hyperspectral images, especially in the HIC scenarios.