Simulated Annealing-Based Hyperspectral Data Optimization for Fish Species Classification: Can the Number of Measured Wavelengths Be Reduced?

Relative to standard red/green/blue (RGB) imaging systems, hyperspectral imaging systems offer superior capabilities but tend to be expensive and complex, requiring either a mechanically complex push-broom line scanning method, a tunable filter, or a large set of light emitting diodes (LEDs) to collect images in multiple wavelengths. This paper proposes a new methodology to support the design of a hypothesized system that uses three imaging modes—fluorescence, visible/near-infrared (VNIR) reflectance, and shortwave infrared (SWIR) reflectance—to capture narrow-band spectral data at only three to seven narrow wavelengths. Simulated annealing is applied to identify the optimal wavelengths for sparse spectral measurement with a cost function based on the accuracy provided by a weighted k-nearest neighbors (WKNN) classifier, a common and relatively robust machine learning classifier. Two separate classification approaches are presented, the first using a multi-layer perceptron (MLP) artificial neural network trained on sparse data from the three individual spectra and the second using a fusion of the data from all three spectra. The results are compared with those from four alternative classifiers based on common machine learning algorithms. To validate the proposed methodology, reflectance and fluorescence spectra in these three spectroscopic modes were collected from fish fillets and used to classify the fillets by species. Accuracies determined from the two classification approaches are compared with benchmark values derived by training the classifiers with the full resolution spectral data. The results of the single-layer classification study show accuracies ranging from ~68% for SWIR reflectance to ~90% for fluorescence with just seven wavelengths. The results of the fusion classification study show accuracies of about 95% with seven wavelengths and more than 90% even with just three wavelengths. Reducing the number of required wavelengths facilitates the creation of rapid and cost-effective spectral imaging systems that can be used for widespread analysis in food monitoring/food fraud, agricultural, and biomedical applications.


Introduction
Over the past 20 years, hyperspectral imaging (HSI) has become an invaluable tool for food safety and quality applications [1,2]. Spoilage and contamination of food and agricultural products are ongoing concerns for the food industry. Recent applications of hyperspectral imaging for food safety include detection of mold in peanuts [3,4], lead pollution in lettuce leaves [5], and Fusarium head blight in wheat kernels and wheat flour [6]. Food fraud, the intentional misrepresentation of food or food ingredients for economic gain, is another major food safety issue that has been addressed with hyperspectral imaging. For example, this technology has been applied for identifying fillets of less expensive species of fish that have been marketed and sold as more expensive red snapper (Lutjanus campechanus) fillets [7,8].
Hyperspectral imaging has been a staple of agriculture monitoring, with initial applications dating back to the 1970s. Early applications include large-scale remote monitoring of land and agriculture from the Landsat-I satellite [9], monitoring of crop yield [10], and detection of plant disease and invasive species [11]. While agriculture applications have remained constant since these early examples, the methods have changed with new technologies enabling more localized analysis. Unmanned aerial vehicles (UAVs) have become attractive survey platforms for local, detailed aerial monitoring efforts [12] and advancements in computing technology and miniaturization of HSI devices have enabled the construction of new systems for in-field crop analysis [13].
Hyperspectral imaging devices are complex systems that can be characterized by the method with which the full spatial-spectral data cube is obtained. Data cubes can be acquired by spatial scanning, spectral scanning, or by a combination of these methods [14]. With spatial scanning imagers, light is collected at a point or along a line and dispersed into its spectral components by a dispersive optic such as prism or diffraction grating. This point or line is then scanned over the target area through the physical motion of the sensor, reflection from a scanning mirror, or physical motion of the target object. With spectral scanning imagers, the full spatial content is collected by the image sensor for individual wavelengths in sequence. Collection of the wavelengths is typically accomplished by switching wavelengths through filter wheels, electronically controlled liquid crystal tunable filters (LCTF), or acousto-optic tunable filters (AOTF) [15].
Despite successes in the food safety and agriculture industries, hyperspectral imaging does have disadvantages, mostly due to the data cube being constructed from individual components collected in a time-sequential manner. This can be an error-prone process, especially for high-speed imaging applications. Another category of the hyperspectral imager, the snapshot imager, overcomes these issues by combining an array of optics to collect both the spatial and spectral information simultaneously. Usually, this means some compromise in either the spectral or spatial domain. All of these solutions tend to be both complex and costly [16]. In research and discovery, it is unknown which wavelengths will be significant and which are redundant. In many cases, once the spectral characteristics for a particular targeted application are understood, there can be a significant reduction in the complexity of the spectral imaging system.
Issues common to all hyperspectral imager types are the significant computing power required and the large file sizes of the data cubes, especially in applications involving larger fields of view. Attempts to address these issues have included the application of compressive sensing [17][18][19], deep neural networks [20], and methods centered around principal component analysis (PCA) [21]. Each of these solutions has its own limitations in terms of heavy computational requirements and large file sizes for data cube analysis.
This paper shows proof of concept for a new method for selecting narrow wavelengths for the classification of material samples. This method could support the design of a hypothetical rapid spectral imaging system consisting of a focal plane array covered with a mosaic color filter array or illumination by selected wavelength LEDs. These can collect full spatial resolution images at a small number of narrow wavelengths for visible/nearinfrared (VNIR), shortwave infrared (SWIR) reflectance, and fluorescence. The proposed method has the potential to be applied in a hand-held, mobile device for rapid scanning of food products in wholesale or retail marketplaces or configured as a drone-deployable payload for low-altitude aerial scanning of crops and vegetation.
The aim of this study was to evaluate the potential of this new method for use in an application combating food fraud by determining the correct species of fish fillets that are often mislabeled to justify a higher selling price [8,22]. Specific objectives were to (1) develop and evaluate a heuristic wavelength selection algorithm, (2) develop and evaluate methods for classifying the species of a fillet using classifiers designed for both single-mode spectroscopy and a fusion of spectroscopy modes, and (3) compare the relative effectiveness of each spectral mode for this classification task.

Hyperspectral Imaging Systems
Full-resolution reflectance and fluorescence images were collected using an in-house developed visible and near-infrared (VNIR) hyperspectral imaging system [23]. The light source for the VNIR reflectance was a 150 W quartz tungsten lamp (Dolan Jenner, Boxborough, MA, USA). For fluorescence imaging, two UV narrowband light sources were used, each with four 10 W, 365 nm, LEDs (LED Engin, San Jose, CA, USA). VNIR reflectance images in 125 wavelengths within the 419-1007 nm spectral range and fluorescence images in 60 wavelengths within the 438-718 nm range were acquired using a 23 mm focal length lens, an imaging spectrograph (Hyperspec-VNIR, Headwall Photonics, Fitchburg, MA, USA), and a 14-bit electron-multiplying charge-coupled device (EMCCD) camera (Luca DL 604M, Andor Technology, South Windsor, CT, USA).
A separate hyperspectral imaging system was used to acquire reflectance images in the SWIR region. The illumination source for this system was a custom-designed two-unit lighting system, each with four 150 W gold-coated halogen lamps with MR16 reflectors. The detection unit included a 25 mm focal length lens and a hyperspectral camera, including a 16-bit mercury cadmium telluride array detector and an imaging spectrograph (Hyperspec-SWIR, Headwall Photonics, Fitchburg, MA, USA). The SWIR reflectance images were acquired in a wavelength range of 842-2532 nm (287 wavelengths).

Simulated Annealing
Rather than sensing the full resolution spectra in each of the three modes, the proposed method uses just a small number of narrow wavelength bands (referred to simply as "wavelengths" in this paper) that are specifically chosen to yield accurate species classifications. Simulated annealing, a heuristic optimization method modeled after the metallurgical annealing process in which the metal undergoes controlled cooling to remove defects and toughen it, was used to select the wavelengths. The simulated annealing algorithm consists of a discrete-time inhomogeneous Markov chain with current state s(i) and a cooling schedule defined by a starting temperature, T max , a final temperature, T min < T max , and a total number of steps, n [24]. The goal of the algorithm is to determine the minimum of a user-defined energy function, E(i).
At each iteration i ∈ 1, · · · , n, a new trial state is determined by randomly selecting a "neighbor" of the previous state and calculating its energy. If the resulting energy is less than the energy from the previous iteration, the trial state becomes the new state of the system. If the resulting energy exceeds the energy of the previous energy, the algorithm adopts the trial state with probability given by: where T(i) is the temperature at iteration i. Note that this equation allows the algorithm to occasionally accept states that result in an increase in energy. This can benefit the optimization by preventing it from becoming stuck in local minima. The probability of accepting such states is high at the beginning of the process when the temperature is high but gradually decreases with decreasing temperature. The output of the algorithm is the state with the lowest energy encountered throughout the annealing schedule. Figure 1 provides a summary of this algorithm. gradually decreases with decreasing temperature. The output of the algorithm is the state with the lowest energy encountered throughout the annealing schedule. Figure 1 provides a summary of this algorithm. For this wavelength selection problem, we define the state as an array of binary elements indicating the presence or absence of each wavelength in the full-resolution spectrum. Because the collected spectra may contain artifacts at the lowest and highest wavelengths, we institute a fixed buffer of size at either end of the spectrum. Thus, the state at iteration i can be expressed as where ( ) is 1 to indicate that the jth wavelength is selected and 0 to indicate it is not, and is the total number of wavelengths in the spectrum. Furthermore, because consecutive wavelengths are highly correlated and thus offer little additional information if both are selected, we institute a minimum separation of wavelength indices between selected wavelengths. Finally, we set a limit, , on the number of wavelengths selected such that: Under these three restrictions, we update the state for each iteration by generating a "neighbor" of the current system state. This is done by randomly de-selecting one wavelength index from the current state and selecting a new one. The energy of the trial state is then calculated as 1 − ( ) where ( ) is the average 4-fold cross validation accuracy (see Section 2.5) as determined using the weighted k-nearest neighbors (WKNN) classifier. WKNN is a variation of the familiar k-nearest neighbors algorithm where the training data points are weighted based on the squared inverse of their distances from the query point. It was chosen as the basis for the energy calculation because of its relatively high For this wavelength selection problem, we define the state as an array of binary elements indicating the presence or absence of each wavelength in the full-resolution spectrum. Because the collected spectra may contain artifacts at the lowest and highest wavelengths, we institute a fixed buffer of size m at either end of the spectrum. Thus, the state at iteration i can be expressed as where I(j) is 1 to indicate that the jth wavelength is selected and 0 to indicate it is not, and N is the total number of wavelengths in the spectrum. Furthermore, because consecutive wavelengths are highly correlated and thus offer little additional information if both are selected, we institute a minimum separation of q wavelength indices between selected wavelengths. Finally, we set a limit, k, on the number of wavelengths selected such that: Under these three restrictions, we update the state for each iteration by generating a "neighbor" of the current system state. This is done by randomly de-selecting one wavelength index from the current state and selecting a new one. The energy of the trial state is then calculated as 1 − a(i) where a(i) is the average 4-fold cross validation accuracy (see Section 2.5) as determined using the weighted k-nearest neighbors (WKNN) classifier. WKNN is a variation of the familiar k-nearest neighbors algorithm where the training data points are weighted based on the squared inverse of their distances from the query Appl. Sci. 2021, 11, 10628 5 of 20 point. It was chosen as the basis for the energy calculation because of its relatively high classification performance and its rapid training time. Accuracy, in this sense, is calculated as the percentage of correct classifications, weighted by the number of samples per class in the test sets to ensure equal contribution from each class.
The simulated annealing algorithm was implemented in Python 3.7 using the simanneal 0.5.0 library [25]. The temperature parameters were set to T max = 25 and T min = 0.05 and the number of steps was set to n = 5000. These temperature values were selected to ensure nearly 100% selection of new states in the initial steps, regardless of whether the energy decreased or increased, and nearly 0% selection of states that increased the energy during the final steps. The number of steps was chosen to balance the desire for rapid processing with the need for algorithm convergence.
We compared the performance of the proposed simulated annealing approach for wavelength selection with three common feature selection methods: analysis of variance (ANOVA), recursive features elimination (RFE), and Extremely Randomized Trees (i.e., Extra Tress) [26] classifier feature importance. The ANOVA method selects features based on their ability to provide separation between the target classes in a linear manner. The RFE method is a standard linear regression method which takes as inputs the desired number of features to select and the linear classification method (in this case, the linear discriminant classifier was used). Finally, the nonlinear Extra Trees method assigns a quantitative importance to each feature based on its relevance to correct classification. Performance comparison was conducted using the same WKNN classifier featured in the simulated annealing algorithm.

Classification of Fish Species
To evaluate the success of the optimal wavelength selection algorithm, a pair of classification studies were conducted with the goal to determine the correct species of a fillet based on spectral information from a single sample point on the fillet represented by one 10 × 10 pixel block (i.e., voxel). For both studies, a multi-layer perceptron (MLP) neural network served as the primary classifier. In the first study, each spectral mode (i.e., VNIR, fluorescence, and SWIR) was investigated separately and the results of the MLP classifier were compared with results from a collection of common machine learning classifiers. The classifiers were trained on the spectral values from the selected wavelengths and evaluated using 4-fold cross-validation. In the second study, the selected wavelengths from the three spectral modes were combined in the input layer of the MLP classifier, and this spectral fusion method was again evaluated with 4-fold cross-validation. Both studies were repeated for numbers of selected wavelengths k = 3, 4, 5, 6, and 7. Results using all available wavelengths were included as a benchmark for comparison.

Multi-Layer Perceptron (MLP) Classifier
An MLP neural network is a common feed-forward artificial neural network that determines its weight values through supervised learning to yield a nonlinear decision boundary designed to minimize a cost function. In this case, the cost function was defined as the complement of the multiclass classification accuracy (weighted by the number of samples per class). For each of the studies described in the subsequent sections, the same two-layered MLP network shown in Figure 2 was used. To protect against overfitting, dropout with a probability of 50% was applied to both hidden layers [27]. Additionally, L2 kernel regularization (with factor λ = 0.0001) was applied to both hidden layers to protect against overfitting by adding a term to the loss function that increases with the magnitude of the network's weight vector. The input and hidden layers featured the rectified linear unit (ReLU) activation function, and the output layer included the softmax activation function to yield the classification decision. Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 22

Single-Mode Classification Study
In addition to the MLP classifier, four common machine learning classifiers-including support vector machine with a linear kernel (SVM), WKNN, linear discriminant (LD), and Gaussian Naïve Bayes (GNB)-were used to perform classification separately for each of the VNIR, fluorescence, and SWIR data. As with the first study, feature sets consisted of the k spectral samples with no further attempt at feature selection. A 4-fold cross-validation was conducted for each study as a robust estimation of multiclass classification accuracy (weighted by the number of samples per class).
SVM determines the set of maximum-margin hyperplanes to separate the classes in the feature space. WKNN, as explained above, is a variation on the k-nearest neighbors algorithm that weights the training points by the inverse square of their distances from the query point. LD classification makes simplifying assumptions about the data (i.e., Gaussian distributed with the same covariance matrix for all classes) to determine the separating hyperplanes. Finally, GNB combines the probabilities of obtaining the measured value for each input given each specific class and selects the class with the highest resulting probability. GNB assumes statistical independence between the inputs [28]. SVM was included due to its reputation as a high-performance classifier. WKNN, another robust classifier, was included for its performance and because of its use in the simulated annealing algorithm. LD was included for comparison to evaluate any performance degradation that might result from the expected violation of the Gaussian or identical covariance assumptions. GNB was included for comparison to evaluate performance degradation due to the expected violation of independence among the inputs (i.e., the selected wavelengths).
Each classifier was trained with the = 3, 4, 5, 6, and 7 wavelengths selected by the simulated annealing algorithm for each of the three spectral modes. To place the resulting classification accuracy values in context, the results of this study were compared with benchmark classification accuracies determined using all wavelengths in the full-resolution spectra.

Single-Mode Classification Study
In addition to the MLP classifier, four common machine learning classifiers-including support vector machine with a linear kernel (SVM), WKNN, linear discriminant (LD), and Gaussian Naïve Bayes (GNB)-were used to perform classification separately for each of the VNIR, fluorescence, and SWIR data. As with the first study, feature sets consisted of the k spectral samples with no further attempt at feature selection. A 4-fold cross-validation was conducted for each study as a robust estimation of multiclass classification accuracy (weighted by the number of samples per class).
SVM determines the set of maximum-margin hyperplanes to separate the classes in the feature space. WKNN, as explained above, is a variation on the k-nearest neighbors algorithm that weights the training points by the inverse square of their distances from the query point. LD classification makes simplifying assumptions about the data (i.e., Gaussian distributed with the same covariance matrix for all classes) to determine the separating hyperplanes. Finally, GNB combines the probabilities of obtaining the measured value for each input given each specific class and selects the class with the highest resulting probability. GNB assumes statistical independence between the inputs [28]. SVM was included due to its reputation as a high-performance classifier. WKNN, another robust classifier, was included for its performance and because of its use in the simulated annealing algorithm. LD was included for comparison to evaluate any performance degradation that might result from the expected violation of the Gaussian or identical covariance assumptions. GNB was included for comparison to evaluate performance degradation due to the expected violation of independence among the inputs (i.e., the selected wavelengths).
Each classifier was trained with the k = 3, 4, 5, 6, and 7 wavelengths selected by the simulated annealing algorithm for each of the three spectral modes. To place the resulting classification accuracy values in context, the results of this study were compared with benchmark classification accuracies determined using all wavelengths in the fullresolution spectra.

Spectral Fusion Classification Study
For this study, the wavelengths were selected for each of the three spectral modes independently, as discussed in the previous section, and then concatenated into a single vector, which formed a new input layer for the MLP classifier. This classifier was then trained and evaluated (using 4-fold cross-validation) for k = 3, 4, 5, 6, and 7 wavelengths and the results were compared with a benchmark determined by including all wavelengths from the full-resolution spectra. Due to concerns about the usefulness of the SWIR data for species classification, we also evaluated fusion with just the VNIR and fluorescence modes. Figure 3 shows an overview of the data acquisition and processing steps for the studies represented in this paper. The database for this study consisted of VNIR and SWIR reflectance and fluorescence spectra collected from 133 fish fillets representing a total of 25 different species groups ( Table 1). The species for each fillet was verified using DNA barcoding [8]. Each fillet was placed in a 150 × 100 × 25 mm sample holder created with a 3D printer (Fortus 250mc, Stratasys, Eden Prairie, MN, USA) using production-grade black thermoplastic. Image acquisition was conducted by the pushbroom method, where a linear motorized translation stage was used to move the sample holder incrementally across the scanning line of the imaging spectrograph. The length of the instantaneous field of view (IFOV) was made slightly longer than the length of the sample holder (150 mm) by adjusting the lens-to-sample distance. The resulting spatial resolution along this dimension was determined as 0.4 mm/pixel. Each fillet was sampled along the width direction (100 mm) of the holder with a step size of 0.4 mm to match the spatial resolution of the length direction [8].

Spectral Fusion Classification Study
For this study, the wavelengths were selected for each of the three spectral modes independently, as discussed in the previous section, and then concatenated into a single vector, which formed a new input layer for the MLP classifier. This classifier was then trained and evaluated (using 4-fold cross-validation) for = 3, 4, 5, 6, and 7 wavelengths and the results were compared with a benchmark determined by including all wavelengths from the full-resolution spectra. Due to concerns about the usefulness of the SWIR data for species classification, we also evaluated fusion with just the VNIR and fluorescence modes. Figure 3 shows an overview of the data acquisition and processing steps for the studies represented in this paper. The database for this study consisted of VNIR and SWIR reflectance and fluorescence spectra collected from 133 fish fillets representing a total of 25 different species groups ( Table 1). The species for each fillet was verified using DNA barcoding [8]. Each fillet was placed in a 150 × 100 × 25 mm sample holder created with a 3D printer (Fortus 250mc, Stratasys, Eden Prairie, MN, USA) using production-grade black thermoplastic. Image acquisition was conducted by the pushbroom method, where a linear motorized translation stage was used to move the sample holder incrementally across the scanning line of the imaging spectrograph. The length of the instantaneous field of view (IFOV) was made slightly longer than the length of the sample holder (150 mm) by adjusting the lens-to-sample distance. The resulting spatial resolution along this dimension was determined as 0.4 mm/pixel. Each fillet was sampled along the width direction (100 mm) of the holder with a step size of 0.4 mm to match the spatial resolution of the length direction [8]. Flat-field corrections were applied to the VNIR and SWIR reflectance images and the fluorescence images to convert the original absolute intensities in CCD counts to relative reflectance and fluorescence intensities [29]. An initial spatial mask was then created for each imaging mode to separate the fish fillets from the background. To filter out inaccurate measurements around the thinner edges and portions of the fillets near the bone structure, an outlier removal scheme was instituted. Outliers were handled by first calculating the mean (μ) and standard deviation (σ) of the fish pixel intensities over the entire fillet. Voxels of 10 × 10 pixels were considered to mimic independent fish fillet spectral point measurements using the field of view of a fiber optic spectrometer. Exclusion occurred if ≥10% of the constituent pixels in a voxel exceeded μ ± 2 σ to eliminate outliers. Figure 4 shows an example result of voxel processing where most of the excluded voxels are concentrated near the fillet edges. This approach produced a final set of spatial masks, one each for the VNIR and SWIR reflectance and fluorescence images, which determined  Flat-field corrections were applied to the VNIR and SWIR reflectance images and the fluorescence images to convert the original absolute intensities in CCD counts to relative reflectance and fluorescence intensities [29]. An initial spatial mask was then created for each imaging mode to separate the fish fillets from the background. To filter out inaccurate measurements around the thinner edges and portions of the fillets near the bone structure, an outlier removal scheme was instituted. Outliers were handled by first calculating the mean (µ) and standard deviation (σ) of the fish pixel intensities over the entire fillet. Voxels of 10 × 10 pixels were considered to mimic independent fish fillet spectral point measurements using the field of view of a fiber optic spectrometer. Exclusion occurred if ≥10% of the constituent pixels in a voxel exceeded µ ± 2 σ to eliminate outliers. Figure 4 shows an example result of voxel processing where most of the excluded voxels are concentrated near the fillet edges. This approach produced a final set of spatial masks, one each for the VNIR and SWIR reflectance and fluorescence images, which determined the blocks to be used for analysis. Finally, the fluorescence spectra were scaled by a constant factor of 6000, the approximate maximum of fluorescence spectral values in the database. This was done to set the range of fluorescence values to between zero and one. Alternative normalization methods such as z-score and area under the curve (AUC) normalization were tried as well and produced similar results. However, this simple scaling was chosen because, unlike these alternatives, it requires no knowledge of the entire spectrum and is thus consistent with the concept of collecting only a small number of wavelengths for analysis. Table 1 provides a summary of this database with the numbers of fillets per species and the number of valid voxels for each fillet and each collection mode.

Fish Fillet Data Collection
The reflectance and scaled fluorescence spectra for each of the 25 fish species are shown in Figure 5. The significant differences in the shapes and positions of the spectral averages for the various species and the homogeneous nature of the spectra (as indicated by the relatively short error bars) suggest that high classification accuracies can be achieved with this spectral information.
tabase. This was done to set the range of fluorescence values to between zero and one. Alternative normalization methods such as z-score and area under the curve (AUC) normalization were tried as well and produced similar results. However, this simple scaling was chosen because, unlike these alternatives, it requires no knowledge of the entire spectrum and is thus consistent with the concept of collecting only a small number of wavelengths for analysis. Table 1 provides a summary of this database with the numbers of fillets per species and the number of valid voxels for each fillet and each collection mode.

Cross-Validation Train and Test Datasets
For both the single-mode and the spectral fusion studies, 4-fold cross-validation was conducted by dividing the complete dataset (as described in Table 1) into four disjoint test sets, each of which contained voxels from at least one fillet of each of the 25 species. The corresponding training set for each test set was then composed of all data not in the test set. Four-fold cross-validation (as opposed to the more common 5-or 10-fold versions) was chosen because there was greater variability between fillets of the same species than between voxels of the same fillet. Thus, we wanted to ensure that each test set contained entire fillets that were not included in the corresponding training set. For those species with more than four fillets in the complete dataset (e.g., Malabar blood snapper), the fillets were divided into the four test sets with the goal of having the total number of fillets in each test set as equal as possible.

Data Imbalance Correction
To prevent classification biases due to data imbalances between the various species, we applied sampling with replacement to each training set to produce 8000 voxel samples per species for a total of 200,000 samples in each training set. No resampling was applied to the test sets, but the measured multiclass classification accuracies were weighted by the number of voxel samples per class to ensure an equal contribution from each species. The reflectance and scaled fluorescence spectra for each of the 25 fish species are shown in Figure 5. The significant differences in the shapes and positions of the spectral averages for the various species and the homogeneous nature of the spectra (as indicated by the relatively short error bars) suggest that high classification accuracies can be achieved with this spectral information.

Cross-Validation Train and Test Datasets
For both the single-mode and the spectral fusion studies, 4-fold cross-validation was conducted by dividing the complete dataset (as described in Table 1) into four disjoint test sets, each of which contained voxels from at least one fillet of each of the 25 species. The corresponding training set for each test set was then composed of all data not in the test set. Four-fold cross-validation (as opposed to the more common 5-or 10-fold versions) was chosen because there was greater variability between fillets of the same species than between voxels of the same fillet. Thus, we wanted to ensure that each test set contained entire fillets that were not included in the corresponding training set. For those species with more than four fillets in the complete dataset (e.g., Malabar blood snapper), the fillets were divided into the four test sets with the goal of having the total number of fillets in each test set as equal as possible.

Data Imbalance Correction
To prevent classification biases due to data imbalances between the various species, we applied sampling with replacement to each training set to produce 8000 voxel samples per species for a total of 200,000 samples in each training set. No resampling was applied to the test sets, but the measured multiclass classification accuracies were weighted by the number of voxel samples per class to ensure an equal contribution from each species.

Wavelength Selection
The purpose of wavelength selection is to enable classification with a limited number of wavelengths (3-7) that can be created using optical filters, LEDs, etc. to produce a simple, low-cost classification device. The robustness of the proposed simulated annealing approach was evaluated by running 10 iterations of the algorithm with the VNIR data for the k = 7 cases and examining the variation in the resulting selected wavelengths and the associated WKNN classification accuracies. Figure 6a shows the wavelengths selected for each of the 10 iterations, with each row of similarly colored dots representing a single iteration. Although some variability in the selected wavelengths is noticeable, the plot of multiclass classification accuracies for these iterations in Figure 6b shows little variability in the resulting accuracy. The standard deviation over these 10 accuracy values was 0.13%. Figure 7 shows the average VNIR reflectance spectrum for a red snapper fillet with the k = 3, 4, 5, 6, and 7 optimal wavelengths selected by the simulated annealing algorithm. For all k values, the selected wavelengths correspond to interesting peaks, valleys, and inflection points of the spectrum. Clearly, the region of wavelengths <600 nm is favored along with the trough near 950 nm.
The wavelength selections for the fluorescence data in relation to the average spectrum for one of the red snapper fillets are shown in Figure 8. For this mode, the initial wavelength selections are concentrated at the minima of the spectrum with no wavelengths near the large peak around 670 nm selected until the k = 6 case. approach was evaluated by running 10 iterations of the algorithm with the VNIR data for the k = 7 cases and examining the variation in the resulting selected wavelengths and the associated WKNN classification accuracies. Figure 6a shows the wavelengths selected for each of the 10 iterations, with each row of similarly colored dots representing a single iteration. Although some variability in the selected wavelengths is noticeable, the plot of multiclass classification accuracies for these iterations in Figure 6b shows little variability in the resulting accuracy. The standard deviation over these 10 accuracy values was 0.13%.  Figure 7 shows the average VNIR reflectance spectrum for a red snapper fillet with the k = 3, 4, 5, 6, and 7 optimal wavelengths selected by the simulated annealing algorithm. For all k values, the selected wavelengths correspond to interesting peaks, valleys, and inflection points of the spectrum. Clearly, the region of wavelengths <600 nm is favored along with the trough near 950 nm. The wavelength selections for the fluorescence data in relation to the average spectrum for one of the red snapper fillets are shown in Figure 8. For this mode, the initial wavelength selections are concentrated at the minima of the spectrum with no wavelengths near the large peak around 670 nm selected until the k = 6 case.   Table 2 shows the results of the comparison between the proposed simulated annealingbased wavelength selections method and the three alternative methods. For each combination of spectral mode and number of selected wavelengths, the simulated annealing method yields the set of wavelengths that produces the highest 4-fold cross validated classification accuracy with the WKNN classifier. The wavelength selections for the fluorescence data in relation to the average spectrum for one of the red snapper fillets are shown in Figure 8. For this mode, the initial wavelength selections are concentrated at the minima of the spectrum with no wavelengths near the large peak around 670 nm selected until the k = 6 case.    Table 2 shows the results of the comparison between the proposed simulated annealing-based wavelength selections method and the three alternative methods. For each combination of spectral mode and number of selected wavelengths, the simulated annealing method yields the set of wavelengths that produces the highest 4-fold cross validated classification accuracy with the WKNN classifier.  . Average SWIR reflectance spectrum for one of the red snapper fillets with the optimal k = 3, 4, 5, 6, 7 wavelength selections.

Results of the Single-Mode Study
Average cross-validated (4-fold) classification accuracies for the VNIR reflectance data are given in Table 3. The column labeled "Benchmark" gives the results for the case where all wavelengths are included. The set of columns under "Selected Wavelengths" list the resulting accuracies based on the spectral values at the k = 3, 4, 5, 6, 7 optimal wavelengths. Results for the fluorescence data are provided in a similar manner in Table 4 and for the SWIR reflectance data in Table 5. Values in bold denote the highest accuracy for each number of selected wavelengths. Looking first at the accuracies for the benchmark cases, MLP yields the highest accuracy for the fluorescence data but comes in second for the SWIR data and third for the VNIR data. The superior performance of LD, a relatively simple classifier, for the VNIR and SWIR benchmark cases suggests that overfitting is a significant problem for these cases. Accuracies for the SWIR data are far lower, with LD yielding the highest accuracy at just 80.7%. GNB yields the lowest accuracies for all three modes, reinforcing the notion that classification performance is not dependent upon the values from the selected wavelengths themselves but upon their values in relation to one another. The independence assumption of GNB results in low performance.
Looking next at the "Selected Wavelengths" cases, MLP outperforms the other classifiers for all k values and spectral modes (except for the k = 3 case with the VNIR data). Accuracies >85% are possible given spectral values at just seven or fewer wavelengths for the fluorescence data and >80% for the VNIR reflectance data. Most importantly, with MLP trained on only seven spectral values, the resulting accuracies are within 10 percentage points of the benchmark case for all three spectral modes. The highest performance (89.91%) is seen for the fluorescence data. Figure 10, Figure 11, and Figure 12 show confusion matrices for the k = 7 MLP results from the single-mode VNIR, fluorescence, and SWIR data, respectively. The classification performance is clearly best with the fluorescence data with accuracies >95% for many species. However, the accuracies for some other species are much lower. For example, goosefish has the lowest accuracy at 62.5%, being misclassified as rockfish 28.2% of the time. This is an indication that nearly an entire goosefish fillet was misclassified as rockfish in one of the folds. The overall classification performance is a little lower with the VNIR data. Winter skate shows the lowest classification accuracy at 39.4% in this case, being misclassified as goosefish 26.0% of the time and as almaco jack 13.0% of the time. Much worse performance is seen with the SWIR data, where we find a larger variety of misclassifications. Rockfish has the lowest classification accuracy at just 15.7% with high percentages of misclassification (>14%) as Atlantic cod, haddock, Pacific halibut, and Pacific cod. MLP trained on only seven spectral values, the resulting accuracies are within 10 percentage points of the benchmark case for all three spectral modes. The highest performance (89.91%) is seen for the fluorescence data. Figure 10, Figure 11, and Figure 12 show confusion matrices for the k = 7 MLP results from the single-mode VNIR, fluorescence, and SWIR data, respectively. The classification performance is clearly best with the fluorescence data with accuracies >95% for many species. However, the accuracies for some other species are much lower. For example, goosefish has the lowest accuracy at 62.5%, being misclassified as rockfish 28.2% of the time. This is an indication that nearly an entire goosefish fillet was misclassified as rockfish in one of the folds. The overall classification performance is a little lower with the VNIR data. Winter skate shows the lowest classification accuracy at 39.4% in this case, being misclassified as goosefish 26.0% of the time and as almaco jack 13.0% of the time. Much worse performance is seen with the SWIR data, where we find a larger variety of misclassifications. Rockfish has the lowest classification accuracy at just 15.7% with high percentages of misclassification (>14%) as Atlantic cod, haddock, Pacific halibut, and Pacific cod.    The variability of these single-mode classification results with each of the four crossvalidation folds can be seen in Figure 13. The lower and upper limits of the error bars in each plot represent the minimum and maximum accuracies, respectively, for the fourfolds. The red dashed line in each plot represents the benchmark accuracy obtained by MLP using all wavelengths in the spectrum.  The variability of these single-mode classification results with each of the four crossvalidation folds can be seen in Figure 13. The lower and upper limits of the error bars in each plot represent the minimum and maximum accuracies, respectively, for the fourfolds. The red dashed line in each plot represents the benchmark accuracy obtained by MLP using all wavelengths in the spectrum. The variability of these single-mode classification results with each of the four crossvalidation folds can be seen in Figure 13. The lower and upper limits of the error bars in each plot represent the minimum and maximum accuracies, respectively, for the four-folds. The red dashed line in each plot represents the benchmark accuracy obtained by MLP using all wavelengths in the spectrum. The results of the single-mode classification study prove that high accuracies can be obtained (especially with the MLP classifier) with just seven or fewer wavelengths. The best benchmark performance (92.9%) using all wavelengths was seen with the fluorescence mode with MLP followed by VNIR reflectance (91.7%) and then SWIR reflectance (80.7%), both with LD. The superior performance of LD in these cases suggests the inclusion of all wavelengths significantly increases the potential for overfitting. With seven wavelengths in the fluorescence case, the MLP accuracy came within ~3% of the benchmark accuracy. Review of the confusion matrices from this study reveal that although high overall accuracies can result from these single-mode classifications, each spectroscopic mode has its own unique set of strengths and weaknesses. Furthermore, highly concentrated misclassification results were seen in a few cases, suggesting that entire fillets in the test sets were sometimes misclassified. This is likely a consequence of the somewhat small size of our current dataset. We believe these misclassifications can be alleviated in future studies as we increase the number of fillets per species to better represent the within-species variability of the spectra. The results of the single-mode classification study prove that high accuracies can be obtained (especially with the MLP classifier) with just seven or fewer wavelengths. The best benchmark performance (92.9%) using all wavelengths was seen with the fluorescence mode with MLP followed by VNIR reflectance (91.7%) and then SWIR reflectance (80.7%), both with LD. The superior performance of LD in these cases suggests the inclusion of all wavelengths significantly increases the potential for overfitting. With seven wavelengths in the fluorescence case, the MLP accuracy came within~3% of the benchmark accuracy. Review of the confusion matrices from this study reveal that although high overall accuracies can result from these single-mode classifications, each spectroscopic mode has its own unique set of strengths and weaknesses. Furthermore, highly concentrated misclassification results were seen in a few cases, suggesting that entire fillets in the test sets were sometimes misclassified. This is likely a consequence of the somewhat small size of our current dataset. We believe these misclassifications can be alleviated in future studies as we increase the number of fillets per species to better represent the within-species variability of the spectra. Table 6 gives the resulting average 4-fold cross-validation accuracies for the MLP classifier with the spectral modes fused at the input layer. As with the single-mode study, the value in the "Benchmark" column is the accuracy obtained by fusing all wavelengths from the various modes. We present results from the fusion of all three modes as well as results of fusion without the SWIR mode. This latter iteration was included due to the poor performance with the SWIR data in the single-mode study. By fusing the modes, MLP is able to produce classification accuracies that exceed the highest accuracies from the single-mode study by >10% for k = 3 and by >4% at k = 7. An accuracy of >90% is obtained even with only three wavelengths. The fusion accuracies with all three spectral modes exceed the accuracies without the SWIR data only by 1-2 percentage points for the k = 3, 4, 5, 6, 7 cases (and is lower for the benchmark case), indicating that SWIR, in fact, does not contribute independent information for species classification. Figure 14 shows the confusion matrix for the fusion of all three modes with k = 7. Note that although the rates of correct classification are >99% for many species and >90% for 20 species, the large, concentrated misclassification errors seen in the single-mode study were found here as well. Tuna has the lowest classification accuracy at 61.8%, with 27.8% of the misclassifications as Malabar blood snapper. In this case, less than 8% of the voxels from the two tuna fillets in one of the test sets were classified correctly. Table 6. Resulting average 4-fold cross-validation accuracies for the fusion of spectral modes in the input layer of the MLP classifier. The values in the "Benchmark" column refer to accuracies obtained using all wavelengths in each spectral mode.

Fusion
Benchmark  Table 6 gives the resulting average 4-fold cross-validation accuracies for the MLP classifier with the spectral modes fused at the input layer. As with the single-mode study, the value in the "Benchmark" column is the accuracy obtained by fusing all wavelengths from the various modes. We present results from the fusion of all three modes as well as results of fusion without the SWIR mode. This latter iteration was included due to the poor performance with the SWIR data in the single-mode study. By fusing the modes, MLP is able to produce classification accuracies that exceed the highest accuracies from the single-mode study by >10% for k = 3 and by >4% at k = 7. An accuracy of >90% is obtained even with only three wavelengths. The fusion accuracies with all three spectral modes exceed the accuracies without the SWIR data only by 1-2 percentage points for the k = 3, 4, 5, 6, 7 cases (and is lower for the benchmark case), indicating that SWIR, in fact, does not contribute independent information for species classification. Figure 14 shows the confusion matrix for the fusion of all three modes with k = 7. Note that although the rates of correct classification are >99% for many species and >90% for 20 species, the large, concentrated misclassification errors seen in the single-mode study were found here as well. Tuna has the lowest classification accuracy at 61.8%, with 27.8% of the misclassifications as Malabar blood snapper. In this case, less than 8% of the voxels from the two tuna fillets in one of the test sets were classified correctly. Table 6. Resulting average 4-fold cross-validation accuracies for the fusion of spectral modes in the input layer of the MLP classifier. The values in the "Benchmark" column refer to accuracies obtained using all wavelengths in each spectral mode.  These results support the hypothesis that individual strengths of different spectroscopic modes can be combined to form a classifier with superior accuracy. Stated another way, the failure modes of each spectroscopic mode can be mitigated by the other two modes to significantly reduce all misclassification rates. Furthermore, Table 6 and Figure   Figure 14. Confusion matrix for the fusion of all three spectral modes with k = 7 selected wavelengths (overall classification accuracy = 94.5%).

Fusion
These results support the hypothesis that individual strengths of different spectroscopic modes can be combined to form a classifier with superior accuracy. Stated another way, the failure modes of each spectroscopic mode can be mitigated by the other two modes to significantly reduce all misclassification rates. Furthermore, Table 6 and Figure  14 reveal that significant improvements in accuracy are possible even with just three selected wavelengths from each mode. However, the low accuracies found for certain fish species suggest the need for an expansion to the proposed methodology to enable high classification accuracy for large numbers of fish species. We are currently investigating a cascading multiple-model approach that will be the subject of a future publication. Future work with a larger dataset will also include hyperparameter optimization to identify the optimal MLP architectures for each of the single-mode and fusion cases.

Conclusions
This effort was designed to evaluate the potential of a new methodology for selecting narrowband wavelengths from multiple spectroscopic modes and combining the spectral values at these wavelengths to enable the accurate classification of materials under investigation. The simulated annealing algorithm was found to robustly produce optimal sets of k wavelengths for k = 3, 4, 5, 6, 7. The results of the two classification studies confirm proof of concept for the proposed methodology to support the design of inexpensive hyperspectral imaging devices to classify fish species featuring homogenous spectral data. Future work will include a larger database of fillets for this same food fraud application and will consider agricultural and biomedical applications where the data is expected to be more heterogeneous. Both the optimization and the classification components of the algorithm will be revised and improved to meet the challenges of these more complex applications.