In this section we aim to show that HypPy has a set of features, which sets it apart from other packages and makes it a versatile and flexible tool for processing hyperspectral data.
In the following sections, the points mentioned above will be discussed in more detail.
2.1. The Usage Levels: GUI, CLI, and API
Image processing packages support different types of interfaces to the software. For instance, the Khoros package [
8] support a GUI, CLI, and API. On top of that, Khoros offers an interface for visual programming, which is called Cantata [
9]. Building a visual programming environment is complex and is not within the scope of HypPy. However, similar to Khoros, HypPy offers a GUI, CLI, and API.
The advantage of having these three levels is that it offers flexibility. GUIs make it easy to use the tools, even for beginners. From the interface, it is clear which inputs and outputs a certain tool may need. In addition to that, a GUI can be used to invoke multiple tasks at once. In this way, two or more programs may be run in parallel, thus enabling the use of multi-processing capabilities of the computer without the need for parallel coding.
The CLI can be used to run tools on the command line. However, the real power of the CLI is that it can be used to create scripts that run a sequence of tools in order to run a complete processing chain. Furthermore, scripting can also be used for time-consuming tasks that run for a long time or automate the processing of large sets of images.
Most of the functionality of HypPy can also be imported into other Python programs using the API.
The GUI of a HypPy tool consists of a separate module, which imports the HypPy function, creates the GUI, and passes the parameters to the HypPy function. One module contains the GUI, while another module contains the API and the CLI. For instance, the edgy.py module contains the functions for hyperspectral edge filtering plus the command-line interface. The GUI is invoked using the tkEdgy.py module. In this case, the three levels are as follows:
GUI program: tkEdgy.py;
CLI program: edgy.pyl
API function: edgy().
The HypPy top-level menu can be used to start the GUI program tkEdgy.py. The Edgy filter can be found as a ‘Spatial–Spectral Edge Filter’ in the ‘Filter’ menu.
2.1.1. GUI
Most HypPy tools are split into two programs, one for the GUI and one for the API and the CLI. Most GUIs are simple interfaces for setting input and output parameters, which are passed to the API. An example of a GUI can be found in
Section 2.4.1, where the Band Math Tool is discussed.
The user interfaces for the GUI processing tools are created using Tkinter, the GUI (graphical user interface) package that is part of the official Python distribution. Tkinter is a thin object-oriented layer on top of Tcl/Tk [
10]. There are many packages that can be used to develop GUIs in Python. For instance, wxPython is a more extensive package (e.g., a widget set) than Tkinter. However, the Tkinter module (Tk interface) is the standard Python interface to the Tk GUI toolkit [
11] and was chosen for HypPy for its simplicity and availability. All the names of the HypPy tools start with ‘tk’ to indicate that these have a GUI.
2.1.2. CLI and Scripting
The command-line interface is useful for running processes at the command prompt. However, the most important reason to have a CLI is to be able to run a processing chain from a script. Scripts are convenient for repeating a set of commands in an iterative manner. In this way, a potentially large set of images can be processed in exactly the same way. Running scripts reduces errors and offers the possibility to run commands that require a lot of processing time overnight.
HypPy adheres to the conventions used to supply command-line options and input and output on the command line. For that purpose. it uses the Python standard module called argparse, which offers a convenient way to build command-line interfaces that can be used in Unix as well as Windows shells.
2.1.3. The API
An application programming interface (API) is a software interface by which one piece of software offers functionality to other software. In Python, functionality from one program can be imported into another program by making use of the import statement. An important part of the API is the API specification, which describes the functions and data structures that are needed to make use of the available functionality.
Currently, a dedicated API manual is not present in HypPy, but essential classes like Image and Spectrum are covered in the Programmers’ Manual and the Spectral Math Manual, respectively. Additionally, many HypPy functions are described in the HypPy Scripting Manual (CLI) and HypPy User Manual (GUI), as the CLI and GUI often act as a thin shell around these functions.
Because data objects are an important part of the API, HypPyś header and image objects are discussed below; see
Section 2.2.
2.1.4. The Top-Level Menu
The top-level menu of HypPy can be configured using the menu configuration file called hyppymenu.cfg. Functions can be Python scripts (typically the HypPy GUI programs), PDF or HTML files, or complete URLs. In this way, the top menu can be used to start scripts as well as present documents such as manuals or web pages.
The tools are executed as separate processes; therefore, multiple HypPy tools can be started and executed in parallel at the same time.
Figure 1 shows the top menu of HypPy.
Python scripts are executed by Python as subprocesses. The PDF, HTML, or URL is delegated to the default reader of the system, which will start the PDF reader or web browser on the system.
2.2. Image Formats
HypPy has adopted the ENVI formats as its native formats. The ENVI image and ENVI spectral library formats are binary files with an accompanying text header file. ENVI supports three different types of interleaving in its native format: band sequential (BSQ), band-interleaved by pixel (BIP), and band-interleaved by line (BIL).
HypPy supports all three ENVI formats: BSQ, BIP, and BIL. In addition, HypPy can map an ENVI spectral library onto an image. The spectral library is then represented as a y by x by z image, in which y is the number of spectra stored in the spectral library, x is always 1 (the spectral library is visible as an image with just one column of pixels), and z is the number of bands in the spectral library. The advantage is that spectral libraries can be viewed just like spectral images. On top of that, a processing chain meant for spectral images can be tested on a spectral library first to see what effect the processing chain will have on the various minerals present in the spectral library.
For consistency in coding, single-band images are mapped onto three dimensions as well, in which case the band dimension is always 1. There is one exception to this rule: classification images are mapped as two-dimensional one-band images. Classification images have a different use and semantics than hyperspectral data, and for this reason, classification images are treated in a different way in HypPy.
Internally, HypPy maps all three formats, BSQ, BIP, and BIL, onto an array with a BIP structure, thus relieving the programmer of having to deal with three different formats.
2.2.1. Header and Image Objects
HypPy’s header module is used to read and write ENVI header files. ENVI images and spectral libraries typically consist of two files, the data and the header. The data files contain raw binary data. The header files contain human-readable text.
Inside HypPy, the ENVI header is represented as a Python object of the class Header. Objects of this class contain attributes that reflect the attributes as they are found in the ENVI header files. In the Python code, there are a few differences. First, the names of the attributes in the ENVI header can contain spaces, which is inconvenient in a programming language like Python because variable names are not allowed to contain spaces. A choice was made to replace the spaces in ENVI attribute names with an underscore. For instance, the ‘data-type’ attribute in the ENVI file header will be renamed to ‘data-type’ in Python. Second, some of the values inside the ENVI header may change their appearance in Python. For instance, IDL records, as they appear inside ENVI headers, such as values of the wavelength attribute, are changed into the standard Python list data type. The translation between IDL records and Python lists is handled by HypPy internally.
HypPy supports all IDL/ENVI raw binary data types from 8-bit unsigned integers to 64-bit signed integers and from 32-bit single floating point numbers to 64-bit double-precision complex numbers with a real-imaginary pair of double precision. Furthermore, through the use of the ‘byte order’ field name, HypPy supports both big-endian and little-endian byte order files.
HypPy has two factory functions in its image module, the Open() function for opening existing images, and the New() function for setting up new (empty) images.
The factory function Open() returns one of the image objects of class ImageBIP, ImageBIL, ImageBSQ, Classification, or ImageSL (a spectral library mapped as an image). Images accessed using the Open() function can only be read, not written. To protect existing images, these are always opened as read-only.
The factory function New() returns one of the image objects of class ImageBIP, ImageBIL, ImageBSQ, or Classification. In addition to these classes, the class ImageSL is also supported, which is an image representation of an ENVI spectral library.
2.2.2. Memory Map
HypPy uses Numpy’s memory map, ‘numpy.memmap’, to map binary data files onto arrays. Memory-mapped files are used to access large files on a disk without having to read the entire file into memory first. NumPy memmap objects are array-like objects; they can be manipulated in the same way as numpy arrays. Because, in HypPy the choice was made to have the BIP structure for the internal array, the axes of the two other formats, BSQ and BIL, must be swapped to look like BIP in the returned memory map. This is achieved by calling the transpose function of the memory map. BSQ is mapped using a transpose(1, 2, 0). BIL is mapped using a transpose(0, 2, 1). The BIP format does not need to be transposed.
One would think that using the transpose function would create a copy of the data with the axes of the data swapped in a different way. However, this is not the case because, for certain operations, numpy supports what is called a ‘view’ on the data. A view creates a reference to an existing array, with the required reorganization of the data added to the metadata. This has two advantages: first, the view does not create a copy of the data in memory, and second, when the view is opened in read–write, any modifications to the view will be reflected immediately in the original data.
Furthermore, endianness and byte order can be changed by creating a view. The three formats (BSQ, BIP, and BIL), endianness (little endian and big endian), and data type (from 8-bit unsigned integers to 64-bit floating points, etc.) can be handled by creating a view of the data on file. For the input and output image data, no copy needs to reside in the memory of the computer. In such a way, HypPy can handle large files that would normally not fit in memory.
The following functions are applied to create views:
Creating a memory map:
numpy.memmap(fname, mode = ’r’, shape = shape, dtype = data_type);
Changing the byte order: method newbyteorder();
Swapping the axes from BSQ to BIP: method transpose(1, 2, 0);
Swapping the axes from BIL to BIP: method transpose(0, 2, 1).
Unfortunately, changing the data type of an input file creates a copy in memory, not a view:
However, there are a few caveats in the use of numpy memory maps. Not all operations and functions that can be used on numpy arrays can be used on memory maps, and only a limited number of operations or functions on a memory map are capable of returning a view.
Additionally, the use of a memory map itself has disadvantages as well. Setting up a memory map uses a bit of extra memory, and all the reorganization of a view must be executed on-the-fly whenever the data on file are accessed. In this way, the memory map is less efficient than making an array copy in memory.
The advantages of a memory map are obvious, it enables the handling of large datasets that cannot possibly fit in the physical memory of the computer. Furthermore, internally, the data are mapped on the same BIP structure all the time; in a program, there is no need to keep track of the real interleave format of the data file. All this is handled transparently by the memory map and the views thereof.
2.2.3. Spectral Library Formats
HypPy supports two basic types of spectral libraries: ENVI spectral libraries and ASCII or text-based spectral libraries. In ENVI spectral libraries, a major drawback is that the list of wavelengths has to be the same for all the spectra in the spectral library. Therefore, the number of bands and the central wavelengths of all the spectra must be the same for all included spectra. An advantage of this format is that it is a binary format and can be addressed like an image, which offers the possibility to process spectral libraries in the same manner as spectral images. This can be used to test processing chains on spectral libraries before running them on images. A process for detecting certain minerals can be tested on the USGS spectral library [
12], for instance, before applying it to an image. Problems with false positives of certain minerals can be caught in the early stages of developing a processing chain.
On the other hand, the ASCII spectral libraries offer more flexibility. Text files can be imported into other programs, and the spectral ranges of the individual spectra do not need to be all the same. A complication is that spectra must be resampled before use. HypPy resamples spectra on-the-fly, using the interp1d() function from the scipy.interpolate module to unite both sets of wavelengths. An intermediate step of resampling spectra is not needed.
2.2.4. Bad Band Lists
The ENVI image format supports the notion of what is called a bad band list (BBL), indicated in an ENVI header file by the keyword BBL. There must be as many items in the BBL as there are bands in the image. Sequentially, the bands are marked as bad or good. A zero (0) for a particular band signifies that the data in this band should not be trusted because of a sensor failure or excessive noise. A one (1) for a band signifies valid data. HypPy can take into account the bad band list, such that the bad bands do not show up in array data. In most HypPy tools, this can be indicated by setting the use_bbl option to true. If this option is set to false, then all of the data are visible in the array, including the bad bands. Such a virtual image, on which the use_bbl is set to true, may have fewer bands than the original data file.
Internally, this is handled by an index-to-index array, which is used as an indexing array to access multiple elements at once. The disadvantage is that handling such virtual images is slower than using the full image. Another disadvantage is that the band numbers in the virtual image are not the same as the band numbers in the original image. However, one of the design ideas of HypPy is to use the wavelength rather than the band number to address a spectral band of an image (see next section).
2.2.5. Sorting Wavelengths
Often, sensors consist of a number of detector banks, each covering a specific wavelength range. The data from these detectors may have overlapping wavelength ranges or may even appear in an unexpected order. For instance, the raw data from the OMEGA instrument of the European Mars Express have the data from the SWIR1 (0.93 µm to 2.69 µm) and SWIR2 (2.53 µm to 5.09 µm) detector banks before the data from the VNIR detector (0.36 µm to 1.07 µm) [
13]. Another well-known hyperspectral instrument, AVIRIS, consists of four spectrometers with overlapping ranges [
14]. Note that redundant bands in the spectral overlap of the AVIRIS spectrometers can also be removed or put in the bad band list; see, for instance, [
15].
Obviously, the switched order and overlapping ranges may lead to problems when plotting or processing such data. Therefore, one of the design criteria of HypPy was to be able to sort the wavelengths of a hyperspectral data cube on-the-fly. In a trade-off between memory use and CPU use, HypPy does not convert the input image into memory; therefore, it must convert the image on-the-fly. Many HypPy tools offer the ‘sort_wavelengths’ option when opening an image.
HypPy sorts the wavelengths by keeping an index-to-index table as an attribute of the images objects. This table translates the band numbers of an input image into bands that are sorted by wavelength. The advantage is that, internally, the data look like they have their bands sorted by wavelength. The obvious disadvantage is that the band numbers of the input image may not correspond to the band numbers of the original image object.
However, if bands are addressed by wavelengths rather than by band numbers, then this last point is irrelevant. Wavelengths have physical meaning and should be the preferred way of addressing image bands. Band numbers may change and even have a different meaning from one remote sensor to another. Unfortunately, not all images have wavelengths, either because the wavelengths are missing or the images do not contain spectral data, which is why band numbers can still be used in HypPy.
2.4. Tools
In the following, a subset of HypPy’s tools will be discussed in more detail.
2.4.1. Band Math Tool
Multispectral and hyperspectral images can be regarded as a set of spectral bands, and functions are often created to process such data in a band-by-band fashion. The band math tool implements this.
An example of such a band math operation is the calculation of the NDVI, in which the difference between a near-infrared band and a red band is normalized by dividing by the sum of the near-infrared band and the red band. For the NDVI, usually for the red band, a range of 400 nm to 700 nm is used, and for infrared, the range is 700 nm to 1100 nm. In the following example, for red, we pick a band around 675 nm, and for infrared, a band around 1000 nm is chosen. In HypPy, the NDVI of an AVIRISNG image can be calculated using one of the following expressions: (
1) or (
2).
In band math, the sequence of image file names is linked to variable i1, i2 and so on. In the example of the NDVI, only one input image is needed, which is linked to the variable i1. Images are numbered starting at 1, and band numbers are numbered starting at 0. In the example, band 59 is the band with a center wavelength of 676.9 nm and band 124 has a center wavelength of 1002.4 nm. Please note that, in this example, the BBL is used, and the first seven bands, which are marked as bad, are not present.
In band math, it is also possible, and even preferred, to use wavelengths instead of band numbers. The band with the nearest central wavelength will be automatically selected. Note that, in band math, when wavelengths are used rather than band numbers, these wavelengths must be enclosed by round brackets, not square brackets. In Python, this is implemented using a ‘call’ on the image class. Python objects can be designed to behave like lists or iterators, having an index between square brackets or a function with a parameter list between round brackets.
Figure 6 shows an example of a band ratio.
HypPy uses the Python eval() function to execute expressions like the NDVI example above. Band math expressions are basically Python expressions. HypPy sets up the variables i1, i2, and so on. These are HypPy image objects, which can be called to return bands of a certain wavelength. Linking the image variables to image files is an essential step in the use of band math expressions. Note that the syntax used in band math differs from the one used in spectral math. Band math uses image objects, while spectral math uses spectrum objects; see below.
2.4.2. Spectrum Class and Spectral Math
Spectral math is used to apply mathematical expressions to spectra of images or spectral libraries. Band math uses the band paradigm; spectral math uses the spectrum paradigm. In spectral math, the image is processed pixel by pixel. In band math, the image is processed band by band. Spectral math relies on the functionality as it is implemented in the spectrum class. The interface of the spectral math tool is similar to the band math tool (see
Figure 6), but the expression must follow the syntax of the spectrum class.
The spectrum class was created for the manipulation of spectra. The idea is that a spectrum object contains information about wavelengths as well as values, be it reflectance, radiance, or other data. The spectrum class offers access to the data, as well as functions to operate on the spectral data.
The list below summarizes some of the properties of the spectrum container object:
The spectrum object contains wavelengths and values.
The spectrum object may contain a name and a description.
An index can be used to select values or make subsets of the spectrum.
An integer as index denotes a band number; a float denotes a wavelength.
Mathematical operators like +, −, * and/can be used on spectra.
The spectrum object contains a wealth of operators and methods. Currently, some 150 functions and 20 operators are implemented (see HypPy’s spectral math manual).
When working with spectra with different wavelengths, the spectra are automatically resampled on-the-fly.
As shown previously, spectral indices can be implemented using band math. A spectral index is supposed to have a high value if a certain mineral or material is present and a low value if it is absent. An example of such an index was given in the previous section in which we presented the example of the NDVI.
However, if the image contains noise, then the resulting NDVI may also contain noise. In such a case, it may be beneficial to work with average values in band ranges rather than single bands. In band math, it is not possible to use band ranges because bands of a range must be addressed one by one. However, in spectral math, a spectral range can be obtained using a slice on the spectrum.
The spectral math expression for NDVI may now look like the following expression (
3):
in which
S1 denotes the spectra from the first image. Spectra of subsequent input images are numbered
S2,
S3, and so on. However, in this example, only one image is used. The square brackets contain what in Python are called slices, denoted by the use of a colon ‘:’. Notice that floats are used as indices, which denote wavelengths. Integers as indices are interpreted as band numbers. The result of a slice on a spectrum is a new spectrum object that contains a subset of the original spectrum object. In this case, the subset will contain the spectral values roughly between 400 nm and 700 nm and between 700 nm and 1100 nm.
The wavelengths are approximate because these wavelengths are matched to the closest band numbers, which may not have the exact same wavelengths as central wavelengths. HypPy does not generate errors or warnings for the conversion of wavelengths to bands to prioritize speed over quality checks, as the checks would consume significant CPU time. However, users and programmers can verify the requested and obtained wavelengths using the functions wavelength2index() and index2wavelength().
Subsequently, the averaging function mean() is used on these subsets. In other words, the result will be the NDVI calculated from the average spectrum between 400 nm and 700 nm and the spectrum between 700 nm and 1100 nm. In this way, spectral indices that operate on ranges of wavelengths rather than single bands can be built. The obvious advantage is that this reduces noise in the output. The downside is that such expressions are slower than band-by-band operations. Furthermore, in the example above, the slices and means are calculated twice because the spectral math tool only supports one single expression. However, by making clever use of the walrus operator ‘
’, which was introduced in Python version 3.8 [
20], double calculation can be prevented. The expression for the NDVI would then look like expression (
4):
in which the mean of the infrared range is stored in the variable
and the mean of the red range is stored in the variable
, respectively, and these values are retrieved later in the expression for calculating the denominator.
Please note that we use a broad range for red and infrared in Equations (
3) and (
4), using the chlorophyll absorption in the visible part of the spectrum (400 nm to 700 nm) as
red and the reflection by the leaf cell structure in the near-infrared part of the spectrum (700 nm to 1100 nm) as
ir. The specific wavelength ranges may vary depending on the remote sensing sensor or technology in use. For instance, Tucker [
21] used the range 630 nm to 690 nm for
red and 750 nm to 800 nm for
ir, which are the ranges of the Landsat MSS spectral bands 5 and 7, respectively.
HypPy uses the Python eval() function to execute expressions like the averaging NDVI example above. The spectral math expressions are Python expressions. However, HypPy sets up the variables S1, S2, and so forth, which are HypPy spectrum objects which can be used as a Python container class. In that sense, the behavior of the spectrum object is similar to a Python list object, which can be indexed, sliced, looped over, and so on.
In HypPy, spectral math can be used in the spectral math tool as well as in the spectral library viewer.
2.4.3. Spatio-Spectral Filters
Spatio-spectral filters take into account both spectral information and spatial information. HypPy has three types of spatio-spectral filters: gradient filters, mean filters, and binning filters. Gradient filters are implemented by regarding a local spatial kernel of spectra as vectors and to calculate the weighted difference between these vectors. The mean filters calculate the weighted sum of these vectors. Binning filters reduce the number of samples by aggregating values.
HypPy has a tool for hyperspectral edge filtering. Such filters can be used to detect spectral variability in a scene. Areas with a high value are spectrally inhomogeneous, and areas with low values are spectrally homogeneous. The results of such filters can serve as a first step in image quality assessment, classification, or segmentation. The result of the gradient filters is always a single-band grey-level image. The HypPy tool called hyperspectral gradient implements a number of directional (up, down, and two diagonals) and non-directional filters (edgy and Sobel). A number of distance measures have been implemented that can be used in conjunction with these filters.
The spectral angle (SA) distance measure only takes into account the spectral information of the input image because the spectral angle is insensitive to the intensity of the input spectrum. The intensity difference (ID) distance measure, on the other hand, only takes into account the intensity information of the spectrum, not the spectral information. The Euclidean distance (ED) measure is sensitive to both the spectral and intensity information of the spectra, although most of the time, the ED seems to be dominated by the intensity of the input spectra. In addition to the standard spectral difference measures discussed above, two other distance measures, which are widely used in hyperspectral analysis, are available: the Bray–Curtis (BC) distance and the spectral information divergence (SID) [
22]. All these distance measures have their own properties regarding spectral similarity.
Where the gradient filters take into account a 3 × 3 two-dimensional set of full spectra, the mean filters take into account a smaller three-dimensional set of spectral values. The mean filters apply a 3 × 3 × 3 kernel on the hyperspectral data cube. The filter can take all the values of the kernel into account or a limited set of the kernels. Current kernels in HypPy calculate the weighted average of 27, 19, or 7 of the values of the kernel. Two averaging functions are implemented: the mean and median. A mean filter is a linear filter and, therefore, predictable in what the effect on the image will be. A median filter reduces the effect of outliers. In that respect, a median filter may be useful for filtering out pepper-and-salt noise. However, the effect of a median filter is not as easy to predict as a mean filter, which may lead to unexpected effects in the output image. In certain cases, median filters may have an effect comparable to morphological filters in the sense that areas seem to shrink or grow, which may be confusing and thus complicate the interpretation of the resulting image.
The result of a spatio-spectral mean filter is always a hyperspectral cube with the same dimensions as the input. However, like with regular two-dimensional spatial filters, edge effects should be taken into account. The first and last values in any of the three dimensions (x, y, and band) are treated differently than the rest of the hyperspectral cube.
In addition to these filters, HypPy has a number of binning tools, spatial, spectral, and spatial–spectral binning, to reduce the number of samples in the image cube. These tools can be used to decrease the data volume, reduce the noise by averaging, and speed up subsequent processing steps. For instance, spatial–spectral binning of 3 × 3 and 11 will reduce the data volume by almost a factor of 100 (3 × 3 × 11 = 99) and reduce the standard deviation of the noise by almost a factor of 10 ().
2.4.4. Wavelength Mapper
Mapping physical properties from hyperspectral images is complicated and often relies on the availability of spectral libraries of target materials. Many existing methods depend on the subjective decisions of the interpreter for the selection of endmembers and the post-processing of rule images. If no prior information is available, information should be extracted by characterization of spectral absorption features. On multi-spectral images, such features can be found using indices based on band ratios and band depths; see, for instance, [
23]. However, hyperspectral images show more spectral detail of the features, which opens up the possibility of using more sophisticated methods.
Absorption features of spectral curves are important for analysis and interpretation. Such features may be parametrized as center wavelength position, depth, area, and asymmetry [
24]. Usually, the most important features are the center wavelength position and the depth of the main absorption feature. The wavelength position of an absorption feature can be used for the identification of minerals because many minerals have their own unique absorption wavelengths. The depth of such a feature is an indication of the abundance of a mineral.
The center wavelength and depth of the deepest feature and consecutive deepest features can automatically be determined. These parameters can be calculated for all the pixels (spectra) of a hyperspectral image and presented as grey-level images representing the wavelength and depth of the feature.
Subsequently, the resulting wavelength and depth images can be fused using an HSI (hue–saturation–intensity) transform [
17] in such a way that the resulting image shows the wavelength in color (the hue). The intensity represents the depth of the feature. The saturation is set to 1. The wavelength mapper tool of HypPy automatically determines the wavelength and depth of the deepest absorption feature of all the pixels of a hyperspectral cube and fuses the two into one color image.
Figure 7a shows a rock sample in a natural color composite.
Figure 7b shows the result of the wavelength mapper;
Figure 7c shows the legend to the wavelength map. The REE-containing monazite is visible as orange grains. The greens and the blues are carbonates. For a further discussion of this rock sample, see [
16].
The wavelength mapper is unsupervised, straightforward, and repeatable; large areas can be mapped at once, and details are preserved within the focus of the chosen spectral range. The result, called the wavelength map, can be understood by users with a limited image processing background. However, although the colors may suggest the presence of certain minerals, the resulting wavelength map is not a mineral map. The wavelength map can be used as a pre-classification step in order to understand the spectral variability in the scene. The knowledge obtained from the wavelength map is useful for the additional steps of producing a mineral map.
The wavelength mapper relies on only two input parameters: the wavelength range to focus on, and the stretch to use for the depth of the feature. The wavelength mapper was first published in van Ruitenbeek et al. [
25], and a tutorial on the method was published in [
24].
2.4.5. Decision Trees
A decision tree is a type of classifier that consists of a series of binary decisions used to determine the class for each input pixel. The decisions are based on the characteristics of the datasets. Every decision partitions data into either one of two potential classes or groups of classes. Decisions are expressions applied to a single image or a set of images. Each of these images represents a property of the input pixels.
One such property, for instance, could be the wavelength position of the deepest absorption feature of the spectrum of the input pixel. An expression could be built to check whether the input pixel might contain a clay mineral with a deepest absorption feature of around 2.2 µm. Subsequent decisions could check the water features of this pixel to see whether the material has a high crystallinity index. After a number of decisions, a class will be assigned to the pixel.
Decision trees are useful for building classifiers that are based on physical properties rather than statistical inference. Of course, this can only be carried out if all the properties needed for the decision trees can be established with enough confidence.
A package like ENVI supports the construction and visualization of new decision trees. However, ENVI has a number of disadvantages. The way decision trees are implemented in ENVI is rather rigid because once a node in the tree is constructed, its place in the decision tree is fixed. In a finalized ENVI decision tree, it is impossible to move elements around. In ENVI, if you want to change the decision tree, you have to rebuild it from scratch.
HypPy can be used to execute ENVI decision trees. The only condition is that expressions used for decisions can be translated into expressions in Python.
In addition to the ENVI decision tree format, HypPy has its own decision tree format. This format is a text-based format that can be edited using a text editor.
For the visualization of decision trees, HypPy uses the Graphviz package [
26]. To this end, the ENVI or HypPy decision tree must be converted to the ‘dot’ format first. The dot language is an extensive language for describing graphs. The binary decision trees used by ENVI and HypPy present a tiny subset of all possible graphs and types of graphs supported by the Graphviz package.
2.4.6. Zonal Statistics
The zonal statistics tool calculates statistics on pixels of an input image within the zones defined by the zone image. The input image is a regular hyperspectral or multispectral image. The zone image can be the result of a classification image or a mask image. For each distinct value in the zone image, an aggregation function is run on the input image for all the locations having this distinct value in the zone image. The currently implemented aggregation functions are as follows: mean, median, minimum, maximum, standard deviation, and mean plus or minus two times the standard deviation.
The results of the zonal statistics are useful for determining the properties of areas in the input image. For instance, in the case of noise in the input image, the mean spectrum of an area may reveal more spectral detail than the spectrum of a single pixel, in which spectral details of features may be drowned out by noise.