The EnMAP-Box — A Toolbox and Application Programming Interface for EnMAP Data Processing

The EnMAP-Box is a toolbox that is developed for the processing and analysis of data acquired by the German spaceborne imaging spectrometer EnMAP (Environmental Mapping and Analysis Program). It is developed with two aims in mind in order to guarantee full usage of future EnMAP data, i.e., (1) extending the EnMAP user community and (2) providing access to recent approaches for imaging spectroscopy data processing. The software is freely available and offers a range of tools and applications for the processing of spectral imagery, including classical processing tools for imaging spectroscopy data as well as powerful machine learning approaches or interfaces for the integration of methods available in scripting languages. A special developer version includes the full open source code, an application programming interface and an application wizard for easy integration and documentation of new developments. This paper gives an overview of the EnMAP-Box for users and developers, explains typical workflows along an application example and exemplifies the concept for making it a frequently used and constantly extended platform for imaging spectroscopy applications. OPEN ACCESS Remote Sens. 2015, 7 11250

With the launch of the Environmental Mapping and Analysis Program (EnMAP), a German spaceborne imaging spectroscopy mission envisioned for 2018, a new wealth of full range spaceborne imaging spectroscopy data from all types of environments is expected to be made available [35].As part of the mission preparation, several research projects are funded that aim at the development of new algorithms to explore the full potential of EnMAP data in different application fields.In addition, a series of EnMAP summer schools is organized for doctoral researchers from these projects and from other research groups, where the young scientists are trained to implement their ideas for shared use.The EnMAP-Box, a toolbox for processing imaging spectrometer data, has been developed since 2010 to make the algorithmic developments of the mission's preparation phase generally available.This toolbox will be freely distributed with EnMAP data and it is developed along three objectives: (i) complying with standards of common software for imaging spectroscopy data analysis, while extending functionality (e.g., for regression analysis); (ii) including very powerful approaches from the field of machine learning for state-of-the-art image processing; and (iii) offering interfaces to increasingly used scripting languages and, this way, most rapidly evolving resources.The background and motivation for these objectives are briefly outlined in the following paragraphs.
Earth observation (EO) with imaging spectrometers started in 1982 with the first flights of the Airborne Imaging Spectrometer (AIS) followed by the Advanced Visible/Infrared Imaging Spectrometer (AVIRIS) in 1987 [36,37].The high information content of the contiguous narrow spectral bands, which cover the visible (VIS), near infrared (NIR), and short-ware infrared (SWIR), provided valuable insights on surface characteristics and opened up new pathways of image analysis.The unprecedented spectral resolution of imaging spectrometers enabled analyzing a spectrum's shape, e.g., its slope or the depth and width of absorption features.In this context, the field of chemometrics offered a suite of techniques [38] that had already been applied to laboratory spectra from minerals and rocks (e.g., [39,40]).A first collection of applications for image data was available in the Spectral Image Processing System (SIPS), which was predominantly programmed in the Interactive Data Language (IDL) and could be used on various operating systems [41].SIPS included routines for data input, visualization and analyses (e.g., spectral angle mapper, linear spectral unmixing [42,43]) and was the precursor for ENVI (Environment for Visualizing Images) [44].ENVI addressed a broad range of users, allowing inexperienced users to easily adapt to an imaging spectroscopy workflow.At the same time, experienced users could include their own algorithms through the use of open and generic image file formats and the interface to IDL and its array-based data handling.The rapid methodological developments throughout the 1990s made ENVI the preferred choice for image processing (IP) from airborne instruments such as AVIRIS.
Looking at spaceborne EO the first three decades of application development were focused on multispectral data with four to six spectral bands.Many of the methods used to analyze such EO data, e.g., broad-band vegetation indices or Gaussian maximum likelihood classification, are not necessarily suited for analyzing imaging spectroscopy data.They either do not exploit the wealth of spectral information [34] or require additional IP steps such as feature extraction or selection [45].The introduction of machine learning algorithms in EO data analysis, e.g., kernel-based methods [26] or self-learning decision trees [46], improved such drawbacks and the rapid increase in hardware processing power made them applicable [47].Machine learning approaches are mostly used for statistical learning, e.g., in the context of data classification or regression, or data mining, but also for the inversion of radiative transfer models [48].Due to high processing efforts, the most powerful implementations of machine learning algorithms are usually written in C/C++ and Java (see http://jmlr.org/mloss/).A well-known example is the LIBSVM library for support vector machines (SVM) [49], which is used in many applications and regularly improved, e.g., by new optimization approaches.Nevertheless, such libraries are usually domain-specific and are not provided with disciplinary interfaces, e.g., to read and write standard EO data formats or to automatize parameter tuning.At the same time, they are often not included in commercial software packages or only with limited parameter options.The use of, e.g., LIBSVM in typical environments for EO data processing thus requires programming skills beyond that of most EO data users.Other examples in this context are Breiman's Random Forests [46], Markov Random Fields [50], or Gaussian Processes [51].
In parallel to the development of machine learning algorithms, there is a growing community that uses scripting languages for a wide range of applications.Over the past 10-15 years, languages such as R [52] or Python [53] opened up new and very effective pathways for effective code generation.The advantage of scripting languages that make it "possible to specify programming tasks in a few lines of code that would otherwise take hundreds of lines in a lower-level language such as C or C++" was already pointed out by Aho in a Science Viewpoint in 2004 [54].Especially Python's increasing popularity caused by "simple syntax, abundant online resources and a rich ecosystem of scientifically focused toolkits with a heavy emphasis on community" has recently also been described by Perkel [55] in a Nature Editorial.The success and power of scripting languages is to a great extent based on the constant improvement of shared resources within an active community.Flexible interfaces are needed in order to make the strength of such shared development available in software products.
Against this background, the concept for the EnMAP-Box is developed with a focus on exploring the full potential of future EnMAP data from various environments.Overarching aims are (1) extending the EnMAP user community beyond the existing user community of airborne imaging spectroscopy and (2) providing free-of-charge and user-friendly access to recent approaches for imaging spectroscopy data processing for experienced users.The EnMAP-Box works stand-alone but may be integrated into the ENVI classic menu to extend the range of available applications.It can be used with any multispectral imagery.It combines functionality for spectrally high dimensional data with latest machine learning and interfaces to R or Python.A standard procedure for effectively integrating machine learning approaches was created.This way, a suite of methods and interfaces is created that includes advantages from other commercial, e.g., ENVI, or non-commercial products, e.g., the Orfeo Toolbox [56], and combines them into a set of applications with functionality for imaging spectroscopy data analysis that does not exist elsewhere.The toolbox is delivered together with an application programming interface (API) for a standardized integration of new developments independent from the respective programming language.
This paper provides an overview of the EnMAP-Box (Section 2.1) and an application example (Section 2.2) followed by a description of the API (Section 3).The paper concludes with a general evaluation of the concept with regard to the aims and objectives and an outlook on future developments (Section 4).

Overview
The EnMAP-Box (currently version 2.1) is mainly developed in IDL for Windows, Mac, and Linux operating systems and distributed with an open source license via the mission's website www.enmap.org.It requires the free-of-charge IDL virtual machine or an IDL/ENVI license.It provides a graphical user interface (GUI) that is designed in a single window approach (Figure 1) and contains a menu to start all tools and applications, a file list for managing all open image or spectral library data as well as an arbitrary number of frames displaying spectral images or libraries, which can easily be arranged within the window.Single frames, which include image data or library spectra, may be detached and moved to a second screen.Drag-and-drop functionality exists between the file list and frames to open images/bands or libraries/spectra, easily.Navigation and visualization within single frames are handled through common mouse gestures (e.g., direct pixel selection, panning and zooming with left mouse button and mouse-wheel, respectively) or the mouse context menu (e.g., displayed bands selection, data stretching, etc.).Users may also open a console at the bottom of the window, which gives an overview of processing tasks or tools for interactive manipulation, such as pixel/spectra labelling.A detailed description of the GUI functionality is given in manuals at www.enmap.org.
The EnMAP-Box uses generic file formats for storing image data and spectral libraries, e.g., binary spectral data in band sequential order with an associated header file that includes all metadata information.The design of the header files is compatible to the frequently used ENVI file format and in-and outputs therefore compatible with other software products.The EnMAP-Box header partly extends the current standards, e.g., to integrate labels for continuous values when performing quantitative mapping.In addition, spectral libraries are treated as pseudo-images (i.e., single column images) to allow their direct usage in IP methods, e.g., for the training of a regression model.This image-like handling of spectral libraries also enables attributing spectra with values from label images, which offers possibilities for regression analysis that are not available, e.g., in ENVI.
Available basic utilities are designed for spectrally high dimensional data.These include tools for the generation of image statistics, scatter plots, random samples, spatial/spectral subsets or reclassified images, and for data scaling, data transformation (with linear and kernel-based principle component analysis) or image stacking, and for applying masks, e.g., to select pixels to generate a spectral library.For interactive plotting of spectra an R-interface is created, which gives users a variety of graphical options and allows saving the resulting figures directly to the clipboard or as a file.This exemplifies the potential of the EnMAP-Box IDL interface to other scripting languages (see Section 3 for details).The core part of the EnMAP-Box is a growing set of applications.This includes both universal applications that are generally very strong for the analysis of EnMAP-like data (e.g., support vector approaches) and application-or data-specific approaches (e.g., EnWaterMap for the automatic generation of water and shadow masks [57]).With the aim of providing most recent developments to the users, several steps were taken to create a standardized implementation of new applications as easy as possible (see Section 3.1).As a result, several well-known data processing approaches are already available in the EnMAP-Box (Table 1 [57][58][59][60][61][62][63]), which were developed and documented by a variety of laboratories involved in the EnMAP mission preparation.The standardized development scheme includes the option of separate license agreements for each included application.Authors of such applications retain all property rights for their work.Existing applications like imageSVM for SVM classification and regression and imageRF for Random Forest classification and regression [58] are already widely used for the analysis of EO data.In addition, a spatial-spectral calculator for pixel/band arithmetic is included (imageMath), which allows, e.g., the use of spectral filters to smooth or generate derivatives for full images in the spectral domain.The list of available applications shows a relatively high number of different regression approaches.Such quantitative approaches are especially effective on imaging spectroscopy data since absorption depths or other features of the contiguous spectral bands can be linked to environmental indicators with linear and non-linear empirical models.Still, most existing IP software only offers concepts for qualitative analysis, e.g., layers for discrete labels.The EnMAP-Box provides both qualitative and quantitative accuracy assessments for different mapping approaches.For qualitative evaluation, the EnMAP-Box integrates approaches for area-based normalization according to [64].The quantitative accuracy assessment calculates metrics like root mean squared error of Pearson correlation and visualized the error distribution.To the authors' knowledge, such comprehensive accuracy assessments for qualitative and quantitative results are not available in commercial IP software.In order to be fully independent from operating systems and to facilitate an easy use of textual, tabular and graphical outputs, all reports are printed to HTML documents.Section 2.2 shows an application example with an imageSVM regression with quantitative accuracy assessment.
With regard to the large share of ENVI users, the EnMAP-Box is programmed in a way that it can be integrated into the regular ENVI Classic menu (Figure 2).ENVI's available bands list then functions as the file list and users may access the EnMAP-Box applications without leaving their regular IP environment.

imageSVM: An Application Example for Quantitative Mapping in the EnMAP-Box
One of the first advanced applications in the EnMAP-Box was imageSVM for classification and regression.This machine learning application has been used for a variety of studies including both classification [14,65] and regression approaches [13,66].Its implementation structure is outlined in Section 3.3.In the following, imageSVM is used to map vegetation fractions from simulated EnMAP data.Along this application example, the EnMAP-Box concept for integrating machine learning approaches as well as the functionality for quantitative mapping and accuracy assessment are explained and, this way, some key advantages of the EnMAP-Box compared to other available software products are illustrated.
The first step for mapping vegetation fractions with support vector regression (SVR) is the parameterization of the model.This includes the selection of model parameters and fitting the hyperplane during an optimization.This process is semi-automized in imageSVM by offering a grid search with default ranges that proved useful during many tests on remote sensing data.For each parameter combination, i.e., the kernel parameter γ, the penalization parameter C, and Vapnik's insensitive loss function parameter ε, model accuracy is tested using cross validation.After starting the imageSVM regression model parameterization from the main menu, the user is asked to specify a spectral image and reference information.The EnMAP-Box interprets spectral libraries as images and users may therefore train from library data, and reference information for classification or regression may be entered analogous to image data.The reference information expects a single-band image with floating point pixel values and the definition of a no data value.This file type was created for the EnMAP-Box and constitutes a regression equivalent to training data for classification.Using the advanced options, users may select all relevant options individually (Figure 3).
During all machine learning applications in the EnMAP-Box, the model parameterization or training is separated from its application to an image or spectral library.Therefore, the SVR parameterization ends with saving the best model and listing it in the file list.An HTML report on the model performance during cross validation is created for the user to evaluate the model (Figure 4).In addition, the separation allows users to apply the model to a variety of images, independent from the source of training data.For the model application, an SVR model and an image need to be specified together with an optional mask and a path and name for the output file (Figure 5).After successful model application, the resulting estimate is output and listed in the file list for further processing.In order to evaluate the accuracy of the result, the EnMAP-Box offers a comprehensive set of measures and visualizations specifically designed for the regression modelling of spectral image data.When choosing the accuracy assessment for regression, the user is asked to specify the regression estimate image and a reference image plus an optional mask.Afterwards, an HTML report appears showing information on the data sets, a variety of statistical measures and visualizations (Figure 6).

Overview
Several measures are foreseen to make the EnMAP-Box an evolving toolbox where new applications can easily be integrated, regardless of the respective programming language.This way, the provision of latest developments to the user community shall be ensured.From a user perspective, such developments must ensure certain standards, e.g., data formats, but similarly the "look-and-feel" to enable the intuitive combination of several IP steps.Looking at the developer community needs, this may be achieved by offering wizards and pre-programmed code to quickly embed core parts of an application, which themselves do not have to be translated to IDL.Developers can therefore download an EnMAP-Box version with the full source code, the EnMAP-Box API, wizards to create base structures and code documentation, together with manuals for developers.
The concept for integrating external applications into the EnMAP-Box starts with a standardized strategy for all supervised approaches, which separates the model parameterization (Figure 3) and application of respective models to image data.In doing so, supervised approaches and most other procedures can be structured in five distinguished steps: (1) the collection of all parameters and files; (2) reading spectral data from files; (3) analysis of spectral data using specified parameters; (4) writing results to files; (5) reporting and visualizing results (Figure 7).Interaction by the user with the GUI and application specific dialogues is limited to steps 1 and 5 and can be considered an outer shell for the full program integration.The API offers auto-managed widgets that allow selection of various types of parameters and communicate with the file list for quick file selection.Steps 2 and 4 include data in-and output (data IO), for which API routines are provided, including tile-based reading and writing which is required for the large data volume of imaging spectroscopy data.The actual data processing is performed in step 3.By introducing this standardized structure, an optimized use of the different API components (i.e., auto-managed widgets, (tile-based) data IO, reporting) is enabled.At the same time, the core part of external applications is decoupled from all user interaction and concentrated in step 3, which then requires no standardization and may often be favorable to remain in an external programming language.In the case of imageSVM, for example, the LIBSVM library is integrated via the IDL-JAVA Bridge interface to perform the parameter optimization, only (see Section 3.2).

Figure 7.
General framework for standardized integration for embedding applications.
In the case of scripting languages, the bridging between IDL and languages like R or Python is more complicated as large binary spectral data cannot easily be passed-on.In this case, the data IO is better performed in the scripting language (Figure 8 (top)).In addition, user defined parameters and filenames are easily transferred via JSON strings (Figure 8 (bottom)).
Another component for the standardized implementation and consistent coding is the application development wizard.This IDL tool may be used to create a set of IDL routines which already follow the general framework for application development in Figure 7.This includes the creation of routine skeletons for input dialogues, image/data processing and HTML report creation, but also invokes source code documentation via IDLdoc to generate Javadoc-style HTML documents from comments in IDL [67].Both measures are seen as incentives for all programmers to follow certain standards and this way enable joined work on improving and extending existing methods.

imageSVM-Implementing JAVA Code with the EnMAP-Box API
Aiming at high processing performances, scripting languages (e.g., R, Python) or interpreter languages (e.g., IDL, MATLAB) are often outperformed by C, C++ or Java.The integration of algorithms in these languages appears especially useful when optimization procedures and repetitive processes are needed.In the case of SVM, the use of LIBSVM [49] appears also useful with regard to constant improvements of the base code.In the imageSVM application, LIBSVM is used as a Java archive (i.e., JAR-file) via the IDL-Java Bridge for the optimization and evaluation of individual models with given training/test data and pre-defined parameters as part of an IDL-based grid search (Figure 9).imageSVM for regression makes full use of the concept for embedding external Java.Following the parameter selection with auto-managed widgets from the API (Figure 6), the spectral data and labels that are needed for the model parameterization are read.The application specific core part consists of an IDL-based data preparation for the cross validation.IDL code is used to pass the spectral data and labels for training and testing via the IDL-Java Bridge to LIBSVM separately for each set of parameters resulting from the previously defined ranges.LIBSVM returns the models' cross validation accuracy, and within IDL a best parameter pair is selected for the final training in LIBSVM.This model is saved and all results from the grid search are reported.The data output step is not required, here, because the application of the model to an image is performed separately.

Figure 9.
Framework for embedding applications for the imageSVM regression example using Java-based LIBSVM and the IDL-Java Bridge.

Integration of Existing Libraries Using the Command Line Interface
Projects like the Geospatial Data Abstraction Library (GDAL, www.gdal.org)and the Orfeo ToolBox [56] have a long development history and provide efficient implementations of remote sensing relevant algorithms.In addition to its core API, which often has a C/C++ interface and requires deeper understanding of implementation details, they offer stand-alone applications that can be run from the command line interface (CLI).The EnMAP-Box source code provides wrapper routines that allow calling these applications from inside IDL and to integrate them into their own workflows.

Conclusions and Outlook
The EnMAP-Box integrates powerful machine learning algorithms and bridges to more and more popular scripting languages with well-established concepts of imaging spectroscopy software.Much of the available functionality of the EnMAP-Box is not available as an ENVI-industry standard in this field.Users may use support vector machines or random forests for regression and classification with spectral libraries or spectral images, and they may interface to Python toolkits or R script libraries.Various components of the EnMAP-Box were improved by sophisticated extensions, e.g., comprehensive HTML reports, the spatial/spectral calculator imageMath, etc. Developers, on the other hand, are provided with a concept and pre-programmed code to easily disseminate their implementation to a wider community.
Aiming at the extension of the user community and providing the most powerful algorithms for the analysis of EnMAP data, the needs of regular users as well as potential developers of new methods were integrated in the conceptual development over the past years.The EnMAP-box development was thus driven by the idea of  user-friendliness-achieved, e.g., by an intuitive GUI focusing on the handling and visualization of data with high spectral dimensions, widget controlled machine learning algorithms, common file formats, selected basic tools and easy-to-use advanced methods, the possible integration into the ENVI menu;  comprehensiveness-the set of available tools and applications as well as interfaces to scripting languages make the constant change between different software obsolete;  standardization-the implementation and use of applications is standardized to assist external developers and provide the users a common look-and-feel, which also constitutes a key component for user-friendliness;  addressing external developers-by making well-documented source code available, offering an API and creation wizard.
Ever since the prototype of the EnMAP-Box in 2009, its applications have been used in the field of imaging spectroscopy and beyond.Since about 2012, more and more external applications have been added, all following the idea of a standardized implementation.It was used for data processing within EnMAP summer schools since 2010 and always received positive responses.Several universities use the EnMAP-Box as a freely available tool in basic teaching.Feedback from users was constantly used to extend and improve its functionality.
Prior to the start of EnMAP, the toolbox will be extended by a set of sensor product specific tools.This will include, e.g., a tool for atmospheric correction to transfer data from level 1B to 2A with user defined settings, or for working with rational polynomial coefficients to reconstruct detector pixels from level 1B data.In addition, import filters for currently defined EnMAP data products as well as complementary sensors, e.g., Sentinel-2/3, Landsat OLI, will be included.Moreover, a set of disciplinary applications are currently being developed at laboratories involved in the mission preparation [35].This includes an advanced suite for geological mapping based on the tetracorder approach [68] as well as a soil mapping suite.
Several recent software developments in remote sensing were linked to new instruments.The BEAM application suite (Basic Envisat and ERS (A)ATSR and MERIS Toolbox) and the Orfeo ToolBox (for Pleiades) are successful examples.Both are open-source and programmed in Java and C++, respectively.They are not limited to data from the mentioned sensors.The EnMAP-Box constitutes a similar approach that fills a gap with regard to imaging spectroscopy data.The chance for the EnMAP-box to become an evolving toolbox with a constantly growing set of applications is high, given the toolbox's availability with future EnMAP data, its sensor specific algorithms together with its flexibility of integrating new developments in a variety of programming languages.In his 2009 overview paper on imaging spectroscopy, Goetz [1] explains the demise of the High Resolution Imaging Spectrometer (HIRIS) with the small number of scientists working with imaging spectroscopy data, and, in this context, he mentions a lack of readily available software and algorithms.Therefore, the early availability of simulated EnMAP data [69] in combination with the freely available EnMAP-Box can be seen as major pre-requisites for the success of the EnMAP-mission and a frequent use of its data in many environmental fields.

Figure 1 .
Figure 1.Graphical user interface of the EnMAP-Box version 2.1.1 with main menu, file list and selected interactively linked data frames.Images show simulated EnMAP data from Berlin, Germany, together with modelled vegetation cover fraction (see Section 2.2).

Figure 3 .
Figure 3. Dialogue for and advanced SVM regression settings in imageSVM.The dialogue for the regular settings is limited to the Input and Output fields.

Figure 4 .
Figure 4. imageSVM regression: after completed parameter search the results from the grid search with internal cross-validation are displayed in an HTML-report.

Figure 5 .
Figure 5. SVM models are separately stored and, this way, may be flexibly applied to series of images or spectral libraries.

Figure 6 .
Figure 6.For comprehensive quantitative accuracy assessment the EnMAP-Box generates various statistical performance measures and visualizes histograms in an HTML report (figure shows excerpt only).

Figure 8 .
Figure 8. Adapted framework for application integration using scripting languages (top) and example for scripting options (here the case of the plot interface) (bottom).

Table 1 .
List of available applications in EnMAP-Box version 2.1.