Large-domain modeling is an important advancement in the environmental sciences. Models covering broad spatial areas are expected to improve our ability to study, monitor and manage natural resources at regional, continental and even global scales [1
]. Modeling at this scale is increasingly feasible thanks to the growing computational power of desktop and cloud computing platforms, as well as the availability of large-scale and spatially continuous meteorological and geospatial datasets [3
]. Large-domain models facilitate research and management not only at broad scales but also at local scales by providing spatially consistent datasets for filling data gaps (e.g., estimating streamflow in ungaged basins) and supporting site-specific assessments and comparisons. However, along with the benefits of large domain modeling come new challenges for how we best utilize the often large and complex datasets generated by these models.
In the era of Big Data, discovering meaningful patterns in large datasets is a common challenge in many fields [4
]. In the environmental sciences, geospatial datasets and model outputs spanning large areas can contain a wealth of information. However, due to their sheer size and complexity, model datasets are often inaccessible to the vast majority of interested stakeholders, resource managers, policy makers and researchers. Consequently, it is common that only those who developed the models or that have the experience and technical skills necessary to analyze the results are able to use them to derive new insights and knowledge. However, other stakeholders and researchers, whose backgrounds, goals and interests likely differ from the original model developers, could benefit from using these datasets to form their own hypotheses, discover new patterns and develop a better understanding of the processes and systems in their own area of interest.
One common and effective way of communicating model results is through the use of data visualization [6
]. By representing data using one or more visual encodings (e.g., length, position, color, shape, etc.), visualizations leverage our visual perception system to help us rapidly and effortlessly identify patterns or anomalies in the data [10
]. Static data visualizations have been used for centuries and today can be found in virtually every publication across all scientific fields [6
]. Although effective and widely used, static visualizations are limited to the variables and spatial and temporal scales for which they were created [12
]. However, advances in computer hardware and software have led to the emergence of interactive data visualizations that provide greater flexibility for directly interacting with and exploring datasets at multiple scales and across any number of variables.
Interactive data visualizations can be effective tools to help us better understand datasets and the phenomena they represent through a process known as visual analytics [13
]. Card et al. [10
] defined data visualization as “the use of computer-supported, interactive, visual representations of data to amplify cognition”. Liu and Stasko [15
] argue that interactive data visualizations are useful because they facilitate the formation of mental models, which can play an important role in management and decision making [16
]. Interactive data visualizations can also be useful for helping us iteratively form and test hypotheses [8
]. As a result, instead of being the end-product of the research process and intended solely to communicate study results, data visualization has become an integral part of an iterative scientific workflow helping researchers form hypotheses and better understand their own datasets and analyses [22
]. In short, interactive data visualizations are tools that help us not only to see the data but also to think about the systems and processes they represent.
Advances in web technologies and standards have led to a proliferation of free and open source software (FOSS) libraries for creating interactive data visualizations on the World Wide Web (the Web) [23
]. The Web has long been recognized for its potential to improve environmental management by making data and models more accessible and for fostering cooperation and collaborative decision making between stakeholders [27
]. This improved accessibility can also facilitate inter-disciplinary research by making datasets available to researchers from other fields who may not have the experience or skills necessary to access and analyze the data themselves [30
]. These tools can help others discover new patterns that may not have been previously known even to the original creators of the dataset [13
]. Furthermore, by linking to underlying data sources, Web-based interactive data visualizations can integrate new information and data as they become available [22
Research into using interactive data visualizations for exploring geospatial datasets can be found dating back to the 1990s [31
]. More recently, Steed et al. [33
] presented a desktop-based software application for visualizing large-scale environmental datasets and model outputs called the Exploratory Data Analysis Environment (EDEN). Although EDEN provides powerful data visualization and analytical capabilities for exploring environmental models, it was designed primarily for scientists and engineers and may be less accessible to broader audiences including stakeholders and decision makers. Recent examples of web-based applications include the Stream Hydrology and Rainfall Knowledge System (SHARKS) for visualizing hydrologic and meteorological timeseries [34
] and an application for viewing interactive animations of hydrologic model simulations [35
]. While these examples demonstrate two approaches to making environmental data and models more accessible through user-friendly web-based interfaces, both provide limited interactive data visualization capabilities and were also developed for specific datasets and models covering small spatial scales. New approaches for creating web-based interactive data visualizations that can be generalized across different datasets and over large spatial scales are still needed.
In this paper, we present a web-based interactive data visualization framework for exploring spatial patterns in environmental datasets and model outputs called the Interactive Catchment Explorer (ICE). In developing the ICE framework, our objectives were to (1) create a generalized framework for making environmental and geospatial datasets and models easier to access, explore and understand; (2) demonstrate the application of this framework to different datasets and models through a series of case studies; and (3) determine what benefits and broader impacts, if any, these tools can have on research, management and decision making.
In the next section, we begin by describing the datasets, models and management issues associated with three separate research projects, which served as case studies to demonstrate the application of the ICE framework. We then describe the design and implementation of the core features and functionality that are central to this framework. In the results, we use a simplified user interface created solely for demonstration purposes to describe how an ICE application works and how various tasks can be performed. We then present the web application built for each of the three case studies, highlighting the unique aspects of each application and how it has been used in practice. The discussion focuses on the broader impacts of these applications and highlights the unique aspects of the ICE framework compared to similar efforts found in the literature. We also discuss some of the limitations of both the ICE framework itself as well as our understanding of how ICE applications affect user thinking and decision making, and we provide some suggestions for how these limitations could be addressed in future research. Lastly, the conclusions provide an overall summary of this work and the potential impacts of interactive data visualization tools on environmental research and management.
For large-scale environmental models and datasets to be effective in supporting management and decision making, we need new ways of making this information easier to access, explore and understand. The ICE framework demonstrates one approach to developing web-based interactive data visualizations for exploring the spatial patterns in environmental datasets and model outputs. Although we initially created the ICE framework for a specific dataset, we found that its primary components and architecture could be readily adapted to datasets from other research projects. Using this framework, we developed a series of Web applications designed to help users explore datasets and model results at multiple spatial scales, discover spatial patterns, evaluate relationships between variables and identify and prioritize areas for future management.
Based on our three case studies, we found that tools such as these can have a variety of broader impacts on research, management and decision making. From the northeast application, we found that, when linked to underlying datasets and models, these tools can play a central role in the iterative process of monitoring, modeling and management by motivating users to collect and contribute data to the project, which in turn improves the accuracy and quality of the underlying models. These tools can also provide a common platform for learning and exploring new datasets, which can be used to facilitate discussion and collaboration between stakeholders and decision makers across international jurisdictions, as demonstrated by the CCE application. Lastly, these tools can provide a unified interface for accessing and exploring multiple datasets that may be generated over the course of a single project and which otherwise may require substantial effort to process and analyze, as shown by the LMG application.
The use of interactive data visualizations for exploring geospatial datasets has been an area of research since as early as the 1990s [31
]. However, only within the past decade or so has computer hardware coupled with new FOSS libraries and web standards made it possible to create interactive data visualizations of large datasets that can be accessed through a standard Web browser. While many data visualization tools have been created for exploring environmental datasets and models based on desktop software [33
] or server-based Web architectures [24
], the ICE framework demonstrates a new approach that uses a client-based architecture to create a responsive user experience for directly interacting with this kind of geospatial data.
One of the primary tradeoffs in using a client-based architecture is that the size of the dataset is limited to what can be reasonably downloaded, processed and rendered within the browser without causing excessive latency. Across our three case study applications, datasets ranged in size from approximately 100 kilobytes up to 50 megabytes, which we considered the upper limit for what can be supported by modern computer hardware. Although some datasets contained hundreds of variables, we selected only a subset of those variables in order to keep the dataset for each application to a reasonable size. To address this limitation, a server-side component could be added to the ICE framework that would generate datasets containing any combination of variables as chosen by the user. Although the number of variables would still be limited, this would provide greater flexibility for evaluating different combinations of variables depending on the interest of the user.
In addition to the dataset size, the applications were also limited by the number and complexity of the geospatial features that are rendered on the map. In two of our case studies, we addressed this limitation by using spatial aggregation in the northeast application to reduce the overall number of features and by using simpler geometries in the LMG application (HUC12 pour points instead of polygons representing the basin areas). However, this limitation could also be addressed using alternative Web-based graphics technologies. Currently, the ICE framework renders geospatial features on the map using the scalable vector graphics (SVG) format, which is the simplest but also the least performant graphics standard available. However, Web-based maps can also be rendered using raster-based methods such as canvas or Web Graphics Library (WebGL), which can have better rendering performance and scalability than SVG at the expense of increased code complexity.
Beyond the technical limitations of the ICE framework, we also recognize that our understanding of its impacts on end users is limited. Although user feedback suggests that these tools are useful and engaging, this feedback was provided voluntarily and therefore may be biased if it was only provided by those who find the tools most useful to begin with. Controlled user surveys and experimental studies would be beneficial for helping us to determine if and how these tools improve understanding and influence decision making. Based on similar types of environmental data visualization tools, user assessments have yielded mixed results regarding whether these types of tools improve understanding. For example, Herring et al. [12
] found that using an interactive data visualization tool for exploring local climate risks significantly changed the participants’ attitudes towards climate change. Furthermore, van Hardeveld et al. [73
] found that their interactive simulation system fostered cooperation among stakeholders and improved their understanding of the problems and actions related to peat management. In contrast, Xexakis and Trutnevyte [74
] found that use of an interactive web application for simulating electricity supply scenarios did not improve user engagement and actually led to poorer user understanding. Similarly, Arciniegas et al. [75
] found that an interactive map-based decision support tool was not considered helpful by study participants. Experimental user studies are therefore necessary to evaluate what impacts, if any, a tool has on user understanding and decision making.
In addition to understanding the impacts and effectiveness of these tools, user studies would also be helpful for improving the design of the ICE framework by helping to identify areas that cause confusion. Initial development of this framework was greatly improved thanks to feedback from a select group of users. However, this feedback was primarily obtained from other researchers and managers with similar technical skills and experience. More feedback from users with varied backgrounds would likely lead to a better tool that is both useful and effective across broader audiences.
Finally, as we continue to develop the ICE framework and adapt it to new datasets and models, there are a few key areas that we will focus on to expand and improve on its capabilities. Currently, the ICE framework is designed primarily to explore only the geospatial patterns of large-scale environmental datasets. However, many modeling datasets contain timeseries of observed and simulated values. Although some support for time-varying datasets was added in the LMG application, the addition of greater timeseries support will allow users to see not only the long-term average conditions of each spatial feature but also a complete time history. Furthermore, while the ICE framework is currently aimed at the exploration of static model outputs, it could also incorporate interactive modeling tools for performing scenario simulations of individual features. Lastly, finding ways of incorporating uncertainty in the visualizations will help to improve the use of these tools for decision making and management.
The ICE framework demonstrates a Web-based interactive data visualization approach to explore the spatial patterns in environmental datasets and model outputs. We applied this framework to datasets and models from three separate research projects, each focusing on a unique research topic and management issue in a specific region of North America. Based on our experience developing these applications, we found that not only can web-based data visualization tools be used for accessing, exploring and understanding large geospatial datasets, but these tools can also provide a number of broader impacts that go beyond any one user.
Web-based interactive data visualization tools such as the ICE framework can promote a democratization of environmental datasets and models by placing this information directly in the hands of stakeholders, managers, decision makers and other researchers. They can make large and complex datasets easier to access and understand for broader audiences who otherwise may not be able to analyze the raw data themselves. They can provide a central platform for shared learning and discovery, which in turn fosters discussion and collaboration between users. When linked to their underlying data sources and models, these tools can motivate users to contribute their own data to the project in order to improve the quality of the model results.
Although there are some limitations to this approach and the ICE framework specifically, further research into hybrid client–server architectures and the application of alternative rendering strategies is likely to increase the capabilities of this framework as well as improve its performance for handling larger datasets. Overall, the development and application of the ICE framework demonstrate that interactive data visualization tools can play a central role in the iterative process of environmental data collection, exploration, modeling and decision making.