# A Review and Characterization of Progressive Visual Analytics

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Motivation

- to realize responsive client-server visualizations using incremental data transmissions [4],
- to make computational processes more transparent through execution feedback and control [5],
- to steer visual presentations by prioritizing the display of regions of interest [6],
- to provide fluid interaction by respecting human time constraints [7], or
- to base early decisions on partial results, trading precision for speed [8].

- a collection and review of scholarly publications on the topic of PVA from various domains;
- a characterization of PVA capturing the reasons, benefits, and challenges of employing it;
- a set of recommendations for implementing PVA sourced from a range of publications.

## 2. PVA Fundamentals

- With PVA, we are able to start the interactive analysis right after ${t}_{\mathit{Response}}$ in parallel to the further refinement of the view. Yet with MVA, view generation and interactive analysis can only be done in sequence, so that the overall time spent on generating and utilizing the visualization extends well beyond ${t}_{\mathit{Complete}}$.
- With PVA, we yield a sufficiently good partial result as early as ${t}_{\mathit{Reliable}}$ and no later than ${t}_{\mathit{Stable}}$, which is in most cases still well before MVA’s completion time. PVA’s ${t}_{\mathit{Complete}}$ is irrelevant, since if we wanted a final and polished result, we would have used MVA in the first place.

## 3. Review Procedure

**(Step 1) Gathering literature on PVA.**To collect PVA-related publications, we used the common approach of a set of initial seed publications on the topic of PVA from various disciplines. Using Google Scholar, we retrieved the set of all papers they cite, as well as all papers citing them. For all PVA-related papers among them, we repeated this process, until no further PVA-related papers were found. The set of initial publications included 15 papers from various disciplines: Databases [23,24,25,26,27], Information Visualization [3,28], Scientific Visualization [9,11], Visual Analytics [1,2,5,7], and Human–Computer Interaction (HCI) [4,8]. The results of this step are listed in the Reference Section at the end of this paper.

**(Step 2) Extracting different PVA concepts.**We then went over all gathered papers with the aim of extracting the different notions of PVA they describe. For the largest part, these notions align with the scientific community in which the respective papers are published. For example, PVA papers from the HCI community tend to focus on PVA as a means to facilitate interaction, whereas papers from the Database community tend to position PVA as a means to produce database query results. It soon became clear, that most papers put their own little twist on the basic idea of progression, which results in a quite diverse and nuanced understanding of PVA with each PVA system or algorithm instantiating their own PVA concept. Hence, the result of this step was the realization that the understanding of PVA is much too heterogeneous to discuss it bottom up by merely collecting all its different notions.

**(Step 3) Consolidating concepts in a characterization of PVA.**To nevertheless get a grasp on the body of literature, we thus proceeded to no longer look for distinctions between the various notions, but for their commonalities, disregarding different terminologies and levels of detail. We found that most instances of PVA note some reasons why it is employed in the first place and some PVA benefits that are exploited to address the reasons. Moreover, papers often discuss additional challenges of using PVA. Breaking up PVA along these three aspects captures that there are different reasons for using PVA, which are addressed by different PVA benefits and lead to different PVA challenges. The characterization according to PVA’s reasons, benefits, and challenges is given in Section 4.

**(Step 4) Extracting practical requirements for PVA.**The existing body of literature offers various ways in which to provide the PVA benefits and to resolve, circumvent, or at least reduce the PVA challenges. To that end, a number of publications give practical advice on which additional features to provide and what design criteria to observe to realize an effective and usable instance of PVA. We collected this advice as requirements from these papers and we present this “raw” collection in Appendix A1.

**(Step 5) Aligning requirements with PVA characterization.**Having the collection of requirements is just one side of the coin. We still need to know when to follow which one. For this reason, we drew connections between the various requirements by grouping them in a process-driven way, and in a user-driven way. The results of this step are detailed in Section 5. The process-driven grouping is done with respect to the aspect of the visual analytics process to which they pertain. And the user-driven grouping is done with respect to the PVA characterization given in Section 4, i.e., the PVA benefits they aim to provide and the PVA challenges they aim to address. As a result of this step, we distill nine high-level recommendations from the various requirements—one for each identified benefit and challenge.

**(Step 6) Exemplification of PVA characterization and recommendations.**Finally in this last step, we aim to put our PVA characterization and the proposed recommendations in the context of a real world use case. For this step, we have chosen one of our own PVA solutions for the simple reason that we have all the necessary inside information on why and how we built it, so that we can discuss it in the necessary depth. This discussion is given in Section 6.

## 4. A Characterization of PVA

#### 4.1. Reasons for Using PVA

**Long-lasting computations.**In this case, the main problem is the completion time ${t}_{\mathit{Complete}}$ of the computation being well beyond the acceptable. Possible instances are:

- Quasi-indefinite computations: These are computations that will theoretically terminate at some point, but reaching this point is a few thousand years out, e.g., due to combinatorial explosion [32].
- Delayed computations: These types of computations can be completed within reasonable time, but it is still taking too long to meet a given deadline by which the result is needed.

**Slow computations.**Here, the shortcoming is that the completion time ${t}_{\mathit{Complete}}$ is not within bounds for fluid interaction [33] including guaranteed response times to queries and continuous manipulation of views. Expected response times are commonly subdivided in three levels [2,34,35]:

- Task completion: Initiating a computational task, such as a query or complex filter operation on large datasets, should not stall the flow of analysis for more than 10 s.
- Immediate response: In an interactive setting, such as tuning computational parameters in a GUI, feedback to the made changes should appear in 1 s.
- Perceptual update: Computations initiated through direct interaction with the view should complete in under 1 s to ensure smooth updates without noticeable stutter or flickering.

**Non-transparent computations.**Papers arguing from this perspective criticize the monolithic, one-step nature of the computation that gives MVA its name. They note that MVA presents an algorithmic black box without any means to observe, interject, and reconfigure it on the fly. Its only means of understanding the computation is after the fact by inferring what might have happened from the result produced at completion time ${t}_{\mathit{Complete}}$. This argument is delivered from two different angles:

- Monolithic computations: In this instance, the focus lies on the complexity and opacity of the computational process that cannot be observed, understood, or steered while running its course [5].
- Monolithic visualizations: Here the focus lies on the visualization produced by the process and being shown in all its cluttered, overplotted detail as a single monolithic end result [6].

#### 4.2. Benefits of Using PVA

**Early partial results.**These are produced between ${t}_{\mathit{Response}}$ and ${t}_{\mathit{Reliable}}$. Therefore, they are meaningful, but not necessarily reliable yet. An example for such an early partial result would be a preview that gives already a first visual indication of the overall look and feel of the final visualization, but without yet showing enough data to draw conclusions from it. Showing such a preview can be used to orient oneself in the view before it gets too cluttered. If the preview is generated quickly enough—i.e., ${t}_{\mathit{Response}}$ stays below an acceptable update rate—it can provide fluid interaction.

**Mature partial results.**These are produced between ${t}_{\mathit{Reliable}}$ and ${t}_{\mathit{Stable}}$. They already reflect the final result within an acceptable margin of error. Hence, they can be used for an early start of the visual analysis, as they show enough data and detail to make first observations in the still unfinished, yet already trustworthy intermediate result. In that sense, mature partial results can be used to gain a head start in time-critical analysis scenarios even before the results fully stabilize. The decision when a partial result is reliable enough to do so is:

- process-specific: a monotonously converging progressive computation is likely to yield useful results earlier than one that is highly fluctuating and bound to produce “surprises”;
- task-specific: a high-level overview task requires less detail to be shown than an in-depth comparison;
- domain-specific: a social media analysis can accept a higher margin of error than analyzing patient records to make a clinical treatment decision;
- user-specific: some users with experience in progressive computations may be able to see early on “where this is going”, while others wait a little longer before feeling comfortable to work with a partial result.

**Definitive partial results.**These are produced between ${t}_{\mathit{Stable}}$ and ${t}_{\mathit{Complete}}$. In this case, the computation is not entirely finished yet and we are still dealing with partial results. These results may not yet look as polished as the final result with last minor layout optimizations still needing to be computed, or they may still miss some data points as a few remaining data chunks have not yet been processed. Yet, these results leave no more doubt about the final outcome. That final outcome can be assumed as settled at this stage and the partial results can be used in its place for all practical purposes. Hence, they can be used for an early completion of the visual analysis, confirming observations made on mature partial results and basing early analytic decisions on them. There exist two scenarios of such early completions, which we term:

- hard early completion: this terminates the running computation and starts with the next analytical step based on the findings made up to that point;
- soft early completion: this also starts with the next analytical step, but the computation is only paused and not terminated [1].

**Succession of results.**Besides providing new start and end points for visual analysis, PVA also provides a constant outflow of results that lie in between. These form a string of natural “break points” that can be used to

- monitor the computation in its course for understanding and possibly debugging the algorithmics behind it (cp. software visualization);
- steer the computation through interactive reparametrization of algorithms or reprioritization of data chunks, adapting the PVA process to early observations and emerging analysis interests;
- observe the build-up of complex visualizations step by step, showing increasing numbers of visual elements to convey even dense, detailed, and cluttered visualizations;

#### 4.3. Challenges of Using PVA

**Parametrizing the progression.**Running a PVA solution asks for additional efforts at both design phase and utilization phase for a suitable (for the task to be carried out) parametrization of the progressive pipeline. Common parameters for data chunking concern the sampling technique, the sample size, the processing order or prioritization of sampled data chunks, and how to combine them back together through buffering, binning, or aggregation. Common parameters for process chunking concern the step size for the iterations, the stopping criterion for when to end the progression, and sometimes also a random displacement that governs the degree of intentionally introduced fluctuation to avoid convergence towards local optima. Managing this additional parameter space, not present in MVA, clearly demands additional efforts [39]. But PVA also helps in doing so, by allowing interactive readjustment and fine-tuning of parameters while the progression is already on its way.

**Judging partial results.**Partial results, while being displayed promptly, come at the cost of being only approximate. They are either incomplete in case of data chunking as not all data has yet been processed, or they are inaccurate in case of process chunking as not all iterations have yet been computed. When dealing with intermediate results, the analyst has the additional burden of judging whether they are “good enough” for the analytic task at hand—i.e., whether ${t}_{\mathit{Reliable}}$ has been reached. This challenge does not occur in MVA, as it is clearly rooted in the progression. There are different strategies for judging partial results, such as looking at absolute completion rates (How much of the full dataset is shown? How many out of all iterations have been computed?) or looking at relative completion rates (How much additional information did a new result add to a previous one?) Note that this particular challenge gains additional complexity when multiple views are involved that exhibit different states of maturity. In this case, it is not clear on which view the user should rely to judge the current result’s trustworthiness.

**Monitoring the succession of results.**The partial results alone do not tell the full story. Only in the context of the series of results, we are able to judge the current one. Is the computation converging towards a stable result or is there no such trend? If it is converging, can we estimate how long until ${t}_{\mathit{Stable}}$ is reached? And when the output is no longer changing, have we actually reached ${t}_{\mathit{Stable}}$? Or is it just because the progression is “jammed”, dealing with the latest interactive changes we made or processing an unexpectedly complex data chunk? So, while keeping an eye on the running computation can be seen as an additional burden, it also provides additional information on the provenance of the currently shown result as an intermediate point of that computation. This can be used to further inform the reliability assessment of the current partial result, but also as an indication for the need to adjust or steer the progression.

**Steering the progression.**Like MVA, PVA can be run in a fire and forget manner: once set up and parametrized, it is never changed and runs its predefined course. Yet this would mean to disregard most of PVA’s inherent flexibility and thus to reduce its use to only a fraction of what it could be: a responsive mechanism with which we can interact while it is running and which can be adapted alongside our growing understanding of the data and our evolving analytic interests. Leveraging this full potential of PVA also means that the analyst cannot sit back and wait for the final result to arrive, but has to actively intervene and steer the computation. At the minimum, the analyst has control over starting, pausing, slowing down and speeding up, as well as stopping (early cancellation, early completion) the process. This can also encompass branching of an analysis into two concurrently run processes—e.g., running the same algorithm on different data, or different algorithms on the same data to compare or interleave their results. More involved forms of steering include narrowing down and refocusing the process on certain subspaces of interest within the dataset, or to halt the processing of data subspaces irrelevant to the current task or analytic question.

**Handling fluctuating or even diverging progressions.**Many PVA approaches assume a “well-behaved”, monotonously converging progression of results that proceeds smoothly from a crude first response to a refined final result. In practice, this is hardly ever the case: data may be non-uniformly distributed across the chunks, the computation may perform random moves to escape local optima, or the user may interact with the computation, readjusting it and thus introducing discontinuities in the process. Governing concepts like data consistency (intermediate visualizations should become increasingly representative of the complete dataset) and visual consistency (intermediate visualizations should become increasingly representative of the completed visualization) are difficult to enforce under such circumstances [40,41]. This makes it particularly hard to judge the reliability of preliminary findings, which can turn out to be mere artifacts of the progression and not of the data.

## 5. Requirements

- $\mathbb{R}$Hel1: Hellerstein et al., 1999 [10]

#### 5.1. Process-Driven Characterization

- Data requirements concerning aspects from the ingestion and subdivision of the data, to prioritization and aggregation strategies for data in a PVA solution;
- Processing requirements dealing with all aspects from the progressive implementation of the computation to its execution and control;
- Visualization requirements regarding all aspects from visual feedback about the running process to the dynamic presentation of the incremental outcome;
- Interaction requirements including all aspects from meeting human time constraints to providing structured interaction with the process.

#### 5.2. User-Driven Characterization

#### 5.2.1. Requirements for Providing PVA Benefits

**Provide early partial results**

**Immediacy:**The whole idea of early partial results is that they are provided promptly, which usually means that ${t}_{\mathit{Response}}$ must respect human time constants ($\mathbb{R}$T32), as they are listed under Section 4.1. Otherwise, the PVA process will be perceived as lagging or stalling. To ensure immediacy, PVA systems should be designed to start computations immediately after being invoked by the user ($\mathbb{R}$T35). They can also employ the aforementioned adaptive sampling mechanisms to ensure that only as much data as there is time for gets processed ($\mathbb{R}$T34).

**Significance:**A partial result, as early as it may be delivered, is only useful if it shows what the analyst needs to see. This notion is captured by various concepts, such as meaningfulness, interestingness, and relevance:

- Meaningfulness ($\mathbb{R}$S14): Partial results should reflect the overall result by coming in the same format (sometimes called structure preservance $\mathbb{R}$M24) and be appropriate to be taken in by a human analyst.
- Interestingness ($\mathbb{R}$Hel1, $\mathbb{R}$S15, $\mathbb{R}$M29): Partial results are of no use, if those parts of the data, which interest the analyst most, get processed and shown last. Hence, it is important to be able to prioritize data and process it in order of decreasing interestingness.
- Relevance ($\mathbb{R}$S16): Some parts of the data may not only be of lesser interest to the user, but actually be entirely irrelevant to the analysis task at hand. Being able to exclude irrelevant data from processing can further streamline the creation of useful partial results by making it faster and less cluttered.

**Actionability:**In addition to having a significant and early partial result, the analyst must also be able to act upon it. Hence, the literature stresses the point that early partial results should already provide full interactivity ($\mathbb{R}$H5). This way, analysts can interact with them as they would interact with the completed result. In addition, a PVA system should allow analysts to perform an early cancellation of the running process ($\mathbb{R}$M28), if they perceive the early partial results as inadequate for their analytical needs.

**Recommendation I:**To provide early partial results—i.e., to establish ${t}_{\mathit{Response}}$—first processing results should be delivered promptly, while maintaining their significance and interactivity.

**Provide mature partial results**

**Uncertainty of the results:**To judge how trustworthy a partial result is and thus how trustworthy any first observations are, it is of principal importance to communicate the uncertainty ($\mathbb{R}$M26). Uncertainty displays can range from minimally invasive measures, such as adding confidence bounds [8], to switching out the entire visualization for binned alternatives that prevent micro readings in still uncertain areas of the plot [7] (Section 2.4.3). For a simpler communication, the results’ uncertainty can also be condensed into numerical aggregates or quality metrics ($\mathbb{R}$M25).

**Uncertainty of the process:**Uncertainty is not only a characteristic of the partial result, but also of the process that generated it. As uncertainties in the computations may influence the trustworthiness of the results without being directly quantifiable, it is of equal importance to inform analysts about them ($\mathbb{R}$T41). Such provenance information can stretch from traits of the data preprocessing—i.e., the currently used sampling strategy, to the algorithmic parameters and any simplifications made during the process ($\mathbb{R}$M27).

**Recommendation II:**To provide mature partial results—i.e., to establish ${t}_{\mathit{Reliable}}$—the inherent uncertainty of the partial results and of the progressive computation should be communicated truthfully and comprehensively for judging the results’ trustworthiness.

**Provide definitive partial results**

**State of the process:**The analyst should be able to discern the aliveness of a process—i.e., whether a current result is no longer visibly changing, because the result is actually stable, or because the process is stalling due to a deadlock or a lost connection ($\mathbb{R}$M21).

**Progress of the process:**It can also be beneficial for the analyst to know how much processing has been done overall, either in absolute or relative terms ($\mathbb{R}$M22, $\mathbb{R}$M23). Another way to indicate progress is to convey an estimated time to completion ($\mathbb{R}$T40). In some cases, the complete result may not be that far out and analysts may be willing to wait another minute for it, but not hours.

**Recommendation III:**To provide definitive partial results—i.e., to establish ${t}_{\mathit{Stable}}$—the process’ state and progress should be communicated for judging its aliveness, convergence, and time of completion.

**Provide a succession of results**

**Monitoring the succession:**When observing the running process, it is paramount to not only have an expressive visual representation of the data and any associated uncertainties, but also of the amount of change in between updates. As a guiding principle, we want the change in the visualization to be proportional to the change in the underlying data ($\mathbb{R}$F11). Otherwise, we might see change where there is none and vice versa. The literature also points out the importance of supporting various monitoring tasks, such as spotting general changes with each update ($\mathbb{R}$H2, $\mathbb{R}$S18), or tracking specific items of interest across multiple updates ($\mathbb{R}$H6).

**Steering the succession:**In some cases, it may be necessary to pause the progression on demand in order to inspect a particularly complex update. This is usually supported via pause/play mechanisms ($\mathbb{R}$H3, $\mathbb{R}$T36), with which users are familiar from animated visualizations and video playback. In addition, it may also be of interest to change the time span between updates, e.g., by adapting the step-size or chunk-size of the progression ($\mathbb{R}$T36). More sophisticated modifications of the running process usually involve manual adjustments of intermediate results that are then used as input for the next processing step ($\mathbb{R}$M30)—e.g., to “help along” a clustering or a network layout.

**Recommendation IV:**To provide a succession of partial results, visualizations with change-proportional updates should be used together with interaction mechanisms for steering the progression dynamics and their visual representation.

#### 5.2.2. Requirements for Mitigating PVA Challenges

**Support the parametrization of the progression**

**Interactive parameter changes:**Turkay et al. [7] (Section 4.1.3) have observed that PVA users tend to be quite preoccupied with manual parameter changes—in particular in multi-view setups. To make manual parameter adjustments in an informed manner, one can employ multiple consecutive executions of the same progressive computation ($\mathbb{R}$M31). By using different parameters for each, their early partial results can be compared and the most promising can be chosen for continued processing.

**Automated parameter changes:**It is also possible to use adaptive mechanisms for auto-adjusting the parameters to match external constraints, such as a desired response time ($\mathbb{R}$T34). Such an automated parameter adjustment can not only be helpful to cope with input data of varying complexity, but also help to keep the user’s attention on the data analysis instead of being distracted by the multitude of PVA parameters and options ($\mathbb{R}$H4, $\mathbb{R}$S13).

**Recommendation V:**To support the parametrization of PVA, changes to process and data parameters should yield predictable results that lend themselves to interactive and adaptive parametrization.

**Support judging partial results**

**Minimize added visual complexity:**Communicating the uncertainty in addition to the currently shown result puts extra complexity in the visualization. If possible, this extra information should be rendered as an unobtrusive visual cue, adding minimal complexity to the base visualization and thus easing its interpretation ($\mathbb{R}$F9). Furthermore, these cues should use a distinct representation from the base visualization, so as not to add visual noise ($\mathbb{R}$F12). Error bars for column charts are a good example for both: While also using the height to encode the uncertainty value, they do not use the same column representation, which could have been confused with the columns from the base representation. Instead they only consist of a few extra lines per column that grow less pronounced as the uncertainty decreases.

**Consistency across views:**In multi-view setups, uncertainty should be displayed in a consistent fashion for all views ($\mathbb{R}$B45). Consistency does not only relate to using similar visual uncertainty representations across different views. It also concerns the effect of different views reaching stability at different points during the progression. Having a partial result that is apparently stable in one view and still changing in another is confusing to the user, which is why such discrepancies need to be taken into account ($\mathbb{R}$T37). One way of doing so is to compute the uncertainty in a global manner and to display it in a dedicated view as a summary statistic over all views.

**Recommendation VI:**To support judging partial results, their uncertainty should be visualized in a way that minimizes added visual complexity and ensures consistency across views.

**Support monitoring the succession of results**

**Stabilizing dynamic views:**The main challenge of a “live” display of the progression is that constant changes of the view make it hard to keep one’s orientation, as the layout still moves data points around or simply adjusts the color scale as new values are added. While in some sense, these changes are representative of the progression, view changes still need to be limited ($\mathbb{R}$S17), as otherwise the animated view is nothing more than a flickering of seemingly unconnected partial results. A common way of doing this is by using visual anchors ($\mathbb{R}$B44) that introduce more stable landmarks in the view to provide a frame of reference for the analyst’s mental map. For example, in online dynamic graph drawing, this is done by assigning so-called pinning weights to central nodes, keeping them in place while the layout around them changes [44] (Section 4.1).

**Dynamic features in static views:**In case of static views, it is still important to convey the ongoing progression, so that the analysts know if the view they are currently working on is outdated and requires an update on demand. Some PVA systems use a global highlighting of the UI elements that are used for updating on demand—e.g., a “cautionary yellow background” to indicate that new results are available [28] (Section 3.2). Others provide more detail, for example by dynamically adding visual cues directly into the static view to mark those regions where new results are waiting to be incorporated [1] (Section 5.3). For the special case of asynchronous updates—i.e., updates that come in a different order than they were triggered—a color-coding of visual changes has been proposed to aid users in maintaining an overview of the succession [45].

**Recommendation VII:**To support monitoring the succession of results, the visualization should ensure stability for constantly changing dynamic views and embed dynamic features in fixed views.

**Support steering the progression**

**Information about the computational process:**It is hard to steer a computational process that is a mere black box to the analyst. While it helps with the parametrization if this black box behaves deterministically ($\mathbb{R}$C8), for actually steering it by switching algorithms or excluding certain parts of the data, more information is needed. To support the steering, provenance information should be shown that detail the current computational pipeline ($\mathbb{R}$B42) and its current parametrization ($\mathbb{R}$M27).

**Immediate feedback on made changes:**Steering the progression in a particular direction by prioritizing certain data or adjusting algorithmic modules of the pipeline requires the analysts to see the effects of their changes to further fine-tune their adjustments. On one hand, this relates back to $\mathbb{R}$T32 and $\mathbb{R}$T35, which require swift responses without delays. On the other hand, this also means that the analysts must be able to observe any, possibly minor, effects in the output due to their steering of the process. Yet for more complex changes that involve adjustments of many intricate UI controls at the same time, this is hard to do in parallel. In these cases, structured interaction sequences ($\mathbb{R}$T38) can help to lessen the burden of the interactive steering and to be able to better focus on observing its output. These sequences automate low-level interactions, such as moving sliders across a particular interval or brushing/selecting data in certain regions.

**Recommendation VIII:**To support steering the progression, PVA systems should make the current processing pipeline explicit and employ mechanisms that reduce the interaction costs of steering as well as of reverting back to previous configurations.

**Support handling fluctuating progressions**

**Data-induced fluctuations:**Fluctuations can originate from data chunks with a skewed data distribution that is not representative of the overall distribution in the dataset [46]. This happens when employing an inadequate sampling technique, but can in particular be observed when performing no sampling at all and simply partitioning the data as is. One solution is to randomize the data in a preprocess before loading it into the PVA system [19] (Section 4). This simple, yet effective solution is suited for scenarios in which one does not have access to the source of a PVA system and cannot add improved sampling functionality to it. Note that the opposite approach of sorting the data according to one’s own prioritization strategy can likewise be employed for PVA systems that do not support this.

**Computational fluctuations:**A common source for such fluctuations is the inclusion of random displacements in the computation to escape local optima. For example, in some computational approaches, such as genetic algorithms, random elements are an essential part of the principal approach and cannot be eliminated or reduced. A generic way of handling highly fluctuating processes and enforcing their gradual stabilization is to employ simulated annealing [47]. It uses a cooling factor to gradually limit changes as the results become better, but not necessarily more stable.

**Visual fluctuations:**Visualizations with absolute positioning that use the Euclidean space as a fixed frame of reference—e.g., scatterplots—remain reasonably stable unless the data items themselves move around. Yet for example, in case of network diagrams, relative positioning is used that places data items (nodes) in relation to each other. Such a relative positioning is prone to change with every added data item, and it is thus not surprising that the idea of simulated annealing has also been transferred to network visualizations [48] (Chaper 12.2). If fluctuations cannot be avoided, it is common practice to at least try to smoothly animate between successive updates using animated transitions [7] (Section 2.4.1) or staged animations [19] (Section 4.2).

**Interaction-induced fluctuations:**Interactions, such as reparametrizing or steering the process, are one of the main causes of discontinuities in the progression. The literature notes that interactions should take fluctuations into account ($\mathbb{R}$T37). Turkay et al. [7] (Section 2.3.2), suggest using structured interaction sequences ($\mathbb{R}$T38) to ensure a steady parameter changes without the intermediate discontinuities introduced by manual adjustments of sliders or brushes.

**Recommendation IX:**To handle fluctuations, these should be reduced by proper data sampling, by enforcing process stability, by using absolute positioning and animation, and by steadying interactions.

## 6. A Use Case Scenario

#### 6.1. The Visual Analysis Setting

- The analyst selects 30 to 50 candidate provinces in a scatterplot. This scatterplot displays the provinces according to numerical properties that will likely have an influence on the success of the marketing campaign—e.g., market penetration and average income as shown in Figure 3.
- Among those candidate provinces, a Top-10 subset is computed that maximizes the objective function. The subset is displayed in a Sankey diagram, which allows the user to explore the numeric properties of these provinces as well as their relation to the provinces not included in the Top-10—see Figure 4.
- Depending on the interactive assessment of the current set of Top-10 provinces, the analyst can either go back to the scatterplot to choose a different candidate set, or conclude the analysis with the current result and launch the marketing campaign in those ten provinces.

#### 6.2. Characterization as a PVA Scenario

**Reason for using PVA:**Finding the Top-10 is a combinatorial optimization problem for which no better solution exists than testing all possible combinations of 10 provinces out of the selected candidate set. Computing the Top-10 on all 110 provinces and assuming we could test 3000 combinations in a second (actual throughput on an Intel Core i7 with 3.1 GHz), getting the exact result would require around 500 years, which is certainly

**quasi-indefinite**. Under the same assumption, given a candidate set of only 40 provinces this computation would still require 75 h. While this may still be doable, ${t}_{\mathit{Complete}}$ lies well beyond the expected

**task completion**time of around 10 s, effectively hindering a fluent analysis.

**Benefits of using PVA:**The iterative assessment of the result for different, manually tested and refined candidate sets poses clear time constraints on the computation of the Top-10 provinces. To keep the analyst engaged in a fluid back and forth between scatterplot and Sankey diagram, we need to provide a result within a time frame of at most 30 s. PVA can be used to achieve this by producing

**early partial results**within such an acceptable time span, which are then further refined into

**mature partial results**while the analyst already starts exploring.

**Challenges of using PVA:**Using PVA to yield a swift first response from the computation creates several challenges that our PVA design has to address. The foremost challenge is the inexact nature of the partial results and the analyst’s

**need to judge**them. In addition, the generated partial results are

**likely to fluctuate**, so that a province might disappear from the Top-10 only to be reintroduced later. Finally, the

**parametrization of the data partitioning**affects the runtime until the first result is produced. Choosing it inappropriately can easily result in extremely long wait times even for the first early partial result.

#### 6.3. A Solution Design following the PVA Recommendations

**inherent uncertainty**of the process, which we infer from a metric that captures how many out of the 10 optimal provinces are already included in the current Top-10 (see below for its computation). Overall, the users are not much interested in the process of the computation itself and have no concern for monitoring the succession of results, let alone steering it. Hence, our PVA solution, shown in Figure 4, follows the recommendations relating to the PVA benefits and challenges identified in Section 6.2.

**Providing PVA Benefits:**A prerequisite of any PVA solution is to employ a procedure that produces a stream of intermediate results. As a general approach, our PVA solution samples subsets of increasing size from the candidate set, computes their Top-10, and replaces the current best solution with the new one, if the newly produced Top-10 performs better with respect to the objective function.

**Recommendation I:**While this general approach could be used to

**deliver a prompt early result**, this result would most likely be far from significant and worth to be explored, as it initially only covers a small subset of the candidate set. Hence, we employ a slightly different strategy for the first partial result that

**ensures significance**by covering the whole candidate set, while maintaining acceptable response times: We divide the candidate set in two disjoint subsets of equal size, independently compute their Top-5 provinces, and merge them into the very first Top-10 shown in the

**interactive**Sankey diagram. Subdivisions into even more subsets (e.g., Top-3) are also possible, depending on the size of the candidate set and the desired ${t}_{\mathit{Response}}$.

**Recommendation II:**To provide means for estimating the

**inherent uncertainty**of the current partial result and of the progressive computation, we have implemented an adaptive strategy, sampling the whole design space with different selection sizes and process granularities, to find suitable trade-offs. This strategy is implemented defining three intervals for the process granularity and three intervals for the selection size, effectively obtaining nine possible configurations as their pairwise combinations (see Figure 5). For each of these nine possible configurations, we collected and averaged two measures:

**Mitigating PVA Challenges:**Our main design decision to counter the challenges was the use of a minimalistic visual interface that masks the complexity of the underlying computation and its parametrization. Its only visual feature is the Sankey diagram showing the current best result, while all other information is given in textual or numerical form to not distract the user and to keep possible fluctuations in the graphics to a minimum.

**Recommendation V:**The two parameters having most influence on ${t}_{\mathit{Response}}$ are the size of the candidate set and the number of subsets into which the candidate set is partitioned. As the candidate set is chosen by the user, we have no control over it. Based on the benchmarks from Figure 5, we use an

**adaptive parametrization**for the number of subsets, which does not subdivide if the candidate set contains $\le 19$ provinces, subdivides into two subsets for anything between 19 and 50 provinces, and into three sets for $\ge 50$ provinces. This way, we can guarantee ${t}_{\mathit{Response}}$ of 29 s, 34 s, and 18 s, respectively.

**Recommendation VI:**To

**minimize added visual complexity**, our PVA solution shows the uncertainty of the current result only in numerical form. We make it easy to interpret this numerical uncertainty information by not just computing some abstract quality/error metrics, but actually providing indicators of how good the current solution already is (first confidence value) and what could still be gained by waiting for a better result (second confidence value).

**Recommendation IX:**While new results are continuously produced by our procedure, only results that are better than the current best result will update the view. This leads to the effect that view updates come irregularly and unforeseeably, and there is not much we can do about it. At least the analysts know that any update is an improvement over the current view and that they do not have to worry about having a good intermediate result being replaced by one that is worse. To

**smooth the visual changes**resulting from the transition to an updated result, the Sankey diagram does not only show the current Top-10, but all candidate provinces. This way, provinces do not appear and disappear, but merely move up to join the Top-10 or down if excluded again. This movement is animated to help following the transition. To

**stabilize the interaction**, the system automatically blocks the rendering of a new partial result if the user is interacting with the current one.

#### 6.4. User Feedback

## 7. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

HCI | Human-Computer Interaction |

IVA | Instantaneous Visual Analytics |

MDS | Multi-Dimensional Scaling |

MVA | Monolithic Visual Analytics |

PVA | Progressive Visual Analytics |

RoI | Return of Investment |

SIM | Subscriber Identity Module |

SNE | Stochastic Neighbor Embedding |

TIM | Telecom Italia Mobile |

## Appendix A. List of requirements

Requirement | Description | Source |
---|---|---|

$\mathbb{R}$Hel1 | Process interesting data early, so users can get satisfactory results quickly, halt processing, and move on to their next request | Hellerstein et al., 1999 (Preferential data delivery: online reordering) |

$\mathbb{R}$H2 | Enable monitoring the visualization and seeing what’s new each time a new increment of data is processed and loaded into the system | Hetzler et al., 2005 (Section 3.1) |

$\mathbb{R}$H3 | Allow the explicit control of updates arrival | Hetzler et al., 2005 (Section 3.2) |

$\mathbb{R}$H4 | Minimize the disruption to the analytic process flow and the interaction flow | Hetzler et al., 2005 (Section 3.2) |

$\mathbb{R}$H5 | Provide full interactivity for dynamic datasets | Hetzler et al., 2005 (Section 3.3) |

$\mathbb{R}$H6 | Provide dynamic update features | Hetzler et al., 2005 (Section 3.4) |

$\mathbb{R}$C7 | Allow users to communicate progressive samples to the system | Chandramouli et al., 2013 (Section 1.1) |

$\mathbb{R}$C8 | Allow efficient and deterministic query processing over progressive samples, without the system itself trying to reason about specific sampling strategies or confident estimation | Chandramouli et al., 2013 (Section 1.1) |

$\mathbb{R}$F9 | Uncertainty visualization should be easy to interpret | Ferreira et al., 2014 (Design Goals) |

$\mathbb{R}$F10 | Visualizations should be consistent across tasks | Ferreira et al., 2014 (Design Goals) |

$\mathbb{R}$F11 | Maintain spatial stability of visualizations across sample size | Ferreira et al., 2014 (Design Goals) |

$\mathbb{R}$F12 | Minimize visual noise | Ferreira et al., 2014 (Design Goals) |

$\mathbb{R}$S13 | Managing the partial results in the visual interface should not interfere with the user’s cognitive workflow | Stolper et al., 2014 (Section 1) |

$\mathbb{R}$S14 | Produce increasingly meaningful partial results | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S15 | Allow users to focus the algorithm to subspaces of interest | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S16 | Allow users to ignore irrelevant subspaces | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S17 | Minimize distractions by not changing views excessively | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S18 | Provide cues to indicate where new results have been found by analytics | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S19 | Support an on-demand refresh when analysts are ready to explore the latest results | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$S20 | Provide an interface to specify where analytics should focus, as well as the portions of the problem space that should be ignored | Stolper et al., 2014 (Section 4.3) |

$\mathbb{R}$M21 | Provide feedback on the aliveness of the execution | Mühlbacher et al., 2014 (Section 3.1) |

$\mathbb{R}$M22 | Provide feedback on the absolute progress of the execution | Mühlbacher et al., 2014 (Section 3.1) |

$\mathbb{R}$M23 | Provide feedback on the relative progress of the execution | Mühlbacher et al., 2014 (Section 3.1) |

$\mathbb{R}$M24 | Generate structure-preserving intermediate results | Mühlbacher et al., 2014 (Section 3.2) |

$\mathbb{R}$M25 | Provide aggregated information | Mühlbacher et al., 2014 (Section 3.2) |

$\mathbb{R}$M26 | Provide feedback on the uncertainty of a result | Mühlbacher et al., 2014 (Section 3.2) |

$\mathbb{R}$M27 | Provide provenance information, including any meta-information concerning simplifications made for generating a partial result | Mühlbacher et al., 2014 (Section 3.2) |

$\mathbb{R}$M28 | Allow for execution control by cancellation | Mühlbacher et al., 2014 (Section 3.3) |

$\mathbb{R}$M29 | Allow altering the sequence of intermediate results through prioritization | Mühlbacher et al., 2014 (Section 3.3) |

$\mathbb{R}$M30 | Provide inner result control for steering a single ongoing computation before it eventually returns a final result | Mühlbacher et al., 2014 (Section 3.4) |

$\mathbb{R}$M31 | Provide outer result control to generate a result from multiple consecutive executions of a computation | Mühlbacher et al., 2014 (Section 3.4) |

$\mathbb{R}$T32 | Employ human time constants | Turkay et al., 2017 (Section 2.1, DR1) |

$\mathbb{R}$T33 | Employ online learning algorithms | Turkay et al., 2017 (Section 2.2, DR2) |

$\mathbb{R}$T34 | Employ an adaptive sampling mechanism (convergence & temporal constraints) | Turkay et al., 2017 (Section 2.2.2, DR3) |

$\mathbb{R}$T35 | Facilitate the immediate initiation of computations after user interaction | Turkay et al., 2017 (Section 2.3.1, DR4) |

$\mathbb{R}$T36 | Provide interaction mechanisms enabling management of the progression | Turkay et al., 2017 (Section 2.3.1, DR5) |

$\mathbb{R}$T37 | Design interaction taking into account fluctuations | Turkay et al., 2017 (Section 2.3.1, DR6) |

$\mathbb{R}$T38 | Provide interaction mechanisms to define structured investigation sequence | Turkay et al., 2017 (Section 2.3.2, DR7) |

$\mathbb{R}$T39 | Support the interpretation of the evolution of the results through suitable visualizations | Turkay et al., 2017 (Section 2.4.1, DR8) |

$\mathbb{R}$T40 | Inform analysts on the progress of computations and indicate the time-to-completion | Turkay et al., 2017 (Section 2.4.3, DR9) |

$\mathbb{R}$T41 | Inform analysts on the uncertainty in the computations and the way the computations develop | Turkay et al., 2017 (Section 2.4.3, DR10) |

$\mathbb{R}$B42 | Show the analysis pipeline | Badam et al., 2017 (Section 7.3) |

$\mathbb{R}$B43 | Support monitoring mode and exploration mode | Badam et al., 2017 (Section 7.3) |

$\mathbb{R}$B44 | Provide similarity anchors | Badam et al., 2017 (Section 7.3) |

$\mathbb{R}$B45 | Use consistently visualized quality measures | Badam et al., 2017 (Section 7.3) |

## References

- Stolper, C.; Perer, A.; Gotz, D. Progressive visual analytics: User-driven visual exploration of in-progress analytics. IEEE Trans. Vis. Comput. Graph.
**2014**, 20, 1653–1662. [Google Scholar] [CrossRef] [PubMed] - Fekete, J.D.; Primet, R. Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis. arXiv, 2016; arXiv:1607.05162. [Google Scholar]
- Schulz, H.J.; Angelini, M.; Santucci, G.; Schumann, H. An Enhanced Visualization Process Model for Incremental Visualization. IEEE Trans. Vis. Comput. Graph.
**2016**, 22, 1830–1842. [Google Scholar] [CrossRef] [PubMed] - Glueck, M.; Khan, A.; Wigdor, D. Dive in! Enabling progressive loading for real-time navigation of data visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Toronto, ON, Canada, 26 April–1 May 2014; Schmidt, A., Grossman, T., Eds.; ACM: New York, NY, USA, 2014; pp. 561–570. [Google Scholar] [CrossRef]
- Mühlbacher, T.; Piringer, H.; Gratzl, S.; Sedlmair, M.; Streit, M. Opening the black box: Strategies for increased user involvement in existing algorithm implementations. IEEE Trans. Vis. Comput. Graph.
**2014**, 20, 1643–1652. [Google Scholar] [CrossRef] [PubMed] - Rosenbaum, R.; Schumann, H. Progressive refinement: more than a means to overcome limited bandwidth. In Proceedings of the Conference on Visualization and Data Analysis (VDA), San Jose, CA, USA, 18–22 January 2009; Börner, K., Park, J., Eds.; SPIE: Bellingham, WA, USA, 2009; p. 72430I. [Google Scholar] [CrossRef]
- Turkay, C.; Kaya, E.; Balcisoy, S.; Hauser, H. Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis. IEEE Trans. Vis. Comput. Graph.
**2017**, 23, 131–140. [Google Scholar] [CrossRef] [PubMed] - Fisher, D.; Popov, I.; Drucker, S.M.; Schraefel, M. Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Austin, TX, USA, 5–10 May 2012; Konstan, J.A., Chi, E.H., Höök, K., Eds.; ACM: New York, NY, USA, 2012; pp. 1673–1682. [Google Scholar] [CrossRef]
- Song, D.; Golin, E. Fine-grain visualization algorithms in dataflow environments. In Proceedings of the IEEE Conference on Visualization (VIS), San Jose, CA, USA, 25–29 October 1993; Nielson, G.M., Bergeron, D., Eds.; IEEE: Piscataway, NJ, USA, 1993; pp. 126–133. [Google Scholar] [CrossRef]
- Hellerstein, J.M.; Avnur, R.; Chou, A.; Hidber, C.; Olston, C.; Raman, V.; Roth, T.; Haas, P.J. Interactive Data Analysis: The Control Project. IEEE Comput.
**1999**, 32, 51–59. [Google Scholar] [CrossRef] - Frey, S.; Sadlo, F.; Ma, K.L.; Ertl, T. Interactive Progressive Visualization with Space-Time Error Control. IEEE Trans. Vis. Comput. Graph.
**2014**, 20, 2397–2406. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kim, H.; Choo, J.; Lee, C.; Lee, H.; Reddy, C.K.; Park, H. PIVE: Per-Iteration Visualization Environment for Real-time Interactions with Dimension Reduction and Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 1–9 February 2017; Singh, S., Markovitch, S., Eds.; AAAI: Menlo Park, CA, USA, 2017; pp. 1001–1009. [Google Scholar]
- Moritz, D.; Fisher, D.; Ding, B.; Wang, C. Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Denver, CO, USA, 6–11 May 2017; Lampe, C., Schraefel, M.C., Hourcade, J.P., Appert, C., Wigdor, D., Eds.; ACM: New York, NY, USA, 2017; pp. 2904–2915. [Google Scholar] [CrossRef]
- Kwon, B.C.; Verma, J.; Haas, P.J.; Demiralp, Ç. Sampling for Scalable Visual Analytics. IEEE Comput. Graph. Appl.
**2017**, 37, 100–108. [Google Scholar] [CrossRef] [PubMed] - Zgraggen, E.; Galakatos, A.; Crotty, A.; Fekete, J.D.; Kraska, T. How Progressive Visualizations Affect Exploratory Analysis. IEEE Trans. Vis. Comput. Graph.
**2017**, 23, 1977–1987. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lins, L.; Klosowski, J.T.; Scheidegger, C. Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. IEEE Trans. Vis. Comput. Graph.
**2013**, 19, 2456–2465. [Google Scholar] [CrossRef] [PubMed][Green Version] - Nocke, T.; Heyder, U.; Petri, S.; Vohland, K.; Wrobel, M.; Lucht, W. Visualization of Biosphere Changes in the Context of Climate Change. In Proceedings of the International Conference on IT and Climate Change (ITCC), Berlin, Germany, 25–26 September 2008; Wohlgemuth, V., Ed.; Trafo Wissenschaftsverlag: Berlin, Germany, 2009; pp. 29–36. [Google Scholar]
- Sacha, D.; Stoffel, A.; Stoffel, F.; Kwon, B.C.; Ellis, G.; Keim, D.A. Knowledge Generation Model for Visual Analytics. IEEE Trans. Vis. Comput. Graph.
**2014**, 20, 1604–1613. [Google Scholar] [CrossRef] [PubMed] - Badam, S.K.; Elmqvist, N.; Fekete, J.D. Steering the Craft: UI Elements and Visualizations for Supporting Progressive Visual Analytics. Comput. Graph. Forum
**2017**, 36, 491–502. [Google Scholar] [CrossRef][Green Version] - Marai, G.E. Activity-Centered Domain Characterization for Problem-Driven Scientific Visualization. IEEE Trans. Vis. Comput. Graph.
**2018**, 24, 913–922. [Google Scholar] [CrossRef] [PubMed] - Amar, R.A.; Stasko, J.T. Knowledge Precepts for Design and Evaluation of Information Visualizations. IEEE Trans. Vis. Comput. Graph.
**2005**, 11, 432–442. [Google Scholar] [CrossRef] [PubMed] - Fink, A. Conducting Research Literature Reviews: From the Internet to Paper, 4th ed.; SAGE Publishing: Thousand Oaks, CA, USA, 2014. [Google Scholar]
- Hellerstein, J.M.; Haas, P.J.; Wang, H.J. Online Aggregation. SIGMOD Record
**1997**, 26, 171–181. [Google Scholar] [CrossRef] - Ding, X.; Jin, H. Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data. IEEE Trans. Knowl. Data Eng.
**2012**, 24, 1448–1462. [Google Scholar] [CrossRef] - Chandramouli, B.; Goldstein, J.; Quamar, A. Scalable Progressive Analytics on Big Data in the Cloud. Proc. VLDB Endow.
**2013**, 6, 1726–1737. [Google Scholar] [CrossRef] - Im, J.F.; Villegas, F.G.; McGuffin, M.J. VisReduce: Fast and responsive incremental information visualization of large datasets. In Proceedings of the IEEE International Conference on Big Data (BigData), Silicon Valley, CA, USA, 6–9 October 2013; Hu, X., Lin, T.Y., Raghavan, V., Wah, B., Baeza-Yates, R., Fox, G., Shahabi, C., Smith, M., Yang, Q., Ghani, R., et al., Eds.; IEEE: Piscataway, NJ, USA, 2013; pp. 25–32. [Google Scholar] [CrossRef]
- Procopio, M.; Scheidegger, C.; Wu, E.; Chang, R. Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time. In Proceedings of the Workshop on Data Systems for Interactive Analysis (DSIA), Phoenix, AZ, USA, 1–2 October 2017. [Google Scholar]
- Hetzler, E.G.; Crow, V.L.; Payne, D.A.; Turner, A.E. Turning the bucket of text into a pipe. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), Minneapolis, MN, USA, 23–25 October 2005; Stasko, J., Ward, M.O., Eds.; IEEE: Piscataway, NJ, USA, 2005; pp. 89–94. [Google Scholar] [CrossRef]
- Wong, P.C.; Foote, H.; Adams, D.; Cowley, W.; Thomas, J. Dynamic visualization of transient data streams. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), Seattle, WA, USA, 20–21 October 2003; Munzner, T., North, S., Eds.; IEEE: Piscataway, NJ, USA, 2003; pp. 97–104. [Google Scholar] [CrossRef]
- García, I.; Casado, R.; Bouchachia, A. An Incremental Approach for Real-Time Big Data Visual Analytics. In Proceedings of the IEEE International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), Vienna, Austria, 22–24 August 2016; Younas, M., Awan, I., Haddad, J.E., Eds.; IEEE: Piscataway, NJ, USA, 2016; pp. 177–182. [Google Scholar] [CrossRef][Green Version]
- Crouser, R.J.; Franklin, L.; Cook, K. Rethinking Visual Analytics for Streaming Data Applications. IEEE Int. Comput.
**2017**, 21, 72–76. [Google Scholar] [CrossRef] - Angelini, M.; Corriero, R.; Franceschi, F.; Geymonat, M.; Mirabelli, M.; Remondino, C.; Santucci, G.; Stabellini, B. A Visual Analytics System for Mobile Telecommunication Marketing Analysis. In Proceedings of the International EuroVis Workshop on Visual Analytics (EuroVA), Groningen, The Netherlands, 6–7 June 2016; Andrienko, N., Sedlmair, M., Eds.; Eurographics Association: Aire-la-Ville, Switzerland, 2016. [Google Scholar] [CrossRef]
- Elmqvist, N.; Vande Moere, A.; Jetter, H.C.; Cernea, D.; Reiterer, H.; Jankun-Kelly, T.J. Fluid interaction for information visualization. Inform. Vis.
**2011**, 10, 327–340. [Google Scholar] [CrossRef][Green Version] - Shneiderman, B. Response time and display rate in human performance with computers. ACM Comput. Surv.
**1984**, 16, 265–285. [Google Scholar] [CrossRef] - Card, S.; Robertson, G.; Mackinlay, J. The information visualizer, an information workspace. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), New Orleans, LA, USA, 27 April–2 May 1991; Robertson, S.P., Olson, G.M., Olson, J.S., Eds.; ACM: New York, NY, USA, 1991; pp. 181–188. [Google Scholar] [CrossRef]
- Liu, Z.; Heer, J. The Effects of Interactive Latency on Exploratory Visual Analysis. IEEE Trans. Vis. Comput. Graph.
**2014**, 20, 2122–2131. [Google Scholar] [CrossRef] [PubMed][Green Version] - Qu, H.; Zhou, H.; Wu, Y. Controllable and Progressive Edge Clustering for Large Networks. In Graph Drawing. GD 2006. Lecture Notes in Computer Science; Kaufmann, M., Wagner, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 399–404. [Google Scholar] [CrossRef]
- Jo, J.; Seo, J.; Fekete, J.D. A Progressive k-d tree for Approximate k-Nearest Neighbors. In Proceedings of the Workshop on Data Systems for Interactive Analysis (DSIA), Phoenix, AZ, USA, 1–2 October 2017; Chang, R., Scheidegger, C., Fisher, D., Heer, J., Eds.; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Fisher, D. Big Data Exploration Requires Collaboration Between Visualization and Data Infrastructures. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), San Francisco, CA, USA, 26 June–1 July 2016; Binnig, C., Fekete, A., Nandi, A., Eds.; ACM: New York, NY, USA, 2016; pp. 16:1–16:5. [Google Scholar] [CrossRef]
- Crotty, A.; Galakatos, A.; Zgraggen, E.; Binnig, C.; Kraska, T. The Case for Interactive Data Exploration Accelerators (IDEAs). In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), San Francisco, CA, USA, 26 June–1 July 2016; Binnig, C., Fekete, A., Nandi, A., Eds.; ACM: New York, NY, USA, 2016; pp. 11:1–11:6. [Google Scholar] [CrossRef]
- Angelini, M.; Santucci, G. On Visual Stability and Visual Consistency for Progressive Visual Analytics. In Proceedings of the International Conference on Information Visualization Theory and Applications (IVAPP), Porto, Portugal, 27 February–1 March 2017; Linsen, L., Telea, A., Braz, J., Eds.; SciTePress: Setúbal, Portugal, 2017; pp. 335–341. [Google Scholar] [CrossRef]
- Ferreira, N.; Fisher, D.; König, A.C. Sample-oriented task-driven visualizations: Allowing users to make better, more confident decisions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Paris, France, 27 April–2 May 2014; Schmidt, A., Grossman, T., Eds.; ACM: New York, NY, USA, 2014; pp. 571–580. [Google Scholar] [CrossRef]
- Angelini, M.; Santucci, G. Modeling Incremental Visualizations. In Proceedings of the International EuroVis Workshop on Visual Analytics (EuroVA), Leipzig, Germany, 17–18 June 2013; Pohl, M., Schumann, H., Eds.; Eurographics Association: Aire-la-Ville, Switzerland, 2013; pp. 13–17. [Google Scholar] [CrossRef]
- Frishman, Y.; Tal, A. Online Dynamic Graph Drawing. IEEE Trans. Vis. Comput. Graph.
**2008**, 14, 727–740. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wu, Y.; Xu, L.; Chang, R.; Hellerstein, J.M.; Wu, E. Making Sense of Asynchrony in Interactive Data Visualizations. arXiv, 2018; arXiv:1806.01499. [Google Scholar]
- Rahman, S.; Aliakbarpour, M.; Kong, H.K.; Blais, E.; Karahalios, K.; Parameswaran, A.; Rubinfield, R. I’ve Seen “Enough”: Incrementally Improving Visualizations to Support Rapid Decision Making. Proc. VLDB Endow.
**2017**, 10, 1262–1273. [Google Scholar] [CrossRef] - Kirkpatrick, S.; Gelatt, C.D., Jr.; Vecchi, M.P. Optimization by Simulated Annealing. Science
**1983**, 220, 671–680. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kobourov, S.G. Force-Directed Drawing Algorithms. In Handbook of Graph Drawing and Visualization; Tamassia, R., Ed.; CRC Press: Boca Raton, FL, USA, 2013; Chapter 12; pp. 383–408. [Google Scholar]
- Pezzotti, N.; Höllt, T.; van Gemert, J.; Lelieveldt, B.P.; Eisemann, E.; Vilanova, A. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. IEEE Trans. Vis. Comput. Graph.
**2018**, 24, 98–108. [Google Scholar] [CrossRef] [PubMed] - Zhao, H.; Zhang, H.; Liu, Y.; Zhang, Y.; Zhang, X. Pattern Discovery: A Progressive Visual Analytic Design to Support Categorical Data Analysis. J. Vis. Lang. Comput.
**2017**, 43, 42–49. [Google Scholar] [CrossRef] - Boukhelifa, N.; Perrin, M.E.; Huron, S.; Eagan, J. How Data Workers Cope with Uncertainty: A Task Characterisation Study. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Denver, CO, USA, 6–11 May 2017; Lampe, C., Schraefel, M.C., Hourcade, J.P., Appert, C., Wigdor, D., Eds.; ACM: New York, NY, USA, 2017; pp. 3645–3656. [Google Scholar]
- Pham, D.T.; Dimov, S.S.; Nguyen, C.D. An Incremental k-means Algorithm. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci.
**2004**, 218, 783–795. [Google Scholar] [CrossRef] - Williams, M.; Munzner, T. Steerable, Progressive Multidimensional Scaling. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), Austin, TX, USA, 10–12 October 2004; Ward, M.O., Munzner, T., Eds.; IEEE: Piscataway, NJ, USA, 2004; pp. 57–64. [Google Scholar] [CrossRef]
- Pezzotti, N.; Lelieveldt, B.; van der Maaten, L.; Hollt, T.; Eisemann, E.; Vilanova, A. Approximated and user steerable tSNE for progressive visual analytics. IEEE Trans. Vis. Comput. Graph.
**2017**, 23, 1739–1752. [Google Scholar] [CrossRef] [PubMed] - Rosenbaum, R.; Hamann, B. Progressive Presentation of Large Hierarchies Using Treemaps. In Advances in Visual Computing; Bebis, G., Boyle, R., Parvin, B., Koracin, D., Kuno, Y., Wang, J., Pajarola, R., Lindstrom, P., Hinkenjann, A., Encarnação, M.L., et al., Eds.; Number 5876 in Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 71–80. [Google Scholar] [CrossRef]
- Heinrich, J.; Bachthaler, S.; Weiskopf, D. Progressive Splatting of Continuous Scatterplots and Parallel Coordinates. Comput. Graph. Forum
**2011**, 30, 653–662. [Google Scholar] [CrossRef][Green Version] - Rosenbaum, R.; Zhi, J.; Hamann, B. Progressive parallel coordinates. In Proceedings of the IEEE Pacific Visualization Symposium (PacificVis), Songdo, Korea, 28 February–2 March 2012; Hauser, H., Kobourov, S., Qu, H., Eds.; IEEE: Piscataway, NJ, USA, 2012; pp. 25–32. [Google Scholar] [CrossRef]

**Figure 1.**Instantaneous, Monolithic, and Progressive Visual Analytics described by their temporal characteristics. Dotted lines indicate wait times in which the analysis is stalled. Note that in the case of PVA, partial results of different quality can be used for different stages of visual analysis, roughly aligning with the different levels or loops of the knowledge generation model for visual analytics proposed by Sacha et al. [18].

**Figure 2.**The requirements from Appendix A1 linked to the different aspects of PVA: The main diagonal lists requirements, which relate to a single PVA aspect. Other cells list requirements concerning two PVA aspects. The shorthands relate to the following publications: $\mathbb{R}$Hel [10], $\mathbb{R}$H [28], $\mathbb{R}$C [25], $\mathbb{R}$F [42], $\mathbb{R}$S [1], $\mathbb{R}$M [5], $\mathbb{R}$T [7], $\mathbb{R}$B [19].

**Figure 3.**The scatterplot allows selecting a group of provinces that capture some high level campaign scenarios. In the shown example, the analyst has selected provinces characterized by high income and a market penetration close to the median. The objective is to promote some additional non-essential features (e.g., faster network services or more data volume) that will likely be accepted by existing TIM customers, but that also have a chance to win over new customers. As the potential customers need to be able to pay for these additional features, the analyst purposefully chose high income provinces as a suitable subset from which to extract the Top-10 provinces.

**Figure 4.**The currently best Top-10 set of provinces is highlighted at the top of the parallel-Sankey plot. The user can easily distinguish provinces above the market penetration median (yellow) and those below (blue). The numerical quality indicators above the plot help the analyst judge the current result: The first confidence value of $0.85$ means that between 8 and 9 provinces of the current Top-10 are already stable. The second confidence value shows that the current result is at $0.673$ of the optimal result—i.e., if the analyst waits for the remaining two provinces of the Top-10 to stabilize, the estimated gain to the current objective function $F=8.648$ will still be more than $30\%$. Putting these values in relation to the small fraction of only a 2/10000th of all combinations having been tried yet, the current result looks already quite good, but leaves room for further improvement.

**Figure 5.**Exploring different PVA configurations of chunking data and process. The y axis represents the provinces selection size, ranging from 10 to 110 provinces; the x axis represents the process, ranging from (1) a monolithic process computing $\left(\genfrac{}{}{0pt}{}{n}{10}\right)$ on the whole selection, via (2) computing $\left(\genfrac{}{}{0pt}{}{n}{5}\right)$ splitting the selection in 2 chunks to produce an early partial result, and then compute $\left(\genfrac{}{}{0pt}{}{n}{10}\right)$ on the whole chunk in a longer time, to (3) a process composed by the sequence of computations $\left(\genfrac{}{}{0pt}{}{n}{3}\right)$, $\left(\genfrac{}{}{0pt}{}{n}{5}\right)$, $\left(\genfrac{}{}{0pt}{}{n}{10}\right)$ splitting the selections in 3 chunks and producing an early partial result, refining it using two chunks, and then computing the optimal solution on the whole selection. The ${t}_{\mathit{Response}}$ values report the time span needed to produce the first meaningful result for each configuration. The confidence measures, TP and FR indicate the means of the numerical ratio between the estimated function and the optimal one, and the proportion of the provinces belonging to both the estimation and the optimum, respectively. The orange tiles represent the selected strategies in the current implementation: they respect the time constraints and minimize errors.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Angelini, M.; Santucci, G.; Schumann, H.; Schulz, H.-J. A Review and Characterization of Progressive Visual Analytics. *Informatics* **2018**, *5*, 31.
https://doi.org/10.3390/informatics5030031

**AMA Style**

Angelini M, Santucci G, Schumann H, Schulz H-J. A Review and Characterization of Progressive Visual Analytics. *Informatics*. 2018; 5(3):31.
https://doi.org/10.3390/informatics5030031

**Chicago/Turabian Style**

Angelini, Marco, Giuseppe Santucci, Heidrun Schumann, and Hans-Jörg Schulz. 2018. "A Review and Characterization of Progressive Visual Analytics" *Informatics* 5, no. 3: 31.
https://doi.org/10.3390/informatics5030031