Visual Analytics for Predicting Disease Outcomes Using Laboratory Test Results

: Laboratory tests play an essential role in the early and accurate diagnosis of diseases. In this paper, we propose SUNRISE, a visual analytics system that allows the user to interactively explore the relationships between laboratory test results and a disease outcome. SUNRISE integrates frequent itemset mining (i.e., Eclat algorithm) with extreme gradient boosting (XGBoost) to develop more specialized and accurate prediction models. It also includes interactive visualizations to allow the user to interact with the model and track the decision process. SUNRISE helps the user probe the prediction model by generating input examples and observing how the model responds. Furthermore, it improves the user’s conﬁdence in the generated predictions and provides them the means to validate the model’s response by illustrating the underlying working mechanism of the prediction models through visualization representations. SUNRISE offers a balanced distribution of processing load through the seamless integration of analytical methods with interactive visual representations to support the user’s cognitive tasks. We demonstrate the usefulness of SUNRISE through a usage scenario of exploring the association between laboratory test results and acute kidney injury, using large provincial healthcare databases from Ontario, Canada.


Introduction
Accurate and early clinical diagnoses play an important role in the successful treatment of diseases. Every disease stems from or causes changes at a molecular and cellular level, and some of these changes can be detected through changes in urine and blood parameter values [1]. Patterns within laboratory test results may contain additional information relevant to patient care that are not detected or appreciated by even the most experienced physicians [2,3]. Laboratories typically report test results as individual categorical and numerical values, but some individual results, particularly when studied in isolation, may have limited clinical value. Physicians often integrate several individual tests from a patient and interpret them in the context of medical knowledge and experience to use them for disease diagnosis and management. Furthermore, patients might have many individual tests, spanning years. There is a higher chance of overlooking important patterns in the increasing numbers of parameters that laboratories measure. While the manual approach to test interpretation is the routine procedure in most cases, data analytics offers the potential to improve the laboratory tests' diagnostic value [4]. Several studies have been conducted to develop risk prediction models using laboratory test data, and some of these models were developed using data analytics techniques [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. These studies rely solely on performance metrics, such as high accuracy scores, to assess model performance. Furthermore, due to their unclear working mechanisms and their incomprehensible functions, the analytics techniques used in these studies are often treated by users as black boxes. Therefore, the question arises whether the user can trust these analytics techniques or not, especially in medical settings, where the model makes a critical decision about patients [20,21]. One way to increase the model's interpretability is by getting the user involved in the analytics process through an integrated approach called visual analytics [22,23].
Visual Analytics (VA) is an emerging research discipline that integrates data analytics and interactive visualizations [22,24]. It has the potential to enhance the user's confidence in the prediction results by improving their understanding of the modeling process and output [25][26][27]. VA is capable of illustrating the model's rationale, presenting the prediction result, and providing the user with the means to validate the model's response. In addition, VA allows the user to access, modify, and restructure the displayed data as well as guide the data analytics techniques. This, in turn, sets off internal computational reactions that result in further analytical processes. VA aims to make the best possible use of the massive amounts of data stored in electronic health records (EHRs) by combining the strength of analytical processes, the user's visual perception, and analysis capabilities of the model [28][29][30][31]. It enhances the user's ability to accomplish data-driven tasks by allowing them to analyze EHRs in ways that would be difficult to do otherwise [32,33].
The goal of this paper is to show how VA systems can be developed systematically to create disease prediction models using laboratory test result data. To this end, we present a novel proof-of-concept system called SUNRISE (viSUal aNalytics for exploring the association between laboRatory test results and a dIsease outcome using xgbooSt and Eclat). SUNRISE allows healthcare providers to examine associations between different groups of laboratory test results and a specific disease by tweaking the test values and inspecting how the predictive model responds. It aims to support the user to go beyond judging predictive models based on their performance measures. Instead of just relying on the evaluation metrics, SUNRISE helps the user better understand how predictions are generated by illustrating their underlying working mechanisms. While several VA systems have been developed for other areas in healthcare [30,31,[34][35][36][37][38][39][40][41][42][43][44][45][46], SUNRISE is novel in that it incorporates the extreme gradient boosting technique (i.e., XGBoost), frequent itemset mining (i.e., Eclat algorithm), visualization, and human-data interaction in an integrated manner. We demonstrate the usefulness of SUNRISE through a case study of exploring associations between laboratory test results and acute kidney injury (AKI) using large provincial healthcare databases from Ontario, Canada stored at ICES (ICES is an independent, non-profit, world-leading research organization that uses population-based health and social data to produce knowledge on a broad range of healthcare issues).
The rest of this paper is organized as follows. In Section 2, we provide a summary of the conceptual background that is required to understand the design of SUNRISE. In Section 3, we explain the methods used for the design of SUNRISE by providing a description of its structure and modules. In Section 4, we present a usage scenario of SUNRISE to demonstrate the potential utility of the system. In Section 5, we discuss the usefulness and limitations of the proposed VA system. Finally, Section 6 concludes the paper.

Background
This section presents the necessary concepts for understanding the design of SUNRISE. First, we describe the components of visual analytics. Afterwards, we briefly describe the machine learning techniques used in this paper.

Visual Analytics
Visual analytics is a multidisciplinary field that helps the user gain insights from data via integration of analytics techniques and interactive visualization with human judgment [47]. It can support the execution of data-driven cognitive tasks such as sensemaking, knowledge discovery, and decision-making, to name a few [32,48,49]. The primary challenge these tasks present is that the user needs to rapidly analyze, interpret, compare, Informatics 2022, 9, 17 3 of 28 and contrast large amounts of information. VAs are capable of providing cognitive and computational assistance to the user in performing these cognitive tasks by combining machine learning techniques, analytical processes, visualizations, and various interaction mechanisms [50,51]. In summary, VA is composed of two integrated modules: an analytics module and an interactive visualization module [49,52].
The analytics module combines machine learning with data processing techniques to reduce the cognitive load of the user when performing data-intensive tasks [33,51,[53][54][55]. The analytics module is technology-independent, and it includes the use of processing techniques and data mining algorithms that best fit the needs of a domain. This module is composed of three primary steps: pre-processing, transformation, and analysis [56,57]. In the pre-processing step, the raw data retrieved from multiple sources gets pre-processed. This includes tasks such as cleaning, integration, fusion, and synthesis [57]. Then, the pre-processed data gets transformed into forms that are more appropriate for analysis. Examples of tasks that can be integrated into this stage are feature construction, normalization, aggregation, and discretization [57]. Finally, different data mining algorithms and machine learning techniques are applied to discover useful, unknown patterns from the data in the analysis stage. Despite all the benefits, most of these computational techniques are treated as black-box models and not developed with interpretability constraints. VA can provide the user with the underlying working mechanisms of these models to make them more trustworthy, informative, and easier to understand through interactive visualization.
Interactive visualization in VA involves mapping processed and derived data from the analytics module to visual structures [49,52]. It allows the user to interactively control and validate the analytical processes towards better interpretability and performance. It provides the user with new analytical possibilities that can be utilized in an iterative manner [58]. In the context of VA, these iterations can be regarded as discourses between the user and the VA. This back-and-forth communication supports the user by distributing the processing load between the user and the VA system during their analysis and exploration of the data [49,59,60].

Machine Learning Techniques
In this section, we provide a brief overview of the machine learning techniques used in this paper.

Frequent Itemset Mining (Eclat)
Frequent itemset mining, which was first introduced by Agrawal and Srikant [61], is a task of discovering features that frequently appear together in a database. Although frequent itemset mining was initially proposed to find groups of items that frequently co-occur in transactions made by customers, it is now viewed as a general mining task that can be applied in many other domains, such as image classification [62], bioinformatics [63], network traffic analysis [64,65], customer reviews analysis [66], activity monitoring [67], and disease prediction [68,69], to name just a few.
The frequent itemset mining can be formally defined as follows. Let I be a set of items where I = {i 1 , i 2 , . . . , i m }. A transactional database T includes a set of transactions {t 1 , t 2 , . . . , t n } where every transaction is a set of items (t i ⊆ T) that can be identified by a unique transaction identifier (TID). An itemset x is a collection of items, and it can be characterized by a notion called support value (sup(x)). Support is defined as the ratio of the number of transactions in T that contain x and the total number of transactions in T. It shows the frequency of appearance of an itemset in the database. An itemset is considered frequent if its support value is higher, or equal to, the smallest minimum support threshold (minsup) that is defined by the user. The task of frequent itemset mining consists of extracting all frequent itemsets from database T, given a minimum support threshold. Several techniques have been proposed to address this task. One of the most common frequent itemset mining techniques is the Eclat algorithm.
Eclat [70] is a depth-first search approach that uses a vertical database format. A vertical database format represents the list of transactions where each item appears (i.e., tid − list or tid(x) for itemset x). The main benefit of this format is that it makes it possible to obtain the tid − list of an itemset by simply intersecting the tid − lists of its included items without requiring a full scan of the dataset. The main idea of Eclat is to utilize the tid − lists intersections to obtain the support value of an itemset by using the property that sup(x) = |tid(x)|. This algorithm first, scans the dataset to obtain all frequent itemsets with k items, and then, it generates all candidate itemsets that include k + 1 items from frequent k-itemsets. In the next step, it gets all frequent (k + 1)-itemsets by leaving out all the non-frequent itemsets. It repeats these steps until no other candidate itemset can be generated.

Extreme Gradient Boosting
Extreme Gradient Boosting (i.e., XGBoost) belongs to a class of learning algorithms that aim to create a strong classifier by combining many "weak" classifiers-namely boosting techniques [71]. XgBoost is chosen due to its scalability, excellent performance, and efficient training speed [72][73][74]. This technique is an enhancement of the gradient boosting decision tree, and it is used for both regression and classification problems [75].
The idea of XGBoost is to build decision trees sequentially such that each subsequent tree seeks to reduce the residuals of the previous trees. At each iteration, the tree that grows next in the sequence learns from its predecessors by fitting a new model to the last predicted residuals and then minimizing the loss when adding the latest prediction. XGBoost adds an additional custom regularization to the loss function to establish the objective function.
L is the loss function that measures how well the model fits the training data, and Ω represents the regularization term that measures the model's complexity. For a training data set with n samples, the model is given by a function of the sum of K tress:ŷ where x i is the independent variable,ŷ i is the predicted value corresponding to the dependent variable y i , f k represents the tree structure, and F is the collection of all possible trees. When the model is additive, we can write the prediction value at iteration t using the following equation:ŷ Then, the objective function at iteration t can be defined as: where n represents the number of samples. Chen et al. [71] define the regularization term using the following equation : where γ is the minimum loss reduction required to make a further split to a terminal node in the tree, T is the number of terminal nodes in the tree, λ is the regularization parameter, and w represents the vector of scores on terminal nodes.
We can re-write the objective function as: At each iteration, a tree that optimizes the objective function defined in Equation (6) is created. In order to optimize this function, the second-order Taylor expansion of the loss function is taken.
where g i and h i are the first and second derivatives of the loss function. By solving this equation, the optimal values for w j (weights for a given tree structure) can be calculated as: where G j and H j can be defined as: where I j represents all the samples assigned to the j-th terminal node of the tree. Now that we have a way to learn the weights for a given tree structure, the next step is to learn the structure of the tree. A set of candidate splits are proposed for each split, and the one that minimizes the loss function is selected. This is the criterion that we seek to minimize to find the optimal split in the tree, and it can be defined as: Equivalently, we seek the split that maximizes the gain: Given an input data, each tree will identify a root-to-leaf path (i.e., decision path) accordingly, which results in the prediction generated by each tree. If we assume the decision path is composed of M non-leaf nodes d(p) = {d 1 , . . . , d m , . . . , d M }, the path can be represented as: where f d m is the feature at the node d m ,τ d m is the corresponding threshold that is used to split node d m into two child nodes, and ⊗ ∈ {" ", " "} represents the boolean condition on each node d m [76]. The final prediction is the weighted sum of the predictions for each individual tree.

Materials and Methods
In this section, we explain the methods used to design SUNRISE. In Section 3.1, we describe the design process and participants. Then, in Section 3.2, we briefly explain Informatics 2022, 9, 17 6 of 28 how the overall system works. Sections 3.3 and 3.4 describe the analytics and interactive visualization modules of SUNRISE, respectively.

Design Process and Participants
We adopted a participatory design approach in the development of SUNRISE. The participatory design approach helps with a better understanding of EHR-driven tasks from the perspective of the healthcare providers. It helps generate design solutions, collect feedback iteratively, and thus, it is conducive to the continuous improvement of our proposed VA system such that it can meet the needs and expectations of the healthcare providers [77][78][79]. Participatory design is an iterative group effort that requires all the stakeholders (e.g., end-users, partners, or customers) to work together to ensure the end product meets their needs and expectations [80]. Several computer scientists, data scientists, an epidemiologist, and a clinician-scientist were involved in the conceptualization, design, and evaluation of SUNRISE. It is critical to enhance the communication between all the members of the design team since healthcare experts might have a limited understanding of the technical background of the analytical processes, and medical terms might not be very comprehensible to the members of the team with a technical background. In light of this, we asked healthcare providers to provide us with their feedback on different design decisions and performed formative evaluations at every level of the design process. In our collaboration with healthcare providers, we discovered that they want SUNRISE to enable them to perform two essential tasks: (1) to examine the relationship between different groups of laboratory test results and the disease and (2) to investigate the prediction result and track the decision path to determine how reliable the prediction is, based on their domain knowledge.

Workflow
As shown in Figure 1, SUNRISE has two components: the Analytics module and the Interactive Visualization module. The Analytics module utilizes Eclat and XGBoost to generate prediction models. The Interactive Visualization module encodes the data items generated by the Analytics module to four main sub-visualizations: (1) selection panel, (2) control panel, (3) probability meter, and (4) decision path panel. These sub-visualizations support multiple interactions to assist the user in achieving their tasks. These interactions include selecting, drilling, searching, measuring, and inserting/removing (for a list of possible interactions, see [81]).
The basic workflow of SUNRISE is as follows. First, we create an integrated dataset from different databases. Next, features in the laboratory test data are encoded and transformed into appropriate forms for analysis. In the next step, we apply the Eclat algorithm to the pre-processed dataset to obtain the frequent laboratory groups (i.e., frequent combinations of laboratory tests). For each laboratory group, we then create a subset of data with all the tests included in the group. In each subset, we only include rows where all the tests in the group are available. We then split each subset into train, validation, and test sets. We use the validation set to adjust the tuning parameters. Then, the XGBoost technique is applied to each subset with its corresponding tuning parameters. We develop four sub-visualizations in the Interactive Visualization module to allow the user to examine associations between laboratory groups and the outcome. The user can choose multiple laboratory tests using the selection control panel based on the result of the Eclat algorithm. When the user selects a test from the selection panel, the system then inserts a slider, associated with the selected test, in the input control panel. The user can probe the prediction models by creating input examples with their desired values using the input control panel. Upon clicking the "submit" button, SUNRISE passes the input (i.e., the chosen laboratory group and selected test values) to the Analytics module. The Analytics module uses the XGBoost model, corresponding to the chosen laboratory group, to predict the patient outcome and returns the results to the probability meter and the decision path panel. Finally, the user is able to observe the final prediction outcome and track the decision Informatics 2022, 9, 17 7 of 28 process that leads to the outcome to gain a deeper insight into the working mechanism of the prediction model. The basic workflow of SUNRISE is as follows. First, we create an integrated dataset from different databases. Next, features in the laboratory test data are encoded and transformed into appropriate forms for analysis. In the next step, we apply the Eclat algorithm to the pre-processed dataset to obtain the frequent laboratory groups (i.e., frequent combinations of laboratory tests). For each laboratory group, we then create a subset of data with all the tests included in the group. In each subset, we only include rows where all the tests in the group are available. We then split each subset into train, validation, and test sets. We use the validation set to adjust the tuning parameters. Then, the XGBoost technique is applied to each subset with its corresponding tuning parameters. We develop four sub-visualizations in the Interactive Visualization module to allow the user to examine associations between laboratory groups and the outcome. The user can choose multiple laboratory tests using the selection control panel based on the result of the Eclat algorithm. When the user selects a test from the selection panel, the system then inserts a slider, associated with the selected test, in the input control panel. The user can probe the prediction models by creating input examples with their desired values using the input control panel. Upon clicking the "submit" button, SUNRISE passes the input (i.e., the chosen laboratory group and selected test values) to the Analytics

Analytics Module
The Analytics module of SUNRISE generates prediction models using laboratory test data stored in EHRs by integrating the XGBoost techniques with Eclat. In this section, we describe how these techniques are combined to build the prediction models.
First, we create an integrated dataset from different databases. This dataset includes laboratory test data and the outcome for every patient. For a laboratory test, a patient might have multiple values from different times. Therefore, a sequence of laboratory test results can be formed. In order to represent this sequence for each patient, we use the average result. The outcome is considered positive if the patient develops the disease, and it is considered negative otherwise. If there is a large number of laboratory tests available, we cannot consider every possible combination of these tests because of limited memory and computational resources. Therefore, we use a frequent itemset mining technique to obtain the most frequent combinations and make the computations manageable. In order to generate more specialized prediction models, we use the Eclat algorithm to obtain frequent combinations of laboratory tests. Eclat is a fast algorithm that reduces memory requirements due to the use of the depth-first search technique. We use the "arules" library to implement the Eclat algorithm with a specified minimum support to create several laboratory groups (i.e., frequent itemsets) from laboratory tests included in the dataset. Then, for each group, we create a subset of data with all the laboratory tests that were included in the group. In order to get more accurate predictions, we only include rows where all the laboratory variables in the group are available in each subset. This approach allows us to deal with a more specialized model based on the available laboratory tests in the prediction phase rather than a generalized model using the whole dataset.
In the next stage, we apply the XGBoost technique to each group. For each laboratory group, we split its corresponding subset into train, validation, and test sets to generate the prediction model. We use 80% of patients for training the model, 10% for validation and 10% for testing. The validation set is used to tune the hyperparameters, when building the XGBoost model, to avoid overfitting and to control the "bias-variance" trade-off. We adjust the complexity of the model by modifying values of the tuning parameters maximum tree depth (max_depth) and minimum leaf weight (min_child_weight). Minimum leaf weight is the minimum weight that is required to generate a new node in the tree. Generation of children that correspond to fewer samples can be achieved by selecting a smaller value for this parameter, which allows for creation of more complex trees that are more likely to overfit. Maximum tree depth is defined as the maximum number of nodes that are allowed from the root of the tree to its farthest leaf. A large value for this parameter makes models more complex by letting the algorithm create more nodes. However, as we go deeper in the tree, splits become less relevant, thus causing the model to overfit. Another approach to avoid overfitting is to add randomness to make the model more robust to noise. Randomness is tuned by setting the sub-sampling rate (i.e., subsample parameter) at each sequential tree. Another parameter that can get adjusted is the model's learning rate (i.e., eta), which determines the contribution of each tree to the overall model. A low learning rate should result in better performance, but it will increase the computational cost. The final XGBoost model is a linear combination of all individual decision trees in the series, along with their contributions to the model, weighted by the learning rate. In order to detect the best combination of parameters for each laboratory group, we use the random search approach, which is shown to have higher efficiency compared to a manual search and grid trials when given the same computation time. Another advantage of random search is that, as opposed to the manual search, results obtained through random search are reproducible [82]. We use the combination of parameters with the best performance on the validation set to train the final model for each laboratory group.
We use the XGBoost library in R to implement XGBoost and use the area under the receiver operating characteristic curve (i.e., AUROC) [83,84] to measure the performance of all the models and choose the best combination of tuning parameters. A ROC curve shows the trade-off between specificity and sensitivity across different decision thresholds (i.e., threshold that is used for interpreting probabilities to class labels). Sensitivity measures how often a model classifies a patient as "at-risk" correctly. On the other hand, specificity is the capacity of a model to classify a patient as "risk-free" correctly [85].

Interactive Visualization Module
The Interactive Visualization module is composed of four main sub-visualizations: the selection panel, control panel, probability meter, and decision path panel ( Figure 2). In this section, we describe how data items that are generated in the Analytics module are mapped into visual representations to allow healthcare providers to accomplish their tasks. specificity is the capacity of a model to classify a patient as "risk-free" correctly [85].

Interactive Visualization Module
The Interactive Visualization module is composed of four main sub-visualizations: the selection panel, control panel, probability meter, and decision path panel ( Figure 2). In this section, we describe how data items that are generated in the Analytics module are mapped into visual representations to allow healthcare providers to accomplish their tasks.

Selection Panel
The selection panel displays the hierarchical structure of the laboratory data, using horizontally stacked rectangles ordered from left to right (Figure 2A). The first rectangle to the left (i.e., root) that represents the laboratory tests takes the entire height. Each child node is placed to the right of its parent with the height proportional to the percentage it consumes, relative to its siblings.
The selection panel utilizes the result of the Eclat algorithm from the Analytics module to allow the user to select their desired group of laboratory tests. The user can choose a test by clicking on its corresponding rectangle in the selection panel. This action changes the color of the selected rectangle from green to blue. When a test is selected, all the other rectangles, corresponding to tests that are not in any laboratory group with the selected test, become un-clickable and greyed out. The user can also insert/remove a slider corresponding to a test in the control panel by clicking/unclicking the rectangle corresponding to that test in the selection panel. We will describe the control panel in more detail in the next section.
The selection panel allows the user to observe the full name of laboratory tests that belong to a category by clicking on the rectangle corresponding to that category. In

Selection Panel
The selection panel displays the hierarchical structure of the laboratory data, using horizontally stacked rectangles ordered from left to right (Figure 2A). The first rectangle to the left (i.e., root) that represents the laboratory tests takes the entire height. Each child node is placed to the right of its parent with the height proportional to the percentage it consumes, relative to its siblings.
The selection panel utilizes the result of the Eclat algorithm from the Analytics module to allow the user to select their desired group of laboratory tests. The user can choose a test by clicking on its corresponding rectangle in the selection panel. This action changes the color of the selected rectangle from green to blue. When a test is selected, all the other rectangles, corresponding to tests that are not in any laboratory group with the selected test, become un-clickable and greyed out. The user can also insert/remove a slider corresponding to a test in the control panel by clicking/unclicking the rectangle corresponding to that test in the selection panel. We will describe the control panel in more detail in the next section.
The selection panel allows the user to observe the full name of laboratory tests that belong to a category by clicking on the rectangle corresponding to that category. In addition, when the user hovers the mouse over any of the rectangles, a tooltip with information regarding the test shows up. The selection panel is supported by a search bar. If the user enters the name of a specific laboratory test in the search bar, the border of the rectangle, corresponding to the specified test, becomes orange.

Control Panel
The control panel includes sliders corresponding to the laboratory tests that the user has chosen in the selection panel ( Figure 2B). It allows the user to probe the prediction models by creating input examples with their desired values and observe the output the model generates. When the user selects a test from the selection panel, the system inserts a slider, associated with the selected test, in the input control panel. Each slider is composed of a label including the full name and unit of measurement of its corresponding test, a horizontal axis with a linear scale representing the possible values of its associated test, and a rectangular handle that allows the user to change the values of the test. This panel allows the user to interactively tweak the values of the selected tests and see how the predictive model responds. The user can hover the mouse over the handle to observe the chosen value in any of the sliders.
When the user clicks on the "submit" button, after selecting multiple tests from the selection panel and choosing the values of each test using their corresponding sliders, the system passes the information regarding the chosen laboratory group and selected test values to the Analytics module. The Analytics module uses the corresponding XGBoost prediction model that is associated with the selected group to predict the outcome and returns the results to the probability meter and the decision path panel.

Probability Meter
The probability meter is a radial gauge chart with a circular arc that shows the probability of developing the outcome ( Figure 2C). This probability is the outcome prediction after the system feeds the input (i.e., chosen laboratory group and laboratory test values) to its corresponding XGBoost model. The value inside the arc represents the probability. If the probability is less than 50 percent, then the shading of the arc is green; otherwise, it is red.

Decision Path Panel
The Decision Path Panel allows the user to audit the decision process of a prediction outcome to make sure its corresponding XGBoost model works appropriately when given an input (i.e., chosen laboratory group and laboratory test values) ( Figure 2D). The final prediction outcome in the XGBoost model is the additive sum of all the interim predictions from each individual tree, where these interim predictions have unique decision paths. Therefore, summarizing the structure of all the decision paths that lead to the final prediction can deepen the understanding of the working mechanism of the model. Thus, this panel is designed to help the user audit the decision paths by summarizing the critical ranges of the laboratory tests involved in the chosen laboratory group and providing the detailed information of the decision paths layer by layer.
In order to reveal the structure and properties of the decision paths that lead to the final prediction, we first summarize the features (i.e., laboratory tests included in the chosen group) at the layer level. A feature may occur multiple times at each layer of all the decision paths (Equation (13)) for an input data point of x = x 1 , x 2 , . . . , x H with H features of Q = q 1 , q 2 , . . . , q H . In each layer l i of all the decision paths, for each feature q h ∈ q l i , we merge the ranges on q h to [τ We represent these summarized features using feature nodes. Each feature node summarizes the feature ranges for each laboratory test in each layer using a horizontal bar chart. The x-axis uses a linear scale to represent the possible values of the laboratory test associated with the feature node. The vertical bar represents the laboratory test value of the current input. The color of the feature node is identical to the color of its corresponding laboratory test slider in the input control panel. The user can hover the mouse over the feature node to observe the summarized ranges associated with the node.
We create a decision path flow by connecting the feature nodes from different layers using ribbons. The tooltip of a ribbon displays the pair of feature nodes that are connected by the hovered ribbon. This allows the user to examine the order of the features that appeared in the decision paths-very critical in measuring the importance of each feature. In the decision path panel, each column represents a layer where the right side represents higher layer depth. This supports the user in understanding how the ranges from each feature evolve from the root layer to the terminal node (i.e., leaf). We append a circle to the decision path flow to encode the leaf that represents the final prediction outcome. If the probability of developing the outcome for the input data point is less than 50 percent, then the color of the circle is green; otherwise, it is red (i.e., similar to the probability meter). The tooltip of the circle displays the probability of the outcome for the given input.

Usage Scenario
In this section, we demonstrate how SUNRISE can assist healthcare experts in studying associations between laboratory test results and acute kidney injury (AKI) using the data stored at ICES.

Data Description
We used a data cut that contained nine laboratory test results and the outcome of AKI for 229,620 patients, which were obtained from three health administrative databases (as shown in Table A1) from ICES. These datasets were linked using unique, encoded identifiers that were derived from patient health card numbers and were analyzed at ICES. We obtained outpatient albumin/creatinine ratio (ACr), serum creatinine (SCr), serum sodium (SNa), serum potassium (SK), serum bicarbonate (SBC), serum chloride (SCl), hemoglobin (HGB), white blood cell count (WBC), and platelets (Pl) measurements from the Dynacare medical laboratories, which represents around one third of outpatient laboratory results for Ontarians. A 365 days lookback window was used to obtain the outpatient laboratory test data. Hospital admission codes and emergency department visits were identified from the National Ambulatory Care Reporting System (ED visits) and the Canadian Institute for Health Information Discharge Abstract Database (hospitalizations). ICD-10 (i.e., International Classification of Diseases, post-2002) codes were used to identify the incidence of AKI from ED visit and hospital admission data. The cohort included senior patients, aged 65 years or older, who visited the emergency department (ED) or were admitted to hospital between 1 April 2014 and 31 March 2016. The hospital admission date or ED visit date served as the index date. If an individual had multiple hospital admissions or ED visits, the first incident was selected.

Outcome
AKI was the outcome variable for all the prediction models in this case study [80,82,83,[85][86][87]. AKI is defined as a sudden deterioration of the kidney function in a short period of time [87,88]. The management and diagnosis of AKI can be a challenging task because of its complex etiology and pathophysiology. In the process of AKI diagnosis, the available information is complemented by additional data, which is obtained from patients' medical history and different diagnostic tests, including laboratory tests. Laboratory tests play a crucial role in the detection and diagnosis of AKI. The incidence of AKI was captured using the National Ambulatory Care Reporting System and Canadian Institute for Health Information Discharge Abstract Database, based on the ICD-10 (International Classification of Diseases-Tenth Revision) diagnostic codes (i.e., "N17"). If an individual had multiple episodes of AKI, the first episode was selected. Positive cases were the ones in which AKI was acquired during the index date (i.e., 6743), and negative cases were those when AKI was never developed (i.e., 222,877).

Case Study
First, the features in the laboratory test data are encoded and transformed into appropriate forms for analysis. For instance, if there is more than one result for a test on a patient, the average result is used. Thus, we created nine variables for each laboratory test reported in the past year prior to the index date for each patient. Then, we apply Eclat with the minimum support of 0.05 to obtain the most frequent combinations of laboratory tests. At this stage, a total of 263 laboratory groups (i.e., frequent itemsets) were created from nine laboratory tests, as shown in Table A2. Next, we create a subset of data for each group only, including the rows where all the tests in the group are available.
Generally, in most of the laboratory groups the prevalence of AKI was lower than 2.5 percent, which led to an imbalanced class ratio. This issue can severely reduce the prediction performance, as most classifiers are developed to maximize the total number of correct predictions, and thus are more sensitive to the majority class. Therefore, if the imbalance issue is not addressed properly, then the classification result can be biased towards the majority class, leading to poor performance on the prediction of AKI. The misclassification of AKI, including false positive and false negative cases affects the choice of treatment and prognosis, which consequently might increase the overuse of clinical resources and the risk of deterioration in patient's condition. To address this issue, we set the weight of positive class (i.e., scale-pos_weigth) parameter in the XGBoost models using the following equation: Scale − pos − weight = √ number of non − AKI cases in each subset/number of AKI cases in each subset (14) After adjusting tuning parameters for each subset, we applied XGBoost with 100 trees and its corresponding tuning parameters, as well as scale-pos-weight parameter to each subset (Table A1). Thus, we created 263 XGBoost prediction models in total.
As shown in Figure 3, the laboratory tests are classified into four categories: Creatinine, Complete Blood Count (CBC), Serum electrolyte, and Urine. Creatinine refers to SCr. CBC is composed of HGB, WBC, and Pl. Serum electrolyte contains SNa, SK, SBC, and SCl, and Urine includes ACR. Now, let's assume the user is interested in exploring the relationship between SCr, SK, SCl, and AKI. The user can first select the rectangle corresponding to serum electrolytes to open it up and observe the full laboratory names included in that group and then select the rectangles corresponding to SCr, SK, and SCl in the selection panel. Upon selection, the system inserts a slider corresponding to the chosen test in the control panel. The system allows the user to probe the prediction model by generating input examples for their chosen tests using sliders in the control panel. As shown in Figure 3, the user has selected the SCr value of 70 umol/L, SK of 4 mmol/L, and SCl of 102 mmol/L through corresponding sliders. Upon submission, the analytics module uses the XGBoost model, generated with the subset of data, including SCr, SK, and SCl, to predict AKI with the input values and returns the result to the probability meter and the decision path panel. The probability meter in Figure 3 shows that the probability of developing AKI, for a patient with the chosen values for SCr, SK, and SCl, is 22 percent.  To ensure that the prediction is reliable, the user examines the decision path panel to check the result ( Figure 4). As shown in 4, the user can observe that SCr is the only feature that appears in the first and second layers. Since we expect features near the root of the path to be more important than features near the leaves, SCr has higher importance than SCl, and SK when predicting AKI, given the input. If the user hovers the mouse over the SCr feature node in the root layer, they can see the split threshold for that specific node (i.e., SCr > 121.1 umol/L). This information can guide the user to observe how the probability of developing AKI changes if they increase the SCr from 70 to 140 (i.e., a value greater than 121.1 umol/L for SCr). Figure 5 shows that the AKI probability is risen to 68 percent by increasing the SCr value. The user, then, might be curious to explore the association between SK, SCl, and AKI. In this case, they can click on SCr rectangle in the selection panel to remove its corresponding slider from the control panel. Let's assume To ensure that the prediction is reliable, the user examines the decision path panel to check the result ( Figure 4). As shown in 4, the user can observe that SCr is the only feature that appears in the first and second layers. Since we expect features near the root of the path to be more important than features near the leaves, SCr has higher importance than SCl, and SK when predicting AKI, given the input. If the user hovers the mouse over the SCr feature node in the root layer, they can see the split threshold for that specific node (i.e., SCr > 121.1 umol/L). This information can guide the user to observe how the probability of developing AKI changes if they increase the SCr from 70 to 140 (i.e., a value greater than 121.1 umol/L for SCr). Figure 5 shows that the AKI probability is risen to 68 percent by increasing the SCr value. The user, then, might be curious to explore the association between SK, SCl, and AKI. In this case, they can click on SCr rectangle in the selection panel to remove its corresponding slider from the control panel. Let's assume the user wants to observe how the changes in SK level would affect the probability of AKI. If the user increases SK to 6 mmol/L (high potassium level), the probability of AKI becomes 80 percent (Figure 6). The user can then observe the feature ranges and the path that led to this probability. For instance, they can observe that the split points for SK are around 4.7 to 5.1 mmol/L, which suggests that this range is critical when using SK in predicting AKI. They can also observe that SK and SCl have similar importance in AKI prediction based on the order they appear on the decision path.
check the result ( Figure 4). As shown in 4, the user can observe that SCr is the only feature that appears in the first and second layers. Since we expect features near the root of the path to be more important than features near the leaves, SCr has higher importance than SCl, and SK when predicting AKI, given the input. If the user hovers the mouse over the SCr feature node in the root layer, they can see the split threshold for that specific node (i.e., SCr > 121.1 umol/L). This information can guide the user to observe how the probability of developing AKI changes if they increase the SCr from 70 to 140 (i.e., a value greater than 121.1 umol/L for SCr). Figure 5 shows that the AKI probability is risen to 68 percent by increasing the SCr value. The user, then, might be curious to explore the association between SK, SCl, and AKI. In this case, they can click on SCr rectangle in the selection panel to remove its corresponding slider from the control panel. Let's assume the user wants to observe how the changes in SK level would affect the probability of AKI. If the user increases SK to 6 mmol/L (high potassium level), the probability of AKI becomes 80 percent (Figure 6). The user can then observe the feature ranges and the path that led to this probability. For instance, they can observe that the split points for SK are around 4.7 to 5.1 mmol/L, which suggests that this range is critical when using SK in predicting AKI. They can also observe that SK and SCl have similar importance in AKI prediction based on the order they appear on the decision path.

Discussion and Limitations
The purpose of this paper is to: (1) show how VA systems can be designed to examine relationships between laboratory test results and a specific disease outcome and (2) study the structure and the working mechanism of the risk prediction models. To accomplish these tasks, we have reported the development of SUNRISE, a VA system designed to support healthcare providers. SUNRISE incorporates two main components: an analytics module and an interactive visualization module. The analytics module integrates a frequent itemset mining technique (i.e., Eclat) with XGBoost to develop risk prediction models. The interactive visualization module then maps the data items generated by the analytics module to four main sub-visualizations-namely, the selection panel, control panel, probability meter, and the decision path panel. SUNRISE is unique in how it integrates XGBoost with Eclat to develop prediction models, and it allows the user to interact with the model and audit the decision process through multiple interactive subvisualizations. SUNRISE provides a balanced distribution of processing load by seamless integration of computational techniques (i.e., frequent itemset mining and XGBoost in the analytics module) with interactive visual representations (i.e., sub-visualizations in the interactive visualization module) to support the user's cognitive tasks. It provides the user with the means to probe the prediction model by creating input instances and observing the model's output. Furthermore, it allows the user to examine how a particular input Figure 6. Shows how the system gets updated when the user removes SCr from the control panel and increases the value of SK to 6 (mmol/L).

Discussion and Limitations
The purpose of this paper is to: (1) show how VA systems can be designed to examine relationships between laboratory test results and a specific disease outcome and (2) study the structure and the working mechanism of the risk prediction models. To accomplish these tasks, we have reported the development of SUNRISE, a VA system designed to support healthcare providers. SUNRISE incorporates two main components: an analytics module and an interactive visualization module. The analytics module integrates a frequent itemset mining technique (i.e., Eclat) with XGBoost to develop risk prediction models. The interactive visualization module then maps the data items generated by the analytics module to four main sub-visualizations-namely, the selection panel, control panel, probability meter, and the decision path panel. SUNRISE is unique in how it integrates XGBoost with Eclat to develop prediction models, and it allows the user to interact with the model and audit the decision process through multiple interactive sub-visualizations. SUNRISE provides a balanced distribution of processing load by seamless integration of computational techniques (i.e., frequent itemset mining and XGBoost in the analytics module) with interactive visual representations (i.e., sub-visualizations in the interactive visualization module) to support the user's cognitive tasks. It provides the user with the means to probe the prediction model by creating input instances and observing the model's output. Furthermore, it allows the user to examine how a particular input example's risk might change if it had different values. Finally, SUNRISE helps the user gain deeper insight into the underlying working mechanism of the model, increasing their confidence in the generated predictions.
Through a case study using the ICES-KDT dataset, we have shown that outputs generated by SUNRISE are consistent with what has been found in the literature. For instance, Chen et al. [89] has shown that higher serum potassium and lower levels of serum sodium were more likely to lead to the development of AKI. A similar output is observed in results generated by our system. As shown in Figure 7, in the case study presented in the paper, if the user selects SK value of 6 mmol/L and SNa of 96 mmol/L through corresponding sliders, upon submission, the probability meter shows the probability of developing AKI for a patient with the high value of SK, and low levels of SNa is 81 percent. Another example can be seen in Figure 8, where a low serum bicarbonate level (16 mmol/L) is shown to be significantly associated with the development of AKI (74%) while a normal serum bicarbonate level (26 mmol/L) is less likely to progress to AKI (43%). A similar association has been shown by Lim et al. [90], where patients who have low serum bicarbonate levels are estimated to develop AKI 1.57 times the patients with normal bicarbonate levels. According to the study done by Oh et al. [91], the incidence of AKI was significantly higher for patients with high and low serum Chloride levels compared to patients with normal chloride levels. As shown in Figure 9, a similar conclusion can be reached using SUNRISE. When the user selects a high chloride level of 110 mmol/L ( Figure 9A) or a low chloride level of 86 mmol/L ( Figure 9B), the probability of development of AKI is 62% and 60% consecutively, which are significantly higher than the probability of developing AKI (23%) with normal chloride levels ( Figure 9C). Several studies have shown the association between lower hemoglobin, which is frequent in hospitalized patients, and AKI [92,93]. A similar outcome is generated using our tool. As shown in Figure 10, a low hemoglobin level (64 g/L) is shown to be significantly associated with the development of AKI (82%), while a normal hemoglobin level (166 g/L) is less likely to progress to AKI (22%). In addition to these examples, there are many other hypotheses that can be generated from the results of the system. Although most of them are aligned with theories from medical literatures, some of them have not been studied yet. This is especially true for combinations of different test results and their associations with the outcome. SUNRISE can be used by domain experts to identify such hypotheses, which can further be verified through formal clinical studies. It is important to note that the system can be used with any dataset with laboratory test results to interactively explore the relationships between test results and an outcome. The accuracy of generated hypotheses depends on the quality of data that has been used to train the models.   In terms of generalizability, SUNRISE is designed in a modular way to make sure new data sources and data types can be incorporated easily. SUNRISE can be used to study other healthcare problems, such as exploring the association between medication dosage and diabetes. Although SUNRISE focuses on making XGBoost interpretable, we can apply a similar approach to other tree-based ensemble techniques such as Random forest. Random forest uses several decision trees and generates a final prediction model by aggregating the output of all internal trees. Unlike XGBoost, decision trees in Random forest are trained independently. One potential enhancement to support Random forest is to summarize the paths based on whether they generate positive predictions or negative ones and then let the user compare them in the same view.
One of the primary considerations in the design of SUNRISE is scalability. To make the control and decision path panels less cluttered (because of the user's limited visual capacity when the number of laboratory tests increases), we restrict the maximum number of tests that can get inserted into the control panel by adjusting the minimum support parameter of Eclat.
This research has several limitations. The first limitation is that, although we used a participatory design approach, and medical researchers have assessed SUNRISE and found it valuable, we did not conduct any usability studies to assess SUNRISE's performance and the efficiency of its interaction mechanisms. Second, the decision path panel sometimes does not function properly if the number of layers in the XGBoost trees gets higher due to screen space limitations and computational resources. Third, as we use curves to link the feature cells from different layers in the decision path panel, these curves might have overlapping problems. Another limitation is that the prediction models might be prone to overfitting because a small validation set might lead to an unstable model at a particular hyperparameter set. This will result in validation error measurements that are overoptimistic. Additionally, large variations in the dataset may drive the input vector to In terms of generalizability, SUNRISE is designed in a modular way to make sure new data sources and data types can be incorporated easily. SUNRISE can be used to study other healthcare problems, such as exploring the association between medication dosage and diabetes. Although SUNRISE focuses on making XGBoost interpretable, we can apply a similar approach to other tree-based ensemble techniques such as Random forest. Random forest uses several decision trees and generates a final prediction model by aggregating the output of all internal trees. Unlike XGBoost, decision trees in Random forest are trained independently. One potential enhancement to support Random forest is to summarize the paths based on whether they generate positive predictions or negative ones and then let the user compare them in the same view.
One of the primary considerations in the design of SUNRISE is scalability. To make the control and decision path panels less cluttered (because of the user's limited visual capacity when the number of laboratory tests increases), we restrict the maximum number of tests that can get inserted into the control panel by adjusting the minimum support parameter of Eclat.
This research has several limitations. The first limitation is that, although we used a participatory design approach, and medical researchers have assessed SUNRISE and found it valuable, we did not conduct any usability studies to assess SUNRISE's performance and the efficiency of its interaction mechanisms. Second, the decision path panel sometimes does not function properly if the number of layers in the XGBoost trees gets higher due to screen space limitations and computational resources. Third, as we use curves to link the feature cells from different layers in the decision path panel, these curves might have overlapping problems. Another limitation is that the prediction models might be prone to overfitting because a small validation set might lead to an unstable model at a particular hyperparameter set. This will result in validation error measurements that are overoptimistic. Additionally, large variations in the dataset may drive the input vector to the model outside the probability density functions of the training data. Thus, the system may show inaccurate results in low probability density function areas. Finally, we aggregated laboratory test results for a patient by taking the average of their test results in the past 365 days before the index date. As such, we might have lost vital information regarding laboratory tests. To address this issue, in future versions, we plan to offer the user different aggregation functions, such as the trend of change (i.e., increase/decrease) in tests over a certain period of time.

Conclusions
The overall goal of this paper is to show how VA systems can be designed systematically in order to support the investigation of various clinical problems. To achieve this, we report the development of SUNRISE and demonstrate how it can be employed to assist healthcare providers explore associations between laboratory test results and a disease outcome. SUNRISE's novelty and usefulness stems from its design, as it incorporates frequent itemset mining, XGBoost, visualization, and human-data interaction in an integrated manner to support complex EHR-driven tasks. We illustrate SUNRISE's value and usefulness through a usage scenario of investigating and exploring the relationship between laboratory test results and AKI using the data stored at ICES. We demonstrate how it can help clinicians and researchers at ICES probe the AKI risk prediction models by hypothesizing input examples and observing the model's output. Researchers can also audit the decision process to verify the reliability of the prediction models. Finally, the design concepts employed in SUNRISE are generalizable. These concepts can be utilized to systematically design any VA system whose purpose is to support clinical tasks involving investigation and analysis of EHR data using XGBoost and frequent itemset mining. Informed Consent Statement: ICES is a prescribed entity under PHIPA. Section 45 of PHIPA authorises ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system.

Data Availability Statement:
The study dataset is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access might be granted to those who meet prespecified criteria for confidential access, available at www.ices.on.ca/DAS (email das@ices.on.ca). The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs might rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or require modification.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. List of databases held at ICES (an independent, non-profit, world-leading research organization that uses population-based health and social data to produce knowledge on a broad range of healthcare issues).