Automated Recommendation of Aggregate Visualizations for Crowdfunding Data

: Analyzing crowdfunding data has been the focus of many research efforts, where analysts typically explore this data to identify the main factors and characteristics of the lending process as well as to discover unique patterns and anomalies in loan distributions. However, the manual exploration and visualization of such data is clearly an ad hoc, time-consuming, and labor-intensive process. Hence, in this work, we propose LoanVis , which is an automated solution for discovering and recommending those valuable and insightful visualizations. LoanVis is a data-driven system that utilizes objective metrics to quantify the “interestingness” of a visualization and employs such metrics in the recommendation process. We demonstrate the effectiveness of LoanVis in analyzing and exploring different aspects of the Kiva crowdfunding dataset.


Introduction
Crowdfunding, also called peer-to-peer lending, social lending, or crowd lending, is an internet-based fundraising mechanism soliciting small monetary contributions from crowd donors to help others in need [1,2].The importance of crowdfunding is further underscored as economies worldwide are racing to meet the Sustainable Development Goals (SDGs) by 2030.Particularly, crowdfunding plays an important role in achieving those goals, especially the ones related to poverty, hunger, health, education, and gender equality [3].
Among those crowdfunding platforms, Kiva, which is the focus of this work, is becoming an increasingly popular platform.Kiva's goal is to develop an instrument to use charitable loans to combat and eradicate poverty [5].Lenders provide money with the understanding that they would only recoup their initial investment and lose any profit [6].Kiva has provided people and groups with modest incomes with loans totalling more than $1.4 billion [4].
The rapid development of crowdfunding platforms, such as Kiva, has attracted much attention from the data analytics research community.Particularly, solutions have been proposed to address some of the interesting research problems that arise in crowdfunding platforms, such as predicting project success [7], tracking the funding dynamics [8], recommending donors [9], recommending projects for donors [10][11][12], etc. Orthogonal to the existing work mentioned above, our focus in this paper is to utilize visual data exploration techniques, and in particular visualization recommendation systems, to unlock valuable insights from crowdfunding data, as described next.
Visual data exploration is an essential step in the data science pipeline, in which analysts examine datasets up-close to extract valuable insights [13][14][15][16].This process has traditionally been performed manually, where the analyst interactively applies various exploratory queries (such as SQL-based filtering, aggregation, joins, etc.).The results of those queries are presented as data-driven visualizations (e.g., bar or line charts, scatter plots, etc.).The analyst then examines those visualizations looking for insights, which are used as a springboard to decide their next analytical query.
However, unlocking those insights has been anecdotally compared to "finding a needle in a haystack" [17].That is particularly true for big high-dimensional datasets with tens to hundreds of attributes and measures, as it is typically the case in financial data warehouses and scientific datasets [13].Particularly, the "curse of dimensionality" leads to analysts having to manually construct a prohibitively large number of queries and visually explore their results looking for insights, which is clearly an ad hoc and labor-intensive process.That challenge motivated multiple research efforts that focused on automatic recommendation for data exploration.That is, recommender systems dedicated to providing the user with suggestions for specific, high-utility visualizations (e.g., [17][18][19][20][21][22][23][24][25][26][27][28]).Such systems are data-driven (also known as discovery-driven) systems, which use heuristic notions of "interestingness" and employ them in the recommendation.The main idea underlying those solutions is to automatically generate "all possible" exploratory queries of the data, generate their corresponding visualizations, and recommend the top-k interesting ones, where k is a userdefined parameter.Meanwhile, the interestingness of a query/visualizations is quantified using some utility metric over its result.For instance, an exploratory query, which applies filters such as: WHERE country = 'Kenya' AND borrower-gender = 'Female'; is considered interesting if it maximizes some quantifiable metric (e.g., skewness, surprisingness, diversity, or deviation from an expected distribution [19,26,27]).
In exploring crowdfunding datasets, analysts are typically interested in identifying unique patterns, anomalies, and discrepancies across the different aspects of such data (e.g., borrower-gender, country, project-activity, etc.) [29][30][31].For instance, the data visualization tool Tableau features a visualization-based case study of the Kiva dataset called Kiva Loan Story [32].Furthermore, in an attempt to gain insights from crowdfunding data, Kiva deploys its own data visualization and analytics dashboard [30].However, such a dashboard mainly supports basic data filtering operations and rudimentary visualizations based on the user's search parameters.That is, it is expected that the user knows exactly in advance the insights they are looking for!However, such insights become clear only in "hindsight" after spending a long time exploring the data.
Hence, in this work, we present LoansVis, our visual analytics platform, which employs well-studied utility metrics (e.g., [19,21,24,26,27]) to automatically recommend interesting data visualizations that reveal hidden insights (e.g., discrepancies and anomalies in the distribution of loans across different countries or by gender).Accordingly, we have designed our proposed LoanVis to automatically provide recommended visualizations in two forms: (1) Value-based recommendation: in which LoanVis recommends high-utility visualizations based on some specific values provided by the analyst (e.g., recommends aggregate visualizations based on the analyst manual choice of: country = 'Kenya'), and (2) Aspectbased recommendation: in which the analyst's manual interaction is further minimized and they only have to specify their aspect of analysis (e.g., country or borrower-gender), then LoanVis automatically recommends both: (i) a specific value for the user-specified aspect (e.g., country = 'Congo'), and (ii) high-utility insightful visualizations based on that recommended value (e.g., a visualization of loan distribution across gender in the country of Congo).
Our extensive experimental evaluation demonstrates LoanVis' ability to automatically detect and recommend interesting visualizations based on the Kiva crowdfunding dataset.Particularly in this work, we present some of the recommended visualizations that show unique and often unexpected patterns in the distribution of loans across different aspects of analysis (e.g., country, gender, sector, etc.).Our findings complement and expand on existing related works that aim at studying the discrepancy and factors affecting crowdfunding (e.g., [2,29,30,33]).For instance, the work in [33] investigates how the gender, country, and type of borrower's business affect the lenders' lending decisions.The research in [2] examines the impact of forming groups on receiving fast funds for loan requests.Both of the works in [6,34] focus on the loan characteristics, either through examining the factors that motivate lenders to contribute to donations or that lead to the success or failure of crowdfunding.
However, while those studies relied on time-consuming manual data exploration and visualization, our proposed LoanVis facilitates understanding the characteristics of crowdfunding by automatically recommending visualizations that reveal unique patterns and discrepancies.
The rest of this paper is organized as follows.In Section 2, we present our methodology, which is based on a discussion of the Kiva dataset (Section 2.1), followed by the details of our proposed LoanVis system (Section 2.2).In Section 3, we present our results based on the visualizations recommended by our LoanVis system.We conclude in Section 4.

Materials and Methods
Differently from existing work that relies on manual data exploration for discovering insights from crowdfunding databases, in this work we propose LoanVis, which is an automated solution for discovering and recommending those valuable and insightful visualizations.LoanVis is a data-driven system that utilizes objective metrics to quantify the "interestingness" of a visualization and employs such metrics for recommending insightful visualizations, as shown in Figure 1.Particularly, the main idea underlying LoanVis is to automatically generate "all possible" visualizations and recommend the top-k interesting ones, where k is a user-specified parameter.However, to identify those interesting visualizations, LoanVis employs well-studied utility functions that assign each visualization a utility score [21,24,27], and then recommends the top-ranked visualizations according to that score.Notice that Figure 1 illustrates only a single iteration of that visualization recommendation process.However, that process is typically repeated multiple times throughout a data exploration session, along with other data exploration tools.For a comprehensive overview of the workflow involved in an end-to-end data exploration session, we refer the reader to [14].In the following, we first describe the Kiva crowdfunding dataset, then we present our visualization recommendation methodology employed by LoanVis.

The Kiva Dataset
The Kiva dataset has attracted the attention of multiple research studies in the area of data science and analytics (e.g., [2,6,9,35]).The overarching goal of such research is to utilize the Kiva data set for extracting knowledge, gaining valuable insights into the crowdfunding model, and understanding its loan funding characteristics.In this paper, we expand on such research efforts and present our LoanVis system, which automates the process of exploring the Kiva crowdfunding dataset and provides analysts with fast data-driven recommendations of insightful visualizations.
There are mainly four types of participants in Kiva, namely: loans, borrowers, lenders, and field partners.The loans are in the form of fund-raising campaigns posted by field partners on behalf of the borrowers.Particularly, field partners are typically local non-profit organizations that act as the link between borrowers and Kiva.Lenders participate in a donation-based crowdfunding model (i.e., donors receive back their principal investment without profits).When exploring the Sector aspect of the Kiva dataset, the top-1 visualization recom-306 mended by LoanVis according to the employing deviation metric is the one based on the 307 Entertainment sector, which has been discussed in the previous section, and presented 308 in Figure 5. Recall, that recommended visualization shows the significant discrepancy 309 associated with the Entertainment sector, where for most countries loans provided for that 310 sector is minimal, except for the USA and Israel.In addition to Figure 5, that insight could 311 be further understood by examining Figure 6 need a crisp and polished version of that 312 figure.Particularly, Figure 6 shows the normalized distributions of the amount of loans for 313 all sectors (i.e., comparison view) vs. the normalized distribution of loans directed to the 314 entertainment sector (i.e., target view).As expected, and as shown in    ??).Further, the figure clearly emphasizes and clarifies the discrepancy highlighted 317 earlier in Figure 5.For instance, while a country such as the Philippines receives almost 318 10% of the total kiva loans distributed worldwide, its share of the Entertainment loans does 319 not exceed 1% why there are two y-axis in that figure?do the purple bars actually add up to 320 1.0?.In comparison, projects in the USA receive less than 6% of the total distributed loans, 321 whereas its share of the loans directed to the Entertainment sector is the highest among all 322 countries at almost 10%.More interestingly, as Figure 6 shows, Israel receives only xx% of 323 the total loans worldwide, but receives xx% of the Entertainment loans.While both Fig- 324 ures 5 and 6 deliver the same insights, Figure 5 utilizes two scales for plotting the absolute 325 values on the Y-axis, whereas Figure 6 used normalized values derived based on Eq.??.For 326 the sake of simplicity, and to enhance readability, all of the remaining visualizations are 327 presented using absolute values, similar to Figure 5.

328
Figure 7 shows the top-2 visualization recommended by LoanVis along the sector 329 aspect.In contrast to the top-1 visualization, which is the based on the Country dimension 330 (Figure 5), this top-2 recommendation is based on the Gender dimension (Figure 7).Particu-331 larly, Figure 7 shows the total amount of loans funded for projects in all sectors per gender 332 (i.e., comparison view) vs. the loans funded in the specific Wholesale sector (i.e., target 333 view).From Figure 7, it is interesting to notice that, in general, female-led projects receive 334 most of the funding (about $200M vs. $60M for male-led projects).However, for projects in 335 the particular wholesale sector, male-led projects seem to receive the higher share (about 336 $300k for female-led wholesale projects vs. $400k for male-led ones).

337
Interestingly, that same discrepancy applies to projects in the Construction sector, which 338 was the top-3 visualization recommended by LoanVis, and is shown in Figure 8.During 339 our analysis, we initially thought that male-led projects will receive the highest amount of 340 loans in the Construction sector.However, as automatically discovered by LoanVis, that 341 discrepancy is more pronounced in the Wholesale sector (Figure 7), which scored a deviation 342 Version March 12, 2023 submitted to Journal Not Specified 11 of 15  10).Particularly, the figure shows the amounts of loans paid under each of 351 the different repayment methods.Interestingly, for all countries (i.e., the comparison 352 view), monthly repayments is the most popular method for paying back the funded loans, 353 followed by irregular repayments, then bullet repayment (i.e., pay back the loan all at once 354 in full amount).However, for Namibia (i.e., the recommended target view) the pattern 355 is completely different!Particularly, as the figure shows, all loans directed to projects in 356 Namibia were paid back as bullet repayments!That significant discrepancy between how 357 loans are paid for Namibian projects vs. the rest of the world led to that view achieving a 358 high-deviation value of x.xx and making it as a top recommendation.

359
Figure 11 shows another visualization that LoanVis recommended among the top ones 360 during our analysis along the Country aspect.As it is already known from previous studies 361 of the Kiva dataset, most funded projects are led by females, which is also confirmed by 362 the general distribution of loans in the comparison view shown in Figure 11.However, 363 LoanVis automatically discovered an interestingly different pattern for the country of 364 Congo, which is shown as the target view in Figure 11.Particularly, and differently from 365 the general pattern, Figure 11 shows that in Congo the vast majority of funded projects are 366 led collaboratively by both males and females.In fact, upon further analysis we realized 367 that those mixed projects constitute xx% of the funded projects in Congo, while they 368 constitute only xx% of the funded projects worldwide.10).Particularly, the figure shows the amounts of loans paid under each of 351 the different repayment methods.Interestingly, for all countries (i.e., the comparison 352 view), monthly repayments is the most popular method for paying back the funded loans, 353 followed by irregular repayments, then bullet repayment (i.e., pay back the loan all at once 354 in full amount).However, for Namibia (i.e., the recommended target view) the pattern 355 is completely different!Particularly, as the figure shows, all loans directed to projects in 356 Namibia were paid back as bullet repayments!That significant discrepancy between how 357 loans are paid for Namibian projects vs. the rest of the world led to that view achieving a 358 high-deviation value of x.xx and making it as a top recommendation.

359
Figure 11 shows another visualization that LoanVis recommended among the top ones 360 during our analysis along the Country aspect.As it is already known from previous studies 361 of the Kiva dataset, most funded projects are led by females, which is also confirmed by 362 the general distribution of loans in the comparison view shown in Figure 11.However, 363 LoanVis automatically discovered an interestingly different pattern for the country of 364 Congo, which is shown as the target view in Figure 11.Particularly, and differently from 365 the general pattern, Figure 11 shows that in Congo the vast majority of funded projects are 366 led collaboratively by both males and females.In fact, upon further analysis we realized 367 that those mixed projects constitute xx% of the funded projects in Congo, while they 368 constitute only xx% of the funded projects worldwide.In this experiment we focus on automatically generating recommendations based on 372 the em Year aspect.Figure 12 shows the top-1 visualization recommended by LoanVis for 373 that aspect.Particularly, as the figure shows, LoanVis recommended the selection of Year = 374 2016, and also recommended a visualization in which the Y-axis is the Average Loan Amount 375 and the X-axis is the Country.Examining Figure 12, it shows the overall distribution of 376 the average loan amount received by each country over all the recorded years (i.e., the 377 Comparison view, which is shown in black).However, that general overall distribution is 378 significantly different in year 2016, which has been recommended by LoanVis.Specifically, 379 looking at the distribution of loans in 2016 (i.e., target view, which is shown in red color), 380 we notice that while the distribution of the average loan amount of most countries followed 381 the same general pattern as in the comparison view, some discrepancies stand out, namely: In this experiment we focus on automatically generating recommendations based on the em Year aspect.Figure 12 shows the top-1 visualization recommended by LoanVis for that aspect.Particularly, as the figure shows, LoanVis recommended the selection of Year = 2016, and also recommended a visualization in which the Y-axis is the Average Loan Amount and the X-axis is the Country.Examining Figure 12, it shows the overall distribution of the average loan amount received by each country over all the recorded years (i.e., the Comparison view, which is shown in black).However, that general overall distribution is significantly different in year 2016, which has been recommended by LoanVis.Specifically, looking at the distribution of loans in 2016 (i.e., target view, which is shown in red color), we notice that while the distribution of the average loan amount of most countries followed the same general pattern as in the comparison view, some discrepancies stand out, namely: 1) few countries did not receive any loans in 2016 (e.g., Bhutan, Chile, Congo, and Iraq), Version March 12, 2023 submitted to Journal Not Specified 10 of 15 Figure 7 shows the top-2 visualization recommended by LoanVis along the sector aspect.In contrast to the top-1 visualization, which is the based on the Country dimension (Figure 5), this top-2 recommendation is based on the Gender dimension (Figure 7).Particularly, Figure 7 shows the total amount of loans funded for projects in all sectors per gender (i.e., comparison view) vs. the loans funded in the specific Wholesale sector (i.e., target view).From Figure 7, it is interesting to notice that, in general, female-led projects receive most of the funding (about $200M vs. $60M for male-led projects).However, for projects in the particular wholesale sector, male-led projects seem to receive the higher share (about $300k for female-led wholesale projects vs. $400k for male-led ones).
Interestingly, that same discrepancy applies to projects in the Construction sector, which was the top-3 visualization recommended by LoanVis, and is shown in Figure 8.During our analysis, we initially thought that male-led projects will receive the highest amount of loans in the Construction sector.However, as automatically discovered by LoanVis, that discrepancy is more pronounced in the Wholesale sector (Figure 7), which scored a deviation Kiva provides open public access to its data through daily snapshots and an Application Programming Interface (API).The Kiva data contains a set of heterogeneous information (i.e., data attributes) about the loans, lenders, borrowers, and field partners [34].In particular, the main "objects" in the Kiva dataset are the borrower, the lender, and the partner, which are all connected to the loan object [1].That is, the dataset is centered around the loan data object, namely the "Loans" table.Each loan listing would include information regarding the key details about that loan, such as the industry for which the loan is intended, together with information about the borrowers, as well as the loan financial information (e.g., amount, term, repayment interval, etc.).Table 1 provides a summary of the main attributes of the Loans dataset.
The utilized Kiva dataset contains information about more than 671,000 loans disbursed in 87 countries.A quick and simple exploration of the Kiva dataset can reveal some basic and interesting insights.For instance, out of the 15 funded sectors, projects related to the Food and Agriculture sectors receive the most loans.In terms of countries, borrowers from the Philippines and Kenya top the list of 87 countries for the total number of funded loans.Gender-wise, the data indicates that women make up the majority of borrowers, with 64% of borrowers being female.
Manually exploring the Kiva dataset and discovering some basic insights, similar to the ones mentioned above, has been the focus of multiple works (e.g., [2,29,30,33]).For instance, the work in [29] explores the Kiva dataset to understand the impact of the project sector on the distribution of loan amount (i.e., the relationship between attributes Funded Amount and Sector in Table 1).Similarly, the work in [30] attempts to understand the relationship between lending activity and features that characterize the loan, including the country of the loan, the loan sector, and the gender of the borrowers (i.e., attributes listed in Table 1 as Country, Borrower Genders, and Sector).In fact, Kiva provides its own data visualization and analytics dashboard [36], which allows users to explore the underlying crowdfunding data and facilitates conducting studies similar to the ones listed above.However, in this work, our goal goes beyond providing such basic statistics, and our focus is on leveraging visual analytics to automatically recommend data-driven, insightful visualization of the Kiva dataset, as described in the next sections.

Borrower information Country
The name of the country in which the loan was disbursed.

Borrower Genders
Comma separated list of Male, Female, where each instance represents a single male/female in the group.

Loan Usage information
Sector High-level category of the loan usage field.Activity Granular category of the loan usage field.

Use
Exact Usage of loan amount.

Loan Dates Posted Time
The time at which the loan is posted on Kiva by the field agent.

Funded Time
The time at which the loan posted to Kiva gets funded by lenders completely.

Disbursed Time
The time at which the loan is disbursed by the field agent to the borrower.

Loan Amount Funded Amount
The amount disbursed by Kiva to the field agent (USD).

Loan Amount
The amount disbursed by the field agent to the borrower (USD).

Lender Count
The total number of lenders that contributed to this loan.

Loan Repayment Term in Months
The duration for which the loan was disbursed in months.

Repayment Interval
Loan repayment pattern -either monthly, irregular, or bullet (one time).

The LoanVis Visualization Recommendation System
The process of visual data exploration is typically initiated by an analyst specifying an exploratory query Q on a database D, as shown in Figure 1.The result of query Q, denoted as R, represents a subset of the database D, which the analyst can further transform into data visualizations that might reveal some interesting insights.For instance, an analyst exploring the Kiva dataset using our LoanVis system might pose some specific queries that are based on the following general query structure Q: Where in the exploratory query Q, D specifies the explored dataset (i.e., the Loans table) and T specifies a combination of predicates, which selects a subset of D for visual analysis.
For instance, an analyst who is reproducing the results in [29] might want to study the disparity in loan distribution for projects in the Entertainment sector, and in turn will pose a query Q, in which T is specified as: sector = Entertainment.Similarly, exploratory analysis of the other attributes and values of the Kiva dataset can be conducted using alternative settings of the predicate T (e.g., gender = Female or country = Kenya AND sector = Food, etc.).
A visual representation of the query Q is basically the process of generating different aggregate views of its result (i.e., R), which are then plotted using some of the popular data visualization representations (e.g., bar charts).An analyst typically examines those visualizations looking for insights, which are used as a springboard to decide their next analytical query.For instance, the results for a query in which T is sector = Entertainment might trigger further analysis based on gender, where an analyst would pose a subsequent query in which T is sector = Entertainment AND gender = Female.
Accordingly, we employ a multi-dimensional data model of D, which consists of a set of dimension attributes A (e.g., country, sector, etc.) and a set of measure attributes M (e.g., loan amount, lender count, etc.).Additionally, F is the set of possible aggregate functions over the measure attributes M, such as SUM, COUNT, AVG, MIN, and MAX.
Hence, an aggregate view V i over R is represented by a tuple (A, M, F) where A ∈ A, M ∈ M, and F ∈ F, as shown in Figure 1.That is, R is grouped by dimension attribute A and aggregated by function F on measure attribute M.
A possible view V i of the example query Q above would be expressed as: V i : SELECT A, F(M) FROM D WHERE T GROUP BY A; where the GROUP BY clause specifies the dimension A for aggregation, and F(M) specifies both the aggregated measure M and the aggregate function F.
Clearly, there is a large number of possible aggregate views that can be generated from the results of each posed exploratory query Q.In fact, the number of those views/visualizations is equal to the number of all possible combinations of dimensions, measures, and aggregate functions.That is, equal to: |A| × |M| × |F|.For instance, Figure 2a shows one of the possible visualizations of the result of the query Q above, in which sector = Entertainment.That visualization is equivalent to the following aggregate view V i : V i : SELECT country, SUM (loan amount) FROM Loans WHERE sector = Entertainment; GROUP BY country; Notice that to enhance the readability of a visualization, the user might include an ORDER BY clause in the view definition described above.For example, the visualization shown in Figure 2a is generated after extending V i with the ORDER BY SUM (loan amount).However, including such a clause would only change the order of the visualized bars, not the insights revealed by the visualization.Moreover, a HAVING clause might be considered to specify a condition that must be met by each group (i.e., bar) in the visualization.For instance, HAVING SUM (loan amount) > $5000.However, since in LoanVis those views are generated automatically, as we discuss in the next section, all the possible conditions for the HAVING clause must be considered during the view generation process.Clearly, there are an infinite number of such conditions, which would render the process of automated recommendation infeasible.Hence, a HAVING clause is excluded from our view generation model, and we adopt the basic aggregate view definition described above.

Recommending Insightful Visualizations
Typically, a data analyst is keen to find visualizations that reveal some interesting insights about the analyzed data.For instance, to conduct studies similar to [2,29,30,33], an analyst would be exploring the Kiva dataset looking for visualizations that might reveal interesting discrepancies or anomalies in loan disbursement and distribution.That is, analysts need to manually construct a prohibitively large number of queries and visually explore their results looking for insights, which is clearly an ad-hoc and labor-intensive process.Particularly, the complexity of the manual visual data exploration process is contributed to: (1) the large number of possible visualizations, and (2) the uncertainty about the interestingness of each visualization.The challenges mentioned above motivated multiple research efforts that focused on automatic recommendation for visual data exploration.That is, recommender systems that provide analysts with suggestions of interesting visualizations based on some objective, well-defined quantitative metrics (e.g., [21,24,26,27]).
For example, DeepEye is a visual insight recommendation system that employs a supervised machine learning approach to capture human perception by understanding existing examples [20].QuickInsights [27] supports multiple types of data-driven insights for a comprehensive analysis (e.g., correlation, skewness in data distribution, diversity, etc.), and our work [37] studies the impact of data quality problems on discovering those insights.Meanwhile, SeeDB is one of the first visual insight recommendation systems that recommends top-k aggregate visualizations based on data-driven, deviation-based approach [19,21].Other works that leverage a deviation-based approach include MuVE [23], which addresses binning problems in visualization recommendation systems.For further details, we refer the reader to a comprehensive recent survey on this topic [16].
In this work, and similar to several existing approaches (e.g., [21,23,27]), we adopt a deviation-based metric, which is able to provide analysts with interesting visualizations that highlight some of the particular patterns of the analyzed datasets .In particular, the deviation-based metric measures the distance between V i (R) and V i (D).That is, it measures the deviation between the aggregate view V i generated from the subset data R vs. that generated from the entire database D. As such, V i (R) is denoted as target view (e.g., sector = Entertainment), whereas V i (D) is denoted as comparison view (e.g., sector = ALL).The premise underlying the deviation-based metric is that a view V i that results in a higher deviation is expected to reveal some interesting insights that are very specific to the subset R and distinguish it from the general patterns in D. That is particularly important when exploring the Kiva dataset since the deviation-based metric is naturally able to capture and quantify anomalies and discrepancies in loan distribution, which has been one of the main focuses of existing work (e.g., [2,29,30,33]).
To ensure that all views are of the same scale, each target view V i (R) is normalized into a probability distribution P[V i (R)], and similarly, each comparison view into P[V i (D)].Particularly, consider an aggregate view V = (A, M, F).A bar chart visualization of that aggregate view can be represented as the sequence of pairs: < (a 1 , f 1 ), (a 2 , f 2 ), . . ., (a l , f l ) >, where l is the number of distinct values (i.e., groups) in the dimension attribute A, a i is the i-th group in attribute A, and f i is the aggregated value F(M) for the group a i .For example, in Figure 2a, each a i is a country, whereas each f i is the amount of loans disbursed to that country a i for projects related to the entertainment sector.Finally, V is scaled by the sum of aggregate values U = l ∑ p=1 f p , leading to the probability distribution P[V], which is computed as: For an arbitrary view V i (i.e., a specific combination of V = (A, M, F)), given the probability distributions of its target and comparison views (i.e., P[V i (R)] and P[V i (D)]), the deviation S(V i ) is computed as the distance between those probability distributions.Formally, for a given distance function dist (e.g., Euclidean distance, Earth Mover's distance, etc.), S(V i ) is computed as: In this work, we adopt Euclidean distance as our distance function.Hence, the deviation-based metric for a view V i is computed as: Consequently, the deviation S(V i ) of each possible view V i is computed, and the k views with the highest deviation are recommended (i.e., top-k), as shown in Figure 1.
Illustrative Example: Consider a data analyst trying to gain insights into the loans disbursement to projects in the Entertainment sector.Particularly, the analyst poses an exploratory query: Clearly, the query Q above will return all the information about all the loans related to the Entertainment sector.Such information include different dimensions (e.g., country, repayment interval, gender, etc.), and different measures (e.g., loan amount, funded amount, lender count, etc.).Hence, the analyst can manually try creating different visualizations based on the different combinations of dimension and measure attributes, hoping that some of those visualizations would reveal interesting insights.
Alternatively, using our proposed LoanVis system, those insightful visualizations are quickly and automatically recommended to the analyst.In particular, LoanVis applies different SQL aggregate functions (i.e., F) on the views resulting from all the possible pairwise combinations of dimensions and measures (i.e., A and M), then the most interesting views are presented to the analyst (please see Figure 1).That is, the top-k views/visualizations with the highest deviation-based utility score, based on the user's setting for the value of k.
Figure 2a shows the top-1 target view recommended by LoanVis based on the user input query Q.In particular, out of all the possible combinations of A, M, and F, the view recommended by LoanVis is based on a visualization, in which: A (x-axis) = country, M (y-axis) = loan amount, and F = SUM().Such view is equivalent to the following SQL query V t .
V t : SELECT country, SUM (loan amount) FROM loans WHERE sector = Entertainment; GROUP BY country; Essentially, LoanVis recommends that view shown in Figure 2a because it achieves the highest score according to our ranking utility function (i.e., the deviation-based metric).Specifically, the visualization based on the entertainment sector (the target view shown in Figure 2a) shows the highest deviation from the same visualization when generated for the aggregation of all sectors combined (the comparison view in Figure 2b), where the comparison view is equivalent to the following SQL query V c : V c : SELECT country, SUM (loan amount) FROM loans GROUP BY country; To help understand that recommendation, we combine the target view V t and comparison view V c in Figure 3, which reveals some very interesting observations regarding the disparity of loan distribution over the different sectors across different countries.As Figures 2b and 3 show, countries such as the Philippines, Kenya, Peru, Rwanda, Uganda, Colombia, Pakistan, Lebanon, Mexico, and Samoa are the ones that received the biggest share of total loans (summed over all the different sectors).However, when it comes to the specific loans to the entertainment sector, it is the United States of America that received most of the loans in that sector, at roughly $800,000.In fact, examining Figure 3 shows that the entertainment sector loans constitute about 2.67% of the total USA loans, whereas in the Philippines, that percentage drops to only 0.1%.That is, while projects related to the entertainment sector constitute a significant percentage in the USA, they are of lesser significance in other countries, where most loans are related to other sectors (e.g., agriculture, etc.).

Aspect-Based Recommendation
Notice that in the previous discussion, the analyst had to specify two inputs in their exploratory query: (1) an aspect for analysis (e.g., sector), and (2) a specific value within that aspect (e.g., entertainment).Such specification is realized using the exploratory query predicate T (e.g., T: WHERE sector = Entertainment).However, working with the Kiva dataset, we have learned firsthand that it is often challenging to specify those aspects and their corresponding values, which might eventually lead to some interesting visualizations being recommended!For instance, an analyst might assume that an exploration based on WHERE sector = education or WHERE sector = agriculture would lead to some interesting recommended visualizations.However, during our analysis, we realized that all the possible visualizations, which are based on those two particular sectors, exhibit very low deviation, including the top-k ones.That is, there was nothing unique about the loans disbursed to those sectors, and their patterns followed the same pattern as that of the aggregated loans disbursed to all sectors.It was only after several rounds of experimenting that we discovered that the visualizations based on the Entertainment sector are the ones that reveal some interesting insights.Same for the visualizations based on the Wholesale and Construction sectors, which are presented in the next section.
However, current visualization recommendation systems (e.g., [18,26,27]) assume that the analyst is able to formulate a well-defined query that selects a subset of data, which leads to insightful visualizations being recommended (i.e., visualizations with a high utility score).That is, they are limited to only recommending interesting visualizations based on a precise exploratory query for which the analyst provides all the necessary query filters.Meanwhile, in reality, it is typically a challenging task to pose an exploratory query, which can immediately reveal some insights.Hence, it is a continuous process of trial and error, in which the analyst keeps refining their query filters manually and iteratively until some interesting visualizations are recommended.Therefore, in our design of LoanVis, we emphasize that, in addition to the existing techniques for automatically recommending interesting views, there is an equal need for additional techniques that can also automatically select subsets of data that would potentially provide such interesting views.Hence, our goal in this work is not only to recommend interesting visualizations but also to recommend exploratory queries that lead to such visualizations.
To achieve that goal, LoanVis expands and explores a larger search space of possible visualizations in order to recommend the top-k most insightful ones.In particular, we introduce the aspect-based recommendation, where an aspect could be any of the dimension or measure attributes of the analyzed dataset.More formally, in addition to the set of dimensions A and measures M, we introduce the set of aspects C, where C = A ∪ M.
Hence, for a given aspect C ∈ C, LoanVis explores its distinct values searching for those that might result in high-utility visualizations.Particularly, for an aspect C (e.g., sector), which takes a set of distinct values: c 1 , c 2 , . .., LoanVis iterates through all the distinct values in C (Algorithm 1 line 1).Then, for each distinct value c i , it generates all the possible visualizations that are based on selecting the subset of data that satisfies that value c i (e.g., Entertainment) (Algorithm 1 lines 4-6).That process is repeated for all values in C, and the top-k visualizations with the highest deviation values are recommended to the analyst (Algorithm 1 line 10).That is, instead of recommending a visualization only in terms of the tuple (A, M, F), LoanVis expands the recommendation process and recommends (c, A, M, F), where c is a distinct value along an analyzed aspect C. For instance, Figure 3 shows a LoanVis recommendation, which is equivalent to the tuple (Entertainment, Country, LoanAmount, SU M()).Our detailed analysis presented in the next section is fully based on the aspect-based recommendation provided by LoanVis.

Algorithm 1 Aspect-based Recommendation
for each A ∈ A do 3: Generate target view V t based on c i , A, M, C Generate comparison V c view based on A, M, C Calculate the deviation dist(P[V c ], P[V t ]) end for 8: end for 9: end for 10: Sort the generated views V based on their deviation score.Output: Top-k views.

System and Results
We have conducted an extensive exploratory analysis of the Kiva dataset using our proposed LoanVis system.In this section, we present some of the visualizations recommended by our system LoanVis, together with some of the insights driven from those visualizations.The presented visualizations are the ones that received the highest utility score, according to the employed data-driven, deviation-based metric (please see Section 2.2).
Figure 4 shows a screenshot of LoanVis, which enables analysts to explore the Kiva dataset and recommends to the analysts visualizations that suit their exploratory analysis and are based on the deviation-based metric.LoanVis is developed using Python 3.7 under the PyCharm IDE.Our user interface is developed using the Dash package, which allows for the creation of an interactive web-based data application.Finally, the Plotly library is utilized for generating dynamic data visualizations.
As Figure 4 shows, LoanVis enables two forms of visual data exploration: (1) manual exploratory search and (2) automated recommendation-based exploration.Particularly, as shown in the top part of the interface, LoanVis allows analysts to specify parameters to manually construct different visualizations of the Kiva dataset across its different dimensions and measures.Alternatively, and as shown in the lower part of the interface, analysts can rely on LoanVis to automatically recommend insightful visualizations based on their selection of the explored aspects (e.g., sector, gender, etc.).In the rest of this section, we focus on the automated recommendations generated by LoanVis.
Notice that in this work we focus on the effectiveness of LoansVis.That is, the interesting insights discovered by LoanVis, whereas efficiency issues (i.e., query execution time) are beyond the scope of this work.Meanwhile, techniques for optimizing the query processing time of visualization recommendation systems have been proposed in some of our related work (e.g., [23][24][25]) and are directly applicable to our LoanVis system.Table 2 presents a summary of all the results presented in this section (i.e., the recommended visualizations).For each result, the table shows the aspect of the Kiva dataset explored by the analyst, together with the different elements that constitute the corresponding visualizations recommended by LoanVis.
Particularly, for each recommended visualization, the table lists the following: (1) the aspect explored by the analyst (e.g., sector, country, etc.), (2) the particular value along that aspect recommended by LoanVis (e.g., sector = entertainment), (3) the dimension, measure, and aggregate function employed in the visualization recommended by LoanVis, and (4) the utility value of that recommended visualization.Notice that the maximum possible utility value for any visualization under the Euclidean distance measure is √ 2 [24].

Automated Recommendations for the Sector Aspect
When exploring the Sector aspect of the Kiva dataset, the top-1 visualization recommended by LoanVis according to the employing deviation metric is the one based on the Entertainment sector, which has been discussed in the previous section and presented in Figure 3. Recall that the recommended visualization shows the significant discrepancy associated with the entertainment sector, where for most countries, loans provided for that sector are minimal, except for the USA.
In addition to Figure 3, that insight could be further understood by examining Figure 5. Particularly, Figure 5 shows the normalized distributions of the amount of loans for all sectors (i.e., comparison view) vs. the normalized distribution of loans directed to the entertainment sector (i.e., target view).
As expected, and as shown in the figure, the sum of all the normalized values in each of the target and comparison views adds up to 1.0 (please see Equation ( 1)).Further, the figure clearly emphasizes and clarifies the discrepancy highlighted earlier in Figure 3.For instance, while a country such as the Philippines receives almost 10% of the total kiva loans distributed worldwide, its share of the entertainment loans does not exceed 1%.In comparison, projects in the USA receive less than 6% of the total distributed loans, whereas its share of the loans directed to the entertainment sector is the highest among all countries at almost 10%.Similarly, as Figure 5 shows, Israel receives only 0.12% of the total loans worldwide but receives 3.12% of the entertainment loans.
While both Figures 3 and 5 deliver the same insights, Figure 3 utilizes two scales for plotting the absolute values on the Y-axis, whereas Figure 5 uses normalized values derived based on Equation (1).For the sake of simplicity and to enhance readability, all of the remaining visualizations are presented using absolute values, similar to Figure 3.
Figure 6a shows the top-2 visualization recommended by LoanVis along the Sector aspect.In contrast to the top-1 visualization, which is based on the Country dimension (Figure 3), this top-2 recommendation is based on the Gender dimension (Figure 6a).Particularly, Figure 6a shows the total amount of loans funded for projects in all sectors per gender (i.e., comparison view) vs. the loans funded in the specific Wholesale sector (i.e., target view).From Figure 6a, it is interesting to notice that, in general, female-led projects receive most of the funding (about $200 M vs. $60 M for male-led projects), as shown in the comparison view.However, for projects in the particular Wholesale sector, male-led projects seem to receive the higher share (about $300 k for female-led wholesale projects vs. $400 k for male-led ones).Interestingly, a similar discrepancy applies to projects in the Construction sector, which was the top-3 visualization recommended by LoanVis, and is shown in Figure 6b.In fact, during our analysis, we initially thought that male-led projects would receive the highest amount of loans in the Construction sector.However, as automatically discovered by LoanVis, that discrepancy is more pronounced in the Wholesale sector (Figure 6a), which scored a deviation value of 0.428, whereas the Construction sector (Figure 6b) came next with a deviation value of 0.4204.

Automated Recommendations for the Country Aspect
Figure 7a shows the top-1 recommended visualization for the Country aspect.That is, when the analyst utilizes LoanVis to recommend visualizations based on a country selection, LoanVis recommends selecting the country of Namibia and also recommends visualizing the distribution of the loans over the different repayment methods (as shown in Figure 7a).Particularly, the figure shows the number of loans paid under each of the different repayment methods.Interestingly, for all countries (i.e., the comparison view), monthly repayments are the most popular method for paying back the funded loans, followed by irregular repayments, then bullet repayments (i.e., paying back the loan all at once in full amount).However, for Namibia (i.e., the recommended target view), the pattern is completely different!Particularly, as the figure shows, all loans directed to projects in Namibia were paid back as bullet repayments!That significant discrepancy between how loans are paid for Namibian projects vs. the rest of the world led to that view achieving a high-deviation value of 1.12 and making it a top recommendation (recall that the maximum possible deviation under the Euclidean distance measure is √ 2).
Figure 7b shows another visualization that LoanVis recommended among the top ones during our analysis along the Country aspect.As it is already known from previous studies of the Kiva dataset, most funded projects are led by females, which is also confirmed by the general distribution of loans in the comparison view shown in Figure 7b.However, LoanVis automatically discovered an interestingly unique pattern for the country of Congo, which is shown as the target view in Figure 7b.Particularly, and differently from the general pattern, Figure 7b shows that in Congo the vast majority of funded projects are led collaboratively by both males and females!In fact, upon further analysis, we realized that those mixed-gender projects constitute 92% of the funded projects in Congo, while they constitute only 0.15% of the funded projects worldwide.

Automated Recommendations for the Year Aspect
In this experiment, we focus on automatically generating recommendations based on the Year aspect.Figure 8 shows the top-1 visualization recommended by LoanVis for that aspect.Particularly, as the figure shows, LoanVis recommended the selection of Year = 2016 and also recommended a visualization in which the Y-axis is the Average Loan Amount and the X-axis is the Country.Examining Figure 8, it shows the overall distribution of the average loan amount received by each country over all the recorded years (i.e., the comparison view, which is shown in black).However, the general overall distribution is significantly different in 2016, which has been recommended by LoanVis.
Specifically, looking at the distribution of loans in 2016 (i.e., target view, which is shown in red color), we notice that while the distribution of the average loan amount of most countries followed the same general pattern as in the comparison view, some discrepancies stand out, namely: (1) few countries did not receive any loans in 2016 (e.g., Bhutan, Chile, Congo, and Iraq), and (2) the country of South Sudan received loans in an average amount much higher than the loans it received in the other years.Digging deeper into that discrepancy, we realized that in 2015/2016, Kiva agreed to allow a one-year grace period for loans disbursed in South Sudan and also agreed to restructure repayment plans [38].
Interestingly, our observation was further emphasized by the top-2 visualization recommended by LoanVis, which is shown in Figure 9.In that recommended visualization, the comparison view shows the average number of lenders per project over the different years for all countries, whereas the target view shows the average number of lenders per project for all countries in 2016 (in red).Looking at the general distribution captured by the comparison view, we notice a skewed distribution, in which projects in Bhutan got the highest number of lenders per project (about 200 lenders per funded project), followed by Chile, Namibia, and Nigeria comes last in terms of average number of lenders per project.However, the distribution captured by the target view for 2016, while it still shows a skewed pattern, the details of that pattern are significantly different from the general one.Particularly, as Figure 9 shows, in 2016, the country with the maximum number of lenders per project was South Sudan, followed by Somalia, then Namibia.Noticing that projects in South Sudan received the highest number of lenders in 2016 (as shown in Figure 9) might provide a potential explanation for those projects receiving loans in high amount (as shown in Figure 8).

Automated Recommendations for the Gender Aspect
In this last analysis, we examine generating recommendations based on the Gender aspect.Figure 10 shows LoanVis' recommendation, in which it automatically selected Gender = Male and also recommended the Y-axis to be the number of loans (i.e., count()) and the x-axis as the Repayment Interval.As the figure shows, in the general pattern (i.e., comparison view), irregular payment is the most popular method for paying back loans, followed by monthly, then bullet payment.However, the figure also shows that male-led projects exhibit a different pattern, in which monthly payment is the most popular, followed by bullet, then irregular.Discovering and studying that discrepancy could potentially assist decision-makers and lenders in structuring the repayment options for their loans.

Conclusions and Future Work
Motivated by the need for unlocking valuable insights from crowdfunding data, in this paper we propose our LoanVis solution for visualization recommendation.Unlike existing work that relies on manual data exploration for discovering insights from crowdfunding databases, LoanVis is an automated solution that utilizes objective metrics to quantify the utility of the recommended visualization.Our experimental evaluation demonstrated the effectiveness of LoanVis in recommending high-utility visualizations that reveal some interesting insights into some of the crowdfunding loan distribution patterns, based on the Kiva dataset.
Currently, LoanVis relies only on the deviation-based metric for capturing the interestingness of a visualization.To address that limitation in the future, in addition to the deviation-based measure, we plan to explore an expanded set of utility metrics and incorporate them into our LoanVis solution.Examples of such data-driven metrics include correlation, skewness in data distribution, diversity, etc., [26,37].Moreover, we will investigate combining and integrating several of those data-driven utility metrics into hybrid multi-objective functions so that we can recommend visualizations that satisfy different requirements and expectations.
Furthermore, notice that while our data-driven approach has its clear advantages in visualization recommendation, it suffers from a lack of personalization, as recommendations may not be tailored to individual user preferences and needs.As such, in the future, we will investigate learning the user preference for visualization recommendations.That is, what makes a certain visualization interesting from the user perspective using ML-based classifier techniques.In turn, we also plan to conduct a real-world user study to further assess the effectiveness of our proposed LoanVis.

Figure 5 .Figure 6 . xxx 3 . 1 .
Figure 5. xxx Figure, the sum of 315 all the normalized values in each of the target and comparison view add up to 1.0 (please 316 Version March 12, 2023 submitted to Journal Not Specified 10 of 15

Figure 8 .
Figure 8. need updated figure see Eq.??).Further, the figure clearly emphasizes and clarifies the discrepancy highlighted 317 earlier in Figure5.For instance, while a country such as the Philippines receives almost 318 10% of the total kiva loans distributed worldwide, its share of the Entertainment loans does 319 not exceed 1% why there are two y-axis in that figure?do the purple bars actually add up to 320 1.0?.In comparison, projects in the USA receive less than 6% of the total distributed loans, 321 whereas its share of the loans directed to the Entertainment sector is the highest among all 322 countries at almost 10%.More interestingly, as Figure6shows, Israel receives only xx% of 323 the total loans worldwide, but receives xx% of the Entertainment loans.While both Fig-324ures 5 and 6 deliver the same insights, Figure5utilizes two scales for plotting the absolute 325 values on the Y-axis, whereas Figure6used normalized values derived based on Eq.??.For 326 the sake of simplicity, and to enhance readability, all of the remaining visualizations are 327 presented using absolute values, similar to Figure5.

Figure 9 .Figure 10 .344 3 . 2 . 346 Figure 10
Figure 9. seems a bit strange -I am skipping it for now! turned our Bhutan has only 2 loans in the entire dataset

369 15 Figure 9 .Figure 10 .344 3 . 2 . 346 Figure 10
Figure 9. seems a bit strange -I am skipping it for now! turned our Bhutan has only 2 loans in the entire dataset
(a) Loans for the Entertainment Sector (b) Loans for All sectors combined

Figure 3 .
Figure 3. Deviation in loans distribution for all sectors (comparison view in black) vs. loans for the entertainment sector (target view in red).

Figure 5 .
Figure 5. Normalized probability distribution in loans disbursed for all sectors (comparison view in blue) vs. loans for the entertainment sector (target view in green).

Figure 6 .
Figure 6.LoanVis top Recommendations for the Sector Aspect.(a) Deviation in loans distribution for All sectors vs. loans for Wholesale based on Gender.(b) Deviation in loans distribution for All sectors vs. loans for Construction based on Gender.

Figure 7 .
Figure 7. LoanVis top recommendations for the Country aspect.(a) In Namibia (target view in red), most loans are payed as bullet repayments at once in contrast to the other worldwide prevailing repayment methods (comparison view in black).(b) In Congo (target view in red), most projects are led by mixed-gender teams vs. worldwide (comparison view in black); most projects are female-led.

Figure 8 .
Figure 8. Deviation in loan distribution for all years (comparison view in black) vs. loans for 2016 (target view in red).

Figure 9 .
Figure 9. Deviation in number of lenders for all years vs. 2016.

Figure 10 .
Figure 10.Distribution of repayment interval for all projects (comparison view) vs. male-led projects (target view).

Table 2 .
Summary of the LoanVis Recommendations

Table 1 .
The Schema for the Kiva Dataset.

Table 2 .
Summary of the LoanVis Recommendations.