Digitalisation and Big Data Mining in Banking

: Banking as a data intensive subject has been progressing continuously under the promoting inﬂuences of the era of big data. Exploring the advanced big data analytic tools like Data Mining (DM) techniques is key for the banking sector, which aims to reveal valuable information from the overwhelming volume of data and achieve better strategic management and customer satisfaction. In order to provide sound direction for the future research and development, a comprehensive and most up to date review of the current research status of DM in banking will be extremely beneﬁcial. Since existing reviews only cover the applications until 2013, this paper aims to ﬁll this research gap and presents the signiﬁcant progressions and most recent DM implementations in banking post 2013. By collecting and analyzing the trends of research focus, data resources, technological aids, and data analytical tools, this paper contributes to bringing valuable insights with regard to the future developments of both DM and the banking sector along with a comprehensive one stop reference table. Moreover, we identify the key obstacles and present a summary for all interested parties that are facing the challenges of big data.


Introduction
The era of big data came along with both big opportunities and challenges, almost all science subjects are experiencing overflowing information at unpredictable volume and speeds [1].As such, revealing the hidden information in big data via Data Mining (DM) techniques has became an emerging trend and ultimate objective for a wide range of studies [2][3][4][5].As a data intensive subject, banking has been a popular implementation field for researchers with DM skills over the past decades of the information science revolution.Banks have acknowledged that knowledge instead of financial resources is the new biggest asset [6].Moreover, the development and popularization of e-banking and mobile banking adds to the exponential growth of real time banking information.These continuous developments and the rapidly increasing availability of big data make mastering relevant big data analytics tools one of the most crucial tasks for the banking sector.
Following a comprehensive investigation of existing literature, to the best of our knowledge, only two review papers focused on the DM applications in banking [7,8] and both covered a number of DM implementations before 2013.For such a rapidly developing subject that progresses on a daily basis, it is important to provide researchers and interested parties with the most up to date status of DM and banking collaborations.As such, we thoroughly review the DM applications in banking, especially for the recent years, post 2013.It is noteworthy that we will not repeat the contents covered in [7,8], but instead focus on the most recently developed DM applications in the banking sector.This paper aims to serve as the most up to date one stop directory guide for relevant researchers and apprise them of the evolution of big data analytics in banking with an outlook for future research.
Having summarized the recent applications, big data in banking have been exploited for improving customer satisfaction, marketing and optimizing strategic management.Specifically, the recent applications collected in this paper targeted mainly four topics: security and fraud detection, risk management and investment banking, customer relationship management (CRM) and other advanced supports.Moreover, this paper contributes by gathering and analyzing the information of DM techniques, software, and data resources (among other factors).As such, the contents of this review ensures its novelty and meaningful contributions to the existing gap in academic literature.
The remainder of this paper is organized such that the methodology is presented in Section 2; the value creation of DM in banking by topics along with a summarized reference table are reviewed in Section 3; the key techniques and software trends are briefly summarized in Section 4; Section 5 finally concludes and proposes directions and challenges for future research.

Methodology
We adopt the methodology presented in [9] considering its sophisticated research design, which is clearly demonstrated to the reader via Figure 1.Firstly, as clarified in the last section, the research scope is defined as the DM applications in banking post 2013.The search process follows the usual approach by defining a range of keywords, where we employed the significant terms for both banking and DM techniques, including banking, fraud detection, credit card, credit scoring, risk management, deposit, mortgage, debit, loan, CRM, bank marketing; and data mining, clustering, text mining, classification and other specific DM technique terms.It is noteworthy that the pairwise searching approach is conducted while ensuring at least one term from each aspect is represented.Moreover, the scope of the search focuses solely on reliable academic resources from leading journals (for example, Expert Systems with Applications, Decision Support Systems, Data Mining and Knowledge Discovery) and leading conferences (e.g., IEEE series, and Knowledge Discovery and Data Mining).To this end, there are over 100 applications identified to be manually reviewed and summarized in the following section.

Value Creation of DM in Banking by Topics
Having reviewed over 100 recent DM applications in banking post 2013, it can be generally concluded that the banking sector mainly adopts DM techniques for the following purposes: Security and fraud detection: Big secondary data like transaction records are monitored and analysed to enhance banking security and distinguish the unusual behavior and patterns indicating fraud, phishing, or money laundering (among others).
Risk management and investment banking: Analysis of in-house credit card data freely accessible for banks enables credit scoring and credit granting which form part of the most popular tools for risk management and investment evaluation.
CRM: DM techniques have been widely applied in banking for marketing and customer relationship management related purposes such as customer profiling, customer segmentation, and cross/up selling.These help the banking sector to have a better understanding of their customers, predict customer behaviour, accurately target potential customers and further improve customer satisfaction with a strategic service design.
Other advanced supports: A few less mainstream applications focus on branching strategy, and efficiency and performance evaluation, which can significantly assist in achieving strategic branch locating and expansion plans.
In what follows, we briefly review the collected publications with respect to the areas of interests.Moreover, to serve as the one stop reference directory for all recent DM applications in banking post 2013, a summary table can be found in Table 1, where the literature are grouped by means of value creation, along with detailed information like data resources/regions and DM techniques adopted.Note that most implementations are applied to more than one DM technique, and some applications did not clarify the specifics due to confidential restrictions.

Security and Fraud Detection
In order to maintain the high standards of security amidst the overwhelming flow of big banking data and the rapidly growing scale and complexity of cyber crimes, researchers have been exploring advanced DM techniques for effectively identifying unusual fraud behaviour.Note that an existing review that targeted credit card processing can be found in [10].From an internal perspective, the survey data of bank employees in India are collected in [11] to analyse their perceptions with regard to fraud.
Many researchers worked with transaction data, seeking better approaches to distinguish between patterns from genuine behavior with higher efficiency and accuracy [12][13][14][15][16][17][18][19][20][21][22][23][24][25].Among these, Wei et al. [12] proposed a framework named i-Alertor for major Australian banks; a semi-supervised decision support system named BankSealer was proposed in [14] for an Italian bank; authors in [15] proposed a hybrid DM method to predict network intrusions and detect fraud activities; FraudMiner model that integrated frequent itemset mining was introduced in [16] and verified with the data set from UCSD DM contest 2009; a comparative study [17] addressed the ensemble approach to build classifiers; in terms of a recent advancement in FraudMiner, the authors in [18] introduced the LINGO clustering technique [26] for the pattern matching process, and this enhancement helped maintain a satisfying performance in terms of accuracy while further reducing the false alarm rate; Behera and Panigrahi [19,20] demonstrated the hybrid approach for credit card fraud detection by combining Fuzzy Clustering and NN techniques, and achieved over 93% accuracy with the dataset generated in [27]; APATE was proposed in [21] for automated fraud detection within a large credit card issuer in Belgium; both Luhn's and Hunt's algorithms were employed in [22] for proposing a novel system of credit card fraud detection; the authors in [23] illustrated the use of DM techniques on customer data to add a higher level of authentication to banking processes for real time fraud detection; a hybrid approach combining genetic algorithm and NN was proposed in [24] for Greek companies in the banking sector; a framework named FDiBC was developed in [31] for fraud detection within the Saman Bank in Iran; an e-banking security system employing Cryptography and Steganography was introduced in [25] for preventing online banking fraud.
Apart from the main implementations on transaction data, the authors in [28] focused on phishing detection from official banking websites and applied a multi-label classifier based associative classification DM for effective detection of phishing in websites with high levels of accuracy.In order to improve the customer credit card churning prediction for a Latin-American bank, the authors in [29] adopted improved DM techniques that are based on K-means clustering and support vector machines (SVM).Blog mining (text mining and cluster analysis) was applied in [30], where security risks, protection strategy and security trends of mobile banking were summarized from more than 200,000 results of the Google blog search engine.
There are also researchers who paid extra attention to money laundering detection.For instance, a DM model is presented in [6] that applied K-means clustering and Association Rule Mining for identifying suspected sequence of money laundering processes.A novel technique named Bitmap Index-based DT was proposed in [32] for evaluating the risk factor of money laundering with Statlog German credit data.

CRM
Customer satisfaction [56] k-mean clustering, classification (NN) Spain [56] Make the most strategic investment on maintaining and enhancing customer satisfaction.

Risk Management and Investment Banking
Those interested in DM applications in credit scoring up until 2012 are referred to the review in [33].
A large proportion of recent applications applied DM techniques for credit scoring across the banking sectors globally [34][35][36][37][38][39][40][41][42].A bank in Indonesia was investigated in [34].Chen et al. [35] analyzed data from 16 listed Chinese commercial banks whilst the data set from the Export Development Bank of Iran was evaluated in [37].Koh et al. [36] constructed a two-step method for credit scoring through the database in a German bank.Similarly, with the German credit scoring dataset, Harris [38] demonstrated the clustered SVM classifier for credit scoring, and Zhao et al. [39] presented the improved Multi-layer Perceptron NN model by employing the back propagation algorithm.The authors in [40] proposed the credit risk evaluation approach with the assistance of external evaluation and sliding window testing; it was verified on real life data from EDGAR.Later, Alaraj and Abbod [41,42] adopted the classifiers consensus system and proposed a hybrid model for credit scoring.The authors applied the classifier combination rule based on the consensus approach for experiments with seven real world credit data sets.
It is noteworthy that there are a few research studies that specifically targeted classification techniques and their applications in credit scoring leading to significant research developments.For instance, Lessmann et al. [43] reported on relevant research up until 2014 and conducted comprehensive experiments with real life Australian and German credit data sets for seeking the optimal classifier.Louzada et al. [44] recently produced a systematic review that specifically focused on the applications of classification techniques for credit scoring.Here, the main classification methods for credit scoring were summarized and introduced along with a detailed analysis of theoretical and paradigm trends.A recent comparative study in [45] conducted experiments on credit data sets from six different regions (including Australia, Germany, Iran, Japan, Poland and the US) to rank and identify the best classier for credit scoring out of 25 different classifiers that were considered.
There are also researchers who focused on the decision making process of credit granting [46][47][48].Specifically, a personal bankruptcy system was proposed in [46] for bad account prediction and was implemented on the credit card data set from a Canadian bank.The authors in [47] analyzed the capacity of credit union members at settling their commitments.A decision support system for banks was proposed in [48] for leading institutions to monitor account receivables and maintain profitability.
With regard to DM applications in risk management of peer-to-peer (P2P) lending, the authors in [49] focused on profit scoring by predicting the internal rate of return for a decision support system of P2P lending.The proposed model was verified by an experiment considering US Lending Club data.Similarly, the research in [50] adopted LR and K-means clustering techniques for detecting bad credit scores for P2P lending data.Recently, Xia et al. [51] employed three real life credit data sets and two P2P lending data sets for evaluating the performance of a newly proposed approach that employed extreme gradient boosting and Bayesian hyper-parameter optimization.

Customer Relationship Management (CRM)
CRM is "a comprehensive strategy and process of acquiring, retaining and partnering with selective customers to create superior value for the company and the customer" [85], and it has been overwhelmingly influenced by the DM techniques [86].A previous review of DM applications in CRM with respect to a general research scale was published in [87], which thoroughly reviewed relevant literature up until 2008, and a recent general review can be found in [88].However, with regard to the specific interest of the banking sector, only one short review can be found in [89] covering literature up until 2013.
The representative framework of customer analytics in banking is iCARE supported by IBM; more details of its solutions and a real case study for a commercial bank in Southeast China can be found in [90].

Customer Profiling and Knowledge
To build up an accurate customer profile, it is crucial for banks to extract valuable information from the customer behaviour with the assistance of DM techniques [91].Mansingh et al. [52] demonstrated the use of DM techniques on survey data of internet banking users in Jamaica, which further aided in the decision making process of analyzing attitudinal, behavioral and demographic variables collectively for the purpose of prediction and profiling.By focusing on the data obtaining process, the online data imputation framework incorporating DM techniques was introduced in [53] and verified via an application of real banking data sets.

Customer Segmentation
In order to achieve better understanding of mobile banking customers and implementations of customer-centric strategies, Noori [54] proposed the customer segmentation model for an Iranian bank.Later, the authors in [55] introduced a framework based on the transaction driven parameters for proper segmentation of a bank's customers.

Customer Satisfaction
DM techniques have also been applied for maintaining and enhancing banking customer satisfaction, for instance, in Spain the core determinants of level of trust for banking customers are analyzed using DM techniques in [56].

Customer Development and Customization
A number of DM applications in customer development and customization focused on marketing related tasks.The bank direct marketing data sets from Portuguese banking institutions have been popular data resources and were investigated in [57][58][59] which compared the performances of four different DM classification techniques.The same dataset was used for verifying the proposed approach in [60] that employed correlation-based feature subsection selection algorithm along with the data set balancing technique.Later, this model was developed by [61] with an ensemble framework.Moreover, a profit driven artificial NN approach was proposed in [62] and a similar study applied two steps model of K-mean clustering and classification [63].Recently, Lahmiri [64] proposed a two step system that combined a NN ensemble model and Particle Swarm Optimization for optimizing the initial weights of each NN in the ensemble framework.This was also verified by the bank direct marketing data with outstanding performance in relation to the baseline approaches.
Apart from the continuous exploration of the above Portuguese direct marketing dataset, Shih et al. [65] presented a target marketing model for commercial banks for the personal loan service, and the experiment was conducted with the data from a bank in Taiwan.With the direct marketing data set of a Turkish bank, Mitik et al. [66,67] proposed a two step hybrid system and achieved promising accuracy and a huge increase in the overall profit/cost ratio.Another regionally focused research by Wang and Petrounias [68] analyzed the relationships between demographic characteristics and mobile banking in China with big data collected through questionnaires.This application contributes to guiding the development of marketing strategies for domestic banks in China.

Customer Retention and Acquisition
DM techniques have been widely applied to detect early warnings in customer behaviour such as reduced transactions and account status dormancy, which will help the banks to provide proactive and strategic steps for preventing customer churn and switching banks.
Data relating to private banking customers in a European bank was analyzed in [69]; He et al. [70] employed the SVM technique for customer churn and attrition prediction with a real life data set from a Chinese commercial bank; customer records of a major bank in Nigeria were investigated in [71]; later, in [72] a NN classification technique was applied on the customer database of an international bank for customer churn prediction; similar research was also conducted in a small Croatian bank in [73] and the electronic banking service data set in [74].
More recent research by Azad [75] investigated mobile banking adoption in Bangladesh by employing the NN technique on a structured questionnaire data set.The research identified the most influencing factor on mobile banking adoption, which then assisted in attracting potential customers and the design of future services.

Other Advanced Supports
Social media has been identified as another top trend in the banking sector [76], for instance, text mining was applied in [77] to extract hidden information from social media data in Nigeria (Twitter and Facebook) to assist the banking sector decision making and internet marketing.Specifically, the experiments analysed the unstructured data extracted from Facebook and Twitter for five of the largest Nigerian banks via Text Mining and K-means clustering.
A less mainstream application by Batmaz et al. [78] focused on the DM application for deposit pricing and identifying its main determinants.This research was conducted on the customer level data set of a commercial bank in Turkey, and beneficial conclusions for strategic deposit pricing were achieved, contrary to existing evidence obtained from macro-level bank data.
With regard to the evaluation of bank branch performance and providing early warning for failing banks, DM techniques like clustering and classification were adopted in [79][80][81], where [80] conducted the experiment on a real data set of a Canadian bank, while [81] on the Ziraat bank in Turkey.These studies assisted the regulatory bodies with early signals of banks/branches that require immediate attention; they also contributed to achieving strategic expansion design.Similarly, Wanke et al. [82][83][84] have conducted a series of research studies into banking efficiency evaluation by adopting NN techniques.Their applications have covered banking sectors from ASEAN, Islamic and BRICS countries.

Key DM Techniques, Software for Banking and Trends
Following the detailed review of over 100 publications by topics, the key DM techniques adopted in banking are identified, which include cluster analysis [92], association rule mining (ARM) [93] and classification techniques [94], which include but are not limited to Decision Trees (DT) [95,96], Neural Networks (NN) [97], Support Vector Machines (SVM) [98], Naive Bayes (NB) [99], and Logistic Regression (LR) [100].Note that a brief introductory summary on these DM techniques can be found in [3,4].
This research also reveals the trends of DM applications and techniques in the context of banking based on the key information manually extracted from the identified recent applications.Note that the following statistics and diagrams are achieved based on the manually filtered information from the reviewed publications only and some applications did not clarify these specific information due to confidentiality related restrictions.
According to Figure 2, CRM applications account for about 35% of reviewed publications, which confirms the fact proposed by [101] that over 80% of financial service organizations globally list customer experience as its top priority.Within CRM applications, about half of the implementations are targeting customized marketing and cross/up selling, followed by customer retention and acquisition covering about 1/3 of CRM implementations.Recent research also addressed the significance of fraud detection and risk management as they account for 28% and 26% of the overall applications.This is due to the emerging need for combating cyber crime and developing more advanced technologies [102] (A review paper specifically targeting DM applications in fraud detection can be found in [103][104][105]).Weka, Matlab and SPSS are the most popular software adopted, followed by R and RapidMiner.Although there are about 30% of publications that did not clarify the software information, these facts are expected to assist the relevant research parties in finding corresponding analytical solutions or researchers who hold relative expertise.The most frequently adopted DM techniques are classification 60% and clustering 28%.However, it is noted that most of the applications are employing more than one DM technique, also generally the papers adopting classification techniques use more than one specific classification technique for comparison purposes.Specifically, K-mean clustering is the most frequently applied clustering technique and the top three classification techniques are NN, DT and SVM.ARM seems rarely exploited considering its 5% proportion, and it is hardly seen integrating text mining or social network analysis mining.Considering the availability of unstructured big banking data from customer profiles, feedback and call centre records [106], there exists a huge potentials for many DM techniques that have not been investigated before.

Conclusions
This paper successfully captured and systematically reviewed nearly 100 DM applications in banking post 2013.It fulfills the literature gap and serves as a quick reference guide for recent DM implementations in banking.Having reviewed these recent publications, it can be concluded that the banking sector has adopted DM mainly for fraud detection, risk management and CRM.In addition, most of the applications are using more than one DM technique, among which clustering and classification have shown sufficient evidence of both applicability and popularity.
Although the growing interest and promising performances have reflected the values and potentials of DM applications in banking, the obstacles of applying this techniques on big banking data is still noteworthy, for instance, the costly and time consuming process of personnel training towards pattern identification and data preprocessing, variable (feature) selection, complexity and difficulty of data quality assurance, large dataset storage and maintenance, etc.
Apart from the comprehensive summary of recent developments of DM applications in banking, this study also aims to present insights into the challenges and directions for future research.Firstly, it is noted that although the big banking data consists of large volumes of unstructured data, there are many DM techniques which continue to be rarely exploited, e.g., text mining, entity extraction, and social network analysis.This unbalanced exploration status can be caused by the limited access of big banking data, the shortage of researchers with relevant skill set, system constraints, and the lack of advanced data analytic tools [107].Specifically, the confidentiality restrictions of banking related data have limited the progression of research.Therefore, seeking a proper solution for data availability will make a significant difference for future research.
In terms of the means of value creation by DM applications in banking, the banking sector has obtained sufficient rich customer information and the current implementations only focus on the marketing aspect.There exists a significant potential and valuable information waiting to be discovered.Moreover, a big proportion of available data channels like call center, customer surveys, and social media, are still waiting to be fully exploited.As a trending approach, machine learning methodology, especially deep learning, has been the emerging focus of a lot of scientific research.
Accordingly, it can be expected as another key direction for the banking sector in order to better embrace the era of big data.
Finally, as another suggestion of future work, the study of new analytical trends is also crucial for the banking sector, which aims to provide solutions for three types of use (i.e., Embedded Analytics, Citizen Data Science and Analytics for Data Scientists).Meanwhile, new technological trends in the era of big data can also continuously alter the research directions of DM applications in banking.For instance, the development of cloud computing can significantly improve the computational performance of most existing frameworks whilst the popularization of Internet of Things further enriches the big data resources, and it may also positively influence the embedded analytics and the development of dynamic big data analytics networks.

Figure 1 .
Figure 1.Research framework for DM applications in banking.

Figure 2 .
Figure 2. Key facts on DM applications in banking since 2013.

Table 1 .
Summary table of Data Mining applications in banking since 2013.