Next Article in Journal
Enhancing Fabric Detection and Classification Using YOLOv5 Models
Previous Article in Journal
A Bibliometric Analysis of International Structural Engineering Standards Using VOS Viewer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

The Use of Support Vector Machine to Classify Potential Customers for the Wealth Management of a Bank †

1
Department of Electronic Engineering, National Taipei University of Technology, Taipei 106, Taiwan
2
Department of Business Administration, Takming University of Science and Technology, Taipei 114, Taiwan
3
Department of Wealth Management, Cathay United Bank, Taipei 110, Taiwan
*
Author to whom correspondence should be addressed.
Presented at the 2024 IEEE 7th International Conference on Knowledge Innovation and Invention, Nagoya, Japan, 16–18 August 2024.
Eng. Proc. 2025, 89(1), 32; https://doi.org/10.3390/engproc2025089032
Published: 3 March 2025

Abstract

We developed a method for the evaluation and selection of customer business analysis in two stages. First, using the bank’s existing expert model, artificial rules of thumb were used to evaluate the value of each field of the data and establish screening rules. Secondly, the machine learning feature screening method was applied based on the customer’s transaction data to find out whether the customer’s contribution to the bank had a significant impact as a feature of the model. Based on the results, the best classification model was selected through data verification. The effectiveness of the proposed model was validated through actual case analysis, taking wealth management in banks as an example. The classification method, using support vector machines (SVMs), effectively assists banks in identifying potential customers efficiently and in planning to manage customers. This method helps to avoid the traditional blind spots, which emerge based on subjective judgment, and allows bank wealth managers to promote customer relationship management (CRM).

1. Introduction

The stock markets in Taiwan and Japan have been booming. Banks’ wealth management services aim to attract deposit holders to invest large amounts of money by trading trade stocks and bonds. Banks hope to create revenue by charging trading fees and deducting from customer surpluses. The key is to identify people who have money in accounts, are interested in investment, and have a mutual agreement with the bank. Such customers need to be identified as potential customers in order to promote wealth management services.
However, banks have many customers, and each person’s financial management concept, income, and expenditure are different. It is difficult to select qualified people to be potential customers from many records. The application of machine learning in customer relation management (CRM) has been verified [1,2,3,4,5,6]. Previous methods solicited customers using marketing methods, such as calls, flyers, or questionnaires, but they were not effective. To improve effectiveness, banks target their customers. There are a variety of screening methods, and the conditions are different. Most of the screening conditions are based on the bank’s financial relationship and the managers’ decisions, which are subjective and lack mathematical logic [3]. The accuracy was only 56%. Therefore, it is not appropriate as a management strategy. The characteristics of the data must be engineered to achieve accuracy, as shown in Table 1.
Accuracy, recall, and precision are calculated using (1)–(3). The results are classified into true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). TP and TN describe the number of correct classifications. On the other hand, FP and FN represent the number of misclassifications.
A c c u r a c y = T P + T N T P + T N + F P + F N ,
R e c a l l = T P F N + T P ,
P r e c i s i o n = T P F P + T P ,
In this study, we cooperated with C Bank in Taiwan and used a machine learning algorithm to analyze the conditions for wealth management customer records. Features were extracted to build a model. From numerous customer records, potential customers can be identified to formulate business strategies.

2. Experiment Method

2.1. Analysis

Machine learning algorithms were proposed in Refs. [1,2,3,4,5] in order to study CRM and customer purchasing behavior. We used methods in Refs. [1,3]. Five algorithms were chosen, as listed in Table 1.

2.2. Algorithm

The multilayer perceptron classifier (MPC) is connected to multiple hidden layers [7]. Neurons in each layer have different weights, producing a function that expands the calculation of the output layer. As a result, an MPC cannot have transparency and interpretability [2]. Therefore, a decision tree classifier (DTC) and random forest classifier (RFC) were introduced [8]. RFC is regarded as a combination of many DTCs. Depending on entropy or Gini coefficient, DTC is used for classification to determine the maximum information gain, but it is not easy to determine the importance of features. RFC shows better results than DTC by establishing multiple DTCs and voting via multiple DTCs. Therefore, the more DTCs established, the better the RFC. However, as the amount of data increases, the depth and quantity of DTC also increase, and the execution efficiency of RFC is too reduced to process millions of data points.
Logistic regression [9] and support vector machine [10] are mostly used in non-linear classification methods. By finding out common conditions from the customer records of wealth management services, banks can use them as features to build models to predict potential customers. Since the model in this study needs to predict the results of binary classification, we consider the number of features and the size of the training sample with the following strategy.
  • If n > m, either of the two is used.
  • If n < m, SVM is used.
According to the observed situation, the number of customer records in the bank, m, must be larger than the number of fields in the customer information table, n. Therefore, based on the above strategy, we used SVM.

3. Implementation and Results

For feature selection, different screening methods are used according to supervised or unsupervised learning. Unsupervised learning removes features with low correlation to reduce interference from irrelevant features. The SVM algorithm belongs to supervised learning. It takes advantage of the overlapping characteristics of highly correlated features to retain highly correlated features and use them to find out effective fields for classifying potential customers. There are three types of feature screening methods used in supervised learning.
  • Filter Method
Each feature is scored according to divergence and correlation and sorted according to its importance. Then, the threshold is used to eliminate unimportant features in order to select required features [11,12]. Common filtering feature screening techniques include correlation and univariate features.
2.
Wrapper Method
Based on the prediction results obtained using the algorithm, the quality of selected features is judged. Various feature combinations are tested to exclude unnecessary features until the best feature combination is selected [13]. The screening results are better than those of the filtering method but obtaining them is time-consuming [11,13]. Common packaging feature screening technologies include forward elimination, backward elimination, and recursive feature elimination (RFE).
3.
Embedded Method
Machine learning algorithms and models are used for training. Through training, the contribution of each feature to the model is determined. A specific weight is used to represent its contribution. Features are sorted based on the weights. Although the screening effect of this is not better than that of the filtering method, it is more efficient than the packaging method [14]. Common embedding feature screening technologies include the assessment of feature importance and L1 regularization.

3.1. Implementation

We tracked the customers’ recent transaction investments and the records of investment products on Bank C’s official website to identify and label them. Labels include the customer’s recent investment portfolio products, bonds, and time deposits, recently remitted foreign currency, browsed financial management, insurance, and a suspension for the invested funds. There are more than ten types of loss and profit suspension reminders for financial management. However, the primary consideration for financial managers is to identify potential customers with large investment intentions in bank products and with large pay subscription fees. Therefore, customers whose contribution in the past year (set as variable Y) was five times larger than the average contribution by all customers (set as variable X, that is, Y ≧ 5X) were defined as potential customers in this study.
We randomly selected more than 144,000 records from all customers with the following conditions: the asset balance was more than NT$3 million and Y ≧ 5X. We established a reference that can be used by various banks for wealth management services. We referred to segmentation, targeting, and positioning theory to determine factors [15], considering demographics, geographical location, beliefs and values, lifestyle, and consumer behavior. Based on the C Bank’s company internal analysis, the following three aspects were defined:
  • Basic customer information: In addition to age, gender, education, and occupation category, the customer needs to be qualified to invest in specific high-risk products and receive a “professional investor flag”. This indicates whether the customer has a higher tolerance and participates in the bank’s issuance of funds. Products with high investment returns contribute to the performance of banks’ wealth management services. In addition, a “life cycle segmentation” method customized by Bank C is also added.
  • Customer asset account: 9 fields are determined, including the total average asset balance, the foreign currency saving deposit balance, Taiwanese dollar deposits, foreign currency deposits, mutual fund assets under management, insurance, and structured notes. Based on the balance of each field, we identify customer investment behavior and preference characteristics. Factors such as the number of products held and whether someone is a check deposit holder are added to prove the customer’s enthusiasm for investment and their risk-taking awareness.
  • Customer management of bank transactions: The “bank-period-years” are used to determine the degree of customers’ loyalty to the bank or represent customer stickiness. The “relationship manager’s code” is also added as the experience level of the specialist who provides the service. The “customer’s major district code” is used to determine the difference to marketing approaches from different geographic area.
Table 1 lists the 21 fields, as shown in Appendix A. The name, data type, and range of the data were used to train the model.

3.2. Results

More than 144,000 randomly selected items were confirmed by Bank C’s wealth management department. Among them, 12,499 records (80%) were used as training data, and the other 3125 records (20%) were used as testing data, to train the SVM model developed in this study. Three methods were used according to different feature screening methods: a correlation analysis method, a feature importance assessment method, and L1 regularization. The selected fields were applied to the SVM model to test its accuracy and find the best feature screening method that is consistent with the current rules of the wealth management department. The accuracy and F1 score of the four classification models after training are listed in Table 2. Table 3 shows the retest results of the SVM models using a test set of 3125 records. The F1 score is calculated using (4). Figure 1, Figure 2, Figure 3 and Figure 4 show the receiver operator characteristic curve (ROC) of the SVM model with four screening methods. Figure 5, Figure 6, Figure 7 and Figure 8 the confusion matrixes.
F 1   S c o r e = 2 1 P r e c i s i o n + 1 R e c a l l

4. Conclusions

The SVM model of feature importance shows the best predictive ability, with an AUC of 0.977. It was the only model that judged TP and FP accurately with different thresholds. The AUC of the rule of thumb was 0.7, so the curvature of the ROC was less close to the boundary than seen in other models. The AUC of feature importance was 0.977, and the ROC was close to the boundary, which showed excellent discriminative power. The SVM model had an excellent ability to correctly judge the target category, but the best model misjudgment rate was about 60%. It is still challenging to accurately classify potential customers based on the results, but the model helps to identify non-potential customers, avoid ineffective customer management, and improve efficiency. The F1 score was used to judge the results, and the SVM model with feature importance showed the best prediction. The results are consistent with the judgment using the AUC.
We used the existing practices of Bank C as the experimental control group to observe whether the model enables efficient analysis. The accuracy of Bank C’s current rules of thumb in the SVM model was 0.83. Although the accuracy did not reach 0, it still was satisfactory. After retesting, the AUC was 0.7, which showed that the current rules of thumb of Bank C were satisfactory. The accuracy of the SVM model was higher than 0.9, showing that the model improves the efficiency of avoiding subjective judgments. The number of important features was reduced compared with the 17 stipulated in Bank C’s current rules.
The data contained invalid information due to subjective judgment. If an automated calculation mechanism is developed, the waste of computing resources can be reduced. The machine learning model can be incorporated into the future operating mechanisms of the bank to improve management efficiency.

Author Contributions

Conceptualization, C.-H.L. and J.-W.H.; methodology, C.-H.L.; software, C.-H.L. and J.-W.H.; validation, C.-H.L., J.-W.H. and Y.L.; formal analysis, Y.L.; investigation, J.-W.H. and Y.L.; resources, J.-W.H.; data curation, J.-W.H. and Y.L.; writing—original draft preparation, J.-W.H.; writing—review and editing, C.-H.L.; visualization, J.-W.H. and Y.L.; supervision, Y.-S.H.; project administration, Y.-S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data cannot be obtained because the bank must protect customer privacy and has business secrets.

Acknowledgments

We sincerely thank the Wealth Management Department of Cathay Pacific Bank in Taiwan for providing all the assistance required during the writing of this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Current wealth management customer database fields of the Bank C.
Table A1. Current wealth management customer database fields of the Bank C.
Field TypeField NameData Scope
Prediction (the answer)C1, potential customer No (0), yes (1)
Basic customer informationC2, age 20 to 70
C3, gender Male (1), female (2), enterprise (0)
C4, education 1. Ph.D. 2. Master. 3. Bachelor.
4. Associate bachelor.
5. High school. 6. Others.
C5, professional investor flag No (0), yes (1)
C6, occupation 1. Unstable. 2. Public agencies.
3. Teaching position/business.
4. Housekeeping.
5. Financial industry. 6. Others
C7, life cycle segmentation 1. Top rich class.
2. Rich second generation.
3. Middle class. 4. Young.
5. Salary class.
6. Asset appreciation. 7. Retired.
Customer asset accountC8, total average asset balance 3 to 615 million
C9, number of products held 0 to 8
C10, check deposit holder No (0), yes (1)
C11, Taiwan foreign currency saving deposit balance 0~220 million
C12, Taiwanese dollar saving deposit balance 0~337 million
C13, Taiwanese dollar time deposit balance 0~208 million
C14, foreign currency saving deposit balance 0~169 million
C15, foreign currency time deposit balance 0~350 million
C16, mutual fund net assets under management 0~569 million
C17, insurance balance 0~647 million
C18, structural notes balance 0~584 million
Customer management of bank transactionsC19, bank-period-years 0~104 years
C20, relationship manager’s code1. Financial relationshipmanagement SRM1
2. Financial relationshipmanagement SRM2
3. Financial relationshipmanagement RM1
4. Financial relationshipmanagement RM2
5. PFB
6. YS
7. Comprehensive relationship management SRM
8. Comprehensive relationship management RM
9. Private banking RM
C21, customer’s major district code1. North District 1
2. North District 2
3. North District 3
4. North District 4
5. Tao-zhu District
6. Middle District 1
7. Middle District 2
8. South District

References

  1. Amnur, H. Customer Relationship Management and Machine Learning Technology for Identifying the Customer. JOIV Int. J. Inform. Vis. 2017, 1, 12–15. [Google Scholar] [CrossRef]
  2. Ledro, C.; Nosella, A.; Pozza, I.D. Integration of AI in CRM: Challenges and guidelines. J. Open Innov. Technol. Mark. Complex. 2023, 9, 100151. [Google Scholar] [CrossRef]
  3. Houdhury, A.M.; Nur, K. A Machine Learning Approach to Identify Potential Customer Based on Purchase Behavior. In Proceedings of the International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 10–12 January 2019. [Google Scholar]
  4. Velu, A. Machine Learning Techniques for Customer Relationship Management. Int. J. Creat. Res. Thoughts (IJCRT) 2021, 9, 753–763. [Google Scholar]
  5. Ledro, C.; Nosella, A.; Vinelli, A. Artificial intelligence in customer relationship management: Literature review and future research directions. J. Bus. Ind. Mark. 2022, 37, 48–63. [Google Scholar] [CrossRef]
  6. Hsieh, J.-W.; Lai, C.-H.; Hwang, Y.-S. Application of Artificial Intelligence Technology in Bank Wealth Management Customer Operation Analysis-Utilizing Support Vector Machine to Predict and Classify Potential Customers. Master’s Thesis, Executive Master of Business Administration (EMBA). National Taipei University of Technology (NTUT), Taipei, Taiwan, 19 January 2024. [Google Scholar]
  7. Pal, S.K.; Mitra, S. Multilayer perceptron, fuzzy sets, classification. IEEE Trans. Neural Netw. 1992, 3, 683–697. [Google Scholar] [CrossRef] [PubMed]
  8. Esmaily, H.; Tayefi, M.; Doosti, H.; Ghayour-Mobarhan, M.; Nezami, H.; Amirabadizadeh, A. A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes. J. Res. Health Sci. 2018, 18, 412. [Google Scholar]
  9. Zou, X.; Hu, Y.; Tian, Z.; Shen, K. Logistic Regression Model Optimization and Case Analysis. In Proceedings of the IEEE 7th International Conference on Computer Science and Network Technology, Dalian, China, 19–20 October 2019. [Google Scholar]
  10. MHearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  11. Zhang, R.; Nie, F.; Li, X.; Wei, X. Feature selection with multi-view data: A survey. Inf. Fusion 2019, 50, 158–167. [Google Scholar] [CrossRef]
  12. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
  13. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. (JMLR) 2003, 3, 1157–1182. [Google Scholar]
  14. Salesi, S.; Cosma, G.; Mavrovouniotis, M. TAGA: Tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Inf. Sci. 2021, 565, 105–127. [Google Scholar] [CrossRef]
  15. Moutinho, L. Strategic management in tourism. In Segmentation, Targeting, Positioning and Strategic Marketing; Moutinho, L., Ed.; CABI: Wallingford, UK, 2000; pp. 121–166. [Google Scholar]
Figure 1. ROC of rules of thumb, dashed line represents the threshold.
Figure 1. ROC of rules of thumb, dashed line represents the threshold.
Engproc 89 00032 g001
Figure 2. ROC of correlation, dashed line represents the threshold.
Figure 2. ROC of correlation, dashed line represents the threshold.
Engproc 89 00032 g002
Figure 3. ROC of feature importance, dashed line represents the threshold.
Figure 3. ROC of feature importance, dashed line represents the threshold.
Engproc 89 00032 g003
Figure 4. ROC of L1 regularization, dashed line represents the threshold.
Figure 4. ROC of L1 regularization, dashed line represents the threshold.
Engproc 89 00032 g004
Figure 5. Confusion matrix of rules of thumb.
Figure 5. Confusion matrix of rules of thumb.
Engproc 89 00032 g005
Figure 6. Confusion matrix of correlation.
Figure 6. Confusion matrix of correlation.
Engproc 89 00032 g006
Figure 7. Confusion matrix of feature importance.
Figure 7. Confusion matrix of feature importance.
Engproc 89 00032 g007
Figure 8. Confusion matrix of L1 regularization.
Figure 8. Confusion matrix of L1 regularization.
Engproc 89 00032 g008
Table 1. Performance after data engineering [3].
Table 1. Performance after data engineering [3].
AlgorithmAccuracyRecallPrecision
Logistic regression98.4999.5697
Decision tree classifier97.9596.9898.22
Support vector classifier97.397.9995.82
Random forest classifier98.1498.4997.2
Multilayer perceptron classifier99.4198.9399.68
Table 2. Comparison of models after training using SVM and classification methods.
Table 2. Comparison of models after training using SVM and classification methods.
MethodChosen FieldsAccuracyF1 Score
Rule of thumbC2, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C20, C210.830080.83
CorrelationC2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C14, C15, C16, C17, C18, C19, C20, C210.954240.95
Feature importanceC5, C8, C9, C10, C14, C16, C180.952960.95
L1 regularizationC8, C14, C16, C180.920640.92
Table 3. Comparison of results obtained by retesting four methods.
Table 3. Comparison of results obtained by retesting four methods.
MethodAUCPrecisionRecallF1 Score
Rules of thumb0.70.357640.9710.52274
Correlation0.870.417480.9360.57742
Feature importance0.960.575380.9770.72424
L1 regularization0.820.3333310.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lai, C.-H.; Lin, Y.; Hsieh, J.-W.; Hwang, Y.-S. The Use of Support Vector Machine to Classify Potential Customers for the Wealth Management of a Bank. Eng. Proc. 2025, 89, 32. https://doi.org/10.3390/engproc2025089032

AMA Style

Lai C-H, Lin Y, Hsieh J-W, Hwang Y-S. The Use of Support Vector Machine to Classify Potential Customers for the Wealth Management of a Bank. Engineering Proceedings. 2025; 89(1):32. https://doi.org/10.3390/engproc2025089032

Chicago/Turabian Style

Lai, Chien-Hung, Yi Lin, Ju-Wen Hsieh, and Yuh-Shyan Hwang. 2025. "The Use of Support Vector Machine to Classify Potential Customers for the Wealth Management of a Bank" Engineering Proceedings 89, no. 1: 32. https://doi.org/10.3390/engproc2025089032

APA Style

Lai, C.-H., Lin, Y., Hsieh, J.-W., & Hwang, Y.-S. (2025). The Use of Support Vector Machine to Classify Potential Customers for the Wealth Management of a Bank. Engineering Proceedings, 89(1), 32. https://doi.org/10.3390/engproc2025089032

Article Metrics

Back to TopTop