1. Introduction
At present, the development of global fintech is unstoppable [
1,
2,
3]. A large amount of capital has been poured into the fintech field, which has caused the rapid growth of the fintech industry and has a huge impact on the high-quality development of the world economy [
4]. According to the report “The Pulse of Fintech H1 2021” released by Klynveld Peat Marwick Goerdeler (KPMG), one of the four major international accounting firms, the global financial technology investment reached 98 billion USD in the first half of 2021, and financial technology has become the engine of the development of the global financial industry. However, with the rapid development of fintech, it also faces many potential risks [
5,
6], such as information system vulnerabilities, customer data, and privacy leaks [
7,
8,
9]. In the process of applying advanced technology, if criminals take advantage of its loopholes to make huge profits, it will cause incalculable losses to society. Thus, studying the risks faced by the fintech industry is significant for the industry’s subsequent risk measurement and management.
Generally, corporations are exposed to various risks [
10]. Understanding these risks is essential for measuring corporate risks [
11]. Many studies have engaged in identifying the risks faced by corporations. For instance, Mirakur [
12] manually classified 29 risk types for more than 100 companies. Grundke [
13] and Bellini [
14] made contributions to identifying bank risks. Thus, risk identification has become an important basis for corporate risk management, and the comprehensive identification of risk sources is of the utmost importance for explaining fintech corporate risks. The accuracy of risk assessment and the management of fintech corporations depend on comprehensively identifying the correct risk factors [
15].
The current research works about fintech risks are mainly divided into two categories: one type of literature is that in which researchers summarize the risks or challenges faced by fintech by analyzing the characteristics of fintech [
16,
17,
18,
19,
20]; another type of literature is based on the specific business of fintech companies making risk judgments. Specifically, Lee and Shin [
21] analyzed that fintech start-ups had to deal with both financial and regulatory risks. Vives [
22] found that fintech has the potential to disrupt established financial intermediaries as new business models based on the use of big data emerge. Systemic issues arising from operational risk and cyber risk will intensify as fintech activities are carried out. Fintech credit is growing rapidly while the relevant regulatory development is insufficient, which can lead to regulatory risks [
23].
Thus, some qualitative studies on the risks and challenges of fintech have appeared in recent years. However, there is no unified understanding of the risks faced by fintech companies in the academic circle. Furthermore, the identification of risk sources of corporations is difficult and complex [
11,
24]. Previous studies identifying fintech risks mainly depended on the researchers’ adjustments. However, as the fintech industry becomes increasingly complex, depending on researchers’ adjustments is infeasible for comprehensively and accurately identifying fintech corporate risks. Therefore, a comprehensive discovery of fintech risk factors is essential for managing the risks faced by fintech corporations.
Since 2005, the US Securities and Exchange Commission (SEC) has required listed companies to add a separate part 1A to their Form 10-K to disclose “the most important factors that make the offering speculative or risky” [
25]. The disclosure text of the listed company provided us with a detailed description of the risks faced by the company, and some kinds of literature [
26,
27,
28] have also shown that the risk factor disclosures in the Form 10-K were highly related to the basic situation of the company. Therefore, there have been a great number of studies into identifying the risks faced by enterprises from the risk factor part of financial statements [
29].
Specifically, Wei et al. [
30] used a semi-supervised text mining algorithm to identify bank risk factors from financial statements. Some studies identified the risk factors of energy companies by analyzing risk disclosure texts from Form 10-K [
31], and Li et al. [
32] further measured the risk dependence between energy companies. In addition, many researchers have analyzed the risk disclosure texts of corporations to discover the risks faced by corporations [
33,
34,
35]. Among them, Bao and Datta [
34] especially proposed an unsupervised Sentence Latent Dirichlet Allocation (Sent-LDA) method to comprehensively discover and quantify the risk types from text risk disclosure. Thus, analyzing the text risk disclosure of the fintech industry by using the text mining method is a feasible way to identify the risk factors of fintech companies. However, to the best of our knowledge, there is no work that applies text analysis to the textual risk disclosures of the fintech industry currently.
The development of fintech in the US has always been at the leading level in the world. At present, the business contents of the fintech industry in the US have been expanded to many aspects, such as payment, big data analysis, trading platforms, rating agencies, software service, online loan platforms, online banks, and information consulting [
36]. As the first stock market in the world to adopt electronic trading, the Nasdaq has now become one of the world’s largest securities trading markets, and its indexes have extensive influence. KBW Nasdaq Financial Technology Index (KFTX) is the fintech index announced by KBW Investment Bank, Stifel Financial Corporation, and Nasdaq jointly in 2016, which included influential fintech companies—i.e., visa, lending club, etc.—mainly engaged in big data, exchange, transaction, and payment. The announcement of KFTX aims to accurately track the performance of companies that use high technology to issue financial products and services. According to KBW, these companies account for 18% of the U.S. financial sector and have a market value of
$785 billion. Thus, it provides a reliable research sample for us to study the risk sources of the fintech industry and the key risk factors in different sub-industries.
Therefore, this paper proposes a new perspective for comprehensively identifying the risk factors of fintech corporations by first introducing the textual mining approach of Sent-LDA to analyse textual risk disclosures of the fintech industry. Based on the typical fintech companies included in the KFTX, this paper comprehensively identifies the risks faced by fintech companies from the textual risk factor part of financial statements based on 53,452 sentences in 169 Form 10-K filings of 34 fintech companies over the period 2015–2019. Furthermore, this paper analyzes the importance of fintech risk factors, which can help bank managers and regulators to focus on these important risks. Besides, this paper also studies the difference in risk factors among fintech companies of the whole fintech industry and different fintech sub-industries by analyzing the similarity of risk factor types and risk factor contents. Based on the identification results of fintech risk factors in this paper, researchers can comprehensively and effectively select risk factors when measuring the risks of the entire fintech industry or fintech subsectors. Hence, our identified fintech risk factors constitute fundamental support for further fintech corporate risk measurement.
The structure of this paper is as follows. In
Section 2, we introduce our approach.
Section 3 explains the data collection and description. In
Section 4, we give the empirical results based on sample companies. Finally, we provide some concluding remarks in
Section 5.
2. Methods
In this paper, we use Sent-LDA to comprehensively identify the risk factors of fintech companies. Then, we use a commonly used indicator to measure risk factor importance [
30,
31] and two widely used methods to measure the similarity of risk factor types and similarity of risk factor contents to analyze the similarity of the risks faced by the fintech industry [
32,
37]. Next, we introduce these three methods in detail.
2.1. Sent-LDA Topic Model
This paper uses the topic model [
34] to identify risk factors faced by fintech companies from the textual risk factor disclosed in Form 10-K reports. As a commonly used topic model, the main idea of LDA is that each document is a mixture of multiple topics, and each word has its corresponding topic. However, for short texts, a sentence may only express one topic. The assumption of Sent-LDA is that a sentence in the document comes from a topic. The risk disclosure part of the fintech company’s 10-K report is only a short text, and one sentence expresses only one topic. Thus, previous studies proposed and used the Sent-LDA model to identify risk factors from textual risk disclosures [
30,
31,
34].
Specifically, let K, M, S, N, and V denote the number of topics, the number of documents, the number of sentences in the document, the number of words in the document, and the size of the vocabulary in a corpus, respectively. Dirichlet(·) and Multinomial(·), respectively, represent the Dirichlet distribution and multinomial distribution with parameters.
is the v-dimensional word distribution of topic k, while
is the K-dimensional topic ratio of document d. In addition, α and η represent the hyperparameters of the corresponding Dirichlet distribution, and w is a list of words in a sentence s. The graphical representation of the Sent-LDA model is shown in
Figure 1, and the specific generation process is as follows:
- (1)
For each topic k∈{1, 2,… K}, draw a Dirichlet distribution on the vocabulary words ~ Dirichlet(η);
- (2)
For each document d, draw a Dirichlet distribution over topics ~ Dirichlet(α);
- (3)
For each sentence s in document d, draw a topic distribution ~Multinomial ();
- (4)
For each word in sentence s, draw a word multinomial distribution ~Multinomial .
According to Bao and Datta [
34], who proposed the Sent-LDA model, when using the model, the key problem we should solve is to calculate the posterior distribution of the hidden variables θ (topic proportion) and z (topic distribution) given the model parameters and the set of words w observed from the sentence:
Based on the above Sent-LDA model, we can comprehensively identify risk factor topics and the assignment of sentences in these identified topics. These topics clustered through the Sent-LDA model represent the risk factors of fintech companies.
2.2. Method of Measuring the Similarity between Risk Factor Types
In recent years, cosine similarity has often been used to measure text similarity [
38,
39]. Specifically, it uses the cosine value of the angle between two vectors to reflect the similarity between the two vectors [
38]. Since it can be applied to any number of dimensions, this similarity measure has been applied in many studies [
39]. This paper uses the cosine similarity between risk factor vectors to measure the similarity of risk factor types between two companies.
Based on the risk factors identified by Sent-LDA and the risk factor distribution of the sentences in the text disclosure, we can construct a risk factor vector for each corporation [
32]. Formally, let A represent the total number of identified risk factors.
I is the total number of companies, while
T is the number of all sample years. For company
i∈{1, 2, …,
I} in year
t∈{1, 2, …,
T}, we construct an A vector, denoted as
. The value of the vector
only includes 1 or 0, which indicates whether the risk factors disclosed by the company in year
t include the particular risk factors a ∈ {1, 2, …,
A}, as in the following Expression (2):
where
stands for the
ath element in the risk factor vector
. As a result, the company’s textual risk factors have been converted into digital vectors, which could be used for further risk similarity measures.
Let
represent the risk similarity of enterprise
i and
j in year
t. This can be calculated as Formula (3):
where
represents the angle between these two vectors, and
is the cosine of this angle.
According to the above method, the more common risk factors appear in the two vectors, the greater the degree of risk similarity between the two companies, and the higher the value of . If the risk factor types of two companies are the same, their risk similarity is 1. In contrast, if the risk factor types of the two companies are completely different, the risk similarity value of the two companies is 0.
2.3. Method of Measuring the Similarity between the Risk Factor Contents
The commonly used method to measure the similarity between textual contents is to calculate the vocabulary similarity score of two texts [
37]. In this paper, we also calculate the vocabulary similarity score of risk factor contents disclosed in Form 10-K of two companies to respect the similarity of the two companies’ risk factor contents. To be specific, firstly we extract the vocabulary included in the risk disclosure section of each company’s Form 10-K reports to summarize the vocabulary used in the risk disclosure part of all companies in the sample and construct a total phrase vector
P. The length of the vector
P is equal to the number of unique words used in the risk disclosure section of all companies’ Form 10-K reports. To focus on the risk, we remove common words including articles, conjunctions, personal pronouns, and abbreviations from the stop words list. For the risk description vocabulary, for a given company
i, a vector
can be constructed. The constructed vector is a binary vector. Let the length of each vector be
L (that is, there are
L components). Use each component of the vector
to compare each component of the total vector
P in turn. If a company
i uses the word given in
P in its risk disclosure to describe the risk it faces, then fill 1 for this component in
; if not used, fill this value with 0. Then, unitize each vector to get the following expression:
To obtain the risk similarity between the two companies in the industry, we use the vectors
and
to represent a pair of companies
i and
j and calculate the cosine similarity of the two companies’ risk factor contents (or the company’s pairwise similarity), as follows:
Since fintech companies use a large number of words to describe the risks they face in Form 10-K, and , . are unitized vectors, the value of Content Cosine Similarityi,j is an unrestricted real number in the interval [0, 1]. Intuitively, when companies i and j use more of the same words to describe the risks they face, the calculated cosine similarity is higher and closer to 1.
3. Data
To analyze the risks faced by the fintech industry, we select the fintech companies included in KFTX. KFTX is the fintech index announced by KBW Investment Bank, Stifel Financial Corporation, and Nasdaq jointly in 2016, which includes 49 fintech companies– i.e., visa, lending club, etc.—mainly engaged in big data, exchange, transaction, and payment [
40]. According to KBW, these companies account for 18% of the U.S. financial sector and have a market value of
$785 billion. Thus, the announcement of KFTX aims to accurately track the performance of companies that use high technology to issue financial products and services.
Therefore, to analyze the fintech industry, we collect the textual risk factor disclosures reported in item 1A of the Form 10-K reports of all 49 fintech companies included in the KFTX. Each textual risk disclosure includes headings and the following descriptions. According to the EDGAR database on the US Securities and Exchange Commission website, some fintech companies did not disclose risk factors in their 10-K form. By removing these companies, we finally obtain a total of 34 companies as sample companies. Since the Form 10-K filings as of 2019 are publicly disclosed, the risk disclosure sentences collected in this paper are all from the disclosures of sample companies in the EDGAR database from 2015 to 2019, and the description of the sample risk factor sentences for the corporations is shown in
Table A1 of
Appendix A. Our data set contains a total of 169 Form 10-K documents and 53,452 sentences describing risk factors from 34 fintech companies from 2015 to 2019.
Table 1 shows five examples of sentences describing risk factors in our data set.
5. Conclusions
This paper comprehensively identifies the risk factors of the fintech industry for the first time. Furthermore, this paper analyzes the fintech risk factors from the perspective of risk factor importance and risk factor similarity of types and contents. Besides analyzing the risk factors of the whole industry, this paper also studies the risk factors of different fintech sub-sectors, including payment, information consulting, trading platforms, rating agencies, big data, software service, online loan platforms, and online banks. The identification of fintech risk factors can provide suggestions for the effective selection of risk factors, laying foundational support for further fintech corporate risk estimation.
In theory, through the empirical analysis based on 53,452 textual risk factor sentences disclosed in 169 Form 10-K filings of 34 fintech companies from 2015 to 2019, we identify 20 fintech risk factors. According to the order of importance from high to low, the identified 20 fintech risk factors are information system security risk, product risk, investment risk, business risk, legal risk, compliance risk, transaction payment security risk, infringement risk, economic and market condition risk, capital risk, acquisition risk, tax risk, personnel risk, data security risk, foreign exchange risk, credit risk, regulatory risk in the international market, global financial market risk, credit rating risk, and information disclosure risk.
For the similarity analysis, the similarities of risk factor types and the risk factor contents of disclosed risks both increased from 2015 to 2019, which shows that the risks faced by fintech companies are becoming increasingly similar. The mean of similarity of risk factors is higher, with an average value of 80.93%, while the average value of risk disclosure contents’ similarity is only 42.13%, indicating that although the risk factor types faced by fintech companies are very similar, their descriptions of risks are still very different. Furthermore, the similarity results of different fintech subsectors show that, in general, companies belonging to the same fintech industry have a higher similarity in both risk factor type and risk factor contents.
From the perspective of practical management, with regard to the risks faced by the fintech industry that have a huge impact on the high-quality development of the world economy, this paper comprehensively identifies fintech risk factors based on textual risk disclosures, which solves the important problem of selecting risk factors for fintech corporate risk measurement. The identification results are highly significant in terms of practical applications. The identified fintech risk factors can support financial regulators and managers of fintech companies to better measure and manage risks, which has practical significance for the robust operation of the fintech industry.
This study is not without limitations. A comprehensive selection of risk factors is of the utmost importance for explaining corporate risks. The comprehensive identification of fintech risk factors lays the foundation for making effective risk estimations. Therefore, one limitation of this paper is that we have not analyzed how to use the identified fintech risk factors to measure the risks faced by fintech companies. Therefore, in future research, we will further measure the risks of the fintech industry based on the identified fintech risk factors.