Crypto Exchanges and Credit Risk: Modeling and Forecasting the Probability of Closure

: While there is increasing interest in crypto assets, the credit risk of these exchanges is still relatively unexplored. To ﬁll this gap, we considered a unique dataset of 144 exchanges, active from the ﬁrst quarter of 2018 to the ﬁrst quarter of 2021. We analyzed the determinants surrounding the decision to close an exchange using credit scoring and machine learning techniques. Cybersecurity grades, having a public developer team, the age of the exchange, and the number of available traded cryptocurrencies are the main signiﬁcant covariates across different model speciﬁcations. Both in-sample and out-of-sample analyzes conﬁrm these ﬁndings. These results are robust in regard to the inclusion of additional variables, considering the country of registration of these exchanges and whether they are centralized or decentralized.


Introduction
A cryptocurrency is generally defined as a digital asset designed to work as a medium of exchange, while cryptography is used to protect transactions and to control the creation of additional units of currency 1 .Over the past ten years, since the advent of Bitcoin in 2009, cryptocurrency research has become one of the most relevant topics in the field of finance, see Burniske and Tatar (2018), Fantazzini (2019), andBrummer (2019), Schar and Berentsen (2020) for more details.
Some studies show that cryptocurrencies have been used, not only as an alternative way to carry out transactions, but also as investment assets.According to Glaser et al. (2014), users view their cryptocurrency investments as speculative assets rather than a means of payment.Moreover, Baur et al. (2018) show that the largest cryptocurrency -Bitcoin-is not related to traditional asset classes, such as stocks or bonds, thus indicating the possibility of diversification.Fama et al. (2019) used the empirical strategy originally proposed by Baek and Elbeck (2015), and they found that it is more reasonable to consider Bitcoin as a highly speculative financial asset rather than a peer-to-peer cash system.Furthermore, White et al. (2020) obtained that Bitcoin is diffusing, i.e., it is a technology-based product rather than a currency, so it seems Bitcoin and other cryptocurrencies can be mostly considered as assets rather than currency.However, we should also note that some authors recently derived the fundamental value of Bitcoin as a means of payment, see Schilling and Uhlig (2019), Biais et al. (2020), Giudici et al. (2020), Chen and Vinogradov (2021), and references therein.Therefore, as of writing this paper, a clear distinction between being an asset and a payment mechanism cannot be made.
One of the most popular ways to trade and hold cryptocurrencies is by using crypto exchanges.Moore and Christin (2013) were the first to notice that traders can face the risk of crypto exchange closing down with accounts wiped out.They showed that nearly 45 percent of exchanges that opened before 2013 failed, taking the users' money with them.This result shows the need to develop models that can discriminate between safe and vulnerable exchanges.This goal is important because crypto exchanges are the most popular way to exchange fiat currencies with cryptocurrencies and vice versa, and it is therefore essential to know which exchange to use based on its security and safety profiles.Moreover, the risks of crypto exchanges may significantly contribute to the value of cryptocurrencies as assets, as the famous bankruptcy of the Mt.Gox exchange and the hacks of several exchanges highlighted, see Feder et al. (2017), Gandal et al. (2018), Chen et al. (2019), Twomey and Mann (2020), and Alexander and Heck (2020) for a detailed discussion.
Based on our knowledge, this topic has not been investigated so far.The few studies focused on this topic analyze data before 2015 (at the latest), see Moore and Christin (2013), Moore et al. (2018), andFantazzini (2019).A quick look at CoinMarketCap 2 highlights that the total cryptocurrency market capitalization in 2021 has grown more than 400 times since 2015, with the total number of listed cryptocurrencies exceeding 10,000.Consequently, there is no doubt that the cryptocurrency market has experienced major changes over the past 6 years.
This paper aims to forecast the probability of a crypto exchange closure using previously identified factors, as well as new ones that have emerged recently.In this regard, recent IT research has suggested that, instead of focusing on specific procedures, it is better to pay attention to the overall security grade of the crypto exchange, as well as to new factors, such as the possibility of sending money to the exchange by wire transfer and/or credit card, the presence of a public developer team, etc., see Votipka et al. (2018) and Hacken Cybersecurity Services (2021) for more details.Therefore, to reach the paper's objective, we first employed a set of models to forecast the probability of closure, using a unique set of covariates (some of which were never used before), including both traditional credit scoring models and more recent machine learning models.The latter are employed because recent literature show their superiority over traditional approaches for credit risk forecasting, see Barboza et al. (2017) and Moscatelli et al. (2020) for more details.
The second contribution of this paper is a forecasting exercise, using a unique set of 144 exchanges that were active from the beginning of 2018 until the end of the first quarter of 2021.Our results show that the cybersecurity grades, having a public developer team, the age of the exchange, and the number of available traded cryptocurrencies are the main factors across several model specifications.Both in-sample and out-of-sample forecasting confirm these findings.
The third contribution of the paper is a set of robustness checks to verify that our results also hold when considering the country of registration of the crypto exchanges and whether they are centralized or decentralized.
The paper is organized as follows: Section 2 briefly reviews the (small amount of) literature devoted to the risks of exchange closure, while the methods proposed to model and forecast the probability of closure are discussed in Section 3. The empirical results are reported in Section 4, while robustness checks are discussed in Section 5. Section 6 briefly concludes.

Literature Review
The financial literature dealing with the credit risk involved in crypto exchanges is extremely limited and, as of writing this paper, only three works have examined the main determinants that could lead to the closure of an exchange 3 .Moore and Christin (2013) highlighted that fraudsters can hack the exchanges instead of trying to hack the cryptocurrency system directly, by taking advantage of a specific property of several cryptocurrencies (Bitcoin included): transactions are irrevocable, unlike most payment mechanisms, such as credit cards and other electronic fund transfers, so that the fraud victims cannot get their money back after revealing the scam; see also Moore et al. (2012) for more details.In this regard, we should note that, when investing in a crypto asset, there are two types of credit risks: the possibility that the asset "dies" and the price goes to zero (or close to zero) 4 , and the possibility that the exchange closes, taking most of its users' money with it.The latter is an example of counterparty risk, where the exchange may not fulfill its part of the contractual obligations.In this regard, Moore et al. (2018) examined 80 Bitcoin exchanges established between 2010 and 2015 and found that 38 have since closed: of these 38, 5 fully refunded customers, 5 refunded customers only partially, 6 exchanges did not reimburse anything, while there is no information for the remaining 22 exchanges.These numbers show that closed/bankrupt crypto exchanges imply losses given default (LGD) comparable to subordinated bonds if not public shares; see Shimko (2004) for more details about classical LGDs estimated using the data from Moody's Default Risk Service Database.The best example of the credit risk associated with crypto exchanges is likely represented by the bankruptcy of Mt.Gox in 2014.At that time, this exchange had the most traded volume worldwide (>70%); it dealt with the most important cryptocurrency (Bitcoin), and it was based in a developed country with a sophisticated and advanced legal system (Japan).Moreover, the Bitcoin price increased more than 20 times from the moment the bankruptcy was declared until the moment the available exchange assets were liquidated.Despite these premises, creditors that sued Mt.Gox (not all of them did) will probably be refunded according to the price in April 2014, but it is not clear when, due to competing (and conflicting) legal claims, see the full Reuters and Bloomberg reports by Harney and Stecklow (2017) and Leising (2021), respectively, for more details.Moore and Christin (2013) first used a Cox proportional hazards model to estimate the time it takes for Bitcoin exchanges to close down, and to discover the main variables that can affect the closure.They found that exchanges that processed more transactions were less likely to shut down, whereas past security breaches and an anti-money laundering indicator were not statistically significant.Secondly, they ran a separate logistic regression to explain the probability that a crypto exchange experienced a security breach, and they found that a higher transaction volume significantly increased this probability, while the age of the exchange was not significant.Moore et al. (2018) extended the work by Moore and Christin (2013), by considering data between 2010 and March 2015, and up to 80 exchanges.They built quarterly indicators and estimated a panel logit model with an expanded set of explanatory variables.They found that a security breach increases the odds that the exchange will close the same quarter, while an increase in the daily transaction volume significantly decreases the probability that the exchange will shut down that quarter.Interestingly, they found that exchanges that get most of their transaction volume from fiat currencies traded by few other exchanges are 91% less likely to close than other exchanges that trade fiat currencies with higher competition.Moreover, they reported a significant negative time trend decreasing the probability of closure over time, thus implying that the quality of crypto exchanges may be improving.Instead, an anti-money laundering indicator and the two-factor authentication were not significant, similar to what was reported by Moore and Christin (2013).Fantazzini (2019) showed that crypto exchanges belong to a large 'family' known as small and medium-sized enterprises (SMEs), which represent the vast majority of businesses in most countries.Credit risk management for SMEs is a challenging process due to a lack of data and poor financial reporting; see the report by the European Federation of Accountants (Federation des Experts Comptables Europeens (2005)) for a specific analysis of this problem, the textbooks by Ketz (2003) and Hopwood et al. (2012) for a larger discussion about financial frauds, while Reurink (2018) provides a recent literature review.Given this background and using the dataset by Moore and Christin (2013), Fantazzini (2019) proposed several alternative approaches to forecast the probability of closure of a crypto exchange, ranging from credit scoring models to machine learning methods.However, intensive in-sample and out-of-sample forecasting analyzes were not performed and the dataset used is now almost ten years old, thus reflecting a completely different market for crypto assets.
Therefore, given the past literature and professional practice, we expect that older exchanges should have a larger experience in terms of system security and a larger user base providing higher transaction fees, which should result in a smaller probability of closure.Similarly, the possibility to send money to the exchange by wire transfer and/or credit card should highlight a higher security level and, thus, a lower probability of default.Moreover, a mature and experienced exchange should be transparent, and the team running it should be composed of accountable individuals with identities publicly available.Furthermore, crypto exchanges with higher overall security grades are expected to show a lower probability of closure, whereas exchanges with a smaller number of tradable assets and a smaller volume of transaction fees may have less funding for the exchange security and thus a higher probability of closure.Finally, a past security breach should increase the probability that the exchange will close or go bankrupt.

Materials and Methods
To analyze the determinants behind the decision of closing an exchange, we consider the two main approaches: credit scoring models and machine learning.The literature on credit scoring models is pretty large Baesens andVan Gestel (2009), Joseph (2013).Machine learning techniques have been extensively used in finance; see James et al. (2013), De Prado (2018) and Dixon et al. (2020).Another important contribution of this paper involves comparing the classification accuracy of credit scoring models and machine learning techniques.To do so, we briefly review the models that will be used in the empirical analysis in this section.We remark that our paper employs credit scoring and machine learning models to estimate the probability of closure of crypto exchanges with a cross-sectional dataset.Some of these models could be used for time series forecasting and portfolio management with crypto assets; see Borges and Neves (2020); Sebastião and Godinho (2021), and references therein for more details.

Credit Scoring Models
Scoring models employ statistical techniques to combine different variables into a quantitative score.Depending on the model, the score can be either interpreted as a probability of default (PD), or used as a classification system.In the former case, a scoring model takes the following form: where PD i is the probability of default for the firm i (in our case, a crypto exchange), and X is a vector of financial ratios or indicators of various kind.If we use a logit model, F(β X i ) is given by the logistic cumulative distribution function, The maximum likelihood method is usually used to estimate the parameters vector β in Equation (1), see McCullagh and Nelder (1989) for more details.The logit model is the widely used benchmark for scoring models, because it often shows a good performance in out-of-sample analysis, see Fuertes and Kalotychou (2006), Rodriguez and Rodriguez (2006), Fantazzini and Figini (2008), Fantazzini and Figini (2009), and references therein.
The linear discriminant analysis (LDA) proposed by Fisher (1936) uses a set of variables to find a threshold able to separate the reliable firms from insolvent ones.LDA builds a linear combination of these variables for the two populations of firms (alive and defaulted), with the weights chosen to maximize the average distance between the two populations.Once the weights are computed, the observations of the different variables are transformed into a single score for each firm, which is then used to classify the firm based on the distance of the score from the average scores for the two populations.The variables of the two groups must be distributed as a multivariate normal with the same variance-covariance matrix.
If we have a set of n variables X, the group of alive firms will be separated from the group of defaulted firms based on a discriminating function of this type: where Z is the so-called Z-Score, a is the vector of discriminant coefficients (weights), and the average values for the two groups (defaulted and not defaulted) are E(a X) = a X1 and E(a X) = a X2 .The best discriminant function is found by choosing a, so that the squared distance between the sample means of the two groups weighted by the variance/covariance matrix Σ is the maximum: while the optimal threshold is given by, and supposing that Z1 > Z2 , the discriminant rule is: The Altman (1968) Z-score model is arguably the most well-known classificatory model for credit risk that uses the linear discriminant analysis, and it is still widely used nowadays; see Altman and Sabato (2007) for more details.

Machine Learning Techniques
Machine learning (ML) is a subfield of artificial intelligence that deals with the development of systems able to recognize complex patterns and make correct choices using a dataset already analyzed.We will consider methods that can be useful for forecasting the probability of closure for a set or crypto exchanges, which is a specific case of supervised learning dealing with a classification problem, where the outputs are discrete and divided into two classes.In general, supervised learning considers all the algorithms where the user provides examples of what the algorithm must learn, containing both the input data and the corresponding output value.The goal is to generate an inference function known as a "classifier" that can be used to predict an output value given a certain input.
The supervised learning algorithm known as Support Vector Machine (SVM) was originally developed by V. Vapnik and his team in the 1990s at the Bell AT&T laboratories; see Boser et al. (1992) and Cortes and Vapnik (1995).A SVM interprets the training data as points in space, maps them into one n-dimensional space, and builds a hyperplane to separate these data into different classes.The subsets of points which intersect the separation hyperplane are called support vectors.A classification problem mapped into a vector space can be linearly or not linearly separable.More specifically, the SVM binary classification problems can be formulated as y = w OE(x) + b, where x i ∈ R n are the training variables, y i ∈ {−1, 1} their corresponding labels from two classes, φ is the featurespace transformation function, w is the vector of weights, and b is the classification bias.The SVM looks for the optimal hyperplane that has a maximum margin between the nearest positive and negative samples, and the search is given by arg min If the dataset is large and/or the data are noisy, the usual optimization with the Lagrange multipliers α = {α i } i=1,...,n may become computationally challenging.To deal with this issue, it is possible to introduce control parameters that allow the violation of the previous constraints, using the following dual formulation: where k is the radial kernel k(x, y) = exp(−γ x − y 2 ) with parameter γ, while the parameter C is a regularization term, where small values of C determines a hyperplane with a large-margin separation and several misclassified points, and the opposite is true for large values of C. Other kernel functions can be used, but we chose the radial kernel due to its past success in dealing with non-linear decision boundaries, see Steinwart and Christmann (2008) and Hastie et al. ( 2009) for more details.
A classification decision tree is one of the approaches most commonly used in machine learning.It is similar to a reversed tree diagram that forks each time a choice is made based on the value of a single variable, or a combination of several variables.It consists of two types of nodes: non-terminal nodes, which test the value of a single variable (or a combination of variables) and have two direct branches that represent the outcome of a test; and terminal nodes (or leaves) that do not have further branches and hold a class label.The classification tree performs an exhaustive search at every step among all the possible data splitting, and the best partition is chosen to create branches that are as homogeneous as possible.This procedure continues until a predefined stopping criterion is satisfied that can be, for example, a minimum number of units beyond which a node cannot be further split.This operation is performed by optimizing a cost function, such as the the Gini index: suppose we have a classification outcome taking values k = 1, 2, ..., K, and pmk represents the proportion of class k observations in node m, then the Gini index is given by The Gini index is a measure of total variance across the K classes, and it also represents the expected training error if we classify the observations to class k with probability pmk .When the recursive algorithm ends, it is possible to classify the dependent variable in a specific class using the path determined by the individual tests at each internal node.In our case, the estimated probability of closure for a specific crypto exchange is given by the proportion of closed exchanges in the terminal node where the exchange is included.We refer to Hastie et al. ( 2009), Maimon and Rokach (2014) and Smith and Koning (2017) for more details about decision trees.
Decision trees have several well-known drawbacks: their performance is poor in the case of too many classes and/or relatively small datasets.They can be computationally intensive, particularly if a "pruning" procedure is required to make its structure interpretable and to avoid overfitting.Moreover, the pruning procedure may suffer from a certain degree of subjectivity and does not fully solve the problem of overfitting.Furthermore, decision trees can be highly unstable, with small changes to the dataset resulting in completely different trees.Random forests solve the problem of instability and overfitting of a single tree by aggregating several decision trees into a so-called "forest", where each tree is obtained by introducing a random component in their construction.More specifically, each decision tree in a forest is built using a bootstrap sample from the original data, where 2/3 of these data are used to build a tree, while the remaining 1/3 is used as a control set, which is known as out-of-bag (OOB) data.m variables out of the original n variables are randomly selected at each node of the tree, and the best split based on these m variables is used to split the node.The random selection of variables at each node decreases the correlation among the trees in the forest so that the algorithm can deal with redundant variables and avoid model overfitting.Moreover, each tree is grown up to its maximum size and not pruned to maximize its instability, which is neutralized by the high number of trees created to have the "forest".Note that, for a given i-th exchange in the OOB control set, the forecasts are computed using a majority vote: in simple terms, the probability of closure is given by the proportion of trees voting for the closure of exchange i.This procedure is repeated for all observations in the control set, which leads to the computation of the overall OOB classification error.The main drawback of random forests is interpretability, which is not immediate as it is for decision trees.See Hastie et al. (2009) and Smith and Koning (2017) for more details about random forests.
Finally, we will also consider the random forest with conditional inference trees proposed by Strobl et al. (2007), Strobl et al. (2008), andStrobl et al. (2009), which perform better than the original random forests in case of variables of different type (both discrete and continuous).Fantazzini (2019) showed that this approach was the best among the machine learning methods used to forecast the probability of closure with the dataset collected by Moore and Christin (2013).

Model Evaluation
Several evaluation metrics can be used to compare a set of forecasting models for binary variables.These metrics usually employ a dataset different from the one used for estimation and they can be applied to all the models considered, even if they belong to different classes, see Section 5 in Giudici and Figini (2009) for a review.Given the size of our dataset, after in-sample forecasting, we will also consider the Leave One Out Cross Validation (LOOCV): one observation is left out for forecasting purposes, while the model is estimated using all other observations in the training dataset.This process is then repeated for all observations in the dataset.Once the predicted values for the validation dataset are computed, we can check the forecasting performance of a model using the confusion matrix by Provost and Kohavi (1998), see Table 1:

Observed/Predicted Closed Exchange Alive
Closed Exchange a b Alive c d In our case, the entries in the confusion matrix have the following meaning: a is the number of correct predictions that an exchange is closed/bankrupt, b is the number of incorrect predictions that an exchange is closed/bankrupt, c is the number of incorrect predictions that an exchange is open/solvent, while d is the number of correct predictions that an exchange is open/solvent.The confusion matrix is then used to compute the area under the receiver operating characteristic curve (AUC or AUROC) proposed by Metz (1978), Metz and Kronman (1980), and Hanley and McNeil (1982) for all forecasting models.The ROC curve is computed by plotting, for any probability cut-off value between 0 and 1, the proportion of correctly predicted closed/bankrupt exchanges a/(a + b) on the y-axis, also known as sensitivity or hit rate, and the proportion of open/solvent exchanges predicted as closed/bankrupt exchanges c/(c + d) on the x-axis, also known as false positive rate or as 1-specificity, where the latter is d/(d + c).The AUC lies between zero and one and the closer it is to one the more accurate the forecasting model is, see Sammut and Webb (2011), pp.869-875, and references therein for more details.
It is possible to show that the area under an empirical ROC curve, when calculated by the trapezoidal rule, is equal to the Mann-Whitney U-statistic for comparing distributions of values from the two samples, see Bamber (1975).DeLong et al. (1988) used this nonparametric statistic to test the equality of two or more ROC areas, and we used this test in our analysis.This method has become popular because it does not make the strong normality assumptions required in alternative approaches, such as those proposed by Metz (1978) and McClish (1989).
Even though the AUC is one of the most common measures to evaluate the discriminative power of a predictive model for binary data, it has also some drawbacks, as discussed in detail by Krzanowski and Hand (2009), p. 108.Therefore, we also computed the model confidence set (MCS) proposed by Hansen et al. (2011) and extended by Fantazzini and Maggi (2015) to binary models, to select the best forecasting models among a set of competing models with a specified confidence level.The MCS procedure selects the best forecasting model and computes the probability that the other models are indistinguishable from the best one using an evaluation rule based on a loss function that, in the case of binary models, is given by the Brier (1950) score.More specifically, the MCS approach tests at each iteration that all models in the set of forecasting models M = M 0 have an equal forecasting accuracy using the following null hypothesis for a given confidence level 1 − β, is the sample loss differential between forecasting models i and j and L i stands for the loss function of model i (in our case, the Brier score).If the null hypothesis cannot be rejected, then M * 1−β = M.If the null hypothesis is rejected, an elimination rule is used to remove the worst forecasting models from the set M. The procedure is repeated until the null hypothesis cannot be rejected, and the final set of models define the so-called model confidence set M * 1−β .Among the different equivalence tests proposed by Hansen et al. (2011), we briefly discuss the T-max statistic that will be used in the empirical analysis.First, the following t-statistics are computed, is the simple loss of the i-th model relative to the average losses across models in the set M, and d ij = H −1 ∑ H h=1 d ij,h measures the sample loss differential between model i and j, and H is the number of forecasts.The T-max statistic is then calculated as T max = max i∈M (t i• ).This statistic has a non-standard distribution that is estimated using bootstrapping methods with 2000 replications, see Hansen et al. (2011) for details.If the null hypothesis is rejected, one model is eliminated using the following elimination rule: e max,M = arg max i∈M d i• / var(d i• ) .

Data
The dataset examined in this paper was collected using four sources of information: • CoinGecko 5 : it is a platform that aggregates information from different crypto exchanges and has a free application programming interface (API) with access to its database; • Cybersecurity Ranking and Certification platform 6 : it is an organization performing security reviews and assessments of crypto exchanges; • Cryptowisser 7 : it is a site specialized in comparison of different crypto exchanges, including those closed and bankrupt; • Mozilla Observatory 8 : it is a service allowing users to test the security of a particular website.
The dataset consisted of 144 cryptocurrencies that were alive or closed between the beginning of 2018 and the first quarter of 2021.We discarded earlier data because the cryptocurrency market has changed dramatically since 2015, see also Section 4.1 in Fantazzini and Kolodin (2020) and references therein for a discussion about structural changes in Bitcoin markets.
Safety is essential for crypto exchanges because it builds trust among users.The more customers are sure that their money is safe on a specific crypto exchange, the more they will use that crypto exchange, and this explains why several crypto exchanges try to improve their security.Moreover, in case of a security breach, a crypto exchange may be obliged to compensate users for the lost money.Consequently, security grades can affect the probability that a crypto exchange will close.Past studies focused on the presence of some peculiar security procedures, such as the two-step authentication process or a security audit, but most of these variables turned out to be not statistically significant.Therefore, following the latest professional IT research (see Hacken Cybersecurity Services ( 2021)), we decided to use aggregated overall grades of the exchange's cybersecurity in place of single testing procedures.
The Cybersecurity Ranking and Certification platform developed a methodology that allows assessing the overall cybersecurity grade of different exchanges.This grade depends on the results of testing procedures performed in six different categories: The final cybersecurity grade takes all the previous security factors into account and assigns an aggregated score between 0 and 10.It is important to note that these cybersecurity grades changed over time for most crypto exchanges, particularly for the exchanges that closed.Therefore, in the case of closed crypto exchanges, we considered the cybersecurity grades published in the periods before the closure using cache versions of the certification platform.
We also considered a second variable to measure the security of a crypto exchange using data collected from the so-called Mozilla Observatory.The Mozilla Observatory developed a grading system that allows a user to check a website's security level, with grades ranging from A+ to F.Moreover, it is possible to transform these grades into numerical variables.The grades for the crypto exchanges that are alive refer to the first quarter of 2021, while the grades for the closed crypto exchanges refer to the last quarter when they worked.Possible grades and the corresponding numerical grades are reported in Table 2 9 .Moore et al. (2018) found that a negative time trend significantly affected the probability of a crypto exchange closure.As a consequence, we included in the analysis a variable named "age" to measure the operational longevity of exchanges: in the case of alive exchanges, this variable is equal to the number of years from their foundation until the first quarter of 2021, while for closed exchanges to the number of years between their launch and their closure 10 .Moore et al. (2018) also discovered that a security breach increased the odds of an exchange closing in the same quarter.Therefore, we added a binary variable to model the case of whether the crypto exchange was hacked or not 11 .
Crypto exchanges give the possibility to trade different cryptocurrencies: a higher number of available assets to trade may result in higher transaction volumes and higher incomes from fees.Thus, the number of traded cryptocurrencies may potentially decrease the probability of closure, so we added this variable in our analysis 12 .
Finally, recent professional research has suggested studying whether the exchange's developer team is public or anonymous because this information can be a potential harbinger of future scams, see Digiconomist (2016), Reiff (2020), Sze (2020) for more details.A mature and experienced exchange should be transparent, and the team running it should be composed of accountable individuals.Unfortunately, it is common for scammers to create fake identities and biographies for their projects, so that is important to check whether the members of the development team and their qualifications are real.Therefore, we also added a binary variable, which is 1 if the team information is public and 0 otherwise 13 .For similar reasons, we also considered two dummy variables that are equal to 1 if the exchange supports credit card/wire transfers, respectively, and zero otherwise.
The final dataset consisted of 144 exchanges 14 active from the beginning of 2018 until the first quarter of 2021 (but they could start working before 2018): 51 exchanges closed, while 93 were still active.A brief description of the variables used in the empirical analysis is reported in Table 3.The variance inflation factors of the regressors that are reported in Table A2 and their correlation matrix in Table A3 (both of them in the Appendix A) show that collinearity is not a problem in our dataset 15 .Their box plots are reported in Figure 1.

In-Sample Analysis
Table 4 reports the results for the logit model, together with its traditional diagnostics and goodness-of-fit tests, such as the McFadden (1974) pseudo R 2 , the Hosmer and Lemesbow (1980) test, the Osius and Rojek (1992) test, and the Stukel (1988) test, where the latter two tests are robust variants of the original Hosmer and Lemesbow (1980) test, see Bilder and Loughin (2014) Section 5 for a detailed discussion at the textbook level.The logit diagnostics show a pretty good fit and the lack of major misspecification problems, while the signs of all coefficients correspond to what we expected.Interestingly, only the presence of a public team and the CER security grade are strongly significant at the 5% probability level, while the possibility of a wire transfer, the exchange age, and the presence of a security breach are only weakly significant at the 10% level.All other regressors are not statistically significant.
The estimated coefficients of the linear discriminant function that is used to classify the two response classes are reported in Table 5: the signs and sizes of the coefficients are rather similar to the coefficients of the logit model.Figure 2 reports a stacked histogram of the values of the discriminant function separately for each group (alive and closed exchanges in our case), which is a common way to display the results of a LDA: positive values are generally associated with closed exchanges, while negative values with alive exchanges.
The estimated decision tree with our dataset is reported in Figure 3.The meaning of the plot is the following: 51 exchanges closed (∼35% of the total sample), while 93 exchanges remained alive (∼65% of the total sample).In the dataset, there were 89 exchanges (∼62% of the total sample) that had a public developer team: out of these 89, 14 exchanges closed (∼16% of 89 exchanges), while 75 remained alive (∼84% of 89 exchanges).Out of the 55 exchanges (∼38% of the total sample) that did not have a public team, 37 exchanges closed (∼67% of 55 exchanges), while 18 remained alive (∼33% of 55 exchanges).In the last row: • 51% of exchanges (=73 exchanges) had a public team and an age bigger than 2.5 years (68 remained alive and 5 closed, 93% and 7%, respectively); • 11% of exchanges (=16 exchanges) had a public team and an age smaller than 2.5 years (7 remained alive and 9 closed, 44% and 56%, respectively); • 11% of exchanges (=16 exchanges) did not have a public team and they had a number of tradable assets bigger than 35 (11 remained alive and 5 closed, 69% and 31%, respectively); • 27% of exchanges (=39 exchanges) did not have a public team and they had a number of tradable assets smaller than 35 (7 remained alive and 32 closed, 18% and 82%, respectively).
Summarizing: an exchange that has a public team, which has operated for more than 2.5 years, and which has a number of tradable assets bigger than 35 has a high probability to survive and to keep working.
Support vector machines, random forests, and conditional random forests do not have straight interpretations.To compare these models with the previous ones, we followed Fantazzini and Figini (2008) and Moscatelli et al. (2020) and we first report in Table 6 the models' AUCs together with their 95% confidence intervals for the in-sample forecasting performance, their Brier scores, and whether the models were included in the MCS or not.Table 7 reports the joint test for the equality of the AUCs estimated for all models using the test statistic proposed by DeLong et al. (1988).Finally, Table 8 reports the difference (in %) between the models' AUCs (with all variables included) and the AUCs of the same models with a specific variable excluded: this approach was proposed in Moscatelli et al. (2020) as a measure of variable importance across different models.The random forest is the best model (but conditional R.F. and SVM are close), while the age of the exchange, the number of tradable assets, and a public developer team seem to be the most important variables to model the probability of closure.The reported high values of the AUCs were expected, given that we did in-sample forecasting with a small dataset, so that out-of-sample forecasting should give better insights about the real forecasting capabilities of the models.

Out-of-Sample Analysis
After in-sample forecasting, we implemented the leave one out cross validation (LOOCV), where one observation is left out for forecasting purposes, while the model is estimated using all other observations in the dataset.This process is then repeated for all observations in the dataset.
Table 9 reports the models' AUCs together with their 95% confidence intervals for the LOOCV forecasting performance, their Brier scores, and whether the models were included in the MCS or not.Table 10 reports the joint test for the equality of the AUCs estimated for all models using the test statistic proposed by DeLong et al. (1988), while Table 11 reports the difference (in %) between the models' AUCs (with all variables included) and the AUCs of the same models with a specific variable excluded.The performance criteria highlight that there is not a clear model that strongly outperforms the others, since they all show a similar AUC close to 85%-90%.An exception is the decision tree model that had the worst performance; thus, confirming well-known problems of model instability with small changes to the dataset.However, the MCS shows that the random forest and the SVM have significantly better forecasts than the competing models, according to the Brier score.
This empirical evidence seems to partially confirm past evidence and the theoretical discussion reported by Hand (2006), who showed that "the marginal gain from complicated models is typically small compared to the predictive power of the simple models", and that "simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm".Moreover, simple classification models may be preferred thanks to their interpretability, which may be a legal requirement in some cases (like credit scoring).
As for the main determinants of the decision of closing an exchange, a public developer team is the most important variable across all models, followed by the number of tradable crypto assets, the age of the exchange, and the CER cybersecurity grade.The evidence that a public developer team is by far the most important determinant did not come as a surprise: scammers and fraudsters alike always try to hide their identity to avoid being discovered (and prosecuted).

Robustness Checks
We wanted to verify that our previous results also hold with different model specifications.Therefore, we performed a series of robustness checks considering the additional information of whether the exchanges are centralized or decentralized, as well as their country of registration.

Centralized or Decentralized Exchanges: Does It Matter?
Decentralized exchanges allow for direct peer-to-peer cryptocurrency transactions without the need for an intermediary, thus reducing the risk of theft from hacking that can take place in centralized exchanges.Moreover, they can prevent price manipulation or faked trading volume through wash trading 16 , and they are more anonymous than centralized exchanges that require "know your customer" (KYC) procedures 17 .However, they have also some drawbacks, such as slippage and front running; see Lin et al. (2019), Daian et al. (2020), Johnson (2021), andAlkurd (2021) for more details.
The number of decentralized exchanges in our dataset is less than 5%, so their influence on the probability of closure can be minor at best.Nevertheless, we added a binary variable to our dataset that is 1 if the exchange is decentralized and zero otherwise, and we redid our analysis due to the increasing interest in these exchanges 18 .Table 12 reports the models' AUCs together with their 95% confidence intervals for the LOOCV forecasting performance, their Brier scores, and whether the models were included in the MCS or not.Table 13 reports the joint test for the equality of the AUCs estimated for all models using the test statistic proposed by DeLong et al. (1988), while Table 14 reports the difference (in %) between the models' AUCs (with all variables included) and the AUCs of the same models with a specific variable excluded.The models' performances are very close, if not identical, to the baseline out-of-sample forecasting case.The only small difference is the Brier scores that are now slightly higher, so the MCS includes all models except for the decision tree model.The noise introduced by an additional insignificant regressor worsened the model performances just enough to make them no more statistically different from each other, and the MCS was unable to separate good and bad models.This outcome was expected due to the small sample size involved and the small number of decentralized exchanges present in the dataset.

Country of Registration: Does It Matter?
To verify the effect of the country of registration of crypto exchanges on their probability of closure, we followed Moore and Christin (2013) and Moore et al. (2018), and we used an index computed by World Bank economists (Yepes (2011)) to identify each country's compliance with "Anti-Money Laundering and Combating the Financing of Terrorism" (AML-CFT) regulations; see Yepes (2011) for more details.
Table 15 reports the models' AUCs together with their 95% confidence intervals for the LOOCV forecasting performance, their Brier scores, and whether the models were included in the MCS or not.Table 16 reports the joint test for the equality of the AUCs estimated for all models using the test statistic proposed by DeLong et al. (1988), while Table 17 reports the difference (in %) between the models' AUCs (with all variables included) and the AUCs of the same models with a specific variable excluded.The models' performances and the tests statistics are almost identical to the baseline out-of-sample forecasting case, thus confirming that the AML-CFT index is not a statistically significant variable as reported by Moore and Christin (2013) and Moore et al. (2018).

Conclusions
This paper investigated the determinants surrounding the decision to close an exchange, using a set of variables consisting of previously identified factors, and new ones that emerged from the latest professional IT research.
To reach this objective, we first proposed a set of models to forecast the probability of closure of a crypto exchange, including both traditional credit scoring models and more recent machine learning models.Secondly, we performed a forecasting exercise using a unique set of 144 exchanges that were active from the beginning of 2018 until the end of the first quarter of 2021.We found that having a public developer team is by far the most important determinant, followed by the CER cybersecurity grade, the age of the exchange, and the number of traded cryptocurrencies available on the exchange.Both in-sample and out-of-sample forecasting confirm these findings.The fact that having a public developer team is the most important factor is probably a confirmation that cryptocurrencies' returns merely depend on financial conventions and that these assets have become part of the traditional financial system, as discussed in Fama et al. (2019).
The general recommendation for investors that emerged from our analysis is to choose an exchange with a public developer team (scammers and fraudsters always try to hide), with a high CER cybersecurity grade, preferably with a working experience of several years, and with a high number of available tradable assets, which can guarantee a large volume of transaction fees and, thus, better funding for exchange security.
Finally, we performed a set of robustness checks to verify that our results also hold when considering whether the exchanges are centralized or decentralized, and when considering their country of registration by using an index to identify the country's compliance with the AML-CFT regulations.We found that the models' performances and the tests statistics were almost identical to the baseline out-of-sample forecasting case; thus, showing that the exchange being decentralized or not, and the AML-CFT index, are not statistically significant variables.
We should note that the number of exchanges that we used is rather low compared to traditional studies dealing with credit risk for SMEs, despite our analysis being the largest so far in this field of research.We are aware that this limitation may make our models suffer from a certain degree of selection bias.For example, some small exchanges were discarded from our dataset because we were unable to collect all the regressors required for our analysis: it was not possible to find information about their public team, past hacks, age, methods of money transfers, etc.However, we are confident that the addition of these exchanges, mainly small and no more working, would strengthen our results instead of weakening, because they would likely confirm the need to choose exchanges with a public team, without past hacks, and with several years of experience.The retrieval and the analysis of additional exchanges data are left as an avenue for future research.
Another possibility of future work will be to check how the credit risk for crypto exchanges will change when the number of decentralized exchanges and their trading volume increase to a more sizable level.The recent crackdown in China, where both crypto mining and transactions involving crypto assets are now fully prohibited, may stimulate the growth of decentralized exchanges.Their development may spread a form of "fully denationalized financial money" from which only a few social groups will benefit with increasing social inequalities, but it may also stimulate financial circuits that can enable a more equitable distribution of the wealth created by social cooperation, as recently discussed by Fama et al. (2019).This is why this phenomenon will have to be monitored.13 Information about the exchanges' developer team is available at CoinGecko.

14
The names of these exchanges are reported in Table A1 in the Appendix A.

15
The variance inflation factors (VIF) are used to measure the degree of collinearity among the regressors in an equation.They can be computed by dividing the variance of a coefficient estimate with all the other regressors included by the variance of the same coefficient estimated from an equation with only that regressor and a constant.Classical "rules of thumbs" to get rid of collinearity are to eliminate those variables with a VIF higher than 10 or to eliminate one of the two variables with a correlation higher than 0.7-0.8(in absolute value).16 Wash trading is a process whereby a trader buys and sells an asset to feed misleading information to the market.It is illegal in most regulated markets, see James Chen (2021) and references therein for more details.However, there is recent evidence that up to 30% of all traded tokens on two of the first popular decentralized exchanges on the Ethereum blockchain (IDEX and EtherDelta) were subject to wash trading activity, see Victor and Weintraud (2021) for more details.17 The "know your customer" or "know your client" check is the process of identifying and verifying the client's identity when opening a financial account, see https://en.wikipedia.org/wiki/Know_your_customer(accessed on 1 August 2021) and references therein for more details.

Figure 1 .
Figure 1.Box plots of the regressors.

Figure 2 .
Figure 2. Stacked histogram of the scores of the discriminant function separately for each group.

Table 1 .
Theoretical confusion matrix.Number of: a true positive, b false positive, c false negative, d true negative.

Table 3 .
Description of the explanatory variables used in the analysis.

Table 6 .
AUC and 95% confidence intervals for each model, Brier scores, and model inclusion in the MCS.

Table 7 .
Joint test of equality for the AUCs of the six models.

Table 8 .
Difference (in %) between the baseline AUCs and the AUCs of the same models without a specific variable.

Table 9 .
AUC and 95% confidence intervals for each model, Brier scores, and model inclusion in the MCS.

Table 10 .
Joint test of equality for the AUCs of the six models.

Table 11 .
Difference (in %) between the baseline AUCs and the AUCs of the same models without a specific variable.

Table 12 .
AUC and 95% confidence intervals for each model, Brier scores, and model inclusion in the MCS.

Table 13 .
Joint test of equality for the AUCs of the six models.

Table 14 .
Difference (in %) between the baseline AUCs and the AUCs of the same models without a specific variable.

Table 15 .
AUC and 95% confidence intervals for each model, Brier scores, and model inclusion in the MCS.

Table 16 .
Joint test of equality for the AUCs of the six models.

Table 17 .
Difference (in %) between the baseline AUCs and the AUCs of the same models without a specific variable.