A Method for Visualizing Posterior Probit Model Uncertainty in the Early Prediction of Fraud for Sustainability Development

Corporate fraud is not only curtailed investors’ rights and privileges but also disrupts the overall market economy. For this reason, the formulation of a model that could help detect any unusual market fluctuations would be essential for investors. Thus, we propose an early warning system for predicting fraud associated with financial statements based on the Bayesian probit model while examining historical data from 1999 to 2017 with 327 businesses in Taiwan to create a visual method to aid in decision making. In this study, we utilize a parametric estimation via the Markov Chain Monte Carlo (MCMC). The result show that it can reduce over or under-confidence within the decision-making process when standard logistic regression is utilized. In addition, the Bayesian probit model in this study is found to offer more accurate calculations and not only represent the prediction value of the responses but also possible ranges of these responses via a simple plot.


Introduction
In the last few decades, many senior managers have been caught using phony financial statements to cheat stakeholders or manipulate stock prices in an attempt to funnel profits. As such, corporate fraud has long been a serious problem, particularly when it involves financial statements. Ironically, the information contained in these documents has remained as one of the key indicators that fraud has taken place [1,2]. Fraudulent activities have not only directly resulted in significant losses for stakeholders and severe punishments for the accounting institutions involved, but they have also significantly altered trading practices in the financial market. According to the Association of Certified Fraud Examiners (ACFE) in 2020, a total of 2504 cases from all over the world with an average loss of 5% revenue was due to corporate fraud which is equivalent to the loss of Gross World Product (GWP) about USD 3.6 trillion [3]. Although it is possible to detect corporate fraud, the ACEF still holds that it is indeed ubiquitous, and that no organization can be completely immune from this threat. The complex causes of fraud are explained in the agency problem theory, earning management, fraud triangle, and the GONE theory [4]. According to the fiduciary norm, managers must act solely in the interests of the principal, neglecting all others [5]. If the principal and agent are at odds, the latter will tend to focus on his or her interests which has attracted much attention over the years. Moreover, Song, et al. [6] have voiced concern over the privatization of many state-owned businesses in China, which may be problematic because, previously, the interests of state-owned companies have aligned with those of the nation. However, with privatization comes market-oriented goals, meaning effective performance and profit become the primary objectives for the corporation.

Related Studies of Fraud Detection
Fraud detection has been studied for a long time, with many techniques and models such as logistic models, decision trees, artificial neural networks [28][29][30], support vector machines [31], and random forests [32] or data engineering methods [33] which have proven to be quite precise. The most famous model is the Z-score, which is commonly used even nowadays for predicting financial distress and fraud [34]. Summers and Sweeney [35] used the logit model to study 51 companies that were under investigation by The Wall Street Journal for financial statement fraud from 1980 to 1987. The researchers matched the samples from the same number of no-fault companies following the standard industry classification code (SIC code). They found that company insiders who commit fraud tend to sell significant numbers of shares in order to reduce the quantity available for others to buy, which obviously also reduces the percentage of shares held by the company. Imhoff [36] suggests that substantive change is necessary to improve corporate governance. Problems with accounting or auditing procedures will not be solved until boards are given sufficient information to operate independently and are allowed to act on behalf of the shareholders.
In practice, fraudulent financial reporting is associated with managers who can easily override or change the internal control procedures while appearing to be loyal to the company [15]. Under these circumstances, managers can easily manipulate earnings and present falsified financial reports. Desai [37] suggests that many corporate scandals have been caused by the exaggeration of profits. Managers tend to report gross profits in the capital market and taxable profits to governmental agencies to avoid paying taxes, which leads to the creation of fraudulent financial statements. Davidson, et al. [38] studied the effect of corporate governance on earnings management by analyzing 434 companies listed in the exchange. They discovered that most non-executive directors on the board and audit committees would be less likely to manipulate earnings if the board is independent. Perols and Lougee [39] argue that managers engaged in acts of fraud begin to manipulate earnings a few years before the crime is detected. The level of adjustment may even exceed that of predicted growth, or they may exaggerate their revenues to commit financial statement fraud.
Many researchers also suggest that the quality of audits can be guaranteed [19], and fraud will much less likely [40] if financial statements are audited by large accounting firms. Although this theory is not directly observable, Hribar, Kravet, and Wilson [20] who used accounting fees as a surrogate variable, found that the size of the fees may reflect the level of reliability of the statements. Kamarudin, Ismail, and Mustapha [22] have a different perspective on this controversial issue. After analyzing data from 184 companies from 2003 to 2010, they found that most that were guilty of fraud tended to practice "aggressive accounting" including claiming revenue prematurely or over-optimism and the timely identification of loss. Although these practices are not against the law, they are considered Axioms 2021, 10, 178 4 of 22 negligent because their presence indicates that financial statements must be compiled a second time which calls into question the reliability and quality of financial reporting. In recent years, data mining and machine learning techniques have shown many advantages to traditional statistical tools in fraud detection, but we are still trying to explain this "black box" to reduce the bias of models [26]. Perols [28] found that logistic regression outperforms neural networks and decision trees. Furthermore, the Bayesian Belief Network model outperforms decision trees and neural network models for identifying fraudulent financial statements, and it also can utilize ten-fold stratified cross-validation [30]. Also, many scholars and practitioners prefer the Bayesian methods rather than machine learning or deep learning models due to the limit of data and lack of interpretability [41]. After analyzing the development of fraud detection models, it becomes clear that the accuracy of models depends heavily on gauging financial indicators.

Comparing the Bayesian Probit Model to the Standard Logistic Model
Over the years, many scholars have performed logistic analysis using the Bayesian model [42][43][44][45] to correct parametric estimation errors and establish a more realistic model. This method has been extensively applied to various domains of research. Gerlach, et al. [46] applied the Bayesian probit model to 63 items within financial statements and used stepwise regression to select appropriate variables to create a logistic model specifically for forecasting changes in corporate earnings. Lately, the Bayesian probit model, which is widely used in the domain of statistics, has attracted much attention in the field of social science [47]. In a similar vein, Rossi, et al. [48] adopted this model to analyze many marketing problems and help managers make more informed decisions. The difference between the Bayesian probit model and the standard logistic model is that the estimation of parameters under the latter is based on the Maximum Likelihood Estimation (MLE). This iterative method of calculation is necessary for determining non-linear solutions, which causes the expression of parameters to be in closed-form. After calculating the coefficient, the chi-square can be used to test its significance. Another common method is the Wald test, which conforms to the standard normal distribution with a null hypothesis [49].
Although some researchers argue that the most effective sample size for the standard logistic model is only ten or more [50], the process of mathematical inference requires a larger sample size that is substantial enough to effectively approximate the chi-square or normal distribution. However, the prior assertion cannot be ignored. Researchers always use a sample size of less than 100 for corporate fraud studies due to the prolonged time it takes to reach verdicts in such cases. For this reason, there are not enough types of samples to conduct a valid study. These limited sample sizes remain one of the inherent shortcomings of the standard logistic model. In addition, that model operates through the paring of samples. The common ratio of pairing companies that have been accused of fraudulent activities with no-fault companies is 1:1 or 1:2. In reality, it would be difficult to find two companies of similar size in the same industry. For example, in an oligarchic market, the size of companies varies significantly. At this point, because it is so difficult to find companies in good standing to use for analysis, the results of this study would be somewhat biased. This is yet another shortcoming of the logistic model. Although it is unnecessary to assume that the independent variables are from the normal distribution, after model fitting and computing the confidence interval between the independent and the dependent variable, the standard normal distribution method of the Wald test is required. Therefore, this model may not be stable enough to detect fraud, which is a third shortcoming of the standard logistic model. Whether or not the results from this model can effectively map the relationship between the variables, is another issue to be explored in the future.
Due to these shortcomings, we adopted the Bayesian probit model in conjunction with the MCMC for this study to overcome the aforementioned constraints [51]. After utilizing simulation to redistribute the parameters, we compared the posterior probability to the prior probability via the Bayesian probit model to create a realistic scenario. This model is also more effective and stable than others for determining early signs of fraud. In summary, this model can help to effectively eliminate the bias of parametric estimation. However, it has not been popularly applied by researchers in particular of financial statement fraud. Thus, the objective of this study is an attempt to use the Bayesian probit model to more effectively analyze financial statement fraud and compare it to the results of the standard logistic model to provide a more accurate reference and decision-making guide.

Notations
In the Bayesian probit model, we noted y that represents corporate fraud as 1, while all others were noted as 0. Therefore, the equation for determining the probability of fraud is F(x i ; β) = P(y i = 1|x i ; β), and non-fraud is P(y i = 0|x i ; β) = 1 − F(x i ; β). As such, the logistic function g(x i ) is also referred to as an odds ratio, as expressed in the equation below: where i = 1, 2, . . . , n. that represents the sample size of the model; j = 1, 2, . . . , p. symbolizes the individual variables; F(x i ; β) is the probability of fraud while ε i represents the residual effects. For the logistic function, parameter β was calculated via MLE, and the i th term likelihood function was determined as l i (β) = F (x i |β) y i [ (1 − F(x i |β)] 1−y i , which could be expanded into Equation (2). I assumed that each variable was independent, and that the likelihood function of the model would be the product of all items, as shown in Equation (3). According to the Bayesian inference, the posterior probability would be directly proportional to the product of the likelihood function and prior probability, which is shown in Equation (4). l i (β) = e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p 1 + e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p y i 1 − e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p 1 + e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p   e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p 1 + e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p y i 1 − e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p 1 + e β 0 +β 1 X i 1 +β 2 X i 2 +...+β p X i p Furthermore, we summarize the sequence of the proposed method as a flowchart in Figure 1.

MCMC Parameter Estimation
There has recently been a resurgence in the use of Bayesian regression methods, in part due to the popularity of the MCMC approach [48]. In this study, our model was derived from a combination of the Markov Chain and the Monte Carlo methodologies. Based on random sampling from the Markov Chain, the Monte Carlo method is used to estimate the integration of problems that have no analytical solutions or to analyze difficult and complicated probability distributions.
When employing the Markov Chain, we assumed that if β 0 , β 1 , β 2 , . . . are a series of random variables, then β t+1 would be generated from the conditional probability of P(β t+1 β t ) , and its value would only depend on β t and would not be related to β 0 , β 1 , β 2 , . . . , β t , β t−1 . When time t increases, the distribution would become stationary and independent from t and β 0 . However, if the probability could not fit into a standard distribution, we would need to apply the Monte Carlo method to obtain an accurate estimation. For instance, if β is the random variable of the model parameter, and we assume that it conforms to the posterior probability distribution π(β), then f (β) would be the expected value of the probability distribution, as shown in Equation (5). Sometimes, if it is too difficult or even impossible to calculate the integration using Equation (5), we employ the Monte Carlo integration, which is based on random sampling from π(β) for selecting β 1 , β 2 , . . . , β m and can be used to accurately estimate the mean value of the samples to approximate the expected value of the probability distribution f (β). The process is shown in Equations (5) and (6): where β t represents the t th sampling result when t ≥ 0.

MCMC Parameter Estimation
There has recently been a resurgence in the use of Bayesian regression methods, in part due to the popularity of the MCMC approach [48]. In this study, our model was derived from a combination of the Markov Chain and the Monte Carlo methodologies. Based on random sampling from the Markov Chain, the Monte Carlo method is used to estimate the integration of problems that have no analytical solutions or to analyze difficult and complicated probability distributions.
When employing the Markov Chain, we assumed that if , , , … are a series of random variables, then would be generated from the conditional probability of ( | ) , and its value would only depend on and would not be related to { , , , . . . , , }. When time increases, the distribution would become stationary and independent from and . However, if the probability could not fit into a standard distribution, we would need to apply the Monte Carlo method to obtain an accurate estimation. For instance, if is the random variable of the model parameter, and we assume that it conforms to the posterior probability distribution ( ), then ( ) would be the expected value of the probability distribution, as shown in Equation (5). Sometimes, if it is too difficult or even impossible to calculate the integration using Equation (5), we employ the Monte Carlo integration, which is based on random sampling from ( ) for selecting { , , . . . , } and can be used to accurately estimate the mean value of the samples to approximate the expected value of the probability distribution ( ). The process is shown in Equations (5) and (6): It becomes clear that if the initial value were different, the average estimation result would also change. Thus, if we could establish that φ() = π(), we could ignore the burn-in sample of the previous r th test, utilize the sampling result with interval k, and solve the above problem via Equation (7).
In this study, we applied the Gibbs sampling method (an MCMC algorithm), a special type of the Metropolis-Hastings algorithm proposed by [52] to obtain the following observations. According to this method, we determined the result of the i th sampling of β = β 0 , β 1 , . . . , β p from the mth sampling as β i = β i 0 , β i 1 , . . . , β i p by following the three steps shown below.
Step 3: We used the parametric values from the sampling to repeat step 2 until we reached the end of the m th sample.
After estimating via the Gibbs sampling, in order to verify that the Markov Chain reached stationarity, we used the Autocorrelation Function (ACF) to monitor the convergence of the chain [48,53]. Then, we selected the number series {β m : m = 0, 1, 2, . . .} from the m value of the Markov Chain. When m approximated infinity, β m changed to β. At this point, β was the random variable from the joint probability distribution, f (β) and we accomplished our estimation goal.

Creating the Fraud Detection Model
During the data-gathering phase, n represents the total number of companies and X i signifies all the predictive variables of the ith company. These could include continuous or dispersed variables, such as financial indicators, corporate governance variables, principles of stability, and the size of the company, which will be explained in detail in Section 3.4. In the model, if y i = 1, this would indicate that an act of fraud had taken place at i th company. If y i = 0, this would suggest that employees at i th company were innocent of this crime. In this study, my analysis was based on the binary probit model language of the R statistical software for sampling and estimation, as shown in Equation (9).
where Y i = (y 1, y 2 , . . . , y n ) is a vector of n × 1 which is used to determine if employees at the i th company which is engaged in fraud. Z i = (z 1, z 2 , . . . , z n ) is also a vector of n × 1 and the aggregate of the continuous potential variables that correspond to Y i . As such, the model structure that corresponds to the i th company is shown in Equation (10).
In this model, the cutoff point of value in the judgment of {Y i } differs from that in the logistic model. Thus, before we could begin any analysis, we converted the scope covered by {X i } to a range within the closed-form of [−1, 1] [48], as shown in Equation (11).
In the fraudulent financial statement prediction model proposed in this study, the only observed values were {X i } and {Y i }. The estimation parameters were the aggregate of β in the multiple of p + 1 denoted as {β t } = (β 0 , β 1 , . . . , β p ) while the posterior probability {β t } featured the closed-form parameters. As such, I used the Gibbs sampling of the posterior probability distribution to estimate the joint probability distribution of f (β) of {β t }.

Description of Variables
In this study, the fitted model has constructed 14 variables that are similar to Lin [23]. The operational definitions are discussed below: • Dependent Variables: We used binary classification to categorize the variables in this equation. The fraudulent company was noted as 1 and the no-fault company was 0.
• Independent Variables: In this study, there were 13 independent variables from the following categories: the "five financial ratios," proposed by Bernstein [54], included profitability, liquidity, growth, utility, and financial structure (Table 1), corporate governance variables (Table 2), and conservative accounting variables. The ratio of the number of external directors to total director's seats The chairman also holds the position of general manager (β 8 ) Dummy variable, chairman who also holds the position of general manager is represented by 1. If not, it is represented by 0.
Percentage of shareholding by directors (β 9 ) The quantity of shares held by the directors/Total outstanding shares at the end of the period.
Percentage of shareholding by institutional investors (β 10 ) The ratio of institutional investors in the company.
Deviation between one's voting rights and earnings (β 11 ) Voting rights minus earnings distribution rights We adopted Givoly and Hayn [21] hypothesis of stable variables, which states that the greater the Conservative Accounting (CONACC) value, the more conservative the accounting policy of the company.
(Earnings before extraodinary items+ depreciation − cash flow from the operation) Total assets at the beginning of study timeframe Size of the company β 13 = ln (Asset Size).

Sample Data
In this section, we applied the data organization as Lin [23] for adapting the framework and utilizing the MCMC method to thoroughly analyze. The income is chosen before extraordinary gain (loss). However, since enterprises in Taiwan have already adopted the IFRS accounting standards, income (loss) for continuing is more appropriate than before.
TA(β 12 ) = [income (loss) for continuing + depreciation − cash flow from operations]/average total assets: We analyzed companies that had been convicted of fraud in a court of law for crimes such as insider trading, stock price manipulation, and fraudulent financial statements between 1999 and 2017. The reason we used the dataset until 2017 was because most of the recent investigations could not be completed yet. Of the 327 companies investigated, 109 were found guilty. The 1:2 ratio method was used to match them with 218 companies that had not engaged in fraud (see Table 3). Moreover, 109 companies that had engaged in fraud spanned a total of 35 different industries. Although the crimes covered a wide range of industry categories, they did not all include special financial statement layout items such as the financial industry, securities, or insurance industries and were very similar in this way. The selection criteria used for pairing companies were based on the industry to which the fraudulent company belonged, and the fact that the asset gaps did not exceed 40% during the same year. The goal is to match two innocent companies with one guilty company of fraud. Corporate information data published by the Taiwan Economic Journal (TEJ) was used in the study. We collected all the data from the year the fraudulent activities took place (T), 1 year prior to the fraudulent activities (T-1), 2 years prior (T-2), and 3 years prior (T-3). Data from 327 enterprises and a total of 981 data items were used to establish the analysis model. The fraud distribution by industry is shown in Table 4. According to Table 4, a large portion of the fraud detection is from the semiconductor industry with 10.1%, while motherboards stay behind with 7.3%, compared to 35 different industries. In addition, most of the frauds were detected from the 2005-2009 period compared to other periods. Besides, around 30% of industries were detected as fraud with only one company from 1999 to 2017 such as glass ceramics, communication equipment or foods, and animal feed.

Prior Distributions
The corresponding probability distributions prior to estimation were assigned to all unknown parameters in the model, including the 14 constant terms. The β prior probability β in this study was set as the average and the A −1 normal distribution of the variances, which were calculated using β Equation (13) by A −1 with v 0 = 3 [48].

Sampling and Modeling
The parameter of the Bayesian probit model used in the study was estimated according to the MCMC procedure described in the previous chapter. The number of Gibbs samplings was set to 1 million (R = 1 million), the sampling interval was 10 (keep = 10), and a total of 100 thousand iterations were obtained. Next, the first 20 thousand sampling results were discarded (burn-in = 20 thousand) and the remaining 80 thousand were determined as the joint probability distribution of the parameters, which were used to calculate the detection capacity and range of the fraud warning model. K-fold cross-validation was used in this study to establish and analyze the model. The 327 companies were divided into 10 groups according to the three different years using a ratio of 1:2 between fraudulent and non-fraudulent companies. The first nine groups were made up of 33 companies, and the last group contained only 10. I used one as a test group, and the remaining nine were used as training sets. The testing was carried out 10 times, and a different group was chosen to be the test set each time to most efficiently calculate the predictive ability of the model. Besides the first-order term, an interaction term (full second-order) is also added that could represent the analysis results by a particular degree according to Allen and Tseng [55].

Prediction Results from the Standard Logistic and Bayesian Probit Models
The results of the first-order model are shown in Figures 2-4. Each graph on the box-and-whisker plot was drawn according to the prediction results and was estimated from 80,000 iterations using MCMC. The red dot represents the prediction result of the general logistic model. According to the Cross-Validation result in Figure 2, only Set 4 and Set 8 are stable by using the general logistic model, while others are uncertain in the T-1 period. In the T-2 period, most of the logistic model predictions are stable more than in the T-1 period but the uncertainty seems to increase during the T-3 period. Overall, the figures show that the single result of the logistic model fell within the 80,000 iterations that were estimated using MCMC, which indicates that the logistic model results were quite unstable. However, the MCMC was able to estimate the overall distribution and provided more abundant information.

Comparison of the One-Time Model and the Interaction Term Model
Moreover, Figures 5-7 illustrate the results of the first order and the interaction term models. Each graph on the box-and-whisker plot was also drawn based on the prediction results from 80,000 iterations that were estimated using the MCMC. The red dot represents the prediction result of the general logistic model. The figures also indicate that this model's results were quite unstable and often produced over-or under-estimations. Furthermore, the predictive accuracy of the interaction term model was generally higher than that of the one-time model. logistic model. According to the Cross-Validation result in Figure 2, only Set 4 and Set 8 are stable by using the general logistic model, while others are uncertain in the T-1 period. In the T-2 period, most of the logistic model predictions are stable more than in the T-1 period but the uncertainty seems to increase during the T-3 period. Overall, the figures show that the single result of the logistic model fell within the 80,000 iterations that were estimated using MCMC, which indicates that the logistic model results were quite unstable. However, the MCMC was able to estimate the overall distribution and provided more abundant information.   logistic model. According to the Cross-Validation result in Figure 2, only Set 4 and Set 8 are stable by using the general logistic model, while others are uncertain in the T-1 period. In the T-2 period, most of the logistic model predictions are stable more than in the T-1 period but the uncertainty seems to increase during the T-3 period. Overall, the figures show that the single result of the logistic model fell within the 80,000 iterations that were estimated using MCMC, which indicates that the logistic model results were quite unstable. However, the MCMC was able to estimate the overall distribution and provided more abundant information.

Comparison of the One-Time Model and the Interaction Term Model
Moreover, Figures 5-7 illustrate the results of the first order and the interaction term models. Each graph on the box-and-whisker plot was also drawn based on the prediction results from 80,000 iterations that were estimated using the MCMC. The red dot repre- Moreover, Figures 5-7 illustrate the results of the first order and the interaction term models. Each graph on the box-and-whisker plot was also drawn based on the prediction results from 80,000 iterations that were estimated using the MCMC. The red dot represents the prediction result of the general logistic model. The figures also indicate that this model's results were quite unstable and often produced over-or under-estimations. Furthermore, the predictive accuracy of the interaction term model was generally higher than that of the one-time model.   The results of the comparison are shown in Table 5. The T-test confirmed that there was a significant difference between the two, and the T-2 and T-3 phases were shown to have a higher accuracy rate based on the overall average, as seen at the end of Table 5.   The results of the comparison are shown in Table 5. The T-test confirmed that there was a significant difference between the two, and the T-2 and T-3 phases were shown to have a higher accuracy rate based on the overall average, as seen at the end of Table 5.   The results of the comparison are shown in Table 5. The T-test confirmed that there was a significant difference between the two, and the T-2 and T-3 phases were shown to have a higher accuracy rate based on the overall average, as seen at the end of Table 5. Comparisons of the predictive results from the traditional logistic and MCMC models regarding the 109 fraudulent companies are shown in Table 6. A logistic prediction of "1" indicates that fraud had occurred while "0" indicates no fraud. Using the MCMC method, there were 80,000 iterations for each sample, and the ratios in the fields represent the ratios of the 80,000 iterations predicted to be a fraud. According to Table 6, the MCMC provided clearly more information than the standard logistic model. For example, the 7th, 58th, 139th, 169th, and 322nd of the logistic model during the T-1 period was predicted to be normal; however, the MCMC's predictions revealed fraud with over 76%, as highlighted in grey. Furthermore, the difference between the MCMC and the logistic model also occurs in the T-2 period in the 64th, 238th, 250th, and 256th samples. During the T-3 period, eight samples are predicted as normal, but the MCMC indicates it as fraud-such as the 202nd sample with 82.9%, or the 322nd sample with 88%.

Model Error Analysis
Concerning the limitations of the models, the percentage of errors in the prediction results can be divided into false negatives and false positives. The error analysis results within the interaction models are shown in Figures 8-13. Besides, the box plots are shown in black, correspond to the sets of errors from the 80,000 iterations via the MCMC method and the red solid dots represent errors from the logistic model. In this study, I defined a false positive error as when a company was falsely accused of fraud. The false negatives occurred when a guilty company was judged to be innocent of fraud.

Model Error Analysis
Concerning the limitations of the models, the percentage of errors in the prediction results can be divided into false negatives and false positives. The error analysis results within the interaction models are shown in Figures 8-13. Besides, the box plots are shown in black, correspond to the sets of errors from the 80,000 iterations via the MCMC method and the red solid dots represent errors from the logistic model. In this study, I defined a false positive error as when a company was falsely accused of fraud. The false negatives occurred when a guilty company was judged to be innocent of fraud.
Furthermore, Figures 8-10 show the false positive errors in T-1 to T-3 periods. More than half of the 30 results (26 groups) using the logistic model deviated from the overall distribution of the 80,000 iterations. Figures 11-13 indicated false negatives in T-1, T-2, and T-3 periods, and 28 groups of these results from the logistic model deviated from the overall distribution.

Model Error Analysis
Concerning the limitations of the models, the percentage of errors in the prediction results can be divided into false negatives and false positives. The error analysis results within the interaction models are shown in Figures 8-13. Besides, the box plots are shown in black, correspond to the sets of errors from the 80,000 iterations via the MCMC method and the red solid dots represent errors from the logistic model. In this study, I defined a false positive error as when a company was falsely accused of fraud. The false negatives occurred when a guilty company was judged to be innocent of fraud.
Furthermore, Figures 8-10 show the false positive errors in T-1 to T-3 periods. More than half of the 30 results (26 groups) using the logistic model deviated from the overall distribution of the 80,000 iterations. Figures 11-13 indicated false negatives in T-1, T-2, and T-3 periods, and 28 groups of these results from the logistic model deviated from the overall distribution.            The figures above clearly show that the results of the standard logistic prediction model also indicate an unstable state (i.e., overestimation or underestimation) regarding error analysis, meaning that if only the logistic model's error results were analyzed, it would most likely result in a miscalculation of the error rate. Both the standard logistic and the Bayesian probit model have their strengths and weaknesses. For example, the former may be too simplistic to handle such complicated data. Despite the complex nature of the Bayesian probit model, it could be used to correct the parametric estimation errors and reduce the problem of over-or under-confidence. As always, the best analysis method will depend on the problem that must be solved. Although the Bayesian probit model often yields clearer results, it is very complicated and expensive. Therefore, we recommend the integrated use of these two methods. The standard logistic model can be utilized for a preliminary analysis of the sample. The Bayesian probit model will then be used for more precise calculations. Since over-fitting will interfere with the accuracy of the predictions yielded from the standard Furthermore, Figures 8-10 show the false positive errors in T-1 to T-3 periods. More than half of the 30 results (26 groups) using the logistic model deviated from the overall distribution of the 80,000 iterations. Figures 11-13 indicated false negatives in T-1, T-2, and T-3 periods, and 28 groups of these results from the logistic model deviated from the overall distribution.
The figures above clearly show that the results of the standard logistic prediction model also indicate an unstable state (i.e., overestimation or underestimation) regarding error analysis, meaning that if only the logistic model's error results were analyzed, it would most likely result in a miscalculation of the error rate. Both the standard logistic and the Bayesian probit model have their strengths and weaknesses. For example, the former may be too simplistic to handle such complicated data. Despite the complex nature of the Bayesian probit model, it could be used to correct the parametric estimation errors and reduce the problem of over-or under-confidence. As always, the best analysis method will depend on the problem that must be solved. Although the Bayesian probit model often yields clearer results, it is very complicated and expensive. Therefore, we recommend the integrated use of these two methods. The standard logistic model can be utilized for a preliminary analysis of the sample. The Bayesian probit model will then be used for more precise calculations. Since over-fitting will interfere with the accuracy of the predictions yielded from the standard logistic model, it would not be as useful for real-world scenarios. However, as previously stated, the other model yields more accurate predictive results when the specific fitting of the correct model and data are used. These elements will be processed through the Bayesian probit model to take advantage of its more realistic predictive power, and also provide a visual component to help users better understand the distribution of prediction values. Above all, if the logistic model is used for prediction, a single result represents only one prediction point within the distribution space. However, if the MCMC model is used, multiple iterations may be used to offset the uncertainty of its parameters (dispersion of the predicted result). Thus, the MCMC model may be more appropriate for helping researchers understand the complexity of corporate fraud.

Theoretical Implications
In this study, we primarily employed the standard logistic model supplemented by Bayesian inference to counteract the uncertainty of model parameters. This study may be the first to use the boxplot to visualize the effects of model uncertainty and help users to make decisions based on the simulation results of model coefficients. Based on the proposed method, we also can eliminate the bias of parametric estimation for regular statistical models. In fact, the standard logistic model better revealed the analytical results while the Bayesian probit model with parameters via the MCMC showed a stable convergence. We also found that, unlike the standard logistic model, the distribution of unknown variables cannot be expressed in closed-form, and must be referred to as a simulated sample to accurately interpret the exact distribution value of the parameters.
Combining these two models to analyze the data yielded ideal predictive results. We found that the predictive power of the standard logistic model was stronger than that of the Bayesian probit model, which was more appropriate for approximating the maximum value. But the predictive power of the standard logistic model is unstable because the parametric estimation bias is inherent within the model. In this study, we used the MCMC model to calculate an unbiased estimation to enhance the predictive power and ameliorate the effects of model uncertainty.

Implications for Managers
For the investigation of fraud, the predictive results from the standard logistic model tended to be overly optimistic. However, the Bayesian probit model will significantly drive up the cost of analysis. Thus, although the full-range application of this model is ideal, it is not practical in the real world. For this reason, we suggest the integrated use of both models for the detection of fraud. In this way, the weakness of over-fitting would balance the unfitted model and data. After preliminary sorting of the data, the Bayesian probit model could be used for more precise calculations and would provide not only the prediction value of the responses but also possible ranges of these responses via a simple plot. This can help users to make informed decisions. In this way, the strengths of both models can be retained and utilized to their best advantage. This system would be much more accurate than applying the logistic model on its own to predict corporate fraud.
In this study, both models were run independently. The findings from both models using the same set of data unanimously indicated that the data from two years before the fraud occurred could most effectively predict this crime. As such, we can infer that indirect signs of fraud would begin to surface two years before it would become obvious. Therefore, issues related to corporate fraud, particularly fraudulent financial statements, not only require impeccable professional ethics and patience to correct the problem, but also a viable model that will allow for systematic analysis and reduce false accusations of fraud. Accordingly, companies that have been wrongly accused could be freed from unnecessary legal trouble, and these resources could be used more efficiently elsewhere. Most importantly, it could accurately detect companies that are engaged in acts of fraud. This would also help to protect the rights and privileges of the stakeholders and maintain stability within the market. Besides, the limitations of the proposed method still exist, such as the cost of analysis due to computationally expensive posterior distributions in the MCMC. In addition, the proposed model can be applied to the multinomial probit model in future studies. Further studies can be explored using other techniques to increase the efficiency of the MCMC algorithm, such as [56,57].