Predicting Audit Opinion in Consolidated Financial Statements with Artiﬁcial Neural Networks

: The models for predicting audit opinion analyze the variables that a ﬀ ect the probability of obtaining a qualiﬁed opinion. This helps auditors to plan revision procedures and control their performances. Despite their apparent relevance, existing models have only focused on the context of individual ﬁnancial statements and none have referred to consolidated ﬁnancial statements. The consolidated information is essential for decision-making processes and understanding the true ﬁnancial situation of a company. Our objective is to provide a new audit opinion prediction model for consolidated ﬁnancial statements. To this end, a sample of group of Spanish companies was chosen and an artiﬁcial neural network technique, the multilayer perceptron, was used. The results show that the developed method managed to predict the audit opinion with accuracy above 86%. Moreover, there exist important di ﬀ erences concerning the most signiﬁcant variables in the audit opinion prediction for individual accounts, since when using consolidated ﬁnancial statements, the variables referring to industry, group size, auditor, and board members were converted into the main explanatory parameters of the prediction.


Introduction
Previous studies have developed models to predict audit opinion. Auditors can use these models to plan audit procedures, as a quality control tool in the stage of review of a commitment, and for the analysis of the variables that affect the probability of obtaining a qualified opinion [1,2]. The existing audit opinion prediction models have only been developed in the context of individual financial statements, and none have referred to consolidated financial statements. However, the literature on audit has been concluded by recognizing the greater relevance of consolidated information [3][4][5][6]. Consolidated information is essential to decision-making processes [7], to report the true financial situation of the economic entity [8], and to obtain the value of a company [9].
Given the importance that the previous literature places upon consolidated information, the present study aims to fill the gap in audit opinion prediction models, providing a new model that is capable of predicting the audit opinion in consolidated financial statements. To this end, a random sample of 298 groups of companies operating in the Spanish market was chosen. The sample was divided into unqualified opinion groups and qualified opinion groups, providing an initial set of explanatory variables of their consolidated financial statements. These variables include financial parameters to predict the audit opinion of a sample of companies from the Istanbul Stock Exchange. The results show that retained earnings/total assets is the most predictive variable. The authors of [2] improved the adjustments of these models by introducing corporate governance variables. This study used a multilayer perceptron and probabilistic neural network, obtaining a 98.10% accuracy for the testing sample. Finally, the authors of [15] carried out a study on the influence of financial variables on the audit opinion. The results showed a positive relationship between liquidity, loss in the current year, loss of the previous year, and qualified audit reports.

Characteristics of Consolidated Financial Statements
The purpose of the consolidated financial statements is to present the single economic and financial position of the group of companies owned by the parent company. The consolidation process requires the performance of a series of procedures, such as the elimination of transactions between group companies, with the objective that the assets, liabilities, income, and expenses reflected in the consolidated financial statements are finally those made available to third parties. In this way, the individual financial statements are affected by transactions with other group companies, in many cases altering the economic and financial situation intentionally. This situation means that the true result obtained from third parties is not shown in the individual financial statements. In the European Union, the financial statements of companies are addressed in Regulation 1606/2002, which requires International Financial Reporting Standards (IFRS) for consolidated reports and introduces a member state option to apply IFRS to other entities and to separate financial statements.
Consolidated information is widely recognized as essential to support decision-making processes and to ensure public accountability [7,26]. According to the authors of [8], it is important to remember that many economic studies and empirical studies for companies have been based on consolidated data. Two main reasons drive the adoption of the consolidated figures. Firstly, consolidated reports are the true financial statements of the economic entity. Secondly, in most countries, particularly for non-listed companies, only the consolidated financial statements are available to the public; therefore, all research is done on consolidated data. Despite the aforementioned, much has been written about the quality of consolidated financial statements and their differences concerning individual financial statements. Most of these studies have concluded by recognizing the greater relevance and predictive capacity of the consolidated information [3][4][5][6]. According to some authors, the lower relevance of individual financial statements is because the companies formulate their financial statements with an eminently fiscal purpose [27][28][29][30][31][32]. Similarly, the authors of [9], through the analysis of a sample of German companies, concluded by demonstrating the existence of greater predictive capacity in the consolidated data for obtaining the value of the companies, unable to identify any relationship between the value of the companies and the data counters of the parent companies analyzed. Finally, other empirical studies have shown the increase in relevance and greater impact of the consolidated information on the shares' market value [33][34][35][36][37].

Sample
To comply with the empirical objective of this paper, a random sample of 298 groups of companies that operated during 2017 in the Spanish market were chosen. We have not taken the financial sector into account in the selection of companies. The reason that financial institutions have not been selected is due to the special accounting characteristics of these companies. The audit opinion and groups' financial information in the sample were obtained from the Iberian Balance Sheet System database of Bureau Van Dijk, which contains corporate information on more than two million Spanish companies. Audit opinions are classified broadly into two groups. On the one hand are the reports with an unqualified opinion, which are those that adequately comply with the accounting standards. On the other hand are the reports with a qualified opinion, which are those that do not comply with the accounting standards.
From the sample, 87 financial statements received a qualified opinion and 211 unqualified opinions. Table 1 provides a breakdown of the industry classification of the groups in the sample, according to CNAE-09 (Spanish National Classification of Economic Activities). The sector of activity with greater representation is architectural and engineering technical services (92 groups of companies), followed by manufacture of transport equipment (72 groups of companies) and telecommunications (47 groups of companies). Textile industry 9 14 Clothes manufacturing 10 16 Wood and cork industry 1 17 Paper industry 6 18 Graphic arts 3 2 Chemical industry 2 3 Manufacture of transport equipment 72 4 Building construction 6 5 Air and maritime transport 4 6 Telecommunications 47 7 Architecture and engineering activities 92 8 Security and research activities 13 9 Creative, arts and entertainment activities 9

Variables
The present study uses 35 independent variables obtained from previous studies on audit opinion prediction models [1,2]. These independent variables are presented in Table 2. The independent variables include financial parameters, non-financial parameters, and other qualitative parameters. The use of the financial parameters related to profitability, liquidity, efficiency, solvency, productivity, and size is justified because previous studies indicate that a high level of financial risk and larger company size is related to a greater probability of receiving an opinion from qualified audit [1,37]. For its part, the previous literature also indicates that the use of non-financial parameters related to number of board members contributed to improving the levels of accuracy of the audit prediction models [2,38,39]. Likewise, other qualitative parameters have significant results in previous models of prediction of audit opinions. For example, the auditor code and audit fees indicate quality differences between audit firms. The authors of [11] hypothesize that larger audit firms provide higher quality audit reports. Finally, the variable of industry is used to denote the industry classification of the companies. The activity of the companies has been significant in previous studies on the prediction of audit opinions [1]. Furthermore, to identify the dependent variable, a dummy variable has been used. This variable takes a value of 1 if the opinion was qualified and 0 if otherwise. Auditor code 1 = "Big 4"; 0 = Others Audit fees Log (audit fees 10 3 €) Industry Industry classification (CNAE-09) 1 : Earnings before interest, taxes, depreciation and amortization

Model Research Design
This section presents the details of the NN model used in the present study. As noted, we are facing a problem of binary classification, using a set of explanatory variables as inputs. The NN will try to classify the companies in the samples set as qualified or unqualified. Since the set of patterns contains the target, it is best to use a supervised training model so that it learns from the available patterns. We have tested several different NN models, among which the multilayer perceptron (MLP), the radial base function network (RBF), and the deep networks with convolution layers (CNN) stand out. It is worth mentioning that the use of CNN does not significantly improve the results for this problem and has the disadvantage of being a more complex model than MLP and with a greater resource requirement for training. MLP has also obtained better results than RBF, so MLP has finally been selected to build the model object of the present work. The authors of [40] and [41] confirmed that learning in MLP constituted a special case of a functional approach, where there is no assumption about the underlying model of the analyzed data.
As is well known, one of the drawbacks of using NN to solve classification problems is the need to comprise a set of values in advance for the so-called model hyperparameters. In the case of MLP, the hyperparameters that must be predefined before training are the training algorithm, the initial learning rate, the number of hidden layers, the number of hidden neurons of each hidden layer, the activation functions of each layer, and the number of training epochs. In addition, to improve the capacity of the generalization of the network, some method of regularization must be implemented to avoid overtraining, which in turn also has hyperparameters. To define this set of hyperparameters, we have used two methods: firstly, the so-called dropout [42] and, secondly, training through k-fold cross-validation [43].
As a previous step and considering that the independent variables used have very different magnitudes, the values of them have been normalized using Equation (1) for the set of training patterns and Equation (2) for the testing set.
In these equations, Train_std j,i is the feature j of the training pattern number i, Test_std j,i is the feature j of the testing patterns number i, µ(Train) j it is the mean value of the feature j in the whole training set, and σ(Train) j is its standard deviation. Note that the set of standardized test patterns does not use the information on the mean value or its standard deviation but only information on training patterns. This allows the evaluation of the results on the testing set to not be polarized (unbiased performance evaluation of our model).
To find the appropriate values for the hyperparameters of the model, a thorough sweep was performed using the following configurations. First, and with respect to training algorithms, different variants of the gradient descent optimization algorithm have been used, specifically Adagrad [44][45][46]; RMSprop [47]; Adadelta [48]; adaptive moment estimation [49]; AdaMax [50]; Nesterov-accelerated adaptive moment estimation [51], and AMSGrad [52]. Besides the above, 1, 2 or 3 hidden layers have been used, varying the number of neurons of each layer from 1 to 200, in steps of 5. As a probability of elimination in the dropout method, the values 0, 0.1, 0.2, 0.3, 0.5, and 0.6 have been used. Hyperbolic tangent and logistic and ReLU activation functions have been tested. The best models have been obtained by using the adaptive moment estimation training algorithm, with 60 neurons in a single hidden layer, with a probability of dropout 0.5, with hyperbolic tangent activation function for hidden neurons and with softmax activation function for output layer neurons.
On the other hand, to implement the k-fold cross-validation, the following process has been repeated N times: (a) split the set of input patterns into i random groups (where i = 1, . . . , k), and (b) For each i = 1, . . . k, use pattern set i for testing and the rest (k−1 sets) for training. In addition, 20% of the training pattern set has been chosen as the validation set.
The repetition of the procedure N times minimizes the possible dependence of the results concerning the initial random division of the patterns since each repetition begins with the random selection of the set of patterns. In our case, we have found that k = 5 is a good balance between the diversity of configurations and size of pattern sets. The methodological process used in the present study ends with the evaluation of the results obtained for training, validation, and testing sets. In this evaluation, the percentage of correctly classified patterns (accuracy) and the mean square error have been used.

Exploratory Analysis
In the exploratory analysis, a descriptive study was carried out to determine the main statistical parameters of the independent variables (mean and standard deviation). This was intended to deduce which variables could be relevant in the prediction model to be developed. This analysis will be verified later with a specific confirmatory analysis through the use of the MLP model. Likewise, these results will be contrasted with the models estimated by other authors in previous studies. Table 3 provides information on the main statistics of the sample. The groups with qualified opinions, in comparison to those that present unqualified opinions, appear to be grouped with a smaller and worse position in terms of financial variables. These results are similar to those obtained by the authors of [1,2,10,18] for individual financial statements. As argued by the authors of [17], the possibility of obtaining qualifications in the reports increases when companies have worse financial health. In this sense, some studies also detected that companies with qualified audit opinions suffered from lower levels of profitability, liquidity, solvency, and productivity [2,18,[53][54][55].

Model Evaluation
To evaluate our system and considering that the number of patterns is not very high, we have implemented a five-fold cross-validation procedure repeated 100 times. For each of the hundred tests, the patterns for training, validation, and testing are chosen randomly, and the model is trained  Figure 1, which represents the accuracy average value for each of the 100 tests performed. Each test also contains three results (for the training, validation, and testing sets). The continuous blue graph shows the average percentage of correctly classified patterns in the five-fold cross-validation in each of the 100 tests performed. The dashed blue line shows the average value. In the same way, the green curve shows the results for the set of validation patterns and the red one for the set of training patterns. 1 : UQ: unqualified opinion; Q: qualified opinion.

Model Evaluation
To evaluate our system and considering that the number of patterns is not very high, we have implemented a five-fold cross-validation procedure repeated 100 times. For each of the hundred tests, the patterns for training, validation, and testing are chosen randomly, and the model is trained and evaluated. The results of the MLP model developed are shown in Figure 1, which represents the accuracy average value for each of the 100 tests performed. Each test also contains three results (for the training, validation, and testing sets). The continuous blue graph shows the average percentage of correctly classified patterns in the five-fold cross-validation in each of the 100 tests performed. The dashed blue line shows the average value. In the same way, the green curve shows the results for the set of validation patterns and the red one for the set of training patterns. For its part, Figure 2 shows the standard deviations of the results shown in Figure 1, indicating that, in all tests performed, the standard deviation is very limited (less than 9%). For its part, Figure 2 shows the standard deviations of the results shown in Figure 1, indicating that, in all tests performed, the standard deviation is very limited (less than 9%).  Table 4 shows a summary of the results for training, validation, and testing. The most used metrics for machine learning techniques for classification problems have been calculated. In particular, the precision indices, F-measure, precision, sensitivity, and specificity are supported. The results obtained indicate the great predictive capacity of MLP, with very similar accuracy to that provided by similar methodologies observed in the previous literature on audit opinion prediction models from individual financial statements. For example, in [2], a successful classification of 74% is reported; in [18], 82.22% is reached; in [43], 73% is reached, and in [15], 80% is reached.   Table 4 shows a summary of the results for training, validation, and testing. The most used metrics for machine learning techniques for classification problems have been calculated. In particular, the precision indices, F-measure, precision, sensitivity, and specificity are supported. The results obtained indicate the great predictive capacity of MLP, with very similar accuracy to that provided by similar methodologies observed in the previous literature on audit opinion prediction models from individual financial statements. For example, in [2], a successful classification of 74% is reported; in [18], 82.22% is reached; in [42], 73% is reached, and in [15], 80% is reached.

Sensitivity Analysis
The sensitivity analysis is the technique that is most used for the interpretation of the weights and parameters of a NN model [56,57]. The objective is to determine to what extent the oscillations in input values or parameters influence the output results of this, and it is exactly through the analysis of these variations by which it is possible to determine the importance of each variable, seeing as each one of them has a proportional representation in the model. To carry out this sensitivity analysis, the agnostic interpretability of machine learning [58] model has been implemented. To do this, the accuracy of the MLP has been evaluated, giving a value of 0 to each training pattern of independent variables. Subsequently, MLP has been reassessed, giving the value of 0 to each of the testing patterns of said variables.
The sensitivity results obtained appear in Figure 3. The variables of greatest impact have been those of industry, group size, auditor code, audit fees, and board members, all of which have an impact level above 83%. These variables of greater sensitivity convert, therefore, into the main explicative parameters for the prediction of audit opinion in the consolidated financial statements. In contrast, the developed models for individual financial statements require other types of explicative variables in order to reach an acceptable predictive power, like the financial variables [2,15,25]. Possibly, the aggregation of elements of heterogeneous companies that make up the consolidation perimeter, the homogenization processes, and the accounting adjustments carried out as a consequence of the consolidation mechanics may have led to the least significant of these variables in the consolidated model. Furthermore, the variance inflation factor (VIF) and tolerance value have been used to test multicollinearity and to support the model's results (Table 5). In cases where the VIF value is under 10 and the tolerance value is not very close to 0, the model is considered to be free from a multicollinearity problem [59]. All variables have good VIF and tolerance values. Therefore, there are no multicollinearity problems or autocorrelation in the model, and this shows the soundness and reliability of the model. consequence of the consolidation mechanics may have led to the least significant of these variables in the consolidated model. Furthermore, the variance inflation factor (VIF) and tolerance value have been used to test multicollinearity and to support the model's results (Table 5). In cases where the VIF value is under 10 and the tolerance value is not very close to 0, the model is considered to be free from a multicollinearity problem [60]. All variables have good VIF and tolerance values. Therefore, there are no multicollinearity problems or autocorrelation in the model, and this shows the soundness and reliability of the model.

Conclusions
This study covers the existing gap in the literature regarding audit opinion prediction models, developing a new exclusive model for consolidated financial statements. The results obtained have permitted us to show that it is possible to predict the audit opinion for consolidated financial statements and to identify the differences and similarities existing in the prediction of both individual and consolidated models.
The results obtained with MLP reached an accuracy of around 82.50% with the testing sample, indicating that the developed model has managed to predict the audit opinion with consolidated financial statements, as occurred in previous studies for individual financial statements. Furthermore, important differences concerning the most significant variables in the prediction of audit opinion pointing to individual financial statements have also been detected. The variables referring to industry, group size, auditor code, audit fees, and board members were converted into the main explicative parameters of the prediction of audit opinion for consolidated financial statements. Economic-financial variables related to liquidity, solvency, profitability, or productivity lost predictive capacity compared to other explanatory and corporate governance variables. In contrast, the developed models for individual financial statements require other types of explicative variables to reach an acceptable predictive power, like financial variables and other qualitative variables. Possibly, the singularity of consolidated financial statements has conditioned the results of this study. In this way, a collection of variables of a very diverse nature has been made evident for the prediction of audit opinions of consolidated financial statements. Therefore, the greater disintegration of heterogeneous activities and the subsequent aggregation as a result of the diversification observed in the consolidation perimeters, as well as the proliferation and constant regulatory adaption of individual and consolidated levels, may have been the factors that provoked the observed characteristics in the set of variables which were significant for the consolidated model.
In addition, the results have important implications for the analysis of the consolidated information. Approximately 20-30 years ago, business groups were practically an extension of a mother or dominating company. There was no group diversification, and real financial differences between the individual companies and the consolidated group were not detected. Consolidated financial statements were simply an aggregation of equal activities under the assumption of a level of risk with similar characteristics. With the passing of time, greater diversification of activities in groups of companies and the assumption of different risk levels according to the activity in concern has come about. A group of companies is no longer an extension of the parent company. The consolidated statements are, at present, the union of a set of activities of a different nature and with a different financing and investment scheme structures, and they provide more relevant and accurate information than individual statements. Hence, groups of companies and financial statements considered individually are, in principle, completely different states. For these reasons, the developed model in this study confirms the singularity of the predictor variables of audit opinion for consolidated groups, in opposition to those observed at the individual level.
The results obtained make it possible to think about an eminent practical application of these tools. Sectors such as the audit of accounts, financial institutions, credit rating agencies, or regulators of the audit activity, among others, could apply these methodologies in their work systems to increase profitability, productivity, and efficiency in their production cycles. Knowing a priori the type of audit opinion would allow the auditor to plan their work in a more orderly, systematic, and accurate way. If the models predict an audit report with a qualified opinion, this would allow the design of work programs with a greater number of tests and expand the samples to be analyzed. These models would help to complete the risk analysis in the planning phase. In addition, they would serve as a decision tool when deciding whether or not to accept certain assignments, a matter of considerable controversy in the profession of auditing accounts. Financial institutions would enjoy new tools to know a priori the audit opinion when they analyze unaudited financial statements or specify in advance the opinion of the auditor when the audit of financial statements is not finalized. This is very common for the segment of small and medium-sized companies that, according to the legal provisions of each country, are not usually subject, on a mandatory basis, to an audit of accounts. Likewise, the granting of loans and lines of financing to companies requires the analysis of a lot of financial information, such as the examination of the financial statements, the analysis, and interpretation of business plans. With the use of these methodologies, analysis times would be reduced, and decision-making would be streamlined. The audit opinion is a measurement of the true image of the financial statements and knowing it in advance would improve the productivity of the work teams and the business profitability in this type of institution. The result would be reflected in a reduction of errors when granting financing, as well as in acceleration towards the achievement of the objectives in these entities. Credit rating agencies would be another sector that would benefit from the use of these methodologies. These agencies handle a lot of information to develop their ratings. Knowing the opinion of these computational tools could enrich and complement the conclusions obtained in the analysis of the documents. In addition, the regulatory bodies that monitor and supervise the auditing sector would have a methodology that could allow them to determine whether the opinion of an auditor in their report coincides with the opinion expressed by these statistical tools. This would have a great impact on the oversight work of these organizations since they would detect afterward whether an auditor has distorted their professional opinion when compared with the opinion provided by these computer models. In this way, the greatest number of indications of possible incidences or errors in the work of the auditor could be detected, for the analysis, inspection, and, where appropriate, the opening of a sanctioning procedure.
On the other hand, our models showed a very high percentage of prediction and these values are similar to the results obtained for predicting the audit opinion in individual financial statements. The auditor's professional judgment must assess whether these prediction percentages are sufficient to use these computational methodologies. The inherent risk of the sector, the volatility in the market, or the risks detected in the internal control analysis of the company could be the most important factors that influence the auditor to accept the prediction percentages that these models give. In many cases, the auditor may consider this to be an acceptable percentage based on the circumstances outlined above.
Finally, the obtained results suggest future lines of investigation. Firstly, it would be very enlightening to know how the financial cycles can affect audit opinion prediction and consolidated financial statements. Working with time series and differentiating which variables are predictive according to the economic cycle is of an important practical application. Secondly, and looking deeper into the professional field of account auditing, exploring the predictive effect of the financial variables in each type of opinion and/or rating, as the dependent variable used has been dichotomous, qualified opinion, or unqualified opinion, is also important. Achieving these results would add value to the audit sector of financial statements, meaning that one could not only predict whether the opinion is favorable or without ratings but also determine which kind of rating would be reached and which type of opinion the auditor would have. Lastly, it would be interesting to verify for other international legislative environments, which would demonstrate the effect local economies may have on the audit opinion predictions. In this context, the administration of a global database may offer models from which to make international comparisons. The conclusions of these global models may serve as powerful tools for decision-making processes such as adopting international legislative improvements.