Managerial Self-Attribution Bias and Banks ’ Future Performance : Evidence from Emerging Economies

The objective of the study was to predict the future performance of banks based on the contextual information provided in annual reports. The European Central Bank has observed that performance prediction models in earlier studies mainly rely on quantitative financial data, which are insufficient for the comprehensive assessment of banks’ performance. There is a need to incorporate the qualitative information along with numerical data for better prediction. In this context, this study employed the attribution theory for understanding the contextual information of behavioral biases of management towards the expected outcomes. The sample consisted of 58 banks of 16 emerging economies, and the period covered from 2007–2015. Unsupervised hierarchical clustering was performed to identify the latent groups of banks within the data. For performance prediction, system GMM was employed, because it helped to deal with the endogeneity and heterogeneity problems. The results of the study were consistent with the attribution theory that management took credit for favorable expected outcomes and distanced from bad outcomes. An important policy implication of the study is that the prevalence of self-attribution bias of management in annual reports provides an additional source of information for the regulators to identify the banks at risks and take preventive measures to avoid the expected cost of failure. It can also help investors, and gives analysts a better tool for a comprehensive analysis of the profitability of prospective investments.


Introduction
Banks are important financial intermediaries within the financial system, and play a significant role in the economic growth of a country by mobilizing the savings of depositors and making them available to the borrowers for productive ventures (Jasevičien ė et al. 2013).Bank failures led to contagious effects, as witnessed in the global banking crisis of 2007-2008 that affected many sectors of major economies of the world (Edey 2009;Laeven and Valencia 2010;Fethi and Pasiouras 2010;Jasevičien ė et al. 2013).Therefore, prediction of bank performance is important, because bank failures cause vulnerabilities to the financial system (De Haan and Vlahu 2016).The Federal Deposit Insurance Corporation of the US reported that almost 400 commercial banks were bankrupted in the US during 2008 to 2012.Furthermore, 10 large banking groups survived with the help of government bailout packages.The total estimated cost of the financial crisis that the US economy had to face ranged between $10 trillion to $20 trillion (Seamans 2013).
In this context, a large body of literature has emerged to deal with investigating bank future performance from traditional econometric models to advanced machine learning methods (see Horváthová and Mokrišová (2018) for more details).However, these models were exclusively built on quantitative financial data (Gaganis et al. 2006).In addition, the focus of the earlier studies was to classify the banks into two groups, 'failed' and 'non-failed' (Lin et al. 2009).In response to that, the The 'immeasurably important' place of banks as central of economic activities in the emerging economies underscore the merit for studies that exclusively investigate the bank future financial performance.To the best of my knowledge, there is no study that has used self-attribution bias of management focusing on the emerging economies to predict the future performance of banks.This study attempts to fill this gap.
This study poses the research question, does the self-attribution of management in annual reports offer incremental predictive power, over and above the models based on the traditional quantitative financial data alone?
The study contributes in two important ways.First, it contributes toward a comprehensive performance prediction models of banks in which contextual information using self-attribution bias of management is leveraged with the traditional numerical data to predict the future performance of banks.This could help potential, as well as existing, investors with a comprehensive information about future profitability.Second, management of banks have the advantage of superior information, and are aware that regulators mostly focus on few financial ratios, which can be distorted (Gandhi et al. 2019).This study helps in reducing the information asymmetry between the management and shareholders about expected future performance by getting signals with the help of self-attribution of management in annual reports of the banks (Principal-Agent model).
The remaining part of the paper has been organized as follows: the next section shed lights on existing literature, taking up each issue under scope of attribution theory.Section 3 provides the research methodology that describes the sample selection, how managerial attribution bias was measured, and describes the econometric models.Section 4 explores the data with the help of descriptive statistics, correlation matrix and scatter plot matrix.Section 5 presents the exploratory data analysis with agglomerative hierarchal cluster analysis to find the latent groups within the data.Section 6 provides the estimates of the model that leads to the discussion of results.Section 7 concludes the research findings.Canbas et al. (2005) have explained that it is important to predict the future performance of banks, so that the regulators could take timely actions to mitigate the disastrous outcomes resulting from banking crisis.In this context, a large body of the literature emerged that dealt with investigating the future performance of banks.Methods developed earlier for manufacturing firms were also adopted in financial firms with some modifications.The methods included an analysis of financial ratios, discriminant analysis ( (Beaver 1966;Altman 1968), and logistic models, data envelopment analysis or DEA, (Fethi and Pasiouras 2010), and machine learning algorithms, particularly, the neural networks (Ravi Kumar and Ravi 2007).These methods provide the foundation for researchers to predict the future financial performance of banks from more than two decades (Board et al. 2003).In addition to the earlier models, stress test which has become more widespread after the financial crisis of 2007-2008 that is performed under hypothetical bad economic conditions to observe that a bank has required capital to bear the impact of adverse conditions (Petrella and Resti 2013).However, this test is performed by management, and based on hypothetical situations which may be strategically manage by the managers to show a better financial condition of the bank (principals-agent problem).Gaganis et al. (2006) criticized earlier developed models on the basis that these models exclusively relied on quantitative financial data.The European Central Bank (2010) also observed that researchers were developing models based only on the financial data for predicting the bank's performance.However, there is a large quantity of unstructured qualitative information that should also be used in conjunction with numerical data for a comprehensive performance prediction model.In this perspective, Smith and Taffler (2000) augmented that qualitative sections of annual report provide "nearly twice the quantity of information as do the basic financial statements".Moreover, the qualitative information could also help in understanding financial statement, and signals about future financial performance (Li 2008(Li , 2010b)).This was earlier suggested by Craig et al. (2013) that the textual data should also be considered in examining whether firms are healthy or at risk.

Literature Review
Earlier studies on non-financial firms have empirically evidenced that future performance could be predicted using qualitative information in corporate documents (Li 2008(Li , 2010a;;Craig et al. 2013).Thus, the qualitative data seems to be a valuable additional information source to supplement the financial data available in corporate financial statements.Research in the manufacturing sector has already begun to explore the narratives in textual disclosures in many ways, including complexity of text, sentiment analysis, and self-attribution bias.In this study, self-attribution bias of management is used to predict the future financial performance of banks.
Managerial attribution bias can be understood in the context of attribution theory (Bloomfield 2008;Hooghiemstra 2001;Clatworthy and Jones 2003).Attribution theory was first proposed by Heider in 1958, but Weiner and colleagues developed a theoretical framework that has become a major research paradigm in social psychology (Sparck Jones 1972;Li 2008;Baginski et al. 2000;Clatworthy and Jones 2003;Merkl-davies et al. 2014;Leary and Kowalski 1990;Moosa and Ramiah 2017).The theory explains that achievers believe success was attained due to their abilities, and efforts (internal attribution), whereas, failure would be considered due to bad luck, and was not their fault (external attribution).Therefore, Bloomfield (2008) claims that attribution theory in psychology predicts that using internal attribution or external attribution in textual disclosures is an import indication of a firm's future performance.
Literature has witnessed that attribution of management provides involuntary signals about the expected outcomes of firms in the textual sections of annual reports, especially, in the CEO's letter to the shareholders, and management discussion and analysis (Aerts 1994;Clatworthy and Jones 2003;Merkl-davies et al. 2014).Amernic and Craig (2006) provided that when the outcomes of firms were good, the CEO took credit, and attributed to internal factors (their ability, skills, vision and foresightedness).In contrast, if the outcomes were unfavorable, they made external attributions (bad economic, and market situations) in an attempt to distance from bad performance, so that they would not be held personally accountable.
Accounting literature has widely analyzed management's self-attribution bias from different perspectives.For instance, Craig et al. (2013) analyzed the bankrupted Indian firm 'Satyam', and found that there was a shift in first-person singular pronouns to first-person plural pronouns in the CEO's letter to shareholders.It also showed blame shifting signals about the bad outcomes of the firm.Similarly, Clatworthy and Jones (2003) examined the chairman's letter to shareholders of the 50 top and bottom UK companies, and found that management took credit for good news, and blamed the external environment for bad outcomes.Likewise, Li (2010a) analyzed the managements' tendency towards self-serving attribution with the help of computational linguistic technique in management discussions and analysis (MD&A), and found that the inclination of attributional words by management was positively related to firms' future performance.More recently, Lehmberg and Tangpong (2018) examined how top management communicate the bad or good firm performance to the stakeholders.The authors found a significant and positive relationship between subsequent good performance and internal attribution, while bad performance was attributed to external factors.
Researchers also used self-attribution biases from many other aspects other than firm performance.For instance, Adam et al. (2015) analyzed whether the past speculative forecast of managers increased more investments by testing the data of 92 North American gold mining firms over a period of 1989 to 1999.The results of the study demonstrated that bad forecast did not reduce investments, because managers attributed successful outcomes to their abilities, while losses were attributed to bad luck or some other external factors.
From the investors' perspective, Mushinada and Veluri (2018) analyzed the behavior of investors for their investment decisions using behavioral biases.The authors used 1290 stocks traded in the Bombay Stock Exchange of the India during 2004 to 2012 examining whether the self-attribution bias existed among investors about their earnings' forecasts.The results of the study revealed that when the forecast was accurate, the investors took credit, while the wrong forecast was attributed to external factors, especially the excessive volatility in the stocks.Likewise, Chen et al. (2016) investigated the relationship between attribution of managers for their earnings forecast, and investors' utilization of that attribution.The study found that investors followed the internal attribution of management for their investments.In another study, Asay et al. (2018) investigated the attribution with the help of personal pronouns of CEO for the prediction of winning law suits or good/bad news of CEO.Results clearly indicated that participants' likelihood of winning law suits or good performance was associated with the use of more personal pronouns.

Data
The author selected all the emerging economies which were reported in IMF emerging economies list.Specialized banks were excluded to keep the sample homogeneous.Annual reports were downloaded from the websites of banks, and the financial data were obtained from Bureau van Dijk (BvD)2 .Another condition to the sample selection was the availability of CEO letters to shareholders, and management discussion and analysis in annual reports.Moreover, banks were who published annual reports in their national language excluded from the sample, because this could create reliability problems in the construction of self-attribution bias indices.
After accounting for all changes, the data consisted of 58 banks from 16 emerging economies, and the period covered 2007 to 2015.The list of banks is provided in Appendix A.

Bank Performance Indicator
Return on Average Equity (ROAE): Return on average equity was taken as a performance indicator of banks.It was calculated as the net income divided by average total equity.ROAE is an internal performance measure of shareholder value and has been widely used for performance prediction of banks (Aerts 2001;Petria et al. 2015;Beccalli et al. 2015;Yao et al. 2018;Akhisar et al. 2015).It is a fundamental ratio that tells the investors, how effectively management uses their money.It proposes a direct assessment of the financial return of shareholders' investment.This ratio shows whether the management is growing banks' value at an acceptable rate.

Performance Determinants
Self-Serving Attribution Bias (CEO letter to shareholders): In the corporate context, more use of first, and second-person pronouns is an indication of taking credit for good outcomes, compared to third-person pronouns.Researchers have widely analyzed corporate narrative to explore the relationship between CEO's self-attribution bias in letters to the shareholders and the firm's future financial performance (Clatworthy and Jones 2003;Amernic and Craig 2006;Craig et al. 2013;Lehmberg and Tangpong 2018).In this study, a positive relationship is expected between self-attribution bias of CEO in letter to shareholders and bank's future performance.
Self-Serving Attribution Bias (Management Discussions and Analysis): This section of annual report explains the firm's overall performance, challenges face by the management, internal and external risks involved in the operations, and indications about the future prospects.A similar methodology was adopted to calculate the self-attribution bias of management in management discussions and analysis as described in CEO letter to shareholders.Earlier studies have evidenced the positive relation between self-attribution of management in management discussions and analysis and firm' future financial performance (Li 2010a;Lehmberg and Tangpong 2018;Aerts 2001Aerts , 2005)).Similarly, a relationship is also expected in the bank's financial performance.
Total Assets: In this study, total assets represent the size of bank with the absolute values in million US dollars.There was huge variation in the assets of the banks in the emerging economies.Thus, the logarithm of total assets (Log_Assets) was employed as a proxy for bank size.The size of assets could provide higher profit up to a certain level, thereafter, the profitability could be lowered as compared to small banks.Thus, the relationship between total assets and banks performance could be negative, because percentage of profit does not increase with the equivalent proportion of assets (Trujillo-Ponce 2013;Panta 2018;Terraza 2015;Shehzad et al. 2013).Some other studies have also shown an insignificant relationship of total assets and firms' performance (Athanasoglou et al. 2008;Petria et al. 2015).
Assets Growth Ratio: The assets growth ratio was calculated by current year assets minus last year assets divided by last year assets.Assets growth ratio indicates the percentage increase or decrease over the prior year.The increase in profitability of the bank depends upon the increase in quality of assets.Higher quality assets would increase the profitability of the banks.The earlier studies suggested that as the assets increased, profitability of the bank were also increased (Mathuva 2009;Ahamed 2017;Bougatef 2017;Yao et al. 2018).Thus, the relations between assets growth and banks future performance is expected to be positive.
Non-Performing Loans to Gross Loans Ratio: Non-performing loans to gross loans was calculated as total non-performing loans divided by gross loans of the bank.The loans are classified as nonperforming when the borrower defaults or declares bankruptcy.It measures the effectiveness of a bank in receiving repayments on its loans.The higher the ratio, lower the profitability of the banks (Trujillo-Ponce 2013;Panta 2018;Petria et al. 2015).
Tier1Capital Ratio: Tier1capital ratio was calculated as total tier1capital3 divided by total risk-weighted assets4 .The tier1capital ratio measures a bank's core capital.In 2015, under Basel III, the minimum tier 1 capital ratio was 6 percent.The regulators use this ratio to determine, whether a bank is well capitalized, undercapitalized or adequately capitalized relative to the minimum requirements.Since, net income is spread over increased equity, the relationship between performance of banks and tier1capital is expected to be negative (Stovrag 2017).
Loans to Asset Ratio: Loans to asset ratio was calculated as total loans held by bank's borrowers divided by total assets of the bank.The loans included cash deposits at other banks, financial assets, securities, advances to the borrowers.This ratio indicates to what extent assets are devoted to loans.Literature has shown that the relationship between loans to assets ratio and bank performance could be either positive or negative.Higher the ratio, lower the liquidity position of bank, and may face a higher risk of failure (Goddard et al. 2013).In addition, a bank holding more liquid assets (lower loan to asset ratio) may suffer from lower profitability (Demirgüç-Kunt and Huizinga 1999; Yao et al. 2018).On the other hand, higher loans to borrowers could provide more interest income to the banks (Trujillo-Ponce 2013).
GDP Growth Rate: GDP growth is the rate of change in gross domestic products of a country.Positive relationship between bank's performance and GDP growth rate is expected, because increase in economic activities would lead to demand for bank's transactions, resulting higher profitability of the banks (Petria et al. 2015;Trujillo-Ponce 2013;Yao et al. 2018;Athanasoglou et al. 2008).Other studies showed that there was no relationship between GDP growth and banks profitability, and was more country specific (Shehzad et al. 2013;Djalilov and Piesse 2016).
Interest Rate Spread: Interest rate spread is the country level variable that refers to the difference between the borrowing and the lending rates of banks.Higher spread shows that banks are advancing loans at a higher premium.Thus, the relationship between interest rate spread and bank's performance is expected to be negative.
Exchange Rate (1 $ equals local currency): The exchange rate is the price of one country's currency in terms of foreign currency.The financial data for this study was obtained from Bureau van Dijk (BvD), which was available in US dollar.Exchange rate has indirect impact on the profitability of the banks.There is a tendency among some countries to keep local currency weaker to stimulates exports.Because the export transactions are executed through banking channels, it increases the non-interest income of the banks (Medura 2006;Pagratis et al. 2014).Thus, a positive relationship is expected between exchange rate and return on average equity.
The list of variables and their definitions are provided in Table 1.

Measurement of Managerial Self-Attribution Bias
Text preprocessing is an exhaustive process, especially when someone needs to analyze unstructured qualitative information.The purpose of preprocessing is to convert qualitative text in a usable form to get insight for further analysis.There are different software programs which have built-in functions that convert text into predefined objects.Many of those software programs do not provide customized options to the users to process text according to specific requirements.For instance, two software programs are commonly used in literature; (i) Diction, and (ii) Linguistic Inquiry and Word Count (LIWC) that contain predefined built-in functions for text analysis.Our text analysis required customized functions, because annual reports of selected sample countries did not follow any standard pattern to measure the managerial self-attribution bias.Therefore, an open source R software is used for measurement of attribution bias indices, because it provides a rich selection of text preprocessing packages.
I also used SAS software for agglomerative hierarchical clustering, and estimation of system GMM.
After downloading the annual reports from the websites of the banks, two sections namely, CEO's letter to shareholders, and the management discussions and analysis (MD&A)5 were extracted.
At an initial stage, all the documents were converted from pdf to plain, UTF-8 encoded text using "pdftools" of R, and were stored in a "Corpus".A corpus is considered to be a "library" of all the original documents that have been converted to plain, UTF-8 encoded text.Figure 1 exhibits the whole process for the measurement of self-attribution bias indices.Self-attribution bias was constructed by first and second-person pronouns minus third-person pronouns.To construct the self-attribution bias from the CEO's letter to shareholders (CEO SAB), two separate dictionaries of first and second-person pronouns, and third-person pronouns were constructed.These two dictionaries were employed to obtain the score by matching with the corpus of CEO letter to shareholders.Each term of pronoun was counted as many times it appeared in the document, which reflected the stress made by management to take credit of good performance and vice versa.

Econometric Model Using System GMM
For performance prediction, three models were estimated.First, prediction of banks performance using only two self-attribution bias (SAB) indices.Second, the model was estimated with the help of SAB indices along with bank level quantitative financial variables.Third model was estimated with full set of SAB indices, bank level variables, and macroeconomic indicators.The notion behind estimating three models was to observe the predictive power of SAB indices, over and above the models that were based on quantitative financial data alone.
Dynamic panel models are linear regression models that consist of individual effects, yield individual-level errors, and overall model residual errors.It allows for dependent variables to depend on its own value from its previous time, thus making the model dynamic.The following Equation ( 1) is specified for model 1.
where  is the return on average equity one year ahead taken as a performance indicator of banks.CEO_SAB is the self-attribution bias calculated from CEO letter to shareholders, and MD&A_SAB is the self-attribution bias calculated from management discussions and analysis.The  is individual effects, and the  observation-level regression errors.
The second model is shown in Equation ( 2): The second model consisted of SAB indices, bank level quantitative financial variables that includes log of assets, assets growth ratio, ratio of non-performing loans to gross loans, tier1captial ratio, and loans to assets ratio.
The third model is shown in Equation (3): The third model consists of a full set of SAB indices, bank level quantitative financial variables, and macroeconomic indicators that includes interest rate spread, GDP growth, and exchange rate.

Diagnostic Checks
Initially, the model was estimated using fixed effects, and random effects, and observed the individual effects using F-test.The Hausman test was estimated for model's selection either to use the fixed effects or the random effects.The problem with the fixed effects and the random effects are that the error term being correlated with the lagged dependent variable created an endogeneity problem, and even the error term is not autocorrelated (Greene 1996, p. 536).Both models are A similar process was adopted to obtain the score of first and second-person pronouns and third-person pronouns of management discussions and analysis (MD&A SAB).

Econometric Model Using System GMM
For performance prediction, three models were estimated.First, prediction of banks performance using only two self-attribution bias (SAB) indices.Second, the model was estimated with the help of SAB indices along with bank level quantitative financial variables.Third model was estimated with full set of SAB indices, bank level variables, and macroeconomic indicators.The notion behind estimating three models was to observe the predictive power of SAB indices, over and above the models that were based on quantitative financial data alone.
Dynamic panel models are linear regression models that consist of individual effects, yield individual-level errors, and overall model residual errors.It allows for dependent variables to depend on its own value from its previous time, thus making the model dynamic.The following Equation ( 1) is specified for model 1.
where ROAE it is the return on average equity one year ahead taken as a performance indicator of banks.CEO_SAB is the self-attribution bias calculated from CEO letter to shareholders, and MD&A_SAB is the self-attribution bias calculated from management discussions and analysis.The v i is individual effects, and the it observation-level regression errors.
The second model is shown in Equation ( 2): The second model consisted of SAB indices, bank level quantitative financial variables that includes log of assets, assets growth ratio, ratio of non-performing loans to gross loans, tier1captial ratio, and loans to assets ratio.
The third model is shown in Equation (3): The third model consists of a full set of SAB indices, bank level quantitative financial variables, and macroeconomic indicators that includes interest rate spread, GDP growth, and exchange rate.

Diagnostic Checks
Initially, the model was estimated using fixed effects, and random effects, and observed the individual effects using F-test.The Hausman test was estimated for model's selection either to use the fixed effects or the random effects.The problem with the fixed effects and the random effects are that the error term being correlated with the lagged dependent variable created an endogeneity problem, and even the error term is not autocorrelated (Greene 1996, p. 536).Both models are presented in Appendix C. Thus, the regressors are said to be endogenous when random errors are correlated to the regressors.In fact, endogeneity is a major methodological concern in many areas of research in corporate finance (Abdallah et al. 2015).If endogeneity is present in the model, then the statistical inference from the analysis may be biased (Abdallah et al. 2015).Endogeneity may occur due to omitted variables, which may result in the error term being correlated with the explanatory variables.Alternatively, the endogeneity may be of the dynamic type, whereby the past realizations of the dependent variable influence current realizations of one or more of the explanatory variables.Finally, endogeneity can be of the simultaneous type, where the contemporaneous realizations of both the dependent variable and the explanatory variables affect each other (Abdallah et al. 2015;Roberts and Whited 2012).
The model in this study potentially faces two types of endogeneity issues, i.e., omitted variable bias, and the past realization of dependent variable in terms of earning persistence.For example, SAB indices were developed from the textual data to capture the private information of the management.Nevertheless, these indices may not necessarily present a perfect proxy of private information, and is affected by the agency problem.This is referred to as endogeneity of omitted variable bias.Moreover, lagged dependent variable is included in the right-hand side of the model, making it dynamic.The reason to include the lagged dependent variable is the performance persistence of the banks, which was the continuity of the current earnings affected by the magnitude of the accruals.The higher persistent earnings are accompanied with more ability to maintain the current earnings (Lipe 1990).Hence, failure to address the endogeneity may lead to poor statistical inference (Abdallah et al. 2015).
Some of researchers have mentioned that there should be a theoretical reasoning for considering the regressor as endogenous.However, for a robustness check, the Durbin-Wo-Hausman test was used to test whether the theoretical reasoning justify the empirical reason of endogeneity problem.
Similarly, the model could also face heteroscedasticity problem.The White test was used for detecting the heteroscedasticity of residuals, and the null hypothesis is that the variances for the errors are homoscedastic, where H o : σ 2 i = σ 2 for all i.In presence of the heteroscedasticity, estimates are still unbiased, but become inefficient.However, the standard errors of estimates are wrong, leading to incorrect inferences (White 1980).Arellano and Bover (1995) and Blundell and Bond (1998) proposed system GMM that could help deal with the endogeneity problems (Abdallah et al. 2015), and heterogeneity problems (Baum et al. 2003).System GMM consists of two sets of equations, each with its own internal instruments, i.e., the first set is levels equations, and second set is the difference equations (Zheng et al. 2017).
A generalization of the linear regression model is an autoregressive (AR) model.The AR test was conducted to identify the serial correlation in residual.The null hypothesis is that there is no serial correlation within residuals.
The validity of the instruments was confirmed via a the Sargan test that checks for overidentifying restrictions, and it is asymptotically distributed as a χ 2 (n) with n degrees of freedom under the null hypothesis that the instrument set is appropriate for the data at hand.
In addition, robust standard errors were calculated, which are more reliable than conventional standard errors (Gutierrez and El-khattabi 2017).
The estimates of all models were conducted with the help of SAS 'proc panel' procedure.

Descriptive Statistics
Table 2 provides the descriptive statistics of 58 banks of 16 emerging economies.Furthermore, detailed descriptive statistics were calculated at country level, so that insight could be obtained in depth about how one country's statistics were different from another.Country level statistics are shown in Appendix B. The Mean of ROAE was 16 percent, which was close to median, whereas the lowest value was −27 percent, and the highest was 41 percent, and the standard deviation was 6.75.It was found that the highest ROAE belonged to the NDB bank of Sri Lanka in 2012, because in that year NDB bank became the first investment bank.It made divestment of AVIVA insurance, and made new investment with A/A corporation.Further, economic growth, and the post war period in Sri Lanka also helped to increase profitability.Lowest ROAE −27.80 belonged to an Askari bank of Pakistan due to huge non-performing loans, and the bank written-off non-banking assets.SAB of CEO ranged between −139 to 181 that means there was polarity of managerial attribution, because in some of the banks, the CEO used more first and second-person pronouns, and in other cases, the CEO used more third-person pronouns.SAB of MD&A also showed a greater dispersion, and ranged between −191 to 1128.The highest values of SAB of MD&A related to those banks who had published annual reports of more than 700-1000 pages, meaning that a large number of pages were also allocated to MD&A section.
Financial variables also presented greater variability in the data.Absolute values were reported in millions of US dollars converted from local currency on the last day of the financial year of the respective country.The mean value of total assets was 187,225 US dollars, and ranged between 143 million US dollars to 3,421,363 million US dollars, having a standard deviation of 486,273.Largest value belonged to the ICBC, because it was one of the largest banks in the China, whereas, lowest assets were reported by the PABC bank of the Sri Lanka.It was important to notice that the Chinese banks were much larger than any other banks in selected samples.The lowest value of assets growth was −22% belonged to the Bank Alfalah of the Pakistan in 2008, which was year of financial crisis, and 56% belonged to the ThanaChart bank of the Thailand.The PABC bank of the Sri Lanka has the highest non-performing loans with the value of 27%, whereas, an average value was 4% close to median.
In the macroeconomic indicators, the Do-Brazil bank had the highest interest rate spread ranged between 30-35 percent.The high interest rate spread was due to the Brazil's large public debt, and debt services.Another reason of high interest rate spread was the history of default that's the government had to pay a high default risk premium to attract foreign capital.Moreover, the exchange rate was lowest in the Indonesia for 1 dollar equals 13,389 Indonesian Rupiah.The GDP Growth rate was lowest in Hungary in 2009 with −6.56, and highest in Singapore with 15.24 in 2010.
The author provides the correlation matrix in the Table 3 showing that there was no multicollinearity between the variables.Figure 2 provides the scatter plot matrix of all the SAB indices, quantitative financial variables, and macroeconomic indicators.

Agglomerative Hierarchical Clustering Analysis
Cluster analysis is an important form of exploratory data analysis that tries to explore hidden groups within the data.It is an unsupervised machine learning method, which is widely used for classification problems, and exploring the data based on some similarity of features6 through a structed pattern (Wu et al. 2009;Myatt and Johnson 2014).As a result, observations in the same cluster are more analogous than those in other clusters (Hsu et al. 2007).More importantly, clustering helps to discover latent natural group, and categorize data into a hierarchical set of clusters organized in a tree structure (Loewenstein et al. 2008).
There are different clustering techniques, including k-mean clustering, k-mode clustering, but the underlying study adopted the hierarchical clustering, because it is mainly used for small data sets, where each observation forms hierarchy.SAS 'proc cluster' procedure was used for agglomerative hierarchical cluster analysis.
To calculate the similarities between observations, it is required to normalize all the variables, so that the distance between these observations are computed to prevent disproportionate weights, and biases.The data used in this study consisted of total assets reported in million in the dollar, and other financial variables are in percentage.Therefore, the whole dataset was normalized using the following formula: 6 Features in a sense of variables, which is named in machine learning.

Agglomerative Hierarchical Clustering Analysis
Cluster analysis is an important form of exploratory data analysis that tries to explore hidden groups within the data.It is an unsupervised machine learning method, which is widely used for classification problems, and exploring the data based on some similarity of features 6 through a structed pattern (Wu et al. 2009;Myatt and Johnson 2014).As a result, observations in the same cluster are more analogous than those in other clusters (Hsu et al. 2007).More importantly, clustering helps to discover latent natural group, and categorize data into a hierarchical set of clusters organized in a tree structure (Loewenstein et al. 2008).
There are different clustering techniques, including k-mean clustering, k-mode clustering, but the underlying study adopted the hierarchical clustering, because it is mainly used for small data sets, where each observation forms hierarchy.SAS 'proc cluster' procedure was used for agglomerative hierarchical cluster analysis.
To calculate the similarities between observations, it is required to normalize all the variables, so that the distance between these observations are computed to prevent disproportionate weights, and biases.The data used in this study consisted of total assets reported in million in the dollar, and other financial variables are in percentage.Therefore, the whole dataset was normalized using the following formula: where z n is the standardized value of observation n, x n is the original value of observation n, x and σ X are the mean, and standard deviation of the variable X.The sample data consisted of a panel of 58 banks, and the time period covers from 2007-2015.Directly making clusters of the panel data could match the observation of one bank' year with another bank.This could distort the whole data, and understanding of clusters.Therefore, the mean of each bank was taken, where each bank represented a single observation.Moreover, the data consisted of 11 variables that could also create problems for differentiating variables for making the clusters.Principal component analysis was performed to identify the variables that were loaded on the first factor (see Appendix C).Hence, four variables: assets growth, total assets, NPL to gross loans, and GDP growth, were loaded on factor-1 that were taken to form clusters.

Measurement of Distance between Observations
The distance between observations was calculated using Euclidean distance matric (Myatt and Johnson 2014).It is shown as follows.
where d is the distance between p, and q the observations of n variables.Thus, the Euclidean method calculates the distance of each combination of all the observations.Then, the linkage rule creates clusters by comparing the distance between the clusters, and observations.This process continues until a diagram called dendrogram is created, which explains how the observations are connected to each other based on similarities of observations.The dendrogram is shown in Figure 3.This could distort the whole data, and understanding of clusters.Therefore, the mean of each bank was taken, where each bank represented a single observation.Moreover, the data consisted of 11 variables that could also create problems for differentiating variables for making the clusters.Principal component analysis was performed to identify the variables that were loaded on the first factor (see Appendix C).Hence, four variables: assets growth, total assets, NPL to gross loans, and GDP growth, were loaded on factor-1 that were taken to form clusters.

Measurement of Distance between Observations
The distance between observations was calculated using Euclidean distance matric (Myatt and Johnson 2014).It is shown as follows.

𝑑 = (𝑝 − 𝑞 )
where d is the distance between p, and q the observations of n variables.Thus, the Euclidean method calculates the distance of each combination of all the observations.Then, the linkage rule creates clusters by comparing the distance between the clusters, and observations.This process continues until a diagram called dendrogram is created, which explains how the observations are connected to each other based on similarities of observations.The dendrogram is shown in Figure 3.To determine the optimum number of clusters, cubic clustering criterion (CCC), (b) pseudo T statistic, and (c) pseudo F statistics were used.The highest peak of CCC is considered as the optimum number of clusters.In our case, the peak of the plot at 4 CCC provided 9 optimal number of clusters, as shown in Figure 4. To determine the optimum number of clusters, cubic clustering criterion (CCC), (b) pseudo T statistic, and (c) pseudo F statistics were used.The highest peak of CCC is considered as the optimum number of clusters.In our case, the peak of the plot at 4 CCC provided 9 optimal number of clusters, as shown in Figure 4. Pseudo F statistic (PSF) also supplemented the selection of 9 clusters by showing the highest peak (middle graph).In pseudo t 2 statistic, observing the graph from right to left, until the first height gave the indication of about 9 as the number of clusters.Table 4 shows the number of banks in each cluster, percent frequency, and cumulative percent frequency.
Figures 5 and 6 show the distribution of clusters in pie chart, and bar chart respectively.Pseudo F statistic (PSF) also supplemented the selection of 9 clusters by showing the highest peak (middle graph).In pseudo t 2 statistic, observing the graph from right to left, until the first height gave the indication of about 9 as the number of clusters.Table 4 shows the number of banks in each cluster, percent frequency, and cumulative percent frequency.

Cluster Profile
To profile the clusters, descriptive statistics were calculated on the unstandardized dataset based on the clusters' distribution of banks to explore the reasons of similarities, and differences between clusters.
It helps to understand the patterns between clusters, and the rationale behind how these clusters were formed.Statistics included mean, median, standard deviation, skewness, and kurtosis shown in the Table 5.

Cluster Profile
To profile the clusters, descriptive statistics were calculated on the unstandardized dataset based on the clusters' distribution of banks to explore the reasons of similarities, and differences between clusters.
It helps to understand the patterns between clusters, and the rationale behind how these clusters were formed.Statistics included mean, median, standard deviation, skewness, and kurtosis shown in the Table 5.
Observing the mean statistic of four variables used in cluster analysis, the mean value of assets growth was 10 percent, and GDP growth at 5.16 percent.However, Natural Language Processing (NLP) to gross loans were only 4.24 percent.These values were also close to median, meaning that the distribution of these variables was symmetrical.These statistics were compared with cluster statistics.
Table 5 shows that cluster-1 included those banks which had a smaller mean value, and median as compared to the whole dataset, except total assets.It included 13 banks: four banks of Philippines, four banks of the Sri Lanka, two banks of Indonesia, and one bank each from the India, Bangladesh, and the Malaysia.Cluster-2 consisted those banks which had high assets growth, banks belonged to those countries with a better GDP growth rate, less NPL, and low averaged total assets as compared to whole sample statistics.This cluster consisted 6 banks: four Indian banks, and one bank each from the Sri Lanka, and Bangladesh.Cluster-3 comprised of banks with highest assets growth, and banks belonged to those countries with the highest GDP growth rate, lowest NPL, and largest banks in the selected sample.It was interesting to notice that all the three banks in cluster-3 belonged to the China.
Cluster-4 included 16 banks where five from Turkey, three each from Malaysia and Singapore, and one bank each from Thailand, Indonesia, and Nepal.Banks included in this cluster had lower assets growth which was also manifested in countries with lower GDP growth.Banks included in cluster-4 had low averaged NPL, and total assets as compared to whole sample statistics.Cluster-5 carried those banks having low averaged assets, GDP growth, and low NPL, as compared to total sample statistics, which was also supplemented by other statistics.Banks included in this cluster were relatively small banks.Three Chinese banks also constructed the cluster-6 without including any other bank from emerging economies.Cluster-3 and cluster-6 carried the Chinese banks, but the major differentiators between both were assets growth, and the size of banks.Cluster-3 banks were small sized, but the highest assets growth as compared to the banks of cluster-6.
Cluster-7 included banks which had the highest non-performing loans with second lowest assets growth.The data also revealed that the Askari bank and NBP bank from the Pakistan, and the OTP bank from Hungary had persistent high non-performing loans.In cluster-8, the major differentiator was the assets growth, having the lowest mean value and highest coefficient of variation.Finally, cluster-9 carried only one bank that did not merge in any other cluster.The reason being that the separated bank was the highest mean value of assets growth of 18.14 percent, and smallest bank in terms of total assets.The list of banks in each cluster along with their belonging to the country are shown in Table 6.

Pre-Diagnostics Checks
Initially, fixed effects and random effects were estimated.The results of both models, fit statistics and diagnostics tests are reported in Appendix D. The Hausman test was used to select the fixed effects or random effects model.Based on the results, we reject the null hypothesis, and suggested to employ fixed effects model.In addition, F-test was also performed to observe the bank effects in the model, and the null hypothesis is that bank effects are all zero.Based on the results, we rejected the null hypothesis at 1 percent (p-value < 0.01), and suggest that bank effects exist in the data.In both models, fixed effects and random effects, CEO SAB and MD&A SAB indices were insignificant.However, assets growth was significant at 1 percent, NPL to gross loans and tier1capital ratios, and exchange rate from macroeconomic indictor were negative and significant at 5 percent.Assets growth was positive and significant at 5 percent.
Although, the Hausman test suggested for fixed effects model, but the problem with the fixed effects was that the error term was correlated with the lagged dependent variable, creating an endogeneity problem (Greene 1996, p. 536).Moreover, the model could also face the heterogeneity problem.The White test was used for detecting the heteroscedasticity of residuals.Based on the results, the null hypothesis was rejected at 1 percent significance (p < 0.01), and suggested that the variances of error were heteroscedastic7 .Thus, the results produced could be biased, and lead to poor statistical inferences.
In addition, self-attribution bias indices were also theoretically endogenous, which were further tested using Durbin-Wu-Hausman test, known as the Hausman specification test.The null hypothesis is that there is no measurement error.The results of the test rejected the null hypothesis at 5 percent (p < 0.026, and suggested for use as the instrumental variable8 . To deal with the endogeneity and heterogeneity problem, the author used the GMM system, developed for dynamic panel models by Arellano and Bover (1995), and Blundell and Bond (1998).System GMM helps to deal endogeneity problem using instrumental variables.It also deals the heteroskedasticity using the orthogonality conditions to allow for efficient estimation in the presence of heteroskedasticity, even of unknown form (Baum et al. 2003).
Table 7 shows the results of model-1 (Equation (1)) of system GMM, parameter estimates, fit statistics such as means squared error (MSE), Root MSE, and r-squared.

Post Diagnostics Checks
Post diagnostic checks were conducted to determine the robustness of the empirical results.The Sargan test provided the test of overidentification restrictions.Therefore, the null hypothesis was rejected, and suggest that the instrument set is appropriate for the data (p-value < 0.4695).Similarly, the AR test was conducted to identify the serial correlation in residual, and the null hypothesis that there is no serial correlation, and the results suggested no serial correlation at the second-order (p-value < 0.6918).
Observing the model-1, lagged ROAE was positive and statistically significant at 1 percent (p-value < 0.01).The relationship between CEO SAB and MD&A SAB with ROAE was positive and statistically significant at 1 percent (p-value <0.01).This suggested that these variables could be used for the prediction of banks performance at least one-year prior to the announcement of the results.These results were consistent with the attribution theory that management took credit for expected favorable outcomes, and distance itself when expected outcomes were poor (Bloomfield 2008).
Table 8 provides the results of model-2 which is a combination of SAB indices, and five financial ratios.Fit statistics showed little improvement in MSE, RMSE, and SSE as compared to model-1.In addition, there was no serial correlation within the residuals at second-order (p < 0.9051).The Sargen test also showed that the instruments set was appropriate (p-value < 5275).Observing the Table 8, MD&A SAB was positive, and highly significant at 1 percent (p-value < 0.01) without major change in coefficients.It means that the self-attribution bias of management provided additional information, over and above the traditional bank level quantitative variables.However, CEO SAB became insignificant by adding the financial ratios in the model-2.This suggested that CEO managerial attribution provided indications about future performance, when it was used without financial variables.All the bank level financial variables were significant at 1 percent (p-value < 0.01).For example, the size of banks in terms of total assets was negative and statistically significant at 1 percent.Non-performing loans to Gross Loans showed a negative relationship with future ROAE and was significant at 1 percent.Likewise, the tier1Capital ratio, and Loans to assets ratio were negative and significant at 1 percent.Nevertheless, assets growth was positive and statistically significant at 1 percent (p-value < 0.01).
Finally, Table 9 demonstrates the results of model-3 using the full set of SAB indices, quantitative financial variables, and macroeconomic indicators.Fit statistics were further improved in MSE, RMSE, and SSE from previous both model-1 and model-2.The AR test provided no serial correlation at the second order within the residual (p-value < 0.9422), and the Sargen test demonstrated that instruments used in the model were appropriate (p-value < 0.3189).Adding macroeconomic indicators within the model, CEO attribution bias index became positive and significant at 5 percent (p-value <0.05).Management discussions and analysis persistently remained positive and significant at 1 percent (p <0.001).Adding the macroeconomic indicators did not make any change in the bank level financial variables as compared to model-2.Some of the coefficient values were changed, but all the financial variables remained significant at 1 percent (p-value < 0.01).However, all the macroeconomic indicators were insignificant in model-3.

Discussions
Observing model-1 and model-3, the results suggested that managerial self-attribution bias in CEO letter to shareholders predicted the future ROAE, either alone or in combination of quantitative financial variables and macroeconomic indicators.This suggests that CEO of banks use more first and second-person pronouns in the letter to shareholders to take credit for expected positive outcomes.Thus, it also suggests that the self-attribution bias of CEO provides additional predictive power, over and above the traditional quantitative financial data that researchers have been using from many decades.The results produced in this study were consistent with prior studies conducted on non-financial firms (Clatworthy and Jones 2003;Amernic and Craig 2006;Craig et al. 2013;Lehmberg and Tangpong 2018).
Similarly, self-attribution bias of management discussions and analysis also had predictive power, over and above quantitative financial and macroeconomic indicators.This index was positive and statistically significant in all the three models.Management discussions and analysis is a comprehensive document, where management discusses about overall bank performance, challenges faced by management, future plans, and expected outcomes.Thus, the results were supported by previous studies who had evidenced the positive relationship between self-attribution of management and firm' future financial performance (Li 2010b;Lehmberg and Tangpong 2018;Aerts 2001Aerts , 2005)).The results of both indices were also consistent with the attribution theory that managerial attribution bias provided contextual information for understanding the behavioral bias of management that could be linked to the expected outcomes of the banks (Bloomfield 2008;Hooghiemstra 2001;Clatworthy and Jones 2003).
In financial variables, the size of the banks in terms of assets was negative and statistically significant at 1 percent and 5 percent in model-2, and model-3 respectively, meaning that as the size of banks increased, future ROAE of the bank decreased.Thus, the size of banks in terms of assets could stipulate higher profit up to a certain level, thereafter, the profitability could be lower.Moreover, this also shows higher unproductive, non-interest earning assets in the balance sheet of the banks.Contrary to the study of Athanasoglou et al. (2008) and Petria et al. (2015), the results of both models were support by earlier studies that there was an inverse relationship between total assets and ratio of future ROAE, because the percentage of profit did not increase with the same proportion of assets (Trujillo-Ponce 2013;Panta 2018;Terraza 2015;Shehzad et al. 2013).Likewise, the ratio of asset growth was positive and significant at 1 percent in model-2 and model-3.This is directly proportional to the banks' profitability, i.e., as the assets of the bank increases, banks have more loanable investments to earn profits.Positive relationship also shows that banks have higher interest earning assets.Thus, an increase in customer borrowing means an increase in the interest income of the banks.These results were in line with the earlier studies that an annual increase in assets, profitability of the banks also increased (Mathuva 2009;Ahamed 2017;Bougatef 2017;Yao et al. 2018).
Non-performing loans to gross loans was also negative and the significant at 1 percent in model-2 and model-3 that means profitability decreases with an increase in non-performing loans, either used with self-attribution bias of management or in combination of macroeconomic indicators.Earlier discussed in the methodology section that loans were classified as non-performing, if the borrower defaulted or declared bankruptcy.Our results were supported by previous studies that higher the ratio, lowered the ROAE of the banks (Trujillo-Ponce 2013;Panta 2018;Petria et al. 2015).
The relationship between the performance of banks and Tier1capital was negative, because return on average equity was calculated by dividing the net income over equity.Thus, net income is spread over increased equity.The coefficient of tier1capital was highest in all the variables used in models.The results supported the view that the higher the ratio, the lower the future performance of banks (Stovrag 2017).Finally, the loans to assets ratio also showed a negative and significant relationship with future ROAE.This ratio indicates to what extent assets were devoted to loans as opposed to other assets, including cash, securities, and equipment.The results presented in this study are consistent with Goddard et al. (2013) that higher the ratios, lower the liquidity position of the bank, and may face a higher risk of failure.
In economic indicators, it was expected that there would be a positive relationship between banks future performance in terms of ROAE and GDP growth rate.However, the author failed to find any relationship as witnessed by (Shehzad et al. 2013;Djalilov and Piesse 2016).The reason of insignificance would be the direction of dependent and independent variables.It was argued in the introduction that banks promote economic growth.However, the present study considered GDP growth as determinant of future ROAE.Similarly, the relationship between interest rate spread and future banks performance was also insignificant.Finally, contrary to the studies of (Medura 2006;Pagratis et al. 2014), this study failed to find any relationship between future ROAE and exchange rate.
Based on the analysis, the study answers the research question that self-attribution bias of management in annual reports of banks provided additional predictive power, over and above the quantitative financial variables.The results provided in this study were more robust, because models were estimated in a way that reduced the endogeneity and heterogeneity problems along with valid instruments.

Conclusions
Banks are important financial intermediaries within the financial system because they help to promote the economic growth of a country.Nevertheless, the banking sector crisis had a history for their role in the financial turmoil, especially in the 2007-2008 sub-prime crisis.Such a banking crisis had not only reduced the industrial production, entrepreneurial innovation, trade, but also had knock-on effects to the rest of the world.Therefore, the prediction of bank performance is important for regulators to take pre-emptive actions to avoid huge losses.However, developed models for performance prediction of banks were only based on quantitative financial data.
In this research, self-serving attribution bias, which is a text analysis technique based on attribution theory, was used for a contextual understanding of managerial behavioral bias towards the outcomes of banks.The notion behind self-attribution was that management uses more first and second-person pronouns as compared to third person pronouns in annual reports if they anticipate better future performance.
The sample consisted of 58 banks of 16 emerging economies for a period from 2007-2015.For exploratory data analysis, hierarchical clustering from unsupervised machine learning was performed to detect latent groups within the data.It was observed that some of the banks joined the clusters based on asset growth, other banks formed clusters due to high NPL.GDP growth also worked as a differentiator for grouping the banks into the cluster.Finally, the size of the banks in terms of assets distinguished the small banks with medium and large banks.
To predict the future performance of banks, system GMM proposed by Blundell and Bond (1998) was used to estimate the models.System GMM helps to deal with the endogeneity and heterogeneity problems within the data.The results of the study have shown that there existed a strong relationship between managerial attribution bias and the future performance of banks in emerging economies.The results were consistent with the attribution theory, which predicted that managers took credit for good outcomes and distanced from bad outcomes.Therefore, the study concludes that self-attribution bias of management signals about the future performance of banks, over and above the quantitative financial data provided in financial statements.

Policy Implications
Regulators: Any technique that could even marginally improve the ability of regulatory authorities to make an assessment of overall bank performance would be beneficial, because supervisors may intervene in a timely manner to avoid bank failure (Gandhi et al. 2019).The findings of the study have shown that self-attribution bias of management in annual reports provides incremental information, over and above the quantitative data provided in financial statements of banks.Such information could be used as indications of early warnings, and help the regulatory authorities of emerging economies to differentiate the bad performing banks.As a result, the banks could be supervised more efficiently, and take preventive, and corrective measures to avoid huge losses to the investors, and government.
Investors and Analysts: The findings also show that the contextual information of management in emerging economies' banks can help reduce information asymmetry between shareholders and management (principal-agent theory).It can provide the existing as well as potential investors with a better tool for a comprehensive assessment of banks profitability for prospective investments.
Researchers: Use of textual information from the textual information of banks is a relatively unexplored area where researchers may yield rich insights for testing further hypotheses of interest.

Limitation of the Study
Sample data was relatively small, because the author had to follow certain criteria for inclusion of banks in the sample.For example, annual reports of banks must include CEO letter to shareholders, and management discussions and analysis (MD&A).The CEO letters were mainly available in the annual reports; however, MD&A was not regulatory requirement in emerging economies.In addition, the banks in emerging economies were either state owned or family owned or became public limited only recently.Therefore, most of the banks did not have these two sections.These issues reduced the sample size of banks.
Cultural differences of countries may hold individual effects, while using first-person pronouns, and second person pronouns by management in annual reports.For instance, it might be the convention in some sample countries that management uses more plural pronouns in these two sections.
Likewise, English was not the first-language in annual reports of banks in emerging economies.Therefore, use of pronouns, while writing annual reports may hold some implications for construction of self-attribution bias indices.
Finally, the literature on textual analysis was mainly focused on developed economies.Most of the banks in emerging economies are either state owned or family owned.Thus, management might not necessarily take credit for good performance, and bad performance attributed to external factors, due to restricted power of management over the board of directors and shareholders.

Figure 2 .
Figure 2. Scatter Plot Matrix of Independent and Dependent Variables.

Figure 2 .
Figure 2. Scatter Plot Matrix of Independent and Dependent Variables.
is the standardized value of observation n,  is the original value of observation n, x̅ and  are the mean, and standard deviation of the variable X.The sample data consisted of a panel of 58 banks, and the time period covers from 2007-2015.Directly making clusters of the panel data could match the observation of one bank' year with another bank.

Figure 4 .
Figure 4. Criteria for Selecting number of Clusters.

Table 1 .
Variables and Definitions.

Table 2 .
Descriptive Statistics of SAB, Financial, and Macroeconomic Variables.

Table 3 .
Correlation Matrix of SAB, Financial and Macro-Economic variables.

Table 3 .
Correlation Matrix of SAB, Financial and Macro-Economic variables.

Table 4 .
Frequency of Banks in Each Cluster.

Table 4 .
Frequency of Banks in Each Cluster.
Figures 5 and 6 show the distribution of clusters in pie chart, and bar chart respectively.

Table 6 .
List of Banks in Each Cluster.