1. Introduction
Since the publication of the United Nations’ report ‘Who Cares Wins’ more than two decades ago, the evaluation and management of environmental, social and governance (ESG) risks has proven challenging. These challenges can be attributed to multiple factors. Conceptually there has been a great deal of ambiguity and a multitude of perspectives [
1], which have resulted in the politicization of the issue [
2], questions about purpose and goals [
3], and issues with reporting [
4,
5] and ESG ratings [
6,
7,
8]. While existing research generally demonstrates the existence of a relationship, the direction and size of financial results are mixed and conditional [
9,
10]. Furthermore, we can observe significant and rapid changes relating to ESG from a variety of sources. We can observe changes in political sentiment [
2], as well as the easing of regulation before they come into effect in Europe [
11] and the United States [
12]. This has resulted in changes to financial flows and the names of funds [
13]. Surveys have also demonstrated a decline in support for ESG factors [
14], and a shift towards a shorter [
15] and more selective [
16] list of investor-relevant factors. This is important because the impact of uncertainty relating to policy changes [
17,
18] and events has been proven [
19,
20,
21]. Although these studies are not directly related to ESG, based on our observations of the economy, we believe that a parallel can be drawn. This is important in the context of evaluating ESG risks, because we assume that it will influence corporate behavior, including their strategy and actions, and the quantity and content of the information they provide. Similarly, we assume that it will influence the way investors evaluate ESG risks and make investment decisions.
In the context of evaluating ESG risks and performance, a significant divergence has been observed among rating providers [
7,
8]. This raises concerns about the reliability and practical application of ESG ratings [
6]. The existing literature explores various directions, with the most prominent focusing on methodologies [
7,
8], disclosures [
6,
22], and biases [
23]. Investors and corporate surveys reinforce the importance of these findings, with the most significant factors in ESG ratings being the quality of the methodology, the relevance and materiality of issues, and the credibility of data sources. The experience and research competence of the team are also significant factors [
24,
25]. Further analysis of the problem and the relevant literature emphasizes the underlying issues that arise from the fact that ESG reports are the main source of data. These reports also experience a high level of divergence, as well as absence of clear definition and understanding [
1,
26]. These issues underpin all the issues. This has significant implications for how investors use ESG ratings to evaluate risk.
Existing research (e.g., [
7,
8,
27]) provides valuable insights into issues surrounding ESG ratings and the reasons for divergence. However, when considering the investor perspective on evaluating ESG risks, there are evident gaps in the literature on this topic. There is also a notable absence of literature that would summarize these findings based on a common definition of these risks. This paper primarily focuses on evaluating ESG risks and performance taking into account investor’s perspective. Gaps in the literature are addressed by providing a critical review of literature that combines a broader scope of literature on ESG evaluation relevant to ESG risks with existing literature on these risks. The purpose of this paper is therefore twofold. Firstly, it aims to conduct a critical literature review based on a robust framework of clear definitions. Secondly, it aims to aid the evaluation and development of ESG risk measures or ratings. We approach the research process by defining ESG risks, which aid the subsequent deconstruction and critical analysis of the ESG risk evaluation process. The aims of this research are the following: (1) a working definition of ESG risks and performance; (2) an understanding of the issues with data and reporting; and (3) an understanding of the evaluation of ESG risk and performance. The analysis demonstrates that there is a lack of consensus and clarity in definitions, reporting, and data, which manifests in the evaluation of ESG risk. The evaluation process is further complicated by differences in rating construction, disclosures, and biases, resulting in divergent ratings, which must be understood in order to be used effectively. The present paper contributes to our ongoing research, the objective of which is to enhance the evaluation of ESG risks. It achieves this by providing a framework that clarifies ambiguities relating to the concept of ESG. As mentioned, these ambiguities are significant factors in the evaluation process. Although this review considers investors perspective, it is also applicable to other financial institutions and stakeholders, such as banks. This analysis also provides investors with practical value.
The rest of this paper is structured as follows. In
Section 2 we define ESG risks and performance.
Section 3 addresses issues relating to ESG reporting and data.
Section 4 critically evaluates issues with ESG ratings and scores.
Section 5 provides discussion,
Section 6 outlies future research directions, and
Section 7 concludes the paper.
2. Definition of ESG Risks and Performance
The aim of this section is to address absences of consensus on ESG issues by developing an understanding and working definitions for ESG. It should be noted that the intention is not imply that the provided working definitions are superior to other possibilities. The proposed working definitions provide the concise framework required for this research, which seeks to address the view that meaning should be considered based on the research frame [
26].
In the United Nations report entitled ‘Who Cares Wins’ [
28], the acronym ‘ESG’ was introduced alongside the conceptualization of ESG factors. ESG was set broadly, without a specific definition or goals, to allow for broad support, which came at the cost of a clearly defined purpose and fundamental issues [
1]. There is still no consensus and clarity on the differences between the terms ESG, sustainability, socially responsible investing (SRI) and corporate social responsibility (CSR) [
26]. However, the results of a text mining review of the literature reveals commonalities between the terms CSR and ESG, as well as clear differences. CSR mainly refers to corporate activities focusing on “responsibilities and obligations”, while ESG refers to corporate activities focusing on the “performance of companies, shareholder, and stakeholders” [
29]. Starks [
26] proposes a similar distinction, differentiating based on motivations in the context of investment returns, and how expectations may differ based on ‘value’ versus ‘values’. A value-focused investor will incorporate ESG in the context of returns and risk management. Conversely, values investors adopt a negative screening strategy and evaluate the consequences of business operations. This is further reflected in the varied uses of term ESG. Initially it referred to the integration of ESG factors in investment analysis. However, the term has evolved to encompass risk management, functioning as a synonym for corporate social responsibility and sustainability or as an expression of ideological preference [
1].
To illustrate the importance of ambiguity in ESG definitions, we provide two examples. Firstly, it impacts interpretation of relationship between ESG and financial performance. A meta-analysis of over 2200 empirical studies revealed non-negative results in 90% of cases, with more than 2100 of these demonstrating positive outcomes [
9]. Similarly, a comprehensive review of over 1000 studies produced similar mixed results (approximately 90% non-negative for both financial and investment performance) while also highlighting issues relating to the unclear specification of different elements [
10]. While the results demonstrate an impact it is important to acknowledge Starks [
26] observation that the discrepancies between investment funds’ objectives and approaches, and the absence of differentiation between these different types of funds in empirical studies, complicate the assessment of their effects on investment decisions and society. Secondly, it results in diverging ESG ratings [
7,
8,
30], as is discussed in
Section 5.
We define ESG at two levels. Firstly, we define ESG in relation to performance and risk at a conceptual level. Secondly, we define individual factors in scope of ESG. As Wan et al. [
31] conclude, there is no academic definition of ESG. Therefore, for the purpose of this paper, we relate the definitions to existing research that contextualizes ESG within existing frameworks. We assume that this approach provides robustness and leverages the relationship between existing frameworks and outcomes.
From an ESG performance standpoint, we adopt Edmans [
32] argument, that ESG factors are extremely important and at the same time nothing special in comparison to other intangible assets (e.g., management quality, corporate culture, and innovation capability) when it comes to creating long-term financial and social value. Although IFRS [
33] (p. 8) defines intangible assets as “an identifiable non-monetary asset without physical substance”, which is open for some interpretation, ESG factors do not meet the other requirements. While the validity of this approach in defining ESG performance remains uncertain, it could be argued that ESG factors and intangible assets are somewhat similar in nature and behavior. For example, both ESG and intangible assets impact brand and reputation. Additionally, a key point of view is that the impact of ESG can stem from both risk and opportunities [
28]. In this context, we define ESG performance as the quality of company practices relating to environmental, social, and governance factors that are improved through management of opportunities and risks to create a long-term value for the company. The proposed definition also incorporates risks, enabling us to relate performance literature to the scope of risks. It also allows for open interpretation in relation to sustainable strategies of investing.
A systematic review of literature on ESG risks indicated that, although the literature on the topic is growing, it is currently scarce and scattered [
34]. Searches in academic databases reveal that the most frequently cited papers are related to either ESG risks in asset allocation or ESG practices and their impact on evaluation, which is line with existing review [
34]. As Pollman [
1] and Starks [
26] point out, the ESG risks are considered in the context of risk and return opportunities at company and portfolio levels. Based on this, we can infer that this is the reason why the literature on ESG risks is somewhat limited. Similarly to the definition of ESG performance, the proposed definition of ESG risks is set within an existing framework. The definition of risk is a key issue when researching ESG and risks, because different studies use different types of risk, which limits general understanding of ESG risks [
34]. One of the new terms in the risk taxonomy of the financial services industry is ‘non-financial risk’, which denotes new or emerging risks that are defined based on exclusion [
35]. An example of this is Khan [
36] (p. 14) who defines “nonfinancial risk management deals with all risks that are not financial (credit, liquidity, market, interest)” in an International Monetary Fund working paper. However, the financial impact does need to be measured. [
35]. As a starting point we use the definition of ESG risks provided by European Banking Authority [
37] (p. 6) which defines them as “risks of any negative financial impact on the institution stemming from the current or prospective impacts of ESG factors on its counterparties or invested assets.”, while they materialize through the traditional categories of financial risk (credit risk, market risk, operational and reputational risks, and liquidity and funding risks). Additionally, ESG risks can be categorized as novel risks, which Kaplan, Leonard, and Mikes [
38] (p. 2) define as follows “novel risks arise from unforeseen events, from complex combinations of apparently routine events, and from apparently familiar events occurring at unprecedented scale and speed”. The European Central Bank [
39] also classifies environmental risks as novel risks due to the lack of historical data. Based on this, the proposed working definition is that environmental, social and governance risks are risks of any negative financial impact that stem from the ESG factors, having characteristics of non-financial risks due to their origin, and of novel risks due to their lack of history. In the context of ESG ratings, one of the identified purposes is to evaluate risks and identify misconduct [
40], which can then be used for screening [
41].
In scope of individual factors, studies point to differences between ESG and sustainability frameworks [
42,
43]. The European banking authority goes further by defining ESG factors as “Environmental, social or governance matters that may have a positive or negative impact on the financial performance or solvency of an entity, sovereign or individual”, and listing frameworks that address these factors. Additionally, they provide examples of ESG factors in most commonly used frameworks, and summarize common areas for environmental (water usage and consumption, waste management and production, energy consumption, pollution, biodiversity, GHG emissions), social (labour and workforce considerations, human rights, inequality, discrimination, gender equality), and governance (rights and responsibilities of directors, remuneration, bribery and corruption) factors [
37]. Although we advocate for clear definitions because they facilitate understanding and enable easier comparisons, we also advocate for some flexibility regarding the factors, unless they are regulated. This argument is based on the aforementioned trend of investors becoming more selective about the relevance of factors. Therefore, we argue that investors and other stakeholders, including values investors, should prioritize topics that generate the most significant value. Moreover, this assertion is reinforced by the politicization of ESG as a whole [
2,
32], with social factors playing a particularly prominent role in this regard [
44].
3. ESG Reporting and Data
This section provides an overview of the characteristics, issues, and developments related to this topic, with a particular focus on ESG reporting, which is the primary means through which companies disclose data [
45].
Most of the actions taken to improve reporting and data was focused on establishing reporting guidelines and standards. The first steps were taken by various non-governmental and private organizations, including the Global Reporting Initiative (GRI), the International Integrated Reporting Council (IIRC), the Climate Disclosure Standards Board (CDSB), the Task Force on Climate-related Financial Disclosures (TCFD) and the Sustainability Accounting Standards Board (SASB) [
46]. Survey data show that reporting grew amongst the G250 (the world’s 250 largest companies by revenue based on the 2023 Fortune 500 ranking) and the N100 companies (the top 100 companies worldwide by revenue) in the 2010s [
47], remaining stable at 96% (77% of which used GRI and 56% SASB) and 79% (71% of which used GRI and 41% SASB), respectively. Significant differences were observed based on the level of development of the country or region [
48]. A similar trend was observed among Russell 1000 companies, with 93% of companies reporting on ESG matters in 2023, including 98.6% of the larger half and 87% of the smaller half of companies reporting. Of these, 81% used GRI and 56% used TCFD [
49].
While some of the proposed guidelines are compatible—for instance, the GRI covers the stakeholders’ interests, while the SASB covers the investors interest [
50]—differences remain, which are being addressed through alignment of different standards. Examples include the Corporate Sustainability Reporting Directive (CSRD) and the GRI [
51], as well as the European Sustainability Reporting Standards (ESRS) and the International Sustainability Standards Board (ISSB) [
52]. However, it is suggested that the probability of convergence in the short term is limited [
53]. As aforementioned, CSRD came into effect in the European Union on 5 January 2023. This would make CSRD mandatory for 50,000 companies that are either based in the EU, have subsidiaries there, or that are listed in a regulated market. It would require the use of the EU’s reporting standards and taxonomy, as well as the application of the ‘double materiality’ principle and independent assurance [
54]. This changed on 26 February 2025 with omnibus one, which delayed the inclusion of broader scope of companies and simplified the scope of reporting. The proposal aims to reduce the reporting burden and to limit the trickle-down effect of obligations to smaller companies [
11]. Similarly, the final rule adopted in the USA on 6 March 2024 required large, publicly traded companies to disclose information on climate action, greenhouse gas (GHG) emissions, and the financial impacts of severe weather events [
55]. On 27 March 2025, the Securities and Exchange Commission ended its defense of the rule [
12]. We can assume that these actions will probably slow down improvements in ESG reporting and ratings. However, given the current economic situation and the absence of the required infrastructure, it makes sense to reduce the reporting burden on companies.
ESG reports are not the only source of ESG data. Visalli et al. [
56] categorizes ESG data as either primary and secondary, and then further categorizes data according to the source. Primary data is divided into three categories: self-reported ESG data (i.e., all the data disclosed by the company itself), third-party ESG data (i.e., data from non-governmental and governmental websites and reports) and real-time ESG signals (e.g., news, social media and company reviews). Secondary data comprises ESG data vendors or providers, who collect, systematize, and analyze ESG attributes from primary data sources. Li and Polychronopoulos [
57] developed a three-tiered framework to help to categorize the providers and understand the different types of ESG rating data. Fundamental tier collects and aggregates publicly available data from the aforementioned primary sources but does not provide company-wide ESG scores. Comprehensive providers utilize objective and subjective data from ESG market segments and have their own overall rating methodology. This methodology combines publicly available data with data produced by their own analysts through company interviews, questionnaires, and independent analysis. This produces hundreds of different metrics. Specialist providers focus on specific ESG issues such as environmental/carbon, corporate governance or human rights. Most providers fall into the comprehensive category.
Issues with data can arise during the creation process. For example, self-reported data may be subject to potential biases, since companies can choose what information to report. Meanwhile, real-time signals may be influenced by misleading news from competitors. Secondly, immediate issues may arise regarding the format and access, resulting from different reporting standards and a common source. This relates to primary data being available but unstructured, while secondary data is limited, untimely, and lacking in transparency [
56].
Research into reporting shows that many different challenges can be identified. These were categorized into four groups. These categories are: (1) behavioral challenges highlighting resistance to governance-related issues, (2) data-based challenges relating to operational issues concerning data collection, costs and handling, (3) methodological challenges emphasizing the technical limitations of various impact valuation accounting methodologies in the context of environmental and social issues, and (4) contextual challenges arising from the complexities of socio-ecological economic systems characterized by heterogeneity, interconnectedness and the emergence of global patterns [
58].
With regard to reporting and measurement, Kaplan and Ramanna [
5] (p. 3) state that reporting “must be based on the distinctive measurement issues in each of its dimensions to become as relevant and reliable as financial reporting”. Environmental reporting should start with topics that are easier to measure such as greenhouse gases (GHGs), rather than equally relevant topics that are more difficult to measure, such as non-GHGs. These findings could then be applied to other topics. When reporting on social topics, the focus should be on areas of general consensus, such as reducing unsafe working conditions, eliminating child and slave labor, and preventing bribery and corruption. Conversely, governance is viewed as a process rather than an outcome, and as such, it should be incorporated into financial, environmental, and social reporting [
5].
General quality of ESG data reveals similar issues from a different perspective. When it comes to ESG data quality, there are several key factors that must be taken into consideration, several of which have a lot in common. Jonsdottir et al. [
59] identify four principles: materiality, accuracy, reliability, and comparability. The Global Reporting Initiative [
60] provides more extensive list of reporting principles, including accuracy, balance, clarity, comparability, completeness, sustainability context, timeliness and verifiability. Alternatively, Monk, Prins, and Rook [
61] define reliability (covers accuracy, precision, and verifiability), granularity, freshness, comprehensiveness, actionability and scarcity. In addition, Kotsantonis and Serafeim [
4] identify four limitations of ESG data: (1) inconsistencies in metrics; (2) lack of clarity in peer group definitions; (3) differences in imputation methods; and (4) increased disagreement as more information becomes available.
Research has found that companies implementing SASB disclose more on topics related to their business, which has a positive effect on disclosed information. Companies disclosed 48% more information on material topics and provided more information on topics deemed material for their sector. Additionally, they also increased the proportion of material information by 11%. However, reports remain varied due to the absence of audits, mandates and regulations [
45]. This is further supported by the finding that companies that adhere to guidelines disclose 39% more information. It is also noted that most guidelines use process-focused verification rather than content-focused verification. However, companies prefer the former, which also results in better information [
62]. In addition, study on the topic of audits found that the scores of audited companies remained stable, whereas those of non-audited companies tended to decline. This suggests that third-party auditing of reports helps to ensure the quality of a company’s ESG information [
63].
As this paper focuses on evaluation, it is important to make an additional note about the continuity and consistency of ESG reporting, and therefore also about the ESG data. Firstly, it is important to note that regulatory reporting, both in the EU and the United States, is not a standalone occurrence, as there have been developments in all regions [
47]. While there is a certain level of alignment between regulatory reporting [
64], and as aforementioned also between voluntary standards, this brings changes to reporting. Secondly, voluntary standards such as the GRI [
65] and the SASB [
66] have also evolved over the years. While the topic of changes to reporting standards and requirements fall outside the scope of this paper, it is very important when it comes to the evaluation of ESG risks, given that inputs are subject to change. This is particularly important in the context of utilizing methods of artificial intelligence, especially large language models.
4. Evaluation of ESG Risks and Performance
The evaluation of ESG risks in the form of ratings is important because it summarizes a company’s risk or performance within a single measure. While these ratings are of great conceptual importance, academic research and the media are questioning the effectiveness due to a high degree of disagreement between providers [
6]. It is important to note that the literature does not specifically discuss ESG risks, but as discussed in
Section 2, these risks are inherently captured within the scope of ESG and, therefore, within the scope of the ratings. Examining some of the largest providers, such as MSCI [
67], Sustainalytics [
68], S&P Global [
69] and LSEG [
70], it evident that they mention measuring risk in their methodologies. The focus of the existing literature is largely on issues and divergence, and the research is recent and scarce. For this reason, we organize the literature review around these topics, breaking down the issues of ESG ratings step by step.
Starting with the widely discussed issue of divergence, the existing literature often refers to credit ratings, which are highly correlated, with a correlation around 90% [
30] and higher [
71]. The correlation among ESG ratings is substantially lower, with average values ranging from around 30% to 50%, and the highest values reaching approximately 70% [
7,
8,
30]. As demonstrated by Charlin, Cifuentes and Alfaro [
30] in their study, applying measure theory techniques results in a reliability and a level of agreement of 18.3% of 5.4%, respectively. This emphasizes the inherent subjectivity of any ESG evaluation. While the diverse needs of clients are an argument for different methodologies and approaches, the varying degree of transparency rating methodologies limits the comparability and understanding of the reasons for divergence [
72]. Another view on divergence points out that explanations for differences lie in social contexts, such as history, ownership, establishment, and the purpose of providers. These contexts influence values and the definition of materiality, as well as the social context of the forming assessments, all of which are reflected in the final evaluation [
2]. The studies mentioned in this paragraph were conducted using a sample of 10–15 major providers, including MSCI, Sustainalytics, RobecoSAM, LSEG (formerly Refinitiv), FTSE Russell, Thomson Reuters, CDP and ISS. However, some studies suggest that the total number of providers is actually between 125 and 150 [
24], or even up to 600 [
73]. While the consolidation of providers is generally considered positive in terms of heterogeneity and increased standards, it also carries the risk of creating an oligopoly, as has been observed with credit ratings [
72].
In terms of the construction of ratings, ESMA [
72] (p. 106) categorizes them as either ESG risk ratings, which measure “the exposure of entities to ESG risks and how these risks are managed” and ESG impact ratings, which measure “the impact of entities on ESG factors”. It is further explained that, as they are built on comparable methodologies and metrics, there are small differences, and that they can also be categorized as either backward-looking or forward-looking. Building on the definitions framework of this paper, we propose the following definition of ratings, as defined by ESMA [
74] (p. 3) that an “ESG rating means an opinion regarding an entity, issuer, or debt security’s impact on or exposure to ESG factors, alignment with international climatic agreements or sustainability characteristics issued using a defined ranking system of rating categories”. Based on our review of literature, we have identified three common themes with ESG ratings that relate to (1) the construction or methodology of ratings, (2) disclosures, and (3) biases.
4.1. Construction or Methodology of Ratings
Differences have been identified in almost all aspects of ratings construction. This begins with the various ESG definitions of different providers. Stewart [
27] analyzed the definitions of thirteen rating agencies. Based on this analysis, he found that only Moody’s, RepRisk, and Sustainalytics explicitly measure risk, albeit under different definitions of risk. The definitions and ratings from CDP and Just Capital focus only on a subsegment, while those of the remaining providers are less explicit about what the ratings intend to measure. The definitions of Bloomberg, EcoVadis, FTSE Russell, Refinitiv and Sustainable Fitch do not explicitly define risk exposure in their definitions and focus on performance. ISS, MSCI and S&P Global, however, capture both risk and performance instead. Chatterji et al. [
75] analyzed rating divergence based on two preconditions. The first precondition is theorisation relating to what raters choose to measure, or common theorisation referring to agreement on a common definition. The second precondition is commensurability, which refers to an overlap in how they measure. The results showed limited common theorisation and low commensurability, even after adjusting for explicit differences. Research built on these findings notes that the decomposition of ESG ratings is not straightforward, as the structures relating to the different indicators and how these are organized into hierarchies differ [
21]. While some providers only break down the first level of the hierarchy only into environmental (E), social (S) and governance (G) factors, others take different approaches [
7,
27]. Others break them down further to the level of ESG subcomponents [
27], and even to the level of metrics or indicators [
8]. This reveals additional differences between ratings, as these include from 37 to 300 or more metrics [
7,
8], as well as different main risk factors [
8]. To enable meaningful comparisons between five providers, Berg, Kölbel and Rigobon [
7] imposed their taxonomy and assigned 708 indicators to 64 categories. Raters’ unique indicators were labelled as ‘unclassified’. The authors further studied the measurement divergence by comparing the assessment of the categories. Due to the common categorization, the resulting differences can be attributed to measurement divergence, which is also present in unambiguous categories and increases with lower granularity. In the final decomposition, the weights are also considered, resulting in contributions to the total divergence. Measurement accounts for 56% of the divergence, scope accounts for 38%, and weights accounts for 6%. Differences between pairs of providers also provide insight into the areas of difference. The analysis also identifies the categories that contribute most to the overall divergence: Climate Risk Management, Product Safety, Corporate Governance, Corruption and Environmental Management Systems. A different study that decomposes ESG ratings into values and weights at the first and second levels produces different results. The main driver of divergence in this study is weight divergence of social and governance indicators [
76].
Furthermore, we can identify additional differences contributing to the divergence. One example is the definition of materiality, which differs between providers and is proprietary. This can lead to further divergence in weighting, as the final ratings are adjusted by integration of industry-specific issues. However, very few providers are transparent about the weights [
8]. Boffo and Patalano [
77] also consider materiality in the construction of ratings, while highlighting factors relating to the qualitative expert judgement involved in derivation of indicators and weighting of inputs. Furthermore, they highlight the importance of incorporating controversies, which some include in the final rating, whereas others treat them as a standalone rating. Differences can also be found in the data sources and the rating specifications (e.g., CCC–AAA or 0–100) [
8]. Another aspect is categorization of metrics according to the type of input, output, outcome and process that also varies between provider [
78]. It is also important not to overlook the lack of transparency in some providers’ methodologies [
72], as this makes identifying the differences much more difficult, or even impossible. Studies analyzing consistency and convergent validity [
79], and dimensionality, reliability and validity [
80], also reach similar conclusions. Another problematic aspect of methodologies is the rewriting and amendment of ratings. Changes to historical ESG scores from Refinitiv have been documented for the same firms over same periods. This occurred due to changes in the calculation methodology and unannounced data modifications. Depending on the used data, there can be significant changes in rankings and classification, moving towards ex-post upgrades. The argument is that this creates an incentive to introduce a positive relationship with returns [
81]. It is also emphasized that changes in methodologies can lead to rating upgrades, as well as commonalities in upgrades, such as the adoption of different policies [
82].
4.2. Disclosures
While disclosures can relate to the reporting and data topics discussed in
Section 3, they can also relate more generally to how much and how companies disclose. Opinions on these disclosures are not fully aligned. The study by Christensen, Serafeim and Sikochi [
6] provides an extensive insight into the factors that affect disagreement. Firstly, they demonstrate that increased disclosures have negative effects, as they provide more opportunities for interpretation and subjectivity. Conversely, when faced with limited data, raters tend to rely on similar imputation methods or rules of thumb. For example, assuming that a lack of reporting indicates poor performance. Secondly, there are more disagreements regarding environmental and social factors, as well as companies with poor ESG scores. Thirdly, there is also disagreement about inputs. There is less disagreement about ratings based on inputs (i.e., what companies are trying to achieve) than on outputs (i.e., actual company performance), as it is unclear what constitutes good or bad performance. The discrepancies are the greatest when the two are compared [
6]. Similarly, it was found that greater quantitative disclosures, particularly in respect to environmental and social pillars, resulted in greater divergence, whereas standardized, comparable, and more numerical information reduced disagreement [
83]. Greenwashing practices and compromised information quality can inadvertently lead to more disagreement [
84]. In addition, there was an increase in divergence, particularly in the context of environmental factors. This relationship was found to be more pronounced in circumstances involving greater availability of private information, lower rating agencies accuracy, more volatile and complex operating activities, and regions experiencing heightened ESG concerns [
85].
However, a lot of recent research indicates positive effects. Voluntary ESG reporting, particularly in respect to the environmental and social dimensions and longer reports reduced disagreement [
22]. Furthermore, the study also indicates that improved information on biodiversity also reduces divergence. The results are more pronounced for companies that are less covered by analysts and for those that report voluntarily or independently [
86]. Similarly, voluntary disclosures relating to the environmental and social effectively reduced divergence, with more pronounced results for companies with higher quality disclosure, better governance practices, external auditor assurance, and compliance with GRI standards [
84]. Equity-based compensation was also found to be associated with lower disagreement, as equity pay improves the quality of disclosures and practices. However, this depends on additional factors [
87]. Examination of ESG rating divergence was found to have a negative impact on performance forecasts. Therefore, companies are compelled to enhance precision and adapt to external environment in relation to strategic information disclosures [
88]. Positive expressions in ESG reports lead to greater divergence when the reports are considered low in credibility [
89]. In contrast, negative expressions enhance consistency and remain unaffected by credibility. A positive tone and ‘sticky’ words are also associated with greater disagreement [
22]. Cross-listing has been found to significantly reduce disagreement, which occurs through multiple channels that improved disclosure [
90]. Although adopting reporting standards does not affect disagreement over ESG ratings, adopting assurance practices reduces this phenomenon [
91]. Similar findings also show that there is less disagreement when firms obtain third-party attestations, especially when the external assurers are accounting firms and when the firms adhere to higher levels of GRI reporting standards [
22].
4.3. Biases
Biases are important because they skew the ratings based on certain criteria. Within the scope of risk management, this implies the possibility of undervaluing certain risks. Based on existing research, the most important criteria are size, geography, and industry [
23,
40,
92].
In terms of company size, Doyle [
40] found that companies with higher market capitalizations had higher ratings than those with lower ones. Similarly, portfolios consisting of larger companies tend to have higher average ESG ratings than those consisting of smaller companies [
93]. These differences can be explained by greater resources available for ESG initiatives and the more extensive disclosure of data [
94]. In addition, there are two interpretations of the difference. One is that larger companies are more sustainable than smaller ones. The other is that larger companies have an advantage when it comes to measuring sustainability, due to their size, as well as the resources to provide ESG data and make company ESG performance available [
95]. A comparative study suggests that large multinational corporations are particularly advanced in making extensive public commitments to CSR and publishing comprehensive reports. In contrast, small and medium-sized enterprises (SMEs) are particularly advanced in implementing CSR-related practices into organizational processes and procedures, such as engaging employees [
96].
There is agreement that there is a geographical bias towards Europe having higher ratings [
40,
93]. These ratings are attributed to mandatory reporting requirements, which improve the quality of the data [
23,
93]. This idea is also further supported in respect to the risks associated with ESG scores, where Europe is classified as low risk, North America, Africa, Australia and Oceania as medium risk, and Asia and South America are classified as high risk [
97].
In the case of industries, the majority of sustainability-related ratings are normalized by industry to account for changes in materiality. However, this approach oversimplifies the situation based on the assumption that all companies in the same industry face the same risks, rather than considering company-specific factors. This approach introduces biases, such as favoring sectors where emissions are easier to reduce, penalizing extractive sectors, and disadvantaging companies that do not fit well into the categorization [
92].
Some other biases which are not discussed as widely. The first of these is the relationship between companies and providers, whether through the data verification process [
92], through consulting [
98], or a bias towards existing clients of credit rating agencies [
99]. Additionally, the literature highlights the ‘Rater Effect’, whereby evaluators tend to rate companies more highly in other categories if they have rated them positively in one category. However, the existence of this effect is inconclusive [
7]. Lastly, there is the language bias, whereby companies that only report in the local language could receive worse ratings as documents would be excluded. However, this is supposedly not the case for larger rating providers [
92]. Additionally, broader factors—such as operating in a critical sector or being located in a country with better institutions—have been shown to reduce disagreement on ESG issues. This could have further implications for other biases [
100].
5. Discussion
In reviewing the literature for the purpose of this paper, we found that the literature on the topics of ESG risks is scarce and scattered. This is consistent with the result of the systematic literature review [
34]. As aforementioned, ESG risks are largely integrated within ESG performance [
26], with similarities identified in terms of definitions, data source, and reports. Additionally, as Pollman [
1] points out, risk management is considered as one approach of applying ESG and is something that is frequently referenced by rating providers. Thus, we did not conduct a systematic literature review but instead focused on analyzing a broader range of literature relevant to the topic of evaluating ESG risks. This broader scope of literature includes papers related to ESG performance, based on the assumption that, because performance and risk factors are integrated, the studies of ESG performance will have implications for the evaluation of ESG risks. Additionally, systematic reviews have already been conducted [
34,
101,
102]. Based on an analysis of existing literature, we can identify numerous issues and challenges related to ESG risks, from fundamental issues to the evaluation process, which are discussed throughout this paper. Consequently, this paper is structured to address these issues sequentially, beginning with definitions that provide a framework for the rest of the paper.
Firstly, existing research highlights the lack of consensus and clarity surrounding the definition of ESG in general [
26] and ESG risk specifically [
35]. Additionally, it is noted that lack of consensus and clarity makes it difficult to compare different studies [
26,
34]. The main divisions are based on terminology (e.g., ESG, CSR, sustainability) and impact (e.g., value versus values, impact on financial returns of investors versus impact on society). However, it is important to note that there is no clear distinction between them, as it is discussed in the literature [
1,
26]. Because this paper focuses on the evaluation of ESG risks, we use the term ESG and ESG value for investment returns, which are more in line with the literature [
1,
26,
37]. In developing the definitions, we aligned them with the literature that leverages the existing frameworks, which are the intangible assets for ESG performance [
32], and the scope of financial, non-financial and novel risks for ESG risks [
37,
38,
39]. Considerable differences exist between the frameworks regarding specific factors to be included [
37]. This creates another layer of issues relating to two questions: (1) What can impact financial performance? and (2) What is relevant to investors and stakeholders? In the context of ESG risks, the topics should relate to issues that have a negative financial impact, such as regulation, reputation and controversies. Similarly, there is no clear way to distinguish between these questions. Although aligning with existing frameworks and distinguishing between ‘value’ and ‘values’ can be helpful, these approaches do not solve the fundamental problem. Therefore, the proposed working definitions are not a solution to the complex problem at hand. Instead, they serve as tools and frameworks for analysis, and as a guidance for other researchers and practitioners, demonstrating that definitions must be sufficiently precise and unambiguous.
Secondly, issues relating to reporting and data can be traced back to a lack of clear definitions and objectives. Existing research points to this in discussion of metrics, materiality and reporting [
5,
35]. Without a clear scope, it is difficult to determine what and how to measure. Additionally, differences in materiality between sectors introduce an additional dimension, making alignment even more challenging. Based on our analysis, there are four main issues related to reporting and data. These are (1) a lack of a clear definition of ESG and relevant factors [
4,
5]; (2) a lack of a clear understanding of what needs to be measured [
4,
5]; (3) too many approaches in measuring all of the factors, a lack of clear understanding what is material, and a lack of regard for what can be easily measured [
5]; and (4) a lack of mechanisms for ensuring the accuracy of this data [
62,
63]. Another issue is that because of its novelty, the reporting process can be a significant burden, particularly for small and medium-sized companies. This issue has also been recognized by the European Commission [
11]. Based on our analysis, the issues of data and reporting represent a major link between definition and evaluation of ESG because of the lack of unified information from companies and data providers. However, it is important to note that the issue of ESG rating disclosures is widely discussed.
Thirdly, despite the limited literature, existing studies provide a good overview of the issues associated with the evaluation of ESG risks or ESG ratings. Based on our analysis, common themes emerge between the studies and the issues, which we have grouped into three categories: construction of ratings, disclosures and biases. Additionally, there are commonalities between the specifics of the issues. In the construction of ratings, this relates to definitions [
27], the metrics used [
7,
8,
27], contributions to divergences [
7,
76], differences between providers, the rewriting of ratings [
81,
82], and other elements. Furthermore, we identify three additional issues: (1) a lack of transparency in rating methodologies, which makes comparisons difficult; (2) the role of disclosures, where different type of disclosures can worsen the ratings and there is a high degree of dependence on the characteristics of disclosures [
6], which can skew the evaluation; and (3) a number of biases that have been proven to affect company ratings, most notably size, geography and industry [
23,
40,
92]. Investors and corporations also articulate their issues and expectations in surveys. [
24,
25]. The recuring issues and their specifics, identified across multiple studies, indicate that there are common issues that need to be addressed. The main contribution of this paper lies in its deconstruction of issues related to the evaluation of ESG risks. Elements such as evaluation [
7,
8,
27] and disclosure [
5,
56,
58] have been studied, but currently no overview of these issues exists. In short, the issues are relatively easy to understand, but difficult to resolve. As discussed extensively in this paper, issues that stem from the definition, data, and ratings themselves lead to numerous rating issues. Additionally, factors such as materiality, transparency, expert judgement, controversies, disclosures and biases further complicate these issues. To accurately measure ESG risks, it is essential to account for all of these dimensions wherever possible.
In this paper, we offer a step-by-step deconstruction of the issues surrounding the evaluation of ESG risks, drawing on existing literature. The main issues identified relate to a severe lack of common understanding among all stakeholders. Although measurement theory provides the fundamentals of measurement for the social sciences, we examine metrology to address these issues. Specifically we refer to the definitions of calibration and measurement provided by the Joint Committee for Guides in Metrology [
103] (pp. 16, 28) which defines calibration as “operation that, under specified conditions, in a first step, establishes a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties and, in a second step, uses this information to establish a relation for obtaining a measurement result from an indication” and measurement as “process of experimentally obtaining one or more quantity values that can reasonably be attributed to a quantity”. The application of these principles to ESG risks requires a measurement standard (e.g., greenhouse gas emissions or the negative financial impact of these) to serve as a reference point for what constitutes low risk, along with controlled conditions (e.g., the same scope of emissions, reporting standard and country). Different instruments can then be used to measure elements of ESG risk and subsequently weighted according to preference.
As previously mentioned, this literature review, together with the discussed principles of measurement, contributes to our ongoing research on evaluating ESG risks based on ESG reports. This review has important implications for the evaluation process. Firstly, it provides us with working definitions that form the basis of the research process and the model. Secondly, it establishes a foundation for understanding ESG reporting by analyzing reporting methodologies and actual company reports. Thirdly, it highlights the limitations of current solutions. The prevailing methodological shortcomings of contemporary solutions are characterized by a lack of transparency and flexibility. As previously stated, this is becoming paramount considering the increasing integration of investor-relevant factors. It should also be noted that the subjectivity of human evaluators is a further significant factor to consider. To address these issues, we propose a model based on artificial intelligence methods that can be flexible in relation to topics, while also providing transparency regarding what is evaluated, and removing the subjectivity of human evaluation.
6. Future Directions of Research
As there are many identified issues and challenges, there are also many potential future directions of research. In general, our analysis demonstrated a limited body of literature on ESG risks. There are many different approaches to understanding ESG risks. We provide specific examples within the scope of the structure used throughout the paper.
Firstly, in terms of definitions, primary efforts should be made to develop a taxonomy that clarifies definitions and differentiates between terms and impacts. Additionally, we believe that all of ESG research should take this into consideration to minimize the need for consideration and interpretations based on the research question.
Secondly, in terms of reporting, we have identified the following steps that should be taken to improve ESG reporting: (1) a clear definition of ESG and relevant factors, (2) a clear understanding of what needs to be measured, (3) the construction of a limited number of comparable, generally applicable, and material measures for all companies, that provide a basis for a ‘good company’, such as respecting human rights, which be measured without requiring excessive resources, and (4) a mechanism for ensuring the accuracy of this data. In this context, an additional step could be to refine the measures to include those that are relevant to different industries, and construction of infrastructure that would limit the reporting burden on companies. We propose that future research should focus on the suggested steps with the aim of simplifying the measurement process and enabling companies to provide this data.
Thirdly, in terms of evaluation, research addressing specific issues related to construction of ratings, disclosures and biases would contribute to the field of ESG risks. This is due to the lack of existing research, as our analysis was largely based on more general papers. However, our research explores the development of ESG ratings as part of academic research. This approach would reduce dependence on ESG rating providers while also enhancing transparency, generating new solutions and establishing a benchmark for analyzing providers’ ratings. This type of research has the potential to make theoretical and practical contributions.
Finally, it is imperative to note that all future research on ESG risks, and related subjects, should align with the aforementioned trends of selectiveness and integration of ESG factors.
7. Conclusions
Since its inception, the field of ESG risks has lacked consensus and conceptual clarity, which complicates understanding, in light of the rapid pace of change. This phenomenon impacts academic research, particularly affecting investors and other stakeholders, who must adapt to these changes. In this paper, we deconstruct the issues related to the evaluation of ESG risks. Firstly, the definition of ESG is addressed, and working definitions of ESG risks and performance are developed. Secondly, the issues with data and reporting are addressed. Finally, the issues directly relating to the evaluation of ESG risks are addressed, including construction of ratings, measurement, weights, disclosures and biases.
The results of our analysis show that the literature on ESG is limited, particularly in respect to definitions, data, and evaluation. Therefore, it is necessary to consider a broader range of literature in relation to performance. Firstly, deconstructing the evaluation demonstrates that issues stem from a lack of consensus and clarity, which undermines understanding and comparability of everything built upon these definitions. Secondly, issues with definitions also affect reporting and data, where a lack of consensus creates issues with measures, understanding, and accuracy. Finally, the aforementioned issues converge in ESG risk evaluation and are further complicated by methodological differences, transparency, the influence of disclosures, and various biases. Overall, we conclude that there are numerous issues on multiple levels, resulting in ratings that must be clearly understood before use and used with caution.
This paper makes theoretical and practical contributions. The theoretical contribution lies in the deconstruction of ESG risk evaluation, which provides a framework for identifying issues and possible solutions. The practical contributions of this study, which bear implications for investors and other stakeholders, pertain to the use of existing ESG ratings. The findings of our analysis indicate that investors must clearly understand the definitions and construction of ESG ratings, as significant differences between them may potentially result in suboptimal decision-making.