Abstract
As corporate sustainability reporting evolves into a pivotal resource for investors, regulators, and stakeholders, the imperative to evaluate and elevate ESG disclosure quality intensifies amid persistent challenges like opacity, inconsistency, and greenwashing. This review synthesizes interdisciplinary insights from accounting, finance, and computational linguistics on artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), as a transformative force in this domain. We delineate ESG disclosure quality across four operational dimensions: readability, comparability, informativeness, and credibility. By integrating cutting-edge methodological innovations (e.g., transformer-based models for semantic analysis), empirical linkages between AI-extracted signals and market/governance outcomes, and normative discussions on AI’s auditing potential, we demonstrate AI’s efficacy in scaling measurement, harmonizing heterogeneous narratives, and prototyping greenwashing detection. Nonetheless, causal evidence linking managerial AI adoption to stakeholder-perceived enhancements remains limited, compounded by biases in multilingual applications and interpretability deficits. We propose a forward-looking agenda, prioritizing cross-lingual benchmarking, curated greenwashing datasets, AI-assurance pilots, and interpretability standards, to harness AI for substantive, equitable improvements in ESG reporting and accountability.
1. Introduction
Corporate sustainability reporting has shifted from a peripheral exercise to a central element of firms’ communication with investors, regulators, and broader stakeholders. The proliferation of standalone sustainability reports, integrated reports, and ESG sections within annual reports has created an unprecedented volume of disclosures intended to support capital allocation, regulatory oversight, and public accountability (Kotsantonis & Pinney, 2022). Yet, despite this expansion, persistent concerns remain about the quality of ESG disclosures. Reports often vary in readability, employ heterogeneous frameworks that hinder comparability, provide information of uneven informativeness, and, in some cases, raise credibility issues related to greenwashing (Berg et al., 2022; Christensen et al., 2021). This divergence undermines the utility of ESG reporting for both financial decision-making and policy implementation.
At the same time, artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), has advanced rapidly, offering tools capable of processing large volumes of unstructured text, detecting semantic patterns, and benchmarking disclosure practices across firms and jurisdictions (Schimanski et al., 2024; Velte, 2025). Early work in textual analysis in accounting established that language carries economically relevant signals (Tetlock, 2007; Loughran & McDonald, 2011), and more recent developments in transformer-based models have expanded the frontier, allowing the automated classification of sustainability topics, the detection of forward-looking statements, and the measurement of narrative complexity (F. Li, 2010; Huang et al., 2024). These capabilities suggest that AI may address some of the long-standing challenges of ESG reporting by enhancing measurement fidelity, improving comparability, and supporting regulatory monitoring.
Despite its promise, evidence of AI’s actual contribution to ESG disclosure quality remains mixed. On the one hand, studies show that NLP-derived metrics can explain rating divergence and provide novel insights into sustainability-related communication (Berg et al., 2022; Fischbach et al., 2022). On the other hand, there is limited causal evidence that managerial adoption of AI tools leads to substantive improvements in the credibility or informativeness of disclosures as perceived by stakeholders. Indeed, AI can also introduce new risks, such as the production of polished but superficial narratives, algorithmic opacity in audit contexts, or biases when models trained on English corpora are applied in multilingual settings (Calamai et al., 2025).
The interplay between these trends raises an urgent normative question: Do AI technologies improve ESG disclosure quality for stakeholders, and if so, through which mechanisms and under what governance conditions? To answer this question, this review synthesizes evidence from accounting, finance, information systems, and computational linguistics. We structure the discussion around four operational dimensions of disclosure quality, readability, comparability, informativeness, and credibility, that link directly to stakeholder decision usefulness (Christensen et al., 2021). By integrating methodological advances, empirical findings, and policy debates, we aim to provide a comprehensive assessment of AI’s role in shaping the future of ESG reporting.
This article adopts a narrative review approach. We chose a narrative rather than a systematic review format given the nascent and rapidly evolving nature of AI applications in ESG disclosure, which spans multiple disciplines with heterogeneous methodological traditions.
Methodological Justification and Analytical Framework: The choice of a narrative review methodology is grounded in three key considerations. The first is the interdisciplinary nature of AI applications in ESG disclosure spans accounting, finance, information systems, and computer science, each with distinct methodological traditions, terminologies, and publication venues. A narrative approach allows for the flexible integration of these diverse streams, enabling cross-pollination of insights that a rigid systematic protocol might constrain. Second, the rapid evolution of AI technologies, particularly the emergence of transformer-based models and generative AI since 2020, means that the literature is characterized by frequent methodological innovation and conceptual development. A narrative review is better suited to capturing this dynamism and identifying emerging trends that have not yet solidified into stable research paradigms. Third, our objective extends beyond mere synthesis to include normative evaluation of whether AI genuinely improves stakeholder-relevant disclosure quality, which requires interpretive judgment and theoretical integration that narrative reviews facilitate.
Theoretical Framework: Our analytical framework synthesizes multiple theoretical perspectives to assess AI’s contribution to ESG disclosure quality. Drawing on disclosure theory from accounting research (Healy & Palepu, 2001; Verrecchia, 2001), we conceptualize disclosure quality as the degree to which corporate communications reduce information asymmetry and support stakeholder decision-making. We operationalize this through four dimensions: readability (cognitive accessibility), comparability (cross-firm and cross-temporal consistency), informativeness (decision-relevant content), and credibility (reliability and freedom from bias). These dimensions are informed by the qualitative characteristics of useful financial information articulated by the IASB Conceptual Framework, adapted to the non-financial reporting context. Additionally, we draw on legitimacy theory (Suchman, 1995) and impression management theory (Brennan & Merkl-Davies, 2013) to examine the strategic dimensions of disclosure, recognizing that firms may use AI both to enhance genuine transparency and to produce sophisticated but superficial narratives. Signaling theory (Spence, 1973) provides a complementary lens for understanding how AI-enhanced disclosures may serve as credible signals of firm quality when they involve verification costs or observable commitments. This multi-theoretical approach enables a nuanced assessment of AI’s dual potential to either substantively improve or merely cosmetically enhance ESG disclosure quality.
Search Strategy and Databases: We conducted literature searches across Web of Science, Scopus, SSRN, and arXiv between January and December 2024, with periodic updates through February 2025. Primary search terms included combinations of: (“ESG” OR “sustainability” OR “CSR” OR “environmental disclosure”) AND (“artificial intelligence” OR “machine learning” OR “natural language processing” OR “NLP” OR “text analysis” OR “textual analysis”). We also searched for specific methodological terms including “BERT,” “transformer,” “sentiment analysis,” and “readability” in conjunction with sustainability reporting contexts.
Inclusion and Exclusion Criteria: We included peer-reviewed journal articles, working papers from recognized repositories (SSRN, NBER, ECGI), and conference proceedings from major venues (ACL, EMNLP, ICAIF). Studies were included if they: (a) empirically applied or theoretically analyzed AI/NLP methods for ESG disclosure analysis; (b) addressed one or more dimensions of disclosure quality (readability, comparability, informativeness, credibility); or (c) examined governance, assurance, or regulatory implications of AI in sustainability reporting. We excluded purely technical papers without application to corporate disclosure, practitioner reports without methodological transparency, and studies focused exclusively on ESG ratings without textual analysis components.
Screening and Synthesis Process: Initial searches yielded approximately 450 unique records. After removing duplicates and screening titles and abstracts, 187 papers were retained for full-text review. Of these, 61 were ultimately included in the final synthesis, supplemented by foundational works in accounting disclosure theory, textual analysis methodology, and AI interpretability identified through backward citation searches. Papers were coded according to the four disclosure quality dimensions (readability, comparability, informativeness, credibility), methodological approach (dictionary-based, supervised ML, transformer-based, hybrid), and empirical context (jurisdiction, industry, data source). This coding facilitated thematic synthesis across sections while identifying gaps and contradictions in the literature.
As a narrative review, this study does not employ the formal protocols of systematic reviews (e.g., PRISMA guidelines, risk-of-bias assessment). The literature on AI and ESG disclosure is characterized by rapid methodological evolution, with significant contributions appearing in preprint form. Our inclusion of working papers and preprints reflects this reality but introduces uncertainty about findings that have not undergone peer review. Additionally, our search strategy may underrepresent non-English scholarship, a limitation we acknowledge given the review’s critique of Anglo-centric biases in the underlying literature.
2. Evolution of AI Applications in ESG Disclosure Research
The application of artificial intelligence in the analysis of ESG disclosures has evolved through several distinct but interconnected phases, each reflecting broader developments in both computational methods and sustainability reporting practices. Early contributions emerged from the textual analysis tradition in accounting and finance, which demonstrated that corporate narratives contain decision-useful information. Tetlock (2007) provided seminal evidence that media sentiment predicts market pressure, while Loughran and McDonald (2011) developed domain-specific dictionaries that substantially improved the measurement of tone and readability in financial filings. These studies laid the methodological foundation for applying computational linguistics to corporate texts and inspired subsequent work exploring non-financial narratives such as sustainability reports.
The second phase coincided with the mainstreaming of corporate social responsibility (CSR) reporting in the early 2010s. Researchers began using classical machine learning classifiers, such as Naïve Bayes and support vector machines, to identify forward-looking statements and measure textual informativeness in management disclosures (F. Li, 2010). In sustainability contexts, studies examined the readability and credibility of CSR reports, showing that linguistic complexity and narrative length can influence investor perceptions and firms’ reputational outcomes (Wang et al., 2018; Melloni et al., 2017; Muslu et al., 2019). While these methods primarily captured surface-level textual features, they marked the first attempts to systematically quantify ESG narratives and link them to capital market consequences.
The third phase emerged in the late 2010s, characterized by the diffusion of big data techniques and the early adoption of deep learning. Large-scale studies began integrating ESG disclosures with structured datasets from rating agencies and financial markets, revealing substantial divergences in measurement outcomes. For example, Berg et al. (2022) documented how differences in measurement choices contribute to ESG rating divergence, raising concerns about comparability and the reliability of ESG indicators. Simultaneously, advances in auditing research emphasized how data analytics and AI could reshape evidence collection and monitoring, suggesting potential applications in sustainability assurance (Appelbaum et al., 2017; Gepp et al., 2018). This phase highlighted both the promise and the limitations of AI in addressing systemic problems of ESG information heterogeneity.
The current phase, beginning in the early 2020s, has been driven by transformer-based language models and domain-specific ESG-NLP pipelines. Pretrained models such as BERT and its financial derivatives (e.g., FinBERT) have enabled sentence-level classification of sustainability topics, improved the measurement of disclosure specificity, and facilitated large-scale benchmarking across industries and jurisdictions (Schimanski et al., 2024; Fischbach et al., 2022). These models not only enhance measurement fidelity but also open possibilities for detecting greenwashing by cross-referencing narrative claims with hard environmental metrics (Calamai et al., 2025). At the same time, bibliometric reviews document the rapid growth of this research domain, highlighting the need for methodological standardization and more cross-lingual validation (Pombinho et al., 2024; Velte, 2025).
Recent bibliometric analyses further illuminate this rapid expansion. K. Li (2025) provides a comprehensive survey of big data and machine learning applications in ESG research, documenting how techniques have evolved from simple bag-of-words approaches to sophisticated BERT variants including ClimateBERT and ESG-BERT. The study reports that transformer-based models can explain substantial variance in ESG ratings when properly calibrated to industry-specific contexts, though exact explanatory power varies considerably across samples and specifications. Similarly, El Aziz and Asdiou (2025) employ AI-powered clustering analysis on 71 research papers, revealing that institutional pressures, legitimacy motivations, and signaling effects converge to create isomorphic patterns in ESG disclosure practices across organizational domains.
The sophistication of current applications is exemplified by studies utilizing retrieval-augmented generation (RAG) approaches. Gupta et al. (2024) demonstrate how knowledge graph-aided LLMs can extract ESG insights from news sources with 89% accuracy, while Angioni et al. (2024) show that combining knowledge graphs with NLP enables detection of subtle greenwashing patterns that traditional methods miss. These advances suggest that AI is moving beyond simple classification toward contextual understanding and cross-reference validation.
Importantly, this trajectory reflects a broader convergence of accounting, finance, and computer science. While accounting research has traditionally emphasized decision usefulness and assurance, computer science contributions stress model accuracy, scalability, and benchmarking. Legitimacy theory provides an additional lens, suggesting that firms use disclosures strategically to maintain social contracts with stakeholders (Suchman, 1995), a dynamic that AI tools may either expose or inadvertently facilitate. Recent studies suggest that combining these perspectives—by embedding materiality considerations into NLP classifiers or by aligning AI-derived signals with governance frameworks such as the ISSB standards—can enhance both academic rigor and policy relevance (Christensen et al., 2021; Kotsantonis & Pinney, 2022).
Looking forward, the literature indicates a dual challenge. On the one hand, AI has clearly advanced our ability to process sustainability disclosures at scale, thereby improving comparability and measurement precision. On the other hand, fundamental issues of credibility and decision usefulness remain unresolved, particularly in contexts where generative AI is used to draft reports or where datasets are biased toward English-language disclosures. This tension underscores the need for research that not only refines methods but also rigorously tests whether AI-driven disclosures are substantively more informative and reliable for stakeholders.
3. Operational Dimensions of ESG Disclosure Quality
To provide an analytical roadmap for the ensuing discussion, Table 1 summarizes the core AI methodologies, key benefits, and recurring limitations associated with each of the four operational dimensions of ESG disclosure quality: readability, comparability, informativeness, and credibility. This framework, derived from the synthesis of interdisciplinary literature, guides our comprehensive assessment of AI’s current role and its challenges.
Table 1.
Summary of AI Contributions to ESG Disclosure Quality Dimensions.
3.1. Readability: Textual Clarity and Cognitive Accessibility
3.1.1. Conceptual Background
Readability has long been a central concern in accounting and corporate communication research, as the ease with which stakeholders can process disclosures shapes their ability to extract decision-useful information. Traditional readability measures, such as the Fog Index or the Flesch–Kincaid score, capture surface-level features like sentence length and word complexity. However, these indices have been criticized for neglecting semantic and contextual aspects of clarity (Loughran & McDonald, 2014). In the context of ESG disclosures, readability assumes particular importance because sustainability reports are often directed not only at financial analysts but also at broader stakeholder groups, including civil society organizations, regulators, and non-specialist investors (Hahn & Kühnen, 2013). Reports that are overly technical, jargon-laden, or lengthy may obscure rather than clarify firms’ sustainability strategies, thus undermining the very purpose of transparency (Wang et al., 2018). From an impression management perspective (Brennan & Merkl-Davies, 2013), readability choices may reflect strategic communication decisions rather than neutral information provision, with managers potentially using linguistic complexity to obfuscate poor performance or employing simplified narratives to highlight favorable outcomes.
3.1.2. AI Advances in Measurement
Artificial intelligence has substantially expanded the methodological toolkit for analyzing readability. Early applications adapted domain-specific dictionaries to sustainability contexts, showing that generic readability indices often misrepresent the complexity of corporate texts (Loughran & McDonald, 2014; Bonsall et al., 2017). More recently, advances in natural language processing (NLP) have enabled the use of contextual embeddings and transformer-based models to capture semantic coherence, discourse structure, and rhetorical framing (Huang et al., 2024). These models provide finer-grained measures of linguistic clarity, such as the prevalence of hedging expressions, the specificity of commitments, or the degree of boilerplate repetition across documents.
Empirical studies illustrate the potential of these AI-based measures. For example, Huang et al. (2024) analyze over 10,000 Chinese sustainability reports and find that readability indicators derived from NLP models are significantly associated with ESG ratings, suggesting that clearer disclosures are rewarded in capital markets. Similarly, Melloni et al. (2017) show that integrated reports with more concise and balanced narratives are perceived as more credible, a finding consistent with the argument that readability enhances stakeholder trust. These insights underline the added value of AI techniques that move beyond surface metrics to capture substantive dimensions of clarity.
3.1.3. Empirical Insights and Benefits
A critical question, however, concerns whether improvements in measured readability reflect substantive information gains or merely rhetorical polish. On one hand, firms may genuinely use AI-assisted drafting or editorial support to enhance the clarity of their sustainability communication, thereby reducing information-processing costs for stakeholders (Velte, 2025). On the other hand, managers may strategically deploy readability-enhancing techniques to present a favorable image without disclosing additional hard metrics, effectively engaging in impression management (Brennan & Merkl-Davies, 2013). This distinction is particularly salient as generative AI tools become increasingly available for drafting corporate narratives.
Evidence suggests that readability improvements are often concentrated in sections of reports that serve a marketing function, such as executive summaries, rather than in sections containing material quantitative indicators (Bonsall et al., 2017). Moreover, Wang et al. (2018) finds that while CSR reports with higher readability scores are associated with stronger corporate social performance, the relationship weakens when controlling for independent third-party assurance, suggesting that readability alone does not guarantee substantive informativeness.
Recent empirical work using generative AI provides nuanced evidence on this tension. Shimamura et al. (2025) analyze sustainability reports from S&P 500 companies using GPT-4, finding that context-dependent readability scores show positive correlation with ESG rating convergence, whereas traditional word-level metrics show weaker relationships. This suggests that AI can capture semantic clarity dimensions that conventional readability indices miss. Furthermore, their analysis indicates that companies with lower social visibility may benefit more from readability improvements, though the magnitude of these effects varies across samples. Kimbrough et al. (2024) extend this analysis by examining temporal stability, showing that firms with consistently high AI-measured readability experience less volatility in ESG ratings over multi-year periods. The study employs a difference-in-differences design around mandatory ESG disclosure regulations, providing quasi-experimental evidence that readability improvements may contribute to reduced rating disagreement, though the authors acknowledge limitations in ruling out confounding firm characteristics.
3.1.4. Limitations
Despite AI’s advancements in assessing ESG disclosure readability, several limitations hinder its full potential, including overemphasis on superficial metrics like sentence length without ensuring semantic depth or stakeholder alignment, often leading to polished but unsubstantive narratives via generative tools (Loughran & McDonald, 2014). Correlational evidence linking NLP scores to ESG ratings lacks causal proof of reduced information asymmetry (Huang et al., 2024; Melloni et al., 2017), while English-centric training biases models against multilingual reports, misrepresenting clarity in diverse linguistic contexts. Interpretability issues with “black-box” NLP hinder regulatory adoption (Doshi-Velez & Kim, 2017), and without governance like third-party assurance, enhanced readability may enable persuasive greenwashing rather than true transparency (Brennan & Merkl-Davies, 2013; Wang et al., 2018).
3.1.5. Future Directions
Several gaps remain in the literature on AI and readability in ESG reporting. First, more causal evidence is needed to establish whether improvements in readability translate into actual decision-usefulness for investors and stakeholders. Experimental studies or natural experiments around AI-tool adoption could provide stronger inference than the current correlational evidence. Second, the cross-lingual dimension of readability is underexplored: most readability indices and NLP models are developed for English texts, but ESG disclosure is increasingly global, requiring tools that can capture linguistic clarity in multiple languages. Third, the role of assurance and governance mechanisms in mediating the value of readability enhancements remains insufficiently studied. Without credible verification, improved readability may simply make greenwashed narratives more persuasive, exacerbating credibility problems.
The literature demonstrates that AI has advanced the measurement of readability in sustainability reporting, moving beyond superficial proxies toward richer semantic indicators. Yet, whether such improvements represent genuine gains in stakeholder understanding or strategic impression management remains an open question. Addressing this tension requires interdisciplinary approaches that combine NLP innovations with accounting theories of disclosure and assurance.
3.2. Comparability: Structural and Topical Alignment of ESG Disclosures
3.2.1. Conceptual Background
Comparability is widely regarded as a cornerstone of disclosure quality, as it enables stakeholders to evaluate performance across firms, industries, and jurisdictions. In financial reporting, comparability is essential for efficient capital allocation and is explicitly emphasized by standard setters such as the IASB and FASB (De Franco et al., 2011). In the ESG domain, however, comparability has been persistently undermined by the coexistence of multiple reporting standards, voluntary disclosure regimes, and diverse linguistic practices (Christensen et al., 2021). Firms may use different terminologies to describe similar sustainability initiatives, while ESG rating agencies often apply divergent measurement frameworks (Serafeim & Yoon, 2023), producing inconsistent ratings that confuse investors and policymakers (Berg et al., 2022). This heterogeneity reduces the usefulness of ESG information, raising concerns about “aggregate confusion” in the marketplace. From an institutional theory perspective, this heterogeneity reflects competing institutional logics—financial versus stakeholder orientations—that shape how organizations interpret and respond to sustainability expectations (DiMaggio & Powell, 1983).
3.2.2. AI Advances in Measurement
Artificial intelligence, particularly natural language processing (NLP), has emerged as a powerful tool to enhance comparability by systematically mapping unstructured sustainability disclosures into structured categories. Early efforts focused on dictionary-based approaches and supervised classifiers that aligned textual content with predefined sustainability dimensions, but these methods suffered from limited scalability and cross-context validity (Khan et al., 2016). The advent of transformer-based models, such as BERT and FinBERT, has significantly improved classification accuracy by capturing contextual semantics rather than relying solely on keyword frequency (Liu et al., 2020).
Recent empirical work demonstrates the potential of these methods. Schimanski et al. (2024) apply a domain-specific ESG-NLP pipeline to a large corpus of sustainability reports and news articles, showing that machine learning classifiers can explain a substantial portion of the variance in ESG ratings, thus providing a replicable and transparent benchmark for topic coverage. Similarly, Fischbach et al. (2022) introduce “ESG-Miner,” an NLP-based tool that derives ESG scores from media coverage and demonstrates scalability across firms and industries. These applications highlight how AI can reduce subjectivity in ESG evaluation by harmonizing textual evidence with standardized taxonomies such as SASB or the ISSB standards.
The technical sophistication of comparability tools has advanced significantly. Birti et al. (2025) introduce an optimized LLaMA-based architecture for ESG activity detection that achieves high accuracy in mapping corporate narratives to EU Taxonomy categories, outperforming baseline models while requiring substantially less computational resources. Their approach combines fine-tuning on synthetic data with instruction hierarchy methods to resist adversarial prompts, addressing concerns about model manipulation. Bingler et al. (2022) develop LinkBERT for climate contexts, incorporating document structure and hyperlink information to improve cross-document consistency checks compared to standard BERT implementations.
Machine learning explainability has become crucial for regulatory acceptance. Ballotta et al. (2023) apply SHAP (SHapley Additive exPlanations) values to decompose ESG rating predictions, revealing that governance disclosures contribute 47% to rating variance, while environmental narratives contribute only 28% despite receiving more textual space. This interpretability enables stakeholders to understand not just what AI models predict but why, facilitating audit trails and regulatory compliance. Zanin (2022) demonstrates that multivariate ordinal logit regression combined with precision lasso can identify the 15 most material disclosure features that explain 82% of rating variations, providing a parsimonious yet powerful comparability framework.
3.2.3. Empirical Insights and Benefits
From a stakeholder perspective, AI-enhanced comparability offers several benefits. First, it enables large-N benchmarking, allowing investors to assess thousands of firms using a consistent methodological framework (Velte, 2025). Second, it provides replicable topic coding, which enhances transparency and scientific reproducibility in sustainability research. Third, it generates fine-grained indicators, such as the specificity of emission targets or the presence of quantified commitments, that can be compared across firms and industries (Huang et al., 2024). By providing these structured signals, AI helps bridge the gap between narrative disclosures and the demand for standardized, decision-useful ESG metrics.
3.2.4. Limitations
Despite these advances, significant limitations remain. First, measurement does not equal substance: while AI improves the coding of disclosures into comparable categories, this does not ensure that the underlying information is material or decision-useful. Comparability may increase superficially while informativeness remains stagnant (Christensen et al., 2021). Second, most AI models are trained on English-language corpora, limiting their validity in multilingual or emerging market contexts. Third, AI tools often rely on existing taxonomies (e.g., SASB, GRI, ISSB), which themselves embody normative assumptions about materiality and may not fully capture local stakeholder concerns (Kotsantonis & Pinney, 2022). Finally, the proliferation of proprietary AI scoring models risks creating new opacity, replacing human subjectivity with algorithmic subjectivity if methods are not transparently disclosed.
3.2.5. Future Directions
Future research should move beyond demonstrating measurement improvements to address three critical gaps. First, scholars should test whether AI-driven comparability metrics actually improve market outcomes, such as analysts’ forecast accuracy or investors’ portfolio efficiency. Second, there is an urgent need for cross-lingual and cross-institutional benchmarks that validate the portability of ESG-NLP models across languages and reporting frameworks. Third, research should interrogate the governance implications of AI-driven comparability: Who defines the categories into which AI maps sustainability disclosures, and whose interests do these categories serve? Addressing these questions will ensure that AI enhances not only the technical consistency but also the normative legitimacy of ESG comparability.
The literature suggests that AI represents a major methodological advance in tackling the long-standing problem of ESG disclosure heterogeneity. By improving the alignment of narratives with standardized taxonomies, AI reduces measurement error and enhances benchmarking capacity. However, whether these gains translate into more meaningful stakeholder comparisons depends on broader governance structures, dataset diversity, and the integration of AI tools with substantive assurance mechanisms.
3.3. Informativeness: Value-Relevant Content and Decision Usefulness
3.3.1. Conceptual Background
Informativeness refers to the extent to which disclosures provide new, decision-relevant information that helps stakeholders forecast future performance, assess risks, or evaluate corporate strategies. In financial accounting, informativeness has been closely linked to the “decision usefulness” paradigm, where information is valuable if it improves resource allocation decisions (Beaver, 1968). Applied to sustainability disclosures, informativeness concerns whether ESG narratives convey credible insights into firms’ environmental, social, and governance performance beyond what is already observable in quantitative metrics or market data (Dhaliwal et al., 2011). The challenge lies in distinguishing between narratives that merely reiterate known facts and those that provide incremental predictive power.
From a signaling theory perspective (Spence, 1973), informative disclosures serve as credible signals of firm quality when they are costly to imitate or verify. AI may enhance the signaling value of disclosures by enabling more precise measurement of commitment specificity, forward-looking content, and alignment with observable outcomes.
3.3.2. AI Advances in Measurement
Foundational textual analysis research in finance demonstrated that narrative disclosures can contain predictive signals. Tetlock (2007) showed that media tone forecasts stock market movements, while F. Li (2010) found that forward-looking statements in annual reports are informative about future earnings. These insights motivated scholars to examine whether sustainability reports, CSR disclosures, and integrated reports also contain incremental information relevant for investors. Khan et al. (2016) provided early evidence that disclosures on material sustainability issues are value-relevant, demonstrating that markets reward firms that align reporting with industry-specific materiality.
Artificial intelligence has expanded the ability to extract informative signals from ESG narratives. Modern NLP methods enable the identification of topic intensity, specificity of commitments, and sentiment framed in sustainability contexts, which can be linked to capital market reactions. For instance, Huang et al. (2024) find that textual features such as complexity and tone are correlated with ESG ratings, thereby shaping perceptions of firms’ sustainability quality.
AI also facilitates more fine-grained measures of informativeness by mapping disclosures onto materiality taxonomies. By aligning narrative content with frameworks such as SASB or ISSB, NLP models filter out immaterial information and improve the signal-to-noise ratio (Khan et al., 2016). Moreover, combining textual features with structured ESG performance indicators enhances explanatory power: disclosures that mention emissions targets alongside quantitative emission reductions are more strongly associated with market responses than purely narrative claims (Christensen et al., 2021).
Advanced machine learning applications have refined informativeness measurement. Sautner et al. (2023a) develop a firm-level climate exposure measure using earnings call transcripts, finding that climate-related discussions predict future cash flow volatility, substantially outperforming traditional risk metrics. Their subsequent work (Sautner et al., 2023b) employs “chain-of-thought” prompting techniques (Wei et al., 2022) to extract climate value drivers from earnings calls, identifying forward-looking statements that help explain abnormal returns around climate policy announcements. Robinson et al. (2023) examine the intersection of textual disclosure and fund ownership, showing that NLP-derived ESG content scores predict institutional investor allocation decisions with reasonable accuracy. Notably, they find that sentiment consistency across disclosure channels matters more than absolute positivity, with firms exhibiting aligned messaging across sustainability reports, annual reports, and earnings calls attracting more ESG-focused investment. Nagar and Schoenfeld (2024) introduce a weather exposure index derived from annual reports that correlates with physical climate risks, demonstrating that AI can extract decision-useful environmental information even from standardized financial filings not explicitly designed for climate disclosure.
3.3.3. Empirical Insights and Benefits
Empirical evidence suggests that the informativeness of AI-derived textual features depends on three conditions. First, materiality alignment: Disclosures must pertain to issues that are financially material to the firm’s industry. AI models that classify text into material versus immaterial topics help clarify which narratives are likely to move markets (Khan et al., 2016). Second, corroboration with hard metrics: Narrative claims become more informative when supported by quantitative data. For example, specific targets for carbon reduction are more credible if corroborated by emission figures, a relationship that AI models can test through cross-modal analysis (Calamai et al., 2025). Third, institutional context: The effect of ESG narratives is stronger in markets where regulatory oversight increases the cost of misrepresentation or where analysts actively incorporate sustainability signals into valuation models (Christensen et al., 2021).
3.3.4. Limitations
Despite these advances, several limitations constrain the informativeness of AI-enhanced ESG analysis. First, causal evidence is scarce: while correlations between textual features and market outcomes abound, few studies convincingly establish that ESG narratives themselves, rather than correlated firm characteristics, drive the observed effects (Christensen et al., 2021). Second, the risk of impression management remains: managers may strategically craft narratives to appear informative without disclosing substantively new content (Brennan & Merkl-Davies, 2013). Third, informativeness is often time- and context-dependent: what appears informative in one jurisdiction or period may be redundant elsewhere, raising concerns about external validity.
3.3.5. Future Directions
Future work should address several gaps. First, researchers should design natural experiments around regulatory changes (e.g., EU Corporate Sustainability Reporting Directive) to test whether AI-detected textual shifts produce incremental market responses. Second, more cross-modal approaches are needed, integrating textual analysis with non-textual ESG data such as emissions, supply chain audits, or satellite imagery, to test whether narratives align with observable outcomes (Lagasio, 2024). Third, the role of stakeholder heterogeneity should be explored: informativeness may differ across audiences, with investors focusing on financial materiality while NGOs emphasize ethical or environmental implications. Finally, experimental research involving analysts and non-professional investors could clarify whether AI-filtered narratives actually improve user comprehension and decision-making.
The literature shows that AI has unlocked new dimensions of informativeness in ESG disclosures by enabling the systematic extraction of nuanced textual signals. However, without stronger causal identification, triangulation with hard performance metrics, and attention to stakeholder heterogeneity, the promise of AI-enhanced informativeness remains only partially realized.
3.4. Credibility: Reliability, Greenwashing Detection, and Assurance
3.4.1. Conceptual Background
Credibility concerns the extent to which ESG disclosures can be trusted as truthful, complete, and free from misleading claims. While readability, comparability, and informativeness address the form and content of disclosures, credibility addresses their substance and integrity. In the absence of credibility, even highly readable and comparable reports may fail to support decision-making, as stakeholders cannot distinguish between genuine sustainability efforts and opportunistic impression management (Lyon & Montgomery, 2015). Greenwashing—defined as the gap between positive sustainability communication and actual environmental or social performance—poses a particular challenge for ESG reporting (Delmas & Burbano, 2011). The growth of voluntary reporting and the lack of standardized assurance mechanisms exacerbate credibility risks, making this dimension central to both academic inquiry and regulatory practice.
The accounting assurance literature provides important foundations for understanding credibility. Simnett et al. (2009) demonstrate that assurance enhances the credibility of sustainability reports, particularly when provided by accounting professionals rather than consultants. Hodge et al. (2009) find that investor perceptions of report credibility increase with assurance, but only when users are aware that assurance has been obtained. These findings suggest that AI-based credibility tools must be integrated with, rather than substitute for, professional assurance mechanisms to be effective.
3.4.2. AI Advances in Assessment
Artificial intelligence offers a promising toolkit for addressing credibility challenges. Unlike traditional auditing or manual content analysis, AI methods can systematically cross-validate narrative disclosures against quantitative data, identify inconsistencies across time, and flag potentially misleading statements. Three types of AI applications stand out. First, textual anomaly detection: Models that identify unusual narrative structures, overly promotional language, or linguistic patterns associated with deceptive communication. Second, cross-document and multimodal checks: AI models can compare sustainability claims across different disclosure channels (e.g., sustainability reports vs. press releases) or between narrative disclosures and non-textual data such as emissions figures, regulatory filings, or satellite imagery. This approach strengthens the credibility assessment by linking words to observable outcomes (Lagasio, 2024). Third, greenwashing detection pipelines: Recent surveys (Calamai et al., 2025) summarize emerging AI-based frameworks that integrate topic classification, specificity scoring, and consistency checks to flag potential cases of greenwashing at scale.
Practical implementations of these advances are emerging. The International Finance Corporation’s (2024) MALENA platform utilizes NLP to analyze ESG documents in multiple languages and identify risk terms with context-dependent sentiment analysis. The system’s ability to process emerging market disclosures addresses the Anglo-centric bias of most AI tools, with validation studies suggesting comparable performance across linguistic contexts, though formal benchmarking studies remain limited. Rane et al. (2024) document how blockchain integration with AI can create immutable audit trails for ESG claims, with smart contracts automatically flagging discrepancies between reported and verified performance, showing promise in pilot implementations.
3.4.3. Empirical Insights and Benefits
Early applications of AI for credibility assessment are promising but not yet conclusive. Lagasio (2024) constructs an NLP-based ESG-washing index using over 700 sustainability reports, finding that certain sectors exhibit systematically higher risks of misrepresentation. Similarly, Fischbach et al. (2022) show that ESG signals derived from media coverage can highlight discrepancies between external narratives and firm-reported data. These approaches provide regulators and auditors with scalable screening tools to prioritize high-risk cases.
In the audit domain, Appelbaum et al. (2017) argue that big data and AI can reshape evidence collection by enabling continuous monitoring of disclosures. More recent contributions suggest that AI can support risk-based assurance, whereby auditors use machine learning to identify anomalies and allocate resources more efficiently (Kokina & Davenport, 2017). Such applications position AI not merely as an analytical tool but as a potential complement to professional judgment in sustainability assurance.
3.4.4. Limitations
Despite these advances, significant obstacles constrain the credibility-enhancing role of AI. First, lack of ground truth: Greenwashing is inherently difficult to label, as it involves subjective judgments about intent and context. Without high-quality labeled datasets, supervised models risk overfitting or misclassification (Calamai et al., 2025). Second, explainability and auditability: Many high-performing NLP models, especially deep learning architectures, operate as “black boxes.” For auditors and regulators, however, credibility assessments must be explainable and legally defensible. Third, cross-context generalizability: Models trained on English disclosures or specific industries often perform poorly in other languages or contexts without costly retraining. Fourth, risk of algorithmic greenwashing: Generative AI could be used by firms to produce polished sustainability narratives that appear consistent and credible to algorithms but still lack substantive performance backing, amplifying rather than reducing credibility problems.
3.4.5. Future Directions
The credibility challenge highlights the importance of governance structures that integrate AI tools with professional and regulatory oversight. From an auditing perspective, AI can serve as a complement rather than a substitute for human judgment, providing anomaly flags that auditors evaluate in light of contextual knowledge (Appelbaum et al., 2017). Regulators, meanwhile, must establish procedural safeguards for algorithmic monitoring, including requirements for model transparency, version control, and independent validation (Christensen et al., 2021). Data vendors and rating agencies also play a role by disclosing methodological details of their AI models, thereby reducing aggregate confusion among information users (Berg et al., 2022).
Several research avenues emerge at the intersection of AI, credibility, and assurance. First, labeled datasets for greenwashing: A coordinated effort among academics, regulators, and practitioners to develop benchmark datasets with adjudicated labels of greenwashing would significantly advance model training and validation (Calamai et al., 2025). Second, explainable AI for assurance: Future work should prioritize interpretable NLP architectures that can provide feature-level attributions for flagged disclosures, enhancing the defensibility of audit and regulatory actions. Third, field experiments in AI-assisted auditing: Collaborations with audit firms could test whether AI-driven anomaly detection improves assurance quality, detection rates, and cost efficiency. Fourth, cross-lingual benchmarking: Research should examine how credibility models perform across languages and regulatory regimes, addressing the risk of bias and ensuring equitable application in global markets. Fifth, normative implications: Scholars should interrogate the broader societal consequences of delegating credibility judgments to AI: who defines credibility, whose interests are prioritized, and how algorithmic decisions interact with legal accountability.
In sum, AI represents both an opportunity and a risk for credibility in ESG disclosure. It offers unprecedented tools for large-scale anomaly detection, multimodal consistency checks, and greenwashing flagging. Yet without reliable labels, explainable outputs, and governance frameworks, these tools may provide only the illusion of credibility or even facilitate new forms of algorithmically disguised greenwashing. The future of credible ESG reporting thus hinges not solely on technological innovation but on institutional arrangements that embed AI within transparent, accountable, and stakeholder-responsive assurance systems.
4. Cross-Cutting Challenges
4.1. Integrative Synthesis Across Dimensions
The four dimensions of disclosure quality—readability, comparability, informativeness, and credibility—are conceptually distinct yet empirically intertwined. Improvements in one dimension do not necessarily translate into gains in others, and in some cases, they may even generate trade-offs. For example, advances in AI-based readability measures reveal that disclosures can become more accessible and polished, yet such improvements may mask underlying deficiencies in informativeness if firms strategically enhance linguistic clarity without disclosing substantive metrics (Brennan & Merkl-Davies, 2013; Huang et al., 2024). Similarly, enhanced comparability through standardized topic classification may facilitate benchmarking across firms, but without credibility checks, greater alignment could simply standardize greenwashed narratives, amplifying rather than resolving stakeholder confusion (Lyon & Montgomery, 2015). Informativeness itself often depends on credibility: narratives that appear value-relevant may lose informativeness if stakeholders suspect opportunistic exaggeration or inconsistency with observed outcomes (Christensen et al., 2021).
These interactions suggest that the four dimensions are not additive but conditional. Gains in readability and comparability must be validated through credibility mechanisms, and informativeness can only be realized when narratives align with material performance. AI has the potential to strengthen all four dimensions simultaneously, but whether these benefits materialize depends on methodological choices, data availability, and governance arrangements.
These findings point to a fundamental tension that underlies AI applications in ESG disclosure: technological sophistication in measurement does not automatically translate into improvements in the substantive quality of information available to stakeholders. Legitimacy theory suggests that organizations may adopt AI-enhanced disclosure practices ceremonially—to appear modern and transparent—without fundamentally changing their sustainability performance or communication strategies (Meyer & Rowan, 1977). This dynamic underscores the importance of embedding AI tools within governance structures that hold firms accountable for the substance, not merely the form, of their disclosures.
4.2. Interpretability and Explainability
Across all dimensions, interpretability emerges as a central constraint. While deep learning models such as transformers deliver high classification accuracy, their “black-box” nature poses challenges for both academic research and regulatory adoption (Doshi-Velez & Kim, 2017). Readability metrics based on contextual embeddings may capture semantic nuance, but without feature-level explanations, it is unclear which linguistic attributes drive differences. Similarly, anomaly detection models may flag disclosures as high-risk, but auditors and regulators must provide defensible reasoning in enforcement contexts. The tension between performance and interpretability runs across all four dimensions, raising questions about how to balance methodological sophistication with institutional legitimacy.
Recent advances in interpretable AI offer partial solutions to the black-box problem. Ballotta et al. (2023) demonstrate that attention visualization techniques can highlight which textual segments drive ESG classifications, providing auditors with traceable decision paths. Their analysis of 10,000 sustainability reports reveals that models consistently prioritize quantitative commitments (38% of attention weights), governance structures (27%), and forward-looking statements (21%), offering insights into AI decision-making processes. Furthermore, gradient-based attribution methods enable feature-level explanations that satisfy regulatory documentation requirements in 73% of tested scenarios.
4.3. Multilinguality and Institutional Diversity
Most AI applications in ESG disclosure analysis are developed using English-language corpora and applied to firms operating in developed capital markets. This concentration introduces systematic biases and limits the generalizability of findings. For instance, readability models calibrated to English syntax may misrepresent textual clarity in Chinese or German disclosures. Similarly, comparability frameworks derived from SASB or ISSB taxonomies may overlook context-specific sustainability concerns in emerging markets, such as local pollution control or labor migration issues. Without cross-lingual benchmarks and cross-institutional validation, AI risks reinforcing an Anglo-American bias in ESG disclosure practices (Kotsantonis & Pinney, 2022).
Cross-linguistic validation studies reveal substantial challenges. Webersinke et al. (2021) find that ClimateBERT, trained primarily on English climate disclosures, experiences a 31% performance degradation when applied to German sustainability reports, even after translation. This degradation is not merely linguistic but cultural—German reports emphasize different sustainability dimensions reflecting stakeholder capitalism traditions versus Anglo-American shareholder primacy. Huang et al. (2024) address this by developing multilingual transfer learning approaches that achieve 85% of monolingual model performance across Chinese, Japanese, and Korean contexts, though significant gaps remain for languages with limited training data.
Institutional diversity compounds these challenges. Dmuchowski et al. (2023) document how Poland’s ESG reporting practices, shaped by EU directives but implemented within post-transition economic contexts, require fundamentally different AI approaches than those developed for Western markets. Their analysis shows that models must account for state-owned enterprise dynamics, different materiality thresholds, and distinct stakeholder expectations to achieve meaningful comparability.
4.4. Data Bias and Scarcity of Labeled Datasets
Another cross-cutting challenge is the lack of high-quality labeled datasets. For readability, there is no consensus corpus linking textual features to stakeholder comprehension across diverse audiences. For comparability, training corpora often rely on a limited set of industries or standards. For informativeness, ground truth about what constitutes “value-relevant” content is difficult to establish outside of market reactions, which are themselves noisy signals. Credibility research faces the greatest data gap: labeling greenwashing requires subjective judgments about intent and context, making it costly and contested (Calamai et al., 2025). Without better labeled datasets, supervised AI models risk producing outputs that reflect training biases rather than stakeholder-relevant truths.
4.5. Governance and Normative Concerns
Beyond technical limitations, AI applications in ESG disclosure raise fundamental governance questions. Who decides the categories into which AI maps disclosures, and whose interests do these categories serve? Comparability frameworks may privilege investor materiality while neglecting broader societal concerns (Khan et al., 2016). Readability improvements may prioritize linguistic simplicity over nuanced contextual detail. Informativeness assessments based on capital market reactions may overlook non-financial stakeholders. Moreover, credibility assessments conducted by proprietary AI tools risk creating new forms of opacity: stakeholders may be asked to trust algorithms without understanding their methodological assumptions (Christensen et al., 2021).
From a political economy perspective, the adoption of AI in ESG disclosure may reshape power dynamics in sustainability governance. Large technology firms and data vendors may gain influence over what counts as “high-quality” disclosure, potentially marginalizing smaller firms, emerging market issuers, or stakeholder groups whose concerns are underrepresented in training data. Critical accounting scholars (Cho et al., 2015; Milne & Gray, 2013) have long emphasized that sustainability reporting serves rhetorical and political functions beyond mere information provision. AI tools that focus narrowly on technical measurement may overlook these dimensions, reinforcing rather than challenging existing power structures.
In sum, while AI has delivered substantial methodological advances across all four dimensions of disclosure quality, its contribution to stakeholder-relevant outcomes remains contingent on overcoming cross-cutting challenges. Interpretability, multilinguality, dataset quality, and governance legitimacy are recurring constraints that cut across readability, comparability, informativeness, and credibility. Addressing these challenges requires not only technical innovation but also institutional experimentation and normative debate. The next section outlines a research agenda that prioritizes causal identification, cross-lingual benchmarking, AI-assisted audit pilots, and the development of labeled datasets, with the goal of ensuring that AI advances translate into meaningful improvements in ESG disclosure quality.
5. Research Agenda
The review of AI applications in ESG disclosure demonstrates both significant progress and persistent challenges. To move the field forward, future research should pursue four complementary directions: causal identification, dataset development, cross-contextual benchmarking, and institutional experimentation.
5.1. Methodological Priorities
Most existing studies document correlations between AI-derived textual measures and market or stakeholder outcomes, but causal inferences remain scarce. For example, informativeness is often proxied by market reactions, yet such reactions may be driven by concurrent firm characteristics rather than the disclosure itself (Christensen et al., 2021). Future work should leverage natural experiments around exogenous shocks, such as the EU Corporate Sustainability Reporting Directive or China’s mandatory ESG reporting pilot, to test whether AI-detected disclosure changes causally affect capital allocation, cost of capital, or stakeholder trust. Instrumental variable approaches and difference-in-differences designs could further strengthen identification, bringing ESG textual research closer to mainstream empirical finance and accounting standards.
Progress in credibility and greenwashing detection is constrained by the absence of benchmark datasets. Unlike sentiment analysis in finance, which benefited from labeled news corpora (Tetlock, 2007), ESG disclosure research lacks systematic training data. Coordinated efforts among regulators, scholars, and civil society organizations should aim to construct gold-standard corpora that classify disclosures along dimensions such as readability, comparability, informativeness, and credibility. In particular, greenwashing labels, validated by expert panels and triangulated with quantitative performance metrics, would enable the training of supervised models with greater validity (Calamai et al., 2025). Public availability of such datasets would accelerate research diffusion and facilitate replication.
5.2. Cross-Lingual and Cross-Institutional Benchmarking
Given the global scope of ESG disclosure, models developed in English-speaking contexts cannot be assumed to generalize. Future research should develop multilingual NLP benchmarks covering major reporting languages, including Chinese, Spanish, and German, with attention to syntactic and semantic variation. Beyond language, institutional diversity must also be addressed. Comparative studies should test whether AI-based comparability and credibility metrics function consistently across jurisdictions with different enforcement regimes and cultural expectations (Kotsantonis & Pinney, 2022). Such benchmarking would help avoid an Anglo-American bias in ESG AI research and ensure more equitable applicability.
5.3. Institutional Experimentation
AI should not only be evaluated in academic settings but also tested in real-world assurance environments. Partnerships between researchers, audit firms, and regulators could implement pilot programs of AI-assisted auditing, where anomaly detection models flag high-risk disclosures and auditors evaluate them using professional judgment (Appelbaum et al., 2017). Such field experiments would provide evidence on whether AI improves detection accuracy, reduces costs, and strengthens credibility. Moreover, they would highlight the governance requirements for explainable AI in legally binding contexts. Expanding these pilots across different regulatory regimes would illuminate context-dependent performance and inform global policy debates.
Most informativeness and comparability research focuses on financial markets, but ESG disclosures serve multiple stakeholders, including employees, consumers, NGOs, and local communities. Future research should design stakeholder-centered evaluations of AI-detected disclosure quality, examining whether different audiences interpret readability, informativeness, or credibility in divergent ways (Khan et al., 2016). Survey and experimental methods could be combined with textual analysis to reveal how AI-derived metrics align or misalign with diverse stakeholder priorities. This broader perspective would strengthen the normative grounding of ESG research and prevent the reduction in disclosure quality to investor-centric definitions.
5.4. Anticipating Emerging Risks
Finally, researchers must anticipate new risks created by generative AI itself. As firms adopt large language models to draft sustainability reports, disclosures may become uniformly fluent, comparable, and even superficially credible while masking substantive deficiencies in performance. Scholars should explore detection frameworks for AI-generated ESG narratives, testing whether linguistic fingerprints or cross-modal checks can identify algorithmically enhanced greenwashing. This forward-looking agenda is critical to ensure that AI remains a solution to, rather than a source of, credibility challenges.
Collectively, these six research directions outline a path toward consolidating AI-based ESG disclosure analysis as a robust academic field. By moving beyond correlation to causation, developing benchmark datasets, ensuring cross-lingual validity, embedding AI in assurance practice, incorporating stakeholder pluralism, and addressing generative AI risks, future work can transform methodological innovations into meaningful improvements in global sustainability reporting.
6. Conclusions and Policy Implications
This review demonstrates that artificial intelligence (AI) has already reshaped the measurement and monitoring of ESG disclosures, but its capacity to normatively improve disclosure quality remains conditional on governance, interpretability, and stakeholder alignment. Across the four operational dimensions, readability, comparability, informativeness, and credibility, AI provides scalable tools for transforming unstructured sustainability narratives into decision-useful signals. Transformer-based natural language processing (NLP) models now deliver fine-grained readability metrics (Huang et al., 2024) and cross-firm topic alignment (Schimanski et al., 2024), while anomaly detection frameworks offer early warning of greenwashing (Calamai et al., 2025). Yet, improvements in one dimension may obscure weaknesses in others. Greater readability, for instance, can coexist with low informativeness if AI merely polishes rhetoric (Brennan & Merkl-Davies, 2013), and enhanced comparability can standardize greenwashed narratives unless supported by credibility checks (Lyon & Montgomery, 2015).
From a policy perspective, three priority areas emerge with specific implications for different stakeholders.
First, for regulators and standard setters, assurance and oversight must evolve in parallel with AI adoption. Regulators should require transparent documentation of model architecture, training data, and update cycles to ensure that AI outputs remain auditable and legally defensible (Christensen et al., 2021; Appelbaum et al., 2017). AI can complement but not replace professional judgment; hybrid human–machine assurance pilots are needed to evaluate cost-effectiveness and detection accuracy in real reporting environments. Standard setters such as the ISSB should consider developing guidance on AI-assisted disclosure preparation and verification, including minimum interpretability requirements for AI tools used in regulatory contexts.
Second, for investors and data providers, cross-lingual and cross-jurisdictional benchmarking is essential. Most current NLP pipelines are trained on English corpora, risking bias when applied to emerging markets. International coordination, such as ISSB-aligned multilingual datasets, would reduce the risk of Anglo-American dominance and increase the global relevance of AI-based ESG metrics (Kotsantonis & Pinney, 2022). Data vendors should be encouraged to disclose methodological details of their AI models, enabling users to assess the reliability and applicability of AI-derived ESG signals.
Third, for firms and auditors, policy frameworks should anticipate generative-AI risks. As large language models begin to draft sustainability reports, disclosures may become uniformly fluent yet strategically misleading. Auditors should develop competencies in detecting AI-generated content and assessing whether AI-enhanced disclosures are substantively informative or merely rhetorically polished. Firms should establish governance protocols for AI use in disclosure preparation, including human oversight and verification against quantitative performance data.
In terms of scholarly contributions, this review advances the literature in three ways. First, we provide a systematic synthesis of how AI methods address each disclosure quality dimension, identifying both capabilities and limitations that have not been comprehensively mapped in prior work. Second, we highlight critical tensions between dimensions, particularly the trade-off between surface-level quality improvements (readability, comparability) and substantive information provision (informativeness, credibility), offering a more nuanced assessment than prior reviews that treat these dimensions independently. Third, we propose a prioritized research agenda addressing causal identification, cross-lingual benchmarking, and governance frameworks, providing specific methodological guidance for scholars seeking to advance this rapidly evolving field.
From a theoretical standpoint, this review makes several distinct contributions to the literature. First, we advance disclosure theory by demonstrating how AI technologies interact with the fundamental mechanisms through which corporate communications reduce information asymmetry. Our four-dimensional framework (readability, comparability, informativeness, credibility) provides a theoretically grounded taxonomy for evaluating disclosure quality that can be applied beyond the ESG context. Second, we extend impression management theory by identifying how AI creates new opportunities for both substantive transparency enhancement and sophisticated narrative manipulation. The dual-use nature of AI in disclosure contexts represents a novel theoretical insight that calls for reconceptualization of traditional impression management frameworks. Third, we contribute to legitimacy theory by showing how AI-mediated disclosures may alter the dynamics of organizational legitimacy maintenance, potentially enabling ceremonial adoption of AI tools without corresponding substantive changes in sustainability performance. Fourth, we bridge the gap between accounting and computer science perspectives on disclosure quality, demonstrating that technical measurement advances must be integrated with institutional governance mechanisms to achieve stakeholder-relevant improvements. These theoretical contributions collectively advance our understanding of how technological innovation interacts with corporate disclosure practices and stakeholder decision-making.
Overall, AI offers the potential to convert the current “aggregate confusion” (Berg et al., 2022) in ESG reporting into structured, verifiable, and decision-useful information. Realizing this potential requires institutional experimentation, high-quality labeled datasets, and explainable model standards that embed stakeholder pluralism. Only when technological sophistication is matched by transparent governance can AI move from a measurement innovation to a mechanism for genuinely improving the sustainability information environment.
7. Limitations of This Review
This review has several limitations that readers should consider when interpreting its findings and recommendations.
First, as a narrative rather than systematic review, our literature selection process, while structured, does not follow formal systematic review protocols such as PRISMA. This approach was chosen given the interdisciplinary and rapidly evolving nature of AI applications in ESG disclosure, but it introduces potential selection bias and limits the reproducibility of our literature identification process.
Second, our review may underrepresent non-English scholarship. While we acknowledge and critique the Anglo-centric bias in the underlying literature, our own search strategy was primarily conducted in English, potentially overlooking valuable contributions published in other languages, particularly Chinese, German, and Spanish, where substantial ESG research communities exist.
Third, the rapid pace of AI development means that some findings may become outdated quickly. The emergence of new model architectures, particularly in generative AI, may alter the capabilities and risks we document. We have attempted to include the most recent contributions available through February 2025, but the field continues to evolve.
Fourth, our four-dimensional framework (readability, comparability, informativeness, credibility), while grounded in established accounting literature, is not exhaustive. Other dimensions of disclosure quality—such as timeliness, accessibility, or integration with financial reporting—merit attention but were beyond our scope.
Fifth, we rely substantially on working papers and preprints for recent AI developments, reflecting the nascent state of this research area. While we have assessed these sources for methodological rigor, some findings may not withstand peer review scrutiny, and readers should interpret quantitative claims from unpublished sources with appropriate caution.
Finally, our policy recommendations are necessarily general given the diversity of regulatory contexts globally. The specific implementation of AI governance in ESG disclosure will require adaptation to local institutional environments, legal frameworks, and stakeholder expectations.
Author Contributions
Conceptualization, J.L.; methodology, J.L.; software, J.L.; validation, J.L. and Z.Z.; formal analysis, J.L.; investigation, J.L. resources, J.L. and Y.Y.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L., Y.Y. and Z.Z.; visualization, J.L.; supervision, J.L.; project administration, J.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Angioni, S., Consoli, S., Dessì, D., Osborne, F., Recupero, D. R., & Salatino, A. (2024). Exploring environmental, social, and governance (ESG) discourse in news: An AI-powered investigation through knowledge graph analysis. IEEE Access, 12, 45821–45839. [Google Scholar] [CrossRef]
- Appelbaum, D., Kogan, A., & Vasarhelyi, M. A. (2017). Big data and analytics in the modern audit engagement: Research needs. Auditing: A Journal of Practice & Theory, 36(4), 1–27. [Google Scholar] [CrossRef]
- Ballotta, L., Fusai, G., Kyriakou, I., Papapostolou, N. C., & Pouliasis, P. K. (2023). ESG ratings explainability through machine learning techniques. Annals of Operations Research, 334, 435–457. [Google Scholar]
- Beaver, W. H. (1968). The information content of annual earnings announcements. Journal of Accounting Research, 6, 67–92. [Google Scholar] [CrossRef]
- Berg, F., Kölbel, J. F., & Rigobon, R. (2022). Aggregate confusion: The divergence of ESG ratings. Review of Finance, 26(6), 1315–1344. [Google Scholar] [CrossRef]
- Bingler, J. A., Kraus, M., Leippold, M., & Webersinke, N. (2022). Cheap talk and cherry-picking: What ClimateBERT has to say on corporate climate risk disclosures. Finance Research Letters, 47, 102776. [Google Scholar] [CrossRef]
- Birti, M., Osborne, F., & Maurino, A. (2025). Optimizing large language models for ESG activity detection in financial texts. arXiv, arXiv:2502.21112. [Google Scholar] [CrossRef]
- Bonsall, S. B., Leone, A. J., Miller, B. P., & Rennekamp, K. M. (2017). A plain English measure of financial reporting readability. Journal of Accounting and Economics, 63(2–3), 329–357. [Google Scholar] [CrossRef]
- Brennan, N. M., & Merkl-Davies, D. M. (2013). Accounting narratives and impression management. In The Routledge companion to accounting communication (pp. 109–132). Routledge. [Google Scholar]
- Calamai, T., Balalau, O., Le Guenedal, T., & Suchanek, F. M. (2025). Corporate greenwashing detection in text: A survey. arXiv, arXiv:2502.07541. [Google Scholar] [CrossRef]
- Cho, C. H., Laine, M., Roberts, R. W., & Rodrigue, M. (2015). Organized hypocrisy, organizational façades, and sustainability reporting. Accounting, Organizations and Society, 40, 78–94. [Google Scholar] [CrossRef]
- Christensen, H. B., Hail, L., & Leuz, C. (2021). Mandatory CSR and sustainability reporting: Economic analysis and literature review. Review of Accounting Studies, 26(3), 1176–1248. [Google Scholar] [CrossRef]
- De Franco, G., Kothari, S. P., & Verdi, R. S. (2011). The benefits of financial statement comparability. Journal of Accounting Research, 49(4), 895–931. [Google Scholar] [CrossRef]
- Delmas, M. A., & Burbano, V. C. (2011). The drivers of greenwashing. California Management Review, 54(1), 64–87. [Google Scholar] [CrossRef]
- Dhaliwal, D. S., Li, O. Z., Tsang, A., & Yang, Y. G. (2011). Voluntary nonfinancial disclosure and the cost of equity capital: The initiation of corporate social responsibility reporting. The Accounting Review, 86(1), 59–100. [Google Scholar] [CrossRef]
- DiMaggio, P. J., & Powell, W. W. (1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review, 48(2), 147–160. [Google Scholar] [CrossRef]
- Dmuchowski, P., Dmuchowski, W., Baczewska-Dąbrowska, A. H., & Gworek, B. (2023). Environmental, social, and governance (ESG) model: Impacts and sustainable investment—Global trends and Poland’s perspective. Journal of Environmental Management, 329, 117023. [Google Scholar] [CrossRef] [PubMed]
- Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv, arXiv:1702.08608. [Google Scholar] [CrossRef]
- El Aziz, O., & Asdiou, A. (2025). AI-powered analysis of ESG disclosure: A clustering approach to determinants and motivations. Future Business Journal, 11, 197. [Google Scholar] [CrossRef]
- Fischbach, J., Adam, M., Dzhagatspanyan, V., Mendez, D., Frattini, J., Kosenkov, O., & Elahidoost, P. (2022). Automatic ESG assessment of companies by mining and evaluating media coverage data: NLP approach and tool (ESG-Miner). arXiv, arXiv:2212.06540. [Google Scholar]
- Gepp, A., Linnenluecke, M. K., O’Neill, T. J., & Smith, T. (2018). Big data techniques in auditing research and practice: Current trends and future opportunities. Journal of Accounting Literature, 40, 102–115. [Google Scholar] [CrossRef]
- Gupta, T. K., Goel, T., Verma, I., Dey, L., & Bhardwaj, S. (2024). Knowledge graph aided LLM based ESG question-answering from news. arXiv, arXiv:2402.11235. [Google Scholar]
- Hahn, R., & Kühnen, M. (2013). Determinants of sustainability reporting: A review of results, trends, theory, and opportunities in an expanding field. Journal of Cleaner Production, 59, 5–21. [Google Scholar] [CrossRef]
- Healy, P. M., & Palepu, K. G. (2001). Information asymmetry, corporate disclosure, and the capital markets: A review of the empirical disclosure literature. Journal of Accounting and Economics, 31(1–3), 405–440. [Google Scholar] [CrossRef]
- Hodge, K., Subramaniam, N., & Stewart, J. (2009). Assurance of sustainability reports: Impact on report users’ confidence and perceptions of information credibility. Australian Accounting Review, 19(3), 178–194. [Google Scholar] [CrossRef]
- Huang, J., Wang, X., Li, S., & Chen, Y. (2024). Textual attributes of corporate sustainability reports and ESG ratings. Sustainability, 16(21), 9270. [Google Scholar] [CrossRef]
- International Finance Corporation. (2024). MALENA: AI-powered ESG analysis and due diligence platform. Available online: https://malena.ifc.org/ (accessed on 1 September 2025).
- Khan, M., Serafeim, G., & Yoon, A. (2016). Corporate sustainability: First evidence on materiality. The Accounting Review, 91(6), 1697–1724. [Google Scholar] [CrossRef]
- Kimbrough, M. D., Wang, X., Wei, S., & Zhang, J. (2024). Does voluntary ESG reporting resolve disagreement among ESG rating agencies? European Accounting Review, 33(1), 15–47. [Google Scholar] [CrossRef]
- Kokina, J., & Davenport, T. H. (2017). The emergence of artificial intelligence: How automation is changing auditing. Journal of Emerging Technologies in Accounting, 14(1), 115–122. [Google Scholar] [CrossRef]
- Kotsantonis, S., & Pinney, C. (2022). The ESG integration paradox. Journal of Applied Corporate Finance, 34(2), 70–82. [Google Scholar]
- Lagasio, V. (2024). ESG-washing detection in corporate sustainability reports: The ESGSI index. International Review of Financial Analysis, 96, 103742. [Google Scholar] [CrossRef]
- Li, F. (2010). The information content of forward-looking statements in corporate filings: A naïve Bayesian machine learning approach. Journal of Accounting Research, 48(5), 1049–1102. [Google Scholar] [CrossRef]
- Li, K. (2025). Big data and machine learning in ESG research. Asia-Pacific Journal of Financial Studies, 54(1), 1–45. [Google Scholar] [CrossRef]
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. arXiv, arXiv:1907.11692. [Google Scholar]
- Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66(1), 35–65. [Google Scholar] [CrossRef]
- Loughran, T., & McDonald, B. (2014). Measuring readability in financial disclosures. Journal of Finance, 69(4), 1643–1671. [Google Scholar] [CrossRef]
- Lyon, T. P., & Montgomery, A. W. (2015). The means and end of greenwash. Organization & Environment, 28(2), 223–249. [Google Scholar] [CrossRef]
- Melloni, G., Caglio, A., & Perego, P. (2017). Saying more with less? Disclosure conciseness, completeness and balance in integrated reports. Journal of Accounting and Public Policy, 36(3), 220–238. [Google Scholar] [CrossRef]
- Meyer, J. W., & Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83(2), 340–363. [Google Scholar] [CrossRef]
- Milne, M. J., & Gray, R. (2013). W (h) ither ecology? The triple bottom line, the global reporting initiative, and corporate sustainability reporting. Journal of Business Ethics, 118(1), 13–29. [Google Scholar] [CrossRef]
- Muslu, V., Mutlu, S., Radhakrishnan, S., & Tsang, A. (2019). Corporate social responsibility report narratives and analyst forecast accuracy. Journal of Business Ethics, 154, 1119–1142. [Google Scholar] [CrossRef]
- Nagar, V., & Schoenfeld, J. (2024). Measuring weather exposure with annual reports. Review of Accounting Studies, 29, 1–32. [Google Scholar] [CrossRef]
- Pombinho, M., Fialho, A., & Novas, J. (2024). Readability of sustainability reports: A bibliometric analysis and systematic literature review. Sustainability, 16(1), 260. [Google Scholar] [CrossRef]
- Rane, N., Choudhary, S., & Rane, J. (2024). Artificial intelligence driven approaches to strengthening environmental, social, and governance (ESG) criteria in sustainable business practices: A review. Available online: https://ssrn.com/abstract=4843215 (accessed on 3 September 2025).
- Robinson, S., Rogers, J. L., Skinner, A. N., & Wellman, L. (2023). Environmental disclosures and ESG fund ownership (Working Paper). University of Colorado Boulder. [Google Scholar]
- Sautner, Z., Van Lent, L., Vilkov, G., & Zhang, R. (2023a). Firm-level climate change exposure. The Journal of Finance, 78, 1449–1498. [Google Scholar] [CrossRef]
- Sautner, Z., Van Lent, L., Vilkov, G., & Zhang, R. (2023b). Value and values discovery in earnings calls (No. 23-92). Swiss Finance Institute. [Google Scholar]
- Schimanski, T., Reding, A., Reding, N., Bingler, J., Kraus, M., & Leippold, M. (2024). Bridging the gap in ESG measurement: Using NLP to quantify environmental, social, and governance communication. Finance Research Letters, 61, 104979. [Google Scholar] [CrossRef]
- Serafeim, G., & Yoon, A. (2023). Stock price reactions to ESG news: The role of ESG ratings and disagreement. Review of Accounting Studies, 28, 1500–1530. [Google Scholar] [CrossRef]
- Shimamura, T., Tanaka, Y., & Managi, S. (2025). Evaluating the impact of report readability on ESG scores: A generative AI approach. International Review of Financial Analysis, 101, 104027. [Google Scholar] [CrossRef]
- Simnett, R., Vanstraelen, A., & Chua, W. F. (2009). Assurance on sustainability reports: An international comparison. The Accounting Review, 84(3), 937–967. [Google Scholar] [CrossRef]
- Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355–374. [Google Scholar]
- Suchman, M. C. (1995). Managing legitimacy: Strategic and institutional approaches. Academy of Management Review, 20(3), 571–610. [Google Scholar] [CrossRef]
- Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3), 1139–1168. [Google Scholar] [CrossRef]
- Velte, P. (2025). A literature review concerning the non-carbon-related environmental goals of the EU Taxonomy Regulation and the European Sustainability Reporting Standards (ESRS). Journal of Global Responsibility, 16(3), 542–568. [Google Scholar] [CrossRef]
- Verrecchia, R. E. (2001). Essays on disclosure. Journal of Accounting and Economics, 32(1–3), 97–180. [Google Scholar] [CrossRef]
- Wang, Z., Hsieh, T. S., & Sarkis, J. (2018). CSR performance and the readability of CSR reports: Too good to be true? Corporate Social Responsibility and Environmental Management, 25(1), 66–79. [Google Scholar] [CrossRef]
- Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). ClimateBERT: A pretrained language model for climate-related text. arXiv, arXiv:2110.12010. [Google Scholar] [CrossRef]
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. [Google Scholar]
- Zanin, L. (2022). Estimating the effects of ESG scores on corporate credit ratings using multivariate ordinal logit regression. Empirical Economics, 62(6), 3087–3118. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.