Developing a Novel Audit Risk Metric Through Sentiment Analysis

Wang, Xiao; Sun, Feng; Kim, Min Gyeong; Na, Hyung Jong

doi:10.3390/su17062460

Open AccessArticle

Developing a Novel Audit Risk Metric Through Sentiment Analysis

¹

Enterprise Compliance Research Center, Binzhou Polytechnic, No. 919 Yellow River Twelve Road, Bincheng District, Binzhou 256619, China

²

School of Business Administration, Binzhou Polytechnic, Binzhou 256619, China

³

S.K.K. Business School, Sungkyunkwan University, Seoul 03063, Republic of Korea

⁴

Department of Accounting and Taxation, Semyung University, 65, Semyeong-ro, Jecheon-si 27136, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sustainability 2025, 17(6), 2460; https://doi.org/10.3390/su17062460

Submission received: 21 November 2024 / Revised: 28 February 2025 / Accepted: 3 March 2025 / Published: 11 March 2025

(This article belongs to the Section Economic and Business Aspects of Sustainability)

Download

Browse Figures

Versions Notes

Abstract

This study introduces the Audit Risk Sentiment Value (ARSV), a novel audit risk proxy that leverages sentiment analysis to address limitations in traditional audit risk measures such as audit fees (LNFEE), audit hours (LNHOUR), and discretionary accruals (|MJDA|). Traditional proxies primarily capture quantitative dimensions, overlooking qualitative insights embedded in audit report narratives. By systematically analyzing sentiment and tone, ARSV captures nuanced audit risk dimensions that reflect the auditor’s risk perception. The study validates ARSV using a dataset of South Korean firms listed on the KOSPI from 2018 to 2023. The results demonstrate the ARSV’s superior explanatory power, as confirmed through the Vuong test, showing consistent performance across binary and continuous measures of explanatory language. ARSV bridges the gap between qualitative and quantitative audit risk assessments, offering significant benefits to auditors, regulators, and investors. Its ability to enhance the interpretability of audit reports improves transparency and trust in financial reporting, addressing stakeholder demands for actionable, forward-looking information. Furthermore, ARSV aligns with global trends emphasizing sustainability and accountability by integrating qualitative insights into audit practices. While this study provides robust evidence supporting ARSV effectiveness, its focus on South Korean firms may limit its generalizability. Future research should explore ARSV application in diverse regulatory and cultural contexts and refine the sentiment analysis tools using advanced machine learning techniques. Expanding ARSV to include other unstructured data, such as management commentary, could further enhance its applicability. This study marks a significant step toward modernizing audit methodologies, aligning them with evolving demands for comprehensive and transparent financial reporting. The empirical analysis reveals that ARSV outperforms traditional audit risk proxies with significantly higher explanatory power. Specifically, ARSV achieved a pseudo R² of 0.786, compared to 0.608 for LNFEE, 0.604 for LNHOUR, and 0.578 for |MJDA|. The Vuong test results further validate ARSV superiority, with Z-statistics of −12.168, −12.492, and −9.775 when compared against LNFEE, LNHOUR, and |MJDA|, respectively. The model incorporating ARSV demonstrated a 62.454 F-value and an Adjusted R² of 0.599, highlighting its robustness and reliability in audit risk assessment. These quantitative metrics underscore ARSV’s effectiveness in capturing qualitative audit risk dimensions, offering a more precise and informative measure for stakeholders.

Keywords:

Audit Risk Sentiment Value; sentiment analysis; audit risk proxies; explanatory language; financial reporting transparency

1. Introduction

Audit risk assessment plays a pivotal role in ensuring the transparency and reliability of financial reporting. As a fundamental component of the auditing process, it enables stakeholders to evaluate the integrity and accuracy of financial statements. However, despite the issuance of clean audit opinions, the level of audit risk can vary significantly across firms, exposing the limitations of traditional audit risk proxies. Commonly used metrics, such as audit fees and hours, primarily capture the quantitative dimensions of audit risk but neglect qualitative aspects, such as linguistic expressions in audit reports, which could provide critical insights into audit risk [1]. This gap underscores the need for more robust measures to integrate the rich qualitative information embedded within audit reports to enhance risk assessment.

The prevalence of unqualified audit opinions further emphasizes the necessity for enhanced audit risk measurement. Approximately 97–98% of listed companies in South Korea and the United States receive such opinions, raising concerns about the objectivity and fairness of audit assessments [2,3]. This high frequency of unqualified opinions has led to skepticism regarding the informational value of audit reports and whether they effectively communicate the underlying risks. Choi et al. (2023) demonstrated that explanatory language in audit reports often conveys the auditor’s perception of audit risk [2]. Their study utilized advanced big data analysis techniques to show how qualitative details in audit reports provide critical risk insights. However, the complexity of this language often limits its interpretability for non-expert users.

Similarly, Choi et al. (2022) explored the potential of deep neural networks in estimating abnormal audit fees and highlighted the value of advanced computational methods in improving audit-related assessments [4]. Their findings suggest that incorporating machine learning techniques can enhance the accuracy and reliability of audit measures, further supporting the need for innovation in audit risk proxies.

The growing emphasis on sustainability and transparency in accounting and auditing compounds these concerns. Sustainability reporting, encompassing environmental, social, and governance (ESG) dimensions, highlights integrating financial and non-financial information into actionable insights [5]. However, while sustainability reporting has advanced in addressing stakeholder needs, audit reports often fail to provide meaningful insights into audit risks. Loughran and McDonald (2011) emphasized that narrative disclosures in financial reports, including audit reports, contain valuable information but require advanced analytical tools, such as sentiment analysis, to extract actionable insights [6]. The complexity of qualitative details in audit reports further emphasizes the need for accessible and reliable audit risk measures.

In response to these challenges, this study introduces a novel audit risk proxy, the Audit Risk Sentiment Value (ARSV), which leverages sentiment analysis to quantify qualitative data from audit reports. Sentiment analysis has been recognized as a powerful tool for analyzing unstructured textual data in financial disclosures [7]. By converting textual data into quantifiable metrics, ARSV addresses the limitations of traditional proxies and provides a more precise measure of audit risk. Utilizing sentiment mining and advanced techniques such as the Vuong test, this study demonstrates the superiority of ARSV over existing proxies, including audit fees, audit hours, and discretionary accruals. The findings consistently show that ARSV offers enhanced explanatory power and reliability, making it a valuable tool for assessing audit risk. The contributions of this study can be summarized as the following key points.

First, this study introduces an innovative methodology incorporating artificial intelligence (AI) techniques, particularly sentiment analysis, into audit risk assessment, enabling a novel approach to evaluating audit risks. Second by systematically analyzing unstructured textual data, the study broadens the traditional boundaries of audit research, offering new opportunities to explore narrative disclosures in audit reports. Third, the methodology aligns with sustainability principles by fostering greater transparency and accountability in financial reporting, addressing stakeholders’ growing demands for reliable and forward-looking audit information. Fourth, the creation of an audit report lexicon provides a robust framework for extracting meaningful insights from unstructured data, bridging the gap between qualitative narratives and quantitative metrics. Fifth, the proposed approach offers significant value for scholarly research and practical applications, providing stakeholders with actionable insights and enhancing the reliability of audit methodologies.

By examining South Korean firms listed on the Korean Composite Stock Price Index (KOSPI) between 2018 and 2023, this study provides empirical evidence supporting the effectiveness of ARSV in assessing audit risk. The results highlight the potential of incorporating qualitative data into audit risk measurement to enhance audit reports’ informativeness and meet stakeholders’ growing demands for precise, reliable, and forward-looking information. By addressing the limitations of traditional audit risk proxies and integrating advanced computational techniques, this study contributes to the ongoing evolution of audit methodologies. It sets a foundation for future innovations in the field.

2. Related Previous Studies and Research Questions Development

2.1. Related Previous Studies

Traditional audit risk proxies, such as audit fees, audit hours, and discretionary accruals, have long been used to assess audit risk. However, their reliance on quantitative dimensions limits their ability to capture the qualitative nuances embedded in audit reports. DeFond and Zhang (2014) provided a foundational critique of these measures, emphasizing that audit quality—a critical element of reliable financial reporting—cannot be fully captured through these proxies [1]. They highlighted the potential of qualitative data, such as textual content in audit reports, to provide deeper insights into audit risk. Their findings call for innovative methodologies that integrate unstructured data into audit analyses to address the complexity of modern auditing environments better.

Similarly, Hope et al. (2017) explored the narrative disclosures in financial reporting, particularly within audit reports [3]. Their study demonstrated that these narratives often contain valuable information about audit risk and quality but are underutilized due to their complexity. This limitation highlights the need for advanced tools to systematically extract and quantify insights from textual data, underscoring the inadequacy of traditional proxies in providing a comprehensive understanding of audit risk.

Advanced analytical techniques, particularly sentiment analysis, have been recognized as powerful tools for uncovering insights from textual data. Loughran and McDonald (2011) pioneered sentiment analysis in financial disclosures, developing a framework for extracting sentiment from textual data and introducing the concept of domain-specific lexicons to enhance the relevance of sentiment-based metrics [6]. Their study demonstrated that sentiment analysis could reveal critical insights into audit risk that are often overlooked by traditional quantitative proxies.

Building on this foundation, Li (2010) explored applying computational text-mining techniques to analyze unstructured data in corporate disclosures [7]. His research emphasized the potential of domain-specific approaches for identifying hidden risks and patterns, laying the groundwork for extending sentiment analysis to audit reports. These studies collectively highlight the value of sentiment analysis as a method for addressing the qualitative dimensions of audit risk, offering a robust alternative to traditional proxies.

Audit reports often include complex narrative disclosures, such as Key Audit Matters (KAM), intended to enhance transparency and stakeholder understanding of significant risks. Christensen et al. (2018) [8] examined the effectiveness of KAM disclosures and found that their qualitative nature often lacks clarity and consistency, limiting their communicative value. The authors proposed sentiment analysis to quantify and better articulate the qualitative aspects of KAM, improving their utility for stakeholders.

Choi, Na, and Lee (2023) extended this discussion by demonstrating that explanatory language in audit reports frequently reflects auditors’ perception of audit risk [2]. They used big data techniques to analyze this qualitative information, showing how it provides critical risk-level insights. However, they also acknowledged that the complexity of this language poses challenges for non-expert users, emphasizing the need for tools that enhance interpretability.

Machine learning has emerged as a transformative tool for addressing the limitations of traditional audit risk proxies. Choi et al. (2022) explored the use of deep neural networks to estimate abnormal audit fees, showcasing the potential of machine learning to improve audit-related assessments [4]. Their findings emphasized that integrating advanced computational methods can enhance the accuracy and reliability of audit risk measures, particularly when combined with qualitative data.

Finally, Chopra et al. (2024) examined the intersection of sustainability reporting and audit risk assessment, highlighting how environmental, social, and governance (ESG) factors influence stakeholders’ perceptions of audit quality [5]. Their study underscores the importance of combining financial and non-financial data into cohesive frameworks, aligning audit risk methodologies with broader sustainability-focused practices.

These studies collectively highlight the significant limitations of traditional audit risk proxies and emphasize the untapped potential of advanced analytical techniques such as sentiment analysis. They demonstrate that integrating qualitative and unstructured data into audit risk assessments can improve the informativeness and utility of audit reports, providing stakeholders with clearer and more actionable insights into audit risk. Together, these advancements pave the way for a more comprehensive and robust framework for audit risk assessment.

2.2. Research Questions Development

Building on the insights and gaps identified in the literature, this study seeks to address the following research questions:

(1): How can sentiment analysis techniques be applied to audit reports to develop a novel audit risk metric that captures qualitative dimensions of audit risk?
(2): What are the advantages of the Audit Risk Sentiment Value (ARSV) over traditional audit risk proxies, such as audit fees, audit hours, and discretionary accruals?
(3): How does incorporating qualitative data into audit risk measurement enhance the interpretability and usefulness of audit reports for diverse stakeholders, including non-expert users?
(4): What implications does adopting sentiment-based audit risk measures have for aligning audit practices with sustainability and transparency goals?

These research questions aim to guide the development and validation of the proposed audit risk proxy, emphasizing the importance of qualitative data and advanced analytical methodologies in addressing the limitations of traditional metrics.

3. Research Design of Analyzing Texts in the Audit Report

3.1. Data Processing

The sample for this study comprises companies listed on the Korean Composite Stock Price Index (KOSPI) from 2018 to 2023. The dataset selection for this study was carefully considered to align with the research objectives and ensure the robustness of the findings. The dataset comprises firms listed on the Korean Composite Stock Price Index (KOSPI) from 2018 to 2023, providing a rich context for examining the relationship between audit report narratives and audit risk. This setting was chosen due to the mandatory disclosure requirements in South Korea, where audit fees and audit hours are consistently reported, enabling a thorough comparison between traditional audit risk proxies and the newly developed Audit Risk Sentiment Value (ARSV). Additionally, the availability of comprehensive audit reports, including detailed narrative disclosures, ensures the dataset is well-suited for sentiment analysis.

Another critical reason for selecting this dataset is that 2018 introduced the requirement to disclose Key Audit Matters (KAM) in audit reports in South Korea. KAM disclosures provide detailed insights into areas of significant risk and auditor judgment, aligning closely with the objectives of this study. These disclosures have been incorporated as a control variable in the empirical analysis, ensuring that the findings account for the influence of KAM on audit outcomes. The availability of KAM data adds depth to the analysis, enabling a more nuanced understanding of how ARSV interacts with other qualitative elements of audit reports.

The analysis is restricted to firm-year observations with unqualified audit opinions to ensure consistency and reduce heterogeneity. This approach minimizes variability between firms with clean and modified opinions, enabling a more focused investigation of audit risk. The Korean audit environment provides a particularly suitable research setting, as listed companies must disclose audit hours and fees. These mandatory disclosures facilitate the empirical analysis of existing audit risk measures, such as audit fees and hours, and enable a robust verification of the novel audit risk proxy developed in this study.

The data collection process is conducted as follows. First, annual audit reports, including explanatory paragraphs, are collected directly from the DART (Data Analysis, Retrieval, and Transmission) system (https://dart.fss.or.kr/ accessed on 12 October 2024) managed by the Financial Supervisory Service of Korea. This database serves as a comprehensive repository for corporate disclosures, ensuring the reliability and completeness of the collected audit report data.

Second, financial data are sourced from the Value-Search database (https://www.nicevse.com/vse/main.html accessed on 12 October 2024) and the TS-2000 system (http://www.kocoinfo.com/ accessed on 12 October 2024), both of which are recognized platforms for financial information in South Korea. These sources provide the necessary financial variables for the empirical analysis.

The initial dataset consists of 4242 firm-year observations spanning the sample period. To ensure data uniformity and accuracy, observations pertaining to financial industries, non-December fiscal year-ends, and firms with missing financial data are excluded. After these exclusions, the final sample comprises 3795 firm-year observations. The sample selection process is summarized in Table 1, providing a clear overview of the criteria applied during data refinement.

This rigorous data collection and refinement process ensures the reliability of the dataset and supports the validity of the study’s findings. By leveraging the unique attributes of the Korean audit environment, this study provides valuable insights into audit risk assessment and the practical applicability of the proposed audit risk proxy.

3.2. Keyword Extraction from the Audit Report

The methodology for extracting and processing keywords from audit reports involves several systematic steps to ensure the precision and relevance of the data for analysis. The process is described as follows:

First, keywords are extracted from the audit report text and automatically collected using Java-based functions. To prepare the text for analysis, morpheme segmentation is performed using Java-embedded code that incorporates Porter’s algorithm [9]. Following this, Part-of-Speech (POS) tagging is applied to classify nouns, verbs, adjectives, and adverbs [10]. Only nouns are retained for the analysis, as they most indicate contextual meaning. At the same time, other elements, including verbs, adjectives, adverbs, memorable characters, and non-lexical symbols such as punctuation marks), are systematically removed. Additionally, fundamental HTML tags are stripped from the text during extraction to eliminate extraneous formatting.

Next, the importance of each word is quantified using Term Frequency–Inverse Document Frequency (TF-IDF) values [11]. This process is based on the Bag-of-Words (BoW) model, which provides term frequency information [12]. TF-IDF is a weighted metric that measures the importance of a word within a document relative to its occurrence across a corpus of documents. Unlike raw frequency counts, TF-IDF accounts for the distribution of words, ensuring that commonly occurring yet insignificant terms do not dominate the analysis [13]. Precisely, the Inverse Document Frequency (IDF) component adjusts the weight of a word based on how frequently it appears across multiple documents, reducing the influence of generic terms that lack distinctive significance [14].

Words appearing with a frequency of less than 1% are excluded to manage computational efficiency and maintain focus on meaningful terms. This threshold reduces the computational burden while retaining words likely to indicate the unique characteristics of the documents. Frequency calculations are performed on each file, ensuring that high-frequency words that capture the traits of specific files are prioritized. It is important to note that the inclusion of words is file-specific; a term present in one file may not appear in others, reflecting variations in document content.

To facilitate further analysis, the words used in each file are aligned to avoid an overlap, and their frequencies are systematically recorded. Since the TF-IDF values of identical words can vary across files based on their contextual relevance, these values are collectively recorded to ensure comprehensive representation.

This robust text-processing methodology ensures that the extracted keywords accurately reflect audit reports’ contextual and semantic characteristics. By leveraging TF-IDF weighting, the analysis emphasizes the most relevant terms, enabling meaningful differentiation among documents and enhancing the reliability of subsequent evaluations.

4. Development of a New Audit Risk Proxy

4.1. Quantification of Audit Report Text Data Using Sentiment Mining

To develop a novel measure of audit risk that reflects the qualitative data embedded in audit reports, we apply sentiment mining techniques to analyze the textual content of these reports. Five audit experts, comprising two accounting and auditing professors and three certified public accountants, independently score keywords from the audit reports. Each expert assigns scores based on the contextual meaning of the keywords within the audit report rather than relying solely on dictionary definitions.

The scoring scale ranges from +3 to −3, reflecting varying levels of audit risk. Keywords indicative of the lowest audit risk is assigned a score of +3, while those associated with the highest audit risk receive a score of −3. Neutral keywords are scored as 0, resulting in seven continuous score categories. This scoring process ensures that the qualitative nuances of the audit report are systematically captured and quantified.

For example, positive keywords in audit reports, such as “improvement”, “stability”, “optimum”, “raise”, “conclusion”, and “solution”, are indicative of reduced audit risk and are scored positively. Conversely, negative keywords such as “capital reduction”, “joint management”, “relation”, “uncertainty”, “regulation”, and “ability” suggest elevated audit risk and are scored negatively. The quantification process is carried out in three rounds to ensure consistency, with each expert independently scoring the significant keywords.

4.2. Establishment of the Audit Report Lexicon

The development of the new audit risk proxy, the Audit Risk Sentiment Value (ARSV), involves calculating a weighted average of the TF-IDF values and expert-assigned scores of keywords. Specifically, the ARSV is computed by combining the frequency of keywords as determined by their TF-IDF values with the contextual sentiment scores assigned by audit experts. This weighted averaging approach ensures that keywords’ importance and sentiment are incorporated into the calculation.

To address potential biases in the scoring process and enhance the reliability of the ARSV metric, several additional measures were implemented alongside the Intra-Class Correlation (ICC) test. While the ICC test result at the 100% confidence level demonstrates a high degree of consistency among expert evaluations, ensuring the robustness and objectivity of the scoring process required further safeguards against potential biases.

First, the scoring process incorporated a diverse panel of five experts, comprising two accounting and auditing professors and three certified public accountants (CPAs). This diversity in professional expertise and academic perspectives ensured a balanced evaluation of the sentiment of keywords within audit reports. By leveraging a broad range of knowledge and experience, the potential for individual biases was minimized, enhancing the representativeness and credibility of the scoring outcomes.

Second, the keywords were presented to the experts in a randomized and blinded format to mitigate contextual or cognitive biases. This approach ensured that the scoring was based solely on the contextual meaning of the keywords, independent of the identity of the audit report or the firm from which the keywords were extracted. Such a blinded process reduced the risk of extraneous factors influencing the experts’ evaluations.

Third, an iterative scoring process was employed to build expert consensus. Conducted over three rounds, this process allowed for identifying and discussing discrepancies in the assigned scores. Through iterative refinement, ambiguities were clarified, and a shared understanding of the sentiment associated with the keywords was achieved. This iterative approach facilitated convergence in scores and strengthened the reliability of the final evaluations.

Fourth, cross-validation techniques were applied as an additional measure to validate the robustness of the scoring process. Subsets of the keywords were re-evaluated independently by different experts to confirm the consistency of the sentiment scores. This cross-validation helped identify any systematic biases and reinforced the credibility of the scoring methodology.

Collectively, these measures ensured that the sentiment scores used in the ARSV calculation were objective, consistent, and dependable. By incorporating a diverse expert panel, implementing a blinded scoring process, engaging in iterative consensus-building, and utilizing cross-validation techniques, the scoring process was refined to minimize subjective influences and maximize reliability. These efforts contribute to the methodological rigor of ARSV, establishing its validity as a novel audit risk proxy.

Moving forward, the operationalization of ARSV could be further enhanced by diversifying expert panels to include participants from varying jurisdictions and industries. Additionally, integrating automated sentiment analysis tools, such as machine learning algorithms, could complement expert evaluations, enabling greater scalability and reducing the reliance on manual scoring processes. Such advancements would strengthen the ARSV methodology and extend its applicability across diverse regulatory and organizational contexts.

An example of the ARSV calculation process is as follows. For instance, in the audit report of Company ABC, the keyword “capital reduction” has a sentiment score of −3 and a TF-IDF value of +5, while the keyword “stable” has a sentiment score of +2 and a TF-IDF value of +4. The ARSV for the company’s audit report is calculated as (−3 × 5) + (2 × 4) = −7.

Through these processes, we have established an Audit Report Lexicon, which integrates qualitative data from audit reports into a structured framework for audit risk assessment. To the best of our knowledge, this lexicon represents the first application of sentiment mining in the auditing field to develop an audit risk proxy. It provides a valuable resource for academic research and practical applications in auditing. The lexicon facilitates a more nuanced understanding of audit risk by reflecting the qualitative dimensions of audit reports, thereby addressing limitations in existing measures.

Table 2 presents selected examples of ARSVs derived from the audit report lexicon, demonstrating its applicability in capturing and quantifying audit risk indicators. This novel lexicon is anticipated to significantly enhance audit research and practice by offering a robust tool for integrating qualitative insights into audit risk assessment.

Figure 1 summarizes the calculations of the new audit risk proxy and the construction of the audit report lexicon. From audit report data collection to calculating ARSV, each step schematizes calculating a new audit risk measure. Using sentiment analysis techniques, Figure 1 illustrates a comprehensive, step-by-step methodology for calculating a novel audit risk proxy, ARSV (Audit Risk-related Sentiment Value). This innovative approach combines advanced text analytics with domain expertise, resulting in a systematic, data-driven metric for evaluating audit risk. Each process stage is meticulously designed to ensure accuracy and reliability, providing auditors with a robust quantitative tool for assessing risk.

The process begins with collecting audit reports as the primary data source. These reports contain critical information that forms the foundation for identifying key elements indicative of audit risk. By systematically gathering audit reports, the methodology ensures a rich and comprehensive dataset for analysis.

Following data collection, the methodology extracts keywords and phrases using sentiment mining techniques. This phase identifies linguistic elements in the audit reports closely associated with audit risk. Sentiment mining enables the detection of words and phrases with significant sentiment polarity—either positive or negative—providing a nuanced understanding of the report’s content.

The extracted text undergoes preprocessing and Part-of-Speech (POS) tagging to prepare the data for analysis. Preprocessing involves cleaning and formatting the text, ensuring consistency, and removing irrelevant elements. POS tagging refines the data by labeling words according to grammatical roles, facilitating more accurate subsequent analysis. This stage is critical in ensuring the extracted data are well-structured and ready for vectorization.

The next step involves vectorization using the Bag-of-Words model, weighted by TF-IDF. This method converts textual data into numerical representations, emphasizing frequent and uniquely significant terms within the corpus. The application of TF-IDF weighting ensures that commonly used but less meaningful words (e.g., “the” or “and”) are downplayed, while terms relevant to audit risk are prioritized.

Once the text data have been vectorized, domain experts evaluate the extracted keywords by assigning sentiment scores. Each keyword is scored as either positive or negative, depending on its potential contribution to audit risk. The ICC test ensures reliability and consistency in this scoring process. This test assesses the degree of agreement among experts, with a conformity threshold set at 99%. Adjustments are made if the conformity level is not achieved, and the scoring process is reiterated until consensus is reached. This rigorous validation process ensures the credibility of the sentiment scores.

After the expert agreement is achieved, the methodology calculates weighted average values for the sentiment scores. These weighted values aggregate the sentiment polarity of the identified keywords, providing a numerical basis for constructing a specialized lexicon tailored to audit reports.

The construction of the audit report lexicon marks a critical milestone in the methodology. This lexicon incorporates the weighted sentiment values, creating a customized database of terms relevant to audit risk contexts. The lexicon’s tailored nature enables more precise sentiment analysis in future applications, aligning closely with the unique linguistic characteristics of audit reports.

Finally, the ARSV metric is calculated by applying the constructed lexicon to the audit reports. The ARSV metric quantifies audit risk by synthesizing sentiment analysis results, offering a novel, data-driven proxy for risk evaluation. This metric enables auditors to assess audit risk quantitatively, addressing the inherent limitations of traditional qualitative methods.

In summary, this methodology represents an innovative approach to audit risk evaluation. Using sentiment analysis seamlessly, the ARSV proxy provides a robust and systematic tool for assessing audit risk with greater precision and consistency. This approach not only enhances the reliability of risk assessments but also contributes to the advancement of audit practices by applying cutting-edge analytical techniques.

5. Research Design of Empirical Test

5.1. Comparing the New Audit Risk Proxy with Existing Audit Risk Measures by Vuong Test Using a Dummy Variable of Explanatory Language as a Dependent Variable

The explanatory paragraph in an audit report can indicate potential issues identified during the audit process. In this study, the presence of an explanatory paragraph indicates relatively higher audit risk. The study employs the model specified in Equation (1) to validate this relationship empirically. This model is designed to examine whether higher audit risk is associated with the presence of explanatory paragraphs in audit reports.

Specifically, the analysis incorporates traditional audit risk measures widely used in prior research, including LNFEE (log-transformed audit fees), LNHOUR (log-transformed audit hours), and |MJDA| (absolute value of modified Jones discretionary accruals). Several studies highlight the relationship between audit fees and firm-specific risk. Demartini and Trucco (2016) demonstrated that higher audit risk correlates with increased audit fees due to intensified auditor effort [15]. Kim et al. (2024) explain that higher audit risk in a firm leads to an increase in the time required by auditors to conduct the audit [16]. Modified Jones discretionary accruals, another prominent indicator of audit risk, tend to increase as audit risk rises [17].

However, it is evaluated that these traditional audit risk indicators do not accurately reflect the specific audit contents presented in the audit report. Linsley and Shrives (2006) emphasized the importance of qualitative disclosures in audit reports, arguing that such disclosures often provide deeper insights into firm risk profiles than purely quantitative measures [18]. Furthermore, Velte and Issa (2019) investigated the impact of key audit matter (KAM) disclosures, demonstrating their potential to enhance the transparency of audit risks [19].

This study compares the effectiveness of ARSV (Audit Risk Sentiment Value), a newly developed qualitative audit risk measure, through empirical analysis with existing audit risk indicators by emotionally analyzing the contents of the audit report. The empirical analysis aims to demonstrate the relative utility of ARSV in capturing audit risk and its potential to enhance the understanding of audit risk factors reflected in audit report disclosures. This aligns with the findings of Mcchlery and Hussainey (2021), who found that integrating both qualitative and quantitative measures significantly improves stakeholders’ ability to comprehensively assess audit risks [20]. To compare and analyze the superior explanatory power of new audit risk measures with existing ones, we set the regression model Equation (1) and conducted the Vuong test.

EXL_dum = β0 + β1 (ARSV/LNFEE/LNHOUR/|MJDA|) + ΣControl + Fixed Effects + ε.

(1)

The dependent variable, EXL_dum, is a dummy variable that equals one if the audit report contains at least one explanatory paragraph and zero otherwise. This reflects whether auditors included additional explanatory language due to significant risks or concerns. The independent variables include ARSV, LNFEE, LNHOUR, and |MJDA|. ARSV is a sentiment-based audit risk score derived from audit report text analysis and expert sentiment scores. LNFEE and LNHOUR represent the natural logs of audit fees and audit hours, respectively, indicating audit effort. |MJDA| is the absolute value of discretionary accruals, capturing the magnitude of earnings management.

Control variables include client firm characteristics such as firm size (SIZE), leverage (LEV), sales growth (GRW), return on assets (ROA), and operating cash flow (CFO) to account for financial performance and complexity. Additional controls include business risk (RISK), tangible asset ratio (PPER), inventory and receivables ratio (INVREC), and loss incidence (LOSS). Corporate governance is captured by major shareholder ownership (LARGEST) and foreign shareholder ownership (FOREIGN), while auditor-related factors such as Big Four affiliation (BIG4) and the inclusion of Key Audit Matters (KAM) address audit firm size and report complexity.

The control variables in this study model are selected based on the following rationales. SIZE reflects the scale of a company and is a key variable affecting audit risk and quality. Davidson and Neu (1993) found that larger firms exhibit higher audit quality, as firm size impacts the allocation of audit resources and the level of auditor attention [21]. LEV represents a firm’s financial risk and is a critical factor in assessing a firm’s sustainability. Broye and Weill (2008) demonstrated that higher leverage increases audit risk, requiring additional auditor effort to mitigate such risks [22]. GRW serves as an indicator of a company’s growth potential. According to Bae et al. (2021), firms with higher sales growth are associated with increased audit fees, reflecting the heightened effort required by auditors [23]. ROA measures a firm’s profitability and is a key metric in evaluating financial health during audits. Swandewi and Badera (2021) highlighted that ROA influences auditors’ effort levels and resource allocation [24]. CFO reflects a firm’s financial stability and plays a significant role in audit risk assessment. Kannan and Skantz (2014) showed that CFO levels significantly affect audit fees and the scope of audit work [25]. RISK captures a firm’s financial uncertainties and is a crucial consideration in assessing audit risk. Johnstone (2000) provided evidence that business risk is a significant determinant of audit scope and cost, highlighting its central role in auditors’ evaluations [26]. PPER represents the composition of a firm’s assets and affects auditors’ evaluation of asset valuations. Visvanathan (2017) found that a higher proportion of tangible assets increases audit risk and costs [27]. INVREC reflects operational and financial stability and is critical to assessing audit risk. An et al. (2023) demonstrated that higher inventory and receivables ratios necessitate additional auditor effort, leading to increased audit time [28]. LOSS is utilized to evaluate a firm’s sustainability and financial soundness. Firms reporting losses require careful evaluation by auditors, often leading to the inclusion of explanatory comments in audit reports [29]. LARGEST is closely tied to corporate governance and can impact auditor independence and audit risk assessment [30]. FOREIGN is associated with a firm’s international transparency. Firms with higher levels of foreign investment typically require additional resources from auditors to meet international reporting standards [31]. BIG4 auditors are widely regarded as indicators of high audit quality and transparency. Big Four auditors contribute significantly to the reliability and credibility of audit reports [32]. KAM reflects the complexity and transparency of audit reports. KAM disclosures often require significant additional effort from auditors, highlighting critical areas of focus [33]. Lastly, the fixed effects in this study model account for year-specific and industry-specific effects. To ensure the derivation of more objective and reliable research results, year dummy variables and industry dummy variables were included as control variables. The control variables used in our model are shown in Table 3 below.

5.2. Robustness Check: Vuong Test Comparison of Audit Risk Proxies with Continuous Explanatory Language

To ensure the robustness of our research findings, we designed Equation (2) and performed the Vuong test. In this section, instead of using the dummy variable (EXL_dum), the continuous variable of explanatory language (Ln_EXL) is used as a dependent variable. This variable is the natural logarithm of the number of explanatory languages in the audit report. Independent and control variables are the same as in Equation (1).

Ln_EXL = β0 + β1 (ARSV/Lnfee/LnHour/|MJDA|) + Σ Control + Fixed Effects + ε.

(2)

The result of Equation (1) is presented in Table 4, and Equation (2) is presented in Table 5. If the results of Equations (1) and (2) are consistent, it can be inferred that ARSV, which analyzes the textual content of audit reports using sentiment analysis, reflects audit risk more sensitively than traditional audit risk measures.

In this study, the empirical analysis employs a regression model, utilizing traditional audit risk measures such as LNFEE, LNHOUR, and |MJDA|, alongside the newly developed audit risk indicator, ARSV, as independent variables. The dependent variables include the dummy variable representing explanatory paragraphs in audit reports (EXL_dum) and the continuous variable measuring the extent of explanatory language (Ln_EXL).

The interpretation of the results is as follows: as audit risk increases, the regression coefficients (β₁) of the traditional audit risk measures, LNFEE, LNHOUR, and |MJDA|, are expected to exhibit significantly positive (+) values. In contrast, the regression coefficient (β₁) of ARSV, the newly developed audit risk measure, is anticipated to show a significantly negative (−) value under higher audit risk conditions.

Additionally, the study conducted the Vuong Test to evaluate which audit risk measure provided a better model for explaining the relationship between audit risk and explanatory paragraphs in audit reports. This approach allows for a robust comparison of the predictive capabilities of the various audit risk measures.

6. Research Results

Comparing the New Audit Risk Proxy with Existing Audit Risk Measures by Vuong Test

Table 4 presents the findings of a logistic regression analysis that examines the relationship between various audit risk proxies—namely, ARSV, LNFEE, LNHOUR, and |MJDA|—and the presence of explanatory language, represented by the dependent variable EXL_Dummy, in audit reports. Additionally, the table includes the results of the Vuong test, which compares the explanatory power of models employing different audit risk proxies. These results provide valuable insights into the effectiveness of the newly developed audit risk metric, ARSV, in capturing audit risk when compared to traditional proxies.

The coefficient for ARSV is negative and statistically significant (−452.232, p < 0.01), indicating that ARSV is highly effective in capturing audit risk, as reflected in the inclusion of explanatory language. Specifically, the findings suggest that negative sentiment embedded in audit reports correlates strongly with the presence of explanatory language, demonstrating ARSV ability to quantify qualitative dimensions of audit risk. In contrast, the coefficients for traditional audit risk proxies, such as LNFEE (0.805, p < 0.01), LNHOUR (0.670, p < 0.01), and |MJDA| (6.828, p < 0.01), are all positive and statistically significant. These results align with prior research, which posits that higher audit fees, longer audit hours, and more significant discretionary accruals reflect increased audit risk. Thus, the results confirm the validity of traditional measures in capturing quantitative aspects of audit risk.

The Vuong test further enhances the analysis by comparing the explanatory power of models using ARSV and traditional proxies. The Z-statistics from the Vuong test are highly significant for all comparisons, including ARSV versus LNFEE (−12.168 ***), ARSV versus LNHOUR (−12.492 ***), and ARSV versus |MJDA| (−9.775 ***). These results demonstrate that the model incorporating ARSV provides superior explanatory power in predicting the presence of explanatory language compared to models relying on traditional proxies. Furthermore, the ARSV model exhibits lower squared error and higher pseudo R² values, reinforcing its predictive strength. These findings underscore ARSV capability to capture qualitative dimensions of audit risk more effectively than conventional proxies.

The results carry significant implications for the field of audit risk assessment. First, ARSV’s negative and significant coefficient and strong performance in the Vuong test validate its robustness as a novel audit risk proxy. By leveraging sentiment analysis to incorporate qualitative aspects of audit reports, ARSV addresses the inherent limitations of traditional proxies, which predominantly focus on quantitative measures. These findings highlight the importance of integrating textual data into audit risk assessments to provide a more comprehensive understanding of audit risk.

Second, the results demonstrate that ARSV captures the nuanced relationship between the tone of audit reports and the presence of explanatory language. This finding suggests that ARSV effectively quantifies the auditor’s perception of risk as reflected in narrative disclosures, offering a more complete representation of audit risk. The use of sentiment analysis in constructing ARSV aligns with emerging trends in accounting research, which emphasize the growing importance of qualitative data in enhancing audit-related assessments.

Third, the Vuong test results illustrate that models incorporating ARSV outperform traditional proxies in predicting explanatory language in audit reports. This indicates that ARSV provides a more precise and informative measure of audit risk, enabling stakeholders to gain deeper insights from audit reports. The higher pseudo R² and lower squared error values further support the effectiveness of ARSV in capturing and explaining audit risk.

From a practical perspective, ARSV ability to extract actionable insights from qualitative data substantially benefits various stakeholders, including auditors, regulators, and investors. By improving the interpretability of audit reports, ARSV bridges the communication gap between auditors and non-expert users, fostering greater transparency and trust in financial reporting.

Moreover, adopting ARSV contributes to sustainability and transparency by integrating qualitative and quantitative data into audit practices. This approach enhances the reliability of audit risk assessments and supports stakeholders’ growing demand for forward-looking, actionable information in an increasingly complex business environment.

In conclusion, the statistical analysis provides compelling evidence of ARSV superiority as an audit risk proxy. The Vuong test results confirm that ARSV outperforms traditional measures such as LNFEE, LNHOUR, and |MJDA|, offering enhanced explanatory power and reliability. By capturing qualitative dimensions of audit risk through sentiment analysis, ARSV represents a significant advancement in audit methodologies. By addressing the limitations of conventional proxies and aligning with modern demands for transparency and accountability, ARSV lays the foundation for future innovations in audit risk assessment and financial reporting.

Figure 2 presents the Vuong Test results, which compare the ARSV’s explanatory power against three traditional audit risk proxies: LNFEE, LNHOUR, and |MJDA|. The analysis is based on a continuous value of explanatory language as the dependent variable.

The Z-statistics for all comparisons—ARSV vs. LNFEE, ARSV vs. LNHOUR, and ARSV vs. |MJDA|—are significantly negative, ranging from approximately −9.8 to −12.5. These results indicate that ARSV demonstrates statistically superior explanatory power compared to traditional audit risk measures. The consistently negative Z-values strongly favor ARSV as the more effective model for explaining the relationship between audit risk and the extent of explanatory language in audit reports.

This comparison underscores ARSV ability to capture qualitative dimensions of audit risk more comprehensively and reliably than traditional proxies. Its superior performance reinforces its potential as a robust and innovative audit risk metric, particularly in addressing the limitations of existing measures by incorporating insights derived from narrative disclosures. These findings further support ARSV relevance in advancing audit methodologies, offering practical value for auditors, regulators, and stakeholders in improving risk assessment practices.

Table 5 provides the results of a linear regression analysis that examines the relationship between various audit risk proxies—ARSV, LNFEE, LNHOUR, and |MJDA|—and a continuous measure of explanatory language (EXL_Number) in audit reports. Additionally, the table includes the results of the Vuong test, which evaluates the comparative explanatory power of models using ARSV against those using traditional proxies. These findings provide critical insights into the effectiveness of ARSV as a novel audit risk metric, particularly in capturing the nuanced dimensions of audit risk.

The coefficient for ARSV is negative and statistically significant (−33.707, t = −31.472 ***), indicating its strong ability to capture audit risk as reflected in the extent of explanatory language. The negative coefficient implies that higher ARSV scores, representing more negative sentiment in audit narratives, are closely associated with a greater degree of explanatory language. This finding validates ARSV’s capability to quantify qualitative aspects of audit risk that are often difficult to measure using conventional methods.

In contrast, the coefficients for traditional proxies such as LNFEE (0.099, t = 7.095 ***), LNHOUR (0.080, t = 5.013 ***), and |MJDA| (0.969, t = 6.854 ***) are positive and statistically significant. These results demonstrate that traditional measures, including audit fees, audit hours, and discretionary accruals, effectively capture the quantitative dimensions of audit risk, aligning with prior research. However, their focus on numerical aspects limits their ability to provide insights into the qualitative nature of audit risk.

The Vuong test results further underscore ARSV’s superiority in explaining audit risk. The Z-statistics for the comparisons between ARSV and traditional proxies are highly significant, with Z = −10.889 *** (ARSV vs. LNFEE), Z = −11.333 *** (ARSV vs. LNHOUR), and Z = −8.801 *** (ARSV vs. |MJDA|). These results demonstrate that models incorporating ARSV significantly outperform those using traditional proxies in explaining variations in explanatory language. The negative Z-values indicate ARSV’s enhanced explanatory power, which is further supported by its higher Adjusted R² value (0.599) compared to those of LNFEE (0.460), LNHOUR (0.455), and |MJDA| (0.446). The F-value for the ARSV model (62.454 ***) also highlights its robustness, confirming its reliability and effectiveness as an audit risk proxy.

These findings have significant implications for the field of audit risk assessment. First, the strong and statistically significant relationship between ARSV and explanatory language demonstrates its ability to capture qualitative audit risk dimensions often overlooked by traditional proxies. By leveraging sentiment analysis, ARSV incorporates narrative insights, bridging the gap between qualitative and quantitative measures in audit risk assessment.

Second, ARSV’s superior performance in the Vuong test validates its methodological advancements and highlights the importance of integrating textual data into audit risk assessments. ARSV effectively quantifies the auditor’s perception of risk as reflected in the tone and sentiment of audit narratives, offering a more comprehensive representation of audit risk.

From a practical perspective, ARSV benefits stakeholders, including auditors, regulators, and investors. By enhancing the interpretability of audit reports, ARSV fosters greater transparency and trust in financial reporting, addressing the growing demand for actionable and forward-looking information. Furthermore, adopting ARSV aligns with emerging trends in financial reporting that emphasize the integration of qualitative and quantitative data, contributing to sustainability and accountability in audit practices.

In conclusion, the results presented in Table 5 provide compelling evidence of ARSV’s superior performance as an audit risk proxy. Its ability to capture the nuanced dimensions of audit risk through sentiment analysis represents a significant advancement in audit methodologies. By addressing the limitations of traditional proxies and meeting modern demands for transparency, ARSV sets the stage for future innovations in audit risk assessment and financial reporting practices.

Figure 3 presents the Vuong test results, which compare the explanatory power of the ARSV metric against traditional audit risk proxies, including LNFEE, LNHOUR, and |MJDA|. The results demonstrate that the Z-statistics are negative across all comparisons, indicating that ARSV consistently outperforms the traditional proxies in explaining variations in the continuous value of explanatory language in audit reports.

Specifically, the Z-statistic for ARSV compared to LNFEE is approximately −10.889, highlighting that the ARSV model provides significantly better explanatory power than the LNFEE model. Similarly, the Z-statistic for ARSV versus LNHOUR is approximately −11.333, reinforcing ARSV superior performance over LNHOUR in capturing audit risk. Finally, the Z-statistic for ARSV versus |MJDA| is approximately −8.801, further confirming ARSV more substantial explanatory power compared to |MJDA|.

These results underscore ARSV’s effectiveness in capturing the qualitative dimensions of audit risk, which are often overlooked by traditional metrics that primarily focus on quantitative measures such as fees, hours, or discretionary accruals. By leveraging sentiment analysis to assess narrative disclosures in audit reports, ARSV provides a more nuanced and comprehensive understanding of audit risk.

The findings validate ARSV as a robust and innovative audit risk proxy, capable of reflecting narrative-based risks with greater accuracy than traditional measures. This analysis highlights ARSV significant utility in audit risk assessments, emphasizing its potential to enhance stakeholders’ understanding of audit risks through its superior explanatory power and integration of qualitative data.

7. Discussions

The findings of this study provide robust evidence supporting the effectiveness of the newly developed Audit Risk Sentiment Value (ARSV) as a novel audit risk proxy. By leveraging sentiment analysis to integrate qualitative dimensions of audit reports, ARSV addresses significant limitations inherent in traditional audit risk measures such as audit fees (LNFEE), audit hours (LNHOUR), and discretionary accruals (|MJDA|). The study’s results, validated through consistent performance across both binary and continuous measures of explanatory language, underscore the reliability, versatility, and practical applicability of ARSV in audit risk assessment.

While widely used, traditional proxies for audit risk primarily capture quantitative dimensions and fail to account for the rich qualitative information embedded in audit reports. Measures such as LNFEE, LNHOUR, and |MJDA| focus on audit effort and financial manipulations but overlook the nuanced contextual elements of audit risk. DeFond and Zhang (2014) emphasized that these quantitative metrics cannot fully capture audit quality, underscoring the need for innovative approaches incorporating textual data [1]. In line with these findings, this study demonstrates that ARSV, by analyzing the linguistic content of audit reports, effectively fills this gap.

As reflected in binary and continuous dependent variables, the significant and negative relationship between ARSV and explanatory language validates its sensitivity to qualitative audit risk indicators. Loughran and McDonald (2011) highlighted the potential of sentiment analysis to reveal insights embedded in financial narratives, which ARSV successfully leverages [6]. These results validate ARSV’s ability to quantify the auditor’s perception of risk as conveyed through the sentiment and tone of audit narratives, offering a more comprehensive measure of audit risk. Unlike traditional measures, ARSV captures the subtle ways auditors communicate risks through narrative disclosures, demonstrating its ability to reflect both explicit and implicit dimensions of audit risk.

The findings consistently show that ARSV outperforms traditional proxies in explaining the presence and extent of explanatory language in audit reports. The Vuong test results highlight ARSV superior explanatory power, with statistically significant Z-statistics across all comparisons. ARSV ability to integrate linguistic content aligns with studies by Hope et al. (2017), which showed that qualitative disclosures often contain valuable information about audit risk but remain underutilized in traditional metrics [3].

Additionally, the ARSV model demonstrates higher pseudo R² values and lower squared errors than models using LNFEE, LNHOUR, and |MJDA|. These results indicate that ARSV provides a more precise and informative measure of audit risk, effectively capturing qualitative dimensions that traditional proxies cannot. The superior performance of ARSV aligns with Choi et al.’s (2023) findings, which emphasized the value of explanatory language in reflecting auditory perceptions of risk [2]. This enhanced explanatory power positions ARSV as a critical tool for advancing audit methodologies and improving the informativeness of audit reports. Notably, the results show that ARSV excels at identifying audit risks that might not be reflected in purely quantitative measures, enabling a more comprehensive understanding of the factors influencing audit outcomes.

The practical benefits of ARSV extend to a diverse range of stakeholders, including auditors, regulators, and investors. By extracting actionable insights from unstructured textual data, ARSV enhances the interpretability of audit reports, particularly for non-expert users who may struggle to derive meaning from complex narrative disclosures. This increased clarity fosters greater transparency and trust in financial reporting, aligning with stakeholder demands for forward-looking and reliable information.

This study’s findings echo the recommendations of Christensen et al. (2018), who argued for more structured and interpretable qualitative data in audit reports to serve stakeholders better [8]. Additionally, ARSV ability to bridge the gap between qualitative and quantitative audit assessments provides auditors with a more holistic understanding of client risk, enabling them to allocate resources more effectively and address areas of concern with greater precision. The development of ARSV also ensures that textual nuances, which might otherwise be overlooked, are systematically analyzed and incorporated into the overall risk assessment process.

Adopting ARSV aligns with broader trends in sustainability and transparency within financial reporting. As stakeholders increasingly prioritize integrating qualitative and quantitative data, ARSV provides a robust framework for addressing these demands. Chopra et al. (2024) emphasized combining financial and non-financial data to meet sustainability-focused objectives [5]. By incorporating sentiment analysis into audit practices, ARSV enhances auditor accountability and the informativeness of audit reports, contributing to the transparency and reliability of financial disclosures.

This alignment with sustainability objectives underscores ARSV potential to support broader regulatory and market expectations for high-quality, actionable audit information. The emphasis on integrating ESG factors into audit assessments, as highlighted by Chopra et al. (2024), further reinforces ARSV relevance in addressing emerging stakeholder demands [5]. ARSV ability to quantify qualitative data also supports a transition towards more nuanced and comprehensive approaches to financial reporting, reflecting the evolving priorities of regulators and market participants.

This study contributes to the advancement of audit methodologies by introducing an innovative approach that combines sentiment analysis with traditional audit risk measures. The development and application of ARSV represent a significant expansion of traditional audit research, enabling the systematic analysis of unstructured textual data to capture nuanced risk indicators.

The construction of the Audit Report Lexicon, which integrates domain-specific linguistic insights, further enhances ARSV applicability. Loughran and McDonald (2011) underscored the importance of tailoring sentiment analysis tools to specific financial contexts, a principle that guided the development of ARSV [6]. By validating its performance across multiple specifications and dependent variables, this study lays the foundation for future research exploring sentiment-based approaches to audit risk assessment. Importantly, ARSV consistent performance highlights its versatility as an audit risk proxy, bridging the gap between traditional metrics and the growing need for qualitative data integration.

The revised discussion emphasizes ARSV superior ability to reflect audit risk through sentiment analysis, highlighting its contributions to audit methodologies and practical applications, aligning with modern demands for transparency and comprehensive risk assessment.

The development and application of the ARSV hold significant practical implications for auditors, regulators, and stakeholders within the financial reporting ecosystem. By utilizing sentiment analysis to integrate qualitative dimensions of audit reports, ARSV provides actionable insights that can meaningfully enhance audit practices. One of the key advantages of ARSV lies in its ability to complement existing audit risk assessment frameworks by incorporating them alongside traditional proxies such as audit fees and discretionary accruals. This integration enables audit firms to use ARSV scores to identify clients with elevated qualitative risk factors, allowing for a more nuanced and effective allocation of audit resources.

Furthermore, the sentiment-based analysis embedded in ARSV improves the interpretability of audit reports for non-expert stakeholders. By offering a quantifiable measure of risk derived from narrative disclosures, ARSV helps bridge the communication gap between auditors and the users of financial statements, ultimately fostering greater transparency and trust. Additionally, regulators could leverage ARSV as a diagnostic tool to monitor systemic risks within the audit market. By analyzing sentiment patterns in audit reports across industries or firms, regulators can proactively identify emerging risk trends and intervene as necessary to ensure the maintenance of audit quality.

Audit firms can also utilize ARSV as part of their training and decision-support systems. For instance, ARSV scores could serve as illustrative case studies to educate auditors about the implications of narrative disclosures, enhancing their capacity to assess and document risks effectively. By systematically quantifying qualitative risks, ARSV improves the precision of audit procedures, reducing the likelihood of misstatements or audit failures and resulting in higher-quality audits. This, in turn, increases stakeholder confidence in financial reporting.

To operationalize ARSV effectively, audit firms may need to invest in infrastructure capable of processing textual data, such as natural language processing (NLP) tools and sentiment analysis software. Additionally, periodic updates to the Audit Report Lexicon will be necessary to ensure that ARSV remains aligned with evolving reporting practices and linguistic trends. Collectively, these measures underscore the practical relevance of ARSV and its potential to improve both the methodology and outcomes of audit practices.

The study primarily focuses on publicly listed firms, which tend to have more structured and detailed audit reports than small and medium-sized enterprises (SMEs) or non-profit organizations. As a result, the applicability of ARSV to these other contexts may require further investigation, particularly given the variations in reporting practices and the complexity of narrative disclosures across different types of entities.

Future research could expand upon this study by applying ARSV to datasets from other jurisdictions with varying regulatory frameworks to assess its adaptability and effectiveness in diverse environments. Similarly, extending the analysis to SMEs or non-profit organizations could provide additional evidence of ARSV versatility and utility in capturing qualitative audit risks across a broader spectrum of entities. These avenues for further research would strengthen the generalizability of ARSV and contribute to the ongoing evolution of audit methodologies. By addressing these potential limitations, future studies can enhance the practical relevance and impact of ARSV across different organizational and regulatory contexts.

8. Conclusions

This study introduces and validates the Audit Risk Sentiment Value (ARSV), a novel audit risk metric that leverages sentiment analysis to address the limitations of traditional audit risk proxies. By analyzing qualitative data embedded in audit report narratives, ARSV captures nuanced dimensions of audit risk that are often overlooked by conventional measures such as audit fees (LNFEE), audit hours (LNHOUR), and discretionary accruals (|MJDA|). The findings of this study provide robust evidence supporting the effectiveness, reliability, and practical applicability of ARSV in assessing audit risk. This innovative approach enhances the reliability of audit risk assessments and fosters greater transparency and sustainability in financial reporting, setting a strong foundation for future advancements in the field. However, this study is not without its limitations. The ARSV metric was developed and validated using data from South Korean firms, which may limit the generalizability of the findings to other regulatory environments or cultural contexts. Future research could explore the application and adaptation of ARSV across different countries and industries to validate its effectiveness further.

Additionally, while ARSV leverages sentiment analysis to quantify qualitative dimensions of audit reports, the reliance on domain-specific lexicons and expert judgment highlights the need for continuous refinement and updating of the lexicon to reflect language and reporting practice changes. Future studies could integrate machine learning techniques to automate and enhance the development of sentiment analysis tools, improving their scalability and applicability. Expanding the scope of ARSV to include other unstructured data, such as management commentary or earnings call transcripts, could also provide additional insights into audit risk, offering a more holistic view of risk assessment in financial reporting.

The study demonstrates that ARSV significantly outperforms traditional audit risk proxies in explaining the presence and extent of explanatory language in audit reports. Consistent results across binary and continuous measures of explanatory language validated through the Vuong test highlight the superior explanatory power of ARSV. By integrating sentiment analysis into audit risk assessment, ARSV reflects the auditor’s perception of risk and provides stakeholders with a more comprehensive understanding of the factors influencing audit outcomes. This aligns with prior research emphasizing the potential of qualitative data in enhancing audit methodologies and the informativeness of audit reports.

ARSV ability to bridge the gap between qualitative and quantitative dimensions of audit risk offers substantial benefits for diverse stakeholders, including auditors, regulators, and investors. It enhances the interpretability of audit reports, particularly for non-expert users, by systematically analyzing and quantifying the sentiment and tone of audit narratives. This increased transparency fosters greater trust in financial reporting and supports stakeholder demands for actionable, forward-looking information. Furthermore, ARSV aligns with broader trends in sustainability and transparency, reinforcing its relevance in addressing the evolving priorities of financial reporting and regulatory practices.

This study’s methodological contributions are significant. The development of the Audit Report Lexicon, tailored specifically for audit contexts, underscores the importance of domain-specific tools in extracting meaningful insights from unstructured textual data. By systematically combining sentiment analysis with expert evaluations, ARSV represents a methodological advancement that expands the scope of traditional audit research. Its consistent performance across multiple specifications establishes a foundation for future innovations in sentiment-based approaches to audit risk assessment.

In conclusion, the ARSV metric demonstrates that qualitative data embedded in audit reports can provide a robust and reliable measure of audit risk when analyzed systematically through sentiment analysis. By addressing the limitations of traditional proxies and aligning with contemporary demands for transparency and accountability, ARSV marks a significant step forward in audit methodologies. This innovative approach enhances the reliability of audit risk assessments and fosters greater transparency and sustainability in financial reporting, setting a strong foundation for future advancements in the field.

This innovative approach enhances the reliability of audit risk assessments and fosters greater transparency and sustainability in financial reporting, setting a strong foundation for future advancements in the field. However, this study is not without its limitations. The ARSV metric was developed and validated using data from South Korean firms, which may limit the generalizability of the findings to other regulatory environments or cultural contexts. Future research could explore the application and adaptation of ARSV across different countries and industries to validate its effectiveness further.

Author Contributions

Methodology, F.S.; Software, H.J.N.; Validation, X.W.; Formal analysis, H.J.N.; Investigation, X.W.; Data curation, H.J.N.; Writing—original draft, M.G.K.; Writing—review & editing, X.W., F.S. and H.J.N.; Visualization, F.S.; Supervision, M.G.K.; Project administration, M.G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source of the dataset presented in the study is included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

DeFond, M.; Zhang, J. A review of archival auditing research. J. Account. Econ. 2014, 58, 275–326. [Google Scholar] [CrossRef]
Choi, S.U.; Na, H.J.; Lee, K.C. Does explanatory language convey the auditor’s perceived audit risk? A study using a novel big data analysis metric. Manag. Audit. J. 2023, 38, 783–812. [Google Scholar] [CrossRef]
Hope, O.-K.; Hu, D.; Lu, H. The benefits of specific risk-factor disclosures. Rev. Account. Stud. 2017, 22, 809–839. [Google Scholar] [CrossRef]
Choi, S.U.; Lee, K.C.; Na, H.J. Exploring the deep neural network model’s potential to estimate abnormal audit fees. Manag. Decis. 2022, 60, 3304–3323. [Google Scholar] [CrossRef]
Chopra, S.S.; Senadheera, S.S.; Dissanayake, P.D.; Withana, P.A.; Chib, R.; Rhee, J.H.; Ok, Y.S. Navigating the challenges of environmental, social, and governance (ESG) reporting: The path to broader sustainable development. Sustainability 2024, 16, 606. [Google Scholar] [CrossRef]
Loughran, T.; McDonald, B. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 2011, 66, 35–65. [Google Scholar] [CrossRef]
Li, F. The information content of forward-looking statements in corporate filings—A naïve Bayesian machine learning approach. J. Account. Res. 2010, 48, 1049–1102. [Google Scholar] [CrossRef]
Christensen, B.E.; Glover, S.M.; Wolfe, C.J. Do critical audit matter disclosures improve the informational value of audit reports? Account. Rev. 2018, 93, 59–79. [Google Scholar]
Willett, P. The Porter stemming algorithm: Then and now. Program 2006, 40, 219–223. [Google Scholar] [CrossRef]
Chiche, A.; Yitagesu, B. Part of speech tagging: A systematic review of deep learning and machine learning approaches. J. Big Data 2022, 9, 10. [Google Scholar] [CrossRef]
Addiga, A.; Bagui, S. Sentiment analysis on Twitter data using term frequency-inverse document frequency. J. Comput. Commun. 2022, 10, 117–128. [Google Scholar] [CrossRef]
Akuma, S.; Lubem, T.; Adom, I.T. Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets. Int. J. Inf. Technol. 2022, 14, 3629–3635. [Google Scholar] [CrossRef]
Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar] [CrossRef]
Jain, S.; Jain, S.K.; Vasal, S. An Effective TF-IDF Model to Improve the Text Classification Performance. In Proceedings of the 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT), Jabalpur, India, 6–7 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
Demartini, C.; Trucco, S. Does intellectual capital disclosure matter for audit risk? Evidence from the UK and Italy. Sustainability 2016, 8, 867. [Google Scholar] [CrossRef]
Kim, J.; Kim, M.; Yoon, Y.; No, W.G.; Vasarhelyi, M.A. Assessing audit effort in response to exogenous shocks: Evidence from Korea on the impact of enhanced audit standards and COVID-19. Int. J. Audit. 2024, 28, 695–716. [Google Scholar] [CrossRef]
Duramany-Lakkoh, E.K. An Assessment of the Relationship between Audit Tenure and Audit Quality using a Modified Jones Model. Eur. J. Account. Audit. Financ. Res. 2022, 10, 14–35. [Google Scholar] [CrossRef]
Linsley, P.M.; Shrives, P.J. Risk reporting: A study of risk disclosures in the annual reports of UK companies. Br. Account. Rev. 2006, 38, 387–404. [Google Scholar] [CrossRef]
Velte, P.; Issa, J. The impact of key audit matter (KAM) disclosure in audit reports on stakeholders’ reactions: A literature review. Probl. Perspect. Manag. 2019, 17, 323–334. [Google Scholar] [CrossRef]
Mcchlery, S.; Hussainey, K. Risk disclosure behaviour: Evidence from the UK extractive industry. J. Appl. Account. Res. 2021, 22, 194–210. [Google Scholar] [CrossRef]
Davidson, R.A.; Neu, D. A note on the association between audit firm size and audit quality. Contemp. Account. Res. 1993, 9, 479–488. [Google Scholar] [CrossRef]
Broye, G.; Weill, L. Does leverage influence auditor choice? A cross-country analysis. Appl. Financ. Econ. 2008, 18, 715–731. [Google Scholar] [CrossRef]
Bae, G.S.; Choi, S.U.; Lamoreaux, P.T. Auditors’ fee premiums and low-quality internal controls. Contemp. Account. Res. 2021, 38, 153–179. [Google Scholar] [CrossRef]
Swandewi, N.; Badera, I.D.N. The effect of audit opinion, audit delay, and return on assets on auditor switching. Am. J. Humanit. Soc. Sci. Res. 2021, 5, 593–600. [Google Scholar]
Kannan, Y.H.; Skantz, T.R. The impact of CEO and CFO equity incentives on audit scope and perceived risks as revealed through audit fees. Audit. A J. Pract. Theory 2014, 33, 111–139. [Google Scholar] [CrossRef]
Johnstone, K.M. Client-acceptance decisions: Simultaneous effects of client business risk, audit risk, auditor business risk, and risk adaptation. Audit. A J. Pract. Theory 2000, 19, 1–25. [Google Scholar] [CrossRef]
Visvanathan, G. Intangible assets on the balance sheet and audit fees. Int. J. Discl. Gov. 2017, 14, 185–202. [Google Scholar] [CrossRef]
An, R.; Li, W.; Wang, D.; Wang, Y.; Yu, L. Do key audit matters affect operating activities? Evidence from inventory management. Abacus 2023, 59, 300–339. [Google Scholar] [CrossRef]
Pittman, J.; Stein, S.E.; Valentine, D.F. The importance of audit partners’ risk tolerance to audit quality. Contemp. Account. Res. 2023, 40, 2512–2546. [Google Scholar] [CrossRef]
Guo, F.; Lin, C.; Masli, A.; Wilkins, M.S. Auditor responses to shareholder activism. Contemp. Account. Res. 2021, 38, 63–95. [Google Scholar] [CrossRef]
Dang, V.C.; Nguyen, Q.K. Internal corporate governance and stock price crash risk: Evidence from Vietnam. J. Sustain. Financ. Invest. 2024, 14, 24–41. [Google Scholar] [CrossRef]
Martani, D.; Rahmah, N.A.; Fitriany, F.; Anggraita, V. Impact of audit tenure and audit rotation on the audit quality: Big 4 vs non big 4. Cogent Econ. Financ. 2021, 9, 1901395. [Google Scholar] [CrossRef]
Baatwah, S.R. Key audit matters and big4 auditors in Oman: A quantile approach analysis. J. Financ. Report. Account. 2023, 21, 1124–1148. [Google Scholar] [CrossRef]

Figure 1. Procedure of calculating the new audit risk proxy.

Figure 2. Vuong Test Results: ARSV vs. Other Audit Risk Measures (a continuous value of explanatory language as a dummy variable).

Figure 3. Vuong Test Results: ARSV vs. Other Audit Risk Measures (a continuous value of explanatory language as a dependent variable).

Table 1. Sample selection.

Criteria						N
Initial firm-years listed in KOSPI market from 2018 to 2023						4242
Excluding: Financial industries (initial word of KIS-code started from ’K’)						(278)
Excluding: Non-December fiscal year-end						(62)
Excluding: Unavailable financial data required from the database						(107)
Final observations						3795
Observations by year
Year	2018	2019	2020	2021	2022	2023
N	656	615	618	644	657	665

Table 2. Scoring process of sentiment mining.

Variable	Word	TF-IDF Frequency	Scoring Value	ARSV
Variable	English (Translated)	TF-IDF Frequency	Scoring Value	ARSV
Positive Words	unqualified	0.01	3	0.03
	financing	0	3	0
	terminate	0.32	3	0.96
	improvement	0.21	2	0.42
	revenue	0.65	2	1.3
	current assets	0	2	0
	receivable	0	2	0
	retained	0.12	2	0.24
	pushing	0.51	2	1.02
	land	0.23	1	0.23
	common stock	0.43	1	0.43
	negotiation	0.02	1	0.02
	…	…	…
Negative Words	uncertainty	0	−3	0
	litigation	0	−3	0
	loss	0	−3	0
	aggravated	0.03	−3	−0.09
	restatement	0.32	−3	−0.96
	expense	0	−2	0
	exposure	0.16	−2	−0.32
	govern	0.23	−2	−0.46
	delay	0.21	−2	−0.42
	reason	0.71	−1	−0.71
	calculation	0.22	−1	−0.22
	symptom	0.31	−1	−0.31
	accrual	0.23	−1	−0.23
	…	…	…

Table 3. Explanation of the control variables used in the research model.

Variable	Definition
SIZE	the natural log of total assets;
LEV	debt to equity ratio (= total debt/total assets);
GRW	revenue growth rate (= revenue of period t − revenue of period t − 1)/revenue of period t − 1;
ROA	return on total assets (= net income/total assets);
CFO	operation cash flow ratio (= operation cash flow/total assets);
RISK	market risk (= standard deviation of recent five years’ return on sales);
PPER	tangible assets ratio (= tangible assets scaled by total assets);
INVREC	sum of inventory and receivables scaled by total assets;
LOSS	dummy variable equal to 1 if the firm’s net loss, otherwise 0;
LARGEST	the largest shareholder ownership;
FOREIGN	the foreign shareholder ownership;
BIG4	dummy variable equal to 1 for BIG4 client, otherwise 0.
KAM	dummy variable equal to 1 for including Key Audit Matter, otherwise 0.

Table 4. The relation between risk proxies and reporting explanatory languages: applying Vuong test.

Dependent Variable: EXL_Dummy
Variable	Audit Risk = ARSV			Audit Risk = LNFEE			Audit Risk = LNHOUR			Audit Risk = \|MJDA\|
Variable	Coefficient	Wald χ²		Coefficient	Wald χ²		Coefficient	Wald χ²		Coefficient	Wald χ²
Intercept	−1.327	0.030		−4.687	1.632		−1.787	0.244		−0.543	0.114
Audit Risk	−452.232	298.960	***	0.805	23.846	***	0.670	11.776	***	6.828	22.897	***
SIZE	−0.291	9.428	***	−0.502	24.807	***	−0.434	16.253	***	−0.119	2.033
LEV	2.060	15.102	***	2.510	36.784	***	2.686	42.826	***	3.379	55.775	***
GRW	0.166	0.415		0.250	1.793		0.234	1.583		0.346	2.902	*
ROA	0.368	0.067		−0.114	0.010		−0.216	0.037		1.402	1.337
CFO	−0.727	0.251		−1.462	1.582		−1.503	1.680		−0.478	0.154
RISK	3.022	5.624	**	3.286	11.184	***	3.497	12.806	***	3.315	7.918	***
INVREC	−0.360	0.197		−1.332	4.558	**	−1.379	4.957	**	−0.151	0.049
PPER	0.108	0.037		−0.749	2.842	*	−0.739	2.780	*	−0.410	0.673
LOSS	0.198	0.549		0.276	1.860		0.279	1.938		0.364	2.677
LARGEST	−57.329	1.021		−14.255	0.107		−31.831	0.547		−83.725	3.043	*
FORN	62.076	0.388		−78.294	0.918		−75.396	0.841		−65.162	0.494
BIG4	−0.541	7.213	***	−0.735	20.526	***	−0.764	20.082	***	−0.703	17.293	***
KAM	8.748	45.470	***	9.293	56.716	***	9.335	56.290	***	19.633	0.004
Fixed Effect	Included			Included			Included			Included
pseudo R2	0.786			0.608			0.604			0.578
−2 Log L	974.066			1583.733			1595.956			1275.020
HL Chisq	0.182			0.072			0.021			0.043
N_obs	3795			3795			3795			3795
Vuong Test
vs. ARSV				Z	Preferred Model		Z	Preferred Model		Z	Preferred Model
vs. ARSV				−12.168 ***	ARSV		−12.492 ***	ARSV		−9.775 ***	ARSV

Note: *, **, *** means statistically significant results at 10%, 5%, and 1% significance levels.

Table 5. Additional Vuong test results using a continuous value of explanatory language as a dependent variable.

Dependent Variable: EXL_Number
Variable	Audit Risk = ARSV			Audit Risk = LNFEE			Audit Risk = LNHOUR			Audit Risk = \|MJDA\|
Variable	Coefficient	t-Value		Coefficient	t-Value		Coefficient	t-Value		Coefficient	t-Value
Intercept	0.194	1.597		−0.019	−0.129		0.327	2.277	**	0.104	0.850
Audit Risk	−33.707	−31.472	***	0.099	7.095	***	0.080	5.013	***	0.969	6.854	***
SIZE	−0.007	−1.422		−0.053	−6.290	***	−0.043	−4.851	***	−0.003	−0.548
LEV	0.164	5.399	***	0.266	7.509	***	0.285	8.053	***	0.326	8.472	***
GRW	0.007	0.499		0.008	0.495		0.006	0.395		0.016	0.797
ROA	−0.084	−0.930		−0.183	−1.739	*	−0.200	−1.893	*	−0.123	−1.020
CFO	0.002	0.018		−0.115	−1.171		−0.110	−1.115		0.017	0.154
RISK	0.341	4.333	***	0.486	5.321	***	0.514	5.610	***	0.495	4.565	***
INVREC	−0.073	−1.623		−0.175	−3.335	***	−0.171	−3.244	***	−0.047	−0.821
PPER	−0.050	−1.624		−0.107	−2.977	***	−0.110	−3.043	***	−0.085	−2.120	**
LOSS	0.018	1.186		0.026	1.431		0.027	1.469		0.033	1.631
LARGEST	−8.032	−2.590	***	−4.986	−1.364		−6.852	−1.874	*	−11.040	−2.720	***
FORN	6.937	1.446		−1.565	−0.280		0.772	0.138		3.007	0.488
BIG4	−0.034	−3.001	***	−0.068	−4.963	***	−0.072	−5.049	***	−0.059	−4.023	***
KAM	0.495	26.043	***	0.703	34.069	***	0.707	34.153	***	0.700	29.221	***
Fixed Effect	Included			Included			Included			Included
F-Value	62.454 ***			35.939 ***			35.229 ***			49.857 ***
Adj_R2	0.599			0.460			0.455			0.446
N_obs	3795			3795			3795			3795
Vuong Test
vs. ARSEV				Z	Preferred Model		Z	Preferred Model		Z	Preferred Model
vs. ARSEV				−10.889 ***	ARSEV		−11.333 ***	ARSEV		−8.801 ***	ARSEV

Note: *, **, *** means statistically significant results at 10%, 5%, and 1% significance levels.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Sun, F.; Kim, M.G.; Na, H.J. Developing a Novel Audit Risk Metric Through Sentiment Analysis. Sustainability 2025, 17, 2460. https://doi.org/10.3390/su17062460

AMA Style

Wang X, Sun F, Kim MG, Na HJ. Developing a Novel Audit Risk Metric Through Sentiment Analysis. Sustainability. 2025; 17(6):2460. https://doi.org/10.3390/su17062460

Chicago/Turabian Style

Wang, Xiao, Feng Sun, Min Gyeong Kim, and Hyung Jong Na. 2025. "Developing a Novel Audit Risk Metric Through Sentiment Analysis" Sustainability 17, no. 6: 2460. https://doi.org/10.3390/su17062460

APA Style

Wang, X., Sun, F., Kim, M. G., & Na, H. J. (2025). Developing a Novel Audit Risk Metric Through Sentiment Analysis. Sustainability, 17(6), 2460. https://doi.org/10.3390/su17062460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Developing a Novel Audit Risk Metric Through Sentiment Analysis

Abstract

1. Introduction

2. Related Previous Studies and Research Questions Development

2.1. Related Previous Studies

2.2. Research Questions Development

3. Research Design of Analyzing Texts in the Audit Report

3.1. Data Processing

3.2. Keyword Extraction from the Audit Report

4. Development of a New Audit Risk Proxy

4.1. Quantification of Audit Report Text Data Using Sentiment Mining

4.2. Establishment of the Audit Report Lexicon

5. Research Design of Empirical Test

5.1. Comparing the New Audit Risk Proxy with Existing Audit Risk Measures by Vuong Test Using a Dummy Variable of Explanatory Language as a Dependent Variable

5.2. Robustness Check: Vuong Test Comparison of Audit Risk Proxies with Continuous Explanatory Language

6. Research Results

Comparing the New Audit Risk Proxy with Existing Audit Risk Measures by Vuong Test

7. Discussions

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI