1. Introduction
Over the past few years, technological advancements—especially in artificial intelligence (AI) and data analytics—have influenced auditing across sectors. This progress has revolutionized the traditional auditing process and helped auditors provide services with greater precision, efficiency, and transparency. The evolution of auditing technologies is crucial due to the complexity of global financial systems, regulations, and business environments. AI adoption in the audit process is transforming financial and performance audits, especially machine learning and natural language processing (NLP), reshaping audit methods and improving efficiency and accuracy. NLP significantly advances accounting and auditing by overcoming communication barriers in financial documents, extracting insights from reports, and aiding decision-making [
1].
In parallel, the rapid development of machine learning techniques has provided auditors with tools to identify patterns of financial misreporting and detect fraud. Machine learning methods, such as artificial neural networks (ANNs) and decision trees, have outperformed traditional audit practices in detecting financial statement fraud [
2]. Further advancements have shown that machine learning can be employed to predict accounting misstatements by identifying irregularities that are not immediately apparent through conventional auditing techniques [
3]. These innovations have improved fraud detection and introduced upgraded auditing models, thereby allowing for real-time monitoring of financial transactions and reducing the discrepancies risk.
With remote work and social distancing, audit firms use AI and blockchain to maintain quality and adapt to business changes [
4]. These technologies enable remote data management while preserving audit integrity. Blockchain solutions improve data reliability and transparency in smart cities by removing centralized auditors [
5]. As digital transformation progresses, efficient and secure audit practices are essential. The ethics of AI in auditing are crucial, including biases in AI systems leading to discrimination. High-stakes audits need frameworks for fairness, transparency, and accountability [
6]. Transparency in algorithms and decision explanations is vital for trust in AI audits [
7,
8]. Probabilistic neural networks (PNNs) improve AI sophistication in auditing by predicting audit opinions and assessing financial statement quality [
9]. Explainable AI (XAI) advancements, such as the Transparent Open-Box (TOB) network, enhance model clarity, which is critical for auditors to validate AI decisions [
10]. Auditors must understand algorithms for transparency and accountability.
LDA-generated topics enhance AI-era auditing by providing insights into unstructured data and identifying risks missed by traditional methods [
11,
12,
13]. LDA detects financial fraud and compliance violations, automates large dataset processing, and improves auditing speed and accuracy [
12,
14,
15]. It ensures thorough audits, helps auditors manage complex business environments, and makes LDA vital for modern auditing [
12,
16,
17]. As the auditing profession navigates this rapidly evolving landscape, understanding the interplay between AI and auditing practices is pivotal for both practitioners and regulators [
13,
18,
19]. As the auditing profession navigates this rapidly evolving landscape, understanding the interplay between AI and auditing practices is pivotal for both practitioners and regulators.
This review tries to find evidence of the ongoing transformation, offering guidance for the profession in adapting to and leveraging the full potential of AI technologies while addressing the emerging challenges. It also seeks to investigate the influence of AI within modern auditing practices. By analyzing information from major academic databases, Scopus and Web of Science, the study utilizes Latent Dirichlet Allocation (LDA) methods designed to uncover concealed topics within a document set through word co-occurrence patterns to identify significant themes in the literature from 1982 to 2024. The results provide an in-depth summary of the thematic elements propelling changes in auditing, emphasizing technological progress along with the social and ethical factors influencing the future of the field.
To the extent of my knowledge, this is one of the first studies that uses AI-based techniques, specifically Latent Dirichlet Allocation (LDA), to explore and map the thematic transformation of auditing in the AI era. Unlike earlier reviews, this study adopts a machine learning approach to a large dataset, enabling a more data-driven and comprehensive understanding of emerging research trends, key themes, and knowledge gaps. Furthermore, the accelerating integration of AI technologies in auditing—alongside new ethical, regulatory, and professional considerations—creates a timely need for an updated and methodologically advanced review.
The results provide a comprehensive overview of the key thematic areas identified by the LDA model, revealing a strong emphasis on the technological transformation of auditing alongside a recognition of the ethical, specialized, and even potentially sensitive societal factors that are becoming increasingly relevant in the field. The varying percentages of tokens associated with each topic offer a quantitative perspective on the prevalence of these themes within the analyzed text data. The AI era offers challenges and opportunities for auditing. Reliance on machine learning, NLP, and blockchain can transform auditing by increasing efficiency, accuracy, and transparency. Addressing ethical and regulatory challenges is crucial for responsible AI use. As AI evolves, its role in financial and compliance audits will expand. Developing AI auditing tools with regulatory and ethical frameworks is key to harnessing these technologies’ full potential.
2. Literature Review
Incorporating AI into auditing enhances efficiency, precision, and the ability to detect fraud. Auditing is being transformed by machine learning, natural language processing, and data mining, introducing both new opportunities and obstacles. NLP helps process large unstructured financial data, speeding up document analysis and improving decision-making, especially in large audits [
1]. It also enhances audit quality by spotting anomalies in financial statements, providing more reliable results. Machine learning identifies financial misreporting by analyzing content in financial statements like 10-K filings. Combining topic modeling with financial variables enhances the detection of misleading disclosures [
20]. It also detects accounting misstatements by recognizing patterns in financial and market variables, allowing auditors to focus on high-risk areas and streamline the process [
3].
RPA and blockchain are transforming auditing by automating repetitive tasks and enhancing data integrity and transparency [
5,
21]. RPA’s success relies on a strong implementation framework [
21]. Big data analytics (BDA) automates tasks and improves decision-making. In Australia, BDA, AI, and robotics aid task automation despite blockchain skepticism [
22]. This integration helps auditors manage large data sets. XAI tools like ExMatrix improve AI model transparency [
23]. AI systems must meet ethical standards for responsible automated decision-making [
24].
Artificial Intelligence has significantly impacted the field of auditing through enhanced fraud detection capabilities. Advanced techniques such as Artificial Neural Networks (ANNs) and Decision Trees outperform conventional methods in identifying fraudulent financial reports, thereby upholding integrity [
2]. Machine learning algorithms, like probabilistic neural networks (PNNs), contribute to greater transparency and precision in audits [
9]. The COVID-19 pandemic underscored the necessity of digital tools for remote auditing, which improves quality, lowers expenses, and increases flexibility [
4]. This development is integral to a wider digital transformation that incorporates big data analytics (BDA) and AI, making auditing processes more continuous and efficient [
25].
AI enhances auditing but introduces ethical and regulatory challenges such as biases in algorithms, privacy concerns, and the need for transparency. In sectors like e-commerce that rely heavily on real-time data, continuous auditing has become crucial. Computer-assisted audit tools (CAATs) enable auditors to monitor transactions as they happen, providing immediate verification of financial reports [
26]. In our fast-paced digital world, where activities occur rapidly, these tools are indispensable. The shift from periodic to continuous audits allows for the prompt detection of errors and fraud, thereby improving the timeliness and reliability of financial statements. As automated systems expand, adopting ethical auditing practices is vital to address AI decision-making biases. In critical domains like employment and the justice system, audits must reflect cultural nuances and maintain integrity [
27]. AI systems should be transparent and accountable, sparking the creation of solutions like the Transparent Open-Box (TOB) learning network to elucidate algorithmic predictions [
10]. Developing frameworks to ensure ethical compliance, particularly in financial reporting and fraud prevention, is essential [
6].
A study on AI in auditing shows that AI improves audit quality and lowers fees but may displace workers as automation increases [
28]. Machine learning models enhance predictions of financial outcomes and audit opinions, with ANNs offering precise tools for assessing the likelihood of qualified audit opinions [
29]. Integrating blockchain and fintech technologies in auditing enhances transparency and accountability, ideal for financial reporting and fraud prevention. By using blockchain, smart contracts, and IoT, auditors can ensure data integrity, boosting investor confidence and reducing fraud risk [
30]. Blockchain also tackles decentralization, security, and privacy challenges, particularly benefiting industries like healthcare, supply chains, and energy by streamlining operations and ensuring regulatory compliance [
31]. In healthcare, studies on AI in medical diagnostics stress the need for auditing for accuracy and fairness, improving diagnostics, and requiring continued oversight to prevent errors and biases affecting patient care [
32]. In geotechnical engineering, AI predicts soil behavior, aiding auditors with predictive tools for structural integrity assessment [
33].
One of the primary areas where AI impacts auditing is vulnerability detection. A hybrid machine learning model combining static and dynamic code analysis has been proposed to improve vulnerability prediction in web applications, particularly for Structured Query Language (SQL) injection and cross-site scripting (XSS) attacks. This approach leverages both supervised and semi-supervised learning to handle scenarios with sparse labeled data, making it highly effective in security audits [
3]. Additionally, reinforcement learning (RL) has shown promise in enhancing penetration testing processes. An AI-driven system replicates human penetration testing, demonstrating superior efficiency, accuracy, and reliability in security audits [
34]. Machine learning algorithms like ant colony optimization (ACO) and particle swarm optimization (PSO) enhance credit risk assessment and fraud detection by improving feature selection for accurate financial models, which is crucial for handling large datasets [
24].
Adopting AI in auditing faces challenges, relying on audit firms’ tech capabilities and organizational settings [
35]. Remote auditing during the COVID-19 pandemic emphasizes AI’s role in improving sustainability audits and certifications [
36]. Machine learning models, like Beetle Antennae Search (BAS), efficiently and accurately detect fraud in financial data, surpassing traditional methods [
37]. These techniques also aid in identifying anomalies in transactions, assisting auditors in spotting discrepancies and potential fraud [
38].
Auditing must consider AI outcome accuracy and alignment with societal values and legal standards [
39]. AI could disrupt the auditing labor market by improving efficiency but risking job displacement. It will not fully replace auditors but will change their roles and required skills [
28]. Organizations should focus on workforce reskilling and upskilling for adaptation. The regulation of AI and its role in auditing has also evolved in response to this technological innovation. The European Union’s AI Regulation emphasizes the need for robust auditing mechanisms. The regulatory framework ensures that AI systems are developed and applied in compliance with established standards of fairness and transparency, and it introduces mechanisms for conformity assessments and post-market monitoring to enforce compliance [
40]. These developments indicate that as AI becomes more integrated into auditing practices, regulators will need to continuously refine and adapt rules to ensure the integrity of the audit process.
While AI advancements in auditing present benefits, ethical concerns remain significant. Researchers emphasize auditing content personalization for transparency, especially in politics, and suggest frameworks to prevent bias [
41,
42]. Ethical considerations ensure fairness in AI auditing systems, avoiding discrimination. Integrating AI and blockchain boosts efficiency and accuracy in financial reporting but poses adoption challenges. Balancing technology with ethical standards is vital. The future of auditing will involve the evolution of technology, demanding integrity and transparency. Success depends on addressing ethical, regulatory, and operational challenges [
28,
34,
43].
3. Research Methodology
Latent Dirichlet Allocation (LDA) studies AI’s role in auditing by identifying hidden topics in the text. It assumes documents have mixed topics and identifies main themes and trends. Articles from databases like Scopus and Web of Science are collected, and text data are cleaned by removing stopwords, stemming, and normalizing. The number of topics is set by testing and evaluating coherence, highlighting key themes like AI systems, machine learning, ethics, and automation. LDA assigns probability distributions to identify dominant topics. Over time, analysis reveals AI’s auditing roles, aiding research, policy, and applications. Dominant word weight shows word importance through frequency and co-occurrence. Tokens, or total word counts, are used in NLP and LDA to recognize patterns and themes.
Blei, Ng, and Jordan (2003) introduced Latent Dirichlet Allocation (LDA) [
12], a probabilistic model to discover topics in large text collections. The model assumes documents mix topics, each represented by a word distribution. LDA is widely used in natural language processing and text mining to uncover latent structures in textual data and provide insights into the themes of corpora. It infers hidden structures from observed data (words in documents). Below is a detailed breakdown of how LDA works, covering the generative process, mathematical formulation, and key components.
For each document , the generative process of LDA can be described by the three following steps:
Step 1: Choose a topic distribution. For document , a topic distribution is chosen from a Dirichlet distribution with a hyperparameter . , where is a vector of parameters that determines the prior belief about the distribution of topics in a document. This controls how sparse or uniform the topics are in the document.
Step 2: For each word in the document, a topic (a latent variable) is selected from the multinomial distribution . This means each word in the document is assigned to one of the topics . Once the topic is chosen, the word is sampled from the word distribution of the chosen topic. This is done by sampling , where is the multinomial distribution over words for topic .
Step 3: Repeat for all words in the document. The process continues for all words in the document, and this is carried out for each document in the corpus.
Components of the LDA Model include the following:
: The topic distribution for document . This is a vector of probabilities, each corresponding to the probability of a topic in that document.
: The word distribution for topic . This is a vector of probabilities over the vocabulary, representing how likely each word is for topic .
: The parameter of the Dirichlet prior for the topic distributions . This controls the distribution of topics within a document. A higher value of indicates that documents are likely to contain many topics, while a lower value of implies that each document will be dominated by a few topics.
: The parameter of the Dirichlet prior for the word distributions . This controls the distribution of words within a topic. A higher value of indicates that topics are more likely to have many words, while a lower value of suggests that topics are more focused on a small set of words.
: The latent topic assignment for the -th word in document . This indicates which topic generated the word.
: The -th word in document .
The goal of LDA is to infer the topic distributions and word distributions from a collection of documents. LDA assumes that the topics and are latent, and the task is to infer them given the observed data (the words in the documents).
The joint probability of the observed words
and the latent topic assignments
, given the topic distributions
and word distributions
, is expressed as:
where
is a distribution over the possible values of
, controlled by the parameter
.
represents the prior distribution of topic proportions for document
.
represents the probability of assigning topic
to the word
, given the topic distribution
.
represents the probability of word
given the topic
, using the word distribution
.
To infer the topics from the data, LDA uses Bayesian inference. The posterior distribution is given by:
where the goal is to estimate the most likely values for the topic distributions
, the word distributions
, and the topic assignments
.
When applying topic modeling algorithms like Latent Dirichlet Allocation (LDA), the number of tokens represents the total words (or terms) in the entire corpus that the model analyzes. The fundamental formula for determining tokens is as follows:
In topic modeling, the connection between topics and words is shown through topic-term distributions:
This refers to the significance of a particular word within a specific topic. This weight is represented by the probability that indicates how likely it is that a given word belongs to a specific topic relative to all other words in the model’s vocabulary.
In LDA, each topic is represented as a distribution over words, a set of probabilities that describe the likelihood of each word appearing in the topic. The dominant word for a given topic is the word with the highest probability in the topic’s distribution.
Mathematically, the dominant word weight for a word
in topic
is given by the value of
,:
where
is the probability that the word
appears in topic
. The higher the value of
, the stronger the association between the word
and topic
, making it a dominant word in that topic.
The dominant word for a topic is typically the word with the highest probability in that topic’s word distribution. If
are the words in the vocabulary and their respective weights are
, the dominant word is the one with the highest
.
This means that the is the word that has the highest weight in topic .
The weight can be interpreted as the relevance of a word in a specific topic. In the context of the entire dataset, dominant words help characterize or label topics.
As a generative probabilistic model, LDA analyzes large text datasets to group frequently co-occurring words into topics, making it possible to identify dominant themes such as artificial intelligence in fraud detection, risk assessment, data analytics, blockchain for assurance, and ethics in automated auditing (Blei, Ng, and Jordan, 2003 [
12]). By assigning probabilities to each document’s association with different topics, LDA quantitatively clarifies which areas are most prevalent in the literature, enabling researchers to visualize the distribution and evolution of research themes over time [
44].
LDA also tracks topic trends, revealing shifts such as the increasing focus on machine learning and data ethics within auditing scholarship. This temporal analysis can highlight emerging areas needing further research, such as the application of AI in sustainability audits (Blei, Ng, and Jordan, 2003 [
12]). By systematically mapping the landscape of academic discourse, LDA helps identify knowledge gaps, trends, and interdisciplinary connections, thereby guiding future research directions and enhancing understanding of auditing’s technological evolution [
45].
4. Data Analysis
To explore the intersection between artificial intelligence (AI) and auditing, datasets were selected from two major academic databases, Scopus and Web of Science. A combination of relevant keywords to capture the broad and evolving landscape of AI applications in auditing are “AI and auditing”, “auditing and machine learning”, “auditing and deep learning”, “auditing and neural networks”, and “auditing and generative AI”. The search was limited to publications in English and ensured coverage of recent advancements. After removing duplicates and irrelevant articles, a final dataset of 465 papers was obtained. Covering research from 1982 to 2024, the analysis thoroughly explores the field’s evolution.
Table 1 provides an overview of AI and auditing publications, showing long-standing and evolving research trends. With 465 papers, the field has extensive literature covering various topics. An average paper age of 0.89 years indicates active, recent developments in combining AI and auditing. With an average of 18.86 citations per paper, the research in this field shows notable scholarly impact. A total of 8774 references indicates substantial reliance on existing research. The 360 indexed keywords highlight the diversity and rich research potential at the AI and auditing intersection.
A total of 1804 authors participated in the research, highlighting the field’s interdisciplinary nature and innovation. There are 70 single-author papers, but most research is co-authored, with an average of 3.88 authors per paper. The international collaboration rate is 21.87%, showing significant global academic exchange and cooperation in AI and auditing.
Table 2 shows the publication trends in AI and auditing research from 1982 to 2024. Initially, from 1982 to 1991, only eight papers (1.72%) were published, reflecting AI’s early application to auditing. From 1992 to 2001, there were 16 papers (3.44%). This growth continued from 2002 to 2011, with 26 papers (5.59%).
A major increase occurred from 2012 to 2021, with 147 papers (31.61%) indicating a surge in interest as AI technologies advanced. The most dramatic rise was from 2022 to 2024, with 268 papers (57.63%), highlighting the rapid expansion of AI in auditing due to new advancements in machine learning, automation, and data analytics. AI and auditing research has significantly grown, mostly in the past decade, reflecting AI’s increasing integration into auditing.
Figure 1 illustrates a co-authorship network representing international collaborations specifically within the scientific domain of “AI and auditing”, a specialized area focusing on the application of artificial intelligence in auditing practices, adhering to the parameters of a minimum of six documents per country and selecting the top 26 countries based on co-authorship links.
The network depicts countries as nodes, with their size reflecting their research output or centrality within this niche field, while the connecting lines represent co-authorship ties, with thicker lines indicating stronger collaborative relationships. The color gradient from blue to red indicates the temporal aspect of these collaborations, highlighting more recent activity. The prominent positions of the United States (blue) and China (orange) suggest their leading roles in this domain, demonstrating significant research output and extensive collaborations. Furthermore, the network reveals clusters of collaborations among European countries and growing activity in Asian nations, indicating a global interest in developing and implementing AI-driven auditing solutions.
Relating to the key concepts, relationships, and temporal dynamics within the research literature on AI and auditing,
Figure 2 presents a co-occurrence network of keywords extracted from a dataset related to “AI and auditing”, as indicated by the prominent presence of “auditing” and “artificial intelligence” nodes. The network visualizes the relationships between 46 keywords, selected from a pool of 3478, each appearing at least 10 times in the data. The “full counting” method ensures that each keyword is counted every time it appears in a document, regardless of other co-occurring keywords. Nodes represent these keywords, with their size corresponding to their frequency or centrality within the network.
The edges connecting the nodes signify co-occurrence relationships, with thicker lines indicating stronger associations. The color of each node is significant, representing the average publication year of the documents in which the keyword appears. Purple nodes, like “auditing”, “risk assessment”, and “finance”, suggest these keywords were prevalent in earlier publications closer to 2018. Conversely, brighter yellow nodes, particularly “machine learning”, “data set”, “ethics”, “blockchain”, and “human”, indicate keywords that are more prominent in recent publications closer to 2022. The gradual transition from blue to yellow across the network reflects the evolving focus and terminology within the field of AI in auditing over time. The shift towards “machine learning” suggests an increasing emphasis on its application in auditing practices.
5. Empirical Results
Figure 3 presents two key plots for selecting the optimal topic model. The left plot indicates that coherence scores peak at 10 topics, suggesting the best balance of interpretability. Meanwhile, the right plot shows that perplexity decreases as the number of topics increases, stabilizing after 14 topics. Considering both coherence and perplexity, 10 topics emerges as the most suitable choice for model fit and interpretability. This is further reflected in the LDA model configuration, where the number of topics,
, is set to 10, alongside other parameters such as
,
. The model goes through the entire dataset 100 times to optimize the topics. The model allows each word in a document to have its own topic distribution, providing more flexibility in modeling the data to fine-tune the model’s performance and topic coherence.
Based on the analysis of the coherence scores and perplexity for different topic models, 10 topics emerges as the optimal choice. The coherence score for 10 topics is the highest at 0.5121, indicating that the topics are semantically meaningful and consistent, making them the most interpretable. Additionally, the perplexity for 10 topics is very low at 0.0001, which suggests that the model generalizes well and can predict unseen data with minimal uncertainty. While other topic models, such as those with 14 topics with a coherence score of 0.4803, also exhibit strong performance, the coherence score of 10 topics stands out as the best balance between semantic quality and model generalization. Therefore, selecting 10 topics offers the most reliable and coherent representation of the underlying themes in the dataset, ensuring both interpretability and accuracy in the results.
Figure 4 presents an analysis of topics generated by an LDA model, focusing on the top 30 most salient terms and their relationships within a reduced two-dimensional space. The Intertopic Distance Map, visualized through multidimensional scaling on the left, displays four prominent topics as circles, with the size of each circle indicating the marginal topic distribution, suggesting that Topic 1 is the most prevalent. The proximity of Topic 5 to Topic 10 suggests a degree of thematic overlap between them.
Overlapping topics would manifest as circles that are positioned close to each other on the map, indicating a higher degree of similarity in their word distributions. Topics 5, 6, and 7 appear to exhibit the most significant overlap, as their circles are closely clustered together. This proximity suggests that the documents associated with these two topics likely share a considerable number of common terms.
Figure 5 and
Figure 6 show topic modeling results, featuring an Intertopic Distance Map and top 30 terms ranked by their salience score for Topics 1 and 2, respectively. For each term, the blue bar represents its overall term frequency, while the shorter red bar overlaid on it indicates the estimated term frequency within the selected data. The provided lists for Topic 1 and Topic 2 offer the top 30 most relevant terms for each, along with their associated weights, revealing the key concepts defining these individual topics.
The marginal topic distribution in the bottom left corner provides context for the overall prevalence of topics in the dataset. The slide to adjust relevance metric (λ) slider at the top allows for interactive exploration of term relevance, with λ currently set to 1. Concentric circles display topic distribution, and a bar chart shows the top 30 terms for Topic 1, highlighting general and specific frequency with red and blue bars. A slider facilitates topic exploration. Topic 1, characterized by 33.4% of tokens, prominently features AI, audit, accounting, and data, reflecting its focus on applying AI and data analysis in auditing and financial research.
Topic 1 highlights a theme of auditing and financial scrutiny, possibly using advanced analytical techniques. The intertopic distance map shows Topic 1 as distinct from the others, indicating a unique focus in the documents. The stark contrast between the long red bars and short blue bars for terms like “audit”, “auditing”, and “financial” strongly emphasizes their specificity to Topic 1. These terms are highly characteristic of this topic and appear much more frequently within documents associated with Topic 1 compared to the entire corpus. This reinforces the central theme of auditing and finance. Within the top terms, it starts to identify potential sub-themes within Topic 1. The presence of “artificial” and “intelligence” alongside “data” suggests a potential focus on the application of AI and data analytics in auditing or financial contexts. Similarly, “research”, “used”, and “using” might point toward discussions of methodologies or tools employed in these fields. Terms like “analysis”, “information”, “tools”, “potential”, “management”, “process”, “systems”, “risk”, and “fraud”, while having some overall frequency (indicated by the blue bars), are still relevant to Topic 1. They suggest a broader context encompassing the analytical processes, information management, tools utilized, potential applications, management aspects, systemic considerations, risk assessment, and the detection of fraud within the domain of auditing and finance.
Figure 6 presents the topic model results for Topic 2. The presence of terms like “data”, “learning”, “model”, “accuracy”, “training”, “process”, and “security” strongly suggests that Topic 2 is related to machine learning models and performance. A total of 21.2% of tokens indicate that the top 30 terms account for a significant portion of the words in the documents assigned to Topic 2, emphasizing data-driven approaches using machine learning models and methods for tasks such as security, detection, and validation.
Topic 2 is distinct from the other topics, as indicated by its relative separation. However, it shows proximity to the cluster of Topics 3 and 4, suggesting potential shared vocabulary or conceptual overlap with these themes. The size of the red circle for Topic 2, coupled with its representation in the marginal topic distribution, indicates its notable presence within the analyzed documents. The accompanying bar chart of the top 30 most relevant terms for Topic 2 highlights keywords such as “data”, “learning”, “model”, “accuracy”, “training”, and “proposed”, strongly suggesting a central theme related to machine learning, statistical modeling, or data analysis methodologies. The presence of terms like “blockchain”, “neural”, “network”, and “machine” further reinforces this interpretation, pointing towards discussions of specific algorithms and technologies within this domain. Therefore, the overall picture of Topic 2 is a significant theme centered on data and modeling techniques, with potential connections to the topics represented by clusters 3 and 4, while maintaining a degree of thematic distinction from Topic 1.
This LDA topic model has several potential implications. It could be used for document classification, where documents are categorized based on the topics they contain. It also facilitates content analysis, helping to identify the main themes within a collection of texts. In the context of information retrieval, this model could improve the search and retrieval of relevant documents. Further investigation into other topics, as well as an understanding of the context and source of the documents, would provide a more comprehensive picture of the data and its themes.
Table 3 summarizes the dominant themes, keywords, focus, topic name, dominant word weight, and topic prevalence for each topic. The topic name is related to auditing, and the dominant word weight and percentage of tokens are calculated. An analysis of 10 distinct topics is presented within the field of AI and auditing, each with a focus on different aspects of AI integration in auditing practices.
Topic 1, “AI and Audit Technologies”, accounts for 33.4% of tokens, emphasizing interest in AI within auditing. Topic 2, “Data Security in Auditing”, with 21.2% of tokens and “data” as its top word, highlights the importance of data security. Topics 3 and 4 delve into technology in auditing, with Topic 3 on “Auditing and Accounting Technologies” focusing on “auditors”, and Topic 4 on “AI and Machine Learning in Auditing” emphasizing “model”.
Beyond AI’s direct application, the model identifies critical aspects of modern auditing. Topic 5, “Ethical AI in Audit Systems”, highlights concerns with AI’s role in decision-making. Despite a lower percentage of tokens (7.8%), its importance remains significant. Additionally, topics like “Image Processing in Audit” (Topic 6) and “Precision in Audit Technology” (Topic 8) indicate a focus on technological tools and their impact on auditing. The model captures non-technical topics indicating societal influences on auditing. Topic 7, “Political Influence in Auditing”, and Topic 10, “Racial and Ethnic Disparities in Auditing”, with lower token percentages (4.4% and 0.4%), highlight external factors impacting audit practices. Topic 9, “Environmental Ethics in Auditing”, suggests growing concern for sustainability and ethics with “climate” as the keyword at 0.4% token percentage.
Figure 7 offers a concise visual of key vocabulary across 10 topics, highlighting word importance and distribution from the LDA model for a better understanding of topic characteristics and word–topic relationships. The
x-axis shows prominent words; the
y-axis lists 10 topics. The blue color intensity in each cell represents a word’s weight in a topic, with darker blue indicating a stronger association. Lighter blue suggests a weaker link, and white cells mean negligible or no weight. In Topic 1, words like “ai”, “audit”, “study”, “auditing”, “artificial”, and “intelligence” appear with high density and weight, confirming the “AI in Auditing” theme. Topic 2 strongly associates with “data”, reinforced by a dark blue cell and high weight, supporting the “Data Security in Auditing” theme.
Auditing is significantly transforming due to advanced technologies, particularly AI and data handling. Topics like “AI in Auditing”, “Data Security in Auditing”, “Auditing and Accounting Technologies”, and “AI and Machine Learning in Auditing” show a strong shift toward these technologies. High token percentages (33.4%, 21.2%, 12.7%, and 11.1%) reveal a focus on AI, machine learning, and data-centric methods. Words like “ai”, “data”, and “auditors” in technology contexts emphasize new tools and skills in auditing.
AI and data analytics are leading to more specialized tech applications in auditing, like “Image Processing in Audit” and “Precision in Audit Technology”. This shift moves past traditional methods, using advanced tools to enhance audit efficiency and accuracy. The rise of “Ethical AI in Audit Systems” highlights the importance of ethical considerations in AI use for auditing. As AI becomes integral to decision-making, ethical frameworks and governance are essential, marking a key shift in auditing responsibilities. Technological themes dominate, but topics like “Political Influence in Auditing” and “Racial and Ethnic Disparities in Auditing” suggest auditing is broadening to consider external factors and societal issues. Despite lower token percentages, their inclusion shows growing awareness of the wider context of auditing.
6. Conclusions
This study explores the transformation of auditing in the era of artificial intelligence (AI) through a topic modeling analysis of 465 academic papers published between 1982 and 2024. The findings reveal a research landscape centered on integrating AI and data-driven technologies into auditing practices. The most dominant theme, “AI in Auditing” (Topic 1), accounts for 33.4% of the content analyzed, underscoring the central role of artificial intelligence in redefining audit methodologies and decision-making processes. The following theme is “Data Security in Auditing” (Topic 2) at 21.2%, reflecting critical concerns around data integrity, privacy, and governance in automated audit systems.
Secondary but significant themes such as “Auditing and Accounting Technologies” (12.7%) and “AI and Machine Learning in Auditing” (11.1%) point to sustained interest in the application of advanced digital tools for improving audit accuracy, efficiency, and responsiveness. While these findings highlight the profession’s commitment to technological advancement, the analysis also surfaces emerging but less dominant themes that warrant further exploration. Topics like “Ethical AI in Audit Systems”, “Image Processing in Auditing”, and “Political Influence in Auditing” illustrate growing attention to non-technical issues such as algorithmic fairness, audit independence, and the risks of bias in automated decision-making. However, themes like “Environmental Ethics in Auditing” and “Racial and Ethnic Disparities”, though present, remain significantly underexplored—each comprising less than 0.5% of the total content—suggesting important gaps in the literature where auditing intersects with broader societal concerns.
Theoretically, this study contributes to the field by providing a structured, data-driven map of the evolving research priorities in AI and auditing, helping scholars understand the shifting focus of academic inquiry over time. Practically, the findings inform regulators, audit firms, and technology developers about current trends and underrepresented areas that may benefit from targeted innovation, training, or policy development. The results indicate that while the profession advances technologically, ethical and social dimensions are only beginning to receive adequate attention.
Despite its contributions, the study has limitations in that the analysis was restricted to English-language, peer-reviewed publications, potentially excluding valuable regional or practitioner-based insights.
Future research should expand the dataset to include non-English and grey literature to provide a more global and inclusive perspective. Further qualitative or mixed-methods studies could deepen our understanding of emerging issues like algorithmic bias, environmental responsibility in auditing, and racial equity. Empirical work examining how AI tools are practically implemented across firms of different sizes, industries, or regulatory environments would also offer valuable insights. The interest in AI’s potential to transform auditing is increasing, prompting the auditing profession to engage with innovation and help develop ethical and social frameworks for these technologies, which ensures auditing remains a transparent, fair, and trusted accountability pillar.