Abstract
The transformative impact of AI technologies on the financial sector has been a topic of increasing interest. This study investigates ChatGPT’s applications in financial reasoning and analysis and evaluates ChatGPT-4o’s effectiveness and limitations in conducting both basic and complex financial analysis tasks. By designing a series of multi-step, advanced reasoning tasks and establishing task-specific evaluation metrics, we assessed ChatGPT-4o’s performance compared to human analysts. Results indicate that while ChatGPT-4o demonstrates proficiency in basic and some complex financial tasks, it struggles with deep analytical and critical thinking tasks, especially in specialized finance areas. This study underscores the need for meticulous task formulation and robust evaluation in AI financial applications. While ChatGPT enhances efficiency, integrating it with human expertise is crucial for effective decision-making. Our findings highlight both the potential and limitations of ChatGPT-4o in financial analysis, providing valuable insights for future AI integration in the finance sector.
1. Introduction
In 1950, Alan Turing published his seminal paper, “Computing Machinery and Intelligence” (Turing 1950), posing a profound question, “Can machines think?” Over seventy years later, on 30 November 2022, OpenAI launched ChatGPT (Chat Generative Pre-training Transformer), a revolutionary Artificial Intelligence (AI) language model that has rapidly transformed various sectors in a remarkably short time span.
Trained on extensive datasets using advanced Natural Language Processing (NLP) techniques and enhanced by Reinforcement Learning from Human Feedback, ChatGPT can perform a wide array of tasks. Unlike traditional search engines, it provides specific, concise answers and features an advanced data analysis tool. Recent updates have introduced audio and video interaction capabilities, further expanding its functionality. This enables it to write and execute code, perform complex financial analyses, and produce downloadable outputs, making it invaluable for precision and efficiency in financial analysis.
Throughout history, transformative technologies like manufacturing automation and the rise of e-commerce have ushered in new epochs. ChatGPT’s rapid adoption reflects this historical pattern. For instance, in Europe, the travel company Expedia has harnessed AI chatbots to help users plan cost-effective, eco-friendly trips (Blesiada 2023). According to Enterprise Apps Today, the technology and education sectors are among the foremost adopters of OpenAI’s solutions, with industries such as business services, manufacturing, and finance also integrating AI into their operations (Elad 2024). A 2023 Goldman Sachs report highlighted AI’s potential to displace up to 300 million full-time jobs (Kelly 2023), sparking debates among financial analysts about the future relevance of their roles in an increasingly automated economy.
Despite these benefits, current AI models, including ChatGPT, present some dilemmas. For instance, the accuracy and quality of ChatGPT’s responses can vary based on the question posed, the training data available, the complexity of the topic, and the given instructions or prompts (Kocoń et al. 2023). Further, the current AI models, including ChatGPT, still struggle with tasks requiring deep understanding and critical thinking (Roumeliotis and Tselikas 2023). Therefore, evaluating ChatGPT-4o’s performance in financial analysis is crucial. Automating financial tasks can enhance efficiency, reduce costs, and provide consistent, objective analysis. Understanding its capabilities and limitations helps address regulatory and ethical concerns, informs workforce transition strategies, and drives innovation. This study aims to investigate ChatGPT-4o’s effectiveness in performing financial analysis tasks traditionally handled by human analysts, offering insights into its potential and constraints in the financial sector.
To achieve this objective, we designed a set of multi-step and advanced reasoning financial tasks and established specific evaluation metrics. We then conducted empirical experiments to assess ChatGPT-4o’s performance on these tasks compared to human analysts. Our results indicate that while ChatGPT-4o can effectively perform basic and some complex financial tasks, it has limitations in tasks that involve managing complex financial information and specialized finance areas such as derivatives.
This research contributes to the understanding of AI’s role in finance by providing insights into ChatGPT’s financial applications and highlighting its potential limitations. These findings enhance the knowledge base for academicians, developers, and stakeholders interested in integrating ChatGPT into financial practices in our business world.
2. Artificial Intelligence Techniques and Related Studies in Financial Analysis
2.1. Historical Evolution and Technological Advancements
Technological advancements have profoundly influenced the evolution of the financial services industry. Innovations such as telegrams and Morse code in the late 19th century revolutionized monetary transactions, setting the stage for further technological progress (Saunders et al. 2021). The transition to digital banking in the 20th and 21st centuries marked a pivotal shift, with financial technology (FinTech) fundamentally transforming trading practices and financial management. The 1970s introduced algorithmic trading in financial institutions, leveraging computer models to automate trading strategies. This technological evolution enabled the development of advanced trading models that could analyze extensive datasets, identify patterns, and make informed trading decisions (Burgess 2021).
2.2. AI Applications in Financial Analysis
AI stands out as a significant and expanding field of interest among scholars and practitioners. Its applications extend across traditional areas like financial markets, trading, banking, investments, optimization, and insurance. Additionally, AI is increasingly pivotal in burgeoning FinTech sectors, including big data analytics, blockchain, and data mining. These applications are crucial for risk management and regulatory compliance (Ahmed et al. 2022; Cao 2022; Farooq and Chawla 2021; Lin 2019).
2.2.1. Enhancing Market Efficiency and Risk Management
AI-driven trading strategies have shown to outperform human traders under various market conditions including during crises such as the COVID-19 pandemic (Burgess 2021). The integration of machine learning and AI techniques has further refined algorithmic trading, significantly influencing market dynamics, liquidity, and trading strategies, thereby enhancing overall market efficiency (Chaboud et al. 2014). Moreover, advanced algorithms and machine learning models have demonstrated their efficacy in analyzing extensive datasets to identify potential risks (Demajo et al. 2020; Yu et al. 2023) and detect patterns of fraudulent activities (Jullum et al. 2020).
2.2.2. Predictive Analytics and Financial Stability
Since the 1990s, AI methodologies such as artificial neural networks, support vector machines, ensemble methods, generalized boosting, AdaBoost, and Random Forests have been employed to predict financial distress and failures in banks (Liu et al. 2021). The implementation of Explainable AI (XAI) in credit models within the banking sector, such as credit scoring and credit default prediction, has facilitated greater transparency and understanding of complex financial concepts, promoting their adoption in the finance industry (Demajo et al. 2020; de Lange et al. 2022).
2.2.3. Modeling Behavioural Biases and Sentiment Analysis
The use of AI to model behavioral biases has also gained prominence. The integration of Natural Language Processing (NLP) has become increasingly vital in finance studies since the early 21st century, covering areas such as text classification, sentiment analysis, and natural language generation. Research by Tetlock et al. (2008) and Bollen et al. (2011) has shown the predictive power of sentiment analysis in determining stock market trends, establishing a significant correlation between news sentiment and market behavior. Similarly, Félix et al. (2020) have employed machine learning-based models to construct implied volatility sentiment, further highlighting the utility of AI in financial analytics.
2.3. Emergence of ChatGPT in Financial Analysis
Since its inception in November 2022, ChatGPT has sparked considerable academic interest in its application to finance. Researchers have explored its utility in a variety of financial tasks, including financial document classification, sentiment analysis, named entity recognition in financial texts, and financial data extraction (Zaremba and Demir 2023). Traditional keyword-based methods in financial sentiment analysis have shown weaknesses, particularly in handling complex texts, as these methods are susceptible to adversarial manipulation (Boukes et al. 2020; Hartmann et al. 2023; Leippold 2023a).
ChatGPT’s ability to interface with explainable AI models and demystify complex financial concepts for lay audiences underscores its potential in enhancing financial analysis and research (Wenzlaff and Spaeth 2022; Yue et al. 2023). However, Leippold (2023b) cautioned that large language models (LLMs) like GPT-3 might generate unfounded content, as demonstrated in tests involving GPT-3’s responses on climate change topics. Furthermore, Lopez-Lira and Tang (2023) discovered a significant correlation between ChatGPT’s interpretations of corporate news and subsequent stock market reactions, suggesting its accuracy in financial analysis.
In finance research, Dowling and Lucey (2023) highlighted ChatGPT’s contributions across various stages of research, particularly in the study of cryptocurrencies. Hansen and Kazinnik (2023) demonstrated ChatGPT’s effectiveness in analyzing central bank communications, underscoring its value in comparative studies and zero-shot learning capabilities.
The market for AI in finance is experiencing significant growth and is driven by key players who are facilitating this transformation. Services such as KAI, AlphaChat, Growthbotics, and FinChat have been developed to meet the specific requirements of the financial sector. FinChat, in particular, leverages generative AI to provide investment research, offering fundamental investors relevant data through an interactive conversational interface.
Ethical and Regulatory Considerations
Despite its advantages, the deployment of AI models such as ChatGPT in financial settings presents significant ethical and regulatory challenges. Ensuring the responsible use of AI is crucial, particularly in areas of risk management and regulatory compliance (Zaremba and Demir 2023). The increasing acknowledgment of ChatGPT’s potential to influence financial practices and research necessitates robust measures to address these challenges and fully harness AI’s potential to enhance financial analysis.
3. Empirical Design
3.1. Financial Analysis and Reasoning in a Nutshell
Financial analysis can range from simple to complex, depending on the context and specific goals of the analysis. It involves the systematic examination of financial data to assess the performance of a business or investment and to make forecasts.
Unlike basic mathematical calculations, most simple financial analyses involve multi-step processes. A prime example is the concept of present value, a fundamental principle in finance widely used to determine the value of shares, bonds, projects, or entire businesses. Calculating present value requires several steps: identifying future cash flows, selecting the appropriate discount rate, determining the number of periods for each cash flow, and computing the present value for each cash flow. Other simple financial analyses include ratio and metric calculations, as well as simple budgeting and forecasting using historical data.
As we progress to more advanced or complex financial analysis, the necessity for precise reading comprehension, logical interpretation, and the application of financial principles becomes evident. For instance, evaluating a company’s operational status entails interpreting comprehensive financial statements to extract meaningful insights, identifying data patterns and relationships, and subsequently analyzing and formulating strategies. Furthermore, when making investment decisions, it is imperative to consider the cross-temporal and cross-domain characteristics of financial investments, conduct both fundamental and technical analyses, and select the optimal investment strategy amidst various uncertainties. Moreover, financial analysts need to navigate the complexities of financial regulations and compliance requirements.
In the realm of financial analysis, reasoning is of paramount importance. It involves utilizing available financial data, information, and pertinent factors to make judgments, draw conclusions, and infer insights about companies, businesses, projects, investments, or financial markets. This process demands critical thinking, analytical, and problem-solving skills. Financial reasoning further augments context and depth by considering broader economic, industry, and company-specific factors. Collectively, financial analysis and reasoning are indispensable for effective financial management and strategic planning, facilitating the examination, interpretation, and application of financial data to make well-informed decisions.
3.2. Rationale of Human Analysts and ChatGPT in Financial Analysis and Reasoning
Financial professionals, including analysts, traders, and investors, typically engage in reasoning to scrutinize financial statements, evaluate performance metrics, forecast future outcomes, and formulate strategies for investment, budgeting, and financial planning. These professionals must have a solid foundation in algebra and mathematics to excel in their roles. They employ a range of sophisticated tools to support their research, analysis, and investment management endeavors, such as charting software, technical analysis applications, options and derivatives analyzers, portfolio management solutions, and algorithmic trading platforms. Excel is fundamental for tasks like ratio analysis, risk management, investment analysis, and asset valuation. When managing extensive datasets, a deep understanding of mathematical and statistical techniques is crucial for drawing accurate conclusions from financial data. From simple to complex financial analysis, the following two example figures outline the process of analysis handled by human analysts. Figure 1 outlines a simple financial analysis process for net present value.
Figure 1.
Steps for Calculating Net Present Value (NPV) by Human Analysts.
For complex financial analysis, such as effective financial statement analysis, financial analysts should possess a blend of knowledge and tools as outlined in Figure 2 (Masson 2018; Brealey et al. 2022; Wahlen et al. 2018):
Figure 2.
Steps for Financial Statement Analysis by Human Analysts.
Furthermore, these professionals are responsible for developing advanced financial models and conducting extensive research. Proficiency in specialized financial software and programming languages like C++, R, SAS, and Python is vital for effectively navigating the financial landscape.
Conversely, AI, a branch of computer science, focuses on developing systems and machines capable of performing tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, language understanding, and decision-making (Sokolov 2019). Research by Son et al. (2023) on the application of large language models (LLMs) in financial reasoning confirms their capability to generate coherent investment opinions. Although this study does not detail the reasoning process in financial analysis with LLMs, it underscores the importance of task formulation, synthetic data generation, prompting methods, and evaluation capability in influencing the quality of responses generated by LLMs. Complementing this, Wei et al. (2022) found that enabling a chain of thought or intermediate reasoning steps significantly enhances the complex reasoning capabilities of LLMs.
ChatGPT, a notable example of AI, exemplifies these capabilities, particularly in financial reasoning, demonstrating remarkable ability in complex multi-step reasoning tasks. Based on work by Cheng et al. (2023), Son et al. (2023), and Wei et al. (2022), we develop the following financial analysis and reasoning framework regarding ChatGPT-4o as a financial analyst, as outlined in Figure 3.
Figure 3.
The Flow of Analysis for ChatGPT-4o as a Financial Analyst.
3.3. Tasks/Prompt
The principle guiding our task selection process is to ensure comprehensive coverage of financial concepts, the inclusion of realistic and complex task designs, and the integration of both quantitative and qualitative assessments. This approach equips AI models to effectively handle a wide range of financial analysis tasks in real-world settings.
We chose our tasks based on several key criteria to ensure the dataset’s suitability for empirical testing in AI-driven financial analysis. Firstly, the diversity of financial scenarios, ranging from basic savings and investment calculations to complex option pricing and portfolio optimization, ensures exposure to various financial problems, enhancing AI model robustness and versatility. Secondly, the tasks are grounded in realistic financial activities, such as calculating future values, present values, and internal rates of return. These tasks mirror the analyses conducted by financial professionals, ensuring the dataset’s relevance to practical applications.
Thirdly, we prioritized tasks that require complex reasoning and multi-step calculations, such as portfolio construction, capital budgeting analysis, and financial statement analysis. This complexity is ideal for testing AI systems’ capabilities in handling sophisticated financial models and analyses. Additionally, the integration of various financial theories and models, including the Black–Scholes model for option pricing, the Gordon dividend model for valuing stocks, and the Modigliani–Miller theorem on capital structure, ensures a comprehensive understanding of financial principles.
Lastly, the tasks involve both quantitative calculations (e.g., yield to maturity) and qualitative assessments (e.g., financial leverage impact), which are crucial for developing AI that can interpret and analyze financial data effectively.
To gain deeper insights into the performance of ChatGPT-4o in managing these tasks, we have categorized them based on the complexity of the reasoning process into multi-step reasoning tasks and complex reasoning tasks. Additionally, we will assess the effectiveness of traditional tools used by human analysts, such as mathematical equations, Excel, Refinitiv, Stata, and other resources, as benchmarks to evaluate the achievement of our objectives.
3.3.1. Multi-Step Reasoning Tasks
The multi-step reasoning task consists of 32 questions covering various topics in corporate finance, investments, and derivatives. These topics include, but are not limited to, present value, future value, annuities, payment schedules, investment accumulation, and rate calculations. The task also explores basic futures and options pricing models, value calculation, risk management, and forecasting. Additionally, it includes qualitative assessments leading to decision-making or recommendations regarding future dividend payouts and capital structure. A detailed overview of these tasks can be found in Appendix A.
These tasks primarily require straightforward calculations or judgments involving a series of logical or computational steps to reach a specific conclusion. They are usually solvable through explicit logic and analysis without subjective judgments. It is expected that ChatGPT-4o will provide accurate computational formulas and resultant values when addressing such tasks. The aim is to evaluate ChatGPT’s ability to apply logical and analytical reasoning in finance and investment, focusing on precision and objectivity in computations and assessments.
3.3.2. Complex Reasoning Tasks
Complex reasoning tasks require advanced calculations, extensive analysis, and creative thought processes, demanding a higher level of critical thinking compared to multi-step reasoning tasks.
To assess these analyses, we have developed six primary tasks. The first task evaluates ChatGPT’s ability to perform technical analysis of randomly selected stocks and provide stock recommendations based on each technical indicator used. The second task aims to determine if ChatGPT can act as a portfolio manager by constructing an investment portfolio that meets the client’s needs, with a focus on the application of Modern Portfolio Theory. The third task centers on corporate finance, emphasizing cash flow analysis and capital budgeting analysis. The fourth and fifth tasks are about financial statement analysis. The sixth task involves a binomial tree analysis. A detailed description of these tasks is available in Appendix B.
We conduct our evaluations using ChatGPT-4o, the latest and most advanced model equipped with a code interpreter and sophisticated data analysis capabilities. ChatGPT-4o excels in performing complex analyses and computations, allowing seamless interaction with various platforms and applications to ensure the accuracy and reliability of results. This enables comprehensive exploration and execution of tasks in finance and data analysis.
3.3.3. Evaluation Metrics
When financial analysts tackle a financial task, their approach typically begins with reasoning based on previously acquired specialized financial knowledge. They identify relevant concepts, formulas, and solutions applicable to the task at hand. Subsequently, they employ various professional tools to code and execute the task, culminating in the output of results. To scientifically compare the capabilities of large models like ChatGPT with traditional financial professionals, it is essential that these models also adopt a similar workflow. This workflow consists of logical reasoning followed by coding and modeling.
In evaluating the financial mathematics and decision-making performance of ChatGPT-4o, we will assess several metrics that encompass both quantitative and qualitative dimensions. These metrics are derived from generalized university rubrics, specifically tailored for elements of financial mathematics, designed to assess students’ proficiency in comprehension, reasoning, modeling, data analysis, and critical thinking capabilities (Selke 2013). Consequently, we divide the key steps of task processing into two primary modules: reasoning and modeling. The reasoning module includes evaluative dimensions such as task understanding and task deconstruction, while the modeling module encompasses calculation ideas and formulas as well as accuracy. Additionally, we have incorporated an extra metric for critical thinking to assess ChatGPT-4o’s ability in the application of knowledge and the level of critical thinking.
Task Understanding: This dimension gauges the ability to assimilate the prerequisites and objectives of a designated task or problem, evaluating the comprehension of the foundational concepts and principles inherent to the task.
Task Deconstruction: This dimension assesses the capability to fragment a task or problem into manageable and resolvable components or steps, focusing on the identification and isolation of pertinent variables and elements within a task.
Calculation Ideas and Formulas: This dimension scrutinizes the aptness and pertinence of the mathematical concepts, calculations, and formulas employed to decipher tasks, assessing the comprehension and application of mathematical models in problem resolution.
Accuracy: This metric quantifies the correctness and precision of the provided solutions against human analysts.
Critical Thinking: This dimension evaluates the capacity to objectively dissect information and formulate reasoned judgments, applying logical and reflective thinking to draw coherent conclusions and make informed decisions. The depth, quality, and efficacy of critical thinking can be assessed using diverse terminology that delineates the level of critical thinking applied (Stevens and Levi 2023).
For the criteria of task understanding, task deconstruction, and calculation ideas and formulas, we utilize qualitative scales categorized as basic, intermediate, and advanced to evaluate. The basic level identifies some components or steps of the task but lacks clarity and coherence in breaking it down and struggles to isolate pertinent variables and elements. The intermediate level represents effectively breaking down the task into clear, manageable components, accurately identifying and isolating pertinent variables and elements. The advanced level presents a skillful and coherent deconstruction of the task into detailed, manageable components, demonstrating precise identification and isolation of all pertinent variables and elements.
For the assessment of critical thinking/application of knowledge, we employ descriptors such as practical, applicable, functional, operational, and useful for questions 31 and 32 in the multi-step reasoning tasks. This practical descriptor evaluates if the knowledge applied is realistic and can be implemented in real-world scenarios. The applicable term assesses whether the knowledge is relevant and suitable for the given task. The functional descriptor evaluates if the applied knowledge effectively performs its intended purpose within the task. The operational descriptor checks if the knowledge can be actively used in real-world operations while considering all practical constraints and requirements. The useful descriptor measures the overall utility of the knowledge in achieving the task’s objectives.
Conversely, for complex reasoning tasks such as investment suggestions and corporate strategy, the evaluative process is anchored in varying levels of critical thinking to appraise performance with terms including advanced, moderate, basic, superficial, and naive. The advanced level signifies a deep and thorough understanding, with the ability to analyze, synthesize, and evaluate information critically. It involves strategic thinking and insightful judgment. The moderate level indicates a reasonable level of critical thinking, where the individual can interpret and analyze information adequately but may not demonstrate the same depth of insight as at the advanced level. The basic level shows a fundamental understanding and ability to apply critical thinking but with limited depth and complexity in reasoning. The superficial level suggests a shallow approach to critical thinking, where the individual’s analysis and evaluation lack depth and are primarily surface-level. The naive level indicates a very simplistic and undeveloped approach to critical thinking, often characterized by a lack of understanding and basic reasoning skills.
This comprehensive evaluative framework ensures a nuanced and multifaceted assessment of both human analysts and ChatGPT in the domains of financial mathematics and decision-making. It allows for a robust comparison and analysis of competencies and proficiencies across diverse tasks and scenarios.
4. Empirical Results and Findings
4.1. Data Collection/Retrieval
First, it is evident that contemporary AI models, including those analogous to ChatGPT, lack the functionalities and capabilities for real-time data retrieval. Consequently, they cannot directly generate the datasets required for specific financial analyses. Instead, these models are primarily limited to guiding users on potential sources from which pertinent data can be acquired, as illustrated in Appendix C.
For academic pursuits, practitioners ranging from students to seasoned professionals such as analysts, traders, and investors might consider platforms like Yahoo Finance, which offers complimentary access to a vast array of financial data. However, for more comprehensive datasets, one may turn to institutional databases. Organizations often provide access to premium platforms like S&P Capital IQ, Bloomberg, and LSEG Refinitiv Workspace, among other specialized software, to facilitate in-depth financial analysis.
Consequently, the data used in our Complex reasoning tasks were sourced from S&P Capital IQ and LSEG Workspace for the following stocks listed on the ASX: Chalice Mining (CHN), Vulcan Energy Resources (VUL), Fineos Corporation (FCL), Southern Cross Gold Ltd. (SXG), Liontown Resources (LTR), Neuren Pharmaceuticals (NEU), WiseTech Global Ltd. (WTC), Aristocrat Leisure Limited (ALL), NextDC Ltd. (NXT), and Pro Medicus Limited (PME). Additionally, we retrieved Australian 10-year bond yields from Bloomberg on the 15th of May and divided it by 252 trading days to obtain the daily yield.
4.2. Multi-Step Reasoning Tasks Results and Findings
Based on the comprehensive multi-step analytical assessment presented in Table 1, it can be concluded that ChatGPT-4o demonstrates a proficient capability in basic or standard financial analysis reasoning. It follows a step-by-step procedure, working through sequential processes to find solutions akin to highly capable human analysts. In most cases (27 out of 30), ChatGPT-4o reaches accurate conclusions and exhibits a strong understanding of the task at hand. Due to space constraints, we are only displaying the task results that differ from those of human analysts. Results for other tests can be provided upon request.
Table 1.
Multi-step reasoning task evaluation results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with basic, intermediate, and advanced descriptors. Critical thinking/application of knowledge is evaluated with practical, applicable, functional, operational, and useful descriptors.
Several noteworthy insights emerge from the observations. Firstly, the importance of prompts cannot be overstated. Prompts are instructions or queries entered into the AI’s interface to elicit responses, and they require careful wording and specific instruction. Inadequate instructions or poorly aligned Excel files often result in error messages and failure to achieve meaningful results. During our experiment, we observed that unclear instructions led to such issues.
Secondly, ChatGPT-4 demonstrates the ability to learn from instructions, supported by the study of Son et al. (2023), which shows that instruction-tuning plays a significant role in enhancing the performance of the model. Of the 30 calculation-focused multi-step reasoning tasks, the answers generated by ChatGPT-4o diverged from those provided by human analysts in only three instances: Tasks 9, 12, and 19. However, with the appropriate instructions or hints, ChatGPT-4o eventually arrives at the correct solutions, similar to those produced by skilled human analysts. For instance, in Task 9, ChatGPT-4o initially struggled with the exponential calculation, repeatedly arriving at an incorrect answer of 22.73%. After a question was asked, it corrected its answer to 19%, aligning with the human analysts’ solution. However, on the following day, when the same question was asked again, ChatGPT-4o produced another incorrect answer by using a different approach. Detailed information can be found in Appendix D.
Task 12 involved calculating the internal rate of return (IRR). In the initial attempt, ChatGPT-4o employed a trial-and-error method but persisted in trying with larger rate numbers despite the net present value diminishing. A human analyst had to intervene and provide guidance, after which ChatGPT-4o completed the task. Subsequently, when the same task was entered again, ChatGPT-4o immediately produced the correct answer. However, on another fresh trial the next day, ChatGPT-4o generated an incorrect result by using Python. More detailed information is provided in Appendix E.
The issue with Task 19 pertained to the application of the weighted average cost of capital (WACC) for mergers and acquisitions (M&A). Initially, ChatGPT-4o incorrectly applied the WACC of the acquired firm, resulting in different outcomes compared to those of a human financial analyst. Upon receiving prompts about selecting the appropriate WACC for M&A, ChatGPT-4o correctly identified the use of the acquiring firm’s WACC. Thus, with the proper instructions, it reached the correct conclusion.
Additional observations include instances where ChatGPT-4o does not directly provide final answers. In such cases, it recommends using tools like a financial calculator, Excel, or Python to complete the task.
For conceptual or qualitative tasks, such as Task 31 and Task 32, ChatGPT is capable of producing responses that are logical and adhere to recognized standards. However, these answers tend to be concise and may require further investigation. For example, in Task 31, which involves the understanding and insights into the dividend growth rate, ChatGPT-4o simply applied the average value, overlooking other elements that may affect the growth rate.
Moreover, it is noticeable that responses can vary each time a task is given, even though the main theme is maintained. This variability is a characteristic of artificial intelligence models. Language models, like chatbots, fundamentally operate as probabilistic systems, unlike deterministic systems. This means that posing the same questions can lead to different responses due to the inherent variability in the model’s response generation. In these tasks, the wording and structure of the task significantly affect the resulting response generated by the model.
Conversely, for computational or quantitative tasks, the responses, including any incorrect outputs, tend to be consistent across multiple repetitions until intervention occurs. This consistency in computational tasks contrasts with the variability seen in responses to qualitative or conceptual tasks, underscoring the different response mechanisms inherent to artificial intelligence models in different task environments.
Overall, financial analysis is a critical task where even a small error can result in significant financial losses. The ongoing refinement and synergistic collaboration between LLMs and human expertise are crucial to melding analytical precision with human intuition. Therefore, it is recommended to utilize ChatGPT for analysis with great care and caution. It is imperative to always double-check the results to ensure accuracy.
4.3. Complex Reasoning Tasks Results and Findings
In this section, we compare human analyst results with those produced by ChatGPT-4o for six complex reasoning tasks which cover the following broad areas: technical analysis and portfolio construction, capital budgeting and financial statement analysis, and derivatives option pricing. Table 2 presents the performance of ChatGPT-4o in executing complex reasoning tasks, highlighting its proficiency in answering these questions. For technical analysis and portfolio construction, we asked ChatGPT-4o to select the best 10 ASX-listed stocks based on the performance between January 2024 and the 14th of May 2024. Prompts and results from ChatGPT-4o are presented in Appendix F. After that, we extracted the daily stock prices from LSEG Workspace.
Table 2.
Complex reasoning task evaluation results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with basic, intermediate, and advanced descriptors. Critical thinking is evaluated with advanced, moderate, basic, superficial, and naïve descriptors.
For the technical analysis, particularly Tasks 1.1 to 1.3, ChatGPT-4o showed proficiency in performing tasks related to Bollinger Bands, Moving Average Convergence/Divergence (MACD), and Relative Strength Index (RSI). Following this, it offered individual stock recommendations—whether to buy, sell, or hold—based on the latest technical indicators available in our data sample. To validate the outcomes generated by ChatGPT-4o, we utilized LSEG/Refinitiv Workspace to create comparable results, typically formulated by us—human analysts. We included Chalice Mining Limited (CHN) as an example. As depicted in Table 3 and Appendix G, it was observed that the Bollinger Bands and MACD generated by ChatGPT-4o aligned with those from Workspace. However, discrepancies were identified in the RSI charts between ChatGPT-4o and Workspace.
Table 3.
Complex Reasoning Task 1 demonstration: technical analyses and indicators for Chalice Mining Limited (CHN).
Further, ChatGPT-4o demonstrated the capability to offer investment recommendations, providing rational justifications to back stock recommendations stemming from each technical indicator. For instance, it proposes a “hold/sell” recommendation when it detects a potential bullish crossover in the MACD when the RSI is approaching the upper limit and the price goes above the upper Bollinger band as the stock is in an overbought condition. Results for other stocks are available upon request.
For Complex Reasoning Task 2 (i.e., 2.1 to 2.5), ChatGPT-4o demonstrated proficiency in mirroring the responses of human analysts by constructing a global minimum variance portfolio and optimal risky portfolio, determining the weights of each stock in the portfolios, and combining the portfolios. However, there is a discrepancy in the stock weights of the global minimum variance portfolio determined by Excel/Stata and ChatGPT-4o, as shown in Table 4. For an optimal risky portfolio, stock weights provided by ChatGPT-4o are almost the same as the weights computed by Excel and Stata. ChatGPT-4o also successfully constructed an efficient frontier promptly. Both ChatGPT-4o’s calculations and our analyses are also presented in Appendix H.
Table 4.
Complex Reasoning Task 2 demonstration: portfolio construction.
However, ChatGPT-4o faced challenges in completing Complex Reasoning Tasks 3, 4, and 5. Task 3 assessed ChatGPT-4o’s ability in capital budgeting analysis, evaluating its proficiency in interpreting extensive information, distinguishing relevant information, and critical thinking. The results provided in Table 5 show that ChatGPT-4o’s final answers for NPV were inconsistent with human analyst calculations. Appendix I further shows errors in analyzing information, recognizing irrelevant costs, and miscomputing depreciated expenses. Moreover, ChatGPT-4o failed to create a detailed capital budgeting template outlining each cash inflow and outflow item annually.
Table 5.
Complex Reasoning Task 3 demonstration: capital budgeting.
In Tasks 4 and 5, we asked ChatGPT-4o to conduct financial statement analyses (Task 4 is a basic financial statement analysis, and Task 5 is a complex financial statement analysis). These tasks were sourced from the CFA problems test bank in the book Essentials of Investments (Bodie et al. 2022). However, the results varied significantly when compared to those of a human analyst. For example, without explicit instruction, ChatGPT-4o would apply the three-component DuPont formula analysis for Task 4 instead of the commonly used five-component method (Table 6). For task 5, ChatGPT-4o computed DuPont components incorrectly (Table 7). Step-by-step calculations from ChatGPT-4o are presented in Appendix J and Appendix K. Additionally, it appears that ChatGPT-4o struggles to accurately retrieve data tables formatted as images. The Excel template created by ChatGPT-4o displays different values. Any issues encountered in the initial step led to markedly different results or interpretations in subsequent steps.
Table 6.
Complex Reasoning Task 4 demonstration: financial statement analysis.
Table 7.
Complex Reasoning Task 5 demonstration: financial statement analysis.
Task 6 assessed whether ChatGPT-4o could calculate the American call option price using the binomial tree approach. As shown in Table 8, ChatGPT-4o concluded that early exercise is not optimal at any step, which is not correct in view of the output from DerivaGem. Appendix L further shows that ChatGPT-4o could not display the binomial tree diagram, even after instructing it to follow the DerivaGem diagram. Lastly, we attempted to use the new voice interaction feature in ChatGPT-4o. It could provide a better tree diagram, but the option prices and early exercise decisions remained incorrect.
Table 8.
Complex Reasoning Task 6 demonstration: Option Pricing–Binomial Tree.
During our complex reasoning evaluation, several issues related to ChatGPT-4o were identified. First, even when the same methods have been applied, a discrepancy exists between the charts produced by ChatGPT-4o and Workspace. Since both ChatGPT-4o and Workspace are tools or software used by human analysts to draw conclusions, it is plausible that the charts are slightly different from one another. Despite the existing discrepancies in both charts, the stock recommendations using the RSI from both ChatGPT-4o and Workspace are consistent (the RSI value lies between lower and upper bands).
Second, ChatGPT-4o relies mainly on Python programming. According to Dilmegani (2024), “the code interpreter only supports Python as a language”. Differences in programming methods may cause differences, such as the stock weights in the construction of a global minimum variance portfolio. In addition, this requires Python experts or analysts with proficient Python skills to detect any discrepancies in the calculation method.
Third, the capital budgeting and financial statement analyses exposed a shortfall in ChatGPT-4o’s capability to replicate human analytical processes, particularly in offering sequential calculations and in creating Excel-like templates outlining each cash flow item. This indicates that ChatGPT-4o generates responses based on patterns learned during training and does not understand context or infer meanings in the way humans do. This highlights a limitation in ChatGPT-4o’s ability to accurately process comprehensive information, suggesting a potential obstacle in its capability to assimilate and analyze complex data sets accurately.
Lastly, ChatGPT-4o may not provide accurate results for specialized finance areas such as derivative securities. Although GPT-4o was able to perform the step-by-step calculations like a human analyst, the results, such as those involving the probability of the up move, were not correct. Furthermore, it has to rely on Python programming to display the tree diagram, but the structure is somewhat different from a normal binomial tree diagram. The new voice and video model introduced by OpenAI on 14 May 2024 was able to generate a better tree diagram; however, the value of the option computed was also incorrect.
4.4. Discussion and Practical Application and Implementation
Consistent with the findings in a study from Cheng et al. (2023), ChatGPT-4o is able to achieve comparable performance to human analysts, at least for entry-level analysts. Our findings also align with Kocoń et al. (2023), which showed that the more difficult the task, the higher the performance loss of ChatGPT-4o. The qualitative analysis revealed ChatGPT’s lack of deep thinking and comprehensive analysis. Our results provide a basis for a fundamental discussion on whether high-quality financial analysis and reasoning can be effectively applied in real-life scenarios.
First of all, to understand whether ChatGPT provided the correct answers, users should have possessed enough prerequisite knowledge. Second, ChatGPT-4o enhances financial analysis efficiency by performing both basic and complex tasks, thereby automating repetitive calculations and allowing financial analysts to concentrate on more strategic decision-making. Financial institutions can deploy ChatGPT-4o to handle routine tasks such as present value calculations, ratio analysis, and basic forecasting, thus streamlining operations and optimizing human resource utilization. Additionally, ChatGPT-4o contributes to cost reduction by automating numerous financial analysis processes, which is particularly advantageous for small and medium-sized enterprises that may lack extensive financial analysis teams. In the realm of investment strategies, ChatGPT-4o’s ability to conduct technical analysis and portfolio construction enables it to assist in developing and optimizing investment strategies including analyzing stock performance, recommending buy/sell/hold actions, and constructing diversified portfolios based on Modern Portfolio Theory. Furthermore, ChatGPT-4o serves as a valuable educational tool, offering finance students and professionals step-by-step explanations and analyses of various financial concepts and tasks, thus aiding in the comprehension of complex financial models and theories.
However, given the limitations of ChatGPT now, several issues should be kept in mind. Firstly, data accuracy and reliability can be a concern, as ChatGPT might sometimes provide incorrect or outdated information based on its training data. Secondly, contextual understanding can be limited, with the AI potentially misinterpreting complex financial scenarios or nuances that a human analyst would catch. Thirdly, dependence on input quality is crucial; the outputs generated by ChatGPT are only as good as the data and queries it receives, necessitating careful and precise input from users. Fourthly, a lack of real-time updates means that ChatGPT cannot access the latest data or trends beyond its training cutoff, limiting its usefulness for dynamic, real-time financial analysis. Fifthly, security and privacy are important considerations, as using AI for financial analysis involves handling sensitive financial data, requiring robust measures to protect against data breaches. Lastly, ethical considerations arise from the potential biases inherent in AI models, which can impact the fairness and objectivity of the analysis. Addressing these issues is essential for effectively leveraging ChatGPT in financial analysis while mitigating potential risks.
5. Conclusions
This study has examined the analytical and reasoning capabilities of ChatGPT-4o through various financial tasks, providing significant insights into the strengths and limitations of LLMs in financial analysis. ChatGPT-4o has demonstrated considerable skill in performing standard financial reasoning tasks, closely aligning its analytical approach with that of human analysts. It excels in logical reasoning, task decomposition, and generating solutions, which are essential for tasks like financial modeling and forecasting. However, the study also highlights several challenges and limitations.
The variability in ChatGPT-4o’s responses, especially for qualitative tasks, underscores the importance of explicit instructions and careful task formulation. The discrepancies observed in some tasks between ChatGPT-4o and human analysts emphasize the need for robust evaluation metrics to ensure consistent and reliable outputs. Additionally, ChatGPT-4o encountered difficulties with complex tasks requiring a higher level of analytical depth and comprehensive understanding, indicating its limitations in replicating intricate human analytical methods.
Despite these challenges, the prospective integration of ChatGPT-4o with specialized financial data providers and tools, such as Bloomberg, S&P Capital IQ, and statistic packages, represents a transformative shift in the financial sector. This integration is poised to significantly enhance human analytical processes, enabling financial professionals to concentrate more on critical decision-making elements.
This research contributes to the understanding of AI’s role in finance by providing detailed insights into the applications and limitations of ChatGPT-4o in financial analysis. It establishes that while ChatGPT-4o can effectively perform basic and some complex financial tasks, it struggles with tasks requiring deep analytical depth and critical thinking. The study’s findings enhance the knowledge base for academicians, developers, and stakeholders interested in integrating AI into financial practices, demonstrating the potential for AI to enhance efficiency and accuracy in financial analysis when combined with human expertise.
This study is not without its limitations. While the tasks tested are grounded in real-life scenarios, there is a need to incorporate more high-level, practical, and specific tasks to further evaluate the capabilities of AI models. Expanding the dataset to include these advanced tasks will provide a more rigorous assessment of the models’ performance in complex financial environments. Furthermore, the study utilized only one AI model, ChatGPT-4o. Future research should consider using a variety of AI models, such as LLaMA, Galactica, and Pythis, including those developed by specific financial firms, to enable comprehensive comparisons and determine which models produce the most accurate and reliable results.
Additionally, the study assumed a general classification of human analysts as senior and expert analysts. However, human analysts vary widely in expertise, including junior, mid-level, and high-level analysts. Identifying and incorporating specific levels of human analysts in future evaluations could provide deeper insights and more nuanced comparisons of AI model performance against varying levels of human expertise.
Future research should focus on enhancing the deep thinking and comprehensive analysis capabilities of AI models like ChatGPT-4o. This could involve the development of hybrid models that combine the strengths of AI and human intelligence, leveraging AI’s computational power and efficiency with human intuition and contextual understanding. Real-time data integration and continuous learning mechanisms could be explored to improve AI’s adaptability to dynamic financial environments. Additionally, ethical considerations and overcoming biases in AI models should be a priority, ensuring fair and objective financial analysis. Continued interdisciplinary research will be essential to fully realize the potential of AI in finance.
Author Contributions
Conceptualization, L.X.L.; methodology, L.X.L. and K.X.; validation, L.X.L. and C.C.; formal analysis, L.X.L. and Z.S.; data curation, L.X.L. and Z.S.; writing—original draft preparation, writing—review and editing, and supervision, L.X.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Informed Consent Statement
Not applicable.
Data Availability Statement
The raw data and the detailed empirical results are available from the corresponding author on request.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Multi-Step Reasoning Tasks/Prompt
|
Appendix B. Complex Reasoning Tasks/Prompt
|
Appendix C. Data Collection/Retrieval

Appendix D. Multi-Step Reasoning Task 9 Demonstration
| The First ChatGPT-4o Trial | The Second Trial with Instruction | The Third Fresh Trial |
![]() | ![]() | ![]() |
Appendix E. Multi-Step Reasoning Task 12 Demonstration
| The First Trial | The Second Trial with Instruction (Correct Result) | The Third Fresh Trial (Wrong Result) |
![]() | ![]() | ![]() |
Appendix F. 10-Stock Selection by ChatGPT-4

Appendix G. Complex Reasoning Task 1 Demonstration: Technical Analyses and Indicators for Chalice Mining Limited (CHN)
| Human analyst result—LSEG Workspace |
![]() Red is MACD line and blue is Signal line. Source: LSEG Refinitiv Workspace Recommendation Bollinger Bands: price is close to the upper band and a bullish reversal has recently occurred, suggesting a “hold” at this stage. RSI: it is approaching the upper limit “70”, may indicate “overbought” or “sell” situation. MACD: MACD is above signal line, indicating “bullish” signal. |
| ChatGPT-4o result |
![]() Stock Recommendation Explanation CHN Hold/Sell Overbought. MACD bullish. Price above upper Bollinger Band. |
Appendix H. Complex Reasoning Task 2 Demonstration: Portfolio Construction
Global Minimum Variance Portfolio
| Human analyst result |
Stata:![]() Excel: ![]() |
| ChatGPT-4o result |
![]() |
Optimal Risky Portfolio
| Human analyst result |
Stata:![]() Excel: ![]() |
| ChatGPT-4o result |
![]() |
Efficient Frontier
| Human analyst result |
Stata:![]() |
| ChatGPT-4o result |
![]() |
Appendix I. Complex Reasoning Task 3 Demonstration: Capital Budgeting
| Human analyst result |
![]() |
| ChatGPT-4o result |
![]() |
Appendix J. Complex Reasoning Task 4 Demonstration: Financial Statement Analysis
| Human analyst result |
![]() |
| ChatGPT-4o result |
![]() |
Appendix K. Complex Reasoning Task 5 Demonstration: Financial Statement Analysis
| Human analyst result |
![]() |
| ChatGPT-4o result |
ChatGPT-4o’s Parts a—i calculations:![]() ChatGPT-4o’s Parts a—ii calculations: ![]() ChatGPT-4o’s Part b explanations: ![]() Excel template created by GPT-4o: ![]() |
Appendix L. Complex Reasoning Task 6 Demonstration: Option Pricing—Binomial Tree
| Human analyst result |
![]() |
| ChatGPT-4o result |
| ChatGPT-4o voice model: The black boxes contain the asset prices at each node. The dashed boxes contain the option prices at each node. Blue dashed lines indicate the option prices when holding the option is optimal. Red dashed lines indicate the option prices when early exercise is optimal. ![]() |
References
- Ahmed, Shamima, Muneer M. Alshater, Anis El Ammari, and Helmi Hammami. 2022. Artificial Intelligence and Machine Learning in Finance: A Bibliometric Review. Research in International Business and Finance 61: 101646. [Google Scholar] [CrossRef]
- Blesiada, Jamie. 2023. Expedia Group Gives Users the Opportunity to Test New Technology. Tavel Weekly. Available online: https://www.travelweekly.com/Travel-News/Travel-Technology/Expedia-Group-gives-users-opportunity-test-new-technology (accessed on 16 May 2024).
- Bodie, Zvi, Alex Kane, and Alan J. Marcus. 2022. Essentials of Investments, 12th ed. New York: McGraw Hill LLC. [Google Scholar]
- Bollen, Johan, Huina Mao, and Xiaojun Zeng. 2011. Twitter Mood Predicts the Stock Market. Journal of Computational Science 2: 1–8. [Google Scholar] [CrossRef]
- Boukes, Mark, Bob Van de Velde, Theo Araujo, and Rens Vliegenthart. 2020. What’s the Ttone? Easy doesn’t Do It: Analyzing Performance and Agreement between Off-the-shelf Sentiment Analysis Tools. Communication Methods and Measures 14: 83–104. [Google Scholar] [CrossRef]
- Brealey, Richard, Stewart C. Myers, Alex Edmans, and Franklin Allen. 2022. Principles of Corporate Finance, 14th ed. New York: McGraw-Hill US. [Google Scholar]
- Burgess, Nicholas. 2021. Machine Earning–Algorithmic Trading Strategies for Superior Growth, Outperformance and Competitive Advantage. International Journal of Artificial Intelligence and Machine Learning 2: 38–60. [Google Scholar] [CrossRef]
- Cao, Longbing. 2022. AI in Finance: Challenges, Techniques, and Opportunities. ACM Computing Surveys (CSUR) 55: 1–38. [Google Scholar] [CrossRef]
- Chaboud, Alain P., Benjamin Chiquoine, Erik Hjalmarsson, and Clara Vega. 2014. Rise of the Machines: Algorithmic Trading in the Foreign Exchange Market. The Journal of Finance 69: 2045–84. [Google Scholar] [CrossRef]
- Cheng, Liying, Xingxuan Li, and Lidong Bing. 2023. Is GPT-4 a Good Data Analyst? arXiv arXiv:2305.15038. [Google Scholar] [CrossRef]
- de Lange, Petter Eilif, Borger Melsom, Christian Bakke Vennerød, and Sjur Westgaard. 2022. Explainable AI for credit assessment in banks. Journal of Risk and Financial Management 15: 556. [Google Scholar] [CrossRef]
- Demajo, Lara Marie, Vince Vella, and Alexiei Dingli. 2020. Explainable AI for Interpretable Credit Scoring. arXiv arXiv:2012.03749. [Google Scholar] [CrossRef]
- Dilmegani, Cem. 2024. ChatGPT Code Interpreter Plugin: Use Cases & Limitations in 2024. AIMultiple Research. Available online: https://research.aimultiple.com/chatgpt-code-interpreter/ (accessed on 16 May 2024).
- Dowling, Michael, and Brian Lucey. 2023. ChatGPT for (Finance) Research: The Bananarama Conjecture. Finance Research Letters 53: 103662. [Google Scholar] [CrossRef]
- Elad, Barry. 2024. OpenAI Statistics 2024 By Demographics, Products, Revenue and Growth. Available online: https://www.enterpriseappstoday.com/stats/openai-statistics.html#google_vignette (accessed on 11 June 2024).
- Farooq, Akeel, and Privanka Chawla. 2021. Review of Data Science and AI in Finance. Paper presented at International Conference on Computing Sciences (ICCS), Phagwara, India, December 4–5. [Google Scholar]
- Félix, Luiz, Roman Kräussl, and Philip Stork. 2020. Implied Volatility Sentiment: A Tale of Two Tails. Quantitative Finance 20: 823–49. [Google Scholar] [CrossRef]
- Hansen, Anne Lundgaard, and Sophia Kazinnik. 2023. Can ChatGPT Decipher Fedspeak? Federal Reserve Bank of New York. United States of America. Available online: https://policycommons.net/artifacts/5671671/can-chatgpt-decipher-fedspeak/6437313/ (accessed on 11 June 2024).
- Hartmann, Jochen, Mark Heitmann, Christian Siebert, and Christina Schamp. 2023. More Than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing 40: 75–87. [Google Scholar] [CrossRef]
- Hull, John C. 2015. Options, Futures, and Other Derivatives, Global Edition. London: Pearson Education. [Google Scholar]
- Jullum, Martin, Anders Løland, Ragnar Bang Huseby, Geir Ånonsen, and Johannes Lorentzen. 2020. Detecting Money Laundering Transactions with Machine Learning. Journal of Money Laundering Control 23: 173–86. [Google Scholar] [CrossRef]
- Kelly, Jack. 2023. Goldman Sachs Predicts 200 Million Jobs will be Lost or Degraded by Artificial Intelligence. Forbes. Available online: https://www.forbes.com/sites/jackkelly/2023/03/31/goldman-sachs-predicts-300-million-jobs-will-be-lost-or-degraded-by-artificial-intelligence/?sh=43cb004a782b (accessed on 16 May 2024).
- Kocoń, Jan, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, and et al. 2023. ChatGPT: Jack of All Trades, Master of None. Information Fusion 99: 101861. [Google Scholar] [CrossRef]
- Leippold, Markus. 2023a. Sentiment Spin: Attacking Financial Sentiment with GPT-3. Finance Research Letters 55: 103957. [Google Scholar] [CrossRef]
- Leippold, Markus. 2023b. Thus spoke GPT-3: Interviewing a large-language model on climate finance. Finance Research Letters 53: 103617. [Google Scholar] [CrossRef]
- Lin, Tom C. 2019. Artificial Intelligence, Finance, and the Law. Fordham Law Review 88: 531. [Google Scholar]
- Liu, Li Xian, Shuangzhe Liu, and Milind Sathye. 2021. Predicting Bank Failures: A Synthesis of Literature and Directions for Future Research. Journal of Risk and Financial Management 14: 474. [Google Scholar] [CrossRef]
- Lopez-Lira, Alejandro, and Yuehua Tang. 2023. Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models. arXiv arXiv:2304.07619. [Google Scholar] [CrossRef]
- Masson, Dubos J. 2018. 6 Steps to An Effective Financial Statement Analysis. Association for Financial Professionals. Available online: https://www.afponline.org/training-resources/resources/articles/Details/6-steps-to-an-effective-financial-statement-analysis (accessed on 16 May 2024).
- Roumeliotis, Konstantinos I., and Nikolaos D. Tselikas. 2023. Chatgpt and Open-AI models: A Preliminary Review. Future Internet 15: 192. [Google Scholar] [CrossRef]
- Saunders, Anthony, Marcia Cornett, and Otgo Erhemjamts. 2021. Financial Institutions Management: A Risk Management Approach, 10th ed. New York: McGraw-Hill Education. [Google Scholar]
- Selke, Mary J. Goggins. 2013. Rubric Assessment Goes to College: Objective, Comprehensive Evaluation of Student Work. Lanham: R&L Education. [Google Scholar]
- Sokolov, I. A. 2019. Theory and Practice in Artificial Intelligence. Вестник Рoссийскoй академии наук 89: 365–70. [Google Scholar] [CrossRef]
- Son, Guijin, Hanearl Jung, Moonjeong Hahm, Keonju Na, and Sol Jin. 2023. Beyond Classification: Financial Reasoning in State-of-the-Art Language Models. arXiv arXiv:2305.01505. [Google Scholar]
- Stevens, Dannelle D., and Antonia J. Levi. 2023. Introduction to rubrics: An Assessment Tool to Save Grading Time, Convey Effective Feedback, and Promote Student Learning. Abingdon-on-Thames: Routledge. [Google Scholar]
- Tetlock, Paul C., Maytal Saar-Tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance 63: 1437–67. [Google Scholar] [CrossRef]
- Turing, Alan M. 1950. Computing Machinery and Intelligence. In The Essential Turing: The Ideas That Gave Birth to the Computer Age. Oxford: Clarendon Press. Available online: https://academic.oup.com/book/42030/chapter-abstract/355746326?redirectedFrom=fulltext (accessed on 16 May 2024). [CrossRef]
- Wahlen, James Michael, Stephen P. Baginski, and Mark Thomas Bradshaw. 2018. Financial Reporting, Rinancial Statement Analysis, and Valuation: A Strategic Perspective. Boston: Cengage Learning. [Google Scholar]
- Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35: 24824–37. [Google Scholar]
- Wenzlaff, Karsten, and Sebastian Spaeth. 2022. Smarter than Humans? Validating how OpenAI’s ChatGPT Model Explains Crowdfunding, Alternative Finance and Community Finance. Validating How OpenAI’s ChatGPT Model Explains Crowdfunding, Alternative Finance and Community Finance. Available online: https://ssrn.com/abstract=4302443 (accessed on 16 May 2024).
- Yu, Lining, Wolfgang Karl Härdle, Lukas Borke, and Thijs Benschop. 2023. An AI Approach to Measuring Financial Risk. The Singapore Economic Review 68: 1529–49. [Google Scholar] [CrossRef]
- Yue, Thomas, David Au, Chi Chung Au, and Kwan Yuen Iu. 2023. Democratizing Financial Knowledge with ChatGPT by OpenAI: Unleashing the Power of Technology. Available online: http://dx.doi.org/10.2139/ssrn.4346152 (accessed on 11 May 2024).
- Zaremba, Adam, and Ender Demir. 2023. ChatGPT: Unlocking the Future of NLP in Finance. ChatGPT: Unlocking the future of NLP in finance. Modern Finance 1: 93–98. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


























