Abstract
Building on prior research, we revisited the 21 personal finance scenarios using OpenAI’s newer ChatGPT-4o to observe whether its financial guidance has meaningfully evolved. Our qualitative analysis relied on expert assessments to examine both the content and tone of the model’s advice, considering how prompt engineering influenced ChatGPT outputs. We observed that ChatGPT-4o often produced more thorough suggestions and paid closer attention to tax implications—though it still overlooked some important details. It also showed more creative thinking in certain situations. However, some of the same shortcomings persisted: Generalizations remained too broad with respect to certain topics, legal references were occasionally misleading, and emotional empathy continued to feel artificial, even with carefully crafted prompts. We also extended our analysis to the newest ChatGPT model (ChatGPT-5). We found that the recommendations generated by ChatGPT-5 were quite similar to those generated by ChatGPT-4o, but the accuracy in the numerical problems was better under ChatGPT-5. While not a replacement for financial professionals, ChatGPT appears to be maturing into a more useful supporting tool for both advisors and clients. Our findings not only suggest cautious optimism but also underscore the need for careful oversight when using such tools in personal financial decision-making.
Keywords:
ChatGPT; Artificial Intelligence (AI); machine learning; personal finance; financial advisors; financial planning; finance pedagogy JEL Classification:
D14; G53; O33
1. Introduction
In recent years, large language models such as ChatGPT have drawn increasing attention for their potential role in personal finance. This study takes a qualitative investigation into how ChatGPT’s capabilities in this area have evolved—particularly in offering tailored financial guidance. Our central research question is straightforward: Has the newer ChatGPT-4o model shown meaningful improvement in delivering financial advice, and how does prompt design influence the nature of its recommendations? We focus attention on whether the model now better addresses personalization, compassion, and legal, or tax-specific issues.
() explored how ChatGPT-3.5 handled 21 personal finance scenarios, ranging from debt management and inheritances to retirement planning and medical expenses. Most existing studies on Artificial Intelligence (AI) in finance lean heavily on quantitative benchmarks—assessing factors such as model accuracy, return calculations, or portfolio construction. While valuable, these approaches often miss the subtleties of how AI communicates with real users. Our study takes a different route. Using a qualitative, expert-driven lens, we analyze how ChatGPT responds to a set of 21 financial cases originally compiled by (). These scenarios span a wide range of personal financial concerns, from debt crises and unexpected windfalls to elder care and long-term savings. They give us a grounded framework for exploring the human factors at play—tone, clarity, legal caution, and emotional resonance.
Recent studies by researchers such as () and () have examined ChatGPT’s performance on financial literacy tasks or its predictive power in stock selection. Others, such as (), have compared it to robo-advisors. While these findings are promising, most still rely on test-like evaluations. Our goal is more interpretive: to understand what it is like when people turn to AI for financial advice in nuanced situations.
Following the methodology laid out by (), who assessed ChatGPT-3.5 using the same cases, we evaluate the newer model’s responses with and without prompting enhancements. These prompts asked ChatGPT to behave similarly to a thoughtful, compassionate financial advisor. Rather than scoring outputs numerically, we considered them descriptively—based on expert judgment.1 Our analysis shows that ChatGPT-4o offers more complete and creative suggestions than its predecessor in many cases. Prompting tends to improve tone and attention to detail, but it does not fully resolve deeper issues such as vague legal guidance or questionable assumptions. While this tool clearly is not ready to replace human advisors, it may already play a useful supporting role—especially for users seeking a second opinion, or professionals exploring how to use AI in their practice.
The rest of this paper proceeds as follows. Section 2 provides motivation for the study, and Section 3 presents the methodology. Section 4 provides the evaluation of outputs under regular and enhanced prompts and provides a comparison of different ChatGPT models. Section 5 presents the limitations, implications, and contributions of this study, and Section 6 provides discussions and conclusions.
2. Motivation
OpenAI’s ChatGPT-4o represents a significant step forward in large language model (LLM) capabilities, including improvements in reasoning, response detail, and interaction quality, compared to ChatGPT-3.5. These enhancements warrant renewed attention to the role of ChatGPT in personal finance advice, particularly when evaluated from a qualitative, content-oriented perspective. While prior research has largely focused on performance metrics or quantitative comparisons, our motivation lies in interpreting how these model improvements manifest in the substance and style of financial guidance.
The growing public adoption of ChatGPT further motivates this inquiry. A recent Experian survey, for instance, found that 67% of Gen Z and 62% of millennials in the U.S. have used ChatGPT for financial planning (). Performing a Google trend analysis also confirms the popularity of this AI tool. Figure 1 shows the frequency of web searchers (worldwide) for ChatGPT and its main alternatives. ChatGPT is significantly more popular than any of its rivals. Though its rivals offer similar services (), ChatGPT has kept the top spot (). This wide uptake highlights the urgency of assessing not only how accurate ChatGPT is but also how it communicates financial advice and how users might interpret or misinterpret that advice.
Figure 1.
Total Web Searches for ChatGPT and its Alternatives. Source: Authors’ chart based on the Google trends using “search term” for each word. To reproduce this chart, please first type “ChatGPT”, in where it says, “Add a search term” in Google trends. Next, enter “Copilot”, “Gemini”, and “Claude” where it says “+Compare” one at a time. This process will plot the searches for all these terms in a single chart. The time period for the search is 31 December 2023–23 March 2025, and the relevant geography for the search is “Worldwide”.
Importantly, tools such as ChatGPT may have the potential to democratize access to financial knowledge—particularly in underserved regions or “financial advice deserts,” (). With financial literacy levels in the U.S. stagnating around 50% (), AI-driven advisors could play a critical educational role, offering individuals a low-risk and judgment-free way to ask questions and explore financial concepts (; ). These possibilities underscore the need for a careful and qualitative review of the content and tone of ChatGPT’s financial responses.
In addition to potential consumer benefits, ChatGPT may offer value to professionals as well. () argue that LLMs can support financial advisors in client education and communication. However, the integration of such tools into professional practice depends heavily on their consistency, nuance, and ethical presentation—elements best examined through qualitative, expert-guided interpretation rather than numerical performance alone.
In this study, we adopt such a qualitative approach to examine how ChatGPT-4o handles 21 personal finance cases. We assess not only the accuracy of its suggestions but also their contextual appropriateness, emotional intelligence, and attention to detail. These dimensions are critical in shaping real-world impact, especially when users make decisions based on AI-generated advice. Our study aims to inform both academic discussion and the practical consideration of AI tools in financial guidance by highlighting the strengths and persistent limitations of the current generation of LLMs through a qualitative lens.
3. Methodology
We use the same cases developed by () in our examination of ChatGPT-4o. We name the original cases from () and the resulting output regular prompt and regular output, respectively. In addition, different from (), we prompt-engineer ChatGPT by defining a clear role and background for ChatGPT as follows:
“You are a very competent financial advisor. You have many clients who are very pleased with how well you prioritize what they need to do and guide them through their financial goals and challenges. They commend you for having a strong quantitative background, attention to detail, a vast knowledge of various financial products, and a robust understanding of tax laws. They also praise you for being ethical, trustworthy, emphatic, and friendly. Your clients always recommend you to their family members and friends because of the high-quality work that you do. This is how you were able to build a large practice. Now consider the following case.”
We name the new cases (original cases preceded by the defined role above) and the resulting output enhanced prompt and enhanced output, respectively. It is important to reiterate that we use the same cases that () developed and used in their study. Using the same cases allows us to compare our results to theirs. Furthermore, the advantage of these cases is that they cover a wide range of scenarios and help reveal potential issues with ChatGPT (e.g., lack of compassion or attention to details) without overwhelming readers.
4. Results
4.1. Evaluation of Regular and Enhanced Outputs
For each case, we present here the case summary with the regular and enhanced ChatGPT outputs juxtaposed in Supplementary S1. We use ChatGPT to summarize each case and observe that ChatGPT does an adequate job in summarizing each case. Though we only present the ChatGPT summaries of the cases here (to save space), we input the full text of each case into ChatGPT for our analysis (please see the Supplementary Materials of () for the full text of the cases.). We conclude each case with our evaluation where we compare the regular and enhanced outputs and point out their strengths and weaknesses.
Case #1: James’ Financial Troubles
ChatGPT Summary:
James has $10,000 in credit card debt but only makes minimum payments despite having $100,000 in the bank. He underutilizes his company’s 401(k) match, contributing only 2% instead of 6%. Additionally, James ignores his unpaid medical bills.
Evaluation:
The first visible difference between regular and enhanced output is that with the regular output, ChatGPT takes the attitude of a distant third party in its communication. On the other hand, with the enhanced output, ChatGPT sounds more friendly and approachable. This shows that ChatGPT is highly responsive to the tone and expectations that we set for it in our prompts. For credit card debt, under the regular output, ChatGPT suggests that James pay off the balance, but it does not offer any prioritization. However, we observe prioritization in the enhanced output. This shows once again that prompting makes an important difference when using ChatGPT for financial advice. ChatGPT in the enhanced output suggests increasing retirement contributions as a second priority for James. This suggestion comes third in the regular output, but in general, it is unclear if the numbers correspond to priorities in the regular output as there is no mention of priorities. They appear as a bulleted list of actions with no priorities. Emergency fund comes out as the fourth in priority in the enhanced output, followed by creating a budget and a financial plan. We do not observe substantial differences in the context of the advice provided for these items. Different from the regular output, in the enhanced prompt, we see new suggestions, which are insurance, investment opportunities, and a regular review of the financial situation. In the regular prompt, we observe the “avoid new debt” suggestion, which does not appear under the enhanced prompt.
Overall, the enhanced output offers priorities to James compared to the regular output. However, it does not look like the enhanced output follows a logical planning process. Paying off credit card debt is first, followed by increases to 401(k) contributions, medical bills, emergency fund, budgeting, insurance, etc. While these are sound recommendations, the order of importance is critical. Generally, it is wise to address risk management first. This would be establishing an emergency fund, reviewing insurance coverage, budgeting, and then addressing debt. Without a solid foundation, a financial or medical emergency could derail James’ plans. Once the foundation is established, James can then focus on medical bills, increasing 401k savings, and investment strategies. He can then monitor and adjust his financial priorities accordingly.
Case #2: Sports Gambling and Winning the Lottery
ChatGPT Summary:
Mark secretly racks up $500,000 in sports betting debt while living rent-free with his family. He unexpectedly finds a $100 bill and spends it on lottery tickets, winning a $750,000 jackpot, which he hopes will solve his financial woes.
Evaluation:
In this case, both methods appear to produce prioritized action plans. In the regular output, we see the wording “step-by-step” approach. In the enhanced prompt, the suggestions start with “immediate steps”. The regular output ignores the impact of taxes, but with the enhanced output, we observe a suggestion for considering tax implications. Both approaches prioritize paying the credit card debt first and then establishing an emergency fund, although the suggested amounts differ (USD 50,000 under the regular prompt and USD 25,000 under the enhanced prompt). This difference is due to the time frame adopted: Under the regular prompt, the advice-seeker is recommended to have an emergency fund that can cover living expenses from 6 to 12 months. This time frame is 3 to 6 months under the enhanced prompt. We find this difference noteworthy. Under the regular prompt, we observe the prioritization of financial counseling and discussions with family members. This comes later with the enhanced prompt, which places a higher priority on setting up a budget and developing a financial plan. Under both prompts, we see suggestions for saving for college, investing, and insurance. As with the first case, we observe a suggestion for monitoring the financial situation, but this suggestion is generated by both prompts.
Comparing the two approaches, the advice generated with the regular prompt is not reliable, as it ignores the taxes that Mark needs to pay. Furthermore, the recommended amounts for a given action, such as allocating USD 150,000 in a diversified investment, are based on pre-tax amounts. The amounts shown in the enhanced output make more sense given the tax implications. On the other hand, it is unclear whether Mark has the financial capacity to follow the other suggestions, such as making investments and opening college savings accounts. Overall, sound advice is again generated from the enhanced prompt; however, the order of importance is not addressed.
Case #3: Living the Good Life
ChatGPT Summary:
The Smith family earns $500,000 annually but prioritizes a luxurious lifestyle and creating memories with their children over saving for retirement or their kids’ college. They have a second mortgage and rely on their job security, planning to co-sign their children’s student loans.
Evaluation:
Here, with the enhanced output, we continue observing prioritization, but this is less clear with the regular output (it appears to make general recommendations). In the enhanced output, we observe that budgeting and setting up an emergency fund are prioritized first. In this case, we observe a time frame of 6 to 12 months in the enhanced output, similarly to the regular output. Under both prompts we observe similar suggestions, such as savings via 529 accounts, having adequate insurance, estate planning, and paying off the second mortgage. One important detail that we observe is that with the regular prompt, a suggestion to consider refinancing is also provided, but we do not see this under the enhanced prompt. Moreover, a clear suggestion as to how to lower spending (“scaling back on luxury cars and other high-end purchases”) is provided in the regular output. The enhanced output is vague as to how to cut down on spending. Both prompts suggest that the Smith family maximize how much they contribute to their retirement accounts. The reasoning behind this suggestion is provided in the regular output (employee matches and tax benefits) but it is not provided in the enhanced output. However, it remains unclear if the Smith family has the financial capacity to “max out” their retirement savings before they can tackle their lifestyle expenses.
Overall, in both approaches, we observe that a specific financial planning process is not being followed. In other words, once the planner-client relationship has been established, specific and prioritized goals should be agreed upon; then, recommendations can be made. This process did not appear to be followed, and additionally, there is no specific implementation schedule regarding who is responsible for each task or the order in which activities should occur.
Case #4: A Deadly Virus and Wipe-Out of a Family’s Wealth
ChatGPT Summary:
The Franks, known for their frugality and strong financial habits, face a financial wipe-out when the stock market crashes by 90%. Their children criticize them for their sacrifices, comparing their situation to their spendthrift neighbors who now have the same net worth but better life before.
Evaluation:
Prioritization appears to be less clear under both prompts. We observe similar suggestions/recommendations under both prompts. One difference is that in the regular output, there is a recommendation to cut expenses. The family is already very frugal to the extent that they keep the thermostat at low temperatures to save on energy bills. Cutting expenses further may not be a viable strategy. Under both prompts, we observe a recommendation to diversify investments into bonds, real estate, etc. This assumes that in an event where the stock market plummets 90%, other asset classes can protect the family’s portfolio. Although the suggestion of diversification is sound in general, the family should be advised that in a dramatic systematic crisis, all asset classes may experience major declines in their values. If the family is anxious about the possibility of another wipe-out in the financial markets, the suggestion to invest in certificates of deposit might have been a more prudent one. Moreover, it is important to note that the losses in the Frank family’s portfolio are paper losses (i.e., unrealized losses). Selling their stocks now in order to diversify their portfolio may financially ruin them instead of staying put and waiting for a recovery. This point is missed by ChatGPT.
There is a recommendation to increase income under both prompts. While this appears reasonable theory, it is unclear if the Frank family is in any position to take on additional work. ChatGPT suggests that the Frank family enjoy the present and spend quality time with their children, but at the same time, it advises them to find more sources of income, which would naturally reduce family time. There is some inconsistency in these suggestions. It also appears that the prompts provide advice regarding specific values that the clients might not have. For example, a recommendation on teaching the children about financial balance may not be followed unless until it aligns with the client’s current or changed value system.
Case #5: Young Couple and Their Good Fortune
ChatGPT Summary:
Sam and Julie, renovating their new home, find $500,000 in cash and jewelry hidden in the walls. They save $50,000 for emergencies, invest the rest in low-cost index funds, and keep the jewelry, anticipating a rise in gold prices.
Evaluation:
We observe very similar recommendations under both outputs, such as considering tax consequences, estate planning, and the professional appraisal of jewelry. Both methods suggest that Sam and Julie take measures to protect their assets at home in case of theft, loss, or damage. The regular output suggests “safety deposit box”, and the enhanced output suggests “insurance”. We see a suggestion for developing a budget for repairing and improving the home under regular prompts, which is a good one. Both methods mention the children’s education; however, the case makes no mention of the couple having kids or planning to have kids. We see advice on debt management in the enhanced output. The case does not mention any debt that the couple carries. ChatGPT endorsed all the financial decisions that the couple made. Given that ChatGPT speculates that the couple may have debts, it would not make much economic sense to leave the debt unaddressed while allocating a significant proportion of funds to the index funds. Paying off existing high-interest debt should have been the prioritized recommendation. Moreover, both outputs assume that the money and jewelry that the couple found in the house belong to them. However, the legal treatment of this case is quite complex.2 As a result, ChatGPT may be misleading the couple.
Case #6: 90-Year-Old Grandpa Getting Rich with Options Trading
ChatGPT Summary:
Ken convinces his 90-year-old grandfather, Mr. Wilson, to trade (naked call) options. Mr. Wilson’s account grows by $5,000,000 in a year under Ken’s management. Grateful, Mr. Wilson feels financially secure, keeps $4,000,000, and gives the rest to Ken.
Evaluation:
Both outputs start with an analysis of the risky trade (selling naked call options) and emphasize that this trade was not suitable for Ken’s grandfather. We also see that the output under the enhanced prompt raises ethical concerns with Ken’s deployment of a highly risky trading strategy on behalf of someone who clearly does not understand the risks associated with it. We also see a suggestion to invest in real estate (among other asset classes) for Mr. Wilson. This advice may be inappropriate for someone who is 90 years old considering the illiquidity and the necessary time horizon associated with investing in real estate. Both outputs suggest a diversification strategy to lower/spread risk and are highly critical of what Ken did with his grandfather’s account. Though this is a valid suggestion and a justified criticism, they skirt around the issue that Mr. Wilson does not feel financially secure, and a low-risk strategy may not necessarily be sufficient to achieve the desired outcome in the first place. This does not necessarily mean that Mr. Wilson needs to take on more risks in his portfolio, but the suggestion provided does not seem to address the concern that he has (i.e., outliving his savings). For example, neither of the prompts makes any suggestions on lowering his living expenses to the extent that it is possible.
The enhanced output suggests an emergency fund and tax planning, which are not suggested in the regular output. However, there is no specific tax advice. Moreover, neither output brings up the tax liability on the gains earlier in their analyses. Mr. Wilson’s account was up by USD 5,000,000, and Mr. Wilson kept USD 4,000,000 and gave the rest to his grandson. Given the short-term nature of the deployed trading strategy and short-term investment horizon, there would be a significant tax liability on the gains. It is unclear if Mr. Wilson would even have USD 4,000,000 after he pays the taxes on the account. Furthermore, even if we ignore the taxes on capital gains, there is still a substantial amount of money being gifted to the grandson (Ken). The taxes on the gifted amount are not mentioned in either of the outputs. Overall, we observe that ChatGPT provides after-the-fact recommendations and not proactive recommendations. Additionally, a human approach may be able to provide more specific recommendations rather than broad suggestions regarding tax planning, annuities, and risk management.
Case #7: Hardworking Salesman
ChatGPT Summary:
Hank, needing money for his daughter’s medical expenses, secretly cheated by adding a 0.25% markup on invoices over 20 years, saving $5,000,000 without impacting his company’s finances.
Evaluation:
As with the other cases, the enhanced output has a more compassionate tone and approach. The regular output is unequivocal about the illegal nature of the scheme that Hank ran. On the other hand, the enhanced output is vague on this. It says what Hank did “could be considered embezzlement” while it is clearly an example of embezzlement. Both outputs point out the ethical issues in what Hank did. The regular output assumes that Hank will not be able to use his savings due to the illegal means that he resorted to accumulate them. On the other hand, the enhanced output advises a diversification strategy for Hank’s investments. In both prompts, there is an implicit endorsement of Hank’s decision to retire. Given the embezzlement scheme he ran and the legal consequences he will be facing, Hank is not in a position to retire. It is unclear whether Hank contributed to social security and whether his company offered any pension plans (defined benefit or defined contributions). It is also a legal question whether the company could confiscate Hank’s pension benefits due to the embezzlement scheme he ran. Furthermore, convicts do not draw social security payments.3 These potential issues are not brought up by ChatGPT. Both outputs emphasize the importance of having adequate health insurance. Thus, they assume that Hank did not know or was not aware of his insurance options. This seems presumptuous as someone who risked going to jail may have missed a simpler solution to his or her problems. The enhanced output suggests life insurance, which is a good idea, and also it also suggests that Hank look for additional ways to raise his income. These are generic recommendations that may or may not apply to Hank’s particular situation.
Case #8: Getting the Right Kind of Mortgage
ChatGPT Summary:
Sally and Matt, eager first-time homebuyers, secure an adjustable-rate mortgage despite knowing interest rates are likely to rise.
Evaluation:
Both outputs highlight the risks associated with ARMs and recommend that the couple refinance their adjustable-rate mortgage to a fixed-rate mortgage. This is a sensible suggestion. However, it is unclear why the couple will qualify for a fixed-rate mortgage via refinancing given that they did not qualify for one at the beginning based on the original case. The regular output suggests that the couple look for additional sources of income. The enhanced output suggests that the couple make extra principal payments and investments. These are great suggestions. However, it is unclear if the couple can take on additional jobs. Furthermore, the fact that they could not qualify for a less risky mortgage at the beginning suggests that the couple is not in a strong financial position. Asking them to make an extra principal payment may not be feasible. Neither of the outputs touches on the elephant in the room: The couple did not understand how adjustable-rate mortgages work with their rate adjustments. It makes sense to “hurry” and lock in a fixed-rate mortgage. In their case, they are not locking in anything. Instead of pointing out this flaw, the enhanced output finds it understandable that the “[t]hey rushed into the mortgage to lock in a lower rate.”
Case #9: Insurance Needs
ChatGPT Summary:
Sally, the sole breadwinner with four children and elderly parents, worries about her family’s financial future if she dies unexpectedly. With a large mortgage and minimal retirement savings, she is stressed about meeting their financial needs and paying for her parents’ medical bills.
Evaluation:
Here, both outputs make very similar suggestions for Sally. Guidance on the amount and the term length for term life insurance is helpful, but somewhat generic. The regular output suggests long-term care insurance for Sally’s parents and setting up a 529 plan and a spousal individual retirement account (IRA). These suggestions are warranted given that Sally has young children and ailing parents, and her partner is not employed. Whether Sally can afford all these additional expenses and contributions is not explicitly discussed. Moreover, suggestions such as the 529 plan and spousal IRA, while valid, may not align with Sally’s goals. Additionally, both outputs explain permanent life insurance products adequately, but they do not go into detail with regard to the investment component of these products. A more detailed explanation is warranted given the complexities of permanent life insurance cash value and investment strategies and risks.
Case #10: Unexpected Diagnosis and Hardship Withdrawal
ChatGPT Summary:
Howard, diagnosed with cancer, exhausts his savings and stops contributing to his 401(k) to pay medical bills. Facing foreclosure, he takes a hardship withdrawal from his 401(k) to keep his home, despite insurance not covering all his treatments.
Evaluation:
The enhanced output has a very compassionate tone. It shows empathy for the unforeseen situation that Howard found himself in. On the other hand, the regular output lacks compassion, and it points fingers at Howard for mismanaging his financial and health situation. Both methods recommend that Howard seek better insurance coverage and financial assistance from the government, charities, and hospitals, and avoid hardship withdrawals. Emergency savings are recommended in both outputs but the amounts vary. The regular output suggests that Howard save enough to cover living expenses for up to 12 months, while the enhanced output suggests that he do so for up to 6 months. Both outputs state that hardship withdrawal should be the last resort. There does not seem to be a clear and actionable recommendation for Howard’s problems in either of the outputs. It also appears that mental and emotional support is the last recommendation made when perhaps it should be the first. Howard is fighting for his life, and many of the suggestions are the last things that may be on his mind besides survival. The enhanced output is generated after telling ChatGPT that it possesses many desirable characteristics, such as empathy. This seems to change the tone that ChatGPT uses, but it does not render it more human in its recommendations. In a sense, this creates a worse output: a false sense of compassion. This is because the tone is compassionate, but recommended actions are not.
Case #11: Gambling as Last Resort
ChatGPT Summary:
Emily, discovering her late husband’s hidden debts totaling $450,000 and facing foreclosure, gambles her last $10,000 emergency savings. Miraculously, she wins $1,000,000 at the casino, enough to pay off her debts.
Evaluation:
The regular output ignores the fact that Emily needs to pay taxes on her gambling income. The enhanced output does take taxes into account. Both outputs prioritize paying off debts and obligations first, suggesting credit card debt to be paid off “immediately”. However, the rest of the debts and obligations are not prioritized. For example, should family members be paid off before the mortgage? We could argue that catching up with her mortgage payments should be her top priority lest she loses her house. Furthermore, it would be beneficial to Emily to have a good idea about how much she has after taxes before she takes care of her debts and obligations. It is also surprising that the suggestion on counseling appears in the regular output, but not in the enhanced output. The recommendations under investments come before assessing what Emily’s risk tolerance truly is. This may not be prudent investment advice given her situation. We also noticed that the enhanced output recommends steps to take immediately. This is close to providing guidance on an implementation schedule.
Case #12: Estate Planning
ChatGPT Summary:
Sarah, a widow with a $10,000,000 net worth, plans to create equal trust funds for her ten grandchildren. She wants a competent manager to oversee the funds until they turn 18, with provisions to redirect funds to a children’s hospital if any grandchild misbehaves.
Evaluation:
Both outputs suggest an irrevocable trust, but the enhanced output suggests that Sarah first create a revocable living trust and structure it so that it becomes irrevocable after she dies. In line with Sarah’s wishes, both outputs recommend a misbehavior clause. The enhanced output also has a suggestive tone for the misbehavior clause. The importance of tax planning is highlighted in both outputs, but the outputs do not account for unforeseen events such as the death of a grandchild after the trust is formed. Sarah should factor such unforeseen events into the trust’s documents. Another possibility is that Sarah may pass away before the trust documents are finalized. It will be prudent for her to appoint someone who can oversee this process according to her wishes in case she passes away. When it comes to choosing a trustee, there is a slight difference in the outputs. The regular output sticks with professional trustees or trust companies while the enhanced output appears to leave the door open for non-professionals trusted by the trustor. Sarah should be provided with the pros and cons of choosing a professional versus a non-professional trustee.
Case #13: No Luck with Convincing His Father
ChatGPT Summary:
Adam, a successful hedge fund manager, wants to convince his father, Sam, to invest in the stock market. Sam refuses to invest, preferring to keep his $5,000,000 out of the market, leading to tension between them.
Evaluation:
Both output attempts to give a balanced view of the possible thinking process adopted by Adam and his father Sam. The enhanced output speculates that Sam may have invested in low-risk alternative investments such as real estate and bonds. Here, the term “alternative investments” is misused in that alternative investments do not include bonds. Furthermore, calling real estate a low-risk investment may not be appropriate, as investing in real estate comes with many risks, including liquidity risk. Furthermore, the enhanced output recommends that Sam consider investing in equities with low risk to diversify his portfolio. It also provides recommendations for Sam to continue to stay informed about various investment options so that he can make educated decisions aligned with his financial goals. However, both prompts failed to address what the goals of Adam and Sam are. In other words, why does Adam want his father to invest in the stock market, why did Sam not invest in the stock market in the first place, and how did he accumulate his USD 5 million? There is also an assumption in both outputs that Sam is a risk-averse investor. Sam may simply dislike the stock market and may have engaged in risky ventures such as using substantial leverage to invest in real estate or even gambling to build his wealth.
Case #14: No Way to Go Wrong with the Stock Market
ChatGPT Summary:
Alex has a strong quantitative background, advises his friend Henry to take on more risk in his portfolio for higher returns. He suggests Henry load up on risky assets to boost his retirement savings.
Evaluation:
Both outputs confirm the validity of the risk-return relationship but warn that Alex’s generalization is too simplistic. The enhanced output distinguishes different types of risks, which is a helpful approach. Both methods highlight the importance of risk tolerance and the psychological aspect of investing as well as the need to have a diversified portfolio. The regular output also suggests that Henry increase his retirement contributions, which, if Henry can afford it, is better advice than “simply loading up” his portfolio with more risk. However, both outputs fail to address what Henry’s goals are. If he wants to retire early, it may make more sense for him to save more, and if he has a higher risk tolerance, he may need to invest in riskier assets. However, if he has a long-term time horizon, saving more and investing in riskier assets may not be the optimal choice.
Case #15: Mutual Fund Fees
ChatGPT Summary:
Beth considers two identical mutual funds for a $100,000 investment over 10 years, one with a 6% front-end load and the other with a back-end load. Unsure which to choose, her co-worker advises her to always pick the fund with the lowest cost.
Evaluation:
Both outputs have very similar suggestions and the same numerical example. The calculations for the front-end load fund yield slightly different results. However, in both the regular and enhanced output, for the specific example chosen, the results must be the same for front-end and back-end load mutual funds. The answer for the back-end load funds is USD 184,912 under both prompts, which is only USD 0.23 away from the correct answer (USD 184,912.23). The answer for the front-end load fund is USD 184,016 in the regular output, differing from the correct answer (USD 184,912.23) by USD 896.23. On the other hand, the answer for the front-end load fund is USD 184,150 in the enhanced output, differing from the correct answer (USD 184,912.23) by USD 762.23 (see Case #15 in Supplementary S1 for details).
It is unclear whether these differences are due to rounding or mathematical errors introduced by ChatGPT. The regular output has a clear recommendation in favor of the back-end load fund under certain conditions, which are a declining fee structure over the years and a waiver of fees after a certain period. The enhanced output here exhibits out-of-the-box thinking and suggests an alternative: “no-load mutual funds or low-cost index funds.” The enhanced output also brings up the importance of fund performance when assessing the claim made by Sarah’s co-worker. This makes sense since, in some cases, high fees can be justified if the fund performs well. It may not always be optimal for investors to choose a mutual fund with the lowest fee.
Case #16: Lure of Alternative Investments
ChatGPT Summary:
Kayla, five years from retirement, considers investing in timberland based on high returns reported by college endowments. She hopes doing this will help her catch up on her retirement savings, trusting the endowments’ professional management.
Evaluation:
Both outputs describe the properties of timberland as an investment. The enhanced output is more specific. For example, it provides the investment horizon (10 to 20 years), corrects the assumption that “timberland produces high returns”, and gives specific examples of risks such as pest infestations. We see detailed explanations on how college endowments are run and how they differ from individual investors in terms of risk tolerance and time horizon. We observe the suggestion of higher contributions in both outputs via catch-up contributions, which we welcome. Other suggestions seem quite generic (e.g., diversified portfolios), and both outputs fail to mention the potentially significant upfront costs in timberland. This could include the cost of buying the land in general, harvesting and replanting costs, as well as tax implications (basis, depletion, gains, etc.).
Case #17: Debt Consolidation
ChatGPT Summary:
Oliver, overwhelmed by $185,000 in debt from credit cards, medical bills, and a car loan, follows a friend’s advice to get a debt consolidation loan. He consolidates his debt into one payment so he no longer needs to deal with multiple lenders.
Evaluation:
Here, both outputs point out similar advantages and disadvantages of the solution (“debt consolidation”) adopted by Oliver. The regular output discusses the potentially positive impact of debt consolidation on Oliver’s credit score (if he makes his payments on time). On the other hand, the enhanced output discusses the potentially negative impact of the debt consolidation on his credit score since the new loan application necessitates a hard inquiry on his credit. Interestingly, the enhanced output makes a recommendation for Oliver to look at his interest rate and loan terms despite the fact that he has already consolidated and obtained the loan. Both outputs advise Oliver to refrain from taking on new debt. Without knowing the underlying reasons behind the debt that he accumulated (e.g., lack of financial literacy, chronic illnesses), this advice may not be feasible, at least in the short-term. The suggestion to look for additional income sources (the regular output) and reduce credit card limits (the enhanced output) could be helpful to Oliver. Both outputs advised Oliver to set up an emergency fund, but the enhanced output is vague in terms of the size of this fund. The enhanced output presents “being debt-free” as the final goal. Debt in itself is not dangerous and advising someone to live on a cash basis (at least implicitly) may stop them from growing their wealth by preventing them from buying a home or pursuing an advanced degree. Moreover, from the balance he carries (USD 60,000) on his car loan, the chances are high that Oliver is underwater on his loan. This possibility was not mentioned in either of the outputs.
Case #18: Retirement Problem with No Inheritance
ChatGPT Summary:
Mark and Emily, each saving $30,000 annually in their 401(k)s with a 7% return, plan to retire in 30 years. Assuming a 4% return during retirement and 20 years of post-retirement life, they need to calculate their annual annuity payments to deplete their account by the end of their lives.
Evaluation:
The regular output only considers the savings of one of the spouses. The couple saves USD 60,000 a year instead of USD 30,000. However, the output provided is consistent with annual savings of USD 30,000. The output does not show the details behind this calculation. The enhanced output has the correct future value of the annuity. The final answer, USD 416,966.52, is close to the correct answer (USD 417,035.40) with a difference of USD 68.88 (see Case #18 in Supplementary S1). However, the exposition of the solution can be improved. Overall, ChatGPT performs well in simple retirement problems.
Case #19: Retirement Problem with Inheritance
ChatGPT Summary:
Mark and Emily, saving $30,000 annually on their 401(k)s with a 7% return, plan to retire in 30 years. They want to leave a $500,000 inheritance so they need to calculate their annual annuity payments over 20 years of retirement to achieve this goal.
Evaluation:
The regular output generates an incorrect answer (USD 171,727.00 instead of USD 400,244.52 with a difference of USD 228,517.52), and it does not show the calculation steps. The enhanced output generates the correct amount at retirement (as it did in the previous case), which is USD 5,667,647.18. However, it does not properly incorporate the inheritance. The future value of the annuity is the amount available at retirement, while the inheritance is the amount at death (20 years from retirement). The enhanced output treats these two different time periods the same, and we observe this in the step where the inheritance is subtracted from the amount available at retirement. The correct treatment would be the following: The chatbot needed to calculate the present value of the inheritance at retirement and subtract this amount from the amount available at retirement. The correct answer after following these steps would be USD 400,244.52, whereas the ChatGPT’s answer is USD 380,267.11, leading to a difference of USD 19,977.41 (see Case #19 in Supplementary S1). Overall, ChatGPT does not perform well in more complicated retirement problems.
Case #20: Saving for College
ChatGPT Summary:
The Williams, with three young children, decides to invest $30,000 in call options on blue-chip stocks for each child’s college fund. They hope the high-risk investment will yield sufficient returns by the time each child turns 18.
Evaluation:
Here, the regular and enhanced outputs have a similar analysis of the case and suggest a comparable action plan. The regular output discusses “expiration risk” whereas the enhanced output discusses “time decay”. Here, the enhanced output provides a more accurate assessment of the risk involved in an options trade over time. On the other hand, the regular output makes it clear that options may expire worthless, translating to a 100% loss of the invested capital. We see that 529 accounts are suggested in both outputs. However, the enhanced output also lists Uniform Gifts to Minors Act (UGMA) and Uniform Transfers to Minors Act (UTMA) accounts as additional options. Both outputs fail to mention that most 529 plans do not allow options trading. Additionally, the outputs fail to look at the impact of different accounts on the Free Application for Student Aid (FAFSA). 529 accounts owned by parents are considered an asset of the parent and have a more favorable impact on the application for student aid. UGMA and UTMA accounts are considered assets of the child for FAFSA purposes, and these accounts may be used by the child for any expenses once they reach the age of majority, which is typically age 18 in most states. Additionally, only 5.64% of assets in parent-owned 529 plans are considered as available funds to pay for college, while assets in UGMA and UTMA account for 20% of the student’s assets available to pay for college.
Case #21: Saving His Marriage
ChatGPT Summary:
Thomas and Julie, living paycheck to paycheck, take a $10,000 loan to book a one-week cruise, leaving their toddlers with grandparents. They hope the vacation will relieve their stress and improve their marriage.
Evaluation:
The advice in the regular and enhanced outputs is mostly the same. They both offer suggestions such as “side jobs/income”, “avoiding new debt”, and “emergency savings”. The regular output used to suggest emergency savings that will cover expenses in a period from 6 to 12 months. Now this is reduced to 3 to 6 months. Here, the regular output has interesting suggestions, such as seeking government assistance and childcare options, downsizing the house, refinancing their mortgage, and selling assets. The enhanced output suggests that the couple consider saving for retirement and for their children’s college. Another possibility (not brought up by ChatGPT) is that one parent stays home with the kids to save money on childcare and transportation expenses. This may bring financial relief to the family, as childcare costs in some instances exceed housing costs (). The enhanced output also suggests specific debt repayment plans (debt snowball and debt avalanche methods), and it has a five-step action plan. Both methods suggest skill development to raise earning potential. Community colleges are tuition-free in some states, and this could be a viable option for Thomas and Julie.
4.2. Summary: Evaluation of Regular and Enhanced Outputs
Our analysis shows that prompt engineering has a substantial impact on the financial advice generated by ChatGPT. When we tell ChatGPT that the financial advisor has qualities such as compassion, knowledge of taxes, and attention to detail, we observe these qualities in the ensuing output. For example, when it comes to gambling income, the regular output misses the fact that these gains are taxable, but this fact is pointed out in the enhanced output. However, it is not always the case that prompt engineering provides superior outputs. We sometimes observe better suggestions in the regular output. For example, to someone worried about taking care of their elderly parents, ChatGPT recommends long-term care insurance in the regular output, but not in the enhanced output.
The enhanced output sometimes misses important details. For example, when it comes to short-term capital gains and sharing these gains among family members, we do not see the impact of capital gains taxes fleshed out. We also do not see any discussion of the potential gift taxes in the enhanced output (these points are missed in the regular output too). We observe this despite having communicated in our enhanced prompt that ChatGPT should pay attention to details, and it possesses a robust understanding of tax laws. When it comes to the ability to solve retirement problems, we observe similar outcomes. ChatGPT can solve a simple problem, but provides incorrect answers for a more complex problem. We offer a comparative summary of the regular and enhanced outputs in Table 1.
Table 1.
Summary of regular and enhanced outputs.
4.3. Comparison Between ChatGPT-4o and ChatGPT-3.5
How does ChatGPT-4o compare with ChatGPT-3.5 with respect to giving financial advice? Contrasted with the results reported by (), we observe that the newer model of ChatGPT provides more detailed and comprehensive suggestions. It provides suggestions such as no-load mutual funds for someone who is debating between front-end versus back-end load funds, and UGMA or UTMA accounts to parents with young children. However, the newer model continues to miss important details such as taxes (unless it is prompted to consider them) and to give legally dubious advice. There is also some progress in offering less generic financial advice, but there is substantial room for improvements. We developed a rubric (using a set of benchmarks) to compare these two different versions of ChatGPT. Table 2 presents the results of our comparative analysis.
Table 2.
Comparison between ChatGPT-4o and ChatGPT-3.5.
4.4. ChatGPT-4o Versus ChatGPT-5
OpenAI has recently introduced a newer model of ChatGPT: ChatGPT-5. Re-examining all the personal finance cases Via ChatGPT-5 is beyond the scope of this study. However, to provide a preliminary analysis of ChatGPT-5’s financial advice capabilities, we ran some of the cases through ChatGPT-5 (for these, ChatGPT-4o has not performed particularly well) and provide our analysis here. These cases are as follows: 3 (Living the Good Life), 4 (A Deadly Virus and Wipe-Out of a Family’s Wealth), 5 (Young People and Their Good Fortune), 6 (90-Year-Old Grandpa Getting Rich with Options Trading), 13 (No Luck with Convincing His Father), 15 (Mutual Fund Fees), and 19 (Retirement Problem with Inheritance). We first compare the outputs with regular and enhanced prompts. Next, we compare the performance of ChatGPT-5 and ChatGPT-4o. We provide the ChatGPT-5 outputs with regular and enhanced prompts in Supplementary S2.
For Case #3, we observe the following under the regular prompt. ChatGPT emphasizes the importance of saving for retirement with specific suggestions such as a saving rate of 10–15%, contributing to an IRA, saving enough to obtain the employer matching in 401(k) plans, and using automatic contributions. For the family’s mortgage, ChatGPT recommends refinancing, paying the mortgage sooner, avoiding further use of their home equity for borrowing, and keeping their mortgage payments below 25% of their after-tax monthly income. For emergencies, ChatGPT recommends 6 to 12 months of living expenses in a savings account with a high yield. It also urges parents to be careful about co-signing their children’s student loans, simultaneously providing guidance on minimizing student loans and choosing an appropriate amount of student loans. Another suggestion is provided with respect to keeping discretionary spending at 25% of their income level and creating a long-term financial plan by working with a financial planner. Under the enhanced prompt, we also see a recommendation for saving for retirement (an urgent one) beefed up with a numerical example. Suggestions to contribute to an IRA comes up again, but with a caveat: This time we observe ChatGPT suggesting a backdoor Roth IRA (a strategy deployed by high-income earners).
For the couple’s mortgage, there is a specific guideline on when to refinance (if the interest rate on the mortgage exceeds 6%). As with the output under the regular prompt, we observe guidelines on student loans. It appears that under both prompts, ChatGPT wants to protect the parents’ assets from creditors by setting limits on how much their children should borrow. We see new suggestions under the enhanced prompt, such as life insurance, umbrella liability insurance, estate planning, and using donor-advised funds. We also see a rule of thumb suggested to the family (50/30/20 rule): 50% needs, 30% wants, and 20% saving/investing. Overall, for Case #3, the advice provided under the enhanced prompt is more specific and richer compared to that provided under the regular prompt. Compared to ChatGPT-4o, the newer model provides new suggestions, such as a backdoor Roth IRA, using a high-yield savings account to keep emergency funds, and limits on student loans. The newer model also ends its output with requests to further help the individuals in the case. Despite these changes, we do not observe a vast improvement from ChatGPT-4o to ChatGPT-5.
Case #4 is an emotionally charged instance where a family, despite being financially responsible, had their portfolio destroyed due to a deadly virus. The output under the regular prompt lacks technical guidance and mostly focuses on living a balanced life and using what happened as a teachable moment. The output under the enhanced prompt also stresses the importance of living a balanced life. Further, it emphasizes the importance of diversification across stocks, bonds, cash, and real assets. It also suggests that the family maintains an emergency fund. Having cash could serve as a buffer during tumultuous times, but cash can be placed into a certificate of deposit or a high-yield savings account and still serve as a buffer. Comparing the ChatGPT-5 output to the ChatGPT-4o output reveals that the recommendations from ChatGPT-5 are very similar to those from ChatGPT-4o.
For Case #5, the output under the regular prompt starts with a warning: make sure that you have legal claim on the money! This is a very important point. We observe a suggestion that the couple contact an attorney and report what they found to local authorities. ChatGPT also suggests working with a financial planner to minimize taxes. There is a suggestion to have the gold appraised and to consider selling some of it for diversification purposes. The couple is also urged to revise their will and insurance, incorporating the change in their assets. Overall, we see ChatGPT-5 commending the couple for their investment decisions and urging them to seek legal advice on whether they legally own the items they found in the house. In the enhanced output, we observe similar suggestions on the need to legally establish the ownership of the money and the jewelry found. Potential tax liability is also highlighted under the enhanced prompt. We also see a specific diversification strategy with stock index funds, bond index funds, and real estate investment trusts (REITs). Other suggestions under the new prompt include umbrella liability insurance, 529 accounts, and custodial accounts (UGMA/UTMA). As with the previous cases, the recommendations are more specific and comprehensive under the enhanced prompt compared to the regular prompt.
Differently from ChatGPT-4o, ChatGPT-5 is very clear about the couple needing to seek legal guidance on the money and jewelry found. However, as with ChatGPT-4o, ChatGPT-5 makes assumptions about the couple’s desire to have kids and maintains a positive tone about with respect to what the couple did with the money (e.g., their investment decisions). It appears that ChatGPT lacks internal consistency on the legal dimension. A human advisor would likely take the view that the couple acted prematurely, investing the money too early without knowing whether they had a legal claim to it. Another example of this internal inconsistency is the issue of taxes. Under the enhanced prompt, we see a suggestion that the couple put aside 35–40% of the money for taxes. At the same time, the couple is praised (“smart”) for putting USD 450,000 in index funds, which means that they did not set aside the recommended amount for taxes. Overall, we see some improvements in the new model of ChatGPT in legal matters. However, as with the previous model, we observe that some of the advice provided by ChatGPT is disjointed, lacking a coherent structure and plan.
For Case #6, in the output with the regular prompt, we observe a rich discussion on the ethical and legal implications of Ken’s trading naked call options in his grandfather’s brokerage account. ChatGPT clearly disapproves of what Ken did, and it admonishes him for managing his grandfather’s money without having “proper licenses or disclosures”. However, it leaves the door open for Ken to give advice to his grandfather on conservative investments, which we believe is somewhat inconsistent. We observe the ethical and legal issues from Ken’s naked call trades were raised in the output under the enhanced prompt as well. We also see the legal issues extended to the brokerage firm for approving naked call trading for a 90-year-old. ChatGPT suggests that the appropriate portfolio for Mr. Wilson (Ken’s grandfather) comprises annuity, bonds, and dividend-paying stocks. For Case #6, the outputs from ChatGPT-4o and ChatGPT-5 are quite similar, with both strongly emphasizing that what Ken did was wrong. Moreover, we do not observe any discussion on the capital gains taxes (on the USD 5,000,000 gain) and gift taxes (USD 1,000,000 gifted to Ken) in either model. We continue observing unhelpful suggestions for Ken’s grandfather in terms of what he should invest in. He already feels insecure about his financial position, and it is doubtful how having conservative investments will help him without him making lifestyle adjustments. Overall, we do not notice a substantial change in the outputs from ChatGPT-4o to ChatGPT-5.
In Case #13, in both outputs, ChatGPT-5 takes the view that the claims made by Adam (hedge fund manager) and his father towards investing in the stock market are both correct. Furthermore, the chatbot emphasizes that investing is personal, and that people should not be forced into investments they are not comfortable with. This view is very similar to the output in ChatGPT-4o. Overall, we do not observe a major difference between ChatGPT-4o and ChatGPT-5 in their approach toward Case #13.
In Case #15, we observe a numerical comparison between the front-end and back-end load funds in the output under the regular prompt. It has the correct setup for the future value of the investment (taking the fees into account). The output does not provide a final answer for either of the funds but asserts that both funds have the same after-fee future value, which is correct. We do see a slight preference towards the back-end load funds if fees decline over time with an increasing holding period. We also observe the suggestion of no-load funds (rated as the best option). When assessing the co-worker’s claim, ChatGPT urges us to pay attention to the following (in addition to the fees): expense ratio, fund performance consistency, and the length of the holding period. In the output under the enhanced prompt, we observe the same numerical comparisons, but this time, the numerical answers are provided. The answer for the back-end load fund, USD 184,912, is accurate. There is only a USD 0.23 difference from the correct answer due to rounding (see Case #15 in Supplementary S2 for details). The answer for the front-end load fund is inaccurate by USD 14.23. It is USD 184,898 instead of USD 184,912.23 (see Case #15 in Supplementary S2 for details). The same argument in favor of the back-end load funds is observed under the enhanced prompt if the funds are held for long term. As to the co-worker’s claim, here ChatGPT suggests taking liquidity needs, time horizon, and flexibility into consideration. Both ChatGPT-4o and ChatGPT-5 calculate the ending value of the fund with back-end load fees correctly. For the ending value of the fund with front-end load fees, the answers are inaccurate in both models of ChatGPT, but significantly more so under ChatGPT-4o. Overall, the newest model of ChatGPT is more numerically correct. Both models produce somewhat similar responses to the claim made by the co-worker.
In Case #19, we have a retirement problem with inheritance. Both outputs have the correct future value at retirement. The final answer (USD 400,246) differs by USD 1.48 in the enhanced output from the correct answer (USD 400,244.52). The final answer (USD 400,265) differs by USD 20.48 from the current answer (USD 400,244.52) in the regular output (see Case #19 in Supplementary S2 for details). The setup is correct under the regular output, but there is some inconsistency in how ChatGPT performs rounding. However, the final answers from ChatGPT-5 are a vast improvement over those from ChatGPT-4o. Overall, comparing the outputs from ChatGPT-4o and ChatGPT-5 in this section reveals that numerical precision markedly improved from ChatGPT-4o to ChatGPT-5, but the financial advisory quality largely remained consistent.
5. Analysis: Limitations, Implications, and Contributions
Our study has the following limitations. We do not consider the ChatGPT models prior to ChatGPT-3.5 in our evaluation. We also do not have user data on how utilizing ChatGPT for personal finance questions impacted the quality of their decisions. Furthermore, we also caution readers that the personal finance cases used in this study may not apply to different demographics. New cases might be needed if a specific target group requires examination. The next two limitations are related to reproducibility issues in AI. ChatGPT may produce different suggestions than those we reported here when the same cases are used again. Similarly, different approaches to and content for prompt engineering are also likely to produce different outputs from ChatGPT than those we reported here.
We do not study the different regulatory perspectives around the world towards the use of AI tools for financial advice. For example, in the United States, robo-advisors are regulated by the Securities and Exchange Commission (). In the European Union (EU), robo-advising is regulated under the existing regulatory framework, and there are no specific laws targeting robo-advising (). ChatGPT is different from robo-advisors in that it is not subject to these regulations. We are not aware of any regulations for ChatGPT in the United States. The new EU law (EU AI Act) regulates ChatGPT, but it is not deemed high-risk (). This law requires generative AI tools be transparent about the use of AI in their answers and the use of copyrighted materials in model training. The law also requires that generative AI tools be designed such that users will not be able to generate illegal content. We encourage future research in this area.
Although we discuss the ethical and legal aspects of using AI for financial advice in our study, more research remains to be carried out in this area. There are news reports about AI-chatbots encouraging their users to take their own lives (; ). Although these cases are not related to seeking financial advice, they demonstrate the ethical and legal issues surrounding the use of AI tools in general. For example, the possibility exists that a potential AI user may blame his or her employer for all of his or her financial problems due to loss of employment and may feed this information into a chatbot. It is very important that the chatbot then not encourage any illegal or unethical action towards the employer. Furthermore, the possibility exists that an individual may act on financial advice from a chatbot and incur significant financial losses. The question is then whether he or she has any legal recourse. There is also the issue of AI being used to defraud investors () and even finance professionals (). Future research can shed light on these potential issues.
Our study also faces a limitation in terms of inputs from human advisors. We do not have data on human advisors’ answers to the personal finance cases in this study. Instead, we rely on our expert opinion when we carry out comparisons between human advisors and ChatGPT. We also do not compare ChatGPT to robo-advisors. The next two limitations are related to setting the parameters with respect to the use of AI for financial advice. We do not develop a comprehensive guideline for determining when users should use ChatGPT, human advisors, or a combination of both to assist in their personal finance decisions. We also do not develop a theoretical model that might explain who is more likely to use ChatGPT, human advisors, or a combination of both in their personal finance decisions. Another area that may benefit from further investigation is the empowering ability of ChatGPT. More studies need to be carried out on AI and financial advice. They can help determine whether tools such as ChatGPT “empower” financial-advice seekers, enabling them to better manage their finances or merely give the impression of increased understanding without providing them the benefits of customized and comprehensive financial planning.
Our study has the following helpful implications, which are of interest to a broad audience. ChatGPT is a powerful tool in answering personal finance questions and making personal finance decisions. At a modest fee, ChatGPT has the potential to offer financial advice and planning to segments of the population who believe they cannot afford to pay for financial advice from a human advisor. ChatGPT also has the potential to increase the financial literacy of its users with its explanations and recommendations. However, users should be cognizant of the fact that ChatGPT only offers a portion of the solutions that may apply to them, as some will be missed. A competent human advisor can fill this void. Users should also be extra cautious in using ChatGPT for legal advice. It may be vague, ambiguous or misleading when it comes to potential legal issues in personal finance cases. Similarly, users should be aware that ChatGPT misses important tax considerations even with prompting. For example, it may commend behavior or produce recommendations without considering the impact of potential tax liabilities. Furthermore, users should not immediately rely on the numerical solutions from ChatGPT when they make financial decisions, as these solutions occasionally contain errors. We have observed that ChatGPT can sound very confident regardless of the accuracy and suitability of its recommendations. Therefore, users should not take ChatGPT’s output at its face value. In addition to the accuracy and suitability of its financial advice, compassion is another area where ChatGPT needs improvement. Currently, even with prompting, its outputs do not mimic human compassion. For example, to a financially responsible person who got diagnosed with cancer and found himself in extreme financial difficulty due to healthcare costs (see Case #10 in Supplementary S1), ChatGPT’s departing words are “By reviewing your insurance, seeking additional financial assistance, resuming retirement contributions, rebuilding your emergency fund, and managing your debt carefully, you can improve your financial situation”. This is not the right tone to talk to someone who is fighting for his life. Users should be cognizant of this limitation if they are in a personally challenging situation. Overall, the results from our study imply that ChatGPT has the potential to replace human advisors for simple personal finance decisions in the future. However, for more complex cases, there is a clear need for human financial advisors.
Our study provides numerous contributions to the literature. It advances the practical and academic knowledge of AI applications in personal financial decisions. Furthermore, it demonstrates the usefulness and limitations of using ChatGPT in seeking personal finance advice. Testing a new model of ChatGPT (ChatGPT-4o) in the personal finance domain is another contribution. Our study also contributes to the literature by showcasing an application of prompt engineering for improving personal finance advice. It furthers this contribution by testing and examining the impact of prompt engineering on ChatGPT’s responses. This contribution is supported by offering a comparative analysis of ChatGPT’s regular and enhanced (prompt-engineered) outputs. Another important contribution is that we present a comparative analysis of ChatGPT-4o and ChatGPT-3.5 in the financial advice domain. Presenting a comparative analysis of ChatGPT-4o and ChatGPT-5 (based on a select number of cases) furthers this contribution. Developing benchmarks (a rubric) to compare and contrast different versions of ChatGPT for personal finance decisions is also another helpful addition to the literature. Finally, in the discussion of this study’s limitations, we offer numerous directions for potential new research on AI and finance to the scholars in the field.
6. Discussions and Conclusions
This study examines the capability of ChatGPT-4o to help individuals with financial problems and challenges based on the personal finance cases developed by (). We see clear improvements in ChatGPT-4o compared to ChatGPT-3.5, such as more detailed suggestions and alternative solutions (out-of-box thinking). However, the newer model, in its current form, does not appear to be capable of replacing human financial advisors. This is because it tends to provide generalized advice, overlooks important aspects of the financial planning process, such as determining client goals and expectations, and makes mathematical errors in retirement problems. Moreover, ChatGPT sometimes lacks a moral or legal compass. For example, in the face of a clear legal violation, it advises obtaining legal counsel, but fails to acknowledge that the actions taken in the case were unequivocally wrong. This could point to additional moral shortcomings with ChatGPT for users beyond those seeking financial advice. In our preliminary analysis of ChatGPT-5, we see some of these concerns being mitigated (especially the accuracy in numerical problems). However, we do not observe vast differences in the quality of financial advice given by ChatGPT-4o and ChatGPT-5 in the cases that we analyze.
We find that the quality of financial advice improves (but not always) with prompt engineering. However, the issue is that through prompt engineering, ChatGPT appears to mirror the focus of the user’s attention. If a user is not thinking about taxes, ChatGPT may still provide useful financial advice, but it may omit any considerations of taxes. An important aspect of seeking financial advice is to find out or to discover issues or concerns that are not necessarily on an individual’s radar. While a competent human financial advisor can bring such topics to our attention, at this point, it is doubtful whether ChatGPT can do the same even with prompt engineering. Furthermore, we can use prompt engineering to emulate human-like outputs such as compassion. However, we encounter a sense of “false or insincere compassion” throughout the advice given. We concur with () in that ChatGPT has an advantage over human advisors thanks to its convenience, and that it is a great platform to seek general and initial financial advice. However, we suggest that this tool be used with great caution as its omissions of important details, such as taxes and legal issues could create problems for users. Finally, we believe that the benefits of using ChatGPT outweigh its drawbacks in the personal finance domain.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jrfm18120664/s1, Supplementary S1: Regular and Enhanced ChatGPT-4o Outputs for Cases 1–21 with Case Summaries; Supplementary S2: Regular and Enhanced ChatGPT-5 Outputs for Selected Cases (Cases 3, 4, 5, 6, 13, 15, and 19).
Author Contributions
Conceptualization, M.T.T.S. and S.R.; methodology, M.T.T.S. and S.R.; validation, M.T.T.S. and S.R.; formal analysis, M.T.T.S. and S.R.; investigation, M.T.T.S. and S.R.; resources, M.T.T.S. and S.R.; data curation, M.T.T.S.; writing—original draft preparation, M.T.T.S.; writing—review and editing, M.T.T.S. and S.R.; visualization, M.T.T.S.; supervision, M.T.T.S. and S.R.; project administration, M.T.T.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding. The APC was funded by the authors.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study (the 21 personal finance cases and ChatGPT outputs) are available in the Supplementary Materials (Supplementary S1 and S2) accompanying this article. The original cases were developed by () and are available in their published article. No new datasets were created in this study.
Conflicts of Interest
Sterling Raskie is employed at Blankenship Financial Planning as the Vice President. He has not received any funding from Blankenship Financial Planning to conduct this research. The expert opinions that he shared in this study are his own, not those of Blankenship Financial Planning. All the authors declare no conflicts of interest.
Notes
| 1 | Dr. Minh Tam Schlosky has been teaching courses in finance, economics, and statistics over a decade with an active research agenda. Dr. Sterling Raskie is a CFP® and a Vice President in a financial planning firm, and he has over 20 years of professional experience. He also teaches finance courses at the graduate and undergraduate levels. Dr. Schlosky and Dr. Raskie collaboratively evaluated each case. |
| 2 | https://www.asreb.com/2014/04/question-can-keep-500000-found-walls-house/, accessed on 30 August 2025. |
| 3 | https://www.ssa.gov/pubs/EN-0.pdf, accessed on 30 August 2025. |
References
- Ames, O. (2024). Leveraging AI for financial literacy: A comprehensive guide. EastRise Credit Union. [Google Scholar]
- CFA Institute. (2020). Robo-advisors. Theme: Technology. CFA Institute Research & Policy Center. [Google Scholar]
- Chatterjee, S., & Fan, L. (2023). Surviving in financial advice deserts: Limited access to financial advice and retirement planning behavior. International Journal of Bank Marketing, 41(1), 70–106. [Google Scholar] [CrossRef]
- Chen, H., & Magramo, K. (2024, February 4). Finance worker pays out $25 million after video call with deepfake ‘chief financial officer’. CNN. [Google Scholar]
- Duffy, C. (2025, April 3). Senators demand information from AI companion apps following kids’ safety concerns, lawsuits. CNN. [Google Scholar]
- Dunne, R. (2024, November 4). Growing number of Gen Z & millennials using AI for financial guidance, survey shows. Mugglehead Magazine. [Google Scholar]
- El Atillah, I. (2023, March 31). Man ends his life after an AI chatbot ‘encouraged’ him to sacrifice himself to stop climate change. Euronews. [Google Scholar]
- FINRA. (2024). Artificial intelligence and investment fraud. Available online: https://www.finra.org/investors/insights/artificial-intelligence-and-investment-fraud (accessed on 30 August 2025).
- Gibson, K. (2024, November 26). Child care in the U.S. today can cost more than families pay for rent, a mortgage or college tuition. CBS News. [Google Scholar]
- Ludwig, E. T., & Bennetts, C. R. (2023). Streamlining financial planning with ChatGPT: A collaborative approach between technology and human expertise. Financial Planning Association. [Google Scholar]
- Meineke, M. (2024). Can you answer these 3 questions about your finances? The majority of US adults cannot. World Economic Forum. Available online: https://www.weforum.org/stories/2024/04/financial-literacy-money-education/ (accessed on 30 August 2025).
- Niszczota, P., & Abbas, S. (2023). GPT has become financially literate: Insights from financial literacy tests of GPT and a preliminary test of how people use it as a source of advice. Finance Research Letters, 58, 104333. [Google Scholar] [CrossRef]
- Oehler, A., & Horn, M. (2024). Does ChatGPT provide better advice than robo-advisors? Finance Research Letters, 60, 104898. [Google Scholar] [CrossRef]
- Ortiz, S. (2024a). The best AI chatbots of 2025: ChatGPT, Copilot, and notable alternatives. ZDNET. [Google Scholar]
- Ortiz, S. (2024b). What is ChatGPT? How the world’s most popular AI chatbot can benefit you. ZDNET. [Google Scholar]
- Pelster, M., & Val, J. (2024). Can ChatGPT assist in picking stocks? Finance Research Letters, 59, 104786. [Google Scholar] [CrossRef]
- Roberts, M. (2023). Does generative AI solve the financial literacy problem? Knowledge at Wharton. Available online: https://knowledge.wharton.upenn.edu/article/does-generative-ai-solve-the-financial-literacy-problem/ (accessed on 30 August 2025).
- Schlosky, M. T., Karadas, S., & Raskie, S. (2024). ChatGPT, help! I am in financial trouble. Journal of Risk and Financial Management, 17(6), 241. [Google Scholar] [CrossRef]
- Taleo Consulting. (2022). Robo-advice in the financial sector. Available online: https://www.taleo-consulting.com/robo-advice-in-the-financial-sector/ (accessed on 30 August 2025).
- The European Parliament. (2023). EU AI Act: First regulation on artificial intelligence. Updated on February 19, 2025. Available online: https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (accessed on 30 August 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).