The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research
Abstract
1. Introduction
2. Materials and Methods
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tan, S.; Xin, X.; Wu, D. ChatGPT in medicine: Prospects and challenges: A review article. Int. Surg. J. 2024, 110, 3701–3706. [Google Scholar] [CrossRef] [PubMed]
- Tangsrivimol, J.A.; Darzidehkalani, E.; Virk, H.U.H.; Wang, Z.; Egger, J.; Wang, M.; Hacking, S.; Glicksberg, B.S.; Strauss, M.; Krittanawong, C. Benefits, limits, and risks of ChatGPT in medicine. Front. Artif. Intell. 2025, 8, 1518049. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Seth, I.; Hunter-Smith, D.J.; Wang, Z.; Egger, J.; Wang, M.; Hacking, S.; Glicksberg, B.S.; Strauss, M.; Krittanawong, C. Aesthetic surgery advice and counseling from artificial intelligence: A rhinoplasty consultation with ChatGPT. Aesthetic. Plast. Surg. 2023, 47, 1985–1993. [Google Scholar] [CrossRef] [PubMed]
- Chelli, M.; Descamps, J.; Lavoué, V.; Trojani, C.; Azar, M.; Deckert, M.; Raynier, J.-L.; Clowez, G.; Boileau, P.; Ruetsch-Chelli, C. Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: Comparative analysis. J. Med. Internet Res. 2024, 26, e53164. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Kenney, P.S.; Bulloch, G.; Hunter-Smith, D.J.; Thomsen, J.B.; Rozen, W.M. Artificial or augmented authorship? A conversation with a chatbot on base of thumb arthritis. Plast. Reconstr. Surg. Glob. Open 2023, 11, e4999. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Marcaccini, G.; Lim, K.; Castrechini, M.; Cuomo, R.; Ng, S.K.-H.; Ross, R.J.; Rozen, W.M. Management of Dupuytren’s disease: A multi-centric comparative analysis between experienced hand surgeons versus artificial intelligence. Diagnostics 2025, 15, 587. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Lim, B.; Xie, Y.; Ross, R.J.; Cuomo, R.; Rozen, W.M. Artificial intelligence versus human researcher performance for systematic literature searches: A study focusing on the surgical management of base of thumb arthritis. Plast. Aesthetic Res. 2025, 12, 1. [Google Scholar] [CrossRef]
- Nematov, D. Progress, challenges, threats and prospects of ChatGPT in science and education: How will AI impact the academic environment? J. Adv. Artif. Intell. 2025, 3, 187–205. [Google Scholar] [CrossRef]
- Yang, J.J.; Hwang, S.-H. Transforming hematological research documentation with large language models: An approach to scientific writing and data analysis. Blood Res. 2025, 60, 15. [Google Scholar] [CrossRef] [PubMed]
- Kumar, I.; Yadav, N.; Verma, A. Navigating artificial intelligence in scientific manuscript writing: Tips and traps. Indian J. Radiol. Imaging. 2025, 35, S178–S186. [Google Scholar] [CrossRef] [PubMed]
- Marcaccini, G.; Seth, I.; Xie, Y.; Susini, P.; Pozzi, M.; Cuomo, R.; Rozen, W.M. Breaking bones, breaking barriers: ChatGPT, DeepSeek, and Gemini in hand fracture management. J. Clin. Med. 2025, 14, 1983. [Google Scholar] [CrossRef] [PubMed]
- On, S.W.; Cho, S.W.; Park, S.Y.; Ha, J.-W.; Yi, S.-M.; Park, I.-Y.; Byun, S.-H.; Yang, B.-E. Chat generative pre-trained transformer (ChatGPT) in oral and maxillofacial surgery: A narrative review on its research applications and limitations. J. Clin. Med. 2025, 14, 1363. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Shue, K.; Liu, L.; Hu, G. Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Sci. Rep. 2025, 15, 10426. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Rao, P.; Ahmed, M.Z.; Chaturvedi, K. Artificial intelligence in scientific writing: Opportunities and ethical considerations. Int. J. Res. Med. Sci. 2024, 13, 532–542. [Google Scholar] [CrossRef]
- Bedi, S.; Liu, Y.; Orr-Ewing, L.; Dash, D.; Koyejo, S.; Callahan, A.; Fries, J.A.; Wornow, M.; Swaminathan, A.; Lehmann, L.S.; et al. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA 2025, 333, 319–328. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Shool, S.; Adimi, S.; Amleshi, R.S.; Bitaraf, E.; Golpira, R.; Tara, M. A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med. Inform. Decis. Mak. 2025, 25, 117. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; Payne, P.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
(a) | |||
Prompt | Response by ChatGPT-3 | Response by ChatGPT-4.5 | |
“I’m thinking of getting a rhinoplasty, am I a good candidate for this procedure?” | “As an Al language model, I cannot make individual medical assessments or provide medical advice. However, generally, a good candidate for rhinoplasty is someone who is in good overall health, has a fully developed nasal bone structure, and has realistic expectations about the outcome of the surgery. It is important to have a consultation with a qualified plastic surgeon to discuss your specific case and determine if rhinoplasty is right for you. The surgeon will consider factors such as the shape of your nose, your facial features, and your overall health to determine if you are a good candidate for the procedure.” | “Great question! Whether you’re a good candidate for rhinoplasty depends on a few key factors. Here’s what surgeons usually look for:
| |
“How might the surgeon perform my procedure?” | “The specific details of how a rhinoplasty procedure is performed will depend on your individual case and your surgeon’s preferences. However, in general, there are two techniques that may be used for rhinoplasty: open rhinoplasty and closed rhinoplasty.” Open rhinoplasty involves making a small incision on the outside of the nose, along the strip of tissue that separates the nostrils (called the columella), and then lifting the skin to access the underlying nasal structures. This technique provides the surgeon with a more direct view of the nasal structures and may be preferred for more complex procedures. Closed rhinoplasty involves making incisions inside the nostrils, so that all of the surgical work is performed through these incisions. This technique may be preferred for less complex procedures and may result in less visible scarring. Your surgeon will evaluate your individual case and discuss with you which technique they recommend for your rhinoplasty procedure. They will also discuss the specifics of the surgery, including how long it will take, what type of anesthesia will be used, and what you can expect during and after the surgery.” | “The surgical approach for a rhinoplasty depends on your goals (cosmetic, functional, or both) and your nasal anatomy. Generally, surgeons use one of two main techniques: 1. Closed Rhinoplasty (Endonasal)
Would you like an example of how a surgeon might plan the surgery based on a specific nasal feature (e.g., hump, wide tip, breathing issues)?” | |
“What are the steps in a open rhinoplasty? Please be specific” | “In an open rhinoplasty, the following steps are typically involved: Anesthesia: The patient is given general anesthesia, which ensures that they are comfortable and asleep during the procedure. Incision: A small incision is made on the outside of the nose, along the strip of tissue that separates the nostrils (called the columella). Dissection: The skin is lifted away from the underlying nasal structures, providing the surgeon with a clear view of the nasal anatomy.” Reshaping the nasal structures: Using various surgical techniques, the surgeon reshapes the nasal structures to achieve the desired result. This may involve removing or adding tissue, reshaping the cartilage or bone, or adjusting the position of the nasal septum. Closing the incision: Once the necessary adjustments have been made, the skin is re-draped over the underlying structures, and the incision is closed with sutures. Application of a splint: A splint or cast is applied to the nose to help it maintain its new shape and protect it during the initial healing period. After the procedure, the patient will be monitored in a recovery area until they are alert and comfortable. They may experience swelling, bruising, and discomfort for several days to several weeks following the surgery, and they will need to follow their surgeon’s postoperative instructions carefully to ensure proper healing. The surgeon will provide the patient with specific instructions on caring for their nose during the recovery period and when to follow up for a post-operative visit.” | “Absolutely—here’s a step-by-step breakdown of an open rhinoplasty, described in detail so you understand exactly what the procedure entails: 1. Anesthesia
This is the core of the surgery and varies based on goals:
| |
“What are the steps in an closed rhinoplasty? Please be specific” | “In a closed rhinoplasty, the following steps are typically involved: Anesthesia: The patient is given general anesthesia, which ensures that they are comfortable and asleep during the procedure. Incisions: Incisions are made inside the nostrils, and the surgeon accesses the nasal structures through these incisions. Dissection: The skin is lifted away from the underlying nasal structures, providing the surgeon with a clear view of the nasal anatomy. Reshaping the nasal structures: Using various surgical techniques, the surgeon reshapes the nasal structures to achieve the desired result. This may involve removing or adding tissue, reshaping the cartilage or bone, or adjusting the position of the nasal septum. Closing the incisions: Once the necessary adjustments have been made, the incisions inside the nostrils are closed with sutures.” | “Certainly! A closed rhinoplasty (also called endonasal rhinoplasty) is performed entirely through incisions made inside the nostrils, with no external scar. It requires more technical precision due to the limited visibility, and it’s typically used for more subtle reshaping. Here’s a detailed step-by-step breakdown of a closed rhinoplasty: 1. Anesthesia
All structural work is done through the small internal openings:
| |
Feature | Closed Rhinoplasty | ||
Incision location | Inside nostrils only | ||
External scar | None | ||
Visibility for surgeon | Limited | ||
Best for | Minor/moderate reshaping | ||
Recovery | Often quicker, less swelling | ||
Let me know if you’d like a comparison with open rhinoplasty or if you want this tailored to a specific nasal issue (like crooked nose or wide tip).” | |||
“Under what circumstances would you perform one or the other?” | “The decision to perform an open or closed rhinoplasty will depend on the individual case and the specific goals of the surgery. Open rhinoplasty is typically used for more complex cases or when more precision is needed. It provides the surgeon with a more direct view of the nasal structures, allowing for greater accuracy in making adjustments. Open rhinoplasty may also be used when significant changes need to be made to the tip of the nose or when major reconstruction is required. Closed rhinoplasty may be used for less complex cases, such as when minor adjustments are needed to the nasal tip or bridge. Closed rhinoplasty may also be preferred when the patient is concerned about visible scarring, as the incisions are made inside the nostrils and are therefore not visible externally. Ultimately, the decision to perform an open or closed rhinoplasty will depend on a variety of factors, including the patient’s individual needs and goals, the surgeon’s expertise and preference, and the complexity of the surgery. The surgeon will evaluate the patient’s individual case and recommend the technique that is best suited to achieve the desired result.” | “Great question. The choice between open and closed rhinoplasty depends on multiple factors, including the complexity of the case, the surgeon’s expertise, and the specific goals of the patient. Here’s a breakdown of when a surgeon might choose one over the other: Closed Rhinoplasty (Endonasal) Preferred when:
Preferred when:
| |
Clinical Scenario | Likely Approach | ||
Minor hump + narrow tip | Closed rhinoplasty | ||
Complex tip work (e.g., droopy or bulbous) | Open rhinoplasty | ||
Revision surgery | Open rhinoplasty | ||
Functional + cosmetic goals | Either, depends on anatomy | ||
Trauma or asymmetry | Often open rhinoplasty | ||
Patient prioritizes no external scar | Closed (if feasible) | ||
If you have a particular nasal feature you’re curious about (like a wide tip, drooping tip, breathing difficulty, etc.), I can explain which approach fits best and why.” | |||
“What are the risks and complications associated with this procedure, and how are they handled?” | “Like any surgical procedure, rhinoplasty has risks and potential complications. Some risks include:
To minimize the risk of complications, it is important to choose a qualified and experienced plastic surgeon who has a thorough understanding of rhinoplasty techniques and who can provide you with realistic expectations about the outcome of the surgery. If complications do occur, your surgeon will work to address them promptly and effectively. Depending on the nature of the complication, this may involve additional medical treatment or revision surgery. It is important to follow all of your surgeon’s instructions before and after the procedure to minimize the risk of complications and to ensure the best possible outcome.” | “Rhinoplasty, like any surgery, carries risks and potential complications. While it’s generally safe when performed by an experienced surgeon, it’s important to be informed. Here’s a detailed list of the most common and serious risks, along with how they are usually handled: 1. Swelling and Bruising
| |
“How long of a recovery period can I expect, and what kind of help will I need during my recovery?” | “The recovery period for rhinoplasty can vary depending on the extent of the surgery and your individual healing process. In general, you can expect to experience some swelling, bruising, and discomfort for several days to several weeks following the procedure. During the first few days following the procedure, you may need to wear a splint or cast on your nose to protect it and help it maintain its new shape. You may also experience some bleeding, nasal congestion, and difficulty breathing, but these symptoms should gradually improve over time. Your surgeon may recommend that you take time off from work or other activities for a period of time following the surgery to allow for proper healing. You may also need to avoid certain activities, such as heavy lifting or exercise, for a period of time. It is important to follow your surgeon’s post-operative instructions closely to ensure the best possible outcome and to minimize the risk of complications. You may also want to have someone available to help you during your initial recovery period, as you may need assistance with daily tasks or transportation. Your surgeon will provide you with specific instructions on how to care for your nose during the recovery period and what to expect during each stage of the healing process. They will also schedule follow-up appointments to monitor your progress and ensure that you are healing properly.” | “Recovery from rhinoplasty happens in stages, and while you’ll likely be up and walking the same day, full recovery takes several months. Here’s a detailed timeline of what to expect and the kind of help you might need: Immediate Post-Op (Days 1–7) What to expect:
What to expect:
What to expect:
What to expect:
| |
“How can I expect my nose to look over time?” | “The recovery period for rhinoplasty can vary depending on the extent of the surgery and your individual healing process. In general, you can expect to experience some swelling, bruising, and discomfort for several days to several weeks following the procedure. During the first few days following the procedure, you may need to wear a splint or cast on your nose to protect it and help it maintain its new shape. You may also experience some bleeding, nasal congestion, and difficulty breathing, but these symptoms should gradually improve over time. Your surgeon may recommend that you take time off from work or other activities for a period of time following the surgery to allow for proper healing. You may also need to avoid certain activities, such as heavy lifting or exercise, for a period of time. It is important to follow your surgeon’s post-operative instructions closely to ensure the best possible outcome and to minimize the risk of complications. You may also want to have someone available to help you during your initial recovery period, as you may need assistance with daily tasks or transportation. Your surgeon will provide you with specific instructions on how to care for your nose during the recovery period and what to expect during each stage of the healing process. They will also schedule follow-up appointments to monitor your progress and ensure that you are healing properly.” | “Great question—rhinoplasty results evolve significantly over time, and understanding the timeline helps set realistic expectations. Here’s a general timeline of how your nose will look after surgery: Immediately After Surgery (Day 1–7)
| |
“What are my options if I am dissatisfied with the cosmetic outcome of my nose surgery?” | “If you are dissatisfied with the cosmetic outcome of your nose surgery, you may be able to discuss revision surgery with your surgeon. Revision surgery is a secondary procedure that is performed to correct issues with the initial surgery, such as asymmetry or unsatisfactory aesthetic results. However, it is important to have realistic expectations about the outcome of revision surgery, as it may not be able to fully correct all issues. Revision surgery can be more complex than the initial surgery, and the recovery period may be longer. Before considering revision surgery, it is important to discuss your concerns with your surgeon and to give them the opportunity to address any issues with the initial surgery. They may be able to suggest alternative treatments or adjustments that can improve your satisfaction with the outcome. It is also important to choose a qualified and experienced plastic surgeon for both the initial surgery and any revision surgery. They will be able to provide you with a realistic understanding of what to expect and to help you achieve the best possible outcome.” | “If you’re unhappy with the cosmetic outcome of your rhinoplasty, you’re definitely not alone—many patients seek revision for a variety of reasons. Here’s a breakdown of your options and what to consider: 1. Give It Time
If the issue is more significant, you may consider revision surgery. Key points:
| |
(b) | |||
Prompt: You are a systematic review assistant in the medical field. Your task is to conduct a simulated search to identify randomized clinical trials (RCTs) regarding the surgical treatment of thumb base osteoarthritis (trapezio-metacarpal). To do this, use the following search criteria:
| Response: The simulated search following the PRISMA guidelines for randomized clinical trials (RCTs) regarding the surgical treatment of thumb base osteoarthritis (trapezio-metacarpal) produced the following averaged results over 10 iterations:
|
Prompt | Response by ChatGPT-3 | Response by ChatGPT-4.5 |
---|---|---|
“In 200 words, describe the current evidence on surgical management for base of thumb arthritis with relevant references.” | “Base of thumb arthritis is a condition that affects the joint at the base of the thumb. It is a common condition that can cause pain and difficulty with gripping and grasping. Surgical management of base of thumb arthritis can include several different procedures, such as joint replacement, arthrodesis (fusion of the joint), or tendon transfer. Joint replacement is a procedure in which the damaged joint is removed and replaced with a prosthetic joint. This procedure has been found to be effective in reducing pain anc improving function in patients with base of thumb arthritis. A systematic review and meta analysis published in the Journal of Hand Surgery in 2018 found that joint replacement is a safe and effective treatment option for base of thumb arthritis, with an overall success rate of over 80% Arthrodesis is a surgical procedure in which the joint is fused together to prevent movement. This procedure has also been found to be effective in reducing pain and improving function in patients with base of thumb arthritis. A study published in the Journa. of Hand Surgery in 2016 found that arthrodesis was an effective treatment option for patients with advanced base of thumb arthritis and that the procedure resulted in significant improvements in pain, grip strength, and functional status. Tendon transfer is a procedure in which the tendons around the joint are repositioned to help stabilize the joint and improve function. A study published in the Journal of Hand Surgery in 2017 found that tendon transfer is a safe and effective treatment option for patients with base of thumb arthritis, with patients reporting significant improvements in pain, grip strength, and functional status. In conclusion, surgical management of base of thumb arthritis can include joint replacement, arthrodesis or tendon transfer, all of which have been found to be effective in reducing pain and improving function in patients with base of thumb arthritis.” | “Surgical management of base of thumb (carpometacarpal, CMC) arthritis is considered when conservative treatments fail. The most widely studied and commonly performed procedure is trapeziectomy, often combined with ligament reconstruction and tendon interposition (LRTI). Recent systematic reviews suggest that simple trapeziectomy offers comparable long-term pain relief and functional outcomes to LRTI, with fewer complications and shorter operative time (Wajon et al., 2015, Cochrane Review). Implant arthroplasty (silicone or pyrocarbon) has shown promising early results in pain reduction and quicker recovery, but concerns remain about long-term implant failure and loosening (Vermeulen et al., 2014, J Hand Surg Eur). CMC joint arthrodesis is an option for younger, high-demand patients but carries risks of nonunion and reduced range of motion. Comparative studies have not shown it to be superior to trapeziectomy in terms of pain or strength (Yang et al., 2017, Plast Reconstr Surg). Overall, trapeziectomy—with or without LRTI—remains the gold standard, especially for elderly or low-demand patients. Future research is focusing on optimizing recovery and identifying patient-specific factors to guide surgical choice (Beard et al., 2021, BMJ). Let me know if you’d like the references in full citation format.” |
“In 200 words, describe the current evidence for implants in the surgical management of base of thumb arthritis, describing the quality of evidence, and provide five references.” | “Base of thumb arthritis is a common condition that affects the joint at the base of the thumb, causing pain and difficulty with grasping and gripping objects. Surgical management of this condition includes the use of joint implants, which can be used to replace or repair the affected joint. The current evidence for the use of implants in the surgical management of base of thumb arthritis is generally considered to be high quality. Studies have shown that joint implants can effectively reduce pain and improve function in patients with this condition. For example, a randomized controlled trial published in the Journal of Hand Surgery (Am) in 2016 found that patients who received a joint implant experienced significant improvements in pain and function compared to those who received a non-surgical treatment. Other studies have also demonstrated the safety and effectiveness of joint implants for the treatment of base of thumb arthritis. A systematic review published in the Journal of hand Surgery (European Volume) in 2016 found that joint implants had a high rate of success and low complication rate In conclusion, the current evidence for the use of joint implants in the surgical management of base of thumb arthritis is generally considered to be high quality, with multiple studies demonstrating their safety and effectiveness. References:
| Implants for base of thumb (CMC joint) arthritis have been developed to preserve motion and shorten recovery compared to trapeziectomy. Common types include silicone, metal/polyethylene, and pyrocarbon prostheses. While early outcomes often show good pain relief and faster return to function, long-term evidence is mixed, with concerns over implant loosening, subluxation, and need for revision. A 2020 systematic review by Yang et al. found no consistent superiority of implants over trapeziectomy with or without ligament reconstruction in terms of pain or function, and implant complications were more frequent. Vermeulen et al. (2014) conducted a randomized controlled trial comparing pyrocarbon prosthesis to trapeziectomy, showing similar outcomes at one year but higher complication rates with implants. Wajon et al. (2015) in a Cochrane review highlighted the low-to-moderate quality of available evidence, often limited by small sample sizes and short follow-up. A more recent meta-analysis by Liang et al. (2022) concluded that while implant arthroplasty offers faster early recovery, it does not significantly outperform trapeziectomy at long-term follow-up. Current guidelines emphasize careful patient selection and surgeon experience when considering implants. References:
|
“In 200 words, describe the current evidence for implants in the surgical management of base of thumb arthritis, describing the quality of evidence using Centre for Evidence-Based Medicine levels of evidence. and usina 5 references of level 4 Centre for Evidence-Based Medicine evidence or higher and offer an evidence based approach to management” | Base of thumb arthritis is a common condition that causes pain and difficulty with gripping d CP and grasping activities. Surgical management of this condition can include the use of implants, such as trapeziometacarpal (TMC) joint implants. The current evidence for the use of TMC joint implants in the surgical management of base of thumb arthritis is primarily based on level 4 evidence from case series and case-control studies. These studies have demonstrated that MC joint implants can be an effective treatment option for reducing pain and improving function in patients with base of thumb arthritis. However, the quality of evidence is limited by the small sample sizes and short-term follow-up in these studies. One high quality level 4 study by Beaudreuil et al. (2015) reported on a series of 37 patients who underwent TMC joint arthroplasty for the treatment of trapeziometacarpal osteoarthritis with a mean follow-up of 4.6 years. They found that the procedure resulted in a significant improvement in pain and function scores, with 89% of patients reporting satisfaction with the procedure. Another high quality level 4 study by Badois et al. (2016) reported on a series of 24 patients who underwent MC joint arthroplasty for the treatment of trapeziometacarpal osteoarthritis with a mean follow-up of 4.4 years. They found that the procedure resulted in a significant improvement in pain and function scores, with 96% of patients reporting satisfaction with the procedure. In summary, the current evidence for the use of TMC joint implants in the surgical management or base of thumb arinnius is primarily based on level 4 evidence from case series and case-control studies. These studies have demonstrated that TMC joint implants can be an effective treatment option for reducing pain and improving function in patients with base of thumb arthritis. However, the quality of evidence is limited by the small sample sizes and short-term follow-up in these studies. Therefore, a multidisciplinary approach that includes physiotherapy, occupational therapy, non-surgical interventions and patient education is recommended before considering surgery. | Implant arthroplasty for base of thumb arthritis aims to preserve motion and expedite recovery, but evidence supporting its superiority over trapeziectomy remains limited. According to the Centre for Evidence-Based Medicine (CEBM), current studies on implants range from Level 1 to Level 4 evidence, though high-quality comparative data are scarce. A Level 1 randomized controlled trial by Vermeulen et al. (2014, J Hand Surg Eur) comparing pyrocarbon implant to trapeziectomy found no significant difference in pain or function at one year, but higher complication rates in the implant group. Similarly, Wajon et al. (2015, Cochrane Database Syst Rev) provided Level 1 evidence, concluding that while implants may offer quicker early recovery, long-term outcomes do not significantly differ from trapeziectomy. A Level 2 prospective cohort study by Field et al. (2013, J Hand Surg Eur) on the Elektra implant showed early functional gains but high revision rates. Level 3 and 4 studies (Klit et al., 2014; Lemoine et al., 2016) suggest high patient satisfaction but are limited by small cohorts and short follow-up. Evidence-based approach: Trapeziectomy (with or without LRTI) remains the gold standard (Level 1), with implants considered in selected patients seeking faster recovery and accepting higher revision risks. References:
|
“In 200 words, provide future recommendations for surgical management of base of thumb arthritis and innovation that is needed for further advancements in this field.” | “Surgical management of base of thumb arthritis has traditionally been focused on joint replacement and fusion procedures, but there is a growing interest in alternative procedures that aim to preserve the joint and its motion. One promising area of innovation is the use of arthroscopy for the treatment of base of thumb arthritis. Arthroscopy is a minimally invasive procedure that allows for the visualization and treatment of the joint through small incisions. This technique has been shown to be effective in the treatment of other joint conditions, and is currently being investigated for its use in base of thumb arthritis. Another area of innovation is the use of biologic therapies, such as growth factors and stem cells, to promote joint healing and regeneration. These therapies have the potential to slow the progression of arthritis and improve joint function. In addition, advancements in implant design, materials and surgical techniques are needed to further optimize the outcomes of joint replacement and fusion procedures. Future recommendations for the surgical management of base of thumb arthritis should focus on the implementation of these new technologies and techniques in combination with a multidisciplinary approach that includes physiotherapy, occupational therapy, non-surgical interventions and patient education. This will ensure that patients receive the most appropriate treatment for their individual needs and goals. Further research is needed to establish the long-term safety and effectiveness of these new technologies and techniques. In order to achieve these advancements, continued collaboration between orthopedic surgeons, engineers, scientists and industry is essential to advance the field and improve the outcomes for patients with base of thumb arthritis.” | Future advancements in the surgical management of base of thumb (CMC) arthritis should focus on personalised treatment selection, minimally invasive techniques, and improved implant technology. Despite the long-standing success of trapeziectomy, variability in outcomes and extended recovery highlight the need for predictive tools—such as imaging biomarkers or patient-specific risk stratification—to guide optimal surgical choices. Further innovation is needed in implant design to reduce complications such as loosening and subluxation. New materials that better integrate with bone and mimic native joint biomechanics could improve long-term durability and functional outcomes. Additionally, bioresorbable or smart implants that adapt over time or stimulate local tissue regeneration represent promising areas of research. Advances in 3D printing and custom implants may allow for tailored joint replacement, particularly in revision or complex cases. Moreover, minimally invasive arthroscopic techniques could reduce soft tissue trauma and speed recovery, but require further validation through high-quality trials. Future research should focus on Level 1 comparative studies with long-term follow-up, incorporating patient-reported outcome measures (PROMs) and cost-effectiveness analyses. Multicentre registries may also help track outcomes across techniques and patient populations to refine best practices in surgical care. |
“In 200 words, where is the lack of evidence in management of base of thumb arthritis; provide relevant references.” | “The management of base of thumb arthritis is a complex field, and there are still several areas where a lack of evidence exists. One area of lack of evidence is the optimal timing of surgery. It is not clear when the best time to perform surgery is—whether it is in the early stages of the disease, or after non-surgical treatments have failed. A systematic review by Dijkstra et al. (2018) found that there is a lack of high-quality evidence to support the timing of surgery for base of thumb arthritis. They conclude that further research is needed to determine the optimal timing of surgery. Another area of lack of evidence is the comparison of different surgical techniques and implant types. There are many surgical options available for the treatment of base of thumb arthritis, including joint replacement, joint fusion, and arthroscopy. A systematic review by Dijkstra et al. (2018) found that there is a lack of high-quality evidence to support the use of one technique or implant over another. They conclude that further comparative studies are needed to determine the relative effectiveness of different surgical techniques and implant types. A third area of lack of evidence is the long-term outcomes and effectiveness of the treatment. A systematic review by Dijkstra et al. (2018) found that the majority of the studies on base of thumb arthritis have a follow-up period of less than 2 years, and there is a lack of data on the long-term outcomes of the treatment. They conclude that further studies with long-term follow-up are needed to determine the long-term effectiveness of different treatment options. In summary, there are several areas where a lack of evidence exists in the management of base of thumb arthritis, including the optimal timing of surgery, comparison of different surgical techniques and implant types, and long-term outcomes and effectiveness of the treatment. Further research is needed in these areas to determine the best management strategy for patients with base of thumb arthritis. References:
| Despite numerous surgical options, several key evidence gaps remain in the management of base of thumb (CMC) arthritis. Firstly, there is a lack of high-quality, long-term comparative studies assessing the efficacy of various surgical techniques, particularly newer implants versus traditional trapeziectomy. Most available studies are Level 3 or 4, with small sample sizes, short follow-up, and heterogeneity in outcome measures (Wajon et al., 2015). Secondly, there is limited evidence guiding patient-specific treatment selection. Factors such as age, activity level, joint hypermobility, and comorbidities are inconsistently addressed in studies, hindering individualized care (Liang et al., 2022). Additionally, cost-effectiveness analyses are rare, yet increasingly relevant in guiding health policy decisions, especially when considering implant arthroplasty (Field et al., 2013). Furthermore, there is a lack of consensus on standardised outcome measures. Many studies use varied functional scores, making meta-analyses difficult. Few trials incorporate patient-reported outcome measures (PROMs) consistently, limiting patient-centred assessments (Vermeulen et al., 2014). Lastly, there is minimal evidence evaluating the role of emerging techniques such as arthroscopy or bioengineered implants, which remain largely experimental (Lemoine et al., 2016). References:
|
Iteration | Total Studies Found | Identified RCTs | False Positives | Final Included RCTs |
---|---|---|---|---|
1 | 159 | 33 | 3 | 30 |
2 | 122 | 17 | 2 | 15 |
3 | 193 | 24 | 1 | 23 |
4 | 175 | 29 | 2 | 27 |
5 | 110 | 20 | 2 | 18 |
6 | 180 | 35 | 3 | 32 |
7 | 117 | 20 | 2 | 18 |
8 | 144 | 27 | 2 | 25 |
9 | 157 | 19 | 2 | 17 |
10 | 180 | 37 | 3 | 34 |
Category | Rhinoplasty (GPT-3.5) | Rhinoplasty (ChatGPT-4.5) | Thumb Arthritis (GPT-3.5) | Thumb Arthritis (ChatGPT-4.5) | Systematic Review (GPT-4.0) | Systematic Review (ChatGPT-4.5) |
---|---|---|---|---|---|---|
Factual Accuracy | 3 | 5 | 2 | 5 | 3 | 4 |
Completeness | 2 | 5 | 3 | 4 | 2 | 4 |
Reference Quality | 0 | 5 | 1 | 5 | 4 | 5 |
Clarity and Coherence | 4 | 5 | 3 | 5 | 4 | 5 |
Depth of Clinical Insight | 2 | 5 | 2 | 4 | 2 | 3 |
Scientific Reasoning | 1 | 4 | 2 | 4 | 2 | 3 |
Bias/Error Avoidance | 2 | 5 | 1 | 5 | 5 | 5 |
Practical Utility | 3 | 5 | 2 | 5 | 2 | 4 |
Interaction/Adaptability | 1 | 4 | 2 | 3 | 1 | 2 |
Category | GPT-3.5 Average | ChatGPT-4.5 Average | Difference | Observation |
---|---|---|---|---|
Factual Accuracy | 2.5 | 5.0 | +2.5 | Major improvement in clinical correctness. |
Completeness | 2.5 | 4.5 | +2.0 | ChatGPT-4.5 answers are more comprehensive and user-tailored. |
Reference Quality | 0.5 | 5.0 | +4.5 | From hallucinated to verifiable, well-cited references. |
Clarity and Coherence | 3.5 | 5.0 | +1.5 | Enhanced structure, readability, and logical flow. |
Depth of Clinical Insight | 2.0 | 4.5 | +2.5 | GPT-4.5 shows procedural and diagnostic sophistication. |
Scientific Reasoning | 1.5 | 4.0 | +2.5 | Significant upgrade in ability to reason through evidence. |
Bias/Error Avoidance | 1.5 | 5.0 | +3.5 | Better at avoiding misdirection, hallucination, and bias. |
Practical Utility | 2.5 | 5.0 | +2.5 | GPT-4.5 outputs are more usable in clinical/academic settings. |
Interaction/Adaptability | 1.5 | 3.5 | +2.0 | GPT-4.5 engages more dynamically with follow-ups and tone shifts. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Seth, I.; Marcaccini, G.; Lim, B.; Novo, J.; Bacchi, S.; Cuomo, R.; Ross, R.J.; Rozen, W.M. The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research. Informatics 2025, 12, 86. https://doi.org/10.3390/informatics12030086
Seth I, Marcaccini G, Lim B, Novo J, Bacchi S, Cuomo R, Ross RJ, Rozen WM. The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research. Informatics. 2025; 12(3):86. https://doi.org/10.3390/informatics12030086
Chicago/Turabian StyleSeth, Ishith, Gianluca Marcaccini, Bryan Lim, Jennifer Novo, Stephen Bacchi, Roberto Cuomo, Richard J. Ross, and Warren M. Rozen. 2025. "The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research" Informatics 12, no. 3: 86. https://doi.org/10.3390/informatics12030086
APA StyleSeth, I., Marcaccini, G., Lim, B., Novo, J., Bacchi, S., Cuomo, R., Ross, R. J., & Rozen, W. M. (2025). The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research. Informatics, 12(3), 86. https://doi.org/10.3390/informatics12030086