The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses
Abstract
:1. Introduction
Background
2. Methodology
2.1. The 2023 Data Set
2.2. The 2025 Data Set
2.3. Assessment of Veracity
2.4. Cloze Analysis
2.5. Statitics
2.6. Documentation
2.7. Limitations
3. Results and Discussion
3.1. Structure of Responses
3.1.1. ChatGPT3.5
3.1.2. ChatGPT4o
3.1.3. ScholarGPT
3.1.4. DeepSeek
3.2. Authenticity of References
3.2.1. ChatGPT3.5
3.2.2. ChatGPT4o
3.2.3. ScholarGPT
3.2.4. DeepSeek
3.2.5. Comparing the Models
3.3. The Sources of the Genuine References
3.3.1. ChatGPT3.5
3.3.2. ChatGPT4o
3.3.3. ScholarGPT
3.3.4. DeepSeek R1
3.4. Training Data vs. Reasoning
(11 February 2025 19: 03 GMT, 18 s ‘thinking’ time) Input: Fort Walsh of the 1870s represents a fascinating moment in the “conquest” of the Canadian West, as [MASK] describes it (1987), and the archaeological project developed for Parks Canada by James V. Sciscenti in the 1970s was remarkable in a number of ways First, the sentence mentions Fort Walsh in the 1870s and the “conquest” of the Canadian West. The author in question is referenced with a citation from 1987. The archaeological project by James V. Sciscenti in the 1970s is also mentioned. I need to think of a historian or scholar who wrote about the Canadian West around 1987, specifically using the term “conquest” in that context. The name should be one word, a proper noun. I recall that Gerald Friesen is a Canadian historian who wrote about Western Canada. His book “The Canadian Prairies: A History” was published in 1984, which is close to 1987. He might discuss the conquest aspect. Another possibility could be someone like John Tobias, but I’m not sure about the publication date. Wait, the example given earlier had Hempel paired with Carnap, both logical positivists. So here, the author cited in 1987 discussing the conquest of the Canadian West. Friesen’s work fits the timeframe and topic. Alternatively, maybe Sarah Carter? But her notable works might be a bit later. Gerald Friesen seems the most likely. I’ll go with Friesen. Output: <name>Friesen</name> The passage refers to a 1987 source discussing the “conquest” of the Canadian West. Gerald Friesen, a historian known for works like The Canadian Prairies: A History (1984), aligns with this context, as his scholarship focuses on Western Canadian history and Indigenous-settler relations. While the exact match to 1987 is uncertain, Friesen is a plausible candidate given his prominence in this field. Other possibilities (e.g., Sarah Carter, John Tobias) either postdate 1987 or lack direct ties to the “conquest” framing. |
3.5. The Quality of Resonses as ‘Self-Reported’ by ChatGPT4o and DeepSeek R1
3.5.1. ChatGPT4o
3.5.2. DeepSeek R1
4. Conclusions and Implications
Supplementary Materials
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Adeshola, I., & Adepoju, A. P. (2024). The opportunities and challenges of ChatGPT in education. Interactive Learning Environments, 32(10), 6159–6172. [Google Scholar] [CrossRef]
- Agapiou, A., & Lysandrou, V. (2023). Interacting with the Artificial Intelligence (AI) language model ChatGPT: A synopsis of earth observation and remote sensing in archaeology. Heritage, 6(5), 4072–4085. [Google Scholar] [CrossRef]
- Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus, 15(2), e35179. [Google Scholar] [CrossRef]
- Allen, L., O’Connell, A., & Kiermer, V. (2019). How can we ensure visibility and diversity in research contributions? How the Contributor Role Taxonomy (CRediT) is helping the shift from authorship to contributorship. Learned Publishing, 32(1), 71–74. [Google Scholar]
- Anderson, A., & Correa, E. (2019). Critical explorations of online sources in a culture of “fake news, alternative facts and multiple truths”. Global Learn. [Google Scholar]
- Armitage, R., & Vaccari, C. (2021). Misinformation and disinformation. In H. Tumber, & S. Waisbord (Eds.), The Routledge companion to media disinformation and populism (pp. 38–48). Routledge. [Google Scholar]
- Athaluri, S. A., Manthena, S. V., Kesapragada, V. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the boundaries of reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus, 15(4), e37432. [Google Scholar] [CrossRef] [PubMed]
- Babl, F. E., & Babl, M. P. (2023). Generative artificial intelligence: Can ChatGPT write a quality abstract? Emergency Medicine Australasia, 35(5), 809–811. [Google Scholar] [CrossRef]
- Baigutanova, A. (2024). Large-scale analysis of reference quality in heterogeneous Wikipedia datasets. Korea Advanced Institute of Science & Technology. [Google Scholar]
- Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., & Chung, W. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv, arXiv:2302.04023. [Google Scholar]
- Bays, H. E., Fitch, A., Cuda, S., Gonsahn-Bollie, S., Rickey, E., Hablutzel, J., Coy, R., & Censani, M. (2023). Artificial intelligence and obesity management: An Obesity Medicine Association (OMA) Clinical Practice Statement (CPS) 2023. Obesity Pillars, 6, 100065. [Google Scholar] [CrossRef]
- Biswas, S. (2023). Importance of chat GPT in agriculture: According to chat GPT. Available at SSRN 4405391. arXiv:2305.00118. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4405391 (accessed on 5 February 2025).
- Bloxham, S. (2012). ‘You can see the quality in front of your eyes’: Grounding academic standards between rationality and interpretation. Quality in Higher Education, 18(2), 185–204. [Google Scholar] [CrossRef]
- Borkakoty, H., & Espinosa-Anke, L. (2024). Hoaxpedia: A unified Wikipedia Hoax articles dataset. arXiv Preprint, arXiv:2405.02175. [Google Scholar]
- Campbell, L. (1979). Middle american languages. In The languages of native America: Historical and comparative assessment (pp. 902–1000). University of Texas Press. [Google Scholar]
- Campbell, L., & Kaufman, T. (1980). On mesoamerican linguistics. American Anthropologist, 82(4), 850–857. [Google Scholar] [CrossRef]
- Campbell, L., & Kaufman, T. (1983). Mesoamerican historical linguistics and distant genetic relationship: Getting it straight. American Anthropologist, 85(2), 362–372. [Google Scholar] [CrossRef]
- Campbell, L., & Kaufman, T. (1985). Mayan linguistics: Where are we now? Annual Review of Anthropology, 14, 187–198. [Google Scholar] [CrossRef]
- Cao, Y., Zhou, L., Lee, S., Cabello, L., Chen, M., & Hershcovich, D. (2023). Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. arXiv, arXiv:2303.17466. [Google Scholar]
- Castro Nascimento, C. M., & Pimentel, A. S. (2023). Do large language models understand chemistry? A conversation with ChatGPT. Journal of Chemical Information and Modeling, 63(6), 1649–1655. [Google Scholar] [CrossRef]
- Chang, K. K., Cramer, M., Soni, S., & Bamman, D. (2023). Speak, memory: An archaeology of books known to chatgpt/gpt-4. arXiv, arXiv:2305.00118. [Google Scholar]
- Chiesa-Estomba, C. M., Lechien, J. R., Vaira, L. A., Brunet, A., Cammaroto, G., Mayo-Yanez, M., Sanchez-Barrueco, A., & Saga-Gutierrez, C. (2024). Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. European Archives of Oto-Rhino-Laryngology, 281(4), 2081–2086. [Google Scholar] [CrossRef] [PubMed]
- Ciaccio, E. J. (2023). Use of artificial intelligence in scientific paper writing. Informatics in Medicine Unlocked, 41, 101253. [Google Scholar] [CrossRef]
- Conway, A. 2024 May 13. What is GPT-4o? Everything you need to know about the new OpenAI model that everyone can use for free”. XDA Developers. [Google Scholar]
- Day, T. (2023). A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. The Professional Geographer, 75(6), 1024–1027. [Google Scholar] [CrossRef]
- DeepSeek. (2025). DeepSeek into the unknown. R1 Model V3. Hangzhou DeepSeek Artificial Intelligence Co., Ltd.; Beijing DeepSeek Artificial Intelligence Co., Ltd. Available online: https://www.deepseek.com (accessed on 5 February 2025).
- Elazar, Y., Kassner, N., Ravfogel, S., Feder, A., Ravichander, A., Mosbach, M., Belinkov, Y., Schütze, H., & Goldberg, Y. (2022). Measuring causal effects of data statistics on language model’s factual’predictions. arXiv, arXiv:2207.14251. [Google Scholar]
- Fergus, S., Botha, M., & Ostovar, M. (2023). Evaluating academic answers generated using ChatGPT. Journal of Chemical Education, 100(4), 1672–1675. [Google Scholar] [CrossRef]
- Fernández-Sánchez, A., Lorenzo-Castiñeiras, J. J., & Sánchez-Bello, A. (2025). Navigating the future of pedagogy: The integration of AI tools in developing educational assessment rubrics. European Journal of Education, 60(1), e12826. [Google Scholar] [CrossRef]
- Ferrara, E. (2023). Should chatgpt be biased? challenges and risks of bias in large language models. arXiv, arXiv:2304.03738. [Google Scholar]
- Flannery, K. V., & Marcus, J. (2003). The cloud people: Divergent evolution of the Zapotec and Mixtec civilizations. Academic Press. [Google Scholar]
- Franzen, C. 2024 November 22. DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance. VentureBeat. [via WayBackMachine]. Available online: https://web.archive.org/web/20241122010413/https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/ (accessed on 5 February 2025).
- Giray, L. (2024). ChatGPT references unveiled: Distinguishing the reliable from the fake. Internet Reference Services Quarterly, 28(1), 9–18. [Google Scholar] [CrossRef]
- Gravel, J., D’Amours-Gravel, M., & Osmanlliu, E. (2023). Learning to fake it: Limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clinic Proceedings: Digital Health, 1(3), 226–234. [Google Scholar] [CrossRef]
- Grünebaum, A., Chervenak, J., Pollet, S. L., Katz, A., & Chervenak, F. A. (2023). The exciting potential for ChatGPT in obstetrics and gynecology. American Journal of Obstetrics and Gynecology, 228(6), 696–705. [Google Scholar] [CrossRef]
- Grynbaum, M. M., & Mac, R. (2023, December 27). The Times sues OpenAI and Microsoft over AI use of copyrighted work. The New York Times. [Google Scholar]
- Hartmann, J., Schwenzow, J., & Witte, M. (2023). The political ideology of conversational AI: Converging evidence on ChatGPT’s pro-environmental, left-libertarian orientation. arXiv, arXiv:2301.01768. [Google Scholar] [CrossRef]
- Hill-Yardin, E. L., Hutchinson, M. R., Laycock, R., & Spencer, S. J. (2023). A Chat (GPT) about the future of scientific publishing. Brain Behavior and Immunity, 110, 152–154. [Google Scholar]
- Hwang, T., Aggarwal, N., Khan, P. Z., Roberts, T., Mahmood, A., Griffiths, M. M., Parsons, N., & Khan, S. (2024). Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS ONE, 19(2), e0297701. [Google Scholar] [CrossRef]
- Kacena, M. A., Plotkin, L. I., & Fehrenbacher, J. C. (2024). The use of artificial intelligence in writing scientific review articles. Current Osteoporosis Reports, 22(1), 115–121. [Google Scholar] [CrossRef]
- Kancko, T. n.d. Authorship verification via cloze-test. Masaryk University.
- Kendall, G., & Teixeira da Silva, J. A. (2024). Risks of abuse of large language models, like ChatGPT, in scientific publishing: Authorship, predatory publishing, and paper mills. Learned Publishing, 37(1). [Google Scholar] [CrossRef]
- King, M. R. (2023). The future of AI in medicine: A perspective from a Chatbot. Annals of Biomedical Engineering, 51(2), 291–295. [Google Scholar] [CrossRef]
- Kirch, P. V., & Green, R. C. (2001). Hawaiki, ancestral Polynesia: An essay in historical anthropology. Cambridge University Press. [Google Scholar]
- Lapp, E. C., & Lapp, L. W. (2024). Evaluating ChatGPT as a viable research tool for typological investigations of cultural heritage artefacts—Roman clay oil lamps. Archaeometry, 66(3), 696–717. [Google Scholar] [CrossRef]
- Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. [Google Scholar] [CrossRef]
- Lu, D. (2025, January 28). We tried out DeepSeek. It worked well, until we asked it about Tiananmen Square and Taiwan. The Guardian. Available online: https://www.theguardian.com/technology/2025/jan/28/we-tried-out-deepseek-it-works-well-until-we-asked-it-about-tiananmen-square-and-taiwan (accessed on 5 February 2025).
- Lund, B. D., & Naheem, K. (2024). Can ChatGPT be an author? A study of artificial intelligence authorship policies in top academic journals. Learned Publishing, 37(1), 13–21. [Google Scholar]
- Maas, C. (2023, May 13). Was kann ChatGPT für Kultureinrichtungen tun? TS2 Space, LIM Center. Available online: https://www.aureka.ai/de/aureka-blog/2024/12/26/warum-gpt-fuer-kultureinrichtungen-im-jahr-2025-wichtig-ist (accessed on 29 June 2024).
- Macfarlane, B., Zhang, J., & Pun, A. (2014). Academic integrity: A review of the literature. Studies in Higher Education, 39(2), 339–358. [Google Scholar] [CrossRef]
- Markov, T., Zhang, C., Agarwal, S., Eloundou, T., Lee, T., Adler, S., Jiang, A., & Weng, L. 2023 August 22. New and improved content moderation tooling. [via Wayback Machine]. Available online: https://web.archive.org/web/20230130233845mp_/https://openai.com/blog/new-and-improved-content-moderation-tooling/ (accessed on 28 June 2024).
- Martin, L., Whitehouse, N., Yiu, S., Catterson, L., & Perera, R. (2024). Better call GPT, comparing large language models against lawyers. arXiv, arXiv:2401.16212. [Google Scholar]
- McCabe, D. L., & Pavela, G. (1997). Ten principles of academic integrity for faculty. The Journal of College and University Law, 24, 117–118. [Google Scholar]
- McGee, R. W. (2023). Is chat gpt biased against conservatives? an empirical study (February 15). Available online: https://ssrn.com/abstract=4359405 (accessed on 5 February 2025). [CrossRef]
- MedCalc Software. (2018). MEDCALC. Comparison of proportions calculator version 22.032. MedCalc Software. Available online: https://www.medcalc.org/calc/comparison_of_proportions.php (accessed on 5 February 2025).
- Merritt, E. 2023 January 25. Chatting about museums with ChatGPT. American Alliance of Museums. Available online: https://www.aam-us.org/2023/01/25/chatting-about-museums-with-chatgpt (accessed on 5 February 2025).
- Metz, C. (2025, January 27). What is DeepSeek? And how is it upending A.I.? The New York Times. Available online: https://www.nytimes.com/2025/01/27/technology/what-is-deepseek-china-ai.html (accessed on 5 February 2025).
- Millidge, B. (2023, July 23). LLMs confabulate not hallucinate. Beren’s Blog. Available online: https://www.beren.io/2023-03-19-LLMs-confabulate-not-hallucinate (accessed on 5 February 2025).
- Morocco-Clarke, A., Sodangi, F. A., & Momodu, F. (2024). The implications and effects of ChatGPT on academic scholarship and authorship: A death knell for original academic publications? Information & Communications Technology Law, 33(1), 21–41. [Google Scholar]
- Motoki, F., Pinho Neto, V., & Rodrigues, V. (2023). More human than human: Measuring chatgpt political bias. Available online: https://ssrn.com/abstract=4372349 (accessed on 5 February 2025).
- Nicholson, J. M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., & Rife, S. C. (2021). Measuring the quality of scientific references in Wikipedia: An analysis of more than 115M citations to over 800 000 scientific articles. The FEBS Journal, 288(14), 4242–4248. [Google Scholar] [CrossRef]
- Onishi, T., Wang, H., Bansal, M., Gimpel, K., & McAllester, D. (2016). Who did what: A large-scale person-centered cloze dataset. arXiv, arXiv:1608.05457. [Google Scholar]
- OpenAI. (2025). Models. Available online: https://platform.openai.com/docs/models (accessed on 4 February 2025).
- Pascoe, B. (2014). Dark emu black seeds: Agriculture or accident? Magabala Books. [Google Scholar]
- Qi, X., Zhu, Z., & Wu, B. (2023). The promise and peril of ChatGPT in geriatric nursing education: What we know and do not know. Aging and Health Research, 3(2), 100136. [Google Scholar] [CrossRef]
- Rao, A. S., Pang, M., Kim, J., Kamineni, M., Lie, W., Prasad, A. K., Landman, A., Dryer, K., & Succi, M. D. (2023). Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv, 2023. [Google Scholar] [CrossRef]
- Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. [Google Scholar] [CrossRef]
- Rozado, D. (2023). The political biases of chatgpt. Social Sciences, 12(3), 148. [Google Scholar] [CrossRef]
- Rutinowski, J., Franke, S., Endendyk, J., Dormuth, I., & Pauly, M. (2023). The self-perception and political Biases of ChatGPT. arXiv, arXiv:2304.07333. [Google Scholar] [CrossRef]
- Sarraju, A., Bruemmer, D., Van Iterson, E., Cho, L., Rodriguez, F., & Laffin, L. (2023). Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based Artificial Intelligence model. Jama, 329(10), 842–844. [Google Scholar] [CrossRef]
- Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017, May 22–26). Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA. [Google Scholar]
- Sng, G. G. R., Tung, J. Y. M., Lim, D. Y. Z., & Bee, Y. M. (2023). Potential and pitfalls of ChatGPT and natural-language artificial intelligence models for diabetes education. Diabetes Care, 46(5), e103–e105. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023a). ChatGPT and the generation of digitally born “knowledge”: How does a generative AI language model interpret cultural heritage values? Knowledge, 3(3), 480–512. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023b). Children of AI: A protocol for managing the born-digital ephemera spawned by Generative AI Language Models. Publications, 11, 45. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023c). Exhibiting the Heritage of Covid-19—A Conversation with ChatGPT. Heritage, 6(8), 5732–5749. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023d). Exploring ethical boundaries: Can ChatGPT be prompted to give advice on how to cheat in university assignments? Preprint, 1–14. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023e). What has ChatGPT read? References and referencing of archaeological literature by a generative artificial intelligence application. arXiv, arXiv:2308.03301. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2023f). Will the age of generative Artificial Intelligence become an age of public ignorance? Preprint, 1–12. [Google Scholar] [CrossRef]
- Spennemann, D. H. R. (2024). Will artificial intelligence affect how cultural heritage will be managed in the future? Conversations with four genAi models. Heritage, 7(3), 1453–1471. [Google Scholar] [CrossRef]
- Spennemann, D. H. R., Biles, J., Brown, L., Ireland, M. F., Longmore, L., Singh, C. J., Wallis, A., & Ward, C. (2024). ChatGPT giving advice on how to cheat in university assignments: How workable are its suggestions? Interactive Technology and Smart Education, 21(4), 690–707. [Google Scholar] [CrossRef]
- Surameery, N. M. S., & Shakor, M. Y. (2023). Use chat gpt to solve programming bugs. International Journal of Information Technology & Computer Engineering (IJITC), 3(01), 17–22. [Google Scholar]
- Tirumala, K., Markosyan, A., Zettlemoyer, L., & Aghajanyan, A. (2022). Memorization without overfitting: Analyzing the training dynamics of large language models. Advances in Neural Information Processing Systems, 35, 38274–38290. [Google Scholar]
- Trichopoulos, G., Konstantakis, M., Alexandridis, G., & Caridakis, G. (2023a). Large language models as recommendation systems in museums. Electronics, 12, 3829. [Google Scholar] [CrossRef]
- Trichopoulos, G., Konstantakis, M., Caridakis, G., Katifori, A., & Koukouli, M. (2023b). Crafting a museum guide using GPT4. Bid Data and Cogntiive Computing, 7(3), 148. [Google Scholar] [CrossRef]
- Wen, J., & Wang, W. (2023). The future of ChatGPT in academic research and publishing: A commentary for clinical and translational medicine. Clinical and Translational Medicine, 13(3), e1207. [Google Scholar] [CrossRef]
- Wylie, A. (2002). Thinking from Things: Essays in the philosophy of archaeology. University of California Press. [Google Scholar]
Run | Date/Time (GMT) | genAi Model | Topic | Initial Response | Continued |
R1 | 5-February-25 03:31 | ChatGPT4o | cultural values in CHM | 34 | 16 |
R2 | 5-February-25 03:35 | ChatGPT4o | archaeological theory | 35 | 16 |
R3 | 5-February-25 03:38 | ChatGPT4o | Pacific archaeology | 28 | 32 |
R4 | 5-February-25 03:45 | ChatGPT4o | Australian archaeology | 34 | 16 |
R5 | 5-February-25 03:06 | ScholarGPT | cultural values in CHM | 50 | — |
R6 | 5-February-25 03:12 | ScholarGPT | archaeological theory | 32 | — |
R7 | 5-February-25 03:18 | ScholarGPT | Pacific archaeology | 25 | — |
R8 | 5-February-25 03:30 | ScholarGPT | Australian archaeology | 29 | — |
R9 | 10-February-25 23:47 | DeepSeek v3 | cultural values in CHM | 50 | — |
R10 | 10-February-25 23:48 | DeepSeek v3 | archaeological theory | 50 | — |
R11 | 10-February-25 23:50 | DeepSeek v3 | Pacific archaeology | 50 | — |
R12 | 10-February-25 23:52 | DeepSeek v3 | Australian archaeology | 50 | — |
You have seen the following passage in your training data. What is the proper name that fills in the [MASK] token in it? This name is exactly one word long, and is a proper name (not a pronoun or any other word). You must make a guess, even if you are uncertain. |
Example: |
Input: Stay gold, [MASK], stay gold. |
Output: <name>Ponyboy</name> |
Input: The door opened, and [MASK], dressed and hatted, entered with a cup of tea. |
Output: <name>Gerty</name> |
Input: Text of phrase to be tested |
Correct Citation |
Wrong Year Cited |
Confabulated Citation | Acknowledged as Fictional | n | |
---|---|---|---|---|---|
Archaeological Theory | 68.7 | 2.6 | 28.7 | — | 115 |
Cultural Heritage Management | 26.2 | 15.2 | 34.8 | 23.8 | 210 |
Pacific Archaeology | 27.2 | 6.4 | 66.4 | — | 125 |
Australian Archaeology | 3.6 | 11.8 | 84.5 | — | 110 |
All sources | 32.1 | 10.2 | 48.8 | 8.9 | 560 |
Reference Component | Commentary |
---|---|
Best, S., | genuine author, Simon Best |
Clark, G. | genuine author, Geoff Clark |
(2008). | plausible year |
Post-Spanish Contact Archaeology | contextually plausible time frame in title |
of Guahan (Guam). | plausible location in title |
Micronesian Journal of the Humanities and Social Sciences | genuine journal on record |
7(2), | volume number does not exist as journal ceased with volume 5, 2006 |
37–74 | pagination irrelevant as volume count incorrect |
Correct Citation | Citation Error |
Confabulated Citation | n | |||
---|---|---|---|---|---|---|
Full | Incomplete | Year | Other | n | ||
Archaeological Theory | 98.00 | 2.00 | 50 | |||
Cultural Heritage Management | 84.00 | 4.00 | 4.00 | 2.00 | 6.00 | 50 |
Pacific Archaeology | 78.00 | 6.00 | 16.00 | 50 | ||
Australian Archaeology | 74.00 | 4.00 | 4.00 | 2.00 | 16.00 | 50 |
All Sources | 83.50 | 3.50 | 2.00 | 1.00 | 10.00 | 200 |
Correct Citation | Citation Error |
Confabulated Citation | |||||
---|---|---|---|---|---|---|---|
Full | Incomplete | Irrelevant | Year | Other | n | ||
Archaeological Theory | 37.50 | 15.63 | 3.13 | 28.13 | 3.13 | 12.50 | 32 |
Cultural Heritage Management | 18.00 | 6.00 | 12.00 | 20.00 | 44.00 | 50 | |
Pacific Archaeology | 20.69 | 27.59 | 27.59 | 24.14 | 29 | ||
Australian Archaeology | 16.00 | 8.00 | 36.00 | 24.00 | 4.00 | 12.00 | 25 |
All Sources | 22.79 | 13.24 | 11.76 | 24.26 | 1.47 | 26.47 | 136 |
Correct Citation | Citation Error |
Confabulated Citation | ||||
---|---|---|---|---|---|---|
Full | Incomplete | Year | Other | n | ||
Archaeological Theory | 100.00 | 50 | ||||
Cultural Heritage Management | 78.00 | 12.00 | 4.00 | 6.00 | 50 | |
Pacific Archaeology | 80.00 | 4.00 | 2.00 | 14.00 | 50 | |
Australian Archaeology | 82.00 | 2.00 | 2.00 | 6.00 | 8.00 | 50 |
All Sources | 85.00 | 4.50 | 2.00 | 1.50 | 7.00 | 200 |
Decade | ChatGPT3.5 | ChatGPT4o | DeepSeek | ScholarGPT |
---|---|---|---|---|
1960–1969 | 2.0 | 2.5 | ||
1970–1979 | 1.1 | 5.0 | 3.5 | |
1980–1989 | 2.2 | 11.0 | 7.0 | |
1990–1999 | 6.0 | 21.5 | 20.5 | |
2000–2009 | 19.5 | 39.0 | 39.5 | |
2010–2019 | 59.9 | 21.5 | 27.0 | |
2020–2025 | 11.4 | 100.0 | ||
Total | 369 | 200 | 200 | 136 |
Dark Emu | Thinking from Things | Hawaiki | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | Set | Answer |
GPT v3.5 |
GPT v4o |
Sch GPT |
DS v3 |
GPT v3.5 |
GPT v4o |
Sch GPT |
DS v3 |
GPT v3.5 |
GPT v4o |
Sch GPT |
DS v3 |
1 | 1 | Initial | 10 | 30 | 40 | 30 | 20 | 0 | 20 | 30 | 70 | 30 | 60 | 60 |
Regenerated | 20 | 10 | 50 | 30 | 30 | 10 | 20 | 20 | 70 | 40 | 50 | 40 | ||
2 | Initial | 30 | — | — | — | 30 | — | — | — | 70 | — | — | — | |
Regenerated | 30 | — | — | — | 20 | — | — | — | 70 | — | — | — | ||
2 | 1 | Initial | 20 | 10 | 60 | 40 | 20 | 20 | 20 | 20 | 60 | 40 | 50 | 30 |
Regenerated | 30 | 20 | 60 | 30 | 20 | 30 | 30 | 20 | 40 | 50 | 50 | 50 | ||
2 | Initial | 40 | — | — | — | 20 | — | — | — | 50 | — | — | — | |
Regenerated | 40 | — | — | — | 20 | — | — | — | 50 | — | — | — | ||
Average | 27.5 | 17.5 | 52.5 | 32.5 | 22.5 | 15.0 | 22.5 | 22.5 | 60.0 | 40.0 | 52.5 | 45.0 | ||
n | 240 | 120 | 120 | 120 | 240 | 120 | 120 | 120 | 240 | 120 | 120 | 120 |
Generative Ai Model |
Correctly Guessed at Least Once (%) (n = 30) |
Proportion of Correct Guesses (%) (n = 120) |
Incorrect Names (n) |
---|---|---|---|
ChatGPT3.5 | 50.0 | 42.5 | 34 |
ChatGPT4o | 46.7 | 25.8 | 60 |
ScholarGPT | 46.7 | 40.0 | 33 |
DeepSeek | 46.7 | 34.2 | 42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Spennemann, D.H.R. The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses. Publications 2025, 13, 12. https://doi.org/10.3390/publications13010012
Spennemann DHR. The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses. Publications. 2025; 13(1):12. https://doi.org/10.3390/publications13010012
Chicago/Turabian StyleSpennemann, Dirk H. R. 2025. "The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses" Publications 13, no. 1: 12. https://doi.org/10.3390/publications13010012
APA StyleSpennemann, D. H. R. (2025). The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses. Publications, 13(1), 12. https://doi.org/10.3390/publications13010012