Assessment Validity in the Age of Generative AI: A Natural Experiment
Abstract
1. Introduction
2. Theory
2.1. Assessment as a Measurement System
2.2. AI as a Low-Cost Cognitive Substitute and Performance Mediator
2.3. Asymmetric Benefit Distribution, Variance Compression, and Grade Instability
3. Method
3.1. Analytical Strategy
3.2. Statistical Analysis
3.3. Ethics and Use of AI
4. Results
4.1. Descriptive Grade Distributions
4.2. Baseline Versus 2025 Comparison
4.3. Category-Specific Contributions to the Chi-Square Statistic
5. Discussion
5.1. Credibility and the University’s Certification Function
5.2. AI Dependency and the Fragility of Unaided Competence
5.3. Authenticity, Relevance, and the Limits of AI Restriction
6. Limitations and Future Research
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kane, M.T. Validating the interpretations and uses of test scores. J. Educ. Meas. 2013, 50, 1–73. [Google Scholar] [CrossRef]
- Messick, S. Validity of psychological assessment. Am. Psychol. 1995, 50, 741–749. [Google Scholar] [CrossRef]
- Kabbar, E.; Barmada, B. Assessment validity in the era of generative AI tools. In Proceedings: CITRENZ 2023 Conference; Unitec ePress: Auckland, New Zealand, 2024; pp. 26–33. [Google Scholar]
- Kaldaras, L.; Akaeze, H.O.; Reckase, M.D. Developing valid assessments in the era of generative artificial intelligence. Front. Educ. 2024, 9, 1399377. [Google Scholar] [CrossRef]
- Gruenhagen, J.H.; Sinclair, P.M.; Carroll, J.A.; Baker, P.R.; Wilson, A.; Demant, D. The Rapid Rise of Generative AI and Its Implications for Academic Integrity: Students’ Perceptions and Use of Chatbots. Comput. Educ. Artif. Intell. 2024, 7, 100273. [Google Scholar] [CrossRef]
- Kizilcec, R.F.; Huber, E.; Papanastasiou, E.C.; Craw, A.; Makridis, C.A.; Smolansky, A.; Zeivots, S.; Raduescu, C. Perceived impact of generative AI on assessments: Comparing educator and student perspectives in Australia, Cyprus, and the United States. Comput. Educ. Artif. Intell. 2024, 7, 100269. [Google Scholar] [CrossRef]
- Rudolph, J.; Tan, J.; Tan, E. ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? J. Appl. Learn. Teach. 2023, 6, 342–363. [Google Scholar] [CrossRef]
- Xia, Q.; Weng, X.; Ouyang, F.; Lin, T.J.; Chiu, T.K.F. A scoping review on how generative artificial intelligence transforms assessment in higher education. Int. J. Educ. Technol. High. Educ. 2024, 21, 40. [Google Scholar] [CrossRef]
- Bittle, K.; El-Gayar, O. Generative AI and academic integrity in higher education: A systematic review and research agenda. Information 2025, 16, 296. [Google Scholar] [CrossRef]
- Deng, R.; Jiang, M.; Yu, X.; Lu, Y.; Liu, S. Does ChatGPT enhance student learning? A systematic review and meta-analysis of experimental studies. Comput. Educ. 2025, 227, 105224. [Google Scholar] [CrossRef]
- Wang, J.; Fan, W. The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: Insights from a meta-analysis. Humanit. Soc. Sci. Commun. 2025, 12, 621. [Google Scholar] [CrossRef]
- Ilieva, G.; Yankova, T.; Ruseva, M.; Kabaivanov, S. A framework for generative AI-driven assessment in higher education. Information 2025, 16, 472. [Google Scholar] [CrossRef]
- Li, Y.; Shan, Z.; Raković, M.; Guan, Q.; Gašević, D.; Chen, G. When AI explains in natural language: Unveiling the impact of generative AI explanations on educators’ grading and feedback practices. Educ. Inf. Technol. 2025, 30, 24931–24964. [Google Scholar] [CrossRef]
- Kofinas, A.K.; Tsay, C.H.; Pike, D. The impact of generative AI on the academic integrity of authentic assessments within higher education. Br. J. Educ. Technol. 2025, 56, 2522–2549. [Google Scholar] [CrossRef]
- Smolansky, A.; Cram, A.; Raduescu, C.; Zeivots, S.; Huber, E.; Kizilcec, R.F. Educator and student perspectives on the impact of generative AI on assessments in higher education. In Proceedings of the Tenth ACM Conference on Learning@ Scale; ACM: New York, NY, USA, 2023; pp. 378–382. [Google Scholar]
- Moongela, H.; Matthee, M.; Turpin, M.; van der Merwe, A. The Effect of Generative Artificial Intelligence on Cognitive Thinking Skills in Higher Education Institutions: A Systematic Literature Review. In Southern African Conference for Artificial Intelligence Research; Springer Nature: Cham, Switzerland, 2024; pp. 355–371. [Google Scholar]
- Burnett, L.K.; Richmond, L.L. Just write it down: Similarity in the benefit from cognitive offloading in young and older adults. Mem. Cogn. 2023, 51, 1580–1592. [Google Scholar] [CrossRef]
- Grinschgl, S.; Meyerhoff, H.S.; Schwan, S.; Papenmeier, F. From metacognitive beliefs to strategy selection: Does fake performance feedback influence cognitive offloading? Psychol. Res. 2021, 85, 2654–2666. [Google Scholar] [CrossRef]
- Risko, E.F.; Gilbert, S.J. Cognitive offloading. Trends Cogn. Sci. 2016, 20, 676–688. [Google Scholar] [CrossRef] [PubMed]
- Weis, P.P.; Wiese, E. Problem solvers adjust cognitive offloading based on performance goals. Cogn. Sci. 2019, 43, e12802. [Google Scholar] [CrossRef] [PubMed]
- Iqbal, J.; Hashmi, Z.F.; Asghar, M.Z.; Abid, M.N. Generative AI tool use enhances academic achievement in sustainable education through shared metacognition and cognitive offloading among preservice teachers. Sci. Rep. 2025, 15, 16610. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Xu, J. The paradox of self-efficacy and technological dependence: Unraveling generative AI’s impact on university students’ task completion. Internet High. Educ. 2025, 65, 100978. [Google Scholar] [CrossRef]
- Backfisch, I.; Lachner, A.; Stürmer, K.; Scheiter, K. Variability of teachers’ technology integration in the classroom: A matter of utility! Comput. Educ. 2021, 166, 104159. [Google Scholar] [CrossRef]
- Huang, S.; Lai, X.; Ke, L.; Li, Y.; Wang, H.; Zhao, X.; Dai, X.; Wang, Y. AI technology panic—Is AI dependence bad for mental health? A cross-lagged panel model and the mediating roles of motivations for AI use among adolescents. Psychol. Res. Behav. Manag. 2024, 17, 1087–1102. [Google Scholar] [CrossRef] [PubMed]
- Ataş, A.H.; Yıldırım, Z. A shared metacognition-focused instructional design model for online collaborative learning environments. Educ. Technol. Res. Dev. 2025, 73, 567–613. [Google Scholar] [CrossRef]
- Singh, C.A.; Muis, K.R. An integrated model of socially shared regulation of learning: The role of metacognition, affect, and motivation. Educ. Psychol. 2024, 59, 177–194. [Google Scholar] [CrossRef]
- Diaz, B.; Chen, G.; Jaselskis, E.; Delgado, C. Supporting Generative AI Literacy: Exploring the Pedagogical Roles Students Assign ChatGPT and Impact on Course Grades. Comunicar 2025, 33, 46–61. [Google Scholar]
- Vanacore, K.; Pankiewicz, M.; Baker, R. Unpacking the Impact of Generative AI Feedback: Divergent Effects on Student Performance and Self-Regulated Learning. Available online: https://osf.io/preprints/edarxiv/tbpn3_v1 (accessed on 1 April 2026).
- Xie, Y.; Luo, L. The impact of generative AI on learning across grades. In Proceedings of the 2025 14th International Conference on Educational and Information Technology (ICEIT), Guangzhou, China, 14–16 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 91–95. [Google Scholar] [CrossRef]
- Elmourabit, Z.; Retbi, A.; El Faddouli, N.E. The Impact of Generative Artificial Intelligence on Education: A Comparative Study. In European Conference on e-Learning; Academic Conferences International Limited: Reading, UK, 2024; Volume 23, pp. 470–476. [Google Scholar]
- Spence, M. Job market signaling. Q. J. Econ. 1973, 87, 355–374. [Google Scholar] [CrossRef]
- Cook, T.D.; Campbell, D.T. Quasi-Experimentation: Design & Analysis Issues for Field Settings; Houghton Mifflin: Boston, MA, USA, 1979. [Google Scholar]
- Shadish, W.R.; Cook, T.D.; Campbell, D.T. Experimental and Quasi-Experimental Designs for Generalized Causal Inference; Houghton Mifflin: Boston, MA, USA, 2002. [Google Scholar]
- Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988. [Google Scholar]
- NOKUT—Nasjonalt Organ for Kvalitet i Utdanningen. Fire av Fem Studenter Bruker Kunstig Intelligens i Studiene [News Release]; NOKUT: Oslo, Norway, 10 February 2025; Available online: https://www.nokut.no/nyheter/fire-av-fem-studenter-bruker-kunstig-intelligens-i-studiene/ (accessed on 1 April 2026).

| Sample Questions and Main Grading Criteria | 2024 Exam (Take-Home Format) | 2025 Exam (In-Person Exam Format) |
|---|---|---|
| Sample question | Discuss strategy and efficiency in [case company] in light of the challenges the company faces in its environment. Use theories and models from the course literature to justify your answer. | Explain the concept of learning and discuss what can be done to strengthen learning in [case company], in light of relevant theories and models from the syllabus. |
| Sample question | [Case company] faces challenges related to the practice of leadership. Identify some of these challenges and discuss how leadership practice can be developed to address them. Use theories and models from the course literature to justify your answer. | Explain the concept of leadership and discuss how leadership can be exercised to ensure continued strong performance in [case company], in light of relevant theories and models from the syllabus. |
| Sample question | Discuss how [case company] takes care of its employees in terms of motivation and performance, and what can be done to promote motivation and retain employees. Use theories and models from the course literature to justify your answer. | Discuss factors that may promote and hinder employee motivation in [case company], in light of relevant theories and models from the syllabus. |
| Main grading criteria | For all questions, the case must be actively incorporated and discussed in the response. To pass, students are required to apply theories and models from the syllabus when discussing and analyzing various situations in the organization. It should be clearly evident that they are “consulting” the course literature. Responses in which the presentation of relevant theories and models from the syllabus is entirely absent will be considered a fail. | For all questions, the case must be actively incorporated and discussed in the response. To pass, students are required to apply theories and models from the syllabus when discussing and analyzing various situations in the organization. It should be clearly evident that they are “consulting” the course literature. Responses in which the presentation of relevant theories and models from the syllabus is entirely absent will be considered a fail. |
| Grade | 2021 (N = 263) | 2022 (N = 230) | 2023 (N = 211) | 2024 (N = 187) | 2025 (N = 179) |
|---|---|---|---|---|---|
| A | 18 (6.8%) | 16 (7.0%) | 19 (9.0%) | 13 (7.0%) | 13 (7.3%) |
| B | 56 (21.3%) | 53 (23.0%) | 53 (25.1%) | 46 (24.6%) | 30 (16.8%) |
| C | 85 (32.3%) | 68 (29.6%) | 59 (28.0%) | 58 (31.0%) | 34 (19.0%) |
| D | 59 (22.4%) | 50 (21.7%) | 45 (21.3%) | 45 (24.1%) | 40 (22.4%) |
| E | 34 (12.9%) | 32 (13.9%) | 23 (10.9%) | 20 (10.7%) | 29 (16.2%) |
| F | 8 (3.0%) | 11 (4.8%) | 12 (5.7%) | 4 (2.1%) | 33 (18.4%) |
| Total | 263 (100%) | 230 (100%) | 211 (100%) | 187 (100%) | 179 (100%) |
| Grade | Baseline (n) | 2025 (n) | Expected 2025 | χ2 Baseline | χ2 2025 |
|---|---|---|---|---|---|
| A | 66 | 13 | 13.3 | 0.001 | 0.005 |
| B | 208 | 30 | 40 | 0.5 | 2.48 |
| C | 270 | 34 | 51.1 | 1.15 | 5.69 |
| D | 199 | 40 | 40.1 | 0 | 0 |
| E | 109 | 29 | 23.2 | 0.3 | 1.47 |
| F | 35 | 33 | 11.4 | 8.23 | 40.79 |
| Sum χ2 | 10.18 | 50.44 |
| Comparison | Chi-Square | df | Cramér’s V | p |
|---|---|---|---|---|
| Baseline 2021–2024 vs. 2025 | 60.618 | 5 | 0.238 | <0.001 |
| 2021 vs. 2025 | 36.095 | 5 | 0.287 | <0.001 |
| 2022 vs. 2025 | 24.294 | 5 | 0.244 | <0.001 |
| 2023 vs. 2025 | 22.531 | 5 | 0.24 | <0.001 |
| 2024 vs. 2025 | 34.185 | 5 | 0.306 | <0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Brattli, H.; Utne, A.; Lynch, M. Assessment Validity in the Age of Generative AI: A Natural Experiment. Informatics 2026, 13, 56. https://doi.org/10.3390/informatics13040056
Brattli H, Utne A, Lynch M. Assessment Validity in the Age of Generative AI: A Natural Experiment. Informatics. 2026; 13(4):56. https://doi.org/10.3390/informatics13040056
Chicago/Turabian StyleBrattli, Håvar, Alexander Utne, and Matthew Lynch. 2026. "Assessment Validity in the Age of Generative AI: A Natural Experiment" Informatics 13, no. 4: 56. https://doi.org/10.3390/informatics13040056
APA StyleBrattli, H., Utne, A., & Lynch, M. (2026). Assessment Validity in the Age of Generative AI: A Natural Experiment. Informatics, 13(4), 56. https://doi.org/10.3390/informatics13040056

