Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (8)

Search Parameters:
Keywords = essay exam

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 502 KiB  
Article
Passing with ChatGPT? Ethical Evaluations of Generative AI Use in Higher Education
by Antonio Pérez-Portabella, Mario Arias-Oliva, Graciela Padilla-Castillo and Jorge de Andrés-Sánchez
Digital 2025, 5(3), 33; https://doi.org/10.3390/digital5030033 - 6 Aug 2025
Abstract
The emergence of generative artificial intelligence (GenAI) in higher education offers new opportunities for academic support while also raising complex ethical concerns. This study explores how university students ethically evaluate the use of GenAI in three academic contexts: improving essay writing, preparing for [...] Read more.
The emergence of generative artificial intelligence (GenAI) in higher education offers new opportunities for academic support while also raising complex ethical concerns. This study explores how university students ethically evaluate the use of GenAI in three academic contexts: improving essay writing, preparing for exams, and generating complete essays without personal input. Drawing on the Multidimensional Ethics Scale (MES), the research assesses five philosophical frameworks—moral equity, relativism, egoism, utilitarianism, and deontology—based on a survey conducted among undergraduate social sciences students in Spain. The findings reveal that students generally view GenAI use as ethically acceptable when used to improve or prepare content, but express stronger ethical concerns when authorship is replaced by automation. Gender and full-time employment status also influence ethical evaluations: women respond differently than men in utilitarian dimensions, while working students tend to adopt a more relativist stance and are more tolerant of full automation. These results highlight the importance of context, individual characteristics, and philosophical orientation in shaping ethical judgments about GenAI use in academia. Full article
Show Figures

Figure 1

27 pages, 3562 KiB  
Article
Automated Test Generation and Marking Using LLMs
by Ioannis Papachristou, Grigoris Dimitroulakos and Costas Vassilakis
Electronics 2025, 14(14), 2835; https://doi.org/10.3390/electronics14142835 - 15 Jul 2025
Cited by 1 | Viewed by 506
Abstract
This paper presents an innovative exam-creation and grading system powered by advanced natural language processing and local large language models. The system automatically generates clear, grammatically accurate questions from both short passages and longer documents across different languages, supports multiple formats and difficulty [...] Read more.
This paper presents an innovative exam-creation and grading system powered by advanced natural language processing and local large language models. The system automatically generates clear, grammatically accurate questions from both short passages and longer documents across different languages, supports multiple formats and difficulty levels, and ensures semantic diversity while minimizing redundancy, thus maximizing the percentage of the material that is covered in the generated exam paper. For grading, it employs a semantic-similarity model to evaluate essays and open-ended responses, awards partial credit, and mitigates bias from phrasing or syntax via named entity recognition. A major advantage of the proposed approach is its ability to run entirely on standard personal computers, without specialized artificial intelligence hardware, promoting privacy and exam security while maintaining low operational and maintenance costs. Moreover, its modular architecture allows the seamless swapping of models with minimal intervention, ensuring adaptability and the easy integration of future improvements. A requirements–compliance evaluation, combined with established performance metrics, was used to review and compare two popular multilingual LLMs and monolingual alternatives, demonstrating the system’s effectiveness and flexibility. The experimental results show that the system achieves a grading accuracy within a 17% normalized error margin compared to that of human experts, with generated questions reaching up to 89.5% semantic similarity to source content. The full exam generation and grading pipeline runs efficiently on consumer-grade hardware, with average inference times under 30 s. Full article
Show Figures

Figure 1

17 pages, 274 KiB  
Article
Investigating Metacognitive Strategies and Exam Performance: A Cross-Sectional Survey Research Study
by Jolie V. Kennedy and David R. Arendale
Educ. Sci. 2023, 13(11), 1132; https://doi.org/10.3390/educsci13111132 - 13 Nov 2023
Viewed by 2229
Abstract
This investigation used cross-sectional survey research methods in a high-enrollment undergraduate history course, setting out to examine test performance and metacognitive strategies that subjects self-selected prior to class, during class, and during the exam. This study examined the differences in exam scores between [...] Read more.
This investigation used cross-sectional survey research methods in a high-enrollment undergraduate history course, setting out to examine test performance and metacognitive strategies that subjects self-selected prior to class, during class, and during the exam. This study examined the differences in exam scores between one group of students who self-reported completing specific metacognitive strategies and one group of students who self-reported not completing them. An online survey instrument was used to collect data from 121 students about the frequency of occurrence of specific behaviors. Frequencies and an Independent Samples T-Test were used to analyze metacognitive strategies and exam performance. The results showed the following strategies were statistically significant at the 0.05 alpha level: (1) read or listened to assigned readings and audio files before they were discussed during class; (2) frequently took part in small group discussion at the table during the class session; (3) created outlines for each of the potential essay questions to prepare for the examination; and (4) made an outline of the essay question before beginning to write while taking the exam. Limitations of the study, implications of the results, and recommendations for future research are provided. With the challenges of supporting students to earn higher grades and persist toward graduation, faculty members need to join the rest of the campus to be active agents in supporting students through simple learning strategies and effective student behaviors embedded into their courses. This may require extra time and effort to engage in professional development to learn how to embed practice with metacognitive strategies during class sessions. Full article
33 pages, 729 KiB  
Article
ChatGPT and the Generation of Digitally Born “Knowledge”: How Does a Generative AI Language Model Interpret Cultural Heritage Values?
by Dirk H. R. Spennemann
Knowledge 2023, 3(3), 480-512; https://doi.org/10.3390/knowledge3030032 - 18 Sep 2023
Cited by 43 | Viewed by 10306
Abstract
The public release of ChatGPT, a generative artificial intelligence language model, caused wide-spread public interest in its abilities but also concern about the implications of the application on academia, depending on whether it was deemed benevolent (e.g., supporting analysis and simplification of tasks) [...] Read more.
The public release of ChatGPT, a generative artificial intelligence language model, caused wide-spread public interest in its abilities but also concern about the implications of the application on academia, depending on whether it was deemed benevolent (e.g., supporting analysis and simplification of tasks) or malevolent (e.g., assignment writing and academic misconduct). While ChatGPT has been shown to provide answers of sufficient quality to pass some university exams, its capacity to write essays that require an exploration of value concepts is unknown. This paper presents the results of a study where ChatGPT-4 (released May 2023) was tasked with writing a 1500-word essay to discuss the nature of values used in the assessment of cultural heritage significance. Based on an analysis of 36 iterations, ChatGPT wrote essays of limited length with about 50% of the stipulated word count being primarily descriptive and without any depth or complexity. The concepts, which are often flawed and suffer from inverted logic, are presented in an arbitrary sequence with limited coherence and without any defined line of argument. Given that it is a generative language model, ChatGPT often splits concepts and uses one or more words to develop tangential arguments. While ChatGPT provides references as tasked, many are fictitious, albeit with plausible authors and titles. At present, ChatGPT has the ability to critique its own work but seems unable to incorporate that critique in a meaningful way to improve a previous draft. Setting aside conceptual flaws such as inverted logic, several of the essays could possibly pass as a junior high school assignment but fall short of what would be expected in senior school, let alone at a college or university level. Full article
(This article belongs to the Special Issue New Trends in Knowledge Creation and Retention)
Show Figures

Figure 1

18 pages, 3057 KiB  
Article
Automatic Essay Scoring Method Based on Multi-Scale Features
by Feng Li, Xuefeng Xi, Zhiming Cui, Dongyang Li and Wanting Zeng
Appl. Sci. 2023, 13(11), 6775; https://doi.org/10.3390/app13116775 - 2 Jun 2023
Cited by 16 | Viewed by 4858
Abstract
Essays are a pivotal component of conventional exams; accurately, efficiently, and effectively grading them is a significant challenge for educators. Automated essay scoring (AES) is a complex task that utilizes computer technology to assist teachers in scoring. Traditional AES techniques only focus on [...] Read more.
Essays are a pivotal component of conventional exams; accurately, efficiently, and effectively grading them is a significant challenge for educators. Automated essay scoring (AES) is a complex task that utilizes computer technology to assist teachers in scoring. Traditional AES techniques only focus on shallow linguistic features based on the grading criteria, ignoring the influence of deep semantic features. The AES model based on deep neural networks (DNN) can eliminate the need for feature engineering and achieve better accuracy. In addition, the DNN-AES model combining different scales of essays has recently achieved excellent results. However, it has the following problems: (1) It mainly extracts sentence-scale features manually and cannot be fine-tuned for specific tasks. (2) It does not consider the shallow linguistic features that the DNN-AES cannot extract. (3) It does not contain the relevance between the essay and the corresponding prompt. To solve these problems, we propose an AES method based on multi-scale features. Specifically, we utilize Sentence-BERT (SBERT) to vectorize sentences and connect them to the DNN-AES model. Furthermore, the typical shallow linguistic features and prompt-related features are integrated into the distributed features of the essay. The experimental results show that the Quadratic Weighted Kappa of our proposed method on the Kaggle ASAP competition dataset reaches 79.3%, verifying the efficacy of the extended method in the AES task. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)
Show Figures

Figure 1

12 pages, 291 KiB  
Article
Perceptions about the Assessment in Emergency Virtual Education Due to COVID-19: A Study with University Students from Lima
by Iván Montes-Iturrizaga, Gloria María Zambrano Aranda, Yajaira Licet Pamplona-Ciro and Klinge Orlando Villalba-Condori
Educ. Sci. 2023, 13(4), 378; https://doi.org/10.3390/educsci13040378 - 7 Apr 2023
Cited by 1 | Viewed by 2232
Abstract
The COVID-19 pandemic forced a large section of Peruvian universities to design systems for emergency virtual education. This required professors to quickly learn how to use teaching platforms, digital tools and a wide range of technological skills. In this context, it is remarkable [...] Read more.
The COVID-19 pandemic forced a large section of Peruvian universities to design systems for emergency virtual education. This required professors to quickly learn how to use teaching platforms, digital tools and a wide range of technological skills. In this context, it is remarkable that formative assessment may have been the pedagogical action with the greatest number of challenges, tensions and problems, due to the lack of preparation of many professors to apply performance tests and provide effective feedback. Given this, it is presumed that these insufficiencies (previously exhibited in face-to-face education) were transferred to virtual classrooms in the framework of the health emergency. A survey study was carried out on 240 students from a private university in Lima to find out their perceptions and preferences regarding the tests that their professors administered in the virtual classrooms. It was found that the students were assessed, for the most part, with multiple choice tests. In addition, it was found that the students recognized that the essay tests were the most important for their education, but they preferred multiple choice tests. Finally, it was found that law school students were mostly assessed with essay tests and psychology students with oral tests. Full article
27 pages, 489 KiB  
Article
Automated Discourse Analysis Techniques and Implications for Writing Assessment
by Trisevgeni Liontou
Languages 2023, 8(1), 3; https://doi.org/10.3390/languages8010003 - 21 Dec 2022
Cited by 2 | Viewed by 3095
Abstract
Analysing writing development as a function of foreign language competence is important in secondary school children because the developmental patterns are strongest at a young age when successful interventions are needed. Although a number of researchers have explored the degree to which specific [...] Read more.
Analysing writing development as a function of foreign language competence is important in secondary school children because the developmental patterns are strongest at a young age when successful interventions are needed. Although a number of researchers have explored the degree to which specific textual characteristics in EFL students’ essays are associated with high and low ratings by teachers, the extent to which such characteristics are associated with rater-mediated assessment under standard exam conditions remains relatively unexplored. Motivated by the above void in pertinent literature, the overall aim of the present study was to investigate the relationship between specific discourse features present in the writing scripts of EFL learners sitting for the British Council’s APTIS for TEENS exam and the assigned scores during operational scoring by specially trained raters. A total of 800 international EFL students aged 13 to 15 years old took part in the study, and 800 scored written essays on the same task prompt of the pertinent test produced under standard exam conditions were analysed. The results showed statistically significant differences (p ≤ 0.05) between the linguistic features identified in the essays produced by young EFL learners at different levels of language competence. The main text features that were repeatedly found to make a significant contribution to distinguishing scores assigned to texts both within and across levels were word frequency, word abstractness, lexical diversity, lexical and semantic overlap, all of which could be used to obtain a numerical cut-off point between proficiency levels. These findings support the notion that progress in L2 writing is primarily associated with producing more elaborate texts with more sophisticated words, more complex sentence structure and fewer cohesive features as a function of increased language competence. The findings of the study could provide practical guidance to EFL teachers, material developers and test designers as to the kind of linguistic strategies young EFL learners develop as a function of their level of language competence and suggestions to consider when designing EFL classroom curricula, writing skills textbooks and exam papers on written production. Full article
(This article belongs to the Special Issue Recent Developments in Language Testing and Assessment)
12 pages, 263 KiB  
Essay
Changes in Mathematics Core Curriculum and Matriculation Exam in the Light of the COVID-19-Shock
by Csaba Csapodi and Miklós Hoffmann
Educ. Sci. 2021, 11(10), 610; https://doi.org/10.3390/educsci11100610 - 2 Oct 2021
Cited by 2 | Viewed by 3072
Abstract
The new National Core Curriculum came into force in September 2020 in Hungarian schools. The COVID-19 pandemic has had a deep impact on the final stages of its development. In this paper we have selected two areas for analysis: the fundamental principles of [...] Read more.
The new National Core Curriculum came into force in September 2020 in Hungarian schools. The COVID-19 pandemic has had a deep impact on the final stages of its development. In this paper we have selected two areas for analysis: the fundamental principles of mathematics curriculum and the matriculation exam in mathematics. We propose improvements in both fields, further emphasizing the importance of skills in displaying, understanding and processing information, including visual information obtained as a source or outcome of a problem. We argue that representation, interpretation, and critical evaluation of data and information must be essential parts of the mathematics curriculum. In this context, we also propose a new type of task for the matriculation exam: a complex essay task. The ultimate goal is the development of cross-cutting competencies to support students to become citizens who can make responsible decisions based on the data and knowledge available. Full article
Back to TopTop