Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms

Luis, Samuel Yanes; Reina, Daniel Gutiérrez; Marín, Sergio Toral

doi:10.3390/educsci15060706

Open AccessArticle

Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms

by

Samuel Yanes Luis

^*

,

Daniel Gutiérrez Reina

and

Sergio Toral Marín

Department of Electronic Engineering, University of Sevilla, 41004 Sevilla, Spain

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(6), 706; https://doi.org/10.3390/educsci15060706

Submission received: 26 April 2025 / Revised: 28 May 2025 / Accepted: 28 May 2025 / Published: 5 June 2025

(This article belongs to the Section Technology Enhanced Education)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Project-Based Learning is central to modern engineering education, but assessing the originality of student work poses significant challenges, particularly when previous project repositories are accessible. This paper addresses the issue by proposing a framework based on Retrieval-Augmented Generation and Large Language Models to evaluate the originality of project proposals in the context of a master’s course in Electronic Engineering. The system processes and summarizes prior work—including reports, code, and presentations—creating a semantically indexed knowledge base. Upon receiving a new proposal, the RAG system compares it to this base, identifies the most similar previous works, and generates an originality assessment. Results show the effectiveness of the approach, with a strong correlation (coefficient of 0.87) between the originality scores assigned by the system and those given by expert instructors in a blind evaluation. The proposed RAG system demonstrates its ability to systematically detect similarities and potential lack of originality that might be overlooked by human evaluators due to the volume of prior work. This framework provides an objective and efficient tool to support originality assessment, benefiting both instructors and students by promoting genuine innovation in PBL settings.

Keywords:

project-based learning; LLM; RAG; originality evaluation

1. Introduction

Project-Based Learning (PBL) is a constructivist pedagogical approach in which students acquire knowledge and skills through the active exploration of problems and the execution of meaningful projects. Unlike traditional teaching methods, where information is presented in a fragmented way and with a predominantly theoretical focus, PBL suggests that students work on projects that reflect real-life situations, promoting contextualized and applied learning (Vogler et al., 2018). In this model, students are not passive recipients of information, but active agents who develop competencies through research, experimentation, and the application of concepts in concrete contexts. This approach aligns with the theory of meaningful learning (Ausubel, 1968), which emphasizes the importance of connecting new information with prior knowledge to facilitate deeper and more lasting understanding.

Designing a project-based course requires careful planning to guarantee that students not only gain disciplinary knowledge but also develop transversal skills such as problem-solving, critical thinking, effective communication, and teamwork (Ngereja et al., 2020). Clear learning objectives must be defined, and the project should revolve around an open-ended question or challenge, complex enough to require the application of multiple concepts and methodologies. As students progress through the project, they are required to investigate, analyze data, synthesize information, and make informed decisions, fostering active and autonomous learning. A key aspect of PBL is assessment, which must go beyond traditional exams to incorporate methods such as formative assessment, self-assessment, and peer assessment. These strategies enable students to reflect on their own learning process and receive continuous feedback, contributing to the improvement of their competencies. The final project deliverable—which can take various forms such as technical reports, prototypes, presentations, or simulations—not only serves as evidence of learning but can also have an impact beyond the classroom by addressing real problems and proposing innovative solutions (Alves et al., 2016).

In engineering and applied science education, PBL has proven particularly effective, as it allows students to tackle complex technical problems that simulate challenges found in professional environments (Bédard et al., 2012). The integration of computational tools, the use of simulations, and the implementation of agile methodologies in project development have been shown to improve both academic performance and students’ preparedness for the job market. One of the main benefits of PBL in the training of electronic engineers is that it promotes the integration of theoretical and practical knowledge, enabling students to better understand the fundamentals of the discipline and their application in the development of electronic devices and systems (Etchepareborda et al., 2018). In a traditional approach, concepts such as electrical circuits, digital systems, microcontroller programming, and communications are usually taught in isolation, which can hinder the perception of their interrelation. With PBL, the designed projects require the combination of these areas, forcing students to apply principles from different branches of engineering to solve a specific problem. This improves their synthesis ability and fosters deeper and more meaningful learning (de Sales & Boscarioli, 2021).

In this paper, we present the case of the course *Advanced Digital Systems and Applications* (SDAA), part of the Master’s Degree in Electronic, Robotic and Automation Engineering at the University of Seville. This course brings together students from various engineering backgrounds, such as electronics, computer science, and telecommunications among others. The course is designed for students to develop a research project in which they apply the knowledge acquired during the course. This knowledge includes the Internet of Things (IoT), Artificial Intelligence, and the design of electronic and software applications using a Raspberry Pi 4 (RPi4) (Balon & Simić, 2019). The RPi4 system is a low-cost, small-form-factor microcomputer that has gained popularity in educational and professional settings due to its versatility and processing capabilities (see Figure 1). Throughout the course, students must design a project that integrates these skills and concepts, using the RPi4 as the core platform. The topic of the projects is open, but must focus on the knowledge covered in the course. This means students can choose an area of application that interests them, as long as it relates to the use of the RPi4 and the concepts of IoT, AI, and electronic systems. This thematic freedom encourages student creativity and motivation, as they can explore areas they are passionate about and apply their knowledge in real-world contexts (Fitrah et al., 2025). The only requirement is that the project must be validated by the instructor, who will ensure that the chosen topic is relevant and feasible within the course framework. This validation process is essential to guarantee that students are working on projects that meet the course objectives and that they have the necessary resources and support to carry them out successfully. The proposed system expected to be useful for the instructors because it serves as a automatic validation tool for such validation process. This can prevent issues such as lecturers forgetting past projects or even changes in faculty within the department. Even, if lecturers were new to the class, they could be ”easily misled” because they would not know what has been done previously.

To evaluate the projects, the assessment criteria should focus on the originality and innovation of the proposals, as well as the technical quality and the presentation of the final work. Students must demonstrate their ability to research, design, and develop a project that not only meets technical requirements but also provides added value in terms of originality and creativity. This last aspect is critical, as one of the indicators of project quality is its capacity to offer novel solutions to existing problems or to address challenges in an innovative way. Originality refers not only to the novelty of the approach or proposed solution but also to how students articulate their idea and endow it with practical utility and relevance in the current context.

Students, in order to carry out their projects, have access to a knowledge base that includes previous works done by other students in prior years. In the field of application and software development, it is common to reuse code from public repositories such as GitHub to accelerate the development process. This practice is valid and accepted in the developer community, as long as the usage licenses are respected and appropriate credit is given to the original authors. This methodology is motivated also because it encourages the reproducibility of the results. However, in the academic context, the reuse of previous works raises an ethical and pedagogical dilemma. On one hand, it allows students to learn from past experiences and benefit from the work done by others. On the other hand, it can lead to the temptation to copy or plagiarize ideas without adding value or offering an original perspective (Chu et al., 2021). This dilemma is exacerbated by the pressure students feel to meet deadlines and quality expectations, which can lead to unethical decisions in the pursuit of quick solutions.

Regarding this, over the years, the teaching team has observed that students tend to be inspired by previous works, which can lead to the creation of projects that, although technically correct, lack originality and creativity. Evaluating these works becomes a challenge, as the line between inspiration and plagiarism can be blurred (McIntire et al., 2024). Additionally, the instructor must assess year after year whether the work presents an original research line or if it is simply an adaptation of previous works. This task becomes unfeasible when the number of prior works is high (the SDAA course already has 100 previous works), which can lead to subjective evaluations. Furthermore, it must be considered that students do not always have the ability to identify which previous works are relevant to their project and which are not. This can lead to an information overload and difficulty in discerning what is useful and what is not. Additionally, students may feel overwhelmed by the amount of available information and may have trouble integrating the knowledge acquired into their own project.

This paper proposes a framework to address the evaluation of the originality of projects. This framework allows (1) students to identify relevant prior works for their project to reduce similarity, and (2) instructors to evaluate the originality of projects in an objective and verified manner. To achieve this, a project originality assessment system based on information retrieval techniques (RAG) is proposed. The use of large language models (LLMs) is suggested to serve as a semantic foundation to understand the content and idea of prior works. Due to the context window limitation of LLMs, it is necessary to use an information retrieval system to identify relevant prior works for the project. The instructor will provide reports and codes of previous projects as a knowledge base and open a communication channel with the LLM so that students can ask questions about the previous works. The LLM, in turn, will generate a summary of the relevant prior works and provide an evaluation of the originality of the new project. This approach not only facilitates the identification of relevant works but also allows students to reflect on their own learning process and improve their skills in research and project development. Additionally, by using an information retrieval system based on LLMs, it is expected that the cognitive load on students will be reduced, improving their ability to integrate prior knowledge into their own projects.

The main contribution of this paper lies in the development and evaluation of a framework that combines information retrieval techniques and language models to assess the originality of projects in an educational context. This approach not only addresses the issue of plagiarism and lack of originality but also promotes more meaningful and contextualized learning, aligned with the principles of PBL.

The paper is structured as follows: Section 2 presents a state of the art review on the use of LLMs in education and the evaluation of originality. Section 3 describes the proposed framework, including the methodology and the results obtained from its implementation. Finally, Section 4 presents the conclusions and the implications for future research.

2. Previous Work

2.1. LLMs and Their Application in Education

Large language models (LLMs), artificial intelligence systems trained on vast textual datasets, have demonstrated remarkable capability in generating human-like language (Li et al., 2025). These models, such as ChatGPT, Llama, or Gemini, have garnered global attention for their transformative potential across various fields, including education (Hemmat et al., 2025). Specifically, LLMs can significantly enhance critical tasks in engineering education, such as information retrieval, code generation and debugging, and writing refinement, all of which are essential for problem-solving and the technical focus of engineering disciplines. Furthermore, LLMs have the ability to automate tasks within requirements engineering, such as elicitation and specification, processes that have historically been time-consuming and error-prone (Capstick et al., 2025).

At the same time, the technique of retrieval-augmented generation (RAG) has emerged as an effective method to enhance the capabilities of LLMs. A RAG improves LLMs by integrating external, reliable, and up-to-date knowledge during the response generation process (Russell-Gilbert et al., 2025). This technique enables LLMs to access and utilize real-time information from external sources, such as databases or documents, to support their responses, thus improving their accuracy and contextual relevance. The combination of LLMs and RAGs presents significant potential to revolutionize various aspects of engineering education.

The interest of educators and students in the integration of LLMs into the learning process, especially in areas such as software engineering, is increasing (Naman et al., 2025). The versatility of LLMs makes them suitable for a wide range of educational applications, from personalized tutoring to project assistance and evaluation. This state of the art aims to explore in depth the use of LLMs and RAGs in the analysis of project originality, the drafting of educational projects in the field of engineering, plagiarism analysis, and the search for relevant educational documents (Rasnayaka et al., 2024).

2.2. Analysis of Originality with LLMs and RAGs

Evaluating originality is a key component in engineering education, ensuring that students’ projects represent genuine and innovative work. Recent research has begun to explore the role of LLMs in this process. Some studies have assessed the ability of LLMs to generate novel scientific contributions, such as hypotheses or proposals (Gupta & Pruthi, 2025). However, an important finding is that, despite their ability to generate ideas that superficially seem new, LLMs may fail to recognize the original sources of the information and, therefore, bypass plagiarism detection systems. This situation arises because LLMs generate text based on patterns learned from large amounts of data but do not necessarily understand the importance of academic attribution.

In the expert evaluation of the novelty in research proposals generated by LLMs, it has been observed that human experts sometimes consider these proposals to be more novel than those written by people (Gabino-Campos et al., 2025). This could be due to the way LLMs combine or present existing ideas, giving them an appearance of originality. Nevertheless, there have also been documented cases in which proposals generated by LLMs turned out to be skillful plagiarism, suggesting that the perceived novelty could sometimes be superficial. The lack of diversity and predictable patterns in content generated by LLMs are factors contributing to this difficulty in verifying originality (Gabino-Campos et al., 2025).

The use of LLMs is also being explored in analyzing code similarity in engineering projects. In this domain, LLMs have shown the ability to outperform traditional methods in detecting sophisticated code plagiarism due to their understanding of code semantics (Guo et al., 2025). Traditional methods often rely on syntactic analysis and struggle to identify similarities when the code has been intentionally modified or obfuscated. LLMs, on the other hand, can capture the underlying intent and functionality of the code, allowing them to detect similarities even in more complex cases of plagiarism. This is particularly relevant in the educational context, where students may attempt to present work that is variations of previous projects or code found online. However, the ability of LLMs to identify similarities is not infallible and can be affected by the quality of training data and the diversity of the corpus used (Chen et al., 2024).

2.3. Improving LLMs’ Ability to Detect Novelty with RAG

To enhance the ability of LLMs to detect novelty, the RAG technique has been widely applied. One example is presented in (Lin et al., 2024), a method that simulates the human review process by using the retrieval of similar articles to assess the novelty of an academic work. By providing LLMs with a context of existing and relevant works, RAG enables them to identify whether an idea or methodology is truly new in relation to the published literature.

The use of LLMs and RAGs in originality analysis offers several advantages. These technologies can process large amounts of information quickly, identify complex similarity patterns, and provide a broader context for novelty evaluation. However, there are also limitations. The effectiveness of LLMs depends on the quality of the training data. In most cases, a user cannot retrain with specific data due to computational costs. RAG systems alleviate this limitation by providing real context on preselected relevant documents. In (Chen et al., 2024), it is demonstrated that incorporating a RAG system not only improves accuracy but also reduces the cognitive load on the LLM by allowing it to focus on the most relevant and recent information. Lastly, the bias in LLMs arising from their training with biased data can be mitigated by using RAG, which provides a more balanced and diverse context to the model.

Regarding existing tools or platforms for originality analysis, commercial products like Turnitin (Turnitin, 2025), Google Classroom’s originality reports (Google, 2025), which now include functionalities to detect AI-generated content, are available. Turnitin and others, are primarily designed to detect literal or paraphrased overlaps across large textual corpora. They are effective at flagging copying, but they do not evaluate the originality of an idea, nor do they consider pedagogical intent or explain similarity beyond numeric indices. In contrast, the system proposed in this work is explicitly designed for originality evaluation: it identifies semantically similar prior work, explains similarities and differences, and suggests improvements to promote creative divergence. This is particularly relevant in project-based learning scenarios, where the reuse of foundational code structures or standardized libraries does not imply plagiarism. Specialized platforms like GPTZero and others have also emerged, focusing specifically on identifying text created by language models. Although these tools are effective at detecting textual plagiarism, they have proven less effective in detecting text generated by an LLM (Bordalejo et al., 2025). To the authors’ knowledge, there is no platform specifically designed to detect originality based on local repositories of previous projects.

3. Methodology

The proposed RAG system for evaluating originality in the SDAA course is based on the combination of information retrieval techniques and language models. The aim is to provide both students and instructors with a tool that facilitates the identification of relevant prior work and allows the evaluation of the originality of new projects. Below are the key components of the proposed system.

3.1. RAGs: Operating Principle

A large language model, or LLM, is a neural network trained on vast volumes of text to predict the next word in a sequence. The model learns linguistic patterns, semantic and syntactic relationships, as well as certain structures of knowledge present in the training corpus. These models do not understand the world in a human sense, but they can generate coherent, relevant, and seemingly sensible responses due to their self-supervised training and Transformer-based architecture (Gao et al., 2024).

However, LLMs have a fundamental limitation: they are closed in the sense that their knowledge is fixed at the time of training. They cannot directly access updated information or specific external sources, such as business databases, recent scientific articles, or specific technical documents. This makes them unreliable for tasks that require factual precision or niche knowledge. To address this, a technique called RAG (Retrieval-Augmented Generation) has been developed (Lewis et al., 2021).

A RAG system combines an LLM with an information retrieval component. Instead of relying solely on what the model has learned during training, it is provided with a mechanism to query relevant documents in real time before generating a response. The general workflow is as follows: when a user query is received, it is turned into a search query used to retrieve relevant passages from a document database, typically indexed using semantic retrieval techniques, such as embeddings (Asudani et al., 2023). The retrieved documents are then concatenated to the prompt passed to the LLM, so that the model generates its response not only from what it internally knows but also based on the external information provided in the context.

This approach has two important consequences. First, it allows the system to provide more up-to-date, specific, and verifiable answers, as it can rely on concrete documents that are part of the retrieval corpus. Second, it enhances traceability: in many cases, it is possible to track which document originated a part of the model’s response. This is critical in environments where the truthfulness and justification of answers matter, such as in legal, medical, or technical contexts. These features make it particularly relevant for our case, where the originality of existing projects is a key aspect to evaluate. The instructor can provide a knowledge base containing relevant previous works, and the RAG system can help identify the most pertinent works for each new project. This not only facilitates the evaluation of originality but also helps students find inspiration and useful references for their own projects.

3.2. Proposed RAG System

To address the problem of originality in projects, it is proposed to use a RAG system that allows students and the instructor to interact with an LLM. The system is based on three submodules: (1) Generation of structured summaries of works and metadata, (2) Creation of the indexed knowledge base, (3) Deployment of the RAG system with an open LLM (see Figure 2).

3.2.1. Generation of Structured Summaries

The inherent limitation in the context window of Large Language Models (LLMs), exemplified by the token processing capacities of models like GPT-3.5 and GPT-4 (Tsirmpas et al., 2024), underscores the necessity for strategies that enable these models to interact with extensive bodies of knowledge. Applying summarization to documents before their indexing in a RAG system addresses specific needs related to the nature of project originality assessment: when documents contain a mixture of text and raw code, the meaning of which is constructed through an understanding of the entire document, fragmentation into chunks for indexing can decontextualize critical information. A prior summarization process allows for the condensation of essential information and the general context into more compact and meaningful representations, thereby optimizing indexing.

In this particular scenario where the objective is to evaluate the originality of a work, the identification of the central idea becomes important. This idea is often semantically dispersed throughout the document, transcending the presence of specific textual references. Summarization becomes a means to extract and document the essence of each work. By indexing these summaries, the RAG system facilitates large-scale conceptual comparison, allowing for the identification of fundamental similarities or differences between works, rather than being limited to the comparison of isolated textual fragments. The summary acts as a concise representation of the main contribution of each document, facilitating a more comprehensive evaluation of originality.

In this paper’s particular case, the course projects must present: (1) a technical report of the code and (2) a presentation. The technical report of the code includes all the necessary code and files to reproduce the project. In this documentation, the system is not directly discussed; rather, the code is presented and explained in terms of its implementation. Therefore, preprocessing is needed so that the LLM can comprehend the content of the work. In other types of projects, the documentation could include a written report without losing generality. Regarding the presentation slides, all the text is merged with the body of the technical code report. This combined document of raw content will have the structure shown in Figure 3.

The generation of structured summaries is carried out using an LLM that generates a summary of the previous works. This summary is generated using the open-source model DeepSeek-R1 (DeepSeek-AI et al., 2025) with 14 trillion parameters. DeepSeek has been trained on a corpus of technical and scientific documents, enabling it to understand technical language and generate coherent and relevant summaries. The model has been designed to be efficient in terms of computational resources, making it suitable for use in academic and research environments. Furthermore, DeepSeek is an open-source model, meaning it is available for use and modification by the community. For the summarization of previous works, the base version of the model is used, without fine-tuning. The DeepSeek model is accessible via https://www.deepseek.com/en (accessed on 19 April 2025) and can also be imported through the Ollama system (available at https://ollama.com/, accessed on 19 April 2025). The model can be run on a local machine or a remote server, making it versatile for use in various environments.

The input prompt (Figure 4) has been designed for the model to generate a structured summary of the previous works. The model generates a summary with the following structure: (1) project metadata, (2) project description, (3) project objectives, (4) modules and libraries of code used, (5) hardware used, and (6) contextual keywords. This structure allows the LLM to understand the content of the work and generate a coherent and relevant summary.

To facilitate reproducibility and simplify integration, we provide a fully operational Python-based pipeline for the preparation of the structured summaries described in this section. The complete implementation, including all scripts and configuration instructions, is publicly available at https://bender.us.es/syanes/rag-ollama-evaluacion, accessed on 19 April 2025.

The data preparation process is designed to be fully automatic. The user is only required to organize the raw project documentation into a directory structure, where each subfolder represents an individual student project and contains the corresponding code files and presentation slides (in .txt, .md, or .pptx format). Once this structure is in place, the user runs the provided script, which performs the following steps for each project:

Preprocessing: All code and textual content is merged into a single structured document. This includes code comments, implementation details, and any text extracted from the presentation slides.
Summarization: This unified document is passed to the DeepSeek-R1 model (executed locally via the Ollama runtime), using a custom prompt. The prompt is designed to extract the project title, year, objectives, hardware and software used, and ten contextual keywords.
Output: For each project, a YAML is generated containing the structured summary. These files are then compiled into one or more PDF documents.

Once the summaries have been generated and compiled, they can be directly uploaded to Google NotebookLM. This platform accepts PDF documents as its data source, and performs automatic indexing of the uploaded content. No runtime communication between DeepSeek and NotebookLM is necessary. The interoperability between both components is realized via this intermediate document format, which serves as the communication layer between summarization and retrieval. This architecture offers several advantages:

It removes the need for direct API integration or web-based interoperation.
It enables human review, curation, and version control of the summaries prior to ingestion.
It allows reuse of the summaries in other RAG systems beyond NotebookLM (e.g., LangChain, LlamaIndex, ChromaDB).

Furthermore, this modularity makes the system accessible to institutions without high-performance computing infrastructure. Both the summarization and retrieval components can run on local hardware, reducing dependency on proprietary platforms and enabling full deployment in air-gapped academic environments.

3.2.2. Creation of the Indexed Knowledge Base

Once the structured summaries of previous works have been generated, it is necessary to create an indexed knowledge base that allows the LLM to access relevant information efficiently. For this, an information retrieval system based on embeddings is used. Embeddings are vector representations of words or phrases that capture their semantic meaning and enable measuring the similarity between different texts. The embedding model is a language model trained to generate vector representations of words or phrases. These models are based on deep neural network architectures, such as Transformers, and use unsupervised learning techniques to learn semantic relationships between words. The embeddings are generated from the raw text of previous works and stored in an indexed database. This database enables efficient searches and retrieval of relevant information for the LLM.

The use of NotebookLM (https://notebooklm.google.com, accessed on 19 April 2025) from Google is proposed as an off-the-shelf tool for creating the indexed knowledge base. NotebookLM is an information retrieval system that allows the creation of indexed databases from documents in any format. This system uses embedding techniques to generate vector representations of documents and enables efficient searches and retrieval of relevant information for the LLM. The tool allows storing up to 50 PDF documents and performing contextual searches for information. Searches are performed as follows:

The user enters a question into the system.
The system converts the question into a search query and sends it to the embedding model.
The embedding model generates a vector representation of the query and compares it with the vector representations of the documents stored in the database.
The system retrieves the most relevant documents and generates a context.
The retrieved context is added to the input prompt and sent to the LLM to generate the response.
The LLM generates the response and returns it to the user.

3.2.3. RAG Deployment

Once the knowledge base is indexed, both students and faculty can access it using the same Notebook. For students, the system serves as a tool for evaluating the originality of their ideas. An input prompt is provided (see Figure 5), which allows students to enter their idea and receive a summary of relevant prior works. The LLM will generate a summary of the relevant prior works and provide an evaluation of the originality of the new project. This evaluation is based on the comparison between the new project and the previous works, considering the originality and innovation of the proposals.

The professor, on the other hand, can compare the originality of a new work with previous works. To do this, the professor can use the same prompt as the student, but in this case, the LLM will generate a summary of the relevant prior works and provide an evaluation of the originality of the new project.

The RAG system is not only capable of answering contextual questions about the previous works but can also indicate which documents and which parts of the documents the information was extracted from. This allows both the instructor and the student to verify the originality of the work and assess whether the new project contributes something new or if it is simply a repetition of what has already been done. The NotebookLM tool, in its response, is able to interactively return the metadata of the previous projects, making it easier to search within the knowledge base.

4. Results

This section presents the evaluation results of the RAG system for assessing the originality of student projects. The evaluation is divided into two parts. First, the ability of the system to recover and assess the originality of 91 prior works from previous academic years (2019–2022) is evaluated. These works naturally span a range of originality, as students are encouraged to develop new ideas but are not prevented from building on past projects. Second, the evaluation focuses on 10 test cases created specifically for this study to assess the system’s ability to assign originality scores to unseen project proposals.

To construct the test set, a third instructor—who was not involved in the blind scoring phase—reviewed the entire historical corpus of projects and created 10 new project descriptions with controlled levels of originality. These were designed to span the full originality scale and were labeled based on the official originality criterion used in the course and communicated to students. According to this criterion, a project with a score of 0 is one that closely replicates an existing work, with the same modules, objectives, and technical structure. A project with a score of 10 introduces a novel concept, implementation, or application scenario that has not been seen in any previous work, while still respecting the course constraints (in this case, the development of an IoT application based on a Raspberry Pi 4). During the experiment, the two primary instructors rated each test case independently, without access to each other’s scores or to the output of the RAG system. Their scores were then averaged to obtain the final instructor score per case, which was compared against the RAG-assigned score. This evaluation setting was chosen to reflect real teaching conditions, where originality evaluation is inherently subjective but guided by shared pedagogical criteria. By averaging the scores, we reduce the impact of individual bias and provide a fairer ground for comparison with the system’s output.

4.1. Structured Summaries Generation

For the generation of summaries, the DeepSeek-R1 model with 14 trillion parameters was used, which generated a structured summary of the previous works. All summaries were compiled into a single PDF and uploaded to the NotebookLM database. This platform supports up to 50 PDF documents, so the limit of works is well beyond the number of previous works (see Figure 6). The input prompt described in the previous section was used for generating the summaries. The summaries were generated using an Nvidia RTX 4090 graphics card in a processor AMD Ryzen 9 7950X3D. The model was run on a local machine with 128 GB of RAM and a 2 TB NVMe SSD hard drive. The average generation time for the summaries is 3.5 s per work, which allows for the creation of the summaries of the previous works in less than 5 min. Although the computing system is powerful, the DeepSeek-R1 model, which only takes up 9GB, is optimized to be efficient in computational resources, making it suitable for inference on a laptop or desktop computer.

In Figure 7, an example of a structured summary generated by the DeepSeek-R1 model is presented. In this case, the model has generated a summary of a previous work that meets the course requirements. The summary includes the project metadata, project description, project objectives, modules and code libraries used, hardware used, and contextual keywords.

It can be seen that the model is capable of generating a summarized description of the previous work, which contains more than 200 lines of raw code and a 10-slide presentation. The summary includes the project metadata, which is important for indexing: file names and project description. Additionally, it includes the code modules and libraries used, the hardware utilized, and contextual keywords. This information is relevant for indexing the previous works and helps the LLM understand the content of the work. Furthermore, it provides the student or the instructor with a reference to relevant previous works for their project without having to read the entire work or delve into the code.

4.2. Examples of Proposal Originality Evaluation

Once all the works were summarized and the knowledge base was created with NotebookLM, the originality of the previous works was evaluated. For this purpose, three types of previous works were manually created: (1) works very similar to the previous works, (2) inspired works, and (3) completely original works never seen before. These works were generated manually by observing the knowledge base. In Table 1, two examples for each type of idea are presented, which were used to evaluate the RAG. In Table 2, the responses of each of the previous works are shown when simulating them as new project proposals with the prompt described in Figure 5. The model was able to evaluate the originality of the previous works and returned an originality score between 0 and 10, where 0 represents a work very similar to the previous ones and 10 represents a completely original work. It can be observed that the model successfully evaluated the originality of the previous works and provided a consistent originality score.

The system effectively identifies the closest semantic case with ease. In the cases of inspired works, the model returned an originality score between 5 and 7, indicating that the work is original but has similarities with previous works. In the cases of completely original works, the model returned an originality score between 8 and 10, indicating that the work is fully original and does not resemble the previous works. In the latter range, the model references previous works with superficial similarities, such as those using a Raspberry Pi.

4.3. Comparative Evaluation of Originality

To validate this system, 10 ad-hoc works were processed with the RAG tool. These 10 works were specifically designed to cover the full range of originality. A professor of the course was asked to propose 10 works with varied originality characteristics. These 10 proposals were processed using the RAG tool (see Table 2), and two professors from the course were asked to evaluate the originality of the works. These two evaluator-instructors involved in this study have taught the SDAA course since its inception and possess a detailed recollection of past student projects. Their evaluations are therefore based on this accumulated internal dataset, which reflects years of exposure to project work in the course. This memory-driven assessment is, in practice, how originality is evaluated in real classrooms: instructors compare new proposals against what they remember having seen before. The results of the RAG evaluation have been compared with the assessments provided by the professors to check for alignment and consistency in the evaluation of originality.

The RAG-based system is specifically designed to address this issue. By building a semantically indexed and explorable memory of all prior works, it ensures that all instructors—regardless of tenure or recollection—can operate with access to the same complete and objective context. The system not only identifies the most similar past works but justifies its judgment and provides suggestions for improvement when originality is low. In this sense, it functions as a cognitive prosthesis for evaluators, not a replacement, but a correction mechanism for memory limitations and subjective gaps.

Rather than reproducing instructor bias, the system serves to expose it. By comparing human-assigned originality scores with those generated by the RAG tool, discrepancies can be analyzed to understand where instructor evaluations may have overlooked significant similarity. This not only improves the fairness of the assessment process but also contributes to the standardization of evaluation criteria in project-based learning environments.

In Figure 8, the results of each processed project and the originality scores assigned by the instructors and the proposed RAG system are presented. It can be observed that the RAG system shows a good correlation with the instructors’ evaluations. In Figure 9, the correlation between the originality scores assigned by the instructors and the RAG system is also presented. To support the statistical validity of this correlation, we computed the Pearson correlation coefficient between the originality scores assigned by the RAG system and the mean score of the two instructors. The result was

r = 0.87

with a p-value of

p = 0.006

, indicating that the correlation is statistically significant at the 95% confidence level. This correlation indicates that the RAG system is capable of evaluating the originality of previous works in a way that is consistent with the instructors’ evaluations.

Regarding the differences between the originality scores given by the instructors and the RAG system, it can be observed that the RAG system tends to be stricter in evaluating the originality of previous works. The average discrepancy between the RAG score and the instructors’ scores is −1.1. This discrepancy is due to several reasons: (1) the RAG system considers the originality of previous works and their similarity to prior works. It only takes into account originality based on previous ideas. Instructors, on the other hand, may forget that some previous works may have been influenced by others, which can lead to a less strict evaluation. (2) the RAG system tends to underestimate works because most of them contain latent ideas related to the course content. Every work proposed by a student inevitably shares common foundations with previous works. Therefore, the RAG system tends to evaluate the originality of previous works more strictly than the instructors.

An example of this divergence can be found in the work titled “IoT Predictive Maintenance System”. It was evaluated by the RAG system with a score of 8/10, while the instructors rated the originality of the work with a score of 10/10. This work incorporates a novel system that has not been implemented before, and the instructors recognize it as such. However, the RAG system evaluated the originality of the work with a lower score because the work deals with topics such as “monitoring” and “sensing”, which are core themes in all the projects. It has been observed that it is difficult to come up with a project that the RAG identifies with a 10/10 score. Nevertheless, the RAG system does identify this proposal as the most original.

A particular situation is that of the work titled “Connected Perimeter Alarm”. This work was evaluated by the RAG system with a score of 4/10, while the instructors evaluated the originality of the work with an average score of 6/10, as they do not particularly remember any work that dealt with home security and alarm systems based on magnetic sensors. The RAG system evaluated the originality of the work with a lower score (2/10) because it appears to find a significant number of matches with previous works. This work seems to be an adaptation of a previous project titled “Automated Surveillance System with Raspberry Pi and Sensors (2020)”, which uses PIR motion sensors instead of magnetic sensors. This type of similarity detection capability is a desirable feature because it moderates the bias effect in the grading of the student by the instructor. In a real scenario, the student could use this system and reframe their objectives to achieve a higher grade.

4.4. User Feedback and Perceived Utility

To complement the technical evaluation, we conducted a small-scale qualitative survey to assess the perceived usefulness and usability of the system among stakeholders. Two instructors who participated in the originality rating process and eight students who had previously completed the SDAA course were invited to interact with the system and provide open-ended feedback. All participants were asked to identify the most valuable and most limiting aspects of the tool from their perspective.

Instructor Feedback (N = 2):
- Strengths:
  –
  “The system works like an external memory for the course. When you’ve supervised dozens of projects over the years, it’s easy to forget specific details or even entire topics that have already been explored. With this tool, we can quickly check whether a new proposal resembles something done before, without relying only on our memory.”: Instructors appreciated the ability to retrieve structured summaries of previous works, especially when human memory fails due to the volume of accumulated projects.
  –
  “From the teacher’s point of view, the interface is straightforward. Uploading summaries and querying the system doesn’t require technical knowledge or special training.”: The system was considered accessible, requiring minimal technical effort to interact with.
  –
  “ One of the strengths of the system is that it doesn’t just give you a number; it explains the reasoning behind it. You can see what similar projects it found and why it judged the current one as more or less original. That transparency is key to trusting the evaluation.”: The capacity to identify and compare similar prior works, along with detailed justifications, was viewed as a key strength.
- Limitations:
  –
  Limitation: “The system tends to be overly strict when judging originality.” Instructors noted that in some cases, the RAG score seemed conservative relative to their own perception of novelty.
Student Feedback (N = 8):
- Strengths:
  –
  “This tool is actually pretty useful when you’re still deciding what project to do. You can throw in a rough idea and it helps you figure out whether it’s too basic, already done, or worth developing. It kind of pushes you to think twice and come up with something better.”: Students valued the tool as a formative resource that facilitated reflection on the originality and direction of their proposed work.
  –
  “It’s cool that you can keep asking questions. Like, if you get a low originality score, you can ask why, and it tells you which parts are too common or similar to other projects. Then you can tweak your idea and ask again. It’s more like a conversation than a one-shot tool.”: This feature was appreciated as a way to iteratively develop proposals, but it also revealed that initial input precision matters.
- Limitations:
  –
  “The tool doesn’t really get your idea unless you describe it clearly. If you just say “a robot that sorts trash,” it might give you generic feedback or misunderstand your intent. You have to be specific, like what kind of sensors, what environment, what’s new about your approach. It’s not like talking to a human who fills in the blanks.”: Some students found that the system’s response quality was sensitive to how explicitly the idea was described, which could be a barrier to early-stage ideation.
  –
  “ When you ask it for something really original, it sometimes goes overboard and gives you ideas that are too complicated for this course — like stuff that would take a whole team or a research lab. So it’s helpful, but you need to bring it back down to earth sometimes.”: Some students indicated that while the system did provide innovative suggestions when asked, they were occasionally impractical.

This exploratory feedback reinforces the utility of the system as both a pedagogical aid and an evaluation support tool. It also points to areas for refinement, such as balancing the strictness of the scoring function and improving prompt engineering support for novice users.

4.5. Generalization to Other Courses and Disciplines

The RAG framework presented in this study was developed and evaluated in the context of a master’s course in electronic engineering. However, the design of the system is intentionally modular and domain-agnostic. Its capacity to evaluate the originality of student work does not depend on the specific technical content of the projects, but on the ability to semantically compare new proposals against a corpus of previously completed works.The core premise of this system is simple: originality is always defined relative to prior knowledge. Consequently, the framework can be deployed in any context where a collection of previous student outputs is available, regardless of discipline. The key adaptation step required for generalization is the design of an appropriate summarization prompt that reflects the instructional objectives and originality criteria relevant to the target course.

Preparing a new course or domain for RAG deployment involves two main steps:

Definition of Originality Criteria: The instructor must define what constitutes originality in their specific context. This may include novel combinations of methods, application to previously unaddressed problems, creative re-interpretation of known tools, or any domain-specific marker of innovation.
Prompt Engineering for Summarization: Once these criteria are defined, the summarization prompt must be adapted to extract and structure the relevant information from prior works. This includes metadata (e.g., project title, year), descriptive elements (objectives, methodology, implementation), and any features relevant to originality (e.g., design decisions, thematic scope). The summarization LLM—such as DeepSeek—can be instructed via prompt alone, with no retraining required.

The summaries generated using this process are then compiled into a PDF (or multiple PDFs) and uploaded into the retrieval platform—in this study, NotebookLM. At this point, no further customization is needed. The contextualization engine of the RAG system operates over the provided summaries, and its behavior does not change across disciplines. Whether the indexed documents describe engineering systems, social science interventions, or literary analyses, the mechanism for similarity detection, explanation generation, and originality scoring remains unchanged.While the semantic features extracted from the documents are course-specific, the RAG evaluation mechanism itself is fully general. Its modularity, reliance on prompt-based customization, and decoupled summarization–retrieval architecture make it highly transferable across domains. The only requirement is that the corpus of prior works be sufficiently documented and structured so that meaningful summaries can be produced.

5. Discussion

Project-Based Learning has established itself as a highly valuable pedagogical approach, especially in disciplines like engineering, where it encourages the acquisition of knowledge and skills through the active resolution of problems and the completion of meaningful tasks that simulate real professional challenges. This method fosters contextualized and applied learning, aligned with constructivist theories and theories of meaningful learning. However, PBL presents an inherent challenge in assessment, particularly when it comes to originality. In environments where students have access to prior work or vast repositories of code and ideas, it becomes crucial, both from an ethical and pedagogical perspective, to distinguish between legitimate inspiration and genuine innovation, as opposed to mere reuse or plagiarism. Ensuring that students develop novel solutions and demonstrate a deep understanding, rather than superficially assembling pre-existing components, is essential for validating the learning objectives of PBL. Solving this problem is critical for maintaining academic integrity and ensuring that student effort translates into real competencies. Here, an important distinction should be made between plagiarism detection and originality evaluation. While existing tools like Turnitin are effective in identifying exact or near-exact matches in text or code, they are not suitable for assessing whether a student project introduces a novel idea or solution—especially in project-based engineering education where the reuse of libraries or hardware configurations is often legitimate and necessary. Our framework addresses this gap by providing instructors and students with a transparent and explainable assessment of conceptual similarity, including contextualized reasoning and improvement suggestions.

Thus, this article presents and validates a framework based on Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) as a viable and effective solution to address this originality assessment challenge in the context of the Advanced Digital Systems and Applications course. The implementation demonstrated that the system can efficiently process the documentation of previous projects (reports, code, presentations) to generate structured summaries and create an indexed knowledge base. When querying this base, the RAG system provided originality evaluations for new project proposals that showed a notable correlation (0.87) with the assessments of expert instructors, also identifying the most relevant previous works and justifying the evaluation. A significant aspect of the proposal is its accessibility; the use of an efficient open-source language model such as DeepSeek for summarization, combined with a platform like NotebookLM by Google for knowledge base management and RAG interaction, means that the implementation of this system does not necessarily require a massive investment in proprietary computational resources. This makes it a feasible option for educational institutions that may not have dedicated high-performance computing infrastructure, democratizing access to advanced evaluation support tools.

5.1. Practical Implications

The proposed framework has immediate and specific implications for educational practice, particularly in courses where students are expected to produce original project-based work. Instructors frequently face the challenge of assessing whether a student project genuinely introduces a new idea or merely reuses known structures. This task becomes increasingly difficult as the number of prior works grows and institutional memory becomes fragmented. The RAG-based system presented here provides a scalable and explainable support tool that makes the historical context of a course explicit and accessible. By indexing prior works and contextualizing new proposals semantically, the system allows instructors to make more informed and consistent originality judgments. It is particularly helpful for reducing variability between instructors and for assisting those who are new to a course and lack full exposure to past student work.

From the student perspective, the tool functions as a formative assistant during the ideation phase. It provides feedback about the novelty of an idea, retrieves similar previous projects, and suggests refinements. This promotes self-assessment and iteration, and aligns well with pedagogical models that emphasize autonomy and reflection. Integration into the course workflow is lightweight. The summarization pipeline requires minimal preparation once the summarization prompt has been tailored to the domain, and systems like NotebookLM enable direct interaction without complex setup. For institutions concerned with platform dependency, open-source alternatives exist that allow the same logic to be reproduced locally, as discussed in the Future Lines section.

Ultimately, the system does not replace the human evaluator but augments their capacity to reason with the full historical context. It contributes to a fairer and more transparent originality evaluation process and offers a structured path toward scalable co-evaluation in project-based learning environments.

5.2. Limitations

Although the RAG-based originality evaluation system shows strong alignment with instructor judgment overall, a closer inspection of divergence cases and user feedback reveals important limitations, both conceptual and practical.

A first class of failure involves mismatches between the RAG score and the instructors’ perception of novelty. In one case, a project focused on predictive maintenance was scored 10/10 by instructors, who considered it an entirely novel proposal. However, the RAG system assigned it a score of 8/10, likely due to the presence of thematic overlap with many prior projects involving sensor monitoring and diagnostics. This suggests that the system tends to penalize projects that share vocabulary or components with common patterns in the corpus, even when the conceptual framing is new. Conversely, in the case of a perimeter alarm system using magnetic sensors, the instructors gave a moderate originality score (6/10), seeing no clear precedent, while the RAG system rated it much lower (2/10). The system retrieved a similar prior project based on PIR sensors, interpreting the structural similarity as significant, despite the difference in sensor technology. This shows the system’s capacity to recall structural analogies that instructors may miss, but also exposes its limitation in recognizing meaningful technical distinctions.

These cases illustrate a general pattern: the system is conservative near the upper end of the scale and sensitive to recurring terminology or architectures, sometimes underestimating pedagogically novel work. Qualitative feedback from users reinforces these observations. Several students reported that the tool was useful for shaping ideas but noted that its performance strongly depended on how explicitly the idea was described. If the proposal was vague or underspecified, the system often failed to interpret it meaningfully. Others pointed out that when asked for highly original ideas, the system occasionally responded with suggestions that were too complex or impractical for the scope of the course. These behaviors suggest that the system is effective when properly guided, but may not be well-suited to early-stage ideation without support. Instructors, on the other hand, highlighted that the interface was simple and accessible, and praised the system’s ability to explain its scores via contextual comparisons with past work. However, they also noted that the system tended to be overly strict, rarely assigning top scores even to projects they considered outstanding. These observations point to three key limitations: (1) the system’s reliance on surface-level similarity may obscure recognition of conceptual novelty, (2) its suggestions and evaluations are highly sensitive to input precision, and (3) the scoring model lacks adaptability to the pedagogical goals of each specific course. While these constraints do not invalidate its usefulness, they define the boundary conditions under which the tool should be interpreted as a co-evaluation aid rather than an autonomous assessor.

5.3. Ethical Implications

The use of artificial intelligence in evaluative educational settings raises valid ethical concerns that must be addressed explicitly. In the context of originality assessment, these concerns include transparency of the evaluation criteria, explainability of the decision process, and the potential for incorrect judgments—either in the form of false positives (flagging a n ovel project as derivative) or false negatives (failing to identify unoriginal work). The system described in this work is not intended to operate autonomously, nor to replace human evaluators. It is designed as a human-in-the-loop co-evaluation tool. For students, it functions as a formative assistant: it offers feedback on project ideas during the proposal phase, allowing students to iteratively improve and refine their work. It is not used for grading or ranking, but for guidance.

For instructors, the system provides a second opinion: it retrieves potentially similar past projects and explains its judgment, including rationale and suggestions. However, the final decision about originality remains entirely with the human instructor. This mitigates the risk of unjustified automation and places the tool in a support role, rather than a prescriptive one. In terms of explainability, the system provides not only a similarity score, but also specific comparisons and contextual comments that justify its assessment. This transparency allows users to understand why a score was assigned, challenge its validity, or disregard it if it lacks relevance.

Concerning false judgments, it is important to highlight that such misclassifications can and do occur in human-only assessments as well. The novelty of this system lies in making the basis for the evaluation explicit and reviewable. In cases where the system is overly strict or lenient, its outputs can be compared with instructor judgment, and the discrepancy analyzed. This dual-track approach helps to identify biases, blind spots, and inconsistencies on both sides.

Thus, the ethical design of the system hinges on three pillars:

Non-autonomy: The system should never makes binding decisions on its own.
Explainability: All judgments should be accompanied by transparent justification.
Support for fairness: The system reduces dependence on fallible human memory and helps newcomers to evaluate with the same context as experienced instructors.

We believe that maintaining a human-in-the-loop design, together with full transparency and optionality, allows this system to be used ethically and responsibly in educational environments.

6. Conclusions and Future Work

Regarding future work, there is still much lines to explore. A key aspect is the technical improvement of the system. This includes further fine-tuning of the evaluation models to better align with specific pedagogical criteria, as well as integrating source code similarity analysis tools to complement the textual semantic analysis and gain a more comprehensive view of originality. Additionally, an important strategic objective would be to evolve towards a fully open-source RAG system. While DeepSeek is already open-source, replacing proprietary elements such as NotebookLM with open-source alternatives (e.g., using libraries like LangChain or LlamaIndex along with vector databases such as ChromaDB or FAISS, and open LLMs) would give institutions greater control, transparency, customization, and long-term sustainability, eliminating dependencies on external platforms.

Another fundamental expansion line concerns enriching the knowledge base used by the RAG system. It is crucial to go beyond simply adding more projects from previous years. The proposal is to integrate core course materials, such as lecture notes, fundamental theoretical concepts, and even solutions or statements from mandatory lab practices and exercises. This would transform the RAG system: it would no longer only compare a new proposal to previous final projects, but could also help check if a student is improperly reusing fragments from mandatory practices or if, on the contrary, they are correctly applying the theoretical concepts and techniques learned in class to build their original solution. Evaluating originality in this broader context would allow for valuing not only novelty compared to other projects but also the student’s ability to integrate and apply the core knowledge taught in the course in a creative and independent manner.

Another promising direction for future development is the reimplementation of the retrieval and interaction component using open-source tools. While the current system relies on Google NotebookLM due to its ease of use and integration with institutional infrastructure (licensed through the University of Seville), this dependency may limit long-term reproducibility and cross-institutional deployment. Alternative solutions based on LlamaIndex, ChromaDB, or other local vector database frameworks could be used to replicate the RAG workflow in a fully self-hosted environment. In this setup, the summarization would continue to be performed locally using DeepSeek via Ollama, and a minimal interface layer could be developed to replace the proprietary frontend. Although this would require additional computational infrastructure and engineering effort, it would provide a more sustainable and adaptable foundation for future applications in other educational contexts.

Finally, it remains essential to investigate the actual pedagogical impact of these tools. Longitudinal studies are needed to understand how students interact with the RAG system, whether it modifies their ideation and development strategies, and its real impact on the quality and originality of student projects. The development of more intuitive user interfaces adapted to the academic workflow, as well as the exploration of the applicability of this framework across different courses, educational levels, and disciplines that employ PBL, are necessary steps to validate its robustness and maximize its potential as a tool that supports fairer, more efficient, and formative assessment, driving true innovation in project-based learning.

Author Contributions

Conceptualization, S.Y.L. and S.T.M.; methodology, S.Y.L., D.G.R. and S.T.M.; validation, S.Y.L., D.G.R. and S.T.M.; investigation, S.Y.L., D.G.R. and S.T.M.; data curation, S.Y.L. and D.G.R.; writing S.Y.L., D.G.R. and S.T.M.; funding acquisition, D.G.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was developen as part of the project “BerrySostenible—Grupo Operativo (GOPG-HU-23-0005)” from the call “Ayudas a Grupos operativos de la Asociación Europea de Innovación (AEI) en materia de productividad y sostenibilidad agrícolas (2023)”, funded by Andalucía Regional Government (Junta de Andalucía).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RAG	Retrieval-Augmented Generation
LLM	Large Language Model
PBL	Project-Based Learning

References

Alves, A., Sousa, R., Moreira, F., Carvalho, M. A., Cardoso, E., Pimenta, P., Malheiro, T., Brito, I., Fernandes, S., & Mesquita, D. (2016). Managing PBL difficulties in an industrial engineering and management program. Journal of Industrial Engineering and Management, 9(3), 586. [Google Scholar] [CrossRef]
Asudani, D. S., Nagwani, N. K., & Singh, P. (2023). Impact of word embedding models on text analytics in deep learning environment: A review. Artificial Intelligence Review, 56(9), 10345–10425. [Google Scholar] [CrossRef] [PubMed]
Ausubel, D. P. D. P. (1968). Educational psychology: A cognitive view. Holt, Rinehart and Winston. [Google Scholar]
Balon, B., & Simić, M. (2019, May 20–24). Using Raspberry Pi computers in education. 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 671–676), Opatija, Croatia. [Google Scholar] [CrossRef]
Bédard, D., Lison, C., Dalle, D., Côté, D., & Boutin, N. (2012). Problem-based and project-based learning in engineering and medicine: Determinants of students’ engagement and persistance. Interdisciplinary Journal of Problem-Based Learning, 6(2), 10. [Google Scholar] [CrossRef]
Bordalejo, B., Pafumi, D., Onuh, F., Khalid, A. K. M. I., Pearce, M. S., & O’Donnell, D. P. (2025). “Scarlet cloak and the forest adventure”: A preliminary study of the impact of AI on commonly used writing tools. International Journal of Educational Technology in Higher Education, 22(1), 6. [Google Scholar] [CrossRef]
Capstick, A., Krishnan, R. G., & Barnaghi, P. (2025). AutoElicit: Using large language models for expert prior elicitation in predictive modelling. arXiv, arXiv:2411.17284. [Google Scholar]
Chen, Y.-S., Jin, J., Kuo, P.-T., Huang, C.-W., & Chen, Y.-N. (2024). Llms are biased evaluators but not biased for retrieval augmented generation. arXiv, arXiv:2410.20833. [Google Scholar]
Chu, S. K. W., Li, X., & Mok, S. (2021). UPCC: A model of plagiarism-free inquiry project-based learning. Library & Information Science Research, 43(1), 101073. [Google Scholar] [CrossRef]
DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., … Zhang, Z. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv, arXiv:2501.12948. [Google Scholar]
de Sales, A. B., & Boscarioli, C. (2021, June 23–26). Teaching and learning of interface design: An experience using project-based learning approach. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1–6), Chaves, Portugal. [Google Scholar] [CrossRef]
Etchepareborda, P., Bierzychudek, M. E., Carducci, L., Veiras, F. E., Zacchigna, F. G., Corbellini, E., Marra, S. G., Iglesias, M., Teggia, M. M., Alvarez-Hamelin, J. I., & Veiga, R. A. (2018, March 11–14). A project-based learning method applied to an introductory course in electronics engineering. 2018 IEEE World Engineering Education Conference (EDUNINE) (pp. 1–4), Buenos Aires, Argentina. [Google Scholar] [CrossRef]
Fitrah, M., Sofroniou, A., Setiawan, C., Widihastuti, W., Yarmanetti, N., Jaya, M. P. S., Panuntun, J. G., Arfaton, A., Beteno, S., & Susianti, I. (2025). The impact of integrated project-based learning and flipped classroom on students’ computational thinking skills: Embedded mixed methods. Education Sciences, 15(4), 448. [Google Scholar] [CrossRef]
Gabino-Campos, M., Baile, J. I., & Padilla-Martínez, A. (2025). Social biases in AI-generated creative texts: A mixed-methods approach in the Spanish context. Social Sciences, 14(3), 170. [Google Scholar] [CrossRef]
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv, arXiv:2312.10997. [Google Scholar]
Google. (2025). Google classroom: How to use google classroom. Available online: https://support.google.com/edu/classroom/answer/9335819?hl=en (accessed on 19 April 2025).
Guo, J., Li, Y., Chen, R., Wu, Y., Liu, C., Chen, Y., & Huang, H. (2025). Towards copyright protection for knowledge bases of retrieval-augmented language models via ownership verification with reasoning. arXiv, arXiv:2502.10440. [Google Scholar]
Gupta, T., & Pruthi, D. (2025). All that glitters is not novel: Plagiarism in ai generated research. arXiv, arXiv:2502.16487. [Google Scholar]
Hemmat, A., Sharbaf, M., Kolahdouz-Rahimi, S., Lano, K., & Tehrani, S. Y. (2025). Research directions for using LLM in software requirement engineering: A systematic review. Frontiers in Computer Science, 7, 1519437. [Google Scholar] [CrossRef]
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., tau Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv, arXiv:2005.11401. [Google Scholar]
Li, R., Li, M., & Qiao, W. (2025). Engineering students’ use of large language model tools: An empirical study based on a survey of students from 12 universities. Education Sciences, 15(3), 280. [Google Scholar] [CrossRef]
Lin, E., Peng, Z., & Fang, Y. (2024). Evaluating and enhancing large language models for novelty assessment in scholarly publications. arXiv, arXiv:2409.16605. [Google Scholar]
McIntire, A., Calvert, I., & Ashcraft, J. (2024). Pressure to plagiarize and the choice to cheat: Toward a pragmatic reframing of the ethics of academic integrity. Education Sciences, 14(3), 244. [Google Scholar] [CrossRef]
Naman, A., Shariffdeen, R., Wang, G., Rasnayaka, S., & Iyer, G. N. (2025). Analysis of student-llm interaction in a software engineering project. arXiv, arXiv:2502.01273. [Google Scholar]
Ngereja, B., Hussein, B., & Andersen, B. (2020). Does project-based learning (PBL) promote student learning? A performance evaluation. Education Sciences, 10(11), 330. [Google Scholar] [CrossRef]
Rasnayaka, S., Wang, G., Shariffdeen, R., & Iyer, G. (2024, April 20). An empirical study on usage and perceptions of LLMs in a software engineering project. 1st International Workshop on Large Language Models for Code (pp. 111–118), Lisbon, Portugal. [Google Scholar] [CrossRef]
Russell-Gilbert, A., Mittal, S., Rahimi, S., Seale, M., Jabour, J., Arnold, T., & Church, J. (2025). Raad-llm: Adaptive anomaly detection using llms and rag integration. arXiv, arXiv:2503.02800. [Google Scholar]
Tsirmpas, D., Gkionis, I., Papadopoulos, G. T., & Mademlis, I. (2024). Neural natural language processing for long texts: A survey on classification and summarization. Engineering Applications of Artificial Intelligence, 133, 108231. [Google Scholar] [CrossRef]
Turnitin. (2025). Turnitin: Originality. Turnitin, LLC. [Google Scholar]
Vogler, J. S., Thompson, P., Davis, D. W., Mayfield, B. E., Finley, P. M., & Yasseri, D. (2018). The hard work of soft skills: Augmenting the project-based learning experience with interdisciplinary teamwork. Instructional Science, 46(3), 457–488. [Google Scholar] [CrossRef]

Figure 1. Raspberry Pi 4 used in the course. It is widely used in various applications, including IoT projects, robotics, and software development. Its small size and affordability make it an ideal choice for students and hobbyists looking to explore electronics and programming.

Figure 2. RAG system for originality evaluation. The student/instructor can interact with the LLM through a prompt. The system generates a structured summary of the previous works and creates an indexed knowledge base. The LLM is used to generate a summary of the relevant previous works and provide an evaluation of the originality of the new project.

Figure 3. Example of the raw data input for summarization.

Figure 4. Example of the prompt for summarization.

Figure 5. Example of a summary that will compose the knowledge base.

Figure 6. Example of the NotebookLM interface. The interface allows the user to search for information in the indexed database. The system returns the most relevant documents and their metadata. The system is capable of returning the metadata of the projects, which facilitates the search in the knowledge base.

Figure 7. Result for the project summarization of project 2019_01.

Figure 8. Raw data of the originality evaluation. The figure shows the project number and the given score for both teachers and the RAG system.

Figure 9. Correlation between the originality scores given by the RAG system and the teachers. The correlation is 0.87, indicating a strong relationship between the two evaluations.

Table 1. Different projects subjected to originality analysis.

	Related Works Names	Explained Similarity	Improvements	Score
1	2020: TwitterBot Project	The proposed idea presents a very high similarity with the 2020 papers entitled “TwitterBot Project” and “Twitterbot Project for Real-Time Interactions”. The title of the proposed project is almost identical to the second similar work. The main goal of implementing a real-time interaction system with the social network Twitter using a Raspberry Pi is shared by both similar works. [...]	Given the very low originality of the proposed work, the following significant lines of improvement are suggested: Define a highly specific and novel Twitter interaction approach that goes beyond the basic actions of posting, retweeting, and liking. For example, real-time sentiment analysis of tweets and automatic response based on that analysis, creation of generative narratives based on Twitter trends, or integration with other IoT services or platforms. [...]	1/10
2	2019: Environmental monitoring system using Raspberry Pi and sensors	The idea of using a Raspberry Pi as the core of the monitoring system, the reading of sensor data, the visualization of data through an interface (NodeRed) and the sending of alerts upon detection of anomalies are common elements with previous works. The main difference lies in the focus on hardware safety of rack servers, monitoring parameters like dB, power consumption and server-specific temperature.	Implement a trending analysis system to predict potential server failures based on historical data. Use ML algorithms to identify patterns that precede failure. Generate reports on energy efficiency and suggest optimizations. [...]	5/10
3	2021: Computer Vision based Fruit Identification and Pricing System and 2021: Hand Sign Detection System Using CNNs	The idea combines object detection with geographic localization for floating plastic garbage. Existing projects use computer vision and Raspberry Pi, but the focus on aquatic pollution and the use of Yolov8 add originality.	CloudInference with Flask, GPU server processing, and the use of Yolov8 represent a more advanced and specific approach compared to other computer vision projects based on Raspberry Pi. [...]	8/10

Table 2. Set of validation cases for the originality evaluation. The ideal score reflects the expected originality given previous ideas. Projects are sorted by increasing originality.

Project Name	Description	Ideal Score
Environmental Variables Logger	Application running on Raspberry Pi that reads a DHT11 or BME280 sensor to obtain temperature and humidity every minute. Data is stored in a CSV file. A graphical interface built with Tkinter displays current values.	1/10
Light Control by Sensor	System using a motion sensor (PIR) and a light sensor (LDR) to switch on a light when presence is detected under low illumination conditions. Basic graphical interface to configure thresholds.	2/10
People Counter with Camera	Uses OpenCV on the Raspberry Pi to detect people crossing a line (entry/exit). Counts are saved and published via MQTT to an application on PC or mobile.	3/10
Connected Perimeter Alarm	Home security system using magnetic door sensors and IR window sensors. If intrusion is detected, an alarm is activated and a notification is sent via MQTT.	4/10
Connected Greenhouse	Automatic irrigation and lighting system controlled by humidity, light, and temperature sensors. Web interface for remote monitoring. Communication via HTTP REST between the Pi and a web application.	5/10
Cooperative Weather Station	Network of multiple Raspberry Pi units connected via MQTT, each with different sensors. They share local meteorological data, which is aggregated and visualized on a central Pi.	6/10
Urban Microclimate IoT Network	Several Raspberry Pi units distributed across campus or city zones, measuring temperature, humidity, and CO₂ levels. Nodes synchronize and communicate using MQTT with a protocol designed by the student. Data is aggregated and geo-referenced.	7/10
Elderly Monitoring System	Presence sensors and cameras installed in key areas detect movement patterns. If the usual pattern breaks (e.g., no activity for hours), an alert is triggered. Data accessible through a local interface.	8/10
Pet Habit Monitor	System detecting when a pet eats, sleeps, or moves. Activity recognition through camera and sensors in the feeder. Automatic dispenser control. Web interface to review habits and adjust schedules.	9/10
IoT Predictive Maintenance System	System monitoring vibrations and noise in small motors (fans, pumps). Uses acceleration and sound sensors. Extracts features (FFT, RMS, etc.) and applies a locally trained ML model to anticipate failures. Sends notifications via MQTT.	10/10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luis, S.Y.; Reina, D.G.; Marín, S.T. Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms. Educ. Sci. 2025, 15, 706. https://doi.org/10.3390/educsci15060706

AMA Style

Luis SY, Reina DG, Marín ST. Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms. Education Sciences. 2025; 15(6):706. https://doi.org/10.3390/educsci15060706

Chicago/Turabian Style

Luis, Samuel Yanes, Daniel Gutiérrez Reina, and Sergio Toral Marín. 2025. "Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms" Education Sciences 15, no. 6: 706. https://doi.org/10.3390/educsci15060706

APA Style

Luis, S. Y., Reina, D. G., & Marín, S. T. (2025). Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms. Education Sciences, 15(6), 706. https://doi.org/10.3390/educsci15060706

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms

Abstract

1. Introduction

2. Previous Work

2.1. LLMs and Their Application in Education

2.2. Analysis of Originality with LLMs and RAGs

2.3. Improving LLMs’ Ability to Detect Novelty with RAG

3. Methodology

3.1. RAGs: Operating Principle

3.2. Proposed RAG System

3.2.1. Generation of Structured Summaries

3.2.2. Creation of the Indexed Knowledge Base

3.2.3. RAG Deployment

4. Results

4.1. Structured Summaries Generation

4.2. Examples of Proposal Originality Evaluation

4.3. Comparative Evaluation of Originality

4.4. User Feedback and Perceived Utility

4.5. Generalization to Other Courses and Disciplines

5. Discussion

5.1. Practical Implications

5.2. Limitations

5.3. Ethical Implications

6. Conclusions and Future Work

Author Contributions

Funding

Informed Consent Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI