1. Introduction
Numerous studies highlight the importance of evaluating trust in interactions with digital systems, as well as the need to develop mechanisms tailored to the unique characteristics of artificial intelligence (
Fikardos et al., 2025). Researchers such as Vorm and Combs (
Vorm & Combs, 2022) argue that trust is not merely a factor influencing usage, but rather an intrinsic attribute of the user that requires prior assessment. Just as users are often categorized based on their level of experience, they should likewise be classified according to their level of trust, as both dimensions directly affect system use (
Martín-Moncunill et al., 2024). Moreover, while experience with a system typically improves over time, trust may fluctuate in either direction.
Similarly, although there is a growing body of research on the use of AI as a learning tool for both professors (
Khlaif et al., 2025) and students (
Swidan et al., 2025), specific studies focusing on trust remain scarce. This is particularly striking given that trust—in instructors, methodologies, and tools—is a key element in the learning process (
Chavarria et al., 2025;
Đerić et al., 2025;
Ma et al., 2024;
Pan et al., 2024), yet it is often overlooked or not explicitly addressed.
During the 2023/2024 academic year, Microsoft Copilot Pro was made available to all students at the Camilo José Cela University as a learning support tool. A few weeks after its implementation, several professors—including those involved in this article—reported that some students had informally expressed their uncertainty about whether the university was promoting the use of this tool to monitor their activity and detect potential misuse.
Although this question was never formally raised, it sparked interest regarding the trust that the students place in Copilot and other Artificial Intelligence (AI) agents that they commonly use as learning support tools. This article describes a study that, in addition to assessing trust in terms of privacy, delves into the use of the results provided by AI agents among this collective.
We assume that, if students distrust the privacy offered by the tool, the way it is used will differ from what it would be if they were convinced of their anonymity (
Phua et al., 2025;
Croes & Antheunis, 2020;
Polyportis & Pahos, 2024). Some students may simply choose not to use it, making the university investment useless, and paradoxically, other students may use it more responsibly (
Chiu, 2025;
Ramirez et al., 2024). Student bias, specifically complacency bias as defined in several previous studies (
Parasuraman & Manzey, 2010), may help explain this behavior, particularly in relation to overconfidence and reduced vigilance. Moreover, various studies have highlighted the role of metacognitive deficits in explaining such behaviors (
D. Lee et al., 2025).
Another issue to assess would be the trust that the students place in the results provided by the AI agent and how they verify their accuracy (
Acosta-Enriquez et al., 2024;
Hartmann & Schumann, 2022). Based on the evaluation meetings held at the end of the 2023/2024 academic year, professors commonly stated that the students seemed to be performing very poor checks. They reported the following:
Answers contained true elements mixed with what appeared to be AI inventions.
Students did not verify the relationship between the tools used and those presented in class. For example, students in a structured programming course presented the following:
In most of these cases, the students were unable to explain how they had solved the exercises and made use of the following:
The use of inappropriate citations, due to being outdated, incorrect, fabricated, or dubious in their reliability (for example, due to commercial or political interests).
The lack of citations to support key data.
A presentation with a structure and expressions they recognized to be produced by AI.
The above points were collected as part of assessment meetings, based on the professors’ perceptions. These comments are a motivating element of the study, not part of it, which is why these data are not analyzed.
As in any other context, students rely too much or too little on the results provided by AI. In this case, distrust would not be as problematic since, whether the result is true or incorrect, the student’s effort to verify it should be a learning experience (
Yamaguchi et al., 2025).
Our problem lies in overconfidence, with the two most serious cases occurring when the following factors arise:
Case 1:
The result provided by AI is correct.
The student has not assimilated it correctly, as in the case of the structured programming subject explained above.
When producing a correct answer, if the misuse of AI is not detected by the professor, it will not be possible to guide the student.
Case 2:
The result provided by AI is incorrect.
The student has assimilated and studied it, believing the answer to be correct.
The professor is unable to correct the student. For example, the student is delivering an exercise; they are simply studying a topic; they are conducting a self-assessment activity; a correction between a group of students, etc.
Overconfidence in AI—ignoring purely fraudulent use—will negatively impact the learning process. Therefore, it is essential to know the level of trust students have in AI agents, how they check the veracity of the results, and to offer them guidance in this regard.
2. Materials and Methods
The aim of this article is to explore in greater depth students’ trust in AI tools and their perceptions of privacy when using them—an issue regarded as essential to the future of learning. Additionally, the study seeks to better understand the methods students employ to verify the accuracy of the results generated by these tools, as initial observations suggest that such practices are, at best, insufficient. Accordingly, two primary hypotheses are proposed:
H1: Students do not trust that their interactions with AI tools provided by the university are private.
H2: Students do not adequately verify the results obtained from AI tools.
As a reference tool, Microsoft Copilot Pro will be used, as it is the AI assistant officially provided by Camilo José Cela University to its students. At the time of writing, this tool is regarded as the premium licensed version of Microsoft’s AI, based on the GPT-4 and GPT-4 Turbo models. The research was planned during a period in which the free version of OpenAI’s ChatGPT was still based on GPT-3.5—a less advanced iteration of the same underlying architecture. For the sake of simplicity, the term Microsoft Copilot will be used to refer to Microsoft Copilot Pro throughout this text.
The study builds upon the application of the Perceived Operational Trust Degree in AI (POTDAI) assessment system (
Martín-Moncunill et al., 2024), which is based on the Technology Acceptance Model (TAM) (
Davis, 1989) and follows the recommendations proposed by
Vorm and Combs (
2022) regarding the role of transparency factors, how they influence trust and ultimately acceptance, and the need to include trust-related factors in TAM studies.
POTDAI is designed to assess the user’s trust perception in intelligent systems and their willingness to follow recommendations from those systems, based on criteria related to trust and mistrust. It can provide valuable insights into how to increase the trustworthiness of an intelligent system or how to implement those recommendations by understanding users’ perceptions of that system. It can be used as a standalone tool to evaluate perception, included as a new part of a TAM study or combined with any other HCI (Human–Computer Interaction) techniques tailored to the specific objectives at hand. Thus, using POTDAI as a base for our study, we can evaluate the perception of trust, both in terms of privacy and the accuracy of the AI’s results, as well as the perceived ability of students to detect errors made by AI.
Although the confirmation of the aforementioned hypotheses could have been supported by a simpler study design, the opportunity to engage students—whose participation is often difficult to secure—was leveraged to conduct a more comprehensive evaluation of their trust in and use of AI as an educational tool. For this reason, the POTDAI assessment framework was selected as the foundation for the study.
Several additional questions were included to gather information about the frequency of AI use (Q1), the main AIs used by our students (Q2), and how they verify the results provided by the AI (Q3, Q4, Q5, Q6, Q7). Those questions were not intended to enrich or modify POTDAI in any way, but to gather additional information not related to trust perception, in line with what was stated in the original POTDAI article (
Section 5) about the use of complementary techniques.
Given that one of the study’s objectives was to evaluate the distrust regarding the privacy of Microsoft Copilot’s use, we severely restricted the request for participant profiling data. Although obtaining additional participant data could have been useful for establishing relationships, this was completely beyond the scope of our study. In doing so, we avoided issues of distrust and bias that could have directly interfered with our purpose.
We adapted POTDAI questions following their authors’ recommendations, modifying the text in brackets [ ] as shown in
Table 1. Considering that the adaptation was minimal, consistency metrics were not calculated for this sample, which may limit the comparability of the results.
The POTDAI scale is useful for assessing the perceptions of students, which may not necessarily correspond to reality. To explore this aspect, we asked students to answer the following questions:
[PAGE 1]
Q1. I use AI as a learning support tool [5-point Likert scale for frequency].
Q2. Rank the AI tools you use most often.
Q3. Describe how you usually verify that the results provided by the AI are accurate.
[PAGE 2]
Q4. I ask the AI about the references it used to establish its answer and consult them [5-point Likert scale for frequency].
Q5. When the AI offers me a solution to an exercise, I verify that it matches the tools, methods, and solution processes proposed in the subject [5-point Likert scale for frequency].
Q6. If the AI offers me a solution or solution process that is different but valid from what is proposed in class, then, most often, I: [Study the new solution. | Ask the AI to solve the problem based on what was seen in class. | I discarded the AI to resolve my question and tried another way].
Q7. When I used AI to help me complete a project, the professor detected errors or deficiencies [5-point Likert scale for frequency].
[PAGE 3]
Q8. I believe Microsoft Copilot is likely to provide inadequate responses for my learning [5-point Likert scale for frequency].
Q9. I believe I could easily detect when Microsoft Copilot is giving me inadequate guidance and redirect the situation [5-point Likert scale for frequency].
Q10. I believe working with Microsoft Copilot could lead to overconfidence and dependency on the AI tool [5-point Likert scale for frequency].
Q11. I feel observed, monitored, or judged by Microsoft Copilot during my learning process [5-point Likert scale for frequency].
Q12. When I am unsure how to proceed, I usually follow Microsoft Copilot’s recommendations [5-point Likert scale for frequency].
Q13. I believe Microsoft Copilot will help me progress more safely in my learning, avoiding errors in the study and understanding of the subjects I am working on [5-point Likert scale for frequency].
These questions are related to the issues presented in the introduction regarding the professors’ perceptions of students’ use of AI. Question 3 could have been formulated with options such as “I check with another AI,” “I search for internet sources,” “I consult with my classmates,” etc. It was left open-ended to prevent students from falling into the temptation of marking options they had rarely or completely used. It was considered that this way, even if the subsequent analysis would be more complex, the answer would be more precise. For this same reason, the remaining questions regarding the use of AI were placed on a second page, while the third page contained the POTDAI evaluation, which the student could not see until completing the previous ones.
For the open-ended questions, two standardization procedures were applied: in the case of question Q2, “Rank the AI tools you use most often,” the names of AI tools and their possible variants (e.g., ChatGPT and GPT) were collected and unified under a single standardized label. For question Q3, “Describe how you usually verify that the results provided by the AI are accurate”, responses were reviewed based on the description of the procedure and grouped into standardized categories (ex post facto). Any response indicating that no verification methods were used was grouped under the category “no verification”. Responses that explicitly stated the absence of verification methods and mentioned that there was no need for them due to reliance on the student’s own judgment were recorded as “rely on personal knowledge”. Responses mentioning the use of new searches within the context of the AI itself, or cross-checking with other AIs, were grouped under the category “use the same or supplementary AI tools”. Reported verification methods that referred to internet searches without mentioning specialized databases or reliable sources (e.g., bibliographic), or that explicitly referenced specific search engines or other non-academic sources, were grouped under the category “conduct an internet search”. Responses referring to verification using academic databases or bibliographic sources were recorded as “search academic databases”. Responses referencing comparison with class exercises were grouped under the category “compare with class exercises or assignments”, while those referring to comparison with notes, class lessons, or asking the teacher were grouped under the category “compare with class lessons”. An “other” category was included to record any responses that did not fit into the previous categories.
3. Results
Responses were collected from a total of 132 students, aged between 18 and 53 years, who were recruited anonymously via a survey link distributed through class delegates and degree coordinators. Of these, 86 were enrolled in programs at the School of Engineering and Technology at UCJC, and 46 in Social Sciences. All UCJC students have received training in the AI tools provided by the institution.
The distribution of students across academic levels was as follows: within the Engineering group, there were 22 first-year students, 18 second-year students, 14 third-year students, 5 fourth-year students, and 27 postgraduate students. In the Social Sciences group, there were 3 first-year students, 9 second-year students, 12 third-year students, 11 fourth-year students, and 11 postgraduate students.
Responses were analyzed by academic field, year of study, and age group. The results showed a high degree of consistency across the different segmentations.
The findings presented below are organized according to each question block in the study. Each section begins with a direct description of the results associated with each question, followed by a more interpretative summary paragraph that synthesizes and connects the various points of interest identified.
In response to the first question, “I use AI as a learning support tool,” 67 participants answered “frequently” (50.8%), 38 “very frequently” (28.8%), 21 “occasionally” (15.9%), 6 “rarely” (4.5%), and none selected “never.” Overall, 79.6% of respondents reported using AI frequently or very frequently, while 20.4% used it occasionally or rarely. Notably, no participants indicated never having used AI in their studies.
The second question—
Table 2—“Rank the AI tools you use most often”, was open-ended. A total of 109 participants identified OpenAI’s ChatGPT as their primary tool, with 36 of them listing it as their sole option. Thus, 82.6% of respondents considered ChatGPT their main AI tool, and 27.3% used it exclusively. Microsoft Copilot, the university-licensed AI tool, was ranked first by 9 participants (6.8%) and second by 25 (18.8%), with an overall usage incidence of 16.8%. Other frequently mentioned tools included DeepSeek and Google’s Gemini, each cited 32 times (9.2% of total mentions each). A wide variety of additional specialized AI assistants were referenced, accounting for 88 out of 345 total, not corresponding to the aforementioned tools.
As previously stated, regarding the question “Describe how you usually verify that the results provided by the AI are accurate”—
Table 3—seven distinct verification strategies were identified: “No verification” (4.3%), “Rely on personal knowledge” (9.9%), “Use the same or supplementary AI tools” (25.9%), “Conduct an internet search” (35.4%), “Search academic databases” (3.3%), “Compare with class exercises or assignments” (10.4%), and “Compare with class lessons” (9.4%). Some responses included multiple strategies, resulting in 212 total mentions, exceeding the number of participants. These different strategies were often mentioned in relation to specific usage contexts (personal use, class use, etc.), reflecting the varied ways in which participants approached verification depending on the situation. Only 3 responses (1.4%) did not align with any of the identified categories. Overall, 76.9% of responses reflected verification practices that may be considered insufficiently rigorous (e.g., no verification, reliance on personal judgment or AI tools, or unverified internet sources). Fewer than one in four participants reported using academic databases or class materials for verification.
In response to the fourth question, “I ask the AI about the references it used to establish its answer and consult them,” 42 participants answered “very frequently” (31.8%), 41 “frequently” (31.1%), 31 “occasionally” (23.5%), 11 “rarely” (8.3%), and 7 “never” (5.3%). Thus, 62.9% of respondents reported frequently or very frequently checking the sources cited by AI, while 13.6% did so rarely or never. These findings are consistent with the previous results.
The fifth question, “When the AI offers me a solution to an exercise, I verify that it matches the tools, methods, and solution processes proposed in the subject,” yielded the following responses: 52 “very frequently” (39.4%), 60 “frequently” (45.5%), 13 “occasionally” (9.8%), 4 “rarely” (3%), and 3 “never” (2.3%). A total of 84.9% of participants reported frequently or very frequently verifying that AI-generated solutions aligned with classroom methodologies, while 5.3% did so rarely or never.
To the question “If the AI offers me a solution or solution process that is different but valid from what is proposed in class, then, most often I:“—
Table 4—41 participants reported studying the new solution (31.1%), 83 asked the AI to adapt to classroom criteria (62.9%), and 8 chose to disregard the AI and seek alternative methods (6%).
The seventh question, “When I used AI to help me complete a project, the professor detected errors or deficiencies,” received the following responses: 2 “very frequently” (1.5%), 5 “frequently” (3.8%), 29 “occasionally” (22%), 65 “rarely” (49.2%), and 31 “never” (23.5%). Nearly one-quarter of students stated that professors never identified errors in AI-generated content, while 71.2% reported that such errors were rarely or occasionally detected. Only 5.3% indicated that errors were frequently or very frequently identified.
This initial round of questions—as shown in
Table 5—outlines scenarios in which a significant majority of students (79.5%) report using AI frequently or very frequently in their academic work, with none indicating complete non-use. All participants had used AI tools at some point, with OpenAI’s ChatGPT being the predominant choice (82.6% as the primary tool, and 27.3% as the sole tool). Microsoft Copilot, despite being institutionally licensed and integrated into university-provided Microsoft tools, was used by only 16.8% of students. More than three-quarters of participants (76.9%) reported either not verifying or inadequately verifying AI-generated content. When AI responses diverged from classroom instruction, 62.9% asked the AI to conform to class methodologies, 31.1% explored the alternative solution, and 6% rejected the AI response altogether. Finally, nearly three-quarters of students (72.7%) stated that professors rarely or never detected errors in AI-assisted work.
The following set of questions—
Table 6—focused on Microsoft Copilot, the AI tool officially provided by the university to its students. Despite being the second most frequently used AI tool according to survey results, it lagged significantly behind OpenAI’s ChatGPT, with 56.1% of participants indicating they did not use it at all. The institution has actively promoted Microsoft Copilot as the preferred AI tool, offering several seminars on its advanced use. For this reason, it was assumed that all participants were sufficiently informed to respond to specific questions about the tool, regardless of their frequency of use.
In response to the statement “I believe Microsoft Copilot is likely to provide inadequate responses for my learning,” 9 participants answered “very frequently” (6.8%), 30 “frequently” (22.7%), 63 “occasionally” (47.7%), 28 “rarely” (21.2%), and 2 “never” (1.5%). More than a quarter of students (29.5%) believed Copilot frequently or very frequently provided inadequate responses, a figure that rises to 77.2% when occasional inadequacies are included. Only 22.7% believed that such inadequacies were rare or nonexistent.
The ninth question, “I believe I could easily detect when Microsoft Copilot is giving me inadequate guidance and redirect the situation,” yielded 24 “very frequently” (18.2%), 56 “frequently” (42.4%), 40 “occasionally” (30.3%), 12 “rarely” (9.1%), and none selected “never.” A total of 60.6% of students expressed confidence in their ability to frequently or very frequently detect errors from Microsoft Copilot, and this figure increased to 90.1% when occasional detection was included. Notably, no participant reported being unable to detect such errors, and only 9.1% expressed uncertainty.
In response to the statement “I believe working with Microsoft Copilot could lead to overconfidence and dependency on the AI tool,” 23 participants answered “very frequently” (17.4%), 21 “frequently” (15.9%), 35 “occasionally” (26.5%), 38 “rarely” (28.8%), and 15 “never” (11.4%). One-third of students (33.3%) believed that using Microsoft Copilot could frequently or very frequently lead to overconfidence or dependency. Conversely, 40.2% considered this outcome unlikely or impossible.
The eleventh question, “I feel observed, monitored, or judged by Microsoft Copilot during my learning process,” received 7 “very frequently” (5.3%), 13 “frequently” (9.8%), 26 “occasionally” (19.7%), 34 “rarely” (25.8%), and 52 “never” (39.4%). Only 39.4% of respondents firmly rejected the idea of being observed by the AI tool, while 60.6% acknowledged this possibility, with 15.1% experiencing it frequently or very frequently.
The statement “When I am unsure how to proceed, I usually follow Microsoft Copilot’s recommendations,” yielded 8 “very frequently” (6.1%), 37 “frequently” (28%), 44 “occasionally” (33.3%), 37 “rarely” (28%), and 6 “never” (4.5%). Just over one-third of students (34.1%) reported that they would frequently or very frequently follow Copilot’s recommendations when uncertain, while 32.5% indicated they would rarely or never do so.
Finally, in response to the statement “I believe Microsoft Copilot will help me progress more safely in my learning, avoiding errors in the study and understanding of the subjects I am working on,” 19 participants answered “very frequently” (14.4%), 49 “frequently” (37.1%), 44 “occasionally” (33.3%), 16 “rarely” (12.1%), and 4 “never” (3%). More than half of the students (51.5%) believed that Microsoft Copilot could frequently or very frequently support their learning process, while only 15.1% considered this unlikely or impossible.
These results reveal that 77.2% of students perceive Microsoft Copilot as a potentially unreliable source of information, although over 90% feel capable of identifying its errors—none reported being unable to do so. Opinions were more divided regarding the risk of overconfidence and dependency: nearly one-third of students viewed this as a real possibility, while a slightly larger proportion considered it unlikely. This polarization was also evident in responses adopting Copilot’s suggestions. Surprisingly, only 39.4% of students categorically rejected the idea of feeling observed by the AI tool, despite being informed that neither Microsoft nor the university retains records of their interactions. Paradoxically, only 15.1% of participants believed that Copilot would not contribute positively to their learning.
Survey results were also analyzed by age, academic year, and academic precedence. Overall, the findings were relatively homogeneous, with differences generally remaining below a 10% percentage relative to segmented participation rates—except in the specific cases discussed below. The data were also compared using the non-parametric Mann–Whitney U test. In the case of POTDAI questions compared by academic background (Social Sciences vs. School of Engineering), only one significant difference was detected in question Q9 (“I believe I could easily detect when Microsoft Copilot is giving me inadequate guidance and redirect the situation.”), with an effect size of 0.22 points, which falls within the category of small effects according to Rosenthal’s r. When compared by age (25 years or younger vs. 26 years or older), only a minor difference was found in question Q11 (“I feel observed, monitored, or judged by Microsoft Copilot during my learning process”), with an effect size of 0.216 points, which is also not considered significant. The comparison between undergraduate and postgraduate groups did not yield any significant differences.
To explore correlations between POTDAI questions and the rest of the survey items, the non-parametric Spearman correlation test was used. A weak relationship (p-values between 0.24 and 0.28) was found between question Q1 (“I use AI as a learning support tool”) and questions Q11, Q12 (“When I am unsure how to proceed, I usually follow Microsoft Copilot’s recommendations”), and Q13 (“I believe Microsoft Copilot will help me progress more safely in my learning, avoiding errors in the study and understanding of the subjects I am working on”). A weak negative relationship was also detected (p-value of 0.016 and Spearman’s Rho of −0.209) between questions Q4 (“I ask the AI about the references it used to establish its answer and consult them”) and Q1 (“I believe Microsoft Copilot is likely to provide inadequate responses for my learning”). The correlation between question Q5 (“When the AI offers me a solution to an exercise, I verify that it matches the tools, methods, and solution processes proposed in the subject”) and questions Q9–Q10 (“I believe working with Microsoft Copilot could lead to overconfidence and dependency on the AI tool”) also showed weak relationships (p-values of 0.003 and 0.007, respectively), as did questions Q7 (“When I used AI to help me complete a project, the professor detected errors or deficiencies”) and Q9, although in the latter case the relationship was also negative. No other correlations yielded significant results, suggesting that the findings are consistent across groups.
In response to the question “I use AI as a learning support tool,” students from Social Sciences programs selected “very frequently” 14.6% more often than their peers in Engineering, who more commonly selected “occasionally” (12.7% more). While overall AI usage levels were similar across both groups, Social Sciences students reported higher usage of the DeepSeek tool (10.6% more) and other specialized AI assistants (15.4% more).
For the question “I use a method to verify that the results provided by the AI are correct,” Social Sciences students were 15.4% more likely to report using the same or supplementary AI tools for verification. In contrast, Engineering students were 15.7% more likely to verify results by comparing them with exercises or assignments completed in class.
In response to “If the AI offers me a solution or solution process that is different but valid from what is proposed in class, then, most often I: “Social Sciences students were 10% more likely to study the new solution, whereas Engineering students were 11% more likely to request a revised response aligned with classroom criteria.
Regarding the question “When I used AI to help me complete a project, the professor detected errors or deficiencies,” 10.7% more Engineering students selected “never.”
In response to “I believe Microsoft Copilot is likely to provide inadequate responses for my learning,” Social Sciences students showed a more moderate stance, with 16.5% more selecting “occasionally.”
For the question “I believe I could easily detect when Microsoft Copilot is giving me inadequate guidance and redirect the situation,” 11.7% more Social Sciences students selected “frequently,” while Engineering students more often selected “occasionally.”
In response to “I believe working with Microsoft Copilot could lead to overconfidence and dependency on the AI tool,” 14.2% more Engineering students selected “rarely.”
Regarding the statement “When I am unsure how to proceed, I usually follow Microsoft Copilot’s recommendations,” 15.6% more Social Sciences students selected “occasionally,” whereas Engineering students’ responses were more polarized across “rarely,” “frequently,” and “very frequently.”
A similar pattern emerged in response to “I believe Microsoft Copilot will help me progress more safely in my learning, avoiding errors in the study and understanding of the subjects I am working on,” with 12.2% more Social Sciences students selecting “occasionally.”
4. Discussion
Based on the results obtained throughout the study, several key points can be identified that support the validation of the proposed hypotheses. First and foremost, the impact of AI on the educational domain is evident, with nearly 80% of students reporting frequent or very frequent use of these tools during their learning process. Not a single respondent stated that they had never used AI tools for academic purposes. This usage is primarily centered on general-purpose models not specifically trained for particular tasks. Over 82% of participants identified OpenAI’s ChatGPT as their primary AI assistant, and in more than one-quarter of cases, it was the only tool used. This suggests that domain-specific specialization is not a decisive factor in the selection of AI tools, as the results showed less than a 10% difference between Engineering and Social Sciences students.
Just a menial fraction of the 25% alternative tools mentioned by students corresponded to models focused on specific areas (grouped under the “other” category in this study), and almost none were cited as the primary option. Even when the university provided a licensed tool—potentially perceived as superior to free and open alternatives—its use was limited to 2.6% as a first choice and 14.2% as a secondary or lower-tier option. These findings indicate that students tend to favor widely recognized or dominant tools over those tailored to their field of study, which may reflect either a lack of concern for the reliability of AI-generated responses or a lack of awareness regarding the differences among available tools.
Regarding the verification of AI-generated results and sources, more than 75% of students reported using methods considered unreliable for this purpose. Only 62.9% stated that they frequently or very frequently requested sources from the AI—regardless of whether they subsequently verified them. Just 5.3% reported rarely or never checking whether AI responses aligned with the methods, tools, and problem-solving processes taught in class. This suggests that students’ trust in AI is based more on the plausibility of its responses and their alignment with typical procedures than on rigorous verification. This notion is further supported by the methods students reported using to validate AI outputs (such as “I contrast the information provided with videos recommended by the AI”, “I compare with several AI tools” or “I ask the AI for the same information several times”), which reveal a problematic landscape given the tendency of these tools to “hallucinate”—producing false but plausible answers. It is important to mention that Q3, “Describe how you usually verify that the results provided by the AI are accurate”, and Q5, “When the AI offers me a solution to an exercise, I verify that it matches the tools, methods, and solution processes proposed in the subject”, differ in their usage context. Question Q5 specifically states that verification is carried out in a learning context (an “exercise”), whereas question Q3 refers to general usage, including personal contexts. This distinction became evident in responses to Q3, where various verification methods were specified for different usage contexts. This differentiation would be an interesting subject for future research.
Only 5.3% of respondents stated that they would discard AI-generated responses if they differed from classroom methods, while just over 31% used such discrepancies as opportunities to deepen their understanding of alternative perspectives. Nearly 63% instructed the AI to conform to specific methodologies or formats taught in class, indicating that a clear majority primarily tend to seek plausible answers within their educational context. This tendency is reinforced by students’ perceptions regarding the likelihood of AI providing inappropriate results—or of professors being able to detect such errors, which was deemed unlikely by 71.2% of respondents, rising to 94.7% when occasional detection was included. In summary, the findings point to an overreliance on AI tools and insufficient verification of their outputs, underpinned by the belief that instructors are unlikely to identify errors.
Interestingly, when asked whether Microsoft Copilot might provide inadequate responses, over 75% of respondents considered this a possibility, ranging from very frequent to occasional. It is important to note that the university has provided training on this tool, explicitly stating that Copilot uses a more advanced version of OpenAI’s ChatGPT engine than the free version. This reinforces the conclusion that students’ trust is not based on the technical capabilities of the tools themselves, but rather on their perception of brand reliability or on practices that have become normalized through habitual use. ChatGPT seems to be perceived as more trustworthy than Microsoft Copilot Pro, despite the latter being a technically superior implementation of the same model—at least when compared to the commonly used free version. It may be that the lack of trust is not directed at the model itself but at the branding associated with Microsoft Copilot, although this hypothesis would require further investigation.
On the other hand, students’ self-perceived ability to detect errors in AI responses is notably high. Over 90% of respondents stated that they could identify inadequate responses at least occasionally, and none reported being unable to do so. This figure contrasts sharply with the perceived ability of professors to detect such errors, which was considered unlikely in nearly 95% of cases. This suggests that students either conduct a thorough initial screening of AI outputs or have greater confidence in their own judgment than in that of their instructors, as can be seen in
Figure 1.
These findings align with other studies that link the ability to detect errors in AI-generated outputs to the Dunning-Kruger effect (
Guan et al., 2025;
Yousef et al., 2025;
Besigomwe, 2025) or similar constructs (
Nosrati & Motaghi, 2025), whereby a lack of experience in a given domain of knowledge may lead individuals to place disproportionate confidence in their own evaluative abilities.
According to the data collected, one-third of the students acknowledged that using AI tools—particularly Copilot—could lead to overconfidence, while more than 40% considered this outcome unlikely or impossible. Over 60% reported feeling at least occasionally observed or judged by Copilot, indicating a tendency to evaluate interactions with AI tools subjectively and irrationally. This is noteworthy given that university training explicitly assured students that their conversations with the assistant are neither stored nor used to train the model. This perception may also reflect a degree of mistrust toward the institution itself, as it is responsible for providing both the tool and the training.
Regarding whether students followed Copilot’s recommendations when unsure how to proceed, responses were nearly evenly split: one-third reported doing so even when uncertain; another third responded “occasionally;” and the remaining third stated they rarely or never followed such guidance. It is also worth noting that one-third of respondents expressed ambivalence about whether Copilot could support their learning process. However, in this case, positive responses outweighed negative ones (51.5% vs. 15.1%, respectively). Once again, a degree of irrational subjectivity is evident in the responses: while more than half of the students believed that AI could frequently or very frequently support their learning, their confidence in the tool was called into question by several of their previous answers.