1. Introduction
Data analysis and artificial intelligence methods are increasingly indispensable skills across many types of industries, as they are essential for boosting the development of new services and products, fueling decision-making, driving innovation, and steering strategic planning (
Iansiti & Lakhani, 2020;
Jagatheesaperumal et al., 2022). Therefore, higher education institutions are challenged to develop innovative educational solutions that foster these cross-disciplinary competencies in their students, preparing professionals to proficiently and confidently navigate an emerging future with unknown challenges in their professional lives (
Kuleto et al., 2021;
Luan et al., 2020). Traditionally, data analysis and machine learning competencies are taught to freshman, sophomore, junior, and senior college students depending on the career and the target level of specialization. For instance, descriptive statistics and linear regression topics are taught to freshmen and sophomore students in all engineering careers (
Balady & Taylor, 2024;
Legaki et al., 2020), while more advanced topics such as digital signal processing and classification models are taught to junior and senior students in computing and robotics careers (
Anna Åkerfeldt & Petersen, 2024;
Terboven et al., 2020). In general, prerequisites are first- and second-year mathematics and statistics, along with a programming language. For more specialized courses, additional prerequisites include linear algebra, differential equations and numerical analysis. Regardless of the academic year, data analysis and machine learning courses primarily focus on theoretical concepts accompanied by practical activities using a programming language and previously recorded and pre-processed datasets (
Bin Tan & Cutumisu, 2024).
This conventional approach can hinder the development of practical problem-solving abilities and reduce intrinsic motivation, as students may not fully grasp the relevance or application of the concepts which often limit students’ engagement with the complete data lifecycle, from collection and preparation to the contextualization of real-world applications. In addition, students are also expected to develop skills to design and execute experiments in unexpected application fields, as required in their professional lives, often without having had the opportunity to do so in their courses. Developing the competency to design and implement real-life solutions based on courses with solid theoretical components, and crucially, using data collected and prepared by others rather than by the students themselves, is a highly challenging pedagogical task.
The development of competencies in engineering students has been investigated in various domains, such as in a multidisciplinary project-based learning method for a mechanism course (
Guajardo-Cuéllar et al., 2022), in a classical product development project versus a challenge-based project to enhance innovation skills (
Guido Charosky & Bragós, 2022), in the fabrication of a boomerang device using analytical and computational tools (
Guajardo-Cuéllar et al., 2020), and with mixed reality for control engineering (
Navarro-Gutiérrez et al., 2023). Several studies have also proposed learning activities and reported best practices to develop data analysis and machine learning skills in university students (
Padilla & Campos, 2023;
Tsai, 2024). A method to integrate data analysis and machine learning in a learning management system was also proposed to improve student learning in an online education model (
Villegas-Ch et al., 2020). Other efforts advocate for scholars teaching data science-related topics to possess, in addition to theoretical expertise, experience in solving real-world problems, collecting data, and considering ethical implications (
DeMasi et al., 2020;
Hicks & Irizarry, 2018). In addition, contemporary educational strategies aim to enhance student engagement and motivation. For instance, participatory teaching methods in hands-on courses have been shown to positively impact learning motivation by allowing students to actively participate in content planning and evaluation (
Ma, 2023). Similarly, gamification has been explored as an approach to increase motivation and reduce cognitive load, thereby improving learning outcomes in various settings (
Baah et al., 2024). Furthermore, the application of machine learning models to predict student success in online courses also underscores the increasing integration of ML in education itself, validating the relevance of these competencies within pedagogical research (
Arévalo-Cordovilla & Peña, 2024).
Despite significant efforts by educators and the widespread availability of online courses and tools for teaching data analysis and machine learning models (
Alam & Mohanty, 2023;
Ismail et al., 2023;
Thomas Donoghue & Ellis, 2021), these common teaching methodologies often do not incorporate domain-specific contextual information relevant to the application area. Also, these approaches contribute significantly to enhancing educational experiences; many still rely on structured or pre-processed datasets, limiting the authentic experience of managing raw, noisy data from real-world experiments. This drastically limits students’ ability to understand and leverage the relationship between theory and practice, which is essential for developing the competencies needed to address real-world situations demanded by the labor market. Consequently, the challenge lies in fostering an educational environment that bridges the gap between theoretical knowledge and practical, hands-on experience, ensuring that students are active participants in their learning journey. Research indicates that active engagement significantly enhances learning outcomes and performance, highlighting the need for innovative pedagogical strategies that promote deep interaction with the subject matter (
Oliver & McNeil, 2021;
Savonen et al., 2024). Hence, to promote the development of knowledge, skills and attitudes related to data analysis and machine learning, we designed and implemented a learning activity that requires students’ active participation in the entire process of data recording, processing, analysis, and application of machine learning methods. Motivated by this, we propose a learning activity that aims to develop data analysis and machine learning concepts, skills, and attitudes in engineering students from diverse careers through hands-on experiences. The goal is to actively involve students by making them active participants in the learning process, using their own data.
Tecnologico de Monterrey, a top-ranked higher education and research institution in Mexico, offers a new teaching model based on competency development, which fosters learning and knowledge discovery through solutions to real challenges proposed by external and independent educational partners. This educational model called
TEC21 (
ITESM, 2018;
Olivares Olivares et al., 2021), allows the development of skills, attitudes and values to address real problems through the creation of a natural link between theory and practical situations. This model aligns with current trends of higher education institutions that promote moving from the objective-based learning model towards a competency-based learning model (
Gooding, 2023;
Thompson & Harrison, 2023). This new teaching trend aims for students to develop skills and attitudes focused on understanding and bridging the gap between theory and practice. This is precisely what needs to be encouraged in the teaching of data analysis and machine learning for engineering students.
Aligned with our educational model, this paper presents the implementation of a practical learning activity consisting of the collection and visualization of students’ electroencephalographic (EEG) brain signals in real settings, the application of signal processing techniques, and the use of machine learning methods to decode the gathered data. The use of non-invasive brain signals was motivated by the strong appeal and interest it generates in students regardless of their career, its user-friendliness for conducting experiments, and its suitability for exploring and understanding data analysis and machine learning concepts and tools.The activity was implemented in six courses across four engineering careers over two consecutive academic years. The learning gain was used to measure the grade change brought about by the activity, and the MUSIC model inventory was used to quantify the students’ perceptions of empowerment, usefulness, success, interest, and caring generated by the activity. In addition, contextual examinations were applied to spark interest and to measure students’ pre- and post-activity experience related to the learning activity’s topics. This practical activity fostered learning by doing, which in turn enhanced the development of data analysis and machine learning competencies in the courses where it was applied, and increased students’ motivation and appreciation for these topics.
Our work distinguishes itself by proposing and implementing a novel learning activity that immerses engineering students in the entire cycle of data analysis and machine learning by having them collect their own electroencephalographic (EEG) signals through hands-on experiments. This distinctive element ensures that students are not merely processing data but are intimately involved in its generation and contextualization, providing an understanding of data characteristics and their implications for analysis. Unlike studies focusing solely on pre-existing data or simulated environments, our methodology directly addresses the critical gap in developing the competency to design and execute experiments in unexpected application fields, a skill highly valued in professional life.
2. Materials and Methods
2.1. Learning Activity
The learning activity consists of the following four stages to be described in the subsequent subsections.
2.1.1. Stage 1: EEG Acquisition and Visualization Experiment
In this stage students are first instructed on how to use the hardware and software components for acquisition and visualization of electroencephalographic (EEG) brain signals. Students are then organized into groups and instructed to connect all hardware components and make all necessary software configurations to acquire and visualize EEG signals on their own. For this, one of the students adopts the role of experimental subject (from whom the EEG signals will be recorded) while the rest adopt the role of experimenters (responsible for executing the experiment).
With the signals displayed in real time, all students are instructed to recognize noise-free EEG signals in a relaxed state with open and closed eyes, as well as EEG signals contaminated by artifacts such as blinking, eye movement, jaw clenching and body movement. This practical activity is used to introduce and describe technical concepts such as continuous-time and discrete-time signals, analog-to-digital conversion, sampling frequency, Nyquist frequency, frequency band, and digital filtering, among others.
2.1.2. Stage 2: Evoked Potentials Experiment
The purpose of this stage is to carry out EEG experiments to develop data recording skills, and to obtain their own dataset of EEG signals that will be used in the next stages for data analysis and machine learning purposes. Once again, students are organized into teams where one member (different from the one in the first stage) is assigned as the subject and the rest are assigned as experimenters. Experimenters first configure the hardware and software to acquire and display the EEG signals (as previously carried out in stage 1) from the subject. The experimenters then start a graphical user interface (GUI) that will guide the subject through the tasks to be performed while the EEG signals are recorded. The GUI presents a classic P300 evoked potential experimental protocol (see
Chailloux Peguero et al. (
2020);
Picton (
1992) for a detailed description of the P300 paradigm). In this protocol, the subject is seated in front of a computer screen wearing the EEG system and is instructed to follow the instructions presented in the GUI. The GUI displays five symbols: four arrows located at the left, right, top, and bottom of the screen and a “STOP” traffic sign located in the center.
The students then initiate the experiment, which is executed as follows. One of the symbols is randomly highlighted with a blue background for a couple of seconds, indicating the target symbol that the subject has to focus on. The symbols are then randomly highlighted, one by one, by superimposing a yellow smiling cartoon face on them. The subject is requested to mentally count each time the target symbol is highlighted while ignoring instances when the other non-target symbols are being highlighted. This experiment lasts almost 5 min and at the end, the students will obtain a data file with the raw EEG data along with a marker signal that indicates the exact time instants the target and non-target symbols were highlighted.
2.1.3. Stage 3: EEG Data Preparation and Processing
In this stage, students are instructed to work with the previously collected data using software for data analysis such as Python 3.10or Matlab R2022b. Students can work individually or in pairs. The first task is to upload and plot the raw data using code they develop, while being guided by the professor. They inspect the EEG signals and the marker signal that indicates the time instant of the presentation of the target and non-target symbols. This visual analysis is used to relate the experiment with the recorded data. They then proceed to build a code to extract segments of EEG signals with respect to the presentation of the happy face superimposed in the symbols. These data segments are extracted with a duration of one second and are labeled as target or non-target depending on whether the subject was or was not attending to, as encoded in the marker signal. Note that this task promotes the acquisition of skills and technical details about data preparation and manipulation, which are not typically performed when data is pre-recorded and prepared by someone else.
In the second task, students are instructed to apply data analysis techniques such as descriptive statistical analyses, and signal processing techniques such as band-pass filtering, resampling and artifact-contaminated segment identification and rejection. Technical details about these data analysis and signal processing methods are discovered by and/or suggested to the students before they are asked to implement and apply them. During the application of these techniques, students are always asked to identify and understand their effects by comparing the signals before and after the application of the method. Finally, students are asked to apply the ensemble averaging technique to compute the evoked responses (
Chailloux Peguero et al., 2020;
Delijorge et al., 2020). This technique is separately applied for the segments of target and non-target stimulation, independently for each EEG channel.
2.1.4. Stage 4: Machine Learning Model
This final stage aims to develop students’ knowledge and skills in feature extraction techniques, classification models, training and testing of the classification models, as well as methodologies and metrics for the assessment of machine learning models. As described in stage 3, students can work individually or in pairs using software for data analysis. They use the pre-processed and cleaned EEG data segments previously extracted for the target and non-target conditions.
The first task here is to build their own code to extract (compute) features or attributes. Since the experiment is based on evoked potentials (as described in stage 2), this task focuses on extracting temporal features (
Torres-García et al., 2022). Students are instructed and guided to perform downsampling and to append the data across all EEG channels. This is repeated for all data segments and as a result, they construct and organize a feature matrix with columns representing the features. They are also guided to build a class label vector indicating whether each feature vector belongs to a non-target condition (defined as class 0) or a target condition (defined as class 1). Hence, students learn by doing via various important concepts such as feature vector, dimension, class label, feature selection, and feature transformation, among others. Notice that all of these machine learning concepts are derived from and related to the experiment, and therefore, students learn to directly associate these concepts with real-life applications.
In the next task, students learn to use a classification algorithm to distinguish between target and non-target conditions using a feature vector extracted from an EEG data segment as input. They are instructed to split the feature matrix and its associated class label vector into two parts, one for training purposes and the other for testing purposes. They are requested to explore and select a classification model suitable for discriminating between target and non-target situations. Technical details of the classifiers are explored and discovered. Through coding, they initialize the selected classification model and fit its parameters using the training data (this constitutes the model’s training). The testing data is fed into the tuned classifier to generate predictions. Finally, students discover performance metrics to assess the effectiveness of the classification model in recognizing target and non-target EEG data segments. They are also encouraged to propose graphical ways to present results. Again, this hands-on learning procedure allows students to discover more machine learning concepts. As a closing remark, a discussion about how to deploy the machine learning model is held.
Figure 1 depicts the description of the four stages of the learning activity. Notice that all four stages involve hands-on practical tasks performed by the students, exploration and discovery of technical concepts, and computational and programming tasks, guided by the professor. Moreover, the four stages are time-flexible and can be adapted for implementation in one or more teaching sessions depending on the career, course, competencies, learning goals, thematic content, or any other relevant aspects. For example, more emphasis can be placed on stage 4 (machine learning model) for data science students, or on stage 2 (evoked potential experiment) for biomedical students. In other words, the number of sessions and the duration of the learning activity might vary depending on the particular characteristics and needs specific to the career and the course topics.
2.2. Validation Instruments
To assess the impact of the learning activity, two instruments were employed: (1) the learning gain (LG), measured by comparing grades before and after the activity; and (2) the dimensions of the MUSIC model of academic motivation inventory. Details regarding the application of these two instruments are described below.
2.2.1. Learning Gain (LG) Based on Academic Performance
The first validation instrument aimed to quantify the learning gain (LG), defined as the change in academic performance, i.e., grades of the students posterior to versus prior to the implementation of the learning activity. To accomplish this, an examination was administered to the students to assess their performance before (
Pre) and after (
Post) the initiation of the stages. This validation instrument, based on the academic performance, has been successfully applied in previous studies to assess improvements achieved through various learning methods and tools (
Hartikainen et al., 2019;
Vermunt et al., 2018).
In our work, the inquiries in the Pre and Post examinations consisted of several closed-ended questions directly related to the topics of the course and the learning activity. The professor responsible for a course proposed the examination questions, while other professors reviewed them and offered suggestions for improvement. Thus, examinations were independently designed and administered for each course. This was performed to consider the specific concepts and terminology of each course, in spite of all courses addressing similar data analysis and machine learning topics, and to develop competencies with similar characteristics. Note that the academic performance or grade achieved by the student (either Pre or Post), represents a generic measurement across all courses and careers, thus enabling overall evaluations and fair comparisons.
The number of questions ranged from sixteen to twenty; nonetheless, seven questions were the same across all the courses where the learning activity was implemented. These seven common questions, with their corresponding response options and their correct responses (highlighted in bold and italic), were as follows:
- Q1:
Electroencephalography is a method of recording changes in blood flow in the brain.
- R1:
True/False
- Q2:
The sampling frequency of a signal is the number of samples of the signal.
- R2:
True/False
- Q3:
What sampling frequency of EEG signals is sufficient to be able to analyze delta, alpha, beta and gamma rhythms?
- R3:
16 Hz/30 Hz/40 Hz/80 Hz/256 Hz
- Q4:
A typical amplitude range of EEG signals is:
- R4:
−100 to 100 mV/−100 to 100 μV/−1 to 1 μV/−1 to 1 A −1 to 1 μA
- Q5:
It is not source of artifacts in EEG signals:
- R5:
Blinking/Head movement/Chew/Cognitive load/Bad electrode placement
- Q6:
Where could the reference and ground electrodes be placed for the acquisition of EEG signals?
- R6:
Heart and elbow/Earlobe and neck/Wrist and right foot/Head and earlobe/Head and neck
- Q7:
If a signal was recorded at a sample rate of 1200 Hz and there are 4s of that signal, how many samples does the signal have in total?
- R7:
300/600/1200/2400/4800
Note that these questions were translated here to English but they were administered in Spanish since all students were native Spanish speakers. The examinations were provided through an online form. We hypothesized that the scores would be low in the Pre examination and they would significantly increase in the Post examination as a consequence of the learning activity the students were involved in.
2.2.2. Motivation Based on the MUSIC Model Inventory (MMI)
As a second validation instrument, we assessed students’ perceptions of the motivational climate through the college student version of the MUSIC Model of Academic Motivation Inventory (
MMI) (
Jones, 2009,
2012,
2018,
2019). This self-report instrument consists of 26 items that are easily completed on a six-level scale ranging from strongly disagree to strongly agree (the response options for each item are 1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Somewhat Agree; 5 = Agree; 6 = Strongly Agree. These items are used to compute five scores that assess students’ motivational perceptions in five dimensions: empowerment, usefulness, success, interest, and caring. These dimensions measure respectively the degree to which a student believes or perceives the following: (i) She/he is in control of the learning environment; (ii) The coursework or the learning activities are useful to her/his future; (iii) She/he can succeed at the coursework or the learning activities; (iv) The methods and tasks in the coursework or the learning activities are interesting and enjoyable; (v) The instructor cares about the student’s success and well-being during the coursework and learning activities.
The
MMI has been extensively used to assess students’ perceptions about the motivational climate after coursework or learning activities in different learning environments and situations (
Amato-Henderson & Sticklen, 2022), whose validations have been well established in health science courses (
Jones et al., 2023;
Tehmina Gladman & Ali, 2020), English and other second-language courses, and student pharmacists, among others (
Jones, 2020;
Jones et al., 2023;
Pace et al., 2016). In the present work, all students that decided to be part of the research by signing the informed consent form and that responded to the
Pre and
Post examinations were also requested to respond to the
MMI items provided through an online form.
2.3. Participants and Courses
Students from several courses and undergraduate programs at the School of Engineering and Sciences (EIC) at Tecnologico de Monterrey participated in this study. All students enrolled in the courses were informed about the goals of the research in a session prior to the implementation of the learning activity; also, they were informed that their participation was completely voluntary. No academic grade or penalty was assigned for accepting or declining to participate in the research. Students accepted participation by signing a written informed consent form that detailed the goals and procedures for the corresponding learning activity.
Table 1 presents the information where the learning activity was implemented, including academic year, career, course ID, semester in which the course was taught, number of students enrolled in the course, and the number of participants who signed the consent form and completed all procedures of the learning activity. Note that the course ID encodes the name of the course itself as follows: BI2010B for design and development in neuroengineering; MA3001B for development of mathematical engineering projects; TC3002B for development of advanced data science applications; and TE3003B for integration of robotics and intelligent systems.
Despite being different courses and careers, all courses have similar elements and characteristics in the topics of data analysis and machine learning. The list below shows the topics and/or competencies most closely associated with data analysis and machine learning within each course. That is, other topics and/or competencies are also declared in the course; however, we do not show them because they might not be directly related to data analysis and machine learning.
BI2010B: Design and development in neuroengineering.
- -
Competency: Recording and analysis of neural signals.
MA3001B: Development of mathematical engineering projects.
- -
Competency: Data analysis of natural phenomena.
TC3002B: Development of advanced data science applications.
- -
Competency: Implementation of computational algorithms.
TE3003B: Integration of robotics and intelligent systems.
- -
Competency: Signal processing and data analysis.
2.4. Experimental Procedure
2.4.1. Description of the Implementations
The implementations were carried out in the 2023 and 2024 academic years. The number of sessions and the duration of each stage of the learning activity (see
Figure 1) were decided by the professor responsible for the course. However, the same course in the same or in different academic years consisted of the same sessions and duration (see
Table 1). A description of the goals, the research procedures, and an invitation to participate were presented to the students in the first session of the learning activity. Students who accepted to be part of the study then signed the written informed consent form. Then, all students were invited to answer several contextual questions as described in
Section 2.4.2 along with an examination to assess their knowledge (
Pre examination). They were informed to respond individually and honestly. Subsequently, the four stages of the learning activity were implemented following the descriptions and procedures presented in
Section 2.1. Immediately after all the activities were completed, the corresponding examination to assess their knowledge (
Post examination) and questions of the MUSIC survey were administered.
2.4.2. Context Questions
To examine students’ perceived baseline level in data recording in general, and in EEG data recording specifically, contextual questions were asked both before (Pre) and after (Post) the implementation of the learning activity. These questions focused entirely on their previous experiences with data recording and were designed to increase students’ interest and engagement in the learning activity. Consequently, three questions were created and administered to all students across all courses and careers where the learning activity was implemented.
The questions with their corresponding response options are as follows:
- C1:
Have you previously carried out experiments to obtain your own data with which you studied and applied data analysis and machine learning techniques?
- R1:
Yes/No
- C2:
From 0 to 5, where 0 is “not important” and 5 is “very important”, how essential do you consider the data recording process to be for data analysis and machine learning?
- R2:
0/1/2/3/4/5
- C3:
Have you participated and/or performed experiments to acquire and record electroencephalogram (EEG) signals?
- R3:
Yes/No
Note that there are no correct or incorrect responses, and students were informed about this prior to the examination. These contextual questions were administered via an online form along with the assessment questions.
2.5. Data Analysis and Statistical Tests
We only considered data from students who signed the informed consent form and fully completed the Pre and Post examinations, contextual questions and MMI items. That is, data from participants who did not complete all these elements were discarded and not used in the remainder of this study. Student IDs were only used to match the Pre and Post responses with the MMI scores. Right after this, student ID information was discarded to maintain full anonymity of the data. The data from all participants were merged into a resulting dataset to be subsequently subjected to the following analyses.
For the context questions, we computed the proportion of responses (for instance, the proportion of Yes and No, in the case of questions with this type of response) before and after the learning activity. Note that there are no correct or incorrect responses, and therefore these results are used to establish how students’ personal experiences evolved as a result of the learning activity. For the examination questions, the percentage of correct and incorrect responses was computed separately for each question in the Pre and Post examinations. The improvement percentage for each question was simply computed as the difference in the percentage of correct responses between Post and Pre. The total grade for each student, before and after the learning activity, was computed to obtain the distribution of grades in the Pre and Post examinations. These analyses are essential to quantify the learning gain and to evaluate their statistical significance. In the case of the MUSIC inventory, the responses to the 26 items were used to compute the scores of the five dimensions (empowerment, usefulness, success, interest, and caring) that assess students’ motivational perceptions.
To examine significant differences between the distributions of grades before and after the learning activity, given their ordinal nature, the Wilcoxon rank-sum test was employed. All statistical tests were performed at a confidence level of .
2.6. Learning Activity Deployment
2.6.1. Hardware
All activities involving the acquisition of the EEG signals in our implementations utilized the Unicorn Hybrid Black EEG system (from the manufacturer g.tec medical engineering GmbH, Schiedlberg, Austria). We decided to select this system because of its wireless capabilities, wearable properties, easy-to-use aspects, and significantly driver-oriented capability to provide high-quality signals (
Pontifex & Coffman, 2023).
Note however that any other EEG recording system can be used. The Unicorn Hybrid Black system is a 24-bit amplifier that digitizes brain signals at a sampling rate of 250 Hz from the eight scalp locations FZ, C3, CZ, C4, PZ, PO7, OZ, and PO8 according to the international EEG 10-20 system. Reference and ground electrodes are fixed on the left and right mastoids of the participants using disposable electrodes. In the implementation of the learning activities, each work team was provided with one of these systems to carry out their own practical activities as described below. The use of the EEG recording system was learned by the students in stage 1 of the learning activity.
2.6.2. Software
To guide the students during the recording and visualization of the EEG signals, we used an in-house software implemented in C++, which included the graphical user interface (GUI) for the P300 experiments. This software has been previously used in our lab in several brain–computer interface (BCI) experiments for movement recovery and rehabilitation based on P300 and motor imagery (
Hernandez-Rojas et al., 2022;
Peguero et al., 2023).
Importantly, the software included the recording of the marker signal, in addition to the raw EEG signals, which indicates the exact time of stimulus presentation for target and non-target symbols. This marker signal is essential for the practical activities of the learning activity because it is used by the students to obtain the epochs of EEG signals. The use of software and the GUI is learned by the students in stages 1 and 2 of the learning activity.
2.6.3. EEG Preprocessing
The technical details of the EEG data preparation and processing methods that students learn and apply in stage 3 are as follows. The initial step in processing the EEG signals involves filtering to remove low- and high-frequency artifacts, which are typically caused by eye movements, muscle activity, or environmental noise. Specifically, a 6th-order digital Butterworth band-pass filter with cutoff frequencies from 1 Hz to 20 Hz is proposed to the students. This frequency band has been widely used in the literature for the analysis of event-related potentials (ERPs), particularly the P300 component, as it retains the low-frequency content critical for the detection of this potential while effectively attenuating slower drifts and high-frequency noise such as electromyographic activity (
Chailloux Peguero et al., 2020;
Delijorge et al., 2020).
Although some studies suggest using lower cutoff frequencies (e.g., 0.5 Hz), we found that using a 1 Hz lower bound preserves sufficient information for reliable P300 detection, as confirmed by empirical results in prior implementations of this activity. Students were also encouraged to experiment with alternative filter settings to assess the impact on signal quality and classification performance. Additionally, a Notch filter centered at 60 Hz may be applied to suppress power line interference. However, students were informed that the 60 Hz component lies outside the band of interest defined by the band-pass filter, and thus, its removal has a negligible impact on the ERP signal or the outcomes of the P300 detection process. In addition to filtering, another important pre-processing aspect is EEG referencing. In this activity, the EEG signals were originally referenced to mastoid electrodes, which is a standard and widely accepted reference in ERP research, particularly for visual paradigms involving the P300 component. Given the limited number of electrodes (8 channels) and their focused spatial distribution, we did not apply common average referencing (CAR), as it could introduce distortions in ERP morphology rather than improve signal quality. However, more advanced students were encouraged to explore the impact of alternative referencing schemes, such as CAR, as part of the open-ended exploration suggested in stage 3.
2.6.4. Epoching and Rejection of Noisy Windows
After the band-pass filtering step, the EEG signals are segmented into epochs associated with the target and non-target conditions. This segmentation is performed based on the time markers provided by the user interface of the application used to implement the experimental protocol. Each epoch is time-locked to the stimulus onset and spans a window in which the evoked P300 response is expected to occur. All resulting epochs are then analyzed to assess signal quality and detect the presence of artifacts. To determine whether an epoch is contaminated by noise, students are instructed to compute, for each electrode, two metrics: (i) the peak-to-peak voltage, defined as , and (ii) the standard deviation as , where is the band-pass-filtered signal of the i-th electrode, , is the total number of samples in , and is the average of . Therefore, an epoch is considered as noisy if in at least one electrode the following conditions are fulfilled: first, the peak-to-peak voltage is greater than 100 μV, and second, the standard deviation is greater than 50 μV.
2.6.5. ERP Computation and Classification
After selecting the artifact-free epochs, the EEG signals for the target and non-target conditions are averaged separately to obtain the corresponding event-related potentials (ERPs). Students are encouraged to inspect the averaged waveforms for each channel to identify the presence of an evoked response, particularly the P300 component, which is typically visible in the target condition approximately 300 milliseconds after stimulus onset. In addition to ERP averaging, students implement a linear Support Vector Machine (SVM) classifier using features extracted from the clean epochs. The model is evaluated using k-fold cross-validation (k = 5) to estimate its ability to distinguish between target and non-target. Through this process, students examine whether the classifier achieves performance above chance level, linking signal quality and ERP presence with classification success.
All these EEG signal processing and analysis methods are implemented in Python 3.10 or Matlab R2022b depending on the analysis software required to be used in each career and course where the learning activity was implemented. Note however that, as indicated in the description of stage 3 of the learning activity, students can also discover and apply other methods for EEG signal analysis.
4. Discussion
We proposed and implemented a learning activity designed to encourage and develop data analysis and machine learning knowledge, skills, and attitudes in college engineering students across diverse careers. The growing demand for professionals with robust training and experience in these competencies is a critical need in many industries (
Smaldone et al., 2022). Additionally, these competencies are also considered critical for individuals ranging from the K-12 population to life-long learners (
Sanusi et al., 2022;
Srinivasan, 2022). Therefore, it is necessary for students and young professionals across various disciplines to understand and effectively use these methods and techniques. Given their responsibility in preparing students for the labor market, higher education institutions play an essential role in developing data analysis and machine learning competencies.
In our learning-by-doing methodology, we chose to conduct experiments involving the recording of students’ own electroencephalogram (EEG) brain signals given the interest and engagement it generates and its suitability for exploring and discovering data analysis and machine learning concepts. The use of EEG devices in learning scenarios has been previously explored in medical education. For instance, they have been used to assess the efficacy of the flipped classroom model with neurology residency students (
Novroski & Correll, 2018) and to investigate the effectiveness of combining problem-based learning (PBL) with case-based learning (CBL) in teaching clinical EEG interpretation to residents and refresher physicians (
Li et al., 2024). Other studies have also enhanced knowledge and skills in medical students through EEG experiments (
Nascimento et al., 2021). However, our learning activity, utilizing an EEG apparatus, differs in its focus, as it does not aim to develop medical skills. Instead, we use the device to collect students’ own data in hands-on situations for non-medical training. Specifically, we aim to develop competencies in students from various engineering disciplines, such as Biomedical Engineering (measurement analysis and modeling), Data Science and Mathematics Engineering (pattern recognition and AI), Computer Technology Engineering (implementation of computational algorithms), and Robotics and Digital Systems Engineering (signal processing and data analysis).
The learning activity consisted of four stages: experiments for acquisition and visualization of brain signals, experiments to record brain signals in real situations, preparation and processing of the collected data, and application of machine learning models using the collected brain signals. This cross-disciplinary active learning activity aimed to motivate and engage students in their own learning process, promoting learning by doing, knowledge discovery, problem-solving, teamwork, and group discussions. The activity was implemented with nearly 300 students from four different engineering disciplines. Nevertheless, the results reported in this work are based on data from a total of 185 students who completed all procedures, including signing the informed consent form and completing both the Pre and Post examinations. To assess the impact of the activity, we employed two validation instruments: the learning gain, determined by comparing students’ grades before and after the activity, and the self-reported measures of empowerment, usefulness, success, interest, and caring, as calculated from inquiries using the MUSIC model inventory.
Considering the context questions, an increase in students’ personal experience in the subject area was observed. First, it is notable that the proportion of students with hands-on experience collecting data increased from 23% before the learning activity to 90% after (Context question C1,
Figure 2a), which indicates the effectiveness of carrying out practical activities. The 10% of students who did not report this experience warrant further analysis to determine specific reasons (e.g., absence of that session or lack of interest in practical activities), which should be addressed in the future.
Second, a significant increase was observed in students’ appreciation of the importance of data collection for data analysis and machine learning (Context question C2,
Figure 2b). Before the activity, 62% of students rated data collection very important, a percentage that rose to 89% afterward. Additionally, low importance rates decreased from 7% to 2%, revealing a trend towards a better personal appreciation of the importance of data collection.
Finally, there was a substantial increase in students’ experience with EEG experiments, rising from 17% before the activity to 94% afterward (Context question C3,
Figure 2c). This increase demonstrates the effectiveness of the learning activity in providing practical EEG experiences to students. The initial non-zero, low percentage of students (17%) with prior EEG experiences is because some of them had participated in EEG-based brain–computer interface research before. The remaining 6% who reported no EEG experience post-activity possibly indicates a misunderstanding of the question, or that the student, in fact, did not participate (again, absence of that session). This is critical and should be explored further to ensure that all students participate.
In line with our hypothesis, we found a significant increase in the learning gain associated with the application of the learning activity. This was observed individually for each question, with the percentage of improvement ranging from 30% in the lower case to 60% in the best case (see
Figure 3c). Additionally, there was a significant improvement in students’ grades, with the average grade rising from 36.47 ± 24.79 in the
Pre examination to 81.17 ± 21.13 in the
Post examination (see
Figure 4), resulting in an average increase of 44.70 points. Indeed, the statistical analysis showed a significant improvement in students’ grades following the learning activity. These findings indicate that the learning activity had a substantial positive impact on students’ academic performance (consistent with the context questions), demonstrating its effectiveness in enhancing their understanding and application of data analysis and machine learning concepts.
On the other hand, the results of the MUSIC test revealed high motivation and appreciation for the learning activity. Empowerment, usefulness, success, interest, and caring dimensions reached average and median values around 5 points. This indicates that students found the learning activity enjoyable and felt it provided them control over their learning environment. Additionally, they perceived the activity as valuable for their academic development and felt it fostered an environment of success. Similar to the context questions, few students self-reported low motivational scores (as indicated by the atypical values in the distributions presented in
Figure 5). These cases require further analysis, and a contingency plan should be formulated to identify and address these situations. However, our findings indicate that this active, experimental approach significantly improved student grades and fostered high levels of self-reported motivation, curiosity, and satisfaction, particularly concerning the relevance and engagement aspects of the learning experience. The integration of real-world data collection, processing, and application appears to be a powerful pedagogical tool for complex, interdisciplinary subjects. Overall, this study highlights the effectiveness of the learning activity in motivating students for learning.
To our knowledge, this is the first study that formally introduces the use of an EEG brain signal recording device as part of an active learning methodology for developing data analysis and machine learning competencies in engineering (non-medical) students. While many studies explore innovative teaching methods for data science and machine learning, a significant number still rely on curated datasets or theoretical instruction. For example, studies on using machine learning to predict student success in online programming courses highlight the application of ML in education, but often do not involve students in the full data lifecycle themselves. Similarly, research into gamification and learning pathways focuses on enhancing engagement and motivation through structured digital environments, but may not incorporate the hands-on collection and processing of raw, experimental data that is central to our approach. Our unique contribution lies in directly embedding the real-world data acquisition of EEG signals into the learning process, thereby providing a more authentic and comprehensive experience of the data analysis pipeline. This contrasts with approaches where data is simply provided, pushing students beyond mere algorithmic application to a deeper understanding of data source, quality, and contextualization, a critical skill set often underdeveloped in traditional curricula.
Nonetheless, several aspects require further consideration or improvement for future implementations. First, a control group was not included. Hence, it would be advisable to assess the true effect of our educational innovation in the experimental group by comparing the results of the validation instruments with those obtained from groups of students who did not participate in the implementation. In this case, the students in the control group would not participate in stages 1 and 2 of the learning activity. Second, as mentioned above, there remains a small proportion of students who reported no experience with the collection of EEG brain signals and low motivational scores. We did not find associations between these factors, and a more detailed and personalized follow-up needs to be incorporated to ensure that all students are engaged in the activity. Third, it would be interesting to investigate to what extent the proposed educational innovation enhances the development of the same competencies in students from different engineering disciplines. For instance, Biomedical Engineering students may be more inclined and receptive to performing EEG experiments compared to Computer and Data Sciences students, who might have less interest in hands-on experimentation and field experiences. Despite these limitations, the pedagogical framework underpinning this learning activity demonstrates strong potential for generalizability. The core principle of active, experiential learning, where students engage with the full data analysis lifecycle through real data collection, is highly adaptable. This approach could be transferred to other STEM disciplines by simply changing the type of data collected (e.g., environmental sensor data for civil engineering, financial data for business analytics). The emphasis on critical thinking, problem-solving, and interdisciplinary application of data science and machine learning is universally valuable. The findings on enhanced motivation and perceived relevance align with studies on learning pathway systems and gamification, suggesting that integrating engaging elements and practical relevance can broadly benefit student learning outcomes, regardless of the specific data modality.
5. Conclusions
The proposed active learning activity, based on the collection and processing of students’ own brain signals to develop data analysis and machine learning competencies, has proven to be effective in enhancing students’ knowledge, skills, attitudes, and motivation. By guiding students to conduct their own experiments and to record and process their own data before exploring data analysis and machine learning methods, this activity addresses the limitations inherent in using pre-recorded data, which is typical of traditional teaching methods with instructors or video lectures. Therefore, this activity promotes a shift from the traditional passive learning model, where students listen to an expert (either face-to-face or via video lectures) and use existing datasets recorded by others, toward a collaborative environment that fosters higher-order thinking and empowers students as owners of their learning process.
The learning activity comprises four stages, adaptable to various teaching sessions of differing lengths and formats, depending on the specific career, course, learning goals, thematic content, or other factors. The learning activity was implemented in six different courses across four engineering careers and facilitated the development of competencies for students in Biomedical Engineering (measurement analysis and modeling), Data Science and Mathematics Engineering (pattern recognition, natural language and AI), Computer Technology Engineering (implementation of computational algorithms), and Robotics and Digital Systems Engineering (signal processing and data analysis). The positive outcomes, demonstrated by significant learning gain and high self-reported empowerment, usefulness, success, interest, and caring, indicate that this hands-on, experiential learning approach can significantly improve students’ understanding and appreciation of data analysis and machine learning concepts.
This study has significant broader impacts on computing education and beyond. Firstly, it offers a replicable model for teaching complex computational skills by fostering a deeper, more intuitive understanding of data science and machine learning concepts through active engagement rather than passive reception. Finally, the positive influence on student motivation and appreciation for data analysis suggests that similar active learning methodologies can contribute to a more engaging and effective learning environment in higher education, potentially leading to increased student retention and success in challenging STEM fields. Our findings align with broader educational trends emphasizing active learning and technology integration for enhanced pedagogical effectiveness, demonstrating a pathway to cultivate crucial competencies for the future workforce.