3.1. Group Performance Evaluation
Table 4 summarizes the student group performance evaluation of facilitators in terms of participation, engagement, interest, and behaviour, rated on a 10-point Likert scale at the end of the workshop sessions.
The results of the evaluation grid revealed generally high levels of group performance. Several groups received very similar ratings across indicators, which is plausible with bounded rubric scores and may reflect ties/ceiling effects.
Table 4 was then interpreted descriptively and integrated with the facilitators’ field note patterns.
The highest average rating was observed for Behaviour (
M = 8.18,
SD = 1.25), indicating that most groups consistently adhered to rules, collaborated effectively, and demonstrated respectful classroom conduct. Ratings for Interest (
M = 7.91,
SD = 1.29) and Engagement (
M = 7.83,
SD = 1.34) were also strong, suggesting that students showed substantial curiosity and sustained attention throughout the workshop activities. Participation received the lowest mean score (
M = 7.55,
SD = 1.52) and exhibited the greatest variability, possibly reflecting differences in individual willingness to contribute during discussions (
Figure 1). Overall, the average group performance score was
M = 7.97 (
SD = 1.24), with scores ranging from 5.88 to 9.75. Groups 8 (
M = 9.75,
SD = 0.15) and 6 (
M = 9.50,
SD = 0.19) showed the highest consistency and performance across dimensions, whereas Group 3 scored the lowest (
M = 5.88,
SD = 0.65). As an internal check, Friedman’s test in
Table A5 and Pairwise post hoc comparisons in
Table A6 indicated no significant differences across the four within-group indicators (
p = 0.148; Kendall’s
W = 0.18).
To complement the rubric-based evaluation, facilitators’ qualitative comments and brief observational notes were analysed to further contextualize the numerical scores assigned to student group participation, engagement, interest, and behaviour.
The analysis related to the participation indicator revealed three primary patterns that closely align with the distribution of quantitative ratings. Several groups were described as showing consistently high and widespread participation, with students actively contributing to discussions, asking pertinent questions, and demonstrating curiosity throughout the activities. For instance, facilitators noted that “the entire group showed high participation, expressing observations and curiosity relevant to the activity and that “most students actively engaged in discussions and responded to questions”. These qualitative insights are consistent with high participation scores observed for Groups 4, 6, 7, and 8 (scores ≥ 8.5). Participation in some groups was described as unevenly distributed, with a portion of students actively involved, while others remained more passive. For example, Group 2 was noted to have “participated in all proposed activities, although not all students were equally active”. This pattern supports the intermediate participation score observed for this group (7.25), suggesting that numerical averages may mask internal variability. Finally, lower levels of participation were observed in a few groups, often characterised by limited student involvement or the need for external prompting by facilitators. Comments such as “except for a few students, the group showed low participation” and “some students followed attentively, while others tended to be distracted” correspond with the lower numerical ratings of Groups 3 and 5.
The results related to the engagement indicator confirm the existence in several groups of consistently high levels of engagement. Some groups were described as “very attentive to the topics and highly participatory”, or as having “fluid and productive interactions with the instructor and peers”. These observations support the high engagement scores attributed to groups such as 6, 7, and 8 (Engagement ≥ 8.5). Other groups showed more heterogeneous patterns of engagement, with some students actively involved while others were more passive. For example, Group 2 was described as having “some students with high attention and interaction, others with moderate levels, and about half with low attention”, which aligns with their intermediate engagement rating (Engagement = 8) and highlights how averages can mask within-group variability. Lastly, a few groups demonstrated lower or fluctuating engagement, as reflected in comments such as “the group showed low levels of attention and limited interaction” (Group 3) or “engagement was inconsistent… some students were attentive while others remained passive” These remarks are consistent with lower engagement scores assigned to Groups 3 and 5.
The notes related to the interest indicator revealed patterns consistent with the quantitative data, helping to clarify variations in student motivation and curiosity across groups. Several groups were described as exhibiting high levels of interest, characterised by enthusiasm, spontaneous curiosity, and active questioning. For instance, one facilitator noted that “students participated with enthusiasm and curiosity throughout the proposed activities”, aligning with the highest numerical ratings attributed to Groups 6, 7, and 8. Other groups displayed a moderate level, where some students were actively engaged while others showed limited involvement. In these cases, comments such as “only some students expressed strong curiosity” (Group 2) or “in general, the group was motivated” (Group 5) supported a more than sufficient level. Finally, a few groups showed low or inconsistent levels of interest, with facilitators reporting limited curiosity and reduced engagement with the subject matter. These remarks aligned with the lowest interest scores in the dataset (e.g., Group 5 and Group 3).
Further insights come from the behaviour indicators, capturing nuances in students’ ability to follow rules, collaborate, and demonstrate appropriate conduct in a professional research setting. The comments confirmed the variation in behavioural performance reflected in the numerical ratings. Several groups were noted for their excellent behavioural conduct, characterised by rule-following, mutual respect, and positive collaboration with both peers and researchers. For instance, facilitators described groups as “maintaining an excellent attitude, demonstrating respect for rules, instructors, and classmates” and “creating a calm and productive atmosphere” These comments support the high behaviour ratings attributed to Groups 4, 6, 8, 7 and 1. Other groups showed moderate or heterogenous behavioural performance, with some students fully respecting expectations and others requiring reminders or behaving less appropriately: “Group 2 displayed heterogeneous behaviour and respect for rules; only some students behaved appropriately given the professional context”; “Although the group respected the rules overall, in some cases it was necessary to call students to attention”. Such comments align with middle-range behaviour scores, including Groups 2 and 5. A few groups were described as having significant difficulties, including lack of respect for the research environment, poor collaboration, and inattentiveness. A facilitator noted: “Except for a few students, the group was disrespectful toward the work environment and showed limited cooperation”. This observation is consistent with the lower behaviour scores assigned to Groups 3 and 9.
3.2. A Case Study: Chemistry Pathway Questionnaire Evaluation
The chemistry pathway was structured as follows:
Step 1. Brief presentation of the ISMN Institute and our chemistry dissemination group “ChimiCom@CNRPA”.
Step 2. Introduction to the issue of water pollution and its main causes.
Step 3. Overview of porous, high-surface-area silica-based nanostructured materials and their laboratory preparation.
Step 4. Explanation of the planned experiment: “Water purification from methylene blue dye by adsorption on three silica-based materials, and quantitative analysis of the treated water using UV–Vis Spectroscopy”.
Step 5. Execution of the experiment in small groups (of 2–4 students) under the supervision of the instructor and tutor, followed by quantitative analysis of the treated water.
Step 6. Group discussion to identify the most sustainable material among the three tested.
Step 7. Administration of the post-activity satisfaction and learning questionnaire (via QR code).
In
Figure 2, some relevant steps of experimental activity are shown.
The main learning objectives of this pathway were to capture students’ attention and curiosity toward the issue of water pollution and to demonstrate how chemistry can make a concrete contribution to addressing this global challenge. In addition, the activity aimed to introduce the concepts of surface area and porosity in relation to the materials used. The workshop adopted a broader STEM approach by explicitly touching, beside chemistry, some aspects pertaining to mathematics (spectral analysis and interpretation of qualitative/quantitative patterns), geology (natural silica-based materials and their origin), ecology (circular-economy scenarios such as waste valorisation and water remediation), and engineering (characterisation techniques and the production of manufactured goods, including prototyping through 3D printing).
Despite the chemistry workshop was primarily framed as a STEM-oriented learning experience, it intentionally incorporated selected STEAM-related dimensions, particularly creative problem-solving, design-based thinking, and reflective collaboration, which complemented scientific inquiry and technology-enhanced experimentation. The goal was to stimulate students’ scientific creativity, considering all its possible dimensions, i.e., product, process, and trait, as discussed by
Pinar et al. (
2025). Indeed, scientific creativity plays a significant role in shaping students’ future scientific careers (
Pinar et al., 2025). Students worked with porous and nanostructured silica-based materials and explored their potential applications in environmentally relevant contexts (e.g., waste recycling and wastewater treatment), making evidence-based choices that foregrounded sustainability considerations (e.g., selecting materials and procedures by weighing performance and environmental impact). A key technological component was the use of UV–Vis spectroscopy for material characterisation and data acquisition, which enabled students to connect experimental observations to qualitative/quantitative representations (spectral curves) and to discuss how measured signals relate to underlying material properties. Moreover, the activity was operationalised through communication and argumentation: students had to explain and justify group decisions, interpret results collectively, and present reasoned conclusions, thereby integrating scientific evidence with collaborative meaning making.
This chemistry workshop differed from traditional school chemistry laboratories because the experiments were carried out using silica-based materials developed and optimised at the CNR-ISMN Palermo laboratories as part of ongoing research activities. Moreover, the workshop involved specialised equipment and expertise that are not typically available in school science laboratories.
Table 5 shows the total of 128 students distributed in the eight groups participating in the chemistry workshop. Due to time constraints, it was not possible to administer the questionnaire to Groups 1 and 2. A total of 69 students out of 128 completed the chemistry questionnaire. The remaining 25 students were absent or chose not to complete the questionnaire.
As shown in
Figure 3, students reported high levels of satisfaction with the activity. The distribution of satisfaction scores (left panel) revealed a strong skew toward the upper end of the scale, with most students selecting 4 or 5 on a five-point scale. Descriptive statistics confirmed this trend (
M = 4.16,
SD = 0.83), indicating that the overall perception of the activity was highly positive.
The central panel illustrates students’ willingness to participate again, which was also high. The average re-participation score (M = 7.22, SD = 2.37), with most responses clustering between 7 and 10, was suggesting strong interest in repeating the experience.
Finally, the right panel shows that most students were first-time participants (Yes; 85.5%), yet their evaluations were comparably positive, indicating that the activity was engaging regardless of prior experience.
Taken together, these results suggest that students not only expressed very high satisfaction but also demonstrated a strong intention to re-engage, highlighting the perceived value of the experience.
Responses to the item assessing prior knowledge indicated that approximately half of the students (50.7%) reported having heard about pollution from organic dyes before. In contrast, more than one third (36.2%) stated that they had not, while a smaller proportion (13.0%) expressed uncertainty. These findings suggest a heterogeneous background among participants, with some students already familiar with the topic and others encountering it for the first time.
Table A1 reports the eight-item knowledge score with moderate internal consistency due to the brief length of the subtest (KR-20 = 0.51; α ≈ 0.50). Item difficulty (proportion correct) and corrected item–total correlations are reported in
Table A2.
Students’ overall performance on the eight knowledge items indicated a moderate level of accuracy. On average, participants answered correctly slightly more than half of the items (
M = 4.46,
SD = 1.84). The median score was 5, with scores ranging from 1 to 8. Interquartile values showed that 50% of students obtained between 3 and 6 correct answers. These results suggest variability in knowledge acquisition, with some students demonstrating high accuracy while others exhibited substantial difficulties (
Figure 4).
Group-level analysis revealed variation in performance, with mean scores ranging from
M = 3.11 (Group 9) to
M = 5.62 (Group 7). All groups except Groups 4 and 9 answered on average more than 50% of the questions correctly (
Figure 5).
Table 6 summarizes the average percentage of correct responses related to the four knowledge domains. DIE was the highest knowledge area (Data Interpretation and Evaluation;
M = 71.01%,
SD = 32.55), followed by PK (Procedural Knowledge;
M = 54.35%,
SD = 37.12), OS (Observation Skills;
M = 51.45%,
SD = 35.33), and DK (Declarative Knowledge;
M = 46.38%,
SD = 38.65). A Friedman test indicated a significant difference among the domains,
χ2(3) = 19.32,
p < 0.001, with a small effect (Kendall’s
W = 0.093). Post hoc Wilcoxon signed-rank tests with Holm correction revealed that DIE scores were significantly higher than DK (
p < 0.001), PK (
p = 0.001), and OS (
p < 0.001). No other pairwise comparisons have reached significance (
Table A3 and
Table A4).
Comparison between testing time slots revealed a significant effect. The Mann–Whitney U test (U = 393.50, p = 0.015) showed that students assessed in the second time slot, 12:00–14:00, had significantly higher total correct scores (M = 4.97, SD = 1.89) than those assessed in the first time slot (M = 3.91, SD = 1.63), 10:00–12:00.
Regarding prior knowledge of organic dye pollution, no significant differences emerged when comparing students who responded “Yes” (M = 4.66, SD = 1.94), “No” (M = 4.04, SD = 1.86), or “I don’t know” (M = 4.89, SD = 1.17). Thus, self-reported prior exposure to the topic did not significantly influence students’ performance, even when stratified by time of assessment.