3.1. Demographic Characteristics
According to the 2021 census [
50], the population of the Municipality of Patras is 215,922 residents, making it the third-largest city in Greece, while the population of the regional unit of Achaia, excluding the Municipality of Patras, is 90,057 residents.
According to the demographic data, participants were almost equally split by gender: 47.37% boys and 46.81% girls, while 5.82% chose not to disclose their gender. Most students were in the first year of the USE (33.40%), followed by the second year (21.11%). Participation of the third-year USE students was lower (6.00%), likely due to preparation for the Panhellenic University Entrance Examinations, the national standardized tests required for admission to Greek universities. In terms of school type, 42.31% of participants attended USE, 37.90% LSE, and 18.57% USVE. Regarding parental occupation, the most common response for fathers’ employment was “self-employed” (34.05%), while for mothers, the most frequent response was “unemployed” (22.33%). In terms of parents’ marital status, most of the students reported that their parents live together (74.30%). Almost half of the participants (48.41%) indicated that they reside in Patras (capital of the regional unit of Achaia). Finally, 46.25% of the participants stated that their household includes two children (including themselves).
Out of the 39 total questions in the questionnaire, the 2nd grade of USE provided the most correct answers on 15 questions. In contrast, the 3rd grade of LSE gave the most incorrect answers on 12 questions.
3.2. Performance on Key Geological Misconceptions
The results showed that students demonstrated low levels of understanding across several key geological concepts. Some of the most common misconceptions, which were identified in the results of the present study, are presented in what follows.
In the “Metals” category, the majority (60.19%) incorrectly believed that “All metals are hard (have high hardness)”, while only 3.76% selected the correct answer of “In some cases”. Similarly, in the “Differentiation of minerals—rocks” category, 75.12% of the students incorrectly identified “Quartz is rock”. In the “Rock Life Cycle” category, only 35.12% disagreed with the misconception that “Rocks are always solid”. Misconceptions were also prevalent in the category “Rocks”, since only 28.54% correctly disagreed with the statement “The soil is not a rock; it is something completely different”, and just 24.41% rejected the misconception that “Rocks are created by catastrophic events (earthquakes, volcanic eruptions)”. Furthermore, only 33.05% understood that hardness is a valid characteristic “… used to identify rocks”. Finally, in the “Minerals” category, very few students knew that “luster affects the color we see” (16.53%) and that “brightness affects the color we see” (15.59%), and even fewer knew that glass is not mineral (10.52%) or that it breaks easily (9.86%). Finally, only 26.10% were aware that some rocks are used as fuel.
On the contrary, there were few specific areas in which students showed higher levels of understanding. In the category “Mineral–rock differentiation” most students correctly disagreed with misconceptions such as “Minerals and rocks do not have an important role in our lives” (72.86%), “Liquids are always light” (68.92%), and “Rocks and minerals are the same” (67.79%). Additionally, in the “Rocks” category, 84.88% rejected the idea that “All rocks are the same” and 69.67% disagreed with the misconception that “All rocks are heavy”. Finally, in the “Minerals” category, 61.31% rejected the misconception which stated that “If the mineral is heavy, it is also hard”. These results suggest that while students may possess accurate knowledge in certain basic areas, significant misconceptions resist change, particularly when it comes to mineral properties, rock formation, and geological processes.
3.3. Statistical Analysis of the Survey
Statistical analysis was carried out using non-parametric tests such as Mann–Whitney and Kruskal–Wallis, which compare differences between groups based on rank rather than assuming normal data distribution. Post-hoc adjustments, such as the Bonferroni method, were used to correct for multiple comparisons and enhance result reliability. Such non-parametric, rank-based methods are especially suitable when data do not meet parametric assumptions, and are well established in quantitative research methodology [
46].
The statistical analysis was conducted as follows. Initially, for each pair of variables under examination involving two groups––for example, Q1 and gender, or Q2 and gender––a Mann-Whitney test was performed. For comparisons involving more than two groups––such as Q1 and grade level or Q2 and grade level––the Kruskal–Wallis test was applied to determine whether there were statistically significant differences between groups. This test assesses the equality or inequality of medians among the groups. For the purposes of the statistical analysis, the questions were grouped into four categories, as presented in
Table 1.
In
Table 2, the column “School” presents the categories corresponding to LSE (404 students) and USE (USE, USVE, Other), with 661 students participating. The column “City” presents the categories corresponding to Patras (516 students) and Outside Patras, that is, anywhere else within the prefecture of Achaia (549 students).
We calculated the mean score of correct responses in each question category, taking into account the school and the residence of the participants. According to the statistical process, City and School seem to play a role in our research (
Table 3).
For each question category (Q1, Q2, Q3, Q4), a Mann–Whitney test for the City and School audit was performed. The results showed that students attending LSEs in Patras performed better in question category Q1 (Metals and Differentiation of minerals—rocks), achieving a mean score of 3.85 (standard deviation 1.21). In question category Q2 (Rock Life Cycle), students from USEs in Patras achieved the highest mean score of 2.77 (standard deviation 1.18).
In category Q3 (Minerals), students from USEs living outside Patras performed slightly better, with a mean score of 10.8 (standard deviation 2.33). Finally, in category Q4 (Rocks), students from Patras LSEs performed better, achieving the highest mean score of 3.32 (standard deviation 1.03).
Furthermore, using the Kruskal–Wallis test, statistically significant differences were observed in categories Q1 and Q4 based on the combination of school type and city. These differences are also summarized in
Table 3.
Statistically significant differences emerged during the analysis of categories Q1, Q2, Q3, Q4 as a function of gender and school, using the Kruskal–Wallis test. The statistical processing was done as follows. Initially, for each couple examined––e.g., Q1, gender or Q2, class, etc.––a Kruskal–Wallis audit was carried out. If the p-value was below 0.05, then it was considered a statistically significant finding and further investigation using pairwise comparisons (Bonferroni corrected) was performed. Otherwise, the above-mentioned control was not continued.
Thus, according to
Figure 2 (SumQ1—Class) (where SumQ1 stands for Metals and Mineral–Rock Differentiation subcategories of questions as seen in
Table 1) and the Mann–Whitney comparisons with Bonferroni correction (statistically significant,
p = 0.001) the second grade of USE differs from all other groups. Also, the second grade of LSE differs from the second grade of USE based on median two schemes, but not with the rest. From the
Figure 1 and the average values, it appears that the second grade of USEs has more correct answers regarding the questions related to Metals and Mineral–Rock Differentiation, compared to the rest of the classes.
In particular, the second grade of USEs, compared to the second grade of LSEs, has the maximum difference compared to all other classes.
Regarding the Q4 question group (Rocks) about the class attendance (
Figure 3), according to the Kruskal–Wallis test, the
p-value is statistically significant (0.001), with the median in all six (6) classes being around 3.
A Mann–Whitney comparisons test was performed, with Bonferroni correction, according to which there are statistically significant differences between the first grade of LSEs and the first and second grades of USEs, where USE students gave more correct answers (
Table 4). Also, between the third grade of LSEs and the second grade of USEs, the latter gave more correct answers.
According to the above table, statistically significant differences were found in favor of the second-grade students in USEs, who performed significantly better than other student groups. This difference is particularly significant when compared to their peers in LSEs. Overall, USE students gave more correct responses, with second-grade USE students outperforming even those in the third grades of the same school type.
In
Figure 4, a correlation between the number of correct answers in Q1 (Mineral–Rock Differentiation) and Q2 (Life Cycle) is presented.
According to
Figure 4, the highest frequency (9.55%) is observed in the combination (3.3) for Q1 and Q2 number of correct of answers. Also, there seems to be a slight positive correlation since large values of Q1 are observed together with large values for Q2, and small values of Q1 together with small values for Q2. Students who answered correctly in the Q1 question category tended to answer correctly in the Q2 question category as well. Similarly, students who did not answer correctly in category Q1 did the same in category Q2. In any case, the majority of students answered with “moderate” accuracy, i.e., they fell between 3 and 4 in
Figure 4.
Figure 5 presents the results of a correlation conducted between the number of correct answers in Q4 (Rocks) and Q1 (Mineral-rock differentiation).
There is a positive correlation in Q4 = 3 and Q1 = 3 and 4. Also there seems to be a slight positive correlation, since large values of Q4 are observed along with large values for Q1, and small values of Q4 together with small values for Q1. In
Figure 5, the majority of students answered with “moderate” accuracy, i.e., they fell between 3 and 4. Those who answered correctly in the Q1 category had a tendency to answer correctly in the Q4 category as well. Similarly, students who did not answer correctly in category Q1 did the same in category Q4.
Figure 6 presents a correlation conducted between the number of correct answers Q4 (Rocks) and Q2 (Life Cycle).
There is a correlation between Q4 = 3 and Q2 = 2 through 4. Also, there seems to be a slight positive correlation, since large values of Q4 are observed along with large values for Q2, and small values of Q4 together with small values for Q2. Therefore, the majority of students answered with “moderate” accuracy, i.e., they fell between 3 and 4. Those who answered correctly in the Q2 question category had a tendency to answer correctly in the Q4 question category as well (values 11 and 13.03 in
Figure 6).
In order to complement and enhance the aforementioned results,
Table 5 reports the Spearman correlation matrix among the four categories under study. The analysis indicates the presence of a weak but consistently positive association across categories, which nevertheless reaches the threshold of statistical significance. This finding suggests that, although the strength of the relationships is limited, the correlations are not due to chance and warrant further consideration in the interpretation of the data.
3.4. Clustering
Clustering analysis was applied in order to identify groups of students with similar response patterns across the four main categories of misconceptions (Q1: Mineral–rock differentiation, Q2: Rock Life Cycle, Q3: Minerals, Q4: Rocks) and selected demographic variables (school type, city of residence, siblings, family background, and gender). For each category, the total score was calculated by summing the correct responses to the relevant items. These total scores, together with the demographic variables, served as the input for the clustering procedure (
Table 6).
In order to construct cluster profiles, the following parameters were used: Total sum Q1, Total sum Q2, Total sum Q3, Total sum Q4, school, city, siblings, family, gender (
Figure 7). Total Sums refers to the sum of the values of students’ responses in each category, e.g., the Total Sum Q1 refers to the sum of the values of the answers of the students who answered the Q1 question category, and so on.
The optimal number of clusters was determined using the “Silhouette” method (
Figure 7).
This method indicated that two clusters provided the best separation. Subsequently, k-means clustering was performed, and two distinct student profiles emerged. Cluster 1 included students with relatively higher performance across categories, while Cluster 2 consisted of students with comparatively lower scores.
Figure 7 presents a visualization of the clustering results using principal component analysis (PCA), showing the distribution of students across the two clusters.
It is important to note that all four categories (Q1–Q4), including Q3, were incorporated in the clustering analysis, ensuring that the grouping reflects the combined performance across the entire set of questions. Finally, two clusters were created, based on the values Total Sum (Q1… Q4), as shown in
Table 7. Cluster 1 mainly included students with overall higher performance across Q1–Q4, particularly in mineral–rock differentiation (Q1), whereas Cluster 2 consisted of students with lower scores across categories. Moreover, the distribution of students across the two clusters was not random: USE students were more frequently represented in both clusters, suggesting that school type contributed to the observed grouping.
Cluster Profiles
The profile of each group describes its hidden similarities and structures.
Figure 8 shows the separation of clusters at the level of the first two principal components of Q1, Q2, Q3, and Q4 which, depending on their numerical value, are divided into two clusters. The two axes (Dim1 and Dim2) represent the first two principal components, which explain 40% and 22% of the total variance, respectively. Each point corresponds to an individual student projected in this reduced two-dimensional space, while colors indicate the two clusters obtained by k-means. The polygons delineate the area covered by each cluster, highlighting the separation between student groups. The PCA results revealed distinct clustering patterns among the Q1–Q4 question groups, highlighting clear differentiation in how students conceptualize minerals, rocks, and the rock cycle. This clustering underscores meaningful variations in students’ responses and supports targeted interpretation of domain-specific misconceptions. Principal Component Analysis is a well-established multivariate technique for reducing dimensionality by generating uncorrelated components that capture maximal variance––an approach frequently applied to geochemical and sedimentological datasets to interpret compositional stratigraphy [
51].
The results are shown in detail in
Table 7 below. In Cluster 2, where in all indicators the average value is higher, those who generally do better are concentrated.
Figure 9 below shows the two clusters and School type (LSE, USE). This Figure presents the distribution of students by School type (LSE vs. USE) within the two identified clusters. The results indicate that the clustering structure partially reflects school-related differences, with USE students being more represented in both groups.
The
p-value (0.001) is statistically significant when testing independence between the two variables x
2 for cluster and school. The LSE has 404 students and the USE (USE, USVE, Other) 661 students. In clusters 1 and 2 (
Figure 9), USE gathers more numerical values. As a percentage, however, blue is a little larger at 1 than it is at 2. This means that if an individual from High School is accidentally found, they are more likely, based on its characteristics, to join group 2.
Regarding the independence test between the clusters and School type (i.e., LSE and USE), the independence hypothesis is rejected, so it seems that there is a differentiation between LSE and USE between the two clusters, where more specifically, the second cluster gathers a higher percentage of the number of USEs compared to the first cluster.
In “Patras”, 516 students participated, while “anywhere else in the prefecture of Achaia”, 549 students participated (
Figure 10).
Here it seems that City is differentiated between clusters. City (Patras) and the rest of the regional unit of Achaia are not evenly distributed.
In cluster 1 (1st row) the blue part is much smaller than the blue part of the cluster in the 2nd row. (If City were not differentiated between the clusters, the figures in both graphs would be almost the same; however, in cluster 1, there are more values in the outskirts of Patras (549 responses).
Regarding the independence test between the clusters and the city (i.e., the urban center of Patras and all the other areas of the regional unit of Achaia), the independence hypothesis is rejected, so it seems that there is a differentiation between the urban center of Patras and all the other areas of the regional unit of Achaia between the two clusters, where more specifically the second cluster gathers a higher percentage of the number of areas outside the urban center of Patras in relation to the first cluster.
Parents’ marital status does not seem to play a role, as the values are similar (
p-value = 0.625) (
Figure 11). There were more indications of Parents staying together (792 responses) than for all other cases (separated, and more) (273 responses). Regarding the cluster in relation to gender (
p-value = 0.069) and siblings, there does not seem to be significant statistical differences.
Finally, regarding the profile of the second cluster (
Table 8), they were USE students from Patras who demonstrated a higher score, i.e., they gave more correct answers, in: Total Sum Q1, Total Sum Q2, Total Sum Q3, and Total Sum Q4.