Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning

Marín Díaz, Gabriel

doi:10.3390/educsci16060917

Open AccessArticle

Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning

by

Gabriel Marín Díaz

^1,2

¹

Faculty of Statistics, Complutense University, Puerta de Hierro, 28040 Madrid, Spain

²

Science and Aerospace Department, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain

Educ. Sci. 2026, 16(6), 917; https://doi.org/10.3390/educsci16060917 (registering DOI)

Submission received: 5 May 2026 / Revised: 29 May 2026 / Accepted: 8 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Rethinking STEM Education in the Era of Artificial Intelligence: Pedagogical Challenges, Assessment and Student-Level Variables)

Download

Browse Figures

Versions Notes

Abstract

Understanding students’ vocational preferences in STEM domains is a complex challenge characterized by uncertainty, subjectivity, and overlapping interests. Traditional profiling approaches often rely on rigid categorizations that fail to capture the hybrid and dynamic nature of learners. This study proposes FAS-XAI, a reproducible learning analytics framework that integrates fuzzy logic and explainable artificial intelligence for interpretable profiling of STEM vocational preferences. The methodology combines fuzzy AHP for criterion weighting, Fuzzy C-Means clustering to identify overlapping profiles, and XGBoost for supervised validation, complemented by SHAP and LIME to provide global and local explanations of model behavior. The study is framed as a methodological simulation under controlled conditions, using synthetic data to evaluate the internal coherence, transparency, and transferability of the proposed pipeline. The results show that the framework can generate multidimensional and interpretable learner profiles, with resilience, communication, and commitment emerging as relevant discriminative dimensions within the simulated setting. Overall, the proposed approach provides a reproducible methodological basis for future empirical applications in personalized learning, vocational guidance, and AI-supported educational decision-making.

Keywords:

fuzzy logic; explainable artificial intelligence (XAI); personalized learning; learning analytics; student profiling; STEM education; Fuzzy C-Means; analytic hierarchy process (AHP); educational data mining

1. Introduction

In recent years, the integration of data-driven methodologies into educational contexts has gained increasing relevance as a means of fostering analytical thinking, personalization, and evidence-based decision-making (Ferguson, 2012; Gašević et al., 2015; Siemens, 2013). Within STEM education, understanding students’ vocational preferences and learning profiles has become a key challenge, particularly due to the inherent uncertainty, subjectivity, and multidimensional nature of individual abilities and motivations. Traditional approaches to student profiling often rely on rigid categorizations that fail to capture the hybrid and evolving characteristics of learners.

The emergence of learning analytics has provided new opportunities to model and interpret student behavior through data. By leveraging computational techniques, educators can move beyond descriptive assessments toward more adaptive and personalized strategies. However, many existing approaches are based on crisp classification models, which assume clear boundaries between categories and overlook the gradual and overlapping nature of real educational phenomena (Baker, 2014; Zimmerman, 2002). This limitation is especially relevant in STEM domains, where students often exhibit mixed profiles that combine analytical, creative, and practical competencies (Goguen, 1973; Pintrich, 2000).

In this context, fuzzy logic offers a natural framework for representing uncertainty and partial membership, enabling more flexible and realistic modeling of student profiles (Zadeh, 1965). Techniques such as Fuzzy C-Means clustering allow individuals to belong simultaneously to multiple groups with varying degrees of membership, reflecting the continuous and non-binary nature of learning processes (Bezdek et al., 1984). Similarly, multi-criteria decision-making methods, such as the Analytic Hierarchy Process (AHP) (Saaty, 1980), can be extended using fuzzy approaches to capture the subjective importance that each student assigns to different criteria.

Despite these advances, a critical challenge remains: the interpretability of data-driven models in educational settings. The adoption of artificial intelligence in education requires predictive performance, transparency and explainability, so that both educators and learners can understand the reasoning behind the results. In this regard, explainable artificial intelligence (XAI) techniques, such as SHAP (Lundberg & Lee, 2017) and LIME (Ribeiro et al., 2016), provide valuable tools for identifying the contribution of each feature to model outputs, bridging the gap between complex algorithms and human understanding (Molnar, 2019).

To address these challenges, this study proposes a unified framework based on fuzzy logic and explainable artificial intelligence (FAS-XAI) for profiling STEM vocational preferences (Marín Díaz, 2025). The proposed approach integrates fuzzy AHP for individualized weighting of criteria, Fuzzy C-Means clustering for identifying overlapping student profiles, and supervised learning models for validation. An explainability layer is incorporated to provide transparent insights into the factors that shape each profile, supporting interpretation and informing potential decision-making applications.

The framework is designed as a reproducible and adaptable pipeline implemented on a synthetic dataset structure that can later be extended to real educational contexts. Accordingly, this study focuses on methodological development and validation under controlled simulation conditions, prioritizing internal coherence, interpretability, flexibility, and transferability. From a pedagogical perspective, the proposed methodology offers a basis for future applications in student guidance, curriculum design, and the early identification of educational trajectories, once validated with real educational datasets. In this context, interpretability is understood not merely as a technical property, but as a condition for making profile-based information understandable, discussable, and potentially useful for educational guidance.

Given the methodological and simulation-based nature of the study, the analysis is guided by the following research questions:

RQ1. Can the proposed hybrid Fuzzy AHP weighting scheme balance global preference structure and individual variability under controlled conditions?
RQ2. Can Fuzzy C-Means identify interpretable and overlapping STEM vocational profiles from the preference space?
RQ3. Can supervised learning internally validate the fuzzy clustering structure by approximating dominant profile assignments?
RQ4. How do SHAP and LIME complement centroid-based interpretations in explaining the resulting profiles?
RQ5. How stable is the proposed pipeline under controlled data degradation scenarios?

The methodological structure of this study follows a progressive five-phase process inspired by the Knowledge Discovery in Databases (KDD) paradigm (Shafique & Qaiser, 2014). First, a synthetic survey-based dataset is constructed to represent key dimensions related to STEM preferences. Second, preprocessing and normalization steps are applied to ensure consistency and comparability across variables. Third, fuzzy AHP is used to derive individualized weights that reflect the relative importance of each criterion. Fourth, Fuzzy C-Means clustering is applied to identify overlapping student profiles based on these weighted features. Finally, supervised learning and explainable AI techniques are employed to validate and interpret the resulting clusters.

Overall, this structured approach provides a coherent pathway from simulated educational data to interpretable methodological insights, linking computational modeling with future pedagogical applications and contributing to the advancement of explainable and uncertainty-aware learning analytics.

2. Related Work

2.1. Personalized Guidance and Vocational Discovery in Education

The identification and development of students’ vocational preferences have long been central to educational research, particularly in career guidance and academic decision-making. A systematic Web of Science search reveals a sustained increase in publications and citations from 2015 onwards, reflecting the growing relevance of data-informed and student-centered approaches.

As shown in Figure 1, the field comprises 1243 publications and 12,231 citations. Literature is mainly concentrated in education (648), psychology (523), and behavioral sciences (309), with smaller contributions from healthcare (109), business and economics (103), and engineering (59). This distribution reflects a predominantly cognitive and motivational orientation, while also revealing the comparatively limited presence of computational, uncertainty-aware, and explainable approaches.

Traditional vocational guidance models often align individual interests and abilities with predefined pathways through psychometric and standardized frameworks. However, many approaches still assume well-defined and mutually exclusive profiles. This limitation is especially relevant in STEM contexts, where students may combine analytical, creative, social, and practical competencies. This calls for approaches capable of modeling overlapping vocational preferences while producing interpretable outputs.

2.2. Fuzzy Multi-Criteria Decision-Making in Education

Multi-criteria decision-making approaches, particularly those based on the Analytic Hierarchy Process (AHP), have been widely used to structure complex educational decision problems. A Web of Science analysis of fuzzy AHP applications shows a growing body of research (Figure 2), reflecting increasing interest in uncertainty-aware decision-making.

The literature is distributed across computer science and mathematics (52 publications each), education (51), and engineering (33), with more moderate contributions from business and economics (23) and environmental sciences (15). This pattern suggests that fuzzy AHP has been mainly developed from a methodological perspective, with fewer applications addressing personalized learning or individualized student modeling.

The most-cited studies apply fuzzy AHP to evaluate e-learning systems, prioritize critical success factors, assess educational quality, or support institutional decision-making (Ahmad & Qahmash, 2020; Atıcı et al., 2022; Aliyev et al., 2020; Naveed et al., 2020). While these approaches demonstrate the effectiveness of fuzzy multi-criteria weighting in uncertain environments, they are generally oriented toward aggregate evaluation and ranking processes.

Consequently, applications in education tend to focus on comparing alternatives, such as platforms, methods, or institutional performance, rather than modeling individualized, context-dependent student preferences (P. Li et al., 2023; Xu et al., 2023). This leaves its potential for representing personalized preference structures in vocational guidance relatively underexplored, particularly when students assign different importance levels to interests, abilities, motivation, and expectations.

2.3. Fuzzy Clustering and Uncertainty Modeling in Education

The application of fuzzy clustering techniques, particularly Fuzzy C-Means (FCM), in educational data mining and learning analytics remains relatively limited. Existing studies are primarily oriented toward performance prediction, system evaluation, or the clustering of learning behaviors, with less emphasis on the construction of interpretable student profiles.

Previous contributions have applied FCM to classify student performance and identify patterns in educational datasets (Y. Li et al., 2019), while more recent approaches combine fuzzy clustering with neural networks or optimization techniques to improve predictive accuracy (Malik et al., 2025). Other studies have begun to integrate fuzzy models with XAI, such as evolving fuzzy classification frameworks for interpreting student behavior in virtual environments (Casalino et al., 2025), or have extended fuzzy clustering to career-related contexts using social network analysis (Khousa & Atif, 2018).

Systematic reviews in educational data mining also highlight the growing use of data-driven approaches for evaluation and prediction tasks, while pointing to methodological fragmentation and limited integration of interpretable fuzzy models into educational decision-making (Ordoñez-Avila et al., 2023). Overall, this suggests that fuzzy clustering remains underexplored as a tool for representing student profiles in a flexible, interpretable, and personalized way.

2.4. Explainable Artificial Intelligence in Educational Contexts

The increasing adoption of data-driven models in education has intensified the need for transparency and interpretability, fostering the use of Explainable Artificial Intelligence (XAI) in learning analytics and educational data mining. Recent studies show a clear upward trend in the application of XAI techniques, particularly since 2022 (Figure 3), reflecting the demand for more transparent and accountable decision-support systems.

Most contributions focus on explaining predictive models related to student performance, dropout risk, and learning behavior. XAI has been applied to early warning systems (Alwarthan et al., 2022; Melo et al., 2022), student success prediction (Tiukhova et al., 2024), dashboards, recommender systems, and self-regulation support (Afzaal et al., 2024; Susnjak et al., 2022). Other studies examine how explanations influence trust, motivation, and decision-making (Brdnik et al., 2023; Conijn et al., 2023), while personalized approaches use explainable models to capture individual learning behaviors and support precision education (Saqr et al., 2024).

Despite these advances, XAI in education is still commonly applied as a post hoc layer for supervised prediction, rather than being integrated into the construction of interpretable student profiles. Only a limited number of studies integrate explainability with clustering or profiling approaches (Alvarez-Garcia et al., 2024; Guleria & Sood, 2023), and the joint use of fuzzy clustering, multi-criteria weighting, and global/local explainability remains underdeveloped.

2.5. Literature Synthesis Matrix and Methodological Gap

To complement the narrative review, Table 1 summarizes the main methodological lines identified in the literature according to data type, methodological approach, explainability mechanisms, and unresolved gaps.

The synthesis shows that previous research generally treats vocational guidance, fuzzy multi-criteria decision-making, fuzzy clustering, and explainable AI as separate methodological strands. Across these areas, the main limitations are the reliance on predefined or crisp profiles, the predominance of aggregate evaluation and ranking, the orientation toward classification or behavioral grouping, and the frequent use of XAI as a post hoc explanatory layer for supervised prediction.

In contrast, the proposed FAS-XAI framework integrates fuzzy weighting, membership-based clustering, supervised validation, and SHAP/LIME explanations within a single simulation-based methodological pipeline for uncertainty-aware STEM vocational profiling.

3. Methodology

This study applies the FAS-XAI (Fuzzy–Adaptive System for Explainable Artificial Intelligence) framework as a reproducible educational methodology for modeling and interpreting students’ STEM vocational preferences (Marín Díaz, 2025). The main objective is to identify heterogeneous and overlapping student profiles by integrating fuzzy multi-criteria weighting, fuzzy clustering, and explainable machine learning.

The methodological design is grounded in the assumption that vocational preferences are not fixed or mutually exclusive categories, but dynamic and context-dependent constructs shaped by multiple factors, including interests, perceived abilities, motivation, and expectations. To address this complexity, the proposed framework combines fuzzy logic and explainable AI within a structured analytical pipeline that supports both methodological rigor and pedagogical interpretability.

The framework emphasizes the modeling of individualized preference structures, in which each student is characterized not only by their responses but also by the relative importance they assign to different criteria. This perspective enables a more realistic representation of vocational identity, capturing both dominant tendencies and overlapping orientations.

Following the Knowledge Discovery in Databases (KDD) paradigm (Shafique & Qaiser, 2014), the methodology unfolds in sequential yet interconnected phases, from data collection and preprocessing to fuzzy weighting, clustering, predictive validation, and educational interpretation, as shown in Figure 4. This workflow enables the transformation of raw survey-based information into interpretable and transferable profiles that can support personalized educational guidance.

3.1. Dataset Design and Construction

The dataset used in this study is based on a simulated educational scenario designed to model students’ vocational preferences in STEM-related domains. Its purpose is not to produce empirical or generalizable findings for a specific population, but to support the development and validation of a reproducible methodological framework applicable to real educational contexts.

The simulation generates the dataset synthetically and stores it in an open and transparent CSV format, ensuring reproducibility while avoiding concerns related to privacy, ethics, or personal data protection. This design keeps the focus on the analytical process rather than on sample-specific characteristics, and provides a controlled data foundation for subsequent fuzzy multi-criteria modeling and explainable analysis.

3.1.1. Structure of the Dataset and Reproducibility

The dataset consists of

n

simulated students evaluated across

p

criteria representing key dimensions of vocational preference. These criteria include interest in STEM-related activities, perceived ability or self-efficacy, motivation and engagement, and expectations regarding future development or career pathways.

Each student

i

is represented by a feature vector,

x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})

where

x_{i j} \in R

denotes the response of student

i

to criterion

j

, typically measured on an ordinal or Likert-type scale.

The full dataset can be expressed as:

X = [x_{i j}] \in R^{n \times p}

(1)

The analysis processes the dataset within a reproducible computational environment, such as Jupyter Notebook or Google Colab, ensuring traceability from data generation to analysis.

3.1.2. Preprocessing and Normalization

Prior to analysis, the pipeline preprocesses the dataset to ensure comparability across variables. This included:

Handling scale differences across criteria.
Normalizing variables to a common range.
Preparing the dataset for fuzzy modeling.

The normalization is defined as:

x_{i j}^{(n o r m)} = \frac{x_{i j} - m i n (x_{j})}{m a x (x_{j}) - m i n (x_{j})}

(2)

The normalized dataset is:

X^{(n o r m)} = [x_{i j}^{(n o r m)}]

(3)

This step ensures consistency across variables and facilitates the subsequent fuzzy transformation and multi-criteria weighting stages.

3.1.3. Methodological Scope

This study functions as a controlled simulation environment for validating the internal coherence, reproducibility, and interpretability of the proposed methodological pipeline. Within this scope, the dataset is used to:

Explore the modeling of vocational preferences.
Analyze interactions between multiple criteria under controlled conditions.
Evaluate the integration of fuzzy logic and explainable AI.
Assess the robustness of the proposed pipeline under controlled assumptions.

The proposed framework therefore establishes a transferable analytical process that can later be applied and validated using real educational datasets.

3.2. Fuzzy Multi-Criteria Weighting Through Hybrid AHP

To model the relative importance of the criteria involved in students’ vocational preferences, this study adopts a hybrid fuzzy Analytic Hierarchy Process (AHP) approach. Unlike traditional approaches, in which weights are either defined globally or derived individually, the proposed method combines both perspectives to balance collective structure and individual variability.

This strategy addresses two key methodological challenges:

Avoiding circularity when weights are derived solely from individual responses.
Preserving personalization in the representation of student preferences.

3.2.1. Global and Individual Weighting Structure

Let

X = [x_{i j}] \in R^{n \times p}

be the dataset. A global representation is obtained through aggregation:

{\bar{x}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j}

(4)

From this, the group-based comparison matrix is constructed:

a_{j k}^{(g)} = \frac{{\bar{x}}_{j}}{{\bar{x}}_{k}}

(5)

The corresponding global weight vector is:

w^{(g l o b a l)} = (w_{1}^{(g)}, \dots, w_{p}^{(g)})

(6)

In parallel, an individual comparison matrix is derived for each student:

a_{j k}^{(i)} = \frac{x_{i j}}{x_{i k}}

(7)

This leads to an individualized weight vector:

w^{(i n d i v i d u a l, i)} = (w_{1}^{(i)}, \dots, w_{p}^{(i)})

(8)

This captures relative prioritization implicitly in each student’s responses.

3.2.2. Fuzzy Representation

Both global and individual components are mapped into a fuzzy linguistic framework using the 2-tuple linguistic representation. In this study, the ordered linguistic term set is defined as

S = {VL, L, LM, M, MH, H, VH}

, where

V L

denotes Very Low,

L

Low,

L M

Low–Medium,

M

Medium,

M H

Medium–High,

H

High, and

V H

Very High.

Each normalized value

x \in [0,1]

is projected onto this ordered linguistic scale by computing its relative position over the scale and assigning the closest linguistic label. The 2-tuple representation

(s_{k}, α_{k})

combines the assigned linguistic label

s_{k}

with a symbolic adjustment

α_{k}

, which captures the signed deviation between the numerical value and the selected linguistic term. Thus, the linguistic label provides an interpretable qualitative category, while the adjustment term preserves additional numerical information within the linguistic scale.

3.2.3. Hybrid Weighting Model

The final weighting vector for each student is defined as a convex combination:

w^{(i)} = α w^{(g l o b a l)} + (1 - α) w^{(i n d i v i d u a l, i)}

(9)

where:

$α \in [0,1]$ controls the balance between global and individual influence;
$w^{(g l o b a l)}$ captures the collective structure;
$w^{(i n d i v i d u a l, i)}$ captures personalized priorities.

The hybrid weights are applied to each student:

z_{i j} = w_{j}^{(i)} x_{i j}

(10)

yielding the transformed dataset,

Z = [z_{i j}]

3.2.4. Methodological Implications

The hybrid formulation introduces a multi-level representation of preferences:

The global component ensures consistency and robustness.
The individual component preserves personalization.
The parameter $α$ enables controlled interpolation between both.

This approach overcomes the limitations of purely individual or purely global weighting schemes, providing a more flexible and realistic modeling of student preferences.

3.3. Fuzzy C-Means Clustering

After transforming the original dataset into a weighted representation

Z

, the next stage of the methodology focuses on identifying latent structures within the student population. Given the gradual and overlapping nature of vocational preferences, fuzzy clustering is adopted to capture the continuum of student profiles.

3.3.1. Fuzzy Membership Modeling

Traditional hard clustering methods assume that each observation belongs exclusively to a single group. However, in educational contexts, particularly in STEM vocational orientation, students often exhibit hybrid profiles, combining multiple tendencies such as analytical reasoning, creativity, or practical inclination.

To address this complexity, the Fuzzy C-Means (FCM) algorithm is employed, allowing each student to belong to multiple clusters with varying degrees of membership.

Let

Z = [z_{i j}] \in R^{n \times p}

be the weighted dataset obtained in Section 3.2, where each row

z_{i}

represents a student profile.

The FCM algorithm partitions the data into

C

clusters by minimizing the objective function:

J_{m} = \sum_{i = 1}^{n} \sum_{j = 1}^{C} u_{i j}^{m} ∥ z_{i} - c_{j} ∥^{2}

(11)

where:

$n$ is the number of students;
$C$ is the number of clusters;
$u_{i j} \in [0,1]$ is the membership degree of student $i$ to cluster $j$ ;
$c_{j}$ is the centroid of cluster $j$ ;
$m > 1$ is the fuzzification parameter controlling the degree of overlap.

The minimization of the objective function is achieved through an iterative process that updates membership degrees and cluster centroids.

The membership update is defined as:

u_{i j} = \frac{1}{\sum_{k = 1}^{C} {(\frac{∥ z_{i} - c_{j} ∥}{∥ z_{i} - c_{k} ∥})}^{\frac{2}{m - 1}}}

(12)

The result of the clustering process is a membership matrix:

U = [u_{i j}] \in [0,1]^{n \times C}

(13)

subject to:

\sum_{j = 1}^{C} u_{i j} = 1, \forall i

(14)

This matrix provides a soft segmentation of the student population, in which each student is characterized by a distribution of memberships. And the centroid update is given by:

c_{j} = \frac{\sum_{i = 1}^{N} u_{i j}^{m} z_{i}}{\sum_{i = 1}^{N} u_{i j}^{m}}

(15)

These steps are repeated until convergence, typically defined by a threshold on the change in membership values.

3.3.2. Cluster Validation and Interpretation

To evaluate the quality of the fuzzy partition, the Fuzzy Partition Coefficient (FPC) is used (Verma et al., 2023):

FPC = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{C} u_{i j}^{2}

(16)

Higher values of FPC indicate better-defined clusters, while lower values reflect higher overlap.

The validation stage complements this measure by computing the Xie–Beni index (Ghosh et al., 2011):

X B = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{C} u_{i j}^{m} ∥ z_{i} - c_{j} ∥^{2}}{n \cdot \underset{j \neq k}{m i n} ∥ c_{j} - c_{k} ∥^{2}}

(17)

where

c_{j}

denotes the centroid of cluster

j

, and

m

is the fuzzification parameter. In contrast to FPC, lower XB values indicate a better trade-off between compactness and separation.

From a pedagogical perspective, fuzzy clustering enables the identification of non-exclusive student profiles, reflecting the continuous and evolving nature of vocational preferences.

Each cluster can be interpreted as a latent profile, such as analytically oriented, creatively inclined, or application-driven, while the membership degrees describe how strongly each student aligns with these profiles. This representation supports personalized recommendations, the identification of mixed or transitional profiles, and the avoidance of rigid or deterministic classification.

Within the overall methodology, the FCM stage bridges weighted preferences and explainable modeling. The fuzzy memberships derived at this stage constitute the target structure for the subsequent predictive and interpretability analyses, in which machine learning models are trained to approximate and explain these soft clusters.

3.4. Predictive and Explainable Modeling

Following the identification of fuzzy clusters, the final stage of the methodological framework focuses on validating and interpreting the resulting structures through supervised machine learning and explainable artificial intelligence (XAI). This step transforms the descriptive clustering results into an interpretable predictive layer, enabling both internal validation and deeper understanding of student profiles.

3.4.1. Predictive Modeling of Fuzzy Structures

The fuzzy clustering stage produces a membership matrix given by Equation (13), which encodes the degree to which each student belongs to each cluster. To assess the consistency and learnability of these structures, supervised models are trained using the weighted dataset

Z

as input.

A classifier is trained to predict the dominant cluster label:

y_{i} = a r g \underset{j}{m a x} (u_{i j})

(18)

The training objective minimizes the multinominal logistic loss:

L = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{C} y_{i k} l o g (p_{i k})

(19)

where

y_{i k}

is the true class indicator, equal to 1 if sample

i

belongs to class

k

, and

p_{i k}

is the predicted probability of class

k

.

3.4.2. Explainable Artificial Intelligence (XAI)

To ensure interpretability, the predictive models are complemented with XAI techniques that provide both global and local explanations.

Global explanations aim to identify the most influential variables in the prediction process. Feature importance measures derived from ensemble models, such as gradient boosting, quantify the contribution of each criterion

z_{j}

to the model output.

Global importance can be expressed as the average contribution of each feature across all observations:

I_{j} = \frac{1}{n} \sum_{i = 1}^{n} ∣ ϕ_{i j} ∣

(20)

where

ϕ_{i j}

represents the contribution of feature

j

to the prediction of instance

i

, as estimated by model-agnostic explanation methods such as SHAP.

To analyze individual predictions, local explanation methods such as LIME are employed. These methods approximate the model behavior around a specific instance

z_{i}

using a simple surrogate model:

\hat{f} (z) = w_{0} + \sum_{j = 1}^{p} w_{j} z_{j}

(21)

where

\hat{f} (z)

is the local approximation of the model, and

w_{j}

represents the contribution of each feature.

This enables analysis of how local variations in the input variables influence the predicted profile.

3.4.3. Model Evaluation and Interpretation

The integration of predictive modeling and XAI transforms the fuzzy clustering results into an interpretable decision-support system. Instead of treating clusters as abstract mathematical constructs, this approach enables:

Explaining why a student belongs to a given profile.
Identifying key factors influencing preferences.
Supporting personalized educational guidance.

From a pedagogical perspective, this stage reinforces the role of artificial intelligence as a transparent decision-support tool for reflective learning and informed educational guidance.

Within the FAS-XAI methodology, this stage integrates fuzzy logic for uncertainty modeling, machine learning for predictive validation, and XAI for interpretability. Together, these components bridge data-driven analysis and educational insight, enabling a deeper understanding of student preferences and their underlying structure.

3.5. Educational and Strategic Interpretation of Student Profiles

The final stage of the FAS-XAI framework translates the analytical outputs into actionable educational insights. Rather than treating clustering and predictive results as purely computational outcomes, this phase highlights their role in supporting personalized guidance, reflective learning, and informed decision-making.

3.5.1. Interpretation of Fuzzy Profiles

The fuzzy clustering process produces a set of latent student profiles represented by cluster centroids

{c_{1}, \dots, c_{C}}

and membership degrees

u_{i j}

. Each cluster can be interpreted as a prototypical vocational orientation, characterized by specific combinations of criteria, such as high-interest–high-motivation profiles, balanced configurations, or ability-driven tendencies.

Unlike crisp classifications, the fuzzy representation allows each student

i

to be described by a membership vector

u_{i} = (u_{i 1}, \dots, u_{i C})

, capturing the degree to which the student belongs to each latent profile. This enables the identification of dominant profiles, where one membership clearly prevails; hybrid profiles, where multiple memberships coexist; and transitional profiles, indicating potential evolution in preferences.

This representation reflects the complexity of real educational trajectories while avoiding rigid categorizations.

3.5.2. From Data Analysis to Pedagogical Value

The integration of explainable artificial intelligence makes fuzzy clustering results interpretable at the individual level. By combining fuzzy memberships with local explanation methods, the framework helps identify why a student is associated with a given profile, which criteria most influence their vocational orientation, and how variations in input variables could modify their profile.

This turns the model into a feedback mechanism that helps students and educators explore the underlying structure of preferences.

From a practical perspective, once validated with real educational data, the framework may support personalized guidance based on dominant and secondary profiles, help identify misalignments between interest, ability, and motivation, and inform flexible pathway design.

Importantly, the model does not prescribe decisions; rather, it provides structured and interpretable information to support them.

3.5.3. Generalization and Transferability

Although this study uses simulated data, the proposed methodology is designed to be transferable to real educational contexts. The separation between data representation, weighting, clustering, and explainability enables the framework to be adapted to real student data, learning analytics systems, and institutional decision-support tools.

In this sense, the present study should be understood as a methodological validation under controlled conditions that provides a foundation for future empirical applications. The following section presents the results obtained from the implementation of the proposed framework, illustrating its analytical and interpretive capabilities.

4. Results and Discussion

This section presents the main outcomes obtained by applying the proposed FAS-XAI framework to the simulated dataset of students’ STEM vocational preferences. The results are organized according to the methodological stages described in Section 3: data transformation, fuzzy multi-criteria weighting, clustering, and explainable modeling.

The first part focuses on the exploratory and structural analysis of the dataset, including variable distributions, relationships, and the effects of normalization and weighting. This provides an initial view of how individual responses are transformed into a representation suitable for fuzzy modeling.

The second part presents the fuzzy clustering results, including the identification of latent student profiles and the distribution of membership degrees. These results illustrate the model’s ability to capture dominant, hybrid, and transitional profiles within overlapping vocational preferences.

The third part presents the predictive and explainable modeling results, showing how machine learning models approximate the fuzzy structures and how XAI techniques provide global and local interpretability. This analysis identifies the most influential criteria and supports the transparent interpretation of individual student profiles.

From an educational perspective, the results show how fuzzy logic and explainable artificial intelligence can support a nuanced, non-rigid understanding of student preferences. Each subsection connects the analytical findings with their pedagogical implications, positioning the proposed methodology as a bridge between data-driven modeling and STEM-oriented educational practice.

4.1. Exploratory Data Analysis and Variable Justification

The exploratory data analysis (EDA) stage serves a dual purpose in this study. It provides a descriptive understanding of the dataset and supports the conceptual validation of the selected variables as key dimensions of students’ vocational preferences.

The dataset emulates a survey-based scenario in which students report their perceived preferences and competencies across several dimensions. Although synthetically generated, the data are constructed to reflect realistic educational patterns, allowing the methodological framework to be evaluated in a controlled yet meaningful setting.

The dataset includes seven variables, passion, technical aptitude, communication, commitment, creativity, leadership, and resilience, to reflect a multidimensional perspective of vocational orientation. These dimensions are consistent with established frameworks in career development and educational psychology, such as Social Cognitive Career Theory (Hackett, 2006) and expectancy–value models (Eccles & Wigfield, 2002), which emphasize the interaction between interest, ability, and motivation in vocational decision-making.

Prior research also highlights the importance of self-efficacy and perceived competence in STEM domains, together with transversal skills such as communication, leadership, and resilience in both academic and professional contexts. This supports the integration of domain-specific and cross-cutting competencies within the proposed modeling framework.

From a data perspective, the EDA examines variable distributions, variability, and relationships. Figure 5 presents the variable distribution. The results show a concentration of values in the upper range for technical aptitude, commitment, and passion, indicating high perceived competence and motivation. In contrast, leadership and communication exhibit greater dispersion, suggesting higher heterogeneity among students.

The correlation analysis shown in Figure 6 reveals complementary and contrasting relationships among variables. Moderate positive correlations between communication, creativity, and leadership suggest a transversal competency dimension. Conversely, technical aptitude shows negative correlations with several of these variables, suggesting a potential differentiation between technical and socio-cognitive profiles. Resilience displays a distinct association pattern, reinforcing its role as an independent behavioral dimension.

Figure 7 further summarizes the data structure by presenting the mean values and variability of each criterion. The results show that technical aptitude and passion exhibit the highest average scores, while leadership shows the lowest mean and higher variability. This combination of central tendency and dispersion highlights the need to consider both dominant trends and heterogeneity when modeling student preferences.

Overall, the EDA stage links theoretical constructs with data representation, showing that the selected variables are conceptually meaningful and analytically suitable for the proposed FAS-XAI framework.

4.2. Fuzzy Multi-Criteria Weighting Results

Following the exploratory analysis, the fuzzy weighting stage transforms the survey-based responses into a multi-criteria representation that captures uncertainty and relative importance among the considered dimensions. This stage operationalizes the methodology described in Section 3.2 by combining global and individual perspectives through a hybrid fuzzy AHP approach.

The objective is to move from raw survey responses to a structured preference representation in which each criterion contributes according to its relative importance at both collective and individual levels.

4.2.1. Global Weighting Structure

The global weighting stage derives the relative importance of each criterion from the aggregated responses of the full cohort.

As shown in Table 2, the global weights are relatively balanced across criteria, although slight differences can be observed. Passion, technical aptitude, commitment, and resilience exhibit higher weights, suggesting a more central role in the collective perception of vocational preferences. In contrast, communication, creativity, and leadership receive comparatively lower weights, indicating a secondary but still relevant role within the overall structure.

From a methodological perspective, this global weighting vector provides a stable reference frame for the analysis, capturing the shared preference structure of the cohort and serving as the baseline component of the hybrid model.

4.2.2. Individual and Hybrid Weighting Patterns

To account for individual variability, the hybrid weighting stage computes fuzzy AHP weights for each student, capturing how each student implicitly prioritizes the criteria based on their normalized responses.

However, relying exclusively on individual weights may introduce instability and circularity, because the same data would define both the weighting structure and its interpretation. To address this limitation, the proposed methodology integrates both perspectives through a hybrid formulation, Equation (9). The resulting matrix contains the final weighting vectors used in the analysis.

Figure 8 shows the distribution of hybrid weights across all students. The results show that the hybrid model preserves substantial individual variability while maintaining a shared global structure. This is particularly evident in criteria such as communication and leadership, whose wider distributions indicate heterogeneous prioritization among students.

In contrast, passion and technical aptitude show more concentrated distributions, suggesting a stronger consensus in their relative importance.

These findings suggest that the hybrid approach successfully balances robustness and personalization by preserving individual differences within a coherent global structure.

4.2.3. Construction of the Transformed Dataset $Z$

The pipeline then applies the hybrid weights to the original responses and constructs the transformed dataset

Z

according to Equation (10). This transformation integrates the observed responses and their contextual importance, producing a representation that reflects not only what students report but also how these responses are weighted within the multi-criteria decision-making framework.

To ensure comparability between the original survey responses and the transformed dataset

Z

, the pipeline normalizes both representations to a common

[0, 1]

scale. This normalization enables a consistent comparison of the relative contribution of each criterion, independent of the original measurement ranges.

Figure 9 presents the normalized comparison of mean values for both representations. The results indicate that the weighting process reshapes the relative importance of the criteria. While the original values reflect raw student evaluations, the transformed values emphasize dimensions assigned greater relevance by the hybrid AHP model.

Although the overall criterion structure is preserved, the transformation introduces a redistribution effect: passion, commitment, and resilience gain relative importance, whereas communication and creativity, are slightly attenuated. Leadership exhibits the lowest relative contribution after normalization, indicating that it remains the lowest-weighted dimension in both representations.

This effect highlights the role of the weighting mechanism in refining the representation of vocational preferences, moving beyond raw perceptions toward a more structured and decision-oriented interpretation.

From an analytical perspective, the dataset

Z

constitutes the input for the subsequent fuzzy clustering stage. By incorporating weighted preferences, it supports the identification of latent and overlapping student profiles while avoiding the limitations of treating all variables as equally important.

4.2.4. Synthesis of the Weighting Process

Taken together, the global, individual, and hybrid weighting structures define a progressive transformation of the data:

The global weights provide a stable and interpretable reference structure.
The individual weights capture personalized prioritization patterns.
The hybrid model integrates both perspectives into a coherent weighting framework.
The transformed dataset $Z$ represents an enriched feature space for clustering.

This multi-level representation constitutes a key contribution of the proposed FAS-XAI framework, bridging raw survey data and interpretable analytical modeling.

4.3. Fuzzy Clustering and Profile Identification

This section presents the results of the fuzzy clustering stage applied to the transformed dataset

Z

. The aim is to identify latent and potentially overlapping student profiles from the weighted multi-criteria representation produced by the hybrid fuzzy AHP model.

Unlike crisp clustering approaches, Fuzzy C-Means (FCM) allows each student to belong simultaneously to multiple clusters with varying degrees of membership. This property is particularly suitable for modeling vocational preferences, which are gradual, uncertain and multidimensional. In the proposed framework, clustering is performed on the numerical representation of

Z

, while the fuzzy linguistic 2-tuple representation is used for interpretation.

The analysis is organized into four stages: cluster-number selection through fuzzy validity indices, examination of the membership structure, characterization of centroids and latent profiles, and inspection of representative cases in numerical and linguistic terms.

4.3.1. Cluster Selection Using Fuzzy Validity Indices

The validation stage determines the appropriate number of clusters using two complementary fuzzy validity indices: the Fuzzy Partition Coefficient (FPC) and the Xie–Beni (XB) index, defined in Equations (16) and (17).

Figure 10 shows the evolution of both indices across different number of clusters. As expected, FPC decreases monotonically as the number of clusters increases, reflecting its tendency to favor simpler partitions. The XB index provides a more informative perspective, with relatively stable values in the intermediate range and a sharp increase for larger numbers of clusters, indicating over-segmentation and loss of partition quality.

In this context, the selection of

C = 6

provides a suitable compromise between partition quality, interpretability, and consistency with the six latent vocational orientations embedded in the simulated survey scenario.

4.3.2. Fuzzy Membership Structure

Once the optimal number of clusters is established, the pipeline applies the Fuzzy C-Means (FCM) algorithm to the transformed dataset

Z

, producing a soft partition in which each student is characterized by a vector of membership degrees across the six clusters. This fuzzy representation captures the ambiguity and overlap present in vocational preferences while avoiding the limitations of rigid assignments.

Figure 11 shows the number of students assigned to each dominant cluster, defined as the cluster with the highest membership degree for each student. The distribution is relatively balanced, although Cluster 5 presents the largest number of dominant assignments. This suggests that the model identifies multiple coexisting profiles without dominance of a single structure, reinforcing the diversity of preference patterns in the dataset.

Figure 12 presents the distribution of maximum membership degrees among students. Most values are concentrated in an intermediate range, indicating that many students exhibit partial memberships across several clusters rather than belonging exclusively to one. This behavior is consistent with the multi-dimensional nature of vocational orientation, where interests and competencies tend to overlap.

To facilitate interpretation, students are categorized into three profile types based on their maximum membership degree:

Dominant profiles, with membership values above 0.70, reflect a clear alignment with a specific cluster.
Transitional profiles, with values between 0.55 and 0.70, indicate partial but not exclusive association.
Hybrid profiles, with membership values below 0.55, represent highly distributed preferences across multiple clusters.

In addition to membership strength, the uncertainty associated with each student’s cluster assignment is quantified using fuzzy entropy:

H_{i} = - \sum_{k = 1}^{C} u_{i k} l o g (u_{i k})

(22)

This measure captures the dispersion in the membership vector. Lower entropy values correspond to well-defined profiles with strong cluster affiliation, while higher values indicate more ambiguous or hybrid configurations. The combination of maximum membership and entropy provides a complementary perspective, enabling a more nuanced interpretation of each student’s positioning within the fuzzy partition.

To support interpretation, Table 3 presents a reduced set of representative student profiles for each cluster. Specifically, the analysis selects three students per cluster to illustrate prototypical, intermediate, and transitional cases. For each profile, the table reports the dominant cluster (CI), maximum membership (MaxM), entropy (Ent), the gap between the two highest memberships (Gap), and the weighted criterion values in dataset

Z

(Pas = Passion; Tech = Technical Aptitude; Comm = Communication; Comt = Commitment; Creat = Creativity; Lead = Leadership; Res = Resilience). The complete membership vectors are provided in Supplementary Material to ensure reproducibility and analytical transparency.

Overall, the results suggest that the proposed approach captures both distinct and overlapping vocational profiles. By combining fuzzy membership, entropy-based uncertainty, and representative cases, the model provides an interpretable characterization of student preferences that supports subsequent cluster interpretation and explainable modeling.

4.3.3. Cluster Centroids and Linguistic Characterization

To characterize the fuzzy clusters, the analysis examines the FCM centroids in numerical and linguistic terms. The numerical centroid values summarize the average weighted profile of each cluster in the transformed feature space

Z

, while the linguistic 2-tuple representation facilitates qualitative interpretation.

Table 4 presents the numerical centroid values for the six clusters. These centroids reveal differentiated configurations across the criteria, showing that the clusters are structured combinations of motivational, technical, social, and resilience-related dimensions.

The centroid analysis projects the centroid values onto a fuzzy linguistic framework using the 2-tuple representation. This linguistic characterization follows the interval-based mapping rules defined in Section 3.2.2, assigning qualitative labels according to explicit numerical thresholds rather than subjective interpretation. The resulting representation, shown in Table 5, allows each criterion within a cluster to be interpreted in terms such as very low, low, medium, medium high, high, high, very high, together with a symbolic adjustment. This dual representation bridges the numerical output of the clustering model with a human-readable description of the latent profiles.

Taken together, the numerical and linguistic centroid analyses provide the basis for assigning semantic labels to the clusters and for identifying dominant profile tendencies, as discussed in the following subsection.

4.3.4. Semantic Labeling of Clusters

The semantic labeling process uses centroid values and representative student profiles to assign a semantic label to each fuzzy cluster. This labeling process translates numerical and fuzzy outputs into educational categories that can be interpreted in applied contexts.

Figure 13 and Table 4 and Table 5 summarize the centroid values associated with each cluster, revealing distinct configurations across the considered criteria. The analysis uses these patterns to identify dominant traits and assign descriptive labels to each cluster.

Based on these centroid values, the following cluster interpretations are proposed:

Cluster 1—Technical–Resilient Profile. This cluster is characterized by very high levels of Technical Aptitude and Resilience, combined with notably low values in Communication and Leadership. This configuration reflects a strongly analytical and task-oriented profile, with a high capacity to persist under demanding conditions, but with limited emphasis on social interaction and collaborative dynamics.
Cluster 2—Balanced Social Profile. This cluster exhibits relatively balanced values across all criteria, with slightly higher levels of Communication and Commitment. It represents a consistent and socially engaged profile, where interpersonal skills and sustained effort play a central role in performance and development.
Cluster 3—Creative–Communicative Profile. This profile is defined by high levels of Passion, Creativity, Communication, and Leadership. The combination of these dimensions indicates a strong orientation toward expression, collaboration, and idea generation, suggesting affinity with creative and socially interactive environments.
Cluster 4—Structured Technical Profile. Characterized by high Technical Aptitude and Commitment, this cluster reflects a disciplined and goal-oriented profile. Social variables remain moderate to low, indicating a preference for structured tasks and performance-driven contexts.
Cluster 5—Specialized Technical Profile. This cluster combines high Technical Aptitude and Passion with low Communication and Leadership. It represents a specialized and focused technical orientation, potentially associated with deep domain expertise and a preference for individual work.
Cluster 6—Transitional Profile. This cluster presents moderate values across all criteria, without a clear dominance of any dimension. This pattern suggests a diffuse or evolving profile, where preferences are not yet fully defined and may still be in transition.

These labels should be understood as interpretative constructs derived from the dominant tendencies observed in each cluster. The fuzzy nature of the model allows each student to belong simultaneously to multiple profiles with varying degrees of membership, reinforcing the dynamic and multidimensional nature of vocational preferences.

4.4. Supervised Validation and Explainability of the Fuzzy Clustering Structure

The pipeline integrates a supervised learning framework to evaluate the robustness and interpretability of the fuzzy clustering results. The supervised validation stage employs an XGBoost classifier (Chen & Guestrin, 2016) to model the relationship between the input features and the dominant cluster assignments obtained through Fuzzy C-Means.

This step serves two objectives. First, it supports the validation of the clustering structure by assessing whether the cluster assignments can be accurately predicted from the transformed feature space. Second, it enables the incorporation of Explainable Artificial Intelligence (XAI) techniques, which provide insights into the contribution of each variable to the classification process.

Unlike purely unsupervised approaches, this hybrid strategy links clustering and interpretability by transforming the fuzzy partition into a supervised learning problem. In this context, the dominant cluster is treated as the target variable, while the normalized weighted criteria (Z-domain features) act as predictors.

XGBoost is particularly suitable due to its strong predictive performance and compatibility with XAI methods such as SHAP and LIME. These techniques provide global and local interpretability, making it possible to analyze the drivers of each cluster and assess whether the learned patterns are consistent with the previously identified centroids.

Therefore, this stage evaluates the stability of the clustering structure and supports a deeper interpretative analysis of the profiles through the XAI methods developed in the following sections.

4.4.1. Classification Performance and Confusion Matrix Analysis

The percentage-based confusion matrix shown in Figure 14 provides a detailed view of the classification performance across clusters. Each row represents the distribution of predicted labels for a given true cluster, allowing normalized interpretation independent of class size.

The results show a strong diagonal structure, indicating a high agreement between the fuzzy clustering assignments and the predictions generated by the supervised model.

Cluster 1 exhibits perfect classification performance, with 100% of instances correctly identified. This suggests that the profile associated with this cluster is highly distinctive and well-separated from the others.

Clusters 2 and 3 also show high classification accuracy, above 90%, with only minor confusion between them. This slight overlap suggests that these clusters may share certain intermediate characteristics, leading to soft boundaries consistent with the fuzzy nature of the model.

Cluster 4 presents a slightly lower accuracy, approximately 83%, with some misclassifications toward neighboring clusters. This behavior is coherent with profiles that may lie in transitional regions of the feature space.

Cluster 5 maintains strong performance above 94%, indicating a well-defined structure, although a small proportion of instances are confused with Cluster 6.

Cluster 6 shows the lowest classification accuracy, approximately 62.5%, with notable dispersion toward Clusters 4 and 5. This result suggests that Cluster 6 may represent a more heterogeneous or transitional profile, characterized by overlapping membership patterns rather than sharply defined boundaries.

This pattern supports the validity of the fuzzy clustering approach, as it reflects gradual transitions between profiles rather than rigid separations. These findings suggest that the fuzzy clustering structure is internally consistent and can be effectively approximated by a supervised learning model.

4.4.2. SHAP-Based Interpretation of the XGBoost Model

The explainability layer uses SHAP (SHapley Additive exPlanations) to interpret the XGBoost classification model. SHAP assigns contribution values to each feature, allowing the analysis of both global feature importance and cluster-specific directional effects.

First, the global absolute SHAP values summarize the contribution of each feature to cluster classification across the full dataset. Figure 15 shows that Communication, Resilience, and Technical Aptitude are the most influential variables, followed by Passion and Leadership, whereas Creativity and Commitment exhibit comparatively lower global importance. This suggests that the model relies primarily on interpersonal, adaptive, and technical dimensions when differentiating learner profiles.

Second, the SHAP summary plots display the distribution, magnitude, and direction of feature effects within each cluster, thus offering a more detailed view of how feature values influence the model predictions (Figure 16, Figure 17 and Figure 18).

Cluster 1—Technical–Resilient Profile is mainly influenced by Resilience, with secondary contributions from Communication and Passion. Although the cluster is conceptually associated with technical aptitude, SHAP indicates that persistence and adaptability are the strongest discriminative factors in the model.
Cluster 2—Balanced Social Profile is primarily driven by Technical Aptitude, followed by Communication and Commitment. This suggests that, despite its balanced conceptual definition, the model identifies a stronger technical component combined with social and effort-related dimensions.
Cluster 3—Creative–Communicative Profile is predominantly explained by Leadership and Creativity, which is consistent with its expressive and collaborative nature. Leadership emerges as a central feature in distinguishing this profile.
Cluster 4—Structured Technical Profile exhibits a more multidimensional pattern, with Resilience, Creativity, and Technical Aptitude as the main explanatory variables. This indicates that structured performance is not only associated with technical competence, but also with adaptive and innovative capabilities.
Cluster 5—Specialized Technical Profile is mainly shaped by Passion and Communication, rather than by Technical Aptitude alone. This suggests that specialization may be more strongly linked to motivation, engagement, and communicative capacity than to purely technical skill.
Cluster 6—Transitional Profile is primarily driven by Commitment, with additional contributions from Communication and Leadership. This finding suggests that persistence and social-regulatory traits play a stabilizing role in this evolving profile.

Overall, the SHAP analysis shows a partial but meaningful correspondence between the centroid-based characterization of the fuzzy clusters and the feature patterns learned by the XGBoost model. In some cases, the model confirms the conceptual definition of the clusters, whereas in others it uncovers additional relationships not fully captured by centroid values alone. This suggests that the classifier captures non-linear interactions among variables, providing a more nuanced representation of learner profiles than centroid-based clustering alone.

4.4.3. Local Interpretability with LIME

To complement the global interpretation provided by SHAP, the explainability layer uses LIME to analyze selected test instances, capturing local decision behavior and showing how specific feature values influence cluster assignment at the individual level.

Rather than presenting multiple redundant explanations, two representative instances are analyzed to illustrate model consistency and local variability within the FAS-XAI framework.

Figure 19 shows the LIME explanations for these representative cases.

Representative Instance 3—Technical–Resilient Profile (Cluster 1). The classification is primarily driven by high Technical Aptitude and Resilience, together with low Communication. This result is fully consistent with the conceptual definition of the cluster, where analytical capability and persistence dominate over social dimensions. The explanation confirms that the model captures the expected structure of this profile.
Representative Instance 25—Balanced/Transitional Profile (Cluster 4). In this case, the prediction is influenced by a combination of Commitment, Leadership, and moderate Communication, while Technical Aptitude plays a less dominant role. This reveals a more nuanced decision pattern in which motivational and social factors outweigh purely technical characteristics. The result illustrates how the model captures complex interactions that are not fully represented by centroid-based descriptions.

4.5. Cross-Method Consistency Analysis

The comparison between fuzzy centroid-based profiles, SHAP global explanations, and LIME local interpretations shows a high level of overall consistency, while also uncovering important differences that enhance the understanding of the underlying data structure.

In several clusters such as the Technical–Resilient and Creative–Communicative profiles, centroid definitions and model-driven explanations are strongly aligned, suggesting that the classifier captures the expected structure of these profiles.

In other cases, SHAP and LIME reveal additional relationships that are not explicitly reflected in the centroid values. For instance, Resilience emerges as a dominant discriminative factor in the Technical–Resilient profile, while Motivation- and Communication-Related variables play a central role in specialized and transitional profiles.

These differences suggest that the XGBoost model captures non-linear interactions and higher-order dependencies among variables, complementing the average-based nature of centroid clustering.

Overall, the combined use of Fuzzy C-Means, SHAP, and LIME provides a multi-level interpretability framework, in which:

FCM defines the structural profiles.
SHAP explains global and cluster-level feature importance.
LIME refines interpretations at the individual level.

SHAP and LIME make the model’s decisions interpretable at global and local levels, thereby potentially supporting future teacher-mediated interpretation and student reflection.

This integrated perspective constitutes a key contribution of the FAS-XAI framework.

4.6. Comparative Benchmark with a Crisp Clustering Baseline

To provide a quantitative reference against a simpler non-fuzzy alternative, the benchmarking stage compares the proposed FAS-XAI/FCM pipeline with a K-Means baseline. K-Means provides a widely used crisp clustering reference because it assigns each observation to a single cluster and does not model partial membership or transitional profiles.

As shown in Table 6, the K-Means baseline obtains higher Silhouette and accuracy values, together with a lower log-loss. This result is expected, since crisp partitions are generally easier for a supervised classifier to reproduce. However, this advantage comes at the cost of losing information about membership degrees, hybrid profiles, and transitional cases.

In contrast, the FAS-XAI/FCM pipeline provides a softer representation of vocational preferences, allowing each student to be described through degrees of membership across multiple profiles. Therefore, the added value of the proposed framework does not lie in maximizing conventional predictive metrics alone, but in providing an interpretable and uncertainty-aware profile structure.

4.7. Robustness Analysis Under Controlled Data Degradation

To further assess the stability of the proposed FAS-XAI pipeline, the robustness analysis introduces controlled data degradation scenarios. The robustness stage evaluates three perturbation scenarios: Gaussian noise, missing values, and high-variance anomalies. The analysis examines whether the methodological structure remains consistent when the input data are progressively degraded. In the Gaussian noise scenario, the robustness stage adds random perturbations to the normalized input variables. In the missing-values scenario, it removes a fixed proportion of entries at random and subsequently imputes them using the median value of each criterion. In the anomaly scenario, it perturbs a proportion of observations using high-variance noise to simulate atypical or inconsistent response patterns.

For each scenario, the pipeline re-executes the complete analytical sequence, including fuzzy weighting, Fuzzy C-Means clustering, and supervised validation. The analysis then compares the results with the baseline configuration using fuzzy validity indices and predictive validation metrics: Fuzzy Partition Coefficient (FPC), Xie–Beni index (XB), dominant-cluster stability, classification accuracy, and log-loss, as shown in Table 7.

The baseline configuration produced an FPC of 0.308, an XB index of 4.644, and classification accuracy of 0.887. Under Gaussian noise, the stability of the dominant-cluster assignments decreased progressively from 0.850 at 5% noise to 0.377 at 20% noise, indicating that strong continuous perturbations substantially affect the separability of the fuzzy partition. This effect is also reflected in the sharp increase in the XB index, particularly at the highest noise level, suggesting reduced centroid separation and loss of compactness–separation balance.

Missing values produced a more gradual degradation pattern. The framework maintained high dominant-cluster stability at 5% and 10% missingness, with values of 0.911 and 0.858, respectively. At 20%, stability decreased to 0.632, indicating that the pipeline remained relatively robust under mild and moderate missingness but became less stable under stronger information loss.

High-variance anomalies showed a different pattern. Although the presence of atypical observations affected predictive performance, dominant-cluster stability remained high, with values above 0.93 for both degradation levels. This suggests that the fuzzy representation can absorb isolated anomalous responses without substantially altering the dominant partition structure, although uncertainty may increase in the supervised validation layer.

Overall, the robustness analysis shows that the proposed pipeline remains stable under mild and moderate perturbations, whereas stronger degradation reduces cluster separability and predictive consistency. These findings support the internal robustness of the methodological pipeline and identify the conditions under which its performance begins to degrade.

5. Conclusions

This study introduces a structured and interpretable framework for analyzing student profiles by integrating fuzzy clustering and explainable artificial intelligence within the FAS-XAI methodology. The proposed approach extends beyond traditional data analysis by providing a profile-based representation of learner characteristics, with potential implications for future studies on academic development, project-based learning, and career orientation.

By combining fuzzy clustering, supervised predictive modeling, and explainability techniques including SHAP and LIME, the framework enables the identification of latent behavioral patterns while explaining how and why individuals are associated with specific profiles. This dual perspective provides a methodological basis for moving from descriptive analysis toward interpretable and personalized decision-support scenarios that require future empirical validation.

From an educational perspective, the results illustrate the multidimensional nature of student profiles within the simulated setting, where technical competencies interact with social, motivational, and adaptive factors. Variables such as resilience, communication, and commitment emerge as relevant discriminative dimensions in the classification process, suggesting that future empirical applications should consider broader competencies alongside technical aptitude.

A key contribution of this work lies in the consistency between the centroid-based fuzzy clustering structure and the model-driven explanations obtained through SHAP. While the centroids define the structural characteristics of each cluster, SHAP reveals the underlying discriminative factors that drive classification, uncovering additional relationships and non-linear interactions between variables. This complementary perspective strengthens the interpretability of the identified profiles and provides a more comprehensive understanding of the simulated profile structure.

Building on this multi-level interpretability, the identified profiles may suggest potential academic and professional orientations. The Technical–Resilient profile may correspond to contexts requiring persistence and technical problem-solving, such as software development, data analysis, or engineering. The Balanced Social profile may inform collaborative and coordination-oriented pathways, including project management, consulting, and team-based environments. The Creative–Communicative profile may be relevant to innovation-oriented contexts, such as design, entrepreneurship, and leadership in multidisciplinary teams.

The Structured Technical profile may correspond to disciplined, system-oriented contexts involving planning, execution, and process optimization. The Specialized Technical profile may suggest pathways toward deep technical expertise in domains such as artificial intelligence, cybersecurity, or specialized engineering fields. Finally, the Transitional profile may indicate evolving preferences, highlighting the potential value of adaptive guidance in future empirical implementations.

These associations should be understood as interpretative hypotheses derived from the methodological pipeline, rather than as empirically validated career predictions. The profiles generated by the framework should therefore be treated as provisional and revisable representations that may support reflection and future decision-making. In real educational contexts, their responsible use would require transparency, human oversight, careful avoidance of stigmatizing labels, and empirical validation with real learners.

The integration of explainable AI also enhances transparency by providing global (SHAP) and local (LIME) explanations of the model’s decision-making process. In future educational applications, this interpretability could support reflective analysis, data literacy, and informed discussion around student profiling, provided that appropriate ethical and pedagogical safeguards are implemented.

Finally, the results suggest that the FAS-XAI framework constitutes an extensible and transferable methodological approach that connects data-driven analysis with human-centered interpretation. Its main contribution lies in establishing a transferable analytical process for uncertainty-aware and interpretable profiling. Beyond education, the methodology may offer a basis for future exploratory applications in domains such as organizational decision-making, talent management, professional collaboration, and healthcare, where understanding complex, multidimensional profiles is essential.

6. Limitations and Future Work

Despite the promising results, several limitations should be acknowledged when interpreting the findings and guiding future research.

First, the analysis is based on a synthetic dataset specifically designed to represent diverse student profiles and test the proposed methodology. While this enables controlled experimentation and enhances interpretability, it does not fully capture the complexity, variability, and noise of real-world data. Synthetic data limit ecological validity and prevent direct extrapolation to real student populations. However, this choice is deliberate and methodologically grounded: the purpose of this study is not to draw statistical inferences about a specific population, but to validate a reproducible, transparent, and extensible analytical framework that can later be applied to real educational datasets. Accordingly, future work should validate the framework using real-world data to assess its practical applicability and contextual robustness.

Second, the study is primarily oriented towards STEM students, with particular emphasis on computational and data-driven disciplines. Although this context provides an appropriate environment for testing the integration of clustering and explainable AI techniques, it limits the generalizability of the findings. Extending the framework to other educational domains, such as social sciences, humanities, or interdisciplinary programs, would enable a broader understanding of how different competencies and learning dynamics interact.

Third, the current implementation considers a static representation of student profiles and does not account for their temporal evolution. However, learning processes and individual competencies are dynamic. Future research should explore longitudinal and time-aware approaches that capture the evolution of profiles over time, enabling the development of adaptive and continuously updated guidance systems.

Additionally, while the framework integrates fuzzy clustering with explainable machine learning techniques, further work is needed to strengthen the interaction between fuzzy logic and XAI, particularly in terms of enhancing the semantic interpretability of clusters and improving the linkage between quantitative model outputs and qualitative insights.

Beyond these methodological limitations, the simulation-based nature of the study also raises educational and epistemological considerations. The framework demonstrates internal coherence, reproducibility, and interpretability, but it does not provide evidence of empirical educational effectiveness in authentic learning environments. Therefore, the generated profiles should be understood as provisional analytical representations rather than fixed descriptions of students’ vocational identities or trajectories. The value of the simulation lies in showing how fuzzy weighting, fuzzy clustering, supervised validation, and explainability can be combined into a transferable process that can later be adapted to specific educational settings, survey instruments, student populations, disciplinary contexts, and institutional objectives. The framework provides a methodological basis for future data-informed decision-making by teachers and learners once validated with real educational data.

The explainability layer also requires pedagogical and ethical mediation. SHAP and LIME do not generate educational recommendations automatically; rather, they provide interpretable evidence about which variables contribute to profile assignments at global and local levels. In future real-world applications, these explanations could help educators understand patterns of student preferences, refine communication strategies, and promote student reflection. However, responsible use would require real-data validation, human oversight, fairness assessment, avoidance of stigmatizing labels, and careful attention to risks such as algorithmic bias, overclassification, and reductionism.

In light of these limitations and considerations, future research should move in two complementary directions. First, the framework should be validated with real educational datasets and context-specific survey instruments, including different educational levels, disciplinary areas, and student populations. Second, the framework’s potential transferability could be explored in professional and organizational contexts where interpretable profile analysis is relevant, such as talent management, team composition, leadership development, and career orientation. In all cases, these applications should be understood as prospective extensions that require empirical validation and contextual adaptation before practical deployment. Thus, the framework should be understood as a methodological foundation for future validated applications, rather than as a ready-to-deploy educational profiling system.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/educsci16060917/s1, File S1: supplementary dataset and Python notebook implementing the FAS-XAI analysis pipeline.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. Ethical review and approval were waived for this study, as it is based on a synthetic dataset and does not involve human participants, personal data collection, or experimental procedures requiring ethical oversight.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset and Python 3.12.13 code supporting the findings of this study are available as Supplementary Materials. This ensures full reproducibility and transparency of the proposed methodology.

Conflicts of Interest

The author declares no conflicts of interest.

References

Afzaal, M., Zia, A., Nouri, J., & Fors, U. (2024). Informative feedback and explainable AI-based recommendations to support students’ self-regulation. Technology, Knowledge and Learning, 29(1), 331–354. [Google Scholar] [CrossRef]
Ahmad, N., & Qahmash, A. (2020). Implementing fuzzy AHP and FUCOM to evaluate critical success factors for sustained academic quality assurance and ABET accreditation. PLoS ONE, 15(9), e0239140. [Google Scholar] [CrossRef]
Aliyev, R., Temizkan, H., & Aliyev, R. (2020). Fuzzy analytic hierarchy process-based multi-criteria decision making for universities ranking. Symmetry, 12(8), 1351. [Google Scholar] [CrossRef]
Alvarez-Garcia, M., Arenas-Parra, M., & Ibar-Alonso, R. (2024). Uncovering student profiles. An explainable cluster analysis approach to PISA 2022. Computers & Education, 223, 105166. [Google Scholar] [CrossRef]
Alwarthan, S., Aslam, N., & Khan, I. U. (2022). An explainable model for identifying at-risk student at higher education. IEEE Access, 10, 107649–107668. [Google Scholar] [CrossRef]
Atıcı, U., Adem, A., Şenol, M. B., & Dağdeviren, M. (2022). A comprehensive decision framework with interval valued type-2 fuzzy AHP for evaluating all critical success factors of e-learning platforms. Education and Information Technologies, 27(5), 5989–6014. [Google Scholar] [CrossRef]
Baker, R. S. (2014). Educational data mining: An advance for intelligent systems in education. IEEE Intelligent Systems, 29(3), 78–82. [Google Scholar] [CrossRef]
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The Fuzzy C-Means clustering algorithm. Computers & Geosciences, 10(2), 191–203. [Google Scholar] [CrossRef]
Brdnik, S., Podgorelec, V., & Šumak, B. (2023). Assessing perceived trust and satisfaction with multiple explanation techniques in XAI-enhanced learning analytics. Electronics, 12(12), 2594. [Google Scholar] [CrossRef]
Casalino, G., Castellano, G., Kaczmarek-Majer, K., Schicchi, D., Taibi, D., & Zaza, G. (2025). Evolving fuzzy classification for human-centered explainable learning analytics in virtual environments. Evolving Systems, 16(4), 119. [Google Scholar] [CrossRef]
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). Association for Computing Machinery. [Google Scholar] [CrossRef]
Conijn, R., Kahr, P., & Snijders, C. (2023). The effects of explanations in automated essay scoring systems on student trust and motivation. Journal of Learning Analytics, 10(1), 37–53. [Google Scholar] [CrossRef]
Eccles, J., & Wigfield, A. (2002). Motivational beliefs, values and goals. Annual Review of Psychology, 53, 109–132. [Google Scholar] [CrossRef]
Ferguson, R. (2012). Learning analytics: Drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), 304–317. [Google Scholar] [CrossRef]
Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59(1), 64–71. [Google Scholar] [CrossRef]
Ghosh, A., Mishra, N. S., & Ghosh, S. (2011). Fuzzy clustering algorithms for unsupervised change detection in remote sensing images. Information Sciences, 181(4), 699–715. [Google Scholar] [CrossRef]
Goguen, J. A. (1973). L. A. Zadeh. Fuzzy sets. Information and control, vol. 8 (1965), pp. 338–353.—L. A. Zadeh. Similarity relations and fuzzy orderings. Information sciences, vol. 3 (1971), pp. 177–200. Journal of Symbolic Logic, 38(4), 656–657. [Google Scholar] [CrossRef]
Guleria, P., & Sood, M. (2023). Explainable AI and machine learning: Performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Education and Information Technologies, 28(1), 1081–1116. [Google Scholar] [CrossRef]
Hackett, G. (2006). Social cognitive career theory. In J. H. Greenhaus, & G. A. Callanan (Eds.), Encyclopedia of career development (Vol. 2, pp. 750–754). SAGE Publications. [Google Scholar] [CrossRef]
Khousa, E. A., & Atif, Y. (2018). Social network analysis to influence career development. Journal of Ambient Intelligence and Humanized Computing, 9(3), 601–616. [Google Scholar] [CrossRef]
Li, P., Edalatpanah, S. A., Sorourkhah, A., Yaman, S., & Kausar, N. (2023). An integrated fuzzy structured methodology for performance evaluation of high schools in a group decision-making problem. Systems, 11(3), 159. [Google Scholar] [CrossRef]
Li, Y., Gou, J., & Fan, Z. (2019). Educational data mining for students’ performance based on Fuzzy C-Means clustering. The Journal of Engineering, 2019(11), 8245–8250. [Google Scholar] [CrossRef]
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30). Curran Associates, Inc. [Google Scholar]
Malik, S., Mahanty, C., Mohanty, J., Vaghela, K., Narmadha, T., Sivaranjani, R., Bhutto, J. K., Islam, S., Khan, A., & Zewdie, A. (2025). Enhancing education quality with hybrid clustering and evolutionary neural networks in a multi phase framework. Scientific Reports, 15(1), 21323. [Google Scholar] [CrossRef]
Marín Díaz, G. (2025). Integrating exploratory data analysis and explainable AI into astronomy education: A fuzzy approach to data-literate learning. Education Sciences, 15(12), 1688. [Google Scholar] [CrossRef]
Melo, E., Silva, I., Costa, D. G., Viegas, C. M. D., & Barros, T. M. (2022). On the use of eXplainable artificial intelligence to evaluate school dropout. Education Sciences, 12(12), 845. [Google Scholar] [CrossRef]
Molnar, C. (2019). Interpretable machine learning. A guide for making black box models explainable. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 20 April 2026).
Naveed, Q. N., Qureshi, M. R. N., Tairan, N., Mohammad, A. H., Shaikh, A., Alsayed, A. O., Shah, A., & Alotaibi, F. M. (2020). Evaluating critical success factors in implementing E-learning system using multi-criteria decision-making. PLoS ONE, 15(5), e0231465. [Google Scholar] [CrossRef]
Ordoñez-Avila, R., Salgado Reyes, N., Meza, J., & Ventura, S. (2023). Data mining techniques for predicting teacher evaluation in higher education: A systematic literature review. Heliyon, 9(3), e13939. [Google Scholar] [CrossRef]
Pintrich, P. R. (2000). Chapter 14—The role of goal orientation in self-regulated learning. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 451–502). Academic Press. [Google Scholar] [CrossRef]
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the Predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). Association for Computing Machinery. [Google Scholar] [CrossRef]
Saaty, T. L. (1980). The analytic hierarchy process: Planning, priority setting, resource allocation. McGraw-Hill. [Google Scholar]
Saqr, M., Cheng, R., López-Pernas, S., & Beck, E. D. (2024). Idiographic artificial intelligence to explain students’ self-regulation: Toward precision education. Learning and Individual Differences, 114, 102499. [Google Scholar] [CrossRef]
Shafique, U., & Qaiser, H. (2014). A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). International Journal of Innovation and Scientific Research, 12(1), 217–222. [Google Scholar]
Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400. [Google Scholar] [CrossRef]
Susnjak, T., Ramaswami, G. S., & Mathrani, A. (2022). Learning analytics dashboard: A tool for providing actionable insights to learners. International Journal of Educational Technology in Higher Education, 19(1), 12. [Google Scholar] [CrossRef]
Tiukhova, E., Vemuri, P., Flores, N. L., Islind, A. S., Oskarsdottir, M., Poelmans, S., Baesens, B., & Snoeck, M. (2024). Explainable learning analytics: Assessing the stability of student success prediction models by means of explainable AI. Decision Support Systems, 182, 114229. [Google Scholar] [CrossRef]
Verma, R. K., Tiwari, R., & Thakur, P. S. (2023). Partition coefficient and partition entropy in fuzzy C means clustering. Journal of Scientific Research and Reports, 29(12), 1–6. [Google Scholar] [CrossRef]
Xu, S., Yeyao, T., & Shabaz, M. (2023). Multi-criteria decision making for determining best teaching method using fuzzy analytical hierarchy process. Soft Computing, 27, 2795–2807. [Google Scholar] [CrossRef]
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. [Google Scholar] [CrossRef]
Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory Into Practice, 41(2), 64–70. [Google Scholar] [CrossRef]

Figure 1. Publications (1243) and citations (12,231) related to career guidance and student profiling.

Figure 2. Publications (113) and citations (1630) on fuzzy AHP applications in education-related decision-making.

Figure 3. Growth of research on explainable artificial intelligence in learning analytics and educational data mining (45 publications, 574 citations).

Figure 4. FAS-XAI Framework.

Figure 5. Distribution of Survey Responses across Criteria.

Figure 6. Correlation Matrix of Vocational Criteria.

Figure 7. Mean and Variability of Criteria. Dashed line: scale midpoint.

Figure 8. Distribution of Hybrid Weights across Criteria.

Figure 9. Normalized Comparison of Original and Weighted Means.

Figure 10. Fuzzy validity indices as a function of the number of clusters.

Figure 11. Students Assigned to Each Dominant Cluster.

Figure 12. Distribution of Maximum Membership Values across Students.

Figure 13. Fuzzy cluster centroid profiles across the evaluated criteria.

Figure 14. Confusion Matrix–XGBoost Classification.

Figure 15. SHAP-Global Mean Absolute Feature Importance.

Figure 16. SHAP-based summary plot, Clusters 1, 2.

Figure 17. SHAP-based summary plot, Clusters 3, 4.

Figure 18. SHAP-based summary plot, Clusters 5, 6.

Figure 19. Local LIME explanations, representative instances from selected clusters. Green bars increase and red bars decrease the predicted class probability.

Table 1. Literature synthesis and methodological positioning of the proposed framework.

Research Line	Typical Data	Main Approach	Explainability	Main Gap
Vocational guidance and student profiling	Surveys, psychometric instruments	Career guidance, student-centered profiling	Limited	Often assumes predefined or crisp profiles
Fuzzy AHP in education	Expert judgments, educational surveys	Multi-criteria weighting and ranking	Limited	Mainly focused on aggregate evaluation rather than individual preference modeling
Fuzzy clustering in education	Performance data, learning analytics data	FCM and hybrid clustering	Limited	Often oriented toward classification or behavioral grouping
XAI in learning analytics	LMS, performance, dropout-risk data	Supervised ML with SHAP/LIME or dashboards	High	Usually explains predictions post hoc rather than constructing interpretable profiles
This study	Controlled synthetic survey-based data	Fuzzy AHP + FCM + XGBoost	High, SHAP/LIME	Integrates weighting, fuzzy profiles, validation, and explainability in a single pipeline

Table 2. Global Fuzzy AHP Weights by Criterion.

Criterion	Global_Weight
Passion	0.165788
Technical_Aptitude	0.165788
Communication	0.112282
Commitment	0.165788
Creativity	0.112282
Leadership	0.112282
Resilience	0.165788

Table 3. Representative Student Profiles for Each Cluster.

Stud	CI	MaxM	Ent	Gap	Pas	Tech	Comm	Comt	Creat	Lead	Res
0089	1	0.808	0.768	0.737	0.483	0.938	0.184	0.751	0.184	0.184	0.938
0116	1	0.582	1.299	0.440	0.492	0.958	0.187	0.958	0.187	0.075	0.958
0150	1	0.282	1.705	0.095	0.457	0.708	0.376	0.708	0.130	0.376	0.890
0029	2	0.591	1.303	0.474	0.642	0.404	0.711	0.642	0.535	0.324	0.642
0031	2	0.434	1.562	0.265	0.653	0.653	0.724	0.858	0.330	0.330	0.411
0011	2	0.240	1.737	0.043	0.675	0.675	0.568	0.848	0.126	0.358	0.675
0208	3	0.844	0.669	0.792	0.810	0.609	0.676	0.609	0.676	0.676	0.383
0235	3	0.619	1.235	0.473	0.827	0.393	0.694	0.624	0.694	0.694	0.393
0204	3	0.221	1.752	0.027	0.879	0.879	0.341	0.421	0.341	0.563	0.421
0192	4	0.485	1.389	0.225	0.627	0.828	0.520	0.627	0.520	0.314	0.627
0188	4	0.293	1.691	0.082	0.636	0.839	0.529	0.839	0.529	0.320	0.401
0069	4	0.253	1.707	0.025	0.867	0.867	0.583	0.867	0.366	0.127	0.446
0074	5	0.605	1.211	0.461	0.710	0.888	0.164	0.710	0.379	0.164	0.710
0064	5	0.348	1.591	0.106	0.729	0.729	0.167	0.470	0.390	0.167	0.729
0152	5	0.229	1.727	0.003	0.708	0.890	0.376	0.708	0.130	0.376	0.457
0186	6	0.385	1.398	0.067	0.675	0.848	0.358	0.675	0.568	0.126	0.675
0136	6	0.286	1.660	0.036	0.468	0.728	0.387	0.468	0.387	0.132	0.728
0143	6	0.234	1.717	0.005	0.447	0.691	0.584	0.691	0.367	0.128	0.868

Table 4. Numerical centroids of the fuzzy clusters.

Cluster	Pas	Tech	Comm	Comt	Creat	Lead	Res
1	0.511	0.907	0.190	0.748	0.284	0.140	0.895
2	0.647	0.502	0.625	0.737	0.449	0.281	0.553
3	0.802	0.615	0.628	0.536	0.655	0.610	0.459
4	0.659	0.786	0.444	0.663	0.477	0.208	0.623
5	0.739	0.838	0.278	0.687	0.381	0.173	0.713
6	0.669	0.803	0.394	0.658	0.439	0.188	0.664

Table 5. Linguistic 2-tuple representation of the fuzzy cluster centroids.

Cluster	Pas	Tech	Comm	Comt	Creat	Lead	Res
1	(M, 0.066)	(H, 0.442)	(L, 0.140)	(MH, 0.488)	(LM, −0.296)	(L, −0.160)	(H, 0.370)
2	(MH, −0.118)	(M, 0.012)	(MH, −0.250)	(MH, 0.422)	(M, −0.306)	(LM, −0.314)	(M, 0.318)
3	(H, −0.188)	(MH, −0.310)	(MH, −0.232)	(M, 0.216)	(MH, −0.070)	(MH, −0.340)	(M, −0.246)
4	(MH, −0.046)	(H, −0.284)	(M, −0.336)	(MH, −0.022)	(M, −0.138)	(L, 0.248)	(MH, −0.262)
5	(MH, 0.434)	(H, 0.028)	(LM, −0.332)	(MH, 0.122)	(LM, 0.286)	(L, 0.038)	(MH, 0.278)
6	(MH, 0.014)	(H, −0.182)	(LM, 0.364)	(MH, −0.052)	(M, −0.366)	(L, 0.128)	(MH, −0.016)

Table 6. Comparative benchmark between K-Means and the proposed FAS-XAI/FCM pipeline.

Pipeline	Clustering Type	Fuzzy Membership	Silhouette	Accuracy	Log-Loss
K-Means baseline	Crisp	No	0.265	0.935	0.148
FAS-XAI/FCM	Fuzzy	Yes	0.227	0.887	0.394

Table 7. Robustness analysis of the FAS-XAI pipeline under controlled data degradation.

Scenario	Level	FPC	XB	Cluster Stability	Accuracy	Log-Loss
Baseline	0%	0.308	4.644	1.000	0.887	0.394
Gaussian noise	5%	0.267	508.232	0.850	0.839	0.367
Gaussian noise	10%	0.218	353.389	0.709	0.790	0.486
Gaussian noise	20%	0.171	274,465.469	0.377	0.806	0.626
Missing values	5%	0.279	7.795	0.911	0.839	0.538
Missing values	10%	0.263	28.635	0.858	0.839	0.530
Missing values	20%	0.223	2969.859	0.632	0.839	0.492
High-variance anomalies	5%	0.296	23.081	0.951	0.871	0.318
High-variance anomalies	10%	0.288	4.670	0.935	0.790	0.560

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marín Díaz, G. Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning. Educ. Sci. 2026, 16, 917. https://doi.org/10.3390/educsci16060917

AMA Style

Marín Díaz G. Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning. Education Sciences. 2026; 16(6):917. https://doi.org/10.3390/educsci16060917

Chicago/Turabian Style

Marín Díaz, Gabriel. 2026. "Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning" Education Sciences 16, no. 6: 917. https://doi.org/10.3390/educsci16060917

APA Style

Marín Díaz, G. (2026). Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning. Education Sciences, 16(6), 917. https://doi.org/10.3390/educsci16060917

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Vocational Preferences in STEM Students Through Explainable and Fuzzy AI to Support Personalized Learning

Abstract

1. Introduction

2. Related Work

2.1. Personalized Guidance and Vocational Discovery in Education

2.2. Fuzzy Multi-Criteria Decision-Making in Education

2.3. Fuzzy Clustering and Uncertainty Modeling in Education

2.4. Explainable Artificial Intelligence in Educational Contexts

2.5. Literature Synthesis Matrix and Methodological Gap

3. Methodology

3.1. Dataset Design and Construction

3.1.1. Structure of the Dataset and Reproducibility

3.1.2. Preprocessing and Normalization

3.1.3. Methodological Scope

3.2. Fuzzy Multi-Criteria Weighting Through Hybrid AHP

3.2.1. Global and Individual Weighting Structure

3.2.2. Fuzzy Representation

3.2.3. Hybrid Weighting Model

3.2.4. Methodological Implications

3.3. Fuzzy C-Means Clustering

3.3.1. Fuzzy Membership Modeling

3.3.2. Cluster Validation and Interpretation

3.4. Predictive and Explainable Modeling

3.4.1. Predictive Modeling of Fuzzy Structures

3.4.2. Explainable Artificial Intelligence (XAI)

3.4.3. Model Evaluation and Interpretation

3.5. Educational and Strategic Interpretation of Student Profiles

3.5.1. Interpretation of Fuzzy Profiles

3.5.2. From Data Analysis to Pedagogical Value

3.5.3. Generalization and Transferability

4. Results and Discussion

4.1. Exploratory Data Analysis and Variable Justification

4.2. Fuzzy Multi-Criteria Weighting Results

4.2.1. Global Weighting Structure

4.2.2. Individual and Hybrid Weighting Patterns

4.2.3. Construction of the Transformed Dataset Z

4.2.4. Synthesis of the Weighting Process

4.3. Fuzzy Clustering and Profile Identification

4.3.1. Cluster Selection Using Fuzzy Validity Indices

4.3.2. Fuzzy Membership Structure

4.3.3. Cluster Centroids and Linguistic Characterization

4.3.4. Semantic Labeling of Clusters

4.4. Supervised Validation and Explainability of the Fuzzy Clustering Structure

4.4.1. Classification Performance and Confusion Matrix Analysis

4.4.2. SHAP-Based Interpretation of the XGBoost Model

4.4.3. Local Interpretability with LIME

4.5. Cross-Method Consistency Analysis

4.6. Comparative Benchmark with a Crisp Clustering Baseline

4.7. Robustness Analysis Under Controlled Data Degradation

5. Conclusions

6. Limitations and Future Work

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.3. Construction of the Transformed Dataset $Z$