1. Introduction
Choosing a career path is one of the most significant decisions a student may face. For young women, especially from low socioeconomic or marginalised backgrounds, this pathway often feels like a solitary passage navigating through turbulent waters. While vivid dreams accompany this journey, it is also loomed over by social expectations and systemic bias.
Recent studies have investigated the intersection of artificial intelligence, gender, and education, revealing both progress and persistent inequalities. Sharma [
1] contends that AI-driven platforms can enhance women’s access to education and professional advancement, while Conde-Ruiz et al. [
2] highlight the persistent underrepresentation of women in STEM disciplines despite technological progress. Ofosu-Ampong [
3] also found that men and women perceive and use AI-based tools differently, indicating that social variables continue to influence educational and career decisions. Other studies have examined strategies to retain women in higher education STEM programs [
4] and to address the digital gender gap in academic institutions [
5]. These investigations focus primarily on behavioural and institutional aspects rather than computational or data-driven approaches.
Recent literature also highlights the potential of artificial intelligence to promote gender equality through inclusive and adaptive educational environments. Systematic reviews [
6,
7] demonstrate that AI and digital technologies can enhance participation and inclusion in STEM education, although their implementation often faces pedagogical and cultural challenges. Empirical studies show that AI-based interventions can improve performance and reduce gender bias, including enhancing outcomes for girls in male-dominated learning contexts [
8] and raising awareness of gender stereotypes in STEM careers [
9]. Additional research reveals that institutional design and feedback mechanisms strongly influence gendered academic choices [
10,
11]. Despite these advances, the existing literature remains focused largely on awareness initiatives and educational reforms, with limited attention to algorithmic fairness and personalization in career guidance.
To address this gap, the present study introduces SAIL-Y, a Socioeconomic and Gender-Aware Recommender System designed to operationalize equity in educational decision-making. By integrating the socioeconomic context, bias-controlled bootstrapping, and fairness metrics within a unified recommendation framework, SAIL-Y advances the existing literature by demonstrating how data-driven and fairness-aware artificial intelligence can effectively promote women’s participation in STEM degree programs. The main contributions of this work are summarised as follows:
SAIL-Y, a novel socioeconomic-aware recommendation framework that integrates standardised test data with gender-focused bootstrapping techniques and collaborative filtering, explicitly designed to promote STEM career paths among underrepresented female students.
A multi-strategy recommendation architecture, where one-layer leverages collaborative patterns among similar students, and another incorporates bias-controlled sampling to address historical imbalances in academic preferences.
The proposed recommender framework was tested using a large-scale Colombian educational dataset (based on Saber 11 and Saber Pro results), demonstrating that SAIL-Y consistently outperforms baseline models in both recommendation accuracy and fairness—particularly in scenarios involving underrepresented groups and cold-start users.
The remainder of this article is organised as follows.
Section 2 reviews the related works, covering three key dimensions of the field: recommender systems for university major selection, gender bias and fairness in educational recommendations, and systems designed to promote women’s participation in STEM disciplines.
Section 3 describes the methodology, including the dataset obtained from the Saber 11 and Saber Pro examinations, the structure of the SAIL-Y recommender framework, the experimental setup, and the performance metrics used for evaluation.
Section 4 presents the results, providing a detailed analysis to address each of the research questions and to assess the system’s fairness and predictive capacity.
Section 5 discusses the findings in comparison with previous research, highlighting both theoretical implications and contextual limitations of the study. Finally,
Section 6 concludes the paper by summarizing the main contributions and outlining future research directions for gender-aware educational recommender systems.
3. Methodology
The SAIL-Y framework is designed as a multi-layered recommendation system that promotes equitable access to STEM (Science, Technology, Engineering, and Mathematics) careers for female students. By combining data-driven personalisation, bias-controlled data augmentation, and fairness-aware evaluation, SAIL-Y aims not only to improve predictive accuracy but also to mitigate gender disparities in academic guidance.
3.1. Dataset
We use a large-scale dataset composed of 332,933 records from the Colombian Saber 11 standardised exam, collected between 2010 and 2021.
Each student u is represented by a feature vector that concatenates: (i) academic performance (e.g., Saber 11 scores in mathematics, natural sciences, critical reading, English, and citizen competencies), (ii) demographics (gender, region, school type), and (iii) socioeconomic indicators (urban/rural, socioeconomic stratum, parental education). This structured representation allows the model to compare students holistically rather than only on grades.
Notation summary:
3.2. Recommender Framework
Figure 1 provides an overview of the sequential structure of the SAIL-Y recommendation system. The framework is composed of three key modules. First, the bootstrapping stage increases the presence of female students in STEM careers within the training data to address historical bias. Then, collaborative filtering generates preliminary career recommendations based on user similarity in the augmented dataset. Finally, the socioeconomic conditioning module adjusts these recommendations by incorporating the student’s socioeconomic context. This layered architecture ensures that the system balances accuracy with fairness and contextual sensitivity, leading to more equitable and realistic career suggestions.
- (a)
Collaborative Filtering Layer
User-based collaborative filtering is applied in this component. It clusters students with comparable academic and socioeconomic backgrounds, recommending majors based on the decisions of their closest peers. This approach captures behavioural similarity. It works well in simulating hidden preferences.
We quantify similarity between two students u and v using cosine similarity:
where
it is the vector of user characteristics, composed of results in the Saber 11 standardized tests and socioeconomic attributes
.
The prediction for user
and degree
It is calculated as:
where
is 1 if neighbour
choose the degree
, and 0 otherwise.
- (b)
Bootstrapped Bias Correction Layer
To counter the imbalance issue of underrepresented female role models in STEM fields, we employ bootstrapped bias correction. This is done by oversampling to artificially augment the profiles of female students who had chosen STEM fields, thus shifting the educational bias of the model towards a more just representation without deleting any original data. Historical data typically underrepresents women choosing STEM majors, which biases any learner trained on it. To counter this, we employ stratified oversampling that increases the presence of female-STEM trajectories during training.
Let D be the original training set, and let us define:
Lambda (λ) is the oversampling factor.
Thus, the balanced set is:
This procedure increases the influence of successful trajectories of women in STEM careers without removing original data.
- (c)
Socioeconomic Conditioning Module
Even with personalisation and debiasing, recommendations can overlook structural barriers (e.g., rurality, parental education). We therefore learn a context-aware adjustment that modulates each major’s score according to the student’s socioeconomic profile.
Learned adjustment. For each major i, we developed a logistic model over socioeconomic attributes
to obtain an accessibility/feasibility score:
Interpretation. quantifies how feasible or context-aligned major i is for student u given s_u. Higher indicates that, historically, students with similar contexts have had greater exposure or access to i.
Combining signals. The final score for major i blends the CF signal with the socioeconomic adjustment:
3.3. Experimental Setup
Let U denote the set of users and I the set of possible careers. For each user , the recommender system generates a ranked list of k recommended careers , and the ground truth selection is denoted .
The dataset was split into training (70%), validation (15%), and test (15%) subsets using stratified sampling based on gender. All models were trained using the training subset, hyperparameters were tuned on the validation set, and final evaluations were conducted on the test set.
3.4. Experiments
This section presents the experimental evaluation of SAIL-Y, our proposed gender-aware academic recommender system. We aim to answer the following research questions (RQ):
- -
RQ1: How does SAIL-Y perform in terms of recommendation accuracy and fairness compared to standard models?
- -
RQ2: What is the contribution of each component—bootstrapping, collaborative filtering, and socioeconomic conditioning—to overall performance?
- -
RQ3: How sensitive is the system to variations in sampling strategies and neighbourhood size?
- -
RQ4: To what extent is SAIL-Y interpretable in its recommendations, particularly in understanding its bias-aware behaviour?
3.4.1. Benchmark Methods
We benchmark SAIL-Y against the following baselines:
- -
Random: A naive recommender that assigns majors uniformly at random.
- -
Popularity-Based: Recommends the most frequently chosen majors across all students.
- -
Collaborative Filtering (CF): Standard user-based collaborative filtering using nearest neighbours.
- -
Content-Based Filtering: Recommends majors based on closest match to students’ academic profiles.
- -
CF + Bootstrapping: CF trained on a bootstrapped dataset where female STEM choices are oversampled.
- -
SAIL-Y (Full Model): Combines CF, bootstrapping, and socioeconomic conditioning
3.4.2. Performance Metrics
The evaluation metrics used in this study are standard in recommender systems literature [
40,
41]. Fairness metrics such as Disparate Impact Ratio [
42] and adaptations like the Gender Fairness Ratio are inspired by recent advances in fair recommendation systems [
43].
Precision@k: Precision@k measures the proportion of relevant items in the top-k recommendations:
Recall@k: measures the proportion of relevant items that are successfully recommended:
Coverage: indicates the system’s ability to recommend diverse items:
Gender Fairness Ratio (GFR): compares the rate of STEM recommendations between genders.
Disparate Impact Ratio (DIR): adapted from fairness literature, is defined as:
3.4.3. Implementation Settings
All models were implemented in R, using the Caret, and Statsmodels packages. Bootstrapping was performed using stratified oversampling with ratios of 1:1, 2:1, and 3:1 for female STEM:non-STEM entries. Career similarity was computed using cosine distance, and the top-5 recommendations (k = 5) were retained for evaluation.
4. Results
This section presents and analyses the results obtained from implementing the SAIL-Y recommender framework. Each subsection addresses one of the research questions outlined in the introduction, detailing the model’s predictive performance, fairness indicators, and gender-aware recommendation outcomes. The results are organised to illustrate how the system balances accuracy with equity objectives, providing quantitative evidence of its potential to guide more inclusive participation in STEM degree programs.
4.1. Performance Analysis (RQ1)
To assess the impact of SAILY on increasing the likelihood of women selecting STEM careers, we evaluated the overall performance of the system across key metrics including Precision@5, Recall@5, and the Gender Fairness Ratio (GFR).
Table 1 presents a comparative analysis of multiple recommendation models, showing a clear progression in both performance and fairness metrics as the system evolves. The baseline Collaborative Filtering (CF) model yields a Precision@5 of 0.243 and a Gender Fairness Ratio (GFR) of 0.68, indicating notable gender disparity in recommendations. As enhancements are added—such as bootstrapping and bias correction—the GFR improves. The full implementation of the SAILY system achieves the highest scores across all metrics, with a Precision@5 of 0.269, Recall@5 of 0.437, GFR of 1.13, and Disparate Impact Ratio (DIR) of 1.21. These results demonstrate not only an improvement in recommendation accuracy but also a substantial increase in gender fairness. The GFR exceeding 1.0 implies that women receive more STEM career recommendations than men, directly supporting the system’s goal of mitigating underrepresentation. This evidence strongly supports the conclusion that SAILY is effective in increasing the likelihood of STEM career recommendations for women, thus answering RQ1 affirmatively.
The results show that SAIL-Y achieves the highest accuracy and substantially improves gender fairness in STEM recommendations. Notably, the DIR above 1.0 indicates that female users receive more STEM-oriented recommendations than males—a desirable outcome for a gender-aware system.
The full SAIL-Y model achieves the highest overall precision and recall, while simultaneously increasing gender fairness. The improvement in DIR > 1 confirms that the model effectively shifts the distribution of STEM recommendations toward female students, correcting historical underrepresentation.
Specifically, the full SAIL-Y improves accuracy by 10% over traditional collaborative filtering and raises the gender equity ratio (DIR) to 1.21, meaning that for every man who receives a STEM recommendation, 1.21 women also receive it. In other words, SAIL-Y helps women see more careers in areas like engineering or technology among their most likely choices, something that classic systems do not achieve.
4.2. Stepwise Study (RQ2)
To understand each module’s contribution, we conducted a stepwise study, removing one component at a time from the full SAIL-Y pipeline.
Table 2 reports the results.
Therefore, removing the bootstrapping component resulted in decreased fairness and performance: Precision@5 dropped to 0.246, Recall@5 to 0.406, and fairness metrics GFR and DIR fell to 0.74 and 0.79, respectively. This indicates that bootstrapping plays a critical role in enhancing both accuracy and fairness. Excluding socioeconomic conditioning, while retaining bootstrapping and collaborative filtering, led to slightly higher accuracy (Precision@5 = 0.254; Recall@5 = 0.417) but diminished fairness (GFR = 0.94; DIR = 0.95), suggesting that this component is particularly important for promoting equitable recommendations. The full SAIL-Y model outperforms all other variants across every metric—achieving 0.269 Precision@5, 0.437 Recall@5, 1.13 GFR, and 1.21 DIR—demonstrating the synergistic benefit of integrating all three components. These results confirm that each module—bootstrapping, collaborative filtering, and socioeconomic conditioning—adds distinct and complementary value to the system’s overall performance and fairness, thereby providing a robust response to RQ2.
4.3. Parameter Sensitivity Analysis (RQ3)
To address RQ3, we conducted a parameter sensitivity analysis to examine how key hyperparameters affect the balance between fairness and accuracy in the SAIL-Y system. Specifically, we varied three components:
- -
Oversampling ratios (female STEM:non-STEM): {1:1, 2:1, 3:1}
- -
Top-N recommendation sizes: k = {3, 5, 10}
- -
Neighbourhood sizes in CF: k-nearest neighbours = {10, 20, 50}
The results reveal that an oversampling ratio of 2:1 provides the most effective trade-off between fairness and accuracy. Although a 3:1 ratio further improves fairness metrics, it introduces signs of overfitting, suggesting diminishing returns beyond moderate oversampling. Similarly, a top-k value of 5 strikes the optimal balance between recommendation relevance and breadth, in line with findings from the educational recommender systems literature.
In the collaborative filtering layer, a neighbourhood size of 20 yields the most stable performance. Smaller neighbourhoods (e.g., k = 10) result in sparsity-related issues, while larger ones (e.g., k = 50) introduce noise and reduce personalization. These findings underscore the importance of moderate debiasing and context-aware calibration, which collectively support the system’s ability to enhance fairness without sacrificing recommendation quality.
4.4. Interpretability (RQ4)
One of the fundamental objectives of the SAIL-Y system is not only to provide accurate recommendations, but also to ensure that these recommendations can be clearly explained, especially in sensitive educational contexts such as vocational guidance. To illustrate this aspect, we conducted a detailed case study with a student selected from the test set.
She is a 17-year-old girl, living in a rural area and coming from a family with low levels of education (neither of the parents finished high school). Despite her context, the student obtained outstanding results in mathematics and natural sciences in the Saber 11 exam, which indicates a high potential to perform in STEM (Science, Technology, Engineering and Mathematics) areas.
Initially, the system compares this student with other students with similar academic and demographic profiles. This process, known as collaborative filtering, allows us to identify patterns of career choice in similar students. However, in the original dataset, many high-performing students from rural areas tend to select traditional careers such as accounting or administration. For this reason, the first recommendation generated by the system was precisely Business Administration, since it was the most common option among its “educational neighbours”.
This is where the equity component of the system comes into play. Through a technique called “bootstrapping”, the SAIL-Y system over-represents positive cases of women who, despite their context, decided to study STEM careers. This allows the student to have access to a new set of references: other girls like her who opted for less traditional paths. The result is that the recommendation changes significantly: Electronic Engineering comes to occupy the first place as a suggestion.
But SAIL-Y does not stop there. At this stage, the system revises the recommendation again, this time considering aspects of the student’s socioeconomic environment: her stratum, whether she lives in a rural or urban area, and the educational level of her parents. Based on this information, the recommendation is adjusted looking for an option that, in addition to being in accordance with their academic profile, is viable and accessible given their conditions.
In this case, the system considers Biology to represent an equally scientific path but with a lower barrier to entry for a rural student since it can offer more alternatives for access to public programs, scholarships, or agreements with nearby institutions.
Thus, the final recommendation, after going through the three layers of analysis, the system suggests to the student the following order of careers: 1. Biology, 2. Electronics Engineering, 3. Mathematics.
Figure 2 illustrates the evolution of recommendation scores for the case study student u∗u^*u∗ across the three stages of the SAIL-Y system. Initially, under collaborative filtering alone (“CF only”), the highest score is assigned to Engineering, followed by Math and Biology, reflecting historical patterns among students with similar academic performance and socio-demographic background. After applying the bootstrapping module (“CF + Bootstrapping”), which increases the visibility of successful women in STEM fields, Engineering becomes even more strongly recommended, indicating that this module amplifies equity-aware signals in the training data. However, once the socioeconomic conditioning module is applied (“SAIL-Y Final”), Biology emerges as the top recommendation. This shift demonstrates the system’s sensitivity to contextual barriers and opportunities: although Engineering aligns with academic potential, Biology may represent a more feasible and accessible career path given the student’s rural background and low parental education. The figure thus highlights the cumulative and corrective effect of each SAIL-Y module in shaping not only accurate but socially responsible recommendations.
5. Discussion
The development and evaluation of SAIL-Y address a critical gap in educational recommender systems: the need for fair, gender-conscious guidance in university major selection. In this section, we contextualise our findings by comparing SAIL-Y’s methodological design, data usage, and fairness outcomes with prior work and explicitly responding to the research questions.
RQ1—Does the system increase the likelihood of recommending STEM programs to women?
Most conventional academic recommenders rely on personalization based on exam scores, interests, and educational history, employing collaborative filtering, fuzzy logic, or content-based filtering techniques [
1,
3,
9]. While these models often achieve high accuracy—some reporting up to 98% precision in university degree recommendations [
5]—they frequently reproduce historical biases. For example, collaborative filtering tends to reinforce gender-stereotypical choices, underrepresenting women in STEM [
10].
In contrast, SAIL-Y significantly improves the representation of women in STEM recommendations, as evidenced by its Gender Fairness Ratio (GFR = 1.13) and Disparate Impact Ratio (DIR = 1.21), while also increasing predictive accuracy (Precision@5 = 0.269; Recall@5 = 0.437). These results directly address RQ1 and demonstrate that the system meets its intended fairness goals without compromising user relevance.
RQ2—What is the contribution of each component—bootstrapping, collaborative filtering, and socioeconomic conditioning—to overall performance?
SAIL-Y’s layered architecture explicitly integrates equity-enhancing mechanisms. A bootstrapped oversampling layer increases the proportion of women choosing STEM majors in the training data, while the socioeconomic-aware conditioning module introduces structural context such as school type, income bracket, and parental education.
To evaluate the contribution of each component, we conducted an ablation study. Removing bootstrapping or conditioning reduced both accuracy and fairness metrics (e.g., GFR dropped from 1.13 to 0.74 without bootstrapping, and DIR fell from 1.21 to 0.79), confirming that each module is essential. This directly responds to RQ2: both the bootstrapping and the conditioning layers are synergistic in enhancing system performance and fairness. Recent research has explored approaches to mitigating gender bias in recommending educational materials using adversarial learning, fairness-through-unawareness, and sample reweighting [
11,
14]. These methods have shown that fairness-enhancing interventions and bias mitigation are possible without sacrificing predictive accuracy. SAIL-Y is aligned with this research but introduces a more interpretable and simpler approach, based on bootstrapped oversampling, that allows for more straightforward implementation on large public datasets and is more appropriate for low-resource educational contexts.
Additionally, unlike some fairness-aware systems that focus on algorithmic neutrality, SAIL-Y employs constructive bias: deliberately increasing the chances that women are recommended for STEM fields to counter persistent underrepresentation. This bias embodies feminist rationale and social identity theories that call for structural changes rather than neutral models [
20].
RQ3—How sensitive is the system to parameter tuning (e.g., oversampling ratio, top-N recommendation size, neighbourhood size)?
The model’s performance was further examined under different configurations. We tested various oversampling ratios (1:1, 2:1, 3:1), top-k values (3, 5, 10), and neighbourhood sizes (10, 20, 50) in the collaborative filtering layer.
Findings show that an oversampling ratio of 2:1 offers the best trade-off between fairness and overfitting. Similarly, a top-k value of 5 balances recommendation diversity and relevance, while a neighbourhood size of 20 provides stable and interpretable predictions. These results support RQ3, confirming that moderate parameter tuning is sufficient to maintain robustness and enhance fairness.
RQ4—Does the fairness-enhancing approach compromise system performance?
Some fairness-aware systems emphasise algorithmic neutrality, which may reduce accuracy. In contrast, SAIL-Y intentionally introduces constructive bias by increasing the likelihood that women receive STEM recommendations. This strategy draws on feminist and social identity theories, advocating for structural correction rather than neutrality [
20].
Our results show that this constructive bias does not degrade performance. On the contrary, incorporating fairness-aware mechanisms improves both fairness and accuracy compared to baseline models. This finding addresses RQ4, confirming that it is possible to enhance fairness without incurring a trade-off in predictive performance.
Overall, SAIL-Y explores social and technical dimensions from recent literature on recommender systems, educational fairness, and gender inclusivity in STEM. It is designed to transform the status quo and foster more equitable outcomes through its layered architecture, design prioritising fairness, and personalised responses based on specific contexts.
The limitations of the study are related to the context of the data used in the recommender system, restricting the generalizability of the findings. The dataset represents the socio-economic and academic factors of Colombian students, although the Sabre 11 and Sabre Pro exams are aligned with the PISA test, the results of the study may not completely apply to other nations where educational evaluations, gender norms, and access to STEM programs vary considerably. Future research should implement recommendation approaches similar to SAIL-Y, harmonising the academic and socioeconomic context that determines young people’s career choices and reflecting cultural and policy differences in access to higher education.
Consequently, the fairness logic underlying the SAIL-Y framework is bias-corrective rather than strictly neutral. Thus, the fairness approach is based on the idea that structural inequalities cannot be resolved through neutrality alone, considering that machine learning models tend to reproduce the imbalances represented in the dataset where the learning is developed. Therefore, SAIL-Y proposes to counteract women’s historical underrepresentation in STEM fields. The resulting Gender Fairness Ratio is greater than 1, thus reflecting a deliberate equity-oriented design choice rather than a statistical artefact. Nevertheless, the social implications associated with perceptions of reverse discrimination arose, underscoring the importance of contextual adjustments and ethical oversight in future applications of fairness-aware recommender systems.
6. Conclusions
This paper introduced SAIL-Y, a gender-sensitive recommendation framework designed to increase the representation of women in STEM programs. Using real-world educational data and fairness-aware mechanisms, the system demonstrated strong performance across key evaluation metrics.
In direct response to the research questions:
RQ1: SAIL-Y outperforms baseline models in recommending STEM careers to women, achieving higher accuracy and fairness scores.
RQ2: The contribution analysis confirms that both bootstrapping and socioeconomic conditioning play critical roles in enhancing fairness without degrading performance.
RQ3: Parameter sensitivity analysis reveals that moderate oversampling and calibrated neighbourhood sizes offer the best fairness–accuracy trade-offs.
RQ4: The final model achieves improved gender fairness while preserving the quality of recommendations, indicating that fairness and performance are not mutually exclusive.
The findings contribute to the growing field of algorithmic fairness in education and demonstrate the feasibility of embedding equity-aware design into recommender systems. Future work may explore extending SAIL-Y to other underrepresented groups and educational contexts.
The results demonstrated increased predictive accuracy with SAIL-Y and marked improvements in equity-focused fairness metrics such as the Gender Fairness Ratio and Disparate Impact Ratio. This supports previously stated recommendations made by fairness-aware recommendation systems that precision/recall-optimised models advocated incorporating bias mitigation strategies. Unlike most systems, SAIL-Y embeds equity aims in the design phase rather than post hoc adjustments which situates SAIL-Y within holistic frameworks of gender inclusion in STEM education.