Is There a Real Need for the Preparatory Years in Higher Education? An Educational Data Analysis for College and Future Career Readiness

: Universities seek to qualify students for their academic and career futures and meet labor market requirements. Hence, a preparatory year is provided to bridge the gap between high school outcomes and the needs of university study plans. The preparatory year is the ﬁrst year of support in the life of university students, and for decades, it has been recognized as important. It is considered the most crucial stage in the life of university students, where they build and reﬁne their skills and choose their academic major, in which they complete their academic and career life. Due to the importance of this year, which requires the full attention and care of the higher authorities in terms of preparation, development, and renewal, this research outlines the importance of the preparatory year at a local level and in international institutions. Moreover, it sheds light on the details of King Abdulaziz University (KAU) students as a case study. It measures the relationship between the admission weighted ratio (AWR), the college enrollment allocation weighted ratio (CEAWR), and the performance of three batches of male and female students (three consecutive years), with details of students’ college allocation after the end of the preparatory year. More importantly, it aims to realize students’ progress through their weighted averages during their preparatory year, and the extent to which the goals of the preparatory year are achieved. After an analytic survey of the reality of the preparatory year, based on the statistical tests conducted, this study found that it is not possible to be satisﬁed with the weighted ratio for colleges’ direct allocation of high school students. The tests showed a difference between the AWR and that of the CEAWR, which indicates a change in the level of students’ performance from high school to university, due to the positive impact of the preparatory year. More precisely, it was noted that there is a possibility of studying the sufﬁciency of the weighted ratio for the direct allocation of some colleges in future research.


Introduction
In some countries, higher education institutions offer a preparatory year for high school graduates to bridge the gap between school and university education and to well integrate students into the university environment (Batterjee Medical College 2021). Seen as an essential qualitative leap that moves students intellectually and socially from preuniversity to academic study, the preparatory year requires effort, research, and creativity (Imam Abdulrahman Bin Faisal University 2021). The preparatory year program is responsible for developing newcomers' scientific and personal capabilities to allow them to choose a specialization appropriate to their abilities and inclinations (Batterjee Medical College 2021; Imam Abdulrahman Bin Faisal University 2021; Good Universities Guide 2021; King Faisal University 2021). The program works on obtaining the learning outcomes

Student Level Standard for Admission
The data used when admitting are to generate student's AWR: General Secondary School Certificate graduates (GSC) ratio, General Aptitude Test (GAT) score, and Academic Achievement Test (AAT) score (see Figure 1).
For this research, the authors found advanced technological platforms in King Abdulaziz University (Brdesee 2021;Assiri et al. 2020;Brdesee 2019b;Brdesee 2018). This technical dimension is the phenomenal advancement of technologies, networks, and communications. King Abdulaziz University has integrated academic services and systems, designed in the latest programming languages, databases, and modern web, in its website applications (Brdesee 2019a;Brdesee et al. 2017;Alsaggaf et al. 2017;Noaman et al. 2017;Brdesee and Alsaggaf 2015). Thus, for the purpose of this research and to provide meaningful feedback based on these data, the authors used three consecutive years of results from students' preparatory years. Weighted ratio = (GSC ratio × 50%) + (GAT score × 30%) + (AAT score × 20%).
For this research, the authors found advanced technological platforms in King Abdulaziz University (Brdesee 2021;Assiri et al. 2020;Brdesee 2019b;Brdesee 2018). This technical dimension is the phenomenal advancement of technologies, networks, and communications. King Abdulaziz University has integrated academic services and systems, designed in the latest programming languages, databases, and modern web, in its website applications (Brdesee 2019a;Brdesee et al. 2017;Alsaggaf et al. 2017;Noaman et al. 2017;Brdesee and Alsaggaf 2015). Thus, for the purpose of this research and to provide meaningful feedback based on these data, the authors used three consecutive years of results from students' preparatory years.

Literature Review
This section tackles the literature review attempted for this study, based on several axes. The first axis addresses several factors that affect student performance in university education, especially in the preparatory year. Bruinsma and Jansen (2007) examined the nine factors of the Walberg educational productivity model, measuring the academic achievement of first-year university students. According to the study, almost a quarter of the variance in achievement, and the variables of prior achievement and expectancy, are elucidated by eight factors, namely, grades, motivation, age, prior achievement, home environment, support from peers, classroom environment, quality of instruction, and quantity of instruction. To examine differences in the success among first-year university students, Van Der Zanden et al. (2019) analyzed three different domains: Academic achievement, critical thinking disposition, and social-emotional adjustment to university life. The study found that student success is a multi-domain concept, as different groups of students show different patterns of success. Therefore, the study recommended that universities adapt their support to suit student needs.
By analyzing all courses taken by all students in major Canadian universities for a decade, Beaulac and Rosenthal (2019) maintained that machine learning algorithms can utilize extensive data to create useful tools. The study examined how the first two semesters of a student may predict whether they obtain a university degree and may predict a student's major from their registration for the first few courses. In the study, the researchers used random forests, which allowed for reliable variable measurements. Culver and Bowman (2020) tackled the issue of first-year seminars that are meant to assist students in their university studies. Using quasi-experimental analyses within a large dataset to investigate the connection between students' participation in seminars and their success, the study found that seminars in the first year affect first-year students, but have no effect on students and grades in, for example, the fourth year. The study also found that some factors, such as race, sex, and ACT score, may affect first-year seminars. In a recent study, Kabathova and Drlik (2021) discussed the use of learning analytics to explore the levels of student dropout from university studies using the available educational data. The study attempted to address appropriate features for machine learning classifiers used e-learning courses, which provide access to tests, assignments, exams, and projects. The accuracy rates of the prediction of students who complete vs. drop out of their studies reached 77-93%. The study concluded that many machine learning algorithms may be applied to a scarce educational dataset, while universities need to consider classification performance metrics before using the best performance classification model to predict dropout cases and to suggest intervention mechanisms.
Universities also tend to perform data analyses regarding the link between students' performance in high school and admissions. Niu and Tienda (2010) investigated how a U.S. university used the administrative data derived from the period between 1990 and 2003 to assess the claims that students granted automatic admission underperform academically compared to students who had a competitive environment in high school. The study found that differences in university performance roughly depend on the levels of competitiveness in high schools, rather than test scores. Wolniak and Engberg (2010) also stated that high school impacts academic achievement in the early stages of university education. The researchers found that high school infrastructure and context affect the performance of first-year university students.
Meanwhile, Bone and Reid (2011) investigated how high school affects the performance of university students in their first year, with a focus on biology. When the results were compared to high school results in similar subjects, it was observed that students who completed biology at high school do not perform better than those who did not in a biology course at the first university level. However, in the biochemistry course, learning biology in high school makes a difference. This suggests a need to address the biology curricula at the high school level. In a similar study focusing on physics and math courses, Adamuti-Trache et al. (2013) employed a two-level hierarchical model to find the relationship between university students' performance in their first university year and during their high school education. The researchers maintained that this helps university teachers identify the gap between prescribed and achieved learning outcomes and supports policies and decisions made by high schools and university administrators. The study concluded that university performance in physics can largely be determined by a student's gender and their performance in high school.
In their study, Mennen and van der Klink (2017) investigated predictions of the success of novice music higher education students. The study found a strong relationship between study progress in the first year and in subsequent years, as the findings provided more insight into the predictive ability of the first year. Furthermore, Cyrenne and Chan (2012) maintained that while admission decisions rely principally on high school grades, they can estimate the chances of students' success based on high school grades. The researchers used several alternative estimators in a case study, such as a least square dummy variable model and a hierarchical linear model. The study recommended that admission officials adopt this method to estimate the subsequent performance of first-year students.
Numerous studies have also looked at ways to anticipate students' performance in their first and/or preparatory year at university, helping to identify students that are at risk of failure earlier, so that universities may address their potential problems more quickly. Probing the issue, Namoun and Alshanqiti (2021) studied the literature published over a decade (2010-2020), predicting students' performance based on measurements of learning outcomes. The study found that learning outcomes are calculated based on students' class ranks and their achievement scores, i.e., grades. The study also found that while machine learning models are used to classify performance, the main predictors of students' attaining LOs include online learning activities and assessment grades. Figueira (2016) investigated the issue from another perspective, using data from Moodle logs to predict students' grades. Using a component analysis approach to stem a decision tree, the study found that the data need to be applied as a whole for a successful prediction of grades, particularly to predict failure. Koretz and Langi (2018) found that grading standards in high schools vary significantly; therefore, they are not sufficient to predict performance at university. As a result, the researchers used regression models to predict the college performance of freshmen based on scores in high school and admission and state tests. The study found that admission tests are significant in offsetting differences in high school grading standards. Allensworth and Clark (2020) argued that students with the same high school GPA or the same test score (such as ACT test) graduate at very different rates, and that this variation is due to the school attended. The study found that the relationship between high school GPA and graduation is strong, unlike the relationship between ACT score and graduation, which is weak. They concluded that the slope of the relationship varies according to the high school attended. Mayers et al. (2017) studied the influence of novice students' recreation participation, GPA, and engagement. The study found that participation in campus recreation activities positively influences students, which may help university decision-makers make the transition from high school easier, as such participation boosts engagement and yields positive outcomes. Alamri and Alharbi (2021) probed explainable models to predict students' performance by analyzing and synthesizing the literature on the topic. The study found a need for more research and deeper studies on explainable prediction models for students' performance, as the main predictors used in general for similar current studies are socioeconomic features and pre-course performance. Dorta-Guerra et al. (2019) suggested a new predictive model for academic performance dedicated to science students in their first term of the first year at a major Spanish university. The study explored the most prominent factors for predicting students' results using multiple linear regression models. The study found that the best prediction indicator is high school GPA, and then, by using predictive models, academic performance can be predicted so that early interventions can be implemented to boost students' academic achievement.
Researchers have also paid more attention to the selection of a major in universities, and how a student's performance in his/her first university year and the following years can be reflected by the selection of a major. For example, Pinxten et al. (2015) used multinomial regression to examine how several factors, such as prior subjects, occupational interests, gender, socioeconomic status, and future hopes, may affect major selection. The study found that the main predictor of major selection is prior subject uptake in high school. The study used a binary logistic regression model, which proved that higher achievement in high school boosts academic performance in the first university year.
Jamelske (2009) discussed the first-year experience (FYE) program conducted in American Midwest universities at the end of the 1990s, designed to incorporate curricular and extracurricular components into core courses. The program aimed to integrate students into the university community, and the study examined how this affects GPA and retention ratios. The study found that FYE has some effect on GPA, but not on retention. Wille et al. (2020) studied the reasons why more students tend not to choose STEM careers, using two theories of career choices. The study found that both expectancy value theory constructs and vocational interests contribute differently to STEM major choices. Ludwikowski et al. (2019) examined how far ability provides incremental validity when predicting the choice of major, without resorting to predictions based on personality, self-efficacy, and interests. The study concluded that interests and self-efficacy are not impactful predictors, while personality has some influence, which demonstrates that career counselors need to assess clients' interests and self-efficacy when helping them make career decisions.
The previous research varied in the means of studying the preparatory year and its impact on students' performance and college enrollment. However, this study addresses the assumption that a higher ratio after the preparatory year indicates a higher academic performance than the direct admission ratio of high school students who applied to university. Therefore, the study objectives were:

•
To find out how well the admission ratio is balanced with the CEAWR in fulfilling students' college selection; • To study the relationship between the CEAWR and the AWR; • To study the possibility of the weighted ratio being satisfactory for use when admitting students directly into colleges.

Research Methods
For the purpose of this research, the authors gained the proper data from the Deanship of Admission and Registration at King Abdulaziz University. The nature of such data can be analyzed using statistical techniques; thus, a quantitative research approach was used.

Data Preparation
All course grades obtained by students were extracted from the university database through the educational information (BANNER) system. This research was conducted between the years 2019 and 2020, and the data were collected for three specific batches, i.e., 2015, 2016, and 2017, in addition to the cumulative rate of students by the end of the preparatory year, taking the track type (scientific or administrative and humanities) into account.
It should be noted that all students surveyed had already completed the preparatory year and had gone on to specialize in colleges. This section of the report focuses solely on presenting statistics in the form of charts of preparatory year KAU students from three consecutive batches (2015)(2016)(2017). The study samples included approximately 47,000 students (18,391 male and 28,016 female students). We ensured that their results were obtained at the time of admission, during the preparatory year study, and upon receiving their CEA results. The statistics and data included detailed figures for the admission, CEA, and student rates at colleges.

Analysis of Student Levels with CEA and Admission
Initially, the weighted ratio of students was shown upon admission, with the GSC percentage, as well as GAT and AAT grades. For the preparatory year, we listed the average grades in detail for all courses (preparatory year courses) in both tracks: The scientific track (ST) and the administrative and humanities (AHT) track. The relationship between the weighted ratio of students upon admission AWR, including GSC, GAT, and AAT with CEAWR, was provided after the preparatory year, then listed in colleges for the years 2015 to 2017. It is worth mentioning that the comparison and relationship between all student levels were found in general and in more detail. At the general level, we listed the rate for all study years; however, at the detailed level, we linked the weighted ratio of the student at admission to the cumulative rate after the preparatory year, and examined the relationship between them. For the detailed grades in the courses, we listed the rate each year, linked this with previous years, and then linked this with scientific and literary grades.

Data Description and Statistical Methods
This analysis aimed to discover the following: 1.
The relationship between gender and type of GSC; 2.
The relationship between gender and the college to which the student is admitted; 3.
The relationship between the type of GSC and the allocated program after the admission stage; 4.
The relationship between the college and the accepted student's ranking of it.
The following tests were applied and analyzed (George and  Chi-square test and Kramer Labs to study the abovementioned relationships. After organizing the data, we deleted unavailable information on the following variables: The admission ratio, the admitted program, the college, the CEAWR, and the order of desire when admitted (see Table 1).

Admission Ratio
The graphs show the admission ratio of students over the three years. The weighted ratio is close to 60 and less than 100, and it is worth noting that none of the students received a 100 admission ratio (see Figure 2 and Table 2).

College Enrollment Allocation (CEA)
The bar charts below show the distribution of students' enrollment allocation to colleges after the preparatory year. We notice that the extent of the weighted ratio is close, ranging from 80 to 100 s, which means that there was a great improvement in the students' ratio compared to the admission rates based on the high school ratio. We can also see that a large number of students received a 100 CEAWR, which is a good indicator that these students will achieve their CEA desire (see Figure 3 and Table 3).

College Enrollment Allocation (CEA)
The bar charts below show the distribution of students' enrollment allocation to colleges after the preparatory year. We notice that the extent of the weighted ratio is close, ranging from 80 to 100 s, which means that there was a great improvement in the students' ratio compared to the admission rates based on the high school ratio. We can also see that a large number of students received a 100 CEAWR, which is a good indicator that these students will achieve their CEA desire (see Figure 3 and Table 3

College Enrollment Allocation (CEA)
The bar charts below show the distribution of students' enrollment allocation to colleges after the preparatory year. We notice that the extent of the weighted ratio is close, ranging from 80 to 100 s, which means that there was a great improvement in the students' ratio compared to the admission rates based on the high school ratio. We can also see that a large number of students received a 100 CEAWR, which is a good indicator that these students will achieve their CEA desire (see Figure 3 and Table 3).

The Relationship between the Admission and CEA Weighted Ratios
To study the relationship between the admission and CEA weighted ratios, Pearson's correlation test (PCT) was applied and analyzed based on the following hypotheses: I Null hypothesis: There is no relationship between the AWR and the CEAWR; I Alternative hypothesis: There is a relationship between the AWR and the CEAWR. Based on the analysis shown in Table 4 and Figure 4: I We refuse to impose the null hypothesis (p < α = 0.05); I Therefore, there is a relationship between the admission ratio and the CEAWR over the three years; I In 2015, the relationship between the admission ratio and the CEAWR was stronger than in the other years, and was equal to 0.797; I As shown in the graph of the admission and CEA weighted ratios, there is a positive linear relationship with the correlation coefficient (uneven, as shown in the table).

The Relationship between the Admission and CEA Weighted Ratios
To study the relationship between the admission and CEA weighted ratios, Pearson's correlation test (PCT) was applied and analyzed based on the following hypotheses:  Null hypothesis: There is no relationship between the AWR and the CEAWR;  Alternative hypothesis: There is a relationship between the AWR and the CEAWR.
Based on the analysis shown in Table 4 and Figure 4:  We refuse to impose the null hypothesis (p < α = 0.05);  Therefore, there is a relationship between the admission ratio and the CEAWR over the three years;  In 2015, the relationship between the admission ratio and the CEAWR was stronger than in the other years, and was equal to 0.797;  As shown in the graph of the admission and CEA weighted ratios, there is a positive linear relationship with the correlation coefficient (uneven, as shown in the table).

Paired Samples t-Test
To study the difference between the AWR and the CEAWR, a paired samples t-test was applied and analyzed based on the following hypotheses: Null hypothesis: Average admission weighted ratio = average CEA weighted ratio; there is no difference between the average admission and CEA weighted ratios.
Alternative hypothesis: Average admission ratio ≠ average CEA weighted ratio; there is a difference between the average admission and CEA weighted ratios.
To use this test, the sample must follow the natural distribution and be in accordance with the central limit theorem; if the sample is greater than 30, it follows the natural distribution.
Based on the analysis shown in Table 5:  We refuse to impose the null hypothesis (p < α = 0.05);  As there is a difference between the admission and CEA weighted ratios for all three years, we cannot be satisfied with the direct college admission (from high school) ratio. 6.2. The Difference between the Admission and CEA Ratios 6.

Paired Samples t-Test
To study the difference between the AWR and the CEAWR, a paired samples t-test was applied and analyzed based on the following hypotheses: Null hypothesis: Average admission weighted ratio = average CEA weighted ratio; there is no difference between the average admission and CEA weighted ratios.
Alternative hypothesis: Average admission ratio = average CEA weighted ratio; there is a difference between the average admission and CEA weighted ratios.
To use this test, the sample must follow the natural distribution and be in accordance with the central limit theorem; if the sample is greater than 30, it follows the natural distribution.
Based on the analysis shown in Table 5:

I
We refuse to impose the null hypothesis (p < α = 0.05); I As there is a difference between the admission and CEA weighted ratios for all three years, we cannot be satisfied with the direct college admission (from high school) ratio.

Nonparametric Sign Test
To study the differences between the admission and CEA weighted ratios, a nonparametric sign test of two interconnected samples was applied and analyzed based on the following hypotheses: I Null hypothesis: Average admission weighted ratio = average CEA weighted ratio; there is no difference between the average admission weighted and CEA weighted ratios. I Alternative hypothesis: Average admission weighted ratio = average CEA weighted ratio; there is a difference between the average admission and CEA weighted ratios.
Based on the analysis shown in Table 6: I We refuse to impose the null hypothesis (p < α = 0.05); I As there is a difference between the admission and CEA weighted ratios for all three years, we cannot be satisfied with the college CEA weighted ratio.

The Relationship between Gender and the Type of GSC
To study the relationship between gender and the type of General Secondary Certificate (GSC), a chi-square test (Cramer's V) was applied and analyzed based on the following hypotheses: Null hypothesis: There is no relationship between gender and GSC type. Alternative hypothesis: There is a relationship between gender and GSC type.
Based on the analysis shown in Table 7: I We refuse to impose the null hypothesis (p < α = 0.05); I There is a relationship between gender and GSC type; I We also noticed that the value of the Cramer's V coefficient for all years converges, which indicates that there is an average relationship between gender and GSC type.

The Relationship between Gender and the College
To study the relationship between gender and the college to which the student is admitted, a chi-square test (Cramer's V) was applied and analyzed based on the following hypothesis: Null hypothesis: There is no relationship between gender and the college to which the student is admitted.
Alternative hypothesis: There is a relationship between gender and the college to which the student is admitted.
Based on the analysis shown in Table 8: I We refuse to impose the null hypothesis (p < α = 0.05); I There is a relationship between gender and the college to which the student is admitted; I We also noticed that the value of Cramer's V for all three years is close, which indicates that there is an average relationship between gender and the college to which the student is admitted. To study the relationship between the type of GSC and the program allocated after the admission stage, a chi-square test (Cramer's V) was applied and analyzed based on the following hypotheses: I Null hypothesis: There is no relationship between the GSC type and any allocated program after the admission stage. I Alternative hypothesis: There is a relationship between the GSC type and any allocated program after the admission stage.
Based on the analysis shown in Table 9: I We refuse to impose the null hypothesis (p < α = 0.05); I There is a relationship between the GSC type and any program allocated after the admission stage; I We also noticed that the value of the Cramer's V coefficient for all years converges, indicating an average relationship between GSC type and any program allocated after the admission stage. To study the relationship between the allocated college and its ranking by the student when admitted, a chi-square test (Cramer's V) was applied and analyzed based on the following hypotheses: I Null hypothesis: There is no relationship between the allocated college and the students' ranking of it when admitted. I Alternative hypothesis: There is a relationship between the allocated college and the students' ranking of it when admitted.
Based on the analysis shown in Table 10: I We refuse to impose the null hypothesis (p < α = 0.05); I There is a relationship between the allocated college and the student's ranking of it when admitted; I We also noticed that the value of the Cramer's V coefficient for all years converges, which indicates that there is a weak relationship between the allocated college and the students' ranking of it when admitted.

Discussion and Conclusions
The preparatory year, which is considered a transitional phase, allowing students to grow accustomed to the university system, may impact the college enrollment allocation weighted ratio (CEAWR) by either increasing or decreasing it compared to the admission weighted ratio (AWR). This study aimed to understand the differences between the university AWR and the CEAWR after completing the preparatory year, and whether the preparatory year is useful for increasing students' achievement through their weighted averages.
Through the intensive statistical analyses carried out by the researchers in this study, it was found that the CEAWR after the preparatory year is more convincing in assessing the performance of students, and therefore enrollment in colleges is more suitable for this performance. These findings align with other research findings that prove the usefulness of the first year on the prediction of students' success (Bruinsma and Jansen 2007;Van Der Zanden et al. 2019;Beaulac and Rosenthal 2019;Culver and Bowman 2020;Kabathova and Drlik 2021).
On the contrary, many studies have focused on the impact of high school's effect on first-year university students. They found that students' admission and/or performance in university depend on the level of competitiveness in high school (Niu and Tienda 2010), high school context (Wolniak and Engberg 2010), or high school grades (Cyrenne and Chan 2012;Dorta-Guerra et al. 2019;Pinxten et al. 2015). However, the comparison made in this study between the performance of students coming from high school and those who passed the preparatory year shows that direct admission from high school to colleges is less convincing in assessing students' readiness to enroll in colleges. These results have been confirmed by Koretz and Langi (2018), who found that high school grades are not sufficient to predict students' performance at university because grading standards vary significantly.
This indicates a positive change in the level of the student performance between high school and university. Therefore, we can provide the following answer to the main question of this research, does the preparatory year in higher education have a significant impact on students' performance and results? The answer is yes, it does. This positive impact supports the process of student college enrollment allocation.
In the future, the possibility of being satisfied with the AWR can still be studied regarding the direct allocation of college enrolment, which may be independent for some colleges. Moreover, this study recommends that the possible sufficiency of the admission ratio for directly allocating colleges should be considered for students.
This research was conducted between the years 2019 and 2020, and the data were collected for three specific batches, 2015, 2016, and 2017, in addition to the cumulative rate of students by the end of the preparatory year, taking the track type (scientific or administrative and humanities) into account. The results of this study had a great impact on the university decision making between 2020 and 2021, as the university decided to expand the preparatory year track by adding the medical track (MT). Moreover, the study supported the university's decision to open the direct admission for some specific colleges and programs.
Undoubtedly, this analytical study is limited by the sample used, coming only from King Abdulaziz University. However, it can be applied to many universities to generalize its outcomes, thus supporting decision-makers in adopting a variety of types of programs and study plans according to the nature of the colleges.