1. Introduction
The 1970s witnessed a rapid expansion of tertiary education worldwide. Based on data encompassing gross enrollment ratios in higher education across 195 countries, the global gross enrollment ratio surged from under 10% in 1970 to exceeding 35% by 2014, and within this period, 70 countries have achieved massification in their higher education systems [
1], indicating a notable shift towards more inclusive tertiary education provision. However, concomitant with this expansion has been a significant deterioration in the employment prospects of college graduates. According to the report by the International Labour Organization (ILO), the global proportion of young people in employment has declined from 46.4% to 34.6% as of 2023. In 2019, approximately 30% of employed youth were experiencing extreme or moderate poverty, with only 23% engaged in formal employment [
2]. This scenario underscores a prevalent issue where many graduates find themselves in jobs that do not match their attained education levels, a phenomenon known as overeducation [
3]. With rates reaching up to 40%, overeducation has become prevalent in developed countries [
4] has gradually spread to developing countries [
5], particularly in China.
The overeducation rate reached 35.09% in 2015 [
6], while the overeducation rate among college graduates exceeded 90% [
7]. On one hand, the trend of knowledge intensification in China’s labor market is becoming more pronounced. Chinese society exhibits characteristics of a typical credential-based society, with a cultural tradition encapsulated by the saying, “All pursuits are of lower status; only studying is esteemed.”, contributing to an overvaluation of college diplomas, which are regarded as the most effective means of achieving success and upward social mobility [
8]. Since the reform and opening-up policy, especially since the radical education expansion policy started in 1999, China’s gross enrollment rate in higher education has increased from 3.7% in 1988 to 60.2% in 2023, and the number of students enrolled in higher education institutions has reached 47.63 million. Enrollment in master’s degree programs rose from 742,500 in 2018 to 1.1484 million in 2023, and doctoral student enrollment increased from 95,500 to 153,300, with average annual growth rates of approximately 8.54% and 9.93%, respectively [
9]. On the other hand, the phenomenon of knowledge-based unemployment in China is also intensifying. Despite the government’s active promotion of industrial upgrading and the implementation of employment incentive policies like “mass entrepreneurship and innovation”, high-skilled jobs continue to be in short supply despite the seemingly steady overall job market [
10].
Especially since 2018, GDP growth rate has declined from 6.75% in 2018 to 5.2% in 2023; in 2023, 20.4% of the young labor force (aged 16–24) remains unemployed [
11]. This rapid increase in educated labor forces, with 11.79 million graduates in 2024 hitting a historical high since the early 21st century, has exerted considerable pressure on the labor market, and the risk of overeducation is projected to become increasingly severe over the coming decade [
12]. Overeducation has emerged as a pressing concern in China, yet it remains inadequately addressed by policymakers. The increasing incidence of overeducation leads to the underutilization and devaluation of human capital, shifting training costs from firms to governments or individuals and diminishing overall performance and productivity [
13]. More seriously, it can elevate overall unemployment rates by crowding out less educated individuals and hinder economic growth [
14]. Consequently, overeducation poses significant challenges to economic growth and social stability, challenging the assumption that education invariably benefits economic development [
13]. However, the causes of overeducation in China remain unresolved.
Addressing the aforementioned issues requires revisiting the theoretical exploration of the economic functions of education, primarily represented by human capital theory and signaling theory. Human capital theory posits that, at the micro level, education increases individual income through its core mechanism of enhancing individual productivity by imparting knowledge and skills [
15]. At the macro level, education promotes economic development by improving the quality of the labor force, technological advancement, and management efficiency through talent cultivation and scientific research [
16]. Therefore, human capital theory asserts that productivity is the main economic function of education. This theory gained significant recognition in the 1960s, leading the United States to implement proactive education expansion policies, resulting in a rapid increase in the supply of knowledgeable labor and economic development. However, the stagflation phenomenon of the 1970s, characterized by concurrent unemployment and high inflation, was difficult to explain using human capital theory.
Under this context, signaling theory was proposed, positing that an individual’s ability is innate and that they will rationally choose their optimal level of education based on a cost-benefit constraint. The value of the education system lies in its ability to select and classify individuals through selective examinations, enabling them to fit into different social roles and occupations [
17]. Thus, the economic function of education primarily involves sending signals about the diploma holder’s ability to potential employers, known as the “signaling effect”, “diploma effect”, or “sheepskin effect”. Signaling theory’s policy recommendation that “the state should improve the quality and equity of education and guide individuals to rationally invest in education to ensure the effectiveness of educational signals” has some explanatory power for the economic stagflation phenomenon in the United States during the 1970s. However, its assumption that ability is innate has not gained widespread acceptance. This led to a theoretical debate in academia about whether the economic function of education is primarily productive or signaling [
18]. Over time, a consensus has emerged that education has dual economic functions, both productive and signaling [
19]. Nonetheless, the relative importance of these functions in determining the economic returns to education varies depending on the historical, cultural, and institutional context of the labor market.
The two conflicting theoretical propositions provide an appropriate framework for analyzing overeducation in China. The individual returns to education constitute a pivotal factor behind the positive reception of the Chinese government’s policies aimed at expanding education. Over the past few decades, education has increasingly become a valuable asset for individuals in China. The private returns to education have surged dramatically, rising from under three percent in the 1980s to five percent in the 1990s [
20] and reaching ten percent by the end of the twentieth century [
21]. Although the growth rate of returns to education in China has since moderated and even declined after 2006 [
22], the returns remain substantial. Previous studies have not strictly differentiated between the productive and signaling effects of education on individual income. Subsequent research has validated that the signaling value of education in China is notably positive, facilitating job acquisition for applicants [
23]. This provides new evidence supporting the phenomenon of the Chinese public’s enthusiastic response to education expansion policies, although these findings do not directly address the issue of overeducation.
Therefore, understanding, whether surplus education or overeducation, remains significant in China under the current context. Especially in the coexistence of “knowledge deepening” and “knowledge unemployment”, if additional years of education primarily convey signaling value, then overeducation serves as a means for individuals to enhance their competitiveness in the labor market, therefore intensifying educational competition. For enterprises, overeducation merely improves the efficiency of recruitment and selection without enhancing the productivity levels of positions. For the state, overeducation represents a misallocation of public educational resources that needs to be curbed. Conversely, if excessive years of education only have productive value, overeducation results from an oversupply of educated labor relative to demand, leading to potentially negative effects on individuals but contributing to the improvement of labor quality for both the state and enterprises. The study aims to distinguish between the productivity effects and signaling effects of overeducation in the Chinese labor market by employing sub-sample estimation for both employed and self-employed individuals and controlling for sample selection issues. This research provides empirical evidence that has profound implications for individual educational investment strategies and career development planning, corporate human resource management, and national education policy formulation.
2. Literature Review
Numerous studies have examined the factors influencing overeducation, yielding a variety of conclusions. First, in terms of the macro environment, economic downturns have been shown to increase the prevalence of short-term contracts in the labor market, therefore heightening the risk of overeducation [
24]. Technological revolutions can alleviate overeducation by boosting the demand for highly skilled labor and improving job search convenience, but they can also increase the risk of overeducation among the moderately educated due to job polarization [
25]. It has been suggested that beyond economic factors, educational expansion does not necessarily increase the risk of overeducation, which has been validated in most European countries except Spain [
26].
Second, regarding organizational characteristics, overeducation is more likely to occur in smaller companies in the UK [
27], whereas the opposite trend is observed in China, where larger firms are more likely to employ overeducated workers [
28]. The type of organization also plays a significant role; in Iran, the public sector exhibits a higher risk of overeducation compared to the private sector [
29]. Furthermore, factors such as the industry category of the employing organization, the proportion of vulnerable groups within the workforce, the non-economic benefits associated with the job, and the nature of employment contracts significantly influence the incidence of overeducation [
5,
24,
30].
Thirdly, individual characteristics significantly influence the likelihood of overeducation. Most studies confirm that men are more prone to overeducation than women [
13]. Additionally, individuals with shorter work experience, lower educational attainment, and those who majored in humanities and social sciences are at a higher risk of overeducation [
14,
30,
31]. However, these findings are not universally applicable.
Fourthly, family characteristics are also influential. Some studies confirm that children from families with higher socio-economic status are more susceptible to overeducation [
32], but in Malaysia, the opposite is true, with children from lower socio-economic backgrounds being more prone to overeducation [
30]. Additionally, immigrants are more likely to experience overeducation, but the risk of overeducation significantly decreases for their offspring [
33].
Most studies on overeducation primarily address its socio-economic impacts, with a significant emphasis on quantifying its economic effects. Many empirical studies have concentrated on estimating the productivity effects of overeducation using the ORU (Overeducation-Required-Undereducation) model or VV (Verdugo-Verdugo) model [
34,
35,
36]. Over several decades, these studies have consistently concluded, based on empirical evidence, that the rate of return to required years of education is significantly higher than the rate of return to actual years of education. Furthermore, while the rate of return to surplus years of education is positive, it is markedly lower than that for required years. Conversely, the rate of return to inadequate years of education is negative, albeit with a smaller absolute value compared to required years [
4,
14,
27,
37,
38,
39]. Hence, individuals who are overeducated exhibit lower personal productivity compared to their matched counterparts with the same level of schooling [
34,
40], although they display higher personal productivity relative to those matched at the same job position. While some of these differences may be attributed to ability heterogeneity, the overall direction of the discrepancy remains unchanged [
41]. However, these studies do not strictly distinguish between the productivity effects and the signaling effects of overeducation. Only a few studies have focused on the signaling effects of overeducation, but they lack precise quantification.
In general, studies on the economics of higher education provide valuable insights into the signaling effects of overeducation. For instance, Chatterji et al. discussed how employers balance supervision costs and employment costs. When supervision costs surpass those associated with additional education signals, employers tend to hire more highly educated employees, therefore increasing the risk of overeducation. Building on this research [
18]. Li et al. explored the relationship between overeducation and education signals in China. They argue that overeducation is a rational choice for both graduates and employers during higher education expansion. Graduates seek to enhance their job competitiveness through education signals, while employers use these signals to reduce supervision costs [
42]. However, these studies do not directly analyze the signaling effects of overeducation, nor do they quantify the specific signaling values or improve the accuracy of traditional estimation methods until Kedir et al. (2012) [
43]. He combined Wolfing’s method with the ORU model to differentiate between the productivity and signaling effects of overeducation by estimating and comparing rates of return to overeducation for the self-employed and employees in the UK and Cyprus. They found that overeducation has a significant positive impact on productivity and that the signaling effects of required years of education are greater than those of surplus years. The difference in returns between required and surplus years of education stems from their distinct signaling effects. Moreover, after correcting for ability heterogeneity and sample selection bias, Kedir et al. concluded that the overeducated do not experience significant productivity loss [
43]. The main contribution of Kedir’s study is the explicit disentanglement of the signaling effects from the productivity effects of overeducation through sub-sample estimations, therefore improving the accuracy of estimating the productivity effects of overeducation.
Previous studies demonstrate several deficiencies. First, while evidence-based analyses of overeducation (WSH) emphasize the importance of accounting for the signaling effects of education in estimating productivity effects, many earlier studies fail to separate these signaling effects from productivity effects. This oversight compromises the accuracy of the estimations. In other words, the observed productivity effects in the aforementioned studies may be partially or entirely attributed to the signaling effects of overeducation [
43]. Second, most estimations overlook sample selection bias, even though Tsai (2010) has indicated that sample selection significantly impacts the accuracy of traditional estimations [
44]. Thirdly, although some studies have comprehensively examined the signaling effects of overeducation in developed labor markets, there is a notable lack of research focusing on developing countries, particularly in the context of the Chinese labor market. This market is currently experiencing a critical transition characterized by an increasing supply of highly educated workers and a weakening economic status.
3. Hypotheses
The debate between human capital theory and screening theory provides a fitting analytical framework for measuring the productivity and signaling effects of overeducation. Scholars proposed the Strong Screening Hypothesis (SSH) and the Weak Screening Hypothesis (WSH) [
45]. SSH claims that education does not have any production effects but only signaling effects, whereas WSH asserts that education has both production and signaling effects. Due to the varying strength of the signaling value of education among different groups, such as the employed and self-employed individuals [
46], competitive and non-competitive sectors [
47], professionally matched and mismatched individuals [
48]. Scholars often test the aforementioned hypothesis by comparing whether the returns to education differ among various groups. Some studies do not support the signaling effect of education, therefore rejecting both the SSH and the WSH [
49]. Other studies support the notion that education only has a signaling effect, thus supporting the SSH [
50]. Most empirical findings have supported the Weak Screening Hypothesis, which gradually framed a more consistent understanding of the dual functions of education [
12,
23].
Based on the SSH and WSH, [
46] Wolpin and others argue that there is an essential difference in the expression of ability signals between self-employed and employed individuals in the labor market. They suggest that comparing self-employed individuals with employees can help decompose the productive and signaling effects of overeducation. It is important to clarify that self-employment refers to all forms of work where individuals work for themselves in non-agricultural sectors, as opposed to being employed and receiving wages from others [
51]. Examples include skilled artisans, taxi drivers, retailers, individual entrepreneurs with or without employees, independent accountants and actuaries, physicians, freelancers, and heads of privately operated companies or enterprises.
For employees, investing in additional education serves to enhance their competitiveness in the labor market, either by increasing productivity or by enhancing signaling. Employers, in turn, use the educational attainment of applicants as a tool to assess their abilities, therefore mitigating information asymmetry and reducing hiring costs [
19,
46]. Consequently, the education of employees partially functions as a signaling mechanism. In contrast, for the self-employed, there is no information asymmetry in hiring themselves, meaning that their education solely contributes to productivity effects. Similarly, in the context of overeducation among employees, their surplus years of education likely convey two types of information to employers: the ability, skills, or knowledge possessed by the workforce [
52] and the signals indicating their ability level. Especially in China, the signaling effect of education may contribute more significantly to income increases [
53]. In contrast, for the self-employed, surplus years of education exclusively convey productivity-related information. Therefore, by assuming that the productivity effects of education are homogeneous between employees and the self-employed, we can disentangle the signaling effects of education from its productivity effects. By measuring whether the impact of excessive years of education on personal income exists among employees and self-employed individuals, three hypotheses can be derived for this study:
Hypothesis 1. Surplus years of education only have signaling effects instead of productivity effects.
At this point, the excessive years of education have no significant impact on the personal income of the self-employed group, but they do have a significant impact on the income of the employed group. This difference originates from the signaling effect pointed out by [
46] Wolpin (1977) and others. The verification of this hypothesis indicates that overeducation is an active strategy individuals use to leverage educational signals to compensate for disadvantages such as lack of work experience and low ability [
4]. Their actual ability level does not exceed the requirements of their positions, which is categorized as “Apparent Overeducation” [
54]. In the long term, employer learning eliminates the information asymmetry between employers and employees [
55] revealing the true productivity level of overeducated individuals. The signaling effect diminishes, and compared to their peers with the same level of education, overeducated individuals have lower ability levels and work in positions that require less skill. Overeducated individuals find it harder to escape the state of overeducation through career mobility or internal promotions. As the value of educational signals fades, their income will either decrease further or grow slowly, aligning with conclusions about the scarring effect of overeducation and lower future labor market achievements [
56]. Thus, overeducation is a long-term phenomenon; it increases hiring costs for companies due to the disruption of the signaling effect of education and represents a waste of public education resources for society, which is detrimental to the social return on education.
Hypothesis 2. Surplus years of education have both productivity and signaling effects.
At this point, the positive effects of overeducation on productivity observed in previous studies likely stem from its signaling value. To further confirm this, after eliminating the influence of the signaling effect, the productive effects of overeducation can be assessed from both individual productivity and social productivity perspectives [
54]. The former involves a comparison with peers of the same education level to verify whether overeducation has impaired the individual’s productivity level, while the latter involves a comparison with peers in the same job position to verify whether overeducation has enhanced the job’s productivity level. Therefore, Hypothesis 2 can be further refined as follows:
Hypothesis 2a. Overeducation leads to a certain degree of productivity loss.
In this context, surplus years of education have a significantly positive impact on productivity, though this impact is less pronounced compared to the productivity effects of the required years of education. This analysis will corroborate previous findings, but the early observed “income penalty” [
34,
35,
36] is not entirely caused by the productive effects of overeducation; part of it also stems from its signaling value. Once this hypothesis is supported, overeducation can be seen as an individual’s compromise to cope with employment uncertainties, leveraging the signaling and productivity advantages of higher education to enhance their competitiveness. However, due to the adverse environment of the labor market where “supply exceeds demand” and the possibility that overeducated individuals might have weaker abilities compared to their peers, employers are unwilling to fully compensate for the additional education. Consequently, overeducated individuals must bear a certain degree of “income loss”. This loss is further exacerbated by employer learning [
55] and the slower accumulation of skills in lower-requirement job positions compared to peers with the same education level [
56]. Overeducation thus has a long-term negative impact on their future labor market performance, making it difficult to completely escape “overeducation”. However, because overeducated individuals possess higher productivity compared to other groups in the same positions, it is beneficial to society.
Hypothesis 2b. Overeducation does not lead to any loss of productivity.
In this context, surplus years of education have a significantly positive impact on productivity, comparable to the productivity effects of the required years of education. At this point, overeducated individuals have the same productivity level as their peers with the same education level. Although employers fully compensate for their higher productivity, they only partially compensate for the signaling value of their education. Specifically, the signaling value of excessive education years in the job market is weaker than the signaling value of the education years required for the job. The “income penalty” observed in early research is entirely due to the difference in educational signaling value between overeducated individuals and those with matched education. Once this hypothesis is supported, overeducation is more likely caused by temporary factors such as having less experience, tenure, and on-the-job training [
37] rather than due to a lack of ability or an imbalance in the supply and demand of the labor market. This is a short-term phenomenon in the labor market, which can be resolved through internal promotions and job mobility. This hypothesis supports the occupational mobility hypothesis proposed by Sicherman (1993) [
31]. Individuals may suffer some loss in signaling value, but there is no adverse impact on companies or society.
Hypothesis 3. Surplus years of education only have productivity effects instead of signaling effects.
In this context, the excessive years of education do not serve their expected signaling function in the labor market; instead, their productive effects play a decisive role. Similarly, to further evaluate the productive effects of overeducation, Hypothesis 3 can be further divided into:
Hypothesis 3a. Overeducation leads to a certain degree of productivity loss.
At this point, the impact of excessive years of education on individual income is lower than the impact of the years of education required for the job, indicating that overeducated individuals indeed suffer a certain “income penalty”, and this loss is entirely determined by productive effects. This hypothesis aligns with most previous studies [
4,
14,
27,
34,
35,
36,
37,
38,
39]. Although these studies overlooked the signaling effect of education, their reliability remains unaffected. Similar to Hypothesis 2a, the actual productivity level of overeducated individuals falls between their peers with the same education level and colleagues in the same job position. This is detrimental to an individual’s long-term career development but helps enhance overall societal productivity levels.
Unlike Hypothesis 2a, in Hypothesis 3a, the causes of overeducation are more attributable to the education system rather than the labor market. Particularly in the context of China’s rapid educational expansion, overeducation results from both diploma devaluation and the decline in educational quality [
57]. The substantial increase in the number of workers with the same diplomas diminishes their signaling effect, and the inadequacy of quality assurance in education leads to a mismatch between the abilities of some educated individuals and their education level, therefore triggering overeducation.
Hypothesis 3b. Overeducation does not lead to any loss of productivity.
At this point, the impact of excessive years of education on individual income is comparable to that of the years of education required for the job, meaning that overeducation does not necessarily entail an income loss. This contradicts the mainstream conclusion that overeducated individuals suffer an income penalty but aligns with some research findings. For instance, Bauer found that, after controlling for individual heterogeneity, the income difference between overeducated and properly educated individuals reduces or even disappears [
58]. Chevalier discovered that the income loss for genuinely overeducated individuals is only 22.7% to 42.3% of that for apparently overeducated individuals [
54]. Levels et al. also confirmed that the so-called income loss mainly stems from skill heterogeneity [
59].
If this hypothesis is supported, overeducation is no longer caused by problems or deficiencies in the education system. This conclusion is drawn because overeducated individuals possess productivity that matches their education level, indicating that the expansion of education has not necessarily led to a corresponding decline in educational quality. The oversupply of knowledge workers reduces scarcity or leads to an oversupply of the same diploma, making it difficult to reflect this in job returns matching their qualifications [
60]. However, the positive trend of industrial upgrading demands an increase in productivity for jobs that originally did not match, resulting in “apparent overeducation” for some groups. In other words, Hypothesis 3b suggests that educational expansion can improve the quality of the labor force, enabling it to better adapt to the trend of economic and industrial upgrading, or that overeducation represents a positive coupling between economic and industrial upgrading and educational expansion [
61].
4. Research Design
4.1. Measurement of Overeducation
Overeducation describes the extent to which one possesses a certain level of education more than that required for one’s job [
62]. According to the definition, the required years of education for a job is the key to identifying overeducation. Empirical studies typically employ three main methods to define the required years of education. First, the workers’ self-report (SR) method estimates the required education level by directly or indirectly asking respondents about the educational level needed to perform or obtain their job or whether they are over-, under-, or appropriately educated for their job or the skills required [
34,
63,
64]. Second, the job analysis (JA) method estimates the required educational level for a specific job based on skill or education requirements defined by job analysts, such as those found in the United States Dictionary of Occupational Titles (DOT) and the Standard Occupational Classification in the UK [
35]. Third, the realized matches (RM) method estimates the required education by analyzing the distribution of educational attainments within each occupation. This method includes two approaches: one calculates the mean and standard deviation of educational attainments to determine the required years of education for each occupation, categorizing individuals as undereducated or overeducated if their educational attainment deviates by more than one standard deviation from the mean [
36]; the other uses the mode of educational attainments as the benchmark for the required education for each occupation [
65].
The aforementioned measurement methods each have distinct advantages and disadvantages. Hartog (2000) concluded that the estimation results do not significantly vary across these different measurement approaches. For instance, the SR method is limited to the survey sample, making it difficult to generalize to the broader population [
66]. Additionally, an individual’s response to the required educational level for a job may be influenced by their own educational expectations and the educational distribution among both new and existing staff in the same department [
64]. The JA method, while more standardized, is costly to implement. As occupational structures evolve rapidly, the need for frequent updates to job analyses increases correspondingly. Moreover, the complexity of jobs across different countries adds to the difficulty of performing comprehensive job analyses [
63]. The RM method directly reflects the supply–demand dynamics of the labor market, aligning with the theoretical foundations of resource allocation [
25]. However, it overlooks the fact that equilibrium may be influenced by endogenous factors within the labor market, potentially leading to over- or underestimation of overeducation.
Therefore, the selection of a measurement method should be guided by the specific objectives of the research and the quality of the available data. In this study, we employ the mode method to measure overeducation. Given that this paper analyzes the current state of the labor market, the RM method aligns well with our research objectives. The use of the mean and standard deviation as the required education levels is more susceptible to market distribution changes and, in terms of validity, is less stable than the modal value. Although the mode method may be influenced by market changes—such as underestimating the rate of overeducation during periods of educational expansion—this study is a cross-sectional analysis rather than a time-series analysis, allowing us to disregard this impact. Additionally, in the robustness test section, we utilize other measurements of overeducation to ensure the reliability of our findings.
4.2. Data and Variables
The data utilized in this study were derived from the 2015 China General Social Survey (CGSS), which is part of the international General Social Survey (GSS) family. The General Social Survey (GSS) project began in the United States in 1972, and to date, over 40 countries and regions have their own GSS projects. Examples include the International Social Survey Program (ISSP), the European Social Survey (ESS), and the China General Social Survey (CGSS). These surveys are among the most widely used datasets in the social sciences. The CGSS began in 2003 and, by 2018, had completed 12 annual surveys. It is jointly conducted by the Sociology Department of Renmin University of China and the Survey Research Center of the Hong Kong University of Science and Technology. For CGSS2015, a multistage, stratified, and PPS (Probability Proportional to Size) random sampling method was employed nationwide. A total of 100 county-level units plus five major metropolitan areas, 480 villages/neighborhood committees, and 12,000 individuals were selected for the survey. A household face-to-face interview approach was used to complete 10,968 valid questionnaires. Detailed information on the sampling design, reliability, validity, implementers, and processes can be found in the attached Link 11.
The CGSS 2015 dataset includes comprehensive information on the education, employment, and familial contexts of the respondents. For this study, overeducation is considered an employment phenomenon among the labor force population entering the job market, thus defining the research sample. First, according to Chinese law, individuals below 16 and above 60 years of age, who are considered a non-labor force population, are excluded.
Second, groups such as farmers, part-time workers, and students are not strictly considered to be engaged in “employment behaviors”, so only employees, the self-employed, and employers are retained. Employers and the self-employed are combined into the self-employed cohort. This is because the terms “self-employed” and “employers” are simply different expressions used in the CGSS survey to denote self-employment. “Employer” tends to refer to more formal, larger-scale economic activities, such as being a business owner or individual business operator, while “self-employed” tends to refer to an independently employed individual. In different contexts in China, these terms can be used interchangeably [
67]. Similar classifications can be found in other large-scale social surveys, such as the China Social Comprehensive Survey (CSS), which classifies non-agricultural employment into categories such as employees or wage earners, employers/owners (i.e., owners/investors/partners of enterprises), self-employed workers (individual business operators and freelancers without employees), and family helpers (working for their family/household business but not as the boss).
Third, considering the adverse effects of extreme income values in survey responses, a conventional deletion method is used [
68]. The lower limit is set at the minimum monthly wage standard for each municipality, province, or autonomous region in 2015 (12,000 RMB), and the upper limit is set at the income discontinuity point (1.2 million RMB).
Finally, the final study sample comprised 2417 valid cases (see
Table 1), accounting for 22.04% of the total CGSS sample, while the employment population in the 2018 China Economic Census (364 million) accounted for 25.82% of China’s total population (1.41 billion) [
69], a relatively small difference. Additionally, the proportion of self-employed individuals in the final sample is approximately 35.75%, comparable to the 32.33% proportion of individual and private economic employment nationwide at the end of 2015 [
70].
It is worth noting that the first and second steps above are merely for defining the research subjects and do not disrupt the structure of the CGSS data sample. The third step considers the complexity of the group behind China’s large population scale and selects a more representative sample of the general population by excluding 498 individuals with extreme income values. Among these, 494 individuals had income below the lower limit, including 125 individuals with a reported income of zero. A total of 415 individuals reported working as “service workers and shop and market sales workers” or “elementary occupations”, which fall under skill levels 1 and 2 in the ISCO88 classification. In China, these are more likely to be informal employment groups or those who conceal sensitive income information, fundamentally different from the research subjects’ characteristics. Including these individuals could jeopardize the robustness of the conclusions. Additionally, four individuals with income above the upper limit had an income more than 24 times the per capita GDP of China in 2015 (499,000 RMB), and their group characteristics and data scale do not contribute to forming conclusions of general significance. Overall, the sample in this study has high representation.
The employed and self-employed differ in the nature of income, and the former is an operating income while the latter is wage income. However, the survey data do not involve detailed income information, and very few participants report they have both operating income or wage, so income is measured by the total annual income in this study. Furthermore, Wooldridge pointed out that applying a logarithmic transformation to the dependent variable can effectively alleviate issues such as potential heteroscedasticity and skewed data distribution, therefore improving the model’s fit and explanatory power [
71]. Moreover, the logarithmic transformation helps interpret the elasticity of the dependent variable with respect to the independent variable, i.e., the rate of change in the dependent variable relative to the independent variable. This makes the coefficients more intuitive and easier to understand. Therefore, the logarithm of annual total income is added as a dependent variable. In the research, the core independent variables are years of education and employment status. Years of education includes the required years of education, the surplus years of education, and the inadequate years of education; employment status includes self-employment and being employed.
4.3. Methodology
The ORU model used in this study is based on the classic Mincer earnings functions, which decomposes the actual years of education into the required years of education, the surplus years of education, and the inadequate years of education [
34,
35]. As shown in Equation (1):
In the above formula, is all other explanatory variables; is the random error term; is the required years of education; is the surplus years of education and is the inadequate years of education. The relationship between these three variables and the actual schooling years is: . To this end, the coefficient and represent the marginal income if the required years of education, the surplus years of education, and the inadequate years of education increase one year, respectively.
To distinguish the signaling effects and productivity effects of overeducation, Kedir’s Model adds the employee dummy variable and the interaction terms in the ORU model. The detailed model is shown below:
In the Formula (2), is the dummy variable of the employee; , and are the coefficients of the corresponding interaction term, indicating the signaling effects of the required years of education, the surplus years of education, and the inadequate years of education, respectively. , and indicate the productivity effects of the corresponding years of education, respectively.
In cross-sectional survey data, the samples of employees and self-employed individuals may exhibit non-randomized distributions, making it challenging to ensure complete randomization of confounding variables. This lack of randomization can introduce bias in classical OLS estimations. In this study, we control for this bias using the Propensity Scores Matching (PSM) method. It is essential to note that the PSM method must adhere to the Common Support Assumption (CSA) and the Conditional Independence Assumption (CIA). The CSA requires that both the treated group and the control group have a positive probability for each value of the confounding variables, meaning that the propensity scores must share overlapping regions. The larger the overlap, the higher the preservation rate of the sample and the better the matching quality. The CIA stipulates that the treatment variable and the outcome variable must be independent once the confounding variables are controlled for.
The specific steps are as follows: first, the confounding variables are selected. Rubin & Thomas (1996) suggests that “only variables that are determined to be uncorrelated with both the outcome variable and the intervention variable should be excluded”. To satisfy this assumption, it is necessary to avoid endogeneity in the treatment variable by including sufficient confounding variables [
72]. The confounding variables should be related to employment status and individual income, and it should be the pre-variable of employment status. This study selected personal characteristics, human capital characteristics, family social background characteristics, and regional economic characteristics as confounding variables [
25,
73]. The distribution characteristics of the variables are presented in
Table 1; second, employees is selected as the treatment variable, and the Logit model is utilized to estimate the probability of accepting treatment for a single sample [
74]; third, methods including kernel matching, local linear matching, caliper matching, the nearest neighbor matching are used to match the control group and treated group, the matching effect being tested statistically; forth, the matching quality is evaluated [
74].
The CIA can be verified through balance tests, which assess the significance of changes in the differences of the confounding variables between the control and treated groups before and after matching. Finally, robust OLS estimation is conducted after balancing the control group and the treated group. By eliminating the distribution differences of confounding variables between different groups, the matched sample enhances the comparability between self-employed individuals and employed individuals, better meeting the OLS assumptions and improving the reliability of the estimation results.
6. Discussion
The above results differ from those of Kedir’s empirical findings in the UK and Cyprus [
43], who proposed SOH (Strong Overeducation Hypothesis: Years of overeducation do not increase workers’ productivity) and WHO (Weak Overeducation Hypothesis: Years of overeducation have a positive but smaller impact on workers’ productivity than years of required education). Kedir et al. found that both the surplus and required years of education show significant positive signaling effects, 1.8 and 3.8 percent, respectively; the actual productivity of surplus years of education (5.6%) is even slightly higher than that of the years of required education (5.4%). Therefore, overeducation in the UK and Cyprus will not bring about personal productivity loss, and the productivity discrepancy between the overeducated and the matched observed in the early studies is mainly a result of the difference in their signaling values. This rejects both the SOH and WHO hypotheses, but Kedir did not provide sufficient explanations.
Kedir’s findings are consistent with Hypothesis 2b of this study. The evidence suggests that individuals lacking work experience may choose to pursue additional education to obtain the necessary experience for better-matched employment [
31]. This is supported by Vecchi et al. who found that despite an increase in the number of graduates entering the UK job market since the 1990s, the incidence of mismatches has not increased over time, remaining around 31% [
75]. A significant proportion of graduates have not developed the necessary skills suitable for finding employment in graduate-level jobs. However, they also found a negative correlation between tenure and the probability of mismatch, indicating that overeducation is mitigated as work experience increases.
The phenomenon of overeducation in China has not produced the expected signaling effect, which is a thought-provoking finding but aligns with the current context in China. Empirical studies have indicated that the rapid expansion of education and the significant increase in graduates holding diplomas have diminished the signaling value and quality of education [
76]. Despite the profound influence of the civil service examination system over nearly 1300 years, which has resulted in a highly diploma-oriented society [
77], particularly after the Reform and Opening-up policy when millions achieved social mobility through education [
78], the public’s reverence for educational credentials reached a peak. However, with the rising unemployment rate among university graduates, the devaluation of diplomas has led to widespread skepticism about the utility of education, frequently reported in various media, and even to instances of students opting out of the national college entrance examination [
79].
As previously mentioned, the aggressive educational expansion policies and the inadequate education quality assurance system are significant reasons for the decline or disappearance of the signaling effect of diplomas. From associate and bachelor’s degrees to master’s and doctoral levels, the expansion of higher education, while alleviating some employment pressure through further studies, has led to a significant increase in the number of graduates holding the same diplomas. It is particularly noteworthy that the expansion of higher education has had a minimal impact on key universities, with the admission rate of China’s 985 universities remaining around 1.5% from 2007 to 2016 [
80,
81]. The impact of increased enrollment is more pronounced in non-key universities, which often lack sufficient funding and thus cannot avoid the risk of declining education quality due to increased scale. For instance, to avoid a sharp increase in costs, humanities, and social sciences programs have been the hardest hit by expansion policies because their training costs are relatively lower than those of science, engineering, and medical programs. Additionally, both the state and society paid little attention to employment rates. It was not until recent years that the Chinese Ministry of Education made graduate employment rates a crucial metric for evaluating schools, disciplines, and teaching and used it as a basis for funding allocation. This change has driven many universities to adjust their programs. For example, in 2022, a total of 1641 programs were newly added nationwide, while 925 programs were discontinued [
82]. Among the discontinued programs, humanities and social sciences were the most affected. Between 2020 and 2022, 75 universities canceled their Public Administration programs, aiming for universities to proactively align with market demands to improve overall employment rates.
Evidence of the failure of educational signaling also comes from the labor market. In the early stages of educational expansion, Chinese companies did not rely on professional assessments in recruitment. While this was partly due to the slow development of third-party evaluations, it was more because the number of university graduates was relatively small, and the hierarchical mechanism based on the Chinese college entrance examination was a more effective and economical screening method. However, with the continued expansion and the application of digital technology, achieving efficient screening through scientific evaluation methods and rational selection has become a key point in campus recruitment planning aligning corporate talent strategies [
83]. Under this trend, the information asymmetry between employers and employees has been significantly reduced. Overeducated individuals often involuntarily reveal lower capabilities, therefore legitimizing the assessment and recruitment processes [
54].
On the other hand, the common phenomena of “difficulty in recruiting” and “difficulty in finding employment” in the labor market reflect a severe mismatch between the supply of education and market demand in China [
10] (Guo & Deng, 2023), directly leading to a discrepancy between the skills graduates have acquired and the actual requirements of their jobs. This makes it challenging for educational signals to translate into employment opportunities [
84]. Lastly, the inadequacy of labor market institutions is even more critical. In China, social relationships and networks play a significant role in obtaining employment opportunities [
85]. Even if an individual has a high level of education, the signaling effect of education may be limited if they lack relevant social background and networks. Additionally, the uneven economic development levels across different regions in China, along with the severe segmentation among regions, sectors, and industries [
86], exacerbates the incompatibility between educational attainment and economic development. This leads to an imbalance in skill structures between supply and demand, causing the effect of educational signals to vary across different regions [
23,
76].
This study stripped away the potential impact of educational signaling and verified the conclusion of earlier research that “overeducation leads to a loss in individual productivity”. The phenomenon of overeducation in China manifests dual productivity effects: a loss of individual productivity and a gain in social productivity. Due to the prevalent trend of overeducation, individuals with higher educational qualifications are increasingly employed in lower-tier positions, therefore augmenting the overall human capital stock within these roles. This, in turn, facilitates the expansion of knowledge and technology, aids in the gradual upgrading of industrial structures, and promotes social innovation. However, this societal advancement occurs at the expense of individual sacrifice. Empirical studies have indicated that overeducation can reduce cognitive and non-cognitive abilities [
56], signifying a long-term loss of individual productivity. Additionally, for individuals with lower capacities, the risk associated with educational investment increases significantly due to the diminished signaling value of education and the uncertainties in securing well-matched jobs. This exacerbates economic and non-economic losses for these individuals. Consequently, individuals must make rational decisions regarding their educational investments. From a societal perspective, however, the vigorous development of education remains the optimal strategy in the long run. This provides micro-level evidence for the rapid economic development in China following the reform and opening-up. To some extent, overeducation has served as a “talent reservoir” for industrial upgrading. This dual approach ensures that while individuals manage their personal educational investments prudently, society continues to benefit from an increasingly educated populace, driving forward innovation and economic development.
7. Implications and Conclusions
This study investigates the signaling and productivity effects of overeducation in China, examining its impact on social productivity. The findings challenge the Weak Screening Hypothesis (WSH) [
45], demonstrating that enhancing educational signals through overinvestment in education is not an effective strategy for securing employment within the Chinese context. Overeducation does not result from the signaling function of education but arises due to graduates’ low abilities. This indicates that overeducation in China is more apparent than genuine and may be considered a long-term phenomenon within the broader trajectory of career development. The implications of this phenomenon are significant, leading to a series of chain reactions. The presence of overeducation and the disappearance of educational signaling may intensify competition in the job market, resulting in greater employment pressures and challenges for graduates. Educated individuals from lower-income families may find it difficult to achieve social mobility through education, which could exacerbate societal inequality in the long term [
13]. However, the dual production effects of overeducation tell us that it is not always disadvantageous. The level of overeducation in a society needs to be maintained at an appropriate level, striking a balance between individual losses and societal benefits. Empirical studies suggest that the proportion of overeducation should be kept between 9% and 26% [
25].
This study provides insights into the following aspects. Policymakers should promote the balanced allocation of educational resources. While expanding access to education, they should also enhance the assessment and monitoring of educational quality, promote vocational education and technical training, and ensure that the education system can adapt to changes in economic development and social needs. University administrators should strengthen communication and cooperation with industries and enterprises, understand market demands and industry trends, encourage schools to carry out internships and practical activities and adjust educational content and curriculum to make education more aligned with actual employment needs.
A sound education evaluation system should be established, leveraging digital technology to innovate educational models, providing students with more flexible learning options to meet the needs of different students, and guiding their comprehensive development. Individuals should advocate for diversified career choices, actively participate in vocational skills training and practical experiences, reduce the excessive pursuit of traditional higher education, and choose suitable educational paths based on their interests and abilities.
Enterprise managers should actively cooperate with educational institutions, participate in the reform of the education system and curriculum design, and provide guidance and support for the training of future employees. They should focus on candidates’ actual performance and achievements, weaken the emphasis on academic background in recruitment and promotion, clearly define the actual skills and abilities required for various positions, and actively conduct internal training to allow employees to gain practical experience and enhance their abilities on the job. They should also provide opportunities and resources for employees to continue their education, helping them to improve their skills and knowledge in their spare time to meet the needs of career development.
It is essential to acknowledge the limitations inherent in this study. Despite efforts to correct for partial selection bias, the endogeneity of the model remains unresolved. The instrumental variable estimation for the ORU model necessitates robust instrumental variables for actual years of education, the three decomposed years of education, and employment type. This requirement restricts the applicability of instrumental variables in the research. Furthermore, the model implicitly assumes that both signaling and productivity effects are homogeneous between the self-employed and the employed. The evidence is that though personal decisions regarding employment type frequently occur post-graduation, individuals may be uncertain about pursuing self-employment during the early stages of their education [
87]. For the self-employed, customers often assess the quality of their goods or services based on their educational qualifications. Consequently, in the context of future planning and risk mitigation, individuals tend to invest in education even when their current qualifications are sufficient for the job. This is supported by research indicating no significant correlation between the average years of education of the self-employed and employed in China [
88]. However, in reality, these effects may exhibit heterogeneity across different groups. This potential heterogeneity represents an important area for future research. Finally, as mentioned earlier, despite the massive expansion of higher education, admission opportunities at China’s elite universities remain scarce, and the educational signals from these institutions are significantly different from those of other universities. Research on graduates from these elite universities may yield new findings. Therefore, the neglect of educational quality might affect the reliability of the results of this study. However, in 2016, the enrollment scale of China’s 985 universities accounted for about 4% of the total higher education enrollment scale [
81], indicating that the impact of educational quality remains within an acceptable range.
Addressing these limitations in subsequent studies will provide a more nuanced understanding of the dynamics of overeducation and its impacts. Future research should focus on identifying more effective instrumental variables and examining the differential effects of signaling and productivity across various regions, employment sectors, and demographic groups, such as different regions, the private sector, and graduates from universities of different tiers. This will enhance the robustness and generalizability of the findings, contributing to a deeper understanding of overeducation and its broader implications in the labor market.