It is crucial to identify which households have consumption potential due to the inconsistency between the income and consumption classes. In this paper, consumption potential is based on households that are not in the same income bracket as their consumption level. Households whose income classes are equal to or greater than the consumption class threshold are typically recognized as having considerable consumption potential. Conversely, households in which the income class is below the consumption class threshold may be prone to overconsumption, indicating an increased risk of financial instability or vulnerability. Our identification assumption is that once we control for the income class at the year and household level, the consumption class-based real consumption is comparable. The expectation-maximization algorithm for solving mixed distribution models has a sizable panel component that enables the estimation of sophisticated income and consumption processes and establishes who constitutes the poor, the work, the middle and the upper classes [
24]. Specifically, households that have a lower income class than their consumption class are categorized as overconsumption, while those whose income and consumption fall within the same class are categorized as equal consumption. Households whose income class is higher than their consumption class are classified as underconsumption.
2.1. The Data
The analysis of class membership is based on the household income and consumption dataset sourced from the China Family Panel Studies (CFPS). Household income and consumption were examined in a biennially conducted nationally representative survey of China’s communities, families, and individuals between 2010 and 2018. The CFPS, which covers a vast array of domains for families and individuals across 162 counties in China, is carried out by the Institute of Social Science Survey at Peking University. The survey collects information on various aspects of the participants’ lives, including their economic activities, education outcomes, family dynamics and relationships, and health status. For household income, the household disposable income from all sources is defined as the total personal income of all the members of the family. For household consumption, there are eight types of household consumption expenditures identified through a questionnaire that focuses on the following nonproductive expenditures: (1) daily food, (2) clothing, (3) housing, (4) household appliances and daily used commodities and necessities, (5) transportation and communication, (6) entertainment and education, (7) medical care, and (8) other expenditures. In terms of consumption structures, this variable can be divided into survival consumption, which includes daily necessities, clothing, water and electricity, food, local transportation and communication expenses, and non-survival consumption, which includes all other necessities. In this study, all nominal values were deflated using the 2010 national Consumer Price Index (CPI) to obtain real terms, including household consumption, income, and total assets. Given the micro-level nature of the survey data, we prioritized data reliability. Specifically, any observation with missing values in the variables of interest was removed from the analysis. To mitigate the influence of extreme outliers, we applied 1% winsorization to household consumption, income, and total assets, capping the lower and upper 1% of the distribution. These preprocessing steps ensure the robustness of our empirical analyses and reduce potential biases arising from missing or extreme values.
2.2. Estimation of Mixture Distributions via the EM Algorithm
Assume that there is a finite number,
K, of societal classes whose behavior is governed by their unique circumstances. The path of the outcome vector
, which describes their behavior, follows a logarithm normality distribution for each of these classes. With
such groups in a society, the overall distribution
will be a mixture of these distributions as follows:
where
is the weight of the
distribution in the total distribution, also known as the probability that the sample data belong to the
distribution,
is the parameter if the
subdistribution, and
,
.
Given the income or consumption variable
, it is uncertainly specified as belonging to the subdistribution. Consider a latent variable
, where
represents which distribution
comes from.
The log-likelihood of the parameter
given the logarithm normality distribution would be as follows:
The EM algorithm is an iterative method developed to solve and problems. The estimation of mixture model parameters through maximum likelihood (ML) is made easier with the use of the expectation-maximization (EM) algorithm.
First, we use the k-means method as the initial parameter value of the EM algorithm. Then, the EM algorithm proceeds by iterating through multiple steps, each of which is divided into two fundamental stages: the E-step and the M-step. In the generic iteration
, the E-step calculates the logarithmic likelihood function of latent variable
in the joint distribution according to the parameter
of the previous iteration:
In the M-step, solve the parameter that maximizes and complete a parameter update . The parameters are updated through iteration for until the algorithm falls below a certain threshold, such as in the current paper.
2.3. Potential of Household Consumption by Interclass Comparison
Table 1 provides the parameters of the model fits, categorized by years, for each class in income and consumption, including the estimated mean
and standard deviation
of each normal distribution along with its corresponding mixing proportion
. Based on the results obtained from k-means, a four-component mixture appears to be the best and most efficient model. Economically, these four latent classes map naturally onto widely used socio-economic categories—poor, working class, middle class, and upper class—that are well grounded in the development literature [
25,
26]. To further demonstrate the robustness of our class specification, we provide additional evidence in
Appendix A Figure A1, which compares the fitted distributions under alternative specifications with three classes (
) and four classes (
). The results clearly show that the four-class model provides a superior fit to the data and more accurately reflects meaningful socio-economic heterogeneity. In contrast, a five-class specification (
) generates an additional subgroup that is statistically identifiable but economically ambiguous—for example, splitting the upper class into two subcategories that lack clear policy relevance. Because this would weaken the interpretability and policy implications of our results, we do not pursue
K = 5 further in our analysis.
Income growth in each class is much more rapid than that estimated by consumption growth. Consumption habits are formed over time, and the changes in household consumption are relatively small compared to income. It is worth noting that the poor consume more than their income according to
Table 1. The poor often struggle to make ends meet to maintain their daily lives. Overall, the gap between income and consumption is narrowing year by year among the poor. Household consumption is influenced by many factors, such as income. Grouping based on income does not capture the consumption habits and the increase in consumption potential. Therefore, it is crucial for research on household consumption to compare consumption and income class inconsistency.
According to
Table 1, the estimated mean as class threshold can be interpreted as ‘‘poor’’, ‘‘working class’’, ‘‘middle class’’ and ‘‘upper class’’ income and consumption groups.
Figure 1 shows the potential of household consumption determined by interclass comparison, which is compared with the different classes of income and consumption by the EM algorithm.
Table 2 shows presents the distribution of household consumption potential across different income classes, categorizing households based on their consumption behavior. The rows represent different income classes, while the columns categorize households according to their consumption patterns: underconsumption, equal-consumption, and overconsumption. Each row shows the percentage of households within each income class that fall into one of the three consumption categories. Class column shows the total proportion of each income class in the sample. The last row (potential) shows the overall distribution of household consumption potential across all income classes in the sample, indicating the proportion of households exhibiting underconsumption, equal-consumption, and overconsumption on a national scale.
Different from MPC theory, which enhances the consumption of low-income populations, it seems more appropriate to focus on household consumption potential due to the inconsistency between income and consumption classes. For one thing, regarding the income class, the poor and working class households hold a unique disposition to voraciously consume what we call overconsumption, the class share of households falling within this definition is 46.5% and 22.4%, respectively. For the poor, insufficient income is a primary driver of poverty, which causes many households to be unable to meet the necessary expenses associated with maintaining a basic standard of living. These households often resort to borrowing or engaging in excessive spending practices. To address this issue and increase household consumption, it is essential to raise their income, which remains a significant challenge. There are households in the working and middle classes that persist in maintaining excessive consumption habits. In addition, equal-consumption definitions agree with regard to only 37.1% of the overall population, suggesting that a large portion of income and consumption class households are not consistently categorized. Taking the middle class as an example, 28.9% are middle class by income only. Of those who are middle class by income but not consumption, 36.7% are middle class by consumption (approximately 10.6% of the overall population), and 55.2% are working class and poor by consumption (approximately 15.9% of the overall population). The increase in the proportion of household underconsumption with the upgradation of class suggests that maintaining savings remains an inherent preference for many households in China. Even with an increase in household income, these households tend to allocate the additional funds toward savings rather than consumption. An interesting policy issue is how many households have the ability to increase their consumption within the scope of the same income class. A total of 86.9% of the examined households have insufficient consumption, which is a 48.1 percentage point increase compared to the lower income classes (poor and working class) considering MPC. As a result, the potential of household consumption is viewed as a good perspective of incentive consumption because it is less susceptible for households to budget constraints in China.
There is considerable PHC in China account for 86.9% of the overall population, indicating that for any income class, more than half of the households have consumption potential. In addition, 49.8% of households exhibit inadequate levels of consumption. Both the high-income and middle-income groups should be encouraged to continue to exhibit relatively low consumption habits, comprising 76.7% and 55.2% of their class, respectively. Together, these facts suggest that many households earning upper-class incomes still consume similar to middle class households, and many households earning middle- or working-class incomes still consume similar to working- or poor-class households.