2.1. Empirical Method
Personal medical expenditure is a continuous variable with an interval of [0, +∞). Its distribution has at least three characteristics that make it unsuitable for general regression analysis. First, about 20% of the population has no medical care expenditure in a given year; second, 80% of the population has positive medical expenditures, and medical expenditures are highly biased; third, while 10% of the population has hospitalization expenditures, the right tail of the distribution of medical expenditures does not have a normal distribution [
26]. In brief, the highly skewed distribution of medical expenditures, with a large number of non-spenders, makes reliable estimation and prediction difficult [
26]. Traditional linear regression methods are not suitable to analyze the residents’ medical service utilization.
Duan et al. [
26] reviewed alternative models for cross-sectional medical cost data. Specifically, they discussed three models often used in health economics: (1) a one-part model which utilizes a two-parameter Box–Cox transformation of expenditures; (2) a two-part model with a pair of equations-a probit model for the probability of cost being positive and a linear model for the level of the (log transformed) positive cost; (3) a four-part model to further distinguish inpatient and outpatient costs, with four separate equations. This four-part model describes: (a) the probability of a positive expense; (b) the probability of a positive inpatient expense conditional on a positive expenditure; (c) the level of expense for those with only outpatient medical service; and (d) the level of expense for those hospitalized. They applied these models to a large portion of the Rand Health Insurance Experiment (RHIE) data and found that both the two-part model and the four-part model performed better than the one-part model in terms of consistency and accuracy for making forecasts. They expected the four-part model would excel the two-part model if more data were available, since the two-part model has been shown to lead to inconsistent predictions [
27].
In this paper, the four-part model method is adopted to investigate the influence of NRCMS on the medical service utilization and medical expenditure of rural elderly chronic disease patients in China. It divides the research sample into three groups: zero medical expenditure group (with no medical expenditure in a given period of time), outpatient group, and hospitalization group. Logarithmic transformation of non-zero medical expenditure is done to make the distribution of medical expenditure unbiased, consistent and approximate positive, leading to a more accurate estimation [
26]. The specifics are as follows:
Model 1: Probability model of medical treatment choice of chronic disease patients, namely ‘whether to seek medical treatment’ model. This model is about the probability model of choosing medical treatment for observed samples in a given time. It distinguishes whether to see a doctor or not, which is the basis of investigating the medical service utilization and the cost burden of the observed samples. In this model, the probability (
) of the of occurred medical treatment (including outpatient and hospitalization) in a given time is expressed as follows:
In Formula (1), means there is medical service utilization and medical expenditure; otherwise it is 0. , this part takes all the samples and the samples participating in the NRCMS as the research object, and investigates the influencing factors of the decision-making of the elderly chronic disease patients. Because this part is divided into ‘choose to see a doctor’ and ‘not choose to see a doctor’, which are set as a ‘0–1’ variable, the binary probit regression model is adopted.
Model 2: Probability model of hospitalization choice, i.e., for the sample seeking medical treatment, whether they choose outpatient or hospitalization is also a ‘0–1’ variable, and it is expressed as follows:
In Formula (2), . Same as model 1, the dependent variable is a binary variable, so the binary probit regression model is used again.
Model 3: Medical expenditure model of outpatient samples is expressed as follows:
In Formula (3), .
Model 4: Medical expenditure model of hospitalization samples is expressed as follows:
In Formula (4), .
Because the dependent variables in model 3 and 4 belong to continuous variables, the least square method is used for regression. In Formulas (1)–(4): means individual ’s decision-making on medical behavior, including whether to seek medical treatment, and whether to choose hospitalization or outpatient in the treatment; refers to individual medical expenditure burden in medical treatment; represents variables of social demographic and economic characteristics of NRCMS, number of visits, travel distance, individual characteristics and family characteristics; is the error term.
As NRCMS emphasizes the voluntary participation principle, the participation of the rural elderly is not an exogenous random decision, but is influenced by economic conditions, health and other factors with endogenous nature. A common solution is the instrumental variable method. A commonly used instrumental variable selection method is to adopt macro-level variables, such as the provincial NRCMS participation rate published annually by the Ministry of Human Resources and Social Security of China. Calculated by dividing the number of participants by the number of household registrations, it is related to the decision-making on individual participation, but not directly related to the decision-making on individual medical service utilization and expenditure. Therefore, this paper employs the provincial participation rate as an instrumental variable.
In addition, the instrumental variable method can be implemented by 2SRI. Actually, in health economics research, dependent variables are often limited dependent variables, counting variables or skewed distribution, etc. As a consistent and effective estimation method to solve the endogenous problem of non-linear models, 2SRI is widely used in health economics researches [
28,
29,
30,
31]. In the meantime, in order to ensure the robustness and consistency of the results, 500 times bootstrapping will be used to estimate the regression parameters.
2.2. Data Sources and Main Variables
The primary database used in this work is drawn from the China Health and Retirement Longitudinal Study (CHARLS). CHARLS is a nationally representative longitudinal survey of persons in China 45 years of age or older and their spouses, including assessments of social, economic, and health circumstances of community residents [
32]. All data will be made public one year after the end of data collection. CHARLS adopts multi-stage stratified PPS sampling. As an innovation of CHARLS, a software package (CHARLS-GIS, CHARLS, Peking University, Beijing, China) is being created to make village sampling frames. The baseline national wave of CHARLS is being fielded in 2011 and includes about 10,000 households and 17,500 individuals in 150 counties/districts and 450 villages/resident committees (or villages) from 28 provinces. Furthermore, the CHARLS respondents are followed up every two years, using a face-to-face computer-assisted personal interview. More details are offered by Zhao et al. [
32].
This study uses the data of CHARLS in 2011 and 2013, which is a survey conducted in 28 provinces in China in 2011 and 2013 with people over 45 years old and their spouses as the subject, getting final samples of 17,708 and 18,605, respectively. Since we are interested in exploring the relationship between the health of the elderly and the social engagement, we restrict our attention to the subsample of the elderly in China, and further limit our sample to respondents who are aged 60 or above. After eliminating the missing key variables and the urban samples classified by the National Bureau of Statistics, 2418 samples are used, of which 1411 and 1007 are from 2011 and 2013, respectively. The main variables involved in this study are as follows:
2.2.1. Chronic Diseases
In the CHARLS questionnaire, the chronic disease information of main respondents can be obtained by asking ‘Did a doctor ever tell you that you had the following chronic diseases, including hypertension, dyslipidemia (elevation of low density lipoprotein, triglycerides and total cholesterol, or a low high density lipoprotein level), diabetes or high blood sugar, cancer or malignant tumor (excluding minor skin cancers), chronic lung diseases (such as chronic bronchitis, emphysema but excluding tumors or cancer, liver disease (except fatty liver, tumors and cancer), heart attack (including coronary heart disease, angina, congestive heart failure, or other heart problems), stroke, kidney disease (except for tumor or cancer); Stomach or other digestive disease (except for tumor or cancer), emotional (nervous or psychiatric) problems, memory-related disease, arthritis or rheumatism, asthma’, ‘Did you know you have hypertension, chronic lung disease, emotional and mental problems?’ If the main respondent is informed by the doctor and knows that he has these chronic diseases, the value is ‘1’, otherwise it is ‘0’; if the main respondent suffers from chronic lung diseases, heart disease, stroke and cancer and other malignant tumors, the value is ‘1’, otherwise it is ‘0’; Having hypertension, diabetes and other chronic diseases are considered to be suffering from mild chronic diseases, so the value is ‘1’, otherwise it is ‘0’.
2.2.2. Medical Service Utilization
This index is divided into two parts: outpatient and hospitalization. According to the CHARLS questionnaire, there are the following questions: In the outpatient medical service utilization, ‘In the past month, did you go to a medical institution to see an outpatient clinic or receive on-site medical services (excluding physical examination)?’ ‘Have you ever been ill in the past month?’ Questions such as ‘What’s the main reason for you not to see a doctor’, ‘What medical institutions did you go to for outpatient treatment in the past month’, ‘How many times did you go to this medical institution in the past month” provide relevant data. In the part of hospitalization medical service utilization, ‘In the past year, did a doctor say that you should have been hospitalized’, ‘What are the main reasons why you did not go to hospital’, ‘Have you ever been hospitalized in the past year’, ‘How many hospitalizations have you received in the past year’ and other questions are asked to understand the inpatient medical service utilization of respondents.
2.2.3. Medical Expenditure
Similar to the second part, the medical expenditure of the sample is divided into outpatient and hospitalization which is measured by asking ‘What was the total cost of visiting this medical institution in the past month? How much did you pay yourself’, and ‘What is the total cost of hospitalization in the past year (excluding escort, family transportation and accommodation)?’ ‘How much of it was self-paid?’
2.2.4. NRCMS
Firstly, according to the data of the two periods of CHARLS, the proportion of farmers participating in NRCMS is over 91%, while about 9% do not. Therefore, the following part investigates the differences of health status, utilization of medical services and expenditure between the participating elderly and the non-participating elderly by distinguishing whether the farmers have participated in NRCMS. Secondly, combined with CHARLS, this study also investigates how NRCMS policy affects the medical service utilization and expenditure of the elderly, especially the chronic disease patients, from the dimensions of supplementary medical insurance (such as major illness medical treatment) and reimbursement methods.
2.2.5. Other Variables
Other control variables also are controlled, such as travel distance, regional dummy variables, per capita family income, gender, age, education level, marital status, whether to live with children, family size (excluding the main respondents and their spouses) and other socio-economic demographic characteristics. What’s notice, it refers to the distance between the participating farmers’ medical treatment places and their homes, which not only reflects their medical treatment cost, but also the accessibility of medical service resources, the level of medical institutions and the service quality.