A Model for Risk Prediction of Cerebrovascular Disease Prevalence—Based on Community Residents Aged 40 and above in a City in China

Cerebrovascular disease (CVD) is the leading cause of death in many countries including China. Early diagnosis and risk assessment represent one of effective approaches to reduce the CVD-related mortality. The purpose of this study was to understand the prevalence and influencing factors of cerebrovascular disease among community residents in Qingyunpu District, Nanchang City, Jiangxi Province, and to construct a model of cerebrovascular disease risk index suitable for local community residents. A stratified cluster sampling method was used to sample 2147 community residents aged 40 and above, and the prevalence of cerebrovascular diseases and possible risk factors were investigated. It was found that the prevalence of cerebrovascular disease among local residents was 4.5%. Poisson regression analysis found that old age, lack of exercise, hypertension, diabetes, smoking, and family history of cerebrovascular disease are the main risk factors for local cerebrovascular disease. The relative risk ORs were 3.284, 2.306, 2.510, 3.194, 1.949, 2.315, respectively. For these six selected risk factors, a cerebrovascular disease risk prediction model was established using the Harvard Cancer Index method. The R value of the risk prediction model was 1.80 (sensitivity 81.8%, specificity 47.0%), which was able to well predict the risk of cerebrovascular disease among local residents. This provides a scientific basis for the further development of local cerebrovascular disease prevention and control work.


Introduction
Cerebrovascular disease mainly refers to the abnormal manifestations of the nervous system caused by cerebral vascular hemorrhage and infarction ischemia. It is characterized by sudden, dangerous, difficult-to-treat, and irreversible sequelae. It is the main disease that endangers the health and quality of life of middle-aged and elderly people [1]. The report shows that cerebrovascular disease is the leading cause of death globally and is also the main cause of disability in adults [2]. Cerebrovascular disease has become one of the major global public health problems because of its high morbidity, disability, and mortality [3]. Numerous studies have demonstrated that high blood pressure [4][5][6], diabetes [4,7], dyslipidemia [8], smoking [9] and other bad behaviors and lifestyles are recognized risk factors for cerebrovascular disease. Practice has proved that health education, interventions for high-risk behaviors, and active treatment of hypertension and other related diseases can effectively control the occurrence of cerebrovascular disease, reduce its complications, and improve quality of life [10]. Studies have revealed a presence of low public awareness and low control rate regarding risk factors for cerebrovascular disease [11], which seriously affects the acceptance and cooperation of cerebrovascular disease intervention, and then affects its intervention effect. For this reason, researchers from all over the world, combined with local actual conditions, collected relevant risk factors and used different mathematical models to predict the occurrence and death of cerebrovascular diseases [12][13][14][15][16][17][18]. Such effort will allow us to obtain accurate figures such as relative risk, disease or death risk, etc., which will provide important support for the intervention of cerebrovascular disease. It has become more clear today that predicting the risk of cerebrovascular disease is of great significance.
When many scholars predict the occurrence and death risks of cerebrovascular disease, they are combined with the local reality to make predictions, mainly because the absolute risk of cerebrovascular disease varies from race to race [19]. In addition, the risk of cerebrovascular disease assessed by risk assessment tools may not be suitable for other regions [20]. This provides support for us to carry out the risk prediction of cerebral blood pressure in the Qingyunpu District of Nanchang City, Jiangxi Province. Some researchers set up logistic regression [21] and COX regression [22] predictive models. Since the characteristics of the low incidence of cerebrovascular disease and the nature of chronic non-communicable diseases are more suitable in the Poisson distribution, this study employed a Poisson regression model to explore the risk factors and relative risk of cerebrovascular disease. Some studies have suggested that the Harvard Cancer Index risk method can better predict the risk of chronic diseases and tumors [23,24]. In this study, community residents over 40 years old in Qingyunpu District of Nanchang City were used as the research object, and the Poisson regression model was used to explore the risk factors and relative risks that affect the characteristics of cerebrovascular disease in local residents. The Harvard Cancer Index risk method was used to convert the relative risk into a risk score to predict the risk of cerebrovascular disease of local residents. The establishment of the model system help local primary health service agencies screen out high-risk groups of cerebrovascular disease that meets the actual local conditions and establish local conditions. In addition, it will provide a scientific basis for the establishment of comprehensive prevention and treatment measures for cerebrovascular diseases with local characteristics.

Participants
Nanchang City has six districts and three counties under its jurisdiction. Among them, Qingyunpu is the central city of Nanchang and an important industrial zone of the area. The former large state-owned enterprises in Nanchang City were all located in this area. Due to the basic transformation of enterprise, the residents in the area are mostly employees and their families. The unique characteristics of community residents provide a specific basis for this research.
The participants in this study were community residents aged 40 years old and above that were living in Nanchang, Jiangxi Province, China. The inclusion criteria were: (1) inhabitants who have lived in the area for more than half a year at the time of the survey, (2) being aged 40 and above, and (3) willing to participate in the study and sign the informed consent form. The exclusion criteria were: (1) not being a resident, and (2) those who have not lived in the area for more than half a year.

Sampling Methods
The stratified cluster sampling method was used in this study. One district in the central city of Nanchang, Jiangxi Province, was selected. Two towns were randomly selected from six towns in the district, and one community was selected from each of the two selected towns. All community residents who met the inclusion criteria in the sampled communities were recruited in the survey. A total of 2147 participants were enrolled in this study. Nine logically unqualified cases were removed from the samples following data reviewing and double-checking. The final sample enrolled in this study was 2138 cases, and the overall effective rate was 99.6%.

Survey Content and Method
The questionnaire used in this study was adapted from the template uniformly developed by the Jiangxi Provincial Center for Disease Control and Prevention. The questionnaire's content includes (1) demographics such as gender, age, marital status, occupation, and education level; (2) lifestyle such as exercise habits, smoking, and alcohol drinking; (3) family history of cerebrovascular disease; (4) past medical history such as cerebrovascular disease, hypertension, and diabetes; (5) physical examination such as height, weight, systolic blood pressure (SBP), diastolic blood pressure (DBP), and pulse; (6) laboratory tests, which include fasting blood glucose, glycosylated hemoglobin, blood cholesterol, triglycerides, high-density lipoprotein, and low-density lipoprotein. This household survey was conducted from April 2018 to December 2018. The investigation was organized by the Qingyunpu Center for Disease Control and Prevention and conducted by trained general practitioners and general nurses in a community health service center. Laboratory tests were carried out by the laboratory of the local community health service center.

Criteria
Hypertension refers to the following instances: (1) During the investigation, the respondents reported that they were previously diagnosed by a clinician and had taken antihypertensive drugs in the on-site inspection. Regardless of the current blood pressure level, they were counted as hypertensive patients. For those who had no history of taking antihypertensive drugs, the blood pressure result measured this time was used as the diagnosis basis. (2) During the investigation, if the respondent has not been diagnosed by a clinician, the result of the judgment of hypertension should prevail in this study. The diagnostic criteria are as follows: the SBP ≥ 140mmHg and/or the DBP ≥ 90 mmHg [25]. Dyslipidemia means one or more of the following were detected: the total cholesterol ≥ 6.22 mmol/L (240 mg/dL), triglycerides ≥ 2.26 mmol/L (200 mg/dL), high-density lipoprotein < 1.04 mmol/L (40 mg/dL), or those who are currently taking lipid-lowering drugs [26]. Diabetes refers to fasting plasma glucose ≥ 7.0mmol/L or 2-h postprandial plasma glucose ≥ 11.1mmol/L, or people who are currently taking hypoglycemic agents or insulin [27]. Type of stroke was categorized into four groups: intracerebral hemorrhage, subarachnoid hemorrhage, ischemic stroke, and undetermined stroke [28]. Transient ischemic attack refers to a sudden, focal neurological deficit due to vascular cause, lasting <24 h [28].
In the present study, age was grouped as 40-59 and ≥60 years because the number of illnesses in China was relatively high in these two age groups [29]. Smoking/smokers were defined as those who smoke continuously or accumulatively for ≥6 months in their lifetime. Alcohol drinking was defined as the consumption of alcohol every day for more than one year. The frequency of alcohol drinking was classified into three groups: never drink, drink occasionally, and drink frequently (liquor, ≥3 times per week and ≥100 mL each time). Type of physical exercise was categorized into two groups: sufficient physical exercise (taking physical exercise ≥3 times a week and ≥30 min of moderate intensity exercise each time or engage in moderate or heavy physical labor) and lack of physical exercise. Obesity was defined as body mass index ≥ 28 kg/m 2 .

Grouping
The samples were randomly divided into a modelling group (n = 1069) and a validation group (n = 1069). We adopted a lottery method to stipulate the allocation of odd and even numbers to the participants. The participants with odd numbers were allocated to the modelling group (i.e., 1, 3, 5, 7, 9, . . . , 1137), and those with even numbers were allocated to the validation group (i.e., 2, 4, 6, 8, 10, . . . , 1138). The modelling group was used for establishing the risk prediction model, while the validation group was used to validate the model.

Quality Control
First, the questionnaire and survey plan were developed on the basis of the template formulated by the Provincial Center for Disease Control and Prevention, which is widely used and validated in Chinese settings. The second is the sampling process. Residents were sampled using a stratified cluster sampling method. The modeling group and the verification group were randomly divided into two groups according to the rules of odd and even numbers. These processes were used to ensure the representativeness of the sample and control the bias of non-processing factors between the groups on the results. Third, all the staff participating in the on-site investigation were medical staff from the local community health service center. They had a medical background and received unified training. Fourth, in the process of organization and implementation, the research team took advantage of the local health administrative department to promote the project's smooth development through health administrative means. Simultaneously, the research team also designated professionals to supervise and review the on-site investigation process. Fifth, double-entry and logical review were both carried out during the data analysis stage.

Statistical Analysis
Counting data was expressed by rate or composition ratio, and χ 2 test was used for comparison between groups. The trend analysis used the trend χ 2 test. The Wilcoxon rank sum test was used for comparison of the risk level of two groups. The risk prediction model was developed by the Poisson regression model. The area under the receiver operating characteristic (ROC) curve (AUC), sensitivity and specificity, and other indicators were used to evaluate the predictive effect of the model. The significance level was set at two-tailed α = 0.05. The risk factor model was constructed using a Poisson regression model. Poisson distribution is often used to describe the random distribution of the total number of rare particles in unit time, unit plane, or unit space. Since the prevalence of cerebrovascular diseases in this study is relatively low, and the prevalence data are discrete distribution data, it particularly meets the requirements of the Poisson regression model. Tables 1 and 2 show the conversion standard of risk scores and risk levels based on the Harvard Risk Index [23,24] and the Colditz's standard [23,30]. The calculation of the risk level of individual cerebrovascular disease was conducted as follows:

Calculation of Individual Risk Level of Cerebrovascular Disease
(1) The odds ratio (OR) of each risk factor screened by the Poisson regression model was converted into the risk scores of each impact factor according to the conversion standard shown in Table 1. For example, there are n risk factors that affect the cerebrovascular system, which are recorded as f 1 , f 2 , f 3 . . . f n . Poisson regression model can be used to calculate the relative risk OR of each risk factor, which are OR f1 , OR f2 , OR f3 . . . OR fn . According to the range of OR value in Table 1, the corresponding risk score (RC) can be found. The risk scores of each risk factor were RC f1 , RC f2 , RC f3 . . . RC fn . (2) Calculate the average risk score of the population. First of all, on the basis of the factors f 1 , f 2 , f 3 . . . f n that affect cerebrovascular diseases screened by the Poisson regression, consult relevant literature and find the exposure rate (ER) of each influencing factor in the population. The exposure rates of each risk factor were recorded as ER f1 , ER f2 , ER f3 . . . ER fn . Then, the average risk score (average risk score of the population, AROP) was calculated with the risk score value and exposure rate of the influencing factors. In our study, the AROP was calculated through the following means: (1) According to the China health statistics yearbook 2019 [31], the exposure rate (ER) of CVD patients aged 60 or above in China in 2018 was 0.17.
(2) According to Chinese guidelines for the prevention and treatment of hypertension (revised edition 2018) [32], the ER of CVD patients who had a medical history of hypertension in China in 2017 was 0.28.
According to Guidelines for the Prevention and Treatment of Type 2 Diabetes in China (2017 edition) [33], the ER of CVD patients who had a medical history of type 2 diabetes in China in 2016 was 0.11. (4) According to a Chinese report on stroke prevention and treatment 2017 [34], the ER of CVD patients with a medical history of stroke in China in 2016 was 0.28.
The specific method for calculation: Calculation of the risk score of individual (RCI): Refer to the scoring standard in Table 1, and according to the questionnaire answered by the individual, the risk score points were accumulated if there was a risk factor; otherwise, the risk score was recorded as 0. The individual risk factor scores were RCI f1 , RCI f2 , RCI f3 . . . RCI fn . Sum the risk scores of each risk factor formed the total risk value of the individual.
The specific calculation method is: Calculation of the risk of individual cerebrovascular disease R is as follows: The specific calculation method is:   (Table 2), we divided the individual cerebrovascular risk level R into different levels.

The Prevalence of Cerebrovascular Disease
This study showed a cerebrovascular disease prevalence of 4.5% among the participants. As shown in Table 3, the higher the age, the higher the disease's prevalence rate of the disease. The prevalence was significantly higher among males (6.3%), smokers (8.4%), and the participants who lack exercise (8.0%), with hypertension (6.6%), diabetes (10.5%), and a family history of the disease (7.9%).

The Risk Factors of Cerebrovascular Disease
The Poisson regression analysis showed that age, exercise habits, hypertension, diabetes, smoking, and family history of cerebrovascular disease were related to the participants' cerebrovascular disease (Table 4).

Development of the Risk Prediction Model
The Poisson regression results of this study (Table 4) were used to determine the risk factors' OR size. This section refers to Table 1 to convert various risk factors into specific risk scores. The risk scores of age and diabetes were both 25 points, and the other risk factors were all 10 points (Table 5). The average risk score of the participants of modelling group was determined to be 22.20 in this study.

Validation of the Risk Prediction Model
As shown in Table 6, the prevalence of cerebrovascular disease increased with the risk level in the modelling group, the validation group for predicting risk level, and the validation group of actual risk level (χ 2 trend = 22.583, p < 0.001; χ 2 trend = 21.149, p < 0.001, χ 2 trend = 16.144, p < 0.001, respectively), indicating that the classification of risk level is reasonable. Wilcoxon rank sum test showed that there was no significant difference in the proportion of risk level between the modeling group and the validation group for predicting risk level (z = 0.895, p = 0.371). It was also no significant difference in the proportion of risk level between the validation group for predicting risk level and the validation group of actual risk level (z = 1.124, p = 0.261). This shows that the effect of the risk prediction model is stable.   The ratio R was obtained by substituting the validation group's data into the developed risk prediction model. As shown in Table 7, the R ratio for predicting the total population's risk was 1.80 (sensitivity = 81.8%, specificity = 47.0%), which was considered the optimal positive cut-off point to predict the onset of cerebrovascular disease. The optimal positive cut-off point for the males and the females were 2.03 (sensitivity = 53.9%, specificity = 69.6%) and 1.58 (sensitivity = 72.2%, specificity = 56.5%). Table 7. Sensitivity, specificity, Youden index, positive predictive value (PPV), and negative predictive value (NPV) of the model cut-off points in identifying cerebrovascular disease in men and women in this study. As shown in Figure 1a Table 6. The risk levels and the prevalence of the two groups.  Table 7. Sensitivity, specificity, Youden index, positive predictive value (PPV), and negative predictive value (NPV) of the model cut-off points in identifying cerebrovascular disease in men and women in this study.

Discussion
Previous studies have shown that hypertension, hyperlipidemia, lack of exercise, and diabetes are risk factors for cardiovascular disease [35,36]. Similar findings were found in this study showing that age, hypertension, diabetes, lack of exercise, smoking, and family history of cerebrovascular disease were risk factors for cerebrovascular disease. Thus, the

Discussion
Previous studies have shown that hypertension, hyperlipidemia, lack of exercise, and diabetes are risk factors for cardiovascular disease [35,36]. Similar findings were found in this study showing that age, hypertension, diabetes, lack of exercise, smoking, and family history of cerebrovascular disease were risk factors for cerebrovascular disease. Thus, the timely detection of these risk factors and interventions is of great significance to preventing and controlling cerebrovascular diseases.
This study found that older ages were associated with a higher prevalence of the CVD. Roughead et al. [37] described a similar finding in that the older the age, the higher the hospitalization rate. This may be related to the physiology of the elderly. The complexity and severity of the disease will increase with age, which will lead to the increased utilization of inpatient health services.
This study found that hypertension and diabetes were significant risk factors for cerebrovascular disease, consistent with previous studies [38][39][40]. Previous studies have demonstrated that about 54% of strokes and 47% of ischemic heart disease worldwide are due to hypertension [38,39]. The incidence and mortality of cerebrovascular disease in diabetic patients are significantly higher than those in non-diabetic patients [40]. Diabetes can cause the basement membrane of tiny blood vessels to thicken, atherosclerosis of small and medium blood vessels, and large and medium blood vessels, leading to secondary hypertension and dyslipidemia, which are the pathological basis of ischemic stroke.
This study showed that smokers are 1.949 times more likely to suffer from cerebrovascular disease (95%CI: 1.015~3.534) than non-smokers. Some reports have shown that both active and passive smoking can increase the occurrence of atherosclerosis [41]. Pirie et al. [42]. Reported that the risk of stroke decreases with the increase of time to quit smoking, showing that smoking cessation is essential to preventing cerebrovascular disease.
This study shows that people who lack physical exercise have a higher risk of cerebrovascular disease (95%CI: 1.258~4.068) compared with people who regularly exercise. Large cohort studies confirmed that lack of exercise could increase the morbidity and mortality of the cardiovascular disease and stroke risk [43,44]. It may be because physical activity can effectively regulate blood lipids and blood pressure, thereby reducing the risk of cerebrovascular disease. These results indicate that the community should actively call on adults to exercise. This study also found that the OR of people with a family history of cerebrovascular disease was 2.315 (95%CI: 1.167~4.593), higher than those with no family history of stroke. Other studies have demonstrated that a family history of stroke is a significant risk factor for stroke onset [45][46][47]. The influence of family history on risk factors may go through two links [48]: (a) the genetic susceptibility to stroke risk factors such as hypertension increases and (b) family eating habits may affect their relatives or offspring.
The primary purpose of this study was to develop a risk prediction model for cerebrovascular disease. We found six risk factors (age, exercise habits, hypertension, diabetes, smoking, and family history of cerebrovascular disease) related to cerebrovascular disease. Thus, a risk prediction model was initially developed. This model was tested in the present study and showed excellent validity in detecting cerebrovascular disease.
The results showed that the AUC for identifying the cerebrovascular disease in the community residents was 0.686, and 0.664 in both men and women. AUC values were all between 0.5 and 0.7, indicating that the model has a reasonable degree of discrimination. The ratio R of the model for total sample was 1.80 (sensitivity = 81.8%, specificity = 47.0%), which was determined to be the optimal positive cut-off point to predict the onset of the cerebrovascular disease. The sensitivity for the total population in this study was 81.8%, which is higher than the 53.0% reported from a recent study, indicating potentially clinical value of the model. This study has several limitations. First, this research data came only from one district in Jiangxi Province, China, which may be suitable only for the sub-cultural background of the local area. The sample size needs to be further increased to improve the representative and accuracy of the model. Second, the cerebrovascular disease risk prediction model constructed in this study was based on a cross-sectional study. The optimization of the model requires further follow-up validation in community populations. Third, the multifactor model of this study did not include other risk factors for cerebrovascular diseases, such as dyslipidemia. The risk prediction model also lacks specific general applicability in application and promotion due to differences in regional economic conditions and eating habits. Fourth, the male and female positive cut-off values of the cerebrovascular disease model constructed in this study are quite different.

Conclusions
In summary, this study showed that the prevalence of cerebrovascular disease among community residents over 40 years old in Nanchang City, Jiangxi Province, was 4.5%, which provided basic data on cerebrovascular disease in this area. In addition, this study also found that the important risk factors for local cerebrovascular diseases of local residents including old age, lack of exercise, high blood pressure, diabetes, smoking, and family history of cerebrovascular disease with the relative risk ORs of 3.284, 2.306, 2.510, 3.194, 1.949, and 2.315, respectively. The cerebrovascular disease risk prediction model established on the basis of these six risk factors had an R value of 1.80 (sensitivity 81.8%, specificity 47.0%), which can be used to better predict the risk of cerebrovascular disease of local residents. Findings from this study may provide a scientific basis for the further development of local cerebrovascular disease prevention and control strategies. At the same time, the findings from this study may form a basis for the future need for more large-scale research and verification in terms of the Chinese population.  Acknowledgments: Thanks to all staff members of the Center for Disease Control and Prevention of Qingyunpu District for their assistance and support in the data collection.

Conflicts of Interest:
All of the authors declare that there were no conflict of interest.

Abbreviations
AUC Area under the curve CVD Cerebrovascular disease DBP Diastolic blood pressure OR Odds ratio ROC Receiver operating characteristic SBP Systolic blood pressure