Internal Differentiation within the Rural Migrant Population from the Sustainable Urban Development Perspective : Evidence from China

Population mobility and attendant issues, especially housing issues, have a major impact on sustainable urban development. In the urbanization process, a number of micro-communities with various social characteristics have come to compose the rural migrant population (RMP), resulting in internal differentiation. This study aims to reveal the demographic structure of this specific group, and to analyze the effects of the mechanism between population flow trends and sustainable urban development, taking housing demand as a starting point. To this end, a clustering model for mixed-type data based on partitioning around the medoid is proposed, and the linked characteristics and potential laws of the RMP are analyzed, based on the dynamic data of the migrant population in eastern China. To achieve sustainable urban development, the locational preferences and coping strategies of inflowing micro-communities based on city types are demonstrated. The results show that the RMP can be divided into four groups that have strong representativeness and that show significant differences in population structure and housing demand. Super-large and medium-sized cities are the main migration destinations. Several suggestions are proposed, based on these results. Housing security policies should be designed according to the housing demand characteristics and the spatial distribution of different groups. Housing security policies should play a full and positive role in reasonably guiding RMP movement.


Introduction
There is a special migrant group called the rural migrant population (RMP) in China, because of the dual household registration system [1].Since the reform and opening-up, more rural surplus labor has transferred to cities in the accelerated urbanization process in China, and this has become the main body of industrial workers [2][3][4].According to the National Bureau of Statistics, the number of peasant workers in China has reached 286.52 million, among which 171.85 million or more than 70.43% are migrant workers [5].Due to the country's large population base and rapid growth, the conversion of the RMP from farmers to residents must be a step-by-step process.In this process, there is differentiation by social class within the RMP, which generates a series of micro-communities with various characteristics.The development trends between these micro-communities manifest differently, according to aspects such as social factors.As a result, various demands occur for social resources, particularly housing demand, and these demands must receive attention as a critical part of sustainable urban development [6].On the other hand, the RMP is not eligible for the social benefits resulting from economic growth because of household registration restrictions [7,8].This population's position is on the edge of urban society; that is, it is excluded from urban society [6,9].Although this social status quo has been improved by the housing security policy, current policy does not perfectly match the population's needs [10].A serious contradiction between the social status quo and actual demand is generated, which poses a severe challenge to the sustainable urban development of population and housing [11].
In the field of RMP, previous studies related to the complex relation between housing demand and housing security patterns have contributed to discovering the significance of improving housing policies based on actual demand.However, most researchers analyze this issue using theoretical logical analysis or case studies from the perspective of the housing or the security status quo of the RMP [1,12,13].This might be because of the difficulty of collecting data related to the RMP.There is less empirical analysis in this research field, based on statistical data from systematic surveys, and few studies associate the heterogeneity of the RMP with the housing demand.
The major objectives of this paper are (1) to analyze the actual housing demand of the RMP from the perspective of internal differentiation, and (2) to put forward proposals based on the improvement of housing security policies to promote the sustainable development of urban housing.To achieve these goals, we propose an improved clustering analysis algorithm to demographically differentiate groups within the RMP, which contributes to detecting the defects of the macroeconomic housing policies in China.To make the results more persuasive, our empirical analysis is discussed based on the microdata from the China Migrants Dynamic Survey (CMDS), which is the novelty of this study.
This paper is structured as follows.Section 2 introduces the previous research and background of the internal differentiation within the RMP and the population's housing demands.Section 3 presents the data sources and definitions of influencing factors.Section 4 elaborates the principle of the data processing algorithm used.Section 5 outlines the experimental process and discusses the analytical results.Finally, the major conclusions are summarized, based on which the corresponding policy suggestions are given.

Internal Differentiation of Migrants
Academically, rural migrants have been systematically defined by scholars as populations that float from rural to urban areas [14].Non-agricultural workers or surplus labor from rural areas (the so-called "peasant workers") represent the object we focus on in this paper.The serious social issues caused by this special population have long been a focus of attention in academic circles.Some researchers have developed concepts and correlative theories to uncover the social roots and inherent development characteristics of this heterogenous population.However, the majority of studies focus on sociology and demography.One study found significant social class differences within RMP groups, in what is known as social stratification, and the initial understanding of the RMP as a whole was gradually displaced [15,16].Another study found that more distinctions between individual and sometimes micro-communities are coming to be universally acknowledged [17].Essentially, this conclusion indicates that the groups within the RMP are not homogeneous.As a result of internal differentiation, the RMP can be generally divided into many contrasting micro communities, also called clusters, by using certain criteria (see Section 2.2) [18].
Another research method is to analyze micro survey data by statistical methods to explore the population classification and its related social problems.Previous studies with different aims have discussed this issue from multiple perspectives, differentiating the population by considering aspects such as gender [19], race [20], social services [18], generation [21], occupation [22], and urban identity [23].The internal differentiation of migrants may be a reason for the demand for different social recourses.Sustainable housing is the fundamental material prerequisite for living, as well as a key factor in livelihood issues [24].Exploiting the internal connection between differentiation and housing demand is becoming a new research hotspot in the field of sociology.

Influencing Factors of Housing Demand
The social problems of RMP are regional, which has led to little academic research worldwide, with the exception of China [25].However, there are abundant publications in the field of housing demand, which can be referenced in our study.Previous research on this issue consists of two aspects: macroscopic analysis and microcosmic analysis.From the macroscopic perspective, the connection between statistical indicators and sustainable supply and demand in the housing market is discussed based on the national or social level.Kagochi and Mace examine the influencing factors of housing demand for a single household, based on panel time series data for 1988-2007, and elaborate the results, including population growth, sale of existing houses, cost of building, and unemployment rates [26].Similarly, Turkish scholars report that real mortgage interest rates and household expenditure are regarded as the major demand-related factors, on the basis of a survey of recent housing projects from 2016 to 2018 [27].In addition, some researchers have found that the housing loan rate and inflation have a significant effect on housing price, thereby further affecting housing demand [28].The macroscopic analysis method is sufficient to identify the demand changes of the real estate market, but it is unable to reflect the individual housing demand.To solve this problem, through microdata, the housing demand of a special population group is proposed to obtain the specific requirements on the individual level.
Due to the immense diversity of the datasets used for analysis, previous studies yield significantly different final conclusions.Research on housing demand may have started with Mankiw and Weil's empirical analysis.Their results were obtained by a statistical regression model applied to American census data for 1970, which is regarded as a powerful tool for describing the reasons for changes in the housing price and housing demand.The intergenerational difference, namely, age, has been shown to be an influencing factor of housing demand by many researchers [29][30][31][32][33].With the improvement of the model and dataset, more potential confounders have gradually been identified.Han presented a particular method to estimate the housing preference, which can simulate the actual housing demand of Shanghai residents in assumed conditions that are related to household income, family size and generation.The price-income ratio, age composition, and household size were found to have significant effects on housing demand [30].In addition to household size and income, career has been found to be a potential stimulus for housing demand.This argument was supported by Oktay by using survey data of households living in the Erzurum city centre [34].Eichholtz studied the housing conditions of English residents to describe the connection between demographic characteristics and the demand for residential real estate, and found that housing demand is influenced by the price-income ratio, age composition, and household size [31].Flambard estimated the housing demand in northern France according to sociodemographic factors, and noted that residential choices strongly depend on the rent-to-income ratio and the distance to work.Moreover, housing preferences display striking differences in different locations [32]; we elaborate on this issue in Section 5.2.A recent study related to rural-urban migration in China evaluated the current dynamics of living, and creatively showed that an individual's native place might have an enormous impact on the housing demand of Chinese migrants.The idea that improving the housing system is an effective approach to resolving the housing problem for migrants in China is elaborated at the end of this paper [6].Through the above expositions, we find that there are great differences in the influencing factors of regional research, although some factors, such as age, income, education, and household size, are widely recognized.More potential influencing factors in addition to these common factors should be identified according to local characteristics.Such an analysis can clearly reflect the demand for housing, which is the first and most critical step in promoting the sustainable development of urban housing.

The Data
Our empirical analysis of the internal differentiation of the RMP is derived from the microdata on the RMP in the CMDS [35], a continuous research project that is designed by the National Health and Family Planning Commission of China, and is implemented by the Population and Development Research Center.The survey collected valuable information on the RMP and its households, including demographic characteristics, housing attributes, income structure, and consumer status.The subjects of the research area were individuals belonging to the RMP (aged 19 to 59) who had lived in the area for more than one month.
We selected the CMDS as our basic dataset for three reasons.First, the CMDS dataset covers up to 320 dimensions, which is enough for the empirical analysis.Second, the CMDS adopts a stratified three-stage probability that is proportionate to size sampling, which is the most credible census method.Third, the CMDS removes temporary floating populations, such as students and visiting relatives.The screening of the migrant population by the CMDS was consistent with the data conditions demanded in this paper, and provided sufficient screening space for the empirical analysis.
We collected the latest CMDS data for 2014 as the sample of empirical analysis.The CMDS 2014 dataset consisted of 82,985 valid data items collected from 10 cities in eastern China.According to the definition of the RMP by the National Bureau of Statistics and this paper, the qualified data should meet the following requirements: 1.
The value of simp_type (sample type) must be the residents' committee, which means these respondents lived in urban areas.

2.
The value of acc_nat_1 (account nature) must be an agricultural registered permanent residence 3.
The value of flo_reason_1 (flow reason) must be to do business or to seek a job in the city.
We obtained a preliminary dataset consisting of 36,764 RMP samples on the basis of the above principles.Then, we filtered out 1260 samples with either missing critical values or logical errors from the preliminary dataset.Our final dataset consisted of 35,504 samples.

Feature Selection and Descriptive Statistics
There were 320 variables related to individuals and households in the original dataset, some of which were redundant for this study, and had to be eliminated.Due to the insufficiency of the clustering algorithm, redundant input parameters will lead to inaccurate results.Previous research has analyzed the influencing factors of the heterogeneity of the RMP and this population's housing demand.Through a survey of those relevant literatures, we obtained an ordered list according to the number of occurrences of the influencing factors (see Table 1).To improve the reasonability of the final empirical results, we filtered and modified 10 factors (Top 10) as research variables, based on previous findings and the purpose of this study, as shown in Table 1.In addition, the results of the basic descriptive statistical analysis are shown in Tables 2 and 3.

Decision for
Settlement categorical The answer to the question "Do you want to live locally for more than five years?"; includes "yes", "no" and "undefined" [37,42] Number of Minor Children continuous Number of children under 18 years of age raised by the respondent [39,43] Housing Nature categorical The nature of the respondent's dwelling; includes "rental house", "commercial house", "free house" and "other" [37,44] Notes: the birt_year represents the birth year of respondents; the flo_year indicates the migrating year of respondents.Source: Own elaboration.

Partitioning around the Medoid (PAM)
The PAM algorithm, a variant of the K-medoids algorithm, is a clustering algorithm with the partitioning method.It is crucial for PAM to determine the center of each cluster.Compared with the traditional K-means algorithm, the PAM algorithm chooses "medoids" instead of the "mean value" to represent the center of the cluster; thus, PAM is less sensitive to outliers and noise data [45], and has better clustering accuracy.In addition, the algorithm has the advantages of simple implementation, low space complexity, and stronger applicability to various types of data [46].While there is a positive correlation between the time complexity of the algorithm and the sample size, the PAM algorithm is still suitable for solving cluster analysis problems of multivariate cross-sectional data [46].In the first step of PAM, the initial medoids of each cluster are randomly generated based on the number of clusters (generally expressed in k).To reduce the errors of the objective function, the algorithm iteratively modifies the medoids.The final clustering results are obtained when the objective function is optimal.
In previous studies, this algorithm has been widely applied to different fields [47][48][49].Different packages can be directly applied in many kinds of software for data analysis.Therefore, the detailed procedure will not be described here.For more information about the principle and implementation of PAM, please refer to a previous study [50].

Between-Class Distance Computation
In the process of clustering analysis, after choosing the clustering algorithm, we have to select the calculation method of the between-class distance according to the characteristics of our data.It is particularly important to choose the most suitable between-class distance computation, which can not only improve the credibility of the final clustering results, but also make the realistic conclusions drawn from the results more valuable.There are various kinds of between-class distance computations, for instance, Euclidean distance, Manhattan distance, and Chebyshev distance.Considering that there are two types of data (namely, numbers and characters) in the dataset of this paper, we attempted to overcome this problem by introducing Gower's dissimilarity (GD) coefficient into the empirical model.
GD, a method used to measure the similarity among different samples, was proposed by J.C. Gower in 1971.The model introducing GD offers us more choices for determining a cluster algorithm because it can measure the between-class distance of a dataset with both continuous and categorical data.
According to Gower [51], sample x i and x j are rewritten as x i = (x i1 , x i2 ,..., x in )' and x j = (x j1 , x j2 ,..., x jn )', respectively, and k is defined as the size of the dimension.For continuous data, the GD between sample i and sample j can be calculated by the following equation (1): where R k is the difference between the maximum and the minimum on the k dimension of sample x.
For categorical data, variables with m categorical attributes are broken down into m 0-1 variables.
Then, the distances between each sub-variable are weighted and summed by using the strategy of the Dice coefficient [52] to obtain the final GD between samples i and j, by which the following equation ( 2) can be calculated: where s ijk and δ ijk are the markup and weight values, respectively, whose values depend on the value of certain attributes.The markup and weight values of different cases are reported in Table 4.

Attributes of Sample
Source: Own elaboration based on [51].

Number of Clusters k
In a practical clustering analysis process, the k value, which is the number of clusters we initially set, has a profound impact on the final clustering result.It is rather difficult to determine the appropriate k value beforehand.To address this problem, a series of solutions based on practical experience and speculative knowledge have been proposed.On the one hand, the k value must be less than the number of characteristic variables, because it is hoped that features of each cluster are part of the characteristic variables of the source data.On the other hand, a number of statistical indices, such as the Silhouette coefficient (SC) index, have been proposed to estimate the clustering performance with different k values [53].
Based on these solutions, we set a possible range of k values, according to previous studies and the idiographic conditions of the dataset, after which the clustering results are evaluated by the SC.The calculations of the SC often involve the individual SC and the global SC.For each sample in the dataset, the individual SC can be expressed as: where the subscripts i indicate the sample, a i is the average distance between sample i and other samples belonging to the same cluster as sample i, and bi is the average distance between sample i and the sample belonging to the most similar but different clusters of sample i.According to Equation (3), the maximum and minimum values of s i are 1 and −1, respectively, and the closer s i is to 1, the more accurate the clustering result is.The global SC is the average of the individual SC of all samples.We can use it to evaluate the rationality of the clustering results.The larger the value is, the better the performance.

A Mixed-Type Data Clustering Analysis Model
The general solution to mixed type data clustering analysis consists of two steps: A similarity measure, and algorithm selection.For the purposes of this study, an improved model based on PAM for solving the mixed type data clustering problem is provided.In this model, the similarity of samples is calculated by using the Gower coefficient.Then, the clustering experiments are carried out with different initial numbers of clusters, after which the clustering results are verified based on the theory of the SC.The best result has the characteristics of the maximum global SC value.The flowchart of the model is shown in Figure 1.
the theory of the SC.The best result has the characteristics of the maximum global SC value.The flowchart of the model is shown in Figure 1.

Empirical Analysis
Both practical experience and data regularity should be considered when the k value is determined subjectively.Previous researchers have suggested that two to five clusters might realistically represent the RMP class differentiation [55].Based on the requirements of the model that we discussed in Chapter 4, the initial k value must be less than the number of influencing factors that we selected.When the k value varies in the interval [2,10], the dataset is analyzed by using the model that we proposed.The SC values of each clustering result are calculated and compared, and the results are displayed with the curve in Figure 2. The best k value can be obtained from Figure 2. When k < 4, the SC is positively correlated with the k value, which indicates that the clustering result is closer to reality.Nevertheless, the SC falls sharply when k = 5.After SC increases, it falls after k > 6.According to the above analysis, we find that the SC peaks at 0.683 when k = 4, which means that it is most ideal to aggregate the overall samples into four clusters.
RStudio v.1.1.456(Copyright RStudio Inc., Boston, MA, USA), an integrated development environment for R language, was used for statistical analysis.The terse and effective programming environment with code editing is established by providing extensive program packages to reduce the time cost of modelling.There are two functions in the cluster package: daisy and pam.The former can be used to compute the pairwise dissimilarities between samples with mixed type variables,

Empirical Analysis
Both practical experience and data regularity should be considered when the k value is determined subjectively.Previous researchers have suggested that two to five clusters might realistically represent the RMP class differentiation [54].Based on the requirements of the model that we discussed in Chapter 4, the initial k value must be less than the number of influencing factors that we selected.When the k value varies in the interval [2,10], the dataset is analyzed by using the model that we proposed.The SC values of each clustering result are calculated and compared, and the results are displayed with the curve in Figure 2.
the theory of the SC.The best result has the characteristics of the maximum global SC value.The flowchart of the model is shown in Figure 1.

Empirical Analysis
Both practical experience and data regularity should be considered when the k value is determined subjectively.Previous researchers have suggested that two to five clusters might realistically represent the RMP class differentiation [55].Based on the requirements of the model that we discussed in Chapter 4, the initial k value must be less than the number of influencing factors that we selected.When the k value varies in the interval [2,10], the dataset is analyzed by using the model that we proposed.The SC values of each clustering result are calculated and compared, and the results are displayed with the curve in Figure 2. The best k value can be obtained from Figure 2. When k < 4, the SC is positively correlated with the k value, which indicates that the clustering result is closer to reality.Nevertheless, the SC falls sharply when k = 5.After SC increases, it falls after k > 6.According to the above analysis, we find that the SC peaks at 0.683 when k = 4, which means that it is most ideal to aggregate the overall samples into four clusters.
RStudio v.1.1.456(Copyright RStudio Inc., Boston, MA, USA), an integrated development environment for R language, was used for statistical analysis.The terse and effective programming environment with code editing is established by providing extensive program packages to reduce the time cost of modelling.There are two functions in the cluster package: daisy and pam.The former can be used to compute the pairwise dissimilarities between samples with mixed type variables, The best k value can be obtained from Figure 2. When k < 4, the SC is positively correlated with the k value, which indicates that the clustering result is closer to reality.Nevertheless, the SC falls sharply when k = 5.After SC increases, it falls after k > 6.According to the above analysis, we find that the SC peaks at 0.683 when k = 4, which means that it is most ideal to aggregate the overall samples into four clusters.
RStudio v.1.1.456(Copyright RStudio Inc., Boston, MA, USA), an integrated development environment for R language, was used for statistical analysis.The terse and effective programming environment with code editing is established by providing extensive program packages to reduce the time cost of modelling.There are two functions in the cluster package: daisy and pam.The former can Sustainability 2018, 10, 4839 9 of 15 be used to compute the pairwise dissimilarities between samples with mixed type variables, while the latter is a function implementation of the PAM algorithm.Furthermore, the SC can be acquired from the results computed by the pam function.

Population Clustering Results based on Housing Demand
Empirical analysis of the 10 variables of the RMP in our study is presented in the descriptive statistical summary in Table 5.To highlight the unique characteristics of each cluster, and to make the results more intuitive, some variable groupings with an extremely small sample sizes are merged or modified.As shown in Table 5, it may be inferred that the class stratification of the RMP is mainly manifested in the following aspects.
First, there are some common characteristics among the four clusters.Populations with low housing consumption levels and personal housing renters make up a relatively high proportion in all four clusters of the RMP.This indicates that the living conditions of the RMP are likely to be poor, as most of this population has little capacity to pay for houses in the city.Worse, this difficult situation has not improved.
Second, while the above characteristic is common to the four groups, other characteristics display significant differences.For instance, the population of the first cluster (#1) is similar to that of the third cluster (#3) in terms of income level, education level, employment status, and settlement intention.The difference between them is that the former consists mainly of the new generation, and the latter is elderly.Table 5 shows the descriptions of the four clusters.Next, some preliminary conclusions are presented.The population of the first cluster, with an average age of 26.21 years, has a considerable proportion with a junior high school education, is generally engaged in service occupations, and has a low income level, on average.However, due to the small average household dependency burden, the level of household consumption is lower.Characteristics of the individual and of the urban migration are deficient in this group.The third cluster, which is limited by a low education and an advanced age of 47.90 years, on average, obtains a meagre remuneration by participating in manual labor (such as construction work).Because of weighty family obligations, the people in this cluster are forced to hunt for a better-paying job in the city.When their living conditions improve, they will most likely return to the countryside to settle down, because they lack identity and belonging in the urban setting.According to the above analysis, the first and third clusters are defined as "new generations just entering" (#1) and "old generations just entering" (#3), respectively.
Third, there are some symbolic characteristics of each cluster.Taking the second cluster (#2) and the fourth cluster (#4) as examples, the keywords of the second cluster include middle-aged, married (at least 87.67%) with a child (81.4%), a talent in a technical profession (analysis based on the original dataset), and a higher level of education (71.73% with senior high school education).Therefore, the proportion of this cluster that is engaged in a technology-based occupation is larger, and the income is higher.Further analysis based on the original dataset indicates that the migration pattern of this cluster is family migration, involving peasant workers who migrate with their children.This group pays more attention to the supporting public resources than to other aspects, and has a greater possibility of settling in urban cities.The age composition of the fourth cluster is more complex and consists chiefly of middle-aged and elderly people.Most of them work as employers and self-employed workers, and they have a strong desire for settlement in urban cities.Either the income level or the social position of the population of the fourth cluster is higher than that of the people in the other clusters.Their long-term work experience and strong city identity make it possible for these people to hold a stable job and to live comfortably in urban cities.These are the reasons for why their homeownership rate is the highest among the four clusters.According to the above analysis, the second and fourth clusters are defined as being "technical employee class" (#2) and "employer or self-employed class" (#4), respectively.

Distribution of RMP Groups in Various Cities
In the process of the formulation and implementation of social security policies in China, obvious regional distribution differences are found between various clusters; that is, the population distribution of these clusters differs across cities [55].Practical experience suggests that the social security policy in China should be specifically devolved to the local level to balance the urban sustainability [56,57].At present, there are obvious weaknesses in housing security for RMP in China.Both urban residents and RMP are regarded as security objects, based on the interim measures for the management of public rental housing, published in 2012.However, the major housing security resources are applicable for the urban population because of their urban identity (known in China as "urban hukou").In fact, the problems of housing security policy for RMP, especially overlooking the regional features of RMP between various cities, still exist.It is essential to observe the regional distribution of the RMP from the city perspective as an effective measure to increase housing policy efficiency at the stage of macroeconomic policy-making.The State Council measures the urban population scale, based on the resident population size, to classify cities as super cities, mega cities, large cities, middle cities, and small cities [58].In this study, the numbers of primary sample units per city classification are 5, 8, 14, 40, and 15, respectively.The spatial distribution of the population at the city level is reported in Table 6.From the city perspective, it is obvious that major cities (namely, super cities, mega cities, and large cities), and especially super cities, attract the main portion of the RMP.On the one hand, many opportunities for employment are provided in major cities, which creates considerable private wealth to meet the household needs of low-income RMP families.On the other hand, there are many more housing security resources and public service resources in major cities than in other cities, which are more attractive for the long-term development of the RMP.However, blind population mobility may be responsible for aggravating the contradiction between rapid population influx growth and housing shortage, which is detrimental to the coordinated and orderly development of cities.Consequently, the countermeasures for major cities and medium-sized cities should be different.For major cities with strong population pressure, the inflow of the RMP should be restricted reasonably, and the threshold of housing security should be raised appropriately.However, positive housing security policies are appropriate for small and medium-sized cities with vast development potential, which contributes to enhancing their attractiveness.In other words, housing security policy should be initiated, based on the local city situation, to guide the reasonable flow of the RMP, which may effectively relieve pressure on population and housing in major cities.At the same time, this would promote the sustainable development of small and medium-sized cities.
From the perspective of the population, the characteristics of settlements in large cities is demonstrated by the first and the third clusters, which are defined by their original intentions to migrate.People in these clusters migrate to cities in search of more job opportunities and higher income levels.To accumulate wealth rapidly, these people may be more willing to compromise housing conditions.Makeshift houses, villages in cities, and shanty towns are currently the primary housing conditions of the people in these clusters.An effective approach to improving the situation is by providing more policy-based public housing with strict management roles.In contrast, most of the people in the second cluster are married, and favor mega-cities and middle cities as migrant destinations.Both harmonious family life and high income are highly valued by those in this cluster, which makes services such as educational resources and employment opportunities the key factor in their housing choice.The reality in China, however, is that major cities have more social public security resources, but the access threshold for housing buyers is relatively low in small and medium-sized cities.As a result, mega-cities become the best place for these people to migrate, while people who are limited by insufficient housing affordability will migrate to middle cities as an alternative.It is critical to distribute social public security resources fairly, and to provide better economic conditions and policy support to small-and medium-sized cities, which can promote the balanced development of urbanization between cities of different sizes.Moreover, this is an essential requirement for the sustainability of urban housing.

Conclusions and Suggestions
In this paper, a model for mixed-type data clustering analysis combined with the GD coefficient and PAM algorithm was proposed.Empirical analysis was conducted to explore the demographics of the RMP from a sustainable urban development perspective, based on the microdata of the RMP in eastern China.In the research results, four clusters of the RMP were identified: "new generations just entering", "old generations just entering", "technical employee class", and "employer or self-employed class".In general, the consumption level of all four clusters is low, and they prefer rental housing to other forms of housing, partly due to their meagre income.Further analysis indicates that large-sized cities are much more attractive to the RMP, while different clusters have unique preferences concerning the place to settle.Some suggestions for the formulation of housing security policy based on this study are presented, to promote sustainable urban development.
At present, the contradictions between the single supply mode of housing security and the varied housing demand of the RMP have become the main problem in the field of the sustainable development of urban housing, which requires a diversified housing security system that is based on population stratification.Previous researchers tend to regard the RMP as a homogeneous group when making recommendations for building the housing security system.With the advancement of urbanization, demographic stratification has emerged within the RMP.The diversity of actual housing demand caused by social status and household background is reflected in intrinsic and extrinsic factors, which include age, education, occupation, duration of mobility, city identity, and income.The unitary housing security pattern cannot solve the current challenges, which may be resolved by diverse modalities, particularly the preferential policy of public housing.The most fundamental measure for the relevant administrative departments is to analyze and explore the actual housing security demand of the RMP, based on diverse group features, which is conducive to improving the housing affordability of the RMP through multiple approaches.
From the perspective of the city, both the demographic structure and the local regional features should be considered comprehensively in the period of designing housing security policy.Meanwhile, it is necessary to ensure that the common and individual demands are embodied in the policy, which can play an active role in sustainable urban development and urbanization.Metropolises, such as Beijing, have many employment opportunities and income levels, which have a strong appeal for the RMP who are interested in high income but who lack city identity.However, a consequence of this is that the serious contradictions in housing have caused a decline in living standards.Tighter policies for housing security can be adopted to limit the population inflow.We suggest that the access threshold of policy could be raised appropriately.The applicant satisfies the criteria of local urban residents who join social insurance schemes and sign work contracts.In contrast, the housing resources of small-and medium-sized cities cannot be fully utilized, because of the insufficient urban attractiveness of these cities.We suggest that the application conditions of security housing should be diluted.Meanwhile, we can explore diverse forms of housing security, such as housing subsidies and monetary subsidies.Increasing the supply of public housing can make it possible to incorporate RMP into the urban housing security system.Therefore, effective policy for housing security consists of multiple ways to meet diverse needs and to promote sustainable city development.
The group of participants that is concerned with the housing security policy is large and complex.Because such policy concerns distribution, there may be information gaps.Faced with this situation, the dominant approaches are to establish methodical cooperation between suppliers of RMP housing security, and to strengthen the interdepartmental sharing of information resources, which is beneficial for the improvement of governance capacity.Therefore, the irregular changes in the housing demand of the RMP should be analyzed based on the population's demographic structure.In addition, the important prerequisite for solving the series of social security problems for RMP, especially the problem of sustainable urban housing, is to build a big data management platform that is based on information related to housing security and RMP households.On the one hand, it is useful to

Filter
the original dataset and select critical factors Compute similarity of samples using GD Randomly determine the range of k values and generate k initial medoids Clustering analysis based on PAM Output the result as the final conclusion Yes No Is the SC of the clustering result the most optimum?

Figure 2 .
Figure 2. Line chart of the Silhouette coefficient with different numbers of clusters.Source: Own elaboration.

Figure 1 .
Figure 1.Flowchart of the model.Source: Own elaboration.

Filter
the original dataset and select critical factors Compute similarity of samples using GD Randomly determine the range of k values and generate k initial medoids Clustering analysis based on PAM Output the result as the final conclusion Yes No Is the SC of the clustering result the most optimum?

Figure 2 .
Figure 2. Line chart of the Silhouette coefficient with different numbers of clusters.Source: Own elaboration.

Figure 2 .
Figure 2. Line chart of the Silhouette coefficient with different numbers of clusters.Source: Own elaboration.

Table 1 .
Definitions of the influencing factors.

Table 2 .
Descriptive statistics for continuous variables.

Table 3 .
Descriptive statistics for categorical variables.

Table 4 .
Markup and weight values.
Notes: #1~#4 represents the first to the fourth population after clustering analysis respectively.Source: Own elaboration.

Table 6 .
The proportion of the four RMP clusters in various cities (%).: The result is the structure of the RMP in different urban types, based on Table5.The meaning of each cluster (#1, #2, #3, #4) is the as in Section 5.1.Source: Own elaboration. Note