Analysis of Public Complaints to Identify Priority Policy Areas: Evidence from a Satellite City around Seoul

: Conventional studies on policy demand identiﬁcation that are anchored in big data on urban residents are limited in that they mostly involve the top-down and government-oriented use of such data. It restricts treatment to speciﬁc issues (e.g., public safety and disaster management), even from the beginning of data collection. Scant research has emphasized the general use of data on civil complaints—which are independent of areas of application—in the examination of sustainable cities. In this work, we hypothesized that the analyses of civil complaint data and big data e ﬀ ectively identify what urban residents want from local governments with respect to a broad range of issues. We investigated policy demand using big data analytics in examining unstructured civil complaint data on safety and disaster management. We extracted major keywords associated with safety and disaster management via text mining to inquire into the relevant matters raised in the civil complaints. We also conducted a panel analysis to explore the e ﬀ ects exerted by the characteristics of 16 locally governed towns on residents’ policy demands regarding safety and disaster management-related complaints. The results suggest that policy needs vary according to local sociocultural characteristics such as the age, gender, and economic status of residents as well as the proportion of migrants in these localities, so that, city governments need to provide customized services. This research contributes to extend with more advanced big data analysis techniques such as text mining, and data fusion and integration. The technique allows the government to identify more speciﬁcally citizens’ policy needs.


Introduction
With the advent of the Fourth Industrial Revolution, interest in smart cities and big data (big data generally refers to large and complex sets of data that represent digital traces of human activities and can be defined in terms of scale or volume, analytical methods [1], or organizational effects [2]) has been increasing. In particular, the demand for science-based urban policies based on big data has grown, as has the value of big data and the possibilities of using it as an analytical tool for rational urban public policy making and evidence-based administration [3][4][5][6][7][8][9]. For example, cities around the world collect massive amounts of data related to urban life, from objects (e.g., energy infrastructure) and people external and young migrants. In such contexts there is a growing demand for new policies, particularly related to safety and disaster preparedness and mitigation.
In this study, we investigate how town characteristics affect the demand for policies on safety and disaster management through civil complaints. In particular, we analyze the major issues raised in civil complaints as a reflection of policy demand and citizen priorities. We undertake text mining and panel analysis in a large dataset of textual data to identify real alternatives to enhance urban sustainability and improve the quality of urban life. We discuss the applicability of the results for identifying data-driven solutions to social problems and to customize policies for residents. This is because we believe that the handling of civil complaints through the textual analysis of big data can provide a scientific basis for identifying policy directions, formulating and implementing effectively policies, and improving the related systems and institutions.

Study Site: Namyangju as a Smart City
Namyangju is a new city in South Korea, whose development has been guided by the concepts of the Fourth Industrial Revolution and the smart city. The current population is approximately 670,000 people, of which about 15% are new residents from Seoul, and about 27% come from other parts of the country [35]. Namyangju is easily accessible to downtown Seoul (train lines, highways, a beltway around Seoul, and the Han River's northernmost road between Seoul and other satellite cities, all pass through Namyangju. National highways and local roads also connect the city to other northern rural areas), as it is the gateway of the northeastern transportation network linking Gyeonggi province and Seoul, which is shown in Figure 1. The economy of the broader area mainly focuses on fruit and dairy farming because about 70% of the land is located in a mountainous area. In addition, as the city is conveniently located near Seoul it has a well-developed service sector such as tourism [35]. duplication, Namyangju city is shortened to Namyangju) near Seoul [34] that attract an increasing number of external and young migrants. In such contexts there is a growing demand for new policies, particularly related to safety and disaster preparedness and mitigation.
In this study, we investigate how town characteristics affect the demand for policies on safety and disaster management through civil complaints. In particular, we analyze the major issues raised in civil complaints as a reflection of policy demand and citizen priorities. We undertake text mining and panel analysis in a large dataset of textual data to identify real alternatives to enhance urban sustainability and improve the quality of urban life. We discuss the applicability of the results for identifying data-driven solutions to social problems and to customize policies for residents. This is because we believe that the handling of civil complaints through the textual analysis of big data can provide a scientific basis for identifying policy directions, formulating and implementing effectively policies, and improving the related systems and institutions.

Study Site: Namyangju as a Smart City
Namyangju is a new city in South Korea, whose development has been guided by the concepts of the Fourth Industrial Revolution and the smart city. The current population is approximately 670,000 people, of which about 15% are new residents from Seoul, and about 27% come from other parts of the country [35]. Namyangju is easily accessible to downtown Seoul (train lines, highways, a beltway around Seoul, and the Han River's northernmost road between Seoul and other satellite cities, all pass through Namyangju. National highways and local roads also connect the city to other northern rural areas), as it is the gateway of the northeastern transportation network linking Gyeonggi province and Seoul, which is shown in Figure 1. The economy of the broader area mainly focuses on fruit and dairy farming because about 70% of the land is located in a mountainous area. In addition, as the city is conveniently located near Seoul it has a well-developed service sector such as tourism [35].
The city has recently simplified its administration system and is aiming for balanced development between its urban and rural areas. The city explores also customized services related to the transport and welfare systems for its residents. Additionally, the influx of external populations increases the policy demand for aspects related to crisis and safety, with an increasing focus on safety and disaster management policies.  [34,36]. Note: Administrative districts: five eups (one branch office), four myeons, and seven dongs (eight administrative welfare centers).   [34,36]. Note: Administrative districts: five eups (one branch office), four myeons, and seven dongs (eight administrative welfare centers). represents Seoul, the capital of Korea [34].
Data from the 2015 Namyangju social survey (the Namyangju social survey was conducted by the Gyeonggi local government in 2015 and provides secondary information about the living conditions and perceptions of urban residents across the study area. The sample comprises of household members aged 15 or older who live in Namyangju. A total of 1000 household members were sampled from 20 households in each of the 50 counties across Namyangju (n = 20 × 50 = 1000) using the sampling framework of the Korean Population and Housing Census. The survey was conducted between August 17 and 31, 2015 (The 4th Namyangju-si Social Survey Report), and the raw data include responses from 2274 interviewees [37]) shows citizen perceptions about crisis represents Seoul, the capital of Korea [34].
The city has recently simplified its administration system and is aiming for balanced development between its urban and rural areas. The city explores also customized services related to the transport and welfare systems for its residents. Additionally, the influx of external populations increases the policy demand for aspects related to crisis and safety, with an increasing focus on safety and disaster management policies.
Data from the 2015 Namyangju social survey (the Namyangju social survey was conducted by the Gyeonggi local government in 2015 and provides secondary information about the living conditions and perceptions of urban residents across the study area. The sample comprises of household members aged 15 or older who live in Namyangju. A total of 1000 household members were sampled from 20 households in each of the 50 counties across Namyangju (n = 20 × 50 = 1000) using the sampling framework of the Korean Population and Housing Census. The survey was conducted between August 17 and 31, 2015 (The 4th Namyangju-si Social Survey Report), and the raw data include responses from 2274 interviewees [37]) shows citizen perceptions about crisis management and policy priority areas, and can inform the creation of customized services at the city level. For example, citizens' perceptions about safety vary according to age, residence period, education level, marital status, and residence type [37]. About half (51.6%) of the respondents felt that the city was very safe or relatively safe in terms of security, and 54.1% felt it was very safe or relatively safe from natural disasters. However, the residents were dissatisfied with the traffic problems, the poor educational environment, and the lack of alternative residential and parking facilities.
When breaking down these perceptions, the fear of fire or crime was the main reason for dissatisfaction among women. In the 20-30 age group, the causes of dissatisfaction were mainly traffic discomfort (50.7%), lack of convenient facilities (25.4%), and public safety, crime, and anxiety (23.9%) [37]. The perceived level of risk increased as the residents' age increased, and was higher among women than men [37]. Overall, the demand for policies on traffic, safety, and health seems to have increased in the city.

Data Collection
We used basic statistical tools to analyze some of the data from the 2015 social survey to understand the level of the safety-related perception of urban residents (Section 2.1). Results vary depending on age of the town and demographic characteristics such as gender and age. We used this preliminary study to understand whether the safety-related perceptions of urban residents have changed or whether they articulate a policy demand.
For this purpose, we used two data sets; a civil complaint dataset and a demographic dataset; 9 years from January 2009 to June 2017 of telephone based complaint dataset and population movement data.
With the first civil complaints text data, we made an additional two data sets. Then, first, we analyzed the content of the complaints written and filed in the text by accessing Namyangju official complaint site. Based on this text, safety-related texts were drawn and frequency measurements were taken. The dataset was constructed by classifying the frequency by month and by place of residence. The telephone-based data is essentially big data that the local government accumulates on their own platform (for more information refer to: https://www.nyj.go.kr/minwon/2828?space=main_favorites, (accessed on 17 January 2019)). Second, according to the policy type suggested by Namyangju, 396,658 complaints were categorized by policy type and classified by month. Then, a panel data set was constructed by merging demographic data set by 16 towns in Namyangju.
For the panel analysis (Section 2.3.2), we used statistical tools to determine the impact of the local sociocultural characteristics on safety-related civil complaints. We used the 16 towns of Namyangju as the units of analysis for four types of complaints, namely hygiene, traffic safety, lighting facilities, and disaster management. The average population of the 16 towns is 38,881, and the average monthly number of civil complaints per town is 249.
We integrated the civil complaints according to the complainants' broad residential area. We merged and analyzed the demographic data of the 16 towns as the personal information protection law prohibits the direct use of the civil complainants' personal information. First, we divided the telephone complaint data into 69 categories, and then arranged the dataset by month according to the complainants' area of residence. Second, based on the annual population change data of the study Sustainability 2019, 11, 6140 5 of 17 region, we merged the demographic data (e.g., gender, population composition, migrant population) by town with the monthly telephone-based civil complaint data obtained from Namyangju for research purposes. Finally, we constructed the panel data from January 2009 to September 2017 for all of the 16 towns (the town of Byeollaedong was established on 22 December 2011. Therefore, the civil complaint data for this town were from February 2012 onward).

Text Mining
The main purpose of text mining is to elicit high-quality information by applying natural language processing and statistical learning methods [3,38]. In particular, we used frequency analysis to track the use of frequent non-common words such as nouns, which is perceived to represent the main points that an individual respondent aims to convey [3,38]. We examined the frequencies of nouns and noun phrases in a semi-automatic manner through coded data and commands (see below for the different analytical steps). We used the R program for the original Korean text, as it can process Korean text without altering its meaning. The final results were translated into English.
In particular, during the preprocessing stage, we used the R package to decompose the words from the civil complaint data [3,38]. For word extraction, we used the Korean Natural Language Processing (KoNLP) program and the Sejong dictionary package from the R program [17,32,39]. The unit of analysis was the individual complaint, with a total of 111,105 words extracted during the first refinement process (excluding the wrong words or number combinations). We extracted only nouns manually by processing figures, symbols, and duplicate words, and used Microsoft Excel and coded equations to calculate the frequency of each word. In the second refinement process, we deleted words with a frequency lower than 10 that were erroneously extracted or had incorrect syllable combinations. In the third refinement process, from the 500 words, we excluded those that were unclear or incorrectly extracted, and then extracted the word with the highest frequency for each town and for all towns. The 200 words used for the final analysis were extracted through an iterative refinement process.
Language processing is very important in text mining analysis because a word could be interpreted in various meanings or there is a homonym. Since the analysis is based on frequency, the accuracy of language processing that can affect the results is essential. In previous research, it is also verified that low frequency does not affect its final analysis [40,41]. Hence, words frequencies below the top 500 are too few, it happens just once or twice, so low frequency words did not affect to the final decision or rankings [40,41]. Therefore, among the top 500 words, the spelling or homonyms were rearranged and merged according to the Korean language, Hangul, characteristics. Additionally, we chose the top 200 words to process the analysis.

Panel Analysis
As discussed above, panel analysis can determine the relationship between local sociocultural characteristics and safety-related civil complaints. In particularly, we explored whether gender, age, external transfer rate, and income level (which are all factors related to safety perception), actually appear as safety-related complaints and articulated policy demand. Panel analysis can control for unobserved heterogeneity across various towns [42]. This means that it allows for overcoming the limitations of missing variables by controlling the estimation errors that occur in the time series process and in the individual unit of analysis. The panel model is divided into a fixed effects (FE) model and a random effects (RE) model depending on the assumptions of the error term that controls the missing variable.
The FE model takes into account the inherent heterogeneity of the panel entity that does not change with time-that is, the effect of individual characteristics-and assumes that its inherent characteristics are latent [42,43]. On the other hand, the RE model assumes that the inherent characteristics of an individual are not fixed but change stochastically [42]. The choice between the two models depends on the type of study. However, in general, the Hausman test can be used to determine which model is more suitable [42,43]. In this study, we analyzed the pooled ordinary least squares (OLS) model, the FE model, and the RE model. We used the F test to compare the pooled OLS model with the FE model [42,43], and we used the Breusch-Pagan LM test to test the significance of the time probability effect [42,43]. Finally, we used the Hausman test to compare the FE and RE models. The results verified that the pooled OLS model is not suitable because of autocorrelation and heterogeneity [42]. For example, the regression equation used for hygiene-related complaints would be as follows: where all independent variables are explained in Table 1. Season it denotes the vector value of the season variables, i and t are each town and time, respectively. In addition, µ it and e it are town-specific error and total error term, also respectively. β 0 , β 1 , . . . , β 8 are regression coefficients. Other variables such as traffic safety, lighting facility, and disaster related complaints are expressed with the same regression equation by T it , L it , and D it . Based on the Hausman test results, the hygiene-related complaint rate and the lighting facility complaint rate were estimated in the FE model, and the traffic safety and disaster-related complaint rates were estimated in the RE model. However, the FE model cannot estimate the influence of time invariants in each town. Therefore, the estimates of the FE and RE models are presented together in later.
According to Namyangju administration, civil complaints are classified into 29 types; traffic, architecture, road, education, tax, disaster, hygiene, lighting facility, disaster, farm, library, sewage, daycare center, land, park, forest, and so on. In this study, we did focus on safety related complaints, because Korean local governments policy has paid much attention to the safety issues of urban residents gradually. In particular, satellite cities near to Seoul, such as Namyangju, are faced with the demand about the safety issues to facilitate the immigrants from outside, and try to minimize the emigration as well.
The dataset was generated monthly by receiving civil complaints of 16 towns in Namyangju. For the 9 years data of Namyangju, each town-level (16 towns) complaint was measured as the proportion of safety-related complaints among the total complaints for every month. Just because there are more safety-related complaints in one village does not mean more complaints than in other villages.
It is more precise to measure the portion of safety-related complaints among all complaints in a town so that they could be compared between towns. Therefore, in this paper, the dataset is composed of the ratio (%) of safety-related complaints (H, T, L, D) among monthly total complaints by town to maintain the equivalence of data.

Text Mining Analysis
We used computerized techniques to classify, cluster, and analyze large volumes of textual data to find useful patterns in this section. Figures 2 and 3 summarize the main findings of the text mining analysis. Figure 2 shows the most common words after the different refinement processes outlined in Section 2.3.1. Some of the most common words include "action", "parking lot", "construction", "noise", and "installation". These words are essentially related to the citizens' dissatisfaction with the inconveniences they experience or with the urban government's late response to their complaints. Several civil complaints called for the construction of roads, parking lots, and new facilities for the new towns, as many parts of the city are currently under development (Section 2.1). Due to the construction of roads, constantly recurring themes in the complaints include noise nuisance, traffic problems, and the lack of parking lots. Although Namyangju is not yet a smart city, its expansion has led to an influx of migrants, which has led to some conflicts and inconveniences between older residents and new migrants.

External migrants
Number of external migrants × 100

Text Mining Analysis
We used computerized techniques to classify, cluster, and analyze large volumes of textual data to find useful patterns in this section. Figures 2 and 3 summarize the main findings of the text mining analysis. Figure 2 shows the most common words after the different refinement processes outlined in Section 2.3.1. Some of the most common words include "action", "parking lot", "construction", "noise", and "installation". These words are essentially related to the citizens' dissatisfaction with the inconveniences they experience or with the urban government's late response to their complaints. Several civil complaints called for the construction of roads, parking lots, and new facilities for the new towns, as many parts of the city are currently under development (Section 2.1). Due to the construction of roads, constantly recurring themes in the complaints include noise nuisance, traffic problems, and the lack of parking lots. Although Namyangju is not yet a smart city, its expansion has led to an influx of migrants, which has led to some conflicts and inconveniences between older residents and new migrants. However, these results do not show the diversity of civil complaints because they do not account for approximately 70%-80% of the complaints, such as those related to sewage, factories, and However, these results do not show the diversity of civil complaints because they do not account for approximately 70-80% of the complaints, such as those related to sewage, factories, and cleaning. Therefore, to examine the policy demand related to safety and disaster management, we extracted the relevant words and investigate the trends by year (Figure 3). In Figure 3, hygiene-related words represent problems related to water, garbage, smell, and sewage, while words such as "dust", "damage", and "manhole" are related to disasters and crises. cleaning. Therefore, to examine the policy demand related to safety and disaster management, we extracted the relevant words and investigate the trends by year (Figure 3). In Figure 3, hygienerelated words represent problems related to water, garbage, smell, and sewage, while words such as "dust", "damage", and "manhole" are related to disasters and crises.  When examining at the results it is possible to identify various different problems. Apart from "parking", "noise", and "road construction" problems (which are consistent with the contents of the telephone-based civil complaints), other problems such as "garbage", "elementary school", When examining at the results it is possible to identify various different problems. Apart from "parking", "noise", and "road construction" problems (which are consistent with the contents of the telephone-based civil complaints), other problems such as "garbage", "elementary school", "moving", "bus lines", and "pavement construction" are also raised. Minor inconveniences are also mentioned, such as problems with "apartment entrances", "unprocessed complaints", "infant care", and the "location of civil service centers". Figure 3 also suggest that citizen complaints have been changed from the technical and infrastructure-related (e.g., facility installation or maintenance) to the complaints related with more social policy requirements such as the welfare system. Initially, citizens tended to express their complaints about public services, but the complaints were seemed to shift toward more constructive complains related to policy requests or suggestions. Table 2 contains the scores for the main variables outlined in Table 1. As we pointed out, we used the safety related complaints data among all 29 complaints types in accordance with the criteria provided by Namyangju. It is about 16% of total complaints. A large portion of complaints are about parking lot limitation and illegal parking, and many analyses regarding on this issue have already been completed in the previous researches [11][12][13][14][15][16][17]32]. Even though it is a small portion, safety related policy demands are important to keep the city government stable because immigrants and natives consider safety more and more. The local government considers how to make their city sustainable and satisfy residents.

Panel Analysis
To analyze the factors, we postulated four types of complaint as the dependent variable (H, T, L, D) and they were expressed by the ratio of safety related complaints. There are two reasons for measuring the ratio variable in this study [42,43]: (1) to ensure normality because the numerical values are skewed, (2) to identify changes in safety-related complaints (four types; H, T, L, D) among the entire complaints over time (monthly).
The safety related complaints were divided into four areas: hygiene, traffic safety, lighting facilities, and disaster prevention. First, the ratio of hygiene complaints was measured as the ratio of civil complaints related to hygiene complaints among the total complaints per month. The same way, the ratio of traffic safety complaints is the proportion of civil complaints related to traffic safety among all civil complaints. It accounts for each complaint from the total number of complaints filed at the relevant time (month) for each town. The percentage of complaints about lighting facilities is also the ratio of civil complaints related to security at the month the complaints occurred. Disaster emergency civil complaints ratio is the ratio of complaints related to general safety such as floods at the month.
On Table 2 Table 2 shows the average proportion of hygiene-related complaints and complaints about lighting facilities are extremely low, at 0.42% and 0.51%, respectively. The average proportion of disaster-related complaints is higher at 2.82%, and the average proportion of traffic safety complaints is the highest at 12.46%. On average, about half of the citizens that lodged the complaints are female and below 40 years old. An average of 28.8% owned cars during the survey period. The average rate of internal migrants within the sample is 2.41%, and the average rate of external migrants is 1.01%. On average, about 0.836 complaints are lodged per person per month.
We analyzed the derived variables from the civil complaints through regression analysis to determine the factors explaining the prevalence of different types of complaints ( Table 3). The FE analysis for hygiene-related complaints (H) shows that the proportions of females, internal and external migrants, and the total ratio of civil complaints, do not have a significant effect on the hygiene-related complaint ratio. On the other hand, the proportion of people under 40 years old has a statistically significant positive effect on the hygiene-related civil complaint ratio (b = 0.0264, SE = 0.0065, p < 0.01). Car ownership also has a positive effect on the hygiene-related complaint ratio (p < 0.05). The period of lodging complaints (i.e., summer, winter) had a statistically significant negative effect on the hygiene-related complaint ratio. The additional RE model analysis shows that residence in new towns does not have a statistically significant effect on the hygiene-related complaint ratio. These results suggest that the higher the number of people under 40 years old, and the higher the economic status of the town residents, the more likely they are to file hygiene-related complaints. In addition, hygiene-related complaints are more common during the spring than in other seasons. However, there are no differences in the complaint rate between new and old towns.
The These results suggest that the higher the proportion of women, the greater the inflow of external migrants, and the higher the economic status of the residents, the lower the traffic safety complaint ratio. Traffic safety complaints were more common in the winter, and the traffic safety complaint ratio rose as the number of civil complaints relative to the population increased.
The FE model for lighting facilities complaints (L) shows that the proportion of women has a negative effect (b = 0.206, SE = 0.052, p < 0.01) on the ratio of civil complaints related to lighting facilities. The proportion of the younger generation has a positive effect on the lighting-related complaint ratio (b = 0.035, SE = 0.007, p < 0.01). Therefore, there are relatively few civil complaints related to lighting facilities in towns with more women and a relatively large number of civil complaints in towns with many young people. The ratio of complaints related to lighting facilities is high in the towns where the number of civil complaints per population is high due to the negative influence of the complaints on the perceived safety. In the case of seasonal variables, summer has a negative effect, while winter has a positive effect, which means that the complaints for lighting facilities is relatively high in the winter. The RE model does not reveal any significant difference in the complaints related to lighting facilities between new and old towns. However, external migration has a statistically significant positive effect (b = 0.0735, SE = 0.022, p < 0.01). Thus, the higher the proportion of the external migrants, the higher the amount of complaints for lighting facilities. Notes: Standard errors are in parentheses. The Hausman test is for the null hypothesis that the difference in coefficients is not systematic (i.e., the coefficients are equal). The POLS test indicates the Breusch and Pagan Lagrangian multiplier test for random effects, which determines whether the null hypothesis reject or not, and a common intercept is being or not (i.e., pooled OLS is appropriate). *** p < 0.01. ** p < 0.05. * p < 0.10.
The RE model analysis for disaster-related complaints (D) shows that the proportion of females (b = 0.339, SE = 0.133, p < 0.05) and new towns (b = 3.464, SE = 0.482, p < 0.01) has a statistically significant positive effect on the ratio of disaster-related civil complaints. The proportion of people below 40 years old (b = −0.184, SE = 0.026, p < 0.01) and external migrants (b = −0.303, SE = 0.111, p < 0.01) has a negative effect, whereas seasonal variables have a positive effect. Therefore, for overall disaster-related civil complaints, relatively high rates are found in new towns and towns with a high proportion of women. Relatively low complaint rates are found in towns with a large number of young people and external migrants. The overall prevalence of these complaints is higher in most seasons apart from spring.
We conducted t-tests to determine whether there are differences in the demographic characteristics and different types of complaints (e.g., hygiene-related, traffic safety, lighting facility, disaster-related) between new and old towns. We considered Hopyung-dong, Pyungnae-dong, Byulnae-dong as new towns, and Joan-myeon, Soodong-myeon, and Chingun-eup as old towns. The t-test results in Table 4 suggest that the proportion of women, young people, and both internal and external migrants are significantly higher in new towns than in old towns. On the other hand, car ownership is generally high in old towns. As a proxy for economic status, we measured car ownership per population, and not per household (there are limitations when using car ownership as an economic indicator. However, in South Korea, automobiles are included in income and property, which are the standard metrics for calculating the premiums for the members of the national health insurance premium area (Enforcement Decree of the National Health Insurance Act, § 42). Therefore, in this study, due to data constraints, we used car ownership ratio as a proxy of regional income levels). The total number of complaints is higher in new towns, than in old towns. However, the total number of complaints is lower in Pyungnae-dong (new town), than in Chingun-eup (old town). The number of complaints per population is significantly lower in new towns, than in old towns. Therefore, in general, more complaints are filed in new towns, but the number of civil complaints per population is lower in new towns due to population differences. New towns are characterized by a large number of women, young people, and external migrants. However, due to differences in population, the number of complaints per population in new towns is relatively small. Compared to old towns, new towns have more disaster-related complaints, hygiene-related complaints, and complaints related to lighting facilities. Traffic safety complaints are less common in new towns compared to old towns.

Synthesis of Main Findings
Previous studies have been investigated with the top-down approach by use of big data from national governments and also have been identified with the relevant issues in specific areas of urban policy such as transportation, public safety, and sustainability [10,20,26,27]. However, few studies have emphasized the general applicability of civil complaint data to inform urban policies [24,33]. In particular, there are limited efforts to reap the benefits of civil complaint data for urban policy making, although such work would be key in developing real-world applications. This study was aimed at reducing this research gap by utilizing civil complaint data at the town level to represent the policy demand and priorities of citizens.
By using the civil complaint data from Namyangju, a satellite city in South Korea, as an example, we showed that it is possible to analyze unstructured, non-spatialized, and non-specialized data using big data analytical techniques. Such analyses could provide powerful and cost-effective tools for informing the formulation of policy recommendations to identify and solve social problems, as well as customize policy priorities for each town. This could help overcome the limitations and higher cost of conventional studies on policy demand identification based on top-down and government-oriented use of big structured data [40].
Specifically, using text mining, we analyzed the linguistic and semantic characteristics of the civil complaints to determine what policy needs for safety and disaster management are presented. Then, the panel analysis identified the socioeconomic factors that influence the policy demand. The main results of this two-stage analysis show that some specific factors indeed seem to affect the probability of complaint articulation.
First, in towns with high proportion of women, the rate of civil complaints related to hygiene, closed circuit television (CCTV), and street lighting are low. In this study, the rate of complaints about hygiene, traffic safety, and lighting facilities was low in towns where women were living in a lot of places. On the other hand, the rate of complaints related to infrastructure related to disaster prevention was high in areas with new towns such as Byulnae-dong, Hopyoung-dong, and Pyungnae-dong.
This result was the opposite of what we expected. We hypothesized that the higher the female population, the higher the policy demands for street lighting and hygiene. However, this result reflected the characteristics of Namyangju new town with a high female population. The new town analyzed in this study was already well-designed for safety, so the interest in disaster-related facilities rather than street lighting and hygiene-related facilities was expressed as policy demand.
Second, when analyzing the type of civil complaints by age, the proportion of complaints related to hygiene, traffic safety, and lighting facilities is higher in towns with a high portion of young peoples. In these towns, there is a high demand for the matter of traffic safety, security, and safety in common life, while policy demand for overall safety management such as natural disasters is relatively low [41]. Relatively young people express high policy demands on safety infrastructure. This shows that there is a high demand for safety-related facilities that are needed in reality rather than prevention of future situations such as disasters.
Third, when looking in the type of complaints according to the type of migrant population, we found that (a) the higher moving happens from the inside of Namyangju, the higher the demands for traffic safety are required; (b) the higher migrations from the outside of Namyangju, the more demands are asked for the security issue than the traffic safety or natural disaster. It shows that the security issue is an important reason for the influx of people from Seoul to Namyangju [41]. From the complaint analysis of (a), it expresses the policy demands related to safety measures for the traffic situation, which are relatively inconvenient by the influx of migrants from the city. The complaint analysis of (b) tells that the policy needs of those who have migrated from the outside are focused on the necessary security related facilities such as street lighting rather than the well-planned transportation.
Fourth, when analyzing the types of complaints according to the income level, we found a higher probability of hygiene rate and disaster prevention complaints with a higher car ownership rate. On the other hand, the complaints related to traffic safety ratio and lighting facility ratio is low. This shows that citizens with higher income level express a higher demand for safety management related hygiene and natural disaster than traffic safety and security [41]. Migrants from outside have a relatively higher income level than natives. In particular, some people who have moved to Namyangju have families with kindergarten or elementary school students. Looking at the complaints, they express expectations for high hygiene linked to natural environments such as fine dust and dermatitis.
Fifth, town characteristics also seem to affect the rate of safety-related complaints. The t-test results reveal that the articulation of safety complaints is higher in new towns with a large number of women, younger people, and migrants. On the other hand, new towns have a relatively lower articulation of traffic complaints compared to old towns because they largely have an adequate transportation system as they are in a more mature stage of urban development. This suggests that it is needed for policies on safety in cities in terms of constructing infrastructure related to early traffic systems. The above results also imply that policy priorities should be changed according to local sociocultural characteristics such as the age, gender, and economic status of the residents, as well as the proportion of migrants.
Finally, we should note that the overall framework of our work presented in this paper is not limited to specific data (i.e., the civil complaint data) and methodologies (i.e., text mining and panel analysis) but can be applicable to any unstructured and readily available data. It can be extended with more advanced big data analysis techniques such as sentiment analysis, supervised learning, and data fusion and data integration [45]. This technique allows the government to identify more specifically which of the citizens' policy needs is positive or negative. In addition, it is possible to increase the efficiency of how fast and accurate analysis of huge amounts of big data.

Implications for Local Governments
This study discussed how local governments can use complaints to understand real policy demand and make customized policies to enhance urban sustainability. Our results find a high variability in the type of civil complaints due to socioeconomic, geographic, and other factors. It is confirmed that the policy demand varies depending on the gender, age, income, period of stay, and new or old town. Despite these diverse citizen needs, the city government has been created with a single top-down policy. This increases the inefficiency of policy formation and undermines the sustainability of cities. For, if they do not meet the needs of their citizens, they can move to another city. Therefore, this result suggests that policy demand might vary significantly within cities, and thus a uniform approach towards urban policy formulation might fail to take into consideration important context-specific problems. The use of civil complaint data can offer a powerful lens to identify more context-specific issues, and assist local governments in developing appropriate interventions to tackle them.
Second, local governments should have the skills and capacity to effectively use citizen data, including complaint data. As complaint data is being accumulated in real time, local governments should invest in technology for data storage and utilization. Similarly, there should be effort to create or reorganize appropriate data collection and utilization departments within the local government structures, possibly investing in data expertise.
Third, there is a need to create appropriate and ethical data collection and sharing mechanisms for the local governments. For example, currently, the national government of South Korea owns the data of local governments. When local governments need citizen data, they must get approval from the central government. This procedure increases the time and budget needed for urban policy making based on citizen data, reducing thus the efficiency of policy-making processes. As a result, local governments often use unnecessarily repetitive policies that complement existing policies rather than innovative ones [42].

Conclusions
There are two major contributions made by this research. First, this study demonstrated that using computational tools such as text mining for analyzing large volumes of text data allows city governments to uncover latent citizens' policy demand enabling a richer analysis. Specifically, this research involved the analysis of big data on citizen complaints, which reflect what urban residents desire from their local governments in terms of a wide range of issues [43]. The unstructured civil complaints regarding safety and disaster management were used as grounding in the investigation of policy demand. To this end, text mining was performed to extract major keywords associated with the safety and disaster management matters raised in the civil complaints. A panel analysis was also carried out to ascertain the effects of the characteristics of 16 locally governed towns on residents' policy demands regarding safety and disaster-related complaints. This examination revealed that differences in such characteristics are related to population change arising from urban development, which, in turn, is connected to population differences between newly founded towns and existing localities. On this basis, policy priorities should be changed in accordance with local sociocultural characteristics.
Second, related to civil complaints data, this study illustrated the possibility of how local governments can use citizens' complaints data. They are increasingly common (see for example, http://change.org, http://gopetition.com/, http://avaaz.org [46,47]), yielding vast numbers of texts that now appear to be amenable to analysis. Questions about the linguistic and semantic features of texts that are related to citizens' complaint analysis should continue to be explored. It provides a promising approach for capturing emergent policy demands within such bodies of text and lends itself to continuing refinement in the discovery of local issues. The civil complaints data can serve as a useful foundation for policy recommendations that are designed to eliminate social problems and customize policy priorities for local government.
We believe explorations along these lines will yield useful strategies to address the unsatisfactory issues such as diverse complaints from citizens, to facilitate the enhancement of citizen participation in policy making, and to promote sustainable local governments.
Author Contributions: E.L. and J.S. conceived and designed the research study and analyzed empirical findings; E.L. conducted text mining, and J.S. coded panel data and analyzed statistics. K.S.K. and S.L. checked and clarified the results and made major revisions to the manuscript. V.H.P. raised data gathering and summary from the data base viewpoint. All authors read and approved the final manuscript.
Funding: This research received no external funding