AI Augmented Approach to Identify Shared Ideas from Large Format Public Consultation

: Public data, contributed by citizens, stakeholders and other potentially affected parties, are becoming increasingly used to collect the shared ideas of a wider community. Having collected large quantities of text data from public consultation, the challenge is often how to interpret the dataset without resorting to lengthy time-consuming manual analysis. One approach gaining ground is the use of Natural Language Processing (NLP) technologies. Based on machine learning technology applied to analysis of human natural languages, NLP provides the opportunity to automate data analysis for large volumes of texts at a scale that would be virtually impossible to analyse manually. Using NLP toolkits, this paper presents a novel approach for identifying and visualising shared ideas from large format public consultation. The approach analyses grammatical structures of public texts to discover shared ideas from sentences comprising subject + verb + object and verb + object that express public options. In particular, the shared ideas are identiﬁed by extracting noun, verb, adjective phrases and clauses from subjects and objects, which are then categorised by urban infrastructure categories and terms. The results are visualised in a hierarchy chart and a word tree using cascade and tree views. The approach is illustrated using data collected from a public consultation exercise called “Share an Idea” undertaken in Christchurch, New Zealand, after the 2011 earthquake. The approach has the potential to upscale public participation to identify shared design values and associated qualities for a wide range of public initiatives including urban planning.


Introduction
Public data, contributed by citizens, stakeholders and other potentially affected parties, are becoming increasingly used to collect the shared ideas of a wider community. Online surveys with questions and answers powered by e-participation tools are being used to identify the public's new ideas, preferences and opinions through public consultation exercises [1]. For example, "The Quality of Life Survey" undertaken recently in New Zealand asked more than 7000 residents for their opinions on a wide range of different aspects of urban life in four major cities [2]. Likewise, social media, such as Twitter, is another data source to observe public response to news events, such as an earthquake or presidential election, and collect public opinions [3][4][5]. Once ideas are collected from public data, the policy formulation process is undertaken to map the individual ideas of self-interests to the shared values of public interests, and then translate into action plans [6][7][8]. However, large format data from public consultation exercises are often organised and analysed manually for reporting to relevant parties or published online for public access. For example, this was the case for the "Share an Idea" public consultation exercise by the Christchurch City Council after the 2011 earthquake. The interpreted results were reported as "Common Themes" as a guide for rebuilding Christchurch Central City [9]. Not surprisingly, manual interpretation of large format data sets can be a slow and laborious process and even infeasible when large volumes of public data are involved [10]. Natural Language Processing (NLP) technologies provide the opportunity to automate a large proportion of data analysis when processing large volumes of text data at a scale that would be difficult to achieve manually, such as identifying shared ideas from online surveys. In practice, NLP comprises a set of analytical methods and tools that facilitates computational interpretation of written languages by being trained on large datasets [11]. NLP has been widely used in modern life in the area of machine translation [12,13], smart search engines accepting long-form and complicated queries [14], high-quality summary generation from newspapers, product or movie reviews and restaurant ratings analysis [15][16][17][18]. Not surprisingly, various applications involving NLP have increasingly appeared in public service domains, such as analysing Twitter messages to identify opinions about the quality of public services [19,20] or to communicate emergency alerts to citizens such as during Hurricane Sandy in the USA [21,22]. NLP-empowered analysis shines in examining the public's attitudes towards events (climate change, electricity price, political protest, etc.) and helping policy makers understand public concerns and interests for policy formation [23][24][25]. Similar examples can be found for healthcare applications, where NLP is applied to process health-related texts from social media and extract the medical insights, such as patient self-treatment experiences (symptoms and therapies) [26].
This paper describes a novel approach to augment large format public consultation analysis using NLP technologies in order to identify people's shared ideas about future needs of urban infrastructures. The use of NLP in this manner is still in its infancy, although there are examples where it has been used for topic modelling [27] or sentiment analysis [18], which provides a general overview of public opinions at coarse-grained level. In comparison, our aim was to achieve a greater textual understanding at fine-grained granularity of people's shared interests. In this instance, the shared interests were analysed, aggregated and interpreted using NLP to reveal preferences of soft and hard urban infrastructures [28]. This paper is part of ongoing development of the "Urban Narrative" project [29] that aims to aggregate individual contributions into common narratives to foster collaborative problem solving as advocated by Nabatchi [30].
The approach has been illustrated using a dataset obtained for Christchurch "Share an Idea" initiative following the 2011 earthquake when the City of Christchurch (NZ) sought citizens' views on priorities for the future redevelopment of the city to improve livelihood and liveability post-earthquake.

Shared Ideas in Public Consultation
"Ideas" expressed online as short written or spoken messages communicate people's thoughts and priorities. These are sometimes short-term issues or long-term concerns. These grassroot "ideas" can frequently offer valuable insights into "thorny" public problems that can yield innovative solutions grounded in underlying cultural beliefs and values systems [7]. Evidence shows that these "shared ideas" can play a crucial role in helping to shape public policymaking [30]. These "shared ideas" have at times gone on to create "world images" by determining political agendas and influencing future actions at times on a global scale [31]. Scholars [32,33] have highlighted the important role played by "shared ideas" when drawing attention to people's needs and priorities that in turn can be used by local or national governments to shape public policy. For example, "Share an Idea", conducted by Christchurch City Council, was used to collect grassroot ideas for rebuilding in the central city district following the 2010 and 2011 earthquakes. The community-wide conversation comprised a combination of online web-based forum together with face-to-face workshops with more than 100 community groups. In total, over 10,600 ideas were collected across a wide range of interests covering public space utilisation, transport choices, city life, marketplaces, earthquake memorials, etc. The shared ideas were manually analysed by the City Council and aggregated into four urban themes, namely space, movement, market and life. In turn, these broad themes were employed to produce high-level principles for rebuilding the central city district.
In recent years, similar digital platforms have been employed to upscale public participation, such as European eParticipation projects [1] that utilise Information Communication technologies to develop online surveys or e-petitions/e-voting systems for citizens to express their opinions and ideas online (such as systems named "We the People", "UK Parliament Petition", "European Parliament Petition"). Social media (Twitter, Facebook, Instagram, etc.) have been popular data sources to observe public opinions and perceptions. For example, the case study by A.P. Kirilenko and Stepchenkova [34] collected millions of Twitter messages to understand the public's concerns on climate change. This interactivity of social media has enabled the public sector to have more frequent engagements with citizens to co-produce public services and public policy. The case studies by Criado and Villodre [19] and Mergel [20] demonstrated the strategies employing Twitter in increasing the satisfactory levels of public service delivery. Owing to the wide availability, user-friendly interactivity and fast growth of data, Twitter and other social media are widely considered by the research communities as a conducive source of public opinion collection alternative to survey-based public consultation exercises. However, the success of public participation ultimately depends on allowing shared decision-making as highlighted by Nabatchi [30].

Identifying Shared Ideas with Natural Language Processing
Data collected from social media or government sponsored e-participation platforms are being collected and analysed on a massive scale. However, the date often originates in unstructured format from several different sources (texts, web pages, pdf, Word documents, etc.) written in several different human languages (English, Chinese, Spain, etc.) often with numbers, dates, names of things, URLs, hashtags and quotes. This diverse, unstructured format poses serious challenges for analysis and interpretation using traditional analytic techniques. To address these challenges, Natural Language Processing (NLP) techniques have been increasingly used to process public data in order to gain insights into public opinions about public services [35]. The analysis has often been in the form of Topic Modelling and Sentiment Analysis, where Topic Modelling extracts a number of topics from the texts and Sentiment Analysis detects the public moods as positive or negative. Other techniques include Named Entity Recognition analytics that detects the names of public services from tweets to help improve the quality of public administration [10]. However, Topic Modelling and Sentiment Analysis are most relevant to our research.
Topic Modelling identifies the topics from a large body of texts which are then classified and organised by topic [27]. For example, Hagen et al. [36] employed the LDA (Latent Dirichlet Allocation) topic modelling technique to detect petition patterns in "We the People" e-petitioning system. In this case, the top 20 issues detected from public petitions concerned topics familiar to the public (e.g., freedom of religions) or related to important social events (e.g., gun control), whereas less familiar topics such as student visas and diversity of children health care issues (breastfeeding, abduction, baby cares) received less attentions from the public. Likewise, Lock and Pettit [37] employed LDA models to analyse online surveys in order to identify what factors influence customer satisfaction towards public transit systems. However, there are drawbacks to using LDA, as these mostly involve the significant effort required to fine-tune the model and the manual interpretation to select meaningful topics from automatically generated topic candidates.
In comparison, Sentiment Analysis assists the interpretation and understanding of public preferences on key issues ranging from awareness of climate change to protest movements by detecting moods of opinions as "positive", "negative" or "neutral". For example, Kirilenko et al. [38] demonstrated the concept of "people as sensors" by measuring the public attitudes towards climate change from Twitter messages. Likewise, Raza et al. [24] identified public sentiment towards the 2019 Azadi political protest in Pakistan. However, Sentiment Analysis is limited by the multiple interpretation of word order, slang, or the context e.g., "don't like car", which requires several sentiment analysis approaches (such as lexicon-based, rule-based or neural language models) inherently needed to achieve a balanced understanding of the sentiments [17,18,39].
To overcome some of these limitations, the two methods have been used in tandem to evaluate public satisfaction of large-scale infrastructure projects where the public satisfaction are measured with sentiment analysis and topic modelling is used to identify the causes of the different sentiments [25,37,[40][41][42]. At the same time, the performance of NLP has been benefiting from ongoing advances of deep learning techniques [43].
Even so, there are still a number of challenges that researchers face when using NLP in analysing public data for option gathering. As reported by Kuflik et al. [44], analysing the transport-related data collected from Twitter requires a domain ontology that defines the domain specific terms (such as bus, car, traffic, road, park, etc. for transport) to identify and filter the relevant messages from a large amount of Twitter data. Building a comprehensive ontology is difficult because it requires the domain expertise to provide all the relevant terms. Twitter messages are short and informal in nature, containing hashtags, URL addresses and slang, which affect the accuracy of NLP tools. As a result, most NLP tools fail to produce a good quality of analysis results on Twitter data because they are designed and tuned for more formal written texts.
In this project, we use a different approach to identify shared ideas in public consultation compared with Topic Modelling that relies on identifying a set of topic words or Sentiment Analysis that captures public feeling towards an event or option. Instead, we use state-of-art NLP tools to analyse public texts and extract their linguistic features (partof-speech tags, phrases, sentence structures) from which shared ideas are identified and grouped in terms of soft and hard urban infrastructures, or as shared interests compared to fixed positions. Two visualisation methods (Cascade View and Tree View) are developed to view shared ideas hierarchically or in fine details, which are different from the graphs used by Topic Modelling or Sentiment Analysis that provide a general overview or the trends of public options. The detailed methodology is described in the next section.

Methodology
As an alternative approach to Sentiment Analysis that offers coarse-grained perspectives on public attitudes or Topic Modelling that identifies individual topic words such as "bus", "train" or "tram" to describe public transport and "travel", "time" or "delay" to detect traveller complaints, our study focuses on the development of a methodology to provide a fine-grained analysis of public texts using grammatical sentence structures. Such analysis examines the subject, object and verb of the sentences in the form of subject + verb + object and verb + object within sentences, such as "I wanted little mini electric buses for transport" and "I believe the river is an absolute asset to our city". The methodology involves six steps as shown in Figure 1. In Step 1, the public text is cleaned to make it suitable for NLP tools. In Step 2, the texts are split into individual sentences and from which linguistic features (such as part-of-speech of words, phrases, subjects and objects and their relations) are extracted.
Step 3 selects sentences that follow grammatical patterns subject + verb + object and verb + object and subjects, objects and verbs are extracted to generate people's ideas.
Step 4 summarises the generated ideas by urban infrastructure category (such as Transport, Space, Building, People).
Step 5 groups the ideas into shared interests and fixed positions by using the text analysis.
Step 6 presents the grouped ideas in a set of visual and interactive charts and graphs to facilitate communication and exploration.
The methodology was developed using the public texts extracted from Christchurch City Council's "Share an Idea" public consultation exercise that took place during six weeks following earthquakes in 2010 and 2011. The community wide conversation involved more than 1000 people with over 106,000 ideas about how to rebuild Christchurch City Centre collected from a variety of online and face-to-face public engagement activities, including community workshops, forums, "Share an Idea" website, surveys and social media. The results were published in a report entitled "Christchurch Common Themes" [9]. The report itself reproduced approximately 2795 quotes (4636 sentences and 64,981 words) as a distillation from the original database. For the purpose of the article, the Christchurch Common Themes Dataset is labelled (CCTD) throughout this paper.

Step 1: Data Cleaning
Step 1 removes the repeated texts in CCTD and cleans non-uniform formatting or use of symbols and punctuation that would otherwise reduce the accuracy of text processing. In particular, the tabs and duplicated white spaces are placed with a single space, back (') and double quotes (") with single quote marks and hyphens (-) with commas to unify the symbols.

Step 2: Linguistic Feature Extraction
In Step 2, Stanford CoreNLP toolkits [45], a deep learning-based natural language processing toolkits, are used to analyse and extract linguistic data. First, CCTD texts are split into individual sentences. The text "I want more public transport. Pedestrians take predominance." becomes two sentences: "I want more public transport." and "Pedestrians take predominance." Then sentences are tokenised with Part-Of-Speech (POS) tags assigned to individual words. Figure 2 shows the tagged version of the sentence "I want more public transport." with POS tags at the top where PRP stands for personal pronoun, VBP for verb, JJR for comparative adjective, JJ for adjective, and NN for noun. Finally, four types of linguistic data are extracted and stored as the metadata (i.e., data describes the data) of a sentence. The linguistic metadata employed includes the following types of POS, clauses, phrases and sentences: • Verbs, adjectives, nouns and pronouns. As shown in Figure 2, want is a verb and transport is a noun, which are identified based on the POS tags associated with each word. Through POS tags, more or public is an adjective that modifies the noun (transport), and I is the pronoun that refers to persons or things (she, we, they, you, it, me, her, us, them, that, those, these, etc.). • Noun, verb and adjective phrases. Phrases are extracted using the Stanford Constituency parser [46]. A basic noun phrase comprises at least one noun word (e.g., transport), which often are modified by other nouns or adjectives (e.g., more public in Figure 2). A noun phrase itself can contain other noun phrases as shown in the following sentence where green spaces and beautifully colourful gardens is a noun phrase made up of two noun phrases green spaces and beautifully colourful gardens.

I want green spaces and beautifully colourful gardens.
A verb phrase constitutes a verb and its dependents that can be a noun, adjective or verb phrase. As shown in the following text, the main verb is have, followed by the noun phrase free Wi-Fi and the verb phrase to encourage tourists and travellers to linger that in turn contains the main verb encourage, the noun phrase tourists and travellers and the verb phrase to linger.

Central cities should have free Wi-Fi to encourage tourists and travellers to linger.
An adjective phrase is a sequence of words with a head adjective. In the first example below, the adjective phrase is very eco-friendly with eco-friendly as the head adjective. In the second example, totally accessible for bikes is an adjective phrase with accessible as the head adjective.
Every building must be very eco-friendly.
The street layout should be totally accessible for bikes.
• Object clause. In English, an object clause acts as the object of a verb. In the following sentences, that the clubs should be more classy and that CBD should be of high density with medium rise buildings are the object clause of the verb think and believe respectively.
I think that the clubs should be more classy.
I believe that CBD should be of high density with medium rise buildings.
• Sentence structure. A complete sentence typically has a grammatical structure of Subject + Verb + Object as illustrated in Figure 3 The subject is usually a noun phrase or a pronoun (e.g., I, we, you) and the object can be a noun, verb, adjective phrase or a clause. This includes sentences that convey a command, as illustrated in Figure 2, called an imperative sentence. This type of sentence always takes the second person (you) for the subject but most of the time the subject remains hidden. The Stanford NLP Open Information Extractor [47][48][49] makes it easy to identify sentence structures. However, not all sentences in the Christchurch Common Themes Dataset (CCTD) were complete sentences with the grammatical structures comprising subject + verb + object or verb + object. Instead, incomplete sentences were submitted by the public that included suggestions such as quality and affordability, great central city apartment living, a safe environment, car-free shopping streets, were quite intimidating, don't want, too good to lose, close second, not so stiflingly conservative, sustainable. These suggestions evidently represented public ideas on design values or design attributes wanted for the central city development plan for Christchurch, however, they were not considered in this study because the focus was on compiling automated methods of analysis. The unstructured suggestions would have required manual interpretation at this stage. Nevertheless, the incomplete sentences will be the focus of a later study, given their obvious importance.

3.3.
Step 3: Identifying Core Words to Characterise Thematic Urban Infrastructures of Importance to the Public Using Linguistic Metadata In Step 3, the grammatical structure of sentences from Step 2 are further examined to investigate the relationships between subject + verb + object or verb + object. In the examples given below from the CCTD dataset, the words I and CBD are subjects connected to the objects 'green spaces and beautifully colourful gardens' and 'of high density with medium rise buildings' by the verbs want and be, respectively. This allows us to begin formulating a public narrative based on ideas from public consultation. In this case, the corresponding common or proper nouns used to label different components of urban infrastructure (spaces, gardens, CBD and buildings) are compiled as core words to characterise thematic urban infrastructures of importance to the public.
I want green spaces and beautifully colourful gardens CBD should be of high density with medium rise buildings

Step 4: Categorising Ideas from PUBLIC consultation in Terms of Soft and Hard Urban Infrastructures
In Step 4, a framework for thematic urban infrastructures is adopted based on the notion of soft and hard urban infrastructures in Table 1 proposed by Dyer et al. [28]. The term "soft" refers to public administrative, organisational and social structures present in a city, whereas the term "hard" describes the physical components of a city that enable the soft infrastructure to function. The framework allows core words extracted from the text analysis in Step 3 to identify the type(s) of soft and hard urban infrastructures highlighted by public consultation. This cataloguing of public "ideas" as soft or hard urban infrastructures enables the desired design attributes or design qualities to be further explored by examining relationships expressed between subjects and objects commonly using adjectives, e.g., green spaces, colourful gardens, medium rise buildings. Table 1. Soft and hard urban infrastructures after Dyer et al. [28].

Utilities
Utilities are considered to be physical services such as transportation, water and waste systems, ICT, etc. These utilities connect and operate equally across all urban scales, including national and international interconnectivity.

Urban Space
Urban spaces are considered to be largely as bounded space, in the form of streets, urban plazas or local squares, playgrounds, parks, etc. Urban space is typically identifiable at the neighbourhood or district scale, depending on the nature of the open space and pattern of land ownership.

Buildings
Buildings are considered to be architectural space defined as single or grouped buildings forming part of an urban block. This will include dwellings, educational buildings, healthcare buildings, etc.

Institutional
Institutional infrastructure refers to public and private systems which provide certain services within the city such as local government, legal frameworks including land ownership, healthcare services, or educational services. It may also include sporting, art and culture, or official community support organisations. These institutions are typically top-down and more formal in nature.

Community
Community infrastructure refers to formal and informal networks, community or local business groups that occur within neighbourhoods or districts. These infrastructures rely on bridging and linking social capital. While "Communities of Interest" or online communities may not be location specific, many community organisations will relate to a specific physical community delineated by political, parish or physical boundaries (a river, large street, etc.). In this regard community infrastructures will often operate within the district scale and arguably at a more identifiable level at the neighbourhood scale.

Personal
Personal infrastructure refers to the support systems a person will have at an individual, family, or friendship level. This will often involve bonding social capital where membership of a family or social group is critical to a sense of belonging. It will also include educational attainment and other support systems that occur at an individual level.

Step 5: Categorising Ideas as Shared Interests Compared to Fixed Positions
A parallel step in the analysis is differentiation between shared interests compared to fixed positions as expressed through public consultation. As highlighted by Nabatchi [30], successful public consultation depends on creating collaborative environments whilst avoiding adversarial ones. Collaborative environments are where participants willingly work together towards a common goal based on shared interests (often based on shared values) as opposed to fixed positions that potentially lead to conflict. The design of the public consultation exercise itself can influence how participants frame contributions. As noticed for the Christchurch "Share an Idea" public consultation, it produced a combination of demands that could be labelled as fixed positions compared to expressions of belief, appreciation and fondness, which would fall into the category of shared interests. With those aims in mind, the text analysis allows the verbs want, like, love and believe to be identified as early indicators of fixed positions (want) compared to shared interests (like, love, believe). Examples of the different categories of verbs and implied fixed positions or shared interests are given below from the CCTD Dataset.
"I would like to use some of the bricks and material from buildings lost in the quake in new buildings." "I would love to see the inner city car free with a lot of cycle ways, bus lanes and pedestrian only areas." "I believe that more green alternatives should be utilised to improve essential services." "I want recycling bins and more trees." "I want to see more rainwater reused for irrigating green spaces."

Step 6: Visualisation of a Common Narrative Evolving from Public Consultation
Having undertaken text analysis of the large volumes of words collected from public consultation and interpreted the meaning in terms of urban infrastructures and shared interests, a final step is to communicate the results ideally in real time. Data visualisation tools were chosen to communicate a common narrative that could ultimately lead to design personas being created by sorting shared interest in relation to demographic information about contributors. At this early stage in the study, the visualisation techniques chosen offer a Cascade View and Tree View. The Cascade View comprises a combination of hierarchy plots to indicate overall trends towards different soft and hard infrastructures linked to frequency plots of specific infrastructure terms within these broad categories. Ultimately, the frequency plots provide access to individual sentences within a single display that reveal individual contributions for the reader to view. In comparison, the Tree View illustrates a pattern of dialogue around a common subject, verb or object such as I want as a collection of fixed positions or composting as an object of interest. Ultimately, the visualisation techniques are intended to create a common narrative that ideally reveals shared interests that can better inform future decision-making around a more sustainable and resilient future.
In this project, we used two programming languages, Python and JavaScript, along with a set of publicly accessible software libraries. We developed two Python programs and one JavaScript program. The first Python program cleans up the public text in Step 1 and uses the Stanford CoreNLP Toolkit to extract linguistic features from public texts and stores them with the original text together (Step 2). The second Python program is used in Step 3, 4 and 5 to identify shared ideas and group them in terms of soft and hard urban infrastructures or shared interests and fixed positions. The JavaScript program uses visualisation libraries to display the grouped shared ideas in Cascade and Tree View. Table 2 provides an overview of the libraries used, including the name, version, the purpose of use and in which step (please refer to the detailed documentation on how to use these libraries on their website). For example, Step 2 uses the Stanford CoreNLP toolkit to extract the linguistic data. In Step 6, D3.js and Plotly.js are used to produce Cascade View and Google chart for Tree View. Note that the code implementation of our project is included in the Supplementary Materials.  3 Produce Cascade View Visualisation (Step 6) Google Chart (v50) 4 Produce Tree View Visualisation (step 6)

Topics of Public Importance Highlighted by Text Analysis of Christchurch Common Themes Dataset
The Christchurch Common Themes Dataset (CCTD) used for the pilot study contains 3969 sentences; of which 53% (2113 sentences) had the grammatical structure subject + verb + object and 21% (824 sentences) comprised verb + object. The residual 26% (1032 sentences) were incomplete sentences and excluded from the pilot study. The remaining 2937 complete sentences were used to explore topics of interest raised by the public in the Christchurch "Share an Idea" public engagement initiative. The topics of interest were identified as subjects and objects and categorised in relation to soft and hard urban infrastructures using the dictionary in Table 3.  Table 3 gives the urban infrastructure dictionary with four categories and a number of associated terms. The key infrastructure categories were selected based on four common themes that emerged from the "Christchurch Common Themes" report [9] and grouped into soft and hard urban infrastructure [28]. For each category, we collected a list of associated terms after manually reviewing the summaries and topics in the "Common Themes" report. The Public Space category describes people's strong desires for people-centred and green spaces in the new city centre. The Building category is for the development of safe buildings and new marketplaces that encourage businesses and retailers to bring people back to the city. The Transport category aims to provide people with better transport choices (bus, cycle, tram, etc.) to easily commute from/to the new city centre. The People and Communities category transforms the city into a safe and vibrant city accessible for all people (families, ethnicities, communities, etc.).
The sentences in the CCTD dataset were clustered into these four categories based on the terms in the dictionary (Table 3). For example, the sentence "I want social spaces" is categorised under Public Space because of the occurrence of the term space. This dictionary was built on the Common Theme report, and the Christchurch City Council used their own approach to summarise the people's ideas. Our approach is based on this dictionary to categorise the texts in the CCTD dataset, and also identify subjects and objects in the sentences. The results are presented in Tables 4 and 5 for subjects and objects, respectively.  Table 4 showed that 7.9% of sentences were concerned with Public Space, compared with 6.3% for Utilities (in the form of Transport), 4.2% for Building and 4.8% for People. A remaining 17.3% (such as industrial businesses, amphitheatre, alternative energy system, the council) fell outside of the dictionary of terms compiled for soft and hard urban infrastructures. By examining the results in terms of infrastructure, it was found that the subject phrases mostly relate to specific urban spaces (city centre, market square, green space, dog park, inner courtyard, etc.), private dwelling/public buildings (safe building, affordable apartment, restaurant, school, public library, etc.), transport services (public transport system, bus exchange, free tram system, cycle lane, light rail, etc.) or a social group of people (older people, family, young parents, community, etc.). In comparison, subjects in the miscellaneous category covered topics such as wind power, water, waste system, lighting.

Analysis of topics expressed as a subject in
Lastly, the study showed that 59.5% of subjects were pronouns rather than common or proper nouns. The pronoun you was the most frequently cited in verb + object sentences (29.3%), followed by the first person singular I (15.6%), we (5.2%) and other pronouns (9.4%) such as it, this, they, that, there, one, these, those. allow for more disabled/short term car parking, less cars and more emphasis on walking and cycling and public transport, electric buses, integrated bus light rail tram and bicycle transport, to see more dedicated cycleways and safer intersections for cyclists, to slow traffic to make it safe for everyone, to separate bike lanes with bike and pedestrian priority all new buildings to have rooftop gardens, a variety of more modern buildings, 5 storey building height limit, to build safe tall buildings in an earthquake prone location, a certificate on every commercial building that says what level of earthquake code this building is now at, more buildings converted to apartments, most of the old heritage buildings to be rebuilt, to use some of the bricks and material from buildings lost in the quake in new buildings People and Communities 5.7% people safety , public safety and peace of mind, lots of fun activities for families like ice skating in winter and big playgrounds, to become a global community, a new deep sense of community, able to include all people regardless of their physical abilities

34.6%
publicly owned business incubators with low rent and leases limited to three years, more drinking fountains, lots of free buskers festival events/firework shows/outdoor cinema, more recognition of Maori culture, more foreign investment, services such as banks/government departments/clusters of dentists and lawyers, low rise ecologically sound structures Pronouns 7.1% it, them, us, you, me, there, one, that, this, those, these In comparison, the objects identified in complete sentences were expressed as either a noun, verb, adjective, adverb phrases, object clauses or pronoun. The results are presented in Table 5 where objects are grouped in relation to different categories of soft and hard urban infrastructures. The analysis revealed that 31.4% of objects were concerned with Public Space, followed by Transport (12.0%), Buildings (9.2%) and People and Communities (5.7%), leaving a miscellaneous category (34.6%). The results indicate the public's desire for (1) a people-friendly space where families and communities can gather and hold the cultural activities to celebrate the heritage of Maori and English culture in New Zealand, (2) more transport options to travel to around the city centre, such as biking, light rail, tram, etc., (3) stronger and safer buildings that follow the low-rise and earthquake-resilient building standard, and (4) public safety, fun family activities and inclusive community. The miscellaneous category covered topics ranging from business incubators, cultural festivals, building structures and cultural recognition.
Lastly, a variety of verbs were used in the subject + verb + object and verb + object sentences. The be verb in different forms (be, is, are, was, were, been, etc.) was the most common verb, used in about 15% of sentences. Other frequently appearing verbs (occurring more than 20 times) included want, like, love, believe, have, need, make, use, encourage, like, keep, create, get, do, bring, think, walk, move, live, love, take, give, go. The verbs want, like, love, believe offered some indication of shared interests compared with fixed positions as explained earlier. The influences of these verbs are best illustrated using the Tree Views presented in the following section.

Visualisation of Topics Highlighted by Text Analysis of Christchurch Common Themes Dataset (CCTD)
Having summarised the main topics raised by text analysis in Tables 4 and 5, the next step is to visualise the results in a more direct and easily digested manner for public engagement. The aim has ultimately been to create a data story that can be updated in real time to promote community conversation online. With this aim in mind, illustrations based on Cascade View and Tree View have been developed as follows.

Cascade View
Cascade View (Figure 4a,b) has been developed to depict a cascading stream of information beginning with a hierarchy chart (Figure 4a) to illustrate the broad categories of interests captured by public consultation. For the Christchurch "Share an Idea" case study, the hierarchy chart focuses on different categories of soft and hard urban infrastructures, encompassing utilities (transport, energy, water, telecommunications), public spaces (garden parks, rivers, plazas, playgrounds) and buildings (housing, schools, hospitals, libraries, etc.), people and communities. Having depicted relative interests, the next stage illustrates the relative frequency of responses within different categories of soft and hard urban infrastructures. For example, urban utilities encompassed different modes of urban transportation. The third and fourth level of visualisation provides fine-grained information about individual terms visualised in a word cloud and individual sentences. At each stage, the screen provides an interactive link to the access information from the next stage, i.e., the word cloud provides access to individual sentences containing that word. The generation of Cascade View is based on information extracted by NLP text analysis of Christchurch Common Themes Dataset (CCTD). The illustration reveals public interest in transportation with a particular interest shown for improved public transportation with more buses and fewer cars in the city centre. The information was extracted from the noun phrases of the subject + verb + object and verb + object sentences. The size of each term under a category reflects the number of times a term was mentioned. In this case, the term bus is the most frequently mentioned term under Transport (49 times), followed by street, car, cycle, transport, etc. By "clicking" on a term bus (shown in green), a word cloud (Figure 4b) is displayed corresponding to objects expressed as a bus in the sentences. The relative font sizes for the word cloud reflect citation rates. Further "clicking" on individual words in the word cloud (such as CBD) allows original sentences from the CCTD dataset to be displayed such as "use of electric buses in the CBD".

Tree View
To complement Cascade View, a parallel mode of visualisation was developed using Tree Views. Tree Views (Figure 5a-c) provide an opportunity to cluster public suggestions together into urban themes using word trees. The word trees can be constructed based on grammatical structures (i.e., noun phrase, verb phrase, object clauses) or verbs indicating fixed positions or shared interests (i.e., want, need, like, love, think, believe). In each case, the sentences are ordered under themes for soft and hard infrastructures. For example, three Tree Views are presented in Figure 5 plotted from a total of 327 sentences (8.2% of the total sentences). There is also an option to construct the word trees based on grammatical structures that highlight objects as noun phrases or recommended actions as verb phrases or combinations of highlighted objects with associated actions as object clauses. For the purpose of this article, the Tree View in Figure 5 has been created based solely on the urban themes of public space, transport and buildings. As such, it offers a clear insight into public's preferences and dislikes in relation to transport (e.g., wanting less cars in central city), public space (e.g., green parks for cycling and dog walking plus park and ride facilities for multi-mode transport) and buildings (e.g., attractive timber buildings not glass skyscrapers).

Novel Approach for the Analysis, Interpretation and Visualisation of Large Format Dataset from Public Consultation
This paper presents an innovative approach for the analysis, interpretation and visualisation of large format dataset from public consultation. The objective is to facilitate large-scale, community-wide conversations about shared values and priorities. At its core, the approach uses NLP tools to augment sentence-based dataset into usable phrases that express public insight and ideas visualised through Cascade and Tree Views. The terms Cascade and Tree View refer to a combination of hierarchical visualisation methods that have been used to gradually reveal ever greater details of data as a part of a data storytelling approach. More information about this data visualisation is given below.
This novel technique has been trialled using CCTD dataset from the Christchurch "Share an Idea" public consultation exercise undertaken in 2011 following a series of highly destructive earthquakes [9]. In comparison with the manual analysis undertaken by Christchurch City Council, the NLP-driven analysis provides a semi-automated process to readily analyse the 106k public contributions in a matter of minutes. Perhaps more importantly than a rapid analysis of large single datasets, the novel approach provides an opportunity to analyse and visualise public contributions in real time. This capability enables deliberative public discussion to take place in the here and now, as advocated by Nabatchi [30] in her seminal paper on designing public consultation. The aim being to use a deliberative model of communication to support structured problem solving based on shared knowledge and insights from both the user (public) and provider (local authority or private service provider). In comparison, one-way communication is often used for information-sharing purposes, which provides little opportunity to discuss public values.
In a similar vein, the semi-automated NLP-driven process offers an opportunity to explore public "interests" compared with "positions". As highlighted again by Nabatchi [30] and supported by Fung [50,51], interest-based processes are more likely than position-based processes to generate a level of cooperation needed to help public administrators identify and understand public values and priorities in contentious situations. In this instance, NLP analysis of the CCTD dataset allowed combinations of pronouns and verbs to provide an approximate indication of interest-based suggestions compared with position-based ones, such that the pronoun and verb I want would tend to indicate a public position compared with the combination of pronoun and verbs I believe/think/love, etc., being more inclined to indicate a public interest. The Tree View visualisation tool enabled these combinations of pronouns and verbs to be grouped together and classified in terms of urban infrastructure, such as transportation.
The term semi-automated process is used because the analysis of augmented sentence data depends on use of a predefined dictionary. In the Christchurch "Share an Idea" case study, the dictionary characterised a framework for soft and hard urban infrastructure that classified elements of cities into six categories, namely institutions, community and personal for soft urban infrastructure compared to public space, utilities and buildings for hard urban infrastructure. For the Christchurch case study, the approach enables the dataset to be readily interpreted into these key elements of urban infrastructure. A similar approach has been used to interpret the data using a dictionary for the Circular Economy CE [52], where greater emphasis was placed on recycling, reuse and reduction of waste.

Technical Details of the NLP Driven Process
Returning to the technical details of the NLP driven process, the case study restricted itself to 74% of the original Christchurch dataset. This was because the study focused only on analysing complete sentences comprising subject + verb + object and verb + object. The remaining 26% involved incomplete sentences, as shown below. However, the incomplete sentences did convey public ideas, but NLP tools failed to identify grammatical components due to misuse of punctuations and incorrect grammar. It was possible to define a set of grammatical rules to capture some irregular structures, but these rules were handcrafted after manual scanning of raw texts, which becomes infeasible when dealing with large formats.
Model on treatment of Seine: Art, cafes, gardens all the way: The Avon Walk! for a great Central City? People-bringing community together.
Dining experiences, especially evening on/by the Avon River, Oxford Terrace, The Arts Centre, culture.
Similarly, there were a number of short passages that were hard to understand even by manual analysis, as shown below.
small park like areas.
Education centre, displays.

Ballantynes plus a 3?
Leaving aside difficulties dealing with incomplete sentences, the analysis of negative sentiments also proved challenging. For example, negative sentiments are expressed in the following sentences without using the word "not". Yet, the word "not" was commonly used to detect negative sentiments. To avoid overlooking these negative sentiments, a more subtle set of adjectives and adverbs are needed to capture these indirect views (such as detecting words less, avoid, limited, deter). I want less souvenir shops and more high streets and one off businesses to show NZ and locally made products to new Zealanders and tourists.
Avoid visual pollution created by cheap loud signage.
Parking is very limited in town and traffic congestion makes roads difficult and dangerous to cross when shopping.
It deters footpath life, such as café's spilling out onto footpaths.

Technical Guidelines for Large Format Public Data Collection
The performance of our fine-grain analysis approach lies in the accuracy of the Stanford CoreNLP toolkits we used in the project. The toolkits have been widely recognised as one of the state-of-art natural language processing tools that can robustly and efficiently extract linguistic data from large volumes of free-formatted texts with a low error rate, above 80% of accuracy reported [46,48,49]. The accuracy of the toolkits decreases when the text input contains incomplete sentences (discussed in Section 5.2) or sentences using irregular punctuations or grammar. To achieve a satisfying result and to utilise the full potential of NLP tools, we recommend adopting certain guidelines for large format public data collection as follows, which is similar to the template proposed by Rubin et al. [53], offering additional guidance on closed and open questions.

1.
Encourage complete sentence contributions by providing a default set of pronouns and verbs (I/we . . . etc.).

2.
Differentiate between interest-based and position-based suggestions using a default set of verbs (We must have . . . /We would like . . . etc.).

3.
Link contributions where possible to underlying cultural values through drop down menu to understand better public motives and priorities (sense of community, sense of connections, sense of identity, etc.).

4.
Encourage ranking of cultural values and translation into associated qualities when implementing those cultural values.

5.
For topic-specific public engagement, structure discussion around existing frameworks such as soft and hard urban infrastructures for urban planning and design [28] or 9R Framework for Circular Economy [54].

Conclusions
The pilot study has demonstrated both the potential and limitations of using NLP tools to analyse large format public data from public consultation exercises. At the policy level, it provides a novel digital platform to enable greater public engagement at grassroots to bridge the perennial gap between top-down and bottom-up approaches. This challenges the status quo where local and national governments prefer the top-down approaches. However, at a time of significant social and political changes, these traditional approaches of policy-making are being challenged. Interestingly, the EU sustainable goal advocates for a greater social inclusion, which is in line with the motivation for digital platforms. However, significant technical and political limitations still exist for sharing the power of political decision-making.
One notable technical limitation was the dependency on availability of complete sentences for the text analysis to be carried out. In the study, this limitation resulted in 26% of data being discarded. To overcome this constraint, guidelines have been proposed to format the collection of text data by encouraging sentence structures and attempting to differentiate between shared interests compared with fixed positions, ideally encouraging the format to facilitate collaboration.
It should also be noted that interpretation of NLP analysis required a thematic dictionary which in this pilot study was based on terms used to characterise soft and hard urban infrastructure. Hence, the novel method is a semi-automated technique reliant on compiling thematic dictionaries by theme experts. Even so, the technique eases the burden of manual interpretation by experts of large format data and offers the prospect of greater transparency and instantaneous two-mode of communication with the public.
On the positive side, NLP has been shown to provide a rapid means of objectively analysing relatively large amounts of text data which, when coupled with a thematic dictionary, allows an insight into public ideas and insights for transformational change. The choice of visualisations is still under development with numerous visualisation tools available to illustrate hierarchies, frequencies and networks from using sunburst plots, chord charts to word clouds and word trees. All of these methods provide a means of communicating complex data as a data story for public engagement to enable a purposeful, deliberative community discussion based on shared interests rather than fixed positions. However, its uptake depends on political will to share power and decision-making between traditional top-down methods and grassroots engagement.