Big Data Analytics in Australian Local Government

Australian governments at all three levels—local (council), state, and federal—are beginning to exploit the massive amounts of data they collect through sensors and recording systems. Their aim is to enable Australian communities to benefit from “smart city” initiatives by providing greater efficiencies in their operations and strategic planning. Increasing numbers of datasets are being made freely available to the public. These so-called big data are amenable to data science analysis techniques including machine learning. While there are many cases of data use at the federal and state level, local councils are not taking full advantage of their data for a variety of reasons. This paper reviews the status of open datasets of Australian local governments and reports progress being made in several student and other projects to develop open data web services using machine learning for smart cities.


Introduction
Open datasets are now being collected by many local governments in Australia such as City of Melbourne (COM), City of Adelaide [1], and City of Wyndham [2]. However, while Australia's federal and state government departments are embracing the new data age, it appears that most local councils are not taking advantage of the massive amounts of data they collect and manage to operate more efficiently [3]. Some councils do not have automated work systems, instead relying on manual approaches; others do not utilize their systems properly. In either case, these councils lack the information needed to make the best decisions for their community. Section 2 of this paper briefly reviews projects being undertaken in other countries which use data science and big data techniques to assist local government councils. Section 3 briefly reviews the local government open datasets in Australia, and those for the state of Victoria in particular. Section 4 examines how open datasets can be analyzed by predictive machine learning (ML) models and the results deployed on the Internet to better support decision makers.
Several student projects making use of Australian local government open datasets have been sponsored by the authors. One of these was undertaken in 2019 by final year Business students at Swinburne University of Technology in Melbourne, Victoria. Two projects are currently being undertaken by final year Software Engineering students at Swinburne and Monash Universities. Section 5 of the paper gives an overview of this work. Our conclusions are summarized in Section 6.

Local Government Big Data Projects in Other Countries
"Big data" is an amorphous, catch-all label for a wide selection of data [4]. It is often considered to be data that possesses certain broad characteristics (volume, velocity, variety, etc.), but many other descriptors such as value and veracity have been applied. It has been considered to be a

Australian Government Open Data
The Commonwealth of Australia is constituted as a federation of states and territories with power divided between the federal government (centered in the Australian capital city of Canberra) and the six state and two territory governments. Each state is further divided into councils. Thus, there are three levels of government-federal, state, and council with differing levels of responsibility. Australia is moving into the big data age at all three levels.
Australian government open data is stored and managed on the data portal: https://data.gov.au [24]. Anyone can access this public data published by federal, state, and local government agencies. It is a national resource that holds considerable value for growing the economy, improving service delivery, and transforming policy outcomes. In addition to government data, there is publicly funded research data and datasets from private institutions that are in the public interest. The site has over 30,000 publicly available datasets from federal, state, and local levels of government and continues to grow. The federal government's public data policy statement [25] requires all government agencies to make nonsensitive data open by default. In addition to free, open datasets, data.gov.au now includes information about unpublished data and data available for purchase.
The website data.gov.au is managed by the Australian Digital Transformation Agency (DTA). The redeveloped platform MAGDA (Making Australian Government Data Available) was developed in partnership with the Australian Commonwealth Scientific & Industrial Research Organization (CSIRO)'s data and digital specialist data sciences arm, Data61 (https://data61.csiro.au/). MAGDA is a fully open source project. Australian states also maintain their own websites for state-related data. The state of Victoria, for example, has a data portal, DataVic, at: https://data.vic.gov.au. Similarly, New South Wales has an open data portal at: https://data.nsw.gov.au/, as do the other states and territories. Further, some local councils maintain open data portals, although open data are generally stored on data.gov.au. Several open data platforms are in use in Australia, including CKAN, OpenDataSoft, Socrata, and ArcGIS [26]. Most of these come within the MAGDA project.
Australian local councils from all states and the ACT had published 2441 datasets as at 11 April 2020, the largest number (967) being from Victoria. The Victorian datasets are all published on data.gov.au, which includes a link to the COM's open data portal (https://data.melbourne.vic.gov.au/). Figure 1 below shows the numbers of open datasets published by all Victorian local councils. These open datasets mainly cover operational areas of local government including roads and footpaths, libraries, council property, and garbage collection, although some published datasets relate to more strategic matters such as population projections and long-term regional development. The COM has the largest number of open datasets (193) amongst Victorian local councils, and they include datasets derived from IoT sensors such as pedestrian traffic counters [27], which is a trend which can be expected to increase in the future. The numbers of most Victorian local council open datasets have been counted by the authors using a Microsoft Excel ® spreadsheet, and the most common names are shown graphically in Figure  2 below.  The numbers of most Victorian local council open datasets have been counted by the authors using a Microsoft Excel ® spreadsheet, and the most common names are shown graphically in Figure  2 below.  Many of these datasets can be visualized using built-in mapping tools. Datasets that contain a geospatial field (such as latitude and longitude) can be mapped and viewed in the visualization utility NationalMap. The City of Boroondara, for example, has a database of significant trees as shown in Figure 3 below. Smart Cities 2020, 3 FOR PEER REVIEW 5

5
Many of these datasets can be visualized using built-in mapping tools. Datasets that contain a geospatial field (such as latitude and longitude) can be mapped and viewed in the visualization utility NationalMap. The City of Boroondara, for example, has a database of significant trees as shown in Figure 3 below.

Analysis of Big Data by Machine Learning
While collecting data is important, web services (also termed web Application Programming Interfaces-APIs) using ML are needed to answer "what if" questions and make predictions about future needs. These can be developed using Python [28] or machine-learning-as-a-service (MLaaS), which may enable code-free development. The major players in this space are Amazon, Microsoft, Google, and IBM, [29] but there are smaller players including DataRobot ® [30], RStudio ® [31], and BigM ® L [32]. Development may be cloud-based, such as IBM Cloud ® , or using local hardware and software. A ML project comprises several stages: strategy, dataset preparation and preprocessing, dataset splitting, modeling, and model deployment [33]. Note, however, that these MLaaS services are generally not free.

Data Analytics with Python Libraries
One approach to big data analytics and ML is the application of python analytics and the ML libraries Pandas, Numpy, and Scikit-learn [34]. The process to analyze the data follows the steps as depicted in Figure 4

Analysis of Big Data by Machine Learning
While collecting data is important, web services (also termed web Application Programming Interfaces-APIs) using ML are needed to answer "what if" questions and make predictions about future needs. These can be developed using Python [28] or machine-learning-as-a-service (MLaaS), which may enable code-free development. The major players in this space are Amazon, Microsoft, Google, and IBM, [29] but there are smaller players including DataRobot ® [30], RStudio ® [31], and BigM ® L [32]. Development may be cloud-based, such as IBM Cloud ® , or using local hardware and software. A ML project comprises several stages: strategy, dataset preparation and preprocessing, dataset splitting, modeling, and model deployment [33]. Note, however, that these MLaaS services are generally not free.

Data Analytics with Python Libraries
One approach to big data analytics and ML is the application of python analytics and the ML libraries Pandas, Numpy, and Scikit-learn [34]. The process to analyze the data follows the steps as depicted in Figure 4   A pilot study using ML was performed by one of the authors using road accident statistics from Victoria [35]. Figure 5 below shows some outcomes of this initial analysis: the number of persons killed for accidents in different speed zones and the severity of accident in each speed zone within Victoria.

Predictive Machine Learning with Python Libraries
An ML model can then be created using the python scikit-learn libraries as reported in [34,36,37] following these steps: 1. predict outcomes using Naïve Bayes, Linear Support Vector Classification, K Neighbors Classifier, Random Forest, or other models 2. compare model accuracy which can then inform the appropriate ML model 3. save the model using the joblib library 4. create an API using the flask framework 5. test the model over the web using the Postman ® client.
The different models achieved the following scores indicating that the Random Forest Classifier provided the highest predictive accuracy as shown in Table 1. A higher score indicates a higher predictive accuracy. A pilot study using ML was performed by one of the authors using road accident statistics from Victoria [35]. Figure 5 below shows some outcomes of this initial analysis: the number of persons killed for accidents in different speed zones and the severity of accident in each speed zone within Victoria. A pilot study using ML was performed by one of the authors using road accident statistics from Victoria [35]. Figure 5 below shows some outcomes of this initial analysis: the number of persons killed for accidents in different speed zones and the severity of accident in each speed zone within Victoria.

Predictive Machine Learning with Python Libraries
An ML model can then be created using the python scikit-learn libraries as reported in [34,36,37] following these steps: 1. predict outcomes using Naïve Bayes, Linear Support Vector Classification, K Neighbors Classifier, Random Forest, or other models 2. compare model accuracy which can then inform the appropriate ML model 3. save the model using the joblib library 4. create an API using the flask framework 5. test the model over the web using the Postman ® client.
The different models achieved the following scores indicating that the Random Forest Classifier provided the highest predictive accuracy as shown in Table 1. A higher score indicates a higher predictive accuracy.

Predictive Machine Learning with Python Libraries
An ML model can then be created using the python scikit-learn libraries as reported in [34,36,37] following these steps: 1.
predict outcomes using Naïve Bayes, Linear Support Vector Classification, K Neighbors Classifier, Random Forest, or other models 2.
compare model accuracy which can then inform the appropriate ML model 3.
save the model using the joblib library 4.
create an API using the flask framework 5.
test the model over the web using the Postman ® client.
The different models achieved the following scores indicating that the Random Forest Classifier provided the highest predictive accuracy as shown in Table 1. A higher score indicates a higher predictive accuracy. The next stage is to determine how much each feature contributed to the model accuracy. For the sample data, the main features of importance were then found to be as shown in Table 2. This shows that the speed zone is of the greatest importance for this model with light condition and road geometry also of significance. The model can then be tested over the web using the Postman ® client. In the request screen, usually at the top, the query is entered in JavaScript Object Notation (JSON) format. On entering Send, the prediction is output in the response screen, usually at the bottom, also in JSON format. As an example: Input: {"LIGHT_CONDITION":5, "SPEED_ZONE":110, "ROAD_GEOMETRY":5}, {"LIGHT_CONDITION":9, "SPEED_ZONE":30, "ROAD_GEOMETRY":9} Output: {"prediction": "[2, 3]"} Note: Accident Severity 2 means Serious Injury, Accident Severity 3 means Other Injury. The API used for this pilot study was very similar to that given by Paul [37]. The endpoint was http://IPaddress:12345/predict, and the Request and Response parameters were as above.

Big Data Projects Sponsored by the Authors
The authors have recently sponsored and themselves worked on several big data projects: 1.
Analysis of pedestrian traffic in the COM (Swinburne University) 2.
Victorian local government open datasets web APIs (Swinburne University) 3.
Development of web APIs to assist local government manage waste disposal and recycling (being done by one of the authors and Monash University) 4.
Analysis of COM social indicator survey (currently being done only by one of the authors) These are discussed in the following subsections.

City of Melbourne Pedestrian Traffic Analysis
As an example of local council initiatives, analysis of pedestrian traffic in the COM was reported by Carter at al. [27]. While COM is not a suburban council, it is still the local council for the inner-city Central Business District (CBD). The work was sponsored by the authors and carried out by a Swinburne University team. This used COM datasets and Microsoft Power BI ® for data analysis and visualization. Sample analysis is shown in Figure 6. below, where the average pedestrian count by quarter over the past five years (2014-2019) is shown. The COM has a network of pedestrian sensors that upload their data regularly to a COM server thus enabling analysis. The aim of the project was to inform COM on options to improve pedestrian mobility in the city.
Smart Cities 2020, 3 FOR PEER REVIEW 8 Figure 6. Weekday pedestrian count by quarter over past five years in City of Melbourne (adapted from [27]).
Future work was suggested that could link these pedestrian flow data with social media data from smartphones and potentially wearable devices such as fitness monitors to correlate pedestrian satisfaction with traffic flow. The 'happiness' effect of pedestrians passing through green areas such as city parks can also be quantified. Expansion of the sensor network to include more of the city and to extend the pedestrian counting system to include more features such as age and sex of pedestrians was also suggested.

Victorian Local Government Web API Project
A Swinburne University student team was assigned to develop one or more web APIs using Victorian local government open datasets. A web API is an interface for software applications analogous to a graphical user interface used by humans [38]. The Victorian road accident analysis described above used the python scikit-learn library to develop an ML model and deployed the model using the python flask API. The Swinburne student team assigned to this project are planning to use MLaaS, and have learned to use IBM Watson ® Studio [39]. As no funding is available for this project, they initially worked with the free (Lite) version of Watson Studio, but unfortunately this has proven to be unsuitable for this project. Similar experience was encountered with Microsoft Azure ® Machine Learning Studio: the free version quickly expired. Both IBM Watson ® Studio and Microsoft Azure ® have comprehensive analytics and ML capabilities, however, no funding was available for paid subscriptions. Other MLaaS packages being evaluated include DataRobot ® , RStudio ® , and BigML ® . RStudio ® is most likely to be adopted as it is free software and has an integrated development environment for the R programming language, which is good for visualization and statistical analysis.
This project is currently a work in progress, and the team is considering traffic, transport, and parking as the preferred use cases. As far as we know, not much work of this nature has been undertaken in Australia, so it will potentially be of great benefit to Australian local government councils and members of the public who must deal with them. Future work was suggested that could link these pedestrian flow data with social media data from smartphones and potentially wearable devices such as fitness monitors to correlate pedestrian satisfaction with traffic flow. The 'happiness' effect of pedestrians passing through green areas such as city parks can also be quantified. Expansion of the sensor network to include more of the city and to extend the pedestrian counting system to include more features such as age and sex of pedestrians was also suggested.

Victorian Local Government Web API Project
A Swinburne University student team was assigned to develop one or more web APIs using Victorian local government open datasets. A web API is an interface for software applications analogous to a graphical user interface used by humans [38]. The Victorian road accident analysis described above used the python scikit-learn library to develop an ML model and deployed the model using the python flask API. The Swinburne student team assigned to this project are planning to use MLaaS, and have learned to use IBM Watson ® Studio [39]. As no funding is available for this project, they initially worked with the free (Lite) version of Watson Studio, but unfortunately this has proven to be unsuitable for this project. Similar experience was encountered with Microsoft Azure ® Machine Learning Studio: the free version quickly expired. Both IBM Watson ® Studio and Microsoft Azure ® have comprehensive analytics and ML capabilities, however, no funding was available for paid subscriptions. Other MLaaS packages being evaluated include DataRobot ® , RStudio ® , and BigML ® . RStudio ® is most likely to be adopted as it is free software and has an integrated development environment for the R programming language, which is good for visualization and statistical analysis.
This project is currently a work in progress, and the team is considering traffic, transport, and parking as the preferred use cases. As far as we know, not much work of this nature has been undertaken in Australia, so it will potentially be of great benefit to Australian local government councils and members of the public who must deal with them. A prototype web system focused on transport operations has been developed with the home page shown in Figure 7. When complete, users will be able to access predictive models for traffic, transport, and parking, based on ML from open COM datasets.
Smart Cities 2020, 3 FOR PEER REVIEW 9 Figure 7. Home page of prototype Smart City website developed by Swinburne University student project team.

Waste Management Web API Project
This project, which is being undertaken by several student project teams at Monash University, is planned to apply ML or other big data analytics techniques to local council waste management problems. Waste management is one of the biggest challenges posed by the rapid growth of urban populations. This project is thus of potentially great benefit to the environment as well as local government councils. A waste management system has many stakeholders including the local council administration, waste truck owners, and managers of dumps and recycling factories [40]. These are depicted in Figure 8 below. Many decisions need to be made, including when to collect waste from bins (scheduling) and what route trucks will follow (routing). A survey on the various decisions needed and supporting IoT-based models proposed is given by Anagnostopoulos et al. [41]. As discussed by Esmaeilian et al. [42], waste management should really be seen as part of the whole product life-cycle, and IoT-based data collected to enable tracking of products from production to disposal. Further, many barriers exist to adoption of smart waste management systems, including lack of standards and policy norms, as well as lack of knowledge by policy makers [43].

Waste Management Web API Project
This project, which is being undertaken by several student project teams at Monash University, is planned to apply ML or other big data analytics techniques to local council waste management problems. Waste management is one of the biggest challenges posed by the rapid growth of urban populations. This project is thus of potentially great benefit to the environment as well as local government councils. A waste management system has many stakeholders including the local council administration, waste truck owners, and managers of dumps and recycling factories [40]. These are depicted in Figure 8 below. Many decisions need to be made, including when to collect waste from bins (scheduling) and what route trucks will follow (routing). A survey on the various decisions needed and supporting IoT-based models proposed is given by Anagnostopoulos et al. [41]. As discussed by Esmaeilian et al. [42], waste management should really be seen as part of the whole product life-cycle, and IoT-based data collected to enable tracking of products from production to disposal. Further, many barriers exist to adoption of smart waste management systems, including lack of standards and policy norms, as well as lack of knowledge by policy makers [43].
Big data analytics including ML and AI can be applied to many aspects of waste management. Gupta et al. [44] reviewed ML models of the scheduling of waste collection from bins and the sorting and recycling of waste. Another comprehensive review of waste management models in the literature is given by Pardini et al. [45]. A case study of applying ML techniques to predicting fill levels of rubbish bins is reported by Rutqvist et al. [46]. This study showed that ML methods greatly improved the detection accuracy of emptying recycling containers using data from sensors mounted on top. Further, Al-Masri et al. [47] describe an IoT-enabled waste management system, recycle.io, that uses the Microsoft Azure ® IoT hub that enables councils to better regulate waste disposal. Idwan et al. [48] developed a garbage truck optimal routing algorithm using IoT data and agent-based models. Chaudhari and Bhole [49] described an application using IoT data to monitor real-time garbage bin status and enable collection trucks to find efficient routes. This application is implemented on the Android ® operating system and uses the ThingsSpeak ® platform to visualize the data. An assessment of the savings resulting from a similar system in an Indian city is given by Fataniya et al. [50].
Smart Cities 2020, 3 FOR PEER REVIEW 10 10 Figure 8. Stakeholders of the waste management system (reproduced with permission from [40]).
Big data analytics including ML and AI can be applied to many aspects of waste management. Gupta et al. [44] reviewed ML models of the scheduling of waste collection from bins and the sorting and recycling of waste. Another comprehensive review of waste management models in the literature is given by Pardini et al. [45]. A case study of applying ML techniques to predicting fill levels of rubbish bins is reported by Rutqvist et al. [46]. This study showed that ML methods greatly improved the detection accuracy of emptying recycling containers using data from sensors mounted on top. Further, Al-Masri et al. [47] describe an IoT-enabled waste management system, recycle.io, that uses the Microsoft Azure ® IoT hub that enables councils to better regulate waste disposal. Idwan et al. [48] developed a garbage truck optimal routing algorithm using IoT data and agent-based models. Chaudhari and Bhole [49] described an application using IoT data to monitor real-time garbage bin status and enable collection trucks to find efficient routes. This application is implemented on the Android ® operating system and uses the ThingsSpeak ® platform to visualize the data. An assessment of the savings resulting from a similar system in an Indian city is given by Fataniya et al. [50].
The Monash University student projects are at the time of writing still in the planning stage, but two which should be mentioned are one relating to optimizing garbage collection routes based on the type of garbage and urgency of collection, and one about identification and classification of recyclables using image recognition. The authors of this paper plan to evaluate the products produced by these project teams and seek to present them for consideration by local government councils.
Smart bin systems have been adopted by many local councils throughout the world, including the City of Bristol, UK [51]. In Australia, they have been adopted by the City of Hobart, Tasmania [52], and by the Victorian cities of Melbourne, Wyndham, and Hume [53]. Wyndham is a council in the western suburbs of Melbourne, and the council-administered area includes the older suburb of Werribee, plus several newly developed suburbs including Point Cook. The City of Wyndham's The Monash University student projects are at the time of writing still in the planning stage, but two which should be mentioned are one relating to optimizing garbage collection routes based on the type of garbage and urgency of collection, and one about identification and classification of recyclables using image recognition. The authors of this paper plan to evaluate the products produced by these project teams and seek to present them for consideration by local government councils.
Smart bin systems have been adopted by many local councils throughout the world, including the City of Bristol, UK [51]. In Australia, they have been adopted by the City of Hobart, Tasmania [52], and by the Victorian cities of Melbourne, Wyndham, and Hume [53]. Wyndham is a council in the western suburbs of Melbourne, and the council-administered area includes the older suburb of Werribee, plus several newly developed suburbs including Point Cook. The City of Wyndham's smart bins data are stored in three datasets on the data.gov.au portal. A smart city maturity assessment [54] found that this council has a strong smart city culture, and is rapidly developing new technologies and embedding them in its business processes.
A preliminary analysis of the Wyndham City Council (WCC) smart bins data has been carried out by one of the authors. These bins are currently only located in the Werribee CBD and in Point Cook near Boardwalk Park. The datasets include the fill levels and other data about 32 large bins, which are used for either waste or recyclables such as bottles and cans. The fill levels recorded by sensors each day are stored on one dataset, and another dataset stores daily records going back to 2018. The analysis included visualization of the bin fill levels over time, and geospatial maps of bin locations. Figure 9 below shows a histogram of fill levels of all the smart bins on one day in 2020. Figure 10 below shows the fill levels of one of the bins over a 13-day period in 2018. The data appears to be very coarse grained, and therefore may not lend itself to the ML techniques discussed in [46]. Further investigation of this smart bins system is planned. 11 which are used for either waste or recyclables such as bottles and cans. The fill levels recorded by sensors each day are stored on one dataset, and another dataset stores daily records going back to 2018. The analysis included visualization of the bin fill levels over time, and geospatial maps of bin locations. Figure 9 below shows a histogram of fill levels of all the smart bins on one day in 2020. Figure 10 below shows the fill levels of one of the bins over a 13-day period in 2018. The data appears to be very coarse grained, and therefore may not lend itself to the ML techniques discussed in [46]. Further investigation of this smart bins system is planned.

11
sensors each day are stored on one dataset, and another dataset stores daily records going back to 2018. The analysis included visualization of the bin fill levels over time, and geospatial maps of bin locations. Figure 9 below shows a histogram of fill levels of all the smart bins on one day in 2020. Figure 10 below shows the fill levels of one of the bins over a 13-day period in 2018. The data appears to be very coarse grained, and therefore may not lend itself to the ML techniques discussed in [46]. Further investigation of this smart bins system is planned.  A program to draw geospatial maps of the WCC smart bin locations and type (waste or recyclables) was written using the python libraries basemap and matplotlib. This uses a python flask API which can be accessed by a Representational State Transfer (REST) client such as Postman ® using the GET command. The endpoint is: https://IPaddress:80/plot/plot.jpeg On sending the GET command, this API returns a street map of the Werribee CBD with the smart bins shown, as in Figure 11 below. The full python code is given below: #API to download map of Werribee CBD with smart bin locations plotted #Needs to be adapted to your python environment #Author: Richard Watson #import libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt from pandas import DataFrame from flask import Flask, request, abort, jsonify, send_from_directory import os os.environ['PROJ_LIB']='C:/Users/Miniconda/Library/share' from mpl_toolkits.basemap import Basemap #Serve plot with flask app=Flask(__name__) @app.route("/plot/<path:path>") def get_file(path): """Download plot.""" return send_from_directory( DOWNLOAD_DIRECTORY, path, as_attachment=True) if __name__=="__main__": app.run(debug=True, host='0.0.0.0', port=80, ssl_context='adhoc') It is planned to develop an API which extracts smart bin fill levels from the WCC open datasets and works out when emptying is due for each bin, and the optimal route for trucks which empty the bins and transport waste to landfill and recyclables to recycling factories. The above API is only the start of what we intend to produce; many further stages of development have still to be carried out. The final set of APIs will probably need to be containerized and deployed on a cloud [55]. A bespoke client for accessing the APIs, probably a REST one suitable for mobile as well as PC, will also be needed.
Smart Cities 2020, 3 FOR PEER REVIEW 14 client for accessing the APIs, probably a REST one suitable for mobile as well as PC, will also be needed.

Analysis of City of Melbourne Social Indicator Survey and Livability Datasets
ML is beginning to be applied to social surveys. Buskirk et al. provided an introduction to the potential for ML techniques for survey research [56]. Ramirez et al. applied ML to public health surveys to explore the use of language for English and non-English responses [57]. This study showed that there are differences between responses in different languages with heterogeneity among the Asian languages. These authors also applied ML to interpret the 2016 US Presidential Election, concluding that registration and voting procedures along with political issues were significant features but that only the age demographic factor was strongly linked to voter participation [58].
Recently, the COM has conducted a survey and related study to measure city performance in regard to health, wellbeing, participation, and connection of its communities. The COM Social Indicators Survey (CoMSIS) [59] was conducted in 2018 while the COM Liveability and Social Indicators study [60] was conducted in 2019. The latter dataset includes social indicators determined

Analysis of City of Melbourne Social Indicator Survey and Livability Datasets
ML is beginning to be applied to social surveys. Buskirk et al. provided an introduction to the potential for ML techniques for survey research [56]. Ramirez et al. applied ML to public health surveys to explore the use of language for English and non-English responses [57]. This study showed that there are differences between responses in different languages with heterogeneity among the Asian languages. These authors also applied ML to interpret the 2016 US Presidential Election, concluding that registration and voting procedures along with political issues were significant features but that only the age demographic factor was strongly linked to voter participation [58].
Recently, the COM has conducted a survey and related study to measure city performance in regard to health, wellbeing, participation, and connection of its communities. The COM Social Indicators Survey (CoMSIS) [59] was conducted in 2018 while the COM Liveability and Social Indicators study [60] was conducted in 2019. The latter dataset includes social indicators determined from the ComSIS. These data were recorded into two medium-sized open datasets, one that contains responses to questions about lifestyle and health while the other focuses on livability indicators for city services. A third smaller dataset provides indicators of wellbeing by year [61]. The COM uses these datasets to measure city performance against other cities that are members of the World Council on City Data (https://www.dataforcities.org/wccd/).
The first dataset CoMSIS is structured as a CSV file with 666 rows and 10 columns with mainly text data. There are 18 groups of respondents for the survey. Note that while these datasets are not strictly big data, the same analysis techniques can be applied as described in Section 4. Indeed, these datasets are complex: for example, there are 14 topics with one, physical activity, having 108 rows for the following 6 questions asked to each of the 18 groups.
Participate in sports and exercise activities 3.
Participate in sports and exercise activities in COM 4.
Participate in organized physical activity 5.
Participate in physical activity organized by a fitness, leisure, or indoor sports center 6.
Participate in physical activity organized by a sports club or association A sample of the CoMSIS data analysis is shown in Figure 12 for indigenous cultural awareness. The fraction (%) for each of the 18 surveyed groups that correctly identified the two traditional indigenous tribes in the Melbourne area (Wurundjeri and Boonwurrung) is displayed (in red) with an average value of 3.5%. The blue bars denote the percentage of each group that rated the relationship between indigenous Aboriginal and Torres Strait Islander peoples and other Australians as very important with a much higher average value of 92.7%.
This difference in response may be likely due to the difference between a specific question that is challenging and a more generalized question on culture. It also indicates that most residents are unaware of the two local indigenous tribes or at least cannot name them both. Indeed, the Wurundjeri tribe was historically more populous and has a far higher profile in the modern city than the Boonwurrung. For example, there is a famous 25 m eagle sculpture in the Docklands area, Bunjil, the spirit creator of the Wurundjeri people (https://www.onlymelbourne.com.au/bunjil).
Can ML also be applied here? One area of interest would be to develop an ML model to predict physical activity indicators based on respondent group, sample size, and other features. A preliminary investigation was carried out setting the "physical activity" result as the goal state, with several standard algorithms tested on the COM data. Both Random Forest and Naïve Bayes achieved over 98% accuracy for predicting physical activity; however, the sample size is small. This analysis also showed that the features of importance included the respondent group and the sample size.
The second dataset is formatted as a 319-row x column 10 CSV file. Unlike the first dataset, these data were not collected from a people survey but rather from a variety of sources such as the Australian Bureau of Statistics. Similar to CoMSIS, this is a complex dataset with 15 topics each with multiple indicators. Figure 13 shows the 15 livability topics studied and the number of questions (indicators) for each topic.
This difference in response may be likely due to the difference between a specific question that is challenging and a more generalized question on culture. It also indicates that most residents are unaware of the two local indigenous tribes or at least cannot name them both. Indeed, the Wurundjeri tribe was historically more populous and has a far higher profile in the modern city than the Boonwurrung. For example, there is a famous 25 m eagle sculpture in the Docklands area, Bunjil, the spirit creator of the Wurundjeri people (https://www.onlymelbourne.com.au/bunjil). Can ML also be applied here? One area of interest would be to develop an ML model to predict physical activity indicators based on respondent group, sample size, and other features. A preliminary investigation was carried out setting the ''physical activity" result as the goal state, with several standard algorithms tested on the COM data. Both Random Forest and Naïve Bayes achieved over 98% accuracy for predicting physical activity; however, the sample size is small. This Smart Cities 2020, 3 FOR PEER REVIEW 16 analysis also showed that the features of importance included the respondent group and the sample size.
The second dataset is formatted as a 319-row x column 10 CSV file. Unlike the first dataset, these data were not collected from a people survey but rather from a variety of sources such as the Australian Bureau of Statistics. Similar to CoMSIS, this is a complex dataset with 15 topics each with multiple indicators. Figure 13 shows the 15 livability topics studied and the number of questions (indicators) for each topic. Further analysis and ML combining the two datasets is planned. This will be presented in a separate paper.

Conclusions
Australian governments at all three levels-local (council), state, and federal-are exploiting their open datasets to enable Australian communities to benefit from smart city initiatives. These datasets are amenable to big data analytics and ML techniques to assist governments to improve the efficiency of their operations and planning. In this paper, we briefly reviewed the open datasets published to date by Australian local government councils. We then explored the fields of big data analytics and ML and demonstrated how local councils can benefit from these. A pilot study using python-based data analytics and ML on a Victorian road accident dataset was described. This produced a prototype model that could be used to predict crash probabilities given various factors.
Student projects sponsored by the authors in two Melbourne Universities relating to waste management, traffic, transport, and parking were then described and preliminary results of a social indicators survey analysis (performed by one of the authors) were also provided. It was found that the MLaaS systems, while having great potential, were only available free for limited application and required paid subscriptions for their full services. While these projects are yet to deliver usable applications, the potential is enormous. A smart city or council would benefit from a web-served Further analysis and ML combining the two datasets is planned. This will be presented in a separate paper.

Conclusions
Australian governments at all three levels-local (council), state, and federal-are exploiting their open datasets to enable Australian communities to benefit from smart city initiatives. These datasets are amenable to big data analytics and ML techniques to assist governments to improve the efficiency of their operations and planning. In this paper, we briefly reviewed the open datasets published to date by Australian local government councils. We then explored the fields of big data analytics and ML and demonstrated how local councils can benefit from these. A pilot study using python-based data analytics and ML on a Victorian road accident dataset was described. This produced a prototype model that could be used to predict crash probabilities given various factors. Student projects sponsored by the authors in two Melbourne Universities relating to waste management, traffic, transport, and parking were then described and preliminary results of a social indicators survey analysis (performed by one of the authors) were also provided. It was found that the MLaaS systems, while having great potential, were only available free for limited application and required paid subscriptions for their full services. While these projects are yet to deliver usable applications, the potential is enormous. A smart city or council would benefit from a web-served system that hosted a set of big data analytics models to manage its operations such as waste disposal, parking, and energy consumption. Funding: This research received no external funding. The research was done in the authors' own time out of interest in the topic. One of the authors is an Honorary Research Fellow at the Australian Defence Science and Technology Group (DSTG) and some research was carried out within that organization.