Big Data: The Engine to Future Cities—A Reﬂective Case Study in Urban Transport

: In an era of smart cities, artiﬁcial intelligence and machine learning, data is purported to be the ‘new oil’, fuelling increasingly complex analytics and assisting us to craft and invent future cities. This paper outlines the role of what we know today as big data in understanding the city and includes a summary of its evolution. Through a critical reﬂective case study approach, the research examines the application of urban transport big data for informing planning of the city of Sydney. Speciﬁcally, transport smart card data, with its diverse constraints, was used to understand mobility patterns through the lens of the 30 min city concept. The paper concludes by offering reﬂections on the opportunities and challenges of big data and the promise it holds in supporting data-driven approaches to planning future cities.


Introduction
Clive Humby (2008) [1]-of Tesco Clubcard fame once declared that 'data is the new oil', broadly connecting the digital collection and analysis of 'big' data-and behavioural data generated by human activity in particular-to a nascent Fourth Industrial Revolution [2]. In an urban context, big data has been linked to the smart city, the Internet of Things (IoT), and, more recently, to the revival of the 'Digital Twin" [3]. This 'new oil' has fuelled increasingly complex and dynamic city analytics, as well as displacing survey data historically collected to understand demographic change and human behaviour. In its progress, there is promise of big data to better support management of existing infrastructures and, in theory, the planning and sustainable development of the cities of tomorrow. Of course, a more nuanced understanding of the strengths and limitations of big data point to the essential ongoing role of purposive data collection, such as via the national census and household travel survey, in providing insights that underpin a rich tapestry of urban research and real-world policy and decision-making.
In this research, we trace a number of opportunities and challenges in the exploitation of big data within a land-use and transportation planning context. We take a long view of 'big data', considering a number of definitions, and developing a 'potted history' of the co-evolution of data, tools, methods, and research inquiries. We then introduce a case study using public smartcard transport data and explore its application to the '30-min city' concept as a cornerstone of Sydney's Metropolitan Planning Strategy. Lastly, we provide a critical reflection on the promise of big data, and on the opportunity and challenges in its access and its use in planning our future cities. Batty (2013) [4] once defined big data as 'any data that cannot fit into an Excel spreadsheet'. In this new age, this tongue-in-cheek definition now seems feels very much like saying the definition of an ocean is that it does not fit into a teacup. However, it is clear

A History of Big Data in Understanding Cities
The application of data to the governance of cities and regions can be traced back far beyond the origins of computers: Van Rijmenam (2014) [17] highlights the earliest use of written records, dating back more than 7000 years, as part of the administration of ancient Mesopotamia. The Romans too are well known for their administrative acumen, with the census perhaps best known for its role in Christianity's origin story. Around the same time, in 2CE, the Han Dynasty were performing their own census recording data about a Chinese population of more than 57 million in 12 million households, larger than the current Australian census, without the benefit of computers [18,19].
The modern census-with its goal of recording every member of the population at regular intervals-is often associated with the rise of the nation state and the use of statistical tables for the purpose of governance and planning [20]. We suggest, however, that it is perhaps more relevant to trace the modern idea of 'big data' back to the US Census of 1896 and the use of Herman Hollerith's punch card tabulation machines [21]. These machines, the precursor to IBM's mainframes, were central to the development of modern computers [22] and marked a radical shift towards the use of 'machines'-mechanical, electrical, and, finally, digital-in the processing of data.

Urban Modelling and Big Data
The history of Large-Scale Urban Models (LSUM) reflects the linkage of new data and new tools to new planning approaches. Rooted in the premise that computer models of urban land use and transportation could contribute to more rational urban planning, early LSUM improved our understanding of model building, its limitations, and its relationship to policy. Strong contemporary criticism by Lee (1973) [24] stressed that these efforts largely failed to deliver: they were too 'hungry' for disaggregated data and, at the same time, too coarse in their results to be useful in addressing issues that they were intended to solve. Lee also noted that too much emphasis was put on data and rationality, and that this was not supported by the theoretical underpinnings of such models.
Twenty years later, considering changes in society, information technologies, and computing, researchers began to revisit LSUMs [25,26]. The desktop computer gave planners ready access to more data, greater computational power, and dramatically better graphics, all accompanied by new software tools. Wegener (1994) [27] emphasised that by the mid-1990s many LSUMs were being applied to real-life metropolitan regions for the purpose of research and/or policy analysis. Batty (1994) [28] argued that the evolution of urban modelling, integrating theory and method, was an essential element of the movement towards a 'science of planning'. New paradigms, such as complexity, started affecting the way urban systems were conceptualised and were accompanied by new modelling techniques such as cellular automata (CAs) and agent-based modelling (ABMs), used to understand the complex interplay between land use and transport dynamics and flows across the city.
A further twenty years later, Batty (2014) [29] revisited this theme in the light of the smart cities concept (see also Townsend 2013 [30]), noting the same kind of 'euphoria' as was associated with LSUMs in the 1960s, and stressing the need to learn from history and to recognise the limits of such approaches. Yes, computational and sensing devices-both fixed and mobile-are now distributed across the city fabric and are connected to one another by digital networks; however, much of the data generated via these interactions is unstructured and the analyst must either discover order, or impose it, using other data and/or theory for guidance.
The 'Digital Twin' initiative can be seen as a way to reintegrate the concept of LSUMs in the current context and it and embodiment of urban big data. At its most basic, the digital twin is a virtual model of physical assets that can be used for predictive maintenance. Crucially, through the use of big data generated by IoT devices, high resolution city imagery of building and real-time traffic feeds the model is no longer really distinct from the city that it represents. The data generated by this infrastructure enables the model to continuously adapt to changes in its environment and the model can, in turn, inform the environment and its infrastructures. In other words, through feedback effects the distinction between the physical world and its digital simulacrum is breaking down. This might not be a new idea for readers of post-modern theorists such as Baudrillard (1994) [31], but the convergence of IoT, big data, and machine learning puts planning on a very different footing (again) and holds out the promise of predicting and shaping the future [32], even if the event horizon is measured in minutes, hours, or days, rather than the more usual planning horizon of years or decades [33].

Data for Future Cities
This trend towards more real-time data and analytics seems to point to the ultimate emergence of a 'smart' city in which individuals and instrumentation serve as the eyes, ears, and actuators of a kind of city-scale artificial intelligence (AI). If we consider Townsend's (2013) [30] definition of smart cities as being 'where information technology is wielded to address the myriad problems facing society through the collection and analysis of data about the behaviour of people, precincts, and infrastructure', the promise here is of 'better' decision-making and the realisation of more liveable cities where citizens spend less time commuting and have more time for work and leisure. However, as more critical scholars have pointed out, this can strip plan-making of its political features and reframe it as a purely technocratic problem that is centred on simply selecting the most appropriate optimisation criteria.
Regardless, data-driven approaches to urban management are unquestionably becoming more common. Engin et al. (2019) [14] mapped this landscape and highlighted three categories of application: (i) real-time management; (ii) evidence-based planning decisions; and (iii) framing the future. The data required to support these data-driven approaches are captured through combination of traditional government and industry data capture programs coupled with volunteered geographic information [34],which Ratti and Claudel (2016) [35] term 'opportunistic sensing' and with data from authoritative data capture programs.
However, the ethics of such (re)use of VGI and opportunistic data is of considerable interest to government and citizens; most notable and controversial has been the use of harvested Facebook profiles by Cambridge Analytica [36]. Nevertheless, the approach seems here to stay and will likely play a significant role in future collection and curation of urban big data: even Mastercard has become an urban mobiilty solutiona provider through theirr City Possible programme, partnering with Cubic to delivering a contactless payment system for both 'open loop' and 'closed loop' systems in cities such as Sydney and London. s. Such urban mobility data products are made possible through the collection and aggregation of individual transactions which can be considered as digital traces consumers leave in daily activity. In the last two decades alone, data from providers, such as Mastercard, have uncovered previously unrealised patterns of activities (e.g., in e-commerce), which taken as a whole, has challenged what was previously understood of land-use planning, urban mobility, and other demographic and location connections [37].
Owing to this, questions of data ownership and access for a variety of purposescommercial, academic, and governmental-have been widely debated in relation to online services, but similar notions can also be applied to data on the built-environment itself. Valuable information, ranging from property values and building characteristics, to the demographic make-up of their residents and users are often still held tightly by institutions, but alternative systems of data collection, such as Colouring London [38], have sought to 'open' big data by leveraging on crowdsourcing and participatory mapping. The democratisation of urban data by such platforms heralds a new evolution of informed decision-making, replacing outdated legacy systems with novel connected and accessible models that have the potential to reshape administrative and municipal functions. Indeed, if used judiciously, we believe the availability of such data might address some of the complex problems facing our cities by enabling an informed approach towards tackling congestion and realising more integrated public transport solutions for our rapidly urbanising planet.

Urban Data for Transport Planning
Indeed, urbanisation generates no shortage of complex problems: it has been estimated that in Australian cities alone, the cost of avoidable congestion was $12.8 billion in 2015. This figure is expected to more than double to approximately $30 billion by 2030, driven by the rapid growth of Sydney, Melbourne, Brisbane, and Perth [39]. Getting commuters off roads and using public transport has also been reported to have significant environmental benefits in realising more sustainable cities [40] However, this is not a new problem. As early as the 1970s, the Sydney Coordinated Adaptive Traffic System (SCATS) was deployed sensors at each traffic signal to detect vehicle and pedestrian presence. The systemoperating in real time-adjusts traffic signals in response to variations in demand and system capacity. SCATS is now the leading traffic management system in the world, which is used to quietly monitor and control over 37,000 intersections across 27 countries [41]. In this sense, it may perhaps be fair to frame SCATS as a big data, smart city 'solution' that predates the existence of these terms.

Rise of the Mobile Phone
A step change in the availability of data about urban mobility came with the maturation of the GPS system. It then became possible not just to count, but to track, vehicles including private cars, freight, and public transport. Following suit, the uptake of GPS-and data-enabled smartphones has since furthered this process. The inherently personal nature of such devices holds out promise of understanding individual behaviours through their movement. Certainly, smartphone applications, such as Riderlog [42] and SafetiPin [43] has enabled the collection, analysis, and visualisation of individual movement at scale; and, collectively, these apps point towards the possibility of a 'cellular census' [44].
Social media is another frequently used source of big data for urban analytics with the analysis of geographically encoded 'tweets' as one research pathway [45]. These novel datasets have seen applications in the quantification of transport-and demographic migration models, often exhibiting more relevance to planning than traditional models hinging on cyclical data collection [46,47]. Volunteered geographic data, in tandem with the widespread use of mobile phones has, more than ever before, enabled researchers to begin to make sense of this unpredictability. Following this line of thinking, other investigators have also incorporated natural language processing techniques such as sentiment analyses [48] and topic modelling [49] to provide a level of rationalisation of the complexity associated with big data.
In the context of transport, the role of big data consumers in the standardisation of data about urban mobility has often been overlooked. Principal amongst these has been Google and its role in the development of the General Transit Feed Specification (GTFS) in collaboration with Portland's TriMet public transport operator [50]. Designed with multi-modal journey planning in mind, GTFS has become the de facto standard for sharing data about subway, rail, and bus schedules because of Google's pre-eminence in end-user mapping and trip planning.
The standard has, of course, made Google Maps more useful to its customers, reinforced the value of the standard to the operators and encouraged yet more of them to converge on this format as the means for publishing their own timetables. Google can, in turn, combine journey planner requests with real-time location data from phones to prompt a user to input valuable real-time information on the crowdedness of the train or bus and any delays impacting the service. This is an unusually virtuous circle: the phone serves both to access and generate data, yielding ever-more nuanced and exhaustive insights into city-scale mobility and, in turn, allowing users and planners to make ever-better use of a limited resource though few outside of Google are ever likely to see this data 'in full'.
Not all data aggregators have approached this in quite the same way: Strava collects and provides data-for a price-to help cities understand people's cycling patterns and drive new research into active travel behaviour [51,52]. Their data has, for instance, assisted with cycle infrastructure planning by Miami-Dade County, Florida [53]. Of course, the degree to which an exercise app's user base can substitute for the population of pedestrians or cyclists in an urban area is very much an open question, and one to which we will return below.

The Mixed Value of Smartcard Data
A source of (relatively) more accessible data on urban mobility, particularly in a public transport context, has been the Smart Card Automated Fare Collection (SCAFC) systems such as Hong Kong's Octopus, London's Oyster, Sydney's Opal, and Singapore's EZ-link cards (see, for example, Zhong et al., 2016 [54]; Reades et al., 2016 [55]). Unlike app and network data, SCAFC data is often ultimately owned by the transport operator, although in the early days of the smart ticketing 'revolution' not all operators negotiated with their suppliers the right to exploit their 'own' data.
The contractual structures governing smart card ticketing create new opportunities for data sharing and re-use for planning, research, and even commercial, purposes. Transport for London was one of the first movers in this area, making some aggregated data available through the London Data Store and other tranches of disaggregated data available to research partners in academia to support modelling of multi-modal journeys, the analysis of variability at the individual level [56], and the behavioural changes induced by megaevents such as the London Olympics [57].
A recurrent challenge in such analysis, however, is that these systems were designed to bill users correctly-potentially with a bias towards under-or over-billing them depending on political judgements about their acceptability-not to support fine-grained origin/destination (O/D) analysis. It might seem obvious that users of smart card systems 'tap in' at one location and 'tap out' at another, but real life is rarely so simple: London does not require users to tap out of buses but does for trains. New York, meanwhile, has not normally collected data about destinations because the system was designed with a fixed charge in mind. Intermediate validations, re-entries, re-exits, automatic termination, and charge-free transfers add to the complexity. Each infrastructure is unique and, consequently, each carries its own set of limitations and qualifications.
This complexity is a challenge often overlooked by proponents of 'open data' and the provision of such data to researchers can carry substantial costs: the maintenance and validation of data feeds, and the costs associated with ad hoc requests for support documents or updated feeds, are far from inconsequential. It is imporant to note that many open data sites also have a disclaimer about the veracity and timeliness of the data, asking users "to use at their own risk".
The proliferation of 'curiosity-driven research' by academics and the lack of obvious planning or efficiency return benefits to operators can lead to the creation of gatekeepers who filter out requests not seen as directly contributing to the operator's bottom line. This approach reduces costs but also the range of actors working with these data and organisations. However, operators are beginning to release new products that are much more relevant to researchers than anything previously available.
Historically, transport studies-such as those of individual travel budgets-were based on sample surveys using travel diaries [58], but big data cover larger areas of the city and proportions of the population and typically capture date, time and location with greater accuracy than self-reporting methods [59]. With these benefits in mind, it's easy to see why transport has become one of the most significant components of the contemporary smart city.  [59] using public transport smart card data to better understand travel behaviour through cities. At the same time, such work has also highlighted the limits of big data, and the challenges that need to be addressed to overcome these.

Big Data Applications in Practice
We here present a case study focussed on Sydney's SCAFC, Opal, the case study from the perspective of a big data user/urban researcher (ref. Figure 1). This enables critical reflection and the researchers' experiential insights to be systematically and logically documented. We here present a case study focussed on Sydney's SCAFC, Opal, introduced in 2012 and made compulsory in 2016. Between 2016 and 2020, a series of planning exercises underpinned by Opal Smart Card and other datasets were undertaken to better understand accessibility to jobs and services as part of the Greater Sydney Commission's vision of a 30 min city. A standard analytical workflow developed by Pettit et al. (2017) [62] frames the case study from the perspective of a big data user/urban researcher (ref. Figure 1). This enables critical reflection and the researchers' experiential insights to be systematically and logically documented.

Research/Policy Question
The first step of this approach is to define a research and policy question. The city's vision builds on the idea that people aim to optimise their travel, budgeting for no more than one hour per day (see for example Marchetti, 1994 [63]); this figure has become a cornerstone of the Sydney Metropolitan Plan [64] and Sydney's Future Transport Strategy [65]. Essentially, the idea is that that anyone within the city should be able to reach essential services and opportunities, such as employment, healthcare, recreational activities and education, within 30 minutes' travel time (see Figure 2). As defined by the Greater Sydney Commission's 30 min city indicator, their policy purpose is to measure: 'the proportion of residents able to reach their nearest metropolitan centre/cluster or strategic centre using public transport and/or walking within 30 minutes' [66]. This policy has inherent sustainability drivers too, in focusing on reducing travel time to within 30 min and thereby reducing the carbon footprint of a city induced by travel.

Research/Policy Question
The first step of this approach is to define a research and policy question. The city's vision builds on the idea that people aim to optimise their travel, budgeting for no more than one hour per day (see for example Marchetti, 1994 [63]); this figure has become a cornerstone of the Sydney Metropolitan Plan [64] and Sydney's Future Transport Strategy [65]. Essentially, the idea is that that anyone within the city should be able to reach essential services and opportunities, such as employment, healthcare, recreational activities and education, within 30 min' travel time (see Figure 2). As defined by the Greater Sydney Commission's 30 min city indicator, their policy purpose is to measure: 'the proportion of residents able to reach their nearest metropolitan centre/cluster or strategic centre using public transport and/or walking within 30 min' [66]. This policy has inherent sustainability drivers too, in focusing on reducing travel time to within 30 min and thereby reducing the carbon footprint of a city induced by travel. and made compulsory in 2016. Between 2016 and 2020, a series of planning exercises underpinned by Opal Smart Card and other datasets were undertaken to better understand accessibility to jobs and services as part of the Greater Sydney Commission's vision of a 30 min city. A standard analytical workflow developed by Pettit et al. (2017) [62] frames the case study from the perspective of a big data user/urban researcher (ref. Figure 1). This enables critical reflection and the researchers' experiential insights to be systematically and logically documented.

Research/Policy Question
The first step of this approach is to define a research and policy question. The city's vision builds on the idea that people aim to optimise their travel, budgeting for no more than one hour per day (see for example Marchetti, 1994 [63]); this figure has become a cornerstone of the Sydney Metropolitan Plan [64] and Sydney's Future Transport Strategy [65]. Essentially, the idea is that that anyone within the city should be able to reach essential services and opportunities, such as employment, healthcare, recreational activities and education, within 30 minutes' travel time (see Figure 2). As defined by the Greater Sydney Commission's 30 min city indicator, their policy purpose is to measure: 'the proportion of residents able to reach their nearest metropolitan centre/cluster or strategic centre using public transport and/or walking within 30 minutes' [66]. This policy has inherent sustainability drivers too, in focusing on reducing travel time to within 30 min and thereby reducing the carbon footprint of a city induced by travel.  Sydney's 30 min city puts forth the notion of planning by time [59], which brings to the forefront the relationships between mobility accessibility, and the city's locational attributes, built structures, and transport networks. Critically, however, these relation-ships are complex, and the requisite data to describe these relationships may yet still be fragmented and may not display any discernible patterns. As such, it can be argued how big data, facilitated by current technological advancements, may bridge the gap in data collection and analysis of these dimensions still remains to be seen-particularly in the realms of land-use planning, transport planning, and service delivery.
However, as opined by Moreno et al. (2021) [67], there may perhaps still be value in understanding many different outcomes of such chrono-urban models in these spheres as is the case of Sydney. Further work is therefore required to understand such complexities in the city in light of the city's ambitious planning futures; thus, indicating the opportunities for big data to better aid and disentangle these uncertainties, in this vein, the present paper provides a critical reflection of the use of big data for urban planning and policy-making with a case study of the 30 min city to support the NSW Government in understanding and planning for these. In particular, this research sought to identify to what extent current infrastructure was encouraging and enabling the policy objective. For an in-depth analysis of the 30 min city methodology and data, refer to the paper by Leao et al., (2021) [59].

Data Discovery
For big data to be used, it needs to be found (or 'discovered') in the first place, and this requires an infrastructure to support data storage, search functionalities, metadata records, and tools to remotely access to data, including APIs (Application Programming Interfaces). Examples of such infrastructure at a national level in Australia include the Data Gov Portal (https://data.gov.au/data/) (accessed on 15 November 2021) and AURIN, the Australian Urban Research Information Network (https://aurin.org.au/) (accessed on 15 November 2021). The AURIN online workbench provides access to over 5000 urban datasets to support data-driven urban studies [68]. While supporting the open data policies by providing easy access to published data, AURIN also works as a catalyst triggering the publication of more and better-quality data.
Opal data is discoverable and accessible in one form through the Transport for NSW Open Data Gov Portal (https://opendata.transport.nsw.gov.au/) (accessed on 15 November 2021). To protect the privacy of travellers, the raw data collected through the ticketing system is simplified and aggregated to stop-based rather than trip-based, giving nonidentifiable metrics in reduced sample time bins. The 'raw' Opal Card data is also available through 'human' access via a data request process, which is assessed by the Department. Data provided through the 'human' rather than 'automated' channel provides a richer source of trip-based data and is the basis of the urban analysis discussed in the ensuing sections of this paper.

Data Acquisition
Both the open and closed versions of the Opal card data is supplied as a cross-sectional output, rather than as a live data stream for specific stops, routes, or areas of the city. For this work, an agreement was reached between the research teams at the University and the NSW Government. The initial data license provided access to an anonymised dataset spanning a three-month period in 2016 and retained the origin-destination link. The data license was established over a two-week period and data was provided on a secure USB stick for the research team to access.
A further request in 2018 for an update to this data for time-series analysis resulted in a much longer processes for data acquisition. Securing this next license took nearly six months of negotiation and documentation. Due to more strict data privacy policies, further aggregation and simplification of the data were required, even for licensed data, and more recent releases obtained for the project did not provide the same level of resolution as the 2016 data cut. This presented challenges in undertaking temporal analysis. In short, the absence of an agreed format for the life of a project can potentially hamper research, even though the government itself supports the research. As data privacy concerns continue to escalate, access to consistent urban big data, such as the Opal Card data, is not necessarily possible. This limits the spatial-temporal resolution of research analysis and its potential to support the collective planning and policymaking of more sustainable future cities.

Data Preparation
A survey by Forbes in 2016 reported that data scientists typically spend 80% of their time on preparing data, before analysing it [69]. As Opal Card data is an official government system used in the whole State of NSW, it has several advantages: it is a structured, complete, and maintained dataset. The Opal Card data required relatively little cleaning but did include more tap-ons than tap-offs (potentially due to the traveller forgetting to tap-off or to malfunction of the Opal reader), and a few outliers with short distance trips taking a very long travel time (potentially due to disruption of the transport system, or malfunction of the card system). In contrast, crowd sourced mobility data from fitness apps, for example, is intrinsically incomplete, often as a result of recruitment biases.
This does not mean that Opal Card data are without their challenges: there was only a limited period of time period during which Opal was the primary form of payment for public transport in Sydney. When Opal was rolled out in 2012, it did not immediately become the only method of payment, magnetic stripe tickets were still in wide circulation until August 2016. Since mid-2019 American Express, Mastercard and Visa credit or debit card users-as well those with phones supporting Near Field Communications-have been able to pay for their travel without an Opal Card, further complicating the analysis. The fragmentation of payment technologies and vendors can create access and ownership issues, reducing the utility of big data as a replacement for traditional survey instruments.
Balancing the desire for recency and completeness, we chose to use August 2016 data since it captured the most complete record of the public transport system though, of course, it did not capture annual cycles in transport use. Additional considerations include whether to suppress special event or weather conditions. Data linkage to generate O/D matrices, incorporating different levels of resource access, also included allowance for transfer times in multi-modal journeys (e.g., bus + train or ferry + bus). An additional step was the association of Opal data with Census information by assigning an 'SA1 area'-a standard Census unit-to each public transport trip. The costs of data preparation are high and sometimes prohibitive: our study required a highly skilled team of data scientists and used a range of specialist software including R studio and ESRI's ArcGIS.
It is important to note some limitations of the data preparation process. First, the calculated travel time based on Opal Card data is indeed an underestimation, as walking or driving time to and from train/bus stations was not included. Further, 'tapping on' happens on entering a train station, usually well before boarding, but when a bus arrivespotentially making bus trips seem shorter relative to trains. The simplification of using a weekday and morning peak as reference for commuting does not account for weekend or night-shift workers, and may also include non-commuters.

Data Analysis
Data analysis is the process of extracting 'value' from the data, producing meaningful insights or assisting in decision-making by individuals or organisations. In this study, we first calculated the traditional 'mean travel time budget' as an aggregated indicator for Greater Sydney, as a way to ensure comparability with conventional travel budget methodologies. The literature, however, indicates that although results are robust at the aggregate level, the idea of a stable travel budget has not been as strongly confirmed in disaggregated studies in contemporary cities [61]. We therefore proposed and calculated five new travel time related indicators for the 30 min city [59] in the hopes of bringing to the surface spatial variations and inequalities which were, prior to access to big data, hidden inside the aggregated travel budget. For each destination, among 15 strategic town centres across Greater Sydney, the five new indicators included: (1) median travel time budget; (2) interquartile range of the travel time budget; (3) percentage of workers with travel time shorter or equal to 30 min; (4) percentage of workers with travel time shorter or equal to 30 min by range of worker's income; and (5) correlation coefficient between workers' travel time and mean income. Together, these indicators provided a rich picture of geographic differences and inequalities in urban accessibility.
One challenge was the storage and computing power required: although not enormous in terms of length, the complete data for a single day in Greater Sydney nonetheless spanned 5 million rows (individual trips), each with 60 columns of 'width' (the attributes of each trip, including date, time, card type, locations, lines, operators, area codes, etc.). This certainly passed Batty's 'Excel test' and also required a balance between subject knowledge and advanced digital skills for meaningful interpretation along the data analysis pipeline.
Moreover, understanding the limits of individual datasets, and how data integration processes can overcome some of these limitations, is an important dimension in big data analytics. Enrichment with survey instruments is one way to add other dimensions or resolve challenges. For example, in an area with low frequency services a customer may wait 15 min at the start of their supposed 30 min journey. This can be ameliorated by combining O/D data with scheduling or frequency data (such as GTFS, mentioned earlier). Data on weather conditions, public transport disruptions, major sporting or cultural events, school holidays, and so on can provide analytical context and interpretive depth.
The potential of big data is endless, within the capabilities, goals, skills and analytical maturity of the organisation as a consumer, ranging from descriptive, inquisitive, predictive and prescriptive, to pre-emptive analytics [70]. We have focused mostly on a descriptive analysis (identifying travel times across the region for selected employment hubs), and slowly moved towards more inquisitive approaches (How do social spatial inequalities relate to accessibility across the region?).
Interestingly, we found a large variation of travel time, and also a geographic divide, in Greater Sydney. In the East/Centre region, the higher the income of the commuter, the shorter their travel time; while in the West, the opposite was found, with higher income commuters leading having longer commuting times [59]. This divide reflects complex socio-economic inequalities, that manifest geographically in the city shape, and should be taken into consideration in future plans for urban growth and accessibility.

Data Visualisation
In the early stages of an analysis, data visualisation can assist in the identification of missing, erroneous or duplicate values. In intermediate stages, it can assist users in 'sensemaking' through querying of the data and assessing the results through visual outputs. In the final stages, it can be a means to organise the data to best communicate insights. In short, although we've positioned it as the penultimate stage in the pipeline for simplicity, data visualisation can be integral to many of the stages in this workflow. Agrawal et al. (2015) [71] divide the challenges of big data visualisation in three categories: perceptual scalability (associated with the human eye's perceptual limits and also the limited screen sizes of some of our ubiquitous devices); real-time scalability (related to limits of computational memory, query, and processing for big real-time data streams); and interactive scalability (associated with the computation limits of complex algorithms being interactively applied by multiple simultaneous, potentially mobile, users). Additional data visualisation complexities include privacy protection of subjects illustrated graphically, and security against hacking of the datasets behind visualisation dashboards.
Using the 'semi-interactive' ESRI StoryMap platform, we developed a visualisation allowing users to select specific job destinations in Greater Sydney and review a series of related data analyses through graphs and maps. Visualising every origin-destination data point of the study area would lead to over-plotting, overlapping lines, and would overwhelm users' perceptual and cognitive capacities. We therefore combined nearby stations and stops into the standard Travel Zone geography used for Census analysis and eliminated origins within Greater Sydney with less than five trips. The minimum threshold serves to reinforce the key findings while also acting to protect individual user privacy. Presenting the results of the 30 min city analysis in this aggregated form makes it possible to communicate the results via an online urban dashboard to government, industry and citizens. Such dashboards potentially encourage and support greater participation of citizens in the planning process [72].

Policy and Planning Support
The full cycle of big data analytics is completed when the insights from the analytical process are used to inform policy and planning decision-making. The literature on big data suggests abundant potential for better decision-making. In the private sector, big data analytics for decision making have significantly increased (4-6%) productivity [73]. In the public sector, big data's potential for rational decision-making has been both advocated and criticized in the public administration debate [74]. City planning can be very complex, due to the mix between technical and political contexts, and the multiple actors with different roles and goals. In cities, what is generally sought is not simply an optimised solution, but a compromise that generates the best possible overall 'good' for a varied society, subject to existing built stock and infrastructure, socio-economic context, and limited resources. Beyond the goal of increased revenues or reduced costs pursued by businesses, the public sector needs to deal with goals related to sustainability, equity, productivity, and liveability, while incorporating participation and transparency.
The adoption of digital planning tools to support evidence-based policy and decision making thus remains a challenge [75]. The analysis of the Opal card data characterised Greater Sydney as a 32 min city. However, based on the proposed five indicators for individual destinations, accessibility varied from a maximum in which 3 4 of the public transport commuters reached work in less than 30 min, to a minimum in which only 1 4 of the workers could do the same. Moreover, a distinctive pattern dividing Sydney in two halves was found when analysing the disaggregated correlation between workers' travel time and mean income. In the East/Centre region, the higher the income of the commuter, the shorter their travel time; while in the West, the opposite was found, with higher-income commuters having longer-duration commuting trips. In other words, Sydney is a 'somewhat 30-min city' on the west region only for the lower-income workers, and on the centre/ eastern regions only for the higher-income workers. This divide found in Sydney reflects the complex socio-economic inequalities that manifest geographically in the city shape. The 30 min city dashboard is a digital planning tool developed with the specific aim of providing datadriven insights into the planning of Future Sydney, communicating the findings described above, see Figure 3. The dashboard has provided insights to many planners and policymakers who have interacted with the dashboard, both in co-design workshops hosted in the City Analytics Lab [76] and through the ability of individuals to directly explore the visualisations via the publicly accessible online dashboard in their own time. Furthermore, the findings from the 30 min analysis has informed the formulation of district level strategic planning by the Greater Sydney Commission. The District Plans, later publicly published (https://www.greater.sydney/western-city-district-plan) (accessed on 30 November 2021), reference the goal of increasing 30 min access to key employment clusters, as incorporated into the 30 min city dashboard. Further, the dashboard has been published for open online access, since the release of the District Plans, accompanied by the support of the government and media coverage (https://www.smh.com.au/national/nsw/opal-card-data-turns-upsurprise-for-sydneysiders-wanting-30minute-commute-20170215-gudakb.html) (accessed on 30 November 2021). It remains a resource for government, industry and citizens to use.
The study summarised here was an early attempt to utilise new, big data (Opal card) to approach a new government goal (30 min city). Since then, multiple dashboards, analytics and interpretations on the same theme have been developed, making it challenging to provide a clear and definitive message. These include ARUP's 30 min city analytics and visualisation work, based on accessibility modelling using GTFS data [77], and many others. In fact, one of the opportunities and challenges in providing access to open city data is that a 'thousand flowers' may bloom with different results prevailing. In relation to big data, this is a key challenge-ensuring quality control, cross-validation, and scalability [78]. employment clusters, as incorporated into the 30 min city dashboard. Further, the dashboard has been published for open online access, since the release of the District Plans, accompanied by the support of the government and media coverage (https://www.smh.com.au/national/nsw/opal-card-data-turns-up-surprise-for-sydneysiders-wanting-30minute-commute-20170215-gudakb.html) (accessed on 30 November 2021). It remains a resource for government, industry and citizens to use. The study summarised here was an early attempt to utilise new, big data (Opal card) to approach a new government goal (30 min city). Since then, multiple dashboards, analytics and interpretations on the same theme have been developed, making it challenging to provide a clear and definitive message. These include ARUP's 30 min city analytics and visualisation work, based on accessibility modelling using GTFS data [77], and many others. In fact, one of the opportunities and challenges in providing access to open city data is that a 'thousand flowers' may bloom with different results prevailing. In relation to big data, this is a key challenge-ensuring quality control, cross-validation, and scalability [78].

Conclusions
As we plan for more liveable, productive, sustainable and resilient cities the power of urban big data should help politicians, planners and the community work together to address the myriad of 'complex problems' facing our cities: including congestion, social inequality, an ageing population and the impacts of a changing climate. However, one of the challenges is that significant big data repositories are driven by strong commercial interests. Social media data from companies such as Twitter, Facebook and WeChat have not been created to solve the challenges facing our cities, rather they have arisen to connect

Conclusions
As we plan for more liveable, productive, sustainable and resilient cities the power of urban big data should help politicians, planners and the community work together to address the myriad of 'complex problems' facing our cities: including congestion, social inequality, an ageing population and the impacts of a changing climate. However, one of the challenges is that significant big data repositories are driven by strong commercial interests. Social media data from companies such as Twitter, Facebook and WeChat have not been created to solve the challenges facing our cities, rather they have arisen to connect people and generate profit through advertising and selling products. Likewise, Smart Card systems have not been devised specifically to assist city planners in shaping future cities, rather as an efficiency mechanism to reduce friction and people costs in public transport systems. Nevertheless, it is the ability to access and analyse such repositories of big data that can potentially provide new insights into the form and function of our cities. The 'opportunistic' use of such data is an engine to power our future cities. Yet, the growth of data as a commodity through e-commerce and social media platforms is creating a digital divide for those who cannot afford to access such data. Thus, there is an opportunity, and somewhat of an imperative, for government and industry to address this rising social inequality and to forge new partnerships with communities and researchers to inclusively and collaboratively work on solving urban problems.
In this paper we have considered the evolution of data sources and analytics towards the so called 'Smart City'. This is fuelled by rapidly increasing data volumes from an eclectic array of sensors. In this research, we have taken a critical reflective approach focusing on Sydney as a case study. Specifically, we examined how big data, derived from the Opal Smart Card, is being used in planning for a 30 min city, which is a cornerstone of the Greater Sydney Commission's and Transport for NSW's Metropolitan Plan. The paper documents several important insights and lessons from this Sydney case study including the increasingly important and complex problem of accessing big data and deriving new insights without comprising the privacy of individuals. There remain many opportunities and challenges arising from urban big data; only time will tell how these unfold as the divide between the digital and real world continues to diminish.