Simulating Spatio-Temporal Patterns of Bicycle Flows with an Agent-Based Model

: Transport planning strategies regard cycling promotion as a suitable means for tackling problems connected with motorized trafﬁc such as limited space, congestion, and pollution. However, the evidence base for optimizing cycling promotion is weak in most cases, and information on bicycle patterns at a sufﬁcient resolution is largely lacking. In this paper, we propose agent-based modeling to simulate bicycle trafﬁc ﬂows at a regional scale level for an entire day. The feasibility of the model is demonstrated in a use case in the Salzburg region, Austria. The simulation results in distinct spatio-temporal bicycle trafﬁc patterns at high spatial (road segments) and temporal (minute) resolution. Scenario analysis positively assesses the model’s level of complexity, where the demographically parametrized behavior of cyclists outperforms stochastic null models. Validation with reference data from three sources shows a high correlation between simulated and observed bicycle trafﬁc, where the predictive power is primarily related to the quality of the input and validation data. In conclusion, the implemented agent-based model successfully simulates bicycle patterns of 186,000 inhabitants within a reasonable time. This spatially explicit approach of modeling individual mobility behavior opens new opportunities for evidence-based planning and decision making in the wide ﬁeld of cycling promotion


Introduction
Cycling is widely regarded as an appropriate alternative to motorized traffic, without the adverse ecological, economic, and societal effects. Thus, administrations and policy makers are aiming to strengthen sustainable mobility and to promote cycling, especially in urban environments [1,2]. Evidence from the literature suggests a positive correlation between a comprehensive promotion of cycling and a modal shift towards active transportation modes [3][4][5]. However, valid data on where and when persons are cycling within urban transportation networks are still rare [6]. Bicycle mobility data are still difficult to acquire and process despite advancing sensor technologies. This leads to deficiencies in data integrity and representativeness [7,8]. Consequently, cycling infrastructure is widely planned in the absence of robust data, while the effect of cycling promotion measures cannot be monitored appropriately.
A computer model, as the abstract representation of an actual system, allows researchers to investigate processes without conducting experiments within a real environment [9]. Transport models focused on bicycle mobility suggest solutions to questions about transport demand and supply [10,11], route choice [12], lane-changing, and queuing behavior [13,14], etc.
Different methods regarding bicycle traffic flow patterns exist in the literature. Traditional trip-based models are accustomed to the modeling of motorized trips based on aggregated traffic data between large-scale traffic analysis zones (TAZ). However, there are examples of integrating active modes [15,16]. Another approach, such as the direct In this paper, we present a methodology to simulate trips of a heterogeneous population over one complete day with emerged bicycle traffic flows using an agent-based modeling approach. This study, therefore, sets out to implement context-dependent individual decision making resulting in bicycle patterns, interpreting model results for the case study, and checking the model's complexity as well as validate its results with the observed data. The decision making is enhanced by spatial reasoning and includes the characterizations of activity type, activity duration, activity location, trip's starting time, mode, speed, and route. Mode choice provides six transportation options to facilitate mode change over one person's daily schedule. Despite this fact, only persons with bicycle mode move through a simulated space, while the rest is teleported to destinations. The testbed for our proof-of-concept (POC) is the city of Salzburg with adjacent municipalities in Austria.
The rest of the paper is structured as follows. The methods section outlines the list of data acquired for the study, continued by the model description and methods of statistical testing. The results section presents the outcome of the bicycle model and validation analysis for our POC. This is followed by the final section, which offers a discussion of findings, limitations, conclusions, and an outlook considering possible future research.

Model Specification
In brief, the simulation of bicycle mobility in the presented model consists of several steps: the generation of environment and residential population, the dynamic assignment of activities, and the simulation of trips. Activity assignment includes decisions about activity type, starting time, duration of an activity, mode, speed, target location, and route. As a result, system-level traffic flows of cyclists emerge from complex decision making and spatial reasoning of every individual resident in regard to their mobility behavior.
We use the model of Wallentin and Loidl [28] as the conceptual basis and add complexity. We extend the model through the generation of a more heterogeneous population that includes all residents of the study area. The simulated persons are not grouped into specific categories but rather have different socio-demographical characteristics and mobility preferences. Another advancement is in the dynamic scheduling of activities. Moreover, destination locations are selected within reasonable distances to be reached by modes. A modal split is extended to six transportation modes: bicycle, walk, car, car-passenger, public transport, and other transport, thus, daily schedules include bicycle and non-bicycle trips.
The GAMA RC1.8 platform was selected as the programming environment available on Windows, Mac OS, and Ubuntu operating systems [32]. It exhibits extensive functionalities to work with data models in spatially explicit simulations. Moreover, the improved computational capacity of the platform allows for the simulation of large and complex systems, such as a regional transportation system. The model code is implemented in a high-level, intuitive GAMA modeling language (GAML), provided by the platform. The source code of the model is published in the ComSES Net model library (https://www.comses.net/ (accessed on 19 February 2021)) under the open CC-BY-NC-SA-3.0 license [33]. Running the model code requires a minimum of 4 GB of random-access memory (RAM).
The detailed model specification below follows the standard overview, design concepts, and details (ODD) protocol [34]. The protocol is designed to communicate agentbased models in a standard and reproducible way. It describes the model's purpose, components of a simulated transportation system, spatial and temporal extents, behavioral rules of agents, and scheduling of processes.

Purpose
The bicycle model aims to generate bicycle traffic flows of one day by executing the trips of self-ruling persons within a regional transportation system. The resulting dynamics of bicycle traffic are an estimation of the transportation system's overall performance. The model can be used to run scenarios to observe the impact of policy interventions, infrastructure design, and other urban planning strategies.

Entities, State Variables, and Scales
Several entities constitute the system of the model: persons, facilities, roads, intersections, and counting stations. Persons are heterogeneous by demographic attributes. Distinctions include age, gender, and employment status. Age ranges between 6 and 100 years. In terms of employment, there are employed, unemployed, students, pupils, pensioners, and inactive persons.
Roads and facilities represent the built environment that constrains the movements of persons and defines the areas of attraction. Roads are characterized by the number of traversed cyclists, restriction level, safety index, and weight attributes. For cycling, there are restricted, partially restricted, and unrestricted roads. On partially restricted pedestrian roads, cyclists are only allowed to push their bicycles. The safety index as proposed by Loidl and Zagel [35] describes how comfortable a road for safe cycling is according to its category and quality. Roads are weighted for routing. Weights can represent perimeters or safety indices depending on the type of routing, such as shortest and safest paths, respectively. Road intersections are also modeled to characterize the selected routes of persons as additional trip information.
Various facilities serve as locations for specific activities and are classified into ten types. These are workplaces, schools, universities, kindergartens, authorities, doctors, shops, recreation places, and residents' homes. Facilities do not have a limiting factor to attract visitors, except for workplaces that are characterized by the number of employees. Persons traveling to work select target locations depending on this information. Lastly, a name, position, and the number of passing cyclists characterize counting stations. We use the latter component for verification and validation purposes.
Throughout a simulation, the model collects several high-level state variables that describe a simulated world and its dynamics. There are population-level aggregates that describe a demographic content, such as an age split or distribution by employment. The total number of cyclists actively traveling through a simulated space interprets the traffic dynamics. The model additionally detects active cyclists by trip purposes.
The environment of the model covers the extent of Salzburg city and adjacent municipalities, except one on the Bavarian side of the national border ( Figure 1). The selected extent facilitates the inclusion of commuting trips from areas outside the city and prevents edge effects in the area of interest (City of Salzburg). The temporal extent of the simulation is an average day in October-November. The day was determined by the time frame of the statistical data that underlie behavioral rules. The time-step of the simulation is one minute. Figure 2 illustrates the key processes that are reenacted in the model every cycle or when conditions are met. Throughout the simulation, persons act asynchronously. They iteratively choose and accomplish desired activities by traveling to facilities by different transportation modes. A person does not have an initial activity chain for the whole day but complements it by assigning a new activity at the end of the current one. The selection of the last activity is integrated as a probability during the activity assignment. Thus, the total amount of activities that a person can undertake depends on the derivatives from the mobility survey. The maximum is eight activities per day.  Figure 2 illustrates the key processes that are reenacted in the m when conditions are met. Throughout the simulation, persons act as iteratively choose and accomplish desired activities by traveling to f transportation modes. A person does not have an initial activity chai but complements it by assigning a new activity at the end of the curre of the last activity is integrated as a probability during the activity as total amount of activities that a person can undertake depends on the mobility survey. The maximum is eight activities per day. During the process of activity assignment, behavioral rules determine several activity attributes, such as type, starting time, duration, mode, distance restrictions, speed, and target location. A rule is represented in the form of a probability distribution and characterizes the likelihood of a particular option to be selected. After activity selection, only cyclists that cross a city area travel along a network and register themselves at traversed counting stations and roads. The rest of the persons transfer themselves directly to their destinations, due to the model's focus is on a bicycle traffic pattern. At the end of each trip, trip-related data, such as synthetic trajectory, distance, travel time, etc., are saved in the "trips" dataset.

Process Overview and Scheduling
Three synchronous processes collect data about traveling cyclists at user-defined time intervals. The "active cyclists" dataset stores the aggregate totals of cyclists traveling around a simulated world according to their trip purposes. Counting stations and a network register the amounts of traversing cyclists into datasets named "counting data" and "traffic volume heatmap", respectively. During the process of activity assignment, behavioral rules determine several activity attributes, such as type, starting time, duration, mode, distance restrictions, speed, and target location. A rule is represented in the form of a probability distribution and characterizes the likelihood of a particular option to be selected. After activity selection, only cyclists that cross a city area travel along a network and register themselves at traversed counting stations and roads. The rest of the persons transfer themselves directly to their destinations, due to the model's focus is on a bicycle traffic pattern. At the end of each trip, trip-related data, such as synthetic trajectory, distance, travel time, etc., are saved in the "trips" dataset.
Three synchronous processes collect data about traveling cyclists at user-defined time intervals. The "active cyclists" dataset stores the aggregate totals of cyclists traveling around a simulated world according to their trip purposes. Counting stations and a network register the amounts of traversing cyclists into datasets named "counting data" and "traffic volume heatmap", respectively.

Design Concepts
Several design concepts that define agent-based models were implemented in the bicycle model. The emergence of bicycle traffic flows over space and time is given by the complexity of persons' diverse mobility behavior, but-at this stage of the model development-not by interactions with each other. Nevertheless, heterogeneity of agents and the probabilistic nature of behavioral rules that govern agents' decision making make the prediction of traffic volumes emergent. Fitness-seeking behavior is incorporated in the route selection, as persons optimize their paths by safety and distance preferences.
Sensing as the notion of agent awareness of itself and an environment is also implemented in the model. Persons reason about choice options available to them based on the knowledge about their age, gender, and employment. They sense the network quality when navigating to their destinations according to their preferences. Stochasticity is present in the distribution of a population by demographic attributes. Moreover, the modeled rules of activity assignment are based on probability distributions adding the uncertainty of human behavior.
Observation includes monitoring and storing model output for testing and analysis. Active cyclists, traversed cyclists at counting stations, and a network are monitored for verification and validation purposes. For model analysis, trips' information and roads' bicycle volumes were collected.

Initialization
During the initialization, the model creates a world with persons and a built environment. The spatial distribution of persons by age, gender, and employment status is calculated from real-world estimates in residential data. For example, if a grid cell has a registered number of female residents aged between 20 and 24, the model will create that number of female persons and assign age randomly within that range.
The model requires a topologically correct network of links for routing operations. Thus, simulated roads connect to a coherent bidirectional network with one-way roads, two-way roads, and restrictions. The network is weighted with a safety index to provide safety-oriented cycling. The adopted calculation of an index follows the indicator-based assessment [35]. Several road indicators are incorporated into the safety index calculation. These are road category, presence of bicycle infrastructure and established cycling routes, vehicle restrictions, gradient, pavement, and maximum motorized speed from the network shapefile.
Counting stations and network intersections are also initialized directly from shapefiles with no additional computation. Next, the model imports probabilities of activity types, modes, and starting and duration times from comma-separated values (CSV) files. Before the simulation day starts, persons are distributed to the initial activity locations.

Submodels
Activity type choice. There are several activity options: staying at home, working, shopping, recreation activity, business-related activity, visiting authorities, visiting doctor, accompanying people, and staying at other places. The probabilities of these options are different for every individual; firstly, because they vary depending on the position of activity in an activity chain (0-7). In Table 1, every column represents the probability distribution of activity types based on the numerical order of a calculated next activity, i.e., activity position. Secondly, an individual's employment status restricts activity options. For example, a pupil below a certain age can travel to school but cannot go to work. At the beginning of a simulation, every person calculates an initial activity by taking a probability distribution where an activity position is null. Table 1. Probability distributions of activity types (%) in columns depending on an activity position in an activity chain (0-7). Starting and duration time choices. The process of activity assignment continues with the selection of temporal characteristics. The probability of every activity's starting time depends on the activity's type and position in an activity chain. Table A1 in Appendix A shows an example of probabilities for work activity. Throughout a simulated day, passed hours are eliminated from the probability distributions. There are simplified assumptions for school and university activities. Departure time for school is calculated between 7:00 and 8:00, for university, between 8:00 and 18:00. Furthermore, activity types are restricted to possible durations a person can spend at an activity location. In the case of work activity, the duration varies by gender. Moreover, pupils stay at school depending on age. Additional information about duration values can be found in Appendix A (Tables A2-A4).

Activity Position in an Activity
Mode choice. There are six available modes, such as walking, bicycle, car, car passenger, public transport, and other transport ( Table 2). For the sake of simplification, we assume that a person can only change their mode of transportation at home. The probability distributions of modes vary for every activity type. Moreover, there is a difference between modal splits for region-based and city-based trips. Distance and speed choice. Another assumption is that modes have different travel distance capacities and speeds. There is a 71% chance a walking trip is shorter than 1 km, and a 29% chance it is between 1 to 5 km. Similar probabilities are given to bicycle trips: distance restrictions can be up to 2 km and 2-8 km with probabilities of 73 and 27%, respectively. Other modes are not restricted by distances. Speed also varies by mode. Walking speed is in the range of 0.7-2.0 m/s, bicycle speed is given 1.6-5.5 m/s. Car, car passenger, and public transport all travel 4.9-14.9 m/s. Other undefined transport has a speed of 2.4-13.6 m/s. Target choice. A person selects a target location from an available number of relevant facilities that fulfill distance restrictions. The calculation of a suitable workplace has an additional constraint. It takes into account the number of employees every workplace is characterized by. Throughout an activity assignment, a person might not select any activity type or starting time. In such a case, a person returns home and does not travel anymore.
Route choice. The safest and shortest routing options characterize the movement of cyclists and non-cyclists, respectively. The safest routes are calculated using the Dijkstra algorithm and the safety index as impedance. For the shortest routes, the same algorithm uses distance as impedance.
Move. Cyclists whose routes overlap a city area start traveling at calculated departure times along a network toward destinations. The movements are performed according to routing preferences and speed values calculated in prior steps. Non-cyclists and cyclists with trips outside a city do not physically move through a simulated space. They teleport directly to their destinations, and travel times are calculated to represent the real time they would spend on traveling.

Data
In the proposed agent-based model, we make use of extensive geospatial data sets to define the population, environment, and decision-making rules. Several datasets in Table 3 were acquired to parameterize the bicycle model. The residential data from Statistik Austria [36] provides the basis for modeling the socio-demographic heterogeneity of the underlying population. The data of residency by age, gender, and employment status consists of grid cells with a spatial resolution of 250 m. The mobility survey data of the province of Salzburg [37] defines behavioral rules. Around 40,000 respondents from the province reported their daily trips. The trips of respondents are ordered in activity chains. Every trip includes attributive information, such as activity ID (sequential number of activity in an activity chain), activity type, timing, distance, and transport mode. The derived distributions of trips by each attribute form probabilistic rules concerning activity types, starting times, modes, distances to a target, and speeds. Furthermore, every simulated activity is characterized by duration time. The corresponding probabilities are inferred from the guidelines and reports of research institutions [38][39][40]. All probability distributions are assembled into separate CSV files for use by the simulation model.
The built environment is represented in the model by an authoritative road network graph [41] and points of interest (POI) from Open Street Map and authoritative sources [42][43][44][45][46]. The workplace facilities are defined by the workplace census information provided by Statistik Austria [47], where employment values are represented as the number of employees per 100 m grid cell. The city's and region's outlines are gathered from the official dataset of administrative boundaries in Austria [48]. Lastly, the dataset with existing locations of counting sites in Salzburg [49][50][51] is used to create counting stations in the model.

Design of Experiments
Given the stochastic nature of probability distributions incorporated in the decision making of agents, a model's response may vary and include an experimental error [52]. It is recommended to run one simulation several times to achieve meaningful results. The required number of runs is distinct to every model. In order to obtain it, the mean and variability of response values for an increasing number of runs are calculated and analyzed. A response variable can be any record of the model's non-deterministic output. Lorscheid et al. [52] propose to use the coefficient of variation as an accuracy measure of the mean and standard deviation. To define it, the standard deviation is divided by the mean value. The coefficients of variation fluctuate according to the number of runs until they stabilize, meaning that additional runs do not minimize an error considerably.
In our model, the average execution time of one simulation run is around 40 min. Such an amount of time impedes a 1000-times repetition. A much higher repetition of simulations would be extremely time consuming and just lead to pseudo-accuracy, given the structural model uncertainty due to assumptions. We, thus, consider one simulation run sufficient for the purpose of the model if experimental errors for low simulation runs stay small.
We carry out an analysis to check the variability of errors in model results depending on the increasing number of runs. Restricted by the cost-benefit factor, we limit the maximum number of simulation runs to 10. We choose the daily bicycle volume as a response variable and record it at 20 random locations. At every step, we raise the number of simulation runs and calculate the mean of daily bicycle volumes at single locations as well as the coefficient of variation. The variability of coefficients, such as standard deviation, can then show how the number of additional runs changes the variation in response variables.

Verification and Validation
Following the common practice in ABM research, the modeling process consists of designing a conceptual model proceeded by formalizing it into an executable program. Wilensky and Rand [23] emphasize the principle of ABM design to progressively implement agents and rules in a model. Thus, at every step of additional complexity, we check if it improves the model according to its stated purpose. This process is coined with monitoring state variables to verify that the model is built correctly. The verification testing is done by displaying the dynamic content of a simulation output on charts and a map within the platform interface. A map displaying the entire environment and moving agents is used to demonstrate that all components of a transportation system are present and responsive as expected. To verify that agents move, some graphs demonstrate temporal distributions of traveling cyclists.
Patterns observed in a real system can facilitate the design and validation of its simplified representation. They often express the structure and underlying processes. The pattern-oriented modeling (POM) framework suggests using patterns to define a model in terms of its components and processes, test its internal organization, and validate results [53]. We use multiple patterns, such as spatial and temporal distributions of cyclists over the study area and their relative frequencies from observed datasets, to indicate the validity of model output.
For the validation, we consider conceptual and operational validities [54]. Conceptual validity tests cause-effect relationships of underlying model concepts using scenario analy-sis. This step is similar to the POM's strategy to test how well the alternative theories of decision-making processes reproduce some patterns [53]. We formulate four alternative scenarios (Table 4) to test the impact of the behavioral concepts in our model (behaviorally realistic reference scenario) against the respective null models: activity type choice, target location choice, and starting time choice are tested against random selections, and multi-criteria route choice is tested against the shortest path calculation as its null model. The comparison of the scenarios is done by investigating the operational validities of their results. Operational validity checks whether the bicycle model and four alternative scenarios are acceptable for the intended purpose. It examines how well model results imitate observed data. According to Kang and Aldstadt [55], validating the output of a spatially explicit model at different spatio-temporal scales increases its reliability. Such validation eliminates the risk of faulty generative mechanisms when a model reproduces patterns either on a macro scale or a micro scale.
Validation data were acquired from the stationary and mobile sensors that captured the movements of cyclists in the region during October-November. The patterns derived from the data serve as validation criteria at different spatial and temporal scales. The acquired counting data represent counts of cyclists at nine stations between 2012 and 2019 [49][50][51]. The minimum temporal granularity is 15 min. The first criterion is the total amount of traversed cyclists at counting stations over a day. The selected pattern is at a spatially local and temporarily long-term scale. The mean absolute error is used as a comparison method. The absolute error is the difference between simulated and observed daily counts at one station. The average of absolute errors at all stations comprises the mean absolute error. This measure of error demonstrates the magnitude of inaccuracy in model results. The second criterion at a spatially local and temporarily short-term scale is the hourly number of traversed cyclists at counting stations throughout the day. Pearson's correlation analysis is used to show the strength of the association between observed and simulated hourly values.
The two mobile applications that collected spatio-temporal bicycling data are Bike Citizens and Strava [56,57]. Both datasets were spatially matched to the network used in the model. The derived daily totals of traffic volumes per network link are the third validation criterion. The pattern is at spatially local and temporarily long-term scale. Again, Pearson's correlation analysis is employed.
Finally, validation examines the patterns of relative frequencies in the observed and simulated bicycle traffic at counting stations. Mobility can be influenced by the spatial distribution of attractions and the landscape of a city. Therefore, we inspect the ratio of cyclists between two sides of the city divided by the river Salzach. The next pattern is the ratio of cyclists traversing stations in the city center vs. on the outskirts. At last, we use the morning and afternoon peak ratio to validate bicycle traffic at stations.

Results
The model simulated the mobility of around 186,000 persons in the extended region of Salzburg city for one day. Simulated counting stations gathered information about how many cyclists traversed them every hour. The temporal distributions of cyclists and their means are shown in Figure 3 for comparison with the observed data from actual counting stations. The model at the majority of stations captures two distinct peaks of busiest traffic per day. The ratios of cyclists at the morning and afternoon peaks are similar to the observed ratios (Table 5) Figure 4a shows the spatial distribution of bicycle traffic flows on the network. It demonstrates heavily used bicycle corridors, such as the one that emerges along the river Salzach. This corridor attracts most cyclists, because it connects northern and southern parts of the city, providing bicycle infrastructure separated from motorized traffic. Minimum obstacles, such as traffic lights and junctions, also characterize the corridor. Several arteries that gather at the Salzach corridor connect the east and west of the city. They represent the evidence of daily relocations of residents between the city halves split by the river. The concepts of activity type choice and target location choice affect activity locations and the total number of trips carried out by individuals. In turn, the first validation criterion, which is daily counts at stations, is responsive to changes in the distribution of activities. Thus, it can best demonstrate the soundness of these two concepts. Figure 6 illustrates the observed and simulated daily numbers of passing cyclists at each counting station for every scenario. In the reference scenario, the mean absolute error of simulated daily counts compared to observed counts are 1002.07 cyclists on average. In the random activity type choice scenario, the random assignment of activity types changes the spatial and temporal distribution of attractive locations. The error increases by 71.04% to 1713.9 cyclists. A greater increase of 255.12% is observed in the unrestricted target location choice scenario. The error is 3257.9 cyclists. According to scenario rules, persons are unrestricted in the selection of destinations in regards to distances. Hence, bicycle trips can be 500 m or as long as 20 km long.
Salzach. This corridor attracts most cyclists, because it connects northern and southern parts of the city, providing bicycle infrastructure separated from motorized traffic. Minimum obstacles, such as traffic lights and junctions, also characterize the corridor. Several arteries that gather at the Salzach corridor connect the east and west of the city. They represent the evidence of daily relocations of residents between the city halves split by the river. The ratio of registered cyclists at stations on the west side vs. east side is 0.49:0.51, which corresponds well to the observed ratio of 0.53:0.47. coefficients of variations in bicycle traffic volumes at 20 random stations. In Figure 5, the variability of coefficients derived from different number of runs is demonstrated as coefficients' mean, standard deviations, and minimum and maximum values. The majority of coefficients indicate small variabilities in response values. Few locations have null coefficients because of the recorded null bicycle traffic volumes in all of the simulation runs. The standard deviations of coefficients lean towards 0.0 in most of the locations. This means that the different numbers of runs at the low range produce similar results. These outcomes justify that one simulation run is sufficient for using the model. The scenario analysis compares the model (behaviorally realistic reference model) with four alternative scenarios to demonstrate the significance of implemented model concepts. Figure 4b-e show traffic volumes resulting in alternative scenarios. The pattern is more emphasized in the random activity type choice and unrestricted target location choice scenarios, as there are more trips simulated. The random starting time choice scenario shows less traffic on the network. Lastly, there is a spatially different pattern emerging in the shortest path route choice scenario.
The concepts of activity type choice and target location choice affect activity locations and the total number of trips carried out by individuals. In turn, the first validation  criterion, which is daily counts at stations, is responsive to changes in the distribution of activities. Thus, it can best demonstrate the soundness of these two concepts. Figure 6 illustrates the observed and simulated daily numbers of passing cyclists at each counting station for every scenario. In the reference scenario, the mean absolute error of simulated daily counts compared to observed counts are 1002.07 cyclists on average. In the random activity type choice scenario, the random assignment of activity types changes the spatial and temporal distribution of attractive locations. The error increases by 71.04% to 1713.9 cyclists. A greater increase of 255.12% is observed in the unrestricted target location choice scenario. The error is 3257.9 cyclists. According to scenario rules, persons are unrestricted in the selection of destinations in regards to distances. Hence, bicycle trips can be 500 m or as long as 20 km long. The next model concept is the starting time choice responsible for the temporal distribution of trips. Its validity is checked by the second validation criterion, i.e., the hourly distribution of cyclists at counting stations. The results of the correlation analyses between simulated and observed data for each scenario are listed in Table 6. In the reference (be- The next model concept is the starting time choice responsible for the temporal distribution of trips. Its validity is checked by the second validation criterion, i.e., the hourly distribution of cyclists at counting stations. The results of the correlation analyses between simulated and observed data for each scenario are listed in Table 6. In the reference (behaviorally realistic) scenario, the correlation coefficients range between 0.60 and 0.89, showing very significant relationships with p < 0.001 at all stations. Persons in the random starting time choice scenario travel at random times of a day causing insignificant relationships. Its correlation coefficients range between 0.09 and 0.35, which are much lower than in the reference scenario.  Figure 5 show how the concept of route choice, in particular, affects the spatial distribution of cyclists over the network. Thus, we compare the daily traffic volumes over a network with the observed data to justify the concept of route choice in Table 7. Moderate and weak correlations are observed in the reference (behaviorally realistic) scenario. The coefficients are 0.64 and 0.43 with p < 0.001 for Bike Citizens and Strava datasets, respectively. In the shortest path route choice scenario, shortest paths are preferred over safest paths during navigation. The coefficients are 0.35 and 0.14 with p < 0.001. There is a weak positive correlation with the Bike Citizens dataset and absolutely no correlation with the Strava dataset.

Discussion and Conclusions
In our research, we simulated bicycle traffic flows as a phenomenon emerging from individual mobility behavior. Such simulation of emergent traffic flows integrates the strengths of two strains of research. First, there are the behaviorally realistic bicycle models that have mainly focused on emergent phenomena at local scale levels [13,14]. Second, there is the traffic-flow modeling at a regional level that has, thus far, mainly been based on aggregated data [15].
The part of the experiment's design is the assessment of uncertainty in results caused by the model's stochasticity in decision making. The evaluated experimental errors of results are small and do not strongly vary depending on the number of simulation runs. Given the low coefficient of variation, the model produces results of sufficient accuracy with one simulation run.
Through scenario analysis, we could demonstrate that models with demographically parametrized behavior outperform stochastic null models. We conclude that dynamic activity assignment allows for the inclusion of uncertainty of human behavior as opposed to the appointment of pre-defined activity chains. Although, the strong trends in activity chains are still preserved in probabilistic rules. The sequential number of the next activity is a significant factor in determining its type and start time. Its consideration allows the majority of an active population to start their day with long-term utilitarian activities in the mornings, leaving other activities for the rest of a day. The inclusion of distance limitations for cyclists and pedestrians prevents non-city residents from taking longer trips across a city in a simulation. It confirms the fact that persons tend to choose closer facilities, especially when walking or bicycling. Lastly, the safest routing for cycling better corresponds to the observed data than the shortest routing.
While scenario analysis justified the model's complexity, the empirical validation showed different, significant relationships between simulated and observed data: strong relationships in terms of temporal patterns and moderate to weak relationships in terms of the spatial distribution of flows, depending on the reference dataset. We are aware of the model's inaccuracies in its results. However, considering the quality of currently available input and validation data, the model presented in this paper is a step forward in terms of methodology (ABM for macroscopic simulation of bicycle flows) and predictive power. Moreover, the validation of the relative frequencies in bicycle patterns upholds the strength of the model's internal design. The cycling ratios of morning vs. afternoon peaks, city center vs. outskirts, and west vs. east sides of the city were imitated well in comparison to the observed patterns.
The model is based on a coherent travel theory with a complete set of individual choices. Thus, it simulates the mobility of cyclists that can address planned policy strategies and changes in urban infrastructure. Bicycle traffic flows is an emergent phenomenon that results from mobility. In contrast, widely used traditional four-step models lack such a behavioral background and calculate traffic flows solely based on aggregated assignments of trips. They represent a static modeling system that cannot express individual decisions and connect the characteristics and preferences of travelers with their trips [16].
Including individual preferences leads us from models that are "just" phenomenologically correct to behaviorally realistic simulations. The grand advantage of behaviorally realistic models is that they can adapt to city planning scenarios, and they, thus, provide city planners with a powerful toolset to design holistic concepts for cycling mobility. Questions such as "how will bicycle traffic patterns change, if we build a new company with 1000 new employees at location xy" or "what if we add a bicycle lane to road x" or "where to place bike-sharing stations" can only be answered with a behaviorally realistic model.
Although behavioral realism could improve the model, there are still deviations from real bicycle traffic. The model explains general patterns well but needs further improvement to answer more specific questions. Despite the strongly correlated temporal distributions of cyclists, the problem with the magnitudes of error persists. Considering the ultimate safety preference in routing can be a fallacy. In contrast to the implemented routing, Leao and Pettit [29], in their study, determined that around 80% of recorded GPS tracks of cycling commuters matched a synthesized sample of shortest path trips. The fact that commuting trips are more sensitive to distance is also evident in the study by Broach et al. [58]. Thus, the assumption of the shortest path preference for commuting trips is a potential model improvement.
Furthermore, in the real world, a bicycle path might not always be the optimal option from a safety and distance point of view. This unusual behavior can be an example of unique cognitive processes in human behavior that an agent-based model is not able to capture [59]. The model would benefit from the consideration of traffic lights in routing decisions, since cyclists tend to choose less disturbed paths [60].
There are also deficiencies in the quality of available input and validation data. Currently, travel decisions depend on derivatives from the regional mobility survey. The main concern with such regional surveys is the risk of incompleteness, the biased representation of a population, or the lack of necessary information. For example, the workplace census data has its limits for modeling workplace facilities. The selection of workplace destinations depends on the number of employees at locations of their official registration and not where they actually work. This is a possible cause of observed spatial misbalance of simulated traffic between the eastern and western parts of the city The incomplete representativeness of crowd-sourced mobility data may attribute to some discrepancies in validation results [61,62]. Data collection by mobile applications used in this study depends on user participation. Users are often health-oriented people interested in recording bicycle trips. Thus, data could omit utilitarian trips to work, schools, and universities by other groups of people. Moreover, the number of trips recorded by the two applications is small (taken trajectories from both applications together, 0.17% of the average daily number of cyclists registered by Rudolfskai counting station) and, thus, of limited representativeness.
Apart from specific data quality, a problem arises from the fact that demographic parameterization of mobility behavior is generally data hungry. Thus, transferability may be limited. The remaining question is: how robust is the model if transferred to other regions? Which components are to be re-parameterized with local data; what are generalizable parameters that can be transferred? If transferability is provided for the key components, "data hungriness" is less problematic.
The results highlight the added value of using transport ABMs at a regional level, where spatio-temporal traffic patterns emerge from the individual behavior of residents. The output data helps to distinguish the usability of existing and potential bicycle paths necessary for planning strategies and investments. Although only bicycle movement is simulated, the simulation of mode choice includes all major transportation modes. This facilitates the potential integration of additional concepts to capture travel patterns of other transportation modes. Moreover, the model can facilitate the investigation of other urban dynamic processes as an integrated transport module.