Cyclist Injury Severity in Spain: A Bayesian Analysis of Police Road Injury Data Focusing on Involved Vehicles and Route Environment

This study analyses factors associated with cyclist injury severity, focusing on vehicle type, route environment, and interactions between them. Data analysed was collected by Spanish police during 2016 and includes records relating to 12,318 drivers and cyclist involving in collisions with at least one injured cyclist, of whom 7230 were injured cyclists. Bayesian methods were used to model relationships between cyclist injury severity and circumstances related to the crash, with the outcome variable being whether a cyclist was killed or seriously injured (KSI) rather than slightly injured. Factors in the model included those relating to the injured cyclist, the route environment, and involved motorists. Injury severity among cyclists was likely to be higher where an Heavy Goods Vehicle (HGV) was involved, and certain route conditions (bicycle infrastructure, 30 kph zones, and urban zones) were associated with lower injury severity. Interactions exist between the two: collisions involving large vehicles in lower-risk environments are less likely to lead to KSIs than collisions involving large vehicles in higher-risk environments. Finally, motorists involved in a collision were more likely than the injured cyclists to have committed an error or infraction. The study supports the creation of infrastructure that separates cyclists from motor traffic. Also, action needs to be taken to address motorist behaviour, given the imbalance between responsibility and risk.


Introduction
Cyclists are considered 'vulnerable road users' because, like pedestrians, they are at relatively high risk of serious injury compared to drivers of motor vehicles. However, recent research highlights the overall population health benefits that result from cycling implying the need to increase active travel as well as to make it safer [1].
Much previous research on cyclist injury severity has examined cyclist characteristics, often focusing on helmet wearing and head injuries [2]. Another group of studies examined (mis)behaviours including use of alcohol or drugs [3]. Other research has targeted demographic correlates of injury risk, finding older people to be more vulnerable to severe injury [4]. Another strand of work concentrated on route conditions, from permanent fixed infrastructure to temporary conditions such as weather or light levels [5][6][7][8]. In some studies, bicycle infrastructure has been found to reduce injury severity, as have street characteristics such as lower speed limits, and secondary or tertiary Int. J. Environ. Res. Public Health 2020, 17, 96 2 of 16 roads compared to primary roads. There is relatively little work studying factors relating to drivers, although some work did examine vehicle types, finding that larger vehicles-particularly lorries-are associated with higher injury severity [9].
Most research effort focused on Anglophone countries, or on high-cycling contexts such as Denmark. Here the study setting is Spain, a European country with generally low cycling rates. The first priority listed in the Spanish Road Safety Strategy 2011-2020 [10] is "to protect the most vulnerable users". One of the Strategy's 13 goals is to achieve 1,000,000 more cyclists without an increase in what is described as the 'cyclist death rate' at baseline (2009). Yet while the number of cycling deaths remained stable from 2007 to 2016 (66 ± 12), Spain saw a 60% increase in hospitalised cyclists and a 215% increase in cyclist injuries not requiring a hospital stay [11]. For all injured casualties, in the same period of time (from 2007 to 2016), the percentage of cyclists almost doubled, from 4% to 7% [12]. Alongside this substantial growth in injuries, growth in cycling remains uneven across Spain. While some cities, such as Seville, have seen major growth from a low base [13], others, such as Madrid, continue to have very low cycling levels. Given unevenness in take-up alongside growing injury numbers, there is debate about how best to increase cycling uptake and cycle safety.
Unfortunately, Spain no longer conducts a regular national travel survey. The 2006/2007 Movilia survey is the most recently available data, ten years earlier than the injury data we are analysing. Hence, we are unable to assess injury risk in relation to exposure. Instead, the paper aims to explore whether a diverse range of factors are associated with cyclist injury severity in the Spanish context. In this way it contributes to discussion about how to reduce the risk of severe injury for cyclists involved in collisions.
This study investigates the following questions: (i) What route environment, vehicle, and rider/driver-related factors are associated with elevated cyclist KSI risk (Risk of being killed or seriously injured, if involved in an injury collision recorded by the police)?
(ii) What interactions can be identified between these factors?

Approach
Authors have investigated and studied the causes of road traffic collisions using diverse methods and techniques [14]. Traditionally, classical statistical methods such as regression models, ordered probit models, and decision trees have been used to predict the severity of traffic collisions and to determine contributing factors. Moreover, artificial intelligence techniques like genetic algorithms, artificial neural networks, principal component analysis and fuzzy logic have been widely used in injury prediction models [15].
Recently, the number of Studies using Bayesian networks in safety context is rising as this method provides reliable inferences regarding safety issues [16][17][18][19][20][21][22][23]. This includes evaluating the severity of traffic collisions, analysing their causes and/or predicting the probability of fatal and serious injuries. Previous research efforts demonstrated that Bayesian networks predict collision severity better than traditional methods such as regression models [24,25].
This study proposes a Bayesian network model in which the outcome variable is 'KSI' which represents the injury severity (killed or serious injured versus slightly injured) experienced by an injured cyclist in a collision. By using vehicle-level data (both cyclists and other involved vehicles) the study examines the extent to which factors related to injured cyclists and other parties, alongside route environment factors, are associated with a cyclist being killed or seriously injured in a collision.

Variables
Data used in this study have been collected from the 2016 traffic collision database provided by the Directorate General of Traffic, which is responsible for managing the "National Registry of Victims of Traffic Accidents" in Spain [26]. Variables used in this study are fully described in Table 1 and relate to cyclist/motorist behaviour and route environment conditions. The rest of this subsection provides context for international readers about three key types of infrastructure/zonal characteristics related to urban zone or not, presence of bicycle infrastructure, and traffic calming. The dataset classifies a location as a 'calle' ("street") or not. 'Calle' here is translated as 'urban zone', with other locations comprised of inter-urban zones or urban highways which are also defined as inter-urban.
Spanish traffic regulations [27] refer to several types of bicycle infrastructure; "cycle lane" (on-road cycle path), "protected cycle lane" (on-road cycle path with some kind of physical protection), "sidewalk cycle path" (delimitated cycle path located on pedestrian spaces), "cycle track" (completely separated from the rest of the traffic) and "cycle path" (separated from traffic, shared with pedestrians, and within green spaces). The variable '30-zone' might be best defined as 'traffic calmed', including streets where speed limits are lower than 30 kph or where motor traffic is excluded or restricted. Within urban areas generally, the default national speed limit is 50 kph. However, municipalities can install 30 kph and 20 kph zones and some, including Barcelona and Madrid, have implemented 30 kph limits across much of the city.

Model
This section describe the principle of the Bayesian network model which is implemented in MATLAB software (Matlab 2014b). In the proposed Bayesian Network model, in which the outcome variable is the cyclist severity injury (KSI), the Bayes Classifier (BC) minimizes the probability of misclassification by solving the following optimization problem: Discrete Bayesian Networks (BN) are probabilistic graphical models to learn the joint probability distribution (JPD) of a multivariate problem involving multinomial variables (22). The model is based on a directed acyclic graph that expresses the direct/conditional dependencies/independencies between the variables and simplifies the learning of the JPD, based on the factorization associated to the independencies given by the directed acyclic graph, DAG (Equation 1), and the interpretability of the resulting model. The equation 1 represents the Joint Probability Function of the Bayesian Network. Where {x 1 , . . . , x n } are the variables considered in the model and π i are the set of parents of the variable x i given by the DAG.
p(x 1 , x 2 , x 3 , . . . , As a result, the learning is divided in two phases: structural and parametric. First, the DAG is obtained by applying the greedy learning algorithm proposed by Buntine [28]. This is a score-based algorithm that tries to obtain the DAG corresponding to the lowest Bayesian Information Criterium (BIC) which is a measure of the goodness of fit of a Bayesian model based on the likelihood function that penalizes the complexity of the model to avoid overfitting (See Schwarz [29] and Wit [30] for detailed explanation of the score definition). To this aim, for each step the algorithm evaluates all the possible links between the variables introducing the DAG with lowest BIC that best represent the independencies of the data. Please note that we have not included a minimum improvement threshold to add new links to the graph due to the penalty term of the BIC, limiting the inclusion of new links. Other algorithms introduce a pre-order of the variables (K2-algorithm, [31]) limiting the possible parents of a particular variables in order to reduce the computational costs but introducing a dependence on the order established, so we decided discard this option. Secondly, the parameters given by the DAG are obtained by maximum likelihood as the ones that better explain the observed data. Note that the DAG doesn't reflect causality but the statistical dependences between the variables, and from a mathematical point of view an equivalent factorization can be obtained keeping the non-directed graph and the v-structures relating three variables (e.g., Age → Gender ← Helmet), but it is not necessary to maintain the direction of the links between variables [31].
Based on the resulting JPD and DAG, new knowledge for one or several variables of the model (evidence) can be easily propagated to the rest of the BN obtaining the new probabilities of the rest of variables included in the model (inference).
Finally, the resulting JPD of both, factors and target variable (KSI), allows us to define a natural BC by establishing a threshold above/below which the occurrence/absence of KSI is identified.
According to the objectives of the study, two experiments were defined. First, a 10-fold cross-validation experiment was developed to obtain the skill and generalization capabilities of our Bayesian Network, and to identify possible biases. To this aim a random partition of the database in 10 subsets was defined. For each subset 90% of data used for training and the remaining used for predicting. As a result, a prediction of the full sample is obtained by joining the ten subsets. Several parameters have been considered to evaluate the resulting model. The Area Under the Receiver Operating Characteristic Curve (AUC, [32]) was used to evaluate the skill for both each fold and the full series obtained by joining the ten predictions. This measure is based on the ROC Curve, that plots the Hit Rate versus the False Alarm Ratio as the probability threshold varies, obtained by integrating the curve. The score varies between 0 (opposite predictor) and 1 (perfect predictor), being the 0.5 equivalent to a random predictor system. The result of the AUC in the present study was between 0.91 and 0.95.
As the AUC can be biased to one of the categories, mainly when there is an unbalance in the sample to a state of the variable, the sensitivity and specificity have been defined as follows: where TP/TN stands for the number of predicted True Positives/Negatives, and P/N the number of observed Positives/Negatives, respectively. Furthermore, the accuracy index was defined as: The results of the Sensitivity and Specificity for KSI in the present study were 0.60 and 0.99 respectively, and the accuracy index was 0.95. Secondly, taking advantage of the properties of the BNs, a sensitivity analysis was proposed by evaluating how the KSI's probability changes when different factors are evidenced.
Note that only events without missing data in both the factors and target variable, which corresponds to the 99.7% of the sample size, have been considered. This approach lets us to consider a unique model for the sensitivity analysis, removing bias related with the sample, avoiding problems in the model adjustment, and prevents the introduction of noise in the results due to any filling gaps procedure or the availability of different variables for each event. In addition, once the model has been evaluated and its predictability tested, 100% of the database has been considered to train the model used for the sensitivity analysis.
Many programs have been developed to efficiently train Bayesian Networks, such as Netica Software, Hugin Investigator, Genie, Matlab, R or Microsoft with MSBNx sotfware. For our study, we used the Bayesnet toolbox for Matlab (Matlab 2014b).

Descriptive Statistics
In 2016 there were 102,362 injury collisions on Spanish roads, involving 179,295 vehicles and 174,679 drivers or riders, of whom 12,318 were involved in a cyclist collision with at least one injured cyclist. This included 7488 cyclists involved in these collisions, of whom 66 were killed and 711 seriously injured. A collision that injures a cyclist is unlikely to injure a motorist: of 4830 drivers or motorcyclists involved in collisions with cyclists, 4626 (95.8%) were uninjured; 184 (3.8%) sustained a slight injury, with 17 (0.4%) seriously injured and 3 (0.1%) killed (see Table 1). As presented in the last row in Table 2, of the 7488 cyclists involved, 4880 (65.2%) were involved in collisions with motor vehicles. The other 2608 cyclists were involved in falls or in collisions involving other cyclists (the injury data contained records relating to 258 cyclists who were not injured but were involved in a collision that injured other cyclists) (see the last variable in Table 2). Here we focus on factors related to motorised vehicle involvement, also presenting analysis related to the 2608 (34.8%) of cyclists injured in falls or collisions involving other non-motorised users. Within non-motorized incidents the number of pedestrian-cyclist collisions is unknown, and in this case a pedestrian-cyclist collision is considered as cycle-only collision. Of the motor vehicles involved in cycle collisions, 4262 (88.2%) were cars; 313 (6.5%) were motorcycles; 155 (3.2%) were HGVs; 53 (1.1%) buses; and 47 (1.0%) other types of vehicle. Table 1 shows the distribution of our modelled variables in relation to cyclists and to the motorists involved in these injury cyclist collisions. Key descriptive findings highlight the distributions of different crash types. While 7% of cyclists were involved in an incident taking place on cycling infrastructure, only 0.2% (10) of motor vehicles collided with a cyclist on cycling infrastructure. In other words, cycle infrastructure seems to sharply reduce the likelihood of collision with a motor vehicle, with non-motorised falls/collisions being more typical. Similar but less striking findings are true for urban zones (where non-motorised crashes are more typical than in inter-urban zones), but not for 30 kph zones. Over three-quarters of collisions took place in clear, dry weather, without any surface related issues, with the same true for visibility, although with a very high number of missing values.
Cyclists and motorists have different age profiles; specifically, children are unsurprisingly almost absent among the latter category. By contrast, those aged under 18 made up 10.8% (806) of injured cyclists. Men dominated among both cyclists and motorists, but even more so among cyclists (83.1% vs. 73.4%). Just under half of the involved cyclists (44.9%) were recorded as definitely having worn a helmet, albeit with a high level of missing data.
Finally, the data record the prevalence of infractions and errors attributed both to cyclists and involved motorists. Motorists, while unlikely to have sustained an injury, are relatively more likely to have committed an infraction or error, compared to cyclists. For other drivers and riders, 32.0% (1544) were recorded as having committed an infringement of some type, while for cyclists the figure was 14.1% (1057). For errors, the respective figures are 22.4% (1084) and 12.8% (960). Table 3 highlights factors associated with high KSI risk for cyclist, based on our Bayesian network model (see the directed acyclic graph in Figure 1). The table shows the probabilities of a cyclist being killed or seriously injured by effect of each variable.

Factors Associated with High Risk
almost absent among the latter category. By contrast, those aged under 18 made up 10.8% (806) of injured cyclists. Men dominated among both cyclists and motorists, but even more so among cyclists (83.1% vs. 73.4%). Just under half of the involved cyclists (44.9%) were recorded as definitely having worn a helmet, albeit with a high level of missing data.
Finally, the data record the prevalence of infractions and errors attributed both to cyclists and involved motorists. Motorists, while unlikely to have sustained an injury, are relatively more likely to have committed an infraction or error, compared to cyclists. For other drivers and riders, 32.0% (1544) were recorded as having committed an infringement of some type, while for cyclists the figure was 14.1% (1057). For errors, the respective figures are 22.4% (1084) and 12.8% (960). Table 3 highlights factors associated with high KSI risk for cyclist, based on our Bayesian network model (see the directed acyclic graph in Figure 1). The table shows the probabilities of a cyclist being killed or seriously injured by effect of each variable. The involvement of an HGV is associated with an elevated risk of death or serious injury (23.9%, compared to 13.3% for buses and 10.1% for cars) as is the involvement of 'other' vehicles. Conversely, collisions involving motorcycles are associated with lower risk of death or serious injury to the cyclist (8.1%). Perhaps surprisingly, non-motorised collisions are associated with a higher KSI risk (12.0%) than collisions involving cars (10.1%) or motorcycles (8.1%). However, this overstates the severity of the risks related to these collisions, as few deaths (as opposed to serious injuries) occur in nonmotorised incidents. Of 541 KSI cyclist incidents involving motor vehicles, 71 (13.1%) were deaths, while of 320 KSI cyclist incidents not involving motor vehicles, only 11 (3.4%) were deaths.

Factors Associated with High Risk
Other notable findings relate to the location. A location with bicycle infrastructure is associated with a somewhat lower risk of KSI compared to one without (8.8% vs. 12.2%), while an urban zone has a lower risk of KSI than an inter-urban zone (7.9% vs. 19.0%), as does a 30 zone (8.3% vs. 11.9%). Other factors had little relationship to injury severity, apart from poor visibility. The involvement of an HGV is associated with an elevated risk of death or serious injury (23.9%, compared to 13.3% for buses and 10.1% for cars) as is the involvement of 'other' vehicles. Conversely, collisions involving motorcycles are associated with lower risk of death or serious injury to the cyclist (8.1%). Perhaps surprisingly, non-motorised collisions are associated with a higher KSI risk (12.0%) than collisions involving cars (10.1%) or motorcycles (8.1%). However, this overstates the severity of the risks related to these collisions, as few deaths (as opposed to serious injuries) occur in non-motorised incidents. Of 541 KSI cyclist incidents involving motor vehicles, 71 (13.1%) were deaths, while of 320 KSI cyclist incidents not involving motor vehicles, only 11 (3.4%) were deaths.
Other notable findings relate to the location. A location with bicycle infrastructure is associated with a somewhat lower risk of KSI compared to one without (8.8% vs. 12.2%), while an urban zone has a lower risk of KSI than an inter-urban zone (7.9% vs. 19.0%), as does a 30 zone (8.3% vs. 11.9%). Other factors had little relationship to injury severity, apart from poor visibility.  Table 4 illustrates the probability of the cyclist injury severity risk associated with factors specific to the motorist or the cyclist involved in the incident. The Bayesian network inference was generated after turning 'vehicle type' into two discrete states (bicycles and other vehicles). Motorists aged between 40-60 and those under 18 had an elevated risk of seriously injuring a cyclist, while middle aged adults had a somewhat elevated risk of being severely injured. No gender differences were found, nor were there much differences in behaviour terms, such as whether the driver was wearing a seatbelt, or carrying the correct drivers' licence.
The motorist not using a helmet (mostly referring to motorcyclists) had more probability of seriously injuring a cyclist, 16.1% versus 13.6%. Perhaps surprisingly, helmet use among cyclists was associated with higher risk of severe injury (14.2%, vs. 10.8%). Infringements and responsibilities in the incident did not appear to influence the injury severity of the cyclist, but as demonstrated above, the level of culpability for drivers is higher than for cyclists.  Table 4 has shown that certain types of route environment (bicycle infrastructure, 30 kph/traffic calmed zone, and urban zones) are associated with lower risk of serious injury for people cycling, while larger vehicles (particularly HGVs) are associated with elevated KSI risk. This section provides data on interactions between vehicle type and injury severity. For instance, traffic calmed areas reduce KSI risk in general, but do they specifically mitigate risks for crashes involving the most dangerous vehicles, such as HGVs?

Interactions between Vehicle Type and Route Environment
Before presenting the analysis, Table 5 shows the likelihood of (i) an injured cyclist and (ii) an involved motor vehicle being present in different types of location. Except for motorcycles (who are not allowed to use bicycle infrastructure, but nevertheless may sometimes do so) few motor vehicles were involved in collisions with cyclists on bicycle infrastructure. Urban zone-based collisions dominated across vehicle groups except HGVs, where slightly more than half of all collisions took place in non-urban zones. Collisions with HGVs were particularly unlikely to happen on both dedicated cycle infrastructure and on 30 zones.  Table 6 presents the KSI risk by zone type (urban/non-urban and 30-zone/others) based on the involvement of different motor vehicles. A gradient can be seen both for vehicle type (if ordered by weight: Motorcycles, Cars and then Trucks and buses) and for location type. For instance, the KSI risk associated with truck involvement is 14.1% for urban zones; higher than for all other road user types except 'others', but lower than the KSI risk associated with truck involvement in inter-urban zones (32.5%). As there were few cases of collisions with motor vehicles on bicycle infrastructure, they could not be split up by type of motor vehicle involved. Instead, Table 7 separates collisions involving motor vehicles or not and compares KSI risk by presence of cycle infrastructure. In both cases, presence of bicycle infrastructure reduces injury severity.

Discussion
This study found a higher risk for cyclists in Spain of being killed or seriously injured where HGVs are involved in a collision, compared to other vehicles, which is consistent with previous studies that used national-level data [33]. Motorists involved in collisions that injure cyclists are highly unlikely to be killed or seriously injured; in 95.8% of cases they are uninjured. However, according to the police, involved motorists are around twice as likely as the injured cyclists who have committed an infraction or made an error which is consistent with the findings of Bíl et al. for the Czech Republic [34]. The study did not find a protective effect associated with helmets but an increase of risk. Most studies have reported a protective effect of helmet wearing in relation to head injuries and fatality [35]; however, other studies have found helmet wearing may increase the risk of other type of injuries [36].
The research did find a reduction in KSI risk associated with three infrastructure categories: bicycle infrastructure, urban zones (excluding major roads within these), and 30 kph or less zones (reduced speed limit, pedestrianised, and/or residential areas). Few collisions involving motor vehicles happened at locations with bicycle infrastructure, and where bicycle infrastructure was present both collisions involving motor vehicles and those not involving motor vehicles were less likely to be serious. Sensitivity analysis focused on urban and 30 kph zones (due to only 10 cases of motor vehicle collisions on bicycle infrastructure) and found that this reduction in KSI risk held for all vehicle types.
The results in relation to each of the categories found to be protective are aligned with the findings in literature. Reynolds et al.'s review [8] documented the protective effects of bicycle infrastructure. No review comparing cycling injury risk in urban vs. rural roads has been found, but our results are consistent with the studies that used national databases [5]. Cleland et al reported that the introduction of 20 mph (approx. 30 kph) zones would decrease cyclist-involved collisions [37].

Limitations and Generalisability
It is well known that police injury data do not capture all injuries, and in particular slight injuries are under-represented. Under-representation of cyclist-involved collisions have been evidenced over-time and at international level [38,39] and it is an intrinsic limitation of this study due to the use of police data [40]. Police definitions of 'serious injury' also cover quite diverse levels of injury.
As not all regions in Spain used the same road collision data collection system during 2014 and 2015, this study has been carried out only with data for 2016 that contains few cases of deaths. Therefore, we were unable to define the target variable in four states (no injury, minor injury, serious injury or death). Instead, we used the KSI variable by combing serious and fatal injuries. This makes the study results consistent as the resulting model has a favourable accuracy index (0.95).
Some findings may not be specific to the Spanish context. For instance, the low KSI risk following a motorcycle collision may not be transferable to other countries with different motorcycle usage: in Spain, 10% of registered motor vehicles are motorcycles [12].
The counter-intuitive finding for helmets may be related to Spanish helmet laws as helmets are compulsory on non-urban roads, where the risk is higher. The probability of an injured cyclist having used a helmet in inter-urban zones calculated with Bayesian network inference was 81.6%, compared to only 28.1% on urban areas. This is likely to be in turn related to different types of cyclist and cycling, not controlled for in this analysis. Hence the results showing a higher KSI risk for helmet-wearing cyclists may not transfer to other contexts.

Conclusions
The study suggests that separating cyclists from motorised traffic (as via bicycle paths) and/or reducing levels and speeds of motor traffic (as 30 zones, which include pedestrianised and residential streets, aim to do) can help reduce cyclist injury severity. For example, in collisions involving motor vehicles, the presence of cycle infrastructure reduces the probability of KSI risk from 11.8% to 8.6%. This happens partly by reducing the likelihood of collisions with motor vehicles, particularly HGVs: in Spain, HGVs are often restricted in pedestrian and residential areas. However, where interactions with other vehicles do appear (which might happen, for instance, where pedestrianized areas allow timed loading by larger vehicles) the risk of serious injury to the cyclist is reduced (9.6% KSI risk in collision not involving motor vehicles versus 8.6% KSI risk in collision with other vehicles).
The study thus supports the creation of bicycle infrastructure and/or traffic-calmed/30kph zones in urban areas. It highlights the relatively high risk associated with major roads, often outside urban zones (19% of KSI in inter-urban zones versus 7.9% in urban areas). Despite eight of ten injured cyclists in inter-urban zones having worn a helmet, these zones are associated with high risk of severe injury, perhaps partly due to risky overtaking manoeuvres on rural roads, with Spain's legal minimum passing distance of 1.5 m being insufficient at high speeds [41] or in poorly maintained roads. Implementation of cycle infrastructure on these roads (rare in Spain, where bicycle infrastructure has mostly been built in urban areas and most of the country lacks supra-local cycle network planning) is further recommended.
Finally, the study finds high rates of infractions (19% of cases versus 7.9%) and errors (22.4% versus 12.8%) committed by motorists involved by comparison to involved cyclists. Although those operating motor vehicles must pass a test and undergo licensing, they seem more likely than cyclists to be culpable in collisions that injure cyclists. While not associated with higher injury severity, such poor driving may contribute to the higher overall collision risk that cyclists experience, per km, on the roads, compared with motorists. Hence as well as infrastructure there is a role for driver education and enforcement focused around behaviour towards cyclists.