Real-Time Nanoscopic Rider Safety System for Smart and Green Mobility Based upon Varied Infrastructure Parameters

: To create a safe bicycle infrastructure system, this article develops an intelligent embedded learning system using a combination of deep neural networks. The learning system is used as a case study in the Northumbria region in England’s northeast. It is made up of three components: (a) input data unit, (b) knowledge processing unit, and (c) output unit. It is demonstrated that various infrastructure characteristics influence bikers’ safe interactions, which is used to estimate the riskiest age and gender rider groups. Two accurate prediction models are built, with a male accuracy of 88 per cent and a female accuracy of 95 per cent. The findings concluded that different infrastructures pose varying levels of risk to users of different ages and genders. Certain aspects of the infrastructure are hazardous to all bikers. However, the cyclist’s characteristics determine the level of risk that any infrastructure feature presents. Following validation, the built learning system is interoperable under various scenarios, including current heterogeneous and future semi-autonomous and autonomous transportation systems. The results contribute towards understanding the risk variation of various infrastructure types. The study’s findings will help to improve safety and lead to the construction of a sustainable integrated cycling transportation system.


Introduction
One of the significant challenges that the present world is facing is the uptake of sustainable modes of transport. The transportation sector is the second largest contributor to global greenhouse emissions [1]. Another challenge that needs immediate attention is the number of fatalities and casualties due to road traffic crashes. Road traffic crashes are the leading cause of death in the younger generation, 15-29 age group [2]. Hence, it is essential to improve safety as well as decrease the emissions from transportation. This will help save lives, decrease the number of injuries, and improve the cities' likeability. These objectives can only be achieved if we promote cycling as a mode of travel. The share of the cycling mode should increase many folds from the present low base [3]. Such a measure will have social, economic, as well as environmental benefits.
The primary hurdle that affects the uptake of cycling as a mode of travel is the high inequity of cyclists' safety. Cyclists face 13 times higher risk for the same distance traversed in Great Britain [4]. A study on the mode shift elasticity [5] concluded that the latter is more significant than that for cyclists, i.e., they attracted a proportionally higher number of cyclist road users for a safety improvement. Identifying the physical and environmental threats to the cyclist in the natural urban environment provides an insight into the preference and choice of cyclists [6]. The built environment, weather, work-related factors, and attitudes affect the everyday commute by bicycle [7]. In addition to the varying infrastructure, environmental, and traffic flow conditions, cycling hazards are also dependent upon cyclist-specific variables of age, experience, and gender [8]. It is widely established in the literature (see [9,10]) that how males and females use the infrastructure significantly varies, with varied safety implications.
Women are likely to make short journeys, and their journey's spatial and temporal structure is different from men. They rarely prefer large multi-lane roads and busy junctions. Instead, they prefer selected areas of the city having narrow streets with calming traffic measures. They generally cycle at lower speeds, are more likely to make recreational rather than commuter trips and have a stronger liking for quiet traffic streets [11]. A study [12] investigating crashes in the Czech Republic reported that males account for around 69% of the crashes and are more likely to be involved in a fatal crash (80%). Similarly, a study in the USA [13] found that males are at a higher risk than females (around five times more for the same distance travelled). It is common speculation that men drive less safely and more recklessly than women. Another reported variable in the literature is the age of the trip maker. The work on cyclist near-misses in London [14] led them to the conclusion that the age group of the rider directly affects their daily near-misses. They reported that the daily near-misses decrease with age, from 2.47 (20-29) to 1.85 (>60 groups). These near misses are correlated with crashes (see [15]). Similar results were obtained in Germany, and a study concluded that cyclists of varying age groups use the infrastructure differently and exhibit varying microscopic road traffic behaviour [16,17]. Men are more reluctant to modal shift to cycling than women [18,19], and it takes much more improvement in the infrastructure and environment for women to consider cycling [20]. The British cycling report (Transport Research Lab report 490) argued that gender, age, and cycling experience are critical variables affecting cyclists' safety. However, these variables do not influence how the cyclist rates a particular cyclable route. The qualitative evaluation of the infrastructure is the same across age and gender [21].
Cycling safety is an important topic; however, limited studies effectively model the risk in terms of exposure [22]. The present need of the transportation system requires cycling mode share to increase significantly. The major hurdle in this process is insufficient evidence to understand the relationship between cyclist safety and the identified parameters [23]. The literature is abundant with evidence of variables affecting cycling safety, which are presently not modelled in the prevalent general framework. The Italian crash study [24] demonstrated that the variable gender is a significant primary variable affecting the risk faced in terms of road type, type of interacting vehicle, riskiest vehicle manoeuvres, collision type, time, day, and season of the journey. The varied safety implication of gender is compounded by varied meteorological conditions that present different risks to different genders for varied infrastructure types [25]. Similarly, different lighting conditions also have different safety implications [26]. The needs of the cyclists are peculiar while interacting with other road users and infrastructure and its corresponding safety implications. The present road safety modelling framework implies a trade-off between prediction accuracy and knowledge of the causal link between the essential variables [27]. A practical model for examining road safety should determine causation, be highly predictive, and be scalable to large data sets [28]. This gap in the literature needs to be filled so that effective cycling infrastructure design and planning is not compromised. The transportation sector is at the brink of the fourth industrial revolution, with the movement toward autonomous/semi-autonomous vehicles and infrastructure to start soon. It is high time that the work on real-time embedded safety systems is explored. Presently, very few works in the literature undertake the development of an embedded learning system that can accurately model cyclist safety. Hence, the study aims to develop an intelligent embedded learning system for modelling cycling infrastructure. Such a system will be able to take the input data in real-time, model the variable, and result in an output of the predicted safety. As a result, the objectives are to:

1.
Develop an intelligent framework to construct a real-time embedded learning system that accurately models cyclist safety. 2. Apply this learning system as a case study on an investigation area.
3. Construct a nanoscopic model for a cyclist to predict the safety for a particular age and gender. 4. Identify and quantify the significance of the variable affecting the unsafeness of the rider based upon the personal attribute.
The proposed embedded learning system will consist of specific hardware and software. This will result in a composite system that can continuously take the data and undertake modelling to present the policymakers/city planners with the final desired output that is ready to use. Such a proactive approach is necessary to achieve the 2030 vision of zero road traffic fatalities and embark on a pathway towards a smart, green, and integrated transportation system. It is high time that intelligent embedded systems are incorporated into transportation research as well as practice. The proposed real-time embedded learning system is described in the next section, followed by the results in Section 3, and the conclusions are drawn in Section 4.

Real-Time Intelligent Embedded Learning System
This section describes the proposed real-time intelligent embedded learning system in detail ( Figure 1). It consists of three units: (a) input data unit, continuously collects data in real-time; (b) knowledge processing unit, develops an embedded learning system in real-time; and (c) output unit, results in the predictive model and variable interaction model.

Input Unit
Output Unit K P U

Input Unit
The input learning system consists of automatic data collection unit, consisting of: (a) meteorological, (b) lighting, (c) traffic flow data, (d) crash data logger, and (e) digimap. The study area of north-east of England is used as a case study. Through the partnership with the city council, the metrological, lighting, crash datasets, traffic cameras, and counters were accessed. The flow characteristics for the study area were obtained from the traffic flow database system (TRADS). Through this system, the traffic cameras and counters were accessed ( Figure 2). The exact WGS84 coordinates were extracted for each crash, which were then used as an input to obtain the concerned infrastructure parameters. Digimaps is an online map and data delivery service available to the research group. EDINA operates it at the University of Edinburgh. This platform was used to extract the infrastructure information based upon the WGS84 coordinates. This platform hosts accurate infrastructure maps depicting the present as well as past conditions. This ensured that exact infrastructure parameters were used for modelling based on the crash's temporal conditions rather than the present conditions. The sensors continuously feed the data into the KPU in the form of a combined base input file.

Knowledge Processing Unit
The first step in the knowledge processing unit (KPU) involves data cleaning. Multiple datasets are combined; hence an associated noise is present in the dataset, which is removed before data analysis has proceeded. The predictive models were developed through deep learning with neural network classifiers. The final input data was randomly divided into three categories of training, validation, and testing, in the ratio of 6.5:3.0:0.5. For the network to develop accurate prediction properties, this is the recommended division [29]. Such division guarantees adequate data for the network to learn correctly, evaluate the trained model, and apply the generated models to untrained circumstances. Bernoulli distribution was utilised to ensure that the data were divided randomly. Two different models (each for male and female) were constructed using the following input infrastructure variables (Table 1). The output variables of the model are described in Table 2. The following network structure was used to construct the model (Table 3). The following four-step iterative learning process mapping input with the output variables was used.
Step 1: Random weights and activation. Firstly, between each connection, i.e., input and hidden, within hidden layers, and hidden and output layer, random weights were assigned. For signal transmission between the synaptic cleft activation function 'Hyperbolic tangent' Equation (1) for hidden layers and 'Softmax' Equation (2) for the output layers were used.
where ∈ is the activation of the ∈ ℎ output neuron, is the number of output neurons.
Step 2: Error modelling. Cross entropy error function was used to model the error between the output obtained and the desired output.
where is the actual output obtained for the output node l, and is the largest value of l.
Step 3: Synaptic weight update. The initially randomly assigned synaptic weights were updated based on the eq 3 error. The backpropagation algorithm calculates the gradient of the training error in each training case (epoch). ∆ ℎ∈+1 = ℎ∈ + ∆ ℎ∈ (7) where ∅ is the learning rate, and x is the input variable.
Step 4: Scaled conjugate gradient learning. The above steps were continuously repeated (iterated) until either the maximum number of these iterations (epochs) or minimum training error change was achieved.
The performance of the constructed predictive model was evaluated through the area under the curve (AUC) of the receiver operating characteristics (ROC). These are the evaluation matrices utilised for checking the networks classification performance. ROC is a probability curve, and AUROC represents the measure of the separability power of the network. While calculating the risk, the higher the AUROC value, the better is the distinguishable power of the network. Besides, gain and lift charts are used for qualitative evaluation; the visual aids for evaluating the performance assess the model's predictive capability compared with a non-model-based probability evaluation. After model construction and performance measurement, the next step is to validate the model through validation datasets. This process ensures an unbiased evaluation of the model fit on the training dataset while tuning the model hyperparameters, followed by checking the model's performance on unseen data, and providing an unbiased evaluation of the final developed predictive model.
The critical variables in the data learning model were identified through variable importance. The normalised significance of each variable concerning the most vital model was also evaluated to compare variables relative to each other. The independent variable importance is a measure of how much the predicted output value changes, viz a viz change in the input variable. The normalised significance of each input variable is their respective importance value divided by the largest importance value and expressed as percentages. This is followed by the Boolean logic, which presents the output in the form of the single most critical variable affecting the safety of a particular group.
The specific hardware for the embedded system was used. It has wireless connectivity with an inbuilt memory of 32 GB. Although the embedded system has its own processor, the said processor is not used in the KPU, which has its own separate processor and specific RAM. The use of the processor in the embedded system itself should be explored in future research.

Results and Discussion
There were 3325 bike collisions registered in the investigation area: 2638 being slight, 661 serious, and 26 fatal.

Predictive Model
The findings of the deep learning prediction model that predicts the riskiest age and gender group based upon infrastructure input factors are reported in this section. Two deep learning models were built. Their accuracy is measured by comparing their distinguishable power between the riskiest and non-riskiest age and gender groups using AU-ROC, as shown in Table 4. The output predicted versus the observed for the testing datasets is presented in Figure 3, and the ROC curves for each output variable in Figure 4. It can be inferred from the ROC curves and the AUROC values that both the models attained a high level of accuracy. Each model can have up to seven different output values. As a result, with perfect precision, the highest value of AUROC can be seven. The male and female AUROC values obtained were 6.19 and 6.64, respectively. This suggests that the prediction and distinguishable power is high and consistent. As the physical and cogni-tive abilities of riders of different ages and gender groups vary, there are safety implications for each rider sub-group. Hence, it makes it possible to estimate the most dangerous age and gender groups based on the individual input data.     The literature widely reports that current safety models cannot be directly utilised to simulate bike infrastructure, due to their inability to appropriately represent safety. In one comprehensive work to evaluate the efficacy of such a model, the Finnish crash model was gauged for its efficacy to model cyclist safety [30]. The study revealed an inaccuracy of around 65%. Similarly, the evaluation of the primary transportation simulation software, including PTV VISSIM, AIMSUM, TEXAS, and PARAMICS, exhibited an incapacity to mimic riders adequately and efficiently [31]. These studies concluded that cyclist interaction cannot be simulated using such packages, developed on the premise of motorised modes of transport. Hence, the accuracy achieved through the developed learning system is many times higher than the custom models in the literature. To further demonstrate the benefit of using the complex methodology for prediction, lift and gain charts are presented in Figures 3-6. These charts compare the efficacy of the output compared with the traditional probability-based statistical model.   For all the variable cases, the ROC curve line is closer to the upper left corner, farther away from the 45° basslines, which depicts significantly high prediction capability, evident from the AUC values. The gain is a measure of the effectiveness of the constructed model calculated as the percentage of the correct predictions obtained within the model versus the accurate predictions obtained without the model, i.e., baseline. A significant higher gain is obtained for all the output variables (an average of 10%, 80%), i.e., if we sort the wrong prediction by their pseudo probabilities, the top 10% of the dataset will have all 80% cases of improper predictions. Similarly, from the gain chart, the average gain value for the output, at 10% data is eight, i.e., the average accuracy of the models is eight times higher in comparison to the base case at this point.

Variable Interaction Model
The critical variables in the data learning model are identified through the variable and normalised significance (Table 5). This is based upon both testing and validation data sets. From the variable interaction model, it can be inferred that the overall effect of the infrastructure is not significantly different for riders of a different gender. However, the critical variables vary and differ by a small proportion in the rank of importance and their relative effect. The Boolean logic presents the most critical variable for males as RHLD, and for females as vehicle manoeuvre. The following variables are estimated in the rank of their importance for males, RHLD, followed by vehicle manoeuvre, junction location of the vehicle, junction detail, and road location of the vehicle. However, for females, critical variables are vehicle manoeuvre, followed by RHLD, junction details, road location of the vehicle, and road type. The importance rank of the rest of the variables is similar; however, their normalised importance values vary slightly. This leads us to conclude that there are specific attributes of the infrastructure that are risky for all cyclists. However, the level of risk that each infrastructure attribute possesses is dependent upon the gender of the rider. The results agree with the findings in the literature. The infrastructural hazards present different levels of risk to the cyclist based upon their gender (see [20,32]), and that a bad infrastructure design/condition is rated poorly by the cyclists irrespective of their gender (see [23]). The novel variables introduced in the study, i.e., road hierarchy level and direction, are significant variables. It is recommended that these variables be considered in the cyclists' road safety investigations. The sudden change in the road hierarchy requires a shift in how cyclists need to interact with the infrastructure and other road users. Furthermore, the direction of change, i.e., whether the difference in the hierarchy is from a low class of road to a higher level or vice-versa, is a critical externality.

Conclusions
This paper proposes a real-time nanoscopic rider safety system based on the varied infrastructure variables. Such a system will help improve safety, hence the attractiveness of the mode of travel. An embedded real-time learning system was developed, whose knowledge processing unit is primarily based upon deep neural networks. This is applied as a case study on the Northumbria region in England's northeast. Two accurate prediction safety models were developed that can predict the riskiest age and gender groups.
An average accuracy of 92% was obtained in these prediction models. This accuracy is several times higher than the available models in the literature. In addition, a variable interaction model was developed that ranks the input variable and estimates the variable importance.
The research's primary contribution is developing a comprehensive embedded system that models safety in real-time. The system includes an input unit, knowledge processing unit, and output unit. Not only prediction models are developed, but an understanding of the critical variables was determined and established. Presently, very few such systems exist in the literature that have affected infrastructure systems' design and planning. The present models in the literature are primarily based upon the probabilistic functions of human error, which model variables such as miles traversed, speed limits, intoxication, and others. These variables model road users' behaviour at an aggregate level, such as a city or a county. The learning system developed in this work models the road user at a nanoscopic level. The local authorities and city planners can directly apply such a system. Through the application of the learning system on the north-east of England, it is established that the varied types of infrastructure impose differing risks to riders of various ages. The interaction between cyclists and infrastructure can cause both physical and cognitive stressors, to which riders of different ages and genders respond differently, allowing us to estimate the riskiest age and gender group depending on the specific input factors. Furthermore, certain infrastructural characteristics are hazardous to all bicycles. However, the rider's personal attributes such as age and gender determine the amount of danger that any infrastructure poses. The critical variable affecting the safe usage of infrastructure for females is the rider manoeuvre that they are required to perform. This is due to the varying physical and cognitive abilities that females possess compared with males. However, for males, it is a sudden change in the road hierarchy level that presents the highest level of risk. The sudden change in the road hierarchy requires a shift in how cyclists need to interact with the infrastructure and other road users.
The study's findings will assist in improving safety and contribute to the development of a sustainable integrated bicycle transportation system. The study's findings will aid in better understanding the risk variance of various infrastructure types. Following validation, the developed real-time learning system is interoperable in several situations, including present heterogeneous and future semi-autonomous and autonomous transportation systems. The embedded learning system will improve the attractiveness of cycling as a mode of travel and contribute to a smart and green transportation system.