Citizen Science for Trafﬁc Monitoring: Investigating the Potentials for Complementing Trafﬁc Counters with Crowdsourced Data

: Trafﬁc counts are among the most frequently employed data to assess the trafﬁc patterns and key performance indicators of next generation sustainable cities. Automatised counting is often based on conventional trafﬁc monitoring systems such as inductive loop counters (ILCs). These are costly to install, maintain, and support. In this paper, we investigate the possibilities to complement and potentially replace the existing trafﬁc monitoring infrastructure with crowdsourcing solutions. More precisely, we investigate the capabilities to predict the ILC-obtained data using Telraam counters, low-cost camera counters voluntarily employed by citizens and freely accessible by the general public. In this context, we apply different exploratory data analysis approaches and demonstrate a regression procedure with a selected set of regression models. The presented analysis is demonstrated on different urban and highway road segments in Slovenia. Our results show that the data obtained from low-cost and easily accessible counters can be used to replace the existing trafﬁc monitoring infrastructure in different scenarios. These results conﬁrm the prospective to directly apply the citizen engagement in the process of planning and maintaining sustainable future cities.


Introduction
Diminished traffic performance with strong modal-split imbalance is one of the key issues affecting sustainable development and the principles of liveability in urban environments. Planning strategies in many cities today are struggling with recurrent road congestion, extended travel duration, less reliable travel times, rat-run traffic, air and noise pollution, or decreased traffic safety [1,2]. The first steps in addressing these issues are data collection and analysis of traffic parameters which further conduct strategic planning and maintenance of traffic network and public services; support environmental, financial, or travel pattern analysis; and assist long-term urban planning or planning individuals' daily commute. Here, some new actors also enter the stage and bring new approaches to the field. Perhaps the most significant developments have happened in the domain of data-intense methodologies, where great amounts of machine-readable information is generated by the socio-technical systems in which citizens are increasingly entangled, by choice or by necessity [3,4].
Engaging members of the public in the development of effective monitoring systems by providing digital data 'from the field' is gaining relevance in transport monitoring [5]. Introduced by the environmental studies, other professional fields have also opened their gates to new ways of capturing data. Several terms have emerged in the literature in the past decade describing public contribution of data, from collective sensing, public crowdsourcing, or community-based monitoring to citizen science. Although these terms are often used inconsistently and there is limited consensus in relation to how different approaches could be classified within one conceptual framework, there are certain criteria [5][6][7] that classify them according to estimated data accuracy and reliability rates, expertise needed for operation, necessary investments in hardware and software, the rate of cleaning and filtering required to eliminate redundant or irrelevant data, etc.
Amongst the different practices of citizen engagement, citizen science is regarded as a directed attempt to the active participation of the general public and its contribution to scientific research [8][9][10]. In comparison to the concepts of collective sensing, public crowdsourcing, or community-based monitoring, this approach proposes more intentional and pre-planned collecting of particular data, which results in better control of the data quality [11]. More studies comparing scientifically collected versus citizen collected data showed acceptable levels of correlation [12][13][14]. Since the mobility and accessibility aspects are closely connected to our daily routines, such data become valuable resources by also demonstrating/conferring some well-appreciated concepts, such as integration, interchangeability, and reusability of data. The implementation of these concepts in the transport domain and the analytical methods used in this context seem to lag far behind their potential.
This paper introduces a novel analytical approach to the examination of road vehicle counts in Slovenia based on two different publicly available data sets-governmental traffic data, based on inductive loop counters (ILCs), and the Telraam crowdsourced data, based on low-cost video sensors used by the interested public [15]. Our research aims to increase the understanding of how the conventional monitoring of traffic flows and traffic diagnosing can be integrated with a concept of publicly collected data to monitor up-to-date traffic situations and performance. Based on the regression analysis, we elaborate on the possibility to replace the selected ILCs by the geo-equivalent Telraam counters. We open the discussion on the potential to complement or even substitute the existing traffic monitoring infrastructure with emerging citizen-based solutions, such as Telraam counters.

Background
Commonly, the network travel performance and the efficiency of road systems have been estimated by roadway flow rates directly rendered from the vehicle counts. The three crucial variables characterising traffic streams on roadways are flow, density, and speed, which establish an interrelation in the distribution, often referred to as a network fundamental diagram [16,17]. A range of metrics can be derived from this setting to track the network congestion or to estimate the efficiency once the travel flow distribution has been recognised. Ultimately, these parameters are combined to estimate typical traffic situations or events such as free-flow, bottleneck effect, stop-and-go waves, and similar [18]. In practice, network performance is often reflected by travel times on certain routes and the travelling reliability referred to.
Traditionally, the prevailing ways in which traffic agencies receive information on traffic patterns are through traffic counts collected by temporary or permanent sensors. The state-of-practice procedures propose stationary on-road or over-road counting devices such as ILCs, microwave, or laser radar sensors, ultrasonic and passive infrared sensors, etc. The ILC loops embedded in the pavement of the roadway are by far the most widely used sensors in conventional traffic control systems [19,20]. They present mature, wellverified technology for estimating traffic parameters (vehicles count, type, density, speed) providing count data 24 h per day, 365 days per year with minimal privacy and security concerns. Their prolific use in transportation management industry has also been a result of their detection accuracy [21,22] and high prediction of the error rates or prediction of possible difficulties, e.g., inaccuracies classifying vehicles in conditions where vehicles do not operate at a constant speed, in dense traffic, or where stop-and-go traffic occurs [23]. However, the installation and operation of inductive-loop detectors bring significant costs. Every mounting requires pavement cutting and lane closures, and it can be influenced by pavement deterioration, improper installation, or pavement repair, which may impair loop integrity [24]. Thus, constant acceptance testing, repair, and maintenance is required to sustain the operational status of the inductive-loop-based vehicle detection systems [25]. These cannot be performed without additional and continuous resources. To keep traffic monitoring at a reasonable cost, ILCs are commonly positioned individually on main roads and strategic points with their coverage being relatively wide, but still considerably limited in comparison to dynamic data sources and other collection methods nowadays available.
Technical advances of the last decade have attested a strong development in both sensorbased solutions (remote and in situ) and cellular network monitoring, namely, from satellite data acquisition [26] and laser remote sensing methods such as LiDAR [27], GPS location services supporting data collecting through large networks of connected vehicles such as floating car data [18,28], or data collected by network-enabled devices [29], by social media networks and platforms such as Google Traffic, Twitter, and Instagram [30,31], and the engagement of smart traffic cameras with innovative image/video analytical solutions [32]. Especially the visual monitoring of dynamic objects, particularly vehicles on the road, has been an active research topic in computer vision and intelligent transportation systems over the past decade. Important advances have been achieved in vehicle detection, tracking, and speed analysis [33]. Different motion-based algorithms were developed to overcome the issues related to the appearance, shape, shades, or disparity by using dynamic background modelling, optical flow, and occupancy grids [34][35][36][37]. Several reviews of the current advances and applications in this field are available in the literature (see, e.g., [38][39][40]). Object detection algorithms are commonly divided into conventional machine learning and deep learning methods. Deep learning methods such as convolutional neural networks mostly improve prediction performance using big data and plentiful computing resources and have pushed the boundaries of what was possible [41]. Recently, Yang et al. [42] introduced a fast and accurate vehicle counting and traffic volume estimation based on a convolutional neural network. Wang et al. [43] introduced a detection and classification of moving vehicles from video using multiple spatio-temporal features and further developed a system for detection and classification of moving vehicles termed as improved spatiotemporal sample consensus algorithm fixing the intrusion of brightness variation and the vehicles shadow. Zhang et al. [44] proposed real-time vehicle detection and tracking using improved histogram of gradient features and Kalman filters. Azimjonov and Özmen developed a real-time traffic flow data extraction system by processing camera images and using vehicle detection and tracking algorithms on highway videos [45]. This study proposes improving the vehicle classification accuracy of you only look once (YOLO) object detector, and introduces a novel bounding box (Bbox)-based vehicle tracking algorithm. The authors also provide a concise literature review on existent vehicle detection and vehicle tracking methodologies applied in different related works. Considering some of the possible limitations of the above-mentioned methods, such as demanding processing, deficient validation, uncertain accuracy, or privacy concerns, the deliberated combination with more verified methods seems a reasonable next step in acquiring broader understanding of the traffic conditions [19,46]. As [47] points out, the combination of conventional and new techniques opens up many possibilities to generate more complete and comprehensive picture of traffic flows in areas that were previously difficult to map.
Furthermore, participatory concepts where citizens voluntarily contribute data using location-based technologies also proliferate individuals' sense of commitment and concern about the issues of local environments. Previous research in this field has demonstrated how the participation and engagement of citizens is reflected in stronger environmental awareness [48], higher sense of commitment, and more motivated engagement in local decision making [7,49]. However, there is also a growing demand from citizens to engage in sensing as means to answer their own questions and benefits from time, costs, or increases in their own productivit, by gaining information using mobile devices or other technologies [50]. To this end, this research aims to increase the understanding of how conventional monitoring of traffic flows and traffic diagnosing can work reliably alongside with the targeted participatory data captured by low-cost, camera-based automated sensors, introduced by the WeCount (citizen science) platform Telraam [15].

Data Collection
We gathered the available traffic data starting from August 2020 to the end of April 2021. We employed traffic data from four different micro-locations equipped with ILCs and Telraam counters. Three of these are located within the city of Ljubljana (see Table 1, the first six segments). The fourth microlocation is located on a highway exit near the city of Koper (see Table 1, the last two segments denoted as Škofije). While Ljubljana is the capital of Slovenia, Koper presents the largest coastal city in the country, located in the proximity of the Italian border. We observed the traffic data in both directions on each micro-location. Eight different road segments were thus applied in our analysis (see Table 1). Data gathered from the ILCs are publicly available on request either from the Municipality of Ljubljana (MOL) for the roads within the city of Ljubljana or from the Ministry of Infrastructure of the Republic of Slovenia for the highway counters. Data gathered within the WeCount project by Telraam counters are accessible through the application programming interface (API), which can be accessed at https://telraam-api.net (accessed on 25 November 2021).
Inductive loop counters automatically collect data 24 h per day and are able to detect and categorise different types of motorised vehicles. We merged these into a single category. Telraam counters are composed of a low-resolution camera and a Raspberry Pi module which processes the sensor and camera inputs and sends the count data to the central database [15]. Since the counting is based on visual inputs, the counting can be performed only in the daytime. Telraam counters can detect traffic in both directions. Similar to the ILCs, these counters are able to categorise different types of motorised vehicles as well as pedestrians and bicycles. Only the former were aligned with the data obtained with the inductive loop counters.
We matched the counter data with the weather data obtained from the VisualCrossing weather history API [51]. Two weather locations were used, namely, Ljubljana for the road segments in the city of Ljubljana and Koper for the road segments around Koper (denoted as Škofije).

Data Preprocessing
We eliminated the outliers for each counter. Namely, we removed the measurements that deviated from the mean for more than 3 standard deviations. For each hour, the Telraam counters report the percentage of the up-time (pctup), which is then used to rescale the measured number of observed entities (vehicles, bicycles, pedestrians) (n m ) to the whole hour (n = n m /pctup). We removed the counts (hours) for which 0 number of vehicles were reported. Moreover, to increase the reliability of the data, we removed all the counts (hours) for which the counter up-time was lower than 50%.
We selected segments with a single ILC and several Telraam counters. However, some of the latter reflected very limited reliability even after the removal of outliers and were thus eliminated from the further analysis. Since the Telraam counters only operate during the day, zero counts should be sparse when the counters are operating normally. Telraam counters reflecting right-skewed distributions can thus be eliminated from further analysis (for example, see Figure 1, counters 0529 and 1783). The same does not hold for ILCs. These operate throughout the whole day and thus often reflect an excess of zeros. In normal operating conditions, bimodal distributions are observed in these counters with one peak close to zero and the other near the expected traffic flow (see Figure 2). If the latter is close to zero (less frequent roads), the two peaks can merge into a single peak (see Figure 2, counters 1025-116-1 and 1026-136-1).
ILCs 1003-116-1, 1004-136-1, 1040-236-1, and 1040-236-2 are located at main arterial roads leading in and out of Ljubljana, Slovenia's largest city. Their histograms show relatively high vehicle count. They also exhibit bimodality, with one peak corresponding to low traffic (close to zero count, e.g., at night) and the other one corresponding to the rush hour traffic. Devices 1025-116-1 and 1026-136-1 count vehicles near the centre of Ljubljana, where traffic is partially restricted, thus the count is comparably low, and there is no distinct rush hour peak. Finally, ILCs 686-1 and 686-2 monitor the highway exit near Koper, the country's largest coastal city, but 10 times less populous than Ljubljana.
The count distributions of matching Telraam sensors are presented in Figure 1. Shown distributions are aggregated measures in both directions, so we subsequently separated the data for each direction, as explained in Section 3.3. Devices 0619, 0655, 0656, 0820, 1029, and 1506 are matched with ILCs as declared in Table 1. Additionally, three anomalous distributions are displayed, to show unsuitable characteristics, for which we excluded these examples from further analysis. Right-skewed distributions from counters 0529 and 1783 indicate unreliable operation, while the low frequency of counts at sensor 1950 exhibits its recurrent inactivity.
According to the typical weather situations in central and coastal Slovenia, we transformed the weather data to two categories. Namely, weather was classified as bad in case of rain/fog (precipitation > 2 mm/day) or snow (precipitation > 0 mm/day) and in case of low temperatures (T < 5°C if dry; T < 10°C if humid). Weather was classified as good in the remaining cases.

Matching the Counters
We observed 8 different road segments with a single ILC and at least 1 Telraam counter. Some of the observed counters measure the traffic data in both directions. We matched only the subsets of data that describe the traffic counts measured in the same direction (see Table 1). We labelled the data describing the traffic in the primary directions with a postfix -1 and the data describing the traffic in the secondary direction with a postfix -2. In the context of the Telraam counter, the primary direction is defined by the side of the road on which the counter is located. Even though counting the traffic on the same side of the road should reflect higher accuracy, this is not always the case. The accuracy of the counter is also strongly affected by the local configuration of the counter (e.g., orientation of the camera). In the context of ILCs, the accuracy between the primary and the secondary direction should be the same, since inductive loops are installed on both sides of the road. The observed segments together with their labels and corresponding counters are presented in Table 1.

Additional Features
We supplemented the count data obtained from ILCs and Telraam counters (see Table 1) with additional features. Namely, we additionally observed the time, the type of a day (weekend or workday) and the weather (good or bad) when each measurement was obtained. We observed if the prediction accuracy of ILC data is increased if these features are included or not. Moreover, we observed the prediction of ILC values based on these features only (i.e., without Telraam counters) on the selected segments.

Regression of Inductive Loop Counter Data
We tested different regression models on the prediction accuracy of an ILC data (labels) by using different features. More precisely, we applied kernel ridge [52], support vector [53], random forest [54], Gaussian process [55], Bayesian ridge [56], k-nearest neighbors [57], elastic-net [58], LASSO (Least Absolute Shrinkage and Selection Operator) [59], AdaBoost [60], bootstrap aggregating (bagging) [61], and gradient boosting regression [62] in our analysis. All regression models were trained on 70% of the data and tested on the remaining 30% of the data. Hyperparameter tuning for each model was performed using the grid search cross-validation as implemented in scikit-learn Python library [63]. The prediction accuracy of each model was evaluated with the coefficient of determination (R 2 ), which describes the proportion of predictable variation of the observed label (ILC) from the features (Telraam counters, hour of a day, type of day (workday or weekend), and weather).

Telraam Counters Positively Correlate with Inductive Loop Counters
Using the matching from Table 1, we analysed the correlations and normalised the mean absolute error (NMAE) between the ILC and Telraam counters on each segment (see Figure 3). Overall, the correlation coefficients indicate strong positive correlations for most of the segments. However, the data exhibit heteroscedasticity, meaning the variance is not constant-it increases with the growing vehicle count. On the other hand, low counts (near zero) by some Telraam counters (e.g., those on Dunajska segments) correspond to relatively large values (above 100) measured by ILCs. This indicates that the Telraam sensors can miss some vehicles in low traffic. For these reasons, to improve the prediction accuracy, we used advanced regression models instead of simple linear regression.

Prediction Accuracy Increases with the Number of Features
Secondly, we analysed the coefficient of determination (R 2 ) values in dependence on the included features (see Section 3.4). We ran the regression process using the selected regression models in a combination with grid search cross-validation to perform the hyperparameter tuning on the training dataset (70% of the data). Furthermore, we identified the best model based on the testing dataset (remaining 30% of the data) for each segment and for each feature set. The results of this analysis are presented in Figure 4. These indicate that selecting all the available features considerably increases the coefficient of determination in the great majority of the cases. However, in most cases, having a single (more accurate) Telraam counter is able to predict the state of the ILC accurately in combination with basic features (namely, time of a day, type of a day, and weather conditions). Overall, basic features should be included in the regression model since they substantially increase the R 2 values. Moreover, their collection does not present any additional costs.

Optimal Regression Models Are Consistent through Different Segments
Finally, we identified the best regression model for a given segment using a similar procedure as described in Section 4.2. For each regression model and for each segment, we identified the feature set that yielded the best R 2 score on the testing dataset. The best performing models for each segment are presented in Table 2 Figure 4. Dependence of R 2 of a testing dataset on the feature set included in a regression model. Each subplot presents the results obtained on a selected road segment. The feature set denoted as basic presents the basic features, which incorporate time of day, weather conditions (good or bad), and type of a day (workday or weekend). Other features describe the data obtained from Telraam counters. The results obtained with the best regression model for a given scenario are presented. Different colours indicate different feature sets. All possible features were selected in all of the cases. However, an additional (less reliable) Telraam counter did not have a significant contribution to the R 2 increase in the majority of the cases (see Figure 4). Even though the kernel ridge regression model performed the best on the majority of the segments, random forest regression, bagging, and gradient boost regression had very similar scores in most cases (see Figure 5).
A more accurate description of the best performing models together with their parameters is available at https://github.com/SusTra/TraCo/blob/master/best_models.pdf (accessed on 25 November 2021). We also provide these models in a pickle format to allow an interested user to reuse them. A brief description of how to load these models is provided in a README file of the repository supplementing this paper (please see https://github.com/SusTra/TraCo#readme, accessed on 25 November 2021).

Discussion and Conclusions
Traffic monitoring is still mostly based on conventional traffic sensors governed by traffic agencies. These are frequent on major roadways, but since their installation is difficult and costly, they are not present at all locations where vehicle counting is desired. Our goal in this research was to predict the vehicle counts based on data which are collected at low cost and easily accessible, and at the same time also fostering the engagement of wider audiences in data collecting and involving them in the scientific process by monitoring protocols. We developed a methodology to accurately predict the vehicle count data with inexpensive and less reliable sources, namely, Telraam counters combined with weather and time data. We showed that in places with absent reliable sensors these can be replaced with simple and low cost, although possibly less precise sensor systems.
A Telraam counter offers a simple device that can be used by citizens to monitor the traffic for different types of vehicles as well as pedestrians. However, these measurements can be inaccurate due to variety of reasons, e.g., bad placement of the camera or poor outdoor visibility. Some of these factors also affect the traffic conditions and were included in our models. We used regression analysis to evaluate relations between the observed variables and to increase the accuracy of an individual Telraam counter. Data gathered by reliable ILCs were taken as a reference point. Namely, the number of vehicles detected by an ILC in a given timeframe (i.e., within an hour) was regarded as the dependent variable. Initially, independent variables presented measurements by Telraam devices at approximately the same location as ILCs. To improve the prediction accuracy, we added additional features. These include the weather conditions (good or bad), the time of the day, and the type of the day (workday or weekend) when each measurement was collected. To disregard the specificity of a single micro-location, we included eight different road segments with an installed ILC and one or more Telraam counters.
For each analysed segment, we established several regression models which predicted the values of the dependent variable based on independent variables. We tested how well each model fits the observed data by calculating the respective coefficients of determination (R 2 ). In most cases, all models performed well and produced similar R 2 values (see Figure 5). This can be seen especially in segments Ižanska (from centre) and Slovenska (from centre), where all 11 models presented R 2 values above 0.8, indicating a good fit. On the other hand, the models for segment Dunajska (to centre) showed substantially distinct relative performance. Overall, the best model with the highest R 2 was consistently the kernel ridge regression, winning in all cases but one (see Table 2). However, as evident from Figure 5, random forest regression, bagging, and gradient boost regression had a very similar scores as kernel ridge regression in most cases.
Kernel ridge regression combines ridge regression with the kernel trick. It learns a linear function in the space induced by the kernel and the data. If the kernel is not linear, this corresponds to a nonlinear function in the original space. Kernel ridge regression can reduce variance by shrinking parameter estimates, which makes it less susceptible to overfitting. It increases prediction accuracy even when dealing with noisy data. This is also evident in our study, where kernel ridge regression consistently yields the best results. Boosting algorithms train several models in sequence, each trying to improve the performance of its predecessor. Gradient boosting solves an optimisation problem, reducing a loss function by adding a weak learner at each step. Random forest consists of several simpler decision trees (weak learners), combining their results to make a better prediction. Each tree operates on a different subset of training data, so it observes different patterns. The combination of trees produces more robust predictions and often yields better results than linear regression, which is evident in our case. Bagging regression is another ensemble learning method that fits regressors on different random subsets of the training data and then aggregates their predictions. The techniques presented above help to reduce overfitting and gave good results in our study (see Figure 5). They also handle the collinear data well, e.g., a linear relation between two independent variables corresponding to counts by two Telraam devices. Unlike linear regression, the models that we used are nonparametric, i.e., they do not assume anything about the underlying distribution of the data. This is useful in vehicle count data since we cannot expect it to be normally distributed.
Additionally, we examined which independent variables contribute most to the goodness of model fit, i.e., how the inclusion of different variables increases R 2 . The highest R 2 value was always achieved by considering all features (all Telraam counters, weather conditions, time of day, and type of day), as shown in Table 2. However, the impact of individual features varied drastically in some road segments (see Figure 4). In the Slovenska (from centre) segment, the Telraam counter 0619-1 alone sufficed for accurate prediction. In other cases (e.g., Slovenska (to centre) and Škofije (towards Trieste)), adding additional features to the Telraam counter significantly raised the R 2 . Thus, our results showed that prediction by the less reliable Telraam counter can be substantially improved by including extra features relevant to traffic conditions. In addition, the analysis singled out the unreliable Telraam counters that do not contribute much to the model's performance. For example, counter 0820-1 in segment Ižanska (from centre) predicted almost as well as both counters 0820-1 and 1506-1 or all features, but concerning only counter 1506-1 and excluding other variables yielded much lower values of R 2 . Moreover, even in segments where each Telraam counter is highly unreliable, e.g., Dunajska segments, combining all these counters with basic features immensely increased the R 2 .
Our analysis identified distinct characteristics of models relative to different road segments and various included features. Some of them were "well-behaved", e.g., the Telraam counter 0619-1 in segment Slovenska (from centre) and counter 0820-1 in segment Ižanska (from centre). Both counters were sufficient to achieve good model fit with high R 2 , attaining it even without considering other features (see Figure 4). Additionally, this was accomplished by all tested regression models (see Figure 5). This indicates that some Telraam counters are very reliable, i.e., they correlate highly with ILCs, as can be observed in Figure 3. The plots of counters 0820-1 and 0619-1 show high values of both Pearson's and Spearman's correlation coefficients and also homoscedastic behaviour. Telraam counters with these characteristics can be used to substitute ILCs with little or no loss in accuracy. Oppositely, few other counters exhibited poor performance, e.g., the aforementioned Telraam counters on the Dunajska segments. The regression models for the Dunajska (to centre) segment produced a wide range of R 2 values, yet even the best model achieved a relatively low score. Using any counter 0656-2 or 0655-1 alone resulted in extremely low R 2 ; thus, they had to be coupled with other features to yield useful predictions. Low correlation is noticed in the corresponding plots in Figure 3, showing low Pearson's and Spearman's correlation coefficients and high heteroscedasticity. We also observed the normalised mean absolute error (NMAE) of each Telraam counter regarding the observed ILC counter. This can be interpreted as a relative accuracy of a Telraam counter. However, the detection accuracy should be analysed together with correlation coefficients. Lower but consistent values of detection accuracy might produce better predictions than higher but inconsistent detection accuracy values. The segment Dunajska (to center) has a significantly higher detection accuracy than the segment Škofije (towards Koper). However, in the latter case, Telraam measurements are much better correlated with ILC measurements in comparison to the former, and thus a more accurate prediction can be obtained. The reduced detection accuracy of the observed segment is probably due to the higher vehicle speeds on the Škofije (towards Koper) segment, which is located on a highway exit. However, these errors seem to be consistent throughout the experiment and do not reduce the predictive power of the observed Telraam counter.
Although there is a growing belief that data collected by citizens promises to transform how we live, move, work, and think, there is also a growing demand to establish straightforward procedures to validate these data. Namely, the realisation of the potential of citizen science relies not only on the ability to extract information and to interpret massive data by deliberating data analytics, machine learning, and the ability to provide data-driven decisions and predictions [64] but also on the constant validation of the obtained data. Telraam data include inaccuracies due to different factors. These inaccuracies are also recognised and documented by Telramm developers [15]. They relate to the positioning of the camera, viewing angle of the camera, window transparency, etc. However, they also relate to road characteristics such as the number of lanes, road levelling, and the vicinity of traffic lights or crossroads, which can cause queuing on the counting point and thus poor vehicle recognition. Many of these challenges can be fixed by a suitable installation of the counters and information about the street profile. The accuracy of car detection is currently estimated to at least 85% in most scenarios if the device placement follows the basic requirements. The quality of the collected data and the potential to detect errors thus depend also on the end-user. The question, however is how much support and stimulation users receive by the scientific community to accurately set the devices, consciously track the data, and to intervene when deviations are detected. In this regard, Telraam encourages citizen users to provide additional information about their street profiles and to follow the ranking and benchmarking of their street segments. If a particular street has to absorb significantly more car traffic compared to other similar streets, then residents have an extra objective argument to enter into a dialogue with the local government and to engage in the planning process. This can also be considered as a step towards more verified data. Nevertheless, there is also a declared objective to develop a Telraam system also in terms of automatic solutions to filter out bad data and to advance the extrapolation techniques to estimate the typical traffic and relative traffic counts. In the near future, the Telraam sensor version 2 is planned to be released with redesigned hardware and software and an overhauled detection algorithm adopting artificial intelligence for increased accuracy of the traffic count data. As estimated by the developers, this would additionally improve the accuracy of detection and classification of different objects. Currently, more evident difficulties in object recognition are associated with distinguishing between cars and large vehicles and accurately distinguishing bikes versus pedestrians. Since in our study we only observe the vehicle counts and do not distinguish between the vehicle categories, this should not affect the observed accuracy. Another aspect to note is the visibility issues related to sunlight and weather conditions. During the night hours, the camera is not active. Since the counts are rescaled and reported for each full hour, the first and the last hour of counting in a day can have a very low up-time. For example, if the camera activates near sunrise at 6:50 in the morning, there is only a small portion of this full hour available for counting, and consequently, the rescaled data for this hour have a relatively large uncertainty. This can be managed by removing the count data for hours with camera up-times lower than some threshold, as also used in this study. However, if the up-time of the first and the last hour of daylight is nonetheless adequate even though the light conditions are poor, this can affect the obtained results due to low visibility. The same can happen in rare extreme weather conditions, such as heavy showers or extremely dense fog. The described situations are rather rare or well expected (recurrent) and thus can be managed by removing such data. Moreover, the Telraam community is planning to develop and apply a sensor that will be capable of traffic counting even in dark conditions, which will solve the above issues.
One of the contributions of this paper also includes the introduction of a methodology that can be used to validate emergent, less reliable datasources (i.e., Telraam counters) using a well-established platform (i.e., ILCs). Using the proposed methodology, we identified the anomalies in the data and tried to find the reasons for these by gaining information of the actual positions of cameras and the characteristics of a road section to acquire wider understanding of the traffic records gathered by Telraam sensors.
Another issue to be addressed here regarding the open data used in this study is in finding the reasons for the possible cut outs of counting for shorter or longer periods, due to technical problems or the subjective decision of a participant citizen to start/stop using the sensor and contributing data. While in the case of ILCs, their location and operation is constant and provides frequent and regular data, the Telraams sensors, their number, their location, and their periods of operation cannot be well predicted. Broader use of these sensors by citizens would to some extent solve this problem, as possible cut-outs could be replaced by redundant sensors in a direct vicinity. At the time of this study, the distribution of Telraam counters was still relatively sparse which limited possibilities of including more sensors per one location (road segment) in the study. Furthermore, matching the ILCs and the Telraam sensors required some compromised decisions regarding the sensors vicinity and location. We were not always able to find a micro-location with fully overlapping matches. This might not represent a large drawback when the road segment is homogeneous and not subject to major inflows or outflows between two or more counters. However, in the case of the Dunajska (to centre) segment, we recognised a significant inflow/outflow between ILC (1004-136-1) and Telraam (0655-1) counters, which might be the cause of a poor correlation between them.
Overall, the reported regression analysis showed that conventional traffic monitoring systems can be to a large degree substituted with emergent, affordable, and citized-based distributed solutions. We demonstrated this in a case study applying ILCs as current and Telraam counters as a potential future platform. One of the main benefits of the latter is also in its scalability. Namely, the number of Telraam counters can be easily increased on a given (relevant) road segment by simply engaging the general public. This also enhances the participatory role of the citizens and their concerns about their living spaces. In our opinion, increasing the number of Telraam counters to achieve a certain degree of redundancy would be able to reach or even supersede the accuracy of conventional traffic monitoring infrastructure. Using a large number of less reliable sensors to improve the accuracy of obtained data has already been successfully employed in different engineering disciplines [65]. An example vividly illustrating the concept of improving the measurement accuracy by redundancy was reported by Weiss et al., who presented a highly accurate clock implemented with the integration of data obtained by using a set of inexpensive and imprecise watches [66]. In our case, we were able to obtain relatively accurate predictions of count data by employing a single or two Telraam counters. We believe that increasing their numbers would additionally improve the prediction accuracy and would thus provide a reliable infrastructure for traffic monitoring. In the near future, this infrastructure could as well be supplemented with alternative citizen-engaged projects and initiatives directed towards planning and maintaining sustainable cities.