Private Car O-D Flow Estimation Based on Automated Vehicle Monitoring Data: Theoretical Issues and Empirical Evidence

: Data on the daily activity of private cars form the basis of many studies in the ﬁeld of transportation engineering. In the past, in order to obtain such data, a large number of collection techniques based on travel diaries and driver interviews were used. Telematics applied to vehicles and to a broad range of economic activities has opened up new opportunities for transportation engineers, allowing a signiﬁcant increase in the volume and detail level of data collected. One of the options for obtaining information on the daily activity of private cars now consists of processing data from automated vehicle monitoring (AVM). Therefore, in this context, and in order to explore the opportunity offered by telematics, this paper presents a methodology for obtaining origin–destination ﬂows through basic info extracted from AVM/ﬂoating car data (FCD). Then, the beneﬁts of such a procedure are evaluated through its implementation in a real test case, i.e., the Veneto region in northern Italy where full-day AVM/FCD data were available with about 30,000 vehicles surveyed and more than 388,000 trips identiﬁed. Then, the goodness of the proposed methodology for O-D ﬂow estimation is validated through assignment to the road network and comparison with trafﬁc count data. Taking into account aspects of vehicle-sampling observations, this paper also points out issues related to sample representativeness, both in terms of daily activities and spatial coverage. A preliminary descriptive analysis of the O-D ﬂows was carried out, and the analysis of the revealed trip patterns is presented.

Typically, to estimate current demand flows, surveys can be carried out, usually by interviewing a sample of users (direct estimation), and demand can be derived using results from sampling theory.Estimation of current origin-destination demand flows can be improved by combining the estimators with aggregate information related to O-D demand flows [20]; e.g., traffic counts: counts of user flows on some elements-links-of the transportation supply system-transportation network).Alternatively, demand (present and future) can be obtained by modelling systems.While the former provides actual/current matrices with which, when assigned to the network, flows closest to traffic counts are reproduced, the latter allows O-D flows to be linked with land use, socio-economic factors, and level-of-service attributes.Therefore, the effects due to changes occurring in future scenarios on such factors can be assessed.
As summarised below, O-D flow estimation using traffic counts is the focus of great interest due to the amount of data coming from the transportation network [21].Indeed, there are currently issues capturing door-to-door travel characteristics using traditional methods.Avenues of progress are being opened up by the development of new technologies to automate and facilitate disaggregate data collection, such as GPS devices, cellular-network positioning from call and activity data and smartphone sensors, point-to-point detection sensors [22], licence plate recognition [23], bluetooth [24], and radio-frequency identification [25].The opportunity offered by telematics and the penetration of tracing/tracking services on car movements (i.e., floating car data-FCD), which allow continuous information in time and space to be obtained, gave rise to the prime objective of this paper: to investigate urban private transport through automated vehicle monitoring (AVM/FCD) data in order to identify trip characteristics and infer origin-destination demand flows from cars (private vehicles).
The development of telematics during the last 30 years has significantly enhanced research possibilities in the field of transportation, also impacting on traditional survey methods.Classical survey methods for collecting travel data have been adapted to the digital era.For instance, travel diary-based approaches have been transformed into electronic travel diaries with global positioning systems (GPS), thereby speeding up data transfer from users to research groups and providing details for understanding more in-depth travel patterns [26][27][28][29][30]. Innovation has also driven transport system modelling: reverse assignment procedures have been developed to update network and demand model parameters [11,18]; In particular, in transit modelling, smart-card data were used to estimate the origin-destination matrix as well as calibrate and validate assignment models ( [17,[31][32][33][34][35].
By the same token, freight demand modelling benefits from telematics, especially in revealing empirical [36][37][38][39] and in modelling delivery tours [40][41][42][43][44].Of course, the challenge is to obtain data from private freight operators to measure and evaluate the market.This is non-trivial, given that operators avoid sharing data, as they are aware that they risk losing competitive advantage given that such technology is usually used for supporting fleet management and insurance issues [42,45,46]; The collected data can be used to refer only to vehicle location tracked second by second, allowing spatial and timeframe features of the tours made to be assessed.
While automated vehicle monitoring (AVM) data (including GPS) for vehicles are increasingly available due to the deployment of telematics in vehicles with the aim of providing further services [47], as well as for insurance purposes, methods to process such data with the objective of analysing and modelling trip chains (i.e., sequence of daily trips for performing daily activities) have not been fully explored with regard to the implication of underlying origin-destination demand flows by car.
To investigate origin-destination flows, the travel patterns followed by private vehicles (cars) have to be identified, as summarised in Section 2. Trip chains are an important element of vehicle movements in an urban or metropolitan setting for estimating origindestination flows that, once assigned to the network, allow link flows to be obtained and traffic impact to be estimated.Thus, as stated above, the main objective of the paper is to study the chief characteristics of private vehicles (cars), identifying their trip patterns and their daily activity, as well as using the information collected to estimate an origindestination matrix to be used for analysing transport systems.In particular, the sample O-D matrix is extracted from AVM/FCD data, and subsequently, their representativeness is evaluated, as discussed in Section 4. In fact, devices from which transportation data can be obtained concern a portion of a population whose representativeness is not always easily available to researchers/technicians (e.g., private cars equipped with on-board units to reduce insurance costs [48].Secondly, the statistical significance of the estimates obtained [49][50][51][52], as well as goodness in reproducing vehicle link flows, should be pointed out.Thus, the opportunity offered by AVM data is explored.The data source consisted of a sample of about 30,000 cars (388,000 trips) travelling on five working days within the Veneto region (Northern Italy) in October-November 2018.
The paper is organised as follows.Section 2 outlines the literature on the usage of AVM/FCD for route analysis and simulation as well as on O-D estimation, while Section 3 presents the methodology and its implementation to a real test case.Section 4 summarises our results, and some conclusions and further developments are discussed in Section 5.

Literature Review on FCD Usage
An emerging research challenge is to use FCD to describe users' mobility behaviour as well as route choices.Therefore, the literature on the use of FCD for simulating route choice is reviewed below, which is followed by examination of the procedures developed for origin-destination demand flows.

Path Choice and Route Attribute Evaluation
In general, AVM/FCD contain three different levels of data: basic information about vehicle ID, location, time attributes and driving direction, vehicle information, which includes a vehicle's status data (gear, brakes, etc.), and extended information that provides video frames [53].To assess the travel demand patterns, the first-level data, i.e., basic information, are commonly used.This dataset is collected based on GPS systems, which allow definition of the space-time features of the trips undertaken.Following the same lines, Ehmke et al. [54], Rahmani et al. [55,56], and Tu et al. [57] studied the time attributes of trips using FCD.Tu et al. [57] proposed a new approach to investigate the time-varying shortest path.The approach assumes reconstruction of the traffic conditions from discrete space-time points to traffic flows by using a map-matching procedure.This allows the paths used by vehicles to be tracked, together with their speeds.More than 11,000 taxi-based floating car data in Wuhan (China) showed that the shortest paths of the same O-D pair change with the spatio-temporal varying traffic state.
Ehmke et al. [54] showed that FCD can be a very useful source for describing timedependent travel times in a network.They used FCD from a taxi service in Stuttgart (Germany) and identified 100 O-D pairs.Different levels of aggregation in determining time-dependent travel times from a database of historical FCD are presented and evaluated with regard to routing quality.
Rahmani et al. [55] developed a non-parametric method to estimate route travel time distribution using low-frequency floating car data.Considering the problem of low polling frequency for FCD, Rahmani et al. [55] proposed a new method for reconstructing the vehicles' travelled paths and merging them into the entire routes.Subsequently, given that estimation of urban network link travel times from sparse FCD usually needs preprocessing, mainly map-matching and path inference for finding the most likely vehicle paths are dealt jointly; in fact, paths have to be consistent with reported locations.Path inference requires a priori assumptions about link travel times that can be unrealistic and affected by bias with issues in shortest path identification.Therefore, Rahmani et al. [56] developed a combined procedure for path inference and travel time estimation using FCD from taxis in Stockholm (Sweden).
Dewulf et al. [58] studied FCD-based trips made by 400,000 vehicles between 6651 traffic zones of Flanders (Belgium).They traced the travel times for every monitored vehicle, which were used to obtain the aggregated model on generic travel time for peak and off-peak periods for the complete road network of the case study.Thus, using FCD, they revealed the commuting patterns for inter-city and intra-city trips and detected the congested directions of the study area, thereby contributing to more precise prediction of travel times compared with free flow-based models.
Yamamoto et al. [59] and Cao et al. [60] estimated link flows using observed link speeds from FCD, while Croce et al. [61] focused on the use of FCD to define perceived alternative paths (choice set) and the choice of one path in the path choice set.Stipancic et al. [62] examined vehicle manoeuvres using GPS data from smartphones of vehicle drivers and explored their potential as surrogate safety manoeuvres through correlation with historical collision frequency and severity across different facility types.

O-D Demand Flow Estimation
FCD are also considered as the basis for estimating O-D matrices [16,39,58,[63][64][65][66][67][68].Carrese et al. [64] proposed the dynamic estimation O-D matrix approach using FCD.Based on a district road network in Rome, they monitored 12 O-D flows with FCD to assess route choice probabilities and compare them with travel demand generated for 38,000 vehicles in one hour.The results show the great potential of FCD for the dynamic demand estimation problem, allowing enhancements in the accuracy of O-D travel times and route choice probability reproduction.Moreover, the authors revealed the greater importance of FCD than information received from traffic counts, although they used a limited number of monitored vehicles and a small study area (only 54 traffic zones and 400 regular nodes).
Another step to justify the benefits of FCD usage was made by Yang et al. [65], who developed the O-D flow model based on sampled GPS positions of probe vehicles.Considering the map matching procedure, they described the sampled O-D flows and, using the generalized least square method, defined the actual travel demand.The traffic count data were taken as the basis for O-D demand updating.Sbaï et al. [66] proved the necessity of data fusion from several tele sources such as FCD (taxis with GPS detectors), loop detectors, and smartphones.Among the tele sources studied, Sbaï et al. [66] estimated FCD as being more reliable for travel demand assessment but noted the problem of a small share of taxis in the total traffic.This resulted in difficulties for data extrapolation to the whole population.Sbaï et al. [66] also revealed the differences in taxi drivers' behaviour from that of regular car drivers, such as higher travel speed and more frequent stops to enable passengers to board or alight.
Guo et al. [69] focused on estimation of origin-destination trips and proposed an approach to the discovery and understanding the spatio-temporal patterns of movements.They used a large dataset from Shenzhen (China) to test and validate the proposed methodology.Tang et al. [70] focused on taxi trips extracted from GPS data.Then, the travel distance, time, and average speed in occupied and non-occupied status are used to investigate human mobility.They estimated the OD matrix of the inner area of Harbin city and modelled the traffic distribution patterns based on the entropy-maximizing method.Nuzzolo et al. [39] used sampled FCD on 310 taxis in Rome to assess the list of the space-time attributes, revealing the behaviour of taxi trips.Among the main findings of the study, the authors identified demand peaks on Monday morning and Saturday night as well as a very small number of requests for service on Sunday.Analysis of the spatial features of the trips made revealed high-density O-D flow distribution within the Inner Railway Circle (i.e., city centre) and that Fiumicino airport was a huge attractor.The distribution of trip distance showed the prevalence of O-D flows undertaken in the interval from 2 to 5 km.Thus, the basic space-time features of the taxi service in Rome were identified, and the benefits of using autonomous vehicle services to provide an on-demand service in the future were investigated.
Vogt et al. [67] and Dabbas et al. [68] implemented the O-D matrix calculation according to a double-constrained gravity model with FCD usage for impedance function formalization.Tracking the single vehicle trajectories and defining the number of turns made according to FCD allowed Vogt et al. [67] and Dabbas et al. [68] to form the values of the impedance function for every O-D pair in the matrix.Thus, by tracking the route choice along with zone capacity estimation based on FCD, the authors of both studies implemented the O-D flows calculation.The data were calibrated with information obtained from traffic counts according to an information minimization model.The results presented in both studies indicate the increase in computation accuracy if FCD are used for travel demand assessment.Finally, Mitra et al. [16] developed a methodology for obtaining demand matrices without any prior information but starting from FCD including info on vehicle trajectories.The procedure was successfully tested in Turin (Italy).
In summarise, it may be concluded that FCD/AVM have been successfully used both for monitoring paths used by each vehicle and for obtaining link flows (microscopic level) as well as for large-scale simulation of transportation systems (e.g., macroscopic level).However, further research efforts are required to go beyond such preliminary results, especially to obtain timely O-D demand flows.Therefore, a methodology for obtaining present O-D demand flows by car is presented below.

Methodology
To identify private vehicle (car) O-D matrices from AVM/FCD samples, a procedure was developed and tested.Its main feature is that it only uses raw GPS data to detect user activity stops, thus estimating the speed, distance travelled, and status of the engine.Such data, as discussed in Section 2, are anonymised for avoiding privacy issues and have been used for investigating travel patterns; however, a few studies pointed out the opportunity to identify trips and subsequently for O-D estimation.Then, it emerges that further work is needed for investigating technical challenges as well as theoretical issues related to sampling.
The proposed methodology consists of three main steps depicted in Figure 1: car trip detection, sample O-D matrix, and O-D matrix.The first two steps are preparatory to the last, which, instead, represents the focus of this research.Therefore, a description of each step is given below.

Car Trip Detection
According to the purpose of the procedure developed (i.e., estimation of the O-D matrices from AVM/FCD), and given that such O-D matrices represent the spatial characterisation of trips made by grouping them by place (zone or centroid) of origin and destination, the first stage is to identify such places of origin and destination.In this context, a trip is defined as the act of moving from one place (origin) to another (destination) in order to carry out one or more activities.

Let ( )
, , , , ID t s x y be the probe (monitored/surveyed) vehicle datum.It allows ve- Consider a region divided into a set of zones in which the monitored vehicles drive.Once the study area is identified and zoning is performed, for each survey day, the procedure for the daily O-D demand flow estimation consists of the following steps (Figure 1):

1.
Car trip detection: according to some predefined rules, this stage aims to detect the activity stops performed by each sampled vehicle; therefore, the individual trips (with origin and destination) undertaken by each surveyed vehicle can be obtained; 2.
Sample O-D matrices; according to study area zoning, the origin and destination of each sample vehicle trip is identified; then, through an aggregation procedure, the sampled O-D vehicle trips are then merged in order to obtain the daily (or timely) O-D matrix; 3.
Expansion to the universe of investigation; in this step, the sample daily (or timely) O-D matrices need to be expanded to the universe of observation in order to obtain the daily/timely-dependent vehicle O-D matrices of the study area.This step can be considered the core of the procedure, given that the statistical significance of the sample needs to be determined.

Car Trip Detection
According to the purpose of the procedure developed (i.e., estimation of the O-D matrices from AVM/FCD), and given that such O-D matrices represent the spatial characterisation of trips made by grouping them by place (zone or centroid) of origin and destination, the first stage is to identify such places of origin and destination.In this context, a trip is defined as the act of moving from one place (origin) to another (destination) in order to carry out one or more activities.
Let (ID, t, s, x, y ) be the probe (monitored/surveyed) vehicle datum.It allows vehicle location and status to be identified.ID is the vehicle identification code, t is the time when the datum is obtained, s is the status of the vehicle engine, and x, y are the GPS coordinates, i.e., latitude and longitude.With two consecutive data of a given vehicle, the significant changes in vehicle position (i.e., x, y ) according to relevant status s can be detected, and then, the origin and destination of a trip can be identified.A sequence of trips, following each other in such a way that the destination of one trip coincides with the origin of the next, is referred to as a journey or trip chain [17].Therefore, from the fine-grained AVM/FCD, the activity stops need to be identified.
The procedure proposed in this paper is based on the speed and engine status of a vehicle measured minute by minute.The procedure, shown in Figure 1, evaluates these measures so as to determine whether the vehicle has completely stopped or is moving at a very slow speed.The most significant source of errors in classifying stopping events from vehicle data are observations wherein the vehicle has stopped at a bottleneck, but when evaluating its speed, it appears parked.By evaluating both speed during the previous time interval, as well as the GPS data and engine status, the procedure ensures that only activity stops (e.g., longer than a pre-fixed threshold and far from while-travel intermediate/service sites, such as petrol stations) are classified as such in the result.

Sample O-D Matrices
Once the trips belonging to a trip chain have been identified, the next step is the spatial characterization of trips made by grouping them by place (transport zone) of origin and destination.This information (demand flows) can be arranged in tables as O-D matrices, whose rows and columns correspond to the different origin and destination zones, respectively.Then, we can discretize every transport zone as the array of the following coordinates: where Z i is a transport zone i presented as the array of coordinates ϕ, λ j relative to a generic point j spatially within the zone border.
Hence, having obtained the vehicle datum ID, t, s, ϕ, λ j , classified as origin or destination of a trip according to an earlier step, the location ϕ, λ j is evaluated in order to identify the relative traffic zone.This means that ϕ, λ j belongs to Z a only if Accordingly, zone Z j can be grouped into larger administrative units, such as municipalities and provinces: where Ω p is the total set of transport zones within the administrative district p.
Subsequently, the number of trips from origin zone o to destination zone d can be obtained as follows: where T j,od is the generic trip j revealed with origin zone o (Z o ) and destination zone d (Z d ).
The O-D matrix obtained can be characterised for time slice h (selecting only trips belonging to a given time slice h), vehicle type (selecting only trips belonging undertaken by a given vehicle type), and so on.

Expansion to the Universe of Investigation
Once the sample O-D matrices for different survey days or times are determined, current travel demand can be estimated starting from these results.Then, knowledge of sampling units (vehicles) and the method for enumerating the population universe (e.g., lists of registered vehicles in a traffic zone or counts of passing vehicles) are required.This represents one of the main issues in adopting such FCD/AVM data for travel demand forecasting.Indeed, the issue of estimating O-D demand flows depends on the sampling strategies used, and it is also important to investigate sample representativeness both in terms of trip production and attraction.
Assume that the total statistical population (vehicles) is divided into K classes (e.g., vehicles registered in a province) or strata.Let k be the generic stratum with a population of N k vehicles: n k elements are drawn from each stratum.
If T ik od is the number of trips with the required characteristics (e.g., starting in the morning) undertaken by the i-th element (vehicle) in the sample of stratum k, an estimate of the total number of trips can be obtained as follows [20]: where T k od is the average number of trips observed in the k-th stratum, and w k is the weight of stratum k with respect to the universe N.
According to Cascetta [17], the variance of the stratified sampling estimate, VC od , can be estimated as follows: where k is the sample estimate of the variance of the variable n ik od : • α k is the sampling rate in the k-th stratum.
Such issues are discussed below, and empirical evidence is shown to verify the goodness of O-D demand flow estimation using FCD/AVM data.

Application to a Real Test Case
This section reports the application of the methodology used to estimate O-D demand flows in the Veneto region (northern Italy).The application was carried out to ascertain to what extent the objective of using FCD/AVM data to estimate transportation demand flows might be realistic.Our investigation was performed by analysing a large dataset of private cars driving in the region.The results allowed trip-chaining patterns to be identified and helped reconstruct O-D trip flows as well as road link flows.

The Study Area and Available Data
The study area is the Veneto region with its seven provinces.It has an area of 18,391 km 2 and 4,905,037 inhabitants.The main socio-economic characteristics of the region are summarised in Table 1 (sources: ISTAT [71], ACI [72]).The provinces of the Veneto region are roughly equivalent in population terms, except for Belluno and Rovigo, which have approximately four times fewer inhabitants.These provinces also have the smallest number of registered cars in the region.Estimation of car motorization level allowed identification of the variation of this parameter by province within a range from 552 to 673 cars per 1000 inhabitants.As shown in Table 1, the number of cars per inhabitant is quite similar in the whole region, except in Venice where it is about 14% less than the average regional number.

Car Trip Detection
Mobility by car within the Veneto region was investigated through FCD/AVM data, which according to some criteria, such as its large volume and by-product nature, can be considered big data.The collected data consist of information related to car trips within the Veneto region (i.e., at least one survey datum inside the region on the survey day) from the first to the last trip performed in the whole day.The data were analysed to identify travel patterns, thereby obtaining indications on the trips performed.The available database consists of five working-day observations, spread over the autumn months (i.e., October-November 2018) in different working days.For each sampled vehicle, the information form contains the basic vehicle data such as vehicle class, brand, year, type, fuel type and gross weight.The daily car operation logs contain all trips made by the surveyed vehicle in chronological order: vehicle identifier, date (date the record is logged), timestamp (time the record is logged), coordinates (geographical location: latitude and longitude), instantaneous speed, type of road (urban, extra-urban, freeway), and direction angle.No information on the type of activity carried out or the registered trip purpose of surveyed vehicles (e.g., work or university) was available.After extensive cleaning and elimination of observations with missing data, the remaining data were processed in order to investigate travel patterns and trips undertaken.In all, 29,158 vehicles were analysed, corresponding to about 70,000 trip chains undertaken in five days.
Subsequently, the daily vehicle operation logs were analysed in order to identify trips, as introduced in Section 3. Empirical rules (e.g., stopping longer than 20 min far from a petrol station) were used to determine whether the trip reached its destination to perform an activity.It allowed trip origin and destination to be identified and the dataset as pictured in Figure 2 was built.
type and gross weight.The daily car operation logs contain all trips made by the surveyed vehicle in chronological order: vehicle identifier, date (date the record is logged), timestamp (time the record is logged), coordinates (geographical location: latitude and longitude), instantaneous speed, type of road (urban, extra-urban, freeway), and direction angle.No information on the type of activity carried out or the registered trip purpose of surveyed vehicles (e.g., work or university) was available.After extensive cleaning and elimination of observations with missing data, the remaining data were processed in order to investigate travel patterns and trips undertaken.In all, 29,158 vehicles were analysed, corresponding to about 70,000 trip chains undertaken in five days.
Subsequently, the daily vehicle operation logs were analysed in order to identify trips, as introduced in Section 3. Empirical rules (e.g., stopping longer than 20 min far from a petrol station) were used to determine whether the trip reached its destination to perform an activity.It allowed trip origin and destination to be identified and the dataset as pictured in Figure 2 was built.The dataset "vehicles" reports information about private cars for each observation day and allows the sampling rating to be obtained.The primary key vehicle identification number allows vehicles to be combined with daily trips performed and stored in a dataset "trip description".This dataset stores the whole information on every trip undertaken on the basis of origin and destination coordinates.It represents the input for the second stage of the procedure described in Section 3 above for the definition of the sampling O-D matrix.Finally, in the dataset "trip details", the detailed information on the intermediate GPS data from trip origin towards trip destination is reported.This dataset allows us to track the trip chains made by every sampled vehicle through km-by-km or minute-by-minute information.
As stated above, our dataset refers to data from private vehicles (cars) driving at least along one road link of the Veneto region during one of the days in question.In particular, data covering different working days provided the opportunity to point out daily variations in terms of number of trips performed as well as origin-destination (spatial) coverage: 15.10 (Monday), 22.10 (Monday), 7.11 (Wednesday), 15.11 (Thursday), and 23.11 (Friday).It should be noted that the surveyed vehicle was studied for twenty-four hours with the possibility to extend recording to the next day in case the last trip had not been finished or to include the days before if the travel started on those days and concluded on the day of investigation.The database characteristics in terms of cardinality and day of investigation are summarised in Table 2.The distribution of vehicles sampled for the day of investigation and province of registration is reported in Table 3.It shows that there is a fairly constant value of sampled vehicles with a very low variance, as shown by standard deviation and by a coefficient of variation that is about 0.01 for all provinces.The share of vehicles registered outside the Veneto region is less than 12% and mainly from neighbouring provinces; as emerged from in-depth analysis, such vehicles are engaged in exchange O-D trips with municipalities near the regional boundary.Then, comparing the list of registered vehicles in the Veneto region (Table 1) with vehicles monitored, we observed that the daily sampling rate is, on average, 0.93%, which is congruent with the composition suggestions provided for O-D travel estimates by Smith [63].Service cars are also surveyed (i.e., the "unknown location" in Table 3 reflects the vehicles that belong to enterprises with different forms of ownership), and thus, their trip characteristics differ from those of private cars.Therefore, they were not considered in analysing the trip characteristics.

Sample O-D Matrices
Once the trips as well as their origin and destination have been defined, the following step provides their aggregation for O-D matrix building.To model the system, the study area (and possibly portions of the external area) was subdivided (Section 3.2) into a number of discrete geographic units called traffic analysis zones (TAZs).The zones were defined on the basis of official administrative areas called ACE (census areas), which are the aggregation of the census geographic units and are defined by the Italian National Institute of Statistics (ISTAT).This allows each zone to be associated with the statistical data (population, employment, etc.) usually available for such areas.In all, 675 TAZs related to 574 municipalities were identified.Therefore, for each sampling day, the O-D matrix was built; Figure 3   From preliminary analysis of the O-D matrices shown in Figure 3, the prevalence of intra-province trips emerges (Figure 4), as well as in the O-D flow patterns reported by ISTAT (Table 4).On analysing the results plotted in Figure 5, the low variability in terms of spatial distribution of daily trips emerges, confirming that the surveyed trips are mainly systematic.

•
The O-D matrices reproduce quite well the spatial distribution revealed by ISTAT with very limited daily variation.
Calculating the difference (∆ od ) between the province-based O-D matrices (SW sample od ) shown in Figure 4 with that of ISTAT (SW ISTAT od ; Table 4): (6) little difference emerges between the O-D flows structure in the sample and wider population.The calculated values of ∆ od are within the small interval.This empirically indicates the similarity of the sampling data with the real picture of car activity within the Veneto region.
Based on these initial results, below, the analysis is detailed to identify the features for reproducing O-D coverage as well as inferring trip chains in terms of number of stops.

Spatial coverage of O-D flows and trip characteristics
Attention was paid to evaluate the spatial coverage of the O-D sampling trips as well as the variability during the five days of the investigation.Table 5 reports the coefficient of variation (CV) of the number of O-D pairs with at least one trip revealed and the average number of daily O-D trips.It can be seen that these values are quite low, confirming the highly systematic nature of the demand flows.The CV is less than 0.05 for all intra-province O-D flows, which suggests high stability in trip patterns.
Thus, attention was paid to the following travel patterns: single direct or trip chains.In a single direct trip pattern, a vehicle makes only one intermediate stop to perform any activity, while a trip chain involves multiple single direct trips from the base (home) location.In the investigated dataset, the average number of trips is four, with two stops being the highest share of trip chains.The above result is fairly constant during the survey days, as shown in Figure 6 and Table 6.Our descriptive analysis of the trip chains in question reveals the predominance of even values for the number of trips made by a car in one day, confirming the systematic, home-based nature of the trips in question: the number of daily trips most frequently undertaken by the sampled cars was two (20.36%) or four (19.17%).Although the trip purpose was not known in advance, the revealed shares indicate (suggest) the predominance of "home-work-home" and "home-work-shopping/pleasure-home" trips, as shown by other studies such as Jiang et al. [73] and Vrtic et al. [19].These findings could steer further analysis with a view to inferring trip purpose from such data combined with land use data.Our descriptive analysis of the trip chains in question reveals the predominance of even values for the number of trips made by a car in one day, confirming the systematic, home-based nature of the trips in question: the number of daily trips most frequently undertaken by the sampled cars was two (20.36%) or four (19.17%).Although the trip purpose was not known in advance, the revealed shares indicate (suggest) the predominance of "home-work-home" and "home-work-shopping/pleasure-home" trips, as shown by other studies such as Jiang et al. [73] and Vrtic et al. [19].These findings could steer further analysis with a view to inferring trip purpose from such data combined with use data.

Expansion to the Universe of Investigation
According to the empirical evidence summarised above, a further step concerns estimating O-D demand flows.Given the empirical representativeness of the sample in terms of province, and assuming that the sample does not contain a systematic distortion of the information provided, the procedure presented in Section 3.3 was applied.The stratum (or group of vehicles) k is represented by vehicles registered in the same province.Then, once the sampled and total population of each stratum are known, the sampling rate α k may be obtained.

O-D Matrix Validation
This section reports the results of the verification performed through application of the O-D demand flow estimation procedure presented in an earlier section.The aim of the application was to test the ability of the above procedure to reproduce revealed road link flows.A large sample of automated traffic counts was available in the Veneto region as shown in Figure 7, comprising traffic counts on several motorways and main roads, allowing us to characterize flows in terms of vehicle numbers during 13 working days.
Hence, statistical performance is measured by the "divergence" between the estimates f * (road flow vector whose elements, f * l , are calculated assigning to the network the O-D flows calculated through Equations ( 3) and ( 4)) and the true road link vector f , whose element is f l .The mean square error between the two demand vectors, MSE(f *,f ), is one of the most commonly used divergence measures: where m l is the number of road links.An alternative measure is the ratio between the square root of the mean square error and average link flow, which is analogous to the coefficient of variation of a random variable: Obviously, the lower the MSE and RMSE, the better the estimator f *.Table 7 summarises the mean square error (MSE) and the ratio between the square root of the mean square error and the average demand (RMSE) calculated, while Figure 8 reports a com-parison between the revealed and estimated vehicle link flows.The estimates are slightly scattered.However, the model yields good results, especially because the results are less fluctuating.Then, further analyses were developed in order to verify the dispersion of estimates.Hence, statistical performance is measured by the "divergence" between the estimates f* (road flow vector whose elements, * l f , are calculated assigning to the network the O-D flows calculated through Equations ( 3) and ( 4)) and the true road link vector f, whose element is fl.The mean square error between the two demand vectors, MSE(f*,f), is one of the most commonly used divergence measures: where ml is the number of road links.An alternative measure is the ratio between the square root of the mean square error and average link flow, which is analogous to the coefficient of variation of a random variable: Obviously, the lower the MSE and RMSE, the better the estimator f*.Table 7 summarises the mean square error (MSE) and the ratio between the square root of the mean square error and the average demand (RMSE) calculated, while Figure 8 reports a comparison between the revealed and estimated vehicle link flows.The estimates are slightly scattered.However, the model yields good results, especially because the results are less fluctuating.Then, further analyses were developed in order to verify the dispersion of estimates.The results plotted in Figure 8 show the modelled and observed link flows for road sections available in the Veneto region and summarises the estimation accuracy.The ordinate and the abscissa have the same scale, in which case goodness of forecast is represented by any point on the 45-degree line for which forecast=observed.Then, the 45-degree line is drawn to facilitate interpretation of the scatter plots and can reveal several forecasting characteristics: the estimates are slightly scattered, and the model reproduces actual link flows quite well.Correspondence between the regression line and the 45-degree line can be considered simply the measure of reliability.A comparison of the orientation of the regression lines and the 45-degree lines gives a visual representation of the relative quality of the forecasts: the goodness of fit is quite high, with a value of 0.96 (link flows), showing that the model yields good results, particularly, as stated above, because the results are less fluctuating.The results plotted in Figure 8 show the modelled and observed link flows for road sections available in the Veneto region and summarises the estimation accuracy.The ordinate and the abscissa have the same scale, in which case goodness of forecast is represented by any point on the 45-degree line for which forecast=observed.Then, the 45-degree line is drawn to facilitate interpretation of the scatter plots and can reveal several forecasting characteristics: the estimates are slightly scattered, and the model reproduces actual link flows quite well.Correspondence between the regression line and the 45-degree line can be considered simply the measure of reliability.A comparison of the orientation of the regression lines and the 45-degree lines gives a visual representation of the relative quality of the forecasts: the goodness of fit is quite high, with a value of 0.96 (link flows), showing that the model yields good results, particularly, as stated above, because the results are less fluctuating.
Therefore, the procedure to improve the estimates of present O-D demand flows by combining the FCD/AVM estimator with traffic counts was implemented (Cascetta [20]): where x is the unknown demand vector.The two functions z1(x, d ) and z2(v(x), f ) are the "distance" measures: z1 measures the "distance" of the unknown demand x from the a priori estimate d (from AVM data) and z2 measures the "distance" of the flows v(x) obtained by assigning x to the network from the traffic counts f .Thus, the problem is to Therefore, the procedure to improve the estimates of present O-D demand flows by combining the FCD/AVM estimator with traffic counts was implemented (Cascetta [20]): where x is the unknown demand vector.The two functions z 1 (x, d) and z 2 (v(x), f ) are the "distance" measures: z 1 measures the "distance" of the unknown demand x from the a priori estimate d (from AVM data) and z 2 measures the "distance" of the flows v(x) obtained by assigning x to the network from the traffic counts f .Thus, the problem is to search the vector d* that is closest to the a priori estimate d, and, once it is assigned to the network, produces the flows v(d*) closest to the counts f .The results of this step are summarised in Table 7 and Figure 9, proving the limited increase in performance (less than 3%) that can be obtained.The accuracy of estimates through AVM/FCD in reproducing the current origin-destination matrix is shown.This is also evidenced by the small difference (about 1%) between the expanded matrix and estimation with traffic counts.The results discussed in this section were obtained with a reasonable computational cost, i.e., about 20 s using a routine implemented within a commercial macrosimulation tool (running in a Windows environment) through a PC desktop with Intel(R) Core(TM) i7-9700F CPU @ 3.00 GHz and 32 GB of RAM.The routine time includes the computational times for estimating the initial network costs and for running the updating procedure of Equation (9).Such a performance opens new opportunities for its integration within real-time procedures.
In fact, the real-time traffic counts coming from the network could feed such a proposed procedure for producing real-time (e.g., updated every 15 min) and dynamic O-D matrices.
(running in a Windows environment) through a PC desktop with Intel(R) Core(TM) i7-9700F CPU @ 3.00 GHz and 32 GB of RAM.The routine time includes the computational times for estimating the initial network costs and for running the updating procedure of Equation ( 9).Such a performance opens new opportunities for its integration within realtime procedures.In fact, the real-time traffic counts coming from the network could feed such a proposed procedure for producing real-time (e.g., updated every 15 min) and dynamic O-D matrices.Further analyses are in progress to verify the dispersion of estimates, developing models according to departure time and trip purpose, and including other socio-economic data in sampling estimators.

The Road Ahead and Open Research Challenges
The goodness of the results obtained provides indications for further development, especially on the statistical significance of the estimates that can be obtained by using such Further analyses are in progress to verify the dispersion of estimates, developing models according to departure time and trip purpose, and including other socio-economic data in sampling estimators.

The Road Ahead and Open Research Challenges
The goodness of the results obtained provides indications for further development, especially on the statistical significance of the estimates that can be obtained by using such data.As shown in Cascetta [20], Ortúzar and Willumsen [51], Tsekeris and Tsekeris [74], and references therein quoted, estimation of current and future O-D demand flows can require user/driver-based surveys for obtaining travel-related data (e.g., destination, activity purpose, departure time), which are resource (time and money) consuming.As recalled in the previous sections, they require, for example, the driver to stop the vehicle and answer some questions posed by the surveyor or other types of interviews performed by phone or face-to-face at home, in the office or at the transportation terminal.Nowadays, thanks to the opportunity offered by telematics, FCD/AVM data allow vehicle movements to be tracked passively.As happens with commercial vehicles [21,36]; whereby it is not uncommon for freight fleet owners to track their vehicles using fleet tracking services with the aim of locating their assets, planning routes, and monitoring vehicle and driver productivity, private vehicles are currently monitored for insurance purposes or to provide extra travel services.Vehicle tracking can reveal routes taken, tours, and stops made, and it has been widely used to ascertain user travel movements as well as easily obtain the shares/probabilities of routes chosen [61], providing an opportunity to have timely data.Above, the results obtained by analysing a large dataset of private vehicles operating in the Veneto region were described.They allow travel patterns to be identified and guide specification of O-D flow estimation; in the future, this will be extended to the route chosen and the calibration and verification of route choice modelling.
However, in order to exploit the opportunity offered by such data, their representativeness and especially their penetrations need to be determined.Below, the unresolved issues in terms of sample size and number of survey days (which help to point out daily variation as well as systematic travel activity) are discussed on the basis of the data from the Veneto region.

Sample Size
Estimation of sample size can be performed following traditional approaches to O-D surveys under conditions of population stratification [20,[50][51][52].As hypothesised in such studies, it is reasonable to assume that each stratum consists of a province within the study area (seven strata in the Veneto region).As part of the sampling process, representative numbers of cars to be observed during the day within some areas (provinces) are to be determined.
When identifying the sample, it should be borne in mind that not all cars randomly included in the sample can undertake a trip during the survey day.In this case, we can construct the confidence interval to predict the fluctuations of the active cars.Given a generic province o, having the number of sampled vehicles travelling (VT) during each survey day (s) and statistical data from the public authorities (e.g., ISTAT), the confidence interval for active cars can be estimated as follows: where Then, using the daily O-D trips by car between provinces in the survey days, the statistical characteristics on observed O-D flows (e.g., average number of trips made by vehicles between zones-provinces, variance, and coefficient of variation) can be evaluated.
Subsequently, according to the sample size definition by Cascetta [20] and Ortúzar and Willumsen [51], given a province-based zoning and all the O-D pairs od representing the trips between or within the provinces (i.e., exchange, crossing and internal province-based trips), the number of cars (n cars od(s) ) that should be monitored for O-D pair od according to sampling statistics data in day s can be calculated as follows: Equation (11) allows us to obtain the number of cars that should be monitored from every province-based O-D pair.Such values should guarantee the representativeness of the necessary sample.However, given that it could be operatively easier to obtain information on the start place of travel and since Equation ( 4) gives the matrix results, the minimum number of vehicles to monitor should refer to the origin.Therefore, the necessary sample of vehicles (cars) for province o in a day s, n cars o(s) , can be obtained as follows: where the sum is extended to all possible destinations d of the study reached by trips starting from origin (province) o.
As the FCD/AVM data could be easily obtained for more than one day, the number of the sampled cars, taking into account several days of observations (S), can be evaluated as follows: where n cars o is the final sample size of cars that should be monitored from zone (province) o.

Sampling Days
The next step to determine sample size could be to evaluate the minimum number of surveyed days in order to infer the spatialization of O-D patterns.A possible methodology could be based on the estimation procedure for O-D flow values among survey days.Thus, if the trip patterns are relatively stable during the survey days, we can reduce the monitoring days without any negative consequences for the data obtained.To do this, the variability of the O-D flows within S survey days can be evaluated as follows.With the statistics data on observed sampled trips, i.e., average O-D flows φ od within S days, and its variance σ 2 od(φ) , the number of survey days required for every O-D pair od can be found according to Ortúzar and Willumsen [54] as follows:  Subsequently, a research challenge in using FCD/AVM data would be to find the minimum number of sampling days required.The calculations can be made through Equations ( 15)- (17).The results are presented in Table 9.
The results reported in Table 8 indicate that necessary days of observation for Verona, Vicenza, Belluno, Treviso, and Padua provinces are less or equal to five.Only Venice and Rovigo provinces need more than five days of investigation, i.e., 8 and 31, respectively.This may be explained by a large variance in average O-D trip values between Rovigo- The results reported in Table 8 indicate that necessary days of observation for Verona, Vicenza, Belluno, Treviso, and Padua provinces are less or equal to five.Only Venice and Rovigo provinces need more than five days of investigation, i.e., 8 and 31, respectively.This may be explained by a large variance in average O-D trip values between Rovigo-Verona, Rovigo-Venice, and Venice-Rovigo.The O-D flows between these O-D pairs are not stable within weekdays.With reference to Figure 4, it may be seen that SW od vary for Rovigo-Verona from 1.62% to 2.35%, Rovigo-Venice from 1.28% to 1.96%, and Venice-Rovigo from 0.17% to 0.24%, reflecting the specific weight in the total trip values generated by the provinces.If the O-D flows with the highest specific weight are considered (the intraprovince trips), in this case, the number of days required does not exceed five.Accordingly, it may be concluded that five survey days gave representative information on O-D flows within the study area.

Conclusions
The paper presented recent developments in O-D flow estimation through AVM/FCD, hence dealing with link flows to be forecast and road network performance to be assessed.We reviewed the main modelling approaches to be implemented in order to estimate the O-D daily flows through the new opportunities offered by telematics, presented some analyses to exploit the opportunities offered by AVM data capable of identifying and assessing car patterns, and tested the estimation framework through comparison with traffic counts.
The first part of the paper gave an overview of the procedures developed and presented an estimation procedure to obtain car O-D flows using an aggregate approach; the second part reported evidence on trips/trip chains performed by cars belonging to a large dataset, which operate in the Veneto region, followed by the validation/verification results.This allowed mechanisms for driving trip generation to be captured more accurately and a trip chain order to be reproduced.
The proposed O-D estimation procedure was structured into three levels: car trip identification, sample O-D matrix, O-D (i.e., expansion to the universe of investigation).The car trip identification procedure was implemented in order to obtain the trips, and hence trip chains, for each surveyed vehicle.The sample O-D matrix was obtained by means of a trip-grouping procedure, and the result was compared with a sample O-D matrix available for census, revealing little difference between them.The expansion to the universe made it possible to obtain the average O-D matrix for the study area.In order to validate this result, the matrix was assigned, and the flows were compared with available traffic counts on a subset of links, showing low differences among them.
These results indicate that AVM/FCD allow us to have an estimate of O-D demand flows and reproduce quite well the critical patterns along roads.In addition, they enable us to provide a continuous picture of the network status as well as exploit the potential of data-driven approaches for simulating transportation systems.Vehicle activity pattern identification is crucial to characterise passenger operations from AVM/FCD/GPS data.
The results of this paper show the potential of the proposed procedure to identify vehicle trips as well as characterise demand flows spatially.Future work will focus on testing and calibrating the procedure with passenger diary data, to assess its efficacy in detecting stops of very short duration and identify passenger activity.With the growing availability of data, there is great potential for the use of this procedure to characterise passenger trips and as a tool in transportation planning in general.
The proposed methodology of O-D matrix sample estimation may be used for urban and interurban transportation policy making.The field of application is huge, covering attributes of urban logistics [75][76][77], determining mobility patterns for the sustainable development of metropolitan areas and intercity road connections [78,79].Current trends and challenges for future mobility based on electric automated vehicles [80,81] and carsharing services [82] need to be anticipated.The efficient operation of such systems cannot be implemented without robust data on trip chain formation.As one of the ways to achieve this goal, we propose to estimate trip behaviour using AVM/FCD/GPS data with the sampling survey option.
Although the obtained statistics confirm the value of the proposed approach, further developments are in progress to improve the results and apply more advanced machine learning techniques that allow further features to be included in model specification and in modelling accuracy.The modelling framework that can be developed exploiting AVM/FCD could benefit from investigation of the influence of socio-economic attributes on tour/trip chain definition, the inclusion of the size function in delivery location choice, modelling of the choice set generation within the delivery location model, and inclusion of departure time choice in order to investigate the relationship with time-window access restrictions in progress.

Figure 2 .
Figure 2. Structure of the AVM/FCD database.Figure 2. Structure of the AVM/FCD database.

Figure 2 .
Figure 2. Structure of the AVM/FCD database.Figure 2. Structure of the AVM/FCD database.

LegendFigure 3 .
Figure 3. Veneto O-D matrices for the five survey days.Figure 3. Veneto O-D matrices for the five survey days.

Figure 3 .
Figure 3. Veneto O-D matrices for the five survey days.Figure 3. Veneto O-D matrices for the five survey days.

Figure 4 .
Figure 4. Shares of O-D flows between provinces of the Veneto region (sample-based).Figure 4. Shares of O-D flows between provinces of the Veneto region (sample-based).

Figure 4 .
Figure 4. Shares of O-D flows between provinces of the Veneto region (sample-based).Figure 4. Shares of O-D flows between provinces of the Veneto region (sample-based).

Figure 5 .
Figure 5.Comparison of O-D flows specific weight between sampling and ISTAT data.Figure 5. Comparison of O-D flows specific weight between sampling and ISTAT data.

Figure 5 .
Figure 5.Comparison of O-D flows specific weight between sampling and ISTAT data.Figure 5. Comparison of O-D flows specific weight between sampling and ISTAT data.

aFigure 6 .
Figure 6.Distribution of daily trip number made by sampled vehicles.

Figure 6 .
Figure 6.Distribution of daily trip number made by sampled vehicles.

Figure 7 .
Figure 7. Traffic counts available in the Veneto region (background source: OpenStreetMap).

Figure 8 .
Figure 8.Comparison between revealed and modelled road link flows.

Figure 8 .
Figure 8.Comparison between revealed and modelled road link flows.

Figure 9 .
Figure 9.Comparison between revealed and modelled road link flows (after updating).

Figure 9 .
Figure 9.Comparison between revealed and modelled road link flows (after updating).
number of survey days required for the flows between zones (provinces) o and d; • φ od is the average value of flows on O-D pair od within S days; • σ 2 od(ϕ) is the variance of flows on O-D pair od within S days; • ω od (s) is the average O-D flow between zone (province) pair od for the survey day s.Equation (15) gives the number of the survey days needed for the representative sample for every O-D pair considered.In this case, for every origin zone (province) o,

Figure 10 .
Figure 10.Samples of monitored vehicle numbers under different conditions of activity.

Table 1 .
Statistical data for the Veneto region.
* sample rate in brackets.

Table 3 .
Characteristics of sampled vehicles.
Information 2021, 12, x FOR PEER REVIEW 13 of 29

Table 5 .
Coefficients of variation on O-D pairs of trips for the five survey days.
n.a.= not available/not applicable.

Table 5 .
Coefficients of variation on O-D pairs of trips for the five survey days.

Table 6 .
Basic statistical attributes of trip chains.Sampling/Surveyed Days Number of Trips Made by One Vehicle Number of Estimated Vehicles Mean Standard Deviation Min Max 15.10.20183.98 1.51 1 19 16,760

Table 6 .
Basic statistical attributes of trip chains.

Table 7 .
Link flow estimation: accuracy of estimates.

Table 7 .
Link flow estimation: accuracy of estimates.

•
VT o is the mean of VT for province o; • µ o is the estimated mean of the travelling vehicles VT o ; • γ α/2 is the quantile of the distribution (under Prob = 0.95 equals 1.96); • σ VTo is the sample standard deviation for VT o ; • S o is the number (days) of observations for VT o .
CV 2 ω o(s) is the coefficient of variation of number of trips made by vehicles between or within provinces during the observation day s; • τ s is the average number of trips made by one vehicle during day s; • ω od(s) is the detected (revealed) flow between zone o and zone d between or within the provinces in survey (observation) day s; • M o(s) is the detected number of origin zones in day s for province o; • W o(s) is the detected number of destination zones in day s for province o.
where•n cars od(s) is the number of cars that should be monitored for O-D pair od according to sampling statistics data in day s; • ω (s) is the average number of trips made by vehicles between or within provinces during the observation day s;• σ 2 ω (s)is the variance of number of trips made by vehicles between or within provinces during the observation day s;•

Table 8 .
Data on VT values for the observation days.

Confidence Interval 15.10 22.10 07.11 15.11 23.11 Left Side [%]
Figure 10.Samples of monitored vehicle numbers under different conditions of activity.