Utilizing MapReduce to Improve Probe-Car Track Data Mining

: With the rapidly increasing popularization of the automobile, challenges and greater demands have come to the fore, including trafﬁc congestion, energy crises, trafﬁc safety, and environmental pollution. To address these challenges and demands, enhanced data support and advanced data collection methods are crucial and highly in need. A probe-car serves as an important and effective way to obtain real-time urban road trafﬁc status in the international Intelligent Transportation System (ITS), and probe-car technology provides the corresponding solution through advanced navigation data, offering more possibilities to address the above problems. In addition, massive spatial data-mining technologies associated with probe-car tracking data have emerged. This paper discusses the major problems of spatial data-mining technologies for probe-car tracking data, such as true path restoration and the close correlation of spatial data. To address the road-matching issue in massive probe-car tracking data caused by the strong correlation combining road topology with map matching, this paper presents a MapReduce-based technology in the second spatial data model. The experimental results demonstrate that by implementing the proposed spatial data-mining system on distributed parallel computing, the computational performance was effectively improved by ﬁve times and the hardware requirements were signiﬁcantly reduced.


Introduction
With the rapid development of urbanization and the economy, the automobile has become essential in everyday life.For example, China's national motor vehicle ownership reached 310 million as of the end of 2017, which includes 217 million cars [1].As a consequence, various problems have emerged, including traffic congestion, energy crises, traffic safety, and environmental pollution.The increasing traffic demands and tightened automotive emission standards urge infrastructure operators and the automotive industry to act.Faced with these problems, urgent demands have been set on green driving, safe driving, congestion relief, and so on.To meet these demands, more advanced navigational data are needed, including dynamic traffic status information, passage cost information, road accident information, and road fuel consumption information.Probe-cars provide an opportunity to obtain these data by participating in the traffic flow and determining self-experienced traffic conditions, and transmitting these to a traffic center.
The probe-car, known as the Global Positioning System (GPS) rover, is an important and effective way to obtain urban road traffic status in the international Intelligent Transportation System (ITS) [2].It is an advanced road traffic information collection technology in the field of international ITS [3].By mounting a GPS device on a vehicle, the vehicle's position information, behavioral information, and event information are transferred to a data center in real time or offline.By combining massive probe-car tracks and behavioral data obtained from the data center with the existing map data, a variety of advanced navigation data can be mined to address certain problems, including the discovery of new roads, the actual cost of road traffic, the peak periods of roads, and accident-prone areas.Probe-car data can range throughout the region and be collected 24/7.This technology greatly improves the efficiency of information collection by wireless real-time transmission and center-through processing.Meanwhile, it lowers the cost of maintenance of acquisition equipment by the installation of GPS and other communication network resources [4].
A probe-car system consists of a wireless communication network that includes GPS and wireless communication capabilities and an information processing center.Probe-car data (PCD) systems are composed of three parts: the data acquisition system of the probe-car, the traffic information processing system, and the real-time traffic information distribution system.The probecar is driven on city roads and uploads the collected real-time raw data to the probe-car data acquisition system.The traffic information processing system is responsible for the raw data preprocessing, coordinate conversion, geographic information system (GIS) electronic map matching, as well as travel time calculation.The information distribution system releases the traffic information by the processing system to provide the public with real-time road traffic reference information by General Packet Radio Service (GPRS), the Internet, and other means.The architecture of a probe-car data system is shown in Figure 1.As an important component of ITS, the urban traffic dynamic route guidance system obtains vehicle locations under real-time road traffic conditions and provides the best route guidance information.It helps direct travelers in order to improve traffic conditions, reduce traffic congestion, and achieve a reasonable distribution of traffic flow on the roads.ITS experts and enterprises have conducted theoretical research and developed applications based on vehicle locations.Countries in Europe have developed various new technologies and made intelligent road transport systems available in various cities.The ADVANCE real-time traffic releasing system for probe-cars was initiated by the State of Illinois with the US Federal Highway Administration as its partner [5].The purpose of this system is to determine whether drivers need real-time information to avoid congestion, in order to increase capacity.The UK traffic master offers a series of traffic information services, where the data are mainly provided from fixed sensors and supplemented by PCD [6].
To the authors' knowledge, research on large-scale PCD processes is still in the preliminary stages.The existing research mainly focuses on cost, probe-car size, system architecture, and As an important component of ITS, the urban traffic dynamic route guidance system obtains vehicle locations under real-time road traffic conditions and provides the best route guidance information.It helps direct travelers in order to improve traffic conditions, reduce traffic congestion, and achieve a reasonable distribution of traffic flow on the roads.ITS experts and enterprises have conducted theoretical research and developed applications based on vehicle locations.Countries in Europe have developed various new technologies and made intelligent road transport systems available in various cities.The ADVANCE real-time traffic releasing system for probe-cars was initiated by the State of Illinois with the US Federal Highway Administration as its partner [5].The purpose of this system is to determine whether drivers need real-time information to avoid congestion, in order to increase capacity.The UK traffic master offers a series of traffic information services, where the data are mainly provided from fixed sensors and supplemented by PCD [6].
To the authors' knowledge, research on large-scale PCD processes is still in the preliminary stages.The existing research mainly focuses on cost, probe-car size, system architecture, and precision [7,8].With the development of PCD technology and the popularity of GPS devices, the amount of data will grow dramatically.Due to the specific mobility of probe-car data and the limitation of the size of the cars, real-time probe-car data is unable to cover all of the road network.It is essential to address the missing data issues in the road network analysis and improve the application's efficiency.
We describe the basic principles and key issues of mining probe-car data in Section 2. Section 3 describes a MapReduce approach to accelerate the map-matching of massive probe-car tracking data.Section 4 demonstrates the experimental results of the MapReduce approach.Section 5 concludes the research and introduces future work.

Literature Review
Probe-car technology research can be traced back decades.Considering the complex construction of intelligent transportation systems, the relevant agencies have internationally conducted a great amount of research and application, as shown in Table 1.For example, the UK's probe-car data system was developed to collect and analyze traffic information and was invested in by ITIS Holdings Plc, a typical successful probe-car system [6].The data sources include both real-time and historical information from the Automobile Association Traffic Control Centre (AATCC).The probe-car data system uses the GPS/wireless data transmission mode.After the acquired data are processed, the system predicts the traveling time for users in real time and continues to update the information.Another example is the American ADVANCE system, which was an experimental project of dynamic road induction in an Illinois suburban area conducted by the Federal Highway Administration (FHA), Illinois Transportation Authority (ITA), Motorola, Transportation Research Institute at Illinois State University, and other agencies in 1991.Its goal was to determine whether traffic guidance information is helpful to avoid traffic congestion and improve driving quality [6].The Vehicle Information and Communication System (VICS) in Japan is one of the successful applications in the field of intelligent transportation.VICS acquires traffic data via GPS navigation devices and releases accurate traffic guidance information and real-time traffic information to travelers by FM radio and wireless data transmission [9].The Korea Road Traffic Information Center(KORTIC) system of Korea, developed by the Korean Road Safety Association (KRSA), combines toroidal coil, GPS probe-car, and Closed-Circuit Television(CCTV) surveillance equipment for traffic information collection.Then it extracts traffic information after data fusion, analysis, and processing, and determines the traffic status to reduce the estimated error probability for obtaining road travel time to 10% or less [10].In 2001, the German Space Center Transportation Institute (GSCTI) integrated probe-cars with 300 floating taxis to collect and analyze their location, speed, and other information in Berlin [11].Jan Fabian Ehmke predicted time-dependent travel time and assessed the resulting road information using data-mining methods through different levels of aggregation for the large amount of probe-car data [12].With its acquisition of Waze in 2013, Google added a human element to its traffic calculations.Drivers can use the Waze app to report traffic incidents including accidents, disabled vehicles, slowdowns, and even speed traps [13].[14] used velocity and time information to obtain the average speed, traffic, travel time, and other traffic information through a mathematical model to achieve real-time road detection.With the urban road network data acquired by probe-cars equipped with GPS devices, Dong [15] analyzed the road network level and obtained the travel conditions and functioning of the road network at different levels.Zhang [16] used pattern recognition, statistical forecasting, time series, and intelligent algorithms through traffic parameters of probe-car data acquisition to detect traffic incidents.Xin [17] analyzed the space and time distribution characteristics of urban road networks based on probe-car data, adopting the coverage and intensity in a certain coverage as indicators.Xin pointed out that the coverage and intensity of probe-car data have similar peak hours on weekdays.The higher the level of the road, the higher the coverage and intensity of probe-car data.Li [18] presented a mathematical model of probe-car coverage in a single section and the whole road network on the basis of the minimum requirements for probe-car samples in a single section and verified it by simulation.This model considered various factors such as computing interval, average traffic flow density, average travel speed, mistake matching scores of probe-cars, and so on.The simulated results showed that the coverage rate calculated by this model can ensure a 93.7% link of the road network through collecting data.Zhang [19] described the composition of probe-car systems and the optimization theory for probe-car sampling.By considering velocity and analyzing the random signal and the spectrum of the Fourier transform, the optimal sampling frequency is determined by the Shannon sampling theory.The results showed that the more optimal the sampling frequency obtained, the higher the data accuracy, which is suitable for practical applications.Weng [20] categorized probe-car data into three stages, historical data applications, historical traffic state data applications, and dynamic traffic state data applications, on the basis of summarizing the research of probe-car traffic information applications.He analyzed the urban transport operating characteristics of probe-car data in Beijing, such as the distribution characteristics of road traffic and the utilization of different levels of road mileage.Feng [21] proposed a probe-car map-matching algorithm based on the search for local paths by analyzing the characteristics of the collection of raw data.Making use of GPS points matched previously, it greatly reduces the search space to achieve better positional accuracy of probe-car data, so as to determine the vehicle track.
As the probe-car tracking data is massive, the above traditional data-processing methods cannot meet the current data-processing needs.This paper proposes a parallel algorithm combined with cloud computing technology to process the data.

Probe-Car Collection Interval
The timeliness and reliability of traffic status identification can be largely impacted by the configuration of essential parameters based on the data collection of GPS probe-cars, including the data-sampling interval and the sampling ratio.The collection interval refers to the time interval of uploading the collected data of the probe-car to the information processing center; in other words, the time period of the probe-car data collection.Generally speaking, the higher the frequency of data collection, the more accurate the real-time road traffic information.
However, if the acquisition time interval is too short, it will not only increase the cost of acquisition, but also result in higher data redundancy, and the road traffic conditions will be very similar.On the contrary, if the time interval is too long, it will miss important data, which would lead to not reflecting the dynamic traffic conditions precisely.Therefore, it is very important to set an appropriate collection interval for probe-car data.
Yamane conducted research on urban probe-cars in Osaka, Japan [22], and found that different data-collection intervals would result in different road map matching results.Researchers also found that the longer the time interval of probe-car data acquisition, the worse the accuracy of the reflection of real-time road traffic conditions, in spite of the relatively low cost of the acquisition.If the probe-car collection interval is short, the reflected effect of real-time road traffic conditions is better.Though the time cost of the acquisition will be relatively higher, the effect of GIS map-matching is better.From different tests, we found that when the probe-car collection interval is 30 s, we can achieve the best balance among the acquisition cost, the reflection accuracy of real-time traffic conditions, and the results of matching.Therefore, the optimal sampling interval of the probe-car is 30 s in this research.

The Key Issues in Probe-Car Data Mining
Probe-car track data consists of sampling points with a series of latitude and longitude information and other vehicle behavior information.The characteristics of probe-car data are as follows [23,24]: (1) Position information (latitude and longitude).
(2) Position information with noise, and the noise is affected by a variety of factors (GPS noise, clouds, the status of buildings nearby, indoor and outdoor conditions, and so forth).(3) Loss of spatial information: the sampling interval of probe-car tracks is usually long (tens of seconds or minutes), which will result in the loss of shape information.(4) Spatial information redundancy caused by intensive sampling and low speed.(5) The temporal correlation between some series of track points.(6) Additional property information: incidental event information, driving behavior information, sensor parameters while sampling points.
For the above features of probe-car data tracks, the first five characteristics are the inherent basic properties of time and space, which will be fully considered during the data matching.The last characteristic is the additional information provided by the probe-car, which may vary greatly (not available for all vehicles) according to the vehicle's own situation, but it can be used to mine more behavioral factors.
In the spatial data mining of probe-car tracks, there are several key issues for the above data features [25,26]: (1) The close correlation of spatial data.Since the electronic map of the road network topology has a close correlation, it often needs to load all of the road network data into memory to handle all the tracks.Since the road network data is massive, it demands high performance for the hardware.Moreover, it results in low performance for searching the matching data in the entire road network to process each specific track, which is a fatal flaw for the massive spatial data mining of the probe-car track.
(2) True path restoration for the probe-car track.It is crucial to combine the electronic map to restore the true path of probe-car tracks as accurately as possible.Since the sampling interval is sparse, the distance between two adjacent track points might be far.Due to data noise, the position of each track point may greatly deviate from the position of the real road, so large errors result from the conventional method, which matches the best roads according to each track point location on an electronic map.As shown in Figure 2a, if it is a partial match, a part marked I in the whole track will be matched to road 1.However, the whole track should be matched to road 2, as shown in Figure 2b.Therefore, combining the spatial characteristics of the entire track optimally and globally restores the path, while the local single track point cannot be matched to the best road.To obtain city-wide real-time traffic information, large-scale data in are collected in real time and analyzed to ensure the timely release of traffic information.Using a general process to deal with these data, PCD will encounter great bottlenecks, thus the real-time data cannot be obtained, and there is no way to use a more sophisticated processing algorithm to improve the accuracy of the processing results.MapReduce adopts a distributed parallel computing model and makes it easy to implement parallel computing, load balancing, and other excellent properties [27][28][29][30], thus it is very suitable for big data mining.However, due to the close spatial correlation characteristic of the probe-car track data, the entire road network of the electronic map data needs to be read for each computing node to handle the massive track, which increases the hardware requirements for each computing node and impairs computing performance.
To address the above-mentioned problems of spatial data mining on probe-car tracks, massive probe-car track data and map data need to be loaded.By combining road topology with map matching, strong correlations can be made between track and map data.When matching and analyzing the data, the data are not locally matched, but globally matched, so as to avoid the case of Figure 2.For high data throughput and limited computing performance, we propose a parallel processing algorithm to reduce the probe-car tracks, namely, two-step MapReduce technology in the spatial data model.

Methodology
Two types of data were processed by different MapReduce algorithms to enable parallel computing of spatial data.A two-step MapReduce was conducted: (1) for space partitioning to analyze the relatively independent data, that is, the full tracks in one spatial division range that can be matched with the real road net, which is global matching; (2) for the cross-regional track processing to match with the road network of the electronic map in the designated cross-region.

MapReduce Parallel Distributed Computing Model
Dean and Ghemawat proposed a MapReduce distributed computing model for the analysis of web log files [27].The Hadoop project implemented this computational model, which used a cluster consisting of thousands of computers to analyze the massive server files.The MapReduce model is realized mainly through two functions: mapping (Map) and reduction (Reduce).The main process To obtain city-wide real-time traffic information, large-scale data in are collected in real time and analyzed to ensure the timely release of traffic information.Using a general process to deal with these data, PCD will encounter great bottlenecks, thus the real-time data cannot be obtained, and there is no way to use a more sophisticated processing algorithm to improve the accuracy of the processing results.MapReduce adopts a distributed parallel computing model and makes it easy to implement parallel computing, load balancing, and other excellent properties [27][28][29][30], thus it is very suitable for big data mining.However, due to the close spatial correlation characteristic of the probe-car track data, the entire road network of the electronic map data needs to be read for each computing node to handle the massive track, which increases the hardware requirements for each computing node and impairs computing performance.
To address the above-mentioned problems of spatial data mining on probe-car tracks, massive probe-car track data and map data need to be loaded.By combining road topology with map matching, strong correlations can be made between track and map data.When matching and analyzing the data, the data are not locally matched, but globally matched, so as to avoid the case of Figure 2.For high data throughput and limited computing performance, we propose a parallel processing algorithm to reduce the probe-car tracks, namely, two-step MapReduce technology in the spatial data model.

Methodology
Two types of data were processed by different MapReduce algorithms to enable parallel computing of spatial data.A two-step MapReduce was conducted: (1) for space partitioning to analyze the relatively independent data, that is, the full tracks in one spatial division range that can be matched with the real road net, which is global matching; (2) for the cross-regional track processing to match with the road network of the electronic map in the designated cross-region.

MapReduce Parallel Distributed Computing Model
Dean and Ghemawat proposed a MapReduce distributed computing model for the analysis of web log files [27].The Hadoop project implemented this computational model, which used a cluster consisting of thousands of computers to analyze the massive server files.The MapReduce model is realized mainly through two functions: mapping (Map) and reduction (Reduce).The main process [31] is shown in Figure 3. Google's MapReduce programming model serves to process large datasets in a massively parallel manner (subject to a MapReduce implementation).The programming model is based on the following simple concepts: (i) iteration over the input; (ii) computation of key/value pairs from each piece of input; (iii) grouping of all intermediate values by key; (iv) iteration over the resulting groups; and (v) reduction of each group.
The model is stunningly simple and effectively supports parallelism.The programmer may abstract from the issues of distributed and parallel programming, because MapReduce implementation takes care of load balancing, network performance, fault tolerance, and so forth.The seminal MapReduce paper [28,32,33] described one possible implementation model based on large networked clusters of commodity machines with local storage.The programming model may appear restrictive, but it provides a good fit for many problems encountered in the practice of processing large datasets.Additionally, expressiveness limitations may be alleviated by decomposing problems into multiple MapReduce computations or by escaping to other (less restrictive, but more demanding) programming models for subproblems.

MapReduce Method for Data Mining in Probe-Car Tracks
The size division is based on the distribution and density of the trajectory.After dividing the total area into the target areas as described above, a nested MapReduce approach can be used to achieve spatial data mining for the probe-car tracks.The implementation is described as follows: (1) Level-1 Map function design The Level-1 Map function is mainly responsible for dealing with the track in a small designated area of the probe-car, and the algorithm flow is shown in Figure 4.
The pseudocode is as follows: Enter the range A1 to be processed;  Google's MapReduce programming model serves to process large datasets in a massively parallel manner (subject to a MapReduce implementation).The programming model is based on the following simple concepts: (i) iteration over the input; (ii) computation of key/value pairs from each piece of input; (iii) grouping of all intermediate values by key; (iv) iteration over the resulting groups; and (v) reduction of each group.
The model is stunningly simple and effectively supports parallelism.The programmer may abstract from the issues of distributed and parallel programming, because MapReduce implementation takes care of load balancing, network performance, fault tolerance, and so forth.The seminal MapReduce paper [28,32,33] described one possible implementation model based on large networked clusters of commodity machines with local storage.The programming model may appear restrictive, but it provides a good fit for many problems encountered in the practice of processing large datasets.Additionally, expressiveness limitations may be alleviated by decomposing problems into multiple MapReduce computations or by escaping to other (less restrictive, but more demanding) programming models for subproblems.

MapReduce Method for Data Mining in Probe-Car Tracks
The size division is based on the distribution and density of the trajectory.After dividing the total area into the target areas as described above, a nested MapReduce approach can be used to achieve spatial data mining for the probe-car tracks.The implementation is described as follows: (1) Level-1 Map function design The Level-1 Map function is mainly responsible for dealing with the track in a small designated area of the probe-car, and the algorithm flow is shown in Figure 4.  (2) Level-1 Reduce function design The Level-1 Reduce function is responsible for writing the results processed for the probe-car tracks within the designated area to the master server.The algorithm is as follows: Step 1: Write the correspondence relation for matching the local probe-car track and the electronic road map to the master server.
Step 2: Write the local records that the probe-car track does not match with the road in the electronic map into the master server.
(3) Level-2 Map function design The Level-2 Map function is mainly responsible for processing the existence of probe-car tracks across the region of the designated small area currently.The algorithm is shown in Figure 5. (2) Level-1 Reduce function design The Level-1 Reduce function is responsible for writing the results processed for the probe-car tracks within the designated area to the master server.The algorithm is as follows: Step 1: Write the correspondence relation for matching the local probe-car track and the electronic road map to the master server.
Step 2: Write the local records that the probe-car track does not match with the road in the electronic map into the master server.
(3) Level-2 Map function design The Level-2 Map function is mainly responsible for processing the existence of probe-car tracks across the region of the designated small area currently.The algorithm is shown in Figure 5.The pseudocode is as follows: Enter the range A1 to be processed; Enter the set D of the cross-regional tracks to be processed  The pseudocode is as follows: Enter the range A1 to be processed; Enter the set D of the cross-regional tracks to be processed Read the latitude and longitude range A2 in D; A3 = A1 + A2; M = LoadData(A3); // Load the map data of A3 into memory While(D != NULL) { The PCD parallel processing system consists of the collection server software, data preprocessor program, server-side program, node calculation program, database storage, and other components.The PCD acquisition server receives real-time transmissions of a large number of PCDs, establishes a database, and saves all the original PCD collected by the system.The collection server has a high-speed connection to the master server and is convenient for data to be transported to the processing center quickly.
The preprocessing program includes two parts.First, the map data are preprocessed, generating map grid information.This part belongs to the offline calculations, which are only done once.Second, each of the real-time PCD records obtained is pretreated, which removes some obviously wrong invalid records caused by equipment failure.
The server program runs on the master server KD50, which is responsible for the PCD tasks of decomposition, scheduling, and consolidating the results.The node program on each computing node will allocate computing resources in accordance with the instructions on the server side of the program.In addition, the server program needs to initialize the system and load the configuration file, map file, and historic mining information, initiate the shutdown node, and so on.The node program distributes each computing unit over KD50, which is used to complete the map-matching data-processing tasks distributed by the master server.KD50 has many computing units.After task scheduling by the master server, there will be multiple node programs running processing tasks for the computing units.Thus, it can achieve basic PCD parallel processing.

Results
Based on the methods described in the paper, the experimental verification is conducted based on the probe-car tracks during one month in Japan.The experimental data are the long-distance freight data shown in Table 2. Table 3 shows the PCD sample data format.The probe-car tracks are composed of the PCD data in Table 3, and are shown on the map in Figure 6.The different distributions and densities of the trajectories lead to different division sizes.The experiment for the different sizes of the divided region (m × n) is shown in Figure 7.There are 4 machines to process the data by the proposed distributed parallel processing mode based on MapReduce.As shown in Figure 7, it uses the least time for 2 • × 1 • 20 and the percentage of the cross-regional trajectories is 23.1%.It uses the most time for 12 • × 8 • and the percentage of the cross-regional trajectories is 5.3%.The size division for m and n is based on the actual situation, such as the length, distribution, and density of the trajectory.Generally, the region should not be too small.It does not reflect the effect of MapReduce while the proportion of the cross-regional track is significant.Then the region should not be too large for the limitation of the consumption memory and computing resources.
Then the target area is divided into n • × m • small regions (n and m are positive numbers) according to the following principles, ensuring that most tracks (70-80%) are divided into no more than one small lattice: (1) n and m should not be too small, or the proportion of the tracks across the regions will be very significant, which cannot reflect the effect of MapReduce.n and m should be chosen so that the cross-regional trajectory ratio is kept around 25%. m × n is limited by memory limitations.(2) n and m should not be too large.If m × n is larger, consumption memory and computing resources will be greater.The maximum of m × n is limited by the resources a single computing unit can provide.(3) Within a single computing unit resource, it is not necessarily good to have larger m × n values, but it would be better to have a greater range for m × n and a lower marginal effect of the inter-regional track proportion.To meet the conditions in which the cross-regional track ratio is below a certain range, the smaller m × n, the better.
In general, n and m are preferably larger than 1.The size of the divided region can be appropriately adjusted according to the number of working machines.
(3) Within a single computing unit resource, it is not necessarily good to have larger m × n values, but it would be better to have a greater range for m × n and a lower marginal effect of the interregional track proportion.To meet the conditions in which the cross-regional track ratio is below a certain range, the smaller m × n, the better.
In general, n and m are preferably larger than 1.The size of the divided region can be appropriately adjusted according to the number of working machines.The traditional single server and the proposed approach in this paper, respectively, are used to process the road data matching in the electronic map for the probe-car track by the track-matching algorithm based on the spatial semantic features, where it is divided into the range of 2° × 1°20′ in the MapReduce model.The test results are shown in Table 4.The traditional single server takes about 10.5 h.By the proposed distributed parallel processing mode based on MapReduce, when the number of machines was respectively 4 units → 8 units, the processing time required was less, from 2.5 h → 2.1 h.Thus, the processing efficiency of the distributed parallel processing based on MapReduce shows great improvement.The traditional single server and the proposed approach in this paper, respectively, are used to process the road data matching in the electronic map for the probe-car track by the track-matching algorithm based on the spatial semantic features, where it is divided into the range of 2 • × 1 • 20 in the MapReduce model.The test results are shown in Table 4.The traditional single server takes about 10.5 h.By the proposed distributed parallel processing mode based on MapReduce, when the number of machines was respectively 4 units → 8 units, the processing time required was less, from 2.5 h → 2.1 h.Thus, the processing efficiency of the distributed parallel processing based on MapReduce shows great improvement.

Conclusions
This paper discusses the main issues of spatial data mining of probe-car tracks and presents the two-time MapReduce technology in a spatial data model to solve map-matching by using the massive trajectory data of floating vehicles and a special strong correlation of the data.Two types of data are processed by the MapReduce algorithms to enable parallel spatial data computing.The first MapReduce is conducted for space region partitioning to analyze relatively independent data, that is, full tracks as a global matching step.The second MapReduce is conducted for cross-regional track processing, as the cross-regional track.Experiments confirmed that MapReduce technology can be successfully used in data mining.Using distributed parallel computing for spatial data mining, the computing performance is significantly improved from the traditional approach of a single server and the hardware requirements are reduced.Using the proposed MapReduce model, the master server can better balance the tasks of a working host and achieve better load balancing and better stability.Based on the two-time MapReduce technology, the probe-car track is divided into spatial region processes and track processes to solve the strong spatial correlation characteristics of the map data and achieve parallel processing of the probe car track reduction.
Based on the application of the parallel MapReduce algorithms, it is crucial to improve the probe-car track matching and the reduction algorithms for further research, so as to make better use of parallel computing to enhance the massive probe-car registration track performance.Based on the probe-car data and spatial data mining general model, we will conduct floating space vehicle trajectory data matching with reference to the specific electronic maps.According to the matching information, the track is divided into matched and not matched track information.From the not matched track information, new roads, accident-prone areas, and parking Point of Interests (POIs) can be discovered.From the matched track information, abandoned roads and shape-changing information for the roads and other map elements can be implied.Utilizing the parallel MapReduce algorithms, the lane path extracted by the vehicle tracks is the future research issue that emerges from our approach, shown in Figures 8 and 9. Based on the probe-car data, we will conduct floating space vehicle trajectory data matching with reference to the specific electronic map.According to the matching information, combining the spatial data mining model and artificial intelligence, the lane path will be extracted by the vehicle tracks, which will lay the foundation for the acquisition of high-precision maps.
processing, as the cross-regional track.Experiments confirmed that MapReduce technology can be successfully used in data mining.Using distributed parallel computing for spatial data mining, the computing performance is significantly improved from the traditional approach of a single server and the hardware requirements are reduced.Using the proposed MapReduce model, the master server can better balance the tasks of a working host and achieve better load balancing and better stability.Based on the two-time MapReduce technology, the probe-car track is divided into spatial region processes and track processes to solve the strong spatial correlation characteristics of the map data and achieve parallel processing of the probe car track reduction.
Based on the application of the parallel MapReduce algorithms, it is crucial to improve the probe-car track matching and the reduction algorithms for further research, so as to make better use of parallel computing to enhance the massive probe-car registration track performance.Based on the probe-car data and spatial data mining general model, we will conduct floating space vehicle trajectory data matching with reference to the specific electronic maps.According to the matching information, the track is divided into matched and not matched track information.From the not matched track information, new roads, accident-prone areas, and parking Point of Interests (POIs) can be discovered.From the matched track information, abandoned roads and shape-changing information for the roads and other map elements can be implied.Utilizing the parallel MapReduce algorithms, the lane path extracted by the vehicle tracks is the future research issue that emerges from our approach, shown in Figures 8 and 9. Based on the probe-car data, we will conduct floating space vehicle trajectory data matching with reference to the specific electronic map.According to the matching information, combining the spatial data mining model and artificial intelligence, the lane path will be extracted by the vehicle tracks, which will lay the foundation for the acquisition of highprecision maps.

Figure 1 .
Figure 1.Technical architecture of a probe-car system.

Figure 1 .
Figure 1.Technical architecture of a probe-car system.
A2 = A1 + 0.1 ° // Extend the range A1 as A2 M = LoadData (A2); // Load the map data of the range A2 into memory Enter the track collection D While(D != NULL) { P = Read(D); // Read one track in M in order

Figure 4 .
Figure 4. Workflow for the first map algorithm.

Figure 4 .
Figure 4. Workflow for the first map algorithm.
Read the latitude and longitude range A2 in D; A3 = A1 + A2; M = LoadData(A3); // Load the map data of A3 into memory While(D != NULL) { P = Read(D); // Read one of the cross-region tracks in M in order Match track P with the electronic map M; If(matched) { log(P and M match relations); }

Figure 7 .
Figure 7. Different sizes of the divided region (m × n).

Figure 7 .
Figure 7. Different sizes of the divided region (m × n).

Table 1 .
Research and applications of intelligent transport systems.AATCC, Automobile Association Traffic Control Centre; FHA, Federal Highway Administration; ITA, Illinois Transportation Authority; GPS, Global Positioning System; VICS, Vehicle Information and Communication System.