A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data

Liu, Qiang; Xie, Jianguang; Ding, Fan

doi:10.3390/su13137131

Open AccessArticle

A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data

by

Qiang Liu

^1,2,

Jianguang Xie

¹

and

Fan Ding

^3,*

¹

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

Jiangsu Sinoroad Engineering Research Institute Co., Ltd., Nanjing 211008, China

³

School of Transportation, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

Sustainability 2021, 13(13), 7131; https://doi.org/10.3390/su13137131

Submission received: 8 May 2021 / Revised: 1 June 2021 / Accepted: 2 June 2021 / Published: 25 June 2021

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

With the finishing of the construction of the main body of a freeway network, adequately monitoring the traffic status of the network has become an urgent need for both travelers and transportation operators. Various methods are proposed to collect traffic information for this purpose. In this article, a data-driven feature-based learning application is implemented to detect segment traffic status using mobile phone data, building on the practical success of deep learning models in other fields. The traffic status estimation is achieved via the application of a three-level long, short-term memory model. Two phone features are extracted from the raw mobile phone data. A large-scale field experiment was conducted using actual data in Jiangsu, China collected over the “National Holiday Golden Week” of 2014. To evaluate the performance, both precision and recall scores are given along with the overall accuracy. The final results of the large-scale experiment indicate that the proposed application performed well and can be an emerging solution for traffic state monitoring when only limited roadside sensing devices are installed.

Keywords:

mobile phone data; deep learning model; feature extraction; traffic status detection

1. Introduction

With the finishing of construction on the main body of a freeway network, the ability to fully monitor the traffic status of the network has become an urgent need for both travelers and transportation operators alike [1,2,3]. Meanwhile, a system that could fully monitor the traffic status would provide the opportunity for further active traffic control and management. Therefore, smart road, which has the ability to communicate with passing vehicles, provides information, and achieves real-time monitoring, has developed rapidly and drawn considerable attention by the public recently [4]. Such need and opportunities are pressing for busy freeways installed with only limited monitoring devices. Regarding the duration and costs for the deployment of a monitoring system, other than conventional solutions (e.g., loop detectors and microwave detectors), new technologies, focusing on emerging big data resources and data mining approaches through machine learning, have been introduced in recent decades [5,6].

Data, generated by wireless communication networks, also known as cellular data, have been used to identify the characteristics of transportation, such as travel demand and traffic details. Regarding the benefits, broad coverage and high penetration rate of the communication network, especially since the internet has gone mobile, have revealed the pattern of individual human mobility from mobile phone data [7,8]. Studies have also indicated that mobile phone location data are a feasible resource for estimating origin-destination flows and travel demand [9,10,11]. These solutions have proved the potential and opportunities for cellular data application on large-scale issues which do not requiring extreme positioning accuracy.

With the promotion of the big data perspective, a data-driven learning application, which can reveal high-dimensional relations and reduce background noise, is proposed to estimate traffic status using the whole set of mobile phone data, rather than HO data. Two features are extracted from cellular activity data to replace the monitoring parameters generated from HO data, and these features are normalized to partially reduce the background noise. Furthermore, a large-scale experiment was conducted on a busy, 250 km expressway in China. The results were validated by ten bi-directional microwave detector stations. To the best of our knowledge, it was the first experiment with such extensive coverage and that many validation points.

The rest of this paper is organized as follows. In Section 2, a general overview of previous, related studies is provided. Section 3 introduces the composition of the mobile phone data and the procedure of generating two proposed phone features. In Section 4, the learning models applied in this study are presented in detail. The following section, Section 5, presents the experiment and the performance evaluations of this research. Finally, Section 6 concludes that the learning application, based on cellular activity data, performs well in large-scale traffic status estimation.

2. Literature Review

Unlike GPS data, cellular data usually do not contain precise locations, which is a major obstacle for using it to estimate or predict short-term traffic details. In order to determine a phone in a communication network accurately, researchers introduced the handover (HO) mechanism, also known as handoff, which occurs mostly when a phone has an active call and passes the boundary of two cells [12]. Through genuine cellular network HO data, Bar-Gera [13] discussed the possibility of implementing a mobile phone-based system with the function of traffic speeds and travel time estimation. Demissie et al. [14] studied the relation between traffic volumes and the number of HOs over the hours of a day. Evaluations confirmed the positive results and the promising future of HO-based traffic estimation methods [13,15].

However, real communication data always includes a massive number of unexpected and indeterministic errors and interferences, which can affect the performance enormously. Therefore, some researchers tried to fuse HO data with the conventional data of the on-road device, such as loop detectors or microwave detectors, to estimate traffic speed [16], and analyze the performance of the integrated system [17]. On the other hand, other researchers have focused on applying methodologies from different perspectives to reduce the impacts of system noise. Cheng et al. [18] proposed particle filter-based models to cope with system noise and measurement errors. Demissie et al. [14] utilized a single layer backpropagation artificial neural network (BP-ANN) to estimate three discrete levels of arbitrarily defined traffic counts with regard to the noise and sensitivity of HO data. Beside implementing complex statistical and machine learning models, Gundlegård and Karlsson [19] showed that multiple road test runs would help to determine the relatively accurate HO location, thereby mapping with the road network. Nevertheless, multiple tests run for each HO point would increase the cost of deployment for the system of a large-scale freeway network.

Besides the problem of noise, the sample size of HO data is another essential attribute that affects the performance of a phone-based system, much the same as other data sources [20]. Regarding the mechanism of HO data generation, when a phone is on a call and simultaneously passes the boundary of two cells, the HO data is only a small portion of the whole mobile phone dataset. To increase the sample size, Caceres et al. [21] proposed models to estimate traffic flow using in-motion phone data, which is derived by combining HO data and the data of phones having two consecutive calls in two cells. However, such an increase might not be significant, especially considering the fact that many countries forbid the use of mobile phones while driving, phone conversations made while driving typically do not last long enough to pass through two cells, and the number of HO points adjacent to freeways could be sparse for a large-scale freeway. Literature on applying mobile phone data to estimate traffic measures can be found through references [22,23,24,25].

3. Cellular Activity Data and Extracted Features

The whole mobile phone dataset includes both network signals, such as location updates (LU) and handover (HO), and records for billing, such as phone call started, phone call ended, text messaging, and data service. Previous studies have usually focused on estimating traffic status using HO data [13,14,18,20]. In the cellular data of this study, raw HO records were only 12.7% of the total dataset. Intuitively, validated and useful HO records are even smaller after filtering and processing. In order to enlarge the sample size, Caceres et al. [21] proposed the combination of HO data and the data of phones with two consecutive calls in two cells together as the model input. However, records of phones with active calls occupy 10.1%, which still cannot take the full advantage of the entire cellular data. Regarding the scale issue [14], that is the relation between traffic and the number of phones, happened at different scales for different locations and for large-scale freeway traffic monitoring, the intuitive idea is to involve as many validated phones as possible into estimating traffic status. Therefore, instead of only involving phones with consecutive calls in two cells, phones that had consecutive records within a relatively short duration in two cells which are close to the target freeway were also counted. In this way, the potential data set was enlarged to almost cover the entire raw cellular activity data. Subsequently, Ding et al. [25] established a method to extract two phone features, i.e., phone speed (PSP) and phone count (PC).

In this method, the base station, which is the center of each cell in the communication network, is mapped to the freeway based on the projection point and assigned with accumulative mileage computed from the route origin, which allows for a unique mileage for an individual base station. Regarding each raw data record only containing the base station identification, the projection point of the corresponding base station of each record is used to position the phone and is represented as

m i_{r e c o r d}

. The flowchart of the phone speed feature extraction for a certain phone,

p

, in an interval of

t

of a certain segment is shown in Figure 1.

As shown in Figure 1,

r e_{p, t}

and

r e_{p, t - 1}

represent all raw mobile phone data records of a certain phone,

p

, during time interval

t

and

t - 1

in the target segment, respectively.

The phone

p

was defined as a valid phone if the mileage difference between the first and last records, in chronological order, during interval

t

in

r e_{p, t}

,

m i_{f i r s t (r e_{p, t})}

, and

m i_{l a s t (r e_{p, t})}

of the phone,

Δ m i_{p, t}

, was not equal to zero. In addition, the movement direction of the phone could be determined through the sign of

Δ m i_{p, t}

. The phone speed of phone

p

was acquired via the computation of the ratio of the absolute value of

Δ m i_{p, t}

to the time difference between the first and last records of phone

p

during interval

t

,

Δ t s_{p, t}

.

On the other hand, if

Δ m i_{p, t}

was equal to zero, the method retrieved the first record of phone

p

in its previous interval, i.e.,

m i_{f i r s t (r e_{p, t - 1})}

, and then used the abovementioned algorithm to calculate the phone speed again.

However, if

Δ m i_{p, t}

matches none of the two cases mentioned above, the phone

p

was defined as invalid and filtered out.

As indicated by Ding et al. [25], the moving direction availability is the filter condition for the phones. The phone count can be kept as a second feature if the phone’s moving direction is available for each direction. Finally, the phone speed of each segment

i

over interval

t

in a certain direction,

p s_{t}^{i}

, can be generated by calculating the mean value of the speed features of all valid phones in the corresponding direction and interval, and the phone count of each segment

i

over interval

t

,

p c_{t}^{i}

, can be generated through counting the number of valid phones for each direction.

Due to the relaxed filter restriction, the scale issue, and the system noise, features were also applied with a scaling normalization method to regularize the searching space. The model descried in the next section was applied not only to reveal the pattern of estimating traffic status, but also to further denoise.

4. Learning Models

Based on those two extracted features, learning models were implemented to estimate the traffic status. In recent years, neural networks showed outstanding performance in practical contests [26]. Regarding the spatiotemporal nature of traffic status among segments, a unique recurrent neural network (RNN) architecture, the long short-term memory (LSTM), was implemented since it has the capability of revealing the optimal time lags automatically. The detailed structure of an LSTM block is shown in Figure 2.

The LSTM was proposed by Hochreiter and Schmidhuber [27], and it was designed to avoid the gradient vanishing issue [28]. An LSTM block consists of a memory cell,

c

, and three gates: the input gate,

i n

, the output gate,

o

, and the forget gate,

f

. The basic computation workflow of a block for interval t can be expressed in the form as follows:

i n_{t}^{} = s i g m o i d (W_{i n, x}^{} x_{t}^{} + W_{i n, m}^{} m_{t - 1}^{} + W_{i n, c}^{} c_{t - 1}^{} + b_{i n}^{})

(1)

f_{t}^{} = s i g m o i d (W_{f, x}^{} x_{t}^{} + W_{f, m}^{} m_{t - 1}^{} + W_{f, c}^{} c_{t - 1}^{} + b_{f}^{})

(2)

z_{t}^{} = t a n h (W_{z, x}^{} x_{t}^{} + W_{z, m}^{} m_{t - 1}^{} + b_{z}^{})

(3)

c_{t}^{} = f_{t}^{} ⊙ c_{t - 1}^{} + i n_{t}^{} ⊙ z_{t}^{}

(4)

o_{t}^{} = s i g m o i d (W_{o, x}^{} x_{t}^{} + W_{o, m}^{} m_{t - 1}^{} + W_{o, c}^{} c_{t}^{} + b_{o}^{})

(5)

m_{t}^{} = o_{t}^{} ⊙ t a n h (c_{t}^{})

(6)

Here,

⊙

denotes element-wise multiplication (Hadamard product),

x_{t}

denotes the input of the block over interval t,

W

denotes the weight matrices between corresponding vectors (e.g.,

W_{i n, x}^{}

represents the matrix of weights from the input gate,

x

, to the input,

i n

, of a memory cell),

b

represents the vector of bias values (e.g.,

b_{f}^{}

represents the bias at the forget gate,

f

, of layer k), and

m_{t}^{}

denotes the output of the LSTM memory block over interval t. The activation function of all gates is the sigmoid function [29], and the tanh-function is applied for other activations.

With the trend of the model going “deeper” in the neural network field, studies have confirmed that a deep learning model could give an excellent performance on pattern recognition, classification, and prediction-related topics, though theoretical interpretability is an on-going task. Regarding the high background noise of the raw mobile phone data, and the advantage of figuring out the high dimensional relations among features, a three-level LSTM neural network (LSTM-3) was implemented for this study (Figure 3).

For each direction of segment

i

over interval t, the input of layer k is denoted as

x_{t}^{k, i}

, while the output is denoted as

y_{t}^{k, i}

. Specifically, the definitions are as follows:

x_{t}^{k, i} = \{\begin{matrix} {(p s_{t}^{i}^{} p c_{t}^{i}^{})}^{T} \\ y_{t}^{k - 1, i} \end{matrix} \begin{matrix} k = 1 \\ o t h e r w i s e \end{matrix}

(7)

y_{t}^{k, i} = l (W_{y, m}^{k} m_{t}^{k, i} + b_{y}^{k})

(8)

Here,

p s_{t}^{i}^{}

and

p c_{t}^{i}^{}

are the corresponding feature of segment i over interval t.

W_{y, m}^{k}

,

m_{t}^{k, i}

, and

b_{y}^{k}

are the weight, the output vector of LSTM blocks, and the bias, respectively. A linear function,

l (\cdot)

, is used to aggregate attributes.

The final output was the traffic status. The traffic status here classified traffic conditions into three levels: free flow (L0), transition mode (L1), and severe congestion (L2). Each level was assigned a unique color (Table 1). In this study, unlike the work of Demissie et al. [14], the levels were uniformly divided based on traffic speed limits rather than on volume. This was because the nature of traffic flow, where the low traffic could either refer to free flow status or severe congestion, cannot represent the running conditions from the operational point of view. The speed range of each level was defined arbitrarily by the local traffic operation center.

Furthermore, a single layer feedforward neural network (FNN-1) and a single layer LSTM neural network (LSTM-1) were also applied to compare the classification performance on the time series data of phone features with the LSTM-3 model.

5. Case Study in Jiangsu, China

The testbed of this study is a busy, eight-lane freeway in Jiangsu, China [30]. It is divided evenly into 250 one-kilometer-long segments. Figure 4 demonstrates the view of the research area. The data collection, for both cellular activity data and microwave data, was conducted from 00:00 1 October 2014 to 23:59 7 October 2014, covering the whole “National Day Golden Week”. The microwave data were collected from roadside detector stations and aggregated by a local transportation operation center for every 5 min. As the detectors of station No. 4 were not fully functional, only data from the other ten stations were involved in this experiment for both directions. To reduce the impacts of outliers and noise, microwave data were firstly smoothed by the seasonal trend decomposition (STL) procedure [31], then classified into three levels based on the speed (Table 1).

Based on the processed microwave data for the ten validation segments of the westbound direction, over the “National Day Golden Week” 19,366 intervals were labeled as L0, 203 intervals as L1, and the remaining 591 intervals as L2. For the eastbound direction, the numbers were 19,146 L0, 429 L1, and 585 L2. The three levels of traffic status were exceptionally unevenly distributed, which fit with expectations. However, this would impact the performance evaluation if only the overall classification accuracy was considered as other studies did [14], since that estimated all intervals at L0 could already achieve high accuracy. Therefore, two terms, precision and recall, were introduced to evaluate the performance of learners for each traffic state level.

In similar topics, such as pattern recognition and binary classification, precision and recall are widely applied to measure the relevance, particularly for imbalanced classification problems. Precision is the ratio of correctly estimated instances among the retrieved instances, while recall is the fraction of correctly estimated instances over the total amount of actual observed, relevant instances. For a single class, Equations (9) and (10) use true positive (TP, actual true and estimated as true), false positive (FP, actual true and estimated as false), and false negative (FN, actual false and estimated as false) to present the definition of precision and recall.

p r e c i s i o n = \frac{T P}{T P + F P}

(9)

r e c a l l = \frac{T P}{T P + F N}

(10)

Table 2 and Table 3 indicate the confusion matrixes, corresponding precision, recall and overall accuracy for both directions and three learners. After training, all learners could detect three traffic status levels and especially perform well in the detection of free flow status and severe congestions. Compared to the other two levels, all learners did not express promising performance on the transition mode (L1). The potential reasons for this specific phenomenon include:

1.: The transition mode is a relatively unstable state and typically only lasts for a short period of time. Therefore, there are not adequate samples to train and validate learning models, especially deep ones;
2.: Current validation data were collected from microwave detectors, which report the instantaneous speed for vehicles. For some intervals, the change of levels defined by such speed could be short and intense (jumping L1) even though the aggregation and the denoise processes were applied. However, extracted features were generated by the average space-mean speed of each valid phone, so learners might incorrectly classify the traffic state due to the sensitivity;
3.: Though all detectors were fully functional during the research period, no individual microwave detector is identical to another, which means results could vary for different detectors even if monitoring the same traffic condition. It is better to involve a second validation data source to verify the performance and correctness further.

In general, learning models could distinguish traffic states based on extracted features from cellular activity data. Furthermore, based on the overall accuracy, the three-level LSTM model showed the best performance for both directions although the differences were not very significant compared to the ones of the other two learners. Nevertheless, this improvement was hard to achieve due to the complex high-dimensional feature relations. The deep model indeed learned to reveal the relationship between traffic states and extracted features. Figure 5 demonstrates the comparisons between levels according to microwave data (after STL procedure) and the LSTM-3 model outputs for validation links with L1 and L2 conditions. As the study covered an entire week, and each interval was 5 min, pie charts were applied to present the results of selected validation links. The left half circle of each chart indicates the traffic status levels based on traffic speed obtained from microwave detectors and the right represents the model outputs.

Intuitively, the pie charts in Figure 5 are mostly symmetrical. Therefore, the LSTM-3 outputs are essentially matched with the traffic status as defined by the microwave data. There are certain false alarms and detection time latency, which are because of the nature of cellular activity data and the reasons mentioned earlier. Meanwhile, Figure 6 presents the final results of LSTM-3 covering the entire 250 km large-scale freeway over the research period. The diagram matches the practical understanding of bottlenecks on a freeway and the concept of spatiotemporal congestion propagation. According to the final results, the learning model is a promising method to detect three level traffic states under extreme volumes when only limited roadside equipment is installed.

6. Summary and Future Works

This research presents a data-driven learning-based method to detect three level traffic states using a big data resource, specifically, cellular activity data. Two features, the phone count (PC) and the phone speed (PS), were extracted from the raw cellular activity data and considered as model inputs for learners. Three types of learners, a single layer feedforward neural network (FNN-1), a single layer LSTM network (LSTM-1), and a three-level LSTM (LSTM-3), were employed to indicate and verify the performance of learning models, especially the deep model, on traffic state estimation. The proposed method took advantage of the broad coverage of the wireless communication network and used features, rather than conventional handover and location update records, to monitor traffic states.

The case study of this research took place on a busy, 250 km expressway in China during the “National Day Golden Week” of 2014. The three traffic status levels were defined by the local traffic operation center. On the expressway, there were ten working microwave detector stations, which were selected as validation segments, and the ground truth levels were computed through the data collected by the microwave detectors. Regarding the fact that traffic states are not uniformly distributed, two evaluation parameters, precision and recall, as well as overall accuracy, were involved in measuring the estimation results. According to the overall accuracy, the deep model achieved the best outcomes for both directions. Meanwhile, a detailed spatiotemporal traffic states diagram was demonstrated to perform the final results, which indicates the promising performance and the bright future of the proposed method to monitor traffic details for freeways with limited conventional roadside detectors installed.

Nevertheless, the results were promising, but still have space for future improvements. Additional studies are desperately needed, including examining the impacts of traffic states levels, specifically the different numbers of levels and different ranges for each level. Future research could work on proposing methods to detect the transition mode precisely and in detail. With further enhancements, the traffic monitoring solution based on cellular activity data should be a promising implementation that could cover large-scale freeways and be deployed easily, even with limited numbers of roadside equipment.

Author Contributions

Conceptualization, F.D.; Formal analysis, Q.L.; Methodology, Q.L., J.X. and F.D.; Resources, F.D.; Writing—original draft, Q.L.; Writing—review & editing, J.X. and F.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Fundamental Research Funds for the Central Universities (Grant No. 2242021R10039).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request due to restrictions e.g., privacy or ethical.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bucknell, C.; Herrera, J.C. A trade-off analysis between penetration rate and sampling frequency of mobile sensors in traffic state estimation. Transp. Res. Part C Emerg. Technol. 2014, 46, 132–150. [Google Scholar] [CrossRef]
Herrera, J.C.; Work, D.B.; Herring, R.; Ban, X.; Jacobson, Q.; Bayen, A.M. Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment. Transp. Res. Part C Emerg. Technol. 2010, 18, 568–583. [Google Scholar] [CrossRef] [Green Version]
Wang, W.; Jin, J.; Ran, B.; Guo, X. Large-scale freeway network traffic monitoring: A map-matching algorithm based on low-logging frequency GPS probe data. J. Intell. Transp. Syst. Technol. Plan. Oper. 2011, 15, 63–74. [Google Scholar] [CrossRef]
Trubia, S.; Severino, A.; Curto, S.; Arena, F.; Pau, G. Smart Roads: An Overview of What Future Mobility Will Look Like. Infrastructures 2020, 5, 107. [Google Scholar] [CrossRef]
Bachmann, C.; Roorda, M.J.; Abdulhai, B.; Moshiri, B. Fusing a bluetooth traffic monitoring system with loop detector data for improved freeway traffic speed estimation. J. Intell. Transp. Syst. Technol. Plan. Oper. 2013, 17, 152–164. [Google Scholar] [CrossRef]
Tanikella, H.; Smith, B.L. An investigation of the application of stratified sampling in probe-based traffic-monitoring systems. J. Intell. Transp. Syst. Technol. Plan. Oper. 2010, 14, 83–94. [Google Scholar] [CrossRef]
Calabrese, F.; Diao, M.; Di Lorenzo, G.; Ferreira, J.; Ratti, C. Understanding individual mobility patterns from urban sensing data: A mobile phone trace example. Transp. Res. Part C Emerg. Technol. 2013, 26, 301–313. [Google Scholar] [CrossRef]
González, M.C.; Hidalgo, C.A.; Barabási, A.-L. Understanding individual human mobility patterns. Nature 2008, 453, 779–782. [Google Scholar] [CrossRef]
Caceres, N.; Wideberg, J.P.; Benitez, F.G. Deriving origin–destination data from a mobile phone network. IET Intell. Transp. Syst. 2007, 1, 15. [Google Scholar] [CrossRef] [Green Version]
Calabrese, F.; Di Lorenzo, G.; Liu, L.; Ratti, C. Estimating Origin-Destination Flows Using Mobile Phone Location Data. IEEE Pervasive Comput. 2011, 10, 36–44. [Google Scholar] [CrossRef]
Wang, M.-H.; Schrock, S.D.; Vander Broek, N.; Mulinazzi, T. Estimating Dynamic Origin-Destination Data and Travel Demand Using Cell Phone Network Data. Int. J. Intell. Transp. Syst. Res. 2013, 11, 76–86. [Google Scholar] [CrossRef]
Yilin, Z. Mobile phone location determination and its impact on intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 2000, 1, 55–64. [Google Scholar] [CrossRef]
Bar-Gera, H. Evaluation of a cellular phone-based system for measurements of traffic speeds and travel times: A case study from Israel. Transp. Res. Part C Emerg. Technol. 2007, 15, 380–391. [Google Scholar] [CrossRef]
Demissie, M.G.; de Almeida Correia, G.H.; Bento, C. Intelligent road traffic status detection system through cellular networks handover information: An exploratory study. Transp. Res. Part C Emerg. Technol. 2013, 32, 76–88. [Google Scholar] [CrossRef]
Liu, H.X.; Danczyk, A.; Brewer, R.; Starr, R. Evaluation of Cell Phone Traffic Data in Minnesota. Transp. Res. Rec. 2008, 1–7. [Google Scholar] [CrossRef]
He, S.; Zhang, J.; Cheng, Y.; Wan, X.; Ran, B. Freeway Multisensor Data Fusion Approach Integrating Data from Cellphone Probes and Fixed Sensors. J. Sens. 2016, 2016, 1–13. [Google Scholar] [CrossRef]
Zhang, J.; He, S.; Wang, W.; Zhan, F. Accuracy Analysis of Freeway Traffic Speed Estimation Based on the Integration of Cellular Probe System and Loop Detectors. J. Intell. Transp. Syst. Technol. Plan. Oper. 2015, 19, 411–426. [Google Scholar] [CrossRef]
Peng, C.; Zhijun, Q.; Bin, R. Particle filter based traffic state estimation using cell phone network data. In Proceedings of the 2006 IEEE Intelligent Transportation Systems Conference, Toronto, ON, Canada, 17–20 September 2006; pp. 1047–1052. [Google Scholar] [CrossRef]
Gundlegard, D.; Karlsson, J.M. Handover location accuracy for travel time estimation in GSM and UMTS. IET Intell. Transp. Syst. 2009, 3, 87. [Google Scholar] [CrossRef] [Green Version]
He, S.; Cheng, Y.; Zhong, G.; Ran, B. A data-driven study on the sample size of cellular handoff probe system. Adv. Mech. Eng. 2017, 9, 168781401769844. [Google Scholar] [CrossRef] [Green Version]
Caceres, N.; Romero, L.M.; Benitez, F.G.; del Castillo, J.M. Traffic Flow Estimation Models Using Cellular Phone Data. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1430–1441. [Google Scholar] [CrossRef]
Zhang, J.; Wang, F.-Y.; Wang, K.; Lin, W.-H.; Xu, X.; Chen, C. Data-Driven Intelligent Transportation Systems: A Survey. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1624–1639. [Google Scholar] [CrossRef]
Caceres, N.; Wideberg, J.; Benitez, F.G. Review of traffic data estimations extracted from cellular networks. IET Intell. Transp. Syst. 2008, 2, 179. [Google Scholar] [CrossRef]
Qiu, Z.; Cheng, P. State of the art and practice: Cellular probe technology applied in advanced traveler information system. In Proceedings of the Transportation Research Board 86th Annual Meeting, Washington, DC, USA, 21–25 January 2007. [Google Scholar]
Ding, F.; Zhang, Z.; Zhou, Y.; Xiaoxuan, C.; Bin, R. Large-Scale Full-Coverage Traffic Speed Estimation under Extreme Traffic Conditions Using a Big Data and Deep Learning Approach: Case Study in China. Transp. Eng. Part A Syst. 2019, 5, 145. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. Gradient Flow in Recurrent Nets: The Difficulty of Learning Long Term Dependencies. In A Field Guide to Dynamical Recurrent Networks; Wiley-IEEE Press: Hoboken, NJ, USA, 2001. [Google Scholar] [CrossRef] [Green Version]
Han, J.; Moraga, C. The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning; Springer: Berlin, Germany, 1995; pp. 195–201. [Google Scholar] [CrossRef]
G42 Shanghai–Chengdu Expressway—Wikipedia [WWW Document], n.d. Available online: https://en.wikipedia.org/wiki/G42_Shanghai–Chengdu_Expressway (accessed on 31 December 2017).
Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. J. Off. Stat. 1990, 6, 3–33. [Google Scholar]

Figure 1. The flowchart of phone speed feature extraction.

Figure 2. The detailed structure illustration of an LSTM block.

Figure 3. The structure of the proposed LSTM-3 network.

Figure 4. General view of the research area and locations of roadside detectors. (Note: background photo is from Google Maps).

Figure 5. Comparisons between traffic status levels according to microwave data and LSTM-3 outputs for selected validation links.

Figure 6. The final results of LSTM-3 for all segments over the research period (a) Westbound results of the testbed freeway (b) Eastbound results of the testbed freeway.

Table 1. Definition of three traffic status levels.

Level	Description	Color	Speed Range
L0	Free flow	Green	≥80
L1	Transition mode	Yellow	(40, 80)
L2	Severe congestion	Red	≤40

Table 2. Confusion matrix and performance of learners for the westbound direction.

Observed Level	FNN-1 Predicted			LSTM-1 Predicted			LSTM-3 Predicted
Observed Level	L0	L1	L2	L0	L1	L2	L0	L1	L2
L0	19,027	275	64	19,066	224	76	19,147	159	60
L1	79	102	22	84	102	17	94	77	32
L2	59	244	288	50	283	258	104	220	267
Performance
Precision	0.993	0.164	0.770	0.993	0.167	0.735	0.990	0.169	0.744
Recall	0.982	0.502	0.487	0.985	0.502	0.437	0.989	0.379	0.452
Overall Accuracy	0.963			0.964			0.967

Table 3. Confusion matrix and performance of learners for the eastbound direction.

Observed Level	FNN-1 Predicted			LSTM-1 Predicted			LSTM-3 Predicted
Observed Level	L0	L1	L2	L0	L1	L2	L0	L1	L2
L0	18,653	464	29	18,813	294	39	18,860	241	45
L1	83	284	62	132	236	61	106	250	73
L2	19	164	402	24	165	396	38	114	433
Performance
Precision	0.995	0.311	0.815	0.992	0.340	0.798	0.992	0.413	0.786
Recall	0.974	0.662	0.687	0.983	0.550	0.677	0.985	0.583	0.740
Overall Accuracy	0.959			0.965			0.969

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Q.; Xie, J.; Ding, F. A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data. Sustainability 2021, 13, 7131. https://doi.org/10.3390/su13137131

AMA Style

Liu Q, Xie J, Ding F. A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data. Sustainability. 2021; 13(13):7131. https://doi.org/10.3390/su13137131

Chicago/Turabian Style

Liu, Qiang, Jianguang Xie, and Fan Ding. 2021. "A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data" Sustainability 13, no. 13: 7131. https://doi.org/10.3390/su13137131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data-Driven Feature Based Learning Application to Detect Freeway Segment Traffic Status Using Mobile Phone Data

Abstract

1. Introduction

2. Literature Review

3. Cellular Activity Data and Extracted Features

4. Learning Models

5. Case Study in Jiangsu, China

6. Summary and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI