You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

5 November 2018

Road Scene Simulation Based on Vehicle Sensors: An Intelligent Framework Using Random Walk Detection and Scene Stage Reconstruction

,
,
,
,
and
1
School of Software Engineering, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an 710049, Shaanxi, China
2
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an 710049, Shaanxi, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Sensors Applications in Intelligent Vehicle

Abstract

Road scene model construction is an important aspect of intelligent transportation system research. This paper proposes an intelligent framework that can automatically construct road scene models from image sequences. The road and foreground regions are detected at superpixel level via a new kind of random walk algorithm. The seeds for different regions are initialized by trapezoids that are propagated from adjacent frames using optical flow information. The superpixel level region detection is implemented by the random walk algorithm, which is then refined by a fast two-cycle level set method. After this, scene stages can be specified according to a graph model of traffic elements. These then form the basis of 3D road scene models. Each technical component of the framework was evaluated and the results confirmed the effectiveness of the proposed approach.

1. Introduction

Road scene simulation and modeling based on vehicle sensors are currently an important research topic in the field of intelligent transportation systems [,]. The data acquired from vehicle-mounted sensors mainly include images, laser scans, and GPS. Once acquired, road scene modeling results can be applied to the simulation of photo-realistic traffic scenes.
A key use of such simulation is for the purposes of road safety, especially with regard to unmanned vehicles. Extensive tests and evaluations need to be performed to ensure the safety and robustness of unmanned vehicles. These mainly involve the field tests and offline tests. Field tests are the most traditional way to proceed. Micky Town at Michigan State and Google’s Castle Atwater in California have both been constructed to undertake unmanned vehicle evaluation. To be effective, field tests have to be implemented on different types of roads including urban roads, rural roads and highways. However, there are a number of drawbacks to using field tests, such as limited driving space, the time and effort involved in constructing them and their significant monetary cost. As a result, offline tests have become an increasingly important complement to field-based evaluation.
Early studies using offline testing were designed around computer graphics methods. Typical applications here included Prescan from TNO (Netherland) [] and CarMaker from IPG (United States) []. Recently, road scene modeling based on road image sequences has become popular, for instance Google Street View [] or Microsoft Street Slide []. These systems provide users with an immersive touring experience while browsing a map. The Waymo Team at Google (California, USA) have constructed a virtual scene platform from road image sequences that can be applied to the simulation of virtual roads for billions of miles.
In this paper, we propose a new framework for road scene simulation from road image sequences. This specific application we have in mind for this framework is the offline testing of unmanned vehicles. The framework consists of two successive steps: (1) the detection of road and foreground regions’ detection; and (2) the construction of scene models. For the first step, we have developed a region detection method based on a superpixel-level random walk (RW) algorithm, which is further refined by a fast two-cycle (FTC) level set approach. For the second step, foreground and background models are constructed using scene stages. “Floor-wall” structured traffic scenes are then established.
The main contributions of this work can be summarized as follows:
  • The development of a novel region detection method using a superpixel RW algorithm. The superpixel features are computed using a combination of color, texture and location information. Region seeds for the RW algorithm are initialized through region-related trapezoids using an optical flow map. An FTC level set is then employed for region refinement.
  • The development of a new framework for constructing 3D scene models based on scene stages. The scene stages are specified according to the detected road and foreground regions. 3D scene models are then constructed on the basis of graph models of the scene stages.
  • The development of a new system to simulate traffic scenes. Two modes are designed for the simulation of traffic scenes: (1) an interactive touring mode, and (2) a bird’s-eye view mode. This system can form the basis of the offline testing of unmanned vehicles.

3. Random Walk for Region Detection at a Superpixel Level

In order to reconstruct road scene models, the road and foreground regions in each frame have to be detected first of all. To do this, we use a region detection method based on an RW algorithm. The road and the foreground regions of the first frame are specified by means of user annotation. For the rest of the frames, corresponding regions are detected based on the seeds propagated from the previous frames. The detection flow diagram for an example frame t is shown in Figure 1.
Figure 1. Flow diagram of the superpixel-level random walk algorithm at frame t.

3.1. Superpixel Segmentation

For the region detection we use superpixel segmentation, specifically adopting the Ncut method [], with superpixel features being extracted as the basis of the algorithm (see, Table 2). Color, texture and location descriptors are combined to denote the superpixel features. The color descriptor mainly used the HSV and CIElab color spaces. The texture descriptor is composed of a Gabor filter and a local binary pattern (LBP). The location descriptor consists of a convex hull, a clique number, etc. These descriptors are then concatenated into an overall vector.
Table 2. Feature descriptors of superpixels (superpixel cliques).

3.2. Definition of Seed Trapezoid

Seed trapezoids for the road and foreground regions are depicted in Figure 2. If R 1 is taken to be the center of the detected road region of frame t 1 , N 1 can be denoted as the upper bound, with the same vertical coordinate as the centroid. The bottom line of the trapezoid ( C D ) is the same as frame t 1 ’s road base. The top of the trapezoid ( A B ) is equal to the perpendicular distance for R 1 N 1 . The seed trapezoid seed for the road region can then be specified. The trapezoid for the foreground region is defined in a similar way, as shown with regard to the placement in Figure 2.
Figure 2. Definition of seed trapezoid for road and foreground regions. The green and blue colors denote the respective regions.

3.3. Seed Trapezoid Initialization by Optical Flow

The trapezoid of a certain frame t is initialized in frame t 1 using an optical flow map. Optical flow map generation is typically classified into (1) sparse optical flow and (2) dense optical flow. Sparse optical flow is computed from the local feature points in images, e.g., by using the Lucas–Kanade algorithm []. Dense optical flow, however, attempts to compute the optical flow information for all of the pixels in an image, for instance, by using the Horn and Schunck [] approach. The Bruhn method [] brings together the advantages of both sparse and dense optical flows. A dense optical flow field is generated by using this method that is robust under conditions of environmental noise. A comparison of the results produced by the Lucas–Kanade, Horn–Schunck and Bruhn methods are shown in Figure 3.
Figure 3. Optical flow map computed by different methods. (a) Lucas–Kanade; (b) Horn–Schunck; (c) Bruhn.

3.4. Random Walk Detection at a Superpixel Level

The superpixel-level RW detection is shown in Algorithm 1. Its basic steps are as follows:
  • The input image is transformed into the graph: G = ( V , E ) , where the graph nodes denote the superpixels. The weight of the edges in the graph are defined as follows:
    w i j = exp ( β s i s j 2 h i h j 2 ) ,
    where s i and s j are the features of superpixels i and j, while h i and h j denote the respective superpixel locations. The superpixel features are represented by a concatenation of the color and texture vectors:
    s = [ R ,   G ,   B c o l o r ,   T 1 ,   T 2 ,   ,   T N 1 t e x t u r e ] ,
    where [ R , G , B ] denotes the color feature of the current superpixel. [ T 1 , T 2 , , T N 1 ] denotes the texture feature, which is extracted by a Gabor filter. The image block centered at the current superpixel is used to compute the output vector of the Gabor filter. The superpixel features match those described in Section 3.1.
  • A Laplacian matrix is now defined to denote the superpixel probability labels:
    L i j = d i , i f v i = v j , - w i j , i f v i a n d v j a r e a d j a c e n t n o d e s , 0 , o t h e r w i s e ,
    where L i j is the superpixel value indexed by v i and v j , and d i = w ( e i j ) is the degree for v i on all edges e i j .
  • Next, vertices of the graph G are partitioned into two groups: a seeded set V M and an unseeded set V U . These two sets satisfy the following requirements: V M V U = V and V M V U = . The Laplacian matrix can now be redefined as:
    L = L M B d B d T L U ,
    where L M and L U are the seeded and unseeded pixels in L and B d is the decomposition result.
  • The Dirichlet integral is:
    D [ x ] = 1 2 x T L x = 1 2 e i j E w i j ( x i x j ) 2 ,
    where x is a function that minimizes Equation (5) and e i j is an edge of the graph connecting the vertices v i and v j .
  • The decomposition of Equation (5) can be defined as:
    D [ x U ] = 1 2 ( x M T x U T ) L x M x U ,
    where x M and x S are the sets of regional probabilities corresponding to the unseeded and seeded superpixels. Differentiating D [ x U ] with respect to x U yields
    L U x U = B d T x M .
    This equation can be solved using the method outlined in [], with it being considered to be a two-class segmentation problem.

3.5. Region Refinement Using an FTC Level Set

The region detection algorithm described above is effective for most conditions. However, as the computing units of the algorithm are superpixels, there is a risk that, if the superpixel features are similar in the foreground and background regions, it may fail to get accurate results. To solve this problem, a fast two-cycle (FTC) level set method can be applied that is based on pixels for region refinement. The main data structure for this method is composed of three parts: an integer array ϕ as the level set kernel matrix, an integer array F d for the speed function and two linked lists of boundary pixels to denote the foreground contours L i n and L o u t . The foreground contours are initialized from the regions detected by the superpixel-level RW algorithm. In order to facilitate curve evolution of the FTC level set, the first cycle should be conducted to establish the data fidelity terms, then a second cycle should be conducted to establish the data smoothness terms. The pixels inside L i n are called interior pixels, and the pixels outside L o u t are called exterior pixels. The kernel matrix can be defined as:
ϕ ( y ) = 3 if y is an exterior pixel , 1 if y L o u t , 1 if y L i n , 3 if y is an interior pixel .
The speed function F d is defined according to the competition terms between the foreground and background regions []. The curve evolution can then be determined by using the signs of the speed function rather than by solving any partial differential equations. The speed function is as follows:
F d = log p ( f C ( y ) | Ω m ) p ( f T ( y ) | Ω m ) p ( f C ( y ) | Ω b ) p ( f T ( y ) | Ω b ) ,
where Ω m and Ω b are the feature pools for the foreground and the background, respectively. f C ( y ) and f T ( y ) are the color and texture feature vectors extracted at pixel y. After definition of the kernel matrix and speed function, the curve evolution can be implemented across two cycles. The first cycle is for the boundary pixel evolution. The second cyle is for the boundary smoothness. The functions s w i t c h I n ( ) and s w i t c h O u t ( ) can be used for the curve evolution []. The region refinement examples based on the FTC level set are shown in Figure 4.
Figure 4. Region refinement based on the FTC level set. (a) the superpixel-level RW results; (b) the original region contours; (c) the refined region contours.
The region detection algorithm based on a superpixel-level RW with FTC refinement is summarized in Algorithm 1.
Algorithm 1 Region detection based on a superpixel-level RW with FTC refinement
Require : Input image sequence { I 1 ,   ,   I M } ; Superpixel set for each frame { S 1 ,   ,   S M } ;
  1:
Initialize the regions and trapezoids for the frame t = 0 ;
  2:
(1) Region detection based on a RW at superpixel level:
  3:
for t = 1 to M do
  4:
 Compute the optical flow map between frame t and t 1 ;
  5:
 Initialize region seed of frame t using the trapezoid of frame t 1 on the basis of the optical flow map;
  6:
 Down-sample frame t to a lower resolution;
  7:
 Implement the RW algorithm for frame t;
  8:
 Implement the threshold for the probability matrix to get the detection results;
  9:
 Up-sample frame t to the original resolution;
 10:
end for
 11:
(2) Region refinement based on the FTC level set:
 12:
for t = 1 to M do
 13:
 Initialize L i n and L o u t based on the RW detection results;
 14:
 Define the speed function F d using Eq. (9);
 15:
 Undertake the Fast Two-Cycle curve evolution;
 16:
 Get the refined regions according to the contours;
 17:
end for
Ensure : The region detection results for the image frames { R 1 , R M } ; The seed trapezoid for each image frame .

4. Road Scene Construction and Simulation

After detection of the road and foreground regions, the corresponding static and dynamic traffic elements can be extracted. The scene models are then constructed based on the scene stages. These form the basis of the road scene simulation process.

4.1. Scene Model Construction

In order to construct the road scene models, a graph model can be used for each image to represent the connection between the foreground and background regions:
G = ( N , E ) ,
where N and E represent the scene nodes and the corresponding relationships. For the ith image, N i and E i are defined by:
N i = { R P i , L W i , R W i , B W i , F G 1 , F G 2 , , F G n } , E i = { L C P o i n t s i , R C P o i n t s i , P F G 1 , P F G 2 , , P F G n } ,
where R P i , L W i , R W i , and B W i denote “road plane”, “left wall”, “right wall” and “back wall”, respectively. L C P o i n t s i and R C P o i n t s i are the control points of road boundaries on each side []. { F G 1 , , F G n } denotes the n foreground objects, while { P F G 1 , , P F G n } are the positions of rht foreground polygons standing on the road plane.
A set of scene stages S T G is defined in advance, which can represent the basic structures of typical road scenes. A related scene stage can be specified for each graph model to denote the scene’s geometric structure. The scene stage is a traffic scene wireframe model that represents the scene layout for 3D traffic scene modeling. Scene stages can be classified into simple and complex types, as shown in Figure 5. A simple scene stage consists of just the background elements of a traffic scenes. A complex scene stage consists of both the background and foreground elements.
Figure 5. Scene stages of traffic scenes. (a) through lane; (b) left-turn lane; (c) right-turn lane; (d) intersection; (e) tunnel.
For the simple scene stages, the corresponding background models have a “floor-wall” structure, with the road region being located on the horizon plane. The rest of the background regions are assumed to be perpendicular to the road plane. For the complex scene stages, the foreground models are constructed from rectangular polygons, which are set vertically to the road plane. The foreground polygons have R G B A data structure, with A denoting the transparency ratio. The polygons in the region outside of the foreground are set to be transparent.
We determine the control points for the road boundaries according to the traffic scene wireframe model. The control points are distributed on both sides of the road. In each scene stage, the two farthest control points are taken from the far end of the road surface, while the two closest control points are specified by the near end of the road surface. The remaining control points are distributed between the farthest and the nearest control points, as shown in Figure 6a. A 3D corridor road scene model can then be constructed according to the control points (this principle is shown in Figure 6b). In addtion to traditional scene models, cartoon scene models can be constructed by applying non-photorealistic rendering to the input images []. The road scene model construction process is summarized in Algorithm 2.
Figure 6. Generation of control points. (a) labeling of control points; (b) scene modeling results.
Algorithm 2 Road scene model construction.
Require: The region detection results for the image frames { R 1 , R M } .
1:
for t = 1 to M do
2:
 Specify the foreground regions { F 1 t , F 2 t , , F n t } and road region R D t from R t ;
3:
 Construct the scene graph model G = ( N , E ) ;
4:
 Specify the scene stage S T G t according to the scene graph model G;
5:
 Perform the scene construction based on the geometric structure of S T G t .
6:
end for
Ensure: Road scene models with the “floor-wall” structure.

4.2. Road Scene Simulation

The 3D traffic scene simulation can be performed by assembling the “floor-wall”-structured scene model. The viewpoint can be freely changed during the simulation process by using the commands forward, back, up and down. With the assistance of GPS data, the trajectory of the viewpoint can be displayed on the map. Depending on how the viewpoint moves, the traffic scene simulation can be classified into two modes: (1) Roaming mode, where new viewpoint images are generated according to the movement of the viewpoint; (2) Bird’s-eye view mode, where the viewpoint is static and virtual vehicles and obstacles can be added into the scene. The generation of new viewpoint images is shown in Figure 7a. The black grid represents the basic scene model. The green grid denotes the foreground vehicle model. Users can change the viewing angle through adjustments of the road scene model. The new viewpoint images are then generated, as shown in Figure 7b.
Figure 7. New viewpoint images rendering. (a) the 3D traffic scene model. The black mesh denotes the background scene, while the green mesh denotes the foreground vehicle; (b) the control of view angles; (c) the new viewpoint images.
We have designed five metrics for the offline testing of unmanned vehicles, namely: pedestrian recognition; collision avoidance; traffic signal recognition; pavement identification; and fog recognition. Four levels of performance for each metric were also defined to evaluate vehicle behavior, (L0, L1, L2, L3), as shown in Table 3. The complexity of the road scene can generally be divided into three categories, (R1, R2, R3), according to: road conditions; special areas; and special kinds of weather. On the basis of these categories, various scenes can be constructed easily for the offline testing of unmanned vehicles. The three categories of road complexity are as follows:
Table 3. Evaluation metrics of unmanned vehicles.
  • R1: Different road conditions including rural roads, urban roads, highways and tunnels. The road parameters relate to the specific road width, the number of obstacles, traffic signs and traffic lights.
  • R2: Special areas including campuses, hospitals, crowded streets, etc. Vehicles in special areas need to react quickly and perform different operations.
  • R3: Special kinds of weather including rain, snow, fog, etc. The special kinds of weather may also be of varying degree. This needs to be accurately identified for the evaluations.
Finally, the performance of unmanned vehicles may be evaluated across a combination of different degrees of scene complexity and different evaluation metrics.

5. Experiments and Comparisons

Our experiments were undertaken on a computer with an Intel i5 processor @3.33 GHZ and with 16 GB Memory). The experimental data was mostly taken from the TSD-max dataset [], which was constructed by the Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University in China. The dataset is composed of road images captured from urban roads, rural roads, highways, etc. Specifically, the experiment of road region detection is also based on the KITTI dataset [].

5.1. Evaluation of the Region Detection

First of all, we perform the region detection experiments. Three image sequences are selected for the region detection experiments: White Car (512 × 512 pixel size, 200 frames), Gray Truck (512 × 512 pixel size, 300 frames), and Red Truck (512 × 512 pixel size, 250 frames). The KITTI dataset is utilized for the task of road region detection (1242 × 375 pixel size, 200 frames). Figure 8 shows the superpixel segmentation at different scales, the optical flow map between adjacent frames, and the region detection results before and after refinement. The road detection results based on the superpixel-level RW method with refinement for the KITTI dataset is shown in Figure 9. The algorithms mainly work on the basis of a superpixel segmentation size of N = 500 . For quantitative evaluation, we use three metrics: precision; recall; and F-measure to compare with the ground truth road and foreground regions. Precision denotes the ratio of correct pixels within the detected road region. Recall denotes the ratio of correct pixels in relation to the benchmark road region. In our experiments, precision and recall are defined as follows:
P r e = R R G R , R e c = R R G R G ,
where R and R G denote the detected region and ground truth, respectively.
Figure 8. Region detection results based on the RW method (TSD-max dataset). (a) input image sequence; (bd): superpixel segmentation with superpixel numbers of N = 40, N = 200, and N = 500, respectively. (e) optical flow map between adjacent frames; (f) detection of foreground regions; (g) detection of both foreground and road regions; (h) region refinement by FTC level set.
Figure 9. Road region detection results based on an RW method with refinement (KITTI dataset).
The F-Measure can be computed by combining Pre and Rec:
F M e a s u r e = ( 1 + α ) · P r e · R e c α · P r e + R e c .
We set α = 0.5 to give precision and recall equal weight.
We employ the superpixel SVM [], Markov random field (MRF) [], GrowCut [] and Siogkas’s RW [] as baseline methods for road region detection. Besides these methods, VGG16 [] and ResNet101 [] are implemented for the comparison of road detection results. The average values for all of the image sequences are shown in Table 4 and Table 5. With regard to the methods presented in this paper, the superpixel-level random walk (SRW) and the superpixel-level RW with level set refinement (SRW+Refine) are implemented. As the results of the comparisons show, the accuracy our proposed method is approximately similar to ResNet101, which is superior to the other baseline methods.
Table 4. Evaluation of road detection results.
Table 5. Evaluation of foreground detection results.

5.2. Evaluation of the Scene Construction

Next, we performed experiments relating to the scene construction and simulation. Input image sequences with a resolution of 1024 × 1024 pixels were once again selected from the TSD-max dataset, as shown in Figure 10. There were three main types of scene models: (1) simple scene models based on pure background images; (2) complex scene models based on both foreground and background images; and (3) cartoon scene models based on non-photorealistic images [].
Figure 10. Construction of scene models. (a) traditional scene models; (b) cartoon scene models.
The scene models generated by our own approach were compared to those generated by Make3D [] and Photo Pop-up [], as shown in Figure 11. The results demonstrate that distortions can occur in the Make3D and Photo Pop-up models. When compared to these baseline methods, our own approach produce the most realistic results. We also find that our proposed scene structure is more suitable for curved road conditions.
Figure 11. Comparison of 3D traffic scene construction methods. (a) input images; (b) proposed method; (c) Make3D; (d) photo pop-up.
To qualitatively evaluate the scene models, we apply two metrics []: (1) a plane correctness ratio, with a plane being defined as correct if more than 75% of the plane patches are correctly detected as semantic wall and road regions; (2) a model correctness ratio, with a model being defined as correct if more than 75% of the patches in the wall and the road planes had the correct relationship to their neighbors. The evaluation is performed by someone who is not associated with the project. One-thousand road images with a resolution of 1024 × 1024 are chosen overall. The comparative results according to these metrics are shown in Table 6. As can be seen, the proposed scene models outperform the baseline methods in terms of both plane and model correctness.
Table 6. Comparisons of scene models.

6. Conclusions and Future Works

In this paper, we have proposed a framework for road scene model construction via superpixel-level RW region detection. The RW detection is able to locate road and foreground regions simultaneously. After segmentation of the superpixels, a region seed for the first frame is specified through user annotation. For each subsequent frame, the region seeds are located using trapezoids propagated from the previous frame. The RW algorithm is implemented through initialization of the region seeds, with the superpixels being utilized as graph nodes. In order to achieve more accurate region detection results, an FTC level set algorithm was also implemented for region refinement. After detection of the road and foreground regions, 3D corridor-style scene models can be constructed, depending on the type of scene stage. 3D traffic scene simulations can then be developed that are based on the scene models.
In the future research, we intend to use depth maps to construct more detailed scene models. A multi-view registration of point clouds can then be implemented to reconstruct 3D foreground structures. Cuboid models of the traffic elements can also be constructed on the basis of this for an even more sophisticated simulation of traffic scenes.

Author Contributions

Y.L. designed the algorithms and wrote the manuscript. Z.C. conducted the experiments and took part in the manuscript revision. Y.L. managed the project. J.Z. performed the discussion of the results. D.Z. analyzed the data. J.Y. took part in the manuscript revision.

Funding

This research was supported by the National Natural Science Foundation of China under Grant No. 61803298, and it was also supported by the Natural Science Foundation of Jiangsu Province under Grant BK20180236.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, Y.; Liu, Y.; Su, Y.; Hua, G.; Zheng, N. Three-dimensional traffic scenes simulation from road image sequences. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1121–1134. [Google Scholar] [CrossRef]
  2. Hao, J.; Li, C.; Kim, Z.; Xiong, Z. Spatio-temporal traffic scene modeling for object motion detection. IEEE Trans. Intell. Transp. Syst. 2014, 14, 1662–1668. [Google Scholar] [CrossRef]
  3. Simulation of ADAS and Active Safety. Available online: http://www.tassinternational.com/prescan (accessed on 5 November 2018).
  4. Hasagasioglu, S.; Kilicaslan, K.; Atabay, O. Vehicle dynamics analysis of a heavy-duty commercial vehicle by using multibody simulation methods. Int. J. Adv. Manuf. Technol. 2012, 60, 825–839. [Google Scholar] [CrossRef]
  5. Anguelov, D.; Dulong, C.; Filip, D.; Frueh, C.; Lafon, S.; Lyon, R.; Ogale, A.; Vincent, L.; Weaver, J. Google street view: Capturing the world at street level. Computer 2010, 42, 32–38. [Google Scholar] [CrossRef]
  6. Kopf, J.; Chen, B.; Szeliski, R.; Cohen, M. Street slide: Browsing street level imagery. ACM Trans. Graph. 2010, 29, 102–106. [Google Scholar] [CrossRef]
  7. Schick, A.; Bauml, M.; Stiefelhagen, R. Improving foreground segmentations with probabilistic superpixel Markov random fields. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 27–31. [Google Scholar]
  8. Lu, K.; Li, J.; An, X.; He, H. Vision sensor-based road detection for field robot navigation. Sensors 2015, 15, 29594–29617. [Google Scholar] [CrossRef] [PubMed]
  9. Siogkas, G.K.; Dermatas, E.S. Random-walker monocular road detection in adverse conditions using automated spatiotemporal seed selection. IEEE Trans. Intell. Transp. Syst. 2013, 14, 527–538. [Google Scholar] [CrossRef]
  10. Shen, J.; Du, Y.; Wang, W.; Li, X. Lazy random walks for superpixel segmentation. IEEE Trans. Image Process. 2014, 23, 1451–1462. [Google Scholar] [CrossRef] [PubMed]
  11. Tian, X.; Jung, C. Point-cut: Fixation point-based image segmentation using random walk model. In Proceedings of the 2015 IEEE International Conference on Image Processing, Quebec City, QC, Canada, 27–30 September 2015; pp. 2125–2129. [Google Scholar]
  12. Teichmann, M.; Weber, M.; Zoellner, M.; Cipolla, R.; Urtasun, R. MultiNet: Real-time joing semantic reasoning for autonomous driving. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1013–1020. [Google Scholar]
  13. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving: The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  14. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  15. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the 2014 European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
  16. Saxena, A.; Sun, M.; Ng, A.Y. Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 824–840. [Google Scholar] [CrossRef] [PubMed]
  17. Hoiem, D.; Efros, A.; Hebert, M. Recovering surface layout from an image. Int. J. Comput. Vis. 2007, 75, 151–172. [Google Scholar] [CrossRef]
  18. Delage, E.; Lee, H.; Ng, A.Y. A dynamic Bayesian model for autonomous 3D reconstruction from a single indoor image. In Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; pp. 2418–2428. [Google Scholar]
  19. Horry, Y.; Anjyo, K.; Arai, K. Tour into the picture: Using a spidery interface to make animation from a single image. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 3–8 August 1997; pp. 225–232. [Google Scholar]
  20. Nedovic, V.; Smeulders, A.W.; Redert, A.; Geusebroek, J.M. Stages as models of scene geometry. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1673–1687. [Google Scholar] [CrossRef] [PubMed]
  21. Lou, Z.; Gevers, T.; Hu, N. Extracting 3D layout from a single image using global image structures. IEEE Trans. Image Process. 2015, 24, 3098–3108. [Google Scholar] [PubMed]
  22. Shi, J.B.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar]
  23. Bouguet, J.Y. Pyramidal Implementation of the Lucas-Kanade Feature Tracker; Intel Corporation, Microprocessor Research Labs: Santa Clara, CA, USA, 1999. [Google Scholar]
  24. Horn, B.K.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef]
  25. Bruhn, A.; Weickert, J.; Schn, C. Lucas/Kanade meets Horn/Schunck: Combining local and global optical flow methods. Int. J. Comput. Vis. 2005, 61, 211–231. [Google Scholar] [CrossRef]
  26. Zhu, S.C.; Yuille, A. Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 1996, 18, 884–900. [Google Scholar]
  27. Shi, Y.; Karl, W.C. A real-time algorithm for the approximationof level-set-based curve evolution. IEEE Trans. Image Process. 2008, 17, 645–656. [Google Scholar] [PubMed]
  28. Agrawal, A. Non-photorealistic rendering: Unleashing the artist’s imagination. IEEE Comput. Graph. Appl. 2009, 29, 81–85. [Google Scholar] [CrossRef] [PubMed]
  29. Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University in China. Available online: http://trafficdata.xjtu.edu.cn/index.do (accessed on 5 November 2018).

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.