Traffic Sign Detection System for Locating Road Intersections and Roundabouts: The Chilean Case

This paper presents a traffic sign detection method for signs close to road intersections and roundabouts, such as stop and yield (give way) signs. The proposed method relies on statistical templates built using color information for both segmentation and classification. The segmentation method uses the RGB-normalized (ErEgEb) color space for ROIs (Regions of Interest) generation based on a chromaticity filter, where templates at 10 scales are applied to the entire image. Templates consider the mean and standard deviation of normalized color of the traffic signs to build thresholding intervals where the expected color should lie for a given sign. The classification stage employs the information of the statistical templates over YCbCr and ErEgEb color spaces, for which the background has been previously removed by using a probability function that models the probability that the pixel corresponds to a sign given its chromaticity values. This work includes an analysis of the detection rate as a function of the distance between the vehicle and the sign. Such information is useful to validate the robustness of the approach and is often not included in the existing literature. The detection rates, as a function of distance, are compared to those of the well-known Viola–Jones method. The results show that for distances less than 48 m, the proposed method achieves a detection rate of 87.5% and 95.4% for yield and stop signs, respectively. For distances less than 30 m, the detection rate is 100% for both signs. The Viola–Jones approach has detection rates below 20% for distances between 30 and 48 m, and barely improves in the 20–30 m range with detection rates of up to 60%. Thus, the proposed method provides a robust alternative for intersection detection that relies on statistical color-based templates instead of shape information. The experiments employed videos of traffic signs taken in several streets of Santiago, Chile, using a research platform implemented at the Robotics and Automation Laboratory of PUC to develop driver assistance systems.


Introduction
Traffic accidents are the primary cause of death for young people between 15 and 29 years old. Between 20 to 50 million people are injured each year, while 1.3 million died due to traffic accidents, of which 91% take place in low and medium income countries [1][2][3]. Latin America is a region with high rates of road traffic accidents [2,4], due to diverse reasons which include driver education and behavior, law enforcement, and lack of adequate road infrastructure. However, technology can also play an important role in driver assistance systems that contribute to the alertness of the driver and better driving behaviors [5,6].
Most traffic accidents occur in urban areas, especially at road intersections and roundabouts. Country statistics for traffic accidents show that a significant number happened at road intersections, for instance, 22% in the USA [7,8], 58.7% in Japan during 1995 [9], 13.75% in Ecuador during 2015 [10], and 9.22% in Chile during 2014 [11]. Thus, the importance of developing systems for road intersections detection [12], which unlike other aspects such as pedestrian detection, lane-tracking and driver drowsiness or distraction [5,6] has not received enough attention. Prior work has focused on pavement segmentation to detect intersections [6] by analyzing the continuity and curvature of the road boundaries. However, occlusions in urban environments make the analysis of edges difficult or impossible. Therefore, the implementation of an advanced driving assistance system (ADAS) [13] requires a module capable of detecting road signs in general, and specifically those found at intersections.
We propose a traffic sign detection approach based on statistical templates built using normalized color information. The novelty of the approach lies in the probabilistic model of the sign (or object) conditioned over the intensity of the normalized color channels instead of using traditional shape descriptors. The results show that this approach is robust to variations in distance between the car and the traffic sign, as well as variation in illumination. Unlike deep learning techniques, the results show that it is possible to implement the proposed traffic sign detection approach with small datasets of a few hundred images.
This paper is organized as follows. First, the state-of-the-art of traffic sign detection algorithms is discussed in Section 2. Section 3 describes a new system for traffic sign detection and its modules based on color information. The experimental results, where an analysis between the detection rate and the distance is performed to verify the quality of this system, are presented in Section 4. Finally, Section 5 is devoted to the conclusions and discussion of future work.

State-of-the-Art
Traffic sign detection using visible-spectrum cameras may take different approaches. Some works implement feature classifiers. This means that a sliding-window method is used to compute features on different overlapping regions, which are then fed to the previously trained classifier [14,15]. The drawback of this strategy is that many positions and scales have to be tested using classifiers that may need computationally demanding training phases. More recent methods formulate a two-stage strategy, in which candidate or proposal regions are computed first by some "class-agnostic" segmentation process, i.e., extracting groups of pixels that share some characteristic without necessarily identifying whether they truly belong to the same class of object. In a second stage, some classification or decision process is used to complete the detection deciding whether some classes of objects sought are present or not. The methods proposed in [16][17][18][19][20][21][22][23][24] can be found among recent approaches for traffic sign detection employing regional proposal strategies together with classifiers. The most recent approaches to segmentation and classification employed in traffic sign detection are discussed next.

Segmentation for ROI Generation
In the context of traffic sign detection, blob generation and color analysis are the main techniques employed to segment regions of interest. Special efforts have been placed on making the color-based segmentation robust to large variations in illumination and weather conditions. Greenhalgh et al. [16] transform RGB into grayscale images using the red and the blue components and experimentally obtained thresholds to generate ROIs. Salti et al. [17] employ three color spaces derived from the RGB, the first to highlight road signs with a predominance of blue and red colors, the second one is for signs with intense red and the third one for the bright blue. Li et al. [18] have built the Gaussian space (EE λ E λλ ), where objects dominated by the green-red and blue-yellow colors are highlighted. The preselected regions are in turn transformed to normalized values C λ = E λ /E and C λλ = E λλ /E, which are fed to a k-means clustering [25] to generate the ROIs. Zaklouta et al. [19] implement two RGB-based chromatic filters for ROIS generation, one for signs that have a red color prevalence, and another filter for red-yellow predominance; in both cases, thresholds are defined in terms of mean and variance. Lillo et al. [26] have used the L * a * b * space to detect signs in which the blue, green, yellow and red colors dominate. Based on the k-means clustering algorithm, the authors build a classifier that employs the a * and b * components. Fleyeh et al. [23] use the H and S components of the HSV space to train a classifier and implement the color segmentation that yields ROIs. More recently, Han et al. [24] have used the H component of the HSI space, in which the traffic signs are highlighted in order to build a grayscale image where a set of ROIs is generated. Keser et al. [27] have used the HSV filter intervals to generate a set of ROIs. Finally, Zhu et al. [28] employ three different object proposal strategies (Selective Search, Edge Boxes and BING) and convolutional neural networks for classification, achieving an accuracy of 88% on average.

Recognition
The recognition stage typically employs feature classifiers, and therefore requires a feature descriptor and a classification algorithm. One of the most popular feature descriptors is the histogram of oriented gradients (HOG) [15], which provides information about objects' shape. Recent works in traffic sign detection that employ the HOG descriptor include [16,17,19,29]. Li et al. [18] employ the PHOG descriptor, a variant of the HOG descriptor. Other descriptors are based upon the discrete Fourier transform [26], the Hough transform [23], the SURF method [30], the values of the neighboring pixels in a ROI [31], or predefined contour descriptors for basic shapes (circular, triangular, or rectangular) [27].
Concerning classifiers, most of the recent work in traffic sign detection employs SVM (support vector machine) classifiers [25]; see for example [16][17][18][19]26]. Another popular classification approach relies on artificial neural networks (NN). For example, recent work by Huang et al. [29] combines an NN-classifier with ELM (Extreme Learning Machine), and Pérez et al. [32] relies on MLPs (MultiLayer Perceptrons). The simpler k-NN (k-nearest neighbors) algorithm [25] is employed in the traffic sign detection method proposed in [24]. Recently, Deep Learning techniques are being used for simultaneous detection and recognition of traffic signs. CNN (Convolutional Neural Networks) is also employed in many of the most recent papers-Lau et al. [31], Jung et al. [33], Zeng et al. [34], Zhu et al. [28]-which propose new architectures for automatic sign detection. Other strategies, such as the one employed by Li and Yang [35] rely on a combination of DBM (Deep Boltzmann Machine) and CCA (Canonical Correlation Analysis) for feature extraction and classification. Lau et al. [31] have also experimented with RBNN (Radial Basis Neural Networks) for classification.
Thus country-specific databases are required for the development of traffic sign detection systems. However, there is a lack of databases with traffic signs in Latin America. Therefore, another goal of this work is to contribute to the development of traffic sign detection systems that can be validated also on traffic signs of the Latin American region.

Proposed Approach for Segmentation and Recognition of Traffic Signs at Road Intersections and Roundabouts
The proposed computational strategy for detecting signs found at road intersections and roundabouts is composed of two parts. The first part generates ROIs in which traffic signs could be found calculating and analyzing color statistics in the normalized RGB space. The second part solves the recognition of signs in ROIs using a statistical template matching strategy using templates also in the normalized RGB space.
The detection must be done at the furthest possible distance, so that the driver has enough time to react and to stop in time. Examples of typical testing scenarios for the proposed approach are presented in Figure 1, which shows a distant stop sign and a yield (give way) sign.

Chromaticity Filter for the Selection of ROIs
Under adequate illumination conditions, such as daylight or artificial lighting, the color of traffic signs is a feature that can be used to generate ROIs. Comparing the histograms of traffic signs in four color spaces, RGB, HSV, YCbCr and ErEgEb (the normalized RGB space) [38,39], it possible to observe in Figure 2 that some of the color spaces provide better discrimination capability between traffic signs and the image backgrounds. In Figure 2, the histograms labeled RPOS correspond to histograms of the stop sign, while the curves labeled RNEG correspond to the histograms of backgrounds or scenes that do not contain traffic signs.   The color space that shows the smaller overlap between the histograms of positives (signs) and negatives (non-sign) is the ErEgEb space, where in particular the Er channel shows a clear separation between the two classes. The Eg channel histograms have a small intersection, but likewise it serves to discard a significant portion of negatives. Finally, the Eb does not provide considerable information, but will be considered a part of the classification strategy, see Figure 2j-l. A similar analysis was conducted for the yield sign and the results allow to conclude that the ErEgEb space, and in particular Er and Eg channels provide a better discrimination capability.
Therefore, the candidate regions of interest can be detected using a chromaticity filter, i.e., a filter that works on the variables that define color hue (dominant wavelength) and color purity or saturation (difference between the intensity of the dominant wavelength with respect to white, grey or black) regardless of luminance (magnitude of the color components vector) or psychological perception of illumination brightness or intensity (as an average of the components of the color vector). In other words, a chromaticity filter only requires two variables that describe dominant wavelengths regardless of the total energy by mapping the components of the thrichromacy color model into a subspace of two normalized values. Assuming a normal distribution of the chromaticity channels Er and Eg, the selection thresholds for extracting ROIs can be defined as where µ c and σ c are the mean and standard deviation of the channel c computed over a set of positives (images with traffic signs) according to: where I i,c [p] is the value of channel c at the pixel location p within the traffic sign of the i-th reference image (positive), i = 1, 2, . . . , N I , N I is the number of positive images, N P is the number of pixels within the reference traffic sign area, µ i,c is the mean value of channel c for the i-th image, µ c is mean value of channel c across the ensemble of N I images, and µ 2 c is the variance of the mean values µ i,c , i = 1, 2, . . . , N I . The value α is set to minimize false positives and false negatives that will be passed to the recognition stage. A practical value that ensures the lowest amount of false positives while preventing misdetections was found to be α = 2. It is to be noted that using the so-called summed integral area tables or integral images [14,40] is possible to reduce the computation time of the average values on N × N sliding blocks. Two examples showing the generation of preliminary candidates for ROIs using windows sizes of 50 × 50 and 10 × 10 pixels are shown in Figure 3 for values of α = 2, 3.  The last step for the final proposal of ROIs is to eliminate the overpopulation of candidates. To this end, all the candidates that are contained within or are a sub-window of another candidate are discarded, so that only the largest block remains. Next, windows whose mean value is closest to µ c in each neighborhood are selected so that there is only one candidate per neighborhood. Figure 4 shows the final ROI proposal obtained using 10 window sizes ranging from 10 × 10 to 50 × 50 in geometric progression, with a fixed scaling factor between each size. The number of preliminary candidates satisfying the chromaticity filter threshold was 32, 849. This number is reduced to only 9 after the pre-candidate selection and merging step.

Recognition of Traffic Signs Based on Statistical Templates
The second stage of the proposed traffic sign detection approach is responsible for solving the identification of ROIs as traffic signs of a given type. To this end, a set of images is employed to create two statistical models, one with the mean intensity and the other with its standard deviation for each pixel belonging to the traffic sign. Testing on a sliding block for the percentage of pixels that fall within the expected intensity range for a given location provides a discriminator to detect traffic signs from background and non-traffic sign objects. A flow diagram of the proposed method is shown in Figure 5. The corresponding pseudocode of the algorithm is presented in Algorithm 1. The most effective recognition of traffic signs is achieved applying the algorithm to the Er and Eb channels of the ErEgEb color space. Only two channels conveying the chromaticity information are sufficient because the magnitude normalization satisfies Er + Eg + Eb = 1, making the third channel dependent on the other two (Eg = 1 − Er − Eb). The probabilistic model that defines the discrimination thresholds is discussed next.
In order to develop the probabilistic template matching model to recognize traffic signs, it is first convenient to introduce the following notation: The probability that a window corresponds to a particular object (sign) is: , for all j = k. This is possible in view of the fact that O and I are random samples [41,42]. Also, this means that the probability of a pixel not belonging to an object only depends on the pixel value and not its neighborhood. In other words, the background (non-sign) pixels are assumed to be conditionally independent with respect to their neighborhood. This assumption is not entirely true in every area of the background, but simplifies the probability computation. can be obtained assuming that the pixel values of the object (sign) of interest follow a normal distribution N(µ Ex , σ Ex ), x = r, b, with mean chromaticity µ Ex and standard deviation σ Ex obtained from a set of reference images, see the last row of Figure 2.
The likelihood or conditional probability that the measured chromaticity values ( ], x = r, g, given object class, is then given by: The prior probability P O[k] of finding the object (sign) in an image can be obtained experimentally from the set of reference images as: where TP is the true positive rate and FP is the false positive rate for the object (sign) of class (type) O. This ratio is known as a positive predictive value and it describes the probability of traffic signs being correctly detected [43].
Finally, the probability of the occurrence of chromaticity values P(I[k] = (Er, Eb)) can be obtained empirically or analytically. The empirical approach would require constructing the histograms for the Er and Eb channels using a representative set of training data and normalizing the histograms to obtain the ratio between the number of pixels with the given chromaticity levels with respect to the the total amount of pixels. Analytically, a cumulative density function for the values of each channel can be deduced under the assumption that RGB values distribute according to a uniform distribution within a window block that contains both object and background pixels (see appendix for calculations details). The cumulative function for the chromaticity values under the uniform distribution assumption is given by: The probability density function f (y) associated to the cumulative distribution of the chromaticity values I[k] = (Er, Eb) is easily obtained from (6) by deriving F(y) with respect to y. Figure 6 depicts the cumulative and density functions. The density function f (y) reaches a maximum at y ∼ 0.36, which is the most likely background value without prior knowledge of the object (sign) class. Thus, it is desirable that the objects of interest have their chromaticity levels far away from this value.
The analysis thus far provides enough tools to compute the probability that a window corresponds to the object of interest using (2) and the statistical templates for the mean and standard deviation µ Ex [k], σ Ex [k] over the window block k = 1, 2, . . . , n. However, these templates consider both the pixels of the object of interest and the background; therefore, to improve discrimination, it is convenient to discard background pixels.  To this end, Figure 7 presents the representative points to the comparison of histograms of the pixels corresponding to the background and to the object of interest presented in Figure 8 reveals that the luminance channel Y spreads over the entire range of possible values for background objects. Hence, the standard deviation of the luminance channel is a good indicator to determine whether the pixel is part of the object of interest or part of the background. Figure 9 shows the mean and standard deviation for the luminance channel Y. It is clear that the template built using the variance provides higher contrast between background and foreground than using the mean value of the luminance to create a mask for discarding background regions.   To create a mask for discarding background pixels, an adequate thresholding value for the standard deviation template is σ Y = 60 as may be observed in Figure 10 since it allows to retain most of the pixels of the object of interest and discard all of the background. Assuming the luminance Y would distribute Gaussianly, σ Y > 60 would imply that 95% of the samples would fall within ±120 intensity levels, thus it would cover a range of 240 levels, which would be almost the full range for an 8-bit image with 255 levels. Thus, higher values for the threshold on σ Y are not convenient, while lower values cause part of the object to be labeled as background, as shown in Figure 10a with σ Y = 55. By the foregoing, the channels to be used in the recognition stage have the luminance to remove the background and the chromaticity values Er and Eb to confirm the ROIs are of a given traffic sign, ensuring robustness to illumination variations.

Perception and Processing Systems
The vehicle shown in Figure 11a was employed as a testing platform for the experiments. The system comprises several cameras (visible spectrum, IR, catadioptric), an IMU and a RTK GPS; see [6] for further details. For the experiments presented here, only one camera from Imaging Source, model DFK31BF03 model (see Figure 11b) was used together with the RTK GPS from Navcom Technology, model SF-2050, with decimeter positioning accuracy (see Figure 11c). The camera has a resolution of 1024 × 768 pixels and delivers images at 30 fps, while the GPS sampling rate is 10 Hz. Thus, the GPS data was interpolated to match the sampling instants of the camera. Registering the position is important in order to compute detection rates as a distance function. The processing of the images was carried out on a PC with an Intel Core 2 Duo processor with a 2.0 GHz frequency, and 3.5 GB of RAM. All the algorithms were implemented in C++ using the OpenCV version 2.2 library.

Experiments Employing the Viola-Jones Method and the Proposed Statistical Template Approach
In this section, the proposed traffic sign detection approach based on statistical templates is tested and compared to the well-know Viola-Jones method [45].

Viola-Jones Method:
The detection rates of the Viola-Jones approach for the stop and yield signs are summarized in Table 1 as a function of distance. The detection rate for stop signs is 100% when distances are 20 m or less. However, the detection rate rapidly falls to 0% for distances above 35 m. On the other hand, less that 3.1% of the yield signs was detected for distances below 20 m. For distances above 20 m, it was not possible to detect any yield sign. This is attributed, in part, to the fact that some samples in the dataset contained signs with marks and graffiti on it. These results show that the Viola-Jones approach is highly sensitive to possible modifications in the signs sought. The false alarm rates of the Viola-Jones approach for the stop and yield signs were practically 0%, as shown in Table 2. The proposed algorithm was executed employing eight different templates sized from 14 × 14 to 50 × 50 pixels in geometric progression. The detection rates presented in Table 3 show that a high rate of success was achieved for distances up to 37 m in the case of the stop sign and 35 in the case of the yield sign. Detection rates fall to 0% for distances above 52 m in the case of the stop sign and 48 m in the case of the yield sign. Unlike the Viola-Jones approach, false alarm rates were between 3.6% and 6.9% as shown in Table 2. A comparison of the detection rate performance of the proposed method with the Viola-Jones approach is presented in Figure 13. This figure shows the effectiveness of the proposed approach at detecting traffic signs earlier than the Viola-Jones method does. Tables 1 and 3 were constructed with approximately 430 images for each distance class.
In terms of the computational effort, the Viola-Jones method required 450 ms per frame while the proposed approach required 950 ms per frame. This amount could be decreased for real-time operation using dedicated graphic processing units (GPUs).  Finally, Figure 14 presents an extended example of the proposed system, in real driving conditions during the day, for both stop and yield signs.

Conclusions
This paper presented an approach based on statistical templates computed from chromaticity and luminance values for traffic sign detection near road intersections and roundabouts. The approach is evaluated using a dataset of stop and yield (give way) signs from Chile and compared to the well-known Viola-Jones classification method.
The proposed approach is divided into two stages. The first stage is a segmentation stage based on the chromaticity filter applied to the Er and Eg channels of the ErEgEb color space (the normalized RGB color space). In the second stage, the chromaticity filter returns candidate regions that are responsible for the recognition of traffic signs and which must be tested. The selection of the Er and Eg channels is based on the analysis of the histograms of the color components in four color spaces (RGB, YCrCB, HSV and ErEgEb), which shows that the ErEgEb provides greater capacity to discriminate candidate traffic signs. The segmentation stage employs a selection threshold that can be computed automatically from a reference dataset. The recognition stage is based on a statistical template built with information of the Er and Eb chromaticity channels and the Y luminance channel. The luminance channel is employed to create a traffic sign mask from the variance of pixels that allows background regions within a ROI to be discarded. The Er and Eb channels are used to compute a statistical template that provides the sign selection thresholds. A probabilistic model and probability distribution function were derived to construct a recognition process based on Bayesian inference.
The results obtained show that the proposed approach has higher detection rates than the Viola-Jones method. The experiments considered the evaluation of the detection rate at different distances as the vehicle approaches a traffic sign. On average, the proposed approach exhibits a detection rate of 87.5% for yield signs and 95.4% for stop signs at distances below 48 m.
The two main advantages of the proposed approach are summarized in that it does not require a computationally expensive training or calibration stage, and it is not sensitive to changes in illumination, partial occlusion or marks drawn on the traffic sign. Future work will consider extending the proposed approach to the detection of traffic lights at road junctions and crosswalks, as well as to the detection of other signs not necessarily found at road intersections. Ongoing research is considering joint analysis of lane geometry analysis and edge continuity together with traffic sign detection to improve the detection of road intersections. The main limitation of this model is the computing time, which is about 950 ms per frame. As a future work, we will work to reduce this processing time. Author Contributions: The framework was proposed by Miguel Torres, and further development and implementation were realized by Gabriel Villalón. Marco Flores and Miguel Torres studied some of the ideas, analyzed the experiment results and prepared the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Deduction of the Background Probability Distribution
To derive the cumulative density function of chromaticity values for any pixel F(y) in Equation (6), it is assumed that each channel of the RGB color spaces follows a uniform distribution. Thus, three random variables X 1 , X 2 and X 3 representing each channel are defined according to the uniform distribution: The cumulative distribution function of the chromaticity channel E X 1 = X 1 /(X 1 + X 2 + X 3 ), given by: is calculated as follows. The procedure for E X 2 or E X 3 is exactly the same.
In finding a closed-form expression for (A2), the following cases are considered: • Case y = 0: • Case y = 1: • Case 0 < y < 1: The last case amounts to solving the following integral: The inner integral of f (x 1 ) must consider that if y 1−y (x 2 + x 3 ) ≥ S, then density function f (x 1 ) = 1 S is integrated up to S, while if y 1−y (x 2 + x 3 ) < S, then the integral is computed for x 1 ∈ [0, y 1−y (x 2 + x 3 )].
For the latter to be fulfilled throughout the non-zero domain of f (x 2 ) and f (x 3 ), the following must be satisfied: Similarly, when analyzing the limits of integration for the integrals over x 2 and X 3 , so that the non-null parts are integrated and the result is continuous over the range of integration, the following intervals are obtained for y: (0, 1 3 ], ( 1 3 , 1 2 ] and ( 1 2 , 1). For 0 < y ≤ 1 3 : For 1 3 < y ≤ 1 2 , the integral is calculated as follows: Finally, for 1 2 < y < 1, the integral is computed as follows: The above results in the cumulative distribution presented in Equation (6).

Appendix B. Traffic Sign Detection by Using the Viola-Jones Method
The Viola-Jones object recognition approach [14,40] has been used in multiple computer vision applications [45][46][47]. In this work, it has been used to build a traffic signs detector, trained for stop and yield signs. Figure A1 shows the first and second convolution masks based on Haar-like features for each of the traffic signs considered in this work.