3.2. Existing Databases and Their Limitations
Weather information can be an important factor for many computer vision applications, such as object detection and recognition or scene categorisation. For this, we need image databases including weather information.
There are only a few publicly available databases for meteorological classification, and they contain a limited number of images [
24]. The database provided in [
2] includes clear weather, light rain and heavy rain for rain classification.
A second database, including only rain conditions, was developed by [
25] in which the Cerema-AWR (Adverse Weather Rain) database was constructed with 30,400 images divided into two types of rain: natural and simulated for different camera settings. Like rain, the authors of [
26] focused on the study of different fog levels (clear weather, light fog and heavy fog) using a database containing 321 images collected from six cameras installed on three different weather stations. The images were taken from static and fixed platforms.
Works of [
18] provided a meteorological classification database with more real scenes, but only sunny and cloudy images are involved. The weather dataset contains 10,000 images divided into two classes. In [
27], the authors described the Multi-class Weather Image database (MWI), which contains 20,000 images divided into four weather conditions classes, such as sunny, rainy, snowy and haze, and which are collected from web albums and films like Flickr, Picasa, MojiWeather, Poco, Fengniao. This database is annotated manually.
In [
24], the authors present a 65,000-image database collected from Flickr, divided into six weather classes: sunny, cloudy, rainy, snowy, foggy and stormy. This database is called MWD (Multi-class Weather Dataset). Also in [
28], the authors created the Image2Weather database with 183,000 images and divided them into five classes: sun, cloud, snow, rain and fog.
In [
29], the Archive of Many Outdoor Scenes (AMOS + C) database has been proposed as the first large-scale image dataset associated with meteorological information, which has been extended from the AMOS database by collecting meteorological data via time stamping and geolocation of images.
The authors of [
30] introduced the Seeing Through Fog database, which contains 12,000 samples acquired under different weather and illumination conditions and 1500 measurements acquired in a fog chamber. Furthermore, the dataset contains accurate human annotations for objects labelled with 2D and 3D bounding boxes and frame tags describing the weather, daytime and street conditions.
There exists another database called Cerema-AWP [
5], which contains adverse weather conditions, such as fog and rain with two different intensities for both. These conditions are captured during day and night and were used to evaluate pedestrian detection algorithms under these degraded conditions. Fog and rain in this database were produced artificially in the French Cerema PAVIN BP platform, which is the unique European platform allowing the production of controlled fog and rain. We refer to [
31,
32] for more details on this platform.
During the study of state-of-the-art databases, we encountered a lot of difficulties. Indeed, there are many articles that do not present the experimental approach used for weather classification. They are not well documented and do not contain explanations that would allow readers to use the databases reliably. As a consequence, it becomes very complicated to compare our work with previous works with satisfactory results.
There are other difficulties that are related to weather databases. Either they are not public such as [
26,
27], or, if they are accessible, they do not have all the information that makes it possible to study them (for example, the meteorological spatial and temporal conditions in which they were taken). In this context, a qualitative database analysis was presented in [
33]. The authors focused on the assessment of the documentation quality of the Cerema-AWP database [
5]. However, this analysis does not discuss the qualitative or experimental side of the database linked to meteorological data.
Existing meteorological databases cannot be used rigorously. Indeed, most of them are collected from websites like Google Image or Flicker, and these images are then labelled manually, such as the MWD database [
24]. Furthermore, I2W database images are collected from the web, even if they are automatically labelled with meteorological information taken by sensors, and this always contains ambiguous images (the database available publicly is not the same as the one presented in [
28]: the authors added a 6th class called “Others” which contains images presenting classification doubts). The different filtering steps are not described precisely and there are therefore ambiguities. Another problem encountered concerning the I2W database is the imbalance between meteorological classes where there is a large difference in the number of images (70,501 images for sunny class and 357 images for fog class). This imbalance can lead to difficulties for the neural networks during the learning phase.
Through these labelling problems, and since weather classification requires at least reliable data, we have built a rigorous and automatically annotated database that is based on sensors for meteorological classes allocation installed on the same site as the acquisition camera. In this way, our data guarantees better objectivity. This database is called Cerema-AWH and is presented in the next section.
3.3. Cerema-AWH Database
The Cerema-AWH database (Cerema Adverse Weather Highway database) contains different weather conditions (
Figure 1). These conditions are real conditions (unlike those of the Cerema-AWP database [
5], which are artificially generated). This database was created using an AVT Pike F421B road camera installed at the Fageole highway on the E70 motorway, which belongs to “Direction interdépartementale des routes Massif-Central” to acquire images with a
resolution at a 1.9 Hz frequency. The specificity of this site is its location at an altitude of 1114 m, allowing natural degraded weather conditions observation (very heavy rain, heavy fog and frequent snowfall). This outdoor site is equipped with special-purpose meteorological sensors: an OTT P
arsivel optical disdrometer to measure rainfall intensity, a visibilimeter Lufft VS20-UMB to measure the meteorological visibility distance and a Vaisala PWD12 weather sensor to measure both of them (
Figure 2). In this way, we will have the advantage of recording images and weather conditions in the same place unlike other databases where image acquisition is done in one place and meteorological data collection is done at the closest meteorological station.
Upon completing the image acquisition phase, the data were automatically labelled by associating each image with the corresponding weather data. In this study, the snow case is not addressed. The used database contains over 195,000 images covering five weather conditions, distributed as indicated in
Table 3: day normal condition (DNC), day light fog (DF1), day heavy fog (DF2), day light rain (DR1) and day heavy rain (DR2). To avoid introducing bias into the results while training and testing the algorithm, images acquired during even days were used first to train the algorithm, then images acquired during odd days were used for the test. Dividing up the database in this way ensures that similar images are not used for the training and testing phases, which may otherwise have introduced severe bias into the results.
3.4. Detailed Description of the Databases Used in This Study
In our study, we used a new database called Cerema AWH. To be able to compare the efficiency of our database with others, we have selected two databases from the state-of-the-art, which are the Image2Weather database [
28] and the MWD database [
24]. These databases are, therefore, detailed in the following.
Image2Weather (I2W) [
28] is a meteorological database containing 183,798 images, of which 119,141 images are classified according to five classes: sunny, cloudy, rainy, foggy and snowy. This database contains a collection of large-scale images. I2W was built from a selection of already existing images belonging to the EC1M database (
European City 1 Million). From there, the authors of [
28] implemented an automatic image annotation procedure. From the URL and image identifier available in EC1M, they use Flickr to obtain the image itself and its associated metadata, such as time and location. Depending on the position taken, the corresponding elevation information is acquired through Google Maps. Depending on the image longitude and latitude, they use Weather Underground to extract the corresponding weather properties. Weather information on the Weather Underground website comes from over 60,000 weather stations. For about 80% of images of the database, the distance between them and the nearest stations is less than four kilometres. By using these three tools, the I2W database was built with 28 properties available for each image, including date, time, temperature, weather condition, humidity, visibility, and rainfall rate. For the sake of meteorological estimation, the authors focus only on photos captured outside where the sky region occupies more than 10% of the entire image. Overall, the images collected cover most of Europe.
The Multi-class Weather database, noted MWD [
24], was created from Flickr and Google images using the keywords “outdoor” and “weather”. It contains 65,000 images of which 60,000 are divided into six weather classes: sunny, cloudy, rainy, snowy, foggy, and stormy weather. Image annotation is done manually by asking annotators to keep images containing outdoor scenes with reasonable resolutions. Visually similar images were rejected. The MWD database was divided into two groups. The first group contains annotated images with classes used for meteorological classification where images with critical meteorological conditions are identified. In this group, each image should belong to only one class. For this, the images that do not contain visible raindrops, snow, or fog are classified either in sunny or in cloudy classes. However, images with rain, snow, and thunder are affected by adverse weather conditions. Images tagged as “ambiguous weather” are rejected. In the end, 10,000 images of each class were retained. The second group contains annotated images with weather attributes to take into account meteorological attributes recognition, which aims to find all the meteorological conditions in one image. Compared with the classification group, images belonging to this group allow a recognition of two or more weather conditions. Images labelled with at least one “ambiguous attribute” are eliminated. In the end, 5000 were selected for this group.