Next Article in Journal
Evaluation of 3D/2D Imaging and Image Processing Techniques for the Monitoring of Seed Imbibition
Next Article in Special Issue
LaBGen-P-Semantic: A First Step for Leveraging Semantic Segmentation in Background Generation
Previous Article in Journal
Automated Analysis of Spatially Resolved X-ray Scattering and Micro Computed Tomography of Artificial and Natural Enamel Carious Lesions
Previous Article in Special Issue
Deep Learning with a Spatiotemporal Descriptor of Appearance and Motion Estimation for Video Anomaly Detection
Open AccessArticle

Full Reference Objective Quality Assessment for Reconstructed Background Images

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA
*
Author to whom correspondence should be addressed.
J. Imaging 2018, 4(6), 82; https://doi.org/10.3390/jimaging4060082
Received: 16 May 2018 / Revised: 6 June 2018 / Accepted: 6 June 2018 / Published: 19 June 2018
(This article belongs to the Special Issue Detection of Moving Objects)

Abstract

With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images.
Keywords: background reconstruction; image quality assessment; image dataset; subjective evaluation; perceptual quality; objective quality metric background reconstruction; image quality assessment; image dataset; subjective evaluation; perceptual quality; objective quality metric

1. Introduction

A clean background image has great significance in multiple applications. It can be used for video surveillance [1], activity recognition [2], object detection and tracking [3,4], street view imaging and location-based services on web-based maps [5,6], and texturing 3D models obtained from multiple photographs or videos [7]. However, acquiring a clean photograph of a scene is seldom possible. There are always some unwanted objects occluding the background of interest. The technique of acquiring a clean background image by removing the occlusions using frames from a video or multiple views of a scene is known as background reconstruction or background initialization. Many algorithms have been proposed for initializing the background images from videos, for example [8,9,10,11,12,13,14]; and also from multiple images such as [15,16,17].
Background initialization or reconstruction is crippled by multiple challenges. The pseudo-stationary background (e.g., waving trees, waves in water, etc.) poses additional challenges in separating the moving foreground objects from the relatively stationary background pixels. The illumination conditions can vary across the images, thus changing the global characteristics of each image. The illumination changes cause local phenomena such as shadows, reflections and shading, which change the local characteristics of the background across the images or frames in a video. Finally, the removal of ‘foreground’ objects from the scene creates holes in the background that need to be filled in with pixels that maintain the continuity of the background texture and structures in the recovered image. Thus, the background reconstruction algorithms can be characterized by two main tasks:
(1)
foreground detection, in which the foreground is separated from the background by classifying pixels as foreground or background;
(2)
background recovery, in which the holes formed due to foreground removal are filled.
The performance of a background extraction algorithm depends on two factors:
(1)
its ability to detect the foreground objects in the scene and completely eliminate them; and
(2)
the perceived quality of the reconstructed background image.
Traditional statistical techniques such as Peak Signal to Noise Ratio (PSNR), Average Gray-level Error (AGE), total number of error pixels (EPs), percentage of EPs (pEP), number of Clustered Error Pixels (CEPs) and percentage of CEPs (pCEPs) [18] quantify the performance of the algorithm in its ability to remove foreground objects from a scene to a certain extent, but they do not give an indication of the perceived quality of the generated background image. On the other hand, the existing Image Quality Assessment (IQA) techniques such as Multi-scale Similarity metric (MS-SSIM) [19] and Color image Quality Measures (CQM) [20] used by the authors in [21] to compare different background reconstruction algorithms are not designed to identify any residual foreground objects in the scene. Lack of a quality metric that can reliably assess the performance of background reconstruction algorithms by quantifying both aspects of a reconstructed background image motivated the development of the proposed Reconstructed Background visual Quality Index (RBQI). The proposed RBQI is a full-reference objective metric that can be used by background reconstruction algorithm developers to assess and optimize the performance of their developed methods and also by users to select the best performing method. Research challenges such as the Scene Background Modeling Challenge (SBMC) 2016 [22] are also in need of a reliable objective scoring measure. RBQI uses the contrast, structure and color information to determine the presence of any residual foreground objects in the reconstructed background image as compared to the reference background image and to detect any unnaturalness introduced by the reconstruction algorithm that affects the perceived quality of the reconstructed background image.
This paper also presents two datasets that are constructed to assess the performance of the proposed as well as popular existing objective quality assessment methods in predicting the perceived visual quality of the reconstructed background images. The datasets consist of reconstructed background images generated using different background reconstruction algorithms in the literature along with the corresponding subjective ratings. Some of the existing datasets such as video surveillance datasets (Wallflower [23], I2R [3]), background subtraction datasets (UCSD [24], CMU [25]) and object tracking evaluation dataset (“Performance Evaluation of Tracking and Surveillance (PETS)") are not suited for benchmarking the background reconstruction algorithms or the objective quality metrics used for evaluating the perceived quality of reconstructed background images as they do not provide reconstructed background images as ground-truth. The more recent dataset “Scene Background Modeling Net” (SBMNet) [26,27] is targeted at comparing the performance of the background initialization algorithms. It provides the reconstructed images as ground truth, but it does not provide any subjective ratings for these images. Hence, the SBMNet dataset [26,27] is not suited for benchmarking the performance of objective background visual quality assessment. Thus, the datasets proposed in this work are the first and currently the only datasets that can be used for benchmarking existing and future metrics developed to assess the quality of reconstructed background images. The constructed datasets and the code for RBQI are available for download at Supplementary Materials.
The rest of the paper is organized as follows. In Section 2, we highlight the limitations of existing popular assessment methods [28]. We introduce the new benchmarking datasets in Section 3 along with the details of the subjective tests. In Section 4, we propose a new index that makes use of a probability summation model to combine structure and color characteristics at multiples scales for quantifying the perceived quality in reconstructed background images. Performance evaluation results for the existing and proposed objective visual quality assessment methods are presented in Section 5 for reconstructed background images. Finally, we conclude the paper in Section 6 and also provide directions for future research.

2. Existing Full Reference Background Quality Assessment Techniques and Their Limitations

Existing background reconstruction quality metrics can be classified into two categories: statistical and image quality assessment (IQA) techniques, depending on the type of features used for measuring the similarity between the reconstructed background image and reference background image.

2.1. Statistical Techniques

Statistical techniques use intensity values at co-located pixels in the reference and reconstructed background images to measure the similarity. Popular statistical techniques [18] that have been traditionally used for judging the performance of background initialization algorithms are briefly explained here:
(i)
Average Gray-level Error ( A G E ): AGE is calculated as the absolute difference between the gray levels of the co-located pixels in the reference and reconstructed background image.
(ii)
Error Pixels ( E P ): E P gives the total number of error pixels. A pixel is classified as an error pixel if the absolute difference between the corresponding pixels in the reference and reconstructed background images is greater than an empirically selected threshold τ .
(iii)
Percentage Error Pixels ( p E P ): Percentage of the error pixels, calculated as EP/N, where N is the total number of pixels in the image.
(iv)
Clustered Error Pixels ( C E P ): C E P gives the total number of clustered error pixels. A clustered error pixel is defined as the error pixel whose four connected pixels are also classified as error pixels.
(v)
Percentage Clustered Error Pixels ( p C E P ): Percentage of the clustered error pixels, calculated as CEP/N, where N is the total number of pixels in the image.
Though these techniques were used in the literature to assess the quality of reconstructed background images, their performance was not previously evaluated. As we show in Section 5 and as noted by the authors in [28], the statistical techniques were found to not correlate well with the subjective quality scores in terms of prediction accuracy and prediction consistency.

2.2. Image Quality Assessment Techniques

The existing Full Reference Image Quality Assessment (FR-IQA) techniques use perceptually inspired features for measuring the similarity between two images. Though these techniques have been shown to work reasonably well while assessing images affected by distortions such as blur, compression artifacts and noise, these techniques have not been designed for assessing the quality of reconstructed background images.
In [21,27], popular FR-IQA techniques including Peak Signal to Noise Ratio (PSNR), Multi-Scale Similarity (MS-SSIM) [19] and Color image Quality Measure (CQM) [20] were adopted for objectively comparing the performance of the different background reconstruction algorithms; however, no performance evaluation was carried out to support the choice of these techniques. Other popular IQA techniques include Structural Similarity Index (SSIM) [29], visual signal-to-noise ratio (VSNR) [30], visual information fidelity (VIF) [31], pixel-based VIF (VIFP) [31], universal quality index (UQI) [32], image fidelity criterion (IFC) [33], noise quality measure (NQM) [34], weighted signal-to-noise ratio (WSNR) [35], feature similarity index (FSIM) [36], FSIM with color (FSIMc) [36], spectral residual based similarity (SR-SIM) [37] and saliency-based SSIM (SalSSIM) [38]. A review of existing FR-IQA techniques is presented in [39,40,41]. The suitability of these techniques for evaluating the quality of reconstructed background images remains unexplored.
As the first contribution of this paper, we present two benchmarking datasets that can be used for comparing the performance of different techniques in objectively assessing the perceived quality of the reconstructed background images. These datasets contain reconstructed background images along with their subjective ratings, details of which are discussed in Section 3.1. A preliminary version of the benchmarking dataset discussed in Section 3.1.1 was published in [28]. In this paper, we provide an additional larger dataset as described in Section 3.1.2. We also propose a novel objective Reconstructed Background Quality Index (RBQI) that is shown to outperform existing techniques in assessing the perceived visual quality of reconstructed background images.

3. Subjective Quality Assessment of Reconstructed Background Images

3.1. Datasets

In this section, we present two different datasets constructed as part of this work to serve as benchmarks for comparing existing and future techniques developed for assessing the quality of reconstructed background images. The images and subjective experiments for both datasets are described in the subsequent subsections.
Each dataset contains the original sequence of images or videos that are used as inputs to the different reconstruction algorithms, the background images reconstructed by the different algorithms and the corresponding subjective scores. For details on the algorithms used to reconstruct the background images, the reader is referred to [27,28].

3.1.1. Reconstructed Background Quality (ReBaQ) Dataset

This dataset consists of eight different scenes. Each scene consists of a sequence of eight images where every image is a different view of the scene captured by a stationary camera. Each image sequence is captured such that the background is visible at every pixel in at least one of the views. A reference background image that is free of any foreground objects is also captured for every scene. Figure 1 shows the reference images corresponding to each of the eight different scenes in this dataset. The spatial resolution of the sequence corresponding to each of the scenes is 736 × 416.
Each of the image sequences is used as input to twelve different background reconstruction algorithms [8,9,10,11,12,13,14,15,16,17]. The default settings as suggested by the authors in the respective papers were used for generating the background images. For the block-based algorithms of [11,14] and [17], the block sizes are set to 8, 16 and 32 to take into account the effect of varying block sizes on the perceived quality of the recovered background. As a result, 18 background images are generated for each of the eight scenes. These 144 ( 18 × 8 ) reconstructed background images along with the corresponding reference images for the scene are then used for the subjective evaluation. Each of the scenes pose a different challenge for the background reconstruction algorithms. For example, “Street” and “Wall” are outdoor sequences with textured backgrounds while “Hall” is an indoor sequence with textured background. The “WetFloor” sequence challenges the underlying principal of many background reconstruction algorithms with water appearing as a low-contrast foreground object. The “Escalator” sequence has large motion in the background due to the moving escalator, while “Park” has smaller motion in the background due to waving trees. The “Illumination” sequence exhibits changing light sources, directions and intensities while the “Building” sequence has changing reflections in the background. Broadly, the dataset contains two categories based on the scene characteristics: (i) Static, the scenes for which all the pixels in the background are stationary; and (ii) Dynamic, the scenes for which there are non-stationary background pixels (e.g., moving escalator, waving trees, varying reflections). Four out of the eight scenes in the ReBaQ dataset are categorized as Static and the remaining four are categorized as Dynamic scenes. The reference background images corresponding to the static scenes are shown in Figure 1a. Although there are reflections on the floor in the “WetFloor” sequence, it does not exhibit variations at the time of recording and hence it is categorized as a static background scene. The reference background images corresponding to the dynamic background scenes are shown in Figure 1b.

3.1.2. SBMNet Based Reconstructed Background Quality (S-ReBaQ) Dataset

This dataset is created from the videos in the Scene Background Modeling Net (SBMNet) dataset [26] used for the Scene Background Modeling Challenge (SBMC) 2016 [22]. SMBNet consists of image sequences corresponding to a total of 79 scenes. These image sequences are representative of typical indoor and outdoor visual data captured in surveillance, smart environment, and video dataset scenarios. The spatial resolutions of the sequences corresponding to different scenes vary from 240 × 240 to 800 × 600. The length of the sequences also varies from 6 to 9370 images. The authors of SBMNet categorize these scenes into eight different classes based on the challenges posed [26]: (a) the Basic category represents a mixture of mild challenges typical of the shadows, Dynamic Background, Camera Jitter and Intermittent Object Motion categories; (b) the Background motion category includes scenes with strong (parasitic) background motion; for example, in the “Advertisement Board” sequence, the advertisement board in the scene periodically changes; (c) the Intermittent Motion category includes sequences with scenarios known for causing “ghosting” artifacts in the detected motion; (d) the Jitter category contains indoor and outdoor sequences captured by unstable cameras; (e) the Clutter category includes sequences containing a large number of foreground moving objects occluding a large portion of the background; (f) the Illumination Changes category contains indoor sequences containing strong and mild illumination changes; (g) the Very Long category contains sequences each with more than 3500 images; and (h) the Very Short category contains sequences with a limited number of images (less than 20). The authors of SBMNet [26] provide reference background images for only 13 scenes out of the 79 scenes. There is at least one scene corresponding to each category with reference background image available. We use only these 13 scenes for which the reference background images are provided. Figure 2 shows the reference background images corresponding to the scenes in this dataset with the categories from SBMNet [26,27] in brackets. Background images that were reconstructed by 14 algorithms submitted to SBMC [12,16,42,43,44,45,46,47,48,49,50,51] corresponding to the selected 13 scenes were used in this work for conducting subjective tests. As a result, a total of 182 ( 13 × 14 ) reconstructed background images along with their corresponding subjective scores form the S-ReBaQ dataset.

3.2. Subjective Evaluation

The subjective ratings are obtained by asking the human subjects to rate the similarity of the reconstructed background images to the reference background images. The subjects had to score the images based on three aspects: (1) overall perceived visual image quality; (2) visibility or presence of foreground objects; and (3) perceived background reconstruction quality. The subjects had to score the image quality on a 5-point scale, with 1 being assigned to the lowest rating of ‘Bad’ and 5 assigned to the highest rating of ‘Excellent’. The second aspect was determining the presence of foreground objects. For our application, we defined the foreground object as any object that is not present in the reference image. The foreground visibility was scored on a 5-point scale marked as: ‘1—All foreground visible’, ‘2—Mostly visible’, ‘3—Partly visible but annoying’, ‘4—Partly visible but not annoying’ and ‘5—None visible’. The background reconstruction quality was also measured using a 5-point scale similar to that of the image quality, but the choices were limited based on how the first two aspects of an image were scored. If either the image quality or foreground object visibility was rated 2 or less, the highest possible score for background reconstruction quality was restricted to the minimum of the two scores. For example, as illustrated in Figure 3, if the image quality was rated as excellent, but the foreground object visibility was rated 1 (all visible), the reconstructed background quality cannot be scored to be very high. Choices for background reconstruction quality rating were not restricted for any other image quality and foreground object visibility scores. The background reconstruction quality scores, referred to as raw scores in the rest of the paper, are used for calculating the Mean Opinion Score (MOS).
We adopted a double-stimulus technique in which the reference and the reconstructed background images were presented side-by-side [52] to each subject as shown in Figure 3. Though the same testing strategy and set up was used for the ReBaQ and S-ReBaQ datasets described in Section 3.1, the tests for each dataset were conducted in separate sessions.
As discussed in [28], the subjective experiments were carried out on a 23-inch Alienware monitor with a resolution of 1920 × 1080. Before the experiment, the monitor was reset to its factory settings. The setup was placed in a laboratory under normal office illumination conditions. Subjects were asked to sit at a viewing distance of 2.5 times the monitor height.
Seventeen subjects participated in the subjective test for the ReBaQ dataset, while sixteen subjects participated in the subjective test for the S-ReBaQ dataset. The subjects were tested for vision and color blindness using the Snellen chart [53] and Ishihara color vision test [54], respectively. A training session was conducted before the actual subjective testing, in which the subjects were shown a few images covering different quality levels and distortions of the reconstructed background images and their responses were noted to confirm their understanding of the tests. The images used during training were not included in the subjective tests.
Since the number of participating subjects was less than 20 for each of the datasets, the raw scores obtained by subjective evaluation were screened using the procedure in ITU-R BT 500.13 [52]. The kurtosis of the scores is determined as the ratio of the fourth order moment and the square of the second order moment. If the kurtosis lies between 2 and 4, the distribution of the scores can be assumed to be normal. If more than 5% of the scores given by a particular subject lie outside the range of two standard deviations from the mean scores in case of normally distributed scores, this subject is rejected. For the scores that are not normally distributed, the range is determined as 20 times the standard deviation. In our study, two subjects were found to be outliers and the corresponding scores were rejected for the ReBaQ dataset, while no subject was rejected for the S-ReBaQ dataset. MOS scores were calculated as the average of the raw scores retained after outlier removal. The raw scores and MOS scores with the standard deviations are provided along with the dataset.
Figure 4 shows an input sequence for a scene in the ReBaQ dataset together with reconstructed background images using different algorithms and corresponding MOS scores. Starting from the leftmost image in Figure 4b, the first image shows an example of a reconstructed background with the presence of a significant amount of foreground residue, which results in a very low subjective score in spite of acceptable perceived image quality in the remaining areas of the scene. The second and the third images have lesser foreground residue as compared to the first and hence are scored higher. The last image has no foreground residue at all and demonstrates good image quality for the most part except for structural deformation in the escalator. This image is scored much higher than all the other three images but still does not get a perfect score.

4. Proposed Reconstructed Background Quality Index

In this section, we propose a full-reference quality index that can automatically assess the perceived quality of the reconstructed background images. The proposed Reconstructed Background Quality Index (RBQI) uses a probability summation model to combine visual characteristics at multiple scales and quantify the deterioration in the perceived quality of the reconstructed background image due to the presence of any residual foreground objects or unnaturalness that may be introduced by the background reconstruction algorithm. The motivation for RBQI comes from the fact that the quality of a reconstructed background image depends on two factors namely:
(i)
the visibility of the foreground objects, and
(ii)
the visible artifacts introduced while reconstructing the background image.
A block diagram of the proposed quality index (RBQI) is shown in Figure 5. An L-level multi-scale decomposition of the reference and reconstructed background images is obtained through lowpass filtering using an averaging filter [19] and downsampling, where l = 0 corresponds to the finest scale and l = L 1 corresponds to the coarsest scale. For each level l = 0 , , L 1 , contrast, structure and color differences are computed locally at each pixel to produce a contrast-structure difference map and a color difference map. The difference maps are combined in local regions within each scale and later across scales using a ‘probability summation model’ to predict the perceived quality of the reconstructed background image. More details about the computation of the difference maps and the proposed RBQI based on a probability summation model are provided below.

4.1. Structure Difference Map ( d s )

An image can be decomposed into three different components: luminance, contrast and structure, which are local features computed at every pixel location ( x , y ) of the image as described in [29]. By comparing these components, similarity between two images can be calculated [19,29]. A reconstructed background image is formed by mosaicing together parts of different input images. For such an image to appear natural, it is important that the structural continuity be maintained. Preservation of the local luminance from the reference background image is of low relevance as long as this structure continuity is maintained. Any sudden variation in the local luminance across the reconstructed background image manifests itself as contrast or structure deviation from the reference image. Thus, we consider only contrast and structure for comparing the reference and reconstructed background images while leaving out the luminance component. These contrast and structure differences between the reference and the reconstructed background images, calculated at each pixel, give us the ‘contrast-structure similarity map’ referred to as ‘structure map’ for short in the rest of the paper.
First, the structure similarity between the reference and the reconstructed background image, referred to as Structure Index ( S I ), is calculated at each pixel location ( x , y ) using [29]:
S I ( x , y ) = 2 σ r ( x , y ) i ( x , y ) + C σ r ( x , y ) 2 + σ i ( x , y ) 2 + C ,
where r is the reference background image, i is the reconstructed background image, and σ r ( x , y ) i ( x , y ) is the cross-correlation between image patches centered at location ( x , y ) in the reference and reconstructed background images. σ r ( x , y ) and σ i ( x , y ) are the standard deviations computed using pixel values in a patch centered at location ( x , y ) in the reference and reconstructed background image, respectively. C is a small constant to avoid instability and is calculated as C = ( K · I m a x ) 2 , K is set to 0.03 and I m a x is the maximum possible value of the pixel intensity (255 in this case) [29]. A higher S I value indicates higher similarity between the pixels in the reference and reconstructed background images.
The background scenes often contain pseudo-stationary objects such as waving trees, escalator, local and global illumination changes. Even though these pseudo-stationary pixels belong to the background, because of the presence of motion, they are likely to be classified as foreground pixels. For this reason, the pseudo-stationary backgrounds pose an additional challenge for the quality assessment algorithms. Just comparing co-located pixel neighborhoods in the two considered images is not sufficient in the presence of such dynamic backgrounds, our algorithm uses a search window of size n h o o d × n h o o d centered at the current pixel ( x , y ) in the reconstructed image, where n h o o d is an odd value. The S I is calculated between the pixel at location ( x , y ) in the reference image and ( n h o o d ) 2 pixels within the n h o o d × n h o o d search window centered at pixel ( x , y ) in the reconstructed image. The resulting S I matrix is of size n h o o d × n h o o d . The modified Equation (1) to calculate S I for every pixel location in the n h o o d × n h o o d window centered at ( x , y ) is given as:
S I ( x , y ) ( m , n ) = 2 σ r ( x , y ) i ( m , n ) + C σ r ( x , y ) 2 + σ i ( m , n ) 2 + C ,
where
m = x ( n h o o d 1 ) / 2 : x + ( n h o o d 1 ) / 2 , n = y ( n h o o d 1 ) / 2 : y + ( n h o o d 1 ) / 2 .
The maximum value of the S I matrix is taken to be the final S I value for the pixel at location ( x , y ) as given below:
S I ( x , y ) = max ( m , n ) ( S I ( x , y ) ( m , n ) ) .
The S I map takes on values between [−1, 1].
In the proposed method, the S I map is computed at L different scales denoted as S I l ( x , y ) , l = 0 , , L 1 . The SI maps generated at three different scales for the background image shown in Figure 4b and reconstructed using the method of [12] are shown in Figure 6. The darker regions in these images indicate larger structure differences between the reference and the reconstructed background images while the lighter regions indicate higher similarities. From Figure 6c, it can also be seen that the computed SI maps show the structure distortions while being robust to the escalator motion in the background.
The structure difference map is calculated using the S I map at each scale l as follows:
d s , l ( x , y ) = 1 S I l ( x , y ) 2 .
d s , l takes on values between [0, 1], where the value of 0 corresponds to no difference while 1 corresponds to the largest difference.

4.2. Color Distance ( d c )

The d s , l map is vulnerable to failures while detecting differences in areas of background images with no texture or no structural information. For example, the interior region of a large solid foreground object such as a car does not have much structural information but can differ in color from the background. It should be noted that we use the term “color” here to refer to both luminance and chrominance components. It is important to include the luminance difference while computing the color differences to account for situations where the foreground objects do not vary in color but just in luminance, for example, shadows of foreground objects in the scene. Hence, we incorporate the color information at every scale while calculating the RBQI. The color difference between the filtered reference and reconstructed background images at each scale l is then calculated as the Euclidian distance between the values of co-located pixels in the L a b color space as follows:
d c , l ( x , y ) = ( L r , l ( x , y ) L i , l ( x , y ) ) 2 + ( a r , l ( x , y ) a i , l ( x , y ) ) 2 + ( b r , l ( x , y ) b i , l ( x , y ) ) 2 .

4.3. Computation of the Reconstructed Background Quality Index (RBQI) Based on Probability Summation

As indicated previously, the reference and reconstructed background images are decomposed each into a multi-scale pyramid with L levels. Structure difference maps d s , l and color difference maps d c , l are computed at every level l = 0 , , L 1 as described in Equations (4) and (5), respectively. These difference maps are pooled together within the scale and later across all scales using a probability summation model [55] to give the final RBQI.
The probability summation model as described in [55] considers an ensemble of independent difference detectors at every pixel location in the image. These detectors predict the probability of perceiving the difference between the reference and the reconstructed background images at the corresponding pixel location based on its neighborhood characteristics in the reference image. Using this model, the probability of the structure difference detector signaling the presence of a structure difference at pixel location ( x , y ) at level l can be modeled as an exponential of the form:
P D , s , l ( x , y ) = 1 exp | d s , l ( x , y ) α s , l ( x , y ) | β s ,
where β s is a parameter chosen to increase the correspondence of RBQI with the experimentally determined MOS scores on a training dataset as described in Section 5.2 and α s , l ( x , y ) is a parameter whose value depends upon the texture characteristics of the neighborhood centered at ( x , y ) in the reference image. The value of α s , l ( x , y ) is chosen to take into account that differences in structure are less perceptible in textured areas as compared to non-textured areas and that the perception of these differences depends on the scale l.
In order to determine the value of α s , l , every pixel in the reference background image at scale l is classified as textured or non-textured using the technique in [56]. This method first calculates the local variance at each pixel using a 3 × 3 window centered around it. Based on the computed variances, a pixel is classified as edge, texture or uniform. By considering the number of edge, texture and uniform pixels in the 8 × 8 neighborhood of the pixel, it is further classified into one of the six types: uniform, uniform/texture, texture, edge/texture, medium edge and strong edge. For our application, we label the pixels classified as ‘texture’ and ‘edge/texture’ as ’textured’ pixels and we label the rest as ‘non-textured’ pixels.
Let f t e x , l ( x , y ) = 1 be the flag indicating that a pixel is textured. Thus, values of α s , l ( x , y ) can be expressed as:
α s , l ( x , y ) = 1 . 0 , if f t e x , l ( x , y ) = 0 , a , where a 1.0 , if f t e x , l ( x , y ) = 1 ,
when f t e x , l ( x , y ) = 1 , the value of a should be large enough such that P D , s , l ( x , y ) 0 . In our implementation, we chose the value of a = 1000.0 . Thus, in our current implementation, α s , l ( x , y ) takes on the form of a binary function that can be replaced with a computationally efficient model obtained by replacing division by α s , l ( x , y ) in Equation (6) with multiplication by weight w s , l ( x , y ) = 1 / α s , l ( x , y ) = ( 1 f t e x , l ( x , y ) ) . In the remainder of the paper, we keep the notation in Equation (6) to accommodate a more generalized adaptation model based on local image characteristics in textured areas.
Similarly, the probability of the color difference detector signaling the presence of a color difference at pixel location ( x , y ) at level l can be modeled as:
P D , c , l ( x , y ) = 1 exp | d c , l ( x , y ) α c , l ( x , y ) | β c ,
where β c is found in a similar way to β s and α c , l ( x , y ) corresponds to the Adaptive Just Noticeable Distortion (AJNCD) calculated at every pixel ( x , y ) in the L a b color space as given in [57]:
α c , l ( x , y ) = J N C D L a b · s L ( E ( L l ( x , y ) ) , Δ L l ( x , y ) ) · s C ( a l ( x , y ) , b l ( x , y ) ) ,
where a l ( x , y ) and b l ( x , y ) correspond, respectively, to the a and b color values of the pixel located at ( x , y ) in the Lab color space, J N C D L a b is set to 2.3 [58], and E ( L l ) is the mean background luminance of the pixel at ( x , y ) and Δ L is the maximum luminance gradient across pixel ( x , y ) . In Equation (9), s C is the scaling factor for the chroma components as is given by [57]:
s C ( a l ( x , y ) , b l ( x , y ) ) = 1 + 0.045 · ( a l 2 ( x , y ) + b l 2 ( x , y ) ) 1 / 2 .
s L is the scaling factor that simulates the local luminance texture masking and is given by:
s L ( E ( L l ) , Δ L l ) = ρ ( E ( L l ) ) Δ L l + 1.0 ,
where ρ ( E ( L l ) ) is the weighting factor as described in [57]. Thus, α c , l varies at every pixel location based on the distance between the chroma values and texture masking properties of its neighborhood.
A pixel ( x , y ) at the l-th level is said to have no distortion if and only if neither the structure difference detector nor the color difference detector at location ( x , y ) signal the presence of a difference. Thus, the probability of detecting no difference between the reference and reconstructed background images at pixel ( x , y ) and level l can be written as:
P N D , l ( x , y ) = ( 1 P D , s , l ( x , y ) ) · ( 1 P D , c , l ( x , y ) ) .
Substituting Equations (6) and (8) for P D , s , l and P D , c , l , respectively, in Equation (12), we get:
P N D , l ( x , y ) = exp ( D s , l ( x , y ) ) · exp ( D c , l ( x , y ) ) ,
where
D s , l ( x , y ) = | d s , l ( x , y ) α s , l ( x , y ) | β s
and
D c , l ( x , y ) = | d c , l ( x , y ) α c , l ( x , y ) | β c .
A less localized probability of difference detection can be computed by adopting the “probability summation” hypothesis [55], which pools the localized detection probabilities over a region R.
The probability summation hypothesis is based on the following two assumptions:
Assumption 1.
A structure difference is detected in the region of interest R if and only if at least one detector in R signals the presence of a difference, i.e., if and only if at least one of the differences d s , l ( x , y ) is greater than the threshold α s and, therefore, considered to be visible. Similarly, a color difference is detected in region R if and only if at least one of the differences d c , l ( x , y ) is above α c .
Assumption 2.
The probabilities of detection are independent; i.e., the probability that a particular detector will signal the presence of a difference is independent of the probability that any other detector will. This simplified approximation model is commonly used in the psychophysics literature [55,59] and was found to work well in practice in terms of correlation with human judgement in quantifying perceived visual distortions [60,61].
Then, the probability of no difference detection over the region R is given by:
P N D , l ( R ) = ( x , y ) R P N D , l ( x , y ) .
Substituting Equation (12) in the above equation gives:
P N D , l ( R ) = exp D s , l ( R ) β s · exp D c , l ( R ) β c ,
where
D s , l ( R ) = ( x , y ) R | d s , l ( x , y ) α s , l ( x , y ) | β s 1 β s ,
D c , l ( R ) = ( x , y ) R | d c , l ( x , y ) α c , l ( x , y ) | β c 1 β c .
In the human visual system, the highest visual acuity is limited to the size of the foveal region, which covers approximately 2 of visual angle. In our work, we consider the image regions R as foveal regions approximated by 8 × 8 non-overlapping image blocks.
The probability of no distortion detection over the l-th level is obtained by pooling the no detection probabilities over all the regions R in level l and is given by:
P N D ( l ) = R l P N D , l ( R ) ,
or
P N D ( l ) = exp D s ( l ) β s · exp D c ( l ) β c ,
where
D s ( l ) = R l D s , l ( R ) β s 1 β s ,
D c ( l ) = R l D c , l ( R ) β c 1 β c .
Similarly, we adopt a “probability summation” hypothesis to pool the detection probability across scales. It should be noted that the Human Visual Systems (HVS) dependent parameters α s , l and α c , l that are included in Equations (14) and (15), respectively, account for the varying sensitivity of the HVS at varying scales. The final probability of detecting no distortion in a reconstructed background image i is obtained when no distortion is detected at any scale and is computed by pooling the no detection probabilities P N D ( l ) over all scales l, l = 0 , , L 1 , as follows:
P N D ( i ) = l = 0 L 1 P N D ( l )
or
P N D ( i ) = exp D s β s · exp D c β c ,
where
D s = l = 0 L 1 D s ( l ) β s 1 β s ,
D c = l = 0 L 1 D c ( l ) β c 1 β c ,
where D s ( l ) and D c ( l ) are given by Equations (22) and (23), respectively. From Equations (26) and (27), it can be seen that D s and D c take the form of a Minkowski metric with exponent β s and β c , respectively.
By substituting the values D s , D c , D s ( l ) , D c ( l ) , D s , l ( R ) and D c , l ( R ) in Equation (25) and simplifying, we get:
P N D ( i ) = exp ( D ) ,
where
D = l = 0 L 1 R l ( x , y ) R D s , l ( x , y ) + D c , l ( x , y ) .
In Equation (29), D s , l ( x , y ) and D c , l ( x , y ) are given by Equations (14) and (15), respectively. Thus, the probability of detecting a difference between the reference image and a reconstructed background image i is given as:
P D ( i ) = 1 P N D ( i ) = 1 exp ( D ) .
As it can be seen from Equation (30), a lower value of D results in a lower probability of difference detection P D ( i ) while a higher value results in a higher probability of difference detection. Therefore, D can be used to assess the perceived quality in the reconstructed background image, with a lower value of D corresponding to a higher perceived quality.
The final Reconstructed Background Quality Index (RBQI) for a reconstructed background image is calculated using the logarithm of D as follows:
R B Q I = log 10 ( 1 + D ) .
As D increases, the value of RBQI increases implying more perceived distortion and thus lower quality of the reconstructed background image. The logarithmic mapping models the saturation effect, i.e., beyond a certain point, the maximum annoyance level is reached and more distortion does not affect the quality.

5. Results

In this section, we analyze the performance of RBQI in terms of its ability to predict the subjective ratings for the perceived quality of reconstructed background images. We evaluate the performance of the proposed quality index in terms of its prediction accuracy, prediction monotonicity and prediction consistency and provide comparisons with the existing statistical and IQA techniques. In our implementation, we set n h o o d = 17 , L = 3 , β s = β c = 3.5 . The choice of these parameters is described in more details in Section 5.2. We also evaluate the performance of RBQI for different scales and neighborhood search windows. We conduct a series of hypothesis tests based on the prediction residuals (errors in predictions) after nonlinear regression. These tests help in making statistically meaningful conclusions on the obtained performance results. We also conduct a sensitivity analysis test on the ReBaQ dataset as described in Section 5.3.
We use the two datasets ReBaQ and S-ReBaQ described in Section 3.1 to quantify and compare the performance of RBQI. For evaluating the performance in terms of prediction accuracy, we used the Pearson correlation coefficient (PCC) and root mean squared error (RMSE). The prediction monotonicity is evaluated using the Spearman rank-order correlation coefficient (SROCC). Finally, the Outlier Ratio (OR) (calculated as percentage of the number of of predictions outside the range of ± 2 times the standard deviations of the MOS scores) is used as a measure of prediction consistency. A 4-parameter regression function [62] is applied to the IQA metrics to provide a nonlinear mapping between the objective scores and the subjective mean opinion scores (MOS):
M O S p i = γ 1 γ 2 1 + e M i γ 3 | γ 4 | + γ 2 ,
where M i denotes the predicted quality for the ith image and M O S p i denotes the quality score after fitting, and γ n , n = 1 , 2 , , 4 , are the regression model parameters. M O S p i along with MOS scores are used to calculate the values given in Table 1, Table 2 and Table 3.

5.1. Performance Comparison

Table 1 and Table 2 show the obtained performance evaluation results of the proposed RBQI technique on the ReBaQ and S-ReBaQ datasets, respectively, as compared to the existing statistical and FR-IQA algorithms. The results show that the proposed quality index performs better in terms of prediction accuracy (PCC, RMSE) and prediction consistency (OR) as compared to any other existing technique. In terms of prediction monotonicity (SROCC), the proposed quality index is close in performance to the best performing measure for all datasets except for the ReBaQ-Dynamic dataset. Though the statistical techniques are shown to not correlated well with the subjective scores in terms of PCC on either of the datasets, pCEPs is found to perform marginally better in terms of SROCC as compared to RBQI. Among the FR-IQA algorithms, the performance of the NQM [34] comes close to the proposed technique for scenes with static background images, i.e., for the ReBaQ-Static dataset, as it considers the effects of contrast sensitivity, luminance variations, contrast interaction between spatial frequencies and contrast masking effect while weighting the SNR between the ground truth and reconstructed image. The more popular MS-SSIM [19] technique is shown to not correlate well with the subjective scores for the ReBaQ dataset. This is because the MS-SSIM calculates the final quality index of the image by just averaging over the entire image. In the problem of background reconstruction, the error might occupy a relatively small area as compared to the image size, thereby under-penalizing the residual foreground. Most of the existing FR-IQA techniques perform poorly for the ReBaQ-Dynamic dataset. This is because the assumption of pixel-to-pixel correspondence is no longer valid in the presence of pseudo-stationary background. The proposed RBQI technique uses a neighborhood window to handle such backgrounds, thereby improving the performance over NQM [34] by a margin of 10% and by 30% over MS-SSIM [19]. Additionally, as shown in Table 3, the proposed RBQI technique is found to perform significantly better in terms of PCC, RMSE and OR as compared to any of the existing IQA techniques on the larger dataset formed by combining the ReBaQ and S-ReBaQ datasets. Though RBQI is second best in terms of SROCC, it closely follows the best performing measure (pCEPs) for the combined dataset. CQM [20], which is used in the Scene Background Modeling Challenge 2016 (SBMC) [21,22] to compare the performance of the algorithms, performs poorly on the combined dataset and hence is not a good choice for evaluating the quality of reconstructed background images and not suitable for comparing the performance of background reconstruction algorithms.
The P-value is the probability of getting a correlation as large as the observed value by random chance. If the P-value is less than 0.05, then the correlation is significant. The P-values (P PCC and P SROCC ) reported in Table 1, Table 2 and Table 3 indicate that most of the correlation scores are statistically significant.

5.2. Model Parameter Selection

The proposed quality index accepts four parameters:
(1)
n h o o d , dimensions of the window centered around the current pixel for calculating the d s ;
(2)
L, number of multi-scale levels;
(3)
β s , used in the calculation of P D , s , l ( x , y ) in Equation (6); and
(4)
β c , used in the calculation of P D , c , l ( x , y ) in Equation (8).
In Table 4, we evaluate our algorithm with different values for the parameters. These simulations were run only on the ReBaQ dataset. Table 4a shows the effect of varying n h o o d values on the performance of RBQI. The performance of RBQI for ReBaQ static improved slightly with the increase in the neighborhood search window size as expected, but the performance of RBQI increased drastically for the ReBaQ dynamic dataset from n h o o d = 1 to n h o o d = 17 before starting to drop at n h o o d = 33 . Thus, we chose n h o o d = 17 for all our experiments. Table 4b gives performance results for a different number of scales. As a trade-off between the computation complexity and prediction accuracy, we chose the number of scales to be L = 3 . The probability summation model parameters β s and β c were found such that they maximized the correlation between RBQI and MOS scores on the ReBaQ dataset. As in [63], we divided the ReBaQ dataset into two subsets by randomly choosing 80% of the total images for training and 20% for testing. The random training-testing procedure was repeated 100 times and the parameters were averaged over the 100 iterations. Values β s = β c = 3 . 5 were found to correlate well with the subjective test scores.
These parameters remained unchanged for the experiments conducted on the S-ReBaQ dataset to obtain the values in Table 2 and Table 3.

5.3. Sensitivity Analysis

In addition to the statistical significance tests, we conduct sensitivity tests on the ReBaQ dataset by generating multiple smaller datasets and comparing the performance of the different techniques on these smaller datasets. These experiments were carried out on the ReBaQ-Static and ReBaQ-Dynamic datasets separately. For this purpose, we create 1000 smaller datasets from the ReBaQ-Static by randomly sampling n images such that n is smaller than the size of the dataset and such that the corresponding MOS scores cover the entire scoring range. For every dataset of size n, we calculate the PCC values after applying the nonlinear mapping of Equation (32). The mean and the standard deviation of the PCC scores, denoted by μ P C C and σ P C C respectively, over the 1000 datasets are calculated for n = 24 and n = 50 and are given in Table 5. Similar sensitivity tests were conducted on the ReBaQ-Dynamic dataset to obtain the values in Table 5. As it can be seen from Table 5, the standard deviations of the PCC across 1000 datasets of size n are very small and the relative rank of the different techniques is maintained as in Table 1.

6. Conclusions

In this paper, we addressed the problem of quality evaluation of reconstructed background images. We first proposed two different datasets for benchmarking the performance of existing and future techniques proposed to evaluate the quality of reconstructed background images. Then, we proposed the first full-reference Reconstructed Background Quality Index (RBQI) to objectively measure the perceived quality of the reconstructed background images.
The RBQI uses the probability summation model to combine visual characteristics at multiple scales and to quantify the deterioration in the perceived quality of the reconstructed background image due to the presence of any foreground objects or unnaturalness that may be introduced by the background reconstruction algorithm. The use of a neighborhood search window while calculating the contrast and structure differences provides further boost in the performance in the presence of pseudo-stationary background while not affecting the performance on scenes with static background. The probability summation model penalizes only the perceived differences across the reference and reconstructed background images while the unperceived differences do not affect the RBQI, thereby giving better correlation with the subjective scores. Experimental results on the benchmarking datasets showed that the proposed measure out-performed existing statistical and IQA techniques in estimating the perceived quality of reconstructed background images.
The proposed RBQI has multiple applications. It can be used by the algorithm developers to optimize the performance of their techniques by users to compare different background reconstruction algorithms and to determine which algorithm is best suited for their task. It can also be deployed in challenges (e.g., SBMC [22]) that promote the development of improved background reconstruction algorithms. As future work, the authors will investigate the development of a no-reference quality index for assessing the perceived quality of reconstructed background images in scenarios where the reference background images are not available. The no-reference metric can also be used as a feedback to the algorithm to adaptively optimize its performance.

Supplementary Materials

The ReBaQ and S-ReBaQ datasets and source code for RBQI will be available for download at the authors’ website https://github.com/ashrotre/RBQI or https://ivulab.asu.edu.

Author Contributions

A.S. and L.J.K. contributed to the design and development of the proposed method and to the writing of the manuscript. A.S. contributed additionally to the software implementation and testing of the proposed method.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RBQIReconstructed Background Quality Index
PSNRPeak Signal to Noise Ratio
AGEAverage Gray-level Error
EPsNumber of Error Pixels
pEPspercentage of Error Pixels
CEPsnumber of Clustered Error Pixels
pCEPspercentage of Clustered Error Pixels
IQAImage Quality Analysis
FR-IQAFull Reference Image Quality Assessment
HVSHuman Visual System
MS-SSIMMulti-scale Structural SIMilarity index
CQMColor image Quality Measures
PETSPerformance Evaluation of Tracking and Surveillance
SBMNetScene Background Modeling Net
SSIMStructural SIMilarity index
VSNRVisual Signal to Noise ratio
VIFVisual Information Fidelity
VIFPpixel-based Visual Information Fidelity
UQIUniversal Quality Index
IFCImage Fidelity Criterion
NQMNoise Quality Measure
WSNRWeights Signal to Noise Ratio
FSIMFeature SIMilarity index
FSIMcFeature SIMilarity index with color
SR-SIMSpectral Residual SIMilarity index
SalSSIMSaliency-based Structural SIMilarity index
ReBaQReconstructed Background Quality dataset
S-ReBaQSBMNet based Reconstructed Background Quality dataset
SBMCScene Background Modeling
MOSMean Opinion Score
PCCPearson Correlation Coefficient
SROCCSpearman Rank Order Correlation Coefficient
RMSERoot Mean Square Error
OROutlier Ratio

References

  1. Colque, R.M.; Cámara-Chávez, G. Progressive Background Image Generation of Surveillance Traffic Videos Based on a Temporal Histogram Ruled by a Reward/Penalty Function. In Proceedings of the 2011 24th SIBGRAPI Conference on Graphics, Patterns and Images (Sibgrapi), Maceio, Brazil, 28–31 August 2011; pp. 297–304. [Google Scholar]
  2. Stauffer, C.; Grimson, W.E.L. Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 747–757. [Google Scholar] [CrossRef][Green Version]
  3. Li, L.; Huang, W.; Gu, I.Y.H.; Tian, Q. Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 2004, 13, 1459–1472. [Google Scholar] [CrossRef] [PubMed]
  4. Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. Multicamera People Tracking with a Probabilistic Occupancy Map. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 267–282. [Google Scholar] [CrossRef] [PubMed][Green Version]
  5. Flores, A.; Belongie, S. Removing pedestrians from google street view images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 53–58. [Google Scholar]
  6. Jones, W.D. Microsoft and Google vie for virtual world domination. IEEE Spectr. 2006, 43, 16–18. [Google Scholar] [CrossRef]
  7. Zheng, E.; Chen, Q.; Yang, X.; Liu, Y. Robust 3D modeling from silhouette cues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 1265–1268. [Google Scholar]
  8. Maddalena, L.; Petrosino, A. A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications. IEEE Trans. Image Process. 2008, 17, 1168–1177. [Google Scholar] [CrossRef] [PubMed][Green Version]
  9. Varadarajan, S.; Karam, L.; Florencio, D. Background subtraction using spatio-temporal continuities. In Proceedings of the 2010 2nd European Workshop on Visual Information Processing, Paris, France, 5–6 July 2010; pp. 144–148. [Google Scholar]
  10. Farin, D.; de With, P.; Effelsberg, W. Robust background estimation for complex video sequences. In Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain, 14–17 September 2003; Volume 1, pp. 145–148. [Google Scholar]
  11. Hsiao, H.H.; Leou, J.J. Background initialization and foreground segmentation for bootstrapping video sequences. EURASIP J. Image Video Process. 2013, 1, 12. [Google Scholar] [CrossRef]
  12. Reddy, V.; Sanderson, C.; Lovell, B. A low-complexity algorithm for static background estimation from cluttered image sequences in surveillance contexts. EURASIP J. Image Video Process. 2010, 1, 1:1–1:14. [Google Scholar] [CrossRef]
  13. Yao, J.; Odobez, J. Multi-Layer Background Subtraction Based on Color and Texture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
  14. Colombari, A.; Fusiello, A. Patch-Based Background Initialization in Heavily Cluttered Video. IEEE Trans. Image Process. 2010, 19, 926–933. [Google Scholar] [CrossRef] [PubMed][Green Version]
  15. Herley, C. Automatic occlusion removal from minimum number of images. In Proceedings of the IEEE International Conference on Image Processing, Genova, Italy, 14 September 2005; Volume 2, pp. 1046–1049. [Google Scholar]
  16. Agarwala, A.; Dontcheva, M.; Agrawala, M.; Drucker, S.; Colburn, A.; Curless, B.; Salesin, D.; Cohen, M. Interactive Digital Photomontage. ACM Trans. Gr. 2004, 23, 294–302. [Google Scholar] [CrossRef]
  17. Shrotre, A.; Karam, L. Background recovery from multiple images. In Proceedings of the IEEE Digital Signal Processing and Signal Processing Education Meeting, Napa, CA, USA, 11–14 August 2013; pp. 135–140. [Google Scholar]
  18. Maddalena, L.; Petrosino, A. Towards Benchmarking Scene Background Initialization. In 2015 ICIAP: New Trends in Image Analysis and Processing—ICIAP 2015 Workshops; Springer: Berlin, Germany, 2015; pp. 469–476. [Google Scholar]
  19. Wang, Z.; Simoncelli, E.; Bovik, A. Multiscale structural similarity for image quality assessment. In Proceedings of the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
  20. Yalman, Y.; Ertürk, İ. A new color image quality measure based on YUV transformation and PSNR for human vision system. Turk. J. Electr. Eng. Comput. Sci. 2013, 21, 603–612. [Google Scholar]
  21. Bouwmans, T.; Maddalena, L.; Petrosino, A. Scene background initialization: A taxonomy. Pattern Recognit. Lett. 2017, 96, 3–11. [Google Scholar] [CrossRef]
  22. Maddalena, L.; Jodoin, P. Scene Background Modeling Contest (SBMC2016). Available online: http://www.icpr2016.org/site/session/scene-background-modeling-sbmc2016/ (accessed on 15 May 2018).
  23. Toyama, K.; Krumm, J.; Brumitt, B.; Meyers, B. Wallflower: principles and practice of background maintenance. In Proceedings of the IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 1, pp. 255–261. [Google Scholar]
  24. Mahadevan, V.; Vasconcelos, N. Spatiotemporal Saliency in Dynamic Scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 171–177. [Google Scholar] [CrossRef] [PubMed]
  25. Sheikh, Y.; Shah, M. Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1778–1792. [Google Scholar] [CrossRef] [PubMed][Green Version]
  26. Jodoin, P.; Maddalena, L.; Petrosino, A. SceneBackgroundModeling.Net (SBMnet). Available online: www.SceneBackgroundModeling.net (accessed on 15 May 2018).
  27. Jodoin, P.M.; Maddalena, L.; Petrosino, A.; Wang, Y. Extensive Benchmark and Survey of Modeling Methods for Scene Background Initialization. IEEE Trans. Image Process. 2017, 26, 5244–5256. [Google Scholar] [CrossRef] [PubMed]
  28. Shrotre, A.; Karam, L. Visual quality assessment of reconstructed background images. In Proceedings of the International Conference on Quality of Multimedia Experience, Lisbon, Portugal, 6–8 June 2016; pp. 1–6. [Google Scholar]
  29. Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  30. Chandler, D.; Hemami, S. VSNR: A Wavelet-Based Visual Signal to Noise Ratio for Natural Images. IEEE Trans. Image Process. 2007, 16, 2284–2298. [Google Scholar] [CrossRef] [PubMed]
  31. Sheikh, H.; Bovik, A. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef] [PubMed][Green Version]
  32. Wang, Z.; Bovik, A. A universal image quality index. IEEE Sign. Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  33. Sheikh, H.; Bovik, A.; de Veciana, G. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 2005, 14, 2117–2128. [Google Scholar] [CrossRef] [PubMed][Green Version]
  34. Damera-Venkata, N.; Kite, T.; Geisler, W.; Evans, B.; Bovik, A. Image quality assessment based on a degradation model. IEEE Trans. Image Process. 2000, 9, 636–650. [Google Scholar] [CrossRef] [PubMed]
  35. Mitsa, T.; Varkur, K. Evaluation of contrast sensitivity functions for the formulation of quality measures incorporated in halftoning algorithms. In Proceedings of the IEEE International Conference on Acoustics Speech, and Signal Processing, Minneapolis, MN, USA, 27–30 April 1993; Volume 5, pp. 301–304. [Google Scholar]
  36. Zhang, L.; Zhang, D.; Mo, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed][Green Version]
  37. Zhang, L.; Li, H. SR-SIM: A fast and high performance IQA index based on spectral residual. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 1473–1476. [Google Scholar]
  38. Akamine, W.; Farias, M. Incorporating visual attention models into video quality metrics. In Proceedings of SPIE; SPIE: Bellingham, WA, USA, 2014; Volume 9016, pp. 1–9. [Google Scholar]
  39. Lin, W.; Kuo, J.C.C. Perceptual visual quality metrics: A survey. J. Vis. Commun. Image Represent. 2011, 22, 297–312. [Google Scholar] [CrossRef][Green Version]
  40. Chandler, D.M. Seven Challenges in Image Quality Assessment: Past, Present, and Future Research. ISRN Signal Process. 2013, 1–53. [Google Scholar] [CrossRef]
  41. Seshadrinathan, K.; Pappas, T.N.; Safranek, R.J.; Chen, J.; Wang, Z.; Sheikh, H.R.; Bovik, A.C. Image Quality Assessment. In The Essential Guide to Image Processing; Bovik, A.C., Ed.; Elsevier: New York, NY, USA, 2009; Chapter 21; pp. 553–595. [Google Scholar][Green Version]
  42. Laugraud, B.; Piérard, S.; Van Droogenbroeck, M. LaBGen-P: A pixel-level stationary background generation method based on LaBGen. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 107–113. [Google Scholar]
  43. Maddalena, L.; Petrosino, A. Extracting a background image by a multi-modal scene background model. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 143–148. [Google Scholar]
  44. Javed, S.; Jung, S.K.; Mahmood, A.; Bouwmans, T. Motion-Aware Graph Regularized RPCA for background modeling of complex scenes. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 120–125. [Google Scholar]
  45. Liu, W.; Cai, Y.; Zhang, M.; Li, H.; Gu, H. Scene background estimation based on temporal median filter with Gaussian filtering. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 132–136. [Google Scholar]
  46. Ramirez-Alonso, G.; Ramirez-Quintana, J.A.; Chacon-Murguia, M.I. Temporal weighted learning model for background estimation with an automatic re-initialization stage and adaptive parameters update. Pattern Recognit. Lett. 2017, 96, 34–44. [Google Scholar] [CrossRef]
  47. Minematsu, T.; Shimada, A.; Taniguchi, R.I. Background initialization based on bidirectional analysis and consensus voting. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 126–131. [Google Scholar]
  48. Piccardi, M. Background subtraction techniques: A review. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar]
  49. Halfaoui, I.; Bouzaraa, F.; Urfalioglu, O. CNN-based initial background estimation. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 101–106. [Google Scholar]
  50. Chacon-Murguia, M.I.; Ramirez-Quintana, J.A.; Ramirez-Alonso, G. Evaluation of the background modeling method Auto-Adaptive Parallel Neural Network Architecture in the SBMnet dataset. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 137–142. [Google Scholar]
  51. Ortego, D.; SanMiguel, J.C.; Martínez, J.M. Rejection based multipath reconstruction for background estimation in SBMnet 2016 dataset. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 114–119. [Google Scholar]
  52. Methodology for the Subjective Assessment of the Quality of Television Pictures; Technical Report ITU-R BT.500-13; International Telecommunications Union: Geneva, Switzerland, 2012.
  53. Snellen, H. Probebuchstaben zur Bestimmung der Sehschärfe; P.W. Van de Weijer: Utrecht, The Netherlands, 1862. [Google Scholar]
  54. Waggoner, T.L. PseudoIsochromatic Plate (PIP) Color Vision Test 24 Plate Edition. Available online: http://colorvisiontesting.com/ishihara.htm (accessed on 15 May 2018).
  55. Robson, J.; Graham, N. Probability summation and regional variation in contrast sensitivity across the visual field. Vis. Res. 1981, 21, 409–418. [Google Scholar] [CrossRef]
  56. Su, J.; Mersereau, R. Post-procesing for artifact reduction in JPEG-compressed images. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 9–12 May 1995; pp. 2363–2366. [Google Scholar]
  57. Chou, C.H.; Liu, K.C. Colour image compression based on the measure of just noticeable colour difference. IET Image Process. 2008, 2, 304–322. [Google Scholar] [CrossRef]
  58. Mahy, M.; Eycken, L.; Oosterlinck, A. Evaluation of uniform color spaces developed after the adoption of CIELAB and CIELUV. Color Res. Appl. 1994, 19, 105–121. [Google Scholar]
  59. Watson, A.; Kreslake, L. Measurement of visual impairment scales for digital video. In Human Vision and Electronic Imaging VI; International Society for Optics and Photonics: Bellingham, WA, USA, 2001; Volume 4299, pp. 79–89. [Google Scholar]
  60. Watson, A.B. DCT quantization matrices visually optimized for individual images. In Human Vision and Electronic Imaging VI; International Society for Optics and Photonics: Bellingham, WA, USA, 1993; Volume 1913, pp. 202–216. [Google Scholar]
  61. Hontsch, I.; Karam, L.J. Adaptive image coding with perceptual distortion control. IEEE Trans. Image Process. 2002, 11, 213–222. [Google Scholar] [CrossRef] [PubMed][Green Version]
  62. VQEG. Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment. 2003. Available online: ftp://vqeg.its.bldrdoc.gov/Documents/VQEGApprovedFinalReports/VQEGIIFinalReport.pdf (accessed on 15 May 2018).
  63. Mittal, A.; Moorthy, A.; Bovik, A. No-Reference Image Quality Assessment in the Spatial Domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed][Green Version]
Figure 1. Reference background images for different scenes in the Reconstructed Background Quality (ReBaQ) Dataset. Each reference background image corresponds to a captured scene background without foreground objects. (a) Scenes with static backgrounds from the ReBaQ dataset; (b) Scenes with pseudo-stationary backgrounds from the ReBaQ dataset.
Figure 1. Reference background images for different scenes in the Reconstructed Background Quality (ReBaQ) Dataset. Each reference background image corresponds to a captured scene background without foreground objects. (a) Scenes with static backgrounds from the ReBaQ dataset; (b) Scenes with pseudo-stationary backgrounds from the ReBaQ dataset.
Jimaging 04 00082 g001
Figure 2. Reference background images for different scenes in the SBMNet based Reconstructed Background Quality (S-ReBaQ) Dataset. Each reference background image corresponds to a captured scene background without foreground objects.
Figure 2. Reference background images for different scenes in the SBMNet based Reconstructed Background Quality (S-ReBaQ) Dataset. Each reference background image corresponds to a captured scene background without foreground objects.
Jimaging 04 00082 g002
Figure 3. Subjective test Graphical User Interface (GUI).
Figure 3. Subjective test Graphical User Interface (GUI).
Jimaging 04 00082 g003
Figure 4. Example input sequence and recovered background images with corresponding MOS scores from the ReBaQ dataset. (a) Four out of eight input images from the input sequence “Escalator”; (b) Background images reconstructed by different algorithms and corresponding MOS scores.
Figure 4. Example input sequence and recovered background images with corresponding MOS scores from the ReBaQ dataset. (a) Four out of eight input images from the input sequence “Escalator”; (b) Background images reconstructed by different algorithms and corresponding MOS scores.
Jimaging 04 00082 g004
Figure 5. Block diagram describing the computation of the proposed Reconstructed Background Quality Index (RBQI).
Figure 5. Block diagram describing the computation of the proposed Reconstructed Background Quality Index (RBQI).
Jimaging 04 00082 g005
Figure 6. Structure Index (SI) map with n h o o d = 17 for the background image reconstructed using the method in [12] as shown in Figure 4b. The darker regions indicate larger structure differences between the reference and the reconstructed background image. (a) Scale l = 0; (b) Scale l = 1; (c) Scale l = 2.
Figure 6. Structure Index (SI) map with n h o o d = 17 for the background image reconstructed using the method in [12] as shown in Figure 4b. The darker regions indicate larger structure differences between the reference and the reconstructed background image. (a) Scale l = 0; (b) Scale l = 1; (c) Scale l = 2.
Jimaging 04 00082 g006
Table 1. Comparison of RBQI vs. Statistical measures and IQA techniques on the ReBaQ dataset.
Table 1. Comparison of RBQI vs. Statistical measures and IQA techniques on the ReBaQ dataset.
a. Comparison on the ReBaQ-Static Dataset.
ReBaQ-Static
PCCSROCCRMSEORPPCCPSROCC
Statistical MeasuresAGE0.77760.63480.60509.72%0.0000000.000000
EPs0.39760.50930.882913.89%0.0000000.000000
pEPs0.80580.61700.56986.94%0.0000000.000000
CEPs0.57190.69390.789311.11%0.0000000.000000
pCEPs0.62810.78430.962213.89%0.0000000.000000
Image Quality Assessment MetricsPSNR0.83240.70400.53318.33%0.0000000.000000
SSIM [29]0.59140.51680.775911.11%0.0000000.000177
MS-SSIM [19]0.72300.70850.66488.33%0.0000000.000000
VSNR [30]0.52160.39860.82099.72%0.0000030.000531
VIF [31]0.36250.08430.896815.28%0.0017540.484429
VIFP [31]0.51220.36840.826511.11%0.0000040.001470
UQI [32]0.61970.75810.962213.89%0.0000000.000000
IFC [33]0.50030.37710.833111.11%0.0000080.001105
NQM [34]0.82510.86020.54376.94%0.0000000.000000
WSNR [35]0.80130.73890.57565.56%0.0000000.000000
FSIM [36]0.72090.69700.66689.72%0.0000000.000000
FSIMc [36]0.72740.70330.66039.72%0.0000000.000000
SRSIM [37]0.79060.78620.58928.33%0.0000000.000000
SalSSIM [38]0.59830.52170.77109.72%0.0000000.000003
CQM [20]0.64010.57550.73938.33%0.0000000.000000
RBQI(Proposed)0.90060.85920.41834.17%0.0000000.000000
b. Comparison on the ReBaQ-Dynamic Dataset.
ReBaQ-Dynamic
PCCSROCCRMSEORPPCCPSROCC
Statistical MeasuresAGE0.49990.23030.76449.72%0.0050000.051600
EPs0.12080.27710.876113.89%0.0076000.018500
pEPs0.47340.27710.88259.72%0.0076000.018500
CEPs0.59510.75490.709211.11%0.0000000.000000
pCEPs0.64180.79400.882615.28%0.0000000.000000
Image Quality Assessment MetricsPSNR0.51330.41790.75758.33%0.0000040.000263
SSIM [29]0.01350.02640.882615.28%0.9102380.822439
MS-SSIM [19]0.50870.44660.75989.72%0.0000050.000085
VSNR [30]0.50900.15380.75979.72%0.0000050.198310
VIF [31]0.31030.33280.839013.89%0.1999210.236522
VIFP [31]0.48640.10040.77119.72%0.0000150.403684
UQI [32]0.62620.74500.882615.28%0.0000000.000000
IFC [33]0.43060.10240.796611.11%0.0001600.394409
NQM [34]0.68980.66000.63909.72%0.0000000.000000
WSNR [35]0.64090.57600.67759.72%0.0000000.000000
FSIM [36]0.51310.32830.75759.72%0.0000040.004922
FSIMc [36]0.51440.33100.75689.72%0.0000040.004559
SRSIM [37]0.55120.53760.736411.11%0.0000010.000001
SalSSIM [38]0.48660.32000.77109.72%0.0000150.006198
CQM [20]0.70500.76100.62598.33%0.0000000.000000
RBQI(Proposed)0.79080.67730.54025.56%0.0000000.000000
Table 2. Comparison of RBQI vs. Statistical measures and IQA techniques on the S-ReBaQ dataset.
Table 2. Comparison of RBQI vs. Statistical measures and IQA techniques on the S-ReBaQ dataset.
S-ReBaQ
PCCSROCCRMSEORPPCCPSROCC
Statistical MeasuresAGE0.64530.62382.237314.84%0.3929000.000000
EPs0.42020.14261.204924.73%0.0000000.000000
pEPs0.05050.49901.667626.92%0.4983310.000000
CEPs0.62830.66660.849118.68%0.0000000.000000
pCEPs0.83460.83800.60116.59%0.0000000.000000
Image Quality Assessment MetricsPSNR0.70990.68340.76866.59%0.0000000.000000
SSIM [29]0.59750.58270.875112.09%0.0000000.000000
MS-SSIM [19]0.80480.80300.647829.12%0.0000000.000000
VSNR [30]0.08500.17171.087413.19%0.2536750.486686
VIF [31]0.10270.20641.091427.47%0.1678420.005305
VIFP [31]0.60810.62400.866426.92%0.0000000.000000
UQI [32]0.63160.59320.846114.84%0.0000000.000000
IFC [33]0.62350.60200.853316.48%0.0000000.000000
NQM [34]0.79500.78160.662114.84%0.0000000.000000
WSNR [35]0.71760.68880.76017.14%0.0000000.000000
FSIM [36]0.72430.71570.752510.44%0.0000000.000000
FSIMc [36]0.72780.71720.748412.09%0.0000000.000000
SRSIM [37]0.78530.75380.675712.09%0.0000000.000000
SalSSIM [38]0.73560.73000.73937.14%0.0000000.000000
CQM [20]0.26340.36451.05318.24%0.0003270.000276
RBQI(Proposed)0.86130.82220.55453.30%0.0000000.000000
Table 3. Comparison of RBQI vs. Statistical measures and IQA techniques on a combined ReBaQ and S-ReBaQ dataset.
Table 3. Comparison of RBQI vs. Statistical measures and IQA techniques on a combined ReBaQ and S-ReBaQ dataset.
ReBaQ and S-ReBaQ Combined
PCCSROCCRMSEORPPCCPSROCC
Statistical MeasuresAGE0.66670.65930.846214.42%0.0000000.000000
EPs0.57440.63530.929419.02%0.0000000.000000
pEPs0.14560.69391.123329.45%0.0084640.000000
CEPs0.62020.69670.890618.40%0.0000000.000000
pCEPs0.84270.84210.61137.06%0.0000000.000000
Image Quality Assessment MetricsPSNR0.73060.71660.775310.74%0.0000000.000000
SSIM [29]0.60830.57430.901116.56%0.0000000.000000
MS-SSIM [19]0.78740.79070.69998.59%0.0000000.000000
VSNR [30]0.17890.34591.117129.75%0.0011760.001126
VIF [31]0.34780.56011.064525.77%0.0000000.000000
VIFP [31]0.62810.59110.883514.72%0.0000000.000000
UQI [32]0.70240.67780.808112.27%0.0000000.000000
IFC [33]0.64550.59760.867114.42%0.000000.000000
NQM [34]0.78000.77810.71069.51%0.0000000.000000
WSNR [35]0.76690.75500.728610.74%0.0000000.000000
FSIM [36]0.72940.70880.776711.35%0.0000000.000000
FSIMc [36]0.73370.71170.771511.35%0.0000000.000000
SRSIM [37]0.78420.78750.70458.90%0.0000000.000000
SalSSIM [38]0.71570.69600.793011.35%0.0000000.000000
CQM [20]0.56510.54290.936721.78%0.0000000.000000
RBQI(Proposed)0.87700.83720.54564.29%0.0000000.000000
Table 4. Performance comparison for different values of parameters on the ReBaQ dataset.
Table 4. Performance comparison for different values of parameters on the ReBaQ dataset.
a. Simulation results with different neighborhood search window sizes nhood.
ReBaQstaticReBaQdynamic
PCCSROCCRMSEORPCCSROCCRMSEOR
nhood = 10.79310.83140.507712.50%0.63950.65390.566211.11%
nhood = 90.90150.85810.49116.94%0.78340.66830.53946.94%
nhood = 170.90060.85810.48374.17%0.79080.67620.43745.56%
nhood = 330.90010.85810.48965.56%0.78180.66830.47695.56%
b. Simulation results with different number of scales L
ReBaQstaticReBaQdynamic
PCCSROCCRMSEORPCCSROCCRMSEOR
L = 10.81900.81830.66678.33%0.55610.55200.733512.50%
L = 20.85970.83100.55215.56%0.72810.64820.60505.56%
L = 30.90060.85920.50774.17%0.79080.67730.56625.56%
L = 40.90060.85810.49154.17%0.79540.67970.53505.56%
L = 50.90060.85810.48835.56%0.80870.68810.51915.56%
Table 5. Sensitivity analysis on the ReBaQ dataset with n = 24 and n = 50 .
Table 5. Sensitivity analysis on the ReBaQ dataset with n = 24 and n = 50 .
ReBaQ-StaticReBaQ-Dynamic
n= 24n= 50n= 24n= 50
μ PCC σ PCC μ PCC σ PCC μ PCC σ PCC μ PCC σ PCC
Statistical MeasuresAGE0.81540.04510.78980.01080.48240.05040.51640.0123
EPs0.63330.11490.48340.08010.16270.09600.14990.0866
pEPs0.83090.04520.81470.00880.44370.05730.48190.0061
CEPs0.68510.09410.61840.09230.74750.04880.62230.1500
pCEPs0.85560.05000.81780.04510.83270.05040.66440.0185
Image Quality Assessment MetricsPSNR0.86200.03980.84100.00670.51130.05030.52900.0172
SSIM [29]0.55780.08620.57750.00840.23720.22500.22900.2376
MS-SSIM [19]0.77290.05100.74010.01230.52530.07500.52320.0131
VSNR [30]0.53650.08440.52250.01820.49260.02870.52120.0101
VIF [31]0.07980.37400.05710.32450.22420.24740.19020.1916
VIFP [31]0.54530.12590.52640.03020.45150.01670.49410.0057
UQI [32]0.76160.08310.66580.02410.81050.04260.65450.0210
IFC [33]0.52490.09060.50670.01890.43460.02540.44100.0049
NQM [34]0.86190.03000.84270.01200.75640.05110.71270.0196
WSNR [35]0.85200.03920.81940.01490.71500.07270.66170.0238
FSIM [36]0.77490.05190.74210.01440.48280.03280.52020.0064
FSIMc [36]0.78100.05070.74810.01430.48400.03290.52130.0065
SRSIM [37]0.83870.03440.81320.01380.62400.08950.57560.0348
SalSSIM [38]0.58560.13130.59440.01030.46980.06270.49470.0059
CQM [20]0.74370.07930.67510.03730.78630.05930.73360.0267
RBQI(Proposed)0.93200.01940.91410.00840.83550.02410.81540.0107
Back to TopTop