Smart Black Box 2.0: Efficient High-bandwidth Driving Data Collection based on Video Anomalies

Autonomous vehicles require fleet-wide data collection for continuous algorithm development and validation. The Smart Black Box (SBB) intelligent event data recorder has been proposed as a system for prioritized high-bandwidth data capture. This paper extends the SBB by applying anomaly detection and action detection methods for generalized event-of-interest (EOI) detection. An updated SBB pipeline is proposed for the real-time capture of driving video data. A video dataset is constructed to evaluate the SBB on real-world data for the first time. SBB performance is assessed by comparing the compression of normal and anomalous data and by comparing our prioritized data recording with a FIFO strategy. Results show that SBB data compression can increase the anomalous-to-normal storage ratio by ~50%, while the prioritized recording strategy saves ~25% fewer normal frames and ~50-100% more anomalous frames than a FIFO queue. We compare the real-world dataset SBB results to a baseline SBB given ground-truth anomaly labels and conclude that improved general EOI detection methods will greatly improve SBB performance.


Introduction
Traditional automotive data collection has focused on low-bandwidth vehicle data such as speed and brake status. However, modern autonomous vehicles (AVs) require large-scale collection of high-bandwidth data (e.g., video, point clouds) for algorithm development and validation and verification (V&V). Deep-learning networks for common AV tasks such as object detection [1][2][3], object tracking [4][5][6] and trajectory prediction [7][8][9][10] need significant quantities of high-bandwidth real-world data for effective training and testing.
Finite on-board storage capacity presents a challenge for high-bandwidth data capture that low-bandwidth data logging systems do not encounter. Recently, event data recorders (EDRs) specialized for such high-bandwidth data capture have been explored. The Smart Black Box (SBB) [11,12] is one such system. The SBB uses pre-defined rules for event-of-interest (EOI) detection and computes data value according to the detected EOI. Data value is then used to determine data compression factors and as the basis for prioritized data recording.
This paper expands the Smart Black Box to record high-priority video data and applies the SBB to a real-world driving dataset. Rather than using pre-defined rules for EOI detection as in [11,12], machine learning-based methods for generalized EOI detection are applied to derive data value. Raw data is grouped into buffers, compressed, and stored in a priority queue in order to discard low-value data as the storage capacity is filled. The SBB is assessed on real-world driving video. We focus on video data due to the ubiquity of cameras as a sensor in automotive applications. This paper offers two primary contributions. Firstly, we apply video anomaly detection (VAD) [13,14] and online action detection (OAD) [14,15] as methods for generalized EOI detection on real-world driving arXiv:2101.00706v2 [eess.SP] 8 Jan 2021 video. To estimate data value from combined VAD and OAD outputs, we introduce a hybrid value method as well as an Independent Bayesian Classifier Combination (IBCC) [16] method and compare their performance. Secondly, we present an updated SBB pipeline incorporating VAD and OAD and designed for the real-time recording of dash camera video data. We find that while the SBB improves the collection and retention of high-value data, improved EOI detection methods are needed to realize the full potential of the SBB.
The paper is structured as follows. First, related work is explored in Section 2. Then, an overview of the original SBB [12] is presented and the changes we made for application to real-world data are discussed in Section 3. Section 4 describes our adjusted data classification system, the updated SBB pipeline, and the new value estimation method. Section 5 presents experimental results on a combined real-world dataset and analyzes the performance of our updated SBB. Section 6 concludes the paper and discusses future work.

Event Data Recorders (EDRs)
Automotive EDRs use low-level triggers, such as vehicle impact or engine faults, to log vehicle data leading up to and during anomalous events [17,18]. However, these EDRs focus on low-bandwidth data and do not sufficiently address the storage problems posed by high-bandwidth sensors. In the case that the on-board memory is filled, one of two strategies is used. The more common strategy writes data until memory is full, then stops recording data, meaning that the newest data is dismissed. The second strategy uses a circular buffer equivalent to a first-in-first-out (FIFO) queue. In this model, the newest data overwrites the oldest data. Neither of these strategies considers the value of the data being discarded.
High-bandwidth data recorders address this by using prioritized data recording. In the case of [11,12], valuable data is identified using pre-defined rules for EOI detection. This paper seeks to build on the prioritized recording strategy in [11,12] by applying methods for general EOI detection.

Traffic Video Anomaly Detection and Classification
Several methods exist for identifying EOIs in AVs. Pre-defined rules may be applied based on vehicle odometry [19,20] to identify certain EOIs. Other approaches use physiological signals from the driver [21].
In recent years, deep learning computer vision techniques have been applied to anomaly detection in first-person driving videos [13,14,22,23]. Other works further attempt to classify the type of anomaly occurring in the video, either offline after the video is fully observed [24][25][26] or in real time [15].
These methods present a way for generalized EOI detection based only on dash camera video. As such, we use methods from [13] and [15] in the SBB to assign data value in accordance with our focus on video data.

Real-World Driving Datasets
The increasing popularity of deep-learning methods for AV perception tasks has created a demand for high-quality high-bandwidth datasets. Naturalistic Field Operation Test (NFOT) projects such as [27,28] have been used in the past to gather large amounts of driving data. One such dataset [27] uses 100 cars to log nearly 43,000 hours of video and vehicle performance data over a distance of 2,000,000 miles. The more recent Safety Pilot Model Deployment (SPMD) dataset [28] contains roughly 17,000,000 miles of data collected over almost 64,400 hours, including 17 TB of video. However, these datasets primarily focus on the capture of low-bandwidth data; the video streams of both datasets are compressed and downsampled to low frame rates.
Recently, high-quality computer vision-oriented datasets have been published. These include general driving datasets like Cityscapes [29], KITTI [30] and BDD100K [31], and traffic anomaly datasets like A3D [13], DADA [32], and DoTA [14]. Cityscapes contains 24,999 labelled images at 55 GB, while KITTI includes 7,481 images at 12 GB, in addition to 29 GB of point clouds and GPS and IMU data. BDD100K is one of the largest public driving datasets, having 100,000 HD video clips (1.8 TB) for over 1,100 driving hours in a variety of conditions. A3D, DADA and DoTA focus specifically on traffic anomalies. A3D contains 1,500 on-road accident clips with accident start and end times labelled. DADA releases 1,000 video clips with simulated driver eye-gaze. DoTA is comprised of 4,677 videos with spatial, temporal, and anomaly category annotations.
Datasets like BDD100K and DoTA have significantly extended publicly available data access for deep-learning methods to use. However, anomaly-focused datasets are still relatively small; larger datasets like BDD100K contain very few EOIs with which to test AV algorithms. As a result, evaluation of the SBB required the creation of a combined dataset using BDD100K and DoTA video clips in order to have sufficient quantities of both normal and anomalous driving data. The SBB aims to address this problem by providing a method to collect high-value video data across an entire fleet of vehicles.

Preliminaries
This work builds upon the Smart Black Box (SBB) intelligent event data recorder proposed in [12]. The original SBB design, data value estimation method, and its issues in the real world are reviewed.

Smart Black Box Design
The SBB aims to record high-quality high-value data through value-driven data compression and prioritized data recording. At each time step, one data frame is observed and collected. Based on event detectors, a scalar frame value v t ∈ [0, 1] is computed for each frame. The data frame is then appended to a buffer, which caches seconds or minutes of data. The process of buffering data frames is managed by a deterministic Mealy machine (DMM) which uses the new data value, data similarity, and the current buffer size to determine when to end the current buffer and start a new one [12]. After the DMM terminates, local buffer optimization (LBO) is used to determine the optimal compression factor d t ∈ [0, 1], called the LBO decision, for each frame in the buffer. A Gaussian data value filter can be applied over the buffered data to smooth the estimated data value. The buffered data is then compressed according to the LBO decisions and stored in long-term storage. After on-board storage is full, a priority-queue discards the lowest-value buffers to make space for higher-value buffers.

SBB Value Estimation
The SBB was previously tested only in a simulation environment, The Open Source Racing Simulator (TORCS) [33]. Experiments done using TORCS in [12] classified each frame as either normal ( 1 ) or as one of four pre-defined events of interest (EOIs): cutin, hardbraking, conflict, or crash, notated by 2 , 3 , 4 , or 5 respectively. The value of each event is pre-computed using its event likelihood in Eq. 1: where P( j ) is the likelihood of event j . These event values are then normalized over [0, 1] with max j v( j ) = 1. Frame value at time t, v t , is then set according to where (t) is the event detected at time t.
This data value estimation method in works well in a simulation environment. However, it has two main drawbacks that affect its usability in the real-world. First, the method relies entirely on a set of pre-defined rules for EOI detection. In reality, the space of traffic EOIs is large and diverse, and capturing them purely using pre-defined rules is insufficient for real-world applications. Second, the detection of the four EOIs is not always possible given only dash camera data. In simulation, the EOIs are easily detectable by tracking the cars surrounding the ego vehicle. However, limiting the available sensing to a single front-facing camera makes the identification of these EOIs significantly more challenging. In this paper, we apply an adjusted event classification system in Section 4.1.1 and a new value estimation method in Section 4.3.

Materials and Methods
This section introduces a new event classification system to extend the previous SBB and defines updated data frame and buffer representations in Section 4.1. Then, an updated SBB pipeline for real-world video data is presented in Section 4.2. Finally, methods for data value estimation using video anomaly detection and online action detection are discussed in Section 4.3.

Frame Classification
As mentioned in Section 3.2, the event classification system in [12] does not straightforwardly apply to real-world applications. Instead, real-world datasets such as DoTA [14] classify frames based on anomaly type and causation, e.g., a on-coming collision event. As such, we employ the classification system used in the DoTA dataset [14] which defines the eight traffic anomaly categories described in Table 1. Each of these anomaly categories can be further specified as ego or non-ego events. Including the normal event class, this results in 17 total event classes. Online action detection aims to classify frames according to these event classes.
To realize generalized EOI detection, we also utilize a binary anomalous or normal classification. Video anomaly detection is used to solve this binary classification problem. Table 1. Event Classes in the DoTA dataset [14]. An anomaly label with "*" indicates an event where the ego car is not involved (i.e., non-ego); otherwise the event is ego-involved.

. Data Frame Representation
A data frame is defined as all data, both observed and computed, associated with a single video frame. In this paper, we consider only data derived from camera input. This data includes: Image : The video frame captured by the camera. In this paper, we use RGB images at 1280 × 720 resolution.
Value : The value of the frame v t ∈ [0, 1]. Value is calculated according to the value function defined in Section 4.3, and is used in the DMM as well as in buffer value computation.
Cost : The normalized storage cost of the frame c t ∈ [0, 1].
Anomaly score : The anomaly score s ∈ [0, 1] of the frame generated using Video Anomaly Detection. More details can be found in Section 4.3.1. 16 ] for each event class in Table 1 from Online Action Detection. More details can be found in Section 4.3.2.

Classification scores : The output scores
Object data : The tracking ID, object type, bounding box, and detector confidence of each object detected in the frame. Object data is used to support buffer tagging; details can be found in Section 4.1.3.

Frame Buffer Representation
A buffer is a collection of frames grouped by the DMM described in Section 3.1. The buffer cost C k and value V k of the kth buffer are computed as where v i is the value of the ith frame in the buffer, d i is its compression quality, andĉ i is its post-compression storage cost. The 1 + λ with 0 < λ << 1 is an aging factor used to slightly favor more recent buffers. Additionally, buffer tags are high-level descriptions of data buffers which enable buffer indexing and searching in downstream applications. These tags include: Anomaly score : The mean, max, and variance of the anomaly scores of the frames in the buffer.
Frame classifications : A list of event classes for which there is a frame f t in the buffer where o t, > ρ and ρ is a user-defined threshold score for class .
Objects : The tracking ID, object type, and bounding boxes and detector confidences over time of each object in the buffer.

Updated SBB Design
The updated SBB is separated into four processes running in parallel: video capture, buffer management, value estimation, and prioritization. Figure 1 describes the updated SBB pipeline.
Video Capture reads video input and publishes each video frame to value estimation and buffer management. This module remains unchanged from the original SBB.
Value Estimation assigns a value v t ∈ [0, 1] for each frame to be used in buffer management and storage prioritization. The value estimation module first executes object detection, object tracking [4], and optical flow estimation [34]. The outputs are then used in Video Anomaly Detection and Online Action Detection, which are used to compute the value. Details on this calculation can be found in Section 4.3. The value estimation method used differs significantly from [12]. In [12], perfect EOI detection using pre-defined rules was assumed, and data value was computed based on detected EOIs. In this paper, we instead use video anomaly detection and action detection methods for generalized EOI detection and calculate data value based on their output scores. More details on value estimation of the original SBB can be found in Section 3.2. More details on our updated value estimation can be found in Section 4.3.
Buffer Management groups frames into buffers using the DMM from [12] after receiving each frame from Video Capture and its corresponding value from Value Estimation. The similarity of a data frame to the current buffer is computed as the percentage of object tracking IDs in the frame which have already appeared in the buffer. With A being the set of tracking IDs in the frame and B being the set of tracking IDs which have appeared in the buffer, we compute the similarity ξ t = |A∩B| |A| . Once the DMM terminates, LBO solves an optimization problem over the output buffer to determine the compression quality of each frame.
According to [12], a decoupled LBO strategy can optimize the compression quality of a single frame independent of all other frames in the buffer. Given constant η ζ , the uncoupled LBO objective function for frame f t is: where η, ζ ≥ 0 are weighting parameters and φ(d t ) maps from the compression quality to the compression ratio. Note that φ(d t ) increases monotonically over d t ∈ [0, 1]. In this paper, we use the φ function of JPEG compression on real-world driving data following [11]. Throughout the paper, the values η = 0.9 and η = 1.7 are used based on [12]. These parameters were assigned to maximize value-per-memory (VPM) of the recorded data. Further details on η and ζ parameter selection can be found in [12]. DMM and LBO functionality remain the same as [12]. However, the data similarity metric is adjusted to match our focus on dash camera data. In the previous SBB, data similarity was computed using the odometry of the host and surrounding vehicles. However, a single front-facing camera cannot capture sufficient information to use this approach. As such, we compute data similarity using detected objects in the frame, as mentioned above.
Prioritization maintains a buffer priority heap in order to retain high-value buffers and delete low-value buffers as the memory capacity is reached. Buffer value V k and cost C k of the kth buffer are computed according to Eq. 3 and Eq. 4 respectively. A binary min-heap is constructed to store buffers based on V k following [12]. This module is also unchanged from [12].

Value Estimation Method
This section introduces the data value estimation method used by the DMM module to group buffers and decide the optimal compression factors. Similar to [11,12], we define the value of a data frame as a measure of data anomaly. The data value is determined by: 1) The anomaly score estimated by a video anomaly detection (VAD) module; 2) The anomaly category detected by an online action detection (OAD) module.

Video Anomaly Detection (VAD)
A VAD algorithm takes observed image frames and predicts an anomaly score for each frame as a description of the degree of abnormality of that frame. Existing VAD algorithms can be categorized as frame-level VAD and object-level VAD. A frame-level VAD algorithm reconstructs or predicts image frames (e.g., in RGB or grayscale) and computes the L2 error of reconstruction or prediction as the anomaly score [35][36][37]. An object-level algorithm, on the other hand, predicts object appearance and/or motions and computes the anomaly score based on prediction error [38,39] or consistency [13,14].
In this paper, we run an off-the-shelf VAD algorithm to estimate an anomaly score s t of a frame f t and use it to inform our value estimation. To be specific, we trained the TAD algorithm in [13] using the Detection of Traffic Anomaly (DoTA) dataset following [14] and applied it to our data value estimation module.

Online Action Detection (OAD)
While the anomaly score from VAD provides information about the probability an anomaly occurs in a frame, it does not assess anomaly category which is important information for determining data value in long-term driving according to [12]. Categorizing anomalous events is essential to the SBB design since it allows the SBB to prioritize high value categories when the storage limit is encountered, and it allows the SBB to focus on specific event types per a user's request.
In this paper, we implement an off-the-shelf OAD algorithm to obtain a confidence score vector o t for a frame f t , which is then combined with the anomaly score s t to estimate the data value. To be specific, We trained an OAD algorithm called the temporal recurrent network (TRN) [15] using the DoTA dataset [14]. The TRN outputs a 17-D vector o t = [P t ( 1 ), P t ( 2 ), . . . , P t ( 17 )] for each frame with ∑ 16 j=0 P t ( j ) = 1 which represents the confidence score that a frame belongs to each class.

Hybrid Value
A hybrid value estimation method is proposed which sums VAD and OAD scores according to where w i is the information measure of class i and α, β are weighting parameters in [0, 1]. Because o i estimates the probability that a frame is of class i, the weighted sum over o is equivalent to the expected information measure of the frame. Note that class 0, the normal class, is not included in the computation. This is equivalent to w 0 = 0. Throughout this paper, we use α = β = 1 for simplicity.
The information measures w i for each class are calculated using the class likelihoods in the DoTA dataset found in Table 2. The information measure w i for class i is calculated in Eq. (7). Values are normalized to [0, 1] by dividing by the maximum information measure.

Independent Bayesian Classifier Combination
We also implement Independent Bayesian Classifier Combination (IBCC) to fuse VAD and OAD scores. Both VAD and OAD provide an estimate of the probability that a frame is anomalous; VAD gives the anomaly score value s t , while the OAD class probabilities for anomalous classes can be summed to generate a score r t = ∑ 16 j=1 o t,j . By assuming that these two classifiers are conditionally independent, we can apply Independent Bayesian Classifier Combination.
Let a t be the ground truth anomaly indicator of frame i such that a t = 1(0) indicates an anomalous (normal) frame. We assume a t is generated from a binomial distribution with class probabilities p = [p 0 , p 1 ], where p 0 and p 1 are the probabilities of normal and anomaly respectively. We then binarize the anomaly score s t and the OAD score r t using a threshold so that scores greater than are mapped to 1 and 0 otherwise. We assume that s t and r t are generated from binomial distributions conditioned on the ground truth anomaly status a t with class probabilities π (s) k : π (s) k,l p(s t = l|a t = k) and π (r) k : π (r) k,l p(r t = l|a t = k), respectively, where l, k ∈ {0, 1}. π (s) and π (r) are also called the confusion matrices for random variables s t and r t respectively. Thus, we have four hidden variables: the anomaly status a, the anomaly status class probabilities p, and π (s) and π (r) , the confusion matrices for the VAD and OAD scores respectively. Variation Bayes (VB) inference is applied to solve over these four hidden variables. After convergence, we have E a [a t = 1], the probability that f t is anomalous, which we take to be our IBCC estimated value. Readers are directed to [16] for further details.

Experiments
In this section, we conduct SBB data collection experiments on a specifically designed large-scale real-world video dataset and present the results. We discuss storage requirements of SBB-compressed data to showcase its preservation of valuable data. We then compare our SBB prioritized data recording with a FIFO queue data recording strategy. We examine SBB results on each anomaly class. Finally, we compare the performance of two value estimation methods: the VAD-OAD hybrid value with various parameters and Independent Bayesian Classifier Combination (IBCC) value.

Dataset
The SBB is designed for high-bandwidth data collection in long-term driving where on-board storage is limited. Therefore, SBB performance evaluation requires a large, high-quality video dataset which contains both normal driving data as well as events of interest (EOIs). To our best knowledge, there is currently no single dataset that satisfies all these requirements. The BDD100K dataset [31] is one of the largest high-quality driving video datasets and contains 100, 000 video clips covering about ∼ 1, 100 driving hours. The DoTA dataset [14] is the largest and newest high-quality video dataset for traffic anomalies and contains 4, 677 anomalous video clips. We combined the 10, 000 validation videos in the BDD100K dataset and randomly interspersed 500 anomalous video clips from the DoTA dataset, resulting in a large testing video with ∼ 4, 000, 000 frames at 10 FPS. By combining these two datasets, we obtained a > 100-hour high-quality driving video where the vast majority (∼ 99.5%) of the frames are normal but still containing a large number of EOIs the SBB might recognize and record. Note that the ST* anomaly class was not included in this combined dataset, as its rarity in the DoTA dataset led to no ST* clips being sampled.

Results
Experimental results for the SBB are presented and discussed below. In general, high storage size and decisions are desirable for anomalous data, while the opposite is true for normal data. SBB Data Compression. SBB data compression statistics with no memory limit are presented in Table 3. It can be seen that the storage cost of normal frames is significantly reduced (703.84 GB to 106.68 GB, 85%) by the SBB. This leads to a 48% increase in the ratio of anomalous data storage to normal data storage. Both the average (avg.) and median (med.) compression factor decisions of the SBB are higher for the anomalous frames, indicating that the SBB is able to identify and preserve anomalous frames over normal ones. Figure 2 displays normal frames which were highly compressed by the SBB along with preserved anomalous frames; Figure 3 shows two failure cases where anomalous frames were mistakenly compressed. Both of these failures showcase a lack of robustness against cases where anomalous objects are occluded. However, the decision difference between normal and anomaly frames is not as significant compared to the simulation experiment in [12] due to the fact that the anomalous event detection in simulation was 100% accurate while VAD on real-world data is far from perfect. Moreover, the median decision for a normal frame is significantly lower than the mean, indicating that there are outlier normal frames with unusually high value scores. The standard deviation (std.) of anomalous frames is significantly larger than that in the simulation experiment (0.36 vs ∼ 0.02), showing how inaccurate VAD and OAD reduces the SBB's efficiency on real-world data.  The limitations of VAD and OAD are further shown by evaluating performance of the SBB given ground-truth labels as VAD and OAD scores. The anomalous-to-normal storage ratio increases by 1616%, driven by the massive difference in decisions between normal and anomalous frames. This upper-bound performance of the SBB indicates that as anomaly detection techniques continue to improve, the performance of the SBB will improve as well.  Table 4 compares the recorded frames of a prioritized recording system against that of a FIFO queue at memory limits of M = 12.5 GB and 25 GB. These values represent a non-trivial amount of data to upload assuming continuous internet access is not available. In both scenarios, the prioritized recording saved fewer normal frames and more anomalous frames than with the FIFO strategy. Notably, the prioritized recording using 12.5 GB saved more anomalous frames than the FIFO queue at 25 GB. The prioritization strategy of the SBB removes ∼ 93% of the normal frames while still recording ∼ 20% anomalous frames at M = 12.5 GB. Compared to the FIFO queue, the SBB saves ∼ 25% fewer normal frames and ∼ 50-100% more anomalous frames. Performance Per Anomaly Class. Figure 4 displays the decision histograms for each anomaly class. The performance of the SBB varies heavily depending on the anomaly category. For example, the decision distribution of class OC indicates very good detection of this anomaly. In OC, an ego-vehicle collision with an oncoming vehicle, the anomalous object (the oncoming vehicle) is almost always both near the camera and largely unoccluded. However, ST, VO, LA*, VO*, and OO* have notably poor performance.
ST is an extremely difficult case for OAD due to its visual similarity to AH and LA anomalies, resulting in lower OAD confidence that an anomaly has occurred. VO and VO* involve vehicles hitting obstacles in the roadway. In some scenarios, such as hitting a traffic cone or a fire hydrant, the obstacle may be blocked from view by the anomalous vehicle in a non-ego incident or outside the camera's field of view in an ego-incident. LA* often involves vehicles slowly moving closer together, making the collision relatively subtle. OO*, a non-ego vehicle leaving the roadway, can be challenging to detect simply due to the distance at which the anomaly occurs.
Value Estimation Method Comparison. Table 5 compares decision statistics for hybrid value estimation and IBCC value estimation. We note that the VAD-only method generates the largest decision difference in normal and anomalous frames; we suspect this to be a result of OAD's inability to consistently differentiate between anomalous and normal frames. Readers are directed to [14] for an in-depth discussion on the poor performance of OAD algorithms. The IBCC-based value estimation results in a relatively low difference in decisions. However, IBCC does lead to lower standard deviation in the decision, indicating that it reduces the outlying anomaly scores and makes the decision making more stable. This is because IBCC makes use of prior distributions shared between frames for each hidden variable to establish a base expectation for the anomaly status. On the other hand, the hybrid method takes only the current observations into account, meaning each frame is completely independent from every other frame. Low decision standard deviation is especially valuable when memory is limited. With high standard deviation, many normal frames will be assigned high priority, while many anomalous frames will be assigned low priority. Thus, when memory capacity is reached, anomalous data mistakenly given low priority may be discarded.
For applications which value general EOIs, VAD-only value estimation (α = 1, β = 0) has the greatest ability to distinguish normal and anomalous data. However, users interested in specific EOIs may opt to use hybrid value in order to incorporate the EOI classification offered by OAD. In terms of hybrid value parameters, Table 5 shows that lower weights result in higher decision differences. However, in situations where retaining data quality is critical, higher α and β values may be used to achieve higher overall decision quality. Additionally, the higher decision differences as α increases shown in Figure 5 indicate once again that VAD contributes more to the differentiation of normal and anomalous frames than does OAD. Finally, the low-variance decision making of IBCC is useful in memory-limited systems when the retention of more anomalous frames at lower quality is more important than the retention of fewer anomalous frames at higher quality.

Conclusions
This paper has proposed a novel Smart Black Box (SBB) data processing pipeline that uses video anomaly detection and online action detection to efficiently record large-scale high-value video data. We have addressed storage and value estimation problems the SBB will face with real world data, made adjustments in data classification and value estimation accordingly, and presented results on a large-scale real-world driving video dataset. Value estimation is changed from an entirely information measure-based method using pre-defined EOIs to use a combination of video anomaly detection and online action detection capable of detecting more generalized EOIs. Observed decision differences between normal and anomalous data indicate that SBB value estimation can distinguish normal and anomalous frames. In experiments, a 48% increase in the anomalous-to-normal storage ratio was achieved compared to the raw data. Additionally, the prioritized recording of the SBB preserved ∼ 25% fewer normal frames and ∼ 50-100% more anomalous frames compared to a FIFO queue. However, we also noted that SBB performance increases significantly given ground-truth anomaly labels, suggesting that improved methods for general EOI detection will further improve the SBB utility.
The results of this paper motivate future work in several directions. The difference between ground truth and real-world performance of the SBB suggests the need for improved anomaly detection and action detection. Other future work may extend the value estimation method of the SBB by either considering additional kinds of anomalies, such as severe weather conditions, or fusing in additional data sources, such as vehicle acceleration or driver concentration.