A Deep Learning-Based Fragment Detection Approach for the Arena Fragmentation Test

The arena fragmentation test (AFT) is one of the tests used to design an effective warhead. Conventionally, complex and expensive measuring equipment is used for testing a warhead and measuring important factors such as the size, velocity, and the spatial distribution of fragments where the fragments penetrate steel target plates. In this paper, instead of using specific sensors and equipment, we proposed the use of a deep learning-based object detection algorithm to detect fragments in the AFT. To this end, we acquired many high-speed videos and built an AFT image dataset with bounding boxes of warhead fragments. Our method fine-tuned an existing object detection network named the Faster R-convolutional neural network (CNN) on this dataset with modification of the network’s anchor boxes. We also employed a novel temporal filtering method, which was demonstrated as an effective non-fragment filtering scheme in our recent previous image processing-based fragment detection approach, to capture only the first penetrating fragments from all detected fragments. We showed that the performance of the proposed method was comparable to that of a sensor-based system under the same experimental conditions. We also demonstrated that the use of deep learning technologies in the task of AFT significantly enhanced the performance via a quantitative comparison between our proposed method and our recent previous image processing-based method. In other words, our proposed method outperformed the previous image processing-based method. The proposed method produced outstanding results in terms of finding the exact fragment positions.


Introduction
The fragmentation of projectiles or warheads leads to the outbreak of a large number of fragments with various masses and geometries [1]. Fragmentation is the procedure by which the wrapping of projectiles or warheads from a bomb, land mine, missile, etc. is burst by the explosion of the detonative filler. Warheads are explosive materials carried by a flight vehicle, like a shell, rocket, missile, or fighter, to destroy and disable targets. The objective of the arena fragmentation test (AFT) is to measure the warhead performance. High explosive (HE) warhead performance has been characterized based on fragmentation features, including data on the fragment mass, number and shape, initial fragment velocities, warhead case and explosive material performances, and spatial fragment distributions. The lethal efficiency of the HE warhead is a function of the fragment velocity, the geometrical shape and mass of fragments, and their spatial distribution [2]. Generally, in both industry and academia, the function has been modeled using the Mott [3] and Held [4] equations. In practice, the determination of a warhead's performance requires complex measuring hardware equipment, and the measuring process itself is expensive [1].
Fragmentation warheads are explosive materials that contain a thousand fragments in them, and they destroy targets with their explosive pressure and fragment effects. As mentioned above, the AFT is performed to develop a warhead and evaluate its performance [1,5]. Through this test, we can obtain important data, such as the size, velocity, number, and spatial distribution of the warhead fragments and the explosive pressure of the warhead. In general, the AFT consists of installing steel plates at a specific distance around the warhead as shown in Figure 1. The steel plate is equipped with a screen sensor for measuring various statistical characteristics (the size, velocity, number, and spatial distribution) of the warhead fragments, and the signal output from each sensor is transmitted to the measurement system. The sensors and measurement systems are very expensive. They have a complicated structure, including a bundle of open-short circuit sensors, long cables, signal conditioning amplifiers, and analog-to-digital converters [5]. Traditional systems, however, have limited instrumentation capabilities due to physical factors. The sensor is disposable, and the installation of new sensors is required for each test, resulting in high human, time, and financial costs. Their physical size makes the number of sensors installed in one steel plate highly limited, causing a low acquisition rate of data. In addition, there are stability and reliability problems as the warhead fragments can hit the signal cable before hitting the sensor, which results in measurement failure.
Recently, machine learning technologies have been extensively used in diagnosis, sensing, monitoring, and measurement applications [6][7][8][9][10][11]. In addition, in such applications, image processing and computer vision technologies have been widely employed [12][13][14][15][16][17]. Studies on measuring fragment data based on images have also been conducted. Baillargeon [18] and Huang [19] analyzed the performance of warheads and the flight of fragments using X-rays. Liu [20] studied the techniques using two optical devices to measure the characteristics of the fragment flying. However, these studies have the disadvantage of being able to analyze only short-term phenomena, such as the initial stage of detonation. In order to solve this problem, a long-time and multi-fragment measurement algorithm [21] using four stereo images was proposed however, this algorithm has slow measurement speed and overlapping problems, which are not suitable for warhead tests requiring high speed and multi-fragment measurement. In recent years, we introduced an image processing-based fragment detection approach by modeling an appropriate fragment feature [22].
In this paper, to overcome the limitations of the existing AFT measurement system, we propose a warhead fragment data measurement system using an object detection technique, which is one of the most important deep learning-based computer vision technologies [23,24]. To construct the proposed warhead fragment data measurement system, we first captured the warhead fragments that penetrated the steel plate using a high speed camera sensor as shown in Figure 2, and then designed a deep learning-based fragment detector to detect only the first penetrated fragments from the captured images.
The proposed method consists of (1) deep learning-based detection of warhead fragments from the captured warhead fragment images and (2) finding only the first penetrated warhead fragments using a temporal filtering scheme [22]. We will show that the proposed method outperformed the previous image processing-based fragment detection method [22]. More specifically, in our previous image processing-based fragment detection method [22], our aim was to detect bright regions from the dark background. The bright region was carefully modeled to detect first penetrating fragments (FPFs). Although such an approach worked well, in this paper we will demonstrate that the use of convolutional neural network (CNN)-based detection in the first phase outperforms the use of image processing-based detection in the first phase. Figure 3 shows an example representing the difference between the first penetrating fragment (FPF) and flame. In Figure 3, the green and blue bounding boxes indicate the FPFs and flames, respectively. To make it more clear, the shape and appearance (yellow circles) of the fragments have not changed. In practice, the shape and appearance change after the first penetration. Note here that the ultimate goal of AFT is to detect "first" penetrating fragments (FPFs). In other words, the AFT has "no interest" in detecting the subsequent penetrating fragments (such as the second and third penetrating fragments) that penetrate the same area. Furthermore, in reality, the repenetration (e.g., the second penetration) for the same area is "very unlikely" to occur. This is because the explosion of a warhead has the property of bursting radially and thus fragments pass different location. Note also that, after the first penetration, the flame occurs in the penetrated region and its surrounding nearby area. In the first phase (i.e., CNN-based detection) of our approach, we found the FPF candidates. The FPF candidates included not only the true FPFs but also the flames. Then, in the second phase (i.e., temporal filtering), the flames were removed. The first and second phases were very important. The shapes as well as appearances of the FPFs and flames were very similar.
In this paper, for the task of finding FPF candidates (i.e., the first phase), we demonstrated that the CNN-based detection outperformed the image processing-based detection. Figure 4 shows a real example of FPF and its corresponding flames. As shown in Figure 4, the shapes as well as the appearances of FPF and its corresponding flames are very similar. Figure 5 shows an example representing the difference between the bounding box generated by the Faster R-CNN and mask obtained by the temporal filtering. For every input frame, the CNN-based detection (i.e., the first phase) was applied to extract the FPF candidates. In Figure 5, the black bounding box indicates the (first penetrating) fragment candidate detected by the Faster R-CNN. Fragment 1 was determined as a fragment in the t + 1th frame. However, fragment 1 can be filtered out by the temporal filtering in the t + 1th frame. This is because fragment 1 was already determined as a fragment in the tth frame and the region for fragment 1 was masked in the t + 1th frame to prevent the detection of FPFs in that area. On the other hand, fragment 2 was determined as a fragment in the t + 1th frame, as there was no mask related to the fragment 2 in the t + 1th frame. To make it easier to understand, the shape and appearance (yellow circles) of the fragments have not changed.    This paper is organized as follows. Section 2 describes the deep learning-based fragment detection algorithm. Section 3 introduces the collected warhead fragment image data and describes the composition of the data for learning and evaluation. In addition, the fragment detection performance of the proposed algorithm is analyzed by the qualitative detection results and quantitative performance evaluation results. Our conclusions follow in Section 4.

Overview
The object detection technique is a technique to detect the position of an object in an image in the form of a bounding box and to classify the object categories. Recently, with the development of artificial intelligence, high-performance object detection techniques using convolutional neural networks (CNNs) have been proposed. Such object detection techniques are largely divided into two groups. One is a single step detection technique, and the other is a region-based two-step detection technique. Representative examples of single-stage techniques are single shot detector (SSD) [25] and you only look once (YOLO) [26] detectors. The single-stage technique first divides the feature point map obtained from the CNN feature point extractor into a grid cell of a predetermined interval. Then, by using the features in each grid, information about the bounding box indicating the position of the object and the type of the object are simultaneously predicted.
This single step technique is characterized by a fast reasoning speed. However, there is a disadvantage in that small objects, such as warhead fragments, cannot be detected. A representative example of a two-step region-based technique is the Faster R-CNN [27] detector. The Faster R-CNN consists of a region proposal network (RPN) and a Fast R-CNN [28] detector. Both networks share the same CNN feature extractor. First, the RPN searches a bounding box indicating the position of an object at each pixel position from the extracted feature point map and detects only the bounding box corresponding to the object. Next, the Fast R-CNN determines the type of the object using the feature points included in the detected bounding box.
This area-based technique is effective for reflecting the approximate size of an object to the detector during the learning process. Due to these characteristics, it shows a higher performance in detecting small objects, such as warhead fragments, compared to the single-stage technique using the feature map as the grid unit. Therefore, this study effectively detected warhead fragment data in ATFs by using Faster R-CNN suitable for small object detection. Figure 6 shows example input warhead fragment images where the green box represents the first penetrating warhead fragment contained within the images, and the blue box represents the flame that occurs at that location after fragment penetration. The green and blue bounding boxes represent the FPF (ground truth) and flame, respectively. The red bounding box indicates the FPF detected by our proposed two-step approach, which consists of (1) CNN-based detection and (2) temporal filtering. Naturally, the ground truth bounding boxes can be overlapped, as the experts made the ground truth boxes. The flames can, of course, be overlapped as well. There are also overlaps between the ground truth FPFs and flames. For example, the green and blue boxes can be overlapped. In addition, a green (blue) box and the other green (blue) box can be overlapped. From a physical point of view, a red box (whose ground truth is a green box) overlapped with a mask (which corresponds to a blue box) cannot be detected as a FPF when there is even a slight overlap between the red box and the mask. Note that the appearance and shape of FPFs are "inconsistent". Some FPFs are very bright (see a and c in Figure 6b). On the other hand, there are darker FPFs (see b in Figure 6b). The appearance and shape of flames are also "inconsistent". There are brighter and darker flames. For this reason, the FPFs and flames have "very similar" appearances and shapes (see d , e , and f in Figure 6b). Note also that the overlap between green and red boxes indicates the correct detection (see a , b , and c in Figure 6b). There could be false negatives (see f and g in Figure 6b). There could also be false positives (see h in Figure 6b). The h in Figure 6b indicates the noise. The false negatives indicate real FPFs that are incorrectly determined as non-FPFs by the algorithm. The false positives indicate non-FPFs (i.e., background, noise, or flames) that are incorrectly determined as FPFs by the algorithm. Our future work is to reduce the false negatives and positives.
(a) tth frame (previous frame) (b) t + 1th frame (current frame) shows FPFs detected by our proposed method and ground truths in tth and t + 1th frames, respectively. In (b), the two blue boxes in the middle have similar shapes and appearances with FPFs, but since the corresponding FPFs were detected in the tth frame, they are judged as flames in the t + 1th frame through the proposed algorithm. Note that the green and red bounding boxes can be overlaid. Thus the red box can be occluded by a green box and vice versa.
As shown in Figure 6, the warhead fragment image includes not only the first penetrating fragments but also the flames. As the warhead fragment detection is performed for every frame, a flame similar to the warhead fragment may be detected as a warhead fragment. To solve this problem, the proposed image-based fragment detection algorithm detects warhead fragments through Faster R-CNN at every frame and then extracts only the first penetrating warhead fragments through temporal filtering [22]. Figure 7 shows a flow diagram of the proposed approach. As shown in Figure 7, for every input frame, the CNN-based detection (i.e., the first phase) is applied to extract the FPF candidates, and then the temporal filtering (i.e., the second phase) is performed. In the phase of temporal filtering, the mask information is accumulated and updated. The CNN-based detection and temporal filtering modules are shared across an input AFT video. Section 2.2 describes the warhead fragment detection method using Faster R-CNN, and Section 2.3 describes the temporal filtering method to detect only the first penetrating warhead fragments.

Faster R-CNN Based Fragment Detection Algorithm
A Faster R-CNN network can learn to detect specific objects through supervised learning. For such supervised learning, a labeled image showing the position (bounding box) of the object in the learning data is required. In addition, due to the nature of deep learning, a large amount of learning data is required for learning. However, warhead fragment images have no public image and are difficult to acquire compared to general images. In this study, a warhead fragment image dataset was created using a high-speed camera, and a suitable fine-tuning technique was used to perform effective learning with the dataset.
The warhead fragment image dataset will be described in detail in Section 3.1, and this section describes the Faster R-CNN network suitable for warhead fragment detection. In this paper, the Faster R-CNN object detector was trained to detect the FPF. There are two classes: (1) FPF and (2)  The proposed warhead fragment detector uses a pre-trained Faster R-CNN object detector, which includes an Inception-v2 [29] model trained with the MS-COCO [30] dataset. The warhead fragment image dataset was then used to fine-tune the pre-trained Faster R-CNN's feature extractor and classifier. As described above, the RPN used in the Faster R-CNN searches for each pixel location for a bounding box representing the location of the object from the extracted feature point map. In this case, the bounding box candidates may be previously defined in a form suitable for warhead fragments based on a predefined anchor box size and aspect ratio.
The original paper combines three scales (0.5, 1, and 2) and three aspect ratios (2:1, 1:1, and 1:2) based on an anchor box with a 256 2 pixel area, as shown in Table 1. A total of nine anchor boxes were previously defined. However, since warhead fragments are small objects with a minimum area of 3 pixels compared to the 640 × 480 warhead fragment image size, it is not suitable to use 256 anchor boxes for warhead fragment detection. Therefore, in order to accurately detect warhead fragments, this study shows that the RPN anchor box is divided into five scales (0.125×, 0.25×, 0.5×, 1×, and 2×) based on a 64 2 pixel area anchor box.
By combining three aspect ratios (4:3, 1:1, and 3:4), a total of 15 anchor boxes were newly defined to be suitable for detecting warhead fragments and used for learning as shown in Figure 8. In the original Faster R-CNN, for each anchor box, if the intersection-over-union (IoU) was more than 0.7 then there was an object and if the IoU was less than 0.3 then there was no object. Different from that, in this paper, if the IoU was more than 0.1 then there was a FPF. This is because the size of FPFs is very small. Our contributions in the task of Faster R-CNN based fragment detection are summarized as follows: • To the best of our knowledge, our work is the first to demonstrate the applicability of deep learning based object detection approaches for analyzing AFT images acquired by a high-speed camera; • We have verified that a two-step approach (such as Faster R-CNN) is more suitable for detecting warhead fragments (very small objects) compared with a single-stage approach (such as SSD); • We have empirically found the hyper-parameters of anchor boxes that are optimal for detecting warhead fragments.

Temporal Filtering
In this subsection, we briefly review the temporal filtering scheme [22]. The deep learning-based warhead fragment detector described in Section 2.2 detected all fragments in each frame of the test image. This includes fragment-like flames in addition to the first penetrating fragments, as mentioned in Section 2.1. To solve this problem, we used an efficient temporal filtering technique that detected only the fragments penetrating the steel plate first.
As a warhead fragment penetrates the steel plate, a flame is created at that location. These flames are characterized by expanding with time and dispersing finely after a certain amount of time. Unfortunately, these flames are similar to warhead fragments and so they are also detected at every frame after the first penetration. Therefore, our proposed method employed temporal filtering with information of the fragments and their following flames: (1) Size diversity of the first penetrating fragment, and (2) the expandability of the warhead fragment size (flame size). Based on the aforementioned features, the size of each initial penetrating warhead fragment was normalized, and the size change (flame) over time was statistically modeled. The proposed method used the Levenberg-Marquardt non-linear squares algorithm to perform the following non-linear logarithmic fitting on the size of warhead fragments as follows: (1) r t is the result of modeling the relative size of the warhead fragment detected in the tth frame when the size of the first penetrating warhead fragment is normalized to 1, as shown in Figure 9. a is a parameter required for non-linear log fitting of the collected fragment data. The bounding box in the tth frame after the initial penetration of each warhead fragment can be expressed as follows using i , the size of the initial penetrating warhead fragment (i indicates the index): In this case, x and y are the upper left coordinates of the bounding box, respectively, and w and h are the horizontal and vertical lengths, respectively. Equation (2) can be used to estimate the size of warhead fragments at the tth frame after the initial penetration. At this time, the error between the size of the warhead fragments predicted by non-linear log fitting and the one for the actual warhead fragments is less than about 20% on average. By using this, the mask for the first penetrating fragment is defined as follows: where ∪ represents a union operator. Rect (·) is a function that creates a rectangular mask from the input x, y, w, h. Equation (3) is used to obtain mask t i in the tth frame. Therefore, the proposed method could detect the first penetrating fragments using a deep learning-based warhead fragment detector over every frame, and could then use the filter generated by Equation (3) to eliminate the duplicated detection results after the first penetration (see Figure 10).
However, the aforementioned temporal filtering method does not use the tracking information of the detected warhead fragments but uses only the detection result of the warhead fragment at every frame and the change of the warhead fragment size estimated by the non-linear log fitting. The disadvantage is that it is difficult to create appropriate masks. Therefore, the mask obtained earlier is updated through post processing, as shown in Figure 11. For example, assume that new warhead fragments are detected in the tth frame and a preformed mask t i is given. If there exists an intersection of the warhead fragment and the mask t i in the tth frame, the newly detected fragment could be considered as an incorrect result. Therefore, mask t i is updated by adding the area of the warhead fragment to mask t i . This post-processing method can filter only the first penetrating warhead fragment among the warhead fragments detected in the tth frame without the tracking information for each warhead fragment.

Experimental Datasets
In order to acquire the warhead fragment images, the AFT environment was constructed as shown in Figure 1. As shown in Figure 2, a high-speed camera was installed behind the steel plate to capture the warhead explosion process. Four different AFT warhead conditions were constructed to build a total of 11 warhead explosion videos (Case 1: 1, Case 2: 3, Case 3: 3, Case 4: 4, for a total of 11 AFT videos, as shown in Table 2). They contain 5699 warhead explosion images, and 3159 ground truth FPF boxes.
A total of 11 AFT videos were reconstructed into three-fold cross-validation for use in learning and evaluation. In Case 1, there was no similar video, so it was used only as evaluation data. In Cases 2, 3, and 4, one video of each AFT video was evaluated, and the rest of the videos were used for learning. In particular, as Case 4 collected one more video than Cases 2 and 3, video #1 of Case 4 was always used for learning. In this way, the warhead fragment data was used at a ratio of about 7:3 for learning and evaluation. Figure 12 shows the data organization for the three-fold cross-validation. Figure 13 shows an example of a warhead explosion image with different warhead conditions. In addition, experts of the warhead test directly annotated the information on actual warhead fragments in the collected warhead test images.

Performance Evaluation
In order to verify the performance of the proposed deep learning-based warhead fragment detection algorithm, we used the precision and recall, which are evaluation indices. These are primarily used to evaluate object detection algorithms. Precision and recall are defined using True Positive (TP), False Positive (FP), and False Negative (FN). In this study, TP indicates that real warhead fragments were detected as warhead fragments, FP indicates that warhead fragments were not actual warhead fragments, and FN indicates that actual warhead fragments were not detected as warhead fragments. By using this, the precision and recall are calculated as follows: In addition, an evaluation using the F-measure, which is an evaluation method considering both precision and recall, was conducted. Subscripts of F represent the parameter β of the F-measure.
The F-measure was obtained as follows: When conventional AFTs detect warhead fragments used physical sensors, the max number of fragments detected in the test was limited only by the number of sensors installed on the steel plate. That is, even if two or more fragments were hit by one sensor, the sensor could detect only the one piece of fragment data that penetrated for the first time, and the sensor cannot know the exact location of the warhead fragment. In terms of precision and recall, the use of a physical sensor provides high precision but low recall. In addition, the existing system has the disadvantage that the measurement data is contaminated if the fragment hits the sensor-instrument connecting cable before the sensor is hit. The proposed warhead fragment detector was evaluated using three-fold cross-validation as shown in Table 3.
The performance of the proposed warhead fragment detector for 11 different AFT videos is as follows. The detector showed approximately 93% precision for Cases 1, 2, and 3 however, the recall was approximately 56%. Case 4, on the other hand, demonstrated approximately 77% precision and about 86% recall. In Cases 1, 2, and 3, many types of warhead fragments appeared therefore, the proposed algorithm could not detect approximately 44% of the warhead fragments. On the other hand, in Case 4, as most of the warhead fragments had a stereotype, the proposed algorithm could detect most of the warhead fragments. However, as shown in Figure 14d, the light between the steel plate and holes in the steel plate itself had similar characteristics to the warhead fragments, indicating that the precision dropped to approximately 77%. Overall, it can be seen that it showed a precision of 88% and a high recall of 67%.
In other words, 88% of the warhead fragments detected by the proposed warhead fragment detector represented actual warhead fragments, and 67% of the actual warhead fragments were detected. The performance is comparable to that of physical sensors in terms of precision, and was much higher than physical sensors in terms of recall. In particular, the proposed warhead fragment detector had the advantage of finding the exact location of each warhead fragment. In Table 4, we report the performance of the previous image processing-based fragment detection method. As shown in Table 3 and 4, the proposed deep learning-based method outperformed the previous image processing-based method in terms of the precision, F 1 -measure, and F 0.5 -measure. However, the recall of our method was slightly lower than that of the image processing-based method. In the task of AFT, the precision, F 1 -measure, and F 0.5 -measure were more important than the recall [22].
In Table 5, we compare the performance of (1) (only) Faster R-CNN and (2) Faster R-CNN with temporal filtering in terms of the average precision and recall. From the table, we can see that the Faster R-CNN found the FPFs, and the temporal filtering effectively filtered out the non-FPFs (e.g., the flames). This is due to the high recall in (1) (only) Faster R-CNN and the high precision and recall in (2) Faster R-CNN with temporal filtering. More specifically, the Faster R-CNN ensured s higher recall performance. Then, while keeping the recall performance as much as possible, the temporal filtering raised the precision performance. Figure 14 shows the detection results for the sample image shown in Figure 13. The green box shows the actual first penetrating warhead fragment, the blue box shows the non-first penetrating warhead fragment, and the red box shows the first penetrating warhead fragment detected by the proposed detector.

Conclusions
In this paper, we proposed an image-based warhead fragment detection system to overcome the limitations of the sensor-based measurement system used in the existing AFT. We redesigned Faster R-CNN, a deep learning-based object detection algorithm, for warhead fragment detection, and employed a temporal filtering method for detecting only the first penetrating fragment. Using the experimental results, we discussed the limitations of the physical sensor and found that the warhead fragment detection algorithm proposed for various AFT videos was an effective measurement method.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: