Next Article in Journal
Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP
Previous Article in Journal
A Co-Optimization Method for Analog IC Placement and Routing Based on Sequence Pairs and Random Forests
Previous Article in Special Issue
Extension of Interval-Valued Hesitant Fermatean Fuzzy TOPSIS for Evaluating and Benchmarking of Generative AI Chatbots
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Novel Learning Framework with Generative AI X-Ray Images for Deep Neural Network-Based X-Ray Security Inspection of Prohibited Items Detection with You Only Look Once

School of Electronic Engineering, Gyeongsang National University, Jinju-si 52828, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(7), 1351; https://doi.org/10.3390/electronics14071351
Submission received: 14 February 2025 / Revised: 21 March 2025 / Accepted: 25 March 2025 / Published: 28 March 2025
(This article belongs to the Special Issue Generative AI and Its Transformative Potential)

Abstract

:
As the rapid expansion of future mobility systems increases, along with the demand for fast and accurate X-ray security inspections, deep neural network (DNN)-based systems have gained significant attention for detecting prohibited items by constructing high-quality datasets and enhancing detection performance. While Generative AI has been widely explored across various fields, its application in DNN-based X-ray security inspection remains largely underexplored. The accessibility of commercial Generative AI raises safety concerns about the creation of new prohibited items, highlighting the need to integrate synthetic X-ray images into DNN training to improve detection performance, adapt to emerging threats, and investigate its impact on object detection. To address this, we propose a novel machine learning framework that enhances DNN-based X-ray security inspection by integrating real-world X-ray images with Generative AI images utilizing a commercial text-to-image model, improving dataset diversity and detection accuracy. Our proposed framework provides an effective solution to mitigate potential security threats posed by Generative AI, significantly improving the reliability of DNN-based X-ray security inspection systems, as verified through comprehensive evaluations.

1. Introduction

Security inspections are essential for ensuring passenger safety in public transportation, including airplanes, ships, and future mobility systems [1,2,3]. Currently, X-ray detection systems rely on manual inspection by security personnel to identify prohibited items. However, this task can cause significant fatigue and stress, potentially affecting the accuracy of prohibited item detection. Additionally, personnel-based systems are incompatible with future autonomous mobility systems. Recently, deep neural network (DNN)-based accurate and fast automatic systems used to detect prohibited items in X-ray security inspection have received great attention in recent years.
Specifically, object detection techniques within computer vision that can individually identify prohibited items on images have been widely studied for X-ray security inspection [4,5,6,7,8]. However, X-ray images, produced using high-energy radiation, often appear as overlapping objects due to their transmission properties, making detection in security inspections more challenging. These overlap issues can occur between objects and the background or among multiple objects [4]. In addition, it is known that the manual collection and annotation of X-ray images is labor-intensive and costly [9].
To resolve these challenges, constructing high-quality datasets and DNN models have been widely studied to improve detection performance in X-ray security inspection. Benchmark datasets have been extensively studied for X-ray security inspection in real-world scenarios, such as GDXray [10], SIXray [11], OPIXray [4], HIXray [5], CLCXray [6], PIDray [7], LSIray [12], DvXray [13], and LDXray [14]. These datasets have been used to improve the detection performance of X-ray prohibited items by addressing issues such as lack of data, positive data imbalances, overlap between objects, and image building from multiple directions. In addition, adding or improving additional modules in DNN models was studied to enhance the detection performance based on benchmark datasets [4,5,6,8,11,12,13,14,15,16,17]. In addition, the framework for data augmentation of X-ray security inspection images with generative adversarial networks (GANs) was proposed to improve detection performance [9,18].
On the other hand, with OpenAI’s launch of ChatGPT in 2020, Generative AI (Gen AI) has received great worldwide attention in various fields. It autonomously generates new data such as text, images, video, etc., thereby automating tasks traditionally performed by humans [19,20]. With the availability of various commercial Generative AIs (Gen AIs) such as ChatGPT, Bard, DALL·E, and so on, an environment has been created where anyone can easily generate text or high-quality images simply by providing prompt input [19,20,21,22,23,24]. In this regard, research on the utility and impact of Gen AIs, including several concerns of their exploiting capabilities, has been widely studied in various applications [25,26]. Specifically, images created by Gen AI have become so sophisticated that distinguishing them from real images has become increasingly challenging [27]. Accordingly, the potential risks associated with Gen AI technologies, such as deepfakes, have become more prominent [26]. However, to the best of our knowledge, research on the utility and impact of Gen AIs in machine learning-based X-ray security inspection systems has not been studied, despite its tremendous utility and concerns.
In particular, the widespread availability of commercial Gen AIs, which can generate highly realistic images with just a few lines of input, raises concerns about the potential creation of new prohibited items, posing significant safety threats. Since the performance of machine learning-based X-ray security inspection systems relies heavily on the training dataset [28], there is a concern that systems trained solely on traditional benchmark datasets may fail to detect newly created prohibited items generated by commercial Gen AIs. This limitation could lead to serious security risks in public transportation and future mobility systems. Hence, to accurately detect prohibited items, including newly emerging ones, synthetic X-ray images generated by commercial Gen AIs should be incorporated into the training of DNNs for X-ray security inspection systems to investigate their impact on the object detection performance of the prohibited items.
In this paper, we propose a novel machine learning framework that integrates real-world X-ray images with Gen AI X-ray images to enhance dataset diversity and improve the detection of prohibited items in DNN-based X-ray security inspection systems. To support our framework, we establish a newly developed Gen AI X-ray image dataset using a commercial text-to-image model, DALL·E 3, providing a valuable resource for training and evaluating DNN models. To improve model robustness and generalization, we leverage copy-paste augmentation, which enhances object overlap, occlusions, and cluttered environments. We systematically evaluate the impact of Gen AI images on detection accuracy by training the model with varying numbers of real-world and synthetic Gen AI X-ray images, assessing its ability to generalize and identify potential limitations. Our evaluations demonstrate that YOLOv8, when trained solely on real-world X-ray images, struggles to detect prohibited items in Gen AI X-ray images, highlighting the necessity of synthetic data for improved generalization. In contrast, the proposed framework, which incorporates copy-paste-augmented Gen AI X-ray images, significantly enhances detection performance without compromising real-world accuracy in terms of Precision, Recall, and mAP50-95.

2. Proposed Approach

2.1. Framework on Learning with Generative AI X-Ray Images

Our proposed framework is illustrated in Figure 1. We devised a novel approach that enables the automatic security inspection system to learn to detect well-known prohibited items from real-world X-ray image datasets, as well as prohibited items from synthetic X-ray images generated by commercial Generative AI (Gen AI). This approach addresses the limitations of traditional X-ray datasets by enhancing dataset diversity. By reflecting real-world environments and use cases, text-based prompts are used to generate synthetic X-ray images.
In addition, to develop a more robust and accurate system while preventing the commercial Gen AI dataset from consisting of simplistic scenarios with a lack of object overlap, we leverage the copy-paste augmentation technique [29], in which individual objects extracted from Gen AI X-ray images are superimposed onto real X-ray detection systems. This augmentation technique exposes the DNN model to a wider range of object placements, occlusions, and cluttered environments, enhancing its ability to detect concealed or partially visible prohibited items. By integrating copy-paste-augmented Gen AI X-ray images into training, our framework enables the ability to improve the model’s generalization ability, enhancing detection performance on Gen AI X-ray images while maintaining accuracy in real-world scenarios. The augmented dataset enables the model to learn a broader spectrum of object interactions, occlusions, and spatial complexities, ultimately improving detection accuracy across both domains. The process of generating copy-paste-augmented Gen AI X-ray images is described in the next subsection.
The DNN model is trained using an enriched hybrid dataset that integrates real-world X-ray images and Gen AI X-ray images, enabling the model to generalize across both domains [30]. In this paper, we employed YOLOv8 as our chosen DNN model [31]. YOLOv8 incorporates an efficient architecture that utilizes the self-attention mechanism and deformable convolution, reducing memory requirements while simultaneously providing high detection accuracy and fast processing speed [31,32]. Additionally, YOLOv8 supports built-in data augmentation functions such as scaling, flipping, and cropping, enhancing user convenience and contributing to its widespread adoption across various industries [12,33,34,35]. Although YOLOv10, the latest version of the YOLO model, has been recently released with advancements and optimizations [36], we chose to utilize YOLOv8 due to its deployment-friendly nature and suitability for industrial applications requiring fast detection.
To evaluate its effectiveness, we conducted a performance tradeoff analysis focusing on the following aspects:
  • The impact of Gen AI images on detection accuracy.
  • The model’s ability to generalize across real-world and synthetic Gen AI X-ray images.
To systematically analyze these tradeoffs, we trained the DNN model with varying numbers of real-world and Gen AI X-ray images in the training dataset. The trained models were then evaluated separately using distinct test datasets. This approach enabled us to determine whether synthetic data enhances model accuracy or introduces limitations, ultimately refining the effectiveness of Gen AI-enhanced datasets. This approach allowed us to determine whether synthetic data improves model accuracy or introduces limitations, ultimately enhancing the effectiveness of Gen AI-enhanced datasets.

2.2. Real-World X-Ray Image Dataset for Prohibited Items

In this paper, we utilized “X-ray multi-object detection data” as the real-world X-ray images of prohibited items, which were provided by AI Hub [37]. In this subsection, we briefly introduce this dataset. AI Hub, supported by MSIT (Ministry of Science and ICT) and NIA (National Information Society Agency) of South Korea, is a platform that provides the infrastructure needed for the development of AI technologies and services, along with various datasets applicable. The “X-ray multi-object detection data” broadly categorizes various types of items in X-ray images into three categories: (1) “Prohibited Items”, (2) “Information Storage Devices”, and (3) “General Items”, where the dataset consists of a total of 541,260 images across 317 classes. The details on the selected dataset that we utilized in the training and evaluation for the prohibited items are describied in [37].
Figure 2 shows examples of real-world X-ray images that we utilized. The original dataset of prohibited items was classified into six categories (Gun, Knife, Wrench, Pliers, Scissors, and Hammer) based on the SIXRAY dataset [11] for practical X-ray security inspection. Various subcategories were merged, and unrelated or mislabeled data were excluded. The final dataset consisted of 51,210 real-world X-ray images, which were divided into Train, Validation, and Test sets.

2.3. Copy-Paste-Augmented Generative AI X-Ray Image Dataset for Prohibited Items

In order to leverage the Gen AI X-ray image dataset for prohibited items, we produced X-ray images for the prohibited items created by the commercial Gen AI. Among the several text-to-image Gen AI models, we adopted the DALL·E 3 [38], which is developed by Open AI, thanks to its accessibility and availablity in ChatGPT4 [20] as well as Microsoft Copilot [39]. To create the images, we input the text prompt into the uncustomized original DALL·E 3 through Microsoft Copilot (we leveraged Microsoft Copilot due to its utility, where several images corresponding to the prompt are generated at once), without any modifications or additional fine-tuning, as follows:
“X-ray image of a box containing a Prohibited Item. This image is in the style typically seen by airport security personnel.”
In the above prompt, a Prohibited Item means one of Gun, Knife, Wrench, Pliers, Scissors, or Hammer so that it would generate the images with the same classes as the real-world X-ray images. Then, we manually labeled each generated image by utilizing the “LabelImg” program, which supports labeling with YOLO and Pascal VOC formats [40], since the generated images lack information on object location, quantity, and type.
Figure 3 shows examples of Gen AI X-ray images for prohibited items created from DALL·E 3. Most images contain one prohibited item, but some images also contain several prohibited items. In addition, due to creative generation on some images, we excluded the generated images that were unclear to identify objects. To match the default input image size of YOLOv8, the size of each Gen AI image was adjusted 640 × 640 from the initial size of 1024 × 1024.
Next, we produced augmented images based on the created Gen AI X-ray images using a separate copy-paste Python (version 3.12.2) script [41] instead of using YOLO’s built-in augmentation features. This study focuses on images generated by commercial Gen AI. To address the limitations of existing Gen AI X-ray images, which often feature simple compositions and lack object overlap, we applied the copy-paste augmentation technique while preserving other image properties such as color and size. This approach enhances individual images by incorporating additional object information and introducing object occlusion, thereby creating more realistic training data.
Figure 4 illustrates examples of the copy-paste-augmented Gen AI X-ray images for prohibited items that we created. For copy-paste augmentation with instance segmentation labels, we exploited a commonly used labeling tool that supports labeling tasks for various image segmentation and augmentation techniques. In this study, augmentation was primarily performed using the copy-paste technique. However, exploring additional performance improvements by integrating it with various augmentation techniques remains as an on-going topic of research.

3. Results and Discussion

In this section, our goal is to evaluate and analyze the performance of our proposed framework by training YOLOv8 while varying the number of real-world X-ray training images and copy-paste-augmented Gen AI X-ray images.

3.1. Performance Metrics and Setup

To evaluate our proposed framework, we considered the performance metrics that are widely adopted in object detection based on the confusion matrix of Table 1 as follows [42]:
  • Precision: Proportion of predicted positive classes that are actually positive, which is defined as TP TP + FP .
  • Recall: Proportion of actual positive instances that the model correctly identifies as positive, which is defined as TP TP + FN .
  • mAP50-95: A metric that averages the mean Average Precision (mAP) calculated at Intersection over Union (IoU) thresholds ranging from 0.5 to 0.95 in increments of 0.05. This metric includes higher IoU criteria compared to mAP50, thus requiring more accurate bounding boxes.
The environment that we adopted for evaluation is described in Table 2. We utilized YOLOv8 for the DNN model thanks to its lightweight, fast, and high-performance characteristics [31]. The pretrained weights from the COCO dataset were applied to our YOLOv8 model to improve its generalization ability in object detection, which has been widely adopted for benchmarking object detection models [43].
The hyperparameters considered for model training are described as follows. The model was trained for 20 epochs using the AdamW optimizer with an initial learning rate of 0.01, a momentum of 0.937, a weight decay of 0.0005, and a batch size of 16. In addition, the input image size was set to 640 × 640. We considered the default hyperparameter values for YOLOv8 to maintain a consistent experimental environment without the influence of tuning on performance differences. Although it is known that enhancing the YOLOv8 structure and tuning its hyperparameters can improve performance, this study focused on performance analysis based on the inclusion of Gen AI images. Therefore, the default configuration was retained to enable a clearer comparison. Extending the proposed framework to incorporate an improved YOLOv8 by optimizing its structure and hyperparameter configurations remains an ongoing area of research.
To evaluate our proposed framework through performance tradeoff analysis, we trained six separate YOLOv8 models, each with varying numbers of real-world X-ray images and Gen AI X-ray images. Table 3 shows the total number of images for each set, where non-copy-paste-augmented Gen AI (NC-Gen AI) X-ray images mean Gen AI images without copy-paste augmentation technique such as the images in Figure 3. For performance comparison based on the copy-paste augmentation technique, Models 1–3 were trained using NC-Gen AI X-ray images, while Models 4–6 were trained using copy-paste-augmented Gen AI (C-Gen AI) X-ray images such as the images in Figure 4.
More specifically, Table 4 provides the number of training images for Models 1–3 and Models 4–6, respectively, where α [ 0 , 300 ] represents the number of Gen AI X-ray images used in the training set. For example, Model 1 was trained with 1000 real-world X-ray images and α NC-Gen AI X-ray images, while Model 4 was trained with 1000 real-world X-ray images and α C-Gen AI X-ray images. During the evaluations, α increased from 0 to 300 so that each model was trained with an increasing number of Gen AI X-ray images. In this case, a uniform distribution was used to sample the number of images corresponding to the value of α among all training datasets containing 300 Gen AI X-ray images. Also, for Models 1 and 4 and Models 2 and 5, 1000 and 10,000 images, respectively, were also randomly sampled from a real-world training dataset containing 35,850 images.
For a performance tradeoff analysis, each trained model was separately evaluated on a real-world X-ray image test set, an NC-Gen AI X-ray image test set, and a C-Gen AI image test set. This approach aimed to assess the model’s existing detection capability while determining its effectiveness in identifying new threat elements in complex environments. Consequently, the impact of the Gen AI dataset on model performance was quantitatively analyzed. Each performance metric was calculated as the average of five experiments conducted with different sampled images for a fixed number of images, i.e., 1000 or 10,000 sampled real-world images and α sampled Gen AI images.

3.2. Evaluation and Performance Analysis

In this subsection, Figure 5, Figure 6 and Figure 7 illustrate the performance of each model in terms of Precision, Recall, and mAP50-95 with respect to the number of training Gen AI X-ray images, i.e., α , evaluated respectively by (a) a real-world X-ray image test set, (b) an NC-Gen AI X-ray image test set, and (c) a C-Gen AI X-ray X-ray image test set. In addition, the detailed values associated with each figure are provided in Table 5, Table 6 and Table 7, respectively.
First, Figure 5a–c represent the Precision performance, and the corresponding detailed values are provided in Table 5. In Figure 5a, the Precision remained consistently high for all models, regardless of α . Specifically, the copy-paste augmentation or not did not affect the Precision performance for real-world images. As expected, Models 3 and 6 exhibited the highest performance, while Models 1 and 4 showed the lowest performance. Figure 5b shows that the Precision considerably increased as α increased for all models. The models trained without the Gen AI X-ray images (i.e., α = 0 ) yielded Precision values smaller than 0.4, while the models trained with 120 images achieved Precision values higher than 0.8. Models 4–6 slightly outperformed Models 1–3 and achieved near-perfect Precision values with 300 C-Gen AI images. Figure 5c demonstrates the Precision performance evaluated for the C-Gen AI test set. Similarly, increasing α led to improved performance. On the other hand, Models 4–6 considerably outperformed Models 1–3 for all Gen AI image levels and achieved a 0.9 Precision value with 300 C-Gen AI training images.
Next, Figure 6a–c show the Recall performance, and the corresponding detailed values are provided in Table 6. In Figure 6a, the Recall remained consistently high for all models regardless of α , and the copy-paste augmentation or not had little to no effect on the Recall performance for real-world images, where Models 3 and 6 exhibited the highest performance. Figure 6b,c exhibit that the Recall considerably increased as α increased on both the NC-Gen AI and C-Gen AI test sets. In particular, the models trained without the Gen AI X-ray images (i.e., α = 0 ) yielded Recall values smaller than 0.2. Moreover, Models 4–6 outperformed Models 1–3 on both test sets after 120 images, with a larger performance gap observed in the C-Gen AI test set. Models 4–6 achieved near-perfect Recall values on the NC-Gen AI test set and approximately 0.85 in their Recall values on the C-Gen AI test set when trained with 300 C-Gen AI images. In addition, Model 4 demonstrated the highest performance on the C-Gen AI test set.
Figure 7a–c show the mAP50-95 performance, and the corresponding detailed values are provided in Table 7. Figure 7a demonstrates the similar performance to the Precision and Recall results with respect to α . Also, Figure 7b,c demonstrate that the mAP50-95 significantly increased as α increased on both the NC-Gen AI and C-Gen AI test sets. Models 4–6 outperformed Models 1–3 on both test sets, with a larger performance gap observed in the C-Gen AI test set. Specifically, Models 4 and 6 achieved 0.9 in their mAP50-95 values on both NC-Gen AI test sets with 300 C-Gen AI training images. On the other hand, Model 4 demonstrated the highest mAP50-95 performance on the C-Gen AI test set and achieved near 0.8 in its mAP50-95 value with 300 C-Gen AI training images, while Model 6 yielded near 0.7 value.

3.3. Discussion

In this subsection, we analyze and discuss the performance of each model individually from the experimental results. First, Model 1 consistently exhibited the lowest performance in the real-world X-ray image test set, regardless of the increase in Gen AI X-ray images. However, in the NC-Gen AI X-ray image test set, its performance improved as the number of Gen AI images increased, exceeding the average. In the C-Gen AI X-ray image test set, Model 1 achieved the highest performance among Models 1–3 as the number of Gen AI images increased, but the performance gap between Model 1 and Models 4–6 widened progressively.
Model 2 maintained stable performance in the real-world X-ray image test set, even as the number of Gen AI X-ray images increased. In the NC-Gen AI X-ray test set, it demonstrated excellent Precision as the number of Gen AI images increased, but the Recall and mAP50-95 values remained relatively low. In the C-Gen AI X-ray image test set, Model 2 exhibited a significant performance gap compared to Models 4–6. Among Models 1–3, it showed strong performance in terms of Precision, but the Recall and mAP50-95 values remained low.
Model 3 achieved the highest performance in the real-world X-ray image test set, regardless of the increase in Gen AI X-ray images. In the NC-Gen AI X-ray test set, the Precision was strong, but the Recall and mAP50-95 were relatively poor. In the C-Gen AI X-ray image test set, Model 3 recorded the lowest performance overall.
Model 4 exhibited a performance trend similar to Model 1 in the real-world X-ray image test set. However, in both the NC-Gen AI and C-Gen AI X-ray image test sets, Model 4 achieved the highest performance as the number of Gen AI images increased.
Model 5 showed performance nearly identical to Model 2 in the real-world X-ray image test set. It performed well in the NC-Gen AI X-ray image test set and exhibited high performance across all metrics in the C-Gen AI X-ray image test set. However, its mAP50-95 score was approximately 0.1 lower than that of Model 4.
Finally, Model 6 demonstrated performance similar to Model 3, achieving the highest scores in the real-world X-ray image test set. It also performed well in the NC-Gen AI X-ray test set. In the C-Gen AI X-ray image test set, it achieved high performance, but its mAP50-95 score was approximately 0.1 lower than that of Model 4, similar to Model 5.
The results conclude that YOLOv8, when trained solely on real-world X-ray images, exhibits limited detection ability on Gen AI images. However, incorporating Gen AI images into training enhances performance on synthetic data while maintaining real-world accuracy. Additionally, copy-paste augmentation further improves the overall detection capabilities.

4. Conclusions

This paper studied a novel machine learning framework for detecting prohibited items in DNN-based X-ray security inspection systems. As Generative AI technology advances, allowing anyone to create desired images using natural language without requiring specialized AI knowledge, security systems trained solely on real-world images may fail to detect newly generated prohibited items. This limitation could pose a security vulnerability and potential risk. To address this, a new Gen AI X-ray image dataset was created using a commercial text-to-image Gen AI model and was used to train and evaluate YOLOv8. Experimental results indicate that when the YOLOv8 model was trained solely on real-world X-ray images, it failed to detect prohibited items in Gen AI X-ray images, suggesting the necessity of supplementary training using Generative AI data. In contrast, models incorporating copy-paste Gen AI X-ray images significantly improved detection performance without compromising real-world detection accuracy. These findings suggest that leveraging commercial Gen AI can enhance the accuracy of DNN-based X-ray security inspection systems. Furthermore, incorporating just a small number of Gen AI images into the training set can provide a powerful solution to mitigate the security risks associated with Gen AI.
To verify whether the generated X-ray images truly reflect real-world X-ray images, our ongoing research focuses on implementing Human Expert Verification to enhance the quality and reliability of the generated AI X-ray image dataset [44,45,46]. This approach will strengthen our proposed framework, making it more robust and accurate. Moreover, our findings demonstrate that training with Generative AI data can enhance performance. However, they often introduce bias, generate unrealistic objects, and may lead to model collapse, creating a tradeoff [44,45,46]. To address these challenges, future research will focus on developing strategies to maximize performance while minimizing drawbacks. This includes constructing additional Generative AI datasets and adjusting the ratio of real to generated data. Furthermore, expert validation of the generated images will be conducted to ensure data quality and reliability, as well as to explore strategies for mitigating potential model bias.

Author Contributions

Conceptualization, D.K. and J.K.; Methodology, D.K. and J.K.; Software, D.K.; Validation, D.K. and J.K.; Formal analysis, D.K. and J.K.; Investigation, D.K. and J.K.; Resources, J.K.; Writing—original draft preparation, D.K.; writing—review and editing, J.K. Visualization, D.K. and J.K.; Supervision J.K.; Project administration, J.K.; Funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the NRF (National Research Foundation of Korea) grant funded by the Korea government (Ministry of Science and ICT) (RS-2023-00214142), and in part by the IITP (Institute of Information & Coummunications Technology Planning & Evaluation)-ICAN (ICT Challenge and Advanced Network of HRD) grant funded by the Korea government (Ministry of Science and ICT) (IITP-2025-RS-2022-00156409).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author. For the real-world X-ray image dataset, this paper used datasets from ‘The Open AI Dataset Project (AI Hub, S. Korea)’, where all data information can be accessed through ‘AI Hub (https://www.aihub.or.kr, accessed on 31 January 2024)’.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Akçay, S.; Kundegorski, M.E.; Devereux, M.; Breckon, T.P. Transfer learning using convolutional neural networks for object classification within X-ray baggage security imagery. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 1057–1061. [Google Scholar]
  2. Saavedra, D.; Banerjee, S.; Mery, D. Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 2021, 33, 7803–7819. [Google Scholar]
  3. Lee, J.N.; Cho, H.C. Development of artificial intelligence system for dangerous object recognition in X-ray baggage images. Trans. Korean Inst. Electr. Eng. 2020, 69, 1067–1072. [Google Scholar]
  4. Wei, Y.; Tao, R.; Wu, Z.; Ma, Y.; Zhang, L.; Liu, X. Occluded prohibited items detection: An X-ray security inspection benchmark and de-occlusion attention module. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 138–146. [Google Scholar]
  5. Tao, R.; Wei, Y.; Jiang, X.; Li, H.; Qin, H.; Wang, J.; Ma, Y.; Zhang, L.; Liu, X. Towards real-world X-ray security inspection: A high-quality benchmark and lateral inhibition module for prohibited items detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 10923–10932. [Google Scholar]
  6. Zhao, C.; Zhu, L.; Dou, S.; Deng, W.; Wang, L. Detecting overlapped objects in X-ray security imagery by a label-aware mechanism. IEEE Trans. Inf. Forensics Secur. 2022, 17, 998–1009. [Google Scholar]
  7. Zhang, L.; Jiang, L.; Ji, R.; Fan, H. Pidray: A large-scale X-ray benchmark for real-world prohibited item detection. Int. J. Comput. Vis. 2023, 131, 3170–3192. [Google Scholar]
  8. Kim, E.; Lee, J.; Jo, H.; Na, K.; Moon, E.; Gweon, G.; Yoo, B.; Kyung, Y. SHOMY: Detection of Small Hazardous Objects using the You Only Look Once Algorithm. KSII Trans. Internet Inf. Syst. (TIIS) 2022, 16, 2688–2703. [Google Scholar]
  9. Liu, J.; Lin, T.H. A framework for the synthesis of X-ray security inspection images based on generative adversarial networks. IEEE Access 2023, 11, 63751–63760. [Google Scholar]
  10. Mery, D.; Riffo, V.; Zscherpel, U.; Mondragón, G.; Lillo, I.; Zuccar, I.; Lobel, H.; Carrasco, M. GDXray: The database of X-ray images for nondestructive testing. J. Nondestruct. Eval. 2015, 34, 42. [Google Scholar]
  11. Miao, C.; Xie, L.; Wan, F.; Su, C.; Liu, H.; Jiao, J.; Ye, Q. Sixray: A large-scale security inspection X-ray benchmark for prohibited item discovery in overlapping images. In Proceedings of the 2021 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2119–2128. [Google Scholar]
  12. Han, L.; Ma, C.; Liu, Y.; Jia, J.; Sun, J. SC-YOLOv8: A security check model for the inspection of prohibited items in X-ray images. Electronics 2023, 12, 4208. [Google Scholar] [CrossRef]
  13. Ma, B.; Jia, T.; Li, M.; Wu, S.; Wang, H.; Chen, D. Towards dual-view X-ray baggage inspection: A large-scale benchmark and adaptive hierarchical cross refinement for prohibited item discovery. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3866–3878. [Google Scholar]
  14. Tao, R.; Wang, H.; Guo, Y.; Chen, H.; Zhang, L.; Liu, X.; Wei, Y.; Zhao, Y. Dual-view X-ray detection: Can AI detect prohibited items from dual-view X-ray images like humans? arXiv 2024, arXiv:2411.18082. [Google Scholar]
  15. Li, Y.; Zhang, C.; Sun, S.; Yang, G. X-ray detection of prohibited item method based on dual attention mechanism. Electronics 2023, 12, 3934. [Google Scholar] [CrossRef]
  16. Jing, B.; Duan, P.; Chen, L.; Du, Y. EM-YOLO: An X-ray prohibited-item-detection method based on edge and material information fusion. Sensors 2023, 23, 8555. [Google Scholar] [CrossRef] [PubMed]
  17. Zhang, H.; Zhao, Z.; Yang, J. Attention-based prohibited item detection in X-ray images during security checking. IET Image Process. 2024, 18, 1119–1131. [Google Scholar] [CrossRef]
  18. Zhu, Y.; Zhang, Y.; Zhang, H.; Yang, J.; Zhao, Z. Data augmentation of X-ray images in baggage inspection based on generative adversarial networks. IEEE Access 2020, 8, 86536–86544. [Google Scholar]
  19. OpenAI. ChatGPT. 2024. Available online: https://chat.openai.com (accessed on 4 April 2024).
  20. Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  21. Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. Palm 2 technical report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
  22. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Roziere, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
  23. Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 8821–8831. [Google Scholar]
  24. Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
  25. Andreoni, M.; Lunardi, W.T.; Lawton, G.; Thakkar, S. Enhancing autonomous system security and resilience with generative AI: A comprehensive survey. IEEE Access 2024, 12, 109470–109493. [Google Scholar] [CrossRef]
  26. Golda, A.; Mekonen, K.; Pandey, A.; Singh, A.; Hassija, V.; Chamola, V.; Sikdar, B. Privacy and security concerns in generative AI: A comprehensive survey. IEEE Access 2024, 12, 48126–48144. [Google Scholar] [CrossRef]
  27. Bird, J.J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access 2024, 12, 15642–15650. [Google Scholar]
  28. Lee, Y.; Kang, J. Performance Analysis by the Number of Learning Images on Anti-Drone Object Detection System with YOLO. J. Korean Inst. Commun. Inf. Sci. 2024, 49, 356–360. [Google Scholar] [CrossRef]
  29. Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.-Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2918–2928. [Google Scholar]
  30. Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
  31. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 31 January 2024).
  32. Sohan, M.; SaiRam, T.; RamiReddy, V.C. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
  33. Wang, A.; Yuan, P.; Wu, H.; Iwahori, Y.; Liu, Y. Improved YOLOv8 for Dangerous Goods Detection in X-ray Security Images. Electronics 2024, 13, 3238. [Google Scholar] [CrossRef]
  34. Fan, J.; Haji Salam, M.S.B.; Rong, X.; Han, Y.; Yang, J.; Zhang, J. Peach Fruit Thinning Image Detection Based on Improved YOLOv8 and Data Enhancement Techniques. IEEE Access 2024, 12, 191199–191218. [Google Scholar] [CrossRef]
  35. Zhang, L.; Wu, X.; Liu, Z.; Yu, P.; Yang, M. ESD-YOLOv8: An Efficient Solar Cell Fault Detection Model Based on YOLOv8. IEEE Access 2024, 12, 138801–138815. [Google Scholar]
  36. Mao, M.; Lee, A.; Hong, M. Efficient Fabric Classification and Object Detection Using YOLOv10. Electronics 2024, 13, 3840. [Google Scholar] [CrossRef]
  37. AI Hub. Available online: https://aihub.or.kr (accessed on 31 January 2024).
  38. OpenAI. DALL · E 3 System Card. 2023. Available online: https://openai.com/research/dall-e-3-system-card (accessed on 4 April 2024).
  39. Microsoft Copilot. Copilot. 2023. Available online: https://copilot.microsoft.com/ (accessed on 4 April 2024).
  40. Tzutalin. LabelImg. 2015. Available online: https://github.com/HumanSignal/labelImg (accessed on 4 April 2024).
  41. Conrad, R. Copy-Paste-Aug. 2020. Available online: https://github.com/conradry/copy-paste-aug (accessed on 5 December 2024).
  42. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  43. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
  44. Thielen, N.; Rachinger, B.; Schröder, F.; Preitschaft, A.; Meier, S.; Seidel, R. Comparative Study on Different Methods to Generate Synthetic Data for the Classification of THT Solder Joints. In Proceedings of the 1st International Conference on Production Technologies and Systems for E-Mobility (EPTS), Bamberg, Germany, 5–6 June 2024; pp. 1–6. [Google Scholar]
  45. Singh, K.; Navaratnam, T.; Holmer, J.; Schaub-Meyer, S.; Roth, S. Is synthetic data all we need? benchmarking the robustness of models trained with synthetic images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2505–2515. [Google Scholar]
  46. Paproki, A.; Salvado, O.; Fookes, C. Synthetic data for deep learning in computer vision & medical imaging: A means to reduce data bias. ACM Comput. Surv. 2024, 56, 271. [Google Scholar]
Figure 1. Illustration of our proposed framework.
Figure 1. Illustration of our proposed framework.
Electronics 14 01351 g001
Figure 2. Examples of real-world X-ray images for prohibited items where the red boxes indicate the prohibited items that have been labeled.
Figure 2. Examples of real-world X-ray images for prohibited items where the red boxes indicate the prohibited items that have been labeled.
Electronics 14 01351 g002
Figure 3. Examples of Gen AI X-ray images for prohibited items that we created, where the red boxes indicate the prohibited items that have been labeled.
Figure 3. Examples of Gen AI X-ray images for prohibited items that we created, where the red boxes indicate the prohibited items that have been labeled.
Electronics 14 01351 g003
Figure 4. Examples of copy-paste-augmented Gen AI images for prohibited items that we created, where the red boxes indicate the prohibited items that have been labeled.
Figure 4. Examples of copy-paste-augmented Gen AI images for prohibited items that we created, where the red boxes indicate the prohibited items that have been labeled.
Electronics 14 01351 g004
Figure 5. Precision with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Figure 5. Precision with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Electronics 14 01351 g005
Figure 6. Recall with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Figure 6. Recall with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Electronics 14 01351 g006
Figure 7. mAP50-95 with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Figure 7. mAP50-95 with respect to the number of training Gen AI X-ray images evaluated by (a) real-world X-ray image test set, (b) NC-Gen AI X-ray image test set, (c) C-Gen AI X-ray image test set.
Electronics 14 01351 g007
Table 1. Confusion matrix.
Table 1. Confusion matrix.
Predicted Class
PositiveNegative
Actuall classPositiveTPFN
NegativeFPTF
Table 2. Environment for evaluation.
Table 2. Environment for evaluation.
ComponentSpecification
OSWindows 11
CPUIntel i9 139006 (Santa Clara, CA, USA)
RAM64 GB
GraphicsNVIDIA RTX 4080 SUPER (Santa Clara, CA, USA)
VRAM16 GB
DNN ModelYOLOv8.2.42
Table 3. Total number of X-ray images.
Table 3. Total number of X-ray images.
Number of ImagesTrainValidTest
Real-World35,85076807680
Non-Copy-Paste
Augmented Gen AI
3006060
Copy-Paste
Augmented Gen AI
3006060
Table 4. The number of training images for Models 1–6.
Table 4. The number of training images for Models 1–6.
YOLOv8Real-WorldNC-Gen AI/C-Gen AI
Model 1/41000 α / α
Model 2/510,000 α / α
Model 3/635,850 (All) α / α
Table 5. Precision with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Table 5. Precision with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Number of Training
Gen AI X-Ray Images
03060120180240300
Real-world
X-ray image test
Model 10.7330.7290.7360.7420.7340.7370.719
Model 20.8870.8840.8930.8910.8920.8890.887
Model 30.9460.9490.9480.9500.9520.9520.953
Model 40.7330.7340.7390.7290.7420.7330.739
Model 50.8870.8850.8910.8890.8880.8840.883
Model 60.9460.9500.9490.9520.9480.9480.952
NC-Gen AI
X-ray image test
Model 10.0740.8570.9220.9320.9570.9390.954
Model 20.3450.7020.8470.8730.9180.9270.939
Model 30.0040.6770.8810.9180.9450.9210.903
Model 40.0740.6530.8560.9440.9600.9650.976
Model 50.3450.5440.7770.9320.9500.9710.960
Model 60.0040.6920.8500.8730.9060.9560.962
C-Gen AI
X-ray image test
Model 10.2300.6380.6690.7180.7530.7910.792
Model 20.1300.4130.6210.6360.6930.7650.766
Model 30.5230.5610.5360.6920.7110.7150.681
Model 40.2300.5440.7820.9200.9100.9370.914
Model 50.1300.5970.6970.7950.9000.9090.905
Model 60.5230.5860.6710.7710.8280.8480.893
Table 6. Recall with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Table 6. Recall with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Number of Training
Gen AI X-Ray Images
03060120180240300
Real-world
X-ray image test
Model 10.6340.6340.6360.6380.6410.6400.646
Model 20.8340.8370.8320.8310.8300.8350.835
Model 30.9220.9200.9210.9180.9170.9160.915
Model 40.6340.6250.6310.6230.6270.6270.628
Model 50.8340.8370.8320.8310.8310.8330.832
Model 60.9220.9180.9180.9170.9210.9210.910
NC-Gen AI
X-ray image test
Model 10.0400.7020.7920.8410.8380.9000.896
Model 20.0580.5710.7340.8060.8390.8290.841
Model 30.1540.5250.7450.7910.8050.8430.876
Model 40.0400.6250.8140.9500.9750.9940.999
Model 50.0580.5480.7580.8650.9290.9580.971
Model 60.1540.4850.7080.8970.9800.9890.999
C-Gen AI
X-ray image test
Model 10.0380.3920.4390.5440.5790.5740.580
Model 20.0390.2890.3540.4280.4740.4480.509
Model 30.0080.2760.4300.4400.4680.4650.503
Model 40.0380.4850.6760.7690.8170.8300.870
Model 50.0390.3930.5050.7160.7360.8040.832
Model 60.0080.3300.4860.6550.7480.8040.835
Table 7. mAP50-95 with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Table 7. mAP50-95 with respect to the number of training Gen AI X-ray images evaluated by real-world X-ray image test set, NC-Gen AI X-ray image test set, and C-Gen AI X-ray image test set.
Number of Training
Gen AI X-Ray Images
03060120180240300
Real-world
X-ray image test
Model 10.5390.5390.5420.5470.5480.5490.550
Model 20.7690.7700.7690.7690.7700.7700.770
Model 30.8610.8600.8610.8600.8610.8610.861
Model 40.5390.5370.5380.5340.5400.5350.538
Model 50.7690.7700.7690.7670.7670.7670.767
Model 60.8610.8600.8600.8590.8610.8600.860
NC-Gen AI
X-ray image test
Model 10.0010.5110.7000.8230.8590.8850.888
Model 20.0020.3990.5860.6630.7120.7910.810
Model 30.0010.4360.7160.7990.8040.8260.837
Model 40.0010.4490.6500.8490.9180.9340.956
Model 50.0020.3180.5520.7090.7750.8720.898
Model 60.0010.4170.6730.8060.8760.8980.934
C-Gen AI
X-ray image test
Model 10.0030.2170.2930.4190.4300.4490.478
Model 20.0040.1340.2010.2470.2880.3370.358
Model 30.0010.1460.2520.3210.3240.3540.339
Model 40.0030.2630.4390.6370.7040.7590.783
Model 50.0040.1780.2960.4520.5440.6370.687
Model 60.0010.2070.3530.5010.6020.6590.699
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, D.; Kang, J. Novel Learning Framework with Generative AI X-Ray Images for Deep Neural Network-Based X-Ray Security Inspection of Prohibited Items Detection with You Only Look Once. Electronics 2025, 14, 1351. https://doi.org/10.3390/electronics14071351

AMA Style

Kim D, Kang J. Novel Learning Framework with Generative AI X-Ray Images for Deep Neural Network-Based X-Ray Security Inspection of Prohibited Items Detection with You Only Look Once. Electronics. 2025; 14(7):1351. https://doi.org/10.3390/electronics14071351

Chicago/Turabian Style

Kim, Dongsik, and Jinho Kang. 2025. "Novel Learning Framework with Generative AI X-Ray Images for Deep Neural Network-Based X-Ray Security Inspection of Prohibited Items Detection with You Only Look Once" Electronics 14, no. 7: 1351. https://doi.org/10.3390/electronics14071351

APA Style

Kim, D., & Kang, J. (2025). Novel Learning Framework with Generative AI X-Ray Images for Deep Neural Network-Based X-Ray Security Inspection of Prohibited Items Detection with You Only Look Once. Electronics, 14(7), 1351. https://doi.org/10.3390/electronics14071351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop