LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security

Benjumea-Bellott, Constantino; Torregrosa-Domínguez, Ángel; Ramos-González, Víctor; Soria-Morillo, Luis M.; Álvarez-García, Juan A.

doi:10.3390/app16105104

Open AccessArticle

LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security

by

Constantino Benjumea-Bellott

^1,†

,

Ángel Torregrosa-Domínguez

^2,†

,

Víctor Ramos-González

¹

,

Luis M. Soria-Morillo

²

and

Juan A. Álvarez-García

^2,*

¹

Department of Languages and Computer Systems, Universidad de Sevilla, 41012 Seville, Spain

²

Department of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Seville, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(10), 5104; https://doi.org/10.3390/app16105104

Submission received: 7 April 2026 / Revised: 9 May 2026 / Accepted: 12 May 2026 / Published: 20 May 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

The proliferation of 3D-printed firearms poses a growing challenge for civil security, particularly in controlled public environments such as airports, train stations, and other transit hubs. These objects are often manufactured from polymer materials, exhibit high design variability, and are difficult to detect using conventional inspection systems. With over 20,000 weapon designs freely available online, traditional dataset creation methods cannot match the pace of design evolution. To address this challenge, we present LEOPARD, a pipeline designed to support civil security applications by converting CAD (computer-aided design) models of illicit firearm components into large-scale, photorealistic synthetic datasets. The pipeline incorporates procedural geometric variations, material imperfections, and physics-based rendering to realistically model 3D-printed objects as they may appear during security screening. Using this pipeline, we introduce LEOPARD-Zero, a dataset of 75,000 fully annotated synthetic images focused on the detection of illegal 3D-printed firearm components, with potential applications in civil transportation security contexts. Object detection models trained exclusively on our synthetic data achieve high performance on real 3D-printed components, with mAP@50 exceeding 83% and precision reaching up to 91.9%, demonstrating viable performance without requiring extensive real-world data collection. To encourage further research in automated inspection and public safety, we have released LEOPARD-Zero.

Keywords:

synthetic data generation; 3D-printed weapons; object detection; computer vision; security screening

1. Introduction

The emergence of 3D-printed firearms has evolved from a technological novelty to a genuine security concern in the real world. In 2013, the release of “The Liberator”, the first fully 3D-printable handgun, was studied and shown to possess a lethal capacity [1], marking a pivotal moment in the democratization of weapon manufacturing. Since then, designs have become increasingly sophisticated and widespread.

Contrary to appearances, any citizen has full access to the designs, materials, tools, and technical know-how necessary for constructing 3D-printed weapons. Open platforms, such as DEFCAD (https://defcad.com/), currently host more than 20,000 downloadable firearm designs, including the FGC-9 MkII, which is the most popular, with hundreds of thousands of downloads. Alongside this, a few online tutorials or PDF guides are sufficient to successfully assemble lethal 3D-printed weapons using commercially available 3D printers and standard filament materials, at low cost and within a few hours [2,3]. That, in addition to tools such as Ghost Gunner (https://ghostgunner.net/), an automated CNC milling machine, enables users to fabricate regulated components, such as lower receivers, further fueling the spread of ‘ghost guns’ (unregistered and untraceable firearms).

This technological accessibility has materialized into concrete security incidents and law-enforcement operations worldwide. Recent high-profile cases and seizures across North America, Europe, Oceania, and Asia [4] show that 3D-printed firearms are no longer a theoretical concern, but an operational and increasingly international security threat (Figure 1). Representative examples include the December 2024 murder of UnitedHealthcare executive Brian Thompson, reportedly linked to a privately manufactured firearm with 3D-printed components [5]; Project CLUSTER in Canada [6]; Spanish operations such as SAGUARO, ODILO, and CARMELO [7]; and coordinated enforcement actions in Australia and New Zealand involving privately manufactured firearms and 3D-printed components [8]. These cases, together with other recent incidents documented worldwide, validate earlier predictions by Walther et al. [9] and demonstrate that 3D-printed weapons have evolved from proof-of-concept prototypes to functional, lethal instruments used in criminal activity.

Increasing their risk, non-metallic components of these weapons not only facilitate concealment and transport but also make them undetectable by conventional security screening systems [3]. Moreover, their modular design allows users to disassemble them into innocuous-looking parts, enabling easy transport across borders or through security checkpoints without raising suspicion.

All this has led to an annual exponential growth rate, becoming an increasingly pressing challenge for regulatory and security frameworks. This concern was first raised in 2014 by the Geneva Centre for Security Policy [10], and in 2018, the U.S. Department of Justice revised export control laws, legalizing the distribution of 3D-printed gun files for weapons under 0.50 caliber (excluding automatic weapons) (https://archive.is/jpKMh, accessed on 7 May 2026). These weapons present four key challenges to traditional security systems [4]:

Material Evasion: Common materials like PLA and PETG are radiolucent, making them virtually invisible to standard X-ray scanners.
Untraceable Manufacturing: The absence of serial numbers or ballistic fingerprints renders forensic tracing impossible.
Morphological Fluidity: Continuous online updates introduce new modular parts weekly, allowing disassembled weapons to resemble harmless objects.
Detection Asymmetry: The traditional detection pipeline (physical production, scanning, annotation) cannot keep pace with the digital design cycle.

The case for component-level inspection stems directly from the operational reality of how these weapons are transported. Unlike conventional metallic firearms, 3D-printed weapons such as the FGC-9 MkII are specifically engineered for modular disassembly, enabling covert transport through security checkpoints. When broken down into individual components, each part is composed entirely of polymer material, rendering it invisible to standard metal detectors. Furthermore, isolated components bear little visual resemblance to a complete weapon: a lower receiver appears as an unremarkable plastic block, a pistol grip as a simple ergonomic handle. Detection systems trained exclusively on assembled firearms are therefore incapable of flagging these components individually. A fully functional weapon can thus be transported piece by piece, each part passing inspection undetected, and assembled at the destination in minutes. Addressing this threat requires detection at the component level, not at the level of the complete weapon.

While numerous datasets exist for conventional, fully assembled metallic firearms, these are fundamentally unsuited for the emerging threat of 3D-printed weapons. This new challenge is defined by two unique factors:

Rapid design evolution, where new CAD models for parts are shared online daily, rendering any static dataset obsolete almost immediately.
Component-based detection, as firearms are often trafficked or smuggled in a disassembled state.

To our knowledge, no existing public dataset addresses this specific, combined problem. The state of the art lacks a resource for training models to recognize the distinct, often ambiguous, geometric signatures of disassembled 3D-printed parts. This critical gap is not just a failure of data collection, but a failure of methodology; traditional collection of real-world samples is intractably slow and cannot keep pace with the threat. Our work is the first to directly confront this problem by proposing a methodology that does not rely on collecting physical examples.

In this context, we investigate whether it is feasible to train computer vision models capable of detecting parts of 3D-printed firearms in real time, at a pace that can match the rapid evolution of this emerging threat. Our objective is to investigate whether a synthetic dataset could achieve comparable accuracy to a real-world dataset when used to train a firearm detection model. In concrete terms, this work presents three key contributions to the field of 3D-printed firearm detection and analysis:

Proposed Methodology: Enables the rapid creation of synthetic datasets for the purpose of detecting 3D-printed firearms components to train accurate detectors.
Experimental Validation: Experiments have been conducted on a collection of real 3D-printed parts, surpassing 80% of all global metrics.
Public Dataset Release: We make publicly available a dataset of synthetic and real RGB labeled images of 3D-printed firearm components to support research and development in this field.

It is important to emphasize that this work is strictly defensive in nature. The goal of LEOPARD is to equip security systems with the means to detect illicit weapons, not to facilitate their production or dissemination. The ethical implications of this research are further discussed in the Dual-Use Research of Concern statement at the end of this manuscript.

The rest of this paper is organized as follows. Section 2 presents related work in weapon detection systems and different approaches in synthetic dataset creation. Section 3 describes the methodology for generate the synthetic dataset and firearm components detector. Section 4 describes the experimental setup and experiments carried out, and discusses the results obtained. Section 5 discusses the limitations of the proposed approach. Section 6 concludes the paper by summarizing the main contributions and outlining future research directions.

2. Related Works

Synthetic datasets offer several advantages over traditional datasets. They are easy to control and significantly cheaper to label and to create. Additionally, they provide a less obvious but crucial benefit: they can be adjusted for every new training. This flexibility allows researchers to generate the exact type of data needed, transforming datasets from static elements into dynamic tools that support the training process. One last great advantage is being able to study the occlusions, even if an object is partially hidden in the scene: the point mesh makes it possible to know exactly where each vertex is and its coordinates at all times, which is very useful information to avoid manually labeling each image.

2.1. Synthetic Data Approaches

The use of synthetic data for augmenting or generating training datasets has been explored for more than a decade. Early works, refs. [11,12], utilized video game environments (e.g., Half-Life 2) to design and evaluate pedestrian detection systems, demonstrating the potential of synthetic scenarios for training real-world detectors. Ref. [13] introduced the “Virtual KITTI” dataset using Unity (https://unity.com/), showing that models pretrained on synthetic data can perform comparably to those trained on real-world data and that combining both improves performance. Similarly, ref. [14] used data from Grand Theft Auto V to enhance the semantic segmentation results on real datasets like KITTI by [15].

The introduction of domain randomization (DR) [16] marked a shift towards intentionally less photorealistic data to improve generalization. DR techniques force models to focus on essential object features, improving robustness when real images are scarce. Structured DR, introduced by [17], further refined this by preserving contextual elements, especially for the detection of small objects.

We now present four synthetic data paradigms that have informed our work, from which we have drawn upon their respective strengths while mitigating their limitations to suit our specific problem.

Modular: This approach builds synthetic datasets by assembling assets from a fixed library of labeled components—such as objects, backgrounds, and actions, among others—according to predefined rules. An illustrative example is the synthetic image dataset of a chessboard [18], in which different chess pieces are distributed across a board to generate various game states.
Parametric: Collections composed by adjusting specific visual properties—like lighting, color, texture, or viewpoint—while the shape and structure of the objects remain unaltered. A good example is UnrealCV [19], which demonstrates how changes in rendering settings can produce a wide range of appearances from a single 3D model.
Procedural: This paradigm collects images generated by introducing slight alterations over a fixed set of predefined models. Each sample is built from scratch, often with parameters that control aspects such as shape or structure, which allows for a significant amount of variation within the same type of object. Ref. [20] shows how spontaneous variations, such as leaf aging, play an important role.
Simulation: Modeling object interactions like collisions and fragmentations within controlled environments enables the creation of datasets where individual components of objects are labeled and tracked while maintaining their class relationships. Tools like Kubric, from [21], facilitate the scalable generation of complex simulated scenarios with precise annotations.

We build our synthetic dataset of 3D-printed weapon parts by incorporating elements from all the previously described generation strategies: It is composed of several parts of weapons (modular), with variations in visual properties such as color and material (parametric), as well as small surface variations and imperfections (procedural). Finally, all parts are combined in a realistic scenario that allows occlusions (simulation). We will detail each process in the next section.

2.2. Synthetic Datasets for Firearm Detection

Firearm detection has been successfully addressed in prior works [22,23,24,25,26], suggesting that the current bottleneck lies not in the model architectures but in the availability and quality of datasets.

Although there are datasets of real images of complete weapons [27,28], and websites that are useful, such as the Internet Movie Firearms Database (http://www.imfdb.org/) or Open Images Dataset (https://storage.googleapis.com/openimages/web/index.html, accessed on 7 May 2026), as far as we know, no work has addressed the detection of 3D-printed weapon parts, given that they are usually transported disassembled, and thus enable us to combat the illegal trafficking of this type of weapon.

Within the use of synthetic datasets, we can observe different approaches to the problem of conventional weapon detection.

The military simulator VBS3 has been employed [29] to build a synthetic weapon dataset to train an R-CNN model. The following year, ref. [22] continued the same approach using the Unity game engine to augment an existing real-world dataset, creating a virtual environment with non-player characters (NPCs) with several kinds of firearms. Ref. [25] used directly the popular shooter video game Watch Dogs 2 to include frames in a synthetic dataset. Ref. [30] proposes detecting and tracking shooters, not just guns, to make systems that do not lose the threat if the gun is partially hidden. Since there is not a lot of real shooter data, they use synthetic data made in a simulation of Unreal Engine and train with domain randomization and transfer learning.

In respect to X-ray datasets, ref. [31] compares how well CNNs like Faster R-CNN [32] detect guns, gun parts, and knives using real X-ray images and synthetic images. Since real data are hard to collect, they test if made-up (composited) X-ray images can also train models well. The model performs better with real data, but synthetic data is close to it.

A specific use case is the automated baggage inspection using X-ray images. Ref. [33] is a good example of the use of 3D data to expand an existing dataset, adding simulated 3D objects into real X-ray luggage images from a projection of a 3D model. This helps train deep learning models without needing lots of labeled real data, showing synthetic data can improve model performance on real tests.

Finally, we can point out that there exist some projects [34,35] that are trying to avoid printing 3D firearms or combating illicit firearms trafficking, but their approach differs from ours, and does not create useful datasets for detecting parts of 3D-printed firearms.

In our case, we seek an effective solution to the problem of creating custom datasets to detect 3D-printed weapon components, with the potential to be implemented with X-ray scanners. For this reason, having studied the proposed solutions to the problem of conventional weapon detection, we have decided to develop a methodology that combines the strengths of the aforementioned studies and fully leverages the availability of the .STL file (a raw, unstructured triangulated surface used to print 3D-objects).

To address the lack of real datasets, we contribute by creating a synthetic dataset and a manually labeled one using RGB images.

To the best of our knowledge, there are no works in which separate parts of weapons are intended to be detected, which is common when transporting 3D-printed weapons.

3. Methodology

We present the LEOPARD pipeline (Figure 2) for generating synthetic datasets based on open-source CAD models of firearms. The workflow includes manual mesh preparation, part-wise segmentation, and the application of controlled geometric, material, and environmental variations. These synthetic samples are rendered under diverse conditions to simulate real-world imperfections and lighting. Finally, we perform some optimizations in bounding-box computation and object rendering to produce visually diverse samples suitable for training robust detection models.

3.1. LEOPARD Pipeline

The proposed pipeline can be described as follows: let

M = m_{1}, \dots, m_{k}

be the set of CAD mesh components of a given firearm model. For each component

m_{i}

, the pipeline applies a stochastic transformation

τ (m_{i}; θ_{i})

where

θ_{i} \sim p (Θ)

encodes sampled procedural parameters (geometry perturbations, material properties, pose). A scene S is constructed by compositing a sampled component

m_{i}

onto a randomly selected background

b \sim B

, under randomized lighting

l \sim L

. The rendered image

x = R (S)

is produced by the renderer R, and the annotation y is derived analytically from the transformed vertex coordinates of

m_{i}

. The dataset is thus defined as

D = {(x_{j}, y_{j})}_{j = 1}^{N}

, where each sample is independently generated by sampling fresh parameters

θ_{i}

, background b and lighting l.

The following subsections detail each stage of this pipeline, structured into four main stages: model preparation, procedural variation, material tuning, and scene setup.

3.1.1. Model Preparation and Selection Rationale

Our workflow (Figure 2) begins with the selection of firearm models from open repositories such as DEFCAD. We focus on the FGC-9 MkII for three reasons: (1) it is among the most downloaded designs globally (hundreds of thousands of downloads), (2) its hybrid architecture, combining printed and metal components, represents typical modern 3D-printed firearms, and (3) its modular design with clearly separable components facilitates part-wise annotation and detection training.

We employed Blender 3 (Figure 3b) as our primary 3D processing environment due to its robust Python API, enabling automated batch processing, native support for procedural geometry via Geometry Nodes, and compatibility with both real-time (Eevee) and photorealistic (Cycles) rendering engines. STL files downloaded from repositories underwent topology validation to ensure manifold geometry, a critical requirement for accurate bounding-box computation in subsequent pipeline stages.

While automated mesh simplification algorithms exist [36,37], we opted for semi-automated retopology using Blender’s Remesh modifier with manual quality control. This preserves fine structural details essential for realistic rendering (e.g., layer lines, screw threads) while reducing vertex count by approximately 60–70%, from an average of 150 k to 50 k vertices per component. This reduction significantly accelerates downstream bounding-box calculations without compromising visual fidelity.

Once the mesh has been retopologized, we export the content of the STL archive into OBJ format, incorporating the proper material shaders. At this stage, the firearm is segmented into individual components according to mechanical assembly references. Treating each part as an isolated, connected component within a 3D graph allows for greater procedural control and independent annotation. We use Blender’s Geometry Nodes system in segmentation, which offers a flexible, node-based framework for applying controlled random variations to each component.

3.1.2. Procedural Variation

In the second stage, procedural modifications are applied to simulate imperfections commonly observed in real-world 3D-printed objects. We introduce controlled surface distortions, including scratches, dents, leftover support structures, and visible layer lines (Figure 3b). As we have already discussed, adding these flaws makes the model more resilient when dealing with the messy, imperfect data typical of real-world inputs.

3.1.3. Material Parametrization

To make the synthetic models visually closer to real 3D-printed objects, we focus on material parametrization in the third stage of the workflow. We actively parametrize shaders to reproduce the optical behavior of commonly used printing materials, with a particular emphasis on Polylactic acid (PLA). For each model, we vary the surface reflectance to simulate differences in plastic sheen, introduce slight deformations that resemble warping from uneven cooling, and apply irregularities in color to reflect typical filament inconsistencies. We also add subtle texture noise that imitates layer lines and minor defects left by the extrusion process, intended to reproduce the diversity of appearances seen in real printed firearms.

3.1.4. Scene Composition

In the final stage, we generate complete scenes by introducing variation across key environmental and rendering settings. We randomize lighting conditions using High Dynamic Range Image (HDRi) maps, adjust camera angles dynamically, and introduce controlled occlusions by placing background elements or distractors in the scene. Depending on the dataset’s purpose, whether for quick prototyping or final model training, we render the scenes either with Blender’s native engine or in Unreal Engine 5 for a photorealistic output (Figure 4). This setup enables fine-grained control of the visual quality of the data, allowing us to match the realism level to the specific requirements of each experiment.

Once the weapon models have been decomposed and all parametric configurations applied, we initiate the dataset generation loop. In each iteration, the system randomly selects a background and places a single weapon component at the scene’s origin. We then apply random translations and rotations to the object, followed by the procedural geometry alterations described earlier. Material parameters are also perturbed to introduce visual variability, and lighting conditions are randomized once again before rendering the final image (Figure 5). During rendering, we compute the object’s bounding box based on its vertex coordinates and store the annotation in the corresponding .TXT file. Each rendered image is then resized to meet the input specifications of the target neural network. By repeating this process for a predefined number of iterations, the desired size of the LEOPARD dataset, we generate a complete synthetic dataset. To further improve model robustness, we also include negative samples consisting of randomly generated background scenes with no weapon components present (Figure 6).

3.1.5. Optimization

To scale up the dataset generation, we applied several optimizations. The goal of these optimizations is to minimize the total dataset generation time

T_{g e n}

for a dataset of N images:

T_{g e n} = N \cdot (t_{b b o x} + t_{r e n d e r})

(1)

where

t_{b b o x}

is the per-image bounding-box computation time and

t_{r e n d e r}

the per-image rendering time. We address each term independently.

To reduce

t_{b b o x}

, we introduce a binary search on the vertex array to accelerate bounding-box calculation, reducing the overall complexity of spatial computations when iterating across a large number of scenes.

To reduce

t_{r e n d e r}

, we apply three complementary measures. First, we reduce the hyper-realism of rendered objects: instead of photorealistic textures, simplified visuals are used that retain the key features necessary for detection while being significantly lighter to render. Second, background scenes are preloaded into memory, avoiding I/O bottlenecks and facilitating rapid context switching between scenes during iterative rendering. Third, fully parametrized material shaders replace image-based textures, eliminating texture file lookups and maintaining an efficient GPU rendering pipeline.

All these changes enabled the generation of thousands of synthetic images quickly, without compromising the visual quality required for training.

As a proof of concept of the proposed methodology, as detailed in the following section, we create LEOPARD-Zero dataset (publicly available (https://deepknowledge-us.github.io/LEOPARD-dataset/, accessed on 7 May 2026)) and train some models for physical test detection.

4. Experimentation

To assess the adequacy of the LEOPARD pipeline, we have produced LEOPARD-Zero, a fully synthetic dataset following the methodology presented before. This dataset contains 75,000 synthetic images, comprising 67,500 synthetic weapon images and 7500 images free of arms (other objects and background).

We then evaluate its performance in the task of recognizing 3D-printed firearm components over a set of 218 manually annotated real images. To assess the adequacy of the LEOPARD pipeline, we selected the YOLO11 model for our experiments. As established in our review of related works, the problem of 3D-printed part detection in a security context necessitates a high-speed, single-stage detector. The YOLO architecture is the de facto standard for this task. We specifically chose the recent YOLO11 [38] iteration due to its robust architecture, which is widely benchmarked and provides a powerful, off-the-shelf validator for our data.

Furthermore, to analyze the ’reality gap’ and the complexity of the features learned from our synthetic data, we employed two distinct model variants: YOLO11s (small) and YOLO11m (medium). This comparison is not intended to be an exhaustive search for the best model, but rather an ablation study on model capacity. It allows us to investigate whether a more powerful model (YOLO11m) can leverage the subtle, procedurally generated details in our dataset, or if a lighter, faster model (YOLO11s) is sufficient for generalizing to real-world images. This analysis is critical for understanding the dataset’s utility for deployment on computationally constrained edge devices.

Specifically, we present two main experimental setups, differing in the composition of their training and validation datasets. In the first scenario, both the training and validation sets are composed entirely of synthetic images. In the other, while the training set remains unaltered, the validation is extended with 1805 manually annotated real images, which allows us to check the influence of real-object data on the recognition capacity.

4.1. Datasets Specifications

4.1.1. LEOPARD-Zero: Synthetic

The dataset consists of 75,000 synthetic images generated from five key weapon component: upper receiver, lower receiver, pistol grip, barrel retainer, and trigger rot (marked in Figure 7). The dataset encompasses four distinct conditions, determined by using light or dark colors for weapon parts and by the presence or absence of occlusions in the scenes, ensuring a faithful reflection of real-world conditions.

The full process, including occlusion simulation and rendering, was completed in 9 h, 4 min, and 13 s on a workstation with an NVIDIA RTX 3090 GPU and 64GB of RAM, yielding an average generation time of 0.435 s per image. In contrast, capturing and manually labeling an equivalent real dataset would require approximately 75–100 min to photograph 100 images per part, and at least 2 min to label each image, without taking into account the time to print every part. This amounts to over 2,500 h, that is, more than 300 full-time workdays, which highlights the efficiency and scalability of the synthetic approach.

4.1.2. Manually Annotated Data from 3D-Printed Gun Parts

In order to verify the effectiveness of our LEOPARD methodology, it is essential to put it to the test. As our purpose is to show its effectiveness in real environments, we have printed various versions of components of the selected weapon at different scales and colors (Figure 5). Once the pieces are printed, we proceed to record videos using standard consumer cameras. These recordings provide us 2023 images that are then manually annotated, allowing us to perform a comparison between the model’s inference results and the corresponding ground truth annotations. This process provides a practical evaluation of the real-world applicability and reliability of the synthetic dataset and training approach.

4.1.3. Training, Validation and Test Data

As mentioned above, we will evaluate the aptness of the LEOPARD-Zero in 3D-printed weapon parts detection. To ensure a fair and meaningful comparison, we maintain the test set completely independent from the training and validation data. Specifically, the 218 images in the test set are extracted from a single video of 3D-printed firearm parts, deliberately set aside from the other videos to avoid potential biases arising from the model encountering frames with highly similar visual content.

Regarding the training data, two distinct experimental scenarios are considered. In the first scenario (denoted as LEOPARD-Zero), the entire dataset consists of synthetic images. For each condition (light/dark gun parts, presence/absence of occlusions, and only non-relevant objects), 80% of the images are allocated for training and 20% for validation. The distribution of the situations is detailed in Table 1.

In the second scenario (denoted as LEOPARD-Twelve), the training set remains unchanged. However, the validation set is augmented with 1805 images manually annotated from videos of printed parts, representing a 12% increase relative to the original validation set size.

4.2. Experiment Environments Specifications

All training and evaluation experiments were carried out on a high-performance Linux workstation running kernel version 5.15.0-134-generic. The system was equipped with a 16-core processor (32 logical threads) and a single NVIDIA A100 GPU with 40 GB of memory. The Python environment was based on version 3.9.13.

Both scenarios followed the following configuration: we used two YOLO11-based [38] object detection models trained using either the LEOPARD-Zero dataset or LEOPARD-Twelve dataset. These models were trained for 50 epochs with an input image size of 640 × 640 and a batch size of 8. The training used the Adam optimizer with a fixed learning rate of

0.001

. The first 5 layers of the model were frozen to preserve low-level feature representations from the COCO pre-training, avoiding premature adaptation to the synthetic domain. Multiscale rendering was used, and the data augmentations utilized were HSV shift, rotation, scaling, flipping, perspective distortion, and mosaic.

To estimate the variability of the reported metrics, the 218-image test set was partitioned into three non-overlapping subsets of approximately 73 images each, using a random split with fixed seed (seed = 42) for reproducibility. The trained model was evaluated independently on each subset, and reported values correspond to the mean across the three evaluations, with 95% confidence intervals computed from the resulting distribution. Retraining on different data partitions was not performed; the training set remained fixed across all experimental scenarios.

4.3. Results and Discussion

This section presents and discusses the results obtained by the four model variants, YOLO11s and YOLO11m trained under the LEOPARD-Zero and LEOPARD-Twelve scenarios, evaluated on the independent real test set. We first analyze the training dynamics through loss and mAP@50 curves, then examine per-class behavior via confusion matrices, and finally compare aggregate performance metrics across all models.

Figure 8 shows the train and validation loss curves and the validation mAP@50 for all four model variants over 50 epochs. In the LEOPARD-Zero scenario, train and validation loss curves remain closely aligned throughout training, confirming that both sets share the same synthetic domain. In contrast, LEOPARD-Twelve models exhibit a systematic train-validation gap that persists across all epochs: validation loss continues to decrease but remains consistently above the training loss, reflecting the domain shift between the synthetic training set and the real validation images. This gap is particularly pronounced in YOLO11m_Twelve, which shows the slowest convergence in classification loss. The mAP@50 curves reflect this divergence directly: LEOPARD-Zero models reach approximately 97% mAP@50 on synthetic validation, while LEOPARD-Twelve models plateau around 74% on real images, a gap of approximately 23 percentage points that quantifies the synthetic-to-real domain shift.

Figure 9 shows the normalized confusion matrices evaluated on the video test set for YOLO11s and YOLO11m on the LEOPARD-Zero dataset. YOLO11m achieves solid recall on classes like barrel_retainer (88%) and trigger_rot (100%), but shows poor recall for lower_receiver (47%) and pistol_grip (30%), with 53% and 70% of their instances misclassified as others, respectively. This indicates difficulties distinguishing some parts from the background or visually similar components.

In contrast, YOLO11s substantially reduces these errors. Recall for lower_receiver rises from 47% to 58%, and for pistol_grip from 30% to 65%. Similarly, upper_receiver recall improves from 75% to 90%, further supporting the superior per-class discrimination of the smaller model. Although YOLO11s introduces some new cross-class confusions (e.g., lower_receiver partly misclassified as pistol_grip or upper_receiver), overall, it demonstrates superior discrimination among classes and fewer collapses into the background category.

Despite YOLO11m’s greater capacity, these results suggest YOLO11s generalizes better for this fine-grained task. As evidenced by the training–validation loss curves, the performance gap is not attributable to classical overfitting since validation loss decreases consistently in both models throughout training. A more plausible explanation is that the subtle visual differences between gun parts do not require the representational depth of a bigger architecture, and that YOLO11s’s capacity is better aligned with the complexity of this specific detection task. Thus, for LEOPARD-Zero, YOLO11s offers more reliable performance than YOLO11m, underlining that larger architectures are not always advantageous for fine-grained detection tasks.

Now we want to explore how the inclusion of a very small amount of real data can affect the performance of the models. As described in Section 4, training was performed exclusively on synthetic data to evaluate its capacity to support gun parts detection without reliance on real samples. Real images were reserved for validation to assess the domain gap and generalization to real-world scenarios.

Figure 10 presents the normalized confusion matrices evaluated on the real test set for YOLO11s and YOLO11m trained on the LEOPARD-Twelve dataset. YOLO11s shows strong performance, with trigger_rot reaching 100% recall, upper_receiver 92%, and pistol_grip 70%, though some confusion persists with others, particularly for lower_receiver (26%) and pistol_grip (24%).

YOLO11m, however, struggles significantly more: lower_receiver drops to just 23% recall, with 67% of its instances misclassified as others. pistol_grip reaches only 46% recall, with 51% falling into others. Overall, YOLO11m displays higher misclassification rates and a greater tendency to collapse predictions into the background category, confirming that the inclusion of real validation data does not translate into consistent performance gains for the larger model.

In general terms, the results show that YOLO11s trained on LEOPARD-Twelve provides well-balanced and robust performance across all metrics (Table 2). It achieves the highest mAP@50 (86.78%) and mAP@75 (84.78%), indicating strong localization and classification accuracy even under stricter IoU thresholds. Its precision (87.24%) remains competitive, while its recall (80.41%) is notably the highest among all models, suggesting better detection coverage and fewer missed objects.

By contrast, YOLO11m models, despite achieving significantly higher precision in the LEOPARD-Zero scenario (91.97%), generally suffer from lower recall and lower mAP scores, indicating a trade-off where increased model complexity does not consistently translate into superior overall performance.

These findings suggest that, while LEOPARD-Twelve yields modest but consistent improvements in mAP@50 and recall for YOLO11s (+3.66% and +5.12%, respectively), the benefit does not extend to YOLO11m, which shows no consistent improvement in aggregate metrics and, as evidenced by the confusion matrices, higher misclassification rates into the background category.

Representative successful detections using a standard camera are shown in Figure 11. However, certain misclassifications still persist between components with similar morphological features. In particular, lower_receiver and pistol_grip are prone to mutual confusion, likely due to their shared elongated base and pronounced curves, features that are absent in trigger_rot, which consistently achieves near-perfect recall across all models and scenarios. Additionally, false positives occasionally occur, such as detecting a laptop charger as an upper_receiver, though these appear in only a few frames and always below 55% confidence, as shown in Figure 12.

The consistent detectability of trigger_rot makes it particularly relevant for anomaly detection scenarios, such as flagging suspicious items in shipping or security contexts. More broadly, these results demonstrate that purely synthetic training data can support meaningful real-world detection performance, though challenging acquisition conditions such as motion blur, low lighting, or low frame rates remain limiting factors for deployment.

4.4. Synthetic Dataset Quality Analysis

To complement the downstream detection results, we conduct an explicit quality analysis of the LEOPARD-Zero dataset by comparing its visual and distributional properties against the real image collection used for evaluation.

4.4.1. Normalization Methodology

A direct comparison between synthetic and real images is complicated by a structural asymmetry: synthetic images contain a single weapon component per frame, whereas real images captured during video recording may contain multiple components simultaneously. To resolve this and make both domains comparable on equal terms, we apply a crop-based normalization procedure. For each annotated image in the real splits (LEOPARD-Twelve validation set and the independent video test set), individual component instances are extracted as bounding-box crops using the corresponding YOLO ground-truth labels. Each crop is resized to

224 \times 224

pixels to standardize spatial resolution across both domains prior to metric computation. This yields a total of 8406 real component crops (Table 3) distributed across five classes, against which 1000 synthetic crops per class are compared.

4.4.2. No-Reference Perceptual Quality: BRISQUE

To assess intrinsic perceptual quality without requiring paired samples, we compute the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [39] for each crop independently. BRISQUE estimates image quality by measuring deviations from the statistical regularities of natural scene statistics; lower scores indicate fewer perceptual distortions relative to natural image statistics.

As shown in Table 4, synthetic images consistently achieve lower BRISQUE scores than their real counterparts across all five classes, reflecting their controlled rendering environment, which is free from camera noise, motion blur, and illumination artifacts. The gap ranges from 22.94 to 30.27 points for four of the five classes. A notable exception is trigger_rot, where the gap narrows to 9.78 points, indicating that its synthetic representation is perceptually closer to its real counterpart than the other components. This may reflect particular rendering characteristics of this component or specific visual properties of the real captures. Notably, trigger_rot also achieves the highest recall across all models and experimental conditions (Section 4.3), though the relationship between BRISQUE proximity and detection performance involves additional factors beyond perceptual quality alone.

4.4.3. Feature-Space Distribution: KID

To measure the distributional distance between the synthetic and real domains at the feature level, we compute the Kernel Inception Distance (KID) [40] per class. Unlike the Fréchet Inception Distance (FID), KID is an unbiased estimator that yields reliable measurements even with the limited number of real samples available in our setting. KID is calculated between the real crops and a balanced sample of 200 synthetic crops per class.

As shown in Table 5, KID values range from 0.146 to 0.223, confirming a moderate but consistent distributional gap between domains. The classes pistol_grip, lower_receiver, and barrel_retainer present the smallest distances, while trigger_rot, despite its high perceptual fidelity (lowest BRISQUE gap), shows the largest KID. This apparent divergence is explained by the different scales of these metrics: BRISQUE operates on local pixel statistics, whereas KID measures distances in the high-level feature space of an Inception network. A component may be perceptually similar at the local texture level while still differing substantially in higher-level appearance features as captured by deep networks.

4.4.4. Downstream Performance as Quality Proxy

The metrics above characterize the dataset at the image and feature level. As a complementary measure, downstream detection performance on real images provides a direct indicator of whether the synthetic data contains sufficient task-relevant information for model training. Under this criterion, and acknowledging the limited size of the test set (218 images from a single recording session, as discussed in Section 5), YOLO11s trained on LEOPARD-Zero achieves 83.12% mAP@50 on real 3D-printed components, with precision reaching 86.13% and recall 75.29%. These results suggest that, despite the distributional gap quantified by KID, the synthetic data supports viable detection performance without any real images during training.

5. Limitations

While LEOPARD demonstrates promising results, several limitations warrant acknowledgment.

Domain Gap Persistence: The training curves reveal a pronounced gap between the two validation scenarios: LEOPARD-Zero models, validated on synthetic images, reach approximately 97% mAP@50, while LEOPARD-Twelve models, validated on real images, plateau around 74% mAP@50 (Figure 8). This 23-point gap directly reflects the synthetic-to-real domain shift during training. On the independent real test set, the best model trained exclusively on synthetic data (YOLO11s trained on LEOPARD-Zero) achieves 83.12% mAP@50, suggesting that while the domain gap is significant during validation, the models retain meaningful generalization to unseen real components. Real-world deployment faces additional challenges not captured in controlled testing: extreme lighting conditions, motion blur exceeding synthetic simulation, non-standard material compositions (e.g., carbon fiber-reinforced filaments), and adversarial modifications designed to evade detection.

Dataset Scope: Our evaluation focuses exclusively on FGC-9 MkII components. Generalization to the 20,000+ available designs remains not validated. The 218-image test set from a single video session limits the statistical robustness of our conclusions.

Computational Accessibility: While 9 h for 75K images represents substantial improvement over manual annotation, it remains computationally intensive (requiring high-end GPU infrastructure), potentially limiting deployment in resource-constrained security contexts.

Adversarial Vulnerability: We have not evaluated robustness against adversarial attacks, including physical camouflage patterns, intentional shape modifications, or presentation attacks designed to exploit synthetic training biases.

Modality Scope: The experimental validation presented in this work is limited to RGB camera imagery captured in controlled desktop environments. No empirical evaluation has been conducted in operational security screening modalities such as X-ray or millimeter wave imaging, or in cluttered baggage backgrounds representative of real checkpoint conditions. While Appendix A demonstrates the pipeline’s extensibility to X-ray-like synthesis, quantitative validation in these modalities remains as the primary direction for future work.

6. Conclusions and Future Works

The central objective of this work was to investigate whether a fully synthetic dataset, generated directly from CAD models without any real-world images, could support the training of object detectors capable of recognizing disassembled 3D-printed firearm components. The results presented in Section 4 provide a preliminary affirmative answer to this question. Models trained exclusively on LEOPARD-Zero achieve mAP@50 exceeding 83% on real 3D-printed components, with YOLO11s trained on LEOPARD-Twelve reaching 86.78% mAP@50 and 80.41% recall when a small proportion of real images is incorporated into validation. These results demonstrate that synthetic data alone is sufficient for viable detection performance, without requiring any real-world data collection at training time.

It must be acknowledged, however, that this evaluation is conducted on a 218-image test set extracted from a single recording session, and that no public benchmark of real 3D-printed firearm components currently exists against which to compare. The reported metrics should therefore be interpreted as a proof-of-concept validation of the proposed methodology rather than as a definitive performance benchmark. As discussed in Section 5, the residual 23-point domain gap between synthetic validation performance (~97% mAP@50) and real-image validation (~74% mAP@50) further highlights the challenges that remain before operational deployment.

The three contributions stated in Section 1 are fulfilled as follows. First, the LEOPARD pipeline provides a replicable methodology for the rapid creation of synthetic detection datasets directly from STL files, reducing dataset creation from over 2500 h to under 9 h. Second, experimental validation on real 3D-printed components confirms detection performance above 80% across all global metrics, with consistent results across the repeated evaluation over three non-overlapping test subset (mAP@50: 86.78% ± 0.49% for the best model). Third, the dataset LEOPARD-Zero is publicly released to support further research in this area.

The significance of this work extends beyond the immediate utility of the LEOPARD-Zero dataset. We have demonstrated a proactive framework for a security problem that is, by its nature, an “arms race”. Existing static datasets are perpetually reactive and can only detect threats from the previous day. By creating a pipeline that transforms digital designs into training data, we provide a mechanism for continuous and adaptive threat detection. This methodology allows security and machine learning models to be retrained and redeployed in pace with the rapid evolution of 3D-printed firearm designs, fundamentally shifting the paradigm from static detection to a dynamic, ongoing response.

The LEOPARD pipeline (Figure 2) offers a streamlined process for turning raw weapon CAD parts into high-quality training data. Starting from basic CAD files, often shared online in STL format, the pipeline automates the transformation into annotated datasets ready for object detection models, achieving:

A drastic acceleration in the creation of datasets, transforming raw CAD files into training-ready data in just a few hours, compared to the weeks this would normally require.
A realistic reproduction in material appearance and the usual 3D printing imperfections, which are essential for detecting weapons in real conditions.
Specific datasets for training highly accurate object detection models to distinguish 3D-printed weapon parts

Automating this pipeline could make it possible to train detection models in near real-time for any newly discovered or reported weapon models circulating online. For instance, a crawler could be developed to continuously scan and download .STL files from platforms such as DEFCAD or similar repositories. These files could then be automatically fed into the LEOPARD system to generate updated datasets on demand, ensuring that the detection systems remain current and responsive to emerging threats. This kind of automation would represent a significant step forward in proactive digital surveillance and public safety, particularly in the context of rapidly evolving 3D-printed weapon technologies.

It is important to note that the current experimental validation is limited to RGB imagery captured with consumer cameras. Nevertheless, these models perform well when detecting rendered images of parts in a virtual environment, which suggests that they could be particularly valuable for monitoring forums and social media platforms where there is suspicion of 3D weapon trafficking, potentially through a web crawler system. From our side, we intend to continue exploring this line of research and focus on developing datasets that would be difficult, or even impossible, to collect in real-life scenarios. Extending validation to operational security modalities is identified as the primary direction for future work, particularly X-ray scanning, for which we already have proof-of-concept results described in Appendix A. Our interest in this specific case stems from the fact that plastic components are invisible to standard metal detectors, and parts of a disassembled weapon can easily be mistaken for innocuous items. This represents a potentially unprecedented application for synthetic data in aviation security, as it addresses a latent threat and could significantly assist human security systems by providing the reassurance of a database that updates almost in real time with the circulation of new models on the Internet.

Author Contributions

Conceptualization, J.A.Á.-G. and L.M.S.-M.; methodology, Á.T.-D. and C.B.-B.; software, Á.T.-D. and C.B.-B.; validation, Á.T.-D. and V.R.-G.; and C.B.-B. formal analysis, C.B.-B.; investigation, C.B.-B.; resources, J.A.Á.-G. and L.M.S.-M.; data curation, Á.T.-D. and C.B.-B.; writing—original draft preparation, C.B.-B.; writing—review and editing, Á.T.-D., J.A.Á.-G., C.B.-B., L.M.S.-M. and V.R.-G.; visualization, C.B.-B.; supervision, J.A.Á.-G. and L.M.S.-M.; project administration, J.A.Á.-G.; funding acquisition, J.A.Á.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially funded by the projects HORUS (PID2021-126359OB-I00) funded by MCIN/AEI/10.13039/501100011033.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data and code needed to reproduce the experiments in this work can be obtained through the GitHub repository at https://deepknowledge-us.github.io/LEOPARD-dataset/ (accessed on 20 February 2026).

DURC Statement

Current research is limited to the fields of Computer Vision and Civil Security, which is beneficial for enhancing automated threat detection systems in public transportation hubs and does not pose a threat to public health or national security. The authors acknowledge the dual-use potential of research involving 3D-printed firearm components and confirm that all necessary precautions have been taken to prevent potential misuse. Specifically, this work focuses strictly on detection methodologies using pre-existing, publicly available designs and does not provide manufacturing instructions or functional schematics. As an ethical responsibility, the authors strictly adhere to relevant national and international laws about DURC (Dual-Use Research of Concern). The authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes for public safety.

Conflicts of Interest

The authors declare no conflicts of interest and no financial interest in this work.

Appendix A. Generation of X-Ray-like Synthetic Images

The procedure reuses the existing LEOPARD workflow with minimal modifications tailored to this scenario. Initially, a uniform white background is established and assigned a high emission value, typically by setting the emission parameter,

α

, to a value greater than one. This background functions as a constant emitter against which the objects are rendered. The components are then positioned in front of this background and assigned an X-ray-like material that simulates X-ray projection behavior rather than conventional surface shading. Rays are cast perpendicular to the background plane, and the color contribution of each surface point is determined by the dot product between the incident ray direction and the surface normal: surfaces perpendicular to the rays contribute no color and remain invisible, while increasingly inclined surfaces exhibit proportionally higher intensity, naturally highlighting boundaries, edges, and areas of curvature while suppressing flat regions.

As a result of this modified pipeline, an additional dataset of 75,000 synthetic X-ray-like images of 3D-printed firearm components was generated. Representative examples are shown in Figure A1.

Figure A1. Sample images from the synthetic X-ray-like dataset generated to explore the extensibility of the LEOPARD pipeline toward non-RGB screening modalities.

References

Lee, B. Where Gutenberg meets guns: The liberator, 3D-printed weapons, and the First Amendment. NCL Rev. 2013, 92, 1393. [Google Scholar]
Veilleux-Lepage, Y. CTRL, HATE, PRINT: Terrorists and the Appeal of 3D-Printed Weapons. ICCT Perspective, 13 July 2021.
Wilhelm, T. Ghost Guns: Untraceable, Deadly—and on Windsor’s Streets. Windsor Star, 15 March 2024.
Schaufelbühl, S.; Florquin, N.; Werner, D.; Delémont, O. The Emergence of 3D-Printed Firearms: An Analysis of Media and Law Enforcement Reports. Forensic Sci. Int. Synerg. 2024, 8, 5. [Google Scholar] [CrossRef]
Dass, R.A.S. The Rise of 3D-Printed Firearms. RSIS Commentaries, 19 December 2024.
Toronto Police Service. Man Arrested in Firearm Manufacturing and Trafficking Investigation: Project CLUSTER, 2026. Available online: https://www.tps.ca/media-centre/news-releases/65793/ (accessed on 7 May 2026).
Centro Superior de Estudios de la Defensa Nacional. Armas en 3D: Imprimiendo el Futuro del Tráfico Ilícito de Armas, 2025. Available online: https://www.defensa.gob.es/ceseden/-/armas_en_3d_imprimiendo_el_futuro_del_trafico_ilicito_de_armas (accessed on 7 May 2026).
Australian Border Force. More Than 1000 Illicit Firearms and Parts, 3D Firearms and Parts Seized in Transnational Week of Action, 2025. Available online: https://www.abf.gov.au/newsroom-subsite/Pages/More-than-1000-illicit-firearms-and-parts-3D-firearms-and-parts-seized-in-transnational-week-of-action.aspx (accessed on 7 May 2026).
Walther, G. Printing Insecurity? The Security Implications of 3D-Printing of Weapons. Sci. Eng. Ethics 2015, 21, 1435–1445. [Google Scholar] [CrossRef]
Lindstrom, G. Why should we care about 3D-printing and what are potential security implications. GCSP Policy Pap. 2014, 6, 1–4. [Google Scholar]
Taylor, G.R.; Chosak, A.J.; Brewer, P.C. OVVV: Using virtual worlds to design and evaluate surveillance systems. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2007; pp. 1–8. [Google Scholar]
Marin, J.; Vázquez, D.; Gerónimo, D.; López, A.M. Learning appearance in virtual scenarios for pedestrian detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2010; pp. 137–144. [Google Scholar]
Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2016; pp. 4340–4349. [Google Scholar]
Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. arXiv 2016, arXiv:1608.02192. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2012; pp. 3354–3361. [Google Scholar]
Tremblay, J.; Prakash, A.; Acuna, D.; Brophy, M.; Jampani, V.; Anil, C.; To, T.; Cameracci, E.; Boochoon, S.; Birchfield, S. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. arXiv 2018, arXiv:1804.06516. [Google Scholar] [CrossRef]
Prakash, A.; Boochoon, S.; Brophy, M.; Acuna, D.; Cameracci, E.; State, G.; Shapira, O.; Birchfield, S. Structured domain randomization: Bridging the reality gap by context-aware synthetic data. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2019; pp. 7249–7255. [Google Scholar]
Neto, A.d.S.D.; Campello, R.M. Chess position identification using pieces classification based on synthetic images generation and deep neural network fine-tuning. In Proceedings of the 2019 21st Symposium on Virtual and Augmented Reality (SVR); IEEE: New York, NY, USA, 2019; pp. 152–160. [Google Scholar]
Qiu, W.; Yuille, A. UnrealCV: Connecting computer vision to Unreal engine. In Proceedings of the Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Proceedings, Part III 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 909–916. [Google Scholar]
Barth, R.; IJsselmuiden, J.; Hemming, J.; Van Henten, E.J. Data synthesis methods for semantic segmentation in agriculture: A Capsicum annuum dataset. Comput. Electron. Agric. 2018, 144, 284–296. [Google Scholar] [CrossRef]
Greff, K.; Belletti, F.; Beyer, L.; Doersch, C.; Du, Y.; Duckworth, D.; Fleet, D.J.; Gnanapragasam, D.; Golemo, F.; Herrmann, C.; et al. Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 3749–3761. [Google Scholar]
Salazar-González, J.L.; Zaccaro, C.; Álvarez García, J.A.; Soria-Morillo, L.M.; Caparrini, F.S. Real-time gun detection in CCTV: An open problem. Neural Netw. 2020, 132, 297–308. [Google Scholar] [CrossRef] [PubMed]
González, J.L.S.; Álvarez-García, J.A.; Rendón-Segador, F.J.; Carrara, F. Conditioned cooperative training for semi-supervised weapon detection. Neural Netw. 2023, 167, 489–501. [Google Scholar] [CrossRef] [PubMed]
Torregrosa-Domínguez, A.; Álvarez García, J.A.; Salazar-González, J.L.; Soria-Morillo, L.M. Effective Strategies for Enhancing Real-Time Weapons Detection in Industry. Appl. Sci. 2024, 14, 8198. [Google Scholar] [CrossRef]
Ruiz-Santaquiteria, J.; Velasco-Mata, A.; Vallez, N.; Bueno, G.; Alvarez-Garcia, J.A.; Deniz, O. Handgun detection using combined human pose and weapon appearance. IEEE Access 2021, 9, 123815–123826. [Google Scholar] [CrossRef]
Olmos, R.; Tabik, S.; Herrera, F. Automatic handgun detection alarm in videos using deep learning. Neurocomputing 2018, 275, 66–72. [Google Scholar] [CrossRef]
Bhatt, A.; Ganatra, A. Explosive weapons and arms detection with singular classification (WARDIC) on novel weapon dataset using deep learning: Enhanced OODA loop. Eng. Sci. 2022, 20, 252–266. [Google Scholar] [CrossRef]
Haq, N.U.; Fraz, M.M.; Hashmi, T.; Shahzad, M. Orientation aware weapons detection in visual data: A benchmark dataset. Computing 2022, 104, 2581–2604. [Google Scholar] [CrossRef]
Ohman, W. Data Augmentation Using Military Simulators in Deep Learning Object Detection Applications. Master’s Thesis, KTH, School of Electrical Engineering and Computer Science (EECS), Stockholm, Sweden, 2019. [Google Scholar]
Waite, J.R.; Feng, J.; Tavassoli, R.; Harris, L.; Tan, S.Y.; Chakraborty, S.; Sarkar, S. Active shooter detection and robust tracking utilizing supplemental synthetic data. arXiv 2023, arXiv:2309.03381. [Google Scholar] [CrossRef]
Bhowmik, N.; Wang, Q.; Gaus, Y.F.A.; Szarek, M.; Breckon, T.P. The good, the bad and the ugly: Evaluating convolutional neural networks for prohibited item detection using real and synthetically composited X-ray imagery. arXiv 2019, arXiv:1909.11508. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; pp. 91–99. [Google Scholar]
Kaminetzky, A.; Mery, D. In-depth analysis of automated baggage inspection using simulated X-ray images of 3D models. Neural Comput. Appl. 2024, 36, 18761–18780. [Google Scholar] [CrossRef]
Kang, M.; Sun, H. EthicalFab: Toward ethical fabrication process through privacy-preserving illegal product detection. Manuf. Lett. 2025, 44, 1425–1431. [Google Scholar] [CrossRef]
Cani, J.; Mademlis, I.; Mancuso, M.; Paternoster, C.; Adamakis, E.; Margetis, G.; Chambon, S.; Crouzil, A.; Lechelek, L.; Dede, G.; et al. CEASEFIRE: An AI-Powered System for Combating Illicit Firearms Trafficking. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData); IEEE: New York, NY, USA, 2024; pp. 2697–2705. [Google Scholar]
Hasselgren, J.; Munkberg, J.; Lehtinen, J.; Aittala, M.; Laine, S. Appearance-Driven Automatic 3D Model Simplification. Proc. EGSR (DL) 2021, 29, 85–97. [Google Scholar]
Liu, Z.; Zhang, C.; Cai, H.; Qv, W.; Zhang, S. A model simplification algorithm for 3D reconstruction. Remote Sens. 2022, 14, 4216. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J. Ultralytics YOLO11, 2024. Available online: https://docs.ultralytics.com/models/yolo11 (accessed on 7 May 2026).
Mittal, A.; Moorthy, A.K.; Bovik, A.C. Blind/Referenceless Image Spatial Quality Evaluator. In Proceedings of the 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR); IEEE: New York, NY, USA, 2011; pp. 723–727. [Google Scholar] [CrossRef]
Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying mmd gans. arXiv 2018, arXiv:1801.01401. [Google Scholar]

Figure 1. Cases involving 3D-printed firearms or firearm parts per 100,000 population (2017–2024). Numbers inside countries indicate the absolute number of reported cases. F3DP, fully 3D-printed firearms; Hybrid, hybrid firearms; PKC, parts-kit completions/conversions; Parts, 3D-printed firearm parts. Reproduced from The emergence of 3D-printed firearms: An analysis of media and law enforcement reports [4], licensed under CC BY 4.0.

Figure 2. LEOPARD workflow from Web to Experimentation.

Figure 3. Journey from the STL file of barrel retainer downloaded from DEFCAD (a) to scene composition (b), 3D printing from STL (c) and its detection using YOLO11 trained using synthetic images (d).

Figure 4. Photorealistic rendering of selected weapon components using Unreal Engine 5.

Figure 5. Sample synthetic images showing color, background, rotation, and position variations.

Figure 6. Synthetic negative samples used to reduce false positives during training.

Figure 7. Disassembled FGC-9 MkII with annotated components defining the five detection classes used in LEOPARD-Zero: upper receiver, lower receiver, pistol grip, barrel retainer, and trigger rot.

Figure 8. Training and validation loss curves, and validation mAP@50, over 50 training epochs for all four model variants. (Top): bounding box loss (training: dashed, validation: solid). (Middle): classification loss (training: dashed, validation: solid). (Bottom): validation mAP@50.

Figure 9. Confusion matrices (normalized by row) for YOLO11s and YOLO11m trained on the LEOPARD-Zero dataset, evaluated on the video test set.

Figure 10. Confusion matrices (normalized by row) for YOLO11s and YOLO11m trained on the LEOPARD-Twelve dataset, evaluated on the video test set.

Figure 11. Sample of different true positives on test set, using models trained on LEOPARD-Zero (blue) and LEOPARD Twelve (green).

Figure 12. One of the few bad detections where a laptop charger is mistaken for an upper receiver.

Table 1. Distribution of synthetic images divided in dark weapons (dw) and light weapons (lw); and considering having occlusions (ocl) or not (nocl), expressed in thousands (k).

Category	Training	Validation	Total
LW_NOCL	7.00 k	1.75 k	8.75 k
DW_NOCL	1.20 k	0.30 k	1.50 k
LW_OCL	20.00 k	5.00 k	25.00 k
DW_OCL	25.80 k	6.45 k	32.25 k
BG_OBJS	6.00 k	1.50 k	7.50 k
Total	60.00 k	15.00 k	75.00 k

Table 2. Performance metrics for YOLO11 models on the video test set (±values represent 95% confidence intervals from repeated evaluation over three non-overlapping test subsets; see Section 4.2). Bold values highlight the best performance for each metric.

Model	Dataset	mAP@50	mAP@75	Precision	Recall	F1-Score	mAP@0.5:0.95
YOLO11s	LEOPARD-Zero	83.12% (±2.48)	81.13% (±2.13)	86.13% (±4.58)	75.29% (±5.18)	80.33% (±2.88)	72.26% (±2.37)
YOLO11m	LEOPARD-Zero	79.87% (±7.27)	77.92% (±6.02)	91.97% (±7.15)	67.41% (±6.20)	77.79% (±6.26)	66.93% (±5.92)
YOLO11s	LEOPARD-Twelve	86.78% (±0.49)	84.78% (±2.65)	87.24% (±5.63)	80.41% (±2.53)	83.67% (±2.10)	76.21% (±2.33)
YOLO11m	LEOPARD-Twelve	80.41% (±4.72)	78.73% (±4.57)	89.17% (±2.35)	68.50% (±6.63)	77.47% (±4.90)	66.79% (±4.06)

Table 3. Number of component crops available per class after bounding-box extraction from annotated real images.

Class	Real Crops	Synthetic Crops
barrel_retainer	1588	1000
lower_receiver	1686	1000
pistol_grip	1920	1000
trigger_rot	1572	1000
upper_receiver	1640	1000
Total	8406	5000

Table 4. BRISQUE scores (mean ± std) for real and synthetic crops per component class. Lower values indicate higher perceptual quality.

Class	Real	Synthetic	Gap
barrel_retainer	$46.13 \pm 16.15$	$22.56 \pm 14.84$	$23.57$
lower_receiver	$44.37 \pm 17.30$	$21.43 \pm 14.22$	$22.94$
pistol_grip	$51.63 \pm 15.12$	$21.36 \pm 16.34$	$30.27$
trigger_rot	$57.88 \pm 16.87$	$48.10 \pm 11.91$	$9.78$
upper_receiver	$46.96 \pm 19.04$	$21.87 \pm 14.17$	$25.09$

Table 5. Kernel Inception Distance (KID, mean ± std) between real and synthetic crops per component class. Lower values indicate more similar feature-space distributions.

Class	KID
pistol_grip	$0.146 \pm 0.004$
lower_receiver	$0.150 \pm 0.004$
barrel_retainer	$0.159 \pm 0.005$
upper_receiver	$0.179 \pm 0.005$
trigger_rot	$0.223 \pm 0.007$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Benjumea-Bellott, C.; Torregrosa-Domínguez, Á.; Ramos-González, V.; Soria-Morillo, L.M.; Álvarez-García, J.A. LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security. Appl. Sci. 2026, 16, 5104. https://doi.org/10.3390/app16105104

AMA Style

Benjumea-Bellott C, Torregrosa-Domínguez Á, Ramos-González V, Soria-Morillo LM, Álvarez-García JA. LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security. Applied Sciences. 2026; 16(10):5104. https://doi.org/10.3390/app16105104

Chicago/Turabian Style

Benjumea-Bellott, Constantino, Ángel Torregrosa-Domínguez, Víctor Ramos-González, Luis M. Soria-Morillo, and Juan A. Álvarez-García. 2026. "LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security" Applied Sciences 16, no. 10: 5104. https://doi.org/10.3390/app16105104

APA Style

Benjumea-Bellott, C., Torregrosa-Domínguez, Á., Ramos-González, V., Soria-Morillo, L. M., & Álvarez-García, J. A. (2026). LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security. Applied Sciences, 16(10), 5104. https://doi.org/10.3390/app16105104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LEOPARD: Automated CAD-to-Synthetic Pipeline for 3D-Printed Firearm Detection in Civil Transit Security

Abstract

1. Introduction

2. Related Works

2.1. Synthetic Data Approaches

2.2. Synthetic Datasets for Firearm Detection

3. Methodology

3.1. LEOPARD Pipeline

3.1.1. Model Preparation and Selection Rationale

3.1.2. Procedural Variation

3.1.3. Material Parametrization

3.1.4. Scene Composition

3.1.5. Optimization

4. Experimentation

4.1. Datasets Specifications

4.1.1. LEOPARD-Zero: Synthetic

4.1.2. Manually Annotated Data from 3D-Printed Gun Parts

4.1.3. Training, Validation and Test Data

4.2. Experiment Environments Specifications

4.3. Results and Discussion

4.4. Synthetic Dataset Quality Analysis

4.4.1. Normalization Methodology

4.4.2. No-Reference Perceptual Quality: BRISQUE

4.4.3. Feature-Space Distribution: KID

4.4.4. Downstream Performance as Quality Proxy

5. Limitations

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

DURC Statement

Conflicts of Interest

Appendix A. Generation of X-Ray-like Synthetic Images

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI