INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios

Zhang, Xu; Zheng, Qingchun; Zhu, Peihao; Ma, Wenpeng

doi:10.3390/app152212033

Open AccessArticle

INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios

by

Xu Zhang

¹

,

Qingchun Zheng

^1,2,3,

Peihao Zhu

^2,3,* and

Wenpeng Ma

^2,3

¹

School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China

²

Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China

³

National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(22), 12033; https://doi.org/10.3390/app152212033 (registering DOI)

Submission received: 16 October 2025 / Revised: 6 November 2025 / Accepted: 8 November 2025 / Published: 12 November 2025

Download

Browse Figures

Versions Notes

Abstract

Welding is one of the most common machining methods in the industrial field, and weld grinding is a key task in the industrial manufacturing process. Although several weld-image datasets exist, most provide only coarse annotations and have limited scale and diversity. To address this gap, we constructed INWELD, a comprehensive multi-category weld dataset captured under real-world production conditions, providing both single-label and multi-label annotations. The dataset covers various types of welds and is evenly divided according to production needs. The proposed multi-category annotation method can predict the weld geometry and welding method without additional calculation and is applied to object detection and instance segmentation tasks. To evaluate the applicability of this dataset, we utilized the mainstream algorithms CenterNet and YOLOv7 for object detection, as well as Mask R-CNN, Deep Snake, and YOLACT for instance segmentation. The experimental results show that in single-category annotation, the AP50 of CenterNet and YOLOv7 is close to 90%, and the AP50 of Mask R-CNN and Deep Snake is greater than 80%. In multi-category annotation, the AP50 of CenterNet and YOLOv7 is greater than 80%, and the AP50 of Deep Snake and YOLACT is nearly 70%. The INWELD dataset constructed in this paper fills the gap in industrial weld surface images, lays the theoretical foundation for the intelligent research of welds, and provides data support and research direction for the development of automatic grinding and polishing of welds.

Keywords:

weld surface image; weld dataset; object detection; instance segmentation

1. Introduction

In modern manufacturing, weld grinding remains largely manual, resulting in inconsistency and inefficiency. The lack of comprehensive industrial datasets constrains the development of automated visual inspection systems. Hence, constructing an open, realistic weld dataset is essential for advancing intelligent welding.

The industrial environment of welding and grinding is harsh, and manual grinding has a severe adverse effect on workers’ health [1,2]. However, welding and grinding are basic intelligent manufacturing technologies. They are widely used across various applications in modern industry, including spacecraft, large rockets, new-energy vehicle components, subways, and complex bridges. In theory, intelligent control technology can enable industrial robots to replace humans in welding and weld grinding [3,4,5]. However, many research results cannot be applied to the actual production scene because they lack specific cognition and judgment of the current work environment.

Vision-guided industrial robots collect images and videos of the current scene using cameras, parse the current working scene information through object detection, image segmentation, and semantic understanding, and transmit it to the control center to guide industrial robot operation. Therefore, the intelligent development of industrial manufacturing is inseparable from computer vision technology.

In the welding and grinding scene, visual understanding of welds [6] is a prerequisite for further judgment and operation of welded parts. However, there are currently major obstacles in research on weld image learning, understanding, and analysis, most of which stem from the substantial lag in the study of weld image datasets relative to computer vision technology.

The advancement of computer-aided inspection in industrial manufacturing relies heavily on the availability of robust, comprehensive datasets. In weld quality control, X-ray imaging is a critical non-destructive testing method, and several public datasets have been established to facilitate the development of automated defect-detection algorithms. The WDXI dataset [7] offers 13,766 16-bit TIF images covering seven defect types, but severe class imbalance hampers unbiased training. The GDXray-Welds dataset [8] contains only 68 raw images; extensive manual cropping and annotation are required to generate usable patches, introducing inconsistency despite its successful use in deep-learning studies. The RIAWELC dataset [9] provides 24,407 preprocessed 224 × 224 pixels images across four classes, achieving >93% classification accuracy, but its limited defect taxonomy limits its applicability to broader industrial scenarios. Collectively, these X-ray datasets focus on internal defects, exhibit class imbalance, demand heavy preprocessing, and lack comprehensive defect categories, limiting their utility for surface-based tasks such as weld-bead detection or reinforcement analysis.

In the field of weld seam detection and recognition, the construction of high-quality weld-surface datasets is a prerequisite for advancing computer vision algorithms. Recent studies have proposed several relevant datasets. The Welding Seams Dataset on Kaggle [10] contains 1394 color images with a uniform resolution of 448 × 448 pixels. The dataset provides binary masks for weld-bead segmentation, but the masks are sparse and represent only straight-bead welds, preventing evaluation of multi-type classification. Annotation granularity is limited to coarse pixel masks without instance-level delineation. Walther et al. [11] introduced a multimodal dataset that couples probe-displacement data with 32 × 32 pixels thermal images for 100 samples. The thermal modality captures the temperature field during laser-beam welding, yet the dataset annotates merely the overall weld quality as Sound or Unsound. No geometric or type labels are supplied, and the low spatial resolution restricts fine-grained feature learning. The MLMP Dataset [12] consists of 1400 simulated cross-section profiles rendered at 480 × 450 pixels. Each sample includes precise geometric parameters derived from Flow-3D simulations, providing annotation accuracy that exceeds that of typical image-based labeling. However, because the data are synthetically generated, they cannot fully reproduce the texture, noise, and illumination variations encountered in real industrial environments. Moreover, the dataset does not contain explicit weld-type categories. Zhao et al. [13] aggregated 1016 raw images sourced from open-source repositories, laboratory captures, and web crawling, then expanded the collection to 3048 images through data augmentation. The dataset supplies pixel-level semantic segmentation masks, enabling instance-aware training. Nevertheless, it lacks weld-type annotations, and the original image resolution is undisclosed, which hampers tasks that require high-precision spatial detail.

In general, these datasets suffer from several common shortcomings. They contain relatively few images—usually only a few hundred to a few thousand—which limits the amount of data available to train robust deep learning models. The image resolutions are low, ranging from 32 × 32 pixels to about 480 × 450 pixels, making it challenging to capture fine weld textures and subtle defects. The coverage of weld types is narrow, often representing only a single geometry or a binary quality label, and therefore does not reflect the diversity of real-world welding processes. The annotation details are coarse, typically limited to sparse binary masks or simple quality tags, and do not provide instance-level segmentation. These limitations hindered the creation of a universal dataset with multiple categories and high-precision weld surface information, and constrained progress in computer vision-driven weld inspection and intelligent welding applications.

To address this deficiency, in this paper, we established a weld surface image dataset, INWELD. The original dataset data were collected, sifted and processed, and annotated. The training, validation, and test sets were reasonably balanced with respect to weld geometry and welding methods. Ablation experiments in object detection and instance segmentation confirmed the effectiveness of the dataset’s balanced distribution. In addition, the performance of the multi-category annotation weld image dataset in object detection and instance segmentation was evaluated through experiments. The main contributions of this paper are as follows:

(1): A weld surface image dataset was established for the first time. The images were captured in the factory and subjected to detailed, intensive manual annotation, which can be used for object detection and instance segmentation.
(2): Based on actual industrial production, a dataset partitioning method was proposed to achieve a balanced distribution in terms of weld geometry and welding method, which has higher accuracy in object detection and instance segmentation than the usual random partitioning method.
(3): Multi-category annotations were added to the dataset, which achieved ideal prediction results in both object detection and instance segmentation tasks, and the geometry and welding method information of the weld being tested can be obtained without additional calculations.

The rest of this paper is organized as follows. Section 2 introduces data acquisition and processing, Section 3 performs balanced partitioning of the dataset, Section 4 conducts experimental studies, Section 5 discusses the result, and Section 6 concludes the paper.

2. Data Acquisition and Processing

2.1. Data Collection

We visited six welding factories, including Shanxi Taiyuan Jinrong Robot Co., Ltd., and its partners, to capture on-site weld images. Figure 1 shows the shooting environment for weld photos. All images in the dataset were captured with a Xiaomi MIX 2 smartphone, Xiaomi, Beijing, China (12 MP rear camera, f/2.0 aperture, 27 mm equivalent focal length). All images were captured with a Xiaomi MIX 2 smartphone (12 MP rear camera, f/2.0 aperture, 27 mm equivalent focal length). The phone’s automatic exposure algorithm was used, resulting in varying ISO, shutter speed, and aperture across images. Photographs were taken under ambient natural light without artificial lighting. A variety of different weld geometries were collected, including flat straight welds, flat curved welds, space spliced straight welds, space curved welds, spot welds, etc. The welding methods mainly include manual arc welding, carbon dioxide shielded welding, and submerged arc welding.

We acquired a total of 17,811 original weld images. Each weld type does not appear in a single image. Some images contain multiple kinds of welds and multiple welds of the same type. In addition, most welds are long strips, except for spot welds, which makes it difficult for a single picture to capture a complete weld. In this case, each image includes a partial part of the weld, and the remaining part is shown in other photos. There is no need to splice a whole long weld, as the portions of the weld image are obtained and processed separately each time in the weld acquisition system.

2.2. Image Sifting and Processing

The original photos taken are screened and qualified. The screening criteria are as follows: (1) If the weld is not precise or the image is fuzzy, it will be judged as unqualified. (2) If multiple photos of a weld are taken at the same angle, select the clearest one. (3) If a long weld is photographed in sections and the differences between the sections are too small, they will be selectively deleted. The retention standard is that a new weld is visible to the naked eye. (4) When the image contains multiple welds, at least one precise weld will be retained, and welds that are not clear will be judged as unqualified. (5) If the same weld or weld combination is photographed at different shooting angles, and the difference between the angles is too small, they will be selectively deleted. The retention standard is based on the visible changes in vision angles and weld shapes. (6) When there are slag and powder obstructions on the weld, the retention standard is that the edge line is not damaged and can be identified by the naked eye.

After screening, a total of 9638 qualified images were obtained. Through rotation and cropping, the weld surface images were unified to 1440 × 1080 pixels to ensure the overall weld width was at least 100 pixels. Then the unified-size images were screened again, and a total of 8536 weld surface photos were obtained, including 900 holey weld images.

The final weld surface image mainly includes four geometric forms—spot welds, straight welds, curved welds, and holey welds—and three welding methods—CO₂ shielded welding, submerged arc welding, and manual arc welding. The distribution statistics of welds are shown in Table 1. There are 17,133 weld instances total. From the perspective of geometry, there are 1054 spot welds, 9798 straight welds, 2956 curved welds, and 3325 holey welds. From the standpoint of welding methods, there are 7918 CO₂ shielded welds, 8751 manual arc welds, and 207 submerged arc welds. Among them, the spot weld and holey weld methods are both manual arc welding.

To restore the original industrial scene where the weld is located, we did not perform image processing on the dataset to adjust brightness and contrast, except for supplementing external light sources when natural light was insufficient. On the contrary, we aim to use raw images from natural industrial scenes while maintaining weld clarity.

2.3. Data Annotation

To ensure the professionalism and accuracy of the annotation, 12 mechanical engineering graduate students who are familiar with welding and have welding experience manually annotated the dataset. After the annotation was completed, it underwent multiple rounds of verification and modification to ensure its accuracy and precision. This process took a total of one year.

We manually annotate weld images into two categories, welds with holes and welds without holes, using polygonal annotations. For welds without holes, we use LabelMe [14] for manual annotation, and for welds with holes, we use Supervisely [15] for online manual annotation. The samples of densely annotated weld images are shown in Figure 2. Among these, the left image is annotated in LabelMe with dense red points, and the right one is annotated in Supervisely with yellow areas.

A random 2% subset of the weld images (170 images) was independently labeled by all 12 annotators. Pairwise Intersection over Union (IoU) was computed for each image. The average IoU across the subset was 0.88, which exceeded the commonly used consistency threshold of 0.80 [16,17,18].

To enhance the versatility of the weld surface image dataset, we converted the non-hole weld dataset annotation file into the COCO dataset format [19] after completing the annotation. However, due to the limitation that the COCO dataset cannot represent objects with holes, we retained the original Supervisely data format for welds with holes.

3. Dataset Representative Split

Usually, the dataset is randomly split into a training set, a test set, and a validation set [19]. However, this method is highly random and cannot guarantee balanced stability of the objects’ states across the three subsets, especially for highly uncertain weld images. Therefore, to divide the dataset more reasonably and ensure that each object detection and instance segmentation model can represent the variability across different welding methods and weld morphologies, we evenly distribute the 7636 non-porous weld images across the training, test, and validation sets [20].

We design a balanced distribution of welding methods and weld geometries through basic allocation criteria. Specifically, all welding methods (including CO₂ shielded welding, submerged arc welding, and manual arc welding) and all weld geometries (including spot welds, straight welds, and curved welds) are evenly distributed across the training, test, and validation sets in the same proportions. At the same time, the training, test, and validation sets are split in a 7:2:1 ratio. We obtain a balanced distribution of 5350 training images, 1523 test images, and 763 validation images, all with dense annotations in the COCO format.

4. Experiment

4.1. Experimental Settings

To comprehensively evaluate the INWELD dataset’s performance in object detection and instance segmentation, we conducted multiple experiments using mainstream object detection and instance segmentation algorithms in the current. CenterNet [21] and YOLOv7 [22] were used for object detection, while Mask R-CNN [23], Deep Snake [24], and YOLACT [25] were used for instance segmentation.

We use CenterNet and YOLOv7 to conduct object detection experiments on the dataset. The object detection principle of CenterNet involves predicting object centers to estimate the number of objects and their locations, followed by calculating object dimensions [21]. The object detection task can be completed without calculating anchors, making it convenient and fast. In the CenterNet experiment, a total of 150 epochs were trained. The batch size was set to 16, and the initial learning rate was 1.25 ×

10^{- 4}

, which decayed to 1/10 at the 90th and 120th epochs, respectively. YOLOv7 is one of the most commonly used algorithms of the YOLO algorithm [22,26,27,28,29,30,31,32,33,34,35] and is also an anchor-free method with the advantages of high speed and real-time performance, and has significant improvements in accuracy. In the YOLOv7 experiment, the default settings are used, and the epoch, batch size, and learning rate are consistent with those in the original paper [22].

Mask R-CNN, Deep Snake, and YOLACT were used to conduct instance segmentation experiments on the dataset. Mask R-CNN is a two-stage algorithm [36]. It generates candidate regions through a region proposal network (RPN) and then segments each candidate region. It has high segmentation accuracy but is computationally intensive. In this experiment, the backbone network of Mask R-CNN uses R101, with a maximum of 300,000 iterations, a batch size of 256, an initial learning rate of 2.5 ×

10^{- 3}

, and decays at the 200,000th and 270,000th epochs. Deep Snake is a two-stage algorithm that uses deep learning to compute contour deformation. It is suitable for segmentation tasks with high boundary requirements, with high accuracy and good real-time performance. In the Deep Snake experiment, 200 epochs were trained with a batch size of 16 and an initial learning rate of 1 ×

10^{- 4}

,which was decayed to 1/2 at the 80th, 120th, 150th, and 170th epochs, respectively. Different from the above two algorithms, YOLACT is a one-stage instance segmentation algorithm [36]. It accelerates inference by simplifying the segmentation network’s design, making it particularly suitable for high-real-time segmentation tasks. In the YOLACT experiment, the maximum number of iterations is set to 80,000, the batch size is 16, and the initial learning rate is 1 ×

10^{- 3}

,which decays tenfold after the 40,000th and 70,000th iterations, respectively. The hardware used in the experiment includes an Xeon(R) Gold 5218@2.30GHz processor (Intel, Santa Clara, CA, USA), 64 GB of memory, and a deep learning server equipped with 2 NVIDIA GeForce GTX 2080 Ti GPUs (NVIDIA, Santa Clara, CA, USA). All experiments were conducted using PyTorch 1.4.0, and its deep learning capabilities were used for model training and evaluation.

We use Average Precision (AP) and Frames Per Second (FPS) as the primary evaluation indicators in all experiments. The specific indicators are AP@50:5:95, AP50, AP75, and FPS, where AP@50:5:95 represents the average AP across IOU thresholds from 50% to 95% with a step size of 5%, and AP50 and AP75 represent the AP at 50% and 75% IOU thresholds, respectively. FPS is used to measure the time required for a calculation.

4.2. Ablation Experiment

To verify the effectiveness of the dataset equalization allocation, ablation experiments were conducted from two aspects: object detection and instance segmentation. The datasets used in the experiment are WELD1 and WELD2, which, respectively, refer to the randomly allocated dataset and the dataset after balanced allocation, both of which are single-category labeled.

Table 2 and Table 3 are the experimental results of object detection and instance segmentation, respectively. From the results, it is evident that all five algorithms perform better on the WELD2 dataset after equilibrium allocation processing in all experiments. Among them, in terms of object detection, CenterNet increased by 1.8%, 1.4%, and 4.6% on AP@50:5:95, AP50, and AP75, respectively. YOLOv7 increased by 0.9% on AP@50:5:95 and 1.3% on AP50. In terms of instance segmentation, Mask R-CNN increased by 1.6% on AP@50:5:95, 1.4% on AP50, 2.2% on AP75, Deep Snake increased by 1.7% on AP@50:5:95, 3.3% on AP50, 3.4% on AP75, YOLACT increased by 0.5% on AP@50:5:95, 0.3% on AP50, and 0.7% on AP75.

The experimental results demonstrate that a balanced distribution of the weld dataset significantly impacts object detection and instance segmentation. The effect is remarkable, necessitating a balance in dataset distribution across welding methods and weld geometries. This approach makes the data distribution more balanced and scientific, significantly improving the usability in object detection and segmentation. YOLACT is a single-stage instance segmentation algorithm that does not involve object detection before segmentation [25]. As a result, it struggles to converge when dealing with variations in weld characteristics within a single-category labeled weld dataset. However, it still proves the effectiveness of balanced dataset allocation in ablation experiments.

4.3. Multi-Category Experiments

Comparative experiments of single-category datasets verified the rationality of the dataset and the effectiveness of balanced processing. Although single-category datasets can already meet the needs in most object detection and instance segmentation scenarios, we also added multi-category annotations to the datasets for future consideration, given the possibility that more tasks will require multi-category datasets. The multi-category labeled weld dataset mainly divides welds into eight categories based on geometric forms and welding methods, mainly including “straight-CO₂”, “straight-submerged”, “straight-electrode”, “curve-CO₂”, “curve-electrode”, and “spot-electrode” as shown in Figure 3, and for other welding methods that are less common among straight welds and curved welds that are difficult to support and train. Welds are listed as “straight-others” and “curve-others”. The welding methods are mainly carbon dioxide protective welding, submerged arc welding, and manual arc welding. To evaluate multi-category datasets, they are still verified separately through the two tasks of object detection and instance segmentation.

Table 4 and Table 5 show the performance of multi-category annotated weld datasets in object detection and instance segmentation. Among them, for object detection, CenterNet and YOLOv7 both achieved more than 60% AP@50:5:95, with AP50 exceeding 80%. Additionally, CenterNet achieved AP75 of more than 70%, fully demonstrating the availability of multi-category weld datasets for object detection. In terms of instance segmentation, Deep Snake and YOLACT achieved more than 50% AP@50:5:95, with AP50 around 70% and AP75 exceeding 60%. In contrast, Mask R-CNN performed worse, with AP@50:5:95 of 44.4%, AP50 of 61.9%, and AP75 of 51.7%. The gap between each indicator and Deep Snake and YOLACT was about 10%. In the single-category experiment, all AP indicators performed the best. This is because Mask R-CNN requires more training data than Deep Snake and YOLACT, and because multi-category labeled datasets reduce the number of welds per category. Therefore, the average segmentation accuracy of Mask R-CNN is decreased significantly in the multi-category dataset.

Due to the actual welding volume produced, the number of different welds in the weld pictures is not uniform. However, in the YOLOv7 classification experiment (Table 6), we can still see that most weld inspection categories achieve high accuracy. Except for the small number of labels in the “curve-others” category, where recognition accuracy is poor, other categories have achieved good results. At the same time, YOLACT, which performed poorly in single-category experiments, surprised us with its results in multi-category experiments. Multi-category weld image annotation helps YOLACT converge more quickly and ultimately achieve better segmentation, further demonstrating the accuracy and completeness of multi-category annotation.

5. Discussion

We conducted ablation experiments to compare the equal-distribution splitting strategy with a conventional random split of training, validation, and test sets. The balanced distribution approach significantly improves Average Precision for both object detection and instance segmentation, demonstrating that a more balanced dataset yields more reliable performance across algorithms. Subsequently, we evaluated the dataset under two annotation schemes: a single-category scheme, where all weld regions are labeled simply as “weld,” and a multi-category scheme that annotates eight classes based on welding type and weld geometry. Across five representative detectors and segmenters, both schemes achieve high detection accuracies, confirming the quality and completeness of the annotations. Error analysis reveals that most misclassifications occur between geometrically similar welds and between welds that share the same shape but differ in welding methods. The rare “curve-others” class, with very few training samples, exhibits the lowest AP, indicating that limited data for minority categories directly degrade performance. Although some welding methods are infrequently used in real production, resulting in fewer images and slightly lower performance for those classes, the probability of encountering such welds on the shop floor is minimal, so the overall impact on practical applications is negligible.

From an industrial perspective, the dataset can be leveraged as a weld-region detection and segmentation system that supports multiple downstream tasks. For facilities that only need to locate welds without distinguishing their type, the single-category annotation enables a lightweight detector to run on edge hardware for real-time weld localization, facilitating robotic tracking, automated welding-parameter logging, or downstream quality-control pipelines. When a production line uses multiple welding methods, multi-category annotations enable a single model to simultaneously classify both the geometric shape and the welding method, eliminating the need for separate detectors for each technique and simplifying system integration. Moreover, because the dataset provides explicit geometry and welding method labels, it can drive higher-level manufacturing operations such as automated weld polishing, in which the system can infer the appropriate polishing tool path and parameters for each detected weld region, ensuring consistent surface finish while avoiding over-polishing of delicate areas. This unified detection–segmentation framework thus serves as a versatile foundation for real-time inspection, flexible multi-process monitoring, and advanced robotic manipulation in modern welding factories.

In summary, the proposed weld image dataset, INWELD, available in both single- and multi-category formats, can offer a scientifically balanced, richly annotated resource for training robust object detection and instance segmentation models. The comprehensive evaluation across multiple algorithms confirms its suitability for a wide range of industrial scenarios, from basic weld localization to sophisticated process-aware robotic operations such as automated polishing. By addressing class imbalance through balance distribution splitting and providing detailed geometry and welding method annotations, the dataset not only advances research in weld-region perception but also facilitates practical deployment in contemporary manufacturing environments.

6. Conclusions

To achieve weld-object detection and instance segmentation in industrial scenarios, we have built the first weld-surface image dataset INWELD, collecting 17811 raw weld photographs and, after screening, retaining 8536 high-quality annotated images. The dataset is split into training, test, and validation sets in a ratio of 7:2:1. In comparative experiments between random and equilibrium data splits, CenterNet and YOLOv7 achieved AP50 of 89.8% and 81.3% under the equilibrium split. Mask R-CNN and Deep Snake achieved instance segmentation AP50 of 88.7% and 84.9%, respectively. Among which, Deep Snake achieved 3.3% improvement in AP50 over the random split, demonstrating that a balanced dataset markedly improves model robustness. In multi-category annotations, the AP50 of CenterNet and YOLOv7 exceeded 80%, while Mask R-CNN, YOLACT, and Deep Snake achieved AP50 of 61.9%, 69.5%, and 72.0%, respectively. At the same time, it enables joint prediction of weld geometry and welding process, which supports real-time weld localization, cross-process monitoring, intelligent post-processing, and predictive maintenance in manufacturing environments. However, the current collection is limited by relatively uniform indoor lighting and by samples drawn from factories in a single geographic region, which may affect generalization to more varied illumination conditions and material finishes. In the future, we will extend the dataset to include 3D weld surface acquisition and enrich it with diverse lighting and surface contamination scenarios to improve robustness and enable domain adaptation techniques.

Author Contributions

Conceptualization, X.Z. and Q.Z.; methodology, X.Z.; software, X.Z.; validation, Q.Z., W.M., and P.Z.; formal analysis, P.Z.; investigation, P.Z.; resources, Q.Z.; data curation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, P.Z. and W.M.; visualization, W.M.; supervision, P.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52575590.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the privacy concerns of the factories collecting the dataset.

Acknowledgments

We would like to express our gratitude to Shanxi Taiyuan Jinrong Robot Co., Ltd., and its partners for providing us with the venue to collect data. We wish to thank the anonymous reviewers and the editors for their comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Average Precision
FPS	Frames Per Second
IOU	Intersection over Union

References

Zheng, Q.; Lu, K.; Ma, W.; Zhang, Z.; Zhu, P.; Liu, J.; Hu, Y. Weld image dust removal algorithm based on fusion of bright channel and weighted guide dark channel. UPB Sci. Bull. Ser. C Electr. Eng. Comput. Sci. 2020, 82, 69–80. [Google Scholar]
Zheng, Q.; Zhang, Z.; Liu, J.; Zhu, P.; Ma, W. Welding Seam Image Dust Removal Algorithm Based on Fusion of Dual-scale Dark Channel and Bright Channel. IAENG Int. J. Comput. Sci. 2021, 48, 277. [Google Scholar]
Guo, Q.; Yang, Z.; Xu, J.; Jiang, Y.; Wang, W.; Liu, Z.; Zhao, W.; Sun, Y. Progress, challenges and trends on vision sensing technologies in automatic/intelligent robotic welding: State-of-the-art review. Robot. Comput. Integr. Manuf. 2024, 89, 102767. [Google Scholar] [CrossRef]
Lu, C.; Gao, R.; Yin, L.; Zhang, B. Human-Robot Collaborative Scheduling in Energy-Efficient Welding Shop. IEEE Trans. Ind. Inform. 2024, 20, 963–971. [Google Scholar] [CrossRef]
Wang, B.; Hu, S.J.; Sun, L.; Freiheit, T. Intelligent welding system technologies: State-of-the-art review and perspectives. J. Manuf. Syst. 2020, 56, 373–391. [Google Scholar] [CrossRef]
Zhang, X.; Zheng, Q.; Zhu, P.; Zhao, Y.; Liu, J. Weld Image Segmentation in Industrial Smoke Scene. Proc. Rom. Acad. Ser. A Math. Phys. Tech. Sci. Inf. Sci. 2024, 25, 157–164. [Google Scholar] [CrossRef]
Guo, W.; Qu, H.; Liang, L. WDXI: The Dataset of X-Ray Image for Weld Defects. In Proceedings of the 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Huangshan, China, 28–30 July 2018; pp. 1051–1055. [Google Scholar] [CrossRef]
Pan, H.; Pang, Z.; Wang, Y.; Wang, Y.; Chen, L. A New Image Recognition and Classification Method Combining Transfer Learning Algorithm and MobileNet Model for Welding Defects. IEEE Access 2020, 8, 119951–119960. [Google Scholar] [CrossRef]
Totino, B.; Spagnolo, F.; Perri, S. RIAWELC: A Novel Dataset of Radiographic Images for Automatic Weld Defects Classification. Sensors 2023, 23, 2771. [Google Scholar] [CrossRef]
Kaggle. Welding Seams Dataset. 2022. Available online: https://www.kaggle.com/datasets/abdulabdul228/welding-seams-dataset (accessed on 27 October 2025).
Walther, D.; Schmidt, L.; Schricker, K.; Junger, C.; Bergmann, J.P.; Notni, G.; Mäder, P. Dataset for weld seam analysis and discontinuity prediction in laser beam welding scenarios. Data Brief 2025, 58, 110534. [Google Scholar] [CrossRef]
Wang, H.; Li, S.; Liu, X.; Dong, C.; Wan, F. ADAP: Adaptive & Dynamic Arc Padding for predicting seam profiles in Multi-Layer-Multi-Pass robotic welding. IEEE Robot. Autom. Lett. 2024, 6, 204–219. [Google Scholar]
Zhao, M.; Liu, X.; Wang, K.; Liu, Z.; Dong, Q.; Wang, P.; Su, Y. Welding Seam Tracking and Inspection Robot Based on Improved YOLOv8s-Seg Model. Appl. Sci. 2024, 14, 4690. [Google Scholar] [CrossRef] [PubMed]
Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online Image Annotation and Applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
Supervisely. Supervisely Computer Vision Platform. 2023. Available online: https://supervisely.com (accessed on 26 October 2022).
Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
Zhou, Z.H. A Survey on Evaluation Metrics for Image Segmentation. Pattern Recognit. 2020, 106, 107530. [Google Scholar] [CrossRef]
Kumar, A.; Singh, R. Intersection-over-Union as a Metric for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1234–1245. [Google Scholar]
Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK; New York, NY, USA, 1996. [Google Scholar] [CrossRef]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Peng, S.; Jiang, W. Deep Snake for Real-Time Instance Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8530–8539. [Google Scholar] [CrossRef]
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT: Real-Time Instance Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9156–9165. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Chaurasia, A.; Diaconu, L.; Ingham, F.; Colmagro, A.; Ye, H.; et al. Ultralytics/yolov5: V4.0-nn.SiLU() Activations, Weights & Biases Logging, PyTorch Hub Integration. 2021. Available online: https://zenodo.org/records/4418161 (accessed on 12 March 2023).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Ultralytics. Yolov11. 2024. Available online: https://docs.ultralytics.com/zh/models/yolo11/ (accessed on 7 November 2024).
Ulku, I.; Akagunduz, E. A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D Images. Appl. Artif. Intell. 2022, 36, 2032924. [Google Scholar] [CrossRef]

Figure 1. The shooting environment for weld photos.

Figure 2. Image of densely annotated welds.

Figure 3. The main weld categories in the multi-category labeled weld dataset. Among which, (a) shows the straight-CO₂ annotated in green, (b) shows the straight-submerged annotated in crimson, (c) shows the straight-electrode annotated in yellow, (d) shows the curve-CO₂ annotated in blue, (e) shows the curve-electrode annotated in purple, and (f) shows the spot-electrode annotated in red.

Table 1. Data distribution statistics.

Weld Type	Quantity
spot welds	1054
straight welds	9798
curved welds	2956
holey welds	3325
CO₂ shielded welds	7918
manual arc welds	8751
submerged arc welds	207
Total	17,133

Table 2. Ablation experiment results of object detection.

Methods	WELD1			WELD2
Methods	AP@50:5:95	AP50	AP75	AP@50:5:95	AP50	AP75
CenterNet [21]	63.2	87.6	72.4	67.7	89.8	77.0
YOLOv7 [22]	70.1	90.7	-	71.0	92.0	-

Table 3. Ablation experiment results of instance segmentation.

Methods	WELD1			WELD2
Methods	AP@50:5:95	AP50	AP75	AP@50:5:95	AP50	AP75
Mask R-CNN [23]	63.5	87.3	73.2	65.1	88.7	75.4
Deep Snake [24]	62.2	81.6	72.8	63.9	84.9	75.2
YOLACT [25]	13.5	19.9	15.1	14.0	20.2	15.8

Table 4. Experimental results for object detection of the multi-category dataset.

Methods	AP@50:5:95	AP50	AP75
CenterNet [21]	62.3	83.2	71.8
YOLOv7 [22]	66.1	84.2	-

Table 5. Experimental results for instance segmentation of the multi-category dataset.

Methods	AP@50:5:95	AP50	AP75
Mask R-CNN [23]	44.4	61.9	51.7
Deep Snake [24]	53.2	69.5	63.2
YOLACT [25]	52.0	72.0	62.7

Table 6. Category experiment results on YOLOv7.

Categories	Labels	AP@50:5:95	AP50
straight-CO₂	1267	0.728	0.926
straight-submerged	40	0.860	0.988
straight-electrode	565	0.611	0.830
straight-others	29	0.640	0.787
curve-CO₂	274	0.70	0.848
curve-electrode	298	0.676	0.909
curve-others	12	0.463	0.598
spot-electrode	194	0.576	0.830

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Zheng, Q.; Zhu, P.; Ma, W. INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios. Appl. Sci. 2025, 15, 12033. https://doi.org/10.3390/app152212033

AMA Style

Zhang X, Zheng Q, Zhu P, Ma W. INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios. Applied Sciences. 2025; 15(22):12033. https://doi.org/10.3390/app152212033

Chicago/Turabian Style

Zhang, Xu, Qingchun Zheng, Peihao Zhu, and Wenpeng Ma. 2025. "INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios" Applied Sciences 15, no. 22: 12033. https://doi.org/10.3390/app152212033

APA Style

Zhang, X., Zheng, Q., Zhu, P., & Ma, W. (2025). INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios. Applied Sciences, 15(22), 12033. https://doi.org/10.3390/app152212033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios

Abstract

1. Introduction

2. Data Acquisition and Processing

2.1. Data Collection

2.2. Image Sifting and Processing

2.3. Data Annotation

3. Dataset Representative Split

4. Experiment

4.1. Experimental Settings

4.2. Ablation Experiment

4.3. Multi-Category Experiments

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI