1. Introduction
As a crucial load-bearing and transmission component of automobiles, the quality of the wheel hub directly affects the safety and service life of the entire vehicle. In recent years, with the continuous improvement of requirements for automotive lightweighting and energy conservation and emission reduction, aluminum alloy—with its low density, excellent thermal and electrical conductivity, and good plasticity—has become the mainstream raw material for industrial wheel hub production [
1]. The manufacturing processes of aluminum alloy wheels include low-pressure casting, gravity casting, forging and spinning forming, etc. Low-pressure casting has higher production efficiency, better dimensional consistency and lower defect rate of finished products compared with other methods, and has become one of the main processes for manufacturing aluminum alloy wheels [
2]. However, due to the complexity and inherent characteristics of the casting process itself, aluminum alloy wheels are highly prone to defects during the die-casting process [
3]. These defects not only reduce the mechanical properties of the wheels but may also cause safety hazards during use [
4]. Early detection of substandard products can effectively prevent safety accidents and reduce maintenance costs for enterprises in the later stage.
Non-destructive testing (NDT) refers to the inspection methods that identify surface and internal defects of materials or components without damaging the inspected object or affecting its subsequent performance. At present, commonly used non-destructive testing methods include ultrasonic testing [
5], infrared thermal imaging [
6], radiographic testing [
7], and visual inspection. Compared with other ray source detection methods, X-ray detection technology can present the density distribution and structural defects inside metals with high resolution. It has advantages such as strong traceability and wide adaptability, and has been widely applied in industrial inspection practices [
8,
9,
10]. However, in the quality inspection processes of most wheel manufacturers today, the traditional method of manual visual inspection is still widely used. Operators rely on their own experience to manually interpret X-ray imaging results to determine whether there are defects inside the wheel. This process has many drawbacks, such as high labor costs, personal subjective biases, and inability to work efficiently for long periods of time.
In recent years, with the development of deep learning, AI-based aluminum alloy defect detection methods, which feature high precision and efficiency, are gradually replacing manual detection methods [
11,
12]. Existing defect detection methods mainly employ deep convolutional neural networks (CNNs) or Transformer architectures for defect localization and classification. For instance, ref. [
13] introduced a reparameterized convolution module based on over-parameterization into YOLOv5, achieving a coordinated improvement in accuracy and speed for defect detection tasks. Du et al. [
14] proposed a Soft-IOU optimization evaluation criterion considering the blurred boundary characteristics of defects, taking into account complex defect scenarios where a single ground truth box contains multiple predictions and vice versa. Mery D [
15] suggested training YOLOv5 with simulated elliptical defects to address the overfitting issue caused by repeatedly using the same defect perspectives in traditional models. Wu et al. [
16] proposed adaptive noise reduction of X-ray images to improve image quality, thereby improving the model’s ability to detect wheel hub defects. Although CNN-based defect detection models have achieved remarkable results, their limited receptive fields make it difficult to fully capture the edge contour information of blurred defects. The Transformer, with its self-attention mechanism, is particularly suitable for visual tasks requiring long-range dependency modeling and is thus increasingly popular in industrial defect detection. Li et al. [
17] applied Prconv to aluminum alloy casting defect detection based on DETR, reducing computational complexity while enhancing spatial feature extraction capabilities. Ye et al. [
18] proposed focusing on defect regions through a deformable attention mechanism to address issues such as large-scale differences in surface defects and complex background textures of aluminum alloys, significantly improving the ability to capture key information. However, the methods mentioned above do not fully consider the characteristics of aluminum alloy wheel defect images under X-ray: the contrast between defects and the background is low, and the defect shapes are random and mostly distributed discretely. This leads to weak defect localization capabilities of the model and an inability to accurately and efficiently identify defect edge information. Therefore, under the requirement of real-time performance for enterprises, it is particularly important to study a defect detection model for aluminum alloy wheel hubs that is suitable for such complex situations.
Recently, DEIM, as a new generation end-to-end detection framework, has demonstrated outstanding performance in the task of aluminum alloy defect detection [
19]. This framework adopts a dense one-to-one (Dense O2O) matching strategy and a Matchability-Aware Loss (MAL) function, not only achieving a supervision density similar to One-to-Many (O2M), but also significantly accelerating the convergence speed of DETR [
20] by imposing greater penalties on low IoU matches while maintaining high-quality matching performance, setting a new benchmark in the field of real-time object detection. With the global modeling capability based on Transformer, DEIM can more effectively capture edge information of discrete defects in aluminum alloy wheel images and has stronger anti-interference ability against complex backgrounds. However, despite its advantages in detection accuracy and convergence speed, DEIM still has many deficiencies in the task of aluminum alloy wheel defect detection. The standard HGNetV2 is used as the backbone structure in DEIM, which has a weak response to low-contrast defects and insufficient edge extraction ability, leading to the model being easily disturbed by the background and a decrease in detection capability. In addition, the Efficient Hybrid Encoder in DEIM uses Attention-based Intra-scale Feature Interaction (AIFI) and CNN-based Cross-scale Feature Fusion (CCFF) to handle multi-scale features, aiming to solve the computational efficiency problem in multi-scale feature processing. However, the CCFF module relies on layer-by-layer convolution to achieve cross-scale fusion, and this design cannot effectively aggregate edge detail information when dealing with small targets or targets lacking local details, making it difficult to effectively identify defect edge contours. Moreover, the CCFF does not efficiently implement cross-module feature interaction, causing the model to have difficulty distinguishing background noise when facing defect targets with large scale variations in aluminum alloy wheels, and thus cannot efficiently complete the defect detection task.
Inspired by DEIM, this study proposes a novel detection model for low-contrast discrete defects in aluminum alloy wheel hubs. The effectiveness of the model was verified through ablation experiments and comparative experiments, and a valuable solution for the detection of defects in aluminum alloy wheel hubs under X-ray was provided. The contributions of this study can be summarized as follows:
In view of the low contrast between defects and the background and the blurred edges of defects in the original aluminum alloy wheel hub images, data preprocessing methods such as exposure fusion and high-frequency enhancement are adopted to improve the overall contrast of the images. The dataset is expanded through data augmentation methods to enhance the model’s generalization ability. A dataset containing 2318 original high-quality industrial aluminum alloy wheel hub defect images has been constructed, which can serve as a benchmark for further research and validation.
To address the problem that standard convolutions in the backbone are difficult to efficiently extract edge information of such defects, the PConv module with a efficient receptive field is introduced to effectively improve the feature extraction capability of the backbone at the bottom layer, thereby improving the model’s feature discrimination capability for discrete defects with extremely low parameter cost.
A novel two-stage feature fusion-diffusion pyramid structure named MFDPN has been designed. While ensuring the efficiency of model detection, improve the overall positioning ability of the network for discrete defects. The Structure-Aware Visual State Space (SAVSS) module is introduced to achieve feature interaction and fusion under a richer receptive field. In the MFDPN, Mamba Focus Fusion (MFF) focuses on fusing semantic information from different feature layers to achieve deep feature integration. Diffusion Assist Fusion (DAF) spreads the context-rich features from MFF to different scale branches through a cross-scale feature diffusion mechanism, significantly alleviating the information loss problem caused by scale differences in traditional feature pyramid networks.
To address the issue of background information interference during backbone feature extraction, a Channel and Spatial Attention Module (CASAB) is introduced between the backbone and the encoder to enhance the model’s robustness against complex backgrounds.
The structure of this research is as follows:
Section 2 details the data acquisition and processing procedures,
Section 3 elaborates on the aluminum alloy wheel hub defect detection method proposed in this paper,
Section 4 verifies the effectiveness of this method through experiments, and
Section 5 summarizes the entire paper.
2. Data Acquisition and Processing
Figure 1 shows the X-ray imaging data acquisition device, whose core components include a digital flat panel detector, a conveying track, an X-ray source, the wheel hub to be inspected and a computer image processing system. During operation, the conveying track first sends the wheel hub to be inspected into the X-ray inspection area. The X-ray source emits rays that penetrate the inspected wheel hub. Then, the digital flat panel detector receives the transmitted rays and sends the results to the computer image processing system. Subsequently, the computer processes them into X-ray images. Staff members conduct quality inspections by observing these images. Examples of acquired images are shown in
Figure 2.
We collected 2318 X-ray images of defective aluminum alloy wheels, with each image measuring 2048 × 2048 pixels. This dataset includes the spoke area, wheel core area, and rim area of the wheel hub, accounting for 37%, 30%, and 33% respectively. As shown in
Figure 2, the contrast of the collected raw images is relatively low, and some defects are difficult to label. Therefore, this study first performs data augmentation on the original TIF base images. The specific process is shown in
Figure 3, including exposure fusion, high-frequency enhancement, USM sharpening, fusing the processed images with the original images in a certain ratio (set to 0.5:1 here), and finally converting them to JPG format. After processing, the contrast between the background and the defects in the image is enhanced, the edges of the defects become clearer, and the details are more prominent.
Figure 4 comprehensively presents the defect statistics distribution of each area of the wheel hub in the dataset before and after data augmentation. From
Figure 4a, it can be observed that this data augmentation strategy significantly increases the number of defect samples, ensuring the generalization ability of the subsequent model while guaranteeing the model’s optimal detection performance.
Figure 4b shows the defect pixel distribution before and after data augmentation. After data augmentation, the range of defect pixels is larger, which is because there are many small sample defects that are difficult to identify by the naked eye and large sample defects with blurred boundaries in the original images. In addition, the average pixel counts of defects on the wheel core and rim are 2184 and 2765 respectively. In contrast, the average pixel count of defects on the spokes is much larger, reaching 30,172. The annotation work of this study was completed by a professional team from the manufacturer using the Labelme annotation software (Version 3.16.2), which took 15 working days. This software stores the classification and coordinate information of each defect in the image in the corresponding json file. Due to the limited original image data, this study used random flipping, translation, and other operations to expand the original dataset to 6954 images, and divided the dataset into training, validation, and test sets in a 7:2:1 ratio.
5. Conclusions
This study addresses the poor detection performance of aluminum alloy wheel hubs due to low contrast between defects and background and the discrete distribution of defect shapes in images. A novel defect detection model for aluminum alloy wheels is proposed, with the following contributions: (1) Image preprocessing methods such as exposure fusion and high-frequency enhancement are employed to improve the contrast between defects and the background in the aluminum alloy wheel dataset. (2) The PConv module is used in the backbone network to significantly enhance the model’s feature extraction capabilities for discrete defect edges with extremely low parameter cost. (3) To achieve efficient integration of multi-scale features across modules and improve the model’s feature integration capabilities for discrete defects, an innovative Mamba-based MFDPN structure is proposed. This structure promotes extensive interaction and diffusion of multi-scale information across different feature layers in the encoder, effectively mitigating information loss due to scale differences in traditional pyramid networks and significantly improving the feature aggregation capabilities of the network’s fusion layers. (4) The CASAB module is introduced to improve the model’s resistance to interference from complex backgrounds and enhance detection accuracy. Finally, we systematically integrate the above structures to construct a novel target detection model for aluminum alloy wheel hub defect datasets. Extensive experimental analysis shows that the model proposed in this paper outperforms the current state-of-the-art real-time object detectors in several key metrics. Specifically, mAP50 is improved by 7.2% compared to the baseline model, and the detection accuracy for small, medium, and large objects is improved by 7.1%, 7.2%, and 7.1%, respectively. Recall is improved by 5%, and the FPS is 39. This meets the detection requirements of actual aluminum alloy wheel factories.
In engineering deployments, the model proposed in this paper can be integrated into industrial defect detection equipment terminals. By outputting high-quality defect detection data, it can effectively improve product quality, extend the service life of wheel hub structures, and provide strong support for intelligent manufacturing and critical infrastructure monitoring. While the model presented in this paper demonstrates significant advantages in aluminum alloy wheel hub defect detection, it still has certain limitations: First, the framework only supports two-dimensional ray image processing and has not yet effectively learned three-dimensional defect data features, making it difficult to accurately capture the depth distribution and spatial morphology of defects inside the wheel hub; second, the current implementation has not been extended to multimodal processing, lacking cross-modal analysis with technologies such as infrared thermal imaging and ultrasonic detection, and failing to integrate the advantages of different detection technologies to achieve complementary and verified defect information. In future research work, a three-dimensional convolutional network will be introduced to expand the existing model’s three-dimensional feature learning capability, in order to accurately restore the spatial position, depth and volume information of defects. We will also focus on providing more comprehensive data support for the strength assessment of wheel hub structures; integrating X-ray, infrared thermal imaging and ultrasonic testing data; designing a cross-modal feature fusion module to improve the detection accuracy of complex defects; and further expanding the application value of the model in intelligent manufacturing scenarios.