Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection

Jiang, Zirou; Fu, Jintao; Zeng, Tianchen; Liu, Renjie; Cong, Peng; Miao, Jichen; Sun, Yuewen

doi:10.3390/app15094825

Open AccessArticle

Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection

by

Zirou Jiang

^1,2,†,

Jintao Fu

^1,2,†,

Tianchen Zeng

^1,2

,

Renjie Liu

^1,2,

Peng Cong

^1,2,

Jichen Miao

^1,2 and

Yuewen Sun

^1,2,*

¹

Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing 100084, China

²

Beijing Key Laboratory of Nuclear Detection Technology, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(9), 4825; https://doi.org/10.3390/app15094825

Submission received: 31 March 2025 / Revised: 24 April 2025 / Accepted: 24 April 2025 / Published: 26 April 2025

(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)

Download

Browse Figures

Versions Notes

Abstract

Defect detection in industrial computed tomography (CT) images remains challenging due to small defect sizes, low contrast, and noise interference. To address these issues, we propose Defect R-CNN, a novel detection framework designed to capture the structural characteristics of defects in CT images. The model incorporates an edge-prior convolutional block (EPCB) that guides to focus on extracting edge information, particularly along defect boundaries, improving both localization and classification. Additionally, we introduce a custom backbone, edge-prior net (EP-Net), to capture features across multiple spatial scales, enhancing the recognition of subtle and complex defect patterns. During inference, the multi-branch structure is consolidated into a single-branch equivalent to accelerate detection without compromising accuracy. Experiments conducted on a CT dataset of nuclear graphite components from a high-temperature gas-cooled reactor (HTGR) demonstrate that Defect R-CNN achieves average precision (AP) exceeding 0.9 for all defect types. Moreover, the model attains mean average precision (mAP) scores of 0.983 for bounding boxes (mAP-bbox) and 0.956 for segmentation masks (mAP-segm), surpassing established methods such as Faster R-CNN, Mask R-CNN, Efficient Net, RT-DETR, and YOLOv11. The inference speed reaches 76.2 frames per second (FPS), representing an optimal balance between accuracy and efficiency. This study demonstrates that Defect R-CNN offers a robust and reliable approach for industrial scenarios that require high-precision and real-time defect detection.

Keywords:

defect detection; CT images; deep learning; nuclear graphite components

1. Introduction

Since the 1960s, the High-Temperature Gas-Cooled Reactor (HTGR) has undergone multiple development stages. As a representative of Generation IV nuclear reactor technology, it offers high thermal efficiency, enhanced safety, and versatile applications [1]. The safe and reliable operation of HTGR is pivotal for advancing sustainable nuclear energy deployment and aligning with global environmental protection objectives. Central to HTGR architecture are nuclear graphite components, which serve dual roles as both core structural materials and neutron reflectors [2]. Characterized by low thermal neutron absorption cross-sections and exceptional high-temperature stability, these components are critical for neutron flux moderation, thermal insulation, and maintaining geometric integrity during prolonged operation [3]. Since these components are not replaceable after reactor startup, their longevity plays a crucial role in determining the overall operational lifespan of the reactor system [4]. However, inherent defects introduced during manufacturing, transportation, or in-service irradiation can progressively degrade their mechanical properties, potentially leading to irreversible structural damage and compromising the safety margin of the reactor system [5]. Consequently, the development of precise and reliable defect detection methodologies for nuclear graphite components is indispensable to ensuring the structural integrity and long-term operational safety of HTGRs. The Institute of Nuclear and New Energy Technology at Tsinghua University has developed a multi-row helical computed tomography (CT) for full-volume inspection of nuclear graphite components in the HTGR [6]. A 600,000 kw high-temperature reactor comprises over 20,000 components, each ranging in size from 600 mm to 2000 mm. Volume defects are randomly distributed within the components, most measuring between 1 and 2 mm [7]. Due to the large volume of components and the subtle nature of defects, manual inspection is impractical. Therefore, an automated detection algorithm with high accuracy and computational efficiency is crucial for CT inspection tasks. In response to this need, researchers have developed a variety of automatic detection algorithms, which can be broadly categorized into machine vision (MV) and deep learning (DL) approaches [8].

Traditional MV-based detection begins with image acquisition of defective objects, followed by the application of handcrafted image processing algorithms tailored to specific defect characteristics. The extracted features, typically related to texture or shape, are then manually analyzed to classify defect types. Lu et al. [9] proposed an MV method for bearing surface defect detection. They introduced an Lc-MNN image segmentation algorithm and an SCV feature selection algorithm. With 600 test samples, it achieved a 99.5% identification rate. To automate surface defect inspection of ultrasound probes in a production line, Profili et al. [10] developed a machine vision system integrated, achieving a detection accuracy of 98.63% and a classification accuracy of 81.90%. Tao et al. [11] achieved accuracies of 100% (training set) and 96.15% (validation set) in detecting adhesive defects in solid engine linings by incorporating considerations for device deflection and camera-lens compatibility. Yang [12] developed a POL method targeting fiber optic connector end-face defects, achieving at least 97.14% accuracy, approximately 20% improvement in accuracy, and 6–7 times higher efficiency than traditional manual inspection. Nevertheless, these MV methods rely on manually engineered features, extensive domain expertise, and customized algorithms, resulting in high labor and time investments.

With the rapid development of high-performance computing and hardware, deep learning—by extracting features that are difficult to define in traditional algorithms—offers a more efficient solution, positioning it as the dominant method in defect detection [13]. Convolutional neural networks (CNNs) and their variants have become a key technology in automated defect detection due to robust feature extraction capabilities [14]. Based on algorithmic processes, they are further divided into one-stage and two-stage detection algorithms. One-stage algorithms, such as single-shot multibox detector (SSD) [15] and You Only Look Once (YOLO) [16], predict target categories and positions through linear regression in a single feed-forward network. While offering fast detection speeds, their accuracy may be lower. In contrast, two-stage algorithms, including Regions with CNN features (R-CNN) [17], Faster region-based convolutional neural network (Faster R-CNN) [18], and Mask Region Convolution Neural Network (Mask R-CNN) [19], first generate region proposals and then extract features from each. These algorithms are better suited for high-precision detection tasks but tend to have slower inference speeds compared to one-stage models.

Researchers have enhanced one-stage and two-stage algorithms by leveraging their respective strengths, resulting in various target detection methods tailored to specific applications. In the field of defect recognition using CT and radiographic imaging, Xu et al. [20] introduced an enhanced Faster R-CNN framework for pulmonary nodule detection. Evaluated on the LUNA16 dataset, it outperformed YOLOv3 and Cascade R-CNN, with precision increasing from 76.4% to 90.7% and recall from 40.1% to 56.8%. Similarly, Gamdha et al. [21] developed a system employing ray tracing to generate synthetic radiographic data for training a Mask R-CNN model. Their method achieved over 87% accuracy on a test set of 416 X-ray images of solid propellants. Huang et al. [22] designed an improved Mask R-CNN for multi-class concrete defect detection in bridge structures. The model outperformed the original Mask R-CNN and other deep learning methods by achieving 94.7% accuracy, 95.3% recall, and 90.6% mAP. Farag et al. [23] developed a COVID-19 detection model based on Efficient Net and an attention mechanism for the automated analysis of lung CT images, achieving superior F1 scores (0.9367) compared to the previous year’s approaches. Meanwhile, Su et al. [24] applied YOLOv8 to detect patellar instability on knee MRI, achieving 83% accuracy with significantly faster inference than a junior radiologist. Despite these advancements, existing methodologies face persistent challenges in reliably detecting small-sized defects under complex, high-noise CT imaging environments, underscoring the need for enhanced detection algorithms.

DL methods for small target detection are widely applied across various fields. For instance, Liu et al. [25] developed a multi-scale region-generating network (MRPN) using an enhanced Faster R-CNN model to improve detection accuracy in spark plug defect identification, achieving 89% accuracy and 97% recall on a custom X-ray image dataset. Revathy et al. [26] integrated an improved Mask R-CNN model with the Sobel edge detection algorithm to extract region-of-interest features for textile surface defect detection, achieving an accuracy of 97.8% and outperforming other models by up to 6.45%. To address low detection accuracy in traditional forging inspection methods, Yu et al. [27] designed an Efficient Net-based model enhanced with a feature pyramid network and attention mechanisms, achieving a 95.69% mAP and an F1 score of 0.94 on fluorescent magnetic particle inspection images. Liu et al. [28] addressed weld defect detection with an improved YOLO model that integrates a reinforced multi-scale feature module and an efficient feature extraction module, resulting in a mAP of 92.9% and an inference speed of 61.5 FPS, making it suitable for real-time industrial deployment. Liu et al. [29] developed a track defect recognition method based on an enhanced Real-Time Detection Transformer (RT-DETR) model, which improves small target detection by reducing background noise interference; the refined model reaches a mAP of 98.5%, marking a 1.8% improvement over the baseline RT-DETR.

Although the aforementioned algorithms perform well in various detection tasks, their performance declines when applied to specialized applications. Existing target detection algorithms for CT images often struggle to accurately detect small targets [30]. Conversely, methods designed for small target detection are ineffective in handling the complex backgrounds and noise interference typical of CT images, often resulting in false positives or missed detections [31]. Compared to one-stage models, two-stage models are more effective in recognizing small defects in complex backgrounds, improving accuracy in high-noise and low-contrast images [32]. Addressing the industrial demand for graphite components’ defect detection, this study introduces Defect R-CNN, an improved Mask R-CNN model designed for CT-based detection of diverse defect types. By integrating advanced edge feature extraction and multi-scale fusion, the model enhances the accuracy and robustness of small defect recognition in noisy environments. To objectively evaluate detection performance in such complex scenarios, this study adopts commonly used metrics, including mean average precision (mAP), precision-recall curves (PR curves), and frames per second (FPS). The mAP is calculated separately for bounding box detection and segmentation masks across different defect types, with detailed descriptions provided in the subsequent sections.

The structure of this article is organized as follows: First, the baseline Mask R-CNN model and the rationale for its enhancement are briefly introduced. Next, the network architecture of the improved Defect R-CNN and its key modifications are described in detail. A comparison is then made between Defect R-CNN and five other classical defect detection networks, followed by an analysis of the results. Ablation experiments are conducted to further validate the effectiveness of the proposed improvements compared to the baseline Mask R-CNN model. The article concludes with a discussion and summary.

2. Methods

2.1. Mask R-CNN

Mask R-CNN is a deep learning model extended from Faster R-CNN. It performs object detection and generates accurate segmentation masks for each detected object. Its architecture comprises four components: the feature extraction network, the region proposal network, the classification and regression module, and the segmentation mask module, as illustrated in Figure 1.

Mask R-CNN typically employs ResNet and the Feature Pyramid Network (FPN) as the backbone to extract multi-scale feature maps. This design preserves rich semantic information and enables the model to detect targets at various scales, facilitating the extraction of detailed features. The Region Proposal Network (RPN), as the first step in the two-stage detection process, generates a set of candidate regions from the feature map. The RPN scans the feature map through a sliding window, generating anchors and predicting whether each anchor contains a target. By refining these anchors for classification and regression, the RPN provides high-quality candidate regions for subsequent classification and segmentation stages. To further improve detection accuracy, Mask R-CNN replaces the regions of interest (ROI) Pooling used in Faster R-CNN with ROI Align, which eliminates coordinate errors from floating-point rounding. ROI Align retains finer-grained features through bilinear interpolation, enhancing both detection and segmentation accuracy. After generating candidate regions, Mask R-CNN performs multi-task learning for each region, including category classification, bounding box regression, and instance segmentation. The classification module predicts the target category, while the bounding box regression module refines the target’s position. The segmentation module then generates a binary mask for each target, using a convolutional network to predict ROI pixel by pixel, yielding pixel-level segmentation results.

2.2. Analysis of Model Complexity

Next, the number of parameters and the computational load are calculated for each functional module of Mask R-CNN. The number of parameters refers to the total count of all trainable parameters in the model, including those in convolutional layers and fully connected layers. The computational load, measured in floating-point operations (FLOPs), quantifies the number of arithmetic operations required during the model’s forward pass.

2.2.1. Model Parameters

For a convolutional layer, the number of parameters

P_{c o n v}

is given by the formula

P_{c o n v} = (C_{i n} \cdot K_{w} \cdot K_{h} + B) + C_{o u t}

(1)

where

C_{i n}

and

C_{o u t}

are the number of input and output channels,

K_{h}

and

K_{w}

are the height and width of the kernel, and

B

represents whether a bias is present (1 means true, 0 means false).

For fully connected layers, the number of parameters

P_{f c}

is

P_{f c} = (N_{i n} + B) \cdot N_{o u t}

(2)

where

N_{i n}

and

N_{o u t}

are the number of input and output units.

The total number of parameters

P_{t o t a l}

in the model is the sum of the parameters from all layers:

P_{t o t a l} = \sum_{i = 1}^{N_{c o n v}} P_{c o n v, i} + \sum_{j = 1}^{N_{f c}} P_{f c, j}

(3)

where

N_{c o n v}

and

N_{f c}

are the number of convolutional and fully connected layers.

2.2.2. Computational Load

For a convolutional layer, the

F L O P s_{c o n v}

is given by

F L O P s_{c o n v} = 2 \cdot F_{w} \cdot F_{h} (C_{i n} \cdot K_{w} \cdot K_{h} + 1) \cdot C_{o u t}

(4)

where

F_{w}

and

F_{h}

are the height and width of the input feature map. Other symbols refer to the previously defined parameters.

For fully connected layers, the

F L O P s_{f c}

is

F L O P s_{f c} = (2 \cdot I - 1) \cdot O

(5)

where

I

and

O

are the input and output dimensionality.

The total

F L O P s_{t o t a l}

in the model is the sum of the

F L O P s

across all layers:

F L O P s_{t o t a l} = \sum_{i = 1}^{N_{c o n v}} F L O P s_{c o n v, i} + \sum_{j = 1}^{N_{f c}} F L O P s_{f c, j}

(6)

where

N_{c o n v}

and

N_{f c}

are the number of convolutional and fully connected layers.

The modules of Mask R-CNN include the backbone network (e.g., ResNet50), FPN, RPN, and the output branch (ROI Head). Calculations were performed using an input image resolution of 1536 × 1536. The results are summarized in Table 1.

As shown in Table 1, the ResNet50 backbone network accounts for 50.38% of the total parameters and 38.51% of the computational load (FLOPs), making it the largest contributor in both areas. This is due to the backbone’s role in transforming the raw input image into a multi-scale feature map, capturing both low-level details and high-level semantic information. The quality of these feature maps directly affects the accuracy of region proposals from the RPN and the classification and localization tasks at the ROI head. Poor feature extraction degrades performance in subsequent stages. Therefore, optimizing the backbone to balance accuracy and computational cost is key for high-performance detection. While Mask R-CNN uses ResNet and FPN for natural images, which require diverse feature extraction, CT image defect detection focuses on edge detection. To address this, this paper proposes modifications to the Mask R-CNN backbone for improved CT image defect detection.

2.3. Improvement on Mask R-CNN

CT images, as grayscale images, contain only a single intensity value per pixel, which reflects the density variations of different tissues within the object. In contrast, RGB images consist of three color channels: red, green, and blue. Therefore, when designing detection algorithms, the unique characteristics of CT images must be taken into account. Targeted image processing and deep learning techniques are essential to achieve accurate recognition. This paper proposes an improved model, Defect R-CNN, based on the classical Mask R-CNN algorithm, to address the industrial need for CT image defect detection with a focus on both accuracy and speed. The structure of Defect R-CNN, shown in Figure 2, consists of three main components: the backbone network, the RPN, and three output branches. The backbone network replaces the original ResNet50 with Edge-Prior Net (EP-Net), which integrates convolutional components (EPCB) to extract multi-scale feature maps from the input image. These feature maps are then passed to the subsequent modules for detection and segmentation. The RPN, based on Faster R-CNN, is responsible for generating candidate target regions on the feature map. After ROI Align refinement, these candidate regions are more accurately localized. Finally, the fully convolutional network outputs the target category, bounding box location, and segmentation mask for each candidate region, while simultaneously performing defect detection, classification, and instance segmentation.

2.3.1. Edge-Prior Convolutional Block

Given the edge-texture characteristics of CT images, the network prioritizes extracting these critical details. To achieve this, Defect R-CNN introduces the EPCB, structured as follows.

As shown in Figure 3, the EPCB consists of four branches, each combining a convolutional kernel with a batch normalization (BN) layer. These include a standard 3 × 3 kernel, a horizontal Sobel-Dx kernel, a vertical Sobel-Dy kernel, and a Laplacian kernel. The Sobel operators compute local gradient changes in the horizontal and vertical directions, effectively detecting edges, while the Laplacian operator highlights regions with significant gray value variations, enhancing edge information. The integration of the standard convolutional kernel with these specialized operators improves edge detection performance. The BN layer adjusts the weights of each branch during training, accelerating convergence and mitigating issues such as vanishing or exploding gradients. The input image passes through all four branches, each performing distinct convolutions, and the resulting feature maps are normalized and summed to produce the final output.

To improve training speed, a skip connection is introduced in the EPCB convolutional component in EP-Net with a step size of 1, along with a branch containing only the BN layer. This design preserves the original input information, ensuring that shallow features are passed to deeper layers, which enhances gradient backpropagation and improves both model expressiveness and training performance. Since the convolution operations and skip connection are linear, once the model is trained and converged, the parameters of these branches become fixed. The linear nature of the kernels allows the parameters to be merged into a single 3 × 3 convolution kernel, maintaining the same feature extraction capabilities as the original multi-branch structure while improving computational efficiency. In contrast, ResNet50’s BottleNeck, shown in Figure 4, includes a ReLU activation layer in the left branch, introducing nonlinearity. Consequently, the two branches cannot be merged parametrically during inference.

The output of the multi-branch structure in the EPCB module can be mathematically expressed as

M_{o u t} = \sum_{i = 1}^{4} B N (M_{i n} * W^{(i)}, μ^{(i)}, σ^{(i)}, γ^{(i)}, β^{(i)})

(7)

where

M_{i n}

and

M_{o u t}

denote the input and output feature maps of the EPCB module, and

W^{(i)}

represents the convolution kernel of the

i

-th branch. The operation “

*

” denotes convolution, and the parameters

μ^{(i)}, σ^{(i)}, γ^{(i)}, β^{(i)}

are the BN statistics and affine parameters for the

i

-th branch: mean, standard deviation, scaling factor, and shift factor, respectively. The values

γ^{(i)}

and

β^{(i)}

are learnable parameters and are updated during training, enabling dynamic scaling and shifting of feature maps.

When fusing the multi-branch outputs, the convolution operation and the BN transformation in each branch can be merged, since both are linear operations. The BN transformation can be written as

B N (M, μ, σ, γ, β) = (M - μ) \frac{γ}{σ} + β

(8)

This allows each branch to be equivalently represented by a single convolution operation with recalibrated weights and bias terms, which can be pre-computed before inference. The equivalent convolution kernel and bias can be defined as

W' = \frac{γ}{σ} M

(9)

b' = β - \frac{γ}{σ} μ

(10)

Finally, these equivalent parameters from each branch are summed to produce the final fused output.

In this way, the EPCB retains its multi-branch structure during training to enhance edge-aware feature extraction. However, at inference time, it can be converted into a single-branch form, preserving the performance gains while improving inference speed and reducing memory consumption. This facilitates more efficient deployment while maintaining detection accuracy.

2.3.2. Edge-Prior Net

EP-Net is an innovative backbone network designed to incorporate EPCBs, which explicitly enhance edge feature extraction and significantly improve the network’s performance in detecting and recognizing target details.

As shown in Figure 5, the proposed EP-Net backbone is divided into five stages, similar in structure to ResNet50. The first stage performs downsampling using an EPCB block with a stride of 2, which simultaneously conducts initial feature extraction. Each of the following four stages also begins with a stride-2 EPCB block for spatial downsampling, followed by multiple EPCB blocks with stride 1 for deep feature extraction and fusion. Specifically, the number of EPCB blocks in stages 2 through 5 are 2, 3, 14, and 2, respectively. The number of output channels for each stage progressively increases as follows: 64 (Stage 1), 96 (Stage 2), 192 (Stage 3), 384 (Stage 4), and 1408 (Stage 5). This gradual increase in channel depth enhances the network’s capacity to represent complex features.

A key distinction between EP-Net and ResNet50 lies in the design of their core convolutional units. EP-Net employs the EPCB throughout all stages, which integrates edge-enhancing operators such as Sobel and Laplacian filters in a multi-branch structure. In contrast, ResNet50 utilizes the BottleNeck module, which includes ReLU activations and residual shortcuts. Each BottleNeck in ResNet50 contains three convolutional layers, and the total network depth, including the classifier, reaches 50 layers. EP-Net, however, replaces all BottleNeck modules with EPCBs, thereby focusing on explicit edge information and achieving better structural adaptability for defect detection in CT images.

In summary, the EP-Net backbone network enhances the extraction of defective edge features through the multi-branch EPCB convolution module during training. It also employs skip connections to accelerate model convergence and ensure efficient gradient propagation. Once training is complete, each EPCB—originally composed of four parallel branches—can be merged into a single equivalent convolutional operator. This transformation is mathematically equivalent to the original structure, preserving the same feature extraction capability while significantly improving computational efficiency. Specifically, the merged EPCB achieves approximately a fourfold increase in inference speed compared to its multi-branch counterpart and also reduces memory consumption. This design boosts CT image defect recognition accuracy and balances detection speed, thereby better fulfilling the requirements of industrial CT image defect detection.

3. Experimental Data and Parameters

3.1. Experimental Data

3.1.1. Actual Components Dataset

A key component of the dataset is CT-reconstructed images containing naturally occurring defects, which provide realistic samples for training the model. However, because these defects are embedded within the interior of the components, their exact dimensions are unknown. If defect dimensions are annotated solely based on reconstructed images, the partial volume effect causes blurred boundaries, making it difficult to accurately delineate defect contours and sizes. To obtain more precise annotations, several components were subjected to grinding. Each component was ground in 0.5 mm increments, and the size of defects appearing in each layer was measured. Examples of ground defects and their corresponding measurements are shown in Figure 6. This paper utilizes the open-source software labelme [34] for contour labeling of the defect dataset. Defects in each CT image are labeled by aligning their contours with those in the corresponding grinding image, thereby minimizing inaccuracies caused by blurred reconstruction edges and ensuring labeling accuracy.

In this study, three common types of defects are considered: hole, loose, and side defects. The first two are naturally formed defects, while the latter is artificially introduced on the component surface. Hole defects are typically small and characterized by a clear grayscale contrast at their center. Loose defects manifest as larger regions composed of multiple small, clustered voids with relatively low central grayscale values. Side defects are distributed along the surface or edges of the component and often partially merge with the contour of the surrounding material in the reconstructed images. Categorizing defects based on these visual characteristics enables the model to learn discriminative features associated with each defect type.

Figure 7 illustrates two types of images from the training dataset composed of real components. Figure 7a shows reconstructed CT images of real graphite components, along with magnified views of their defect areas. Naturally occurring defects—namely hole and loose defects—can be observed within the interior of the components. In contrast, side defects are manually introduced on the component surface (typically appearing near the image edges). These natural defects exhibit more complex morphology and structural variation, as well as more diverse grayscale patterns, compared to the artificial ones. To facilitate more accurate defect size annotation, the original CT data were reconstructed at a high resolution of 1536 × 1536 pixels, with a pixel spacing of 0.5 mm. The resulting dataset includes 500 images for training and 100 images for testing. However, due to the limited availability of naturally defective samples, the dataset presents challenges such as small sample size and class imbalance. To address these issues, synthetic data were generated using the Decompound-Synthesize Method (DSM) [35], which performs phantom conversion, background simulation, and defect synthesis to produce CT-like images. DSM is capable of generating realistic artifacts, background noise, and diverse defect geometries, resulting in synthetic images that closely resemble real CT scans. A total of 1500 synthetic images were generated to augment the training set. Figure 7b displays representative synthetic images generated using DSM.

3.1.2. Test Components Dataset

Alongside the training dataset, the test dataset plays a crucial role in evaluating the model’s generalization to unseen data. Fifty reconstructed images of real components with natural defects were used as a primary test set. All annotations in this set were based on the corresponding ground images obtained through physical grinding, ensuring high accuracy. This dataset is thus suitable for quantitatively evaluating model performance. However, due to its limited size, test metrics on this set may be affected by sampling bias. To further assess model performance, additional test components were prepared.

As illustrated in Figure 8, four rectangular graphite components with a 2:1 aspect ratio were fabricated, each containing hole defects of four distinct sizes: 4 mm, 2 mm, 1 mm, and 3 mm. These defects were precisely machined, allowing accurate dimension measurements based on their reconstructed CT images. The positioning of these components can be adjusted to create greater variation in the test dataset. The reconstructed resolution of these images is 1360 × 760 pixels, with a pixel size of 0.5 mm. These images are used only for testing purposes, ensuring that training and testing samples come from different component sets, which allows for a more robust evaluation of the model’s generalization ability.

3.1.3. Complete Dataset

Given the relatively small dataset size, a simple cross-validation approach is adopted, without a separate validation set. The dataset is split into training and testing sets at a ratio of approximately 7:3. A summary of the complete dataset is presented in Table 2:

By maintaining a strict separation between the training and test datasets, this approach ensures a more reliable assessment of the model’s ability to learn underlying patterns in the data, rather than simply memorizing label-to-data mappings.

3.2. Evaluation Metrics

To evaluate model performance, several metrics are employed to assess defect detection accuracy, including recall, precision, intersection over union (IoU), mean average precision (mAP), and frames per second (FPS). Recall measures the proportion of correctly detected defects out of all actual defects, while precision calculates the proportion of correct detections among all predictions. IoU assesses the overlap between predicted and ground truth segmentations, with higher values indicating better accuracy. Average Precision (AP), derived from the precision–recall (PR) curve, reflects the average precision for a single class, while mAP represents the mean of AP across all classes, providing an overall model performance. FPS measures the number of images processed by the model per frame, serving as a key indicator of the model’s inference speed.

The specific formulas for these metrics are as follows:

R e c a l l = \frac{T P}{F N + T P}

(11)

P r e c i s i o n = \frac{T P}{F P + T P}

(12)

A P = \int_{0}^{1} p r e c i s i o n (r e c a l l) d (r e c a l l)

(13)

m A P = (\frac{1}{k}) \sum_{i = 0}^{k} (A P) i

(14)

I o U = \frac{T P}{T P + F P + F N}

(15)

F P S = \frac{1}{A v e r a g e T i m e p e r F r a m e}

(16)

where

T P

(True Positive) refers to the probability that a defect is predicted to exist and the actual result is defective;

T N

(True Negative) refers to the probability that no defect is predicted and the actual result is defect-free;

F P

(False Positive) refers to the probability that a defect is predicted but the actual result is defect-free; and

F N

(False Negative) refers to the probability that no defect is predicted but the actual result is defective.

k

represents the total number of object classes.

According to the specific requirements of this study, the mAP is further categorized into mAP-bbox and mAP-segm, corresponding to bounding box detection and segmentation mask evaluation, respectively. Given the small size of the defects under investigation, even minor pixel-level deviations can lead to significant fluctuations in the IoU value. Therefore, an IoU threshold of 0.50 is adopted to determine true positives when computing AP. In this work, performance is evaluated per defect category, including hole, loose, and side defects. For example, AP-bbox-50-hole and AP-segm-50-hole represent the average precision for bounding box detection and segmentation mask, respectively, of hole-type defects under an IoU threshold of 0.5. The same naming convention applies to the other defect categories. In addition, a larger area under the PR curve typically indicates better overall detection performance. A smoother PR curve suggests greater consistency in distinguishing between positive and negative samples across different thresholds. Such smoothness reflects a model’s robustness in balancing precision and recall, and its ability to generalize well across diverse defect types.

3.3. Models Training

To evaluate the performance of defect detection in nuclear graphite components, five mainstream models—Faster R-CNN [18], Mask R-CNN [19], Efficient Net [36], RT-DETR [37], and YOLOv11 [38]—are compared with the proposed Defect R-CNN. Functionally, Faster R-CNN and RT-DETR are employed for target detection, while Mask R-CNN, Efficient Net, YOLOv11, and Defect R-CNN perform both detection and segmentation. Structurally, Efficient Net, RT-DETR, and YOLOv11 are one-stage models that directly predict the target category and location, offering faster inference speeds, and making them suitable for real-time detection. In contrast, Faster R-CNN, Mask R-CNN, and Defect R-CNN are two-stage models that generate candidate regions for classification, yielding higher detection accuracy but with slower inference.

The experiments are conducted using the open-source object detection framework MMDetection [39]. The backbone network is initialized with weights pre-trained on the MS COCO dataset [40] to accelerate convergence and improve generalization. Except for the backbone network, all hyperparameters follow the default settings of the original Mask R-CNN. The model is trained using stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 0.0001. The initial learning rate is set to 0.01 and is reduced by a factor of 10 at the 40th and 70th epochs following a step decay schedule. The total number of training epochs is 100, and the batch size is set to 50 per iteration. The model is trained on a workstation equipped with an Intel i7-12700KF CPU and an NVIDIA GeForce RTX 3090 GPU.

The Faster R-CNN, Mask R-CNN, Efficient Net, RT-DETR, YOLOv11, and Defect R-CNN models are trained on the same training dataset. The final loss curve is shown in Figure 9.

From the curves, it is evident that although all models exhibit a significant decrease in loss values during the early training stages, their convergence behaviors differ. Efficient Net, RT-DETR, and YOLOv11 converge quickly with minimal fluctuation, reflecting efficient and stable early learning. In contrast, Faster R-CNN, Mask R-CNN, and Defect R-CNN show larger fluctuations in loss values during the initial phase. These fluctuations arise as these models, initialized with pre-trained weights, adapt to the new data and adjust their parameters. Despite initial instability, Defect R-CNN demonstrates competitive convergence, eventually effectively minimizing loss and attaining strong final convergence.

4. Experimental Evaluation and Analysis

4.1. Test Results of Actual Components Dataset

After training, the optimal parameter configuration of the model is saved and tested on the actual components test dataset. Figure 10 shows the PR curves of the six evaluated models across three defect classes: hole, loose, and side, as well as their overall performance.

Compared to all other models, Defect R-CNN consistently exhibits the best performance, with smooth PR curves forming a high-arching shape that indicates superior detection capability across all defect categories. In particular, the PR curve for hole defects, which are the smallest and most challenging to detect, remains relatively stable for Defect R-CNN even at higher recall values—demonstrating the model’s effectiveness in capturing fine-grained features. By contrast, both Faster R-CNN and Mask R-CNN show sharp precision drops when recall exceeds 0.2, especially for hole defects, suggesting that they struggle to generalize well on smaller or low-contrast targets. The lack of specialized edge-aware features and deeper multi-scale representation likely contributes to their limited performance. Efficient Net performs well for side defects, maintaining relatively high precision across most recall levels. However, its performance on hole and loose defects degrades significantly as recall increases, with noticeable fluctuations in the PR curve. This may stem from the model’s reliance on global image features, which are less effective in capturing localized, small-scale anomalies. YOLOv11 also shows competent performance on the side and loose defects, but the PR curve for hole defects reveals severe instability—precision falls rapidly as recall increases beyond 0.2. This suggests that YOLO-based detectors, while fast, may be prone to over-detection and false positives when applied to small or ambiguous regions. RT-DETR, as a transformer-based detector, performs better than traditional CNN backbones in handling loose and side defects. However, the curve for hole defects is jagged and lacks smooth progression, indicating unstable confidence in predictions across thresholds. This may be due to RT-DETR’s default settings for anchor-free decoding, which could be suboptimal for tiny object detection in CT images.

Overall, the smoothness and area under the curve of Defect R-CNN across all classes validate the effectiveness of the architectural improvements introduced in this work, which enhance fine defect localization and mitigate false positives in high-noise CT environments.

To further quantify each model’s performance, we calculated the mAP, category-specific AP, and FPS for both detection and segmentation tasks. These metrics provide a comprehensive evaluation of model performance across varying thresholds of the PR curve. Table 3 summarizes the performance of the proposed Defect R-CNN and five benchmark models across these metrics, as defined in Section 3.2.

Defect R-CNN consistently outperforms all baseline models across both detection and segmentation tasks. It achieves the highest mAP-bbox of 0.983, surpassing the next best model, RT-DETR (0.776), by 26.7%, and significantly exceeding traditional models like Mask R-CNN (0.415) and Faster R-CNN (0.373). On category-specific metrics, Defect R-CNN demonstrates superior precision for all three defect types, scoring above 0.98 in AP-bbox-50 across hole, loose, and side defects. This performance advantage is especially pronounced for hole defects, which are the smallest and most challenging to detect due to their blurry edges and weak grayscale contrast. Most models, including YOLOv11 and Efficient Net, show significant drops in AP for this class, with Efficient Net achieving only 0.442. In contrast, Defect R-CNN reaches 0.98, a 121.7% improvement, thanks to its edge-aware EPCB module, which enhances fine-grained feature extraction. For loose defects, detection is relatively easier due to their larger size and distinctive texture, so most models perform well, but Defect R-CNN still leads with 0.98. Side defects, characterized by clean edges along component boundaries, are generally easier to detect, which explains the uniformly high AP-bbox-50-side values (≥0.949) across most models, whereas earlier two-stage detectors like Faster R-CNN (0.426) and Mask R-CNN (0.526) struggle to match this performance. In terms of segmentation performance, Defect R-CNN achieves the highest mAP-segm of 0.956, outperforming Mask R-CNN (0.297) by 221.9%, and also exceeding YOLOv11 (0.381) and Efficient Net (0.411) by large margins. Again, for category-specific AP-segm-50, Defect R-CNN achieves top scores of 0.9 (hole), 0.98 (loose), and 0.988 (side)—showcasing its comprehensive segmentation capability across defect types.

Regarding inference speed, Defect R-CNN achieves 76.2 FPS, outperforming traditional two-stage frameworks such as Mask R-CNN (53.5 FPS) and Faster R-CNN (68.4 FPS). This efficiency gain results from the merging of multi-branch EPCB modules into a single-branch structure during inference, which simplifies computation without sacrificing accuracy. Although one-stage models like YOLOv11 (91.1 FPS), Efficient Net (100.6 FPS), and RT-DETR (111.8 FPS) operate faster, they often trade speed for precision—especially in detecting small or subtle defects like holes. Defect R-CNN offers a well-balanced solution, delivering high precision while maintaining real-time inference speed suitable for industrial applications.

Figure 11 displays a reconstructed CT image from the test set, containing three annotated defect types—side, loose, and hole. Figure 12 provides zoomed-in detection results from six different models on the same image, enabling direct visual comparison. Confidence scores are shown alongside each predicted bounding box to illustrate detection certainty. In Figure 12a,b, both Faster R-CNN and Mask R-CNN detect only a portion of the side and loose defects, with confidence scores generally below 60 (side: ~30–56, loose: ~45–55). Notably, both models fail to detect the hole defect entirely, reflecting their limited capability in capturing small defects. This is consistent with their poor AP-bbox-50-hole values (0.048 and 0.05, respectively) and low mAP-bbox scores reported in Table 3. In Figure 12c, Efficient Net identifies all side defects, but with varying confidence scores, and detects the loose and hole defects with low confidence (30.6 and 34.1, respectively). This suggests a lack of robustness to grayscale fluctuations and partial edge blending, especially for low-contrast defects. RT-DETR, shown in Figure 12d, demonstrates excellent performance in detecting side and loose defects, achieving high confidence values (side: ≥94, loose: 84.8). It attains the highest AP-bbox-50-side (0.993) among all models, affirming its strength in structured region recognition. However, it detects the hole defect with relatively low confidence (30.5), consistent with its lower AP-bbox-50-hole (0.383), revealing difficulty in handling fine-grained features. In Figure 12e, YOLOv11 accurately detects the side and loose defects with high confidence (side: 97.4–99.9, loose: 93.8), showing strong performance on larger, well-defined targets. However, it fails to detect the hole defect, confirming its poor performance on small anomalies, as reflected in its low AP-bbox-50-hole score of 0.187 in Table 3. Despite its speed and effectiveness in detecting large targets, YOLOv11 exhibits limited reliability in capturing tiny features. Defect R-CNN, as shown in Figure 12f, consistently delivers the best visual performance, detecting all three defect types with near-perfect confidence scores (≥99.9). This outcome corresponds with its quantitative results in Table 3, where the model achieves high scores across all categories. These results reflect the effectiveness of the proposed architectural design, which improves the model’s ability to capture fine structural details and edge information in CT images.

Overall, the detection outcomes in Figure 12 are in strong agreement with the quantitative performance reported in Table 3. The consistent dominance of Defect R-CNN across both visual and numerical evaluations highlights its robustness, generalization capability, and suitability for industrial CT defect detection scenarios that require high precision and stability.

4.2. Test Results of Test Components Dataset

Due to the limited size of the actual component test set and the small number of defect instances, it is expected that the resulting evaluation metrics may be higher or lower than typical cases. To provide a more comprehensive assessment of the model’s performance, we further evaluated their generalization ability using the test component dataset. Since the test components in the evaluation set contain only hole defects, performance metrics are computed exclusively for this defect category. The calculated AP-bbox-50-hole values for Faster R-CNN, Mask R-CNN, Efficient Net, RT-DETR, YOLOv11, and Defect R-CNN are 0.025, 0.03, 0.388, 0.343, 0.083, and 0.99, respectively. These results demonstrate that when applied to completely unseen data, the Defect R-CNN model continues to exhibit strong detection performance, while the accuracy of other models declines noticeably. As illustrated in Figure 13, this disparity becomes evident in the visualized detection results.

Figure 13a presents CT-reconstructed images of the test components, composed of four rectangular aperture plates with hole sizes of 4 mm, 2 mm, 1 mm, and 3 mm. As observed in the image, many of the 1 mm holes are barely visible due to their extremely small volume under the current CT system resolution and are only faintly discernible at a few locations.

In the evaluation using this test set, Faster R-CNN, Mask R-CNN, and YOLOv11 perform poorly, with AP-bbox-50-hole values of 0.025, 0.03, and 0.083, respectively. As seen in Figure 13b,c,f, all three models fail to detect the hole defects entirely and instead misclassify them as loose or side defects—constituting false positives. This reflects significant issues in sensitivity when faced with subtle defects. Figure 13d,e shows the test results of Efficient Net and RT-DETR, which show relatively better performance. However, both models detect only a limited number of true hole defects, and frequently mislabel defects as side or loose, especially near the image edges and center. This suggests insufficient feature learning, particularly in handling complex grayscale artifacts and distinguishing subtle variations in defect morphology. In contrast, Defect R-CNN outperforms the other models in detection accuracy and confidence. As illustrated in Figure 13g, it successfully detects nearly all hole defects with high confidence scores, including those of smaller size and in challenging regions. Except for one missed detection in a 2 mm aperture, the model achieves near-complete coverage, highlighting its robustness in multi-scale defect detection and generalization capability on unseen data. However, it should be noted that ring artifacts present in the reconstructed CT images—bearing texture similarities to actual defects—can still lead to occasional false positives, even for Defect R-CNN. This emphasizes the need for further refinement through combined hardware–software optimization to suppress artifact interference and improve overall defect recognition reliability.

In summary, Defect R-CNN outperforms other models in both detection accuracy and confidence. It excels in managing artifacts and complex backgrounds, making it more suitable for industrial inspection tasks requiring high accuracy and robustness. While RT-DETR and Efficient Net offer faster detection speeds, they face challenges in detecting small-scale and low-contrast defects in more intricate scenes.

5. Ablation Study

To validate the effectiveness of the improvements to Mask R-CNN, a series of ablation experiments were conducted using the same dataset. The experiments aimed to isolate and assess the impact of each major architectural component. Specifically, we began with the original Mask R-CNN using ResNet50 as the backbone. Next, we replaced the backbone’s Bottleneck component with the EPCB to evaluate its impact on edge feature extraction. We then substituted the backbone with EP-Net to assess overall performance gains. Finally, we merged the multi-branch structure of EP-Net into a single-branch structure to test inference speedup. The results are summarized in Table 4.

Table 4 presents the results of ablation experiments evaluating the effectiveness of the improvements made to the Mask R-CNN model. The results show the impact of EPCB, EP-Net, and the single-branch structure on detection, segmentation, and FPS metrics. Replacing the convolutional module in the ResNet50 backbone with EPCB enhances edge feature extraction, increasing mAP-bbox from 0.592 to 0.774 and mAP-segm from 0.313 to 0.787. Subsequently, the addition of the EP-Net backbone further improves performance, particularly in hole defect detection, with AP-bbox-50-hole rising from 0.397 to 0.98 and AP-segm-50-hole from 0.397 to 0.913. These results demonstrate EP-Net’s ability to extract more discriminative features. Finally, to improve inference speed, the multi-branch structure of EP-Net was consolidated into an equivalent single-branch design. This optimization led to a significant increase in inference speed, with FPS rising from 50.96 to 78.85 while maintaining high detection accuracy (mAP-bbox = 0.986) and segmentation performance (mAP-segm = 0.979). These findings demonstrate that the proposed architecture not only enhances performance but also maintains efficiency, making it suitable for real-time industrial CT defect detection.

Figure 14 compares the test results of the final improved Defect R-CNN and the baseline Mask R-CNN model on three defect types. Due to the small size of the defects, zoomed-in views are provided to better visualize both the detection accuracy and confidence scores. While both models can locate defects, Mask R-CNN shows significantly lower confidence levels than Defect R-CNN. For hole defects, Mask R-CNN outputs confidence scores of only 34.6% and 55.9%, whereas Defect R-CNN achieves 100% and 96.7%. Similarly, for loose and side defects, Mask R-CNN yields confidence scores of 32.4% and 36.8%, much lower than the 100% and 99% confidence of Defect R-CNN. These results not only reinforce the superior feature extraction and robustness of the proposed Defect R-CNN but also visually validate the improvements observed in the quantitative metrics. The significant boost in detection confidence highlights the effectiveness of incorporating EPCB and the EP-Net backbone in enhancing both accuracy and model reliability.

6. Discussion and Conclusions

In this study, we proposed Defect R-CNN, a deep learning model specifically tailored for defect detection in CT images. The model incorporates shape-prior knowledge and introduces the EPCB, which adaptively enhances edge feature extraction during training. This significantly improves the model’s sensitivity to small and low-contrast defect regions. Based on EPCB, we further designed the EP-Net backbone, which employs a multi-branch structure for robust feature learning during training and is structurally compressed into a single-branch configuration during inference. This design achieves an effective balance between detection accuracy and computational efficiency.

We trained Defect R-CNN alongside five state-of-the-art models—Faster R-CNN, Mask R-CNN, Efficient Net, RT-DETR, and YOLOv11—under identical conditions. Testing on the actual component dataset showed that Defect R-CNN outperforms these models in feature extraction and hole defect localization, achieving a mAP-bbox of 0.983 and a mAP-segm of 0.956, significantly improving upon the baselines and reflecting enhanced spatial precision. Notably, the model also demonstrated efficient inference, achieving 76.2 frames per second, thereby offering both accuracy and speed. In addition, evaluation of the test component dataset—comprising multiple instances of small-sized hole defects—demonstrated the model’s robustness. Defect R-CNN achieved an AP-bbox-50-hole of 0.99, detecting nearly all defects with high confidence, even in regions where feature contrast was low or artifacts were present. To further evaluate the effectiveness of each architectural enhancement, ablation experiments were conducted on the same dataset. Replacing the ResNet50 bottleneck with EPCB increased mAP-bbox from 0.592 to 0.774, while the integration of the EP-Net backbone further improved AP-bbox-50-hole to 0.98. Merging the multi-branch structure into a single-branch design boosted inference speed from 50.96 to 78.85 FPS, with mAP-bbox and mAP-segm still reaching 0.986 and 0.979, respectively. These results confirm that each improvement contributes to both accuracy and efficiency, reinforcing the model’s suitability for industrial CT defect detection. Although the model is evaluated on CT images of HTGR graphite components, its design and performance characteristics suggest strong potential for broader application in other industrial defect detection applications.

Looking ahead, further improvements can be achieved by enhancing CT image quality. This could include the use of higher-resolution detectors during image acquisition or the application of image enhancement techniques such as edge sharpening to increase contrast between defect boundaries and surrounding material. Such efforts would facilitate more precise feature extraction and contribute to even greater detection accuracy, especially in complex industrial CT scenarios.

Author Contributions

Conceptualization, Z.J., J.F., T.Z., R.L. and Y.S.; Methodology, Z.J., J.F., T.Z., R.L. and Y.S.; Software, T.Z., R.L. and Y.S.; Validation, Z.J., J.F., T.Z. and R.L.; Formal analysis, Z.J., J.F., T.Z. and R.L.; Investigation, J.F. and T.Z.; Resources, P.C., J.M. and Y.S.; Data curation, Z.J., J.F., R.L., P.C. and J.M.; Writing—original draft, Z.J.; Writing—review & editing, Z.J.; Visualization, Z.J.; Supervision, P.C., J.M. and Y.S.; Project administration, P.C., J.M. and Y.S.; Funding acquisition, P.C. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

The National Key Research and Development Program of China, Grant/Award Number: 2023YFF0906300.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some or all the data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

No conflicts of interest. All authors have approved the submitted manuscript.

Abbreviations

The following abbreviations are used in this manuscript:

HTGR	High Temperature Gas-cooled Reactor
AP	Average Precision
mAP	mean Average Precision
mAP-bbox	mean Average Precision of bounding box
mAP-segm	mean Average Precision of segmentation mask
FPS	Frames Per Second
ET	Eddy Current Testing
CT	Computed Tomography
UT	Ultrasonic Testing
RT	Radiographic Testing
NDT	Non-destructive Testing
MV	Machine Vision
DL	Deep Learning
CNN	Convolutional Neural Network
SSD	Single Shot multibox Detector
YOLO	You Only Look Once
R-CNN	Regions with CNN features
Faster R-CNN	Faster Region-based Convolution Neural Network
Mask R-CNN	Mask Region Convolution Neural Network
RT-DETR	Real-Time Detection Transformer
DSM	Decompound-Synthesize Method
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity
BN	Batch Normalization
FLOPs	Floating Point Operations
GFLOPs	Giga Floating Point Operations
RPN	Region Proposal Network
MRPN	Multi-scale Region Proposal Network
ROI	Region of Interest
GAN	Generative Adversarial Network
IoU	Intersection over Union
FPN	Feature Pyramid Network
EPCB	Edge-Prior Convolutional Block
EP-Net	Edge-Prior Net
PR curve	precision–recall curve
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative
SGD	Stochastic gradient descent

References

Sun, J.; Li, Z.; Li, C. Progress of Establishing the China-Indonesia Joint Laboratory on HTGR. Nucl. Eng. Des. 2022, 397, 111959. [Google Scholar] [CrossRef]
Zeng, T.; Fu, J.; Cong, P.; Liu, X.; Xu, G.; Sun, Y. Research on Ring Artifact Reduction Method for CT Images of Nuclear Graphite Components. J. X-Ray Sci. Technol. 2025, 33, 317–324. [Google Scholar] [CrossRef]
Johns, S.; Yoder, T.; Chinnathambi, K.; Ubic, R.; Windes, W.E. Microstructural Changes in Nuclear Graphite Induced by Thermal Annealing. Mater. Charact. 2022, 194, 112423. [Google Scholar] [CrossRef]
Zhang, X.; Deng, Y.; Yan, P.; Zhang, C.; Pan, B. 3D Analysis of Crack Propagation in Nuclear Graphite Using DVC Coupled with Finite Element Analysis. Eng. Fract. Mech. 2024, 309, 110415. [Google Scholar] [CrossRef]
Thomas, M.; Oh, H.; Schoell, R.; House, S.; Crespillo, M.; Hattar, K.; Windes, W.; Haque, A. Dynamic Deformation in Nuclear Graphite and Underlying Mechanisms. Materials 2024, 17, 4530. [Google Scholar] [CrossRef]
Lu, J.; Liu, X.; Miao, J.; Wu, Z. Study on Irradiation Light Output Stability of PB-HTGR Pebble Flow CT Detector. At. Energy Sci. Technol. 2018, 52, 1685–1690. [Google Scholar] [CrossRef]
Xiong, D.; Tsang, D.; Song, J. An Insight into Annealing Mechanism of Graphitized Structures after Irradiation. Radiat. Phys. Chem. 2024, 225, 112137. [Google Scholar] [CrossRef]
Wang, W.; Wang, P.; Zhang, H.; Chen, X.; Wang, G.; Lu, Y.; Chen, M.; Liu, H.; Li, J. A Real-Time Defect Detection Strategy for Additive Manufacturing Processes Based on Deep Learning and Machine Vision Technologies. Micromachines 2023, 15, 28. [Google Scholar] [CrossRef] [PubMed]
Lu, M.; Chen, C.-L. Detection and Classification of Bearing Surface Defects Based on Machine Vision. Appl. Sci. 2021, 11, 1825. [Google Scholar] [CrossRef]
Profili, A.; Magherini, R.; Servi, M.; Spezia, F.; Gemmiti, D.; Volpe, Y. Machine Vision System for Automatic Defect Detection of Ultrasound Probes. Int. J. Adv. Manuf. Technol. 2024, 135, 3421–3435. [Google Scholar] [CrossRef]
Tao, X.; Gao, H.; Wu, Q.; He, C.; Zhang, L.; Zhao, Y. Detection of Defects in Adhesive Coating Based on Machine Vision. IEEE Sens. J. 2024, 24, 5172–5185. [Google Scholar] [CrossRef]
Yang, L. Fiber Optic Connector End-Face Defect Detection Based on Machine Vision. Opt. Fiber Technol. 2025, 91, 104158. [Google Scholar] [CrossRef]
Wang, X.; D’Avella, S.; Liang, Z.; Zhang, B.; Wu, J.; Zscherpel, U.; Tripicchio, P.; Yu, X. On the Effect of the Attention Mechanism for Automatic Welding Defects Detection Based on Deep Learning. Expert Syst. Appl. 2025, 268, 126386. [Google Scholar] [CrossRef]
Chen, Y.; Ding, Y.; Zhao, F.; Zhang, E.; Wu, Z.; Shao, L. Surface Defect Detection Methods for Industrial Products: A Review. Appl. Sci. 2021, 11, 7657. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef]
Xu, J.; Ren, H.; Cai, S.; Zhang, X. An Improved Faster R-CNN Algorithm for Assisted Detection of Lung Nodules. Comput. Biol. Med. 2023, 153, 106470. [Google Scholar] [CrossRef]
Gamdha, D.; Unnikrishnakurup, S.; Rose, K.J.J.; Surekha, M.; Purushothaman, P.; Ghose, B.; Balasubramaniam, K. Automated Defect Recognition on X-Ray Radiographs of Solid Propellant Using Deep Learning Based on Convolutional Neural Networks. J. Nondestruct. Eval. 2021, 40, 18. [Google Scholar] [CrossRef]
Huang, C.; Zhou, Y.; Xie, X. Intelligent Diagnosis of Concrete Defects Based on Improved Mask R-CNN. Appl. Sci. 2024, 14, 4148. [Google Scholar] [CrossRef]
Farag, R.; Upadhyay, P.; Gao, Y.; Demby, J.; Montoya, K.G.; Tousi, S.M.A.; Omotara, G.; DeSouza, G. COVID-19 Detection from Pulmonary CT Scans Using a Novel EfficientNet with Attention Mechanism. arXiv 2024, arXiv:2403.11505. [Google Scholar]
Su, Q.; Qin, Z.; Mu, J.; Wu, H. YOLO Lung CT Disease Rapid Detection Classification with Fused Attention Mechanism; ACM: Xiamen, China, 2023; pp. 1381–1387. [Google Scholar]
Liu, Y.; Liu, Y.; Zhang, P.; Zhang, Q.; Wang, L.; Yan, R.; Li, W.; Gui, Z. Spark Plug Defects Detection Based on Improved Faster-RCNN Algorithm. J. X-Ray Sci. Technol. 2022, 30, 709–724. [Google Scholar] [CrossRef] [PubMed]
Revathy, G.; Kalaivani, R. Fabric Defect Detection and Classification via Deep Learning-Based Improved Mask RCNN. Signal Image Video Process. 2024, 18, 2183–2193. [Google Scholar] [CrossRef]
Yu, T.; Chen, W.; Junfeng, G.; Poxi, H. Intelligent Detection Method of Forgings Defects Detection Based on Improved EfficientNet and Memetic Algorithm. IEEE Access 2022, 10, 79553–79563. [Google Scholar] [CrossRef]
Liu, M.; Chen, Y.; Xie, J.; He, L.; Zhang, Y. LF-YOLO: A Lighter and Faster YOLO for Weld Defect Detection of X-Ray Image. IEEE Sens. J. 2023, 23, 7430–7439. [Google Scholar] [CrossRef]
Liu, Y.; Cao, Y.; Sun, Y. Research on Rail Defect Recognition Method Based on Improved RT-DETR Model. In Proceedings of the 2024 5th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 12–14 April 2024; pp. 1464–1468. [Google Scholar]
Zhao, L.; Wang, T.; Chen, Y.; Zhang, X.; Tang, H.; Lin, F.; Li, C.; Li, Q.; Tan, T.; Kang, D.; et al. A Novel Framework for Segmentation of Small Targets in Medical Images. Sci. Rep. 2025, 15, 9924. [Google Scholar] [CrossRef]
Yada, N.; Kuroda, H.; Kawamura, T.; Fukuda, M.; Miyahara, Y.; Yoshizako, T.; Kaji, Y. Improvement of Image Quality for Small Lesion Sizes in 18F-FDG Prone Breast Silicon Photomultiplier-Based PET/CT Imaging. Asia Ocean. J. Nucl. Med. Biol. 2025, 13, 77. [Google Scholar] [CrossRef]
Zhang, F.; Wang, Q.; Fan, E.; Lu, N.; Chen, D.; Jiang, H.; Yu, Y. Enhancing Non-Small Cell Lung Cancer Tumor Segmentation with a Novel Two-Step Deep Learning Approach. J. Radiat. Res. Appl. Sci. 2024, 17, 100775. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online Image Annotation and Applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
Fu, J.; Liu, R.; Zeng, T.; Cong, P.; Liu, X.; Sun, Y. A Study on CT Detection Image Generation Based on Decompound Synthesize Method. J. X-Ray Sci. Technol. 2025, 33, 72–85. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]

Figure 1. Mask R-CNN architecture.

Figure 2. Defect R-CNN framework.

Figure 3. Structure of EPCB.

Figure 4. BottleNeck block in ResNet [33].

Figure 5. EP-Net architecture.

Figure 6. Grinding image of actual graphite components.

Figure 7. Defects in reconstructed images of actual graphite components.

Figure 8. Test graphite components.

Figure 9. Loss curves of six models.

Figure 10. PR curves of six models.

Figure 11. Annotations of three defect types (side, loose, hole) in a reconstructed CT image.

Figure 12. Comparative detection results of six models in a CT image.

Figure 13. Detection results of different models on the test component.

Figure 14. Comparison of detection results between Mask R-CNN and Defect R-CNN.

Table 1. Calculation of parameters and FLOPs for the Mask R-CNN model.

Module	Params (M)	Parameters Proportion	FLOPs (GFLOPs)	Computation Proportion
ResNet50	16.88	50.38%	265.34	38.51%
FPN	1.51	4.50%	195.76	28.41%
RPN	0.60	1.79%	139.66	20.27%
ROI Head	14.52	43.33%	88.21	12.81%
Total	33.51	100.00%	688.97	100.00%

Table 2. Parameters of the dataset.

Parameters	Training Dataset			Test Dataset
Parameters	Actual Component	Synthetic Data Made by DSM	Total	Actual Component	Test Component	Total
Number of images	500	1500	2000	100	750	850
Image resolution	1536 × 1536	1536 × 1536	1536 × 1536	1536 × 1536	1360 × 760	—
Number of hole defect	205	6729	6934	104	36383	36487
Number of loose defect	113	2201	2314	48	—	48
Number of side defect	615	992	1607	184	—	184

Table 3. Detection results of different models.

	Faster R-CNN	Mask R-CNN	Efficient Net	RT-DETR	YOLOv11	Defect R-CNN
mAP-bbox	0.373	0.415	0.664	0.776	0.655	0.983
AP-bbox-50-hole	0.048	0.05	0.442	0.383	0.187	0.98
AP-bbox-50-loose	0.645	0.67	0.574	0.953	0.829	0.98
AP-bbox-50-side	0.426	0.526	0.977	0.993	0.949	0.988
mAP-segm	—	0.297	0.411	—	0.381	0.956
AP-segm-50-hole	—	0.05	0.0544	—	0.0456	0.9
AP-segm-50-loose	—	0.604	0.675	—	0.663	0.98
AP-segm-50-side	—	0.237	0.503	—	0.433	0.988
FPS	68.4	53.5	100.6	111.8	91.1	76.2

Table 4. Results of ablation experiments.

	Mask R-CNN	+EPCB	+EP-Net	Defect R-CNN
mAP-bbox	0.592	0.774	0.985	0.986
AP-bbox-50-hole	0.05	0.397	0.98	0.98
AP-bbox-50-loose	0.651	0.982	0.989	0.989
AP-bbox-50-side	0.538	0.987	0.988	0.989
mAP-segm	0.313	0.787	0.961	0.979
AP-segm-50-hole	0.05	0.397	0.913	0.969
AP-segm-50-loose	0.642	0.978	0.98	0.98
AP-segm-50-side	0.249	0.987	0.989	0.989
FPS	51.91	50.97	50.96	78.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, Z.; Fu, J.; Zeng, T.; Liu, R.; Cong, P.; Miao, J.; Sun, Y. Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection. Appl. Sci. 2025, 15, 4825. https://doi.org/10.3390/app15094825

AMA Style

Jiang Z, Fu J, Zeng T, Liu R, Cong P, Miao J, Sun Y. Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection. Applied Sciences. 2025; 15(9):4825. https://doi.org/10.3390/app15094825

Chicago/Turabian Style

Jiang, Zirou, Jintao Fu, Tianchen Zeng, Renjie Liu, Peng Cong, Jichen Miao, and Yuewen Sun. 2025. "Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection" Applied Sciences 15, no. 9: 4825. https://doi.org/10.3390/app15094825

APA Style

Jiang, Z., Fu, J., Zeng, T., Liu, R., Cong, P., Miao, J., & Sun, Y. (2025). Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection. Applied Sciences, 15(9), 4825. https://doi.org/10.3390/app15094825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection

Abstract

1. Introduction

2. Methods

2.1. Mask R-CNN

2.2. Analysis of Model Complexity

2.2.1. Model Parameters

2.2.2. Computational Load

2.3. Improvement on Mask R-CNN

2.3.1. Edge-Prior Convolutional Block

2.3.2. Edge-Prior Net

3. Experimental Data and Parameters

3.1. Experimental Data

3.1.1. Actual Components Dataset

3.1.2. Test Components Dataset

3.1.3. Complete Dataset

3.2. Evaluation Metrics

3.3. Models Training

4. Experimental Evaluation and Analysis

4.1. Test Results of Actual Components Dataset

4.2. Test Results of Test Components Dataset

5. Ablation Study

6. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI