Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence

Nieberl, Michael; Zeiser, Alexander; Timinger, Holger; Friedrich, Bastian

doi:10.3390/electronics14224366

Open AccessArticle

Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence

¹

BMW Group, Petuelring 130, 80809 Munich, Germany

²

Institute for Data and Process Science, Landshut University of Applied Sciences, Am Lurzenhof 1, 84036 Landshut, Germany

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4366; https://doi.org/10.3390/electronics14224366 (registering DOI)

Submission received: 26 September 2025 / Revised: 3 November 2025 / Accepted: 4 November 2025 / Published: 7 November 2025

(This article belongs to the Special Issue Emerging Applications of Data Analytics in Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

This research contrasts model-centric (MCAI) and data-centric (DCAI) strategies in artificial intelligence, focusing specifically on optical quality control. It addresses the necessity for a thorough empirical study to evaluate both approaches under identical conditions. By examining casting and leather datasets, the study highlights that the quality and diversity of data play a more vital role in the success of models than merely fine-tuning hyperparameters. While MCAI delivers dependable results with superior datasets, DCAI methods—such as label correction, data augmentation, and generating synthetic data through diffusion models—significantly enhance recognition performance. For the casting dataset, accuracy increased from 83% to 93%, and for the leather dataset, from 53% to 62%. These results indicate that robust AI systems are built on high-quality, balanced data.

Keywords:

data-centric AI; model-centric AI; computer vision; convolutional neural networks (CNNs); manufacturing quality control; generative AI

1. Introduction

Quality problems affect not only customer satisfaction but also a company’s excellent reputation. Therefore, the manufacturing industry attaches great importance to the production of flawless products [1]. The concept of zero-defect manufacturing (ZDM) has become a cornerstone of high-quality manufacturing processes. One particularly effective method of defect detection is computer vision (CV). In CV, a post-production image of the component is analyzed with algorithms to detect deviations [2,3]. Traditional defect evaluation requires highly qualified employees who can experience fatigue or loss of concentration because of monotonous tasks [4]. In addition, the short cycle times in production environments make quality control even more difficult. Machine learning (ML), especially deep learning approaches, offers promising solutions to these challenges [5,6]. These methods can learn patterns from the data and subsequently detect anomalies. The implementation of AI-supported quality control in industrial practice often fails not because of the model architecture, but because of insufficient, unbalanced or poorly annotated training data. In industry, there is a lack of reliable data sources, especially for rare defects, which leads to immature and less robust systems. For precise modeling and to avoid overfitting, however, large, high-quality data sets with diverse defect patterns are necessary [7,8].

The data-related challenges are manifold: the creation of suitable data sets is time-consuming, and there is often insufficient data or it is of inferior quality [9,10]. Other problems include data bias [11], inaccurate or missing labels [12,13], class overlap [14] and data drift [15]. These factors not only make it difficult to train powerful models but also make it necessary for models to continuously adapt to new data distributions—with potentially negative effects on accuracy. The recent development of generative models can overcome existing challenges in data generation. In particular, these models enable the generation of new data sets that are characterized by high quality, correct label assignments, controlled variance and noise levels, and the targeted integration of rare error cases. By combining modern generative methods with traditional approaches, a targeted improvement of existing data sets can be achieved. This includes expanding the variety as well as increasing data quality and representativeness. Data-centric artificial intelligence (DCAI) can be realized through the targeted combination of these methods. The data-centric approach has gained attention because it improves systems performance by iteratively improving the underlying data and is able to address the data-related problems mentioned above. In contrast to DCAI is the model-centric (MCAI) approach. MCAI improves the system’s performance by tuning the hyperparameter and improving the architecture. Therefore, MCAI cannot solve data-related problems sustainably. Although DCAI has shown promising results in domains such as natural language processing and time series analysis [16], the approach is not yet fully researched and established in different domains of AI. Ignoring the fundamental importance of data has led to uncertainty, bias, and vulnerability in practice [17]. Although data-centric AI is gaining attention, there is still a lack of quantitative and systematic comparison of both paradigms—MCAI and DCAI—under controlled conditions. Current research increasingly emphasizes the importance of data-centric development. In [17], the authors discuss standardized, open benchmarking of data sets alongside models and examine generalization. Ref. [13] highlights the significance of data quality and collection as fundamental elements in AI development. It discusses the challenges of maintaining robustness with imperfect data ensuring fairness and reviews various techniques for validation, cleaning, integration, and equitable modeling in deep learning. The authors highlight the importance of data-centric AI and describe three primary goals: developing training data, developing inference data, and continuous data maintenance. Ref. [18] describe the associated methodologies, automation processes, and collaborative efforts, highlighting the key challenges and standards throughout the complete data lifecycle. They offer a global viewpoint with recommendations for research and practical approaches to systematically building data-friendly systems.

While these studies focus primarily on conceptual frameworks, benchmark methods, and data quality metrics, there is still a lack of comprehensive empirical studies that directly compare data- and model-centric approaches under comparable experimental conditions. Previous studies, which focus on enhancing the system performance, have mostly focused on architecture optimization or isolated data augmentations without considering data-centric strategies as a complete development system.

This study addresses this gap by both conceptually and empirically contrasting MCAI and DCAI as distinct development frameworks. It especially focuses on incorporating generative diffusion models as a means to enhance targeted data enrichment and improve quality. This creates a structured framework to make data improvement measurable, reproducible, and methodically integrated into the training process. Based on this, we consider the following three research questions (RQ):

How does the model performance change when varying the data quality and quantity additionally compared to a model-centric optimization?
What role do synthetically generated data play in improving the detection of rare defect classes in industrial image data sets?
How can data-centric AI be applied in optical quality control?

For our experiments, we chose a supervised approach, as unsupervised methods yielded poorer results in various studies. The literature reports insufficient results with autoencoders [19], and similarly, the desired results were not achieved in a study using other unsupervised methods [20]. For these reasons, we decided to use a convolutional neural network (CNN) for our experiments. We performed the analysis on two different data sets. These data sets are publicly available, and researchers have already used them for various experiments. At the beginning of the experiments, we define a validation data set that consists only of real data to compare the performance of the two approaches. The validation data set stays the same for all experiments and is not used for training to achieve comparable results.

Our paper is structured as follows: After the introduction, we explain the foundations and present the state-of-the-art, which we divide into DCAI and MCAI. We also explain the foundations of our diffusion model for synthetic data generation and the basics of a CNN. In the third chapter, we describe our methods, followed by the experiments in chapter four. In the last chapter, we summarize our work and derive future research.

2. Foundations and Related Work

The following section provides an overview of the foundations in relation to synthetic image generation using generative AI and CNNs for defect detection. We additionally provide an overview of related works by summarizing the three main studies related to DCAI and MCAI. We then conclude the chapter with a summary.

2.1. Synthetic Data Generation

Synthetic images are computer-generated image data that simulate the visual characteristics and structural variations in real scenes or objects and therefore can serve as a substitute or supplement to real images [21]. Generating such data for industrial applications is a laborious task. Various methods can be used to achieve the goal of data augmentation, including virtual methods [22,23], simulation-based approaches [24] and the use of generative artificial intelligence [25,26]. When using generative AI, several models can be considered, such as autoencoders, generative adversarial networks (GAN) or diffusion models [27]. In our experiments, we use a denoising diffusion probabilistic model (DDPM) that is based on [28]. A diffusion model comprises a diffusion and a denoising part. The purpose of the training process is to minimize the loss. In the following, we briefly describe the different processes that belong to the DDPM.

Forward process: In the forward process of a diffusion model, an image

X_{0}

becomes gradually noisy over several time steps

t = 1, 2, \dots, T

. This process is described by a rule-based stochastic method in which the variance of the noise is controlled by a time-dependent variance planning

β_{t}

. Within the framework of variance planning, the parameter

β_{t}

is restricted by

0 \leq β_{t} \leq 1

and is structured to increase as t progresses. Since each step t depends only on the previous step

t - 1

, it is a Markov process. The forward process can be described by the following Equations (1) and (2):

q (x_{1 : T} ∣ x_{0}) = \prod_{t = 1}^{T} q (x_{t} ∣ x_{t - 1})

(1)

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)

(2)

Denoising Process: Denoising attempts to recreate the initial state of the noisy image. Since this approach is not computable, a neural network (NN) is used to approximate the corresponding distributions (3). The NN is parametrized by

θ

. To calculate

x_{t - 1}

, the NN takes the arguments

x_{1}

from the previous step and calculates the normal distribution (4).

p_{θ} (x_{t - 1} | x_{t}) \sim q (x_{t - 1} | x_{t})

(3)

p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t))

(4)

Training: The noise component with which we add noise to

x_{t - 1}

or

x_{0}

is called

ϵ

. In contrast, this means that our model learns

ϵ_{θ} (x, t)

in such a way that it can predict

ϵ

. The training aims to minimize the mean squared error (MSE) using it as the loss function (5).

L (θ) = E_{x_{0}, ϵ, t} [∥ ϵ - ϵ_{θ} (x_{t}, t) ∥^{2}]

(5)

2.2. Deep Learning for Optical Quality Inspection

Computer vision allows machines to interpret and understand visual information [29]. The combination of deep learning and computer vision overcomes many limitations associated with traditional methods. Deep learning methods can be divided into supervised learning, which requires labeled data, and unsupervised learning, which does not require labeled data. In addition, there is a combination of labeled and unlabeled data, which is called semi-supervised learning. In the following, we will explain the CNN used, but we refer to the literature that summarizes various deep learning methods and use cases related to this topic [29,30].

Convolutional Neuronal Networks A convolutional neural network (CNN) is a type of neural network that excels at handling image data.The process begins with the input layer, where image data is received; color images consist of three channels, whereas black and white images contain only one channel. Pixel values are often normalized before processing to stabilize the calculation. In the convolutional layer, small filters called kernels glide over the image and extract features such as edges and textures. This creates new reduced feature maps [31]. To introduce non-linearities, the ReLU activation function is used, which eliminates negative values. This is followed by the grouping layer, which reduces the spatial dimension of the feature maps and reduces the complexity [32]. Max pooling selects the maximum value within an area. In the end, fully connected layers combine all the extracted features to enable the final classification or regression. The output layer provides the probability of each class [33]. CNNs learn important features through the training process in which the filter weights are optimized, making them particularly powerful for image recognition and other image-based AI applications.

2.3. DCAI in Optical Quality Assurance

Several studies have begun to explore DCAI in visual inspection. These works focus on data set augmentation, edge filtering, and synthetic data generation as strategies to improve model robustness. In the following, three relevant research papers are presented:

The authors developed an image processing-based system to detect structural defects in crankshafts [34]. For the validation, they used the Intersection over Union metric (IoU). IoU is the ratio of the intersection of two bounding boxes. Using semantic segmentation and a CNN with MobileNet architecture, an IoU value of 64.7% was achieved. With a DexiNed edge detection filter, it was possible to improve the performance by 8.4%. By improving the training data set through traditional data augmentation, the IoU value finally reached 86.3%. This showed that targeted improvements and the expansion of the data set improve AI-based quality inspection systems.

In this work the authors developed a generative unsupervised anomaly detection approach for ZDM in additive manufacturing [35]. With the help of this anomaly detection, the quality inspection of printed sand cores in a foundry is to be checked. Using synthetic data generated by domain randomization, a cost-effective and accurately labeled data set could improve the performance of ML algorithms. The authors evaluated their approach using various ML methods.

The following paper discussed experiments to improve the classification of defect images in semiconductor production using a data-centric approach [36]. First, data set V1 was used without modifications to establish a baseline. Subsequently, data augmentation with geometric transformations was applied. The third experiment combined resampling for data balancing and augmentation to expand the data set. The fourth experiment used class weighting for algorithmic balance. The fifth and sixth experiments used data sets V2 and V3 with resampling and augmentation, similar to the previous ones. The final experiment achieved a validation accuracy of 92.7%. All experiments used a unified CNN architecture based on EfficientNet-B1. Using this data-centric approach (DCAI), they successfully improved the classification results.

2.4. MCAI in Optical Quality Assurance

In contrast to the data-centric approach, most existing optical quality assurance systems focus on model design, hyperparameter tuning, and transfer learning, the core principles of MCAI. The following presents three papers that have applied this approach.

The authors investigate the classification of visual errors using neural networks [37]. Infrared images of thermally conductive components are examined. Their research focuses on the design and training of the models as well as transfer learning. Imagenet was used for the experiment, and the training process was divided into two parts. In the second half, the hyperparameter has been adjusted. With their work, the researchers were able to show that their approach worked well but small errors are still challenging. To improve the system, it is proposed to collect more data from the initial error classes.

In their work, Ghansical et al. describe an unsupervised approach which, as in our paper, refers to the casting data set [38]. With their approach, the authors try to avoid data-related problems, such as insufficient error data and the resulting unbalanced data sets. To do so, they use adversarial generative networks and an autoencoder to detect anomalies. The authors compared their approaches with Alexnet, which performed better in two of the three experiments.

Hridoy et al. examined the approach of transfer learning and fine-tuning of pre-trained models in depth [39]. Well-known models such as Inception Resnet v2, Xception, ReNet 101 v2 and ResNet 105 v2 were examined. The product analyzed by visual inspection was a hex nut and the casting data set. The data set for training the hex nut was very well balanced and contained 2000 good and 2000 bad examples. The authors only trained the last 14 layers. Using their approach, they achieved 100% accuracy on their data for the hex nut and 99.72% accuracy on the casting data set.

3. Methodology

The following chapter summarizes the data sets and the methods used to achieve MCAI and DCAI. A separate sub-chapter explains the approach to creating synthetic data.

3.1. Model-Centric Artificial Intelligence

As mentioned above, the main focus of this approach lies in optimizing hyperparameters [40]. The data is pre-processed and cleaned in an initial step, which we consider already completed for the data sets under investigation. This preprocessing includes the removal of conspicuous data points, such as outliers or noisy data, to avoid any negative impact on model performance. This stage included the application of data labeling and traditional data augmentation techniques. Subsequently, the prepared data set was used to train the model. Figure 1A illustrates MCAI. After the initial performance evaluation, the hyperparameters are adjusted for optimal results [11]. Key hyperparameters include the batch size, number of convolution filters, dense units, dropout rate, and the choice of loss function.

The cross-entropy loss function (6) is as follows [41]:

L = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \cdot log (p_{i}) + (1 - y_{i}) \cdot log (1 - p_{i})]

(6)

The Hinge loss function (7) is as follows [41]:

L = \frac{1}{N} \sum_{i = 1}^{N} max (0, 1 - y_{i} \cdot {\hat{y}}_{i})

(7)

This iterative process trains and eventually implements the ML models. In our experiments, we employ a grid search strategy to explore various combinations of hyperparameters. We evaluated each combination over 10 epochs and selected the configuration with the highest accuracy for the final training. The following defines accuracy [42]:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(8)

TP: True Positives;
TN: True Negatives;
FP: False Positives;
FN: False Negatives.

Figure 1. Visualization of the model-centric (A) and data-centric approach (B) [43].

3.2. Data-Centric Artificial Intelligence

DCAI, as illustrated in Figure 1 B, highlights the transition from focusing on hyperparameter optimization to prioritizing high-quality data and its preparation. Data are first reviewed for correct labels and outliers and corrected if necessary. This requires collaboration between domain experts and ML or data experts to combine domain knowledge with technical expertise. This can improve the robustness of the model. We must also account for data augmentation and data enrichment, using both conventional approaches and creating new synthetic data. The goal is to represent features representatively and balance the data set to ensure a high generalizability of the model. Analyzing a confusion matrix is essentially used to identify weaknesses in the model, such as a high number of false positives or false negatives. These insights enable targeted data corrections, e.g., by correcting labels, adding representative examples, or adjusting the data augmentation strategy. This consistently enhances data quality, thereby boosting model performance.

To enhance the data sets in our experiments, we improve the labels, augment the data and add synthetic data, which are generated by a diffusion model. It is important to note that the steps to improve the data sets are in addition to the already very good data sets.

3.3. Data Sets

This study employs two separate data sets that represent different stages of the production process. The casting data set signifies a well-established process with ample defect data suitable for model training, whereas the leather surface data set represents an initial-stage process with a scarcity of defect samples, complicating the model development. Figure 2 and Figure 3 illustrate representative samples from each data set. These images are included to visually demonstrate the differences between defect distributions and data availability across both domains, as well as to show the quality of synthetic defect images generated for data augmentation. The categories comprise good and defective samples (Figure 2a,b and Figure 3a,b) and a corresponding synthetic example per data set (Figure 2c and Figure 3c).

In order to address the significant data imbalance, synthetic images were created to enhance the representation of the defect class. In the leather data set, images from both categories were created, while in the casting data set, which already had an adequate number of good images, only images of defects were generated.

The initial data set comprises images of impellers and can be accessed on Kaggle [44]. The components were produced using a casting process. In its basic form, the data set contains 1300 images, of which 781 images show defects and 519 images represent a good product. Additionally, the data set is enlarged to 7348 images using data augmentation. However, we did not consider the increased data set. The aim is to train a model that distinguishes between good and bad products.The defective products show numerous casting imperfections that aren’t specifically classified, but are collectively labeled as defective.

The MVTec Leather Data Set is designed for assessing anomaly detection techniques, specifically in the context of industrial inspections [45,46]. It includes more than 5000 images divided into fifteen different object and texture categories, such as screws, transistors, or cables. Each category contains a set of defect-free training images, as well as a test set of images with different defects and defect-free images. However, in this paper, we refer only to the training images for leather surfaces. This data set contains 247 images, of which 51 are defect images divided into different categories, which we summarize into the class ‘defective.’

In training both DDPM, we utilized all accessible images, incorporating data augmentation. The precise count of images is detailed in Table 1. During image generation, the model produced images belonging to the ‘ok’ and ‘nok’ categories.

3.4. Model Design and Configuration

The various models used in the experiment are explained below. The structures of DDPM and CNN are discussed. The corresponding training parameters are also explained here.

3.4.1. Convolution Neuronal Network

At the beginning of the experiments, the images were resized to 128 × 128 pixels. For the casting data set, we use one input channel; for the leather surfaces data set, we use three, as the images are colored. The pixel values were then normalized by dividing them by 255. We used a CNN to perform the experiment. The CNN comprises two convolutional layers with 3 × 3 filters and a stride of two without zero padding. As an activation function, we used relu. We also use min/max pooling to reduce the size of the respective layers [31,47]. The CNN categorizes data into good or bad, employing the sigmoid function (9) as noted in [48].

S (x) = \frac{1}{1 + e^{- x}}

(9)

The other parameters depend on the hyperparameter tuning. Table 2 shows the different parameter configurations. A confusion matrix, created for each test, visualizes and compares classification accuracy and performance. The confusion matrix serves as the basis for the latter discussion and for the evaluation metrics.

In our experiments, we use early stopping based on the accuracy achieved. The maximum of 100 epochs was set, but none of the seven subsequent attempts reached this limit. The system stopped training if no accuracy improvement was observed over five consecutive epochs. The data set was divided with 10% allocated to test data, and the remaining 90% was further divided into 10% for validation and 90% for training.

3.4.2. Diffusion Model

A probabilistic diffusion probabilistic model (DDPM) was implemented to generate images of synthetic defects. The diffusion backbone consisted of a four-stage U-Net architecture with a base channel dimension of 16 and channel multipliers of (1, 2, 4, 8), which enabled hierarchical feature extraction across different spatial resolutions. Flash attention was used to improve training efficiency and memory usage.

The Gaussian diffusion model was set up with 10,000 time steps for training and 100 time steps for the sampling phase, with an image resolution of 128 × 128 pixels. To optimize the model, it was trained to predict the noise component (ε prediction target) by minimizing the mean squared error.

The number of training epochs was evaluated every 1000 steps. As a result, sample images demonstrating the model’s current capabilities were produced and reviewed by the AI engineer. For the casting data set, the DDPM was trained for 70,000 epochs, while for the leather data set, the model was trained for 30,000 epochs. Due to the lower complexity of the images in the leather data set compared to those in the casting data set, a shorter training duration was adequate. However, it is recommended to structure the model training based on the intermediate data produced to effectively track and assess progress. The DDPM was trained solely on the training and validation data set mentioned earlier on both classes.

Following the training, we generated synthetic images, but not all were satisfactory, making it essential for an expert to select them.

4. Evaluation

In this chapter, we describe the basic results of the experiments performed. The metrics used to evaluate the trained models, the approaches to MCAI and DCAI, and the generation of synthetic data are explained.

4.1. Metrics

To evaluate the performance of the different approaches and the associated techniques, we calculate precision, recall, and the F1 score, which are explained in [42].

The precision measures the proportion of instances correctly classified as positive among all instances classified as positive. It evaluates how accurate the model is in identifying positive classes.

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

The recall measures the proportion of instances correctly classified as positive among all actually positive instances. It evaluates how well the model detects all positive instances.

R e c a l l = \frac{T P}{T P + F N}

(11)

The F1 score is the harmonic mean of precision and recall. It combines both metrics to provide a balanced assessment of model performance, especially on imbalanced data sets.

F 1 - S c o r e = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(12)

For each data set, we created a nearly balanced validation data set consisting of 10% of the original data, which was used for all tests. For our casting data set, we used 51 okay (‘ok’) and 72 not okay (‘nok’) images. Our leather test data set comprised 32 ‘ok’ and 28 ‘nok’ images. We divided the remaining data into 10% for validation and 90% for training.

Additionally, we plot the associated confusion matrix of each test based on the results. The confusion matrix gives an excellent overview of the right predicted classes to the wrong predicted classes and forms the basis for the calculations. Furthermore, we have illustrated the receiver operating characteristic (ROC) curve for each experiment, along with the corresponding area under the curve (AUC) value. The plot shows the true positive rate (TPR) vs. the false positive rate (FPR) at different thresholds. The ROC curve illustrates the balance between a classifier’s sensitivity and specificity, while the AUC value measures the area under the curve. The AUC value indicates the ability of the model to distinguish between the classes [49].

4.2. Model-Centric Approach

This section provides a concise summary of the experiments related to MCAI. Referencing Table 2, in our experiments, the initial hyperparameter setup achieved an accuracy of 83% during tuning for the casting data set. The model was trained on a dataset that included 468 ’ok’ images and 709 ’nok’ images. For the Leather Data Set, the initial hyperparameter configuration also reached the highest accuracy at 53%. The training utilized 196 ’ok’ images and 51 ’nok’ images, indicating a notably imbalanced dataset.

4.3. Data-Centric Approach

In this subsection, we briefly describe the DCAI experiments. Compared to the MCAI experiment, we conducted multiple experiments with different data sets. We trained each model with the same parameters as in the MCAI approach with the improved data sets. We have summarized the various parameters and results per data set in Table 1.

Casting Data Set:

Data labeling: In the initial iteration of data refinement, we observed inaccuracies in some labels, specifically affecting the ‘nok’ class. We eliminated 39 images, retrained the model, and achieved an accuracy of 86%.

Data augmentation: In the subsequent experiment, our goal was to equalize the data set using data augmentation techniques. To minimize the risk of overfitting, we introduced a limited amount of augmented data. The flipping and resizing of the original images formed the augmentation. As a result, a training set was generated with 849 ‘ok’ images and 828 ‘nok’ images. The highest accuracy achieved in this experiment was 89%.

Data enrichment: The final approach involved the addition of new data, a process we called ‘data enrichment’. A diffusion model, as described in the previous section, generated these new data. We generated 100 artificial defect images and chose the top 59 ‘nok’ images. The enriched data set contained 849 ‘ok’ images and 887 ‘nok’ images. Since the classification for the ‘ok’ class already works quite well, we have only added 59 ‘nok’ images. This approach resulted in an accuracy of 93%.

Leather Data Set:

Data labeling: The leather data set stayed in excellent condition, making any additional relabeling or label removal unnecessary. Consequently, data labeling was not included in Table 3.

Data augmentation: Since the leather data set was already error-free in terms of its labels, we only used data augmentation and enhancement techniques. For the first experiment, we used the previously described data enhancement methods, including mirroring and resizing the original images. The result was a training set with 301 ‘ok’ images and 192 ‘nok’ images. The model achieved an accuracy of 27%.

Data enrichment: In the final experiment, we created 250 synthetic images to enhance the data set. We chose all 142 high-quality defect images and balanced them with 43 images in the OK category. This synthetic data was created using a diffusion model as described in the previous section. The augmented data set for this experiment included 344 ‘ok’ images and 334 ‘nok’ images. The accuracy achieved was 62%.

5. Discussion

In the following section, we discuss the results of each experiment. Therefore, we summarize the DCAI experiments in Table 1. Table 3 shows the different results for the precision of the metrics, the recall, and the F1 score. As a basis for the discussion, we plot the seven confusion matrices in Figure 4. After discussing the experiments, we return to the research questions and answer them.

5.1. Casting Data Set

Experiment a: The MCAI approach for the casting data set results in a precision of 0.71 for the category ‘ok’ and 0.98 for the category ‘nok’. However, the score for defects is nearly inverse, with a precision of 0.98 and a recall of 0.72. The F1 score for both categories is 0.83, showing a balanced performance. However, as we can see from the confusion matrix shown in Figure 4a, the model has trouble predicting defects, since it predicts 20 samples as ‘ok’ which are actually defects.

Experiment b: Improving the labels increases the precision and F1 score for ‘ok’ to 0.75 and 0.86, respectively, and for ‘nok’ to 1.00 and 0.87, respectively. This improvement shows that the accuracy of the labels is enhanced and significantly affects the performance of the model, especially in defect detection. We observed in the confusion matrix (Figure 4c) that the classification of defects identified as ‘ok’ improved by 3, and now all images ‘ok’ are correctly classified.

Experiment c: data augmentation further increases the precision for ‘ok’ to 0.80 and for ‘nok’ to 1.00. The F1 score for both categories improved to 0.89 and 0.90, respectively. This shows that increasing the amount and diversity of the data by augmentation significantly improves the performance of the model. “nok” classification has improved again, and 6 fewer components are incorrectly classified.

Experiment d: data enrichment leads to the best results with a precision of 0.88 for ‘ok’ and 0.98 for ‘nok’ and an F1 score of 0.93 and 0.94, respectively. This underlines the importance of high-quality and diverse data for model performance. The confusion matrix shows (Figure 4e) that we reduced the number of defects incorrectly predicted from 20 to 7.

5.2. Leather Data Set

Experiment e: The MCAI approach for the leather data set shows a strong difference between the categories. Although the category ‘ok’ has a precision of 0.53, the values for the category ‘nok’ are 0.00, showing a complete misclassification. These values illustrate that the model cannot classify any ‘nok’ components. The confusion matrix (Figure 4b) reveals that the model cannot distinguish between the ‘ok’ and ‘nok’ components. The model achieves 53 % accuracy. However, this is not a meaningful value, as the data set is very unbalanced.

Experiment f: The data expansion improves the precision of ‘nok’ to 0.31 and the F1 score to 0.37, while the values of ‘ok’ drop to 0.17 and 0.12, respectively. An improvement is recognizable, and the extension can be helpful. However, the model has not yet performed optimally. The confusion matrix (Figure 4f) shows that the classification has improved, but it is still not in a good range, which results in a very low accuracy of 27 %.

Experiment g: Improved data yields significantly better results. The precision improves to 0.64 for ‘ok’ and 0.59 for ‘nok’. The F1 scores are 0.65 and 0.58 for the two categories. This shows that data augmentation can significantly improve the performance of the model even with more demanding data sets. The performance of the model can also be significantly improved with more demanding data sets. The improvement is shown in the confusion matrix (Figure 4g)

Our experiments indicate that DCAI significantly enhances the performance of ML models. Notably, when paired with data augmentation and synthetic data, these methods yielded remarkable outcomes. This positively influenced the models’ classification capabilities. The latest confusion matrix highlights the beneficial impact of high-quality data sets on performance. The model’s classification score has improved markedly since the initial phase. The positive effect of DCAI compared to MCAI is particularly apparent in Figure 5 and Figure 6. An AUC value of 1 signifies a perfect model, whereas a value of 0.5 (represented by the dashed line in the figures) suggests random guessing. While the AUC value for the casting models is already nearly 1, the AUC value for the leather tests shows a notable improvement.

5.3. Research Questions

On the basis of our previously defined RQs, we discuss the results of the experiments in the following section.

How does the model performance change when varying the data quality and quantity additionally compared to a model-centric optimization? When examining how model performance changes with variations in data quality and quantity compared to a model-centric optimization, several key observations emerge. Firstly, while an increase in the amount of data can provide more information for the model, it does not inherently lead to improved performance. The presence of incorrectly labeled data can hinder the effectiveness of any model or hyperparameter tuning, emphasizing the importance of data quality over sheer quantity. In our experiments, we found that improving the quality of the label, particularly within the casting data set, led to significant performance improvements across all metrics, even with a reduced volume of data. This underscores the critical role of accurately labeled data in driving model performance. The ratio of poor-quality data to high-quality data is pivotal; reducing the proportion of poor-quality data improves system performance more effectively than simply increasing data volume. Thus, while model-centric optimizations, such as tuning hyperparameters, are important, they cannot compensate for low data quality. This analysis clearly demonstrates that focusing on data-centric improvements, ensuring data are high-quality and well labeled, can lead to superior performance outcomes compared to relying solely on a model-centric strategy.

What role does synthetically generated data play in improving the detection of rare defect classes in industrial image data sets? Our experiments have shown that synthetic data have a positive effect on the model. The generated data helps balance the data sets, allowing the model to better distinguish between different categories. This was particularly evident in our leather data set. Furthermore, synthetic data can reduce biases in AI systems. Unlike traditional data enrichment methods, which typically only alter the shape, size, or position of the data, our DDPM generated entirely new data. However, it is important to note that generating synthetic data with DDPM requires a certain amount of initial data. Augmentation techniques can serve as a foundation for enabling the training of the DDPM. In general, synthetic data enrichment had a positive impact on both experiments, improving recall, precision, accuracy, and the F1 score.

How can data-centric AI be applied in optical quality control? Through our experiments, we have proven that data-centric AI is a valuable tool for increasing the performance of an AI system. Industry should consider the use of data-centric AI in all systems. The different methods used here differ depending on the use case in the real world. Only under rare circumstances is a good and balanced data set already available. The methods used also differ depending on the maturity of the production. In the early stages, methods for increasing the number of data are essential. Data quality and labeling are especially critical for established processes. The time of system implementation and current performance determine the extent of data assessment.

6. Conclusions and Future Research Work

The final chapter summarizes our research experiments and findings. Based on this perspective, we identify future research directions for DCAI in industry.

6.1. Conclusions

In this paper, we conducted a comparative analysis and comparison of data-centric AI (DCAI) and model-centric AI (MCAI) using several data sets. First, we outlined the basic principles of both approaches, the creation of synthetic data, and presented the data sets used in our study. We then explain the experimental set up and the evaluation metrics used, including precision, recall, and F1 score. Our experiments showed that MCAI is particularly effective for data sets where sufficient data of all classes are available and of good quality. In contrast, DCAI serves as a complementary approach depending on the use case. Our results show that DCAI can be used as a powerful improvement strategy, especially in scenarios where data quality and quantity are critical factors. The leather data set provided a notable example; the application of DCAI significantly improved performance compared to the model-centric approach. By extending the data set and incorporating synthetic fault data, we not only achieved a balanced class distribution but also improved the model’s ability to effectively distinguish between good and bad instances. DCAI also improved the casting data set, but as this already had a very good data set, the improvement here was not as good as that of the leather data set. This underlines the potential of DCAI as a valuable tool for optimizing ML models, especially in areas where data refinement and enhancement play a central role. The previous chapter detailed our findings related to the research questions established initially. We summarize our findings in Table 4.

The research relies on just two data sets that have well-defined class structures. This limits the extent to which the findings can be applied to more complex or multimodal contexts and diminishes their overall significance. Moreover, the use of synthetic data created through diffusion models poses the risk of leaking information. This situation may result in an overestimated perception of performance improvements and may question the originality of the synthetic data produced. Additionally, data-centric methods require a substantial amount of manual work, such as expert labeling, and incur significant computational expenses for both the creation and verification of synthetic data. Scalable, semi-automated methods would therefore be desirable. Finally, it remains unclear to what extent the observed effects are transferable to other model architectures such as vision transformers or multimodal encoders, as the study deliberately uses a CNN to ensure the most direct comparison possible.

6.2. Future Research Work

Future tasks should focus on improving the methods that can be used for DCAI. These methods are described in the following. We divide these into different methods and explain them. Furthermore, many models are only available to the user as black-box models. The training data are also unknown. A method that tests data-related errors from a black box model would improve the retraining approach.

Synthetic data generation: Synthesizing data used to be difficult. Although Diffusion Models performed well for our applications, their training process is time-consuming. Generative AI struggles, especially with insufficient training data. To perform well, deep learning models for industrial quality inspection require many images of parts with similar anomalies. Future research should focus on developing such tools.

Data labeling: data labeling improves data annotation. Various approaches, such as active learning, have already been used to automate data labeling [50,51]. However, in many areas, domain-specific knowledge from experts is required. Future research could look at how to combine data labeling with active learning to automate the process as much as possible. This problem, caused by overlapping and misclassified classes, could be solved by precisely defining a specific error case.

Model testing: The model is currently being tested with the metrics mentioned above. However, the gap between the real world and product development is often large. Further research would be on how to test models with external data.

Black box testing: AI often represents a black-box behavior. Data tests build trust and understanding of the technology. It is also conceivable to create synthetic data and validate the model on these data.

Looking at future research work, several individual tasks arise. The big goal would be to have everything in one pipeline.

Author Contributions

M.N., conceptualization, methodology, software, validation, data curation, writing—original draft, visualization; A.Z., supervision, writing—review and editing; H.T., supervision, writing—review and editing. B.F., writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data is available in in the sources cited in the text.

Acknowledgments

During the preparation of this work, the authors used ProWritingAid to eliminate possible grammatical or spelling errors. ChatGPT 4 and DeepSeak V3 were used to create the Latex syntax for tables, images, and mathematical formulas. After using this tool/service, the authors reviewed and edited the content as needed and assume full responsibility for the content of the publication.

Conflicts of Interest

Authors Michael Nieberl, Alexander Zeiser and Bastian Friedrich were employed by the company BMW Group. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Govaerts, R.; De Bock, S.; Stas, L.; El Makrini, I.; Habay, J.; Van Cutsem, J.; Roelands, B.; Vanderborght, B.; Meeusen, R.; De Pauw, K. Work performance in industry: The impact of mental fatigue and a passive back exoskeleton on work efficiency. Appl. Ergon. 2023, 110, 104026. [Google Scholar] [CrossRef]
Nascimento, R.; Martins, I.; Dutra, T.A.; Moreira, L. Computer Vision Based Quality Control for Additive Manufacturing Parts. Int. J. Adv. Manuf. Technol. 2023, 124, 3241–3256. [Google Scholar] [CrossRef]
Villalba-Diez, J.; Schmidt, D.; Gevers, R.; Ordieres-Meré, J.; Buchwitz, M.; Wellbrock, W. Deep Learning for Industrial Computer Vision Quality Control in the Printing Industry 4.0. Sensors 2019, 19, 3987. [Google Scholar] [CrossRef] [PubMed]
Neumann, W.P.; Kolus, A.; Wells, R.W. Human Factors in Production System Design and Quality Performance – A Systematic Review. IFAC-PapersOnLine 2016, 49, 1721–1724. [Google Scholar] [CrossRef]
Hachem, C.E.; Perrot, G.; Painvin, L.; Couturier, R. Automation of Quality Control in the Automotive Industry Using Deep Learning Algorithms. In Proceedings of the 2021 International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 8–10 January 2021; pp. 123–127. [Google Scholar] [CrossRef]
Msakni, M.K.; Risan, A.; Schütz, P. Using machine learning prediction models for quality control: A case study from the automotive industry. Comput. Manag. Sci. 2023, 20, 14. [Google Scholar] [CrossRef] [PubMed]
Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep learning applications and challenges in big data analytics. J. Big Data 2015, 2, 1. [Google Scholar] [CrossRef]
Munappy, A.R.; Bosch, J.; Olsson, H.H.; Arpteg, A.; Brinne, B. Data management for production quality deep learning models: Challenges and solutions. J. Syst. Softw. 2022, 191, 111359. [Google Scholar] [CrossRef]
Hamid, O.H. From Model-Centric to Data-Centric AI: A Paradigm Shift or Rather a Complementary Approach? In Proceedings of the 2022 8th International Conference on Information Technology Trends (ITT), Dubai, United Arab Emirates, 25–26 May 2022; pp. 196–199. [Google Scholar] [CrossRef]
Priestley, M.; O’donnell, F.; Simperl, E. A Survey of Data Quality Requirements That Matter in ML Development Pipelines. J. Data Inf. Qual. 2023, 15, 11. [Google Scholar] [CrossRef]
Jarrahi, M.H.; Memariani, A.; Guha, S. The Principles of Data-Centric AI. Commun. ACM 2023, 66, 84–92. [Google Scholar] [CrossRef]
Zha, D.; Bhat, Z.P.; Lai, K.H.; Yang, F.; Jiang, Z.; Zhong, S.; Hu, X. Data-centric Artificial Intelligence: A Survey. arXiv 2023. [Google Scholar] [CrossRef]
Whang, S.E.; Roh, Y.; Song, H.; Lee, J.G. Data collection and quality challenges in deep learning: A data-centric AI perspective. VLDB J. 2023, 32, 791–813. [Google Scholar] [CrossRef]
Patel, H.; Guttula, S.; Gupta, N.; Hans, S.; Mittal, R.S.; N, L. A Data-centric AI Framework for Automating Exploratory Data Analysis and Data Quality Tasks. J. Data Inf. Qual. 2023, 15, 44. [Google Scholar] [CrossRef]
Bauer, J.C.; Trattnig, S.; Vieltorf, F.; Daub, R. Handling data drift in deep learning-based quality monitoring: Evaluating calibration methods using the example of friction stir welding. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
Kumar, S.; Datta, S.; Singh, V.; Singh, S.K.; Sharma, R. Opportunities and Challenges in Data-Centric AI. IEEE Access 2024, 12, 33173–33189. [Google Scholar] [CrossRef]
Mazumder, M.; Banbury, C.; Yao, X.; Karlaš, B.; Gaviria Rojas, W.; Diamos, S.; Diamos, G.; He, L.; Parrish, A.; Kirk, H.R.; et al. DataPerf: Benchmarks for Data-Centric AI Development. Adv. Neural Inf. Process. Syst. 2023, 36, 5320–5347. [Google Scholar]
Zha, D.; Lai, K.H.; Yang, F.; Zou, N.; Gao, H.; Hu, X. Data-centric AI: Techniques and Future Perspectives. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 5839–5840. [Google Scholar] [CrossRef]
Chamberland, O.; Reckzin, M.; Hashim, H.A. An Autoencoder with Convolutional Neural Network for Surface Defect Detection on Cast Components. J. Fail. Anal. Prev. 2023, 23, 1633–1644. [Google Scholar] [CrossRef]
Zipfel, J.; Verworner, F.; Fischer, M.; Wieland, U.; Kraus, M.; Zschech, P. Anomaly detection for industrial quality assurance: A comparative evaluation of unsupervised deep learning models. Comput. Ind. Eng. 2023, 177, 109045. [Google Scholar] [CrossRef]
Gaspar, F.; Carreira, D.; Rodrigues, N.; Miragaia, R.; Ribeiro, J.; Costa, P.; Pereira, A. Synthetic image generation for effective deep learning model training for ceramic industry applications. Eng. Appl. Artif. Intell. 2025, 143, 110019. [Google Scholar] [CrossRef]
Rajendran, M.; Tan, C.T.; Atmosukarto, I.; Ng, A.B.; See, S. Review on synergizing the Metaverse and AI-driven synthetic data: Enhancing virtual realms and activity recognition in computer vision. Vis. Intell. 2024, 2, 27. [Google Scholar] [CrossRef]
Man, K.; Chahl, J. A Review of Synthetic Image Data and Its Use in Computer Vision. J. Imaging 2022, 8, 310. [Google Scholar] [CrossRef]
Dahmen, T.; Trampert, P.; Boughorbel, F.; Sprenger, J.; Klusch, M.; Fischer, K.; Kübel, C.; Slusallek, P. Digital reality: A model-based approach to supervised learning from synthetic data. AI Perspect. 2019, 1, 2. [Google Scholar] [CrossRef]
Figueira, A.; Vaz, B. Survey on Synthetic Data Generation, Evaluation Methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]
Kim, K.; Myung, H. Autoencoder-Combined Generative Adversarial Networks for Synthetic Image Data Generation and Detection of Jellyfish Swarm. IEEE Access 2018, 6, 54207–54214. [Google Scholar] [CrossRef]
Zhou, H.A.; Wolfschläger, D.; Florides, C.; Werheid, J.; Behnen, H.; Woltersmann, J.H.; Pinto, T.C.; Kemmerling, M.; Abdelrazeq, A.; Schmitt, R.H. Generative AI in industrial machine vision: A review. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020. [Google Scholar] [CrossRef]
Raisul Islam, M.; Zakir Hossain Zamil, M.; Eshmam Rayed, M.; Mohsin Kabir, M.; Mridha, M.F.; Nishimura, S.; Shin, J. Deep Learning and Computer Vision Techniques for Enhanced Quality Control in Manufacturing Processes. IEEE Access 2024, 12, 121449–121479. [Google Scholar] [CrossRef]
Tercan, H.; Meisen, T. Machine learning and deep learning based predictive quality in manufacturing: A systematic review. J. Intell. Manuf. 2022, 33, 1879–1905. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review. Neurocomputing 2016, 187, 27–48. [Google Scholar] [CrossRef]
Xia, Z. An Overview of Deep Learning. In Deep Learning in Object Detection and Recognition; Jiang, X., Hadid, A., Pang, Y., Granger, E., Feng, X., Eds.; Springer: Singapore, 2019; pp. 1–18. [Google Scholar] [CrossRef]
Moosavian, A.; Bagheri, E.; Yazdanijoo, A.; Barshooi, A.H. An Improved U-Net Image Segmentation Network for Crankshaft Surface Defect Detection. In Proceedings of the 2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), Tehran, Iran, 6–7 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
Zeiser, A.; Özcan, B.; Van Stein, B.; Bäck, T. Evaluation of deep unsupervised anomaly detection methods with a data-centric approach for on-line inspection. Comput. Ind. 2023, 146, 103852. [Google Scholar] [CrossRef]
Kofler, C.; Dohr, C.A.; Dohr, J.; Zernig, A. Data-Centric Model Development to Improve the CNN Classification of Defect Density SEM Images. In Proceedings of the IECON 2022—48th Annual Conference of the IEEE Industrial Electronics Society, Brussels, Belgium, 17–20 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
Weiher, K.; Rieck, S.; Pankrath, H.; Beuss, F.; Geist, M.; Sender, J.; Fluegge, W. Automated visual inspection of manufactured parts using deep convolutional neural networks and transfer learning. 56th CIRP Int. Conf. Manuf. Syst. 2023, 120, 858–863. [Google Scholar] [CrossRef]
Ghansiyal, S.; Yi, L.; Simon, P.M.; Klar, M.; Müller, M.M.; Glatt, M.; Aurich, J.C. Anomaly detection towards zero defect manufacturing using generative adversarial networks. Procedia CIRP 2023, 120, 1457–1462. [Google Scholar] [CrossRef]
Hridoy, M.W.; Rahman, M.M.; Sakib, S. A Framework for Industrial Inspection System using Deep Learning. Ann. Data Sci. 2024, 11, 445–478. [Google Scholar] [CrossRef]
Malerba, D.; Pasquadibisceglie, V. Data-Centric AI. J. Intell. Inf. Syst. 2024, 62, 1493–1502. [Google Scholar] [CrossRef]
Wang, Q.; Ma, Y.; Zhao, K.; Tian, Y. A Comprehensive Survey of Loss Functions in Machine Learning. Ann. Data Sci. 2022, 9, 187–212. [Google Scholar] [CrossRef]
Dalianis, H. Evaluation Metrics and Evaluation. In Clinical Text Mining; Springer International Publishing: Cham, Switzerland, 2018; pp. 45–53. [Google Scholar] [CrossRef]
Nieberl, M.; Zeiser, A.; Timinger, H. A Review of Data-Centric Artificial Intelligence (DCAI) and its Impact on manufacturing Industry: Challenges, Limitations, and Future Directions. In Proceedings of the 2024 IEEE Conference on Artificial Intelligence (CAI), Singapore, 25–27 June 2024; pp. 44–51. [Google Scholar] [CrossRef]
Dabhi, R. Casting Product Image Data for Quality Inspection. 2020. Available online: https://www.kaggle.com/datasets/ravirajsinh45/real-life-industrial-dataset-of-casting-product (accessed on 3 November 2025).
Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9584–9592. [Google Scholar] [CrossRef]
Bergmann, P.; Batzner, K.; Fauser, M.; Sattlegger, D.; Steger, C. The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. Int. J. Comput. Vis. 2021, 129, 1038–1059. [Google Scholar] [CrossRef]
Sharma, S.; Guleria, K. A systematic literature review on deep learning approaches for pneumonia detection using chest X-ray images. Multimed. Tools Appl. 2023, 83, 24101–24151. [Google Scholar] [CrossRef]
Alzhrani, K.M. From Sigmoid to SoftProb: A novel output activation function for multi-label learning. Alex. Eng. J. 2025, 129, 472–482. [Google Scholar] [CrossRef]
Vogel-Heuser, B.; Neumann, E.M.; Fischer, J. MICOSE4aPS: Industrially Applicable Maturity Metric to Improve Systematic Reuse of Control Software. ACM Trans. Softw. Eng. Methodol. 2021, 31, 5. [Google Scholar] [CrossRef]
Bosser, J.D.; Sorstadius, E.; Chehreghani, M.H. Model-Centric and Data-Centric Aspects of Active Learning for Deep Neural Networks. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5053–5062. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, H.; Li, Y.; Lau, C.T.; You, Y. Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI. arXiv 2022. [Google Scholar] [CrossRef]

Figure 2. Examples of the casting data set: (a) ok sample, (b) nok sample, and (c) synthetic sample.

Figure 3. Examples of the leather data set: (a) ok sample, (b) nok sample, and (c) synthetic sample.

Figure 4. Comparison of the confusion matrices for all variants on the test data set (values in percent). (a) Casting confusion matrix normal data set. (b) Leather confusion matrix normal data set. (c) Casting confusion matrix label improvement. (d) Casting confusion matrix data augmentation. (e) Casting confusion matrix data enrichment. (f) Leather confusion matrix data augmentation. (g) Leather confusion matrix data enrichment.

Figure 5. ROC curves for casting- and leather-data sets under different configurations: (a–d) Casting—normal data set, label improvement, data augmentation, data enrichment; (e,f) Leather—normal data set and data augmentation.

Figure 6. ROC curve for the data set with data enrichment.

Table 1. Comparison of MCAI and DCAI experiments.

Experiment	Data Set	OK Count	NOK Count	Data Source/Method	Accuracy
Baseline (MCAI)	Casting data set	468	709	Original data (no improvement)	83%
Label Improvement (DCAI)	Casting data set	468	670	Manual relabeling by experts	86%
Data Augmentation (DCAI)	Casting data set	849	828	Mirroring and resizing original data	89%
Data Enrichment (DCAI)	Casting data set	849	887	Generated by diffusion model	93%
Baseline (MCAI)	Leather data set	196	51	Original data (no improvement)	53%
Data Augmentation (DCAI)	Leather data set	301	192	Mirroring and resizing original data	27%
Data Enrichment (DCAI)	Leather data set	344	334	Generated by diffusion model	62%

Table 2. Hyperparameter configurations.

No.	Conv_Filters	Conv_Filters_2	Dense_Units	Dropout_Rate	Learning_Rate	Batch_Size	Loss_Function
1	64	256	256	0.30	0.001	16	binarycrossentropy
2	64	256	256	0.20	0.001	48	binarycrossentropy
3	32	256	64	0.40	0.010	64	binarycrossentropy
4	32	192	192	0.30	0.010	48	binarycrossentropy
5	32	64	256	0.30	0.0001	32	hinge
6	96	192	192	0.40	0.010	64	hinge
7	64	256	256	0.40	0.0001	16	hinge
8	64	128	192	0.40	0.001	64	hinge
9	96	256	192	0.20	0.0001	16	hinge
10	64	128	128	0.40	0.010	16	hinge

Table 3. Classification reports for different experiments including weighted metrics.

ID	Experiment	Technique	Category	Precision (P)	Recall (R)	F1 Score	Weighted P/R
a	Casting data set MCAI	MCAI	OK	0.714	0.981	0.827	(W: 0.870/0.829)
			Defect	0.980	0.722	0.832	(W: 0.870/0.829)
b	Casting Label Improvement	Label Improvement	OK	0.750	1.000	0.857	(W: 0.896/0.862)
			Defect	1.000	0.764	0.866	(W: 0.896/0.862)
c	Casting Augmentation	Data Augmentation	OK	0.797	1.000	0.887	(W: 0.916/0.894)
			Defect	1.000	0.819	0.901	(W: 0.916/0.894)
d	Casting Enrichment	Data Enrichment	OK	0.877	0.980	0.926	(W: 0.940/0.935)
			Defect	0.985	0.903	0.942	(W: 0.940/0.935)
e	Leather Data set MCAI	MCAI	OK	0.533	1.000	0.696	(W: 0.284/0.533)
			Defect	0.000	0.000	0.000	(W: 0.284/0.533)
f	Leather Augmentation	Data Augmentation	OK	0.167	0.094	0.120	(W: 0.203/0.267)
			Defect	0.310	0.464	0.371	(W: 0.203/0.267)
g	Leather Enrichment	Data Enrichment	OK	0.636	0.656	0.646	(W: 0.616/0.617)
			Defect	0.593	0.571	0.582	(W: 0.616/0.617)

Table 4. Key findings on the three research questions.

Nr.	Research Question	Key Findings
1	How does the model performance change when varying the data quality and quantity additionally compared to a model-centric optimization?	The quality of the data is more important than the quantity; a large amount of data or hyperparameter optimization cannot compensate for a bad data set. We demonstrated that targeted enhancement of the data foundation using different methods can further improve performance. This result was confirmed for both data sets. Data-related issues in the training process can be addressed through iterative data improvement.
2	What role does synthetically generated data play in improving the detection of rare defect classes in industrial image data sets?	Using synthetic data generated by the DDPM model had a positive impact on model performance by aligning data sets, reducing bias, and improving metrics such as recall, precision, accuracy, and F1 score, although the creation of this synthetic data requires a certain amount of real data.
3	How can data-centric AI be applied in optical quality control?	Our experiments show that data-centric AI methods such as data augmentation, improved data quality, and precise labeling are valuable tools to increase AI system performance, although the use of these methods varies depending on the use case, production maturity, and current system performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nieberl, M.; Zeiser, A.; Timinger, H.; Friedrich, B. Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence. Electronics 2025, 14, 4366. https://doi.org/10.3390/electronics14224366

AMA Style

Nieberl M, Zeiser A, Timinger H, Friedrich B. Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence. Electronics. 2025; 14(22):4366. https://doi.org/10.3390/electronics14224366

Chicago/Turabian Style

Nieberl, Michael, Alexander Zeiser, Holger Timinger, and Bastian Friedrich. 2025. "Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence" Electronics 14, no. 22: 4366. https://doi.org/10.3390/electronics14224366

APA Style

Nieberl, M., Zeiser, A., Timinger, H., & Friedrich, B. (2025). Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence. Electronics, 14(22), 4366. https://doi.org/10.3390/electronics14224366

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Performance of Computer Vision Systems in Industry: A Comparative Evaluation Between Data-Centric and Model-Centric Artificial Intelligence

Abstract

1. Introduction

2. Foundations and Related Work

2.1. Synthetic Data Generation

2.2. Deep Learning for Optical Quality Inspection

2.3. DCAI in Optical Quality Assurance

2.4. MCAI in Optical Quality Assurance

3. Methodology

3.1. Model-Centric Artificial Intelligence

3.2. Data-Centric Artificial Intelligence

3.3. Data Sets

3.4. Model Design and Configuration

3.4.1. Convolution Neuronal Network

3.4.2. Diffusion Model

4. Evaluation

4.1. Metrics

4.2. Model-Centric Approach

4.3. Data-Centric Approach

Casting Data Set:

Leather Data Set:

5. Discussion

5.1. Casting Data Set

5.2. Leather Data Set

5.3. Research Questions

6. Conclusions and Future Research Work

6.1. Conclusions

6.2. Future Research Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI