1. Introduction
Industrial mass production processes require the development of protocols to control various quality specifications. Some of these specifications can be visually recognized [
1]. In processes where the production flow is high or is at high speed, executing these protocols with human eyes is unsafe and often impossible. In these cases, artificial vision systems must be implemented that, along with adequate processing, allow for automatic decision-making on whether a product meets the desired specifications or not. Automation of these systems is critical to reduce human error, improve consistency, and ensure product quality on a large scale.
The implementation of real-time fault detection and diagnosis (RT-FDD) is of paramount importance in Industry 4.0 environments, where immediate feedback is critical to sustaining efficient production and avoiding the propagation of failures. Industrial processes are generally classified into two categories: continuous and discrete. Continuous processes, such as production and packaging in pharmaceutical industry, operate uninterruptedly and maintain a steady output. This mode of operation results in relatively stable and predictable conditions, where deviations typically emerge gradually and are often linked to wear or efficiency losses [
2].
In the design of new fault detection methods, it is a priority to ensure effectiveness so that they are accurate in their analysis, as errors can lead to economic losses and even safety failures. Another important factor to consider is the processing speed, as the system’s response to detecting faults must be consistent with production flow speeds. This is particularly important in real-time systems, where delays could cause disruptions in the production line, leading to inefficiencies and increased costs.
Although Support Vector Machines present a unique solution that is optimal, modeling any type of training set, they have a high cost in memory and training time [
3]. This leads to the need to reduce the dimensionality of the input data as much as possible. PCA is widely used in fault detection methods [
4,
5]. This method allows for patterns in the data population to be revealed with lower dimensionality, generating more efficient processing [
6,
7,
8,
9].
1.1. Related Works
Fault detection systems have been widely studied and applied in all industries. Ref. [
10] classifies fault diagnostic methods in processes, considering those based on pattern recognition and those using AI. In these methods, a reference pattern is available; the data is compared with this pattern, and by means of some measurable magnitude it is decided whether the sample presents anomalies or meets the specifications. However, the complexity and variability of industrial environments often require more sophisticated approaches. Recent advancements have leaned towards AI-based methods for higher accuracy and adaptability.
The adoption of Principal Component Analysis (PCA) in industrial process monitoring can be traced back to the late 1980s. Ref. [
11] demonstrated the applicability of PCA for analyzing process data, using a case study based on a ceramic smelter employed in nuclear waste reprocessing. Their findings highlighted the potential of PCA to improve process insight by examining the contribution of individual variables to the principal components derived from historical operational data [
12].
Several studies have leveraged PCA in innovative ways to enhance fault detection across different contexts. Ref. [
13] combined PCA and LDA using a probabilistic fusion model to achieve robust fault diagnosis in induction motors under noisy conditions. Ref. [
14] proposed PCANet, which integrates PCA with block-wise image histograms in a lightweight neural-style architecture, underscoring the compatibility of PCA and histogram-based representations in image analysis. Ref. [
4] explores the use of PCA in cement rotary kilns for fault detection and diagnosis, demonstrating its ability to reduce the dimensionality of the large sensor dataset while maintaining detection accuracy. In another case, ref. [
5] applies PCA to nuclear power plants, combining it with unsupervised machine learning techniques to monitor system health and detect anomalies. Ref. [
15] proposed a robust fault detection framework that integrates multiscale Principal Component Analysis (MSPCA) with a Kantorovich (KD) distance-based approach. By applying wavelet decomposition, the method extracts multi-resolution features from process signals, enhancing sensitivity to faults under noisy conditions. The KD metric is then used to assess deviations in the projected data space, with non-parametric thresholding enabling flexible and effective anomaly detection. This approach demonstrated improved performance over traditional PCA and MSPCA in scenarios involving drift, bias, and intermittent. Ref. [
16] introduced a dynamic fault detection approach based on Dynamic Kernel Principal Component Analysis (DKPCA) combined with a Weighted Structural Difference (WSD) metric. The method captures nonlinear and dynamic behavior in industrial processes by projecting time-series data into a high-dimensional feature space using kernel functions, followed by the extraction of dynamic correlations through DKPCA. The WSD metric, computed over sliding windows, quantifies structural variations in the evolving data distribution by considering both mean and variance changes, thus enhancing sensitivity to process shifts while maintaining robustness against non-Gaussian noise.
Other applications of PCA can be found. For example, ref. [
17] addresses the challenge of non-linear fault detection in chemical processes employing Kernel PCA (KPCA) in combination with the Generalized Likelihood Ratio Test (GLRT). This method capitalizes on the ability of KPCA to project the data into a higher-dimensional space, making it easier to separate faulty and nonfaulty instances. The residual is then computed in the original space, enhancing the detection of anomalies.
However, KPCA is often suboptimal for uncertain or highly variable systems, as its processing requirements grow significantly with larger datasets. This limitation is addressed in [
18], where a nonlinear Fault Detection Method based on Interval Reduced Kernel PCA (IRKPCA) is developed to monitor processes with uncertainty. The technique uses interval-valued Euclidean distances to retain only the most relevant measurements, reducing computational cost while maintaining high detection accuracy.
A similar approach is presented by [
19], which proposes Reduced Kernel PCA (RKPCA) to monitor industrial processes. By decreasing the number of observations in the data matrix based on a dissimilarity metric, the system can reduce redundancy and improve processing times. This method is particularly useful in systems where large volumes of data are generated, such as in the petrochemical industry.
Recent contributions have also explored sparsity-constrained formulations to enhance variable selection and fault isolation. For instance, some works have incorporated
-norm optimization into PCA and CCA frameworks to achieve joint sparsity and reduce variable redundancy, resulting in improved detection speed and accuracy in benchmark industrial processes such as the Tennessee Eastman and cylinder–piston systems, as seen in [
20,
21]. These approaches highlight the growing relevance of sparse optimization in process monitoring and its potential to complement traditional PCA-based methods.
For image-based fault detection, particularly in pharmaceutical production, the challenge often lies in processing high-resolution images of products such as pill blisters efficiently. Traditional methods have struggled to keep up with the demand for real-time analysis without sacrificing accuracy. A deep learning approach using CNNs has been explored in [
22] for the Tennessee Eastman process. The method successfully isolates various faults using sensor data and achieves a fault isolation performance of more than 98%.
Moreover, Variable Window Moving Kernel PCA (VMWKPCA) has been employed to diagnose faults in dynamic processes such as the Continuous Stirred Tank Reactor (CSTR) process [
23]. In this case, a structured partial VMWKPCA is utilized to detect and diagnose faults. The proposed method demonstrates superior efficacy, particularly in complex, non-linear systems where traditional methods struggle to provide timely and accurate diagnostics.
1.2. Main Contribution
Despite the aforementioned advancements, the challenge of reducing processing time remains, particularly when dealing with large datasets and high-resolution images. In this study, we propose a novel method that combines image histograms with PCA to improve both accuracy and processing speed in fault detection systems. Histograms are utilized for feature extraction from images, which helps to reduce the dimensionality of the data before applying PCA. This method is tested on pill blister images to identify missing pills. Owing to the effectiveness of the feature extraction process, the classification task can be accomplished using a single neuron, highlighting the discriminative power of the proposed representation.
By leveraging the strengths of both PCA and image histogram techniques, this method reduces computational complexity while maintaining high detection accuracy, which is critical in real-time production environments. Our method shows promising results in detecting faulty blisters, paving the way for further applications in pharmaceutical quality control.
This paper is organized as follows: in
Section 2, preliminary concepts are presented to introduce the topic of feature classification.
Section 3 elaborates on the new proposed method with an application in the classification and detection of faults in a pill blister.
Section 4 presents the training process of the SVM and the results obtained in fault classification, while a comparison with other state-of-the-art classifiers is made. Finally, in
Section 5, the conclusions are stated.
2. Preliminaries
An analysis of cluster classification methods and anomaly detection in objects is presented in this section. It is necessary to initially delve into the fundamental concepts on which the present research is based, particularly those related to Support Vector Machines and Principal Component Analysis.
2.1. Support Vector Machines
Based on development seen in [
24], let us consider a training set
, where
is the vector corresponding to the
i-th input and
is its desired value. Assuming that the clusters determined by the input vectors are linearly separable, then the equation that defines the hyperplane of separation between them would be defined by:
where
is the input vector,
is the weight vector, and
b is a bias coefficient.
Let us consider that for one cluster, the desired value is
, and for the other, it is
. Then, we would have
For a given
w and
b, the distance between the hyperplane defined in (
1) and the nearest vector
x (considering the Euclidean distance) is known as the margin and is symbolized by
.
The goal of Support Vector Machines is to maximize the value of
to optimize classification, minimizing decision errors. The key to this type of machine learning is to find the values of the parameters
and
that define the optimal hyperplane for the previously defined training set. This pair of parameters must satisfy that
The input vectors that, in correspondence with their desired output, satisfy the equality in (
4) or (
5) are known as support vectors. Being the vectors closest to the separation hyperplane, they are the most sensitive to classification. This is why they play a fundamental role in calculating the hyperplane.
If (
4) and (
5) are combined into a single condition, we obtain
The training is solved by finding the values of
w and
b that minimize the norm function
under the condition defined in (
6). This training could be performed, for example, using the method of Lagrange multipliers.
2.2. Principal Component Analysis
Suppose we have m-dimensional input vectors that will be used to train and execute a classifier. The classification is based on these m characteristics that define each sample. It is known that increasing the number of inputs to a neural network leads to an increase in its complexity, requiring greater memory and processing costs. The question that arises is whether there are some of these characteristics (or linear combinations of them) that explain most of the information provided by the input sample, without the need to use all of them.
Given the matrix
, where
are the input vectors. Assuming that each row of
X has zero mean, the variance–covariance matrix is formed as
We seek to find a vector
on which to project the inputs such that the variability given by
is maximized. This variability is calculated as
If we maximize (
8) under the condition that
has norm equal to 1, we find that
must be an eigenvector of
associated with the largest eigenvalue. The vector
is known as the first principal component. If we want to find the other principal components, we will find them as the eigenvectors associated with the next largest eigenvalues ordered in decreasing order. In this way, it is possible to present the samples in a
p-dimensional space, with
while maintaining the highest percentage of variability. This percentage that is maintained when projecting onto the first
p principal components is given by
where
are the eigenvalues of
.
3. Proposed Classification Method Development
In summary, the method is based on taking an RGB image of the product to be classified, generating the concatenated histogram of the 3 channels, conducting PCA with
n principal components, and performing classification with SVM. In
Figure 1, a synthesis of the developed method is presented. To analyze and evaluate the performance of the method, it has been considered to work on detecting faults in pill blisters, particularly in cases of missing one or more pills. This solution is highly demanded in the pharmaceutical industry due to the high production speed of blisters and the high level of quality required by the marketing standards of these products.
To implement it, it was first necessary to obtain images for classifier training. A blister of 10 circular pills was used. The blister is blue, while the pills are a light pink color. Images were taken with a cell phone, with a zenith angle, at a distance of 20 cm from the blister on a black background, with a resolution of 300p × 300p. 78 images were made, in which the horizontal orientation of the blister was varied (
Figure 2).
Due to the number of available images, in which the highest possible variability had already been generated through orientation changes, a data augmentation process was carried out. This decision was made to improve the quality and reliability of the training and validation of the proposed method.
The data augmentation process consisted of applying five different types of transformations to generate images considered as new information for the classifier’s development. The transformations included vertical and horizontal flips, the addition of masks with intensity gradients to simulate lighting changes, and the generation of Gaussian and salt-and-pepper noise. For the illumination gradients, a linear directional mask was generated by projecting normalized image coordinates onto a random angle , and applying a multiplicative intensity variation with a randomly sampled strength , producing illumination changes between and . Gaussian noise was added with zero mean and variance , while salt-and-pepper noise was injected with a density of ( of the pixels). As a result, the total number of available images increased sixfold, allowing the use of 468 images in total.
A fundamental characteristic that was recognized is that, if the number of pills, the lighting, and the relative position of the camera to the blister are kept constant, both in distance and angle, and only the horizontal orientation of the blister is varied—even considering the presence of different types of noise—the distribution of pixel intensities does not show significant variations, as was observed in the histograms obtained from the images. Additionally, it was noted that the most relevant information—in terms of variability—was found in the upper part of the histogram, since the lower part corresponds to the pixels of the background. For this reason, it was decided that only the top 157 data points of the histogram of each color channel would be used for classification, and it was compared to a version with a full histogram to justify this decision.
By applying PCA to the histograms, it was detected that, by projecting the 471-dimensional histogram information onto only 2 dimensions, corresponding to the two most representative principal components, 88.76% of the variability could be preserved. By doing this, it was possible to represent each image using only one point in the plane. The method was also tested using 20 principal components, which preserved 98.85% of the variance, showing that the number of components used is a tunable parameter of the method.
The threshold between clusters could be implemented, now that n-dimensional information was available, through an SVM with Kernel Feature Space to ensure optimal separation.
4. SVM Training and Results
In our implementation, a linear kernel was employed for the SVM, which provided a suitable balance between model complexity and generalization for this dataset. To increase robustness against noisy or atypical measurements, an outlier fraction of 5% was specified, allowing the optimization process to tolerate a small proportion of mislabeled or irregular samples without compromising the separating hyperplane. The optimization was carried out using Iterative Single Data Algorithm (ISDA), which in this case required 12,068 iterations to reach convergence. The final model relied on 159 support vectors, indicating that a meaningful subset of the training samples contributed directly to defining the optimal separating hyperplane.
A total of 468 images were used for training and validation. The dataset included 198 images of normal blisters, 138 images containing one missing pill, and 132 images with two missing pills. Since the goal of the experiment is to determine whether a blister is defective or not, a binary classification scheme was adopted. Thus, images with one or two missing pills were grouped into a single faulty class, while full blisters were assigned to the normal class. To strengthen the validation, a cross-validation was implemented using 5 folds. In each fold, 80% of the images were randomly selected for training and 20% for validation, and then the results from each fold were combined to measure accuracy and timings.
To perform a comparative analysis of the performance of this method, other state-of-the-art classifiers were trained, and performance metrics were obtained for each (
Table 1).
5. Discussion
Although the individual techniques employed in this method, such as histogram analysis, PCA, and linear classification, are well established in the literature, their simple yet effective integration into fault detection in blister images represents a novel contribution. This deliberate simplicity is a key strength of the approach: rather than relying on advanced or computationally intensive architectures, the method combines fundamental techniques in a coherent pipeline that achieves high discriminative performance. The results suggest that, under controlled acquisition conditions, the use of more sophisticated feature extractors or deep models may not necessarily yield a real improvement in classification accuracy, while significantly increasing model complexity and, in some cases, computational cost. The method capitalizes on the simplicity and discriminative capacity of histogram-based features, which, when combined with PCA, result in a highly compact data representation that preserves most of the relevant variance. This allows the classification task to be performed with minimal computational complexity, in this case, using a single neuron modeled as a linear Support Vector Machine.
The effectiveness of the proposed approach is evident in its ability to detect missing pills with good classification accuracy on the validation set, without the need for complex image preprocessing or deep learning architectures. The preservation of over 88% of data variance after PCA underscores the relevance of the selected features. This suggests that even simple global descriptors can be highly informative when the acquisition conditions are adequately controlled.
Table 1 presents a comparative analysis of classifier performance across several state-of-the-art methods. The proposed histogram-based approach shows that using partial histograms consistently outperforms full histograms. For instance, with 2 principal components, the partial histogram achieves 84.17% accuracy compared to 64.32% for the full histogram. With 50 principal components, the partial histogram reaches 97.22%, slightly higher than 96.37% for the full histogram.
This improvement can be attributed to the fact that the main discriminative information comes from the blister regions themselves, while the background remains largely unchanged. Incorporating the full histogram introduces background pixels that do not contribute meaningful variability, slightly reducing classification effectiveness.
Traditional descriptors, such as HOG + PCA + SVM (82.22%) and LBP + PCA + SVM (78.61%), show lower accuracy compared to the proposed histogram-based method with 2 principal components, which achieves 84.17%. Gabor + PCA + SVM, despite using 20 principal components, only improves accuracy slightly to 88.89%.
In terms of computational efficiency, the proposed method also compares favorably. Its training time (120.6 ms) is substantially lower than that of HOG + PCA + SVM (1520.3 ms) and PCA + RF (347.1 ms), while maintaining classification times comparable to the fastest models. The kNN classifier yielded the lowest accuracy (62.50%), highlighting its limited generalization capability in this problem setup.
These results confirm that the proposed method achieves a superior trade-off between accuracy, model complexity, and computational cost. Unlike more intricate feature extraction techniques that require large amounts of data or fine-tuning of multiple hyperparameters, the proposed approach relies on a minimal and interpretable pipeline. Such characteristics make it particularly suitable for industrial contexts where interpretability, low latency, and ease of deployment are essential.
Nevertheless, several aspects warrant further exploration. The current dataset, while sufficient for initial validation, is relatively limited in terms of variation in pill types, blister materials, lighting conditions, and background textures. Although additional variability was introduced through data augmentation techniques—providing a stronger basis for evaluating the robustness and effectiveness of the proposed method—it remains advisable to perform further experiments under conditions more closely aligned with industrial environments. Such tests, conducted using real production infrastructure and acquisition setups, would allow a more comprehensive assessment of the method’s stability and generalization capacity in realistic operational contexts.
It should also be noted that the use of Convolutional Neural Networks (CNNs) was not explored in this work for two main reasons. First, the strength of such architectures is generally directed toward more complex classification problems, where large-scale data and intricate spatial relationships are involved, conditions that do not necessarily apply to the problem studied here. Second, the structural and computational complexity of CNNs greatly exceeds that of the proposed method and of the other state-of-the-art techniques used for comparison, which would have made the evaluation uneven and less representative of a fair performance assessment.
Finally, the method’s potential for transfer learning should also be investigated. For example, it could be adapted to other domains involving repetitive visual structures, such as defect detection in solar cells, food packaging, or electronic component assembly.
6. Conclusions
The proposed method demonstrates a robust and efficient approach for fault detection in blister images, combining histogram analysis, PCA, and a simple linear classifier. The results confirm that the image histograms exhibit clear and consistent alterations in the presence of missing pills, which provide sufficient information for accurate classification. By applying PCA, the dimensionality of the data was significantly reduced while preserving the most relevant variance, allowing the classification task to be performed using an extremely simple model.
Compared to other state-of-the-art techniques, the proposed approach achieves a superior balance between classification accuracy, computational efficiency, and model simplicity. Its simple integration of well-established techniques proves sufficient for the studied problem, and the introduction of additional variability through data augmentation further validates its effectiveness.
The use of more complex architectures, such as Convolutional Neural Networks, was not explored because their strength is more relevant to highly complex classification tasks, and their structural complexity greatly exceeds that of the proposed method and the other benchmarked approaches. Therefore, the proposed method represents a practical and well-justified solution under controlled acquisition conditions.
Overall, the method provides a reliable and straightforward solution suitable for real-time, in-line industrial applications. Its simplicity, combined with high classification performance, makes it particularly appealing for scenarios with limited computational resources or strict deployment constraints, while laying the groundwork for future evaluations under more diverse and realistic industrial conditions.