Next Article in Journal
Experimental Investigation on a Throttleable Pintle-Centrifugal Injector
Next Article in Special Issue
Detection of Water Surface Using Canny and Otsu Threshold Methods with Machine Learning Algorithms on Google Earth Engine: A Case Study of Lake Van
Previous Article in Journal
Subcritical Extraction of Coal Tar Slag and Analysis of Extracts and Raffinates
Previous Article in Special Issue
High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DHS-CNN: A Defect-Adaptive Hierarchical Structure CNN Model for Detecting Anomalies in Contact Lenses

Department of Computer Science, Chungbuk National University, Chungdae-ro 1, Seowon-Gu, Cheongju 28644, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(5), 2697; https://doi.org/10.3390/app15052697
Submission received: 30 December 2024 / Revised: 11 February 2025 / Accepted: 21 February 2025 / Published: 3 March 2025
(This article belongs to the Special Issue Advanced Image Analysis and Processing Technologies and Applications)

Abstract

:
Vision-based inspection systems are essential for quality control in manufacturing industries, and advances in artificial intelligence (AI) have significantly enhanced their accuracy. However, the high-precision requirements of products such as contact lenses demand even more robust inspection methods. This paper introduces a novel defect-adaptive hierarchical structure convolution neural network (DHS-CNN) model based on InceptionV4. The proposed model architecture reflects the manufacturing process and defect types, and we developed a custom loss function to suit this multi-output hierarchical design. Experimental results on a dataset of 2800 contact lens images revealed that the proposed model improved accuracy by 2.08% over the baseline model. These findings suggest that the defect-adaptive hierarchical structure and customized loss function offer substantial improvements in the vision-based inspection of contact lenses and may enhance AI-driven quality control processes in other manufacturing sectors.

1. Introduction

1.1. Research Background

The rapid development of artificial intelligence (AI), particularly deep learning, has driven innovations in many modern industrial sectors. One area where these technological advances are particularly evident is manufacturing. Quality control in manufacturing processes is crucial to ensure the safety and reliability of products [1,2,3]. Accordingly, there is an increasing emphasis on the need for accurate and efficient inspection methods for quality control.
Vision-inspection systems play a vital role in this process. Vision inspection, which examines products for surface defects, dimensions, and color, combines cameras and image-processing software to assess product quality. In particular, vision-inspection systems based on deep learning have demonstrated exceptional performance in image recognition and analysis, and have thus become essential tools in many manufacturing processes [4,5,6,7,8,9,10].
The application of such innovative technologies is crucial for manufacturing contact lenses. As contact lenses come into direct contact with the eyes, their quality and safety have a direct impact on consumer vision and health. While there are no fixed regulations, manufacturers generally aim to keep defect rates below 1% as a benchmark. Failures to meet these standards can lead to recalls and legal issues, resulting in financial losses and damaging the brand’s reputation. The manufacturing process of contact lenses involves various stages and utilizes hydrophilic polymer materials. These stages include injection molding, tinting, reagent filling, and assembly as part of the forming process, followed by separation, dry inspection, hydration and hydration inspection, sealing, sterilization, and packaging. Any slight abnormalities or inconsistencies that occur at any stage are detected and classified during the final inspection process, known as hydration inspection [11,12,13,14,15,16,17]. Therefore, detecting abnormalities during the final inspection process is critical. Traditional manual inspection methods, which rely on the human eye, are time-consuming and prone to error, necessitating improvements to enhance this process [18,19].
To address these issues, an automatic optical inspection (AOI) system was introduced, comprising lighting, cameras, and image-processing algorithms for defect detection [20,21]. These studies showed that the method produces more accurate inspection results than traditional visual inspection methods. However, the accuracy of conventional image-processing-based algorithms is limited because of the diversity in contact lens patterns and possible abnormalities. Consequently, vision-inspection methods that utilize deep learning have been introduced and have achieved impressive results in image recognition and processing [22].
Nevertheless, existing deep learning-based methods require further improvement in terms of accuracy and reliability. In particular, for products such as contact lenses—which demand high precision—it is challenging to detect even the slightest abnormalities or deformations. Therefore, developing new AI-based approaches to overcome the limitations of existing models and provide more precise inspections is crucial [22].

1.2. Types of Abnormalities in Contact Lenses

Figure 1 illustrates the various patterns and types of abnormalities that can appear in contact lenses during the final inspection stage of the manufacturing process. These patterns, including their detection, are as follows:
  • (a), (b): These two images represent the normal patterns of contact lenses, showing different types of normal contact lenses that meet the standard product quality.
  • (c) no_lens: This image shows a situation where the lens has not been properly captured, indicating a scenario in which the lens was either missed during the pick-up process or incorrectly captured during imaging.
  • (d) etc_abnormal: This image depicts a lens in an unidentifiable shape owing to improper positioning or camera malfunction. Such instances occur when irregular images are produced for various reasons.
  • (e) broken: This image shows a partially detached lens, indicating breakage or damage. This reflects a serious defect in the product.
  • (f) burr: This image shows a lens with edges that were not properly trimmed but remain intact, indicating a defect in the cutting process that affects the lens finish.
  • (g) Bubble defect (b_bubble): This image shows a lens with bubbles formed inside during the molding process, a quality issue that may arise during manufacturing.
  • (h) Edge defect (b_edge): This image displays a lens with a crack, which can occur due to physical damage or material defects, directly affecting product safety.
During the hydration inspection phase, images are acquired through a camera system. Figure 2 shows that images are captured in pairs under continuously varying lighting conditions, with each pair consisting of two single-channel images, one under black lighting and one under white lighting. The use of black and white lighting in the image acquisition process is a crucial approach to enhance the accuracy of defect detection. This method capitalizes on the specialized ability of each channel to detect different types of defects. For instance, defects visible only under white lighting conditions may not appear under black lighting, and vice versa. This approach ensures the comprehensive detection of potential irregularities. Additionally, regarding the allocation of image channels, images under black lighting conditions are assigned to the green channel, while images under white lighting conditions are assigned to the blue channel. The red channel is not used and thus remains at a value of zero. The final size of the combined image is maintained consistently at 1500 × 1500 pixels, both before and after merging.

1.3. Purpose and Contributions of the Research

In this study, we propose a novel defect-adaptive hierarchical structure convolutional neural network (DHS-CNN) model that achieves improved accuracy by modifying the existing InceptionV4 model [23]. The proposed DHS-CNN model, featuring a defect-adaptive hierarchy, can effectively detect and classify defects of various shapes and sizes. This capability is critical for inspecting precision products such as contact lenses and addressing different types of abnormalities. Furthermore, this research is expected to enhance the application of AI in the manufacturing sector through the implementation of deep learning technologies, thereby facilitating the broader integration of AI across various industries.

2. Related Works

2.1. Related Research on Contact Lenses

Regarding vision inspection in the field of contact lenses, the following studies are available and are reviewed in this section.
Kimura et al. [24] focused on biometric authentication technologies to enhance the security of iris recognition systems, particularly detection techniques that verify the authenticity of iris patterns. Their study aimed to detect spoofing attacks on iris recognition systems and determine whether contact lenses were worn based on iris images. Romero-Garcés et al. [25] concentrated on detecting cosmetic contact lenses in long-distance iris recognition systems. Their paper proposes methodologies to improve the accuracy of iris recognition technology, specifically by introducing a lightweight cosmetic contact lens detection system capable of recognizing irises from a distance.
The studies by [24,25] were conducted in a context different from this research. While these studies focused on enhancing the security of iris recognition systems or determining whether contact lenses are being worn, this study emphasizes the precise detection of physical defects in contact lenses. Because of these differences, their perspectives and objectives do not align with the scope and purpose of this study.
Kim et al. [26] employed CNN-based deep learning models such as ResNet101, GoogLeNetV2 [27], InceptionV4, DenseNet121 [28], and MobileNet [29]. They generated training and validation data through preprocessing and augmentation of colored contact lens images and compared the defect detection rates for each RGB and HSV channel. The accuracies of the aforementioned models were 89.74%, 84.46%, 95.43%, 82.80%, and 89.74%, respectively, with InceptionV4 achieving the highest defect detection accuracy of 95.43% in the RGB channel. Furthermore, in most models, the RGB channel demonstrated higher accuracy than the HSV channel. However, the defect detection objectives explored in Kim et al. [26] differ from those of this study because of differences in the target processes. Kim et al. [26] focused on specific defect types that occur during the coloring process in the manufacturing of colored contact lenses, whereas this study emphasizes the detection of physical defects that may arise during the general manufacturing process of standard contact lenses.
Kim et al. [22] focused on the development of an AI-based automatic optical inspection (AOI) system for detecting and classifying defects that may occur during the manufacturing of contact lenses. Specifically, they explored methodologies for effectively detecting defects using various CNN models, considering the complexity of the colored contact lens manufacturing process. To achieve this, they generated training and validation data through the preprocessing and augmentation of colored contact lens images.
However, the defect detection accuracy of the AI models used in Kim et al. [22] was 59.5%. Although this demonstrates a certain level of effectiveness in the implementation of AI-based AOI systems, there is room for improvement. This underscores the need for further research and model optimization to achieve the high levels of accuracy and efficiency required for manufacturing processes.
This study aims to precisely detect and classify the physical defects that occur during the contact lens manufacturing process. Specifically, we focus on developing an advanced convolutional neural network (CNN) model based on a defect-adaptive hierarchical structure (DHS). This model processes data through pathways specialized for different defect types, thereby achieving higher accuracy and efficiency.

2.2. Deep Learning-Based Image-Processing Techniques

Roy et al. [30] proposed Tree-CNN, which is a hierarchical network with a tree structure designed for dynamic data environments. This approach utilizes a tree-growth mechanism that integrates new data classes into a pretrained model without degrading the performance of the previously trained classes. The hierarchical model progressively organizes the available data into feature-based superclasses, significantly reducing the training effort while maintaining accuracy on datasets such as CIFAR-10 and CIFAR-100.
However, when the aforementioned study was tested on a contact lens dataset, the expected performance was not achieved. Consequently, one of the key directions of this study is to directly implement and improve the feature-based tree structures of existing models. This implementation aims to better reflect the characteristics of the dataset and enhance its accuracy.
The InceptionV4 model proposed by Szegedy et al. is an advanced version of earlier GoogleNet models, characterized by its ability to increase both depth and width while maintaining computational efficiency, compared to other models such as EfficientNet [31] and ResNet [32]. This model employs the “Inception module,” which performs parallel convolution operations of varying sizes and integrates them, effectively capturing diverse features of an image.
Its structure is illustrated in Figure 3. It was designed to extract only the necessary information and achieve superior performance by effectively capturing various image characteristics. This model demonstrates high performance, particularly for complex images, because it applies multiple filters and operations simultaneously, enabling a more precise analysis of different parts and features of the image. In addition, it appropriately extends the depth and width of the network, making it highly effective for complex image recognition tasks. The InceptionV4 model has demonstrated excellent results in the field of image classification. This suggests that it can be effectively applied to highly precise tasks such as classifying defect types in complex contact lens features. The high accuracy and efficiency of InceptionV4 align closely with the objectives of this study and can significantly contribute to quality control in the contact lens manufacturing process.
The core idea of EfficientNet, as proposed by Tan et al. [31], is to achieve optimal performance through balanced scaling of the network depth, width, and image resolution. This model employs a compound-scaling method to balance complexity and accuracy.
Figure 4 illustrates that, unlike many existing models focusing on expanding a single dimension (depth, width, or resolution), EfficientNet considers all three dimensions simultaneously. This approach significantly enhances model performance while minimizing computational costs and the number of parameters. The study demonstrated outstanding results across various image-related tasks, achieving high performance not only in image classification but also in areas such as object detection.
The Vision Transformer (ViT), proposed by Dosovitskty et al. [33], is an adaptation of transformers, originally used in natural language processing, to the field of computer vision. Unlike traditional CNN, ViTs process images as sequences of flattened patches and use a mechanism of self-attention to capture relationships between these patches.
Figure 5 shows the overall structure of ViTs. This architecture resembles Transformers used in the natural language processing domain and the image patches are simply fed to the model after flattening. After training, the feature obtained from the first token position is used for classification. This approach allows ViTs to consider the entire image globally, improving their ability to recognize patterns and contextual relationships without the locality bias inherent to CNN. Furthermore, ViTs are inherently more adaptable to various image resolutions and data scales, making them particularly effective when trained on large datasets. They can learn more generalized features that are not restricted by the local receptive fields of CNN. This ability to generalize from broader data trends contributes to ViTs’ superior performance on tasks requiring understanding of complex scenes or detailed contextual information. Another key advantage of ViTs is their scalability. The architecture allows for efficient scaling in model size without a significant loss in performance, which is often a challenge for CNN. This scalability is beneficial for deploying models in different environments where computational resources and requirements might vary. Additionally, ViTs can be pretrained on a vast amount of data and then fine-tuned for specific tasks with relatively smaller datasets. This transfer learning capability is highly efficient and reduces the need for extensive training data, which can be resource-intensive to gather and annotate. Overall, Vision Transformers mark a significant step forward in the field of computer vision by addressing some of the fundamental limitations of CNN and enhancing the ability to capture, model, and understand complex visual data effectively [34]. The ViT is a state-of-the-art method and is considered an effective approach for detecting defects in contact lens images, where defects are likely to appear in small areas against a similar background. In particular, the use of cropped images is deemed logically valid from the perspective of defect detection. However, ViTs differ fundamentally from CNN in structure and require a large amount of training data, which poses a significant limitation. Therefore, to effectively utilize ViTs, model modification and data preprocessing emerge as additional challenges.
Currently, deep learning models such as InceptionV4 and EfficientNet, which are widely used for detecting defect patterns in contact lenses, demonstrate excellent performance in general image classification tasks. However, they exhibit relatively low accuracy in distinguishing various defect patterns within contact lenses of similar shapes. This limitation stems primarily from the insufficient feature extraction capabilities needed to discern subtle differences between defect patterns and normal images.
While InceptionV4 and EfficientNet excel in classifying categories with large and clear differences, they are less effective when identifying objects such as contact lenses, which have similar backgrounds and relatively minor differences. Consequently, accurately identifying abnormal types in the contact lens manufacturing process poses a significant challenge for these models. In practical applications, this can lead to missed defect patterns or misclassifications.
To address these issues, a new approach capable of more precisely learning abnormal features is required. This underscores the need for custom model designs that consider the unique characteristics of contact lenses.

3. Defect-Adapted Hierarchical Deep Learning Model

3.1. Hierarchical Design Based on Characteristics of Anomaly Types

The no_lens type refers to a situation in which the lens is not captured by the camera. The etc_abnormal type is defined as a category encompassing abnormal conditions caused by various factors, primarily external influences such as camera capture failures or lens positioning errors. As shown in Figure 6, for both abnormal types, the feature size can occupy the entire image. In particular, for no_lens, the entire image may be considered defective when the lens is missing or not properly captured. This highlights the necessity for defect detection models to accurately identify and classify large features.
Figure 7 shows the broken and b_edge defects. The broken defects signify severe damage to the edge of the lens, which is prominently visible in a linear form. Such damage critically affects the structural integrity of the lens and can cause significant problems during use. The b_edge defect represents linear damage along the edge of the lens. These defects vary in size and shape, indicating different levels of severity. If b_edge defects progress, they may develop into a broken state, suggesting that these defects should be considered part of a continuous spectrum.
Figure 8 illustrates the burr and b_bubble defect types that arise from inherent flaws during the manufacturing process. Burr defects occur when improper cutting results in curved features on the surface or edge of the lens. The b_bubble defect appears when insufficient material is introduced during the molding process, leading to curved bubbles or irregular patterns within or on the surface of the lens. Both defect types share a common characteristic of curved forms.
Additionally, as observed in Figure 7b,c and Figure 8b,c, the defect sizes for b_edge and b_bubble types vary significantly. The b_edge defects range from minor damage to large cracks along the lens edge. Similarly, the b_bubble defects range from small to large bubbles, which can affect the lens’s optical properties.
In this study, as shown in Figure 9, we observed cases in which multiple defect types appeared simultaneously within a single lens. This phenomenon reflects the complexity of the manufacturing process and the concurrent occurrence of various defects. To effectively handle these overlapping defect types, adopting a multi-output structure is essential. A multi-output structure should be designed to simultaneously identify and classify multiple defect types for a single lens. By providing separate prediction results for each defect type, this structure enables accurate identification of multiple defects within a single lens. It also allows more precise analysis of each defect and plays a critical role in environments where various defect types may occur simultaneously during manufacturing.
In the multi-output structure, if a lens does not correspond to any of the six defect types (no_lens, etc_abnormal, burr, broken, b_edge, b_bubble)—meaning all defect classifications are determined to be “False”—the lens is classified as “Good”, indicating the absence of defects. This approach enables the model to identify both defect types and defect-free conditions.
In this study, the abnormality size was used as a critical criterion during the initial classification stages of the model. Specifically, a strategy was adopted to prioritize the classification of larger abnormal values in the early stages of the model because larger abnormalities are more likely to significantly impact product quality. Accordingly, the shape of the lens was determined in the first stage of the model. If no lens shape was detected, this indicated a major abnormality and was classified as no_lens or etc_abnormal. As shown in Figure 10, large abnormalities such as no_lens and etc_abnormal are configured to be quickly identified and processed in the initial stages.
In the subsequent stage, the model was used to examine whether the lens maintained its normal circular edge. If the circular edge was damaged, it could be classified as burr or broken edge, which are key indicators of edge damage in the lens.
In addition, a hierarchical structure was conceptually implemented to reflect the progression of certain abnormal types over time, such as the transition from b_edge to broken. This accounts for the possibility of a specific abnormality evolving into another abnormality. For instance, a b_edge defect can advance to a broken form; this potential progression was incorporated into the model design. Finally, the presence of donut-shaped defects, such as b_bubble, was verified.
This hierarchical structure reflects the pathways and possibilities of abnormal progression, enabling more precise detection and classification of abnormalities. Considering the characteristics of the abnormalities in both the initial and advanced stages, this approach provides a comprehensive and accurate analysis.

3.2. Proposed Defect-Adapted Hierarchical Deep Learning Model

In the initial stage of the modified model shown in Figure 11b, the Stem and Inception A layers were used to determine the presence or absence of a lens shape. The Stem layer, positioned at the beginning of the model, focuses on extracting high-level abstract features from the input image. This layer consists of multiple convolution, pooling, and normalization layers. The convolution layers use filters (kernels) to identify basic shapes and patterns in the image, whereas the pooling layers reduce the image size, retain essential information, and improve computational efficiency. The normalization layers help stabilize and accelerate the learning process, thereby ensuring robust feature extraction.
Normalization layers help prevent the data from being overly skewed, thereby improving the stability and performance of the model. The Inception A layer, which is a variant of the inception module, is designed to capture image features at multiple scales by arranging convolution layers with kernels of different sizes and pooling layers in parallel. This allows the model to simultaneously recognize patterns of various sizes within the same image region, thereby enabling the extraction of richer and more diverse features. By leveraging these diverse kernels, Inception A excels in identifying both large and small objects or patterns within an image. These layers play a crucial role in identifying the basic shape of the lens and in detecting significant defects. Based on this role, the no_lens and etc_abnormal classes were designed to be determined at this stage of the model.
Next, the Reduction A and Inception B layers were employed to determine whether the circular edge was maintained. The Reduction A layer reduces the dimensionality of the feature maps in the neural network while preserving critical features. This layer enhances the computational efficiency of the network and prevents overfitting. Reduction A executes various convolution and pooling operations in parallel, effectively reducing the dimensions of the input data while retaining essential information.
The Inception B layer features a more complex and deeper convolutional structure designed to process finer features such as circular edges. This layer uses kernels of different sizes in parallel to capture features at multiple scales simultaneously, allowing for a more detailed and comprehensive analysis of intricate patterns. The Inception B layer covers larger regions of an image without missing intricate patterns and is particularly adept at detecting and analyzing complex shapes and changes, making it highly valuable for detailed image analysis. These layers focus on precisely analyzing the shape and damage to the lens edges. Consequently, this stage was specifically designed to classify the burr and broken classes.
In the final stage, the Reduction B and Inception C layers were employed to identify linear and donut-shaped defects. The Inception C layer, located in the deeper part of the network, is equipped with a complex structure capable of simultaneously processing image features at multiple scales. This layer uses both small and large kernels in parallel to capture fine details in the image.
Inception C enables precise image analysis and enhances classification and recognition accuracy. These layers are designed to handle complex and fine defect types with precision. Specifically, the final layer continues the feature extraction process for previously classified classes, reflecting the characteristics of the defects in its design.
To evaluate how the proposed adaptive hierarchical structure improves accuracy compared to the existing InceptionV4 and EfficientNet models in relation to the goals of this study, we adopted a method that retains the core layers of the InceptionV4 model while incorporating the DHS.

3.3. Adjustment of Balanced Weights for Loss Function

In multi-output classification, particularly in hierarchical structures with multiple classification layers, it is necessary to carefully calculate the loss function:
L y , y ^ = 1 C i = 1 C y i · log σ y ^ i + ( 1 y i ) · log 1 σ y ^ i
In Equation (1), C is the number of classes, and y is a binary indicator (0 or 1) that specifies whether class label i is the correct classification for the observation. y ^ is the predicted raw output score for the class, and σ represents the sigmoid function, which transforms the output scores into probabilities between 0 and 1. This loss function represents a standard multi-output loss function. However, an unweighted loss function that applies equally to all outputs may not suffice when learning opportunities are unevenly distributed across classes. This problem is especially significant in hierarchical models; if the loss function treats all classes equally, it will focus on classes that appear more frequently and may fail to learn finer-grained distinctions for less frequent classes [35].
In the DHS-CNN, each abnormal class has different learning opportunities, necessitating adjustments in the loss function calculation to ensure accurate learning and prediction at each stage. In other words, assigning appropriate weights to each stage is essential so that the model correctly identifies and predicts critical abnormal types. For multi-output classification problems, using a loss function that assigns different weights to each class is crucial. This approach enables the model to recognize the varying importance of the abnormal types and achieve more accurate classification. For instance, higher weights can be assigned to abnormal types such as burr, broken, b_edge, and b_bubble, whereas relatively lower weights can be assigned to types such as no_lens and etc_abnormal. This strategy accounts for the possibility that certain abnormal types (e.g., b_edge and b_bubble) may be more critical or occur more frequently.
Additionally, in the hierarchical structure, a lower weight is assigned to the b_bubble class because it affects more layers than b_edge. This implies that, within the current structure, there are relatively more opportunities to learn and identify b_bubble defects than b_edge defects. Owing to this structural characteristic, the strategy was to assign a lower weight to b_bubble, encouraging the model to focus more on other abnormal types, particularly those with fewer learning opportunities.
L i = y i · log σ y ^ i + ( 1 y i ) · log 1 σ y ^ i
L = i 1 N w i · L i
where L is the total loss function, and N is the total number of abnormal types. Thus, Li represents the loss function for a specific type of abnormality, and wi denotes the weight of a particular type of target to be adjusted. In this study, the weights for no_lens, etc_abnormal, burr, broken, b_edge, and b_bubble were determined based on the previously described loss function strategy, resulting in values of 1, 1, 2, 2, 3, and 1, respectively.

4. Experiment Results

4.1. Dataset

As shown in Table 1, the dataset used in this study comprises 2800 images. The original image size was 1500 × 1500 pixels, and the input image size was adjusted to 640 × 640 pixels for model training. The resolution of 640 × 640 pixels used in this study was selected considering the characteristics of the dataset, computational efficiency, and practical usability. Our test results indicate that despite compressing the initial image resolution from 1500 × 1500 pixels to 640 × 640 pixels, there was no significant difference found between these two resolutions. Increasing the resolution could improve the detail in images, but this would also increase the computational cost, processing time, and memory usage. Specifically, the computational cost showed an increase of more than four times. Therefore, in this study, we determined that choosing a balanced resolution is crucial, and through experimentation, 640 × 640 pixels was confirmed as the optimal resolution. Moreover, the dataset was split into training, validation, and test sets in a 60%, 20%, and 20% ratio, respectively. By allocating relatively larger validation and test sets, we can more accurately assess how well the model generalizes to new data. This is especially pertinent in manufacturing, where there is a high risk of overfitting. Thus, larger validation and test sets help mitigate this risk and enhance the reliability of the model. Additionally, to ensure consistency and reproducibility of the experiments, random seeds were fixed during the data-splitting process.
Table 2 lists the number of labels for each class. The b_edge, burr, broken, b_bubble, etc_abnormal, and no_lens classes contain 483, 411, 431, 427, 423, and 400 labeled images, respectively. In this dataset, selective adjustments are made to ensure a balanced number of labels for each class, helping prevent bias during model training and ensuring fair and accurate classification of all abnormal types. Additionally, the dataset includes images that do not belong to any of the mentioned classes, which are labeled as “good” products, indicating they do not exhibit any defects.

4.2. EfficientNet Optimal Parameter Search

EfficientNet’s compound scaling optimizes model performance through the balanced scaling of three main dimensions: network depth, width, and image resolution. Rather than simply enlarging the network, this approach aims to maximize both efficiency and accuracy by carefully considering all three dimensions simultaneously. The expansion of each dimension affects model performance differently, contributing to higher accuracy and efficiency overall. Accordingly, the model was trained and validated using the ImageNet dataset, where a grid-search method was used to evaluate various parameter combinations. This method systematically explores possible values for each parameter, thus identifying the optimal combination of network depth, width, and image resolution. It plays a crucial role in effectively exploring the parameter spaces of complex models.
In this study, we applied the EfficientNet approach to a contact lens dataset. Because this dataset differs from ImageNet in its characteristics, the grid-search method is essential for determining optimal parameters. Through this process, we experimented with various combinations of key parameters—such as depth, width, and resolution—to identify the most suitable configuration for our dataset. This step is crucial for customizing EfficientNet to fit the dataset’s unique features.
We specifically conducted experiments to discover an optimized network structure for our contact lens dataset based on the EfficientNetB0 model. By default, EfficientNet uses a width of 1.0, a depth of 1.0, and a resolution of 224 × 224 pixels. During our exploration, we focused on adjusting three key parameters: resolution, width, and depth. We set the resolution to 640 pixels, considering the dataset’s characteristics, computational efficiency, and practical usage. This resolution choice was made to balance model performance and efficiency. Width is the core focus of this study: by carefully adjusting width, we aimed to maximize the model’s ability to detect various abnormal contact lens types as shown in Table 3. Increasing the width expands the variety of features captured by the model, making it critical to identify the most appropriate width value for this dataset.
After determining the optimal width, the depth was gradually adjusted. Increasing the depth allowed the model to learn more complex features; however, given the characteristics of this dataset with a smaller number of classes, it was crucial to minimize the depth to avoid overfitting. Therefore, as shown in Table 4, the exploration process involved gradually increasing the depth while aligning it with the optimal width and assessing the accuracy and efficiency of the model at each step.
Figure 12 illustrates the exploration of width and depth. Through this exploration, the study identified the optimal structure for the EfficientNet model tailored to the contact lens dataset, discovering optimal parameters: a width of 2.2, a depth of 1.0, and a resolution of 640 × 640 pixels.

4.3. Experiment Setup

4.3.1. Evaluation Methodology

The evaluation methodology of this study focused on deriving and comparing results for three cases: the basic structure, DHS-CNN, and changes in the loss function for the InceptionV4 and EfficientNet models. The following evaluations were conducted:
  • Basic structure evaluation: First, the performance of the basic structures of the InceptionV4, EfficientNet, and ViT models was evaluated. This step was important for establishing the baseline performance of the models.
  • Application of DHS-CNN: Next, the performance of both models with DHS-CNN applied was evaluated. This approach was crucial for understanding the impact of DHS-CNN on model performance.
  • Changes in the loss function: Finally, the effect of modifying the loss function on the performance of both models was evaluated. This analysis helped determine how applying different weights to specific abnormal types influenced the results.
This evaluation methodology provided a comprehensive understanding of how structural changes and adjustments to the loss function affected overall model performance.

4.3.2. Experiment Environment and Setting

Table 5 describes the hardware environment and the software versions used.
Table 6 lists the hyperparameters used in the experiments. A batch size of 16 was used, and training was conducted for 200 epochs. The loss function was a binary cross-entropy loss for each output, and the Adam optimizer was chosen with an initial learning rate of 0.0005. A scheduler was also used to adjust the learning rate dynamically. When performance improvements ceased, the learning rate was reduced by a factor of 0.1, and a ‘patience’ parameter was applied to decrease the learning rate after 10 epochs without improvement.

4.4. Experimental Results

As shown in Table 7; the original InceptionV4; EfficientNet and ViT models achieved accuracies of 95.31%, 96.59% and 91.17%, respectively; on the validation data. The InceptionV4 model with the DHS structure achieved an accuracy of 96.70%, demonstrating that the DHS improved validation performance. Furthermore, the InceptionV4 model with a defect-adaptive hierarchical structure and a customized loss function achieved a high accuracy of 97.39%, as indicated by the bold value in the table. This suggests that the customized loss function, which provides appropriate weights for the DHS, contributed to further improvements in the model’s performance.
Considering the hyperparameters and training times, the original InceptionV4 model had 82,304,076 hyperparameters and required 45 s of training time per epoch. The original EfficientNet model contained 169,226,088 hyperparameters and required 342 s per epoch for training. The InceptionV4 model with DHS had 160,830,540 hyperparameters and required 73 s per epoch. The model that included both DHS and the customized loss function retained the same number of hyperparameters and training time. In Table 7, a particularly noteworthy point is that the training time for EfficientNet is considerably longer, at 342 s per epoch, compared to other models. This implies that using larger datasets could significantly extend the overall training time, impacting the practical system implementation. On the other hand, the DHS-CNN model and its balanced weights variant maintain high accuracy while keeping training times relatively low. Specifically, the DHS-CNN with balanced weights model shows a high accuracy of 97.39% and a training time of 73 s per epoch, demonstrating an excellent balance between efficiency and performance. Therefore, when planning to apply a model to large datasets, it is crucial to select a model considering the optimal balance between training time and accuracy. Particularly, the DHS-CNN with balanced weights model, which offers high accuracy and a relatively low training time, could be a good candidate.
As shown in Table 8, when comparing the performance metrics, both the InceptionV4 model with a defect-adaptive hierarchical structure and the one with a customized loss function outperform the original InceptionV4 and EfficientNet models. The numbers highlighted in bold represent the highest values in each metric. In terms of precision, the InceptionV4 model with the defect-adaptive hierarchical structure achieved the highest value (0.88614), and the model with the customized loss function also recorded a very high value (0.88038). This indicates that the positive predictions made by both models were highly accurate. The InceptionV4 model with a customized loss function applied to the defect-adaptive hierarchical structure achieved the highest recall (0.71318), indicating that the model correctly identified a substantial proportion of actual positive cases. In addition, this model achieved the highest F1-score (0.78801), which is the harmonic mean of precision and recall. This result suggests that the model balances precision and recall effectively. Finally, the InceptionV4 model with the defect-adaptive hierarchical structure and customized loss function achieved the highest accuracy of 0.73214, reflecting the best overall performance in terms of correct predictions across the dataset.
As shown in Figure 13, For the no_lens class, almost all models showed good results. However, for the etc class, where defects often occupy the entire image, EfficientNet performed the best. In the etc class, features occupy the entire image, which differs from typical manufacturing scenarios. In manufacturing, defects typically appear as small anomalies against a consistent background, whereas in the etc class, the background itself varies. This distinction is why EfficientNet demonstrates the highest accuracy in this class, likely because this model excels at recognizing features across the entire image. Nevertheless, DHS structures showed dominance in subsequent defect types, particularly in the b_bubble and broken classes, where the DHS structure with a custom loss demonstrated superior performance compared to other models.

5. Conclusions

The InceptionV4 model with DHS exhibited an improvement of approximately 1.39% in accuracy on the test data compared with the original InceptionV4 model. Furthermore, when the customized loss function was applied, the accuracy on the validation data improved by 2.08% over that of the original model. These findings indicate that both DHS and the customized loss function play crucial roles in boosting the model’s performance, particularly for accurately identifying and classifying complex defect patterns.
By applying DHS and a customized loss function, the model learns more complex and finer patterns, reducing overfitting and improving generalization on the validation data. Therefore, the slightly lower accuracy observed during the training process indicates more reliable predictions, translating to enhanced performance on the validation data.
This defect-adapted approach, which is not only applicable to contact lenses but also spans various manufacturing processes (such as defect detection based on printed circuit boards, capsule defect detection in pharmaceutical manufacturing, and packaging defect detection in food and beverages), can be utilized across many industries. These manufacturing sectors, which employ stringent management standards, essentially require high levels of accuracy. For this reason, a simple image-processing technique is not sufficient; instead, a defect-adapted approach that can accurately reflect the characteristics of the defects proves to be effective.
In future work, we plan to conduct an analysis of the YOLOv8 [36,37] and YOLOv11 [38] libraries. These models are based on the EfficientNet model but incorporate unique YOLO techniques that enhance performance. Currently, we cannot directly use these original models because our data are in a multi-output format, which is not directly compatible with them. By analyzing these libraries, we aim to explore the potential for integrating defect-adaptive structures within the YOLO framework. This addition will enable us to extend our adaptive hierarchical structure to more sophisticated classification models, thereby improving our ability to classify defects in complex manufacturing environments. Additionally, we plan to investigate a dynamic defect-adaptive hierarchical structure—specifically, an automated system. This aligns with core concepts of Tree-CNN, which can recognize new patterns or changes in the dataset and dynamically adjust model architecture. This approach enhances flexibility and adaptability, enabling the model to respond effectively to evolving conditions. Additionally, future studies will incorporate a broader range of abnormal types, thereby expanding the current model’s ability to detect and classify a wider variety of defect patterns, which is crucial for complex manufacturing environments.

Author Contributions

Conceptualization, S.-H.K.; Methodology, S.-H.K.; software, S.-H.K.; validation and formal analysis, S.-J.J.; statistical analysis, S.-H.K. and S.-J.J.; resources, S.-H.K. and S.-J.J.; data curation, S.-H.K.; writing—review and editing, S.-H.K.; software upgrade and data generation, S.-H.K.; project administration, K.-H.Y.; funding acquisition, K.-H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Innovative Human Resource Development for Local Intellectualization program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (IITP-2025-RS-2020-II201462).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Access to the data is restricted due to proprietary constraints enforced by the data-holding enterprise. Therefore, it is not available for use upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Çınar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
  2. Ha, H.; Jeong, J. CNN-based defect inspection for injection molding using edge computing and industrial IoT systems. Appl. Sci. 2021, 11, 6378. [Google Scholar] [CrossRef]
  3. Kitsios, F.; Kamariotou, M. Artificial intelligence and business strategy towards digital transformation: A research agenda. Sustainability 2021, 13, 2025. [Google Scholar] [CrossRef]
  4. Sankowski, D.; Nowakowski, J. (Eds.) Computer Vision in Robotics and Industrial Applications; World Scientific: Singapore, 2014. [Google Scholar]
  5. Ercan, M.F.; Wang, R.B. Computer Vision-Based Inspection System for Worker Training in Build and Construction Industry. Computers 2022, 11, 100. [Google Scholar] [CrossRef]
  6. Kuric, I.; Klarák, J.; Bulej, V.; Sága, M.; Kandera, M.; Hajdučík, A.; Tucki, K. Approach to automated visual inspection of objects based on artificial intelligence. Appl. Sci. 2022, 12, 864. [Google Scholar] [CrossRef]
  7. Ercan, M.F. A Video Demonstration of the Computer Vision Based Assessment System. 2022. Available online: https://youtu.be/rGezHIx01uU (accessed on 6 February 2025).
  8. Chien, J.-C.; Wu, M.-T.; Lee, J.-D. Inspection and classification of semiconductor wafer surface defects using CNN deep learning networks. Appl. Sci. 2020, 10, 5340. [Google Scholar] [CrossRef]
  9. Imoto, K.; Nakai, T.; Ike, T.; Haruki, K.; Sato, Y. A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2019, 32, 455–459. [Google Scholar] [CrossRef]
  10. Masci, J.; Meier, U.; Ciresan, D.; Schmidhuber, J.; Fricout, G. Steel defect classification with max-pooling convolutional neural networks. In Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, 10–15 June 2012; pp. 1–6. [Google Scholar]
  11. Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf. Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
  12. Herrera, J.A.; Vilaseca, M.; Düll, J.; Arjona, M.; Torrecilla, E.; Pujol, J. Iris color and texture: A comparative analysis of real irises, ocular prostheses, and colored contact lenses. Color Res. Appl. 2011, 36, 373–382. [Google Scholar] [CrossRef]
  13. Hsu, M.Y.; Hong, P.Y.; Liou, J.C.; Wang, Y.P.; Chen, C. Assessment of ocular surface response to tinted soft contact lenses with different characteristics and pigment location. Int. J. Optomechatronics 2020, 14, 119–130. [Google Scholar] [CrossRef]
  14. Raghavendra, R.; Raja, K.B.; Busch, C. Contlensnet: Robust iris contact lens detection using deep convolutional neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 1160–1167. [Google Scholar]
  15. Choudhary, M.; Tiwari, V.; Venkanna, U. An approach for iris contact lens detection and classification using ensemble of customized DenseNet and SVM. Future Gener. Comput. Syst. 2019, 101, 1259–1270. [Google Scholar] [CrossRef]
  16. Kim, G.N.; Kim, S.H.; Joo, I.; Yoo, K.H. Detection of Contact Lens Defects using a Modified GoogLeNet. Korea Computer Congress 2022, 6, 894–896. [Google Scholar]
  17. Parzianello, L.; Czajka, A. Saliency-guided textured contact lens-aware iris recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 330–337. [Google Scholar]
  18. Kim, G.; Kim, S.; Joo, I.; Yoo, K.H. Measurement of Center Point Deviation for Detecting Contact Lens Defects. BIGDAS 2022, 10, 125–130. [Google Scholar]
  19. Kim, G.N.; Kim, S.H.; Joo, I.; Kim, G.B.; Yoo, K.H. Center Deviation Measurement of Color Contact Lenses Based on a Deep Learning Model and Hough Circle Transform. Sensors 2023, 23, 6533. [Google Scholar] [CrossRef]
  20. Chang, C.-L.; Wu, W.-H.; Hwang, C.-C. Automatic optical inspection method for soft contact lenses. Int. Conf. Opt. Photonic Eng. 2015, 9524, 952402. [Google Scholar]
  21. Elliott, C.J. Automatic optical measurement of contact lenses. Autom. Opt. Insp. 1986, 654, 125–129. [Google Scholar]
  22. Kim, T.Y.; Park, D.; Moon, H.; Hwang, S.S. A Deep Learning Technique for Optical Inspection of Color Contact Lenses. Appl. Sci. 2023, 13, 5966. [Google Scholar] [CrossRef]
  23. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. Proc. AAAI Conf. Artif. Intell. 2017, 31. [Google Scholar] [CrossRef]
  24. Kimura, G.Y.; Lucio, D.R.; Britto, A.S., Jr.; Menotti, D. CNN hyperparameter tuning applied to iris liveness detection. arXiv 2020, arXiv:2003.00833. [Google Scholar]
  25. Romero-Garces, A.; Ruiz-Beltrán, C.; Marfil, R.; Bandera, A. Lightweight Cosmetic Contact Lens Detection System for Iris Recognition at a Distance. In International Conference on Soft Computing Models in Industrial and Environmental Applications; Springer Nature: Cham, Switzerland, 2023; pp. 246–255. [Google Scholar]
  26. Kim, G.N.; Kim, S.H.; Joo, I.; Yoo, K.H. Detection of Color Contact Lens Defects using Various CNN Models. J. Korea Contents Assoc. 2022, 22, 160–170. [Google Scholar] [CrossRef]
  27. Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
  28. Huang, G.; Zhuang, L.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  29. Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  30. Roy, D.; Panda, P.; Roy, K. Tree-CNN: A hierarchical deep convolutional neural network for incremental learning. Neural Netw. 2020, 121, 148–160. [Google Scholar] [CrossRef] [PubMed]
  31. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. Int. Conf. Mach. Learn. 2019, 97, 6105–6114. [Google Scholar]
  32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionm, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  33. Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  34. Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
  35. Zhang, M.-L.; Zhou, Z.-H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 2013, 26, 1819–1837. [Google Scholar] [CrossRef]
  36. Ultralytics. YOLOv8: Implementation and Documentation. Available online: https://docs.ultralytics.com/ko/models/yolov8/ (accessed on 6 February 2025).
  37. Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
  38. Ultralytics. YOLOv11: Implementation and Documentation. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 6 February 2025).
Figure 1. Various patterns of contact lenses in the final inspection stage (a,b) normal lens pattern (c) no_lens (d) etc_abnormal (e) broken (f) burr (g) bubble defect (h) edge defect.
Figure 1. Various patterns of contact lenses in the final inspection stage (a,b) normal lens pattern (c) no_lens (d) etc_abnormal (e) broken (f) burr (g) bubble defect (h) edge defect.
Applsci 15 02697 g001
Figure 2. Image acquisition and channel allocation under varying lighting conditions.
Figure 2. Image acquisition and channel allocation under varying lighting conditions.
Applsci 15 02697 g002
Figure 3. InceptionV4 model architecture [23].
Figure 3. InceptionV4 model architecture [23].
Applsci 15 02697 g003
Figure 4. Compound scaling in EfficientNet [31]. Each color represents a Conv block; while these blocks share the same structure, their parameters differ proportionally.
Figure 4. Compound scaling in EfficientNet [31]. Each color represents a Conv block; while these blocks share the same structure, their parameters differ proportionally.
Applsci 15 02697 g004
Figure 5. An overview of Vision Transformer (on the left) and the details of Transformer encoder (on the right) [33].
Figure 5. An overview of Vision Transformer (on the left) and the details of Transformer encoder (on the right) [33].
Applsci 15 02697 g005
Figure 6. Characteristics of abnormal type—(a) no_lens, (b) etc_abnormal.
Figure 6. Characteristics of abnormal type—(a) no_lens, (b) etc_abnormal.
Applsci 15 02697 g006
Figure 7. Characteristics of abnormal type—(a) broken, (b) b_edge with small size, (c) b_edge with large size.
Figure 7. Characteristics of abnormal type—(a) broken, (b) b_edge with small size, (c) b_edge with large size.
Applsci 15 02697 g007
Figure 8. Characteristics of abnormal type—(a) burr, (b) b_bubble with small size, (c) b_bubble with large size.
Figure 8. Characteristics of abnormal type—(a) burr, (b) b_bubble with small size, (c) b_bubble with large size.
Applsci 15 02697 g008
Figure 9. Multiple abnormal types within one lens—(a) b_bubble + b_edge + burr (b) broken + b_edge (c) broken + b_edge + b_bubble.
Figure 9. Multiple abnormal types within one lens—(a) b_bubble + b_edge + burr (b) broken + b_edge (c) broken + b_edge + b_bubble.
Applsci 15 02697 g009
Figure 10. Consider defect type and size when designing a defect-adapted hierarchical structure.
Figure 10. Consider defect type and size when designing a defect-adapted hierarchical structure.
Applsci 15 02697 g010
Figure 11. (a) Original structure of InceptionV4 (b) the proposed DHS-CNN model modified structure based on InceptionV4 (pink, red, and green blocks represent Stem, Inception, and Reduction blocks, respectively.).
Figure 11. (a) Original structure of InceptionV4 (b) the proposed DHS-CNN model modified structure based on InceptionV4 (pink, red, and green blocks represent Stem, Inception, and Reduction blocks, respectively.).
Applsci 15 02697 g011
Figure 12. Grid search to find initial appropriate parameters—(a) Width, (b) Depth.
Figure 12. Grid search to find initial appropriate parameters—(a) Width, (b) Depth.
Applsci 15 02697 g012
Figure 13. Accuracy for each class between comparison model and DHS models.
Figure 13. Accuracy for each class between comparison model and DHS models.
Applsci 15 02697 g013
Table 1. Experiment dataset.
Table 1. Experiment dataset.
Total Number of ImagesOriginal Image SizeInput Image SizeData Split
(Train: Validation: Test)
28001500 × 1500 (pixel)640 × 640 (pixel)6:2:2
Table 2. Number of labeled images for each class.
Table 2. Number of labeled images for each class.
ClassNumber of Labeled Images
b_edge483
burr411
broken431
b_bubble427
etc_abnormal423
no_lens400
Table 3. Grid search to find initial appropriate parameters–width.
Table 3. Grid search to find initial appropriate parameters–width.
WidthDepthResolutionTrain
Accuracy
Train
Loss
Validation
Accuracy
Validation
Loss
1.416400.9983750.0157350.9525460.154295
1.616400.9969710.0033250.9589120.165796
1.816400.998670.0136290.9589120.143445
2.016400.9997050.0025960.9594910.154562
2.216400.9991880.0021790.9658560.124817
2.416400.9997780.0041050.9537040.150497
2.816400.9997770.0058660.9678570.142077
3.316400.9998510.0011270.9654760.143372
4.316400.9998520.0093390.9606970.140015
Table 4. Grid search to find initial appropriate parameters–depth.
Table 4. Grid search to find initial appropriate parameters–depth.
WidthDepthResolutionTrain
Accuracy
Train
Loss
Validation
Accuracy
Validation
Loss
2.216400.9991880.0021790.9658560.124817
2.21.56400.9982890.0476590.9565480.141712
2.21.86400.9767170.0703340.9559690.118954
2.22.26400.9895830.0118180.9589290.133038
Table 5. Experiment environment.
Table 5. Experiment environment.
Environment
HardwareIntel Xeon(R) Silver 4216
RAM 240 GB
GeForce RTX 3090 24 GB x2
SoftwareUbuntu 20.04
Python 3.10.4
Cuda 11.3
Pytorch 1.11
Table 6. Experiment scenario.
Table 6. Experiment scenario.
ModelsBatch SizeEpochLoss FunctionOptimizerLearning Rate
InceptionV416200Binary Cross
Entropy Loss
(per output)
Adam0.0005
EfficientNet16200Adam0.0005
ViT16200Adam0.0005
DHS-CNN16200Adam0.0005
DHS-CNN with
balanced weights
16200Adam0.0005
Table 7. Experiment results in training.
Table 7. Experiment results in training.
ModelsAccuracy
(Validation)
HyperparametersTraining Time
(Seconds/Epoch)
InceptionV495.31%82,304,07645
EfficientNet96.59%169,226,088342
ViT91.17%177,495,56457
DHS-CNN96.70%160,830,54073
DHS-CNN with
balanced weights
97.39%160,830,54073
Table 8. Overall Experiment Results.
Table 8. Overall Experiment Results.
ModelsPrecisionRecallF1-ScoreAccuracy
InceptionV4 original0.800000.635660.708420.65357
EfficientNet original0.831930.667440.740660.71929
ViT original0.807020.484210.605260.53929
DHS- InceptionV40.886140.693800.778260.70357
DHS- InceptionV4
with custom loss
0.880380.713180.788010.73214
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, S.-H.; Joo, S.-J.; Yoo, K.-H. DHS-CNN: A Defect-Adaptive Hierarchical Structure CNN Model for Detecting Anomalies in Contact Lenses. Appl. Sci. 2025, 15, 2697. https://doi.org/10.3390/app15052697

AMA Style

Kim S-H, Joo S-J, Yoo K-H. DHS-CNN: A Defect-Adaptive Hierarchical Structure CNN Model for Detecting Anomalies in Contact Lenses. Applied Sciences. 2025; 15(5):2697. https://doi.org/10.3390/app15052697

Chicago/Turabian Style

Kim, Sung-Hoon, Seong-Jong Joo, and Kwan-Hee Yoo. 2025. "DHS-CNN: A Defect-Adaptive Hierarchical Structure CNN Model for Detecting Anomalies in Contact Lenses" Applied Sciences 15, no. 5: 2697. https://doi.org/10.3390/app15052697

APA Style

Kim, S.-H., Joo, S.-J., & Yoo, K.-H. (2025). DHS-CNN: A Defect-Adaptive Hierarchical Structure CNN Model for Detecting Anomalies in Contact Lenses. Applied Sciences, 15(5), 2697. https://doi.org/10.3390/app15052697

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop