Next Article in Journal
Multi-Output Regression Indoor Localization Algorithm Based on Hybrid Grey Wolf Particle Swarm Optimization
Previous Article in Journal
New Digital Technologies for Diagnosis and Rehabilitation of Neurodevelopmental Disorders
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ENN: Hierarchical Image Classification Ensemble Neural Network for Large-Scale Automated Detection of Potential Design Infringement

1
NetcoreTech Co., Ltd., 1308, Seoulsup IT Valley, 77 Seongdong-gu, Seongsuil-ro, Seoul 04790, Republic of Korea
2
Department of Computer Engineering, Hongik University, 94 Mapo-gu, Wausan-ro, Seoul 04068, Republic of Korea
3
Neouly Incorporated, 94 Mapo-gu, Wausan-ro, Seoul 04068, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2023, 13(22), 12166; https://doi.org/10.3390/app132212166
Submission received: 16 September 2023 / Revised: 2 November 2023 / Accepted: 3 November 2023 / Published: 9 November 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
This paper presents a two-stage hierarchical neural network using image classification and object detection algorithms as key building blocks for a system that automatically detects a potential design right infringement. This neural network is trained to return the Top-N original design right records that highly resemble the input image of a counterfeit. This work proposes an ensemble neural network (ENN), an artificial neural network model that aims to deal with a large amount of counterfeit data and design right records that are frequently added and deleted. First, we performed image classification and objection detection learning per design right using acclaimed existing models with high accuracy. The distributed models form the backbone of the ENN and yield intermediate results aggregated at a master neural network. This master neural network is a deep residual network paired with a fully connected network. This ensemble layer is trained to determine the sub-models that return the best result for a given input image of a product. In the final stage, the ENN model multiplies the inferred similarity coefficients to the weighted input vectors produced by the individual sub-models to assess the similarity between the test input image and the existing product design rights to see any sign of violation. Given 84 design rights and the sample product images taken meticulously under various conditions, our ENN model achieved average Top-1 and Top-3 accuracies of 98.409% and 99.460%, respectively. Upon introducing new design rights data, a partial update of the inference model was performed an order of magnitude faster than the single model. The ENN maintained a high level of accuracy as it was scaled out to handle more design rights. Therefore, the ENN model is expected to offer practical help to the inspectors in the field, such as customs at the border that deal with a swarm of products.

1. Introduction

Industrial design involves creative activities to reasonably and organically construct various product elements. Such designs can be protected by law by applying design right registration. This paper focuses on the problem of the registered design rights being violated by delicately imitated products. Non-experts may not easily distinguish between the original design of a genuine product and a fake one. The number of illegal counterfeit products continues to increase, causing unfair damage to the product design rights owners. According to an OECD report [1], the annual damages due to piracy amounted to 500 billion US dollars. To prevent such damage, professionally trained human inspectors at customs manually inspect for illegal forgery of goods coming from overseas. If a product is suspected to be an unlawful copy during the screening process, it is seized for further investigation of its authenticity. However, even for inspectors with years of experience, it is overwhelming to compare the large volume of incoming products against the list of thousands of design rights. Therefore, an automated illegal counterfeit probe system is in great need.
Artificial Intelligence (AI) has recently advanced unprecedentedly with the introduction of some foundation models that have been proven to be effective in various problem domains [2,3,4,5]. Some AI technologies have been employed for small-scale counterfeit examining systems [6,7,8]. However, the proposed neural networks still have several limitations when building an automated system for examining counterfeit copies at a large scale. The design rights are an aesthetic element that constantly and rapidly change to stay current with the trend. Therefore, new design rights are frequently registered. Enabling a more prompt machine learning system is imperative to cope with the outpouring of products against thousands of continually changing product design rights.
Upon introducing new design rights and the associated training image data, most methodologies perform transfer learning of the existing model. The relearning cost increases proportionally to the number of design rights, thus hampering the realization of a scalable counterfeit examining system. This paper is intrigued by the mechanism of the neocortex in human brains as explained in [9,10]. In [10], it was confirmed that the brain’s neocortex operating as a single mechanism comprises six layers and a column of neurons with a vertical structure penetrating the layers. Additionally, in [9], it was revealed that neuron columns collectively solve the problem through a consensus process to learn the world model holistically. Based on the mechanism of the actual brain operation, we devised a distributed sub-neural network model analogous to the vertically configured neuron column. Then, we created a master neural network that aggregates the intermediate results produced by the distributed sub-models and selects the one that returned the best classification result for a given test image. This ensemble layer’s work is analogous to the voting mechanism in the neocortex. We refer to such a stepwise hierarchical neural network structure as an ENN (ensemble neural network).
First, the product images of design rights were segmented into non-intersecting groups. For each group, we employed a sub-neural network model proven effective for image classification and object detection. The master model at the succeeding stage collects the individually trained sub-models’ output and takes them as input to learn their weights. After completing the stepwise learning process, the ENN model multiplies the inferred weight values to the weighted input vectors produced by the individual sub-models to assess the similarity between the test input image and the existing product design rights to find any sign of violation. Figure 1 illustrates the overall structure of an automated system for examining counterfeit products based on image-examining technology using AI. This system transmits the input image taken via client API to the ENN model. The ENN model returns the Top-N similar product design rights. This system also yields a unique article number of similar design rights. A given product that exceeds the similarity threshold is suspected of violating an existing design right, and the image capture of the alleged product is sent to the design right holder via email. This paper discusses the ENN model, which sits at the core of the counterfeit screening system.
The critical characteristic of the ENN is the hierarchical neural network structure that combines the results of segmented learning by the sub-models. When a sub-model for the newly introduced design rights is added to the ENN, there is no need to pull up the training data of other design rights all over again to complete the classification learning procedure. Therefore, the learning cost of the ENN is significantly lower than that of a single model. The master model still conducts the transfer learning upon completing the newly introduced sub-model. However, the master model does not directly output similarities for the entire classes. Instead, the master model learns to output the relevancy of the sub-models for a given sample product image.
Given 84 design rights and the sample product images taken meticulously under various conditions, our ENN model achieved average Top-1 and Top-3 accuracies of 98.409%.
This paper is structured as follows. Section 2 first reviews the related research works. Section 3 introduces the ENN, the core hierarchical neural network model for the automated design right violation detection system. Section 4 discusses the experimental results. Section 5 presents future research directions. Finally, we reach our conclusions in Section 6.

2. Related Works

Illegal replicas are spreading rapidly due to technological advances in logistics. Accordingly, various methods have been proposed for detecting such counterfeiting [6,7,8,11]. Among them, a way to automatically calculate the probability that a product is a counterfeit based on online customer reviews in the market has been proposed [11]. This study used Natural Language Processing (NLP) and subject analysis methods to process customer reviews. This work also defined counterfeit scores. However, these approaches depend on the NLP analysis of buyers’ reviews that become available only after a specific purchase. Therefore, such a method cannot be exercised at the forefront of product screening before distribution. Diversified sales routes evading customer reviews can result in more victims. Thus, a preventive measure should be applied well before the counterfeits enter the market.
There is a limit to the supply of professional human resources to respond to the increasing number of counterfeits. Moreover, it is difficult for customs to unpack and dissemble items arbitrarily for further inspections. This issue calls for a non-destructive inspection method. The most basic non-destructive testing method is to analyze the visible characteristics. It is possible to analyze specific patterns through an AI-based computer vision algorithm. In particular, impressive performance is shown by image classification and object detection algorithms as presented in [12,13,14,15].
AI-based computer vision technologies have emerged in the studies of the detection of counterfeit bills [6,7] and logos [8]. These studies are limited to recognizing the similarity with a single genuine item. Such special-purpose inspection methods are inappropriate for our case, where a given object has to be compared against multiple categories, i.e., design rights. Counterfeit screening becomes more challenging as new product design rights are constantly added and updated. Despite transfer learning [16], the learning cost increases exponentially with the number of classes if a single model is used for classification.
We devised distributed backbone neural network models ensembled to form a master model and efficiently deal with frequently updated design rights at a large scale. The ensemble model has been studied to improve the accuracy of a single model [17,18,19] in the image classification domain. As well as more classical approaches such as voting [20], bagging [21], and boosting [22], stacking has recently shown some effectiveness for image classification [23]. Some stacking methods used a sequence of different models [22,24,25]. Another stacking method applied the same data to different models at once and aggregated the results [26]. In the method devised by Rosen [27], individual networks are trained by backpropagation to have their errors linearly decorrelated with the other networks. Rosen’s approach linearly combines the individual networks to produce an output. We use a deep neural network with residual blocks to combine the backbone neural networks and find the correlation between the intermediate weighted input and output class. Zhang et al. [28] used the Kalman gain localization method to reduce the errors and rank deficiency when sampling the prior ensembles with a limited size. This method is used to update the uncertain mapping between measurements of different properties. Our backbone models use convolutional layers to capture the features from product images.
More recently, the backbone structure has emerged [14,15].
However, these ensemble approaches incur a cost that increases exponentially with the number of classes. All backbone models have to be retrained even to reflect incremental training data updates. Contrary to these previous approaches, we aim to support incremental updates to achieve high scalability while maintaining high classification accuracy. Our two-stage ensemble architecture is similar to a hybrid neuro-fuzzy system that infers the results of the neural network output with a fuzzy inference system (FIS) [29]. We can experiment with the effect of the FIS at the second stage of the ENN, which is a subject for future work.
The discriminator of GAN [30] can be considered for distinguishing between an authentic product and a forged one. We focus on a case where a non-authorized producer deliberately replicates a product. We notify the authorized producer upon detection of a product suspected to be suspicious of forgery. The licensed producer determines whether to take legal action given the detection result. Our work is not about automatically concluding a fraud, as the similarity between genuine and fake goods is often disputable. Therefore, the ENN focuses on classifying a product under inspection to a known design right. It is an interesting future research problem to analyze from the legal precedence to understand what counterfeit features affected the ruling of design right infringement.

3. Methodology

Previous studies had limitations when responding to the continuous addition of design rights. We employed the core idea for addressing such limitations by designing a neural network similar to the human brain structure studied by Jeff Hawkins and Mountcastle [9,10]. In particular, as mentioned in [10], we tried to construct a sub-neural network that acts similarly to a cortical column of a vertical structure following a common mechanism. We propose a distributed backbone neural network structure functioning as a neural pillar that learns independently per design rights partition. Such an approach differs from the existing learning methodology that applies a single neural network to the entire dataset.
Given the inference result found by vertical neuron pillars, we propose a two-level structure of the ENN in which the parent or master neural network derives the final consensus for learning the world model. Such partitioned learning and stepwise conclusion at the master layer mimics the human neocortex neuronal columns that vote to retain the world model as mentioned in [9]. Learning sub-networks upon introducing new data classes is much faster than retraining the entire network. With our ENN, relearning is performed only at the master layer that only takes the input from the newly trained sub-networks. Therefore, the ENN can scale to many output classes. In the following, we look more deeply into the architecture of the ENN model.
The source code used in this paper is available on https://github.com/neouly-inc/ENN_ensemble (accessed on 15 September 2023).

3.1. Model Architecture

Figure 2 compares our ENN model against the conventional single models. First, Figure 2a describes the existing approach method of injecting preprocessed input data through a network of a single structure. The neural network performs an examination operation for a given product image through a sufficiently learned single network and finally outputs the similarity for all classes. Figure 2b has a similar structure to Figure 2a, except that it uses distributed backbone models and an ensemble layer that makes a final selection by combining the intermediate results from the backbone models. Looking closer, the input data are partitioned into N groups. Each group passes through M sub-backbone neural network models. Subsequently, the output of the sub-models is concatenated and passed through the ensemble layer following a deep neural network (DNN) structure. We refer to this ensemble layer as a master or a parent layer. In the last step, the similarity for all final design rights is returned by an overlay function that takes the product of the ensemble model result and the weighted tensor. The ENN model proceeds through the following five steps to learn the design rights of a given product image.
  • First, the ENN receives an image and performs augmentation, including input size adjustment and normalization.
  • The ENN outputs a distributed weighted output through a distributed backbone neural network.
  • The ENN concatenates the distributed weighted outputs of the individual sub-neural networks and converts them into Rosen’s tensor to be passed to the master layer.
  • With Rosen’s tensor as input, the ENN computes the order similarity coefficient for each backbone model
  • The ENN multiplies the weighted input tensor and the similarity coefficient tensor to output the final closeness of an input image to every design right.
Figure 3 represents the relearning operation that should run due to introducing new data. Previous studies, such as the one shown in Figure 3a require transfer learning of the entire model when a set of design rights is added. For the ENN model we proposed, as shown in Figure 3b, only the backbone model designated for the new dataset goes through the training phase. The DNN model at the master layer picks up the result from the new backbone model and goes through the partial transfer learning. Training time can be significantly reduced since the previous backbone models remain unchanged.

3.1.1. The Distributed Backbone Model

As mentioned in [10], the core of the ENN model is a distributed backbone model that acts similarly to a cortical column of neurons that follow a near-identical structure. The main idea of a distributed backbone model is to learn per partitioned dataset. In this study, the initial dataset was divided to have the same number of classes (design rights) as much as possible. There is no overlap in design rights between different distributed backbone models. Distributed neural networks can be trained independently and quickly in parallel.
Suppose there is a set of 50,000 image data and 100 design rights. Each design right is associated with 500 image data. Suppose we segment the 100 design rights into five masterclasses, each class having 20 design rights. Then, the first model can learn with 10,000 images corresponding to the 1st–20th classes, and the second model can learn with 10,000 images corresponding to the 21st–40th classes. Likewise, the rest of the models perform the distributed training individually by dividing the data by 10,000 images each.
The main reason for learning per divided dataset is to overcome the inefficiency of the existing methodologies that typically train on one huge neural network. The single network model incurs a significant cost of transfer learning on the entire data. Existing methods train one huge neural network to learn all classes, and additional training requires transfer learning over the whole data even when new data are incrementally added. However, our ENN model can be trained on the entire class much quicker by training only the distributed backbone model affected by the change to the dataset. We can even benefit from parallelism by simultaneously training the required distributed backbone models on separate devices.
Assume a hyperscale neural network that needs to be trained to classify an input into more than 500,000 design rights. If there are 500 images per design right, then the single neural network model is trained with about 25 million images per epoch, even when only one class was newly added, and the rest of the data were trained in advance. Moreover, the depth of the neural network may also have to be significantly rescaled and recalibrated to avoid any possible underfitting problem when the number of classes becomes very high. Therefore, learning with a single neural network is inappropriate for our problem of dealing with many product types.
In the case of the ENN, each distributed backbone model receives a learnable workload for a more feasible model fitting. Suppose one backbone model can comfortably learn up to 1000 classes of data. For 50,000 classes, we can have 500 backbone models trained independently on different devices. If 100 extra classes are introduced, we can designate one backbone model to learn from the newly trained data and let it pass the weighted input to the ensemble DNN model. The other backbone models pretrained on the previous 50,000 classes do not have to be retrained. The retraining at the ensemble DNN layer (the master layer) is performed quicker than the single neural network model as it only needs to account for the weighted input of the newly trained distributed backbone models. Therefore, the ENN model only incurs learning costs proportionally to the amount of the new data.

3.1.2. The Ensemble DNN Model

We provide the microscopic view of the ensemble DNN model in Figure 4. The ensemble neural network model derives a final consensus on the results of the distributed backbone model, just as neurons reach agreement through voting to learn the world model in the neocortex composed of neuron columns, as explained in [9]. The ensemble DNN model takes the initial input with the size of N and returns an output with a size equal to the number of distributed models (M). M is smaller than N as the distributed models are learned on partitioned datasets.
This model first takes a weighted input tensor and injects it into an FC (fully connected) layer. Then, the data are fed forward through a sequence of six FC residual blocks that follow the ResNet architecture [12], which is an evolved version of a convolutional network [31]. Each residual block comprises a batch normalization layer, an FC layer, a dropout layer, and an activation (ReLU) layer. The output of the preceding block is added to the activation layer of the next block. After the last residual block, the Sigmoid function, as defined in Equation (1), is applied. The dimension of the Sigmoid function output is identical to that of the weighted input tensor. The result of the Sigmoid function is multiplied by the weighted input tensor through the overlay function. At this time, the shape of the output is the same as the number of distributed models (N) so that the final similarities of all design rights are obtained.
S i g m o i d ( x i ) = 1 1 + e x i , i 1 , 2 , 3 , , k
Figure 5 shows that the overlay function is given p weighted input tensors. The overlay function multiplies the similarity coefficient learned by the ensemble DNN model for each input tensor. The similarity coefficient is computed using the Sigmoid or the Softmax function (Equation (2)), depending on the model we use for the backbone layer. The design right similarities are in descending order in a P × M matrix. Given the similarity table, we can instantly identify the Top-N design rights the input image is suspected to be related to.
S o f t m a x ( x i ) = x i j = 1 k e x j , i 1 , 2 , 3 , , k
The ensemble DNN model takes the input as a weighted input tensor containing the results of the preceding variance model and infers which model is the most relevant to the input image according to the similarity coefficient. The order of the preceding distributed models must remain unchanged during training and inference to determine the most pertinent variance model. Since the size of the output is very small compared to the size of the input, the ensemble model is designed to follow a fairly simple structure. When a class is added or changed, the output layer of the ensemble DNN model must be adjusted accordingly, and the retraining process has to be carried out.
Illegal counterfeits can violate multiple design rights. Thus, we should be able to detect various relevant design rights at the same time. How such a requirement is met depends on whether we use the image classification or the objection detection model as the backbone model.
If the backbone model implements image classification, the similarity of each image classification backbone model is returned for every design right. For example, the model learned from one of the five supersets giving 100 design rights yields the similarity of 20 classes. We chose the Sigmoid function over the Softmax function shown in Equation (2) to enable k multiple class detections from a single input image.
On the other hand, the object detection model already identifies multiple class objects in bounding boxes simultaneously within one image. The object detection model uses the Softmax function to predict an individual object’s class (design right) in bounding boxes. In addition, the max value of each class is added to the calculation process as shown in Equation (3) with k as the number of design rights. This process picks a design right with the highest similarity value for the detected object captured in a bounding box.
Backbone models can be substituted flexibly to seek performance gain.
X i = m a x ( x ( i , b b o x ) ) , i 1 , 2 , 3 , , k
Note that the individual distributed backbone models use a deeper architecture than the ensemble DNN model. We could make the ensemble DNN model lighter as it only needs to learn to output the similarity coefficients of the backbone models instead of learning to return the similarity of every design right. With the output of the ensemble DNN model, we can identify a sub-model that is relatively more likely to return the relevant class for a given input image. Most similar design rights can be computed instantly by running the overlay function. We maintain the efficient training and inference process by keeping the ensemble layer simple while minimizing the accuracy compromise.

4. Experiments

This section assesses the performance of the ENN model.

4.1. Experiment Setup and Implementation

Our model required full utilization of the following resource for training: Dell EMC DSS 8440 server with a 40-core CPU with 80 threads and six Tesla V100 GPUs, each with 32 GB of exclusive memory and 256 GB of RAM. DSS 8440 is operated with Ubuntu 18.04.6 LTS, and the machine learning jobs were executed on Docker containers. We implemented the following machine learning algorithms as the distributed backbone models.
  • UP-DETR [15] with CUDA (v10.2) Python (v3.7.7), PyTorch (v1.6.0), and Torchvision (v0.7.0)
  • ResNet [12] with CUDA (v10.2) Python (v3.7.7), PyTorch (v1.6.0), and Torchvision (v0.7.0)
  • WideResNet [32] with CUDA (v10.2) Python (v3.7.7), PyTorch (v1.6.0), and Torchvision (v0.7.0)
  • Yolo [13] with CUDA (v10.2) Python (v3.7.7), PyTorch (v1.6.0), and Torchvision (v0.7.0)
  • EfficientNet [33] with CUDA (v10.2) Python (v3.7.7), PyTorch (v1.10.0), and Torchvision (v0.11.0)

4.2. Data Collection and Augmentation

We collected 115,916 images for 84 design rights listed in Table 1 and Table 2. More detailed information on the design rights is available on KIPRIS (Korea Intellectual Property Information Search, http://www.kipris.or.kr) (accessed on 15 September 2023). Approximately 1380 images were evenly collected for each of the 84 design rights. For each design right, the models used are also listed. The notation of the model is a tuple followed by an ID indicating a group of design rights. The first and the second elements of the tuple of a model indicate the number of backbone models used and the number of design rights each backbone model learns. For instance, the first design right on Table 1 is a wireless earphone with a unique registration number of 3008346600000. One of the models applied to the train images of this design right was (1,11)A, meaning that one backbone model was used for learning the images of 11 design rights. The letter `A’ indicates the ID of the group to which this design right belongs. We split the image dataset into training, validation, and test sets in an 8:1:1 ratio.
The National IT Industry Promotion Agency of Korea acquired the sample products of these design rights. As shown in Figure 6, we used a machine that turns the table to photograph a sample product every three degrees. The camera height was set to high, medium, and low. We set the lighting to bright, standard, and dim. Through this photograph process, we collected 1080 images per sample product. Additionally, we took 300 pictures of each product under realistic conditions, such as showing the wrapping with label attachments. The human experts in design right examining annotated ground truth images within bounding boxes.
To obtain more real-world cases, we applied various data augmentation techniques [34,35] such as horizontal reversal, vertical reversal, brightness adjustment, contrast adjustment, saturation adjustment, image size adjustment, normalization, and partial image hiding [36].
Horizontal and vertical inversion were applied with a 50% probability. The brightness, the contrast, and the saturation were randomly selected from the ranges of 0.2–2.0, 0.8–1.2, and 0.5–1.5, respectively. The image length was chosen from 480 pixels to 800 pixels with a unit length of 32 pixels when UP-DETR was used as a distributed neural network. We achieved the best balance between accuracy, training, and test speed when the image length was set to 512 pixels. After applying the commonly used image normalization, a part of the image was covered with a 30%
Through these various image augmentations, we increased the model’s accuracy even with the initial small set of images.

4.3. Comparison of Training and Inference Speed

Table 3 shows the average training time per one epoch using the UP-DETR model as a backbone model [15]. The best model was obtained using the validation loss to prevent overfitting. The training was conducted up to 200 epochs, and we used the validation loss function to choose the best-fit model. The batch size was set to eight, considering the VRAM limit of our GPUs. We used the Distributed Data Parallel (DDP) framework to split the training workload among six GPUs.
When training a model with 84 design rights (classes), the existing method of learning all classes at once requires learning with all data through transfer learning. It took approximately 29.5 min per epoch for a single network model to train object classification with 84 classes. Using the same machine learning hardware, we project the training time to take over 2400 days for 50,000 classes. Even with the horizontal scaling of the computing resources, the single network model has to be trained on the entire dataset. Therefore, the computing resources are poorly utilized with the single network model.
On the other hand, when a backbone model of an ENN was trained independently for 10 to 11 classes, it took an order of magnitude less time per epoch than the single network model. The ensemble DNN model is so lightweight that its training time portion was negligible. Using the same computing resources, the ENN always takes a shorter constant time for the incrementally added unit-sized training dataset than the single network model. This performance measurement proves that the ENN can be more scalable than the single network model.
We profiled the inference time as shown in Table 4. For a (3, 256, 256) image, it took approximately 35 ms at each backbone model and 0.021 ms at the ensemble layers. The largest model with eight backbone models had 337,000,000 parameters in total. With sequential inference over the backbone models, the ENN took approximately 300 ms to classify a given image.

4.4. Comparison of Distributed Backbone Models

Figure 7 shows the ENN model’s Top-1 and Top-3 accuracy measurements with varying numbers of split backbone models and total design rights. The model (1,84) is the single network version learning all 84 design rights. As mentioned above, we used five backbone models: UP-DETR [15], EfficientNet [33], ResNet [12], Yolo [13], and WideResNet [32]. Specifically, we used the ResNet-101 model, wide_resnet101_2 model, and efficientnet_b7 model provided by Torchvision.
UP-DETR returned the highest Top-1 accuracy of 98% and above across model configurations. UP-DETR based on Attention Network [2] is an improved version of DETR [14] that performed impressively in the computer vision field through Swin Transformer [37]. Using UP-DETR, the Top-1 accuracy drop with the increase in design rights was negligible. UP-DETR also showed the highest Top-3 accuracy across all model configurations. UP-DETR maintained a high Top-3 accuracy despite the increase in design rights to identify.
Table 5 shows the individual backbone models’ average Top-1 and Top-3 accuracies. Each model was trained on a dataset with 10 to 11 classes. UP-DETR outperformed other backbone models with a Top-1 and Top-3 accuracy of at least 99%. UP-DETR performed flawlessly in terms of Top-3 accuracy.

4.5. Hyperparameter Tuning

In this subsection, we perform hyperparameter tuning for ensemble models. In order, the five hyperparameters are the layer size (number of perceptrons), the dropout rate, the learning rate, the optimization function, and the FC residual block depth.
Table 6 shows the entire ENN model’s prediction accuracy with varying layer sizes of the ensemble DNN model. The dropout and learning rates were fixed at 0.4 and 0.005, respectively. AdamW was used as an optimization function, and the FC residual block depth was set to six. The learning rate is low with large layers, but more information is learned. On the other hand, with small layers, the learning rate is high, but less information is learned. The ENN model performed best with the layer size set to 1024 as Top-1 and Top-3 accuracies were 98.374% and 99.410%, respectively.
Table 7 shows the prediction accuracy of the ENN model according to the dropout rate of the FC residual block. Dropout prevents overfitting in the learning process, and a high dropout rate causes more forgetting in the propagation process. For this experiment, we chose AdamW for optimization. The layer size was set to 1024. The learning rate and the depth of the FC residual block were fixed at 0.005 and 6, respectively. The dropout rate varied between 0.1 and 0.5. We found that the ENN model performed the best with the dropout set to 0.4. However, the difference between other dropout settings was not significant.
Table 8 shows the Top-1 and Top-3 accuracy of the ENN model according to the learning rate of the ensemble model. For this experiment, we used AdamW for optimization. The layer size, the dropout rate, and the FC residual block depth were set to 1024, 0.4, and 6, respectively. The learning rate is used in the learning process to limit how much it learns at a time. With a high learning rate, significant weight changes lead to quick learning. However, the learning result can be sub-optimal. With the low learning rate, more weight values are examined, which can lead to more optimal results. However, the low learning rate makes the whole learning process slower. Compared to the layer size and dropout rate, the ENN model was sensitive to the learning rate regarding the prediction accuracy. The best Top-1 and Top-3 accuracy was obtained with a learning rate of 0.005. Recently, Konar et al. [38] suggested a method to adjust the learning degree in stages according to epochs to expedite learning without falling into local minimum.
Table 9 shows the prediction accuracy of the ENN model according to the optimization function. As mentioned above, we fixed the layer size, the dropout rate, and the learning rate at the values that led the ENN model to perform the best. The depth of the FC residual block was fixed at six.
The optimization function directs the learning process to find the global minimum of loss as quickly as possible without falling into the local minimum. We experimented with RMSprop [39], SGD [40], Adam [41], and AdamW [42]. Specifically, we set the momentum of both SGD and RMSprop to 0.9. As a result, the ENN model performed the best with AdamW. RMSprop showed a sharp drop in Top-1 accuracy as the number of classes and the backbone models increased.
Table 10 shows the prediction accuracy of the ENN model with varying depth configuration for the FC residual block. We fixed all other hyperparameters at the best values we observed. The ENN model yielded the best accuracy with a depth of six.
Overall, the best ENN model we obtained through hyperparameter optimization achieved the Top-1 and Top-3 accuracies of 98.409% and 99.460%, respectively.

4.6. Comparison with a Single Network Model

We compared the ENN model with a single neural network model as shown in Table 11. The single model (1,84) led to the best accuracy. However, as mentioned earlier, the single model takes much longer training time than the ENN models. The error is propagated to all classes (design rights) during the training for the single model. On the other hand, pretrained backbone models are frozen, and only the backbone models accepting incrementally added datasets are involved in the ENN training. This modeling approach was a design choice to enhance scalability. UP-DETR helped the ENN model produce the lowest accuracy margin with the single neural network among the backbone models. Specifically, using eight UP-DETR backbone models, the ENN showed 1.51% p and 0.71% p lower Top-1 and Top-3 accuracies, respectively. For cost-effectiveness and the need to address the frequent design right updates, the ENN model’s quicker and incremental modeling approach seems more practical while not compromising the accuracy significantly.
Table 12 shows the precision, recall, and F1 score of one of the largest ENN models ((8,84) ABCDEFGH) using various backbone models. UP-DETR was the best performer, while Yolo showed the lowest accuracy. Figure 8 shows the confusion matrix of the (8,84) ABCDEFGH model using UD-DETR. The Top-1 accuracy of this model was 98.275%.
The accuracy saturation of the single neural network model is inevitable if there are thousands of products to classify. The tipping point of the single neural network model, given a much larger set of design rights, is a subject for subsequent studies. However, note that it is highly costly and time-consuming to obtain the photos of proprietary products following the design rights. Moreover, cooperation from the design right owner is needed to take the images of their products in various settings. Addressing the issues with data acquisition is another research topic to be studied in the future.

5. Discussion

To build a product-level design right classification system, we need to amass much larger images of real products, let alone actual illegal replicas. However, obtaining pictures of these real products under various settings is costly and time-consuming. An efficient data acquisition method for building the production-level design right classification is a subject for future work. We have to understand court rulings to conclude the forgery of a product automatically. We can employ a fuzzy inference system [29] at the ensemble stage to see if the output of backbone models can be inferred with higher accuracy.
We randomly distributed design rights among backbone models. In cases where some backbone models overlap on some design rights, it can negatively affect the inference at the ensemble layer. We can consider clustering similar design rights to remove overlaps.
A large sub-model on the backbone layer can consume a significant amount of memory and slow the inference process. To overcome such an issue, we can employ the Forward-Forward algorithm [43] that showed a reduced memory footprint especially for large networks.

6. Conclusions

We presented a scalable two-stage hierarchical ensemble neural network (ENN) model tuned for detecting the design rights a product is potentially infringing. We assumed a counterfeit is merely an identical copy of the existing product with a registered design right. We identify the violated design rights by classifying the image of a product shot under different settings such as lighting, packaging condition, focal length, and angles. This study focuses on the fact that thousands of design rights are registered, and many products pour into the market at the border. Classifying a product into the thousands of design rights with a single neural network is impractical due to heavy training costs and inefficient computer resource utilization. The ENN model is designed to address the scalability issue by having distributed backbone models trained on a unit-sized dataset independently and in parallel. The result of the backbone models is concatenated and passed through the ensemble DNN model that consists of fully connected residual blocks to output the model that returned the most similar class of a given product image. This novel structure could train the ENN model on incrementally added unit-sized datasets with constant time. Therefore, the ENN model can be scaled to classify many design rights. The ENN model was designed to enhance scalability. At the same time, the fine-tuned ENN model using UP-DETR as a backbone model showed Top-1 and Top-3 accuracies of 98.27% and 99.25%, respectively. Thus, we showed that the ENN model can be on a par with the single neural network model while having at least an order of magnitude lower training cost when given an incremental dataset to learn. The ENN model is the most appropriate neural network structure to adopt in the field, with thousands of products to examine. Customizing the ENN model is easy as we can plug in any neural network model in the backbone layer for further improvement in terms of accuracy.

Author Contributions

Conceptualization, Y.Y.; methodology, C.J.L., S.H.J. and Y.Y.; software, C.J.L. and S.H.J.; validation, C.J.L., S.H.J. and Y.Y.; formal analysis, C.J.L., S.H.J. and Y.Y.; investigation, C.J.L., S.H.J. and Y.Y.; resources, Y.Y.; data curation, C.J.L., S.H.J. and Y.Y.; writing—original draft preparation, C.J.L., S.H.J. and Y.Y.; writing—review and editing, Y.Y.; visualization, C.J.L.; supervision, Y.Y.; project administration, Y.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Trade, Industry & Energy (MOTIE) and the Korea Institute for Advancement of Technology (KIAT), under Grants P0014268 Smart HVAC demonstration support. This research was also supported by the MSIT (Ministry of Science and ICT), Korea under the ITRC (Information Technology Research Center) support program (RS-2023-00259099) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation), and supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2023-00240211).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The design rights information we used for this study is available on KIPRIS (Korea Intellectual Property Information Search, http://www.kipris.or.kr) (accessed on 15 September 2023. We made our code fully open-source, and it is available on https://github.com/neouly-inc/ENN_ensemble (accessed on 15 September 2023). We do not have permission to share the images of the products that are under the design rights.

Conflicts of Interest

Author Chan Jae Lee and Seong Ho Jeong is employed by NetcoreTech Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ENNEnsemble neural network
DNNDeep neural network

References

  1. Organisation for Economic Co-operation and Development; Kazimierczak,, M. Trade in Counterfeit and Pirated Goods: Mapping the Economic impact; OECD Publishing: Berlin, Germany, 2016. [Google Scholar]
  2. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  3. Kang, M.; Lee, W.; Hwang, K.; Yoon, Y. Vision transformer for detecting critical situations and extracting functional scenario for automated vehicle safety assessment. Sustainability 2022, 14, 9680. [Google Scholar] [CrossRef]
  4. Hwang, H.; Oh, J.; Lee, K.H.; Cha, J.H.; Choi, E.; Yoon, Y.; Hwang, J.H. Synergistic approach to quantifying information on a crack-based network in loess/water material composites using deep learning and network science. Comput. Mater. Sci. 2019, 166, 240–250. [Google Scholar] [CrossRef]
  5. Hwang, H.; Choi, S.M.; Oh, J.; Bae, S.M.; Lee, J.H.; Ahn, J.P.; Lee, J.O.; An, K.S.; Yoon, Y.; Hwang, J.H. Integrated application of semantic segmentation-assisted deep learning to quantitative multi-phased microstructural analysis in composite materials: Case study of cathode composite materials of solid oxide fuel cells. J. Power Sources 2020, 471, 228458. [Google Scholar] [CrossRef]
  6. Kumar, S.N.; Singal, G.; Sirikonda, S.; Nethravathi, R. A novel approach for detection of counterfeit Indian currency notes using deep convolutional neural network. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Coimbatore, India, 22–23 January 2020; IOP Publishing: Bristol, UK, 2020; Volume 981, p. 022018. [Google Scholar]
  7. Lee, S.H.; Lee, H.Y. Counterfeit bill detection algorithm using deep learning. Int. J. Appl. Eng. Res 2018, 13, 304–310. [Google Scholar]
  8. Daoud, E.; Vu, D.; Nguyen, H.; Gaedke, M. Enhancing fake product detection using deep learning object detection models. IADIS Int. J. Comput. Sci. Inf. Syst. 2020, 15, 13–24. [Google Scholar] [CrossRef]
  9. Hawkins, J. A tHousand Brains: A New Theory of Intelligence; Hachette: London, UK, 2021. [Google Scholar]
  10. Mountcastle, V.B. Modality and topographic properties of single neurons of cat’s somatic sensory cortex. J. Neurophysiol. 1957, 20, 408–434. [Google Scholar] [CrossRef]
  11. Wimmer, H.; Yoon, V.Y. Counterfeit product detection: Bridging the gap between design science and behavioral science in information systems research. Decis. Support Syst. 2017, 104, 1–12. [Google Scholar] [CrossRef]
  12. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  13. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  14. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
  15. Dai, Z.; Cai, B.; Lin, Y.; Chen, J. Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar]
  16. Torrey, L.; Shavlik, J. Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques; IGI Global: Hershey, PA, USA, 2010; pp. 242–264. [Google Scholar]
  17. Yousefnezhad, M.; Hamidzadeh, J.; Aliannejadi, M. Ensemble classification for intrusion detection via feature extraction based on deep Learning. Soft Comput. 2021, 25, 12667–12683. [Google Scholar] [CrossRef]
  18. Ahn, H.; Son, S.; Kim, H.; Lee, S.; Chung, Y.; Park, D. EnsemblePigDet: Ensemble Deep Learning for Accurate Pig Detection. Appl. Sci. 2021, 11, 5577. [Google Scholar] [CrossRef]
  19. Usman, S.M.; Khalid, S.; Bashir, S. A deep learning based ensemble learning method for epileptic seizure prediction. Comput. Biol. Med. 2021, 136, 104710. [Google Scholar]
  20. Parhami, B. Voting algorithms. IEEE Trans. Reliab. 1994, 43, 617–629. [Google Scholar] [CrossRef]
  21. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  22. Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the ICML, Bari, Italy, 3–6 July 1996; Citeseer: Raleigh, NC, USA, 1996; Volume 96, pp. 148–156. [Google Scholar]
  23. Divina, F.; Gilson, A.; Goméz-Vela, F.; García Torres, M.; Torres, J.F. Stacking ensemble learning for short-term electricity consumption forecasting. Energies 2018, 11, 949. [Google Scholar] [CrossRef]
  24. Sikora, R. A modified stacking ensemble machine learning algorithm using genetic algorithms. In Handbook of Research on Organizational Transformations Through Big Data Analytics; IGi Global: Hershey, PA, USA, 2015; pp. 43–53. [Google Scholar]
  25. Qi, Q.; Wang, Z.; Xu, Y.; Fang, Y.; Wang, C. Enhancing Phishing Email Detection through Ensemble Learning and Undersampling. Appl. Sci. 2023, 13, 8756. [Google Scholar] [CrossRef]
  26. Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.W.; Han, Z.; Pham, B.T. Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 2020, 17, 641–658. [Google Scholar] [CrossRef]
  27. Rosen, B.E. Ensemble learning using decorrelated neural networks. Connect. Sci. 1996, 8, 373–384. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Hittawe, M.M.; Katterbauer, K.; Marsala, A.F.; Knio, O.M.; Hoteit, I. Joint seismic and electromagnetic inversion for reservoir mapping using a deep learning aided feature-oriented approach. In SEG Technical Program Expanded Abstracts 2020; Society of Exploration Geophysicists: Houston, TX, USA, 2020; pp. 2186–2190. [Google Scholar]
  29. Alizadeh, S.M.S.; Bagherzadeh, A.; Bahmani, S.; Nikzad, A.; Aminzadehsarikhanbeglou, E.; Yu, S.T. Retrograde gas condensate reservoirs: Reliable estimation of dew point pressure by the hybrid neuro-fuzzy connectionist paradigm. J. Energy Resour. Technol. 2022, 144, 063007. [Google Scholar] [CrossRef]
  30. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  31. Fukushima, K.; Miyake, S.; Ito, T. Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. 1983, SMC-13, 826–834. [Google Scholar] [CrossRef]
  32. Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
  33. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 6105–6114. [Google Scholar]
  34. Van Dyk, D.A.; Meng, X.L. The art of data augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
  35. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
  36. Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
  37. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
  38. Konar, J.; Khandelwal, P.; Tripathi, R. Comparison of various learning rate scheduling techniques on convolutional neural network. In Proceedings of the 2020 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 22–23 February 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
  39. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop, coursera: Neural networks for machine learning. Univ. Toronto Tech. Rep. 2012, 6, 307. [Google Scholar]
  40. Cherry, J.M.; Adler, C.; Ball, C.; Chervitz, S.A.; Dwight, S.S.; Hester, E.T.; Jia, Y.; Juvik, G.; Roe, T.; Schroeder, M.; et al. SGD: Saccharomyces genome database. Nucleic Acids Res. 1998, 26, 73–79. [Google Scholar] [CrossRef]
  41. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  42. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  43. Hinton, G. The forward-forward algorithm: Some preliminary investigations. arXiv 2022, arXiv:2212.13345. [Google Scholar]
Figure 1. The overall structure of an automated system for examining counterfeit products.
Figure 1. The overall structure of an automated system for examining counterfeit products.
Applsci 13 12166 g001
Figure 2. Comparison of the structure of the previous model and ENN model. N: Total number of design rights; M: Number of sub-models; C: Number of design rights on each model; K: Number of additional design rights.
Figure 2. Comparison of the structure of the previous model and ENN model. N: Total number of design rights; M: Number of sub-models; C: Number of design rights on each model; K: Number of additional design rights.
Applsci 13 12166 g002
Figure 3. Comparison of the structure of the previous model and ENN model when additional training is needed.
Figure 3. Comparison of the structure of the previous model and ENN model when additional training is needed.
Applsci 13 12166 g003
Figure 4. Structure of the DNN ensemble model.
Figure 4. Structure of the DNN ensemble model.
Applsci 13 12166 g004
Figure 5. An example of overlay function operation.
Figure 5. An example of overlay function operation.
Applsci 13 12166 g005
Figure 6. This is a photo of an image being taken on a turntable for image collection (blurred due to copyright issues).
Figure 6. This is a photo of an image being taken on a turntable for image collection (blurred due to copyright issues).
Applsci 13 12166 g006
Figure 7. Top-1 and Top-3 accuracy of ENN model with varying number of distributed backbone models and total number of classes. Yolo, not shown here, performed the worst. Accuracy of Yolo is shown in Table 5.
Figure 7. Top-1 and Top-3 accuracy of ENN model with varying number of distributed backbone models and total number of classes. Yolo, not shown here, performed the worst. Accuracy of Yolo is shown in Table 5.
Applsci 13 12166 g007
Figure 8. Confusion matrix of (8,84) ABCDEFGH using UP-DETR.
Figure 8. Confusion matrix of (8,84) ABCDEFGH using UP-DETR.
Applsci 13 12166 g008
Table 1. Design rights used for the experiment.
Table 1. Design rights used for the experiment.
Registration NumberProduct TypeInternational ClassificationModels Applied
3008346600000Wireless Earphones14-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009240880000Earphones14-01(1,11)A, (1,21)I, (1,42)M, (1,84)O
3011022290000Earphones14-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3010963450000Smartwatch10-02(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009682050000Auxiliary Battery for Charging Electronic Devices13-02(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009953020000Charger for Electronic Devices13-02(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009911250000Nail Clippers28-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3005785260000Nail Polishing File28-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009277950000Hairdressing Scissors28-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3008580740000Toner Cartridge14-02(1,11)A, (1,21)I, (1,42)M, (1,84)O
3010820300000Hair Styler28-03(1,11)A, (1,21)I, (1,42)M, (1,84)O
3009462960000Nail Cleaning Tool Case03-01(1,10)B, (1,21)I, (1,42)M, (1,84)O
3010448610000Skin Care Machine24-01(1,10)B, (1,21)I, (1,42)M, (1,84)O
3009901080000Eyeliner Container28-02(1,10)B, (1,21)I, (1,42)M, (1,84)O
3009727970000Hair Dryer28-03(1,10)B, (1,21)I, (1,42)M, (1,84)O
3009201910000Lipstick28-02(1,10)B, (1,21)I, (1,42)M, (1,84)O
3008635170000Hair Dryer28-03(1,10)B, (1,21)I, (1,42)M, (1,84)O
3006924410000Front Bumper Cover for Car12-16(1,10)B, (1,21)I, (1,42)M, (1,84)O
3005781700000Cartridge for Printer Developer14-02(1,10)B, (1,21)I, (1,42)M, (1,84)O
3009950260000Nail Clippers28-03(1,10)B, (1,21)I, (1,42)M, (1,84)O
3005904250000Packaging Container09-01(1,10)B, (1,21)I, (1,42)M, (1,84)O
3007711150000Humidifier23-04(1,11)C, (1,21)J, (1,42)M, (1,84)O
3008140280000Spray Container for Cosmetic Packaging09-01(1,11)C, (1,21)J, (1,42)M, (1,84)O
3005222300000Cosmetic Containers09-01(1,11)C, (1,21)J, (1,42)M, (1,84)O
3006924390000Car Radiator Grill12-16(1,11)C, (1,21)J, (1,42)M, (1,84)O
3010336170000Fan23-04(1,11)C, (1,21)J, (1,42)M, (1,84)O
3006037400000Hair Dryer28-03(1,11)C, (1,21)J, (1,42)M, (1,84)O
3009746650000Spray Container for Packaging09-01(1,11)C, (1,21)J, (1,42)M, (1,84)O
3010424520002Portable Vacuum Cleaner15-05(1,11)C, (1,21)J, (1,42)M, (1,84)O
3009508860000Skin Care Machine28-03(1,11)C, (1,21)J, (1,42)M, (1,84)O
3005872160000Nail Clippers28-03(1,11)C, (1,21)J, (1,42)M, (1,84)O
3010277880000Portable Air Purifier23-04(1,11)C, (1,21)J, (1,42)M, (1,84)O
3006394680000Front Fog Lamp for Car26-06(1,10)D, (1,21)J, (1,42)M, (1,84)O
3010353420000Stylus Pen14-99(1,10)D, (1,21)J, (1,42)M, (1,84)O
3008337320000Car Head Lamp26-06(1,10)D, (1,21)J, (1,42)M, (1,84)O
3008337300000Automotive Rear Combination Lamp26-06(1,10)D, (1,21)J, (1,42)M, (1,84)O
3008486220000Front Bumper Cover for Car12-16(1,10)D, (1,21)J, (1,42)M, (1,84)O
3008486270000Car Radiator Grill12-16(1,10)D, (1,21)J, (1,42)M, (1,84)O
3008433850000Car Wheel12-16(1,10)D, (1,21)J, (1,42)M, (1,84)O
3009369070000Cell Phone Protection Case03-01(1,10)D, (1,21)J, (1,42)M, (1,84)O
3009505900000Infant Head Protector02-99(1,10)D, (1,21)J, (1,42)M, (1,84)O
3006471740000Heat Therapy Device24-01(1,10)D, (1,21)J, (1,42)M, (1,84)O
3020200055040Wireless Earphones14-03(1,11)E, (1,21)K, (1,42)N, (1,84)O
3008488090000Infant Head Protector02-99(1,11)E, (1,21)K, (1,42)N, (1,84)O
3007512050000Animal Toys21-01(1,11)E, (1,21)K, (1,42)N, (1,84)O
3007827830000Vacuum Cleaner15-05(1,11)E, (1,21)K, (1,42)N, (1,84)O
3010328940000Hairdressing Scissors28-03(1,11)E, (1,21)K, (1,42)N, (1,84)O
3006880340000Car Head Lamp26-06(1,11)E, (1,21)K, (1,42)N, (1,84)O
3006314510000Developer for Printer14-02(1,11)E, (1,21)K, (1,42)N, (1,84)O
3005792510000Hair Dryer28-03(1,11)E, (1,21)K, (1,42)N, (1,84)O
Table 2. Design rights used for the experiment.
Table 2. Design rights used for the experiment.
Registration NumberProduct TypeInternational ClassificationModels Applied
3009137110000Robotic Vacuum15-05(1,11)E, (1,21)K, (1,42)N, (1,84)O
3005633730000Nail Clippers28-03(1,11)E, (1,21)K, (1,42)N, (1,84)O
3006880350000Automotive Rear Combination Lamp26-06(1,11)E, (1,21)K, (1,42)N, (1,84)O
3007892610000Hair Dryer28-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3004925580000Hair Dryer28-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3009277940000Hairdressing Scissors28-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3009664240000Infant Head Protector02-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3010776320000Cheering Equipment21-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3007488730000Nail Clippers28-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3006812870000Doll21-01(1,10)F, (1,21)K, (1,42)N, (1,84)O
3005777720000Electric Hair Straightener28-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3008380770000General Beauty Scissors08-03(1,10)F, (1,21)K, (1,42)N, (1,84)O
3006813180000Hair Brush04-02(1,10)F, (1,21)K, (1,42)N, (1,84)O
3007298000000Electric Hair Straightener28-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3009442540000Nail Clippers with Magnifying Glass Attached28-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3010468310000Head Guard02-99(1,11)G, (1,21)L, (1,42)N, (1,84)O
3007845090000Stationery Scissors08-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3006955750000Doll21-01(1,11)G, (1,21)L, (1,42)N, (1,84)O
3008976800000Cheering Tool21-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3009317560000Doll21-01(1,11)G, (1,21)L, (1,42)N, (1,84)O
3011212930000Cheering Tool21-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3008380780000Beauty Thinning Scissors08-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3009052330000Hair Dryer28-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3011182010000Infant Head Protection02-03(1,11)G, (1,21)L, (1,42)N, (1,84)O
3005633760000Nail Clippers28-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3010696720000Cheering Equipment21-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3007449670000Hair Brush04-02(1,10)H, (1,21)L, (1,42)N, (1,84)O
3010123750000Nail Clippers28-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3011236760000Cheering Light Stick21-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3009505920000Infant Head Protector02-99(1,10)H, (1,21)L, (1,42)N, (1,84)O
3005480740000Hand Puppet21-01(1,10)H, (1,21)L, (1,42)N, (1,84)O
3011211790000Cheering Tool21-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3008039980000Hair Styler28-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
3007797260000Cheering Glow Stick21-03(1,10)H, (1,21)L, (1,42)N, (1,84)O
Table 3. Average UP-DETR training time per epoch with varying number of design rights.
Table 3. Average UP-DETR training time per epoch with varying number of design rights.
Number of Design Rights101114214284
Average Train Time
per Epoch (min)
3.03.54.56.7515.2529.5
Table 4. Inference time measurement.
Table 4. Inference time measurement.
Inference StagesBackboneEnsemble
Average time (ms)350.021
Table 5. Average Top-1 and Top-3 accuracy (%) of individual backbone models. Model is distinguished in a tuple followed by a letter ID. The first and the second elements of the tuple are the number of backbone models and the number of design rights learned, respectively. The letter ID indicates a group of design rights.
Table 5. Average Top-1 and Top-3 accuracy (%) of individual backbone models. Model is distinguished in a tuple followed by a letter ID. The first and the second elements of the tuple are the number of backbone models and the number of design rights learned, respectively. The letter ID indicates a group of design rights.
ModelUP-DETRYoloEfficientNetResNetWideResNet
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(1,11) A99.868100.00096.44397.95899.868100.00099.671100.00098.09099.868
(1,10) B100.000100.00095.07297.68199.710100.00099.783100.00099.348100.000
(1,11) C99.934100.00093.47897.82699.802100.00099.407100.00099.275100.000
(1,10) D100.000100.00090.29096.23299.783100.00098.33399.85599.56599.855
(1,11) E100.000100.00094.13797.16799.473100.00099.605100.00098.35399.934
(1,10) F99.928100.00095.00098.33399.855100.00099.203100.00097.754100.000
(1,11) G99.868100.00096.50999.012100.000100.00099.868100.00099.539100.000
(1,10) H100.000100.00096.95798.04399.783100.00098.551100.00097.826100.000
(1,21) I99.896100.00096.65397.96499.862100.00099.51799.96599.51799.931
(1,21) J100.000100.00097.86198.89699.896100.00099.551100.00099.65599.965
(1,21) K99.965100.00096.30898.68999.931100.00099.068100.00099.48299.965
(1,21) L100.000100.00098.34499.41399.655100.00099.310100.00098.965100.000
(1,42) M99.879100.00097.98198.68999.948100.00099.586100.00099.620100.000
(1,42) N99.845100.00096.41199.05199.931100.00099.39699.98399.39699.983
(1,84) O (Single)99.78499.95790.70996.29999.905100.00099.56999.98399.68199.983
Table 6. Top-1 and Top-3 accuracy(%) of the ENN model with varying layer size. The letter IDs identify the group of design rights. For example, (2,21)AB means that 42 design rights are divided into groups ‘A’ and ‘B’.
Table 6. Top-1 and Top-3 accuracy(%) of the ENN model with varying layer size. The letter IDs identify the group of design rights. For example, (2,21)AB means that 42 design rights are divided into groups ‘A’ and ‘B’.
Model20481024512256
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(2,21) AB98.86199.82798.93099.96698.68999.82798.93099.827
(3,32) ABC98.48399.50298.25699.59298.12099.45798.21199.389
(4,42) ABCD98.58599.50098.34499.36298.56899.43198.41399.465
(5,53) ABCDE97.96399.33098.15499.28998.03199.18097.74499.180
(6,63) ABCDEF98.06899.16098.32199.14998.03399.10398.13799.275
(7,74) ABCDEFG98.05199.15898.29699.31598.26799.11998.22899.128
(8,84) ABCDEFGH98.36199.16398.31899.19898.13799.12998.10299.180
Average98.33999.37798.37499.41098.26499.32198.25299.349
Table 7. Accuracy (%) of the ENN model according to the dropout rate.
Table 7. Accuracy (%) of the ENN model according to the dropout rate.
Model0.10.20.30.40.5
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(2,21) AB98.96599.89698.93099.96698.72399.82798.99999.89698.96599.793
(3,32) ABC98.18899.54798.25699.59298.27999.54798.39299.66098.25699.479
(4,42) ABCD98.55199.51798.34499.36298.62099.51798.44799.50098.32699.413
(5,53) ABCDE97.93599.22198.15499.28997.93599.26298.11399.43998.19599.330
(6,63) ABCDEF98.06899.03498.32199.14997.93099.04598.26399.21898.09199.114
(7,74) ABCDEFG98.23799.28598.29699.31598.20899.13898.37499.25698.15999.138
(8,84) ABCDEFGH98.20699.11198.31899.19898.45699.17298.27599.24998.06899.189
Average98.30799.37398.37499.41098.30799.35898.40999.46098.29499.351
Table 8. Accuracy (%) of the ENN model according to the learning rate.
Table 8. Accuracy (%) of the ENN model according to the learning rate.
Model0.050.010.0050.001
Top-1Top-3Top-1Top-3Top-1Top-3Top-1Top-3
(2,21) AB98.68999.82798.68999.79398.99999.89698.75899.827
(3,32) ABC97.91799.77497.96299.59298.39299.66098.05399.457
(4,42) ABCD97.91299.46598.37899.56998.44799.50098.22399.362
(5,53) ABCDE96.87498.74297.63599.16698.11399.43997.94999.262
(6,63) ABCDEF97.23999.27598.02299.13798.26399.21898.07999.160
(7,74) ABCDEFG97.13199.15898.14999.26698.37499.25698.30699.275
(8,84) ABCDEFGH97.04199.12098.22399.22498.27599.24998.35299.180
Average97.54399.33798.15199.39298.40999.46098.24699.360
Table 9. Accuracy (%) of the ENN model according to the optimizer.
Table 9. Accuracy (%) of the ENN model according to the optimizer.
ModelAdamWAdamSGDRMSprop
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(2,21) AB98.99999.89698.79299.89698.41399.75895.16999.965
(3,32) ABC98.39299.66098.32499.61597.96299.70692.27899.841
(4,42) ABCD98.44799.50098.67199.46598.44799.65585.05999.638
(5,53) ABCDE98.11399.43998.14199.33097.67699.39880.47699.316
(6,63) ABCDEF98.26399.21898.09199.13797.93099.27576.39898.884
(7,74) ABCDEFG98.37499.25698.36599.13898.18899.22672.11198.570
(8,84) ABCDEFGH98.27599.24998.37899.14698.24099.18069.11797.559
Average98.40999.46098.39599.39098.12299.45781.51599.111
Table 10. Accuracy (%) of the ENN model according to the depth of FC residual block depth.
Table 10. Accuracy (%) of the ENN model according to the depth of FC residual block depth.
Model45678
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(2,21) AB98.93099.89699.03499.86298.99999.89698.89699.65598.89699.827
(3,32) ABC98.37099.50298.37099.59298.39299.66098.21199.29898.21199.434
(4,42) ABCD98.44799.55198.56899.50098.44799.50098.46499.60398.41399.482
(5,53) ABCDE98.05999.38598.04599.37198.11399.43997.79999.33098.19599.412
(6,63) ABCDEF98.44799.19598.11499.12698.26399.21898.02299.19598.29899.264
(7,74) ABCDEFG97.99399.12898.15999.26698.37499.25698.20899.18798.29699.226
(8,84) ABCDEFGH98.24099.25898.22399.19898.27599.24998.27599.22498.41399.224
Average98.35599.41798.35999.41698.40999.46098.26899.35698.38999.410
Table 11. Comparison of accuracy (%) according to the number of model separations.
Table 11. Comparison of accuracy (%) according to the number of model separations.
ModelUP-DETRYoloEfficientNetResNetWideResNet
Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3 Top-1 Top-3
(2,21) AB98.93099.89690.68396.75696.68799.96596.51599.96594.23799.482
(3,32) ABC98.37099.50287.22895.26796.44599.72895.24599.34393.52498.777
(4,42) ABCD98.44799.55184.97293.09996.37799.55194.91098.98293.84198.602
(5,53) ABCDE98.05999.38583.44392.50894.75099.15294.21798.72892.93198.496
(6,63) ABCDEF98.44799.19583.50691.53495.48098.93094.13498.40193.21498.217
(7,74) ABCDEFG97.99399.12883.50091.32495.55498.79694.03698.11093.40097.983
(8,84) ABCDEFGH98.27599.24983.79191.44295.98098.89694.22998.13793.59997.800
(4,84) IJKL98.87899.67288.70896.36895.38599.60395.05799.09493.70398.913
(2,84) MN98.40499.18993.44498.49096.94699.96596.91299.78496.00699.862
(1,84) O99.78499.95790.70996.29999.905100.00099.56999.98399.68199.983
Table 12. Precision, recall, and F1 score of one of the largest models, (8,84) ABCDEFGH.
Table 12. Precision, recall, and F1 score of one of the largest models, (8,84) ABCDEFGH.
ModelPrecisionRecallF1 Score
UP-DETR98.30998.27598.271
Yolo84.26783.79183.718
EfficientNet96.01995.98095.961
ResNet94.28494.22994.212
WideResNet93.69093.59993.593
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, C.J.; Jeong, S.H.; Yoon, Y. ENN: Hierarchical Image Classification Ensemble Neural Network for Large-Scale Automated Detection of Potential Design Infringement. Appl. Sci. 2023, 13, 12166. https://doi.org/10.3390/app132212166

AMA Style

Lee CJ, Jeong SH, Yoon Y. ENN: Hierarchical Image Classification Ensemble Neural Network for Large-Scale Automated Detection of Potential Design Infringement. Applied Sciences. 2023; 13(22):12166. https://doi.org/10.3390/app132212166

Chicago/Turabian Style

Lee, Chan Jae, Seong Ho Jeong, and Young Yoon. 2023. "ENN: Hierarchical Image Classification Ensemble Neural Network for Large-Scale Automated Detection of Potential Design Infringement" Applied Sciences 13, no. 22: 12166. https://doi.org/10.3390/app132212166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop