You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

9 October 2022

JN-Logo: A Logo Database for Aesthetic Visual Analysis

,
and
1
School of Design, Jiangnan University, Wuxi 214122, China
2
School of Software Engineering, Shandong University, Jinan 250101, China
*
Author to whom correspondence should be addressed.
Current address: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.

Abstract

Data are an important part of machine learning. In recent years, it has become increasingly common for researchers to study artificial intelligence-aided design, and rich design materials are needed to provide data support for related work. Existing aesthetic visual analysis databases contain mainly photographs and works of art. There is no true logo database, and there are few public and high-quality design material databases. Facing these challenges, this paper introduces a larger-scale logo database named JN-Logo. JN-Logo provides 14,917 logo images from three well-known websites around the world and uses the votes of 150 graduate students. JN-Logo provides three types of annotation: aesthetic, style and semantic. JN-Logo’s scoring system includes 6 scoring points, 6 style labels and 11 semantic descriptions. Aesthetic annotations are divided into 0–5 points to evaluate the visual aesthetics of a logo image: the worst is 0 points; the best is 5 points. We demonstrate five advantages of the JN-Logo database: logo images as data objects, rich human annotations, quality scores for image aesthetics, style attribute labels and semantic description of style. We establish a baseline for JN-Logo to measure the effectiveness of its performance on algorithmic models of people’s choices of logo images. We compare existing traditional handcrafted and deep-learned features in both the aesthetic scoring task and the style-labeling task, showing the advantages of deep learning features. In the logo attribute classification task, the EfficientNet _B1 model achieved the best results, reaching an accuracy of 0.524. Finally, we describe two applications of JN-Logo: generating logo design style and similarity retrieval of logo content. The database of this article will eventually be made public.

1. Introduction

In recent years, intelligent design has become a new dimension of design practice and academic research. Intelligent design system-assisted design, such as intelligent poster design, photo classification and intelligent logo design, has gradually become popular, bringing more possibilities to the design industry. However, there are few public databases that include design materials such as posters, web pages and logos, so related technologies use photo databases and personal databases. This is inconvenient for researchers. In 2020, Wu et al. [1] in China explored new next-generation artificial intelligence technology including deep learning and proposed a next-generation artificial intelligence plan. The reference exemplifies the need for data and accurate expert annotation knowledge for deep learning. It was confirmed that a database with high-quality annotation knowledge has high research value. On 8 July 2021, at the World Artificial Intelligence Conference (WAIC 2021), the digital creative intelligent design engine proposed by the Alibaba–Zhejiang University Frontier Technology Joint Research Center summarized the model construction of a media carrier that organically integrates design theory and computing. The research content involves the interdisciplinary topics of artificial intelligence and design, such as design semantic annotation, image generation, graphic style learning and aesthetic computing. It can be seen that the computational method of intelligent design urgently needs a large-scale public database of design materials.
As researchers committed to solving this problem, we introduce a new visual analysis database of logo image aesthetics named JN-Logo that combines and improves aesthetic analysis.
The data and labels for the aesthetic score are at this link: https://drive.google.com/file/d/13EXuptt6TpKqHWk6tDr2AWFLVSW-dHHC/view?usp=sharing (accessed on 24 September 2022).
The data and labels for style attribute classification are at this link: https://drive.google.com/file/d/1NQc-uCV42j71sumwh0jWeFpZyvMdHSW1/view?usp=sharing (accessed on 24 September 2022).
The following are the contributions and innovations of our work:
  • We introduce a larger database of logos that are design materials. We evaluate the aesthetics of the data by manual annotation. The evaluation system includes six kinds of aesthetic scores, six kinds of style attribute labels and semantic descriptions. We also show five advantages of JN-Logo.
  • We set up two tasks for JN-Logo: a style attribute classification task and an aesthetic quality scoring task. JN-Logo is tested using methods based on traditional features as well as deep features. Finally, the best performance is selected as the baseline.
  • We demonstrate JN-Logo for logo retrieval and specifying color transfer. It is proven that high-quality data with aesthetic quality evaluation is more beneficial to intelligent system design.
The rest of the paper is organized as follows. In Section 2, we present an aesthetic quality evaluation, aesthetic visual analysis database and logo database related to JN-Logo. In Section 3, we describe the aesthetic evaluation of JN-Logo and compare the pros and cons of related datasets. In Section 4, we present aesthetic visual analysis tasks and experiments. In Section 5, we describe two applications of JN-Logo: logo style generation and logo content similarity retrieval. In Section 5, we discuss how JN-Logo will be expanded for more research in the future.

4. Creation of Baseline

This section creates a baseline for JN-Logo as a performance criterion. We use methods based on traditional features and deep features to conduct experiments on JN-Logo, analyze the indicators of the model, prove the superiority of the performance of the model with higher accuracy, and use it as a baseline. We separately set the style classification task and the image quality scoring task to train simultaneously.
In the style classification task, the purpose is to train an attribute classifier that can separate logo images of different attributes. For an image I i , the attribute labels can be obtained through the style attribute classifier.
attr i = classifier 1 I i .
In the image aesthetic scoring task, it is also necessary to train a scoring classifier to distinguish different scoring levels. For an image I i , obtain the score of this image through the scoring classifier.
Score i = classifier 2 I i .

4.1. Method

The handcrafted and deep-feature methods used for the two tasks are described below.
(1) Methods based on handcrafted features:
The method based on handcrafted features is divided into two steps. First, the features of the image need to be extracted, and then, the features are input into the attribute classifier or the scoring classifier.
We used the following methods: the style attribute classifier corresponds to the style attribute classification task, and the scoring classifier corresponds to the quality scoring task. The Histogram of Oriented Gradients (HOG) [34] is used to extract image features. The second step is to feed the features into a style attribute classifier and scoring classifier using a variety of approaches: (1) Support Vector Machine (SVM), (2) XGBoost [35] and (3) Random Forest [36].
Although the above traditional handcrafted features have the advantages of high efficiency and good interpretability, they cannot extract the implicit high-level semantic information of the picture, so they cannot achieve high accuracy. Therefore, we also consider using deep feature-based methods to accomplish these two tasks.
(2) Methods based on deep features: Deep feature-based methods use end-to-end training, feature extraction and downstream tasks performed on one model, and the image can be directly input into the deep model to obtain its attribute classification label or rating label.
The following models are used: Convolutional Neural Network (CNN) [37,38,39], Residual Network, [40], EfficientNet [41,42], Visual Transformer Model [42] and MLP-Mixer  [43].
Convolutional Neural Networks (CNNs) are a class of feedforward neural networks that contain convolutional computations and have deep structures. Its artificial neurons can respond to a part of the surrounding units within the coverage area and have the characteristics of local perception, weight sharing, downsampling, etc., reducing the number of parameters, expanding the network receptive field, and meaning the network does not need complex preprocessing of images. The original image can be directly input, so this method has been widely used in various tasks in the field of computer vision.
Residual Network (ResNet) [40] is mainly composed of residual blocks. By adding residual connections to the residual blocks, the problem of network degradation is solved, and the problem of gradient disappearance caused by increasing depth is alleviated so that the network can be deeper.
EfficientNet [41,42] is a model obtained by compound model expansion combined with neural structure search technology. It mainly scales the model automatically in the three dimensions of depth, width and scale.
Visual Transformer Model [42] is a deep self-attention mechanism learning model. Visual Transformer is a class of models that blocks images and inputs them into the transformer, abandoning the traditional CNN architecture and having less inductive bias.
MLP-Mixer [43] (Mixer) is entirely based on Multilayer Perceptrons (MLPs), without using convolutions or self-attention, just repeatedly applied to spatial locations or feature channels. Mixer relies only on basic matrix multiplication routines, data storage layout transformations and scalar non-linearization.
The method in this section inputs the training data, that is, the large-scale sign image I, into the depth model D M to obtain the depth feature F, where F is a one-dimensional vector used for semantic representation of the sign image. After obtaining the image features, we complete the attribute classification and aesthetic scoring tasks through the style attribute classification layer or the scoring classification layer, respectively. The formula is as follows:
F = D M ( I i ) , I i I .
Specifically, the feature vector is mapped to the category label through a fully connected layer (FC), and the input probability value (logits) is obtained through the softmax function, which is expressed as follows:
l o g i t s = SoftMax ( F C ( F ) ) ,
finally, the value of the cross-entropy function is calculated through the output logits, the value of the cross-entropy is minimized, and the network parameters are updated through the iterative training and back-propagation algorithm. Figure 7 shows the general framework of the deep model approach. The model completes two tasks based on different labels: style attribute classification and image quality scoring. In simple terms, we input the training image into the deep model to obtain deep features, complete the classification task through the classification layer, and iteratively update the model parameters through the back-propagation algorithm.
Figure 7. Diagram of the deep model-based approach. This model completes two tasks of style attribute classification and aesthetic scoring based on different labels. First, input the training image into the deep model to obtain deep features, complete the classification task through the classification layer, and iteratively update the model parameters through the back-propagation algorithm.
Figure 7 shows the general framework of the deep model approach.
The deep-feature-based method is an end-to-end model that does not require a human to manually design algorithms to extract image features. The hidden high-level semantic features of images can be directly extracted through a large number of learnable parameters. Hence, manual intervention is reduced, and better performance is achieved.

4.2. Results

4.2.1. Dataset

For the dataset, JN-Logo is used to visually analyze all the data in the database, with a total of 14,917 images, which are divided by image quality (0–5 points), style attributes (1–6) and 10 (excluding “other”) style descriptions. The image quality score is 0–5, and the number of aesthetic scores 0–5 is shown in Table 4:
Table 4. Number of aesthetic annotations of logo images.
The six categories of style attributes are called (1) Rational and Scientific, (2) Hot and Warm, (3) Sweet and Fresh, (4) Dynamic and Vivacious, (5) Pure and Simple and (6) other styles. The number of style attributes is shown in Table 5:
Table 5. Number of style annotations of images.
The JN-Logo database is divided into a training set and a test set, and then 90% of the pictures are randomly set as the training set, and the remaining 10% of the pictures are set as the test set; the style attribute classification task and the image quality scoring task are set for simultaneous training (Equation (5)).
Score i = classifier 2 I i .
For the style attribute classification task, the purpose is to train an attribute classifier that can separate logo images of different attributes. For an image, its attribute label can be obtained through this attribute classifier, which can be expressed as:
attr i = classifier 1 I i ,

4.2.2. Experimental Setup

We use the JN-Logo dataset (14,917) for training with the following hyperparameter settings: initial learning rate is set to 3 e 4 ; there are a total of 20 rounds for training, and in the 10th and 15th rounds, the decay learning rate is 1/10 of the previous one; SGD is used as the optimizer, and the image resolution is set to 224 × 224 .

4.2.3. Evaluation Metrics

For the tasks of logo attribute classification and aesthetic scoring, accuracy is used as the evaluation index:
Accuracy = T P + T N T P + T N + F P + F N ,
where N and P are the predicted classes, T P (True Positive) is the number of positive samples predicted to be positive, T N (True Negative) is the number of negative samples predicted to be negative, F P (False Positive) is the number of negative samples predicted to be positive, and F N (False Negative) is the number of positive samples predicted to be negative.

4.2.4. Results and Analysis

The specific results are shown in Table 6. Accuracy 1 represents the accuracy of the manual feature method and the deep feature method for the style attribute category, and Accuracy 2 represents the accuracy of the manual feature method and the deep feature method for the aesthetic score category.
Table 6. Accuracy 1 and Accuracy 2 of the manual feature method and the deep feature method.
In the logo attribute classification task, EfficientNet _B1 [45] achieved the best result, reaching an accuracy of 0.524. In the quality scoring task, neither hand-crafted feature-based methods nor deep feature-based methods achieved good results. The best performance by SeResNet50 [40,47] reached an accuracy of 0.293. Therefore, it can be seen that the participants differed greatly in scoring, and the different judgments of the quality of the logo images resulted in a low accuracy rate. For this situation, it is necessary to design a more appropriate model from the characteristics of the dataset itself. However, manual scoring has high randomness, and each person has different scoring standards, so predicting a high accuracy rate is still a challenging task.
In the quality scoring task, each person’s aesthetic preferences were different and failed to achieve good results. This is a challenge to be solved.
We compared performance parameters of precision, recall and F-1 measure. The performance parameters of the style attribute classification are shown in Table 7:
Table 7. Comparison of performance parameters of precision, recall and F-1 measure of style classification.
The performance parameters table of aesthetic score is as follows (Table 8):
Table 8. Comparison of performance parameters of precision, recall and F-1 measure of aesthetic classification.
The first three are traditional handcrafted features, the rest are deep learning features, and the table shows that deep feature-based methods have certain advantages. The results of aesthetics scoring are less than ideal, indicating that the aesthetics scoring task is still a challenging task.
We increased the real-time display FPS (frames per second), which represents the complexity of the algorithm. Complexity is inversely proportional to speed (Table 9).
Table 9. FPS of models.
As shown in Table 9, the FPS frame rate of models such as Vit-B can meet the real-time requirements.

6. Conclusions

We created a logo aesthetic image database that combines aesthetic analysis databases (Photographic Aesthetic Image) and logo image retrieval databases (Logo Retrieval Image), and we also established a baseline using a deep learning model. The model provides three types of annotations: aesthetic quality score, style attribute classification and semantic description. The data contain six aesthetic scores, six style labels and semantic descriptions. We compare JN-Logo with other related databases and show the advantages of JN-Logo in five aspects. Experiments on JN-Logo using traditional handcrafted features and deep feature methods establish a baseline for JN-Logo as a performance standard. We measure the effectiveness of the performance of algorithmic models that people choose to label data. We introduce JN-Logo for similarity retrieval of image content and image generation with style semantics. We demonstrate by the method that a high-quality database with specified content is more conducive to intelligent design.
In the style attribute classification task, the EfficientNet _B1 model achieved the best results, only reaching an accuracy of 0.524. Especially in the aesthetic scoring task, the best result comes from the SeResNet50 model, which only achieves an accuracy of 0.293, indicating that the current task is still challenging. Because manual annotation is hard to control, we will continue to optimize our database and improve the model in the future.
For example, we will invite art-related experts to score, improve the quality and accuracy of scoring and establish a better standard. We will design a multi-label system and perform multiple rounds of labeling. In the classification of style attributes, not only should the style of color be divided, but also the style of graphics should be classified. In the evaluation method, there is no objective evaluation index for the image quality of design material data such as logos. The database of this paper was used to propose a new evaluation index algorithm. In the design field, we can build a font-based logo image dataset, such as one that includes font logos of Chinese artists.

Author Contributions

All authors contributed to this work. Conceptualization, N.T.; methodology, N.T.; software, N.T. and Z.S.; validation, N.T. and Z.S.; formal analysis, N.T. and Z.S.; investigation, Y.L.; resources, Y.L.; data curation, N.T.; writing—original draft, N.T.; writing—review and editing, Z.S.; visualization, N.T.; supervision, Y.L.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available on request due to restrictions e.g., privacy.The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the research results having potential commercial value.

Acknowledgments

The authors thank the students from the School of Artificial Intelligence and School of Design at Jiangnan University for their help with scoring.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wu, F.; Lu, C.; Zhu, M.; Chen, H.; Pan, Y. Towards a new generation of artificial intelligence in China. Nat. Mach. Intell. 2020, 2, 312–316. [Google Scholar] [CrossRef]
  2. Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
  3. Yan, K.; Tang, X.; Feng, J. The Design of High-Level Features for Photo Quality Assessment. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 419–426. [Google Scholar]
  4. Wei, L.; Wang, X.; Tang, X. Content-Based Photo Quality Assessment. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
  5. Tang, X.; Luo, W.; Wang, X. Content-Based Photo Quality Assessment. IEEE Trans. Multimed. 2013, 15, 1930–1943. [Google Scholar] [CrossRef]
  6. Datta, R.; Joshi, D.; Li, J.; Wang, J.Z. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proceedings of the Computer Vision-ECCV 2006, 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. [Google Scholar]
  7. Joshi, D.; Datta, R.; Fedorovskaya, E.; Luong, Q.T.; Wang, J.Z.; Jia, L.; Luo, J. Aesthetics and Emotions in Images. IEEE Signal Process. Mag. 2011, 28, 94–115. [Google Scholar] [CrossRef]
  8. Ghadiyaram, D.; Pan, J.; Bovik, A.C. A Subjective and Objective Study of Stalling Events in Mobile Streaming Videos. IEEE Trans. Image Proc. 2017, 29, 183–197. [Google Scholar]
  9. Ponomarenko, N.; Carli, M.; Lukin, V.; Egiazarian, K.; Battisti, F. Metrics Performance Comparison For Color Image Database. In Proceedings of the 2009 International Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, USA, 14–16 January 2009. [Google Scholar]
  10. Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Image database TID2013: Peculiarities, results and perspectives-ScienceDirect. Signal Process. Image Commun. 2015, 30, 57–77. [Google Scholar] [CrossRef]
  11. Zepernick, H. Wireless Imaging Quality (WIQ) Database; Blekinge Tekniska Hgskola: Karlskrona, Sweden, 2010. [Google Scholar]
  12. Yue, G.; Meng, K.; Li, H. Graph Based Visualization of Large Scale Microblog Data. In Proceedings of the Advances in Multimedia Information Processing– PCM 2015.Conference On, Part II, Gwangju, Korea, 16–18 September 2015. [Google Scholar]
  13. Revaud, J.; Douze, M.; Schmid, C. Correlation-Based Burstiness for Logo Retrieval. In Proceedings of the ACM Multimedia Conference, Nara, Japan, 29 October–2 November 2012. [Google Scholar]
  14. Romberg, S.; Pueyo, L.G.; Lienhart, R.; Zwol, R.V. Scalable logo recognition in real-world images. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, Trento, Italy, 18–20 April 2011. [Google Scholar]
  15. Kalantidis, Y.; Pueyo, L.G.; Trevisiol, M.; Zwol, R.V.; Avrithis, Y. Scalable triangulation-based logo recognition. In Proceedings of the 1st International Conference on Multimedia Retrieval, ICMR 2011, Trento, Italy, 18–20 April 2011. [Google Scholar]
  16. Romberg, S.; Lienhart, R. Bundle min-hashing for logo recognition. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, Dallas, TX, USA, 16–20 April 2013. [Google Scholar]
  17. Yan, W.Q.; Wang, J.; Kankanhalli, M.S. Automatic video logo detection and removal. Multimed. Syst. 2005, 10, 379–391. [Google Scholar] [CrossRef]
  18. Bao, Y.; Li, H.; Fan, X.; Liu, R.; Jia, Q. Region-based CNN for Logo Detection. In Proceedings of the International Conference on Internet Multimedia Computing and Service, Xi’an, China, 19–21 August 2016. [Google Scholar]
  19. Eggert, C.; Zecha, D.; Brehm, S.; Lienhart, R. Improving Small Object Proposals for Company Logo Detection. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Bucharest, Romania, 6–9 June 2017; pp. 167–174. [Google Scholar]
  20. Neumann, J.; Samet, H.; Soffer, A. Integration of local and global shape analysis for logo classification. Pattern Recognit. Lett. 2001, 23, 1449–1457. [Google Scholar] [CrossRef]
  21. Wang, J.; Min, W.; Hou, S.; Ma, S.; Zheng, Y.; Jiang, S. LogoDet-3K: A Large-Scale Image Dataset for Logo Detection. In Proceedings of the Computer Vision and Pattern Recognition, 12 August 2020; Available online: https://arxiv.org/pdf/2008.05359.pdf (accessed on 25 March 2021).
  22. Wang, J.; Min, W.; Hou, S.; Ma, S.; Jiang, S. Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6194–6201. [Google Scholar] [CrossRef]
  23. Tüzk, A.; Herrmann, C.; Manger, D.; Beyerer, J. Open Set Logo Detection and Retrieval. In Proceedings of the International Conference on Computer Vision Theory and Applications, Madrid, Portugal, 1 January 2018. [Google Scholar]
  24. Hoi, S.; Wu, X.; Liu, H.; Wu, Y.; Wang, H.; Xue, H.; Wu, Q. LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 46, 2403–2412. [Google Scholar]
  25. Hang, S.; Gong, S.; Zhu, X. WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 270–279. [Google Scholar]
  26. Fehervari, I.; Appalaraju, S. Scalable Logo Recognition Using Proxies. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019. [Google Scholar]
  27. Yang, Y.; Xu, L.; Li, L.; Qie, N.; Li, Y.; Zhang, P.; Guo, Y. Personalized Image Aesthetics Assessment with Rich Attributes. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 27 September 2022; pp. 19829–19837. [Google Scholar]
  28. Ren, J.; Shen, X.; Lin, Z.; Mech, R.; Foran, D.J. Personalized Image Aesthetics. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 638–647. [Google Scholar]
  29. Shu, K.; Shen, X.; Zhe, L.; Mech, R.; Fowlkes, C. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
  30. Dong, X.; Zhan, X.; Wu, Y.; Wei, Y.; Kampffmeyer, M.C.; Wei, X.; Lu, M.; Wang, Y.; Liang, X. M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 27 September 2022; pp. 21220–21230. [Google Scholar]
  31. Grauman, K.; Westbury, A.; Byrne, E.; Chavis, Z.; Furnari, A.; Girdhar, R.; Hamburger, J.; Jiang, H.; Liu, M.; Liu, X.; et al. Ego4D: Around the World in 3,000 Hours of Egocentric Vide. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 27 September 2022; pp. 18973–18990. [Google Scholar]
  32. Toker, A.; Kondmann, L.; Weber, M.; Eisenberger, M.; Camero, A.; Hu, J.; Hoderlein, A.P.; Enaras, A.; Davis, T.; Cremers, D. DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 27 September 2022; pp. 21126–21135. [Google Scholar]
  33. Xu, J.; Rao, Y.; Yu, X.; Chen, G.; Zhou, J.; Lu, J. FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 27 September 2022; pp. 2939–2948. [Google Scholar]
  34. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
  35. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  36. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  37. Szegedy, C.; Wei, L.; Jia, Y.; Sermanet, P.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  38. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the Computer Science-Computer Vision and Pattern Recognition, 10 April 2015; Available online: https://arxiv.org/pdf/1409.1556.pdf (accessed on 25 March 2021).
  39. Ren, S.; He, K.; Girshick, R.; Zhang, X.; Sun, J. Object Detection Networks on Convolutional Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1476–1481. [Google Scholar] [CrossRef] [PubMed]
  40. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1 June 2016; pp. 770–778. [Google Scholar]
  41. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11 October 2021; pp. 10012–10022. [Google Scholar]
  42. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA, 22 October 2020. [Google Scholar]
  43. Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Keysers, D.; Uszkoreit, J.; Lucic, M.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
  44. Huang, G.; Liu, Z.; Laurens, V.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  45. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Int. Conf. Mach. Learn. 2019, 97, 6105–6114. [Google Scholar]
  46. Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Zhang, Z.; Lin, H.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R. ResNeSt: Split-Attention Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2736–2746. [Google Scholar]
  47. Hu, J.; Shen, L.; Sun, G.; Albanie, S. Squeeze-and-Excitation Networks. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  48. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  49. Jia, D.; Wei, D.; Socher, R.; Li, L.J.; Kai, L.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  50. Swain, M.J.; Ballard, D.H. Color indexing. Int. J. Comput. Vis. 1991, 7, 11–32. [Google Scholar] [CrossRef]
  51. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision–ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. Available online: https://arxiv.org/pdf/1405.0312.pdf (accessed on 25 March 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.