# CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We propose a novel graph encoder, GUN-2A, for performing zero-shot image classification individually or applied as an effective graph encoder in the graph contrastive learning module.
- We introduce a Structural Symmetric Knowledge Graph for zero-shot image classification. The additional knowledge graph enhances the representative power of the nodes in the embedding space.
- We propose a graph contrastive learning framework, CGUN-2A. We test it on the most challenging zero-shot image classification dataset, ImageNet-21K, and the result shows that our method significantly outperforms the baseline methods.

## 2. Related Works

#### 2.1. ZSL in Ecological Monitoring

#### 2.2. Graph Representation Learning for ZSL and Over-Smoothing Problem

#### 2.3. Contrastive Learning for ZSL

## 3. Materials and Methods

#### 3.1. Problem Definition

#### 3.2. Preliminary Works

#### 3.2.1. Graph Convolutional Network

#### 3.2.2. Graph U-Nets

#### 3.2.3. Graph Contrastive Learning

#### 3.3. GUN-2A Architecture

#### 3.3.1. Overview of GUN-2A

#### 3.3.2. Attention-Based gPool

#### 3.4. CGUN-2A Architecture

#### 3.4.1. Overview of CGUN-2A

#### 3.4.2. Structural Symmetric Knowledge Graph

#### 3.4.3. CGUN-2A for ZSL

## 4. Discussion of Results

#### 4.1. Experimental Settings

#### 4.2. Implementation Details

#### 4.3. Performance Comparison

#### 4.4. Analysis of Smoothness

#### 4.5. Analysis of Ablation

#### 4.6. Analysis of the Number of Layers

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Salakhutdinov, R.; Torralba, A.; Tenenbaum, J. Learning to Share Visual Appearance for Multiclass Object Detection. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1481–1488. [Google Scholar]
- Wang, Y.-X.; Ramanan, D.; Hebert, M. Learning to Model the Tail. Adv. Neural. Inf. Process Syst.
**2017**, 30, 7032–7042. [Google Scholar] - Stork, L.; Weber, A.; van den Herik, J.; Plaat, A.; Verbeek, F.; Wolstencroft, K. Large-Scale Zero-Shot Learning in the Wild: Classifying Zoological Illustrations. Ecol. Inform.
**2021**, 62, 101222. [Google Scholar] [CrossRef] - Li, Q.; Rigall, E.; Sun, X.; Lam, K.M.; Dong, J. Dual Autoencoder Based Zero Shot Learning in Special Domain. Pattern Anal. Appl.
**2022**, 1–12. [Google Scholar] [CrossRef] - Rasheed, J. Analyzing the Effect of Filtering and Feature-Extraction Techniques in a Machine Learning Model for Identification of Infectious Disease Using Radiography Imaging. Symmetry
**2022**, 14, 1398. [Google Scholar] [CrossRef] - Rasheed, J.; Waziry, S.; Alsubai, S.; Abu-Mahfouz, A.M. An Intelligent Gender Classification System in the Era of Pandemic Chaos with Veiled Faces. Processes
**2022**, 10, 1427. [Google Scholar] [CrossRef] - Rasheed, J.; Shubair, R.M. Screening Lung Diseases Using Cascaded Feature Generation and Selection Strategies. Healthcare
**2022**, 10, 1313. [Google Scholar] [CrossRef] [PubMed] - Li, Q.; Han, Z.; Wu, X.-M. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3438–3445. [Google Scholar]
- Zhao, L.; Akoglu, L. Pairnorm: Tackling Oversmoothing in Gnns. arXiv
**2019**, arXiv:1909.12223. [Google Scholar] - Li, G.; Muller, M.; Thabet, A.; Ghanem, B. Deepgcns: Can Gcns Go as Deep as Cnns? In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 9267–9276. [Google Scholar]
- Gao, H.; Ji, S. Graph U-Nets. In Proceedings of the International Conference on Machine Learning, PMLR; 2019; pp. 2083–2092. [Google Scholar]
- Wang, J.; Jiang, B. Zero-Shot Learning via Contrastive Learning on Dual Knowledge Graphs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 885–892. [Google Scholar]
- Villon, S.; Iovan, C.; Mangeas, M.; Vigliola, L. Confronting Deep-Learning and Biodiversity Challenges for Automatic Video-Monitoring of Marine Ecosystems. Sensors
**2022**, 22, 497. [Google Scholar] [CrossRef] - Sun, X.; Xv, H.; Dong, J.; Zhou, H.; Chen, C.; Li, Q. Few-Shot Learning for Domain-Specific Fine-Grained Image Classification. IEEE Trans. Ind. Electron.
**2020**, 68, 3588–3598. [Google Scholar] [CrossRef] [Green Version] - Pradhan, B.; Al-Najjar, H.A.H.; Sameen, M.I.; Tsang, I.; Alamri, A.M. Unseen Land Cover Classification from High-Resolution Orthophotos Using Integration of Zero-Shot Learning and Convolutional Neural Networks. Remote Sens.
**2020**, 12, 1676. [Google Scholar] [CrossRef] - Lampert, C.H.; Nickisch, H.; Harmeling, S. Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 951–958. [Google Scholar]
- Misra, I.; Gupta, A.; Hebert, M. From Red Wine to Red Tomato: Composition with Context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1792–1801. [Google Scholar]
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.; Mikolov, T. Devise: A Deep Visual-Semantic Embedding Model. Adv. Neural. Inf. Process Syst.
**2013**, 26, 2121–2129. [Google Scholar] - Socher, R.; Ganjoo, M.; Manning, C.D.; Ng, A. Zero-Shot Learning through Cross-Modal Transfer. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2-8 December 2012; pp. 935–943. [Google Scholar]
- Norouzi, M.; Mikolov, T.; Bengio, S.; Singer, Y.; Shlens, J.; Frome, A.; Corrado, G.S.; Dean, J. Zero-Shot Learning by Convex Combination of Semantic Embeddings. arXiv
**2013**, arXiv:1312.5650. [Google Scholar] - Elhoseiny, M.; Saleh, B.; Elgammal, A. Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2584–2591. [Google Scholar]
- Changpinyo, S.; Chao, W.-L.; Sha, F. Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3476–3485. [Google Scholar]
- Palatucci, M.; Pomerleau, D.; Hinton, G.E.; Mitchell, T.M. Zero-Shot Learning with Semantic Output Codes. In Proceedings of the 22nd International Conference on Neural Information Processing Systems, Vancouver, Canada, 8–13 December 2008; pp. 1410–1418. [Google Scholar]
- Rohrbach, M.; Stark, M.; Schiele, B. Evaluating Knowledge Transfer and Zero-Shot Learning in a Large-Scale Setting. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1641–1648. [Google Scholar]
- Deng, J.; Ding, N.; Jia, Y.; Frome, A.; Murphy, K.; Bengio, S.; Li, Y.; Neven, H.; Adam, H. Large-Scale Object Classification Using Label Relation Graphs. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 48–64. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv
**2016**, arXiv:1609.02907. [Google Scholar] - Wang, X.; Ye, Y.; Gupta, A. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6857–6866. [Google Scholar]
- Kampffmeyer, M.; Chen, Y.; Liang, X.; Wang, H.; Zhang, Y.; Xing, E.P. Rethinking Knowledge Graph Propagation for Zero-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11487–11496. [Google Scholar]
- Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards Deep Graph Convolutional Networks on Node Classification. arXiv
**2019**, arXiv:1907.10903. [Google Scholar] - Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.; Jegelka, S. Representation Learning on Graphs with Jumping Knowledge Networks. In Proceedings of the International Conference on Machine Learning, Macau, China, 26–28 February 2018; pp. 5453–5462. [Google Scholar]
- Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; ver Steeg, G.; Galstyan, A. Mixhop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 21–29. [Google Scholar]
- Klicpera, J.; Weißenberger, S.; Günnemann, S. Diffusion Improves Graph Learning. arXiv
**2019**, arXiv:1911.05485 2019. [Google Scholar] - Klicpera, J.; Bojchevski, A.; Günnemann, S. Predict Then Propagate: Graph Neural Networks Meet Personalized Pagerank. arXiv
**2018**, arXiv:1810.05997. [Google Scholar] - Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and Deep Graph Convolutional Networks. In Proceedings of the International Conference on Machine Learning, Shenzhen, China, 13–18 July 2020; pp. 1725–1735. [Google Scholar]
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph Contrastive Learning with Adaptive Augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2069–2080. [Google Scholar]
- Velickovic, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep Graph Infomax. ICLR (Poster)
**2019**, 2, 4. [Google Scholar] - You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph Contrastive Learning with Augmentations. Adv. Neural. Inf. Process Syst.
**2020**, 33, 5812–5823. [Google Scholar] - Hassani, K.; Khasahmadi, A.H. Contrastive Multi-View Representation Learning on Graphs. In Proceedings of the International Conference on Machine Learning, Shenzhen, China, 15–17 February 2020; pp. 4116–4126. [Google Scholar]
- Zou, D.; Wei, W.; Mao, X.-L.; Wang, Z.; Qiu, M.; Zhu, F.; Cao, X. Multi-Level Cross-View Contrastive Learning for Knowledge-Aware Recommender System. arXiv
**2022**, arXiv:2204.08807. [Google Scholar] - Jiang, H.; Wang, R.; Shan, S.; Chen, X. Transferable Contrastive Network for Generalized Zero-Shot Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9765–9774. [Google Scholar]
- Anwaar, M.U.; Khan, R.A.; Pan, Z.; Kleinsteuber, M. A Contrastive Learning Approach for Compositional Zero-Shot Learning. In Proceedings of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, 18–22 October 2021; pp. 34–42. [Google Scholar]
- Li, X.; Yang, X.; Wei, K.; Deng, C.; Yang, M. Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 9326–9335. [Google Scholar]
- Guan, J.; Meng, M.; Liang, T.; Liu, J.; Wu, J. Dual-Level Contrastive Learning Network for Generalized Zero-Shot Learning. Vis. Comput.
**2022**, 38, 3087–3095. [Google Scholar] [CrossRef] - Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv
**2015**, arXiv:1511.07122. [Google Scholar] - Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph Attention Networks. arXiv
**2017**, arXiv:1710.10903. [Google Scholar] - Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, online, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM
**1995**, 38, 39–41. [Google Scholar] [CrossRef] - Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the International Conference on Machine Learning, online, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised Feature Learning via Non-Parametric Instance Discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3733–3742. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R. Resnest: Split-Attention Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 2736–2746. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 1532–1543. [Google Scholar]
- Nayak, N.V.; Bach, S.H. Zero-Shot Learning with Common Sense Knowledge Graphs. arXiv
**2020**, arXiv:2006.10713. [Google Scholar]

**Figure 1.**Overview of Graph U-Net with two attention-based graph pooling layers. In this example, the GCN layers aggregate adjacent node features and convert them into a high-dimensional representation. The Att-gPools choose nodes with high attention scores through top-k selection and send them to the next GCN for further aggregation. The gUnpools reconstruct the original graph structure by using position information and empty feature vectors of unselected nodes.

**Figure 2.**Overview of the proposed attention-based graph pooling layer with k = 2. We take a graph $G({A}^{l},{H}^{l})$ with adjacency matrix ${A}^{l}\in {\mathbb{R}}^{4\times 4}$ and feature matrix ${H}^{l}\in {\mathbb{R}}^{4\times 3}$ as input to form a smaller subgraph $G({A}^{l+1},{H}^{l+1})$ with ${A}^{l}\in {\mathbb{R}}^{2\times 2}$ and ${H}^{l}\in {\mathbb{R}}^{2\times 3}$. In the linear transformation stage, we use a learnable weight matrix ${W}^{l}$ to increase the dimension of features and obtain hidden features ${H}^{l}{}^{\prime}\in {\mathbb{R}}^{4\times 5}$. Then, a shared attentional mechanism is performed to compute attention coefficients. Note that a single-head attention is applied here.$\Lambda \in {\mathbb{R}}^{4\times 4\times 1}$ are the column blocks from the coefficient matrix. The scalar ${\alpha}_{ij}$ is the projection of the vector ${\overrightarrow{{h}^{\prime}}}_{i}\parallel {\overrightarrow{{h}^{\prime}}}_{j}$ concatenated by node $i,j$ on the trainable weight vector $\overrightarrow{a}$. We use a $1\times 1$ conv to fuse the attention coefficients in node-level and obtain attention score $y$. Two nodes in ${A}^{l}$ and ${H}^{l}$, respectively, with the highest scores are selected in the top-k node selection stage. At the gate stage, we perform element-wise multiplication between ${\tilde{H}}^{l}$ and the selected node scores vector $\tilde{y}$, resulting in ${H}^{l+1}$.

**Figure 3.**The overall framework of the proposed zero-shot classification model CGUN-2A. The model is constructed by (

**a**) graph encoder, (

**b**) contrastive learning module and (

**c**) classifier learning module.

**Figure 4.**An illustration for generating the representation of class ‘n00015338′ in ImageNet21K, which is a wnid concept consisting of 6 synonyms.

**Figure 5.**A paradigm comparison of the contrastive learning module in DKG (

**a**) and CGUN-2A (

**b**) with different inputs, graph encoders and loss function.

**Table 1.**Hit@k performance for different methods on three datasets. Only testing on the unseen classes.

Test Set | Num of Class | Model | Layer of GCN | Hit@k (%) | ||||
---|---|---|---|---|---|---|---|---|

1 | 2 | 5 | 10 | 20 | ||||

2-hops | 1549 | EXEM | - | 12.5 | 19.5 | 32.3 | 43.7 | 55.2 |

GCNZ | 4 | 19.8 | 33.3 | 53.2 | 65.4 | 74.6 | ||

SGCN | 1 | 26.2 | 40.4 | 60.2 | 71.9 | 81.0 | ||

DGP | 1 | 26.6 | 40.7 | 60.3 | 72.3 | 81.3 | ||

SGCN(Tr) ^{1} | 1 | 28.2 | 43.3 | 62.7 | 74.1 | 82.3 | ||

DGP(Tr) ^{1} | 1 | 27.0 | 41.4 | 61.3 | 73.0 | 81.5 | ||

DKG | 1 | 28.4 | 43.0 | 62.6 | 74.5 | 82.9 | ||

ZSL-KG(Tr) ^{1} | 1 | 26.6 | 40.7 | 60.3 | 72.3 | 81.3 | ||

GUN-2A(ours) | 5 | 26.4 | 40.5 | 60.3 | 71.8 | 80.6 | ||

CGUN-2A(ours) | 5 | 29.2 | 43.6 | 63.0 | 74.6 | 82.9 | ||

3-hops | 7860 | EXEM | - | 3.6 | 5.9 | 10.7 | 16.1 | 23.1 |

GCNZ | 4 | 4.1 | 7.5 | 14.2 | 20.2 | 27.7 | ||

SGCN | 1 | 6.0 | 10.4 | 18.9 | 27.2 | 36.9 | ||

DGP | 1 | 6.3 | 10.7 | 19.3 | 27.7 | 37.7 | ||

DKG | 1 | 7.0 | 11.7 | 20.7 | 29.2 | 39.0 | ||

ZSL-KG(Tr) ^{1} | 1 | 6.3 | 11.1 | 20.1 | 28.8 | 38.8 | ||

GUN-2A(ours) | 5 | 6.4 | 10.8 | 19.5 | 27.9 | 38.0 | ||

CGUN-2A(ours) | 5 | 7.8 | 13.0 | 22.6 | 31.5 | 41.8 | ||

All | 20842 | EXEM | - | 1.8 | 2.9 | 5.3 | 8.2 | 12.2 |

GCNZ | 4 | 1.8 | 3.3 | 6.3 | 9.1 | 12.7 | ||

SGCN | 1 | 2.8 | 4.9 | 9.1 | 13.5 | 19.3 | ||

DGP | 1 | 3.0 | 5.0 | 9.3 | 13.9 | 19.8 | ||

DKG | 1 | 3.3 | 5.6 | 10.1 | 14.7 | 20.5 | ||

ZSL-KG(Tr) ^{1} | 1 | 3.0 | 5.3 | 9.9 | 14.8 | 21.0 | ||

GUN-2A(ours) | 5 | 3.0 | 5.3 | 9.7 | 14.4 | 20.3 | ||

CGUN-2A(ours) | 5 | 4.0 | 6.7 | 12.2 | 17.8 | 24.8 |

^{1}Tr means Transformer-encoded.

**Table 2.**Hit@k performance for different methods on three datasets. Testing on both the seen and unseen classes.

Test Set | Num of Test Class | Model | Hit@k (%) | ||||
---|---|---|---|---|---|---|---|

1 | 2 | 5 | 10 | 20 | |||

2-hops + 1K | 1549 | GCNZ | 9.7 | 20.4 | 42.6 | 57.0 | 68.2 |

SGCN | 11.9 | 27.0 | 50.8 | 65.1 | 75.9 | ||

DGP | 10.3 | 26.4 | 50.3 | 65.2 | 76.0 | ||

DKG | 7.0 | 26.8 | 52.5 | 67.5 | 77.9 | ||

ZSL-KG(Tr) ^{1} | 11.1 | 26.2 | 50.0 | 64.3 | 75.3 | ||

GUN-2A(ours) | 11.2 | 26.8 | 50.4 | 65.2 | 75.4 | ||

CGUN-2A(ours) | 13.5 | 28.9 | 52.9 | 65.9 | 76.3 | ||

3-hops + 1K | 7860 | GCNZ | 2.2 | 5.1 | 11.9 | 18.0 | 25.6 |

SGCN | 3.2 | 7.1 | 16.1 | 24.6 | 34.6 | ||

DGP | 2.9 | 7.1 | 16.1 | 24.9 | 35.1 | ||

DKG | 2.0 | 7.1 | 17.3 | 26.2 | 36.5 | ||

ZSL-KG(Tr)^{1} | 3.4 | 7.5 | 16.9 | 26.1 | 36.5 | ||

GUN-2A(ours) | 2.9 | 6.9 | 16.0 | 24.7 | 34.3 | ||

CGUN-2A(ours) | 4.6 | 9.4 | 19.6 | 29.0 | 39.6 | ||

All + 1K | 20842 | GCNZ | 1.0 | 2.3 | 5.3 | 8.1 | 11.7 |

SGCN | 1.5 | 3.4 | 7.8 | 12.3 | 18.2 | ||

DGP | 1.4 | 3.4 | 7.9 | 12.6 | 18.7 | ||

DKG | 1.0 | 3.4 | 8.5 | 13.2 | 19.3 | ||

ZSL-KG(Tr)^{1} | 1.7 | 3.8 | 8.5 | 13.5 | 19.9 | ||

GUN-2A(ours) | 1.4 | 3.4 | 8.6 | 12.7 | 18.3 | ||

CGUN-2A(ours) | 2.5 | 5.1 | 10.8 | 16.5 | 23.7 |

^{1}Tr means Transformer-encoded.

**Table 3.**The MAD values of some GCN-based methods on the ‘2-hops’, ‘3-hops’ and ‘all’ datasets with both CZSL and GZSL settings.

Model | MAD for CZSL | MAD for GZSL | ||||
---|---|---|---|---|---|---|

2-Hops | 3-Hops | All | 2-Hops | 3-Hops | All | |

SGCN | 0.431 | 0.389 | 0.411 | 0.258 | 0.344 | 0.386 |

DGP | 0.375 | 0.367 | 0.387 | 0.228 | 0.327 | 0.378 |

GUN-2A | 0.429 | 0.391 | 0.418 | 0.265 | 0.345 | 0.377 |

CGUN-2A | 0.453 | 0.419 | 0.433 | 0.275 | 0.372 | 0.413 |

**Table 4.**Results for the ablation study for 2-hops. ${G}_{gv}$ and ${G}_{tr}$ represent graphs encoded by the encoders Glove and Transformer, respectively.

Model | ${\mathit{G}}_{\mathit{g}\mathit{v}}$ | ${\mathit{G}}_{\mathit{t}\mathit{r}}$ | Att-gPool | Hit@k (%) | ||||
---|---|---|---|---|---|---|---|---|

1 | 2 | 5 | 10 | 20 | ||||

GUN | ✓ | ✗ | ✗ | 17.4 | 28.0 | 45.0 | 58.9 | 71.2 |

GUN-2A | ✓ | ✗ | ✓ | 26.4 | 40.5 | 60.3 | 71.8 | 80.6 |

✗ | ✓ | ✓ | 28.4 | 42.5 | 61.5 | 72.9 | 81.3 | |

CGUN-2A | ✓ | ✓ | ✓ | 29.2 | 43.6 | 63.0 | 74.6 | 82.9 |

Model | GCN Layer | Hit@k (%) | ||||
---|---|---|---|---|---|---|

1 | 2 | 5 | 10 | 20 | ||

SGCN * | 1 | 24.8 | 38.3 | 57.5 | 69.9 | 79.6 |

2 | 24.2 | 37.7 | 57.4 | 69.2 | 78.1 | |

3 | 23.9 | 37.5 | 57.1 | 68.4 | 77.2 | |

GCNZ | 4 | 19.8 | 33.3 | 53.2 | 65.4 | 74.6 |

GUN-None | 2 | 24.2 | 37.6 | 57.1 | 69.6 | 79.4 |

GUN-1A | 3 | 25.4 | 39.0 | 58.5 | 70.3 | 79.5 |

GUN-2A | 5 | 26.4 | 40.5 | 60.3 | 71.8 | 80.6 |

GUN-3A | 7 | 25.5 | 39.3 | 58.6 | 71.1 | 79.4 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, L.; Liu, L.; Du, X.; Wang, X.; Zhang, Z.; Zhang, J.; Zhang, P.; Liu, J.
CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification. *Sensors* **2022**, *22*, 9980.
https://doi.org/10.3390/s22249980

**AMA Style**

Li L, Liu L, Du X, Wang X, Zhang Z, Zhang J, Zhang P, Liu J.
CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification. *Sensors*. 2022; 22(24):9980.
https://doi.org/10.3390/s22249980

**Chicago/Turabian Style**

Li, Liangwei, Lin Liu, Xiaohui Du, Xiangzhou Wang, Ziruo Zhang, Jing Zhang, Ping Zhang, and Juanxiu Liu.
2022. "CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification" *Sensors* 22, no. 24: 9980.
https://doi.org/10.3390/s22249980