# Investigating Transfer Learning in Graph Neural Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction and Related Work

#### 1.1. Background

#### 1.1.1. Graph Neural Networks

Model | Update Rule |

GCN | ${h}_{v}^{\left(k\right)}\leftarrow \sigma \left(\right)open="["\; close="]">{W}^{\left(k\right)}\xb7\left(\right)open="("\; close=")">{\displaystyle \sum _{u\in \mathcal{N}\left(v\right)\cup \left\{v\right\}}}\frac{1}{\sqrt{\tilde{{d}_{u}}\tilde{{d}_{v}}}}{h}_{u}^{(k-1)}$ |

GraphSAGE | ${h}_{v}^{\left(k\right)}\leftarrow \sigma \left(\right)open="["\; close="]">{W}^{\left(k\right)}\xb7\left(\right)open="("\; close=")">\frac{1}{{\tilde{d}}_{v}^{\mathrm{in}}}{\displaystyle \sum _{u\in \mathcal{N}\left(v\right)\cup \left\{v\right\}}}{h}_{u}^{(k-1)}$ |

GIN | ${h}_{v}^{\left(k\right)}\leftarrow \sigma \left(\right)open="["\; close="]">{W}^{\left(k\right)}\xb7\left(\right)open="("\; close=")">(1+{\u03f5}^{\left(k\right)})\xb7{h}_{v}^{(k-1)}+{\displaystyle \sum _{u\in \mathcal{N}\left(v\right)}}{h}_{u}^{(k-1)}$ |

#### 1.1.2. Transfer Learning

#### 1.1.3. Related Work

## 2. Materials and Methods

`PyTorch-Geometric`[32] library for efficient GPU-optimised implementations of the three selected GNNs [33]. We track our experiments using

`Comet.ml`, and make them publicly available for transparency.

#### 2.1. Node Classification Experimental Design

#### 2.1.1. Synthetic Data

#### 2.1.2. Real-World Data

`Arxiv`and

`MAG`—both are directed citation networks, where each node has an attribute vector containing a 128-dimensional word embedding of the paper. In addition, a paper’s year of publication is also associated with its node in the network.

`Arxiv`contains 169,343 Computer Science papers, and the task is to predict which of the 40 subject areas a paper belongs to.

`MAG`is taken from a subset of the Microsoft-Academic-Graph [42], and contains four types of node entities: papers, authors, institution and field of study. For consistency we will only make use of the papers, which consist of 736,389 nodes, making it a much larger and more complex network than

`Arxiv`. The task here is to predict which of 349 venues (conferences or journals) each paper belongs to. Open Graph Benchmark also provides model evaluators, which use the standard accuracy score.

`MAG`transfers to itself, we split it into a source and a target graph. Papers from 2010–2014 are placed in the source split, and those from 2015–2019 belong to the target split. Any edges between nodes in separate splits are removed. Table 2 lists the statistics of the above datasets.

#### 2.1.3. Experimental Methodology

`Arxiv`and the

`MAG`source split to the target

`MAG`split. Lastly, to investigate how important attributes are for our node classification tasks, we damaged the node attributes for both

`Arxiv`and the

`MAG`source graph. To damage the attributes, we replaced the attributes with Gaussian distributed random noise with a mean of 0 and a standard deviation of 1.

`Arxiv`and

`MAG`, which allows us to compare the performance for the base models as a sanity check. The network comprises three GNN layers: with an input dimensionality of 128, an output dimensionality of 349 (for

`MAG`), and a hidden dimensionality of 256. We trained the networks on the target task for 2000 epochs using the Adam optimiser [43]. The best performing learning rate for each GNN was selected and fixed across our 6 experiment sets. GCN, GraphSAGE and GIN were all trained with a learning rate of $0.001$ in this case.

#### 2.2. Graph Classification Experimental Design

#### 2.2.1. Synthetic Data

- Create a random n-class classification problem with a sample $\mathbf{X}$ and labels $\mathbf{y}$;
- For each label ${y}_{i}$ in $\mathbf{y}$, generate several graphs and set their node attributes to the relevant example from $\mathbf{X}$. Label these graphs with ${y}_{i}$;
- Optionally, swap the labels assigned to some of the graphs to weaken community structure;
- Optionally, replace node attributes with noise to weaken attribute community structure.

`num_classes`: the number of classes or labels in the dataset;`n_per_class`: the number of graphs to generate per class;`n_features`: the length of the node feature vector;`percent_swap`: the percentage of graphs to swap;`percent_damage`: the percentage of graphs where node attributes are to be damaged.

`num_classes`,

`n_per_class`and

`n_features`parameters allow for dataset level properties to be varied, while

`percent_swap`and

`percent_damage`influence the structural and attribute community structure respectively. We fixed the size of the graphs at 30 nodes. More details regarding this generation process are provided in the Appendix A.

#### 2.2.2. Real-World Data

`BBBP`and

`HIV`.

`BBBP`(Blood–Brain Barrier Penetration) is a physiological dataset where the task is to predict whether a given compound penetrates the blood–brain barrier or not [48].

`HIV`is a biophysics dataset where the task is to predict whether a compound has anti-HIV activity or not.

`BBBP`is a much smaller dataset than

`HIV`, so we split

`HIV`into a source and target split similar to the real-world node classification experiments: half the

`HIV`is randomly sampled for each split, and this sample is kept fixed. A summary of the datasets is given in Table 5.

#### 2.2.3. Experimental Methodology

`HIV`target split, is fixed, and we pretrained our GNNs on

`BBBP`and the

`HIV`source split and then evaluated them on the target task. We also evaluated the transfer performance where the models are pretrained on the source datasets with damaged node attributes, i.e., where the node attributes are replaced by Gaussian distributed random noise with a mean of 0 and a standard deviation of 1. The experiments are described in Table 6.

## 3. Results

#### 3.1. Node Classification Results

#### 3.1.1. Real-World Data

`Arxiv`[Damaged]), and GIN. Since the pretraining datasets are all citation networks, positive transfer is a reasonable outcome.

`Arxiv`dataset. Interestingly, for GCN and GIN, we note that

`Arxiv`with damaged attributes, in absolute terms, performs somewhat better than just

`Arxiv`, indicating that graph characteristics beyond attribute values are being used for transfer. Although some of the GraphSAGE Transfer Ratios are positive, these are not statistically better than the control. Turning to Table 8, we note that GIN statistically always outperforms the other GNNs in this metric, indicating that GIN benefits the most from sharing knowledge from the source domains.

`MAG`(Source split) [Old layer] tasks, in Table 7, have a greater Jumpstart than the rest, since the ouptut layer does not need to be retrained. We note that both GCN and GraphSAGE show significant Jumpstart with the completely new task

`Arxiv`. In fact GraphSAGE exploits graph characteristics beyond attribute values as evidenced by the

`Arxiv`[Damaged] results. In Table 8 we note that GCN is either on par or significantly better than the other GNNs across all datasets on this metric; however, the result is not compelling.

`Arxiv`task. GraphSAGE is inconsistent and does not always show transfer at the end of training. Despite the huge Jumpstart with

`MAG`(Source split) [Old layer], only GIN retains a large absolute improvement in Asymptotic Performance. Both GCN and GIN once again exploit structural characteristics beyond the attribute values as evidenced by the

`Arxiv`[Damaged] results. In Table 8 we see a mixture of best-performing GNNs with both GCN and GraphSAGE performing well on the completely new

`Arxiv`task.

Takeaway 1: We have statistical evidence that transfer to a new task for node classification does occur across all metrics and GNNs. We demonstrate that GCN and GIN exploit structural rather than attribute information for achieving a positive Transfer Ratio and Asymptotic Performance, as does GraphSAGE for Jumpstart. |

#### 3.1.2. Synthetic Data

Takeaway 2: In general, GIN predominately exploits Strong Modularity for transfer—while GraphSAGE can exploit both Modularity and Within Inertia for Jumpstart—in support of our real-world data findings. |

#### 3.2. Graph Classification Results

#### 3.2.1. Real-World Data

`HIV`(Source split), and negative transfer with the remaining pretrainings. GIN’s training curves also show a decay over training with

`BBBP`pretrainings. The training curves indicate GraphSAGE and GIN suffer from worse negative transfer than GCN.

`HIV`(Source split) and

`BBBP`when compared to the control. This result indicates that they are able to transfer to a completely new task. GIN shows transfer from the similar task

`HIV`(Source split) but not from the new task

`BBBP`for any of the metrics. The biggest jumpstart is seen with

`HIV`(Source split), which is understandable since it is a self-transfer task. None of the GNNs for any of the tasks show any significant transfer from the damaged task, indicating that the node attributes are likely exploited over graph structural characteristics.

`BBBP`task. For the transfer from the more similar

`HIV`(Source split) results are mixed with GCN requiring further experimentation to determine outperformance.

Takeaway 3: We have significant statistical evidence to support that transfer happens for graph classification for both GCN and GraphSAGE across all metrics considered. We can also reject the hypothesis that the transfer is as a result of graph structure beyond the node attributes alone. |

#### 3.2.2. Synthetic Data

Takeaway 4: There is significant evidence that GraphSAGE and GIN exploit Strong Attribute Within Inertia in order to achieve transfer. These results support our real-world findings with respect to GraphSAGE. |

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Data Generation for Graph Classification

#### Appendix A.1. Step 1: Create an Attribute-Level Task

`n_features`are generated belonging to

`num_classes`classes. The total number of these vectors is

`num_classes`×

`n_per_class`× 30, so that each node in each graph for each class has an attribute vector that can be assigned to it. We followed Morris et al. [50] in generating this attribute level task using the

`scikit-learn`library (See https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html, accessed on 4 August 2021). This tool generates the classification task using a modified algorithm from Guyon [51]. After this step, the attributes have a high level of community structure.

#### Appendix A.2. Step 2: Generate Graphs and Assign Attributes to the Graphs

`n_per_class`graphs were generated for

`num_classes`with different m values, the corresponding attribute vectors from the previous step were assigned to graphs with the same label. At the end of this step, we had a labelled dataset with strong community structure for both nodes and attributes.

#### Appendix A.3. Step 3: Swap Graphs

`percent_swap`parameter. This parameter may be in the range $[0,1]$, and the default value is 0 (no graphs are swapped). A random sample of pairs of graphs to swap is selected, corresponding to the specified percentage of the dataset. By swapping more graphs, the classes have less distinct average node degrees compared to one another, resulting in weaker community structure. This is demonstrated in Figure A2.

**Figure A2.**The effect of varying the

`percent_swap`parameter on $\mathrm{w}.\mathrm{i}{.}_{\mathrm{struct}}$. The shaded region is the 1$\sigma $ interval variance over 10 runs.

#### Appendix A.4. Step 4: Damage Attributes

`percent_ damage`parameter. This parameter also has a range of $[0,1]$, with a default value of 0, and defines the percentage of graphs which will have their node attributes damaged (replaced with random values). The higher the percentage is, the less distinct the attributes from different classes are from one another, and thus a weaker community structure. This is demonstrated in Figure A3.

**Figure A3.**The effect of varying the

`percent_damage`parameter on $\mathrm{w}.\mathrm{i}{.}_{\mathrm{attr}}$. The shaded region is the 1$\sigma $ variance over 10 runs.

## References

- Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.L.; Chen, S.C.; Iyengar, S.S. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. (CSUR)
**2018**, 51, 1–36. [Google Scholar] [CrossRef] - Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Process. Mag.
**2017**, 34, 18–42. [Google Scholar] [CrossRef][Green Version] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw.
**1995**, 3361, 1995. [Google Scholar] - Roy, A.M.; Bose, R.; Bhaduri, J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. Appl.
**2022**, 34, 3895–3921. [Google Scholar] [CrossRef] - Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 32, 4–24. [Google Scholar] [CrossRef] [PubMed][Green Version] - Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv
**2018**, arXiv:1806.01261. [Google Scholar] - Hendrycks, D.; Lee, K.; Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
- Zhang, K.; Robinson, N.; Lee, S.W.; Guan, C. Adaptive transfer learning for EEG motor imagery classification with deep Convolutional Neural Network. Neural Netw.
**2021**, 136, 1–10. [Google Scholar] [CrossRef] [PubMed] - Kooverjee, N.; James, S.; Van Zyl, T. Inter-and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition. In Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference, Cape Town, South Africa, 29–31 January 2020; pp. 1–6. [Google Scholar]
- Van Zyl, T.L.; Woolway, M.; Engelbrecht, B. Unique animal identification using deep transfer learning for data fusion in siamese networks. In Proceedings of the 2020 IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa, 6–9 July 2020; pp. 1–6. [Google Scholar]
- Karim, Z.; van Zyl, T.L. Deep/Transfer Learning with Feature Space Ensemble Networks (FeatSpaceEnsNets) and Average Ensemble Networks (AvgEnsNets) for Change Detection Using DInSAR Sentinel-1 and Optical Sentinel-2 Satellite Data Fusion. Remote Sens.
**2021**, 13, 4394. [Google Scholar] [CrossRef] - Variawa, M.Z.; Van Zyl, T.L.; Woolway, M. Transfer Learning and Deep Metric Learning for Automated Galaxy Morphology Representation. IEEE Access
**2022**, 10, 19539–19550. [Google Scholar] [CrossRef] - Barabási, A.L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Taylor, M.E.; Stone, P. Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res.
**2009**, 10, 1633–1685. [Google Scholar] - Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.
**2009**, 22, 1345–1359. [Google Scholar] [CrossRef] - Bengio, Y. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, WA, USA, 2 July 2011. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Kornblith, S.; Shlens, J.; Le, Q.V. Do better ImageNet models transfer better? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Huh, M.; Agrawal, P.; Efros, A.A. What makes ImageNet good for transfer learning? arXiv
**2016**, arXiv:1608.08614. [Google Scholar] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Hamilton, W.L. Graph representation learning. In Synthesis Lectures on Artifical Intelligence and Machine Learning; Morgan & Claypool Publishers: San Rafael, CA, USA, 2020. [Google Scholar]
- Velickovic, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep Graph Infomax. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Lee, J.; Kim, H.; Lee, J.; Yoon, S. Transfer learning for deep learning on graph-structured data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.; Leskovec, J. Strategies for Pre-training Graph Neural Networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Dai, Q.; Shen, X.; Wu, X.M.; Wang, D. Network Transfer Learning via Adversarial Domain Adaptation with Graph Convolution. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Errica, F.; Podda, M.; Bacciu, D.; Micheli, A. A fair comparison of graph neural networks for graph classification. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2020. [Google Scholar]
- Fey, M.; Lenssen, J.E. Fast Graph Representation Learning with PyTorch Geometric. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Dwivedi, V.P.; Joshi, C.K.; Laurent, T.; Bengio, Y.; Bresson, X. Benchmarking graph neural networks. arXiv
**2020**, arXiv:2003.00982. [Google Scholar] - Hu, W.; Fey, M.; Zitnik, M.; Dong, Y.; Ren, H.; Liu, B.; Catasta, M.; Leskovec, J. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
- Bhagat, S.; Cormode, G.; Muthukrishnan, S. Node classification in social networks. In Social Network Data Analytics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Ehrlinger, L.; Wöß, W. Towards a Definition of Knowledge Graphs. SEMANTiCS
**2016**, 48, 2. [Google Scholar] - Clauset, A.; Newman, M.E.; Moore, C. Finding community structure in very large networks. Phys. Rev. E
**2004**, 70, 066111. [Google Scholar] [CrossRef] [PubMed][Green Version] - Largeron, C.; Mougel, P.N.; Benyahia, O.; Zaïane, O.R. DANCer: Dynamic attributed networks with community structure generation. Knowl. Inf. Syst.
**2017**, 53, 109–151. [Google Scholar] [CrossRef] - McPherson, M.; Smith-Lovin, L.; Cook, J.M. Birds of a feather: Homophily in social networks. Annu. Rev. Sociol.
**2001**, 27, 415–444. [Google Scholar] [CrossRef][Green Version] - Leskovec, J.; Backstrom, L.; Kumar, R.; Tomkins, A. Microscopic evolution of social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008. [Google Scholar]
- Wang, K.; Shen, Z.; Huang, C.; Wu, C.H.; Dong, Y.; Kanakia, A. Microsoft academic graph: When experts are not enough. Quant. Sci. Stud.
**2020**, 1, 396–413. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Massey, F.J., Jr. The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc.
**1951**, 46, 68–78. [Google Scholar] [CrossRef] - Koutra, D.; Parikh, A.; Ramdas, A.; Xiang, J. Algorithms for Graph Similarity and Subgraph Matching. 2011. Available online: https://www.cs.cmu.edu/~jingx/docs/DBreport.pdf (accessed on 5 May 2021).
- Abu-Aisheh, Z.; Raveaux, R.; Ramel, J.Y.; Martineau, P. An exact graph edit distance algorithm for solving pattern recognition problems. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Lisbon, Portugal, 10–12 January 2015. [Google Scholar]
- Martins, I.F.; Teixeira, A.L.; Pinheiro, L.; Falcao, A.O. A Bayesian approach to in silico blood-brain barrier penetration modeling. J. Chem. Inf. Model.
**2012**, 52, 1686–1697. [Google Scholar] [CrossRef] [PubMed] - Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Morris, C.; Kriege, N.M.; Kersting, K.; Mutzel, P. Faster kernels for graphs with continuous attributes via hashing. In Proceedings of the International Conference on Data Mining, Barcelona, Spain, 12–15 December 2016. [Google Scholar]
- Guyon, I. Design of experiments of the NIPS 2003 variable selection benchmark. In Proceedings of the NIPS Workshop on Feature Extraction and Feature Selection, Whistler, BC, Canada, 11–13 December 2003. [Google Scholar]
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science
**1999**, 286, 509–512. [Google Scholar] [CrossRef] [PubMed][Green Version]

**Figure 1.**An illustration of the jumpstart, asymptotic performance and transfer ratio metrics. The transfer ratio is computed using the area under the curve (AUC).

**Table 1.**The four configurations of the synthetic node classification datasets. The average modularity and Within Inertia ratios are computed on the generated datasets.

Modularity | Within Inertia | |||
---|---|---|---|---|

Configuration 1 (M${}_{\uparrow}$I${}_{\uparrow}$) | Strong | 0.64 | Strong | 0.37 |

Configuration 2 (M${}_{\uparrow}$I${}_{\downarrow}$) | Strong | 0.64 | Weak | 0.47 |

Configuration 3 (M${}_{\downarrow}$I${}_{\uparrow}$) | Weak | 0.32 | Strong | 0.39 |

Configuration 4 (M${}_{\downarrow}$I${}_{\downarrow}$) | Weak | 0.28 | Weak | 0.99 |

Nodes | Edges | Features | Modularity | Within Inertia | Classes | Metric | |
---|---|---|---|---|---|---|---|

Arxiv | 169,343 | 1,166,243 | 128 | 0.495 | 0.890 | 40 | Accuracy |

MAG (Source) | 402,598 | 1,615,644 | 128 | 0.299 | 0.813 | 349 | Accuracy |

MAG (Target) | 333,791 | 1,390,589 | 128 | 0.286 | 0.806 | 349 | Accuracy |

**Table 3.**Experiments conducted for node classification using both real-world and synthetic datasets.

# | Source Task | Target Task | |
---|---|---|---|

Real-world | 1 | Base [Random seed] | MAG (Target Split) |

2 | Arxiv | ||

3 | Arxiv [Damaged features] | ||

4 | MAG (Source split) [Old layer] | ||

5 | MAG (Source split) | ||

6 | MAG (Source split) [Damaged features] | ||

Synthetic | 1 | Base [Random seed] | Configuration 4 (M${}_{\downarrow}$I${}_{\downarrow}$) |

2 | Configuration 1 (M${}_{\uparrow}$I${}_{\uparrow}$) | ||

3 | Configuration 2 (M${}_{\uparrow}$I${}_{\downarrow}$) | ||

4 | Configuration 3 (M${}_{\downarrow}$I${}_{\uparrow}$) |

$\mathbf{w}.\mathbf{i}{.}_{\mathbf{struct}}$ | $\mathbf{w}.\mathbf{i}{.}_{\mathbf{attr}}$ | percent_swap | percent_damage | |
---|---|---|---|---|

Configuration 5 (${I}_{\downarrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$) | Weak | Weak | 0.95 | 0.95 |

Configuration 6 (${I}_{\uparrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$) | Strong | Weak | 0.92 | 0.95 |

Configuration 7 (${I}_{\downarrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$) | Weak | Strong | 0.95 | 0.92 |

Configuration 8 (${I}_{\uparrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$) | Strong | Strong | 0.92 | 0.92 |

No. Graphs | Average Nodes | Features | ${\mathit{I}}^{\mathbf{S}}$ | ${\mathit{I}}^{\mathbf{A}}$ | Classes | Metric | |
---|---|---|---|---|---|---|---|

BBBP | 2039 | 24.06 | 9 | 0.99 | 0.98 | 2 | ROC-AUC |

HIV (Source split) | 20,563 | 25.49 | 9 | 0.99 | 0.99 | 2 | ROC-AUC |

HIV (Target split) | 20,564 | 25.53 | 9 | 0.99 | 0.99 | 2 | ROC-AUC |

**Table 6.**Experiments conducted for graph classification using both real-world and synthetic datasets.

# | Source Task | Target Task | |
---|---|---|---|

Real-world | 1 | Base [Random seed] | HIV (Target Split) |

2 | BBBP | ||

3 | BBBP [Damaged features] | ||

4 | HIV (Source split) | ||

5 | HIV (Source split) [Damaged features] | ||

Synthetic | 1 | Base [Random seed] | Configuration 8 (${I}_{\uparrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$) |

2 | Configuration 5 (${I}_{\downarrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$) | ||

3 | Configuration 6 (${I}_{\uparrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$) | ||

4 | Configuration 7 (${I}_{\downarrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$) |

**Table 7.**Transfer metrics for real-world node classification experiments (10 runs). Bold results are positive and statistically greater than the control at $p=0.1$. We evaluate significance for each model/metric combination.

Model | Source Task → MAG-Target | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

Control | MAG-Source [Damaged] | 0.011 ± 0.006 | 0.000 ± 0.001 | −0.001 ± 0.002 |

GCN | MAG-Source [Old layer] | 0.042 ± 0.007 | 0.228 ± 0.023 | 0.003 ± 0.003 |

MAG-Source | 0.032 ± 0.011 | 0.001 ± 0.002 | 0.002 ± 0.003 | |

Arxiv | 0.021 ± 0.007 | 0.001 ± 0.001 | 0.008 ± 0.002 | |

Arxiv [Damaged] | 0.028 ± 0.006 | 0.000 ± 0.001 | 0.009 ± 0.002 | |

Control | MAG-Source [Damaged] | 0.023 ± 0.037 | 0.000 ± 0.001 | 0.000 ± 0.009 |

G’SAGE | MAG-Source [Old layer] | 0.044 ± 0.041 | 0.239 ± 0.018 | 0.001 ± 0.010 |

MAG-Source | 0.046 ± 0.041 | 0.000 ± 0.001 | 0.003 ± 0.011 | |

Arxiv | 0.028 ± 0.040 | 0.001 ± 0.001 | 0.007 ± 0.010 | |

Arxiv [Damaged] | −0.050 ± 0.033 | 0.001 ± 0.001 | −0.021 ± 0.009 | |

Control | MAG-Source [Damaged] | 0.030 ± 0.011 | −0.002 ± 0.004 | 0.000 ± 0.002 |

GIN | MAG-Source [Old layer] | 0.183 ± 0.008 | 0.226 ± 0.004 | 0.024 ± 0.002 |

MAG-Source | 0.089 ± 0.007 | −0.001 ± 0.004 | 0.009 ± 0.001 | |

Arxiv | 0.048 ± 0.009 | −0.001 ± 0.004 | 0.005 ± 0.001 | |

Arxiv [Damaged] | 0.061 ± 0.016 | −0.001 ± 0.004 | 0.006 ± 0.003 |

**Table 8.**Transfer metrics for real-world node classification experiments (10 runs). Bold results are not statistically greater than the best at $p=0.1$. We evaluate significance for each source-task/metric combination.

Source Task → MAG-Target (Spectral Dist.) | Model | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

MAG-Source [Old layer] | GCN | 0.042 ± 0.007 | 0.228 ± 0.023 | 0.003 ± 0.003 |

G’SAGE | 0.044 ± 0.041 | 0.239 ± 0.018 | 0.001 ± 0.010 | |

GIN | 0.183 ± 0.008 | 0.226 ± 0.004 | 0.024 ± 0.002 | |

MAG-Source | GCN | 0.032 ± 0.011 | 0.001 ± 0.002 | 0.002 ± 0.003 |

G’SAGE | 0.046 ± 0.041 | 0.000 ± 0.001 | 0.003 ± 0.011 | |

GIN | 0.089 ± 0.007 | −0.001 ± 0.004 | 0.009 ± 0.001 | |

Arxiv | GCN | 0.021 ± 0.007 | 0.001 ± 0.001 | 0.008 ± 0.002 |

G’SAGE | 0.028 ± 0.040 | 0.001 ± 0.001 | 0.007 ± 0.010 | |

GIN | 0.048 ± 0.009 | −0.001 ± 0.004 | 0.005 ± 0.001 |

**Table 9.**Transfer metrics for synthetic node classification (10 runs). Bold results are not statistically greater than the best at $p=0.1$. We evaluate significance for each model/metric combination.

Model | Source Task | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

GCN | C.1 - M${}_{\uparrow}$I${}_{\uparrow}$ | −0.203 ± 0.031 | 0.031 ± 0.044 | −0.103 ± 0.026 |

C.2 - M${}_{\uparrow}$I${}_{\downarrow}$ | −0.108 ± 0.065 | 0.012 ± 0.039 | −0.036 ± 0.038 | |

C.3 - M${}_{\downarrow}$I${}_{\uparrow}$ | −0.176 ± 0.026 | 0.001 ± 0.068 | −0.092 ± 0.020 | |

G’SAGE | C.1 - M${}_{\uparrow}$I${}_{\uparrow}$ | 0.083 ± 0.086 | 0.026 ± 0.024 | 0.041 ± 0.042 |

C.2 - M${}_{\uparrow}$I${}_{\downarrow}$ | 0.100 ± 0.102 | 0.004 ± 0.031 | 0.073 ± 0.051 | |

C.3 - M${}_{\downarrow}$I${}_{\uparrow}$ | 0.169 ± 0.139 | −0.024 ± 0.042 | 0.082 ± 0.084 | |

GIN | C.1 - M${}_{\uparrow}$I${}_{\uparrow}$ | 0.161 ± 0.099 | 0.010 ± 0.059 | 0.050 ± 0.042 |

C.2 - M${}_{\uparrow}$I${}_{\downarrow}$ | 0.211 ± 0.076 | 0.001 ± 0.046 | 0.061 ± 0.029 | |

C.3 - M${}_{\downarrow}$I${}_{\uparrow}$ | 0.112 ± 0.070 | −0.006 ± 0.060 | 0.031 ± 0.023 |

**Table 10.**Transfer metrics for real-world graph classification experiments (10 runs). Bold results are statistically greater than the control at $p=0.1$. We evaluate significance for each model/metric combination.

Model | Source Task → HIV-Target | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

Control | HIV-Source [Damaged] | −0.002 ± 0.015 | 0.002 ± 0.017 | 0.014 ± 0.012 |

GCN | HIV-Source | 0.065 ± 0.010 | 0.148 ± 0.009 | 0.031 ± 0.017 |

BBBP | 0.036 ± 0.012 | 0.047 ± 0.016 | 0.026 ± 0.011 | |

BBBP [Damaged] | −0.007 ± 0.013 | −0.008 ± 0.021 | 0.000 ± 0.011 | |

Control | HIV-Source [Damaged] | −0.069 ± 0.023 | −0.029 ± 0.049 | −0.038 ± 0.021 |

G’SAGE | HIV-Source | 0.030 ± 0.006 | 0.160 ± 0.016 | 0.011 ± 0.014 |

BBBP | 0.048 ± 0.008 | 0.072 ± 0.011 | 0.035 ± 0.009 | |

BBBP [Damaged] | −0.064 ± 0.058 | −0.067 ± 0.052 | −0.042 ± 0.064 | |

Control | HIV-Source [Damaged] | −0.197 ± 0.037 | 0.039 ± 0.048 | −0.130 ± 0.051 |

GIN | HIV-Source | 0.033 ± 0.016 | 0.186 ± 0.045 | 0.029 ± 0.040 |

BBBP | −0.059 ± 0.033 | 0.026 ± 0.046 | −0.081 ± 0.075 | |

BBBP [Damaged] | −0.157 ± 0.038 | −0.013 ± 0.017 | −0.136 ± 0.049 |

**Table 11.**Transfer metrics for real-world graph classification experiments (10 runs). Bold results are not statistically greater than the best at $p=0.1$. We evaluate significance for each source-task/metric combination.

Source Task → HIV-Target | Model | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

HIV-Source | GCN | 0.065 ± 0.010 | 0.148 ± 0.009 | 0.031 ± 0.017 |

G’SAGE | 0.030 ± 0.006 | 0.160 ± 0.016 | 0.011 ± 0.014 | |

GIN | 0.033 ± 0.016 | 0.186 ± 0.045 | 0.029 ± 0.040 | |

BBBP | GCN | 0.036 ± 0.012 | 0.047 ± 0.016 | 0.026 ± 0.011 |

G’SAGE | 0.048 ± 0.008 | 0.072 ± 0.011 | 0.035 ± 0.009 | |

GIN | −0.059 ± 0.033 | 0.026 ± 0.046 | −0.081 ± 0.075 |

**Table 12.**Transfer metrics for synthetic graph classification (10 runs). Bold results are not statistically greater than the best at $p=0.1$. We evaluate significance for each model/metric combination.

Model | Source Task | Transfer Ratio | Jumpstart | Asymptotic Performance |
---|---|---|---|---|

GCN | C.5 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | −0.026 ± 0.012 | −0.046 ± 0.024 | −0.019 ± 0.011 |

C.6 - ${I}_{\uparrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | −0.029 ± 0.009 | −0.047 ± 0.022 | −0.019 ± 0.009 | |

C.7 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$ | −0.020 ± 0.005 | −0.046 ± 0.028 | −0.012 ± 0.012 | |

G’SAGE | C.5 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | 0.007 ± 0.006 | −0.009 ± 0.030 | −0.003 ± 0.011 |

C.6 - ${I}_{\uparrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | 0.008 ± 0.010 | −0.006 ± 0.024 | −0.005 ± 0.014 | |

C.7 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$ | 0.015 ± 0.011 | −0.016 ± 0.038 | 0.001 ± 0.011 | |

GIN | C.5 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | 0.012 ± 0.021 | −0.010 ± 0.020 | 0.025 ± 0.016 |

C.6 - ${I}_{\uparrow}^{\mathrm{S}}$${I}_{\downarrow}^{\mathrm{A}}$ | 0.001 ± 0.016 | −0.023 ± 0.017 | 0.004 ± 0.016 | |

C.7 - ${I}_{\downarrow}^{\mathrm{S}}$${I}_{\uparrow}^{\mathrm{A}}$ | 0.027 ± 0.017 | −0.021 ± 0.026 | 0.027 ± 0.013 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kooverjee, N.; James, S.; van Zyl, T.
Investigating Transfer Learning in Graph Neural Networks. *Electronics* **2022**, *11*, 1202.
https://doi.org/10.3390/electronics11081202

**AMA Style**

Kooverjee N, James S, van Zyl T.
Investigating Transfer Learning in Graph Neural Networks. *Electronics*. 2022; 11(8):1202.
https://doi.org/10.3390/electronics11081202

**Chicago/Turabian Style**

Kooverjee, Nishai, Steven James, and Terence van Zyl.
2022. "Investigating Transfer Learning in Graph Neural Networks" *Electronics* 11, no. 8: 1202.
https://doi.org/10.3390/electronics11081202