A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems

Kose, Habib Taha; Nunez-Yanez, Jose; Piechocki, Robert; Pope, James

doi:10.3390/info15070377

Open AccessArticle

A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems

by

Habib Taha Kose

¹

,

Jose Nunez-Yanez

^2,*

,

Robert Piechocki

¹

and

James Pope

^1,3

¹

School of Electrical, Electronic and Mechanical Engineering, University of Bristol, Bristol BS8 1UB, UK

²

Department of Electrical Engineering, University of Linköping, SE-581 83 Linköping, Sweden

³

School of Engineering Mathematics and Technology, University of Bristol, Bristol BS8 1TW, UK

^*

Author to whom correspondence should be addressed.

Information 2024, 15(7), 377; https://doi.org/10.3390/info15070377

Submission received: 22 May 2024 / Revised: 18 June 2024 / Accepted: 21 June 2024 / Published: 28 June 2024

(This article belongs to the Special Issue Artificial Intelligence on the Edge)

Download

Browse Figures

Versions Notes

Abstract

Graph neural networks (GNNs) are powerful models capable of managing intricate connections in non-Euclidean data, such as social networks, physical systems, chemical structures, and communication networks. Despite their effectiveness, the large-scale and complex nature of graph data demand substantial computational resources and high performance during both training and inference stages, presenting significant challenges, particularly in the context of embedded systems. Recent studies on GNNs have investigated both software and hardware solutions to enhance computational efficiency. Earlier studies on deep neural networks (DNNs) have indicated that methods like reconfigurable hardware and quantization are beneficial in addressing these issues. Unlike DNN research, studies on efficient computational methods for GNNs are less developed and require more exploration. This survey reviews the latest developments in quantization and FPGA-based acceleration for GNNs, showcasing the capabilities of reconfigurable systems (often FPGAs) to offer customized solutions in environments marked by significant sparsity and the necessity for dynamic load management. It also emphasizes the role of quantization in reducing both computational and memory demands through the use of fixed-point arithmetic and streamlined vector formats. This paper concentrates on low-power, resource-limited devices over general hardware accelerators and reviews research applicable to embedded systems. Additionally, it provides a detailed discussion of potential research gaps, foundational knowledge, obstacles, and prospective future directions.

Keywords:

graph neural networks; FPGA; acceleration; embedded device; quantization

1. Introduction

Recent scientific research has witnessed remarkable strides in deep learning methodologies and neural network architectures. These advances have garnered considerable attention from researchers due to their extensive applicability in the academic and commercial domains [1]. Deep learning techniques have demonstrated efficacy across diverse domains such as computer vision, natural language processing, medical imaging, and communication systems. The key to this success lies in the adept use of large datasets with significant computing resources by specialized network models. Therefore, both software- and hardware-based applications have been the subject of extensive research aimed at enhancing the prospects of deep learning and facilitating its widespread adoption.

Deep learning is a process of learning complex input data using simple parts [1]. In common machine learning and deep learning applications, these input data are typically represented by Euclidean structures. In contrast, graph data with complex relationships, such as physical phenomena, chemical bonds, protein structures, and diseases, are represented by non-Euclidean data and require specialized models [2]. Current machine learning methods for non-Euclidean data have limited performance due to their high computational cost and implementation inflexibility. Although deep learning models have proven successful on vector-based inputs, graph neural networks (GNNs) have attracted the attention of researchers due to their ability to learn complicated relationships between nodes and edges in a graph [1]. Research has shown that GNNs outperform classical deep learning models when dealing with non-Euclidean data [2].

GNNs are high-accuracy neural network models that can be trained on graph data and used for inference. Graph datasets often have complex relationships between their nodes, and GNNs can use these relationships to successfully learn local and global information. The unique structure of GNNs demonstrates impressive performance in social networks [3], friend recommendations, molecular bonding [4], e-commerce [5], and product recommendation systems [6]. Successful results have been achieved in various fields, including particle physics [7,8], natural language processing [9], traffic applications [10], anomaly detection [11], 3D manufacturing processes [12], and many academic studies [13,14,15]. In addition, GNN technology has attracted the attention of technology companies and major corporations such as Google [16], Amazon [17], Facebook [18], and Alibaba [6], which have started to include it in their strategic plans. Current research shows that GNNs will have wide-ranging applications in various areas of life, and it is foreseen that this technology will play an important role in various applications such as information processing analysis, science, industry, and daily life.

Compared to the Euclidean data used in classical neural network models, graph data require more computationally intensive resources [1]. The massive size of the data and the complexity of the connections increase the effectiveness of the GNN on graph datasets [19,20]. In addition, the irregular nature and instability of graph data pose several computational challenges [21]. Current GNN implementations for learning these complex datasets use frameworks such as the Deep Graph Library (DGL) [17], PyTorch Geometric (PyG) [22], and TensorFlow GNN [23]. These frameworks provide software support for graph neural networks on CPUs and GPUs, which are often used for convolutional neural networks (CNNs) and other well-known methods [24,25]. However, for embedded device applications such as edge devices, IoT, and mobile applications, generic frameworks are insufficient in terms of integration and efficiency. To overcome this problem, there are several works in the literature to build lightweight and fast GNN models. While these studies cover both traditional and embedded hardware, GPU and CPU implementations form a large part of the overall topic. Although the use of low-bit precision and Field-Programmable Gate Array-based (FPGA-based) accelerators to derive low-dimensional and real-time models is a well-known research topic in neural networks (NNs) [26] and convolutional neural networks [27], there has been insufficient research on GNNs [21,25,28,29]. Therefore, the study of quantized GNN models for accelerators is a promising potential research topic.

Developments in GNNs have increased the range of applications of the models, and specialized GNN implementations require specialized hardware techniques [21]. In previous neural network studies, hardware accelerators based on FPGAs have often been the preferred solution when classical neural networks face challenging situations. In the same spirit, these specialized accelerators are seen as crucial in improving the performance of GNN models and making them more suitable for use on embedded devices [30]. However, there has not yet been enough research conducted on this topic in GNNs [31]. Due to their high computational power, FPGAs are known as ideal candidates for the efficient processing of complex algorithms and fast application execution. Furthermore, the flexibility of FPGA hardware components allows customization to meet the unique requirements of users, enabling neural network models to be optimized to suit specific demands, ultimately leading to increased performance. In addition, specially designed and optimized hardware reduces power consumption by significantly reducing computational costs [32]. Low power consumption can increase the energy efficiency of embedded devices, making it advantageous for mobile applications. FPGAs offer flexibility, which opens up innovative possibilities for researchers and developers [33]. The research community aims to utilize FPGA-based accelerators to address issues such as load imbalance, memory requirements, and computing power [31]. Improved performance in GNN-based applications can be achieved through the effective optimization of FPGAs. However, for larger models, additional approaches are required to achieve performance improvements due to limited storage and computing power.

Neural network quantization constitutes an essential technique for the scaling of network models, as well as the efficient computation on hardware such as FPGAs. It serves as a means to enhance computational efficiency and alleviate memory demands by representing the model parameters with reduced bit precision [34]. The core objective underlying quantization is the minimization of bit usage, achieved through the mapping of real numbers to lower-precision equivalents while upholding accuracy standards [35]. This process holds significant promise for facilitating high-performance applications in the future by enabling a streamlined representation of the parameters crucial for matrix multiplication operations. A notable advantage of quantization is its capacity to accommodate lightweight neural networks by employing fewer bits for both activation and model weights. Integer-based representations, which are central to quantization, offer enhanced processing efficiency on FPGAs compared with floating-point numbers [21]. Although floating-point calculations often entail intricate operations, integer computations can be executed more straightforwardly and with greater efficiency [36]. This attribute is particularly advantageous for the development of efficient systems tailored for embedded devices, especially when coupled with specialized hardware, such as FPGAs. The significance of quantization techniques transcends mere computational efficiency; they play a pivotal role in optimizing the performance of FPGA-based hardware accelerators in GNN systems.

The literature examines graph neural networks (GNNs) in depth, covering general knowledge, network structures, potential gaps, and perspectives through various surveys [1,2,13,16,19,20,37,38,39,40,41,42,43,44,45,46,47,48]. In addition, the existing literature includes surveys and reviews focusing on specific applications [49,50,51,52]. For instance, Lamb et al. [53] conducted a study on the use of GNNs as a neural-symbolic computing tool, whereas Malekzadeh et al. [54] analyzed their use in text classification. Ahmad et al. [55] examined the use of graph convolutional neural networks (GCNs) in human action recognition and proposed a taxonomy of studies in this area. Other studies have provided detailed insights into the use of GNNs in various applications such as the Internet of Things (IoT) [56], network science [57], and language processing [58]. There are also general information on accelerators and efficient GNNs [30,59,60]. Liu et al. [61] approach current and future GNN work from an algorithmic perspective, while Abadal et al. [62] provide a comprehensive overview of the acceleration algorithms and GNN fundamentals.

This paper reviews studies on various architectures and designs of GNNs, including those that are both partially and fully quantized, as well as hardware-based accelerators. The GNNs examined primarily focus on computational efficiency. For hardware accelerators, the review starts with FPGA implementations suitable for edge devices and extends to larger-scale applications. This survey includes not only studies focused on specific GNN models but also those that utilize diverse model characteristics, offering a broad perspective. Our aim is to provide readers with a comprehensive understanding of the different aspects of GNNs and the latest advancements in the field. As shown in Figure 1, we review hardware-based accelerators and quantization approaches for computationally efficient GNNs, with a particular focus on energy-constrained embedded device applications.

This survey explores quantization methods and FPGA-based hardware accelerators to reduce the computational cost and model complexity of GNNs. This paper focuses on quantization approaches that can be applied directly to embedded devices or may be compatible with such hardware in the future, FPGA-based hardware accelerators, and efficient system designs using these two complementary methods together. Moreover, we provide an overview of the work on FPGA-based quantized GNNs and a review of other work on medium–large-scale FPGA accelerators and quantization methods searched through digital libraries such as MDPI, IEEE Xplore, and ACM for researchers searching for the keywords GNN/GCN, accelerated/reconfigurable hardware (FPGA), embedded systems, and quantization techniques. The motivation behind this survey is twofold. First, to provide a comprehensive study on GNN quantization, which is a promising complementary method for FPGA accelerators. Second, it aims to update the existing literature which is outdated due to the recent increase in research on FPGA-based GNN accelerators. This research makes the following contributions to the academic community:

GNN Basics and Theories: This paper presents the basic concepts of GNNs and their layers. Furthermore, it proposes a taxonomy of GNNs according to their variations.
Quantization Methods and Lightweight Models: This survey includes reviews of quantization methods aimed at building lightweight GNN models for embedded systems or GPU- and CPU-based applications.
FPGA-Based Hardware Accelerators: Our research describes in detail the work currently conducted on hardware-based accelerators (typically FPGAs) that can be used in current or future embedded device applications.
Discussion and Future Research Directions: This study discusses study outcomes for future research based on the findings and provides insights about possible research gaps.

The following sections provide a background with general information about GNN models, GNN quantization including quantization methods and studies, and FPGA-based hardware accelerator approaches. This paper finishes with future directions and a conclusion.

2. Background

Designed to uncover patterns within graph data by scrutinizing the relationships between nodes and edges, GNN models offer an effective means of extracting insights from intricate network structures. Various GNN models with different characteristics are documented in the literature [63,64,65,66,67,68,69,70,71,72]. Figure 2 illustrates an updated taxonomy that categorizes GNN variants based on their intended applications, as delineated in the existing literature.

In Figure 2, the spectral-based GNN operates the fundamental frequency components of the graph by operating on the eigenvalues of the Laplacian matrix, whereas the spatial-based GNN uses information about the neighboring node. Recurrent GNNs process graphic sequences and use gate mechanisms. Graph autoencoders encode data into low-dimensional latent space to reconstruct the data. Graph generative networks generate graph structures using recurrent neural networks, and hierarchical GNNs summarize the sets of nodes and form hierarchical structures. The federated GNN combines information from distributed data sources through federated learning. This classification summarizes the wide range of GNN architectures and their different applications for graph data.

Within the field of GNNs, one of the fundamental frameworks is message-passing graph neural networks (MPNNs). MPNNs serve as a general paradigm for GNNs, facilitating the exchange of information between nodes to learn node and edge features effectively. Specific GNN models, such as graph convolutional networks (GCNs) and graph attention networks (GATs), implement this message-passing framework using unique techniques. This flexibility and diversity within the MPNN framework allow for tailored approaches to various graph-based learning tasks, enhancing the adaptability and performance of GNN models.

Variations of GNNs, including graph convolutional networks (GCNs), graph isomorphism networks (GINs), graph attention networks (GATs), and GraphSAGE, are widely employed in this domain, serving as fundamental building blocks for graph-based applications. This section provides an overview of GNNs, encompassing their foundational principles, mathematical formulations, and commonly used layers. A list of these models can be found in Table 1. Table 2 serves as a reference for the acronyms and notations used in this and subsequent sections. This table summarizes and explains the symbols and terminology used throughout our work.

Table 1. GNN models and key features.

GNN Model	Description
GCN (graph convolutional network) [73]	For each node, a new feature vector is created by collecting information from neighboring nodes.
GIN (graph isomorphism network) [74]	For each node, an invariant feature vector is created with respect to its neighbors.
GAT (graph attention network) [75]	For each node, a new feature vector is created that considers the importance of neighboring nodes.
GraphSAGE [13]	A feature vector is generated for each node based on the different neighborhood levels.

2.1. Graph Neural Networks

A graph is defined by its vertices and edges, expressed as

G = (V, E)

, where V represents the vertices and E represents the edges. Graph neural network (GNN) models that utilize these edges and vertices to learn the network structure typically involve two main phases as shown in Figure 3: aggregation and combination phases. The aggregation phase, as shown in Equation (1), involves the computation of a new feature vector by aggregating the features of neighbour nodes. Several techniques are used to compute these feature vectors, including weighted averages, maximum values, or summations. It is important to emphasize that the aggregation process operates on the set

N (v)

of neighbour nodes and serves as an essential component in extracting contextual information for each node.

a_{v}^{l} = Aggregation (h_{u}^{(l - 1)} : u \in N (v))

(1)

In Equations (1) and (2),

h_{v}^{l}

symbolizes the feature vector of a node, and

N (v)

denotes the set of neighboring nodes. Subsequently, the combination phase, as represented by Equation (2), combines these derived feature vectors to construct a high-level feature matrix. Here, the new feature vector of each node is fused with the original feature matrix, resulting in a comprehensive representation of high-level features that can be leveraged for classification or regression applications.

h_{v}^{l} = Combination (a_{v}^{l})

(2)

The sequencing of aggregation and combination phases has been the focus of research on system efficiency [21,76]. Experimental proofs show that changing the order of aggregation and combination phases has an impact on the system efficiency. In their study, Tian et al. show that the adoption of CoAg ordering improves the efficiency of the system [25]. This research finding highlights the importance of strategic stage sequencing in optimizing the computational efficiency of GNNs.

A graph convolutional network (GCN) model, illustrated in Figure 3, processes graph-structured data through several methodical stages. The input to the GCN is a graph, represented by nodes labeled from A to H (input feature matrix), with edges illustrating the relationships between these nodes (adjacency matrix). The architecture of a GCN consists of multiple layers, each incorporating an activation function. As the data progress from the initial to the final layer, two primary operations are performed at each layer: aggregation and combination. During the aggregation phase, features from neighboring nodes are collected and aggregated. For instance, node B gathers features from adjacent nodes A, C, D, and E. In the combination phase, these aggregated features are integrated with the features of node B itself, producing a new feature vector. This process is repeated across all layers of the GCN, allowing the model to learn increasingly complex representations of the graph data. The final output layer produces results tailored to specific tasks, such as node classification or link prediction. This streamlined process from input graph representation through iterative aggregation and combination to the final output enables GCNs to effectively capture and utilize the intricate patterns and relationships inherent in graph-structured data.

2.2. Graph Convolutional Networks

GCNs [73] are a specialized type of GNNs. These models apply the convolution operations of classical neural networks on graph data. GCNs can be expressed locally for a single edge and vertex or globally for all edges and vertices. Equation (3) shows the global representation of GNNs.

H^{(l + 1)} = σ (A H^{(l)} W^{(l)})

(3)

In Equation (3),

H^{(l + 1)}

represents the input feature matrix for layer l, A denotes the adjacency matrix, and

σ

denotes the nonlinear activation function. The term

A H^{(l)}

corresponds to the aggregation phase, while

H^{(l)} W^{(l)}

refers to the combination phase. Note that in this equation, only the features of neighbouring nodes are combined, while the nodes themselves are not considered. Furthermore, multiplication by the adjacency matrix changes the scale of the feature vector. As a result, higher-degree nodes exert more influence on the aggregation process because they have more neighbours, while the influence of low-degree nodes is reduced. To address these issues, Equation (4) is proposed:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(4)

To acquire the normalized matrix

\tilde{A}

, the identity matrix I is added to the adjacency matrix A. Here,

\tilde{D}

denotes the diagonal node degree matrix.

2.3. Graph Isomorphism Networks

In 2019, Xu et al. [74] introduced graph isomorphism networks (GINs) as a solution to the analytical challenges posed by graph convolutional networks (GCNs) and GraphSAGE, particularly in the context of certain straightforward graph structures. GIN is a recent addition to the graph neural network (GNN) domain and takes a distinctive approach compared to its counterparts.

Equation (5) shows the mathematical formula for GIN, where

h_{v}^{(k)}

is the feature vector of node v.

M L P

is a multi-layer perceptron and

ϵ

represents the learnable parameter.

h_{v}^{(k)} = M L P^{(k)} ((1 + ϵ^{(i)}) \cdot h_{v}^{(k - 1)} \sum_{u \in N (v)} h_{u}^{(k - 1)})

(5)

In contrast to conventional models, GIN provides a separate representation of isomorphic and non-isomorphic graphs. This unique feature provides the GIN with mathematical robustness, which is advantageous. In particular, GIN is an important achievement by enabling the convergence of the graph to the Weisfeiler–Lehman isomorphism test under certain conditions. The unique feature of GIN enhances its ability to recognize and distinguish effectively non-isomorphic graphs characterized by topological non-identity. GIN’s mathematical ability and the ability to provide a variety of representations contribute to its ability to address the challenges posed by certain graphical structures and make it a remarkable model in the landscape of graphical neural networks.

2.4. Graph Attention Networks

Graph attention networks (GATs) [75] is one of the leading models in the GNN family, proposed by Velickovic et al. in 2018. Unlike traditional GNN models, GAT focuses on learning the importance of neighbouring nodes using an attention mechanism and directing information propagation according to this importance. In this way, GAT can improve the learning performance of the model by giving more weight to information from important neighbouring nodes. Similar to GCNs, this model computes a representation vector for each node and a weight vector for each edge. However, in GAT, the hidden representation vectors of neighbouring nodes and the weight vectors of edges are used to compute attention.

a_{i j} = \frac{exp (LeakyReLU ({\vec{a}}^{T} [W \vec{h_{i}} | | W \vec{h_{j}}]))}{\sum_{k \in N_{i}} exp (LeakyReLU ({\vec{a}}^{T} [W \vec{h_{i}} | | W \vec{h_{k}}]))}

(6)

For each node, GAT calculates the attention coefficients for its neighbouring nodes (Equation (6)). These coefficients are an indication of how important a neighbour node is. For the calculation of the attention coefficients, the node’s hidden representation vector and the hidden representation vector of the neighbouring node are used. Equation (6) utilises

a_{i j}

as attention coefficients, T for transpose, and

| |

for concatenation. Equation (7) shows the mathematical formula for GAT. Here, K is the number of heads and

W^{k}

is the linear transformation weight matrix:

{\vec{h}}_{i}^{'} = σ (\frac{1}{K} \sum_{k = 1}^{K} \sum_{j \in N_{i}} α_{i j}^{k} W^{k} \vec{h_{j}})

(7)

2.5. GraphSAGE

GraphSAGE [13] offers a general inductive learning framework that is especially useful for large and complex graphs. The GraphSAGE model, proposed by William L. Hamilton et al., learns hidden representations solely for nodes in the training data. This enables the model to generalize effectively to previously unseen nodes. GraphSAGE is a scalable framework that enables the inclusion of other models by using various aggregation and combination functions.

GraphSAGE creates an adjacency matrix for each node based on its own features and those of its neighbors. An aggregation function is used to combine the information in the neighbour feature matrix for each neighbour. The authors analyzed three different aggregation functions and found no significant difference between max- and mean-pooling. Therefore, max-pooling was used in this study. Additionally, the LSTM aggregator was also analyzed. Equation (8) shows the max-pooling aggregation function, where ’max’ is the maximum operator and sigma is the activation function.

{AGGREGATE}_{k}^{p o o l} = \max ({σ (W_{p o o l} h_{u_{i}}^{k} + b), \forall u_{i} \in N (v)})

(8)

In the combination stage, the information obtained from the aggregation functions is combined with a learnable model parameter. This combination allows for the weighting of the node’s own features and neighbour information.

2.6. Future Research Directions

While graph neural networks (GNNs) have shown significant promise in various applications, there remain several open research areas. Future work could explore the development of more efficient aggregation and combination methods to enhance computational efficiency. Additionally, investigating novel architectures that can handle dynamic graphs and evolving structures is crucial. Improving the interpretability of GNN models is another key direction, which can aid in understanding the decision-making process of these networks.

3. Graph Neural Network Quantization

3.1. Quantization

GNNs, particularly in extensive graph applications, may encounter constraints due to their intricate model complexity and size. Quantization methods present a promising remedy to tackle these issues by reducing the model’s size and computational demands. Quantization entails condensing model parameters into smaller dimensions, a pivotal step for implementing GNNs on hardware platforms such as FPGAs. However, it is crucial to recognize and handle the challenges linked with these techniques through thorough parameter adjustment procedures to optimize their advantages. This section provides an overview of the fundamental concepts of scalar and vector quantization and reviews quantization strategies developed for GNNs.

3.1.1. Scalar Quantization

Scalar quantization is a method designed to decrease the number of bits required to represent values within data. In this technique, each feature or parameter is depicted using a limited number of bits that fall within a particular range. Scalar quantization encompasses diverse methods such as uniform, non-uniform, signed, unsigned, symmetric, asymmetric, dynamic, and static quantization. These varied strategies offer adaptability in customizing the quantization process to specific needs and improving performance in various applications. Nevertheless, each approach comes with its own set of trade-offs.

The network’s real-valued input parameters are transformed into a discrete space by mapping them to quantization levels, which are evenly or unevenly divided using scalar quantization. Figure 4 illustrates the quantization levels generated by uniform and non-uniform quantization. In non-uniform quantization, the distribution of the data determines the quantization levels. Although this method yields improved quantization points, it is more complicated to execute compared to uniform quantization.

For scalar quantization, decisions must be made about the type of quantization to use, such as signed or unsigned, and symmetric or asymmetric. These decisions are typically influenced by the data distributions within the datasets and can vary accordingly. Signed quantization creates both negative and positive quantization points in the quantization domain. Unsigned quantization uses only the positive side of the number line. Additionally, symmetric and asymmetric quantization refers to the treatment of the positive and negative sides of the quantization domain. However, implementing asymmetric quantization is more challenging due to the additional processing involved. Figure 5 illustrates the methods of symmetric–signed and asymmetric–unsigned quantization.

Similar to asymmetric quantization, dynamic quantization offers another option that increases both implementation and computational complexities. Unlike static quantization, dynamic quantization recalibrates the quantization ranges for input parameters each time, enabling each input datum to be quantized within the range that best represents it. Consequently, this approach improves the accuracy of the model. However, due to the dense nature of input data in graph neural networks, dynamic quantization is anticipated to incur significant computational overhead. Nevertheless, despite its computational demands, it may be preferred in applications where the highest accuracy is crucial.

3.1.2. Vector Quantization

The core elements of vector quantization are codebooks and codewords. A codebook comprises vectors containing all possible codewords. Vector quantization typically involves identifying the closest codeword based on a distance metric and associating the input vector with that codeword. Mathematically, given an input vector (

x

) and a codebook (

C

), vector quantization is expressed as in Equation (9). Here, the function (

Q (x

)) ensures that the vector (

x

) is matched to its nearest codeword in the codebook (

C

). Euclidean distance is commonly employed as the distance metric.

Figure 6 illustrates a vector quantization map, where Voronoi diagrams, depicted by dashed lines, play a crucial role in the quantization process. Each input vector is quantized relative to its Voronoi cell, with each x-point within these areas representing the vectors therein. The circles in the figure signify the codewords that most accurately represent the vectors in each region.

Q (x) = arg min_{w \in C} {∥ x - w ∥}^{2}

(9)

Vector quantization involves grouping values together and mapping them to a set of prototype vectors, typically in a high-dimensional space. Because of this, vector quantization is more effective than scalar quantization in preserving the structure and relationships of data, making it preferable for processing more complex data structures. In FPGA-based applications, both scalar and vector quantization techniques can be utilized. Although scalar quantization offers a simpler and less resource-intensive implementation of FPGAs, vector quantization can handle more intricate data structures and relationships. However, it generally requires more resources and is more complex to implement on FPGAs. When designing FPGA-based applications, it is important to choose the optimal solution by considering the advantages and limitations of both quantization methods.

3.2. Quantization Approaches for GNNs

Quantization emerges as a powerful approach capable of supporting the efficiency of GNN models across both the training and inference phases. By reducing the memory footprint and reducing the usage of computational resources, quantization methods offer a viable solution to improve the efficiency of GNN. Quantization techniques are commonly used to improve efficiency across various hardware platforms, including large-scale systems such as GPUs and CPUs, as well as small-scale devices such as IoT and edge devices. This preference stems from their ability to achieve high compression rates and deliver computational speed advantages, making them indispensable tools in the quest for streamlined GNN deployment in real-world scenarios. Understanding the quantized parameters used in the reviewed studies is important for a deep understanding of this field. In addition to activations and weights, some concepts that are frequently referred to later in the paper are explained below:

Activation functions determine the output of each layer and provide a nonlinear transformation. It is necessary for the model to learn nonlinear relationships; therefore, activation functions are an essential component. Activations can be quantized to reduce computational costs.
Feature matrix is a matrix that shows all the node features of graphs. Each row represents a node and the columns represent features. The feature matrix generates the largest input data of the GNN, and quantization significantly reduces the size and processing cost of these data.
Weights are learnable parameters that refer to the connections between nodes or layers and can be optimized to improve the predictions of the model. Quantizing the weights reduces the memory requirements and computational costs of the model and facilitates the fitting of pre-trained models. They can often be quantized to lower the number of bits.
Attention coefficients determine the degree of importance between nodes. These coefficients are calculated using the attention mechanism and allow the model to focus on specific nodes or edges. Quantization of this parameter in GNN models with attention coefficients is important for fully quantized efficient networks.
Convolution matrix is a matrix used to combine the features of neighboring nodes and create new feature vectors. This matrix usually contains learnable weights and contains information specific to the graph structure. Quantization can reduce the memory and processing power required to store and compute this matrix.
Weighted adjacency matrix represents the relationships between the nodes of a graph and the strength of these relationships. The weighted adjacency matrix is one of the basic building blocks of GNNs for information propagation. Quantization in models containing this matrix can reduce the computational cost.

This section provides a review of the quantization studies for GNNs in the literature. Details of some studies are listed in Table 3. The reviewed works use different datasets and GNN layers. To ensure a fair comparison, accuracy graphs based on the results of studies with common datasets and layers are presented in Figure 7 and Figure 8.

Quantization is an effective approach to overcome the challenges posed by graph data, not only for embedded devices but also for GPU-based applications. Tango [77] aims to accelerate GNN training on GPU systems using techniques such as GEMM, SPMM, and SDDMM. The authors propose a set of rules and stochastic rounding for GPUs to speed up training without compromising accuracy. In addition, on-the-fly quantization and dequantization techniques are used to reduce the GNN training time. Tango can be integrated with DGL for performance enhancement without requiring any modifications. Novkin et al. [78] propose in their study a QAT approach that simulates approximate multiplication operations using CUDA kernels. The method aims to maintain the overall accuracy of the system by quantizing the model’s weights and activations into different bit numbers, incorporating quantization and approximation-aware training. The authors share their experimental results with layers such as GAT, GIN, SAGE, and GCN on different datasets using the PyTorch Geometric framework. Another GPU-based study, AdaQP [79], takes advantage of integer quantization to reduce communication traffic between devices during distributed training. The authors propose two approaches to address this issue in their new system. The first approach involves reducing communication traffic by lowering the precision of transmitted messages through stochastic integer quantization. The second approach is to optimize resource usage by parallelizing computations of central nodes and message communication of nodes on the edge.

GCINT [80] proposes an alternative QAT method for INT tensor cores of GPUs. This method quantizes all model parameters, including weights, activations, gradients, errors, and loss, to INT 8-bits. The dynamic structure of this method establishes an architecture that is independent of datasets and weight distributions by adaptively adjusting the quantization range. Another study that aims to accelerate GNNs on GPU tensor cores using CUDA kernels is QGTC [81]. QGTC provides a flexible structure capable of performing various bit widths and computation optimizations. The authors demonstrate the integration of their approach with PyTorch, showcasing fast results with popular frameworks by using quantized adjacency matrices, weight matrices, and node embedding matrices.

Scalar and vector quantization can be used separately, although there are instances in the literature where they are used together to achieve high compression rates and computational efficiency using integer arithmetic. BiFeat [82] proposes feature quantization in GNNs by combining the quantization approaches BiFeat, BiFeat-SQ, and BiFeat-VQ. The authors claim that this approach resolves GPU memory and bottleneck issues with an acceptable loss of accuracy, providing up to a threefold acceleration.

Quantization is a robust method for making GNNs applicable to energy-constrained devices. Eliasof et al. [83] note in their study that the complex and large structure of GNNs is not suitable for resource-constrained devices. The authors demonstrate in their research that the adverse effects of using very low bit counts on accuracy can be mitigated through Haar Wavelet transformation. As a result of this QAT approach, it is shown that the use of 4- and 8-bit values provides gains with minimal loss in both memory and computation, offering a potential solution to the bottleneck issues in devices used in real-world applications. Another study in this area is the segmented quantization method proposed by Dai et al. [84] with the aim of high accuracy and low cost of computation. This research reduces the error caused by linear quantization through the creation of segments. Furthermore, the study introduces a hardware design that utilizes the benefits of quantization for GNN accelerators.

Dissatisfied with current mixed-precision use in CPUs and GPUs, Zhu et al. [85] propose a scalable hardware accelerator for learning quantization parameters using their proposed addition-aware mixed-precision quantization method. The authors recommend using different bit widths for each node, local gradient, and the nearest neighbor strategy to address the significant accuracy losses resulting from the utilization of quantization methods that overlook the overall structure of GNNs. Another study, proposed by Wang et al. [86], aims at an efficient GNN structure for resource-constrained embedded systems. This paper presents a GNN architecture that considers quantization effects throughout all its stages, including QLR-BT (quantization learn range–skewness-aware bitwise truncation) and SMP (smoothness-aware message propagation). QLR aims to reduce the model size, while SMP aims to maintain accuracy by preventing excessive smoothing.

EXACT [87] is proposed by Liu et al. as a method to obtain lighter GNN models through random projection and quantization. The authors provide a GPU implementation and use random projection to represent activations in a low-dimensional space while compressing activations with integer quantization. They note that there is an effect between the random projection method and quantization, and that quantization after random projection has a limited effect. However, it is emphasized that this method yields better results in terms of time compared to single use. In this method, only activations are quantized, while gradients are computed using their dequantized versions, and all multiplications are performed with full precision due to the GPU hardware used. The study results demonstrate an improvement in memory space and time overhead with acceptable accuracy losses. The study proposes using the EXACT framework as an extension of PyTorch and PyTorch Geometric in combination with various GNN models.

In addition, Eliassen et al. [88] present an improved EXACT version. This application reduces memory consumption and training time through block-based quantization, resulting in a memory gain of over 15% compared to the original EXACT method. The authors achieve both memory savings and model acceleration by quantizing node embeddings in larger pieces rather than individually. While the original EXACT method limits quantization boundaries to integer values, assuming a uniform distribution of activations, Eliassen et al. show that activation maps should be expressed with a normal distribution. This approach improves the quantization process by providing variance estimates for the distribution of intermediate activations.

Table 3. Quantization studies on graph neural networks.

Publication	Year	Targetted Problem	Quantized Parameters	Datasets
Degree-Quant [36]	2020	Inefficient training/inference time	Weights, Activations	Cora, Citeseer, ZINC, MNIST, CIFAR10, Reddit-Binary
EXACT [87]	2021	High memory need for training	Activations	Reddit, Flicker, Yelp, obgn-arxiv, obgn-products
VQ-GNN [89]	2021	Challenges of sampling-based techniques	Convolution matrix, Node feature matrix	ogbn-arxiv, Reddit, PPI, ogbl-collab
SGQuant [90]	2020	High memory footprint for energy-limited devices	Feature matrix	Cora, Citeseer, Pubmed, Amazon—computers, Reddit
LPGNAS [91]	2020	Insufficient optimization research in the field	Weights, Activations	Cora, Citeseer, Pubmed, Amazon—computers and photos, Flicker, CoraFull, Yelp
Bahri et al. [92]	2021	Challenges in real-world applications due to model size and energy requirements	Weights, Feature matrix	obgn-products, obgn-protein
Wang et al. [93]	2021	Model inefficiency and scaling problems with inefficient real-valued parameters	Weights, Attention coefficients, Node embeddings	Cora, Citeseer, Pubmed, Facebook, wiki-vote, Brazil, USA
EPQuant [94]	2022	Limited availability on edge devices due to high requirements	Input embeddings, Weights, Learnable parameters	Cora, Citeseer, Pubmed, Reddit, Amazon2M
Bi-GCN [95]	2021	High memory requirements of GNNs	Weights, Node features	Cora, Pubmed, Flicker, Reddit
Dorefa-Graph [96]	2024	Cost-effective computing on embedded systems	Feature matrix, Weights, Adjacency matrix, Activations	Cora, Pubmed, Citeseer

VQ-GNN [89] proposes a vector quantization approach to prevent neighbor explosion when scaling large GNNs. Instead of using sampling methods, Ding et al. introduce reference vectors that learn and update the messages passed in each mini-batch. The proposed message-passing and backpropagation algorithm of VQ-GNN provides a robust framework against neighborhood explosion. The paper clearly outlines the challenges faced by neighborhood-sampling, layer-sampling, and subgraph-sampling approaches. VQ-GNN offers vector quantization of convolution and node feature matrices to reduce the size of GNNs. The paper distinguishes between intra-mini-batch messages and out-of-mini-batch messages as the two ways in which the quantized messages are expressed. The authors state that their proposed method enables training and inference operations in large GNNs to be performed similarly to normal neural networks. The study focuses on performance analysis and memory size assessments. Based on the results obtained, the proposed model performed as well as or better than the other models. The authors also acknowledge that VQ-GNN may require additional memory in certain scenarios and emphasize that it should be supported by complementary techniques.

SGQuant [90] proposes a scalar quantization method to address the efficiency issues caused by the high memory utilization of graph neural network (GNN) systems, particularly in IoT and edge devices. To solve this problem, Feng et al. implement a multi-granular quantization scheme that includes layer-wise, component-wise, and topology-aware functions to express node embeddings with a low number of bits. In this process, the proposed algorithm is used to automatically select the appropriate number of bits. The paper highlights that aggressive quantization practices can result in high accuracy loss, while more lenient approaches offer no gain. The input features are quantized, rather than other network parameters, as they comprise the majority (99%) of the network. SGQuant utilizes a straight-through estimator for gradients and a specialized layer from the PyTorch Geometric library for quantized inference and backpropagation. The automatic bit selection algorithm aims to achieve the maximum compression ratio with minimal loss of accuracy by selecting the optimal bit value. This algorithm comprises two essential components: the machine learning cost model and the discovery scheme. The study utilized two distinct groups of datasets. The first group represented relatively small graph datasets, while the second group represented larger datasets. These datasets were passed through three different GNN layers: GCN, AGNN, and GAT.

Degree-Quant [36] proposes an integer quantization method that supports acceleration in graph neural networks (GNNs) and can be used efficiently on low-energy consumption hardware. In this work, Tailor et al. aim to efficiently utilize GNN applications on energy-limited devices such as smartphones, regardless of the architecture. The approach uses uniform quantization levels of 8- and 4-bits over message-passing neural networks (MPNNs) to achieve acceptable accuracy and speedup while taking advantage of the ease with which integer quantization can be applied to models. Three different GNN models are used in the study (GCN, GAT, GIN). The authors focus on inaccurate weight updates and unrepresentative quantization weights to deal with error sources. In the approach, protection masks are stochastically generated at each layer during training, and mask-protected nodes are represented with full precision at each stage. During the inference phase, all operations are performed with low bit values. A preprocessing phase is required to generate and compute the masks. For testing, different datasets are used for three different tasks (node classification, graph classification, and graph regression). According to the results obtained, the DQ-8-bit provides equal or more accuracy compared to fully fine-tuned base models, while the DQ-4-bit provides 8x compression. These results demonstrate the effective applicability of the Degree-Quant method in GNNs at low bit levels and its potential to improve model performance.

Several other works propose quantization architectures using node degrees in addition to the Degree-Quant study. Chen et al. [97] emphasize the need for efficient systems and the scarcity of computations when using GNN models on energy-constrained devices. They present a topology-based quantization strategy using Personalized PageRank (TQPP) to address this issue. TQPP analyzes the structure of the graph, determines node importance, and creates masks to preserve sensitive nodes based on their importance. This quantization approach accelerates computations and saves memory by separating nodes according to their sensitivities while maintaining accuracy. The study examines the Degree-Quant baseline and shows higher accuracy values at the same bit levels compared to the baseline. Guo et al. [98] propose a degree-based study to address the challenge of applying GNNs to resource-constrained devices. The study suggests protecting sensitive nodes with a sensitivity determiner mask while applying dynamic mixed-precision quantization to other nodes to reduce the model size without compromising accuracy. The proposed method is implemented using GCN, GAT, and GAT layers on four different datasets.

Bahri et al. [92] propose a binary quantized GNN that can operate on energy-constrained devices. The authors highlight that although the use of non-Euclidean data makes GNNs challenging in several aspects compared to CNN models, small network sizes can be achieved through a controlled training process and model design. The paper demonstrates the application of various approaches, including Hamming space, knowledge distillation, and XNOR-Net for GNNs, and presents the results obtained on an ARM device. The quantization method follows a multi-stage structure, beginning with the use of a trained base model where real numbers are used for the parameters. The quantization process is then replaced by the

t a n h

function. During the second stage, the trained model serves as a teacher for the training of the second model. Binary activations and real number weights are used in this stage. The quantization process is structured based on the sign operator. However, this can result in continuous zero gradients, so the system employs a straight-through estimator. In the following stage, the new model becomes the teacher in the same way, and both weights and activations are binary during the next training. The paper introduces the dynamic graph CNN model and compares it to other approaches, such as direct binarization. The authors demonstrate the speedup effects of the work using a Raspberry Pi 4B board.

Another binary quantization approach is Bi-GCN [95]. The method proposed by Wang et al. attempts to overcome the problems of loading the entire graph into memory by quantizing weights, network parameters, and input features. Additionally, it aims to accelerate multiplication operations within the network using binary operations. The authors reduce quantization errors by adding a scalar value to weights and features after the binarization process. Furthermore, an efficient training phase is provided with the created backpropagation method. Wang et al. recommend BGN [93], a graph attention mechanism-based approach where weights and activations are quantized as binary. In their study, an attention mask is applied to preserve the structural information of the model, and the resulting attention coefficients are expressed in binary. The authors explore two different estimators, the straight-through estimator and reinforce, to overcome the problem of untrainable parameters caused by zero gradients, preferring the STE due to its advantages.

EPQuant [94] is a method proposed by Huang et al. that combines product quantization and scalar quantization approaches to obtain viable GNN models for energy-constrained devices. The authors investigate the vector quantization approach and exploit the efficient structure of integer arithmetic with scalar quantization while achieving high compression with advanced product quantization. Vector quantization offers higher compression rates than scalar quantization by quantizing multiple vectors together. However, this clustering process can lead to longer processing times and increased memory requirements. To address these issues, the authors suggest using index-based and hash-based batching. The study introduces EPQ and SQ blocks that not only quantize the input data but also perform the quantization of weights and other learnable parameters within these blocks. The PyG architecture is utilized in the proposed method to replace layers with quantized versions. The study’s results test various cases where vector and scalar quantization are performed on different layers and datasets.

Kose et al. propose Dorefa-Graph [96], a fully quantized network model, to enhance GCN performance on embedded devices. The authors apply the Dorefa-Net algorithm designed for CNN models to build a lightweight GCN model, and present a modified version of Dorefa-Graph to adapt to the data structure. The paper focuses on creating a fully quantized model using scalar quantization on model parameters such as model weights, weighted adjacency matrix, input features, and activations. The study employs scalar quantization methods due to their ease of computation in FPGA applications and simple implementation in embedded devices. Dorefa-Graph offers a GCN model suitable for embedded devices. However, the results are demonstrated through simulation experiments on GPUs. The impact of quantization error and dequantization effects on accuracy during inference is analyzed by the authors in several cases. The study examined the accuracy values by testing various bit values for two GNN layers and three quantization approaches. The authors demonstrate that their proposed method outperforms the original Dorefa-Net algorithm on GNNs, particularly at low bit levels. They also show that quantization can be carried out at higher bit levels with acceptable accuracy losses.

Zhao et al. propose LPGNAS [91], an approach to systematically quantize GNNs using Network Architecture Search (NAS). The authors create a NAS structure that includes different quantization approaches for different blocks at the micro-architecture level. This structure has single-path, one-shot, and gradient-based features. The quantization function is applied by LPGNAS after labeling the possible parameters to be quantized. The LPGNAS algorithm applies the quantization function to both learnable parameters and activations, as well as input data due to its ease of multiplication and size. The study uses a large set of datasets, with one group consisting of relatively small networks and the other group consisting of larger datasets. The study shows that the LPGNAS method selects binary and ternary levels for weight quantization, while it prefers higher bit numbers, such as 8-bit, for activations. Previous studies have typically quantized networks with 4-bit weights and 8-bit activations.

Figure 7 and Figure 8 compare the results of quantization studies using identical GNN layers and datasets, as described in the original papers. It is important to note that the number of quantized bits varies across these studies. Some studies dynamically adjust the number of bits during the process, while others use a fixed number throughout. The specific bit counts used in each method are provided in the figure legends. Quantization studies often differ depending on which parameters are quantized. While most prioritize quantizing activations and weights, some employ specialized strategies that quantize additional parameters. Furthermore, studies vary in their approach to quantization, including post-training quantization and quantization-aware training methods. The results presented in the figures aim to offer a general comparison for readers. To perform a thorough comparison, reviewing the original results of each study would be advantageous. Upon analysis, it becomes clear that the number of bits, the quantization method, and the stage of quantization (post-training vs. quantization-aware) each have distinct impacts on outcomes. However, studies generally produce comparable results on the same datasets, highlighting the dependency of datasets in quantization studies.

Figure 7. Accuracy results of quantization methods using GCN layer for three datasets. Result parameters from the studies: SGQuant (average reduced precision), Bi-GCN (1-bit int), Degree-Quant (8-bit int weights and activations), EPQuant (ap_wf—quantized features and full precision parameters), Wang et al. [93] (1-bit int), and Dorefa-Graph (8-bit int features, weight and adjacency matrices).

Figure 8. Accuracy results of quantization methods using GAT layer for three datasets (left). Accuracy results of quantization methods using GraphSage layer for ogbn-product datasets (right). Result parameters from the studies: SGQuant (average reduced precision), Degree-Quant (8-bit int weights and activations), EPQuant (ap_wf—quantized features and full precision parameters), Wang et al. [93] (1-bit int), EXACT (2-bit int), and Bahri et al. [92] (1-bit int).

3.3. Future Research Directions

In the domain of quantized graph neural networks (GNNs), numerous research opportunities present themselves. One notable direction is enhancing quantization methods to better integrate with diverse GNN architectures, striving to maintain a balance between computational efficiency and accuracy. Researchers might develop adaptive quantization techniques that adjust according to the unique characteristics of input graphs, thus improving performance across various applications. Additionally, exploring the combination of scalar and vector quantization methods could further enhance the effectiveness of quantization operations. Investigating the resilience and generalization of quantized GNNs across different scenarios will also be vital to understanding their practical applications. Moreover, the co-design of software and hardware to create specialized accelerators, especially for use in resource-limited settings such as edge devices, represents another critical research area. These integrated approaches could lead to substantial improvements in both performance and scalability, fostering a wider adoption of quantized GNNs in practical, real-world scenarios.

4. Graph Neural Network Acceleration

The intricate and computationally demanding nature of GNNs, especially when operating on large graphs, often leads to time-consuming processes and efficiency constraints. This is further compounded by performance limitations encountered on conventional CPUs and GPUs. In this regard, hardware accelerators are emerging as pivotal components to facilitate the swift and efficient processing of GNNs.

The major challenges in accelerating GNNs revolve around their complex structure and significant data sizes. Both training and inference tasks require significant computational power and memory resources. Conventional hardware may prove inadequate for such computations, thereby prolonging the overall processes. Conversely, hardware accelerators, leveraging their parallel processing capabilities and tailored computing units, expedite the training and inference processes of GNNs. They offer the requisite low latency crucial for real-time applications.

4.1. Hardware-Based Accelerator Approaches

In the existing literature, various accelerator designs proposed for diverse hardware platforms such as GPU-CPU [99,100,101], ASIC [29,102,103,104,105,106,107,108], and FPGA [21,31,34,109], among others, contribute valuable insights to this domain. This chapter provides a brief overview of other hardware-based accelerators before an in-depth review of FPGA-based accelerators. Additionally, Figure 9 presents a comparison of different hardware accelerators using HyGCN as a baseline.

HyGCN [102] is introduced as a solution to tackle the challenges posed by hybrid execution models of GCNs, which involve a combination of irregular aggregation stages and regular combination stages. To effectively handle the dynamic and irregular nature of the aggregation process, as well as exploit the static and regular properties of the combination process in GCNs, the proposed accelerator adopts a hybrid architecture. In order to ensure parallelism in both the aggregation and combination phases, the authors devise a programming model. Additionally, they propose a hardware design that incorporates dedicated aggregation and combination engines. HyGCN achieves significant speedup and energy reduction on both CPU and GPU platforms, highlighting its effectiveness in enhancing the execution of GCNs.

EnGN [29] is proposed for the efficient processing of large-scale GNNs in real-world applications as a dimension-aware accelerator architecture that optimizes GNN computations based on input and output feature sizes and eliminates the need for external memory access. The authors present a dimension-aware stage reordering (DASR) strategy for partitioning large graphs into intervals and shards. The work achieves on-chip memory efficiency using graph tiling and scheduling techniques and minimizes data dependencies between tiles. The accelerator uses a ring-edge-reduction (RER) dataflow to address irregular memory access patterns in GNN propagation, improving computational efficiency.

In Figure 9, ASIC-based accelerators such as HyGCN, EnGN, Grip, and GCNAX show superior performance in terms of both acceleration and energy efficiency, while the positive effects of quantization techniques are observed especially in methods such as QEGCN. Although FPGA-based solutions, such as AWB-GCN and FP-GNN, may lag behind in performance in certain scenarios, their hardware flexibility and adaptability make them preferred choices. These results highlight the significant performance improvements achievable through quantization techniques and the advantages of using FPGA for their flexibility and compatibility in diverse applications.

GRIP [108] is proposed with the aim of improving the efficiency of GNNs by addressing the problems of delay and energy consumption. Kiningham et al. show improvements in latency reduction and energy efficiency for GNN inference tasks by exploiting special architectural features. The study evaluates the performance of GRIP on various GNN models and datasets, analyzes the impact of architecture and model parameter tuning, and assesses the effectiveness of GNN optimizations integrated into GRIP.

Rubik [105] addresses the challenge of learning from graphs by optimizing software and hardware accelerations in GCN learning to improve energy efficiency and performance. In this work, the authors present a hierarchical computing paradigm that separates graph-level and node-level computation for specific optimizations. The paper includes a lightweight graph reordering technique to improve graph-level data reuse and a custom GCN accelerator architecture with a hierarchical spatial design for efficient data locality utilization. Furthermore, the authors propose a hierarchical mapping methodology that optimizes both graph-level and node-level computations to improve data reuse and task-level parallelism.

GCNAX [106] is proposed as an accelerator designed to address irregularity in the aggregation phase and exploit regularity in the combination phase of GCNs. The authors present a systematic design space exploration to optimize performance and minimize off-chip data accesses, and use a cycle-accurate simulator for evaluation and ASIC synthesis for area and power estimation. The baseline hardware used for comparison includes two GCN accelerators and one SpMM accelerator. The benchmark is used to evaluate the efficiency of flexible data streaming.

H-GCN [110] proposes a hybrid accelerator design that utilizes Xilinx Versal ACAPs to enhance GNN inference performance. The paper explores graph heterogeneity and utilizes a hybrid PL and AIE architecture to fully exploit the computational capabilities of ACAPs for GNN computations. The H-HCN involves graph partitioning, the utilization of PL and AIE components, exploring sparsity support in the AIE, and creating an effective approach for mapping sparse matrix–matrix products to the systolic tensor array. Using the Xilinx Versal VCK5000 platform, the paper demonstrates the positives of H-GCN over CPU and GPU platforms on various graphics datasets by comparing speedups, energy efficiency, and inference latency with existing GCN accelerators.

4.2. FPGA-Based Accelerators Approaches

FPGA-based accelerators are advantageous over alternative hardware accelerators due to their inherent flexibility in handling variable states. They are equipped with configurable logic and memory blocks, making them adaptable to specific computing requirements. Additionally, they offer parallel processing capabilities and low power consumption, distinguishing them from fixed-architecture accelerators like GPUs. The flexibility of FPGAs is particularly valuable in scenarios that involve variable graph structures and dynamic data flows. Furthermore, FPGAs often outperform other accelerators in terms of energy efficiency and performance, offering low power consumption at high data processing speeds. However, programming and optimizing FPGAs can be complex and time-consuming, which may limit their widespread adoption. Accessing relevant information about FPGAs can be challenging due to their comparatively lower usage prevalence compared to GPUs.

The literature presents software and hardware architectures that address different scales of FPGA applications, ranging from small to large deployments. Some studies focus exclusively on FPGA research, while others provide insight into heterogeneous system architectures that encompass larger-scale applications.

AWB-GCN, proposed by Geng et al. [21], endeavors to tackle the challenge of workload imbalance encountered in real-world graph inference scenarios. By dynamically redistributing tasks among processing elements, AWB-GCN aims to optimize hardware resource utilization and enhance performance. The authors employ hardware-based automatic tuning techniques to adaptively rebalance workloads in real-time, thus improving the efficiency of GCN inference across diverse graph structures. In their study, Geng et al. conducted an evaluation focusing on several key aspects, including processing element (PE) utilization, performance metrics, energy efficiency, and hardware consumption. Through analysis, the research compares the PE utilization, performance metrics, energy efficiency, and hardware resource consumption of the Intel D5005 platform across different datasets. Overall, the findings presented by Geng et al. shed light on the significance of dynamic workload distribution in improving the efficiency and effectiveness of real-world graphics inference tasks, particularly within the context of GCN inference optimization.

ACE-GCN [111] is proposed by Romero et al. for the efficient processing of graph-structured data by exploiting the sparsity and power law distribution in real-world graph datasets for efficient graph convolutional embedding. The authors effectively utilize first-order subgraph similarity, feature exchangeability, and structure redundancy to improve graph convolutional embedding. This addresses performance and resource scalability issues in graph neural network accelerators. The task at hand is to transfer computational complexity to storage capacity while maintaining high accuracy, resulting in speedup gains compared to baseline methods. ACE-GCN optimizes on-chip memory utilization and computational efficiency by customizing configurations for different datasets and using an auxiliary similarity estimation circuit. The results demonstrate that the proposed method provides speedup on datasets compared to baseline models, and in some cases outperforms other FPGA accelerators.

I-GCN [112] aims to improve data locality and reduce redundant computation in GCNs, particularly for efficient inference on large-scale graphs. According to Geng et al., current hardware accelerators for GCNs face challenges in dealing with the irregular non-zero distribution, high sparsity, and significant size of real-world graphs, leading to inefficiencies in data processing and computation. This paper presents the islanding algorithm, which identifies clusters of nodes with strong internal connections. This enables graph data to be processed at the level of centers and islands rather than individual nodes, resulting in improved data reuse and reduced redundant computation. The islanding process of the research identifies clusters of nodes with strong internal connections, which leads to improved data locality and a reduction in the movement of data off-chip. Figure 10 shows the comparison results for these 3 studied methods. These methods use the same hardware and the same datasets as the experimental configuration.

Lin et al. [113] aim to efficiently accelerate GCN inference on FPGA platforms for cloud-based applications, such as e-commerce and recommender systems. They address the challenges of existing solutions, such as high latency inference and energy inefficiency. The paper proposes a new approach to GCN inference acceleration on FPGA. The paper suggests using a partition-centric mapping strategy and HLS-based core design to reduce memory access overhead, leverage data reuse, and achieve significant data parallelism. The authors utilize kernel designs in Vitis-HLS and OpenCL on the Xilinx Alveo U200 platform. They evaluate the performance of the designs using large-scale datasets such as Reddit, Yelp, and Amazon-2M. To do this, they use a two-layer Vanilla-GCN model with detailed specifications about GCN layer sizes and operations.

Zhang et al. [114] propose an accelerator approach to address the problem of limited on-chip memory when dealing with massive node-attributed graphs with static topologies and to improve GCN inference efficiency on FPGA. Their algorithm–architecture co-optimization approach uses data partitioning, decomposition, and node reordering, which distinguishes it from traditional methods. The article describes a hardware architecture pipeline that supports aggregation and transformation kernels, along with a flexible bus and scheduling strategy that is tailored for various GCN models. Additionally, it includes a mathematical analysis of data communication costs to optimize memory traffic. The article also proposes a two-stage preprocessing algorithm to increase data reuse and reduce external memory access.

SPA-GCN [115] integrates deep pipelining and customized levels of parallelization to efficiently process small graphs. SPA-GCN uses a very deep pipeline with nested parallelization, customized compute units, and efficient utilization of FPGA resources. The paper optimizes data flow through FIFOs and dynamic process scheduling. The study compares SPA-GCN’s performance on Xilinx FPGAs with that of Intel Xeon CPUs and NVIDIA GPUs. SPA-GCN highlights the efficiency of the proposed architecture in accelerating GCN computations on small graphics.

BlockGNN addresses the problem of increasing computational complexity in GNNs by proposing a software–hardware co-design strategy using block circulating weight matrices for efficient GNN acceleration, reducing computational complexity from

O (n^{2})

to

O (n l o g n)

with minimal loss of accuracy. The authors suggest compressing GNN models by using block-entangled weight matrices at the algorithm level and implementing a pipelined CirCore architecture at the hardware level. They also propose using the Fast Fourier Transform (FFT) during inference to enhance efficiency. Additionally, they introduce a performance and resource model to automatically optimize hardware parameters for various GNN tasks and improve overall acceleration.

Gui et al. [116] aims to enhance the efficiency of sampling algorithms in GNNs by utilizing hardware acceleration to reduce sampling time while maintaining high test accuracy on large datasets. The paper presents the CONCAT Sampler, an algorithm that merges sample graphs to simplify hardware acceleration and ensure accuracy in GNN models. The authors suggest utilizing the CONCAT Sampler on FPGA hardware. This involves using parallel sampling modules to independently sample from partitioned datasets and combining the results for faster sampling.

Li et al. [117] present an FPGA-based hardware architecture designed to address the challenges of implementing large-scale distributed graph neural networks (LSD-GNNs) in hyperscale environments, with a focus on memory access acceleration and scalability. This work highlights the obstacles faced by LSD-GNNs in hyperscale environments and discusses custom heterogeneous hardware solutions to improve memory access and sampling performance. This work involves the implementation of a domain-specific hardware architecture with an access engine (AxE) to optimize memory access and a RISC-V control interface to implement a memory-on-fabric (MoF) system for near-data processing, with the goals of scalability and programmability.

Graph-OPU [118] addresses the need for efficient acceleration methods for GNNs due to their widespread applications and provides a solution to the challenge of time-consuming FPGA reconfiguration when switching between GNN models. The authors present this work as a novel FPGA-based overlay processor adapted to mainstream GNNs, with software programmability that enables fast model switching. Graph-OPU includes instruction set customization, design of a fully pipelined microarchitecture, implementation of unified matrix multiplication, and testing with various datasets on the Xilinx Alveo U50.

4.3. FPGA-Based Heterogeneous Approaches

Heterogeneous computing architectures combine different types of processors to create powerful and flexible systems. CPU-FPGA heterogeneous approaches, in particular, offer solutions for applications requiring high levels of parallelism and customizability. By combining the general-purpose processing capability of the CPU with the specialized hardware advantages of the FPGA, these approaches enable the faster and more energy-efficient performance of complex computational tasks [28]. Although these structures are not applicable to embedded systems, they are preferred for large-scale applications. This section reviews some proposed heterogeneous FPGA approaches for GNNs.

Zhang et al. [28] recommend a method for training GNNs on large-scale graphs using a CPU-FPGA heterogeneous platform. The method uses neighbor sampling to address scalability and overfitting challenges. The authors discuss the computational challenges associated with neighbour sampling and feature aggregation, which impact the overall execution time of NS GNN training, thereby impeding scalability and efficiency. The work involves implementing a parallel neighbor sampling algorithm on the main processor and an FPGA accelerator optimized for GNN operations. It also proposes to leverage optimizations such as neighbor sharing and task pipelining to improve memory performance and computational efficiency, ultimately increasing training throughput and accelerating NS GNN training.

GraphACT [31] proposes a heterogeneous approach to address memory access and load balancing challenges in accelerating the training of GCNs on CPU-FPGA platforms due to significant data communication issues. GraphACT optimizes the FPGA design through a graph-theoretic preprocessing step to balance the load across on-chip compute modules and various FPGA devices. Additionally, it features a systolic array-based design that improves weight update efficiency. The authors evaluated this work using a 40-core Xeon server with a Xilinx Alveo U200 board and demonstrated the strengths of the proposed method with parameters such as convergence time and test set accuracy on datasets such as PPI, Reddit, and Yelp.

Zhang et al. [119] aim to improve the efficiency and speed of GNNs in real-time applications. They address the challenge of achieving low-latency mini-batch inference on CPU-FPGA platforms. The hardware accelerator design presented in this paper uses the Adaptive Computing Kernel (ACK) architecture to run different GNN computing cores with low latency. It provides a unified solution for different GNN models on FPGA platforms without the need for runtime reconfiguration. The methodology involves identifying GNN compute kernels, designing the flexible ACK architecture, and using a design space exploration algorithm to create a single hardware design point for different GNN models and optimize it for low-latency inference without reconfiguration.

HP-GNN [120] targets to address inefficiencies in full-graph GNN training on large graphs. It aims to reduce the high memory footprint and increase the frequency of model updates per epoch on CPU-FPGA heterogeneous platforms. In contrast to previous work that focused on specific GNN models or algorithms, HP-GNN provides a general framework for GNN training on CPU-FPGA platforms. Hardware templates are optimized for efficient accelerator generation. Lin et al. suggest a design space exploration engine to improve throughput and generate accelerators automatically. They also provide software APIs to simplify building a high-level abstraction for sampling-based mini-batch GNN training, developing optimized hardware templates, and simplifying development without requiring hardware expertise.

4.4. Frameworks for FPGA-Based Accelerators

In the field of high-performance computing, FPGA accelerator frameworks facilitate the integration between hardware and software, enabling algorithms to run efficiently on hardware. Frameworks allow researchers to quickly and efficiently prototype customized hardware designs. They can be used to develop power-efficient systems that can perform complex data processing tasks in real-time. Customized FPGA frameworks provide performance advantages to designers, particularly in areas such as big data analysis, artificial intelligence applications, and deep learning models. This section reviews frameworks for FPGA-based accelerators designed for GNNs.

BoostGCN [109] aims to optimize the inference of GCNs on FPGA platforms by proposing a hardware-aware partition-centric feature aggregation (PCFA) scheme to overcome inefficiencies and adaptability challenges in memory accesses encountered by GCNs. The authors state that the issues arise from different graph sizes, sparsity levels, and complexities of GCN models. Zhang et al. propose a framework that can adapt to these variations to enhance performance and efficiency. The PCFA scheme in BoostGCN performs three-dimensional graph partitioning considering data reuse on the chip and external memory architecture. Additionally, a central load balancing scheme is employed to effectively address workload imbalance. This study enables important data parallelism through optimized RTL templates and a task scheduling strategy aimed at minimizing pipeline stalls.

FlowGNN [121] is proposed to support generic GNN models for real-time inference applications. By introducing explicit message-passing and multi-level parallelism, the authors provide a comprehensive solution for GNN acceleration without sacrificing adaptability. FlowGNN includes the compiling of each GNN model into the FPGA kernel, allowing for easy updates of new architectures and the creation of custom accelerators through modular components. Using the Xilinx Alveo U50 FPGA, FlowGNN outperforms existing baselines such as CPU, GPU, and I-GCN, achieving speedup and energy efficiency on a variety of datasets and GNN models.

DeepBurning-GL addresses the growing need for efficient and specialized accelerators for GNNs due to their complexity and computational demands in various applications. Liang et al. [122] present an automated framework for designing custom GNN accelerators that can meet performance requirements while satisfying resource constraints and user-specific design goals. By focusing on automating the customization of GNN accelerators, this work stands out for streamlining the design process by providing end-to-end solutions tailored to specific applications without manual intervention. DeepBurning-GL includes a systematic approach that starts with a performance analysis of GNN models to identify bottlenecks, selects templates based on model requirements, combines templates into a unified accelerator design, and fine-tunes design parameters using a simulated annealing algorithm combined with model-based design space pruning for optimization.

DGNN-Booster is proposed by Chen et al. [123] as an FPGA accelerator framework designed for real-time dynamic graph neural network (DGNN) inference with HLS, offering high-speed performance and low energy consumption. DGNNs pose challenges in hardware deployment due to low parallelism and a lack of general accelerator frameworks for dynamic graphs. DGNN-Booster employs two FPGA accelerator designs, V1 and V2, each optimized for different levels of parallelism and computational intensity. V1 focuses on parallelizing the GNN and RNN in adjacent time steps, while V2 emphasizes parallelization in a single time step. The designs incorporate graph renumbering, format conversion, multi-level parallelism, and task scheduling to improve hardware efficiency and performance.

GNNBuilder [124] is a framework that enables the automatic generation of GNN accelerators for different models, providing design flexibility and optimization strategies. The authors propose using serialized trained direct fit models for efficient design space exploration, facilitating fast performance evaluation compared to traditional HLS synthesis. The framework demonstrates predictive capabilities for runtime and BRAM models based on a database of 400 synthesized designs, using a random forest regressor with 10 predictors and fivefold cross-validation. The evaluation is performed on FPGA-based parallel implementations and shows speedups over PyG CPU, PyG GPU, and C++ CPU runtimes for various GNN models.

FGNAS [125] is proposed as a hardware/software co-exploration framework that uses FPGA platforms to optimize hardware accelerators for GNNs, aiming at efficient GNN deployment. FGNAS is differentiated by its holistic approach in integrating hardware and software considerations to improve accuracy and speedup in GNN architectures on FPGAs. The methodology uses reinforcement learning to sequentially sample hardware and software parameters, analyze FPGA model performance, and update the controller using policy gradients for optimized design. By partitioning the search space into architectural and hardware parameters such as embedding size, attention type, aggregation type, and group sizes, FGNAS efficiently explores the design space to determine the optimal GNN architecture and FPGA design.

4.5. FPGA-Based Accelerator Approaches with Quantization

By reducing the dimension of data representations and using hardware resources more efficiently, quantization allows FPGA accelerators to reduce energy consumption and increase processing speed. Quantization is especially important for embedded systems that have limited computational resources due to energy constraints. FPGA accelerators designed using quantization increase the usability of GNNs in real-time applications and scenarios requiring energy efficiency. This section explores in detail the integration of quantization techniques with FPGA-based accelerators. Table 4 shows the hardware details, quantization parameters and baselines of these studies.

FPGAN [126] proposes an FPGA-based accelerator to improve the performance and energy efficiency of graph attention networks (GATs) while maintaining accuracy. The study demonstrates the effectiveness of the software–hardware co-optimized approach in accelerating inference and highlights the potential of FPGA accelerators in this area. The authors integrate model optimization and software–hardware co-design to create a dedicated accelerator for GATs. The FPGAN involves process fusion, quantization, data reconfiguration, model tuning, and architecture optimization. Process fusion simplifies the self-attention mechanism, while data reconfiguration improves data storage and access efficiency. Model tuning optimizes activation functions, and architecture design focuses on software and hardware co-design for efficient inference operations. The study demonstrates improvements in performance and energy efficiency without sacrificing accuracy.

SkeletonGCN [127] addresses the growing demand for efficient training of GCNs on FPGA platforms due to their computational and memory requirements. The authors present the work as a solution that optimizes data representation, simplifies operations, and uses a unified hardware architecture to achieve significant speedup without trade-offs in accuracy. The methodology involves quantizing data to SINT16 to reduce computation and storage requirements, simplifying nonlinear operations, using a compression algorithm for sparse matrices, and designing a unified hardware architecture to improve DSP efficiency for various matrix operations in GCN training.

QEGCN [128] is an instance of efficient hardware accelerators to improve GCN performance. The research proposes an FPGA-based accelerator that uses edge-level parallelism. The authors aim to optimize the execution of quantized GCNs and distinguish themselves from existing graph-level parallelism approaches. The paper emphasizes edge-level parallelism, stating that it enables more efficient processing of graph data compared to traditional methods. QEGCN investigates the impact of data quantization on GCN accuracy, evaluates energy efficiency at different quantization levels, and analyzes performance on various benchmark platforms.

FTW-GAT [129] accelerator tackles the difficulties presented by intricate data dependencies and irregular structures of graph attention networks (GATs) by quantizing GAT weights to ternary values. This simplifies processing elements, eliminates the need for digital signal processors (DSPs), and reduces power consumption. This paper presents a methodology that combines ternary-weight quantization with additional techniques such as process fusion, multi-level pipelining, and graph partitioning to increase parallelism in the GAT inference acceleration process. The aim is to improve efficiency and performance. The authors indicate that the proposed model achieves accuracy values similar to full-precision models with quantization, and outperforms baselines in terms of latency and energy efficiency.

Table 4. Quantization studies on FPGA-based accelerators.

Publication	Hardware	Resource Consumption	Quantized Parameters	Baselines
FP-GNN [25]	VCU128 Freq: 225 MHz	LUT: 717,578 FF: 517,428 BRAM: 1792 DSP: 8192	Features, weights 32-bit fixed point	PyG-CPU-GPU, HyGCN, GCNAX, AWB-GCN, I-GCN
LL-GNN [33]	Alveo U250 Freq: 200 MHz	LUT: 815,000 FF: 139,000 BRAM: 37 DSP: 8986	Model parameters 12-bit fixed point	PyG-CPU-GPU
FPGAN [126]	Arria10 GX1150 Freq: 216 MHz	LUT: 250,570 FF: 338,490 BRAM: NI DSP: 148	Features, weights fixed point	PyG-CPU-GPU
SkeletonGCN [127]	Alveo U200 Freq: 250 MHz	LUT: 1,021,386 FF: NI BRAM: 1338 DSP: 960	Feature, adjacency matrices, trainable parameters 16-bit signed integer	PyG-CPU-GPU, GraphACT
QEGCN [128]	VCU128 Freq: 225 MHz	LUT: 21,935 FF: 9201 BRAM: 22 DSP: 0	Features, weights 8-bit fixed point	PyG-CPU-GPU, DGL-CPU-GPU, HyGCN, EnGN, AWB-GCN, ACE-GCN
FTW-GAT [129]	VCU128 Freq: 225 MHz	LUT: 436,657 FF: 470,222 BRAM: 1502 DSP: 1216	8-bit int features3-bit int weights	PyG-CPU-GPU, FP-GNN
Wang et al. [130]	Alveo U200 Freq: 250 MHz	LUT: 101,000 FF: 11,700 BRAM: 1430 DSP: 392	1-bit integer features, weights 32-bit integer adjacency matrix	PyG-CPU-GPU, ASAP [114]
Ran et al. [131]	Alveo U200 Freq: 250 MHz	LUT: 427,438 FF: NI BRAM: 1702 DSP: 33.7	Features, weights	PyG-CPU-GPU, HyGCN, ASAP [114], AWB-GCN, LW-GCN
Yuan et al. [132]	VCU128 Freq: 300 MHz	LUT: 3244 FF: 345 BRAM: 102.5 DSP: 64	Features, weights 32-bit fixed point	PyG-CPU-GPU
LW-GCN [133]	Kintex-7 K325T Freq: 200 MHz	LUT: 161,529 FF: 94,369 BRAM: 291.5 DSP: 512	Features, weights 16-bit signed fixed point	PyG-CPU-GPU, AWB-GCN

Note: NI indicates that the work did not provide information about this data.

LL-GNN [33] aims to minimize latency in processing GNNs on FPGA for real-time applications in high-energy physics, especially in collider triggering systems where ultra-low latency is crucial for timely event selection. Que et al. propose a design combining quantization and FPGAs that offers low latency when processing small graphs and can be used in scenarios requiring sub-microsecond latency and high throughput, such as particle identification in fundamental physics experiments. LL-GNN involves a co-design approach that optimizes both the algorithm and hardware for GNNs on FPGAs, including defining delay thresholds, rebalancing multi-layer perceptron (MLP) sizes, exploring parallelism parameters, and implementing sublayer fusion to improve performance and reduce latency. The authors show that they have achieved low latency as a result of this work and that it is possible to embed GNNs in FPGAs with sub-microsecond latency.

HuGraph [34] aims to address the increasing demand for efficient and scalable GCN training on large and irregular graphs using heterogeneous FPGA clusters. The paper employs a load-balanced mapping strategy and a scheduling method to optimize GCN training. HuGraph focuses on adapting various large graphs to FPGAs while achieving significant speedups with minimal loss of accuracy compared to traditional platforms. The authors state that they use 8-bit integer quantized adjacency coefficients, features, weights, and error values for efficient computation. HuGraph allocates hardware resources, configures samplers, and simulates training performance for each FPGA and dataset configuration to achieve workload balance across heterogeneous FPGAs.

Figure 11 compares the performance of four FPGA-based accelerator methods in various configurations using the same dataset. In this comparison, AWB-GCN, which does not incorporate quantization operations, serves as the common baseline for both latency and energy efficiency graphs. The results are derived from the comparative findings of the original studies. The latency results indicate that the methods employing quantization exhibit higher latencies compared to the baseline without quantization. This increase in latency can be attributed to the additional processing required for quantization. However, when evaluating energy efficiency, the positive effects of quantization are evident in certain cases, demonstrating the potential benefits of quantization in reducing energy consumption. Furthermore, Figure 12 compares the resource utilization of FPGA studies.

Ran et al. [131] developed a software–hardware co-design to achieve low-latency GCN inference on FPGA platforms. The paper presents an integrated approach that combines algorithm-level attention mechanism-based graph parsing with a two-phase hardware architecture and achieves speedups compared to existing accelerators. This work includes the use of attention mechanisms for graph parsing, designing a pipelined two-stage accelerator for efficient aggregation and merging phases, exploiting edge-level and feature-level parallelism, and implementing a graph partitioning strategy to improve data reuse efficiency. The authors emphasize the use of fixed-point representation in the study, as floating-point representation is more resource-intensive.

FP-GNN [25] is suggested as a unified processing module that can perform both the aggregation and combination phases simultaneously. This allows for flexible execution orders and supports various GNN models. The authors provide a customizable hardware architecture with components such as a Workflow Controller and Processing Modules, enabling high-performance computing and efficient resource utilization. The adaptive GNN accelerator (AGA) framework and adaptive graph partitioning (AGP) strategy in FP-GNN optimize parallelism and memory management, enabling improved performance efficiency in GNN inference tasks. In this work, weights and input features are quantized as 32-bit fixed-point to take advantage of the integer artifacts while avoiding accuracy loss.

Yuan et al. [132] use a 32-bit fixed-point representation for efficient computation without loss of accuracy, as in FP-GNN. The paper aims to enhance the performance and energy efficiency of GNNs by developing a dedicated accelerator for the Gathering phase on FPGA platforms. This addresses the inefficiencies of CPUs and GPUs in handling dynamic and irregular data access patterns of GNNs. This research presents an architecture that optimizes the execution order of the Apply and Gather phases, reducing operations and improving performance, while leveraging FPGA technology to design an efficient accelerator for the Gather phase of GNNs.

Wang et al. [130] introduce a customizable FPGA-based accelerator for BiGCNs. The hardware optimizations include fine-grained pipelining, sparse matrix multiplication, and partial unrolling, which significantly improve performance. The work also includes overlapping data transfer with computation, using COO format storage for the adjacency matrix, and sparse matrix multiplication to improve hardware efficiency. The focus is on deep customization to support various GCNs. The results indicate that the proposed FPGA-based accelerator outperforms a previous FPGA-based GCN accelerator by achieving four times faster throughput while consuming fewer DSP resources.

4.6. FPGA-Based Accelerators for Embedded Applications

Efficient system designs for embedded devices are critical for applications that require efficient data processing and fast response times due to their limited computational resources. In this scope, FPGA-based accelerators have the potential to improve the performance of embedded systems, optimizing energy consumption and handling more complex computational tasks. In particular, the implementation of advanced machine learning models such as GNNs in embedded systems is made possible by the flexibility and reconfigurability of FPGA accelerators. This chapter provides a detailed review of the work on FPGA-based accelerators designed for embedded systems. Furthermore, Table 5 lists information about the targeted platform, quantization information and datasets for these studies.

Table 5. FPGA-based GNN accelerator works for embedded devices.

Study	Target Device	Datasets	Fixed-Point Representation
gFADES [76]	Zynq Ultrascale+ XCZU28DR	Cora, Citeseer, Pubmed	-
LW-GCN [133]	Xilinx Kintex-7	Cora, Citeseer, Pubmed	✓
Zhou et al. [134]	Xilinx ZCU104, Alveo U200	Wikipedia, Reddit, GDELT	-
Hansson et al. [135]	Xilinx Zynq UltraScale+	Cora, Citeseer, Pubmed	✓

LW-GCN is proposed by Tao et al. [133] to improve the efficiency of graph convolutional networks (GCNs) on edge devices with limited resources by addressing irregular computation and memory access challenges. LW-GCN addresses irregular computation and memory access challenges and achieves improvements in storage utilization and computation time. This is accomplished through a software–hardware co-design approach and a PCOO compression format that efficiently preprocesses sparse data. The authors present a versatile approach that includes post-training quantization with 16-bit signed fixed-point representation for features and weights, and 4-bit signed fixed-point quantization for non-zero elements in sparse matrices. Additionally, they use an external product tiling technique to balance the workload and reduce the data volume during computation. LW-GCN achieves important reductions in latency and improvements in power efficiency, demonstrating its effectiveness in improving GCN inference performance on edge devices.

Zhou et al. [134] aims to improve the accuracy and speed of inference over dynamic graphs by addressing the challenge of efficiently processing temporal information in graph neural networks. They investigate the optimization of temporal graph neural networks through a model–architecture co-design approach and exploit batch processing, pipelining, and prefetching techniques for improved performance. This paper presents a co-design methodology for temporal graph neural networks that optimizes both the model and the hardware architecture. The methodology enables efficient processing of evolving temporal information on FPGA platforms. The paper includes proposing a hardware mechanism to enable chronological vertex updates without compromising computational parallelism. The methodology reduces computational complexity and memory access while maintaining an accuracy loss of less than 0.33%. The study compared the performance of embedded-scale and large-scale hardware platforms using parameters such as latency, throughput, and batch size.

gFADES [76] discusses the complex data access and processing requirements of GNNs, combining both dense and sparse data representations, which makes hardware acceleration crucial for efficient computation. The focus of the paper is on hardware acceleration for GNNs, with a particular emphasis on performance optimization for GCNs. This paper stands out due to its approach that utilizes a dataflow architecture of data streams to optimize dense and sparse tensor processing, resulting in high-performance execution of GNNs. The methodology of the study involves developing the gFADES accelerator using a dataflow of dataflow (DoD) approach, optimizing the high-level synthesis (HLS) description for extreme sparsity in GNNs, scaling performance with multiple hardware threads and compute units, and integrating the accelerator with Pytorch for edge computing devices. In the study, the Zynq device, known for its resource-constrained structure, was used as the base hardware with the PYNQ overlay. Results of the study show performance improvements of up to

140 \times

with multithreaded hardware configurations compared to optimized software implementations in Pytorch, demonstrating the effectiveness of the gFADES accelerator in improving GNN processing on resource-constrained devices.

Hansson et al. [135] extend the gFADES architecture and focus on optimizing hardware designs for graph convolutional networks (GCNs) by exploring deep quantization strategies to improve performance and energy efficiency. The authors state that the challenge is to meet the computational requirements of GCNs on resource-constrained embedded hardware while maintaining accuracy. The paper provides runtime hardware-aware training for GCNs by embedding a mixed-precision design in the forward pass of the PyTorch backpropagation training loop, enabling significant speedups without trade-offs in accuracy. The methodology involves integrating a hardware accelerator into the PyTorch training loop to explore various hardware configurations with different bit precision levels of up to four bits and evaluate their impact on classification accuracy and performance. The work focuses on optimizing the quantization of features, adjacencies, weights, and intermediate data to increase hardware efficiency and throughput while minimizing logic requirements. The results of the study are evaluated with classification accuracy, execution time, and speedup metrics for different cases obtained with hardware configurations. The authors state that optimized hardware design with deep quantization can provide significant speedup while maintaining classification accuracy.

4.7. Future Research Directions

FPGA-based accelerators for graph neural networks (GNNs) hold significant promise for enhancing performance and energy efficiency, especially in edge computing environments. Future research could delve into optimizing FPGA architectures specifically tailored for different GNN models, aiming to improve processing speed and reduce power consumption. Additionally, the development of dynamic reconfiguration techniques that allow FPGAs to adapt to varying computational loads and graph sizes in real-time could be a game changer. Furthermore, investigating the use of high-level synthesis tools to simplify the design and implementation of FPGA-based GNN accelerators can make this technology more accessible to researchers and developers. These advances could significantly improve the deployment of GNNs in real-world applications, providing efficient and scalable solutions for complex graph-based problems.

5. Discussion and Future Research Directions

Following our examination of quantization techniques and reconfigurable (FPGA) hardware accelerators that enhance the computational efficiency in GNNs from the standpoint of embedded devices, we identify the variances and parallels among current methods based on their techniques. This section addresses the existing gaps, similarities, obstacles, and potential areas in the literature that could be investigated further for future studies.

5.1. Summary of Current Research

The summarized findings of our research on quantization methods and FPGA accelerators for GNNs are as follows:

Achieving a delicate balance between energy efficiency, training-output speed, and accuracy in unified approaches requires careful customization during the design phase according to the specific requirements of particular applications, highlighting the important role of future efforts in achieving this balance.
Quantization methods employed during both the training and inference phases offer effective solutions to challenges such as computational complexity and memory demands in GNN models.
Scalar quantization methods are prevalent in embedded systems due to their ease of implementation and the computational efficiency of integer arithmetic.
Vector quantization provides higher compression ratios compared to scalar quantization by grouping multiple vectors together.
Mixed precision approaches show the potential to maintain accuracy while reducing model size. However, different bit representations can introduce computational complexity from a hardware standpoint.
Research shows that the accuracy achieved with 16-bit and 8-bit quantization values can be achieved with lower-bit numbers such as 4-bit and 2-bit.
The current body of FPGA studies related to graph neural network (GNN) models is still insufficient to comprehensively address the complexities of embedded system applications.
The adaptive nature of FPGA accelerators exhibits notable efficacy in accommodating diverse application requirements, demonstrating their potential for widespread adoption in various domains.
While a significant portion of research efforts are focused on GNN inference, there is a critical need to accelerate the training phase.
While the utilization of common datasets and network models provides an initial benchmark for researchers, the limited extension of studies to diverse application domains and the absence of the establishment of distinct baselines pose significant challenges requiring resolution.

5.2. Future Research Directions

Despite the growing interest in machine learning models, there is a noticeable gap in the literature regarding the investigation of FPGA-based GNN accelerators and efficient quantization methods designed specifically for embedded systems. This lack of research emphasizes the importance of investigating computationally efficient applications in this area. In addressing this gap, several key themes emerge:

Combining vector and scalar quantization can offer the advantages of both integer arithmetic computational power and the high compression ratio of vector quantization, which is crucial for developing highly efficient low-dimensional models for hardware applications.
For embedded system applications and accelerator studies, integer arithmetic provides high computational efficiency. Consequently, the development of fully quantized GNN models specifically designed for embedded system applications is crucial for efficient scalable future work.
High accuracy levels can be achieved even at low bit levels with new quantization methods. In this context, the adoption of aggressive methods involving low-bit representations to integrate large GNN models into embedded device applications is expected to attract the attention of more researchers.
The number of FPGA applications for embedded systems is quite insufficient compared to quantization studies, highlighting an important research gap in the FPGA field.
There is a growing need to accelerate the training phase, especially for dynamic graph structures, and this is a research gap that requires further research.
Although this work is focused on quantization and FPGA-based accelerators, additional techniques such as sampling, reordering, simplification, and knowledge distillation are currently being used with promising results. It is anticipated that interest in additional methods such as quantization and other approaches will grow in hardware-based applications.

6. Conclusions

In this paper, we provide an overview of FPGA-based accelerator approaches for computationally efficient graph neural networks (GNNs) with a focus on embedded hardware. In addition, we explore quantization as an additional method. To the best of our knowledge, for the first time in the existing literature, we present a survey of quantization methods developed for GNNs and assess common fundamentals by comparing these methods with results from published papers. Furthermore, with this survey, we update the work on FPGA-based accelerators for GNNs and provide a taxonomy of existing GNN models.

Our results indicate a growing need for lightweight and computationally efficient GNN models. Quantization studies enable the development of models that run on embedded hardware due to their high compression ratios. Additionally, hardware-based accelerators for GNNs are gaining popularity, and we demonstrate promising results for future work, emphasizing that FPGA accelerators combined with quantization are under-explored when embedded device applications are the focus of research.

Given these findings, we suggest that future work should further investigate the integration of quantization and FPGA-based accelerators to improve the computational efficiency of GNNs. Specifically, studies evaluating the effectiveness of these techniques in real-world scenarios and on large datasets can expand the practical applications of GNNs and promote their use in industrial applications. Moreover, the combination of quantization and accelerator technologies could unlock new research opportunities to optimize energy efficiency and performance in embedded systems.

Author Contributions

Investigation, H.T.K.; writing—original draft preparation, H.T.K.; writing—review and editing, H.T.K., J.N.-Y., R.P. and J.P.; visualization, H.T.K.; supervision, J.N.-Y., R.P. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

Kose’s PhD program at the University of Bristol is funded by the Republic of Turkiye Ministry of National Education. Nunez-Yanez position at the Wallenberg AI autonomous systems and software (WASP) program is funded by the Knut and Alice Wallenberg Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the materials used in the study are mentioned within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Gama, F.; Isufi, E.; Leus, G.; Ribeiro, A. Graphs, convolutions, and neural networks: From graph filters to graph neural networks. IEEE Signal Process. Mag. 2020, 37, 128–138. [Google Scholar] [CrossRef]
Coutino, M.; Isufi, E.; Leus, G. Advances in distributed graph filtering. IEEE Trans. Signal Process. 2019, 67, 2320–2333. [Google Scholar] [CrossRef]
Saad, L.B.; Beferull-Lozano, B. Quantization in graph convolutional neural networks. In Proceedings of the 29th IEEE European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 1855–1859. [Google Scholar]
Zhu, R.; Zhao, K.; Yang, H.; Lin, W.; Zhou, C.; Ai, B.; Li, Y.; Zhou, J. Aligraph: A comprehensive graph neural network platform. arXiv 2019, arXiv:1902.08730. [Google Scholar] [CrossRef]
Ju, X.; Farrell, S.; Calafiura, P.; Murnane, D.; Gray, L.; Klijnsma, T.; Pedro, K.; Cerati, G.; Kowalkowski, J.; Perdue, G.; et al. Graph neural networks for particle reconstruction in high energy physics detectors. arXiv 2020, arXiv:2003.11603. [Google Scholar]
Ju, X.; Murnane, D.; Calafiura, P.; Choma, N.; Conlon, S.; Farrell, S.; Xu, Y.; Spiropulu, M.; Vlimant, J.R.; Aurisano, A.; et al. Performance of a geometric deep learning pipeline for HL-LHC particle tracking. Eur. Phys. J. C 2021, 81, 1–14. [Google Scholar] [CrossRef]
Wu, L.; Chen, Y.; Shen, K.; Guo, X.; Gao, H.; Li, S.; Pei, J.; Long, B. Graph neural networks for natural language processing: A survey. Found. Trends® Mach. Learn. 2023, 16, 119–328. [Google Scholar] [CrossRef]
Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
Pope, J.; Liang, J.; Kumar, V.; Raimondo, F.; Sun, X.; McConville, R.; Pasquier, T.; Piechocki, R.; Oikonomou, G.; Luo, B.; et al. Resource-Interaction Graph: Efficient Graph Representation for Anomaly Detection. arXiv 2022, arXiv:2212.08525. [Google Scholar]
Betkier, I.; Oszczypała, M.; Pobożniak, J.; Sobieski, S.; Betkier, P. PocketFinderGNN: A manufacturing feature recognition software based on Graph Neural Networks (GNNs) using PyTorch Geometric and NetworkX. SoftwareX 2023, 23, 101466. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Huang, W.; Zhang, T.; Rong, Y.; Huang, J. Adaptive sampling towards fast graph representation learning. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 974–983. [Google Scholar]
Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar]
Wang, M.Y. Deep graph library: Towards efficient and scalable deep learning on graphs. In Proceedings of the ICLR Workshop on Representation Learning on Graphs and Manifolds, New Orleans, LA, USA, 6 May 2019. [Google Scholar]
Lerer, A.; Wu, L.; Shen, J.; Lacroix, T.; Wehrstedt, L.; Bose, A.; Peysakhovich, A. Pytorch-biggraph: A large scale graph embedding system. Proc. Mach. Learn. Syst. 2019, 1, 120–131. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
Zhang, Z.; Cui, P.; Zhu, W. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 249–270. [Google Scholar] [CrossRef]
Geng, T.; Li, A.; Shi, R.; Wu, C.; Wang, T.; Li, Y.; Haghi, P.; Tumeo, A.; Che, S.; Reinhardt, S.; et al. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 17–21 October 2020; pp. 922–936. [Google Scholar]
Fey, M.; Lenssen, J.E. Fast graph representation learning with PyTorch Geometric. arXiv 2019, arXiv:1903.02428. [Google Scholar]
Ferludin, O.; Eigenwillig, A.; Blais, M.; Zelle, D.; Pfeifer, J.; Sanchez-Gonzalez, A.; Li, S.; Abu-El-Haija, S.; Battaglia, P.; Bulut, N.; et al. TF-GNN: Graph neural networks in TensorFlow. arXiv 2022, arXiv:2207.03522. [Google Scholar]
Yazdanbakhsh, A.; Park, J.; Sharma, H.; Lotfi-Kamran, P.; Esmaeilzadeh, H. Neural acceleration for GPU throughput processors. In Proceedings of the 48th International Symposium on Microarchitecture, Waikiki, HI, USA, 5–9 December 2015; pp. 482–493. [Google Scholar]
Tian, T.; Zhao, L.; Wang, X.; Wu, Q.; Yuan, W.; Jin, X. FP-GNN: Adaptive FPGA accelerator for graph neural networks. Future Gener. Comput. Syst. 2022, 136, 294–310. [Google Scholar] [CrossRef]
Nunez-Yanez, J.; Hosseinabady, M. Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks. Array 2021, 12, 100101. [Google Scholar] [CrossRef]
Sit, M.; Kazami, R.; Amano, H. FPGA-based accelerator for losslessly quantized convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Field Programmable Technology (ICFPT), Melbourne, VIC, Australia, 11–13 December 2017; pp. 295–298. [Google Scholar]
Zhang, B.; Kuppannagari, S.R.; Kannan, R.; Prasanna, V. Efficient neighbor-sampling-based gnn training on cpu-fpga heterogeneous platform. In Proceedings of the 2021 IEEE High Performance Extreme Computing Conference (HPEC), Virtual, 21–23 September 2021; pp. 1–7. [Google Scholar]
Liang, S.; Wang, Y.; Liu, C.; He, L.; Huawei, L.; Xu, D.; Li, X. Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. 2020, 70, 1511–1525. [Google Scholar] [CrossRef]
Zhang, S.; Sohrabizadeh, A.; Wan, C.; Huang, Z.; Hu, Z.; Wang, Y.; Cong, J.; Sun, Y. A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware. arXiv 2023, arXiv:2306.14052. [Google Scholar]
Zeng, H.; Prasanna, V. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA, 23–25 February 2020; pp. 255–265. [Google Scholar]
Kiningham, K.; Levis, P.; Ré, C. GReTA: Hardware optimized graph processing for GNNs. In Proceedings of the Workshop on Resource-Constrained Machine Learning (ReCoML 2020), Austin, TX, USA, 2–4 March 2020. [Google Scholar]
Que, Z.; Loo, M.; Fan, H.; Blott, M.; Pierini, M.; Tapper, A.D.; Luk, W. LL-GNN: Low latency graph neural networks on FPGAs for particle detectors. arXiv 2022, arXiv:2209.14065. [Google Scholar]
Zhao, L.; Wu, Q.; Wang, X.; Tian, T.; Wu, W.; Jin, X. HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization. In Proceedings of the 2022 IEEE High Performance Extreme Computing Conference (HPEC), Virtual Conference, 19–23 September 2022; pp. 1–7. [Google Scholar]
Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022; pp. 291–326. [Google Scholar]
Tailor, S.A.; Fernandez-Marques, J.; Lane, N.D. Degree-quant: Quantization-aware training for graph neural networks. arXiv 2020, arXiv:2008.05000. [Google Scholar]
Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl.-Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef]
Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 2019, 6, 1–23. [Google Scholar] [CrossRef]
Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: Algorithms, applications and open challenges. In Proceedings of the Computational Data and Social Networks: 7th International Conference, CSoNet 2018, Shanghai, China, 18–20 December 2018; Proceedings 7. Springer: Berlin, Germany, 2018; pp. 79–91. [Google Scholar]
Quan, P.; Shi, Y.; Lei, M.; Leng, J.; Zhang, T.; Niu, L. A brief review of receptive fields in graph convolutional networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence-Companion Volume, Thessaloniki, Greece, 14–17 October 2019; pp. 106–110. [Google Scholar]
Asif, N.A.; Sarker, Y.; Chakrabortty, R.K.; Ryan, M.J.; Ahamed, M.H.; Saha, D.K.; Badal, F.R.; Das, S.K.; Ali, M.F.; Moyeen, S.I.; et al. Graph neural network: A comprehensive review on non-euclidean space. IEEE Access 2021, 9, 60588–60606. [Google Scholar] [CrossRef]
Chami, I.; Abu-El-Haija, S.; Perozzi, B.; Ré, C.; Murphy, K. Machine learning on graphs: A model and comprehensive taxonomy. J. Mach. Learn. Res. 2022, 23, 3840–3903. [Google Scholar]
Veličković, P. Everything is connected: Graph neural networks. Curr. Opin. Struct. Biol. 2023, 79, 102538. [Google Scholar] [CrossRef]
Bhatti, U.A.; Tang, H.; Wu, G.; Marjan, S.; Hussain, A. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. Int. J. Intell. Syst. 2023, 2023, 1–28. [Google Scholar] [CrossRef]
Xu, X.; Zhao, X.; Wei, M.; Li, Z. A comprehensive review of graph convolutional networks: Approaches and applications. Electron. Res. Arch. 2023, 31, 4185–4215. [Google Scholar] [CrossRef]
Shabani, N.; Wu, J.; Beheshti, A.; Sheng, Q.Z.; Foo, J.; Haghighi, V.; Hanif, A.; Shahabikargar, M. A comprehensive survey on graph summarization with graph neural networks. IEEE Trans. Artif. Intell. 2024. [Google Scholar] [CrossRef]
Ju, W.; Fang, Z.; Gu, Y.; Liu, Z.; Long, Q.; Qiao, Z.; Qin, Y.; Shen, J.; Sun, F.; Xiao, Z.; et al. A comprehensive survey on deep graph representation learning. Neural Netw. 2024, 173, 106207. [Google Scholar] [CrossRef]
Liu, R.; Xing, P.; Deng, Z.; Li, A.; Guan, C.; Yu, H. Federated Graph Neural Networks: Overview, Techniques, and Challenges. IEEE Trans. Neural Netw. Learn. Syst. 2024. [Google Scholar] [CrossRef]
Lopera, D.S.; Servadei, L.; Kiprit, G.N.; Hazra, S.; Wille, R.; Ecker, W. A survey of graph neural networks for electronic design automation. In Proceedings of the 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), Raleigh, NC, USA, 30 August–3 September 2021; pp. 1–6. [Google Scholar]
Liu, X.; Yan, M.; Deng, L.; Li, G.; Ye, X.; Fan, D. Sampling methods for efficient training of graph convolutional networks: A survey. IEEE/CAA J. Autom. Sin. 2021, 9, 205–234. [Google Scholar] [CrossRef]
Varlamis, I.; Michail, D.; Glykou, F.; Tsantilas, P. A survey on the use of graph convolutional networks for combating fake news. Future Internet 2022, 14, 70. [Google Scholar] [CrossRef]
Li, H.; Zhao, Y.; Mao, Z.; Qin, Y.; Xiao, Z.; Feng, J.; Gu, Y.; Ju, W.; Luo, X.; Zhang, M. A survey on graph neural networks in intelligent transportation systems. arXiv 2024, arXiv:2401.00713. [Google Scholar]
Lamb, L.C.; Garcez, A.; Gori, M.; Prates, M.; Avelar, P.; Vardi, M. Graph neural networks meet neural-symbolic computing: A survey and perspective. arXiv 2020, arXiv:2003.00330. [Google Scholar]
Malekzadeh, M.; Hajibabaee, P.; Heidari, M.; Zad, S.; Uzuner, O.; Jones, J.H. Review of graph neural network in text classification. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021. [Google Scholar]
Ahmad, T.; Jin, L.; Zhang, X.; Lai, S.; Tang, G.; Lin, L. Graph convolutional neural network for human action recognition: A comprehensive survey. IEEE Trans. Artif. Intell. 2021, 2, 128–145. [Google Scholar] [CrossRef]
Dong, G.; Tang, M.; Wang, Z.; Gao, J.; Guo, S.; Cai, L.; Gutierrez, R.; Campbel, B.; Barnes, L.E.; Boukhechba, M. Graph neural networks in IoT: A survey. ACM Trans. Sens. Netw. 2023, 19, 1–50. [Google Scholar] [CrossRef]
Jia, M.; Gabrys, B.; Musial, K. A Network Science perspective of Graph Convolutional Networks: A survey. IEEE Access 2023. [Google Scholar] [CrossRef]
Ren, H.; Lu, W.; Xiao, Y.; Chang, X.; Wang, X.; Dong, Z.; Fang, D. Graph convolutional networks in language and vision: A survey. Knowl.-Based Syst. 2022, 251, 109250. [Google Scholar] [CrossRef]
Garg, R.; Qin, E.; Martínez, F.M.; Guirado, R.; Jain, A.; Abadal, S.; Abellán, J.L.; Acacio, M.E.; Alarcón, E.; Rajamanickam, S.; et al. A Taxonomy for Classification and Comparison of Dataflows for Gnn Accelerators; Technical Report; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2021. [Google Scholar]
Li, S.; Tao, Y.; Tang, E.; Xie, T.; Chen, R. A survey of field programmable gate array (FPGA)-based graph convolutional neural network accelerators: Challenges and opportunities. PeerJ Comput. Sci. 2022, 8, e1166. [Google Scholar] [CrossRef]
Liu, X.; Yan, M.; Deng, L.; Li, G.; Ye, X.; Fan, D.; Pan, S.; Xie, Y. Survey on graph neural network acceleration: An algorithmic perspective. arXiv 2022, arXiv:2202.04822. [Google Scholar]
Abadal, S.; Jain, A.; Guirado, R.; López-Alonso, J.; Alarcón, E. Computing graph neural networks: A survey from algorithms to accelerators. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Liao, R.; Zhao, Z.; Urtasun, R.; Zemel, R.S. Lanczosnet: Multi-scale deep graph convolutional networks. arXiv 2019, arXiv:1901.01484. [Google Scholar]
Dwivedi, V.P.; Bresson, X. A generalization of transformer networks to graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]
Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; Bronstein, M.M. Geometric deep learning on graphs and manifolds using mixture model cnns. arXiv 2016, arXiv:1611.08402. [Google Scholar]
Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated graph sequence neural networks. arXiv 2015, arXiv:1511.05493. [Google Scholar]
Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
Pan, S.; Hu, R.; Long, G.; Jiang, J.; Yao, L.; Zhang, C. Adversarially regularized graph autoencoder for graph embedding. arXiv 2018, arXiv:1802.04407. [Google Scholar]
You, J.; Ying, R.; Ren, X.; Hamilton, W.; Leskovec, J. Graphrnn: Generating realistic graphs with deep auto-regressive models. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5708–5717. [Google Scholar]
Ying, Z.; You, J.; Morris, C.; Ren, X.; Hamilton, W.; Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
Ma, Y.; Wang, S.; Aggarwal, C.C.; Tang, J. Graph convolutional networks with eigenpooling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 723–731. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Nunez-Yanez, J. Accelerating Graph Neural Networks in Pytorch with HLS and Deep Dataflows. In Proceedings of the International Symposium on Applied Reconfigurable Computing; Springer: Berlin, Germany, 2023; pp. 131–145. [Google Scholar]
Chen, R.; Zhang, H.; Li, S.; Tang, E.; Yu, J.; Wang, K. Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks. In Proceedings of the 2023 33rd IEEE International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden, 4–8 September 2023; pp. 228–234. [Google Scholar]
Novkin, R.; Amrouch, H.; Klemme, F. Approximation-aware and quantization-aware training for graph neural networks. IEEE Trans. Comput. 2024, 73, 599–612. [Google Scholar] [CrossRef]
Wan, B.; Zhao, J.; Wu, C. Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training. In Proceedings of the Machine Learning and Systems, Miami Beach, FL, USA, 4–8 June 2023; Volume 5. [Google Scholar]
Wu, Q.; Zhao, L.; Liang, H.; Wang, X.; Tao, L.; Tian, T.; Wang, T.; He, Z.; Wu, W.; Jin, X. GCINT: Dynamic Quantization Algorithm for Training Graph Convolution Neural Networks Using Only Integers. 2023. Available online: https://openreview.net/forum?id=cIFtriyX6on (accessed on 20 June 2024).
Wang, Y.; Feng, B.; Ding, Y. QGTC: Accelerating quantized graph neural networks via GPU tensor core. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual, 2–6 April 2022; pp. 107–119. [Google Scholar]
Ma, Y.; Gong, P.; Yi, J.; Yao, Z.; Li, C.; He, Y.; Yan, F. Bifeat: Supercharge gnn training via graph feature quantization. arXiv 2022, arXiv:2207.14696. [Google Scholar]
Eliasof, M.; Bodner, B.J.; Treister, E. Haar wavelet feature compression for quantized graph convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 4542–4553. [Google Scholar] [CrossRef]
Dai, Y.; Tang, X.; Zhang, Y. An efficient segmented quantization for graph neural networks. CCF Trans. High Perform. Comput. 2022, 4, 461–473. [Google Scholar] [CrossRef]
Zhu, Z.; Li, F.; Mo, Z.; Hu, Q.; Li, G.; Liu, Z.; Liang, X.; Cheng, J. A²Q: Aggregation-Aware Quantization for Graph Neural Networks. arXiv 2023, arXiv:2302.00193. [Google Scholar]
Wang, S.; Eravci, B.; Guliyev, R.; Ferhatosmanoglu, H. Low-bit quantization for deep graph neural networks with smoothness-aware message propagation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 2626–2636. [Google Scholar]
Liu, Z.; Zhou, K.; Yang, F.; Li, L.; Chen, R.; Hu, X. EXACT: Scalable graph neural networks training via extreme activation compression. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
Eliassen, S.; Selvan, R. Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization. arXiv 2023, arXiv:2309.11856. [Google Scholar]
Ding, M.; Kong, K.; Li, J.; Zhu, C.; Dickerson, J.; Huang, F.; Goldstein, T. VQ-GNN: A universal framework to scale up graph neural networks using vector quantization. Adv. Neural Inf. Process. Syst. 2021, 34, 6733–6746. [Google Scholar]
Feng, B.; Wang, Y.; Li, X.; Yang, S.; Peng, X.; Ding, Y. Sgquant: Squeezing the last bit on graph neural networks with specialized quantization. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 1044–1052. [Google Scholar]
Zhao, Y.; Wang, D.; Bates, D.; Mullins, R.; Jamnik, M.; Lio, P. Learned low precision graph neural networks. arXiv 2020, arXiv:2009.09232. [Google Scholar]
Bahri, M.; Bahl, G.; Zafeiriou, S. Binary graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9492–9501. [Google Scholar]
Wang, H.; Lian, D.; Zhang, Y.; Qin, L.; He, X.; Lin, Y.; Lin, X. Binarized graph neural network. World Wide Web 2021, 24, 825–848. [Google Scholar] [CrossRef]
Huang, L.; Zhang, Z.; Du, Z.; Li, S.; Zheng, H.; Xie, Y.; Tan, N. EPQuant: A Graph Neural Network compression approach based on product quantization. Neurocomputing 2022, 503, 49–61. [Google Scholar] [CrossRef]
Wang, J.; Wang, Y.; Yang, Z.; Yang, L.; Guo, Y. Bi-gcn: Binary graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1561–1570. [Google Scholar]
Kose, H.T.; Nunez-Yanez, J.; Piechocki, R.; Pope, J. Fully Quantized Graph Convolutional Networks for Embedded Applications. In Proceedings of the 6th Workshop on Accelerated Machine Learning, Munich, Germany, 17 January 2024. [Google Scholar]
Chen, Y.; Guo, Y.; Zeng, Z.; Zou, X.; Li, Y.; Chen, C. Topology-Aware Quantization Strategy via Personalized PageRank for Graph Neural Networks. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), Haikou, China, 15–18 December 2022; pp. 961–968. [Google Scholar]
Guo, Y.; Chen, Y.; Zou, X.; Yang, X.; Gu, Y. Algorithms and architecture support of degree-based quantization for graph neural networks. J. Syst. Archit. 2022, 129, 102578. [Google Scholar] [CrossRef]
Xie, X.; Peng, H.; Hasan, A.; Huang, S.; Zhao, J.; Fang, H.; Zhang, W.; Geng, T.; Khan, O.; Ding, C. Accel-gcn: High-performance gpu accelerator design for graph convolution networks. In Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
Ma, L.; Yang, Z.; Miao, Y.; Xue, J.; Wu, M.; Zhou, L.; Dai, Y. {NeuGraph}: Parallel deep neural network computation on large graphs. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, USA, 10–12 July 2019; pp. 443–458. [Google Scholar]
Peng, H.; Xie, X.; Shivdikar, K.; Hasan, M.; Zhao, J.; Huang, S.; Khan, O.; Kaeli, D.; Ding, C. Maxk-gnn: Towards theoretical speed limits for accelerating graph neural networks training. arXiv 2023, arXiv:2312.08656. [Google Scholar]
Yan, M.; Deng, L.; Hu, X.; Liang, L.; Feng, Y.; Ye, X.; Zhang, Z.; Fan, D.; Xie, Y. Hygcn: A gcn accelerator with hybrid architecture. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 22–26 February 2020; pp. 15–29. [Google Scholar]
Yin, L.; Wang, J.; Zheng, H. Exploring architecture, dataflow, and sparsity for gcn accelerators: A holistic framework. In Proceedings of the Great Lakes Symposium on VLSI 2023, Knoxville, TN, USA, 5–7 June 2023; pp. 489–495. [Google Scholar]
Auten, A.; Tomei, M.; Kumar, R. Hardware acceleration of graph neural networks. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), Virtual Event, 20–24 July 2020; pp. 1–6. [Google Scholar]
Chen, X.; Wang, Y.; Xie, X.; Hu, X.; Basak, A.; Liang, L.; Yan, M.; Deng, L.; Ding, Y.; Du, Z.; et al. Rubik: A hierarchical architecture for efficient graph neural network training. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2021, 41, 936–949. [Google Scholar] [CrossRef]
Li, J.; Louri, A.; Karanth, A.; Bunescu, R. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 27 February –3 March 2021; pp. 775–788. [Google Scholar]
Li, J.; Zheng, H.; Wang, K.; Louri, A. SGCNAX: A scalable graph convolutional neural network accelerator with workload balancing. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 2834–2845. [Google Scholar] [CrossRef]
Kiningham, K.; Levis, P.; Ré, C. GRIP: A graph neural network accelerator architecture. IEEE Trans. Comput. 2022, 72, 914–925. [Google Scholar] [CrossRef]
Zhang, B.; Kannan, R.; Prasanna, V. BoostGCN: A framework for optimizing GCN inference on FPGA. In Proceedings of the 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Orlando, FL, USA, 9–12 May 2021; pp. 29–39. [Google Scholar]
Zhang, C.; Geng, T.; Guo, A.; Tian, J.; Herbordt, M.; Li, A.; Tao, D. H-gcn: A graph convolutional network accelerator on versal acap architecture. In Proceedings of the 2022 32nd IEEE International Conference on Field-Programmable Logic and Applications (FPL), Belfast, UK, 29 August–2 September 2022; pp. 200–208. [Google Scholar]
Romero Hung, J.; Li, C.; Wang, P.; Shao, C.; Guo, J.; Wang, J.; Shi, G. ACE-GCN: A Fast data-driven FPGA accelerator for GCN embedding. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2021, 14, 1–23. [Google Scholar] [CrossRef]
Geng, T.; Wu, C.; Zhang, Y.; Tan, C.; Xie, C.; You, H.; Herbordt, M.; Lin, Y.; Li, A. I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization. In Proceedings of the MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. Online Conference, 18–22 October 2021; pp. 1051–1063. [Google Scholar]
Lin, Y.C.; Zhang, B.; Prasanna, V. Gcn inference acceleration using high-level synthesis. In Proceedings of the 2021 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 20–24 September 2021; pp. 1–6. [Google Scholar]
Zhang, B.; Zeng, H.; Prasanna, V. Hardware acceleration of large scale gcn inference. In Proceedings of the 2020 IEEE 31st International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Manchester, UK, 6-8 July 2020; pp. 61–68. [Google Scholar]
Sohrabizadeh, A.; Chi, Y.; Cong, J. SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation. arXiv 2021, arXiv:2111.05936. [Google Scholar]
Gui, Y.; Wei, B.; Yuan, W.; Jin, X. Hardware Acceleration of Sampling Algorithms in Sample and Aggregate Graph Neural Networks. arXiv 2022, arXiv:2209.02916. [Google Scholar]
Li, S.; Niu, D.; Wang, Y.; Han, W.; Zhang, Z.; Guan, T.; Guan, Y.; Liu, H.; Huang, L.; Du, Z.; et al. Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network. In Proceedings of the 49th Annual International Symposium on Computer Architecture, New York, NY, USA, 18–22 June 2022; pp. 946–961. [Google Scholar]
Chen, S.; Zheng, D.; Ding, C.; Huan, C.; Ji, Y.; Liu, H. TANGO: Re-Thinking quantization for graph neural network training on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 11–17 November 2023; pp. 1–14. [Google Scholar]
Zhang, B.; Zeng, H.; Prasanna, V. Low-latency mini-batch gnn inference on cpu-fpga heterogeneous platform. In Proceedings of the 2022 IEEE 29th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Bengaluru, India, 18–21 December 2022; pp. 11–21. [Google Scholar]
Lin, Y.C.; Zhang, B.; Prasanna, V. Hp-gnn: Generating high throughput gnn training implementation on cpu-fpga heterogeneous platform. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, 27 February–1 March 2022; pp. 123–133. [Google Scholar]
Sarkar, R.; Abi-Karam, S.; He, Y.; Sathidevi, L.; Hao, C. FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Montreal, QC, Canada, 25 February–1 March 2023; pp. 1099–1112. [Google Scholar]
Liang, S.; Liu, C.; Wang, Y.; Li, H.; Li, X. Deepburning-gl: An automated framework for generating graph neural network accelerators. In Proceedings of the 39th International Conference on Computer-Aided Design, Virtual, 2–5 November 2020; pp. 1–9. [Google Scholar]
Chen, H.; Hao, C. Dgnn-booster: A generic fpga accelerator framework for dynamic graph neural network inference. In Proceedings of the 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Marina Del Rey, CA, USA, 8–11 May 2023; pp. 195–201. [Google Scholar]
Abi-Karam, S.; Hao, C. Gnnbuilder: An automated framework for generic graph neural network accelerator generation, simulation, and optimization. In Proceedings of the 2023 33rd IEEE International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden, 4–8 September 2023; pp. 212–218. [Google Scholar]
Lu, Q.; Jiang, W.; Jiang, M.; Hu, J.; Shi, Y. Hardware/Software Co-Exploration for Graph Neural Architectures on FPGAs. In Proceedings of the 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Nicosia, Cyprus, 4–6 July 2022; pp. 358–362. [Google Scholar]
Yan, W.; Tong, W.; Zhi, X. FPGAN: An FPGA accelerator for graph attention networks with software and hardware co-optimization. IEEE Access 2020, 8, 171608–171620. [Google Scholar] [CrossRef]
Wu, C.; Tao, Z.; Wang, K.; He, L. Skeletongcn: A simple yet effective accelerator for gcn training. In Proceedings of the 2022 IEEE 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, UK, 29 August–2 September 2022; pp. 445–451. [Google Scholar]
Yuan, W.; Tian, T.; Wu, Q.; Jin, X. QEGCN: An FPGA-based accelerator for quantized GCNs with edge-level parallelism. J. Syst. Archit. 2022, 129, 102596. [Google Scholar] [CrossRef]
He, Z.; Tian, T.; Wu, Q.; Jin, X. FTW-GAT: An FPGA-based accelerator for graph attention networks with ternary weights. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 4211–4215. [Google Scholar] [CrossRef]
Wang, Z.; Que, Z.; Luk, W.; Fan, H. Customizable FPGA-based Accelerator for Binarized Graph Neural Networks. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 27 May–1 June 2022; pp. 1968–1972. [Google Scholar]
Ran, S.; Zhao, B.; Dai, X.; Cheng, C.; Zhang, Y. Software-hardware co-design for accelerating large-scale graph convolutional network inference on FPGA. Neurocomputing 2023, 532, 129–140. [Google Scholar] [CrossRef]
Yuan, W.; Tian, T.; Liang, H.; Jin, X. A gather accelerator for GNNs on FPGA platform. In Proceedings of the 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS), Beijing, China, 14–16 December 2021; pp. 74–81. [Google Scholar]
Tao, Z.; Wu, C.; Liang, Y.; Wang, K.; He, L. LW-GCN: A lightweight FPGA-based graph convolutional network accelerator. ACM Trans. Reconfigurable Technol. Syst. 2022, 16, 1–19. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, B.; Kannan, R.; Prasanna, V.; Busart, C. Model-architecture co-design for high performance temporal gnn inference on fpga. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 1108–1117. [Google Scholar]
Hansson, O.; Grailoo, M.; Gustafsson, O.; Nunez-Yanez, J. Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training. In Proceedings of the International Symposium on Applied Reconfigurable Computing; Springer: Berlin/Heidelberg, Germany, 2024; pp. 33–47. [Google Scholar]

Figure 1. Diagram showing the key areas forming the basis of the survey.

Figure 2. Proposed taxonomy based on variants of GNNs.

Figure 3. An overview representation of the GNN model including input matrices, layers, aggregation, and combination stages.

Figure 4. Uniform quantization (left) with equally ranged quantization levels and non-uniform quantization (right) with quantization levels based on the data distribution. Here, x denotes the input data and y the quantized value representations.

Figure 5. Illustration of symmetric and signed quantization (left) and asymmetric and unsigned quantization (right).

Figure 6. In the figure, a vector (x) represents a data element located in a multidimensional space, while a code word (yellow circle) represents the best representation of the above vector. Code words indicate the density center of the data space and play a role in the next-nearby search process, including vectors. Voronoi regions designate the regions where vectors closest to each code word are grouped. The Voronoi diagram divides the data space and determines the codeword assigned to each vector.

Figure 9. Illustration of speedup and energy efficiency comparison for hardware-based accelerators (GPU, ASIC, FPGA). The results of the HyGCN study are used as a baseline (

1 \times

). The y-axis is presented as a factor, indicating the degree of improvement achieved by the studies in comparison to HyGCN. In the chart, V100 represents the GPU baseline, AWB represents AWB-GCN, and FP represents FP-GNN.The results are used directly as presented in the papers.

Figure 9. Illustration of speedup and energy efficiency comparison for hardware-based accelerators (GPU, ASIC, FPGA). The results of the HyGCN study are used as a baseline (

1 \times

). The y-axis is presented as a factor, indicating the degree of improvement achieved by the studies in comparison to HyGCN. In the chart, V100 represents the GPU baseline, AWB represents AWB-GCN, and FP represents FP-GNN.The results are used directly as presented in the papers.

Figure 10. Comparison of latency (ms) and energy efficiency (graph/kJ) of 3 different methods using the same hardware (Intel Stratix 10 SX, 330 MHz) on the same datasets.

Figure 11. Latency (ms) and energy efficiency (graph/kJ) comparison of 4 different methods on the same datasets. Compared to the 3 approaches with quantization, AWB-GCN represents an approach without quantization.

Figure 12. Resource utilization demonstration for FPGA-based hardware accelerators combined with quantization. Here, FP is denoted as FP-GNN, FTW as FTW-GAT, LL as LL-GNN, LW as LW-GCN, Wang et al. [130], and Ran et al. [131].

Table 2. Acronyms, notations and descriptions.

Description	Notations	Description	Notations
Original graph	G	Feature matrix of l-th layer	$H^{l}$
The set of graph vertices (nodes)	V	Trainable weight matrix for the l-th layer	$W^{l}$
The set of graph edges	E	Diagonal degree matrix of A	$\tilde{D}$
The feature of node v in layer l	$h_{v}^{l}$	Degree of node i	$d_{i}$
Number of node in graph G	N	Degree of node j	$d_{j}$
Layer index	l	Sigmoid function	$σ$
Number of edges in graph G	M	Multi-layer perceptron	$M L P$
Gather feature vector of node v in layer l	$a_{v}^{l}$	Learnable parameter	$ϵ$
Adjacency matrix	A	Attention coefficients	$a_{i j}$
Normalized adjacency matrix	$\tilde{A}$	Transpose	T
Concatenation	$\| \|$	Linear transformation weight matrix	$W^{k}$
Maximum operation	$m a x$	Max-pooling operation	$W_{p o o l}$
Graph neural network	$G N N$	Graph convolutional network	$G C N$
Field-Programmable Gate Array	$F P G A$	Post-training quantization	$P T Q$
Quantization aware training	$Q A T$	Floating-point	$F P$
Integer number	$I N T$	Lookup table	$L U T$
Flip flops	$F F$	Digital signal processing element	$D S P$
Block RAM	$B R A M$	High-level synthesis	$H L S$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kose, H.T.; Nunez-Yanez, J.; Piechocki, R.; Pope, J. A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems. Information 2024, 15, 377. https://doi.org/10.3390/info15070377

AMA Style

Kose HT, Nunez-Yanez J, Piechocki R, Pope J. A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems. Information. 2024; 15(7):377. https://doi.org/10.3390/info15070377

Chicago/Turabian Style

Kose, Habib Taha, Jose Nunez-Yanez, Robert Piechocki, and James Pope. 2024. "A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems" Information 15, no. 7: 377. https://doi.org/10.3390/info15070377

APA Style

Kose, H. T., Nunez-Yanez, J., Piechocki, R., & Pope, J. (2024). A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems. Information, 15(7), 377. https://doi.org/10.3390/info15070377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems

Abstract

1. Introduction

2. Background

2.1. Graph Neural Networks

2.2. Graph Convolutional Networks

2.3. Graph Isomorphism Networks

2.4. Graph Attention Networks

2.5. GraphSAGE

2.6. Future Research Directions

3. Graph Neural Network Quantization

3.1. Quantization

3.1.1. Scalar Quantization

3.1.2. Vector Quantization

3.2. Quantization Approaches for GNNs

3.3. Future Research Directions

4. Graph Neural Network Acceleration

4.1. Hardware-Based Accelerator Approaches

4.2. FPGA-Based Accelerators Approaches

4.3. FPGA-Based Heterogeneous Approaches

4.4. Frameworks for FPGA-Based Accelerators

4.5. FPGA-Based Accelerator Approaches with Quantization

4.6. FPGA-Based Accelerators for Embedded Applications

4.7. Future Research Directions

5. Discussion and Future Research Directions

5.1. Summary of Current Research

5.2. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI