A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression

Theocharis, John B.; Chadoulos, Christos G.; Symeonidis, Andreas L.

doi:10.3390/make7020040

Open AccessArticle

A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression

by

John B. Theocharis

^†

,

Christos G. Chadoulos

^*,†

and

Andreas L. Symeonidis

Department of Electrical & Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mach. Learn. Knowl. Extr. 2025, 7(2), 40; https://doi.org/10.3390/make7020040

Submission received: 16 February 2025 / Revised: 8 April 2025 / Accepted: 16 April 2025 / Published: 26 April 2025

(This article belongs to the Section Network)

Download

Browse Figures

Versions Notes

Abstract

Knee osteoarthritis (KOA) is a highly prevalent muscoloskeletal joint disorder affecting a significant portion of the population worldwide. Accurate predictions of KOA progression can assist clinicians in drawing preventive strategies for patients. In this paper, we present an integrated approach based on hypergraph convolutional networks (HGCNs) for longitudinal predictions of KOA grades and progressions from MRI images. We propose two novel models, namely, the C_Shape.Net and the predictor network. The C_Shape.Net operates on a hypergraph of volumetric nodes, especially designed to represent the surface and volumetric features of the cartilage. It encompasses deep HGCN convolutions, graph pooling, and readout operations in a hierarchy of layers, providing, at the output, expressive 3D shape descriptors of the cartilage volume. The predictor is a spatio-temporal HGCN network (ST_HGCN), following the sequence-to-sequence learning scheme. Concretely, it transforms sequences of knee representations at the historical stage into sequences of KOA predictions at the prediction stage. The predictor includes spatial HGCN convolutions, attention-based temporal fusion of feature embeddings at multiple layers, and a transformer module that generates longitudinal predictions at follow-up times. We present comprehensive experiments on the Osteoarthritis Initiative (OAI) cohort to evaluate the performance of our methodology for various tasks, including node classification, longitudinal KL grading, and progression. The basic finding of the experiments is that the larger the depth of the historical stage, the higher the accuracy of the obtained predictions in all tasks. For the maximum historic depth of four years, our method yielded an average balanced accuracy (BA) of 85.94% in KOA grading, and accuracies of 91.89% (+1), 88.11% (+2), 84.35% (+3), and 79.41% (+4) for the four consecutive follow-up visits. Under the same setting, we also achieved an average value of Area Under Curve (AUC) of 0.94 for the prediction of progression incidence, and follow-up AUC values of 0.81 (+1), 0.77 (+2), 0.73 (+3), and 0.68 (+4), respectively.

Keywords:

knee cartilage osteoarthritis (KOA); deep learning; hypergraph convolutional networks (HGCNs); sequence-to-sequence learning; 3D shape descriptors; Kellgren–Lawrence (KL) grading; KOA progression

1. Introduction

Knee Osteoarthritis (KOA) is one of the most prominent joint diseases affecting people around the world, with the majority of patients experiencing its onset at age 60 or greater. The generative causes that are thought to contribute the most in the onset and progression of the disease are (1) misalignments in bone structure and anatomy, either inherited (congenital) or developed throughout one’s life, (2) excess body mass that chronically places considerable mechanical stress on the knee joint, coupled with a general state of inactivity which, in time, weakens and diminishes the supporting musculature (3), and (4) advanced age [1]. The ensuing cartilage loss induced by the condition exposes the femoral and tibial bones to high friction forces during any type of movement, leading to a gradual denudation of the cartilage volume, and in the more extreme scenarios, to complete exposure of the respective bone surfaces. The above consequences, in time, result in substantial loss of quality of life in the more severe cases, limiting the patients’ mobility in everyday tasks, and overall resulting in a state of diminished general well-being [2].

1.1. KOA Detection and Progression

Imaging modalities such as X-rays and MRIs are invaluable tools in the hands of experts, with regards to diagnosing KOA and assessing its future trajectory. Data repositories that host large collections of imaging data provide unique opportunities to the scientific community, towards the development of methods that can adequately capture the intricacies of the incidence of the disease. Over the years, there has been a considerable body of research making use of such resources to automatically detect the presence of KOA in patients. Recent advancements in both hardware and software technologies are offering new possibilities with regards to the development of large-scale models that can harness the full wealth of the available data.

One of the most prominent scoring systems used for the evaluation of the severity of KOA is the Kellgren–Lawrence (KL) grading scale [3]. This scale is used by experts to evaluate the severity of the condition based on findings from imaging (predominantly X-ray) data. The scale extends from 0 up to 4, with each increasing grade signifying a progressively more severe condition:

KL0: Definite absence of KOA;
KL1: Doubtful Joint-Space-Narrowing (JSN);
KL2: Definite presence of osteophytes and indications of JSN;
KL3: Presence of multiple moderate osteophytes, and possible detection of bone deformities;
KL4: Presence of large osteophytes, severe narrowing of joint space, and definite bone deformities.

Apart from imaging data, the additional incorporation of generic demographic information may potentially yield a net positive effect in the predictive capabilities of the aforementioned models, since all the above variables constitute known risk factors that can influence the future trajectory of KOA. Variables such as age, Body-Mass-Index (BMI) and gender are primary examples of such features, and may be harnessed to steer the diagnostic models towards more robust performance. A comprehensive literature review of the research mentioned above can be found in [4].

In addition to the efforts towards the development of automatic methods that can identify the presence of KOA, an equally pressing task is the exploitation of the available longitudinal data repositories for the advancement of existing methodologies, towards the detection of progression trajectories in the temporal dimension. Deep learning models, making use of high-dimensional representations of the raw data, can potentially identify progression patterns that are impossible for a human expert to detect with consistency. Enabling clinicians to confidently predict such a pattern in a patient will enhance their ability to intervene in the early stages of KOA development. This, in turn, may allow for timely treatment of the condition, ultimately improving the future outlook for the afflicted person by dampening the more severe symptoms associated with it, leading to better outcomes overall. Several studies reviewing recent developments towards this direction can be found in [5,6].

1.2. Proposed Methodology

In this paper, we propose an integrated approach to acquire longitudinal grading of KOA severity. Figure 1 shows the pipeline of our method. It comprises a preliminary segmentation stage and two main networks, namely, the cartilage 3D shape network (C_Shape.Net) and the predictor network (ST_HGCN). Both networks exploit the higher order hyperedges and the multi-view fusion of HGCN convolutions to achieve their tasks.

Most existing methods directly use radiology scans (usually X-ray) as an input source to produce KOA predictions. This suggests that the capturing of the cartilage characteristics and the assessment of KOA grades are blended, thus increasing the difficulties of the combined task at hand. In our perspective, we decouple the two procedures. Concretely, we deploy the recently proposed DMA-GCN [7] method to produce the 3D cartilage segmentation maps from MRI scans at the preliminary stage (Figure 1a). The model integrates local learning convolutional units that aggregate spatial contextual information at multiple scales and global learning convolutional units exploring the global relationship among nodes in the image. These units are suitably interconnected within a dense convolutional architecture that allows the generation of comprehensive feature representations and hence, accurate classification results.

The graph construction task (Figure 1b) aims to transform the 3D cartilage volume (CV) into a graph of volumetric nodes (VNs). The process is especially designed to describe the specific structural properties of the knee CV. VNs are formed as 3D structures between two corresponding faces (upper/bottom), belonging to triangular meshes established on the different surfaces of the CV. The nodes are equipped with a set of local features, including spatial and structural geometric features of the pertaining faces, as well as volumetric measures, such as thickness and volume values. To further enhance the representation capabilities of nodes, we also exploit local spatial information by incorporating the features of neighboring nodes.

The C_Shape.Net (Figure 1c) is a deep hypergraph convolutional network with a hierarchical structure. This network operates on a patient’s level, i.e., acts on a hypergraph associated with each knee’s CV individually. Given the VN representations, it aims to generate a global 3D shape descriptor of a knee, describing the structural properties of the entire 3D cartilage volume. At each layer, C_Shape.Net encompasses three main operations, specifically, the densely connected HGCN block (DHGCN), the graph coarsening, and the readout layers. The DHGCN block conducts multi-view HGCN convolutions, with the different views corresponding to the surface and the volumetric features, respectively. In addition, DHGCN also incorporates local and global multi-view modules. The former implements aggregation of neighboring VNs in a local 3-hop neighborhood, while the latter ones explore global pairwise affinities between distant nodes in CV. The graph coarsening layer reduces the size of the incoming graph by retaining a portion of representative VNs, while the readout unit extracts global features from the graph at the different layers in the hierarchy. To acquire a better shape representation of the entire knee, the final shape descriptor is formed as a concatenation of the respective shapes corresponding to the medial and lateral compartments of the knee, respectively.

As the final stage of our methodology, the predictor model is a HGCN network with spatio-temporal processing properties (ST_HGCN) (Figure 1d). Contrary to the C_Shape.Net, the ST_HGCN network now operates on temporally interconnected hypergraphs of knees (nodes) in the OAI repository. These hypergraphs are considered at different time steps, including the baseline visit and the following eight yearly follow-up visits. ST_HGCN aims at generating multi-step ahead predictions of KOA grades at the prediction stage, based on knee data at the historical stage. The model undertakes three main tasks: (a) the spatial convolutions, (b) the attention-based temporal fusion, and (c) the sequence transformation. The former task performs multi-view HGCN convolutions, consecutively on every graph of knees, at the different time slices of the historical stage. This operation explores the pairwise relations among the knees, with the scope to acquire a more comprehensive feature representation. The knee relationships are encoded by considering four different hyperedges (views): the global shape descriptors created by the C_Shape.Net and the three demographic features (age, BMI, gender). The second task aggregates the feature embeddings across the different branches at the historical stage. Its goal is to discern important temporal changes within the knee’s shape sequence, which may be valuable for assessing the KOA incidence and progression. Finally, for each node, the transformer network transforms the sequence of fused embeddings from the historical stage to a sequence of KOA grades at the prediction stage.

Our methodology is comprehensively evaluated on the OAI [8] dataset. In the experimental setup, we examine the accuracy of longitudinal predictions for different sizes of the historical data. In addition, we investigate the demographic-based performance, i.e., the accuracies attained on different patients’ groups in terms of age, BMI, and gender. Finally, we provide experimental results to assess the efficacy of the proposed approach on the issue of KOA progression.

In summary, the main contributions of this paper are described as follows:

(I): A process to construct a graph of volumetric nodes to represent a 3D cartilage volume: The nodes are described by a rich set of features, including local spatial and geometric features of the surfaces, along with descriptive volumetric measures.
(II): A novel deep hypergraph convolutional neural network (C_Shape.Net): The shape network operates on volumetric graphs, aiming to generate a comprehensive 3D shape descriptor of a knee’s cartilage volume.
(III): A novel HGCN-based predictor network (ST_HGCN): The model exhibits spatio-temporal processing capabilities and adopts the sequence-to-sequence learning approach. The predictor transforms the shape data sequences at the historical stage to sequences of KOA grades at the prediction stage.
(IV): Multi-view convolutions and adaptive hyperedge learning: We apply multi-view HGCN convolutions to fuse and weight different data sources. In the C_Shape.Net, the different views correspond to the face and volumetric features of nodes, where in ST_HGCN, the views are associated with the shape features and the demographic risk factors of patients. Further, in all HGCN convolutions, we perform adaptive hyperedge learning at each layer.
(V): Semi-supervised learning (SSL): The ST_HGCN network is trained under the SSL approach, which allows the leverage of both the training and testing shape data during learning.
(VI): Experimental evaluation: The combined pipeline of the proposed methodology is extensively tested on the OAI cohort. We have devised various experimental scenarios, to assess the accuracy of the multi-step ahead predictions of KOA at future visits, the demographic-based accuracies, the early-stage predictions of KOA, as well as the KOA progression at the prediction stage.

2. Related Works

In this section, we briefly review related works on 3D shape descriptors and KOA risk models.

2.1. 3D Shape Descriptors

With the advancements in the field of 3D computer vision, the 3D shape analysis based on deep learning has been gaining increased attention in recent years, for object recognition, segmentation, and retrieval tasks. A thorough review of the 3D shape representation methods can be found in [9]. Broadly, the plethora of existing approaches can be distinguished into five categories, namely, the view-based, the point-cloud-based, the voxel-based, the mesh-based, and the graph convolutional network GCN-based methods.

Exploiting the advanced processing capabilities of CNNs on 2D images, the view-based methods combine a collection of different 2D views of a 3D object. Some representative models in this category are the multi-view CNN (MVCNN) [10], the group-view CNN (GVCNN) [11], the view N-gram Network (VNN) [12], and the multi-view harmonized bilinear network (MHBN) [13]. MVCNN uses CNNs to extract multi-view features, which are then aggregated via max-pooling to create compact 3D shape descriptors. GVCNN performs advanced feature aggregation by considering group multi-view features and then devises a feature pooling scheme on group views. MHBN aggregates local convolutional multi-view features through harmonized bilinear pooling to obtain more discriminative 3D shape representations.

The methods of the second category operate on point clouds, which comprise a group of irregular and unordered points distributed in 3D space. They are striving to tackle the inherent difficulties of convolutional operations in handling directly a point cloud. PointNet [14] learns to encode a point cloud and generates its global descriptor for segmentation and classification tasks. PointNet++ [15] confronts the inability of PointNet to identify neighborhood structures by aggregating local features of a point cloud. The PointWeb [16] densely interconnects all points in a local neighborhood in a point cloud, aiming to define point features representing its local region characteristics. The proposed adaptive feature adjustment AFA module is then used to identify interactions between points.

Voxel-based methods deploy CNNs to extract 3D shape features from dense 3D volumetric grids of voxels (3D cubes). Popular networks of this category are the ShapeNet [17], the VoxNet [18], and the MO-VCNN [19]. ShapeNet generates 3D shape descriptors directly from volumetric grids using a convolutional Deep Belief Network (DBN), while VoxNet extracts features using a 3D CNN. MO-VCNN considers volumetric data of different orientations. The authors use 3D CNNs to extract orientation specific features. These features are then aggregated via pooling and passed through another 3D CNN to make predictions.

The 3D shape analysis based on 3D meshes has also attracted great interest in the literature. Recent methods in this category are the MeshNet [20] and the MeshCNN [21]. In MeshNet, the faces are first passed through the NN-based spatial and structural descriptors to generate their respective features. Next, the individual face features along with the ones of its neighboring faces are aggregated via mesh convolutional blocks to obtain the shape descriptor after pooling. MeshNet uses a suitably designed CNN to generate shape descriptors from 3D triangular meshes. The model utilizes specialized convolutional and pooling layers that operate on the mesh edges, leveraging their intrinsic geodesic connections.

The GCN-based methods leverage the ability of GCNs to conduct node convolutions on graphs. Three interesting contributions in this field are the View-GCN [22], the MHGCN [9], and the MDC-GCN [23]. In View-GCN, a view-graph is first constructed with multiple views from different orientations, considered as graph nodes. Then, hierarchical GCN convolutional networks are designed to learn discriminating shape descriptors, considering relations among the views. MHGCN suggests a multi-scale representation scheme on HGCN for 3D shape retrieval and recognition. The method integrates multiple 3D shape descriptors (views) obtained from different feature extractors, each view described by a separate hypergraph. High-order relationships among the views are acquired by concatenating the view-specific incident matrices. In addition, multiple feature representations across the convolution layers are fused for more robust results. The MDC-GCN model introduces densely connected GCNs on 3D meshes to extract shape descriptors for object classification and segmentation. Initially, the 3D mesh is converted into a graph structure, with each face corresponding to a node in the graph. To enhance the expressive power of nodes, the authors cast a feature set incorporating local geometric characteristics of a face and its 1-ring face neighbors. For a better fusion of local and non-local features, convolutions in the MDC-GCN are conducted using densely connected GCN blocks.

In this work, we propose the C_Shape.Net to extract 3D shape descriptors, especially cast according to the requirements of the 3D knee cartilage volumes. Specifically, we seek to identify abnormalities of the bone surfaces as indicators of osteophytes and primarily represent the volumetric characteristics of the joint space of the cartilage. Both these issues are essential for accurate assessments in KOA grading and progression. To this end, we first convert the cartilage shape into a graph of VNs. Contrary to the 3D cube grid used in voxel-based approaches, the VNs here are created as 3D structures of prism type, formed between corresponding faces. Hence, regarding shape analysis, our C_Shape.Net incorporates the principles from GCN-based, voxel-based, and mesh-based methods, simultaneously.

2.2. KOA Prediction Models

With regards to KOA severity estimation, most works in the established literature approach the KL assignment problem as a standard classification task, whereby the proposed models are trained to derive an association between image representations and KOA quantification. Conventional machine learning methods rely on an initial feature extraction step, while deep learning ones streamline the process of feature engineering and KL prediction into a single pipeline. The authors in [24] utilize a variant of the Siamese CNN architecture that scores KOA severity in the KL scale. The same task is tackled in [25] via the joint use of a Residual Neural Network (ResNet) and a Convolutional Attention Block Module (CBAM) [26]. In [27], the problem of KL grading is treated as an ordinal regression task, with a similar path being adopted in [28,29], whereby the authors in both works fine-tune several popular CNN models to obtain the KOA scores. Finally, in [30], a multi-task transfer learning approach is proposed for the joint prediction of KL grade and likelihood of Total Knee Replacement (TKR).

Existing methods in KOA progression prediction usually consider the patient’s data at the baseline and predict its future progression up to a specified follow-up time. For instance, the work in [31] predicts early KOA incidence within 24 months, while the authors in [32,33] predict KOA progression within a middle-term period of 48 months ahead. In [34], following an initial quantification of KOA severity based on the Cartilage Damage Index (CDI), the authors evaluate four standard machine learning algorithms on the prediction of KOA progression, as measured by KL grade and Joint Space Narrowing at the medial and lateral compartments (JSM, JSL), respectively. In [35], a mixed-effects mixture model was employed to initially form two clusters of KOA progression vs. non-progression, and several regression models were subsequently trained to estimate the probabilities of subjects belonging to each of the above clusters. The risk assessment models in [36] provide longitudinal predictions of progression, spanning all future time steps of the prediction stage, while the same task is tackled in [37], via the use of a Transformer module that performs multi-modal data fusion. A common feature of these methods is the lack of historic data of patients used in the learning process. While all of the aforementioned works constitute interesting approaches in the KOA progression modeling task, they nevertheless share the same limitation of not incorporating the temporal dimension of the available repositories. Instead, these models are trained on images and clinical variables from the baseline visit alone, and treat the progression estimation as a series of independent classification tasks, disregarding the inherent sequential aspect of the data.

A primary goal of this work is to investigate the effect of previous historical data in future KOA predictions. Therefore, the data are distinguished here into a historic stage of varying depth and a prediction stage. Accordingly, each knee is associated with a sequence of shape/clinical data and a sequence of grades/progressions at future follow-up times. Further, an important innovation of our approach is that the pool of patients is modeled as a hypergraph of nodes with spatio-temporal characteristics, i.e., temporally interconnected hyperedges evolving over time. Our predictor performs spatial HGCN convolutions to exploit the relationships between the patients, temporal fusions to capture the evolution trends, and sequence-to-sequence transformation to obtain longitudinal predictions ahead. A similar framework can be found in the field of traffic flow predictions in road networks [38,39,40]. The efficacy of spatio-temporal techniques on graph representations has been thoroughly demonstrated by extensive research over the last several years.

3. Hypergraph Convolutional Networks

3.1. Hypergraph Definition

A hypergraph is defined as

G = {V, E, H, X}

, where

V = {v_{i}}_{i = 1}^{N}

is the node set of N nodes

(| V | = N)

,

E = {e_{j}}_{j = 1}^{M}

denotes the hyperedge set comprising M hyperedges

(| E | = M)

, and

H \in R^{N \times M}

is the incidence matrix.

X \in R^{N \times F}

is the data matrix containing the feature descriptors of nodes

x_{i} \in R^{F}, i = 1, \dots, N

where F denotes the feature dimensionality. The hypergraph structure is represented by an incidence matrix

H \in R^{N \times M}

with its rows and columns corresponding to the nodes and hyperedges, respectively. The entries of

H

are defined as follows:

H (v_{i}, e_{j}) = H_{i j} = \{\begin{matrix} 1, & i f v_{i} \in e_{j} \\ 0, & o t h e r w i s e \end{matrix}

(1)

Each hyperedge

e_{j}

is assigned a learnable weight

w_{j}

stored in a diagonal weight matrix

W = d i a g (w_{1}, \dots, w_{j}, \dots, w_{M}) \in R^{M \times M}

. The node degree of

v_{i} \in V

is given by

D_{n} (i) = \sum_{j = 1}^{M} w_{j} h_{i j}

. Further, the degree of each hyperedge

e_{j} \in E

is defined as

D_{e} (i) = \sum_{i = 1}^{N} h_{i j}

. The diagonal matrices

D_{n} = d i a g (D_{n} (1), \dots, D_{n} (N)) \in R^{N \times N}

and

D_{e} = d i a g (D_{e} (1), \dots, D_{e} (M)) \in R^{M \times M}

store all the node and hyperedge degrees, respectively. It should be stressed that, unlike the traditional graphs where each edge links two vertices, the hyperedges in a hypergraph are degree-free, which implies that a given hyperedge can involve more than two vertices, thus providing a more flexible description of high-order relationships among nodes.

3.2. Hypergraph Spectral Convolution

Spectral hypergraph convolution relies on the principles of spectral graph theory [41]. In this context, graph convolution is realized using the graph Laplacian matrix, whereby the eigenvectors and eigenvalues are regarded as the Fourier basis and frequencies, respectively. To account for the high computational cost involved in hypergraph convolutions, the authors in [42] suggested the use of K-order Chebyshev polynomials:

g * x = \sum_{k = 1}^{K} θ_{k} T_{k} (\tilde{L}) (x)

(2)

where

\tilde{L} = (\frac{2}{λ_{m a x}}) L - I

is a rescaled form of the Laplacian matrix

L \in R^{N \times N}

defined as

L = I - D_{n}^{\frac{- 1}{2}} H W D_{e}^{- 1} H^{T} D_{n}^{\frac{- 1}{2}}

(3)

The Chebyshev polynomials are recursively computed via

T_{k} (x) = 2 x T_{k - 1} (x) - T_{k - 2} (x)

with

T_{0} (x) = 1

and

T_{1} (x) = x

. The expression in Equation (2) is K-localized, which suggests that the convolution result at a specific node is derived by aggregating the feature representations of nodes belonging to K-hop neighborhoods around the central node. Setting

K = 1

and

λ_{m a x} \approx 2

[42], we are led to a hypergraph convolution, defined as follows:

X^{(l + 1)} = σ (D_{n}^{- \frac{1}{2}} H W D_{e}^{- 1} H^{T} D_{n}^{- \frac{1}{2}} X^{(l)} Θ^{(l)})

(4)

where

X^{(l + 1)}

are the embeddings at the output of the l-th layer,

σ (\cdot)

denotes the ReLU activation function,

H

represents the structure of the hypergraph,

W

contains the hyperedge weights,

X^{(l)} \in R^{N \times F^{(l)}}

stands for the data matrix with N nodes and

F^{(l)}

channels, and

Θ^{(l)} \in R^{F^{(l)} \times F^{(l + 1)}}

is the filter matrix of trainable parameters. Stacking multiple convolutional layers, we can build HGCN models with enhances learning capabilities.

3.3. Adaptive Hypergraph Learning

A crucial issue in HGCN learning to be properly addressed is how the nodes are organized in hyperedges to construct

H

. In some previous works [43], the hyperedges are attached to each node individually, i.e.,

| E | = M = N

, which leads to an incidence matrix

H \in R^{N \times N}

. Specifically, the hyperedge

e_{j} \in E

corresponding to the node

x_{j}

, is formed by considering the k-nearest neighbors of

x_{j}

, i.e.,

e_{j} = N (x_{j})

. In this context, we apply an adaptive hypergraph learning (AHL) mechanism based on the GAT-based attention [44], to automatically learn the hyperedges’ structure, using the current layer inputs at each layer.

Given the embedding matrix

Θ^{(l)}

in Equation (4), we perform a shared attention mechanism on all node pairs:

g_{i j} = α^{T} (x_{i}^{(l)} Θ^{(l)} ∥ x_{j}^{(l)} Θ^{(l)})

(5)

where

α

is a learnable weight vector and

g_{i j}

indicates the pairwise similarity between nodes i and j. The normalized attention coefficients are obtained using a single-layer feedforward neural network, parameterized by LeakyReLU nonlinearities, as follows:

H_{(i, j)} = \frac{exp (L e a k y R e L U (g_{i j}))}{\sum_{k \in N (x_{j})} exp (L e a k y R e L U (g_{k j}))}

(6)

The attention coefficient

H_{(i, j)}

provides the membership degree of node-i to hyperedge

e_{j}, j = 1, \dots, N

. For efficiency reasons, the computation of attention coefficients above is confined to the nodes in

N (x_{j})

. To stabilize the learning procedure, we consider multiple attention heads. The hyperedge convolution is now obtained by the following:

X^{(l + 1)} = ‖_{k = 1}^{K} X_{k}^{(l + 1)} = ‖_{k = 1}^{K} f (X^{(l)}, H^{(k)}; W, Θ_{k}^{(l)})

(7)

where K is the number of attention heads,

H_{k}^{(l)}

and

Θ_{k}^{(l)}

are the produced incidence matrix and the embedding matrix associated with the k-th attention head,

‖

denotes the concatenation operator, and the function

f (\cdot)

is used to implement the convolution operator in Equation (4).

4. Materials

Knee data in this study are selected from the OAI cohort, which contains longitudinal clinical and radiographic data for 4096 subjects between 45 and 79 years old. The data spread over a 9-year follow-up period, starting from the baseline visit and extending to the next eight yearly follow-up visits. Each subject is attached with a record comprising, among others, the following data: (1) 3D MRI images according to the DESS acquisition protocol, formed as a collection of consecutive 2D slices. (2) The respective segmentation masks [45], including labels for the following knee-joint structures (classes): Background tissue, Femoral Bone (FB), Femoral Cartilage (FC), Tibial Bone (TB), and Tibial Cartilage (TC). (3) Demographic features of {age, BMI, gender}, and a history of knee injury, according to an OAI questionnaire at baseline. (4) The Kellgren–Lawrence (KL) grades, quantifying the KOA severity into five KL classes {0, 1, 2, 3, 4}, where the classes are interpreted as

{0, 1} =

No_OA,

{2} =

Doubtful_OA,

{3} =

Moderate_OA and

{4} =

Severe_OA. (5) Anatomic axis alignment (tibiofemoral angle) and minimum medial joint space width measurements.

In our experiments, we have collected 3114 subjects (

2 \times = 6228

knees) with complete record data along the follow-up times. As input sources, we deploy the MRI scans and the demographic features.

With regards to the MRI scans, in order to obtain the full segmentation maps for all the knees in the historical stage (which will subsequently serve to generate an initial set of volumetric and geometric features to be processed by the C_Shape.Net), we deploy our previously proposed DMA-GCN [7] cartilage segmentation model. The segmentation maps generated by the aforementioned process serve the input to the overall ST_HGCN KOA prediction network presented in this study.
Complementing the above imaging data, our approach makes use of additional demographic features for the determination of the various types of hyperedges that constitute the hypergraphs that are utilized by the ST_HGCN predictor network. Concretely, we utilize the available information about the age, gender, and Body-Mass-Index (BMI) of the subjects whose knees are processed in order to define clusters that can implicitly define the corresponding hyperedges in ST_HGCN.

Table 1 presents a KL-grade distribution over time. As can be seen, knees are adequately dispersed across the KL classes at baseline. Further, a sufficient portion of knees exhibits a trend of KOA progression towards more severe stages. Figure 2a depicts a typical DESS scan in the sagittal, coronal, and axial planes showing the different bone and cartilage constituents.

5. Graph Construction of Volumetric Nodes

Given the overall 3D cartilage volume

CV = FC ⋃ TC

submitted by DMA-GCN, we develop in this section a four-stage process for constructing a suitable volumetric graph, with the aim to acquire an effective 3D shape representation of the CV.

5.1. Cartilage Volume Surfaces

Figure 2b shows an illustrative 2D scheme corresponding to a specific slide along the sagittal direction, which depicts the different cartilage bodies and surfaces involved. Generally, CV is delimited by the two main bone surfaces, namely, the FB surface

S^{(F B)}

and the TB surface

S^{(T B)}

, respectively. As a first step, for each slide, we identify the characteristic inflection points

A

and

B

, where FC intersects with the TC part. Next, we draw the lines

AC

and BD perpendicular to the

S^{(F B)}

, i.e.,

AC ⊥ S^{(F B)}

and

BD ⊥ S^{(F B)}

. Similarly, we draw the lines

AE

and

BF

perpendicular to the

S^{(T B)}

, i.e.,

AE ⊥ S^{(T B)}

and

BF ⊥ S^{(T B)}

, respectively. The location of these points divides

S^{(F B)}

into three parts,

{S_{0}^{(F B)}, S_{1}^{(F B)}, S_{2}^{(F B)}}

, where

S_{0}^{(F B)}

denotes the central FB surface, while the latter ones correspond to the lateral FB surfaces. Similarly,

S^{(T B)}

is split into three parts,

{S_{0}^{(T B)}, S_{1}^{(T B)}, S_{2}^{(T B)}}

. Further, we can distinguish the outer FC surfaces

{S_{1}^{(F C)}, S_{2}^{(F C)}}

along the frontal and lateral view of the knee, and the respective outer surfaces of TC denoted as

{S_{1}^{(T C)}, S_{2}^{(T C)}}

.

In view of the above surface definitions, the entire CV area can be decomposed into five regions, circumscribed by their respective surfaces:

\begin{matrix} R_{0} = JS = & \{S_{0}^{(F B)}, S_{0}^{T B}\} \\ R_{1}^{(F C)} = \{S_{1}^{(F B)}, S_{1}^{F C}\} & R_{2}^{(F C)} = \{S_{2}^{(F B)}, S_{2}^{F C}\} \\ R_{1}^{(T C)} = \{S_{1}^{(T B)}, S_{1}^{T C}\} & R_{2}^{(T C)} = \{S_{2}^{(T B)}, S_{2}^{T C}\} \end{matrix}

(8)

It should be stressed that although illustrations are given on a slide basis, the above regions correspond to 3D cartilage volumes, when considering the succession of all slides in the knee MRI. The region

R_{0}

corresponds to the joint space of the knee cartilage and constitutes the crucial volume of interest. The joint space width along with the abnormalities of its embracing bone surfaces (osteophytes) play a pivotal role in assessing the severity grade of KOA. The remaining regions are included in the analysis with the goal of detecting possible appearance of denudation. This also constitutes a significant cause of pain that affects KOA grading seriously.

5.2. Cartilage Surface Meshes

The goal in this section is to construct a triangular 3D mesh for each of the surfaces involved in the cartilage regions. The surface mesh takes the form

M = \{V^{(M)}, E^{(M)}, F^{(M)}, {XF}^{(M)}\}

, where

V^{(M)} = {v_{i}}

denotes the set of vertices,

F^{(M)} = {f_{k}}

represents the set of triangular faces, while

{XF}^{(M)}

is the vector of face features. Further,

E^{(M)} = {E}_{j}

denotes the set of edges on the mesh, each edge representing the connectivity of a face in regard to its 1-hop neighboring faces (Figure 3a). The mesh construction proceeds along the following steps:

(I): Initially, we perform sufficiently dense sampling along each slide of the FB surface to create the set of vertices $\{V_{0}^{(F B)}, V_{1}^{(F B)}, V_{2}^{(F B)}\}$ , where $V_{i}^{(F B)} \in S_{i}^{(F B)}, i = 0, 1, 2$ . Next, we apply the Delaunay triangularization method [46] to construct the 3D meshes of the FB surfaces $\{M_{0}^{(F B)}, M_{1}^{(F B)}, M_{2}^{(F B)}\}$ .
(II): In this step, we establish a one-to-one correspondence (CR) between the vertices on the pairwise surfaces of the cartilage regions defined in Equation (8). Specifically, we apply a CR mapping between the vertices of a reference surface (first argument) and the ones on the target surface (second argument). The vertex mapping is implemented on a per-slide basis. The definition of the corresponding vertices and the computation of their respective thickness values is described in the Appendix A (Figure 3c). For the main region $R_{0}$ , we consider $V_{0}^{(T B)} = CR \{V_{0}^{(F B)}\}$ , where $V_{0}^{(F B)} \in S_{0}^{(F B)}$ and $V_{0}^{(F B)} \in S_{0}^{(T B)}$ , indicating the one-to-one correspondence between the vertices on $S_{0}^{(F B)}$ (reference) and those on $S_{0}^{(T B)}$ (target) surfaces. Proceeding in a similar manner, we can create the CR vertex sets $V_{1}^{(F C)} \in CR \{V_{1}^{(F B)}\}$ and $V_{2}^{(F C)} \in CR \{V_{2}^{(F B)}\}$ , lying on the outer FC surfaces.
(III): In addition to the previously obtained $V_{0}^{(T B)}$ , we generate the vertex sets $\{V_{1}^{(T B)}, V_{2}^{(T B)}\}$ via dense sampling to cover the respective TB surfaces, and after triangulation we obtain the meshes $\{M_{1}^{(T B)}, M_{2}^{(T B)}\}$ . Then, the CR mapping completes by creating the sets $V_{1}^{(T C)} = C R \{V_{1}^{(T B)}\}$ and $V_{2}^{(T C)} = C R \{V_{2}^{(T B)}\}$ on the outer TC surfaces.
(IV): The one-to-one correspondence between the vertices establishes a similar correspondence between the faces of a reference surface and those of its respective CR surface. For example, we consider that $F_{0}^{(T B)} = C R \{F_{0}^{(F B)}\}$ to indicate that there is one-to-one mapping in $R_{0}$ between the faces on FB and TB, respectively. The above finding can be further extended to observe that CR assures that the spatial topology between the reference and target meshes is also preserved, which suggests that all CR faces retain a similar neighborhood structure. In view of the above, we can create the 3D meshes of the remaining surfaces of the CV:

$\begin{matrix} M_{0}^{(T B)} = & C R (M_{0}^{(F B)}) \\ M_{1}^{(F C)} = C R (M_{1}^{(F B)}) & , M_{2}^{(F C)} = C R (M_{2}^{(F B)}) \\ M_{1}^{(T C)} = C R (M_{1}^{(T B)}) & , M_{2}^{(T C)} = C R (M_{2}^{(T B)}) \end{matrix}$

(9)

5.3. Graphs of Volumetric Nodes

In this section, we elaborate on the critical part of the C_Shape.Net, namely, the description of volumetric nodes (VN) along with the respective volumetric graphs. Both are especially tailored to represent the structural characteristics of the knee cartilage volume. The generic VN model proposed herein is shown in Figure 3b, and formally, it can be described as a 3D structure of the form:

V N = s t r u c t (f^{(u)}, f^{(b)})

(10)

As can be seen, the VN shape comprises an upper triangular face

f^{(u)}

that lies on a reference mesh, and a corresponding face

f^{(b)} = C R (f^{(u)})

at the bottom, belonging to the target mesh. Particularly, for the joint space region

R_{0}

, the faces belong to the FB and TB meshes, i.e.,

f^{(u)} \in M_{0}^{(F B)}

and

f^{(b)} \in M_{0}^{(T B)}

. There are also three lateral faces of a quadrilateral shape. The three lateral edges connecting the corresponding vertices of

f^{(u)}

and

f^{(b)}

represent the cartilage thicknesses between these points. Finally, we pay due attention to the volume VN, which is a critical measure in KOA assessment analysis. This quantity is readily computed by suitably decomposing the VN shape into three triangular pyramids.

Given the meshes formed at the different surfaces, we can create VNs across all regions, thus covering the entire CV. The VNs of a specific region form a hypergraph structure of the form:

G = \{V^{(G)}, E^{(G)}, H^{(G)}, X^{(G)}\}

, where

V^{(G)}

denotes the set of VNs,

E^{((G))}

represents the spatial connections among the different VNs,

H^{(G)}

denotes the incidence matrix of hyperedges and

X^{(G)}

represents the node features. Accordingly, we generate the graph

G_{0}

attached to the joint space

R_{0}

, as well as the regional sub-graphs

\{G_{1}^{(F C)}, G_{2}^{(F C)}\}

and

\{G_{1}^{(T C)}, G_{2}^{(T C)}\}

, assigned to the respective FC and TC parts, respectively. The overall graph

G

is then formed as a collection of the above regional subgraphs:

G = \{G_{0}, G_{1}^{(F C)}, G_{2}^{(F C)}, G_{1}^{(T C)}, G_{2}^{(T C)}\}

(11)

5.4. Features of Volumetric Nodes

The formation of VN features involves two issues, namely, the determination of the face features and the collection of the additional volumetric features characterizing the VN. In regard to the face features, we follow the constructive approach and the feature types suggested in [23]. In this work, the feature vector of a face includes the geometric characteristics of this face itself, along with the corresponding ones of its 1-hop face neighbors. As detailed in Figure 3a, a face

f_{1}

on a mesh is attached with three neighboring faces

\{f_{2}, f_{3}, f_{4}\}

and a vertex set

\{v_{1}, v_{2}, v_{3}; v_{4}, v_{5}, v_{6}\}

. Then, the feature vector

XF \in R^{57}

is obtained by considering several spatial and structural characteristics, as follows:

XF = [\{P_{v}\}, \{N_{v}\}, \{{GC}_{v}\}, {N_{f}}, \{θ\}]

(12)

where

\begin{matrix} P_{v} & = (p_{1}, p_{2}, p_{3}, p_{4}, p_{5}, p_{6}) \end{matrix}

(13)

\begin{matrix} N_{v} & = ({\vec{n}}_{1}, {\vec{n}}_{2}, {\vec{n}}_{3}, {\vec{n}}_{4}, {\vec{n}}_{5}, {\vec{n}}_{6}) \end{matrix}

(14)

\begin{matrix} {GC}_{v} & = ({GC}_{1}, {GC}_{2}, {GC}_{3}, {GC}_{4}, {GC}_{5}, {GC}_{6}) \end{matrix}

(15)

\begin{matrix} N_{f} & = ({\vec{n}}_{f_{1}}, {\vec{n}}_{f_{2}}, {\vec{n}}_{f_{3}}, {\vec{n}}_{f_{4}}) \end{matrix}

(16)

\begin{matrix} θ & = (θ_{1, 1}, θ_{1, 2}, θ_{1, 3}, θ_{1, 4}) \end{matrix}

(17)

P_{v}

includes the vertex spatial positions, while

N_{v}

contains the vertex normals.

{GC}_{v}

is the Gaussian Curvature, a significant geometrical indicator in surface description, and

N_{f}

includes the face normals. Finally,

θ

includes the angles between the central face and its 1-hop face neighbors. Detailed derivations of these features can be found in [23]. The consideration of the neighboring face information in Equation (16) provides an effective description of a broader local area around a central face.

Most importantly, as can be seen from Figure 3c, the one-to-one CR mapping ensures that volumetric nodes retain a similar neighborhood structure as the ones of the upper and bottom faces, pertaining in the construction of a VN. The 1-hop neighbors of a central

{VN}_{1} = s t r u c t (f_{1}^{(u)}, f_{1}^{(b)})

are

\{{VN}_{2}, {VN}_{3}, {VN}_{4}\}

, where

{VN}_{j} = s t r u c t (f_{j}^{(u)}, f_{j}^{(b)}), f_{j}^{(b)} = C R (f_{j}^{(u)}), j = 2, 3, 4

. Hence, similarly to the faces, in the design of the

VN

feature vector, we incorporate the information of the neighboring

VN s

, to acquire a better perception of the local volume. Concretely, the

VN

descriptor embraces face and volumetric features, as defined in the following:

\begin{matrix} X & = [{XF}^{(u)}, {XF}^{(b)}, X^{(v o l)}] \end{matrix}

(18)

\begin{matrix} X^{(v o l)} & = [\{{Th}_{v}\}, \{{Vol}_{v}\}] \end{matrix}

(19)

\begin{matrix} \{{Th}_{v}\} & = (T h_{1}, T h_{2}, T h_{3}; T h_{4}, T h_{5}, T h_{6}) \end{matrix}

(20)

\begin{matrix} \{{Vol}_{v}\} & = (V o l_{1}; V o l_{2}, V o l_{3}, V o l_{4}) \end{matrix}

(21)

where

X^{(v o l)}

contains the volumetric features,

\{{Th}_{v}\}

subsumes the thickness values amongst corresponding vertices of the central VN and its 1-hop neighbors, while

\{{Vol}_{v}\}

includes the respective cartilage volumes.

The collection of features in VN should be regarded within the multi-view framework used in this paper. Accordingly, each constituent part in Equation (18) represents a different view (source), which encodes its own aspect of node data. The first two views correspond to the local features,

{XF}^{(u)}

and

{XF}^{(b)}

. They are devoted to capturing the bone/cartilage surface variabilities, with the goal to detect possible appearance of osteophytes and denudation. The third view corresponds to the features

X^{(v o l)}

, which convey valuable volumetric information of local thicknesses and volumes, especially regarding the joint-space area. These features assist in defining the joint space width, and hence the joint space narrowing over time, which are essential factors for the assessment of KOA grading and progression. Overall, the integration of the above views provides a more complete representation of the local features across the entire CV.

6. Cartilage 3D Shape Network

The proposed C_Shape.Net adopts a multilayered hierarchical graph architecture, as shown in Figure 4. The network receives as inputs the local features derived in Equation (18), and produces a global 3D shape descriptor, representing the structural characteristics of CV. At each layer, the model involves three main units: the DHGCN module that combines local and global HGCN convolutions on the VN graphs, the graph coarsening unit aiming to reduce the graph size via a self-attention scoring mechanism on VNs, and the readout units which extract global shape features from the graph across the different layers. These features are then submitted to the FC network to generate the final shape representation.

6.1. DHGCN Blending Block

The DHGCN is a densely connected structure conducting HGCN convolutions, which integrates both local and global learning on hypergraphs (Figure 5). It comprises the local multi-view convolutional (LMV) module and the global multi-view convolutional (GMV) module, respectively, that are intertwined into a four-layered model.

The LMV undertakes the local learning task, acting upon the hypergraph

G_{s} (V, E_{s}, H_{s}, U_{s})

, where

V

denotes the common set of N nodes

(| V | = N)

. The local hyperedge, set

E_{s}

in conjunction with its respective incidence matrix

H_{s}

, reflects the hyperedge structure between the nodes at the local level. The hyperedge associated with a central VN

x_{j}, j = 1, \dots, N

, is formed by establishing attention weights

H_{s} (i, j)

with respect to its neighboring nodes in a local 3-hop neighborhood around

x_{j}

. These weights are adaptively computed at each layer using the AHL mechanism discussed in Section 3.3. For each VN, we construct the incidence matrices

H_{s}^{(u)} \in R^{(N \times N)}

,

H_{s}^{(b)} \in R^{(N \times N)}

and

H_{s}^{(v o l)} \in R^{(N \times N)}

, attached to the respective view data, independently. Then, following the multi-view setting, we concatenate these matrices to construct an aggregated matrix

H_{s}

, suitable for multi-modal data [43,44]:

H_{s} = [H_{s}^{(u)}, H_{s}^{(b)}, H_{s}^{(v o l)}] \in R^{(N \times 3 N)}

(22)

where

H_{s}^{(u)}

and

H_{s}^{(b)}

contain the upper/bottom mesh hyperedges, while

H_{s}^{(v o l)}

includes the volumetric features’ hyperedges. Given

H_{s}

, the LMV unit generates local HGCN convolutions, exploring the spatially local relationships between VNs. The local feature representations

X_{s}^{(l)} \in R^{(N \times F^{(l + 1)})}

at layer l are derived as follows:

X_{s}^{(l)} = σ ((D_{n, s}^{- 1 / 2}) H_{s} W_{s} (D_{e, s}^{- 1}) {(H_{s})}^{T} (D_{n, s}^{- 1 / 2}) U_{s}^{(l)} Θ_{s}^{(l)})

(23)

where

D_{n, s} \in R^{N \times N}

and

D_{e, s} \in R^{M \times M}

,

(M = 3 N)

denote the node degree and hyperedge degree matrices, as computed by

H_{s}

, and

W_{s} = d i a g (W_{s}^{(u)}, W_{s}^{(b)}, W_{s}^{(v o l)}) \in R^{(M \times M)}

is a diagonal matrix of learnable hyperedge weights. Further,

U_{s}^{(l)} \in R^{N \times F^{(l)}}

and

Θ^{(l)} \in R^{F^{(l)} \times F^{(l + 1)}}

denote the input of the LMV unit, and the set of learnable filter parameters, at layer l.

The GMV units, on the other hand, conduct global feature convolutions, operating on the hypergraph

G_{g} (V, E_{g}, H_{g}, U_{g})

. The global hyperedge set

E_{g}

along with its respective incidence matrix

H_{g}

now reflect the hyperedge connectivity among the nodes at the global level. In that case, the incidence matrices of the different views

H_{g}^{(u)} \in R^{N \times N}

,

H_{g}^{(b)} \in R^{N \times N}

, and

H_{g}^{(v o l)} \in R^{N \times N}

are determined by computing the attention weights between VNs located at spatially distant locations in CV using the AHL mechanism. Then, after concatenation, we obtain the global multi-modal matrix

H_{g} = [H_{g}^{(u)}, H_{g}^{(b)}, H_{g}^{(v o l)}] \in R^{(N \times M)}

(24)

The global HGCN feature representations acquired by GMV at layer l are obtained by

X_{g}^{(l)} = σ ((D_{n, g}^{- 1 / 2}) H_{g} W_{s} (D_{e, g}^{- 1}) {(H_{g})}^{T} (D_{n, g}^{- 1 / 2}) U_{g}^{(l)} Θ_{g}^{(l)})

(25)

where

D_{n, g} \in R^{N \times N}

and

D_{e, g} \in R^{M \times M}

are the node and hyperedge degree matrices attached to

H_{g}

and

W_{g} = d i a g (W_{g}^{(u)}, W_{g}^{(b)}, W_{g}^{(v o l)}) \in R^{M \times M}

is a diagonal matrix of learnable hyperedge weights. Finally,

U_{g}^{(l)} \in R^{N \times F^{(l)}}

and

Θ_{g}^{(l)} \in R^{F^{(l)} \times F^{(l + 1)}}

are the GMV input and learnable embedding matrix, respectively.

Given the connectivity between the LMV and GMV units in Figure 5, the signal flow across the DHGCN proceeds as follows:

U_{s}^{(0)} = X^{(0)}, U_{g}^{(0)} = X_{s}^{(1)} U_{s}^{(1)} = X_{g}^{(1)} \oplus X_{s}^{(1)}, U_{g}^{(1)} = X_{s}^{(2)} \oplus X_{g}^{(l)}

(26)

where

X^{(0)}

denotes the initial node data.

Concluding, we provide some comments on the convolution process. First, we should notice that the formation of hyperedges in

H_{s}

and

H_{g}

is performed in a regional manner. Specifically, the neighboring nodes of a central VN are confined to belong to the same regional subgraph, as the one of the VN. By doing so, we decouple the convolutions conducted across the different subgraphs. As a result, the acquired feature representation of VNs in each subgraph is imposed to describe the structural characteristics of the respective part of CV. Secondly, the DHGCN block encompasses both local-global and multi-view learning, simultaneously. The former property allows the aggregation of neighboring information at different spatial scales around VNs, while the latter one combines multiple HGCN convolutions associated with the different shape features of the views. Most importantly, the learnable weight matrices

(W_{s, g}^{(u)}, W_{s, g}^{(b)}, W_{s, g}^{(v o l)})

place proper attention to the views

({XF}^{(u)}, {XF}^{(b)}, {XF}^{(v o l)})

, according to their importance in the convolutions. Lastly, the succession of the four layers in DHGCN implicitly increases the neighborhood depth around central nodes. This suggests that VNs at later layers aggregate neighboring information from broader surface/volume areas, which has a beneficial effect on the obtained feature representations.

6.2. Graph Coarsening and Readout

The coarsening operation performs graph pooling, aiming to reduce the size of the original graph by retaining a portion of representative nodes. To achieve the above task, we harness the principles of the SagPool [47] approach, which is adapted to the HGCN convolutions and the cartilage shape needs. SagPool is a self-attentional graph pooling method, leveraging both the node features and the graph topology. Particularly, it uses the self-attention scores attained via graph convolution to distinguish between the nodes to be preserved and those to be dropped out.

In this work, the self-attention scores of nodes are computed using HGCN convolutions, as follows:

{ZS}^{(l)} = σ ({(D_{n, g})}^{- 1 / 2} H_{g} W_{g} {(D_{e, g})}^{- 1} {(H_{g})}^{T} {(D_{n, g})}^{- 1 / 2} Z^{(l)} Θ_{p o o l}^{(l)})

(27)

where

{ZS}^{(l)} \in R^{N \times 1}

contains the self-attention scores of VN nodes,

Z^{(l)}

is the output of the DHGCN at layer l of C_Shape.Net, and

Θ_{p o o l}^{(l)} \in R^{(N \times 1)}

is the vector of learnable parameters, used for the embedding of node features.

The scores in

{ZS}^{(l)}

are sorted in descending order, providing a global ranking list for all VNs in the entire graph. Next, based on

{ZS}^{(l)}

, we create distinct ranking lists

{ZS}_{r}^{(l)}, r = 0, \dots, 4

, for each regional graph, individually. Then, for each regional graph, we select the top-ranked VNs:

i d x_{r} = t o p - r a n k ({ZS}_{r}^{(l)}, [k N_{r}])

(28)

where

k \in (0, 1]

is the pooling ratio, determining the portion of VNs of the input graph to be preserved, and

N_{r}

is the number of VNs contained in the r-th regional graph. The rest of VNs are masked, and the respective adjacency matrices are reshaped accordingly. Given that the great majority of VNs lie on the joint space graph

G_{0}

, the regional node selection described above is dictated by the fact that we should not risk the case of discarding VNs in the less populated lateral subgraphs, which possibly convey valuable information for the description of their respective volumes.

The readout unit aggregates all the node shape features in the graph, which are updated via graph convolutions by the DHGCN, at the node level. Thus, it provides a global (pooled) shape feature representation of the hypergraph. In our case, we apply the max, min, and average pooling operators to create the global shape feature, at layer l:

Z_{g l o} = (m a x \{Z_{(i)}^{(l)}\}, m i n \{Z_{(i)}^{(l)}\}, a v g \{Z_{(i)}^{(l)}\}), l = 1, \dots, M

(29)

The min and the average pooling operators are introduced with the scope to extract useful information regarding the important indicators of joint-space width and the cartilage volume. The overall global shape feature is obtained by concatenating the pooled features at all layers in the hierarchy of the C_Shape.Net:

Z_{g l o} = ⨁_{l = 1}^{M} Z_{g l o}^{(l)}

(30)

Z_{g l o}

is then submitted to a fully connected network which generates the final 3D shape descriptor

S = F C (Z_{g l o})

. The final output of the DHGCN is obtained by

S_{m e d / l a t} = F C (Z_{g l o})

(31)

From a structural point of view, the knee cartilage comprises two distinct compartments, namely, the medial and lateral cartilage volumes. These volumes exhibit their own shape characteristics and convey complementary evidence towards KOA grading of the entire knee of a subject. To circumvent this issue, we opt to integrate these two volumes by considering a two-branch architecture, whereby each branch implements the exact pipeline of Figure 4. Concretely, the slides of the knee MRI are suitably partitioned into two groups along the medial-lateral direction, which decomposes the overall cartilage into the medial and lateral bodies. These bodies are then processed as described in Section 6, including the formation of the cartilage meshes and graphs, as well as the extraction of face and volumetric features in Section 5.4. After passing through the two-branch C_Shape.Net, we obtain the 3D shape descriptors

S_{m e d}

and

S_{l a t}

, corresponding to the medial and lateral shapes, respectively. The total 3D shape descriptor of the entire knee shape is then attained by

S = λ S_{m e d} + (1 - λ) S_{l a t}

(32)

where the parameter

λ

is utilized to learn the balance between the two volumes according to their importance.

7. KOA Prediction Network

In this section, we elaborate on the proposed ST_HGCN predictor network used for longitudinal KOA predictions. The model comprises three main parts. The first part performs spatial HGCN convolutions for the different time steps of the historical data, while the second one implements an attention-based temporal fusion of the convolutional feature representations. Finally, a transformer network is designed to transform the knee’s feature sequences to future sequences of KOA predictions.

7.1. Notations and Problem Definition

The knee record used in the following analysis mainly includes the 3D shape feature obtained by the C_Shape.Net and the demographic features {Age, BMI, Gender}. Temporally, the data are distinguished here into two stages, namely, the historical stage and the prediction stage. The historical stage extends to a depth of P years, containing the knee data at time steps

{t_{0}, t_{1}, \dots, t_{P}}

, where

t_{0}

corresponds to the baseline visit. Further, the prediction stage refers to the future period of the next T years

{t_{P + 1}, t_{P + 2}, \dots, t_{P + T}}

.

Figure 6 shows the evolution of the different hypergraphs over time, which indicates our spatial-temporal approach of how the patients’ knee data are represented. Along the vertical direction, at each time slice

t = t_{0}, t_{1}, \dots

, the collection of knees is spatially arranged to form a graph,

G_{t} = (V_{t}, E_{t}, H_{t}, {XS}_{t})

.

V_{t}

is the set of N nodes

(| V_{t} | = N)

, whereby each node corresponds now to a specific knee, with their data

{XS}_{t}

considered at time t.

E_{t}

and

H_{t}

represent the set of different hyperedges between the nodes and the incidence matrix, at time t. Moreover, along the horizontal (temporal) direction, each node is associated to a respective sequence of shape data, evolving over time.

At the historical stage, the node data matrix

{XS}_{t}

at time steps

t = 0, 1, \dots, P

is defined as

{XS}_{t} = (S_{t}^{(1)}, \dots, S_{t}^{(i)}, \dots, S_{t}^{(N)}) \in R^{(N \times F)}

(33)

where

S_{t}^{(i)} \in R^{F}

denotes the F-dimensional shape feature vector of the i-th node at time t. We also consider the matrix

{XS}_{P} = ({XS}_{0}, \dots, {XS}_{t}, \dots, {XS}_{P}) \in R^{(N \times P \times F)}

(34)

which subsumes the node data at all time slices of the historical stage. For the time steps of the prediction stage

t^{'} = P + 1, \dots, P + T

, we define the matrix

Y_{t^{'}} = (y_{t^{'}}^{(1)}, \dots, y_{t^{'}}^{(i)}, \dots, y_{t^{'}}^{(N)}) \in R^{(N \times C)}

(35)

where

y_{t^{'}}^{(i)} \in R^{C}

is the label vector of KOA grades in the C cartilage classes. Similarly, the matrix

Y_{T} = (Y_{P + 1}, \dots, Y_{t^{'}}, \dots, Y_{P + T}) \in R^{(N \times T \times C)}

(36)

subsumes the label vectors across all nodes and time slices of the prediction stage.

We now consider the node sequence

{XS}_{P} (i) = (S_{0}^{(i)}, \dots, S_{t}^{(i)}, \dots, S_{P}^{(i)}) \in R^{(P \times F)}, i = 1, \dots, N

(37)

which includes the shapes of a node across all time steps of the historical stage. This sequence represents the shape evolution of a specific knee over time. Similarly, we introduce the label sequence

Y_{T} (i) = (y_{P + 1}^{(i)}, \dots, y_{t^{'}}^{(i)}, \dots, y_{P + T}^{(i)}) \in R^{(T \times C)}, i = 1, \dots, N

(38)

which contains the target label vectors

y_{t^{'}}^{(i)} \in R^{(C \times 1)}

of a node, at the prediction stage. We let

y_{t^{'}}^{(i)} (k) = 1

if the patient i belongs to the k-th KL class, and

y_{t^{'}}^{(i)} (k) = 0

otherwise.

In view of the above definitions, the forecasting problem is posed as follows: Using exclusively the historical data of nodes

{XS}_{P}

, determine the multi-step ahead KOA predictions

{\hat{Y}}_{T} = ({\hat{Y}}_{P + 1}, \dots, {\hat{Y}}_{t^{'}}, \dots, {\hat{Y}}_{P + T}) \in R^{(N \times T \times C)}

. For each node, the above task implements a mapping between the sequences

{XS}_{P} (i) \to Y_{T}^{(i)}, i = 1, \dots, N

. The latter indicates the sequence-to-sequence learning perspective of our approach.

7.2. Spatial Convolutions

The node shape data are processed in this part by establishing P independent convolutional branches of L layers (Figure 6), where each branch performs multi-view HGCN convolutions on the graphs

G_{t}, t = 1, \dots, P

, of the historical stage. The aim of this task is to acquire more descriptive shape representations, taking into consideration the various relationships among the nodes of the knee dataset. For the formation of the hyperedges, we consider four different views, corresponding to the shape feature and the three demographic features, respectively:

H_{t} = [H_{t}^{(s p)}, H_{t}^{(a g e)}, H_{t}^{(B M I)}, H_{t}^{(g e n d e r)}] \in R^{(N \times M)}

(39)

H_{t}^{(s p)} \in R^{(N \times N)}

includes the spectral affinities between the nodes, i.e., the similarities between their shape features. They are adaptively constructed at each layer using the AHL mechanism in Section 3.3. The latter three parts in Equation (39) encode pairwise node relations in terms of age group, BMI, and gender, and are determined differently. For instance, the age domain is distinguished into four age groups (hyperedges):

\{< 50, [50, 59), [60, 69), \geq 70\}

. Each hyperedge is described by a membership function:

μ_{j} (x) = e x p [{(x - p_{j})}^{2} / σ^{2}], j = 1, \dots, 4

(40)

where

p_{j}

is a central value of the group and

σ

is the standard deviation between all nodes in

V_{t}

. Then, the degree of membership of the i-th node in the j-th hyperedge is defined as

H_{t}^{(a g e)} (i, j) = μ_{j} (x_{i}) \in [0, 1]

. Similarly, we create three hyperedges for

H_{t}^{(B M I)}

(Normal, Pre-obesity, Obesity) and two hyperedges for

H_{t}^{(g e n d e r)}

(Male, Female), respectively, leading to a total number of

M = N + 9

hyperedges. Contrary to the spectral

H_{t}^{(s p)}

, the three demographic incidence matrices remain fixed across the layers and time steps.

At each time step of the historical stage

t = t_{0}, \dots, t_{P}

, the feature embeddings

V_{t}^{(l + 1)} \in R^{(N \times F^{(l + 1)})}

acquired by the spatial HGCN at layer

l = 0, 1, \dots, L

are given by

V_{t}^{(l + 1)} = σ ({(D_{n, t})}^{- 1 / 2} H_{t} W_{t} {(D_{e, t})}^{- 1} {(H_{t})}^{T} {(D_{n, t})}^{- 1 / 2} U_{t}^{(l)} Θ_{p r, t}^{(l)})

(41)

D_{n, t}, D_{e, t}

, and

W_{t}

are the respective degree matrices and the learnable hyperedge weights.

Θ_{p r, t}^{(l)}

denotes the embedding matrix of tuneable parameters, while

U_{t}^{(l)}

is the input signal at layer l. For

l = 0

, we have

U_{t}^{(0)} = {XS}_{t}

, where

{XS}_{t}

contains the original shape data, and

U_{t}^{(0)} = V_{t}^{(l)}

for later layers, at branch t. The demographic data

{XD}_{t}

are implicitly used at each layer to generate their respective hyperedges.

7.3. Attention-Based Temporal Fusion

At each layer of spatial convolutions, we apply an attention mechanism with the aim to aggregate the feature representations

\{V_{0}^{(l)}, \dots, V_{t}^{(l)}, \dots, V_{P}^{(l)}\}, ł = 1, \dots, L

, along the temporal direction of the historical stage. As a first step, we compute the following self-attention scores:

g_{t}^{(l)} = σ ({\tilde{V}}_{t}^{(l)} W_{F}^{(l)} + b_{F}^{(l)})

(42)

where

{\tilde{V}}_{t}^{(l)} \in R^{(N \cdot F)}

is a flattened one-dimensional vector of

V_{t}^{(l)}

, while

W_{F}^{(l)}

and

b_{F}^{(l)}

are the weight embedding matrix and bias of learnable parameters, shared across all nodes and time steps. The attentional scores are then normalized via the softmax function as

β_{t}^{(l)} = \frac{exp [g_{t}^{(l)}]}{\sum_{q = 1}^{P} exp [g_{q}^{(l)}]}

(43)

Finally, the multi-temporal aggregation is formed by

V_{F}^{(l)} = \sum_{t = 1}^{P} β_{t}^{(l)} V_{t}^{(l)}

(44)

The obtained fusion

V_{F}^{(l)}

pays due attention to specific time steps of the node shape sequences of the historical data. In that respect, it can capture significant temporal changes in the knee shapes (cartilage volumes), which provides valuable evidence for assessing KOA incidence and progression.

7.4. Sequence Transformation

In this part of the predictor, we develop an FC-based transformer network to transform the historical sequences of shape feature embeddings to future sequences of KOA grade estimates. Formally, the transformer performs the following sequence mapping:

V_{F} = (V_{F}^{(1)}, \dots, V_{F}^{(l)}, \dots, V_{F}^{(L)}) \overset{TRANSF}{\to} {\hat{Y}}_{T} = ({\hat{Y}}_{P + 1}, \dots, {\hat{Y}}_{t^{'}}, \dots, {\hat{Y}}_{P + T})

(45)

where

V_{F} \in R^{(N \times L \times F)}

is the input shape feature data, formed as the concatenation of the fusion results

V_{F}^{(l)}

, at all layers of the spatial convolutions of the historical stage. The output

{\hat{Y}}_{T} \in R^{(N \times T \times C)}

comprises the KOA predictions of nodes

{\hat{Y}}_{t^{'}} \in R^{(N \times C)}

, at all times of the prediction stage. As a first step,

V_{F}

is reshaped as a two-dimensional matrix. Next, we use T two-layered fully connected networks to produce the multi-step ahead predictions [48]:

{\hat{Y}}_{t^{'}} = σ ({\tilde{V}}_{F} W_{1}^{(t r)} + b_{1}^{(t r)}) \cdot W_{2}^{(t r)} + b_{2}^{(t r)}

(46)

where

W_{1}^{(t r)} \in R^{(L \cdot F \times \tilde{C})}

and

b_{1} \in R^{\tilde{C}}

denote the embedding matrix and bias of the first layer used to embed the inputs to an intermediate

\tilde{C}

-dimensional space.

W_{2}^{(t r)} \in R^{(\tilde{C} \times C)}

and

b_{2} \in R^{C}

are the respective weights and bias of the second layer, and

σ (\cdot)

denotes the ReLU function. The above learnable parameters are shared across all nodes(patients) of the graph. The outcomes from Equation (46) are passed through a softmax layer to obtain the final KOA grades, which are submitted to the prediction loss.

The predictor’s training is accomplished by applying semi-supervised learning (SSL). Specifically, the shape data

{XS}_{P}

is composed of a labeled part

{XS}_{P, l} \in R^{(N_{l} \times P \times F)}

and an unlabeled part

{XS}_{P, u} \in R^{(N_{u} \times P \times F)}

with

N = N_{l} + N_{u}

. The former part contains the nodes with historical shape sequences having labeled sequences of KOA grades in the prediction stage. The nodes in the latter part form the testing dataset, and hence, their output sequences are unknown. Following the SSL approach, we take leverage of the shape contents of both the training and the testing data, which can lead to more enhanced forecasting results. Finally, we employ the cross-entropy measure over the labeled samples to penalize the errors between the ground true labels and the predictor’s estimates:

L_{p r} = \sum_{t = P + 1}^{T} \sum_{c = 1}^{C} \sum_{p = 1}^{N_{l}} Y_{T} (t, c, p) ln ({\hat{Y}}_{T} (t, c, p))

(47)

where

Y_{T} (t, c, p)

indicates a labeled node at time t and KL class c. Using Equation (47), we perform an end-to-end training of the architecture in Figure 6, including the different parts of the predictor and the C_Shape.Net.

8. Experimental Results

Our proposed approach is comprehensively evaluated in this section on the widely used OAI dataset. In the experiments, we validate the produced longitudinal KOA predictions for varying depths of the historical stage as defined by P. Concretely, given the maximum length of 8 follow-up years for the patient sequences, we consider four different historical data cases

{t_{0}, \dots, t_{P}}

, assuming that P takes values in the range

P \in {0, 1, 2, 3}

. The case of

P = 0

corresponds to the graph data

G_{0}

, i.e., it contains the population of

N = 6228

knees at the baseline visit. The case

P = 1

subsumes the graphs

{G_{0}, G_{1}}

(Figure 6), i.e., it encompasses the

2 \cdot N

knees at the baseline visit and the first follow-up step, and so on. Finally, the case

P = 3

defines the maximum depth of the historical data, incorporating the

4 \cdot N

knees at the first four consecutive visits. Then, given a specific historical case

{t_{0}, \dots, t_{P}}

, its respective prediction stage is extended to the next four time steps

{t_{P + 1}, \dots, t_{P + 4}}

, i.e.,

T = 4

, where longitudinal KOA predictions are produced. MRI and shape data are used solely in the historical stage. The above experimental design is primarily dictated by space limitations. Nevertheless, our approach is general, allowing the consideration of alternative prediction settings, according to the user’s preference and the database contents.

The quality of the acquired predictions ahead is assessed using a variety of accuracy measures. The dataset is partitioned into training, validation, and testing subsets to a portion of

60 %, 20 %

, and

20 %

, respectively. The reported accuracy results are obtained by averaging over five different data partitions. The various hyperparameters pertaining to our networks are selected through 5-fold cross-validation.

8.1. Implementation Details

All models presented in this study were developed using the PyTorch 2.7 deep learning framework (https://pytorch.org/ (accessed on 1 September 2024)). The initial feature engineering step, involving the creation of 3D surface meshes for extraction of volumetric and surface features, which are then processed by the C_Shape.Net, is carried out utilizing the trimesh (https://trimesh.org/ (accessed on 1 September 2024)) and CGAL (https://www.cgal.org/ (accessed on 1 September 2024)) libraries. Table 2 holds the list of parameters with regards to model architecture, and their respective values. To better streamline the hyperparameter optimization process, we made use of the Optuna (https://optuna.org/ (accessed on 1 September 2024)) open source optimization framework. After an initial experimentation stage, the layer dimensionalities for the LMV and GMV units comprising the C_Shape.Net, and those of the 2-layer FC-Transformer network were set to 256, 64, and

[256, 128]

, respectively. The models were trained for a maximum number of 500 epochs, whereby training was halted by an early stopping criterion, after no reduction had been recorded in the validation set for more than 20 consecutive epochs. The initial learning rate was set to

0.1

, with a multi-step scheduler that decays that number by a factor of 10 at 3 evenly spaced milestones. The code for all models, feature extraction pipeline, and the training process utilized in this study, can be found at https://gitlab.com/mri_cartilage_segmentation/longitudinal_KOA_prediction (accessed on 15 April 2025).

In the following, we present a series of experiments to examine different aspects of our method.

8.2. Node KOA Classifications

In this experiment, we elaborate on the node classification task, i.e., generate KL-grade predictions for each node (knee) of the historical stage

(P = 3)

, individually. The accuracy of these predictions indicates primarily the efficacy of the shape descriptors given by C_Shape.Net, and secondarily that of the spatial convolutions.

All knees of the historical data matrix

{XS}_{P} = R^{(N \times P \times F)}

are considered herein as independent nodes in an overall graph, regardless of the time they are located. Similarly to Equation (36), let

Y_{P} = (Y_{0}, \dots, Y_{t}, \dots, Y_{P}) \in R^{(N \times P \times C)}

denote the matrix of KOA labels of nodes in the historical stage. In that case, the network conducts spatial HGCN convolutions on all nodes of

{XS}_{P}

. The finally acquired feature representations are passed through a two-layered network (FC2) to obtain KOA predictions, which are subsequently submitted to the training loss

L_{t r}

. The goal is then to establish a classification mapping

{XS}_{P} \to Y_{T}

after semi-supervised learning. The knee temporal sequences are dissolved, and no future predictions are sought; the attention-based temporal fusion mechanism and the transformer unit are herein disregarded.

Table 3 shows the node classification accuracies for various depths of the historical data. The results are evaluated in terms of the Balanced overall Accuracy (BA) on the five KL classes, the F1-score, and Cohen’s Kappa coefficient (K). The results indicate a comparable performance for the medial and lateral knee compartments, with a slight edge observed for the latter part. Regarding the Kappa coefficient, it can be seen that the combined treatment yields a meaningful boost. Noticeably, the suggested predictor provides a sufficiently high performance, with the accuracies varying between 84.07 and 95.98%, 85.94 and 97.47%, and 86.29 and 97.92%, in regard to the BA, F1-score, and Kappa measures, respectively. The lower results are obtained when learning is performed on the baseline data

(P = 0)

. Moreover, it can be noticed that the accuracies are proportionally increased for higher depths of the historical data, with the best scores achieved for

P = 3

. This trend is attributed to the fact that for higher P-values, the datasets are progressively supplemented with more severe KOA cases, i.e., knees exhibiting greater KL grades (Table 3).

8.3. Longitudinal KL Predictions

Table 4 shows the longitudinal KOA grading accuracies for varying depths of the historical data. The results are evaluated in terms of BA, F1-score, and K at the different yearly steps ahead, along with the overall accuracy OA on the entire prediction stage. In this experiment, we deploy the complete setup of our ST_HGCN predictor, including the spatial convolutions, the temporal fusion, and the (historic) sequence-to-(KL) sequence transformer.

The first observation is that the historical depth plays a crucial role in the KL grading performance. Concretely, the larger the length of the patients’ sequences, the higher the accuracies attained at follow-up times. The worst performance is obtained for

P = 0

(first row), exhibiting BA in the range (54.19–67.49%) and

O A = 60.02 %

. In that case, learning relies solely on the baseline data, with the temporal fusion mechanism essentially deactivated. Increasing the historic sequence length elevates the accuracy values considerably. The best results are acquired for

P = 3

(maximum depth), which provides a sufficiently high BA between 79.41 and 91.89% and

O A = 85.94 %

. The enhanced performance is primarily due to the temporal fusion, which is now able to detect significant shape trends across the knees’ historic sequence. To highlight the influence of historic sequence length, let us consider the KOA grading of a patient at the fourth follow-up visit ahead of the baseline (+48 months), as derived from four different values of P. As can be seen from Table 4, we obtain the accuracies of

54.19 % (t_{P + 4}, P = 0)

,

68.07 % (t_{P + 3}, P = 1)

,

79.28 % (t_{P + 2}, P = 2)

, and

91.89 % (t_{P + 1}, P = 3)

, respectively, which suggests that higher historic depths lead to sufficiently better predictions.

The second observation is that, reasonably, the KOA grading accuracies at later follow-up times are reduced, compared to the one-step ahead predictions. This trend applies across all historic depths, where we can notice a reduction of about

13 %

for the accuracies at

t_{P + 4}

in contrast to the ones at

t_{P + 1}

. Specifically, for

P = 3

, the one-step ahead prediction achieves an accuracy of

91.89 %

at

t_{P + 1}

(fourth follow-up visit) which is gradually diminished to

79.40 %

at

t_{P + 4}

(seventh follow-up visit). The drop in accuracies can be justified by the fact that, apart from the historical data, no shape, radiological, or other information is utilized during the prediction stage. Similar findings in the experiment can be also drawn in terms of the other quality metrics.

In order to statistically verify the above observations, we perform Nemenyi’s post hoc test [49] after rejecting the null hypothesis of statistical equivalence between the examined cases via the use of the Friedman statistical test [50]. The results presented in Table 5 allow us to confirm the fact that expanding the depth of the historical record considered by ST_HGCN, we are able to attain statistically significant improvements in performance.

8.4. Demographic-Based KL Accuracies

Table 6 hosts the KOA accuracies at future times for different demographic groups, categorized by Gender, Age, and BMI. Regarding the gender groups, the accuracies for the female group are consistently superior compared to the male group, across all time steps ahead. For the Age groups, the three lower age groups are more accurately identified in the KL classes, as contrasted to the elderly group of patients, which shows a relatively degraded performance. We can also notice a general trend of reducing performance as age advances from younger to more aged patients. Finally, in terms of BMI groups, there is also a slight tendency of diminishing accuracies as BMI moves from normal to the obese group. Normal and pre-obese patients are more accurately classified, while the obese ones exhibit reduced scores compared to the average performance in the population.

8.5. Accuracies According to the Stage of KOA

In this experiment, patients are classified with respect to the degree of severity of their OA. Following the setting in [51], we consider an alternative population partition into three new classes

C = {C_{1}, C_{2}, C_{3}}

, which are formed by grouping the KL grades as follows:

C_{1} = N O_O A = {0, 1}

,

C_{2} = E A R L Y_O A = {2}

, and

C_{3} = S E V E R E_O A = {3, 4}

. Table 7 presents the longitudinal KOA predictions in the different classes in C, considering a historic depth

P = 3

. For each of the follow-up times independently, we create three binary classification problems:

N O_O A

vs. rest,

E A R L Y_O A

vs. rest, and

S E V E R E_O A

vs. rest. Based on these formulations, the accuracy is evaluated in terms of each class-specific measures BA, F1-score, and K. We also include the average accuracy over all time steps of the prediction stage.

The results show that the suggested predictor identifies the classes

C_{1}

(mild KOA grades) and

C_{2}

(moderate KOA grades) most accurately, achieving an average BA of

89.72 %

and

85.95 %

, respectively. The class

C_{3}

(severe KOA grades) exhibits a lower level of prediction

82.22 %

. The above average picture is also reflected in the sequences of follow-up predictions. Specifically, the longitudinal predictions for the classes

C_{1}

and

C_{2}

attain enhanced accuracies, between 94.87 and 84.37% and 92.13 and 79.39%, respectively. On the other hand, the prediction scores for

C_{3}

take values in the range 88.65–74.68%. Contrasting the results in the first three rows with those of the fourth row, it can be concluded that patients with mild or moderate KOA severity

(C_{1}, C_{2})

are identified more precisely, while those with severe KOA grades

(C_{3})

are classified to a relatively lower level of accuracy. We can also notice the previously observed trend for the temporal accuracies. Concretely, for every class, the best performance is obtained for the one-step predictions at

t_{P + 1}

, while predictions at later times are gradually diminishing.

8.6. Longitudinal Predictions of KOA Progression

This experiment aims at investigating the capabilities of our predictor network to identify the KOA progression of patients, across the follow-up period. That is, we seek to accurately detect the changes in KOA grades occurring over time, as the patient’s disease aggravates from lower to greater grades.

In our setting, the progression

P r o g r (t, t_{r e f})

is measured as the KOA grade change observed at a future time step

t \in (t_{P + 1}, t_{P + 2}, t_{P + 3}, t_{P + 4})

, in regard to a previous reference time point

t_{r e f} = t_{P}

. This choice suggests that all future KOA changes are evaluated in reference to the last time step of the historical stage, with available KOA labels. To quantify the results, we define three progression variables which are derived according to the temporal transitions among the classes

C_{1} \to C_{2} \to C_{3}

. The variable

N o_P r o g r (t, t_{r e f})

indicates the absence of any KOA change in a patient in the interval

(t, t_{r e f})

. The variable

P r o g r_{+ 1} (t, t_{r e f})

signifies the existence of KOA grade changes causing the following class transitions:

C_{1} (N O_O A) \to C_{2} (E A R L Y_O A)

or

C_{2} (E A R L Y_O A) \to C_{3} (S E V E R E_O A)

. Finally, the variable

P r o g r_{+ 2} (t, t_{r e f})

indicates the occurrence of significant KL-grade changes, leading to a two-class shift

C_{1} (N O_O A) \to C_{3} (S E V E R E_O A)

.

Table 8 hosts the longitudinal progression results at the different follow-up times, considering a historical length of

P = 3

. In that case, the KL classifications provided by the predictor are re-organized according to the three progression classes

(N o_P r o g r, P r o g r_{+ 1}, P r o g r_{+ 2})

. Similarly to the previous section, the accuracies of progressions are evaluated in terms of the BA, F1, and the kappa coefficient K. The results show that as expected, the

N o_P r o g r

class of subjects that retain the same grading throughout the prediction stage, achieve high BA accuracies

(94.49 %, 93.16 %, 88.47 %, 84.72 %)

. Notice that these accuracies are in close accordance with the ones in Table 7 (first row). The

P r o g r_{+ 1}

is a more important class in two aspects. On one hand, it encompasses the critical task of early detection of KOA

(C_{1} \to C_{2})

for patients with or without minimal symptoms, which can assist the clinicians to schedule effective treatment strategies. On the other hand, we have the early-to-severe KOA progression

(C_{2} \to C_{3})

, which is also a critical task for apparent reasons. The

P r o g r_{+ 1}

is also precisely identified, with the accuracy scores varying between

(89.75 %, 84.25 %, 79.39 %, 74.37 %)

. Noticeably, the

P r o g r_{+ 2}

class attains a sufficiently high performance of accuracies, which indicates that the predictor can identify the cases of strong grade changes from normal-to-severe KOA

(C_{1} \to C_{3})

. In Figure 7, we use the Area Under ROC curve (AUC) values for various time steps ahead, along with the average AUC over the prediction horizon. In average terms, all progressions, and especially the early KOA appearance, are adequately captured with AUC higher than

0.8

.

Figure 8 shows a more detailed picture of the results for varying values of historical lengths. In that case, we examine the KL-grade progression incidence task

(Δ K L \geq 1)

, namely, we predict the KL changes greater than 1 over the prediction stage, compared to the KL grades at the reference time (last time step) of the respective historic periods. For each P, we present the area under ROC curves (AUC) values for the future time steps, along with the average AUC over all times of the prediction period. These results are computed after casting the appropriate binary classification problems (KL progression vs. No Progression). In terms of average AUCs, we achieved progression accuracies in the range

(0.81, 0.77, 0.73, 0.68)

for increasing values of P. This also indicates that the performance is sufficiently enhanced when considering longer historic periods. The AUC scores receive their highest values at

t_{P + 1}

(fast progression) and gradually lower ones at further time steps (middle- and longer-term progression). Viewing the above issue in a different perspective, suppose we want to predict the progress incidence of a patient 48 months (four time steps) after their first visit (baseline). Figure 8 shows that this can be obtained with an AUC of

0.68 (P = 0), 0.76 (P = 1), 0.85 (P = 2)

, and

0.94 (P = 3)

, respectively.

8.7. Ablation Studies

In this section, we conduct experiments to demonstrate the impact of several components of the predictor network, including the temporal fusion component, the C_Shape.Net, and the incorporation of the demographic risk factors in the convolutions.

8.7.1. Temporal Fusion Effect

Table 9 shows the KOA classifications along the follow-up times, for varying sizes of the historical windows. For each window, we contrast the efficacy of the proposed attention-based temporal fusion (ATTN) against the temporal fusion via mean aggregation (MEAN). The results demonstrate that the trainable temporal fusion component consistently outperforms the simple mean aggregator by a percentage of 2–6% across all historic depths and future time steps. Noticeably, the accuracy excess increases for larger historic lengths and later future times. Given a sufficiently large historic length, the above finding suggests that the attention-based aggregation on the patients’ data sequences is a more effective approach. Particularly, it allows our temporal fusion to discern shape changes, i.e., indicators of joint space narrowing, thus leading to better KOA assessments.

8.7.2. Demographic Risk Factors Effect

The longitudinal KOA grade predictions in Table 10 aim to justify the inclusion of demographic features. In this experiment, we examine two different cases of features used in the spatial HGCN convolutions. In the former case, we deploy solely the global shape descriptors obtained by the C_Shape.Net as the main feature representation, while in the latter one, we combine the shape features with demographic risk factors of gender, age, and BMI. Considering different views, these factors are incorporated in the convolutions via the respective hyperedge neighborhoods (Section 7.2), which allows the consideration of both shape and demographic relationships between the patients, simultaneously. The results show that the combination of shape and demographic factors provides a superior performance compared to the one attained by shape features alone. Higher values of accuracy can be noticed for the various historical lengths and future time steps, and across all quality measures. Moreover, the accuracy improvements tend to grow for more distant follow-up times. The above outcomes underscore the significance of demographic risk factors in acquiring more precise assessments for KOA grading and progression, an issue also corroborated by previous studies [32].

8.7.3. Shape Network Contribution

This experiment aims to demonstrate the efficiency of the automatically acquired shape descriptors in representing the cartilage volume. For this purpose, we contrast the prediction performance using two different inputs to the predictor network. Concretely, in addition to the shape features provided by the C_Shape.Net, we also consider an alternative shape input obtained by a CDI-based approach [34]. Briefly, the method proceeds as follows. From the collection of MRI slices in the medial compartment, we select 20 tibiofemoral slices, across the medial-to-lateral axis. For each of the slices, we apply sparse sampling along the anterior-posterior direction, to collect

4, 8

, and 12 locations at the

S_{0}^{(F B)}, S_{1}^{(F B)}

, and

S_{2}^{(F B)}

of the femoral bone surfaces, and a similar number of of locations on the tibial bone surface, located at the

S_{0}^{(T B)}, S_{1}^{(T B)}

, and

S_{2}^{(T B)}

, respectively. Then, for each of the femoral/tibial cartilage locations, we compute the respective femoral/tibial cartilage volumes, as a product of the thickness, the cartilage length (anterior-posterior), and the voxel size. In this way, we obtain 800 cartilage volumes overall for the medial compartment. A similar process is applied to compute another 800 cartilage volumes from the lateral compartment, yielding a total of 1600 volumes for the entire knee cartilage. The concatenation of these values forms the new feature vector. After PCA analysis, we are led to a reduced feature dimensionality by retaining the first 60 PCA components. Experiments show that the consideration of additional components does not improve performance. The large number of locations selected intends to ensure that all regions of articular surfaces are sampled where denudation frequently occurs.

Table 11 presents the longitudinal KOA predictions for the two inputs. As can be seen, the shape descriptors achieve considerably higher accuracies in terms of historical depths, future times, and evaluation metrics, compared to the more simplistic stacking of sparsely located volumes. The improvements lie in the range of 3–8% and tend to increase for later follow-up times. The better performance can be explained as follows: (a) the graph-based representation of VNs offers a more detailed description of the structural properties of the entire cartilage volume. (b) The VNs are equipped with a set of local features, including geometric characteristics of the surfaces, cartilage thicknesses, and volumes. (c) As a result of deep HGCN convolutions and hierarchical aggregations conducted by the C_Shape.Net, the shape descriptors comprehensively encapsulate the VN information, acquiring a global representation of cartilage.

8.8. Visual Demonstration

In this brief section, we showcase an example of KL-grade progression for a specific knee, across all the time points in the historical and future stages. Figure 9 illustrates the cartilage degradation that is indicative of KOA incidence and progression

Table 12 illustrates the performance of our proposed approach with regards to the size of the historical record considered, for the progression of the specific knee depicted in Figure 9. As can be seen, expanding the historic depth has a definite positive effect on the predictions of the model, across all the future time points:

Starting from the trivial case of using only the baseline data to perform future estimations, we can observe that ST_HGCN struggles to produce accurate predictions, with the model’s performance progressively deteriorating. While the immediate step-ahead prediction is correct, all multi-step ahead predictions are erroneous, with the furthest one also being the largest in discrepancy.
The second and third rows of Table 12 showcase the model’s performance for a historic size of $P = 2$ and $P = 3$ , respectively. Progressing from a single past observation to two, ST_HGCN is able to slightly boost its performance. Supplying the additional data points to the historical sequence further enhances the model’s predictive capabilities. In particular, ST_HGCN 1) performs more accurate predictions with increasing P and 2) the magnitude of the errors is decreased ( $2 / 4$ in the $P = 0$ case vs. $3 / 4$ in the $P = 2, 3$ cases.)
Finally, for the last step, we can confirm that expanding the historical depth to the maximum size (4), enables ST_HGCN to accurately predict all future KL grades.

8.9. Comparative Results

In this section, we present comparative results in Table 13, considering a collection of existing representative works in the field. The different methods are categorized into three groups, namely, the KL grading, the longitudinal KL grading, and the KOA progression. The methods in the KL grading group conduct static node (patient) classifications into the five KL classes. The second group contains a recent method, which, to the best of our knowledge, matches the multi-step ahead KL prediction setting of our scheme. Finally, the third group contains the methods which provide predictions of KOA progression. Usually, they consider the baseline visit as the reference time and generate predictions at a single follow-up time. For each category, we report the best results obtained using the max depth of historic sequences

(P = 3)

.

In the KL grading, our predictor achieved high accuracy results comparable to the best ones reported in [29]. For the second category, we obtained considerably better longitudinal KL predictions with max historic depth, compared to the A-EEN approach in [36]. Superior results are also produced for smaller historic horizons (Table 13). Finally, regarding the longitudinal progression incidence of the third category, we also obtained good results, favorably compared to existing methods in the group. It should be noticed that straightforward comparisons in this category are not feasible, because the various methods define the KOA progression differently. For instance, direct contrast with our setting is valid for the works [37], where KOA progression is defined as the change in KL grade over a follow-up period. However, other approaches define progression in terms of joint space narrowing [35,55]. Concluding, the above outcomes signify that the usage of previous historical data is indispensable for acquiring more precise KL and progression predictions ahead.

9. Conclusions

An integrated approach is presented in this paper for longitudinal KL predictions and progression. To accomplish the above goal, we propose two novel and distinct networks, namely, the C_Shape.Net and the predictor network, operating in cascade. C_Shape.Net acts upon a graph of volumetric nodes, especially designed to represent the surface and volumetric characteristics of the knee articular cartilage. It conducts HGCN convolutions, graph pooling, and readout operations across a hierarchy of layers, providing comprehensive 3D shape descriptors of the cartilage volume. Contrary to most previous works in KOA, grading the predictor network exhibits spatio-temporal capabilities, following the sequence-to-sequence learning approach. Concretely, the model encompasses spatial HGCN convolutions, attention-based temporal fusion of feature representations at multiple layers, and a transformer module that generates the longitudinal predictions at follow-up visits.

Complying with the sequence-to-sequence scenario, the OAI data are distinguished in the experiments into a historical stage of varying depth and a prediction stage of future times ahead. The predictor is then invoked to process the patient’s historical sequences and transform it into a prediction sequence of KOA grades and progression. We have conducted a thorough investigation to evaluate the performance of our method in different cases, including the node classification, longitudinal KOA predictions, KOA assessments in terms of demographic features, and longitudinal predictions of KOA progression. In these cases, our method acquired adequate accuracies, favorably compared to existing approaches.

The fundamental finding of this work is that the longer the patient’s historic sequences, the more accurate prediction sequences are obtained, in both KL grading and progression. Accordingly, provided that deeper data sequences are collected in the future, we can explore the opportunity of developing personalized prediction schemes via transfer learning, whereby effective longitudinal KOA assessments can be recursively drawn by exploiting temporal trends and knowledge embedded in previous historic visits. Along this direction, in our future work, we intend to further enhance the spatio-temporal capabilities of our scheme by resorting to more effective temporal processing approaches, such as the Gated Recurrent Unit (GRU) under the encoder-decoder paradigm. Additionally, drawing inspiration from recent work proposed in the field of knee segmentation [56], another possible route of experimentation is available, involving the joint treatment of the knee cartilage shape description and the KOA progression, formulated as a multi-task learning problem.

Author Contributions

Conceptualization, J.B.T.; data curation, C.G.C.; formal analysis, J.B.T.; methodology, J.B.T. and C.G.C.; project administration, J.B.T.; software, C.G.C.; supervision, J.B.T.; validation, J.B.T., C.G.C., and A.L.S.; visualization, J.B.T. and C.G.C.; writing—original draft, J.B.T.; writing—review and editing, J.B.T. and C.G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data used in this study are available in the Osteoarthritis Initiative repository at https://nda.nih.gov/oai (accessed on 1 March 2018).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Let us consider two surfaces

S_{1}

and

S_{2}

, and a mesh vertex

P \in S_{1}

. The corresponding point

Q = C R (P) \in S_{2}

is described as shown in Figure 3b. First, we draw the lines

{\vec{l}}_{1} = P R ⊥ S_{2}

and

{\vec{l}}_{2} = P T ⊥ S_{1}

, perpendicular to the surfaces

S_{2}

and

S_{1}

, respectively. Then, the intersection of the line

\vec{l} = ({\vec{l}}_{1} + {\vec{l}}_{2}) / 2

with the surface

S_{2}

defines the location of point

Q \in S_{2}

. Further, the length of the segment

P Q

defines the cartilage thickness between these two

C R

vertices,

T h = ∥ P Q ∥

.

References

Wang, Z.; Xiao, Z.; Sun, C.; Xu, G.; He, J. Global, regional and national burden of osteoarthritis in 1990–2021: A systematic analysis of the global burden of disease study 2021. BMC Musculoskelet. Disord. 2024, 25, 1021. [Google Scholar] [CrossRef] [PubMed]
Bastick, A.N.; Runhaar, J.; Belo, J.N.; Bierma-Zeinstra, S.M. Prognostic factors for progression of clinical osteoarthritis of the knee: A systematic review of observational studies. Arthritis Res. Ther. 2015, 17, 152. [Google Scholar] [CrossRef] [PubMed]
Kellgren, J.H.; Lawrence, J.S. Radiological Assessment of Osteo-Arthrosis. Ann. Rheum. Dis. 1957, 16, 494–502. [Google Scholar] [CrossRef]
Zhao, H.; Ou, L.; Zhang, Z.; Zhang, L.; Liu, K.; Kuang, J. The value of deep learning-based X-ray techniques in detecting and classifying K-L grades of knee osteoarthritis: A systematic review and meta-analysis. Eur. Radiol. 2024, 35, 327–340. [Google Scholar] [CrossRef] [PubMed]
Joo, P.; Borjali, A.; Chen, A.; Muratoglu, O.; Varadarajan, K. Defining and predicting radiographic knee osteoarthritis progression: A systematic review of findings from the osteoarthritis initiative. Knee Surgery Sport. Traumatol. Arthrosc. 2022, 30, 4015–4028. [Google Scholar] [CrossRef]
Lee, L.S.; Chan, P.K.; Wen, C.; Fung, W.C.; Cheung, A.; Chan, V.W.K.; Cheung, M.H.; Fu, H.; Yan, C.H.; Chiu, K.Y. Artifcial intelligence in diagnosis of knee osteoarthritis and prediction of arthroplasty outcomes: A review. Arhtroplasty 2022, 4, 16. [Google Scholar] [CrossRef]
Chadoulos, C.; Tsaopoulos, D.; Symeonidis, A.; Moustakidis, S.; Theocharis, J. Dense Multi-Scale Graph Convolutional Network for Knee Joint Cartilage Segmentation. Bioengineering 2024, 11, 278. [Google Scholar] [CrossRef]
Peterfy, C.; Schneider, E.; Nevitt, M. The osteoarthritis initiative: Report on the design rationale for the magnetic resonance imaging protocol for the knee. Osteoarthr. Cartil. 2008, 16, 1433–1441. [Google Scholar] [CrossRef]
Bai, J.; Gong, B.; Zhao, Y.; Lei, F.; Yan, C.; Gao, Y. Multi-Scale Representation Learning on Hypergraph for 3D Shape Retrieval and Recognition. IEEE Trans. Image Process. 2021, 30, 5327–5338. [Google Scholar] [CrossRef]
Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view Convolutional Neural Networks for 3D Shape Recognition. arXiv 2015, arXiv:1505.00880. [Google Scholar] [CrossRef]
Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 264–272. [Google Scholar] [CrossRef]
He, X.; Huang, T.; Bai, S.; Bai, X. View N-Gram Network for 3D Object Retrieval. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7514–7523. [Google Scholar] [CrossRef]
Yu, T.; Meng, J.; Yuan, J. Multi-view Harmonized Bilinear Network for 3D Object Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 186–194. [Google Scholar] [CrossRef]
Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; NIPS’17. pp. 5105–5114. [Google Scholar]
Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5560–5568. [Google Scholar] [CrossRef]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; 12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
Maturana, D.; Scherer, S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015; pp. 922–928. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; NieBner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and Multi-view CNNs for Object Classification on 3D Data. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5648–5656. [Google Scholar] [CrossRef]
Feng, Y.; Feng, Y.; You, H.; Zhao, X.; Gao, Y. MeshNet: Mesh Neural Network for 3D Shape Representation. AAAI 2019, 33, 8279–8286. [Google Scholar] [CrossRef]
Hanocka, R.; Hertz, A.; Fish, N.; Giryes, R.; Fleishman, S.; Cohen-Or, D. MeshCNN: A Network with an Edge. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
Wei, X.; Yu, R.; Sun, J. Learning View-Based Graph Convolutional Network for Multi-View 3D Shape Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7525–7541. [Google Scholar] [CrossRef]
Tang, W.; Qiu, G. Dense graph convolutional neural networks on 3D meshes for 3D object segmentation and classification. Image Vis. Comput. 2021, 114, 104265. [Google Scholar] [CrossRef]
Tiulpin, A.; Thevenot, J.; Rahtu, E.; Lehenkari, P.; Saarakkala, S. Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach. Sci. Rep. 2018, 8, 1727. [Google Scholar] [CrossRef]
Zhang, B.; Tan, J.; Cho, K.; Chang, G.; Deniz, C.M. Attention-based CNN for KL Grade Classification: Data from the Osteoarthritis Initiative. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 731–735. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Yong, C.W.; Teo, K.; Murphy, B.P.; Hum, Y.C.; Tee, Y.K.; Xia, K.; Lai, K.W. Knee osteoarthritis severity classification with ordinal regression module. Multimed. Tools Appl. 2022, 81, 41497–41509. [Google Scholar] [CrossRef]
Chen, P.; Gao, L.; Shi, X.; Allen, K.; Yang, L. Fully automatic knee osteoarthritis severity grading using deep neural networks with a novel ordinal loss. Comput. Med. Imaging Graph. 2019, 75, 84–92. [Google Scholar] [CrossRef]
V, V.K.; Kalpana, V.; Kumar, G.H. Evaluating the efficacy of deep learning models for knee osteoarthritis prediction based on Kellgren-Lawrence grading system. E-Prime Adv. Electr. Eng. Electron. Energy 2023, 5, 100266. [Google Scholar] [CrossRef]
Leung, K.; Zhang, B.; Tan, J.; Shen, Y.; Geras, K.J.; Babb, J.S.; Cho, K.; Chang, G.; Deniz, C.M. Prediction of Total Knee Replacement and Diagnosis of Osteoarthritis by Using Deep Learning on Knee Radiographs: Data from the Osteoarthritis Initiative. Radiology 2020, 296, 584–593. [Google Scholar] [CrossRef]
Alexopoulos, A.; Hirvasniemi, J.; Klein, S.; Donkervoort, C.; Oei, E.; Tümer, N. Early detection of knee osteoarthritis using deep learning on knee MRI. Osteoarthr. Imaging 2023, 3, 100112. [Google Scholar] [CrossRef]
Guan, B.; Liu, F.; Haj-Mirzaian, A.; Demehri, S.; Samsonov, A.; Neogi, T.; Guermazi, A.; Kijowski, R. Deep learning risk assessment models for predicting progression of radiographic medial joint space loss over a 48-MONTH follow-up period. Osteoarthr. Cartil. 2020, 28, 428–437. [Google Scholar] [CrossRef]
Almhdie-Imjabbar, A.; Nguyen, K.L.; Toumi, H.; Jennane, R.; Lespessailles, E. Prediction of knee osteoarthritis progression using radiological descriptors obtained from bone texture analysis and Siamese neural networks: Data from OAI and MOST cohorts. Arthritis Res. Ther. 2022, 24, 66. [Google Scholar] [CrossRef]
Du, Y.; Almajalid, R.; Shan, J.; Zhang, M. A Novel Method to Predict Knee Osteoarthritis Progression on MRI Using Machine Learning Methods. IEEE Trans. Nanobiosci. 2018, 17, 228–236. [Google Scholar] [CrossRef]
Halilaj, E.; Le, Y.; Hicks, J.; Hastie, T.; Delp, S. Modeling and predicting osteoarthritis progression: Data from the osteoarthritis initiative. Osteoarthr. Cartil. 2018, 26, 1643–1650. [Google Scholar] [CrossRef]
Hu, K.; Wu, W.; Li, W.; Simic, M.; Zomaya, A.; Wang, Z. Adversarial Evolving Neural Network for Longitudinal Knee Osteoarthritis Prediction. IEEE Trans. Med. Imaging 2022, 41, 3207–3217. [Google Scholar] [CrossRef] [PubMed]
Panfilov, E.; Saarakkala, S.; Nieminen, M.T.; Tiulpin, A. End-To-End Prediction of Knee Osteoarthritis Progression with Multi-Modal Transformers. arXiv 2023, arXiv:2307.00873. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. AAAI 2019, 33, 922–929. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph ConvolutionalNetwork for Traffic Prediction. IEEE Trans. Intell. Transport. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. arXiv 2016, arXiv:1606.09375. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar] [CrossRef]
Ma, Z.; Jiang, Z.; Zhang, H. Hyperspectral Image Classification Using Feature Fusion Hypergraph Convolution Neural Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5517314. [Google Scholar] [CrossRef]
Bai, S.; Zhang, F.; Torr, P.H. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
Ambellan, F.; Tack, A.; Ehlke, M.; Zachow, S. Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative. Med. Image Anal. 2019, 52, 109–118. [Google Scholar] [CrossRef] [PubMed]
Elshakhs, Y.; Deliparaschos, K.; Charalambous, T.; Oliva, G.; Zolotas, A. A Comprehensive Survey on Delaunay Triangulation: Applications, Algorithms, and Implementations Over CPUs, GPUs, and FPGAs. IEEE Access 2024, 12, 12562–12585. [Google Scholar] [CrossRef]
Lee, J.; Lee, I.; Kang, J. Self-Attention Graph Pooling. arXiv 2019, arXiv:1904.08082. [Google Scholar] [CrossRef]
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. AAAI 2020, 34, 914–921. [Google Scholar] [CrossRef]
Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
Nemenyi, P. Distribution-Free Multiple Comparisons. Ph.D. Thesis, Princeton University, Princeton, NJ, USA, 1963. [Google Scholar]
Tiulpin, A.; Saarakkala, S. Automatic Grading of Individual Knee Osteoarthritis Features in Plain Radiographs Using Deep Convolutional Neural Networks. Diagnostics 2020, 10, 932. [Google Scholar] [CrossRef]
Gorriz, M.; Antony, J.; Antony, J.; Ie, D.; McGuinness, K.; Mcguinness, K.; Ie, D.; Giro-i Nieto, X.; O’Connor, N.E.; Oconnor, N.; et al. Assessing Knee OA Severity with CNN attention-based end-to-end architectures. Proc. Mach. Learn. Res. 2019, 102, 197–204. [Google Scholar]
Pedoia, V.; Lee, J.; Norman, B.; Link, T.; Majumdar, S. Diagnosing osteoarthritis from T2 maps using deep learning: An analysis of the entire Osteoarthritis Initiative baseline cohort. Osteoarthr. Cartil. 2019, 27, 1002–1010. [Google Scholar] [CrossRef] [PubMed]
Alexopoulos, A.; Hirvasniemi, J.; Tumer, N. Early detection of knee osteoarthritis using deep learning on knee magnetic resonance images. arXiv 2022, arXiv:2209.01192. [Google Scholar]
Schiratti, J.B.; Dubois, R.; Herent, P.; Cahané, D.; Dachary, J.; Clozel, T.; Wainrib, G.; Keime-Guibert, F.; Lalande, A.; Pueyo, M.; et al. A deep learning method for predicting knee osteoarthritis radiographic progression from MRI. Arthritis Res. Ther. 2021, 23, 262. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Lv, S.; Li, M.; Zhang, J.; Jiang, Y.; Qin, Y.; Luo, H.; Yin, S. SDMT: Spatial Dependence Multi-Task Transformer Network for 3D Knee MRI Segmentation and Landmark Localization. IEEE Trans. Med Imaging 2023, 42, 2274–2285. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed methodology. (a) Preliminary stage of 3D cartilage segmentation. (b) Graph construction of volumetric nodes. (c) C_Shape.Net providing 3D shape descriptors of the cartilage volume. (d) The ST_HGCN predictor network providing longitudinal predictions of KOA grades and progression.

Figure 2. (a) Slices of an MRI segmentation map, showcasing the femoral (gray) and tibial (white) cartilages in the sagittal plane. (b) A 2D illustrative drawing of a particular slice showing the cartilage regions and surfaces involved. (c) Explanatory scheme describing the formation of corresponding vertices between two surfaces.

Figure 3. Cartilage mesh face structure. (a) A 1-ring neighborhood of mesh face. (b) A 3D prism formed by an upper and bottom triangular cell (face). (c) Neighborhood structure around a triangular cell.

Figure 4. Proposed C_Shape.Net Hierarchical Architecture. It includes the DHGCN convolutional blocks, the graph coarsening, and readout layers. Global features across the different layers are passed through FC, to provide the global descriptor of a cartilage compartment.

Figure 5. DHGCN Convolutional Block. It comprises the local multi-view convolutional (LMV) and the global multi-view convolutional (GMV) modules, intertwined into a four-layered model.

Figure 6. Proposed ST_HGCN Predictor Network for longitudinal KOA predictions. It incorporates the spatial HGCN convolutions, the attention-based temporal fusion, and the transformer module. Data are distinguished into a historical stage of knee shape sequences, and a prediction stage of KOA prediction sequences. We also show the spatio-temporal interconnection of hypergraphs along the historic/prediction follow-up times.

Figure 7. Performance (AUC) w.r.t. prediction of KOA severity progression (maximum historical depth

P = 3

), under different transition scenarios.

{\tilde{t}}_{H}

denotes the averaged performance along future time steps.

Figure 7. Performance (AUC) w.r.t. prediction of KOA severity progression (maximum historical depth

P = 3

), under different transition scenarios.

{\tilde{t}}_{H}

denotes the averaged performance along future time steps.

Figure 8. Performance (AUC) w.r.t. prediction of OA progression incidence (KL-grade change

\geq 1

) at multiple time points ahead, for varying historic depths P.

Figure 8. Performance (AUC) w.r.t. prediction of OA progression incidence (KL-grade change

\geq 1

) at multiple time points ahead, for varying historic depths P.

Figure 9. KOA progression for a specific knee in the OAI dataset. The first four images in each row correspond to the historic stage, while the latter four refer to the prediction stage. For the first two rows, dark orange and dark purple represent the tibial and femoral bones, respectively, while light yellow and light pink represent the corresponding cartilage structures. In the last row, the red part constitutes the femoral bone while the light yellow represents the femoral cartilage. Ground Truth KL-grade progression:

0 \to 1 \to 1 \to 2 \to 2 \to 3 \to 3 \to 4

/ Predicted KL-grade progression: 0 → 0 → 1 → 2 → 2 → 2 → 3 → 4 (green indicates correct prediction, red indicates erroneous prediction).

Figure 9. KOA progression for a specific knee in the OAI dataset. The first four images in each row correspond to the historic stage, while the latter four refer to the prediction stage. For the first two rows, dark orange and dark purple represent the tibial and femoral bones, respectively, while light yellow and light pink represent the corresponding cartilage structures. In the last row, the red part constitutes the femoral bone while the light yellow represents the femoral cartilage. Ground Truth KL-grade progression:

0 \to 1 \to 1 \to 2 \to 2 \to 3 \to 3 \to 4

/ Predicted KL-grade progression: 0 → 0 → 1 → 2 → 2 → 2 → 3 → 4 (green indicates correct prediction, red indicates erroneous prediction).

Table 1. KL-grade distribution at the multiple historical time points considered in this study.

	Baseline	12-Month	24-Month	36-Month
$K L 0$	2500	2401	2369	2336
$K L 1$	1206	1153	1131	1082
$K L 2$	1610	1645	1640	1636
$K L 3$	774	841	863	910
$K L 4$	138	188	225	264

Table 2. Summary of the proposed model’s hyperparameter ranges and optimal values determined via Grid-Search + 5-fold Cross-Validation. Bold indicates the optimal parameter value.

	#Attention-Heads	Pooling Ratio	Convex Comb. Parameter ( $λ$ )	Batch Size	Dropout Prob.
Range & Opt. value	$[2, 4, 6, 8]$	$[. 20, . 40, 0.60, . 80]$	$[. 20, . 40, . 60, . 80]$	$[64, 128, 256]$	$[. 25, 0.50, . 75]$

Table 3. Performance of node KOA classifications under varying-size

(P)

of historical windows. Results are reported for the Medial/Lateral compartments and their combination. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

Table 3. Performance of node KOA classifications under varying-size

(P)

of historical windows. Results are reported for the Medial/Lateral compartments and their combination. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

	Medial			Lateral			Medial + Lateral
P	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$
0	$84.21 %$	$86.05 %$	$85.61 %$	$83.92 %$	$85.88 %$	$87.03 %$	$84.07 %$	$85.94 %$	$86.29 %$
0	$(\pm 0.054)$	$(\pm 0.081)$	$(\pm 0.049)$	$(\pm 0.087)$	$(\pm 0.101)$	$(\pm 0.024)$	$(\pm 0.092)$	$(\pm 0.037)$	$(\pm 0.062)$
1	$88.46 %$	$89.13 %$	$88.24 %$	$87.69 %$	$88.71 %$	$90.54 %$	$88.12 %$	$89.02 %$	$90.22 %$
1	$(\pm 0.032)$	$(\pm 0.114)$	$(\pm 0.078)$	$(\pm 0.094)$	$(\pm 0.045)$	$(\pm 0.079)$	$(\pm 0.117)$	$(\pm 0.065)$	$(\pm 0.089)$
2	$92.18 %$	$93.01 %$	$94.17 %$	$93.09 %$	$95.08 %$	$95.12 %$	$92.51 %$	$93.98 %$	$94.79 %$
2	$(\pm 0.028)$	$(\pm 0.076)$	$(\pm 0.088)$	$(\pm 0.039)$	$(\pm 0.105)$	$(\pm 0.066)$	$(\pm 0.082)$	$(\pm 0.057)$	$(\pm 0.083)$
3	$96.82 %$	$97.19 %$	$97.83 %$	$95.47 %$	$97.85 %$	$98.61 %$	$95.98 %$	$97.49 %$	$97.92 %$
3	$(\pm 0.067)$	$(\pm 0.092)$	$(\pm 0.11)$	$(\pm 0.039)$	$(\pm 0.075)$	$(\pm 0.041)$	$(\pm 0.056)$	$(\pm 0.083)$	$(\pm 0.077)$

Table 4. Performance of KL longitudinal predictions under varying-size

(P)

of historical windows. Metrics: [Balanced Accuracy (

BA

), F1-weighted measure (

F 1

), Cohen’s Kappa (K), Overall Accuracy averaged across future predictions (

{\tilde{t}}_{H}

)].

Table 4. Performance of KL longitudinal predictions under varying-size

(P)

of historical windows. Metrics: [Balanced Accuracy (

BA

), F1-weighted measure (

F 1

), Cohen’s Kappa (K), Overall Accuracy averaged across future predictions (

{\tilde{t}}_{H}

)].

	$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$			${\tilde{t}}_{P}$
P	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$OA$
0	$67.49 %$	$65.19 %$	$65.02 %$	$60.24 %$	$62.19 %$	$61.75 %$	$58.02 %$	$60.13 %$	$59.88 %$	$54.19 %$	$56.01 %$	$55.41 %$	$60.02 %$
0	$(\pm 0.089)$	$(\pm 0.075)$	$(\pm 0.093)$	$(\pm 0.105)$	$(\pm 0.088)$	$(\pm 0.164)$	$(\pm 0.079)$	$(\pm 0.112)$	$(\pm 0.069)$	$(\pm 0.122)$	$(\pm 0.096)$	$(\pm 0.080)$	$(\pm 0.101)$
1	$75.84 %$	$77.01 %$	$76.73 %$	$72.16 %$	$74.28 %$	$73.64 %$	$68.07 %$	$69.54 %$	$67.91 %$	$62.08 %$	$64.10 %$	$63.59 %$	$69.53 %$
1	$(\pm 0.068)$	$(\pm 0.082)$	$(\pm 0.075)$	$(\pm 0.094)$	$(\pm 0.088)$	$(\pm 0.059)$	$(\pm 0.076)$	$(\pm 0.081)$	$(\pm 0.069)$	$(\pm 0.081)$	$(\pm 0.104)$	$(\pm 0.095)$	$(\pm 0.111)$
2	$84.95 %$	$86.61 %$	$85.39 %$	$79.28 %$	$80.37 %$	$77.87 %$	$75.17 %$	$77.23 %$	$76.49 %$	$71.58 %$	$73.14 %$	$72.01 %$	$77.74 %$
2	$(\pm 0.048)$	$(\pm 0.066)$	$(\pm 0.081)$	$(\pm 0.057)$	$(\pm 0.062)$	$(\pm 0.074)$	$(\pm 0.088)$	$(\pm 0.053)$	$(\pm 0.071)$	$(\pm 0.099)$	$(\pm 0.071)$	$(\pm 0.057)$	$(\pm 0.094)$
3	$91.89 %$	$93.13 %$	$92.81 %$	$88.11 %$	$89.69 %$	$88.81 %$	$84.35 %$	$87.06 %$	$85.92 %$	$79.41 %$	$82.02 %$	$80.19 %$	$85.94 %$
3	$(\pm 0.052)$	$(\pm 0.067)$	$(\pm 0.058)$	$(\pm 0.078)$	$(\pm 0.095)$	$(\pm 0.072)$	$(\pm 0.088)$	$(\pm 0.064)$	$(\pm 0.083)$	$(\pm 0.093)$	$(\pm 0.069)$	$(\pm 0.101)$	$(\pm 0.091)$

Table 5. Pairwise comparisons using the Nemenyi statistical test. We evaluate the performance of ST_HGCN under various historical depths for multiple step-ahead predictions. Reported are the corresponding p-values.

	$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$
	$P = 0$	$P = 1$	$P = 2$	$P = 0$	$P = 1$	$P = 2$	$P = 0$	$P = 1$	$P = 2$	$P = 0$	$P = 1$	$P = 2$
$P = 1$	$1.91 \times 10^{- 4}$	-	-	$4.97 \times 10^{- 5}$	-	-	$7.19 \times 10^{- 4}$	-	-	$2.07 \times 10^{- 3}$	-	-
$P = 2$	$5.44 \times 10^{- 6}$	$0.22 \times 10^{- 6}$	-	$3.18 \times 10^{- 5}$	$7.54 \times 10^{- 4}$	-	$1.33 \times 10^{- 5}$	$0.87 \times 10^{- 3}$	-	$5.08 \times 10^{- 4}$	$9.17 \times 10^{- 3}$	-
$P = 3$	$3.62 \times 10^{- 12}$	$7.41 \times 10^{- 08}$	$9.37 \times 10^{- 6}$	$8.49 \times 10^{- 10}$	$1.65 \times 10^{- 7}$	$6.44 \times 10^{- 6}$	$3.27 \times 10^{- 9}$	$1.98 \times 10^{- 7}$	$8.27 \times 10^{- 4}$	$4.37 \times 10^{- 6}$	$0.59 \times 10^{- 5}$	$1.28 \times 10 \times 10^{- 5}$

Table 6. Balanced accuracy (BA) of KL longitudinal predictions for a historical depth of

P = 3

w.r.t. demographic groups of {Gender, Age, and BMI}.

Table 6. Balanced accuracy (BA) of KL longitudinal predictions for a historical depth of

P = 3

w.r.t. demographic groups of {Gender, Age, and BMI}.

	Gender		Age				BMI
	Male	Female	<50	$[50, 59)$	$[60, 69)$	≥70	Normal	Pre-Obesity	Obesity
$t_{P + 1}$	$90.21 %$	$92.37 %$	$93.04 %$	$92.12 %$	$89.65 %$	$88.34 %$	$91.95 %$	$93.02 %$	$87.57 %$
$t_{P + 1}$	$(\pm 0.116)$	$(\pm 0.099)$	$(\pm 0.078)$	$(\pm 0.091)$	$(\pm 0.107)$	$(\pm 0.077)$	$(\pm 0.77)$	$(\pm 0.079)$	$(\pm 0.072)$
$t_{P + 2}$	$86.98 %$	$90.19 %$	$90.24 %$	$90.01 %$	$88.53 %$	$86.67 %$	$91.05 %$	$89.92 %$	$86.52 %$
$t_{P + 2}$	$(\pm 0.084)$	$(\pm 0.092)$	$(\pm 0.049)$	$(\pm 0.061)$	$(\pm 0.066)$	$(\pm 0.087)$	$(\pm 0.109)$	$(\pm 0.081)$	$(\pm 0.064)$
$t_{P + 3}$	$83.47 %$	$86.02 %$	$84.49 %$	$86.72 %$	$82.48 %$	$80.91 %$	$85.82 %$	$83.91 %$	$83.11 %$
$t_{P + 3}$	$(\pm 0.097)$	$(\pm 0.086)$	$(\pm 0.053)$	$(\pm 0.071)$	$(\pm 0.047)$	$(\pm 0.073)$	$(\pm 0.101)$	$(\pm 0.083)$	$(\pm 0.055)$
$t_{P + 4}$	$77.97 %$	$80.18 %$	$81.58 %$	$81.73 %$	$79.18 %$	$78.14 %$	$82.37 %$	$79.85 %$	$76.63 %$
$t_{P + 4}$	$(\pm 0.072)$	$(\pm 0.098)$	$(\pm 0.119)$	$(\pm 0.091)$	$(\pm 0.066)$	$(\pm 0.081)$	$(\pm 0.059)$	$(\pm 0.082)$	$(\pm 0.090)$

Table 7. Performance of KOA longitudinal predictions for historic depth

P = 3

. Subjects are partitioned according to the stage of KL grades at baseline:

NON - OA

:

{0, 1}, EARLY - OA

:

{2}, SEVERE - OA

:

{3, 4}

. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K), Overall Accuracy averaged across future predictions

({\tilde{t}}_{P})

].

Table 7. Performance of KOA longitudinal predictions for historic depth

P = 3

. Subjects are partitioned according to the stage of KL grades at baseline:

NON - OA

:

{0, 1}, EARLY - OA

:

{2}, SEVERE - OA

:

{3, 4}

. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K), Overall Accuracy averaged across future predictions

({\tilde{t}}_{P})

].

	$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$			${\tilde{t}}_{P}$
	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$OA$
$C_{1}$	$94.87 %$	$95.74 %$	$95.02 %$	$91.06 %$	$92.73 %$	$91.33 %$	$88.61 %$	$89.09 %$	$88.96 %$	$84.37 %$	$86.22 %$	$84.81 %$	$89.73 %$
$C_{1}$	$(\pm 0.041)$	$(\pm 0.067)$	$(\pm 0.77)$	$(\pm 0.089)$	$(\pm 0.103)$	$(\pm 0.097)$	$(\pm 0.082)$	$(\pm 0.099)$	$(\pm 0.105)$	$(\pm 0.055)$	$(\pm 0.079)$	$(\pm 0.101)$	$(\pm 0.083)$
$C_{2}$	$92.13 %$	$94.05 %$	$93.49 %$	$88.14 %$	$89.43 %$	$89.07 %$	$84.19 %$	$85.67 %$	$84.55 %$	$79.39 %$	$81.27 %$	$80.02 %$	$85.94 %$
$C_{2}$	$(\pm 0.106)$	$(\pm 0.099)$	$(\pm 0.113)$	$(\pm 0.124)$	$(\pm 0.085)$	$(\pm 0.092)$	$(\pm 0.103)$	$(\pm 0.077)$	$(\pm 0.081)$	$(\pm 0.108)$	$(\pm 0.090)$	$(\pm 0.102)$	$(\pm 0.092)$
$C_{3}$	$88.65 %$	$90.16 %$	$89.21 %$	$85.17 %$	$86.67 %$	$85.93 %$	$80.43 %$	$81.12 %$	$80.78 %$	$74.68 %$	$76.33 %$	$75.47 %$	$82.23 %$
$C_{3}$	$(\pm 0.077)$	$(\pm 0.092)$	$(\pm 0.064)$	$(\pm 0.081)$	$(\pm 0.102)$	$(\pm 0.079)$	$(\pm 0.076)$	$(\pm 0.091)$	$(\pm 0.107)$	$(\pm 0.099)$	$(\pm 0.111)$	$(\pm 0.088)$	$(\pm 0.097)$

Table 8. Prediction accuracies of KOA progression for historical depth

P = 3

. KOA progression of patients over time is categorized according to the following progress classes: Prog₊₁ indicates the progressions

NON - OA \to EARLY - OA a n d EARLY - OA \to SEVERE - OA

, while Prog₊₂ indicates the progression

NON - OA \to SEVERE - OA

. No-Prog refers to subjects that remain static in their respective OA grading. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

Table 8. Prediction accuracies of KOA progression for historical depth

P = 3

. KOA progression of patients over time is categorized according to the following progress classes: Prog₊₁ indicates the progressions

NON - OA \to EARLY - OA a n d EARLY - OA \to SEVERE - OA

, while Prog₊₂ indicates the progression

NON - OA \to SEVERE - OA

. No-Prog refers to subjects that remain static in their respective OA grading. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

	$Ref \to t_{P + 1}$			$Ref \to t_{P + 2}$			$Ref \to t_{P + 3}$			$Ref \to t_{P + 4}$
	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$
No-Prog	$94.49 %$	$95.02 %$	$95.15 %$	$93.16 %$	$93.52 %$	$93.74 %$	$88.47 %$	$89.23 %$	$89.02 %$	$84.72 %$	$85.01 %$	$85.61 %$
No-Prog	$(\pm 0.059)$	$(\pm 0.082)$	$(\pm 0.077)$	$(\pm 0.064)$	$(\pm 0.092)$	$(\pm 0.061)$	$(\pm 0.049)$	$(\pm 0.056)$	$(\pm 0.074)$	$(\pm 0.092)$	$(\pm 0.071)$	$(\pm 0.062)$
Prog₊₁	$89.75 %$	$90.14 %$	$89.01 %$	$84.25 %$	$84.72 %$	$83.88 %$	$79.39 %$	$80.27 %$	$77.91 %$	$74.37 %$	$76.19 %$	$75.59 %$
Prog₊₁	$(\pm 0.087)$	$(\pm 0.092)$	$(\pm 0.076)$	$(\pm 0.107)$	$(\pm 0.114)$	$(\pm 0.091)$	$(\pm 0.123)$	$(\pm 0.099)$	$(\pm 0.104)$	$(\pm 0.086)$	$(\pm 0.091)$	$(\pm 0.098)$
Prog₊₂	$91.41 %$	$92.31 %$	$92.03 %$	$86.92 %$	$87.49 %$	$85.90 %$	$83.05 %$	$83.56 %$	$82.37 %$	$79.14 %$	$79.88 %$	$78.24 %$
Prog₊₂	$(\pm 0.066)$	$(\pm 0.082)$	$(\pm 0.059)$	$(\pm 0.087)$	$(\pm 0.092)$	$(\pm 0.105)$	$(\pm 0.092)$	$(\pm 0.088)$	$(\pm 0.103)$	$(\pm 0.116)$	$(\pm 0.108)$	$(\pm 0.099)$

Table 9. Summary of KOA classification results of the proposed method, under varying-size historical windows. We examine the effect of temporal fusion via adaptive attention mechanism vs. temporal fusion via mean aggregation. ATTN indicates temporal fusion via trainable attention mechanism, while MEAN indicates temporal fusion via simple averaging. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

Table 9. Summary of KOA classification results of the proposed method, under varying-size historical windows. We examine the effect of temporal fusion via adaptive attention mechanism vs. temporal fusion via mean aggregation. ATTN indicates temporal fusion via trainable attention mechanism, while MEAN indicates temporal fusion via simple averaging. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

		$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$
$P$		$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$
P₁	ATTN	$75.84 %$	$77.01 %$	$76.73 %$	$72.16 %$	$74.28 %$	$73.64 %$	$68.07 %$	$69.54 %$	$67.91 %$	$62.08 %$	$64.10 %$	$63.59 %$
	ATTN	$(\pm 0.068)$	$(\pm 0.082)$	$(\pm 0.075)$	$(\pm 0.094)$	$(\pm 0.088)$	$(\pm 0.059)$	$(\pm 0.076)$	$(\pm 0.081)$	$(\pm 0.069)$	$(\pm 0.081)$	$(\pm 0.104)$	$(\pm 0.095)$
	MEAN	$73.49 %$	$75.31 %$	$74.98 %$	$70.24 %$	$71.87 %$	$71.12 %$	$64.31 %$	$65.68 %$	$64.93 %$	$58.53 %$	$59.67 %$	$59.02 %$
	MEAN	$(\pm 0.092)$	$(\pm 0.104)$	$(\pm 0.112)$	$(\pm 0.141)$	$(\pm 0.098)$	$(\pm 0.132)$	$(\pm 0.107)$	$(\pm 0.113)$	$(\pm 0.127)$	$(\pm 0.106)$	$(\pm 0.089)$	$(\pm 0.117)$
P₂	ATTN	$84.95 %$	$86.61 %$	$85.39 %$	$79.28 %$	$80.37 %$	$77.87 %$	$75.17 %$	$77.23 %$	$76.49 %$	$71.58 %$	$73.14 %$	$72.01 %$
	ATTN	$(\pm 0.048)$	$(\pm 0.066)$	$(\pm 0.081)$	$(\pm 0.057)$	$(\pm 0.062)$	$(\pm 0.074)$	$(\pm 0.088)$	$(\pm 0.053)$	$(\pm 0.071)$	$(\pm 0.099)$	$(\pm 0.071)$	$(\pm 0.057)$
	MEAN	$80.26 %$	$81.17 %$	$80.79 %$	$75.07 %$	$76.91 %$	$75.67 %$	$71.31 %$	$73.08 %$	$72.56 %$	$65.22 %$	$67.08 %$	$66.14 %$
	MEAN	$(\pm 0.095)$	$(\pm 0.108)$	$(\pm 0.102)$	$(\pm 0.088)$	$(\pm 0.090)$	$(\pm 0.112)$	$(\pm 0.131)$	$(\pm 0.108)$	$(\pm 0.143)$	$(\pm 0.109)$	$(\pm 0.135)$	$(\pm 0.116)$
P₃	ATTN	$91.89 %$	$93.13 %$	$92.81 %$	$88.11 %$	$89.69 %$	$88.81 %$	$84.35 %$	$87.06 %$	$85.92 %$	$79.41 %$	$82.02 %$	$80.19 %$
	ATTN	$(\pm 0.052)$	$(\pm 0.067)$	$(\pm 0.058)$	$(\pm 0.078)$	$(\pm 0.095)$	$(\pm 0.072)$	$(\pm 0.088)$	$(\pm 0.064)$	$(\pm 0.083)$	$(\pm 0.093)$	$(\pm 0.069)$	$(\pm 0.101)$
	MEAN	$86.91 %$	$88.63 %$	$87.38 %$	$82.92 %$	$83.16 %$	$81.33 %$	$79.58 %$	$82.42 %$	$81.36 %$	$75.14 %$	$75.67 %$	$74.94 %$
	MEAN	$(\pm 0.088)$	$(\pm 0.092)$	$(\pm 0.084)$	$(\pm 0.091)$	$(\pm 0.079)$	$(\pm 0.101)$	$(\pm 0.099)$	$(\pm 0.073)$	$(\pm 0.082)$	$(\pm 0.095)$	$(\pm 0.101)$	$(\pm 0.097)$

Table 10. Summary of KOA classification results of the proposed method, under varying-size

(P)

of historical windows. We examine the effect of using solely features learned by ShapeNetvs. incorporating additional demographic data of {Age, BMI, Gender}. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

Table 10. Summary of KOA classification results of the proposed method, under varying-size

(P)

of historical windows. We examine the effect of using solely features learned by ShapeNetvs. incorporating additional demographic data of {Age, BMI, Gender}. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

		$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$
$P$		$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$
0	ShapeNet + Dem	$67.49 %$	$65.19 %$	$65.02 %$	$60.24 %$	$62.19 %$	$61.75 %$	$58.02 %$	$60.13 %$	$59.88 %$	$54.19 %$	$56.01 %$	$55.41 %$
	ShapeNet + Dem	$(\pm 0.089)$	$(\pm 0.075)$	$(\pm 0.093)$	$(\pm 0.105)$	$(\pm 0.088)$	$(\pm 0.164)$	$(\pm 0.079)$	$(\pm 0.112)$	$(\pm 0.069)$	$(\pm 0.122)$	$(\pm 0.096)$	$(\pm 0.080)$
	ShapeNet	$65.38 %$	$63.72 %$	$62.83 %$	$57.98 %$	$59.11 %$	$58.64 %$	$55.47 %$	$56.06 %$	$55.24 %$	$50.63 %$	$51.12 %$	$50.96 %$
	ShapeNet	$(\pm 0.112)$	$(\pm 0.108)$	$(\pm 0.137)$	$(\pm 0.165)$	$(\pm 0.142)$	$(\pm 0.119)$	$(\pm 0.137)$	$(\pm 0.121)$	$(\pm 0.106)$	$(\pm 0.113)$	$(\pm 0.136)$	$(\pm 0.120)$
1	ShapeNet + Dem	$75.84 %$	$77.01 %$	$76.73 %$	$72.16 %$	$74.28 %$	$73.64 %$	$68.07 %$	$69.54 %$	$67.91 %$	$62.08 %$	$64.10 %$	$63.59 %$
	ShapeNet + Dem	$(\pm 0.068)$	$(\pm 0.082)$	$(\pm 0.075)$	$(\pm 0.094)$	$(\pm 0.088)$	$(\pm 0.059)$	$(\pm 0.076)$	$(\pm 0.081)$	$(\pm 0.069)$	$(\pm 0.081)$	$(\pm 0.104)$	$(\pm 0.095)$
	ShapeNet	$73.71 %$	$74.98 %$	$74.19 %$	$69.91 %$	$71.57 %$	$70.45 %$	$65.19 %$	$66.02 %$	$65.76 %$	$58.36 %$	$59.17 %$	$59.03 %$
	ShapeNet	$(\pm 0.095)$	$(\pm 0.108)$	$(\pm 0.111)$	$(\pm 0.089)$	$(\pm 0.113)$	$(\pm 0.099)$	$(\pm 0.127)$	$(\pm 0.099)$	$(\pm 0.101)$	$(\pm 0.118)$	$(\pm 0.131)$	$(\pm 0.109)$
2	ShapeNet + Dem	$84.95 %$	$86.61 %$	$85.39 %$	$79.28 %$	$80.37 %$	$77.87 %$	$75.17 %$	$77.23 %$	$76.49 %$	$71.58 %$	$73.14 %$	$72.01 %$
	ShapeNet + Dem	$(\pm 0.048)$	$(\pm 0.066)$	$(\pm 0.081)$	$(\pm 0.057)$	$(\pm 0.062)$	$(\pm 0.074)$	$(\pm 0.088)$	$(\pm 0.053)$	$(\pm 0.071)$	$(\pm 0.099)$	$(\pm 0.071)$	$(\pm 0.057)$
	ShapeNet	$82.97 %$	$84.39 %$	$83.82 %$	$75.88 %$	$76.46 %$	$76.11 %$	$70.32 %$	$73.06 %$	$71.19 %$	$67.02 %$	$68.13 %$	$67.74 %$
	ShapeNet	$(\pm 0.089)$	$(\pm 0.101)$	$(\pm 0.092)$	$(\pm 0.078)$	$(\pm 0.097)$	$(\pm 0.112)$	$(\pm 0.088)$	$(\pm 0.090)$	$(\pm 0.102)$	$(\pm 0.076)$	$(\pm 0.089)$	$(\pm 0.081)$
3	ShapeNet + Dem	$91.89 %$	$93.13 %$	$92.81 %$	$88.11 %$	$89.69 %$	$88.81 %$	$84.35 %$	$87.06 %$	$85.92 %$	$79.41 %$	$82.02 %$	$80.19 %$
	ShapeNet + Dem	$(\pm 0.052)$	$(\pm 0.067)$	$(\pm 0.058)$	$(\pm 0.078)$	$(\pm 0.095)$	$(\pm 0.072)$	$(\pm 0.088)$	$(\pm 0.064)$	$(\pm 0.083)$	$(\pm 0.093)$	$(\pm 0.069)$	$(\pm 0.101)$
	ShapeNet	$88.34 %$	$90.17 %$	$89.61 %$	$85.87 %$	$87.14 %$	$86.42 %$	$80.38 %$	$82.19 %$	$81.33 %$	$74.85 %$	$76.17 %$	$75.49 %$
	ShapeNet	$(\pm 0.076)$	$(\pm 0.092)$	$(\pm 0.079)$	$(\pm 0.102)$	$(\pm 0.133)$	$(\pm 0.089)$	$(\pm 0.114)$	$(\pm 0.105)$	$(\pm 0.096)$	$(\pm 0.118)$	$(\pm 0.131)$	$(\pm 0.119)$

Table 11. Summary of KOA classification results under varying-size

(P)

of historical windows. We examine the effect of using the automatically learned volumetric features extracted by C_Shape.Net vs. computing the Cartilage Damage Index (CDI), at specified points in the MRI scans. Both cases incorporate additional demographic features of {Age, BMI, Gender}. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

Table 11. Summary of KOA classification results under varying-size

(P)

of historical windows. We examine the effect of using the automatically learned volumetric features extracted by C_Shape.Net vs. computing the Cartilage Damage Index (CDI), at specified points in the MRI scans. Both cases incorporate additional demographic features of {Age, BMI, Gender}. Metrics: [Balanced Accuracy (

B A

), F1-weighted measure (

F 1

), Cohen’s Kappa (K)].

		$t_{P + 1}$			$t_{P + 2}$			$t_{P + 3}$			$t_{P + 4}$
$P$		$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$	$BA$	$F 1$	$K$
0	ShapeNet + Dem	$67.49 %$	$65.19 %$	$65.02 %$	$60.24 %$	$62.19 %$	$61.75 %$	$58.02 %$	$60.13 %$	$59.88 %$	$54.19 %$	$56.01 %$	$55.41 %$
	ShapeNet + Dem	$(\pm 0.089)$	$(\pm 0.075)$	$(\pm 0.093)$	$(\pm 0.105)$	$(\pm 0.088)$	$(\pm 0.164)$	$(\pm 0.079)$	$(\pm 0.112)$	$(\pm 0.069)$	$(\pm 0.122)$	$(\pm 0.096)$	$(\pm 0.080)$
	CDI + Dem	$64.28 %$	$62.27 %$	$62.85 %$	$55.73 %$	$56.08 %$	$55.47 %$	$53.19 %$	$53.94 %$	$52.71 %$	$45.33 %$	$46.12 %$	$46.09 %$
	CDI + Dem	$(\pm 0.199)$	$(\pm 0.223)$	$(\pm 0.176)$	$(\pm 0.208)$	$(\pm 0.264)$	$(\pm 0.188)$	$(\pm 0.217)$	$(\pm 0.232)$	$(\pm 0.186)$	$(\pm 0.235)$	$(\pm 0.250)$	$(\pm 0.193)$
1	ShapeNet + Dem	$75.84 %$	$77.01 %$	$76.73 %$	$72.16 %$	$74.28 %$	$73.64 %$	$68.07 %$	$69.54 %$	$67.91 %$	$62.08 %$	$64.10 %$	$63.59 %$
	ShapeNet + Dem	$(\pm 0.068)$	$(\pm 0.082)$	$(\pm 0.075)$	$(\pm 0.094)$	$(\pm 0.088)$	$(\pm 0.059)$	$(\pm 0.076)$	$(\pm 0.081)$	$(\pm 0.069)$	$(\pm 0.081)$	$(\pm 0.104)$	$(\pm 0.095)$
	CDI + Dem	$73.47 %$	$74.02 %$	$73.38 %$	$67.92 %$	$69.14 %$	$68.53 %$	$60.64 %$	$61.79 %$	$61.27 %$	$54.18 %$	$55.07 %$	$54.81 %$
	CDI + Dem	$(\pm 0.151)$	$(\pm 0.186)$	$(\pm 0.154)$	$(\pm 0.192)$	$(\pm 0.218)$	$(\pm 0.165)$	$(\pm 0.202)$	$(\pm 0.229)$	$(\pm 0.171)$	$(\pm 0.194)$	$(\pm 0.231)$	$(\pm 0.187)$
2	ShapeNet + Dem	$84.95 %$	$86.61 %$	$85.39 %$	$79.28 %$	$80.37 %$	$77.87 %$	$75.17 %$	$77.23 %$	$76.49 %$	$71.58 %$	$73.14 %$	$72.01 %$
	ShapeNet + Dem	$(\pm 0.048)$	$(\pm 0.066)$	$(\pm 0.081)$	$(\pm 0.057)$	$(\pm 0.062)$	$(\pm 0.074)$	$(\pm 0.088)$	$(\pm 0.053)$	$(\pm 0.071)$	$(\pm 0.099)$	$(\pm 0.071)$	$(\pm 0.057)$
	CDI + Dem	$81.37 %$	$82.15 %$	$82.03 %$	$74.21 %$	$75.22 %$	$73.96 %$	$70.18 %$	$71.42 %$	$71.29 %$	$64.55 %$	$66.37 %$	$64.93 %$
	CDI + Dem	$(\pm 0.139)$	$(\pm 0.166)$	$(\pm 0.153)$	$(\pm 0.178)$	$(\pm 0.199)$	$(\pm 0.133)$	$(\pm 0.184)$	$(\pm 0.205)$	$(\pm 0.149)$	$(\pm 0.180)$	$(\pm 0.193)$	$(\pm 0.142)$
3	ShapeNet + Dem	$91.89 %$	$93.13 %$	$92.81 %$	$88.11 %$	$89.69 %$	$88.81 %$	$84.35 %$	$87.06 %$	$85.92 %$	$79.41 %$	$82.02 %$	$80.19 %$
	ShapeNet + Dem	$(\pm 0.052)$	$(\pm 0.067)$	$(\pm 0.058)$	$(\pm 0.078)$	$(\pm 0.095)$	$(\pm 0.072)$	$(\pm 0.088)$	$(\pm 0.064)$	$(\pm 0.083)$	$(\pm 0.093)$	$(\pm 0.069)$	$(\pm 0.101)$
	CDI + Dem	$88.36 %$	$89.34 %$	$88.92 %$	$84.41 %$	$85.03 %$	$84.67 %$	$80.29 %$	$82.53 %$	$81.16 %$	$73.86 %$	$74.45 %$	$74.07 %$
	CDI + Dem	$(\pm 0.098)$	$(\pm 0.104)$	$(\pm 0.117)$	$(\pm 0.091)$	$(\pm 0.133)$	$(\pm 0.111)$	$(\pm 0.128)$	$(\pm 0.106)$	$(\pm 0.133)$	$(\pm 0.147)$	$(\pm 0.116)$	$(\pm 0.154)$

Table 12. Table showcasing future KL predictions under varying historical depths, against the ground truth values (Predicted Value/Ground Truth).

	$+ 0$	$+ 1$	$+ 2$	$+ 3$	$+ 4$	$+ 5$	$+ 6$	$+ 7$
$P = 0$	✗	$2 / 2$	$2 / 3$	$2 / 3$	$2 / 4$	✗	✗	✗
$P = 1$	✗	✗	$2 / 2$	$2 / 3$	$3 / 3$	$3 / 4$	✗	✗
$P = 2$	✗	✗	✗	$2 / 2$	$3 / 3$	$3 / 3$	$3 / 4$	✗
$P = 3$	✗	✗	✗	✗	$2 / 2$	$3 / 3$	$3 / 3$	$4 / 4$

Table 13. Comparative performance of KOA prediction methods dealing w.r.t. (1) static KL-grade classification and (2) longitudinal KL-grade prediction and (3) KL-grade progression.

Method	Prediction Type	Data	Source	Add. Inputs	Model	Results
Kishore et.al [29]	KL Grading	OAI	X-ray	-	12 DL models	$A C C = 98.36 %$
Yong et.al [27]	KL Grading	OAI	X-ray	-	{VGG, ResNet, DenseNet}	$A C C_{m a c r o} = 88.09 %$
Gorriz et.al [52]	KL Grading	OAI, MOST	X-ray	-	VGG + Attn. Module	$A C C = 64.3 %$
Chen et.al [28]	KL Grading	OAI-baseline	X-ray	-	{ResNet, VGG, DenseNet} + Novel Ord. loss	$A C C = 69.7 %$
Zhang et.al [25]	KL Grading	OAI-baseline	X-ray	-	ResNet+ CBAM	$A C C = 74.81 %$
Tiulpin et.al [24]	KL Grading	OAI_trn.→MOST_tst.	X-ray	-	Deep Siamese CNN	$K = 82 %$
Ashamari et.al [33]	KL Grading	Kaggle	X-ray	-	{Seq.CNN, VGG, ResNet-50}	$A C C = 92.17 %$
Proposed	KL Grading	OAI	MRI	-	C_Shape.Net + SptHGCN	$B A = 95.98 %, K = 97.92 %$
Hu et.al [36]	Long. KL Prediction	OAI	X-ray	-	Adversarial NN	$S e n s = 0.639 (+ 1), 0.632 (+ 2), 0.618 (+ 3), 0.602 (+ 4)$
Proposed	Long. KL Prediction	OAI	MRI	-	C_Shape.Net + SptHGCN + Transf.	$B A = 95.98 %, K = 97.92 %$
Panfilov et.al. [37]	Long. Progr. Prediction	OAI	X-ray, MRI (multi-modal)	+Demographics	CNN + Transf.	$A U C = 0.76 (+ 1), 0.72 (+ 2), 0.70 (+ 3), 0.74 (+ 4)$
Halilaj et.al. [35]	Long. Progr. Prediction (Progr. vs No-Progr.)	OAI	X-ray	Pain scores (WOMAC)	LASSO + Clustering	$A U C = 0.86$
Guan et.al. [32]	JSL Progr. (baseline → +48)	OAI	MRI	+{Demographics, Injury hist., Tibiofemoral angle, etc.}	Deep Learning	$A U C = 0.85$
Du et.al. [34]	KL Progr. Prediction (baseline → +2)	OAI	MRI	CDI at 36 locations	Comparison of ML methods	$A U C = 0.76, F = 0.714$
Ahmad et.al. [33]	Long. Progr. Prediction	OAI	X-ray, MRI (multi-modal)	+Demographics	CNN + Transf.	$A U C = 0.76 (+ 1), 0.72 (+ 2), 0.70 (+ 3), 0.74 (+ 4)$
Tiulpin et.al. [51]	No-Progr. vs Progr. $(0 \to + 5)$	OAI_trn.→MOST_tst.	X-ray	+{Demographics, WOMAC, Injury hist., KL}	DenseNet	$A U C = 0.834, S e n s . = 0.78, S p e c . = 0.77$
Pedoia et.al. [53]	No-OA(≤1) vs OA((>1))	OAI	MRI	{Demographics, WOMAC}	DenseNet	$A U C = 0.83, S e n s . = 0.77, S p e c . = 0.78$
Alexopoulos et.al. [54]	No_OA vs EARLY_OA	OAI	MRI	+Demographics	{ResNet, DenseNet, CVAE}	$A U C = 0.65$
Schiratti et.al. [55]	JSN Progr. $(+ 1)$ ahead	OAI	MRI	-	EfficientNet-B0, DenseNet, CVAE	$A U C = 0.67$
Proposed	KL Progression Incidence	OAI	MRI	+Demographics	C_Shape.Net + ST_HGCN + Transf.	$A U C = [0.94 (+ 1), 0.89 (+ 2), 0.85 (+ 3), 0.80 (+ 4)], O A = 85.94 %$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Theocharis, J.B.; Chadoulos, C.G.; Symeonidis, A.L. A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression. Mach. Learn. Knowl. Extr. 2025, 7, 40. https://doi.org/10.3390/make7020040

AMA Style

Theocharis JB, Chadoulos CG, Symeonidis AL. A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression. Machine Learning and Knowledge Extraction. 2025; 7(2):40. https://doi.org/10.3390/make7020040

Chicago/Turabian Style

Theocharis, John B., Christos G. Chadoulos, and Andreas L. Symeonidis. 2025. "A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression" Machine Learning and Knowledge Extraction 7, no. 2: 40. https://doi.org/10.3390/make7020040

APA Style

Theocharis, J. B., Chadoulos, C. G., & Symeonidis, A. L. (2025). A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression. Machine Learning and Knowledge Extraction, 7(2), 40. https://doi.org/10.3390/make7020040

Article Menu

A Novel Approach Based on Hypergraph Convolutional Neural Networks for Cartilage Shape Description and Longitudinal Prediction of Knee Osteoarthritis Progression

Abstract

1. Introduction

1.1. KOA Detection and Progression

1.2. Proposed Methodology

2. Related Works

2.1. 3D Shape Descriptors

2.2. KOA Prediction Models

3. Hypergraph Convolutional Networks

3.1. Hypergraph Definition

3.2. Hypergraph Spectral Convolution

3.3. Adaptive Hypergraph Learning

4. Materials

5. Graph Construction of Volumetric Nodes

5.1. Cartilage Volume Surfaces

5.2. Cartilage Surface Meshes

5.3. Graphs of Volumetric Nodes

5.4. Features of Volumetric Nodes

6. Cartilage 3D Shape Network

6.1. DHGCN Blending Block

6.2. Graph Coarsening and Readout

7. KOA Prediction Network

7.1. Notations and Problem Definition

7.2. Spatial Convolutions

7.3. Attention-Based Temporal Fusion

7.4. Sequence Transformation

8. Experimental Results

8.1. Implementation Details

8.2. Node KOA Classifications

8.3. Longitudinal KL Predictions

8.4. Demographic-Based KL Accuracies

8.5. Accuracies According to the Stage of KOA

8.6. Longitudinal Predictions of KOA Progression

8.7. Ablation Studies

8.7.1. Temporal Fusion Effect

8.7.2. Demographic Risk Factors Effect

8.7.3. Shape Network Contribution

8.8. Visual Demonstration

8.9. Comparative Results

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI