This section evaluates our method through experiments comparing it to existing models, analyzing the neighbor penalty’s impact on misclassification, and assessing the embedding network’s effect on performance.
4.1. Data and Setting
We evaluated our model using several publicly available datasets commonly used in ordinal regression research, including the Breast Ultrasound Images (BUSI) dataset [
29], Oil Palm Fruit Quality (OPALM) dataset [
30], Morphological Face Database II (MORPH-2) [
31], Asian Face Database (AFAD) [
3], and Cross-Age Celebrity Dataset (CACD) [
32]. The BUSI dataset (780 images) captures three stages of breast health, useful for evaluating medical imaging tasks. OPALM (4,728 images) categorizes oil palm fruits into five ripeness levels, introducing real-world variability such as lighting conditions. For facial analysis, MORPH-2, AFAD, and CACD provide large-scale datasets with comprehensive age annotations, serving as benchmarks for fine-grained ordinal distinctions.
To ensure consistency, all images were resized to
pixels. During training, data augmentation included random cropping to
and horizontal flipping; evaluation used center cropping. These preprocessing steps enhanced generalization across datasets. All models used ResNet-34 as the backbone to ensure a fair comparison, as detailed in
Table 1. Our method, GNOOR (Global and Neighborhood Order—Ordinal Regression), was built upon this architecture. CE-CNN applied standard cross-entropy loss, OR-CNN followed the ordinal classification scheme from [
3], and CORAL-CNN adopted a rank-consistent output layer from [
7]. GNOOR extended CORAL-CNN by incorporating a neighbor penalty term and a pretrained embedding network to improve global ordinal representation.
We selected three baseline models to represent key directions in ordinal regression research: (1) CE-CNN, which applies standard cross-entropy loss and serves as a baseline classifier that ignores ordinal constraints; (2) OR-CNN, which implements ordinal classification using a series of binary comparisons, following the formulation in [
3]; and (3) CORAL-CNN, which applies rank-consistent ordinal regression with a theoretically grounded output layer [
7]. These models were chosen to reflect the progression from naive classification to structured ordinal regression, offering a robust framework for comparative analysis. Our proposed model, GNOOR (Global and Neighborhood Order – Ordinal Regression), extends CORAL-CNN by incorporating a neighbor-aware penalty term and a pretrained embedding network to improve both global and local ordinal consistency.
The embedding network used a ResNet-34 architecture and was trained independently to learn a 512-dimensional feature representation . Once trained, was frozen and served as a fixed feature extractor. The training process ran for 100 epochs using the Adam optimizer (, ) with a learning rate of and a weight decay of . To ensure stable optimization, a batch size of 256 was used for facial datasets, while the smaller BUSI and OPALM datasets employed batch sizes of 16 and 32, respectively.
The main network , which also followed the ResNet-34 architecture, was designed for ordinal regression and produced a separate 256-dimensional feature vector . The final representation was formed by concatenating and , yielding a 512-dimensional embedding . This was passed through a fully connected layer to generate scalar logits. A trainable bias term was added before applying the sigmoid activation function to estimate ordinal probabilities. The network was trained for 200 epochs using SGD with Adam optimization (, ) at a learning rate of , with batch size settings consistent with to maintain uniform training dynamics. Model performance was evaluated using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) on independent test sets, with results reported as the best across three random seed initializations. All experiments were conducted on a Windows 11 Pro system with an Intel Core i7-12700 CPU, 96 GB RAM, NVIDIA RTX 4090 GPU, Python 3.9.19, CUDA 12.1, and PyTorch 2.4.0.
4.2. Results and Discussion
Ordinal regression bridges classification and regression, enabling evaluation with both classification and regression metrics. Classification metrics like accuracy, precision, recall, and F1-score assess the model’s ability to capture ordinal relationships and minimize misclassification. High precision indicates alignment with true ordinal labels, while high recall reflects effective identification within the ordered framework. F1-score balances these aspects, measuring overall effectiveness. Regression metrics, such as MAE and RMSE, quantify prediction errors, with smaller values indicating closer alignment to true labels, where errors between neighboring classes are less severe.
As shown in
Table 2, our method (GNOOR) outperformed existing models (CE-CNN, OR-CNN, and CORAL-CNN) on the BUSI dataset across both classification and regression metrics. This highlights its strength in leveraging ordinal structure for medical imaging, where errors such as misclassifying malignant tumors can have critical consequences. On the OPALM dataset, CE-CNN slightly outperformed our method, likely due to its circular labeling pattern. While fruit maturity increases from Level 0 to 2 and declines by Level 4, this circularity challenges models that assume linear order. Nonetheless, our method preserved the ordinal structure more effectively, as shown in the confusion matrices (
Figure 4 and
Figure 5), where errors clustered around neighboring classes, indicating better ordinal consistency.
Preserving adjacent ordinal relationships is critical in real-world applications where small-scale misclassifications can lead to significant consequences. In medical imaging, for example, misclassifying a tumor’s malignancy level as Grade 2 instead of Grade 0 is a severe mistake, but frequent misclassifications between adjacent Grades (e.g., 1 vs. 2, 2 vs. 3) can still disrupt clinical decision-making, making diagnoses inconsistent and unreliable. Similarly, in agriculture, repeatedly predicting slightly unripe fruit as ripe can lead to systematic harvesting errors, reducing product quality and market value. While many existing ordinal loss functions emphasize penalizing distant errors, our approach complements them by enforcing local ranking consistency, ensuring that predictions do not fluctuate unpredictably between adjacent ranks. By explicitly incorporating neighbor-aware penalties, our method stabilizes ordinal decision boundaries, reducing the risk of adjacent inconsistencies while preserving global order.
Figure 4 and
Figure 5 show confusion matrices for the BUSI and OPALM datasets, illustrating misclassification distributions across ordinal categories. These visualizations highlight the effectiveness of our method, which integrates a neighbor penalty term into the loss function to address challenges in ordinal regression. In
Figure 4 (BUSI dataset), two error types are emphasized: misclassifications involving the first (nearest) neighbor, marked in green, and those involving the second (distant) neighbor, in red. For example, predicting a label of 2 instead of 0 is more severe (red) than predicting it as 1 (green). Traditional models like CE-CNN ignore ordinal structure, penalizing all errors equally, regardless of severity. OR-CNN improves this using multiple binary classifiers but lacks consistency enforcement. CORAL-CNN adds consistency loss to reduce distant errors but often overlooks neighboring misclassifications. Our method explicitly penalizes neighboring errors, significantly reducing them while maintaining low distant error rates. This balance is especially critical in medical contexts like BUSI, where errors such as misclassifying a malignant tumor as normal can have serious consequences.
Figure 5 (OPALM dataset) offers a finer view of errors, distinguishing first (green), second (yellow), third (orange), and fourth (red) neighboring categories. The dataset’s circular ordinal structure—from immature to ripe to decayed—poses unique challenges, increasing the chance of neighboring errors. Traditional models prioritize reducing distant errors but often struggle with adjacent ones. By incorporating a neighbor penalty, our method effectively reduces errors across both near and distant categories. Our approach aligns predictions with the data’s ordinal nature, producing more reliable outcomes even in non-linear structures. This robustness illustrates the method’s broader applicability to complex ordinal regression tasks. It is important to note that reducing misclassification errors for both adjacent and distant labels simultaneously is a challenging task.
Table 3 compares age prediction errors, measured by mean absolute error (MAE) and root mean squared error (RMSE), across our proposed method and three widely used ordinal regression models adapted from [
7]. To ensure a fair and controlled comparison, all models were trained using the same backbone architecture (ResNet-34), identical training hyperparameters (e.g., learning rate, optimizer, batch size, number of epochs), and consistent data preprocessing procedures. We also initialized all models with the same random seeds (0, 1, and 2) across three runs. This design choice eliminates confounding factors, allowing us to isolate the impact of the ordinal modeling strategy and loss function used in each method.
Our method consistently achieved lower MAE and RMSE values compared to CE-CNN, OR-CNN, and CORAL-CNN, demonstrating its superior performance. CE-CNN, which treats ordinal regression as a standard classification task, disregards the ordinal structure of labels, leading to less accurate, rank-inconsistent predictions. OR-CNN incorporates ordinal information through binary decomposition but lacks explicit mechanisms to enforce rank consistency, often resulting in violations of ordinal order. CORAL-CNN uses a cumulative link formulation to enforce global consistency but may overemphasize this constraint, thereby increasing errors among adjacent categories. In contrast, our proposed model integrates both global ordinal structure and local neighborhood consistency, yielding more reliable and semantically coherent predictions.
In contrast, our method addresses these limitations by integrating a neighbor penalty term and metric learning. The neighbor penalty explicitly reduces adjacent category errors, preserving ordinal relationships and ensuring rank consistency. Metric learning enhances the embedding network, aligning samples in a latent space that reflects their ordinal relationships, minimizing confusion with distant categories while emphasizing local similarity. This synergy between local feature refinement and global consistency enables our model to capture nuanced patterns in the data while maintaining robust generalization. Consequently, our approach outperformed baseline models, achieving superior accuracy and error reduction, underscoring its effectiveness for ordinal regression tasks.
To further validate the superiority of our method, we conducted paired Student’s t-tests comparing the MAE performance of GNOOR against CE-CNN, OR-CNN, and CORAL-CNN across three widely used benchmark datasets: MORPH-2, AFAD, and CACD. The detailed results are presented in
Table 4. For both CE-CNN and OR-CNN, the differences were highly significant across all datasets, with p-values well below the 0.05 threshold (e.g.,
), confirming that GNOOR consistently outperforms these baselines. When compared to CORAL-CNN, GNOOR achieved statistically significant improvements on MORPH-2 (
t = 2.77,
p = 0.0121) and AFAD (
t = 2.62,
p = 0.0170). On the CACD dataset, although the improvement was not statistically significant (
t = 1.40,
p = 0.1775), GNOOR still demonstrated a lower average MAE. These statistical findings provide strong evidence of the robustness and effectiveness of GNOOR in improving ordinal regression performance across diverse datasets.
To evaluate the contributions of each component in our hybrid model, we conducted an ablation study by sequentially removing the neighbor penalty loss (GNOOR-
) and the embedding network (GNOOR-
). As shown in
Figure 6, omitting the neighbor penalty loss resulted in a more significant performance decline, with higher MAE and RMSE values. This emphasizes the importance of the neighbor penalty in reducing adjacent category misclassification errors and preserving ordinal consistency. The embedding network (
), which captures global ordinal relationships by embedding samples into a structured latent space, complements local feature refinements. While it enhances performance, it may risk overfitting, especially in datasets with limited samples. Our method improves the CORAL approach by addressing neighbor misclassification errors through the penalty loss, while the embedding network ensures robust learning of global ordinal features. This synergy enables our model to achieve superior ordinal regression performance.
The computational efficiency and parameter complexity of the models are summarized in
Table 5. As expected, GNOOR exhibits the highest average training time per epoch, primarily due to its more complex architecture and the inclusion of additional ordinal-embedding components. In comparison, CE-CNN, which employs a conventional CNN with a single classifier trained using categorical cross-entropy loss, achieves the shortest training time. This is attributed to its architectural simplicity and the absence of mechanisms that explicitly preserve ordinal relationships. OR-CNN increases computational cost by introducing multiple binary classifiers to model ordinal information, while CORAL-CNN further adds consistency-enforcing constraints to the loss function, resulting in a slightly higher computational burden than OR-CNN. All four baseline models have comparable parameter counts, ranging between 21.4 million and 21.6 million, indicating similar model complexities despite minor architectural variations.
Although GNOOR is more computationally demanding, its total parameter count remains within a comparable range at approximately 22.1 million. This suggests that the longer training time arises more from the structure of its computation graph, particularly the use of the embedding network, rather than from the number of trainable parameters. During the main training phase, the embedding network (
) is frozen and thus does not contribute to gradient computations, making GNOOR’s training time competitive when the embedding phase is excluded. However, the embedding network requires separate pretraining, as indicated in the final column of
Table 5. This phase incurs a substantial training cost due to the use of triplet loss and repeated triplet network evaluations. Despite the additional effort, the embedding module consists of 21.5 million parameters that are reusable and do not require retraining for downstream tasks. This design introduces a beneficial trade-off between training efficiency and ordinal modeling performance, offering advantages in scenarios where classification accuracy and ordinal consistency are more critical than training speed.