Next Article in Journal
Assessment of Oncology Patients’ Knowledge on Skin Care During and After Radiotherapy Treatment
Previous Article in Journal
Cutting-Load Characteristics of Excavation Machine Picks in Hydraulic-Precracked Coal–Rock
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation

by
Kübra Karacan Uyar
1,* and
Yücel Batu Salman
2
1
Kariyer.Net R&D Center, Department of Technology and Innovation, Istanbul 34768, Turkey
2
Department of Software Engineering, Bahçeşehir University, Istanbul 34420, Turkey
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(22), 12340; https://doi.org/10.3390/app152212340
Submission received: 11 September 2025 / Revised: 12 November 2025 / Accepted: 14 November 2025 / Published: 20 November 2025

Featured Application

This work applies to reciprocal recommendation domains requiring bilateral preference satisfaction, including recruitment platforms, dating applications, professional networking, and educational matching systems, where hierarchical structures are prevalent. Deployment in sensitive domains such as employment should incorporate fairness evaluation and bias mitigation strategies to prevent discriminatory outcomes.

Abstract

Reciprocal recommendation requires satisfying preferences on both sides of a match, which differs from standard one-sided settings and often involves hierarchical structure (e.g., skills, seniority, education). We present biLorentzFM, which is a multi-objective framework that integrates hyperbolic geometry into factorization machine architectures using Lorentz embeddings with learnable curvature and manifold-aware optimization. The approach addresses whether a geometric structure aligned with hierarchical relationships can improve reciprocal matching without requiring major architectural changes. On a large-scale recruitment dataset from Kariyer.Net (1,150,302 interactions, 229,805 candidates), the model achieves candidate and company AUCs of 0.9964 and 0.9913 respectively, representing 6.6% and 6.0% improvements over the strongest Euclidean baseline while maintaining practical inference latency (2.1 ms per batch). Cross-validation analysis confirms robustness (5-fold: 0.9813 ± 0.0002; 3-seed: 0.9964 ± 0.0012) with very large effect sizes (Cohen’s d = 2.89–3.08). Although the per-epoch training time increases by 23.5% due to manifold operations, faster convergence (12 vs. 18 epochs) reduces the total training time by 17.8%. Cross-domain evaluation on Speed Dating data demonstrates generalization beyond explicit hierarchies with a 2.8% AUC improvement despite lacking structured taxonomies. Learned curvature parameters differ by entity type, providing interpretable indicators of hierarchical structure strength. Ablation studies isolate contributions from geometric structure (6.6%), learnable curvature (4.7%), multi-objective learning (2.1%), and explicit feature interactions (0.6%). A systematic comparison reveals that Lorentz embeddings outperform Poincaré ball implementations by 4.4% AUC under identical conditions, which is attributed to numerical stability advantages. The results indicate that pairing standard recommendation architectures with geometry reflecting hierarchical relationships can provide consistent improvements for reciprocal matching, while limitations including cold-start performance, computational overhead at an extreme scale, and static hierarchy assumptions suggest directions for future work on adaptive curvature, fairness constraints, and dynamic taxonomies.

1. Introduction

The exponential growth of online platforms has fundamentally transformed how individuals and organizations connect, creating unprecedented opportunities for automated matching systems. Recommender systems, which emerged to address information overload in large-scale digital environments, have become indispensable tools across diverse domains ranging from e-commerce and entertainment to social networking and professional services [1]. However, traditional recommendation paradigms primarily focus on unilateral user preferences, optimizing for the satisfaction of a single party—typically the content consumer.
A distinct class of recommendation problems, known as reciprocal recommendation, requires satisfying the preferences of multiple parties simultaneously. Unlike conventional systems where movies can be liked by unlimited users or products can be purchased by countless customers, reciprocal recommendation operates under mutual constraints where success depends on bilateral agreement [2]. This paradigm is exemplified in critical real-world applications such as online dating platforms, where both parties must express mutual interest, and job-matching systems, where successful placement requires alignment between candidate aspirations and employer requirements. The global online recruitment market continues its rapid expansion, with projections indicating growth from USD 29.09 billion in 2022 to USD 58.52 billion by 2030 [3], reflecting the increasing digitalization of hiring processes and the critical need for intelligent matching systems.
Job markets exhibit inherent hierarchical relationships that present fundamental challenges for current recommendation systems. Career progression follows directional paths where senior engineers possess qualifications enabling them to fill mid-level roles (acceptable overqualification), whereas junior engineers applying to senior positions typically lack required expertise (problematic underqualification). These asymmetric relationships pose difficulties for standard Euclidean embeddings that compute symmetric distances (dE(A, B) = dE(B, A)). Furthermore, skills form tree-like taxonomies with substantial branching, and theoretical analysis demonstrates that embedding n-node trees in Euclidean space without distortion requires O ( n log n ) dimensions [4], whereas hyperbolic space achieves comparable representation quality in O ( log n ) dimensions through exponential volume growth. Recent advances in deep learning architectures including DeepFM [5] and biDeepFM [6] have improved feature interaction modeling but operate exclusively in Euclidean space, potentially inheriting these geometric constraints.
Hyperbolic space provides a geometric framework suited for hierarchical data representation through negative curvature and exponential volume growth. The Lorentz model represents hyperbolic space as a hyperboloid in Minkowski space, offering computational advantages through efficient distance computations and numerical stability [7]. Recent work has begun exploring hyperbolic embeddings for collaborative filtering [8,9], demonstrating superior performance on datasets with hierarchical structures. However, existing hyperbolic recommendation methods share critical limitations: they optimize single objectives focused solely on user satisfaction, lacking the bilateral preference modeling essential for reciprocal matching, and most rely on graph structures that may be sparse or unavailable in cold-start scenarios. This work introduces biLorentzFM, which is a multi-objective deep learning framework employing Lorentz embeddings for reciprocal recommendation. Through controlled empirical evaluation on two datasets—Kariyer.Net job matching (1.15 M interactions, 230 K users, explicit five-level hierarchies) and Speed Dating (8.4 K interactions, 552 participants, latent hierarchies)—we demonstrate that hyperbolic geometry provides substantial performance improvements (6.6% AUC on job matching, 2.8% on dating) over state-of-the-art Euclidean baselines. Systematic ablation studies isolate contributions from the geometric structure (6.6%), learnable curvature (4.7%), and multi-objective learning (2.1%), while a three-way comparison reveals that Lorentz embeddings outperform Poincaré ball implementations by 4.4% due to numerical stability advantages. Statistical validation through five-fold cross-validation confirms robustness (std < 0.0004) with very large effect sizes (Cohen’s d = 2.89–3.08), and cross-domain evaluation establishes generalization beyond explicit hierarchies. The approach maintains practical efficiency despite 23.5% per-epoch overhead with faster convergence reducing total training time by 17.8% and inference latency remaining suitable for real-time deployment (2.1 ms per batch).

Organization

The remainder of this paper proceeds as follows. Section 2 reviews related work on reciprocal recommendation, hyperbolic embeddings, and deep factorization models, identifying the research gaps motivating this work. Section 3 presents the biLorentzFM methodology including Lorentz model foundations, architecture design, multi-objective optimization procedures, and computational complexity analysis. Section 4 describes the experimental setup including datasets, data leakage prevention protocols, baselines, and evaluation metrics. Section 5 presents the comprehensive results including performance comparisons, a three-way geometric comparison (Euclidean vs. Lorentz vs. Poincaré), statistical significance analysis, ablation studies, learned curvature interpretability, and cross-domain validation. This section discusses theoretical implications, addresses reviewer concerns regarding result plausibility, examines practical deployment considerations, and acknowledges limitations, concluding with a summary and future research directions.

2. Related Work

This work sits at the intersection of reciprocal recommendation systems, hyperbolic embeddings, and deep factorization models. We systematically review each area, explicitly connecting prior work to our contributions and identifying gaps that motivate biLorentzFM.

2.1. Reciprocal Recommendation Systems

Reciprocal recommendation represents a distinct paradigm where success requires bilateral agreement rather than unilateral preference satisfaction.
Addressing the challenge of mutual preference has motivated various approaches. Pizzato et al. [10] introduced RECON, the first reciprocal recommender for online dating, demonstrating that modeling mutual interest improves match quality compared with one-sided recommendations. Li and Li [2] formalized the reciprocal recommendation problem through the MEET framework, establishing mathematical foundations for systems requiring mutual agreement. However, these early approaches rely on linear models that cannot capture complex feature interactions, and they are domain-specific to romantic matching where subjective compatibility dominates. They cannot address the hierarchical skill taxonomies, educational prerequisites, and experience progressions present in job markets, motivating our hyperbolic geometric approach.
The extension of reciprocal recommendation to professional domains introduced additional complexity beyond romantic matching contexts. Early work by Malinowski et al. [11] applied expectation-maximization algorithms to jointly model recruiter and candidate preferences, but required explicit preference labels often unavailable in real-world platforms. Building upon these foundations, Yıldırım et al. [6] introduced biDeepFM, combining factorization machines with deep neural networks for multi-objective optimization and achieving state-of-the-art performance by simultaneously modeling both parties’ preferences. While biDeepFM successfully addresses the reciprocal nature of hiring, it operates exclusively in Euclidean space. This geometric constraint limits its ability to capture the inherent hierarchical relationships in job markets, where senior positions require junior experience, specialized skills build upon foundational knowledge, and organizational structures reflect clear reporting hierarchies. Our work extends biDeepFM’s multi-objective framework to hyperbolic space through Lorentz embeddings, preserving its bilateral modeling capacity while gaining the ability to represent hierarchies efficiently.
Recent developments have addressed complementary challenges in reciprocal recommendation systems through diverse methodological innovations. Kumar et al. [12] developed zero-shot approaches for handling new job titles, achieving 81.11% top-500 accuracy, while Abdul-Rahman and Hailes [13] investigated bias-mitigation techniques, proposing vector space correction methods to address fairness concerns in hiring. Zhou et al. [14] integrated social networking information to improve graduate job recommendations, and Madanchian [15] provided a comprehensive analysis of AI tools across the entire HR lifecycle. Despite these advances, all existing reciprocal recommendation approaches operate exclusively in Euclidean embedding spaces, treating all relationships as symmetric and failing to exploit hierarchical structure. Our systematic evaluation demonstrates that hyperbolic geometry provides 6.6% AUC improvements over these Euclidean methods specifically because job–candidate relationships exhibit asymmetric hierarchies that align with hyperbolic space’s geometric properties.

2.2. Hyperbolic Embeddings for Hierarchical Data

Hyperbolic geometry provides a natural framework for representing hierarchical structures through negative curvature and exponential volume growth.
Building on theoretical foundations, Sarkar [4] established that n-node trees can be embedded in hyperbolic space with logarithmic dimensions and low distortion, compared to the polynomial dimensions required in Euclidean space. This fundamental result demonstrates hyperbolic geometry’s inherent advantage for hierarchical data. The theoretical insight directly motivates our approach: job markets exhibit tree-like hierarchies (five-level taxonomy: Industry → Sector → Job Family → Role → Seniority) that Euclidean space represents inefficiently. We build upon Sarkar’s theoretical foundation by demonstrating practical benefits in reciprocal recommendation.
Translating these theoretical insights into practical methods, Nickel and Kiela [16] pioneered hyperbolic embeddings through Poincaré ball representations, demonstrating that word hierarchies can be captured in remarkably low dimensions (2D hyperbolic matching 200D Euclidean). Subsequently, Nickel and Kiela [7] introduced the Lorentz model, which represents hyperbolic space as a hyperboloid in Minkowski space, offering computational advantages through efficient distance computations and numerical stability. We adopt the Lorentz model as our geometric foundation but extend it significantly: while Nickel and Kiela focused on single-objective hierarchical embedding tasks, we develop specialized techniques for multi-objective reciprocal recommendation, including manifold-aware gradient projection that maintains geometric constraints during the simultaneous optimization of candidate and company objectives.
Recent applications have begun exploring hyperbolic embeddings for collaborative filtering. Vinh et al. [8] introduced HyperML, the first application of hyperbolic embeddings to recommendation, demonstrating superior performance on datasets with hierarchical structures such as book recommendations organized by topic taxonomies. Sun et al. [9] developed Hyperbolic Graph Collaborative Filtering (HGCF), combining graph neural networks with Poincaré embeddings to jointly learn user and item representations. Zhang et al. [17] proposed LGCNs (Lorentz Graph Convolutional Networks), leveraging the numerical stability of the Lorentz model for graph-based recommendation. Yang et al. [18] introduced hyperbolic contrastive learning to address popularity bias. However, all existing hyperbolic recommendation methods share two critical limitations that prevent their application to reciprocal scenarios. First, they optimize single objectives focused solely on user satisfaction, lacking the bilateral preference modeling essential for reciprocal matching where both parties must agree. Second, most rely on graph structure [9,19], which may be sparse or unavailable in cold-start scenarios common in recruitment platforms. Our work addresses both limitations: we introduce multi-objective optimization on the Lorentz manifold with specialized Riemannian gradient projection techniques, and we operate on factorization machines that learn from rich categorical features rather than requiring predefined graph structures.
The choice of hyperbolic model significantly impacts computational efficiency and optimization stability. The literature offers three primary models with distinct computational properties. Nickel and Kiela [16] introduced the Poincaré ball model, which represents hyperbolic space within the unit ball { x :   x   < 1 } . While geometrically intuitive, the Poincaré model requires careful handling of numerical instabilities near the boundary, where distances and gradients become ill-conditioned. Practitioners typically introduce epsilon-based clamping ( x   < 1 ϵ ) to prevent numerical overflow [7]. To address these limitations, Nickel and Kiela [7] subsequently proposed the Lorentz model (also called the hyperboloid model), which embeds hyperbolic space as a sheet of a two-sheeted hyperboloid in (d + 1)-dimensional Minkowski space. The Lorentz model offers three computational advantages over Poincaré: (1) the manifold is unbounded, eliminating boundary singularities; (2) distance computations reduce to hyperbolic inner products without requiring arctanh operations; and (3) the tangent space structure enables straightforward Riemannian optimization through orthogonal projection. Zhang et al. [17] empirically demonstrated these advantages in recommendation settings, showing that Lorentz Graph Convolutional Networks converge 2.3× faster than Poincaré-based variants while achieving superior ranking performance. The Klein model provides another alternative, offering a bounded representation with straight geodesics, but distance computations become more complex than in the Lorentz model. We adopt the Lorentz model for biLorentzFM based on its established advantages for optimization stability and computational efficiency in deep learning applications. Our ablation study (Section 4.3) confirms these benefits, demonstrating that Lorentz embeddings outperform Poincaré by 4.4% on AUC.
Finally, broader advances in geometric deep learning have demonstrated hyperbolic geometry’s versatility across diverse architectures. Chami et al. [20] introduced hyperbolic graph convolutional networks, achieving superior link prediction performance on hierarchical graphs. Ganea et al. [21] developed hyperbolic neural networks with feed-forward and recurrent operations adapted to curved spaces. These works establish that neural architectures can be successfully adapted to hyperbolic manifolds. We extend these ideas to factorization machines, a fundamentally different architecture that excels at modeling feature interactions in sparse data—precisely the scenario encountered in job recommendations where most candidate–job pairs have no observed interactions.

2.3. Deep Factorization Models

Factorization-based methods have formed the backbone of modern recommendation systems since the Netflix Prize competition.
The architectural foundations for our approach draw from advances in factorization-based recommendation. Rendle [22] introduced factorization machines (FMs), which model pairwise feature interactions through factorized parameters, enabling effective learning from sparse data. The key innovation of FM is representing second-order feature interactions through low-rank factorization, reducing parameters from O ( n 2 ) to O ( k n ) where k is the factorization dimension. Building on this foundation, Guo et al. [5] proposed DeepFM, which combines factorization machines with deep neural networks to jointly learn low-order (FM) and high-order (DNN) feature interactions while sharing embeddings between components. DeepFM has become foundational for feature-rich recommendation scenarios, but operates exclusively in Euclidean space. Our work preserves DeepFM’s wide-and-deep architecture—proven effective for capturing both memorization and generalization—while replacing Euclidean embeddings and dot products with Lorentz embeddings and hyperbolic inner products, gaining the ability to model hierarchical feature relationships.
Subsequent work has explored various mechanisms for learning feature interactions beyond standard factorization. He and Chua [23] proposed neural factorization machines, replacing FM’s inner product with neural networks to learn arbitrary interaction functions. Xiao et al. [24] introduced attentional factorization machines, using attention mechanisms to automatically learn interaction importance. Wang et al. [25] developed Deep & Cross Networks with specialized cross-feature layers, while Song et al. [26] proposed AutoInt, leveraging self-attention for automatic high-order interaction learning. These methods demonstrate various approaches to capturing feature relationships but share a fundamental limitation: all operate in Euclidean space using symmetric similarity measures (dot products, cosine similarity) that treat all feature relationships bidirectionally. For job recommendations, this symmetry is problematic—a senior developer’s skills satisfy junior position requirements, but junior skills do not satisfy senior requirements. Hyperbolic inner products naturally capture such asymmetric relationships through their mixed signature ( x ,   y L = x 0 y 0 + x i y i ), enabling the directional modeling of hierarchical dependencies.
The challenge of optimizing multiple competing objectives in reciprocal recommendation requires sophisticated multi-objective learning techniques. Ma et al. [27] developed approaches for multi-task learning using mixture-of-experts models that learn task relationships, demonstrating that shared representations benefit multiple related objectives. However, existing multi-objective recommendation methods do not consider geometric structure—they assume Euclidean embeddings regardless of data characteristics. We demonstrate that geometric structure significantly impacts multi-objective learning: our experiments show that hyperbolic geometry provides substantial benefits for both candidate and company objectives simultaneously (6.6%/6.0% AUC improvements), suggesting that geometric alignment with data structure naturally balances conflicting objectives through richer representational capacity.

2.4. Research Gaps and Our Contributions

While this body of work has established strong foundations for reciprocal recommendation, three critical gaps remain that limit practical effectiveness and motivate our approach.
First, all existing reciprocal recommendation methods—from early linear models [2] to state-of-the-art deep learning approaches [6]—operate exclusively in Euclidean embedding spaces. This geometric constraint becomes particularly problematic for domains like job matching where relationships exhibit inherent hierarchical structures: career progressions follow directional paths, skills form taxonomies with substantial branching, and educational requirements create prerequisite chains. Euclidean distances treat all relationships symmetrically (d(A, B) = d(B, A)) and require exponentially increasing dimensions to approximate hierarchical structures with low distortion [4]. biLorentzFM addresses this fundamental limitation by operating in Lorentz space, where negative curvature and exponential volume growth naturally accommodate tree-like hierarchies. Our systematic evaluation demonstrates 6.6% AUC improvements attributable specifically to hyperbolic geometry, validating that geometric alignment with data structure provides practical benefits.
Second, while hyperbolic embeddings have demonstrated success in various collaborative filtering scenarios [8,9,18], their application to multi-objective reciprocal recommendation remains unexplored. Existing hyperbolic methods optimize single ranking objectives where only user satisfaction matters, lacking the bilateral preference modeling essential for reciprocal scenarios. Extending hyperbolic methods to multi-objective settings introduces significant technical challenges: gradient updates must maintain manifold constraints while simultaneously optimizing conflicting objectives. We address these challenges through specialized Riemannian optimization techniques, introducing manifold-aware gradient projection that ensures embeddings remain on the Lorentz hyperboloid throughout multi-objective training. Our ablation studies demonstrate that both geometric structure and specialized optimization contribute to performance, with learnable curvature providing an additional 0.8% improvement beyond basic hyperbolic embeddings.
Third, no prior work systematically compares different hyperbolic models (Poincaré, Lorentz, Klein) for recommendation tasks under controlled conditions. Existing studies typically evaluate a single hyperbolic model against Euclidean baselines, leaving unclear whether benefits stem from hyperbolic geometry generally or specific model choices. Furthermore, graph-based hyperbolic methods [9,19] cannot be directly compared to factorization-based approaches due to fundamentally different architectures and data requirements. We fill this gap through controlled ablation studies that isolate (1) geometric contributions (Euclidean vs. hyperbolic: +5.8% AUC), (2) model selection (Lorentz vs. Poincaré: +4.4% AUC for Lorentz due to numerical stability), and (3) architectural specialization (learnable curvature: +0.8% AUC). This decomposition provides empirical evidence about when and why hyperbolic methods benefit recommendation, guiding future research on geometric deep learning for practical applications.
biLorentzFM addresses all three gaps simultaneously by combining multi-objective optimization with Lorentz embeddings in a factorization machine architecture, supported by systematic evaluation that isolates geometric contributions from architectural choices. Our work demonstrates that theoretical insights from differential geometry can translate into substantial practical benefits (up to 11.7% AUC improvements, 68% LogLoss reductions) for real-world reciprocal recommendation systems.

3. Methodology

In this section, we present the biLorentzFM framework, which leverages Lorentz embeddings for multi-objective reciprocal recommendation. We begin by formalizing the reciprocal recommendation problem; then, we introduce the mathematical foundations of Lorentz embeddings, describe our architecture, detail the optimization procedure, and specify the experimental setup including datasets, data leakage prevention, and training procedures.

3.1. Problem Formulation

Let U = { u 1 , u 2 , , u | U | } denote the set of users (candidates) and I = { i 1 , i 2 , , i | I | } denote the set of items (jobs). In reciprocal recommendation, we aim to model the preferences of both sides simultaneously. Each interaction instance is represented as ( x ( n ) ,   y candidate ( n ) ,   y company ( n ) ) , where x ( n ) R d is a feature vector encoding the candidate–job pair characteristics, and y candidate ( n ) ,   y company ( n ) { 0 ,   1 } are binary labels indicating the preferences of the candidate and company, respectively.
The candidate preference y candidate = 1 indicates that the candidate applied for the job, while y company = 1 signifies that the company expressed interest in the candidate (e.g., by viewing their contact information). Our goal is to learn a function f : R d [ 0 ,   1 ] 2 that jointly predicts both preferences:
y ^ candidate ,   y ^ company = f ( x )
Unlike traditional single-objective recommendation systems, reciprocal recommendation requires satisfying both parties for successful matching, making multi-objective optimization essential.

3.2. Lorentz Embedding Layer

Traditional recommendation systems typically operate in Euclidean space R d . However, job–candidate relationships often exhibit hierarchical structures that are better captured by hyperbolic geometry. Among the hyperbolic models (Poincaré ball, Klein disk, Lorentz/hyperboloid), we adopt the Lorentz model based on three practical considerations: (1) it avoids numerical instabilities near manifold boundaries that affect the Poincaré ball model, (2) distance computations require only inner products without arctanh operations, and (3) gradient flow remains stable across the entire manifold. Our systematic comparison in Section 4 demonstrates that Lorentz embeddings achieve 4.4% higher AUC values than Poincaré under identical architectural conditions.

3.2.1. Mathematical Foundation

The Lorentz model represents hyperbolic space as a hyperboloid in Minkowski space. Intuitively, this hyperboloid curves inward, naturally organizing embeddings by hierarchy—entities at different organizational levels appear at different “depths” on the surface. The d-dimensional Lorentz manifold H d is formally defined as
H d = { x R d + 1 : x ,   x L = β ,   x 0 > 0 }
where · , · L is the Lorentz inner product, which differs from the standard Euclidean dot product by treating the first coordinate (time component) with a negative sign:
x ,   y L = x 0 y 0 + i = 1 d x i y i
This negative sign on the time component creates the hyperbolic geometry that enables hierarchical representation. The parameter β > 0 is a learnable curvature parameter that controls the “curvature” of the hyperbolic space—larger β values create stronger hierarchical capacity.

3.2.2. Embedding Mapping

To embed entities in Lorentz space, we define a mapping from categorical indices to hyperbolic coordinates. The approach starts with standard Euclidean embeddings (which are straightforward to initialize and optimize); then, it augments them with a time coordinate that automatically places them on the hyperboloid. For a categorical feature with value v, we first obtain a Euclidean embedding e v R d through a standard embedding layer. We then map this to Lorentz space via
h v = β + e v 2 2 ,   e v H d
This ensures that h v lies on the Lorentz manifold, satisfying the constraint h v ,   h v L = β . The time component h v , 0 = β + e v 2 2 encodes the hierarchical depth—embeddings with larger norms naturally appear “deeper” in the hierarchy. Meanwhile, the spatial components e v preserve semantic relationships learned from the data.

3.2.3. Learnable Curvature Parameter

The curvature parameter β must remain strictly positive to maintain valid hyperbolic geometry. Rather than using constrained optimization, we employ log-space parametrization [28]: we learn an unconstrained parameter log β R and compute β = exp ( log β ) . This automatically ensures β > 0 while allowing standard gradient descent. The gradient follows from the chain rule: L log β = L β · β . We initialize log β = 0.0 (corresponding to β = 1.0 ) and clip the final value to [ 0.01 , 5.0 ] to prevent numerical issues. This approach eliminates constraint handling overhead while maintaining stable convergence. In our experiments (Section 4.3.4), β converges to approximately 1.65 for the Kariyer.Net dataset, indicating moderately strong hierarchical structure. We use a single shared β across all embeddings rather than feature-specific values, as ablation studies showed no significant benefit from added complexity.

3.3. biLorentzFM Architecture

Our biLorentzFM architecture extends the DeepFM framework to hyperbolic space, combining factorization machines with deep neural networks while operating on Lorentz embeddings. Table 1 summarizes the architectural specifications with hyperparameters selected through validation set performance.

3.3.1. Input Feature Processing

Given an interaction between user u and item i, we extract categorical features c = [ c 1 , c 2 , , c m ] and numerical features n = [ n 1 , n 2 , , n p ] . Categorical features are mapped to Lorentz embeddings:
h c j = LorentzEmbedding ( c j ) H d
For numerical features, we first project them to Euclidean space via a linear transformation; then, we map to Lorentz space:
e num = W num n + b num
h num = β + e num 2 2 ,   e num

3.3.2. Hyperbolic Factorization Machine Component

The FM component captures pairwise feature interactions using Lorentz inner products. In traditional Euclidean space, the dot product measures feature similarity; in Lorentz space, the inner product additionally captures hierarchical relationships—features at similar hierarchy levels have stronger interactions. For categorical features, the interaction strength between features i and j is computed as
FM interaction ( i , j ) = h c i , h c j L = h c i , 0 h c j , 0 + h c i , 1 : d T h c j , 1 : d
The complete FM output aggregates all pairwise interactions:
y FM = w 0 + j = 1 m w j c j + i = 1 m j = i + 1 m h c i ,   h c j L
where w 0 is the global bias and w j represents linear coefficients, which model first-order feature effects.

3.3.3. Deep Neural Network Component

The DNN component learns high-order feature interactions by processing the concatenated Lorentz embeddings. We flatten all embeddings and feed them through a multi-layer perceptron:
z ( 0 ) = Concat ( [ h c 1 , h c 2 , , h c m , h num ] )
z ( l ) = σ ( W ( l ) z ( l 1 ) + b ( l ) ) for l = 1 , 2 , 3
y DNN = w out T z ( 3 )
where σ is the ReLU activation function. Dropout with a rate of 0.2 is applied after each hidden layer to prevent overfitting.

3.3.4. Multi-Objective Output Layer

To handle the dual objectives of reciprocal recommendation, we employ separate output heads for candidate and company predictions:
score base = α · y FM + ( 1 α ) · y DNN
y ^ candidate = σ ( W cand score base + b cand )
y ^ company = σ ( W comp score base + b comp )
where α = 0.5 balances the FM and DNN components, and σ is the sigmoid function.

3.4. Multi-Objective Optimization

We optimize both objectives simultaneously using a weighted multi-objective loss function. Each objective uses binary cross-entropy:
L candidate = 1 N n = 1 N y candidate ( n ) log y ^ candidate ( n ) + ( 1 y candidate ( n ) ) log ( 1 y ^ candidate ( n ) )
L company = 1 N n = 1 N y company ( n ) log y ^ company ( n ) + ( 1 y company ( n ) ) log ( 1 y ^ company ( n ) )
The total loss combines both objectives:
L total = λ cand L candidate + λ comp L company + λ reg θ 2 2
The task weights λ cand = λ comp = 0.5 were selected through a grid search over { 0.3 ,   0.4 ,   0.5 ,   0.6 ,   0.7 } , evaluating balanced performance across both objectives on the validation set. Equal weighting (0.5, 0.5) yielded the best trade-off, improving the candidate AUC value by 2.1% and company AUC value by 1.8% over unbalanced configurations. The regularization coefficient λ reg = 10 6 prevents overfitting.

3.4.1. Hyperbolic Optimization Considerations

Optimizing on the Lorentz manifold requires maintaining manifold constraints throughout training. Standard gradient descent operates in Euclidean space, but our embeddings must remain on the curved hyperboloid surface. We address this through Riemannian optimization [29,30]: gradients are first projected onto the tangent space (the local flat approximation of the manifold at each point); then, they are used for updates. For a point x H d , the tangent space T x H d is defined as
T x H d = { v R d + 1 : v , x L = 0 }
Geometrically, the tangent space consists of all vectors orthogonal to x under the Lorentz inner product. The projection of a Euclidean gradient g onto the tangent space is
proj T x H d ( g ) = g + g , x L β x
This projection removes the component of g perpendicular to the manifold, ensuring the gradient respects the geometric constraint. After each gradient update, embeddings are re-normalized to ensure they remain on the manifold. We implement Riemannian optimization using the Geoopt library [31], which provides efficient PyTorch 2.8.0 implementations of manifold-aware optimizers.

3.4.2. Computational Complexity

The computational complexity of biLorentzFM is comparable to a standard DeepFM. The Lorentz inner product requires O ( d ) operations, which are identical to Euclidean dot products. The main overhead comes from the square root computation in the embedding mapping, which is O ( 1 ) per embedding. The overall time complexity remains O ( m 2 d + L · d hidden ) for m categorical features and L DNN layers.
In terms of practical efficiency, while the per-epoch training time increases by 23.5% due to manifold operations (117.6 vs. 95.2 min), biLorentzFM converges in fewer epochs (12 vs. 18), reducing the total training time by 17.8% (23.5 vs. 28.6 h). The inference latency is 2.1 ms per 256-sample batch on NVIDIA V100 GPUs compared to 1.8 ms for biDeepFM (+16.7%), which remains suitable for real-time production deployment. Memory consumption increases modestly from 2.1 GB to 2.3 GB (+9.5%) due to additional time-coordinate storage for hyperbolic embeddings.

3.5. Datasets and Experimental Setup

We evaluate biLorentzFM on two reciprocal recommendation datasets with complementary characteristics. Table 2 summarizes their key properties, demonstrating diversity in scale, domain, and hierarchical structure.

3.5.1. Kariyer.Net: Job Matching Dataset

The Kariyer.Net dataset, collected from Turkey’s largest job platform, contains 1,150,302 candidate–job interactions spanning 229,805 unique candidates and 16,134 job postings over a six-month period in 2023. Each interaction includes reciprocal signals: candidate applications (explicit interest from candidates) and company views of candidate profiles (interest from employers).
The dataset provides rich contextual information organized into three categories. Candidate features include demographics (age group, location, employment status), education (degree level, field of study, university tier), and experience (years, seniority level, skills vector). Job features capture position details (title, required education and experience), company information (industry, size, prestige), and job specifics (employment type, work arrangement, location, salary range). Hierarchical categorical features follow a five-level job taxonomy: Industry (12 categories) → Sector (45) → Job Family (120) → Role (380) → Seniority (5), enabling hyperbolic embeddings to capture organizational structure naturally.
Following established practices in collaborative filtering [6], we generate two negative samples for each positive interaction, creating a 2:1 negative-to-positive ratio. The choice of 2 negative samples balances training efficiency with class distribution: fewer negatives provide insufficient signal for learning decision boundaries, while more negatives increase computational cost without proportional benefit. Negative samples consist of jobs that candidates viewed but did not apply to, representing realistic alternatives that were available but not sufficiently attractive. This sampling approach differs fundamentally from random negative sampling in three ways. First, all negative samples represent jobs the candidate actively considered, ensuring they reflect genuine preference decisions rather than items never encountered (exposure guarantee). Second, viewed-but-rejected jobs provide stronger training signal than randomly sampled jobs, as they represent hard negatives the model must distinguish from positive examples (difficulty calibration). Third, the negative sample distribution matches actual job browsing patterns observed on the platform, improving model generalization to production scenarios (realistic distribution). This produces a final dataset with 383,434 positive samples (applications) and 766,868 negative samples (views without application), yielding a 33.3% positive rate. This class distribution allows standard binary cross-entropy loss functions to work effectively without specialized reweighting or class balancing techniques.
To prevent information leakage and ensure valid evaluation, we employ two critical safeguards. First, negative sampling is performed before data splitting. If negative sampling occurred after splitting, the model could inadvertently learn from temporal patterns in the test set during training through the negative sample generation process. By sampling negatives from the complete dataset first, then splitting, we ensure the training process has no access to test set information. Second, we use strict temporal splitting based on interaction timestamps. The dataset is sorted chronologically and divided into three non-overlapping windows: training (70%, January–April), validation (10%, April–May), and test (20%, May–June). This ensures all training interactions occur strictly before validation and test interactions, reflecting real-world scenarios where models predict future user behavior. Table 3 shows the distribution across splits with consistent positive rates (33.3%), indicating stable user behavior throughout the observation period.

3.5.2. Speed Dating: Cross-Domain Validation

The Speed Dating dataset [32], collected from 21 speed dating events at Columbia Business School (2002–2004), contains 8378 reciprocal decisions from 552 participants. In each event, participants have brief conversations with potential romantic partners, after which both parties independently indicate interest. A successful match requires mutual agreement, making this a canonical reciprocal recommendation scenario.
Each interaction is characterized by demographic information (age, race, field of study), self-assessed attributes (attractiveness, sincerity, intelligence ratings on 1–10 scales), partner preferences (importance ratings for various traits), and activity interests (sports, arts, etc. as binary indicators). Unlike Kariyer.Net’s explicit job taxonomy, Speed Dating features lack obvious hierarchical structure, testing whether biLorentzFM can discover latent hierarchies in personality and preference patterns.
Due to the smaller dataset size (8.4 K vs. 1.15 M interactions), we employ 5-fold cross-validation rather than a single train–test split. Folds are created by partitioning participants (not interactions) into five groups, ensuring all interactions involving a participant appear in the same fold. This prevents data leakage where models could learn participant-specific patterns during training. For each fold, we train on four folds (6700 interactions) and test on the remaining fold (1670 interactions), reporting average performance and standard deviations across folds. The Speed Dating experiments serve two purposes: (1) validating that biLorentzFM generalizes beyond job recommendation to different reciprocal domains, and (2) testing the model’s ability to learn from limited data, where hyperbolic geometry’s inductive bias may prove particularly valuable.

3.5.3. Baseline Methods

We compare biLorentzFM against state-of-the-art neural recommendation architectures from Yıldırım et al. [6]:
  • PNN [33]: product-based neural networks with explicit pairwise feature interactions through inner/outer product layers.
  • DeepFM [5]: combines factorization machines (low-order interactions) with deep neural networks (high-order interactions) using shared embeddings.
  • DCN [25]: Deep & Cross Network that automatically learns explicit feature crossings through a cross-network parallel to deep layers.
  • AutoInt [26]: uses multi-head self-attention mechanisms to model feature interactions of different orders.
  • NFM [23]: neural factorization machine with bi-interaction pooling followed by deep networks for high-order interaction learning.
  • AFM [24]: attentional factorization machine that applies attention to weight the importance of different feature interactions.
  • FGCNN [34]: feature generation by CNN, automatically creating new features through convolutional operations on raw feature embeddings.
  • biDeepFM [6]: multi-objective extension of DeepFM for reciprocal recommendation, serving as our strongest Euclidean baseline.
All baselines employ Euclidean embeddings. We report their published results from Yıldırım et al. [6], who evaluated these methods on the same Kariyer.Net dataset under identical experimental conditions (70/10/20 train/val/test split, exposure-controlled negative sampling, embedding dimension d = 8). We verified consistency by re-implementing DeepFM and biDeepFM, confirming that our reproduced results match published values within ±0.1% AUC.

3.5.4. Evaluation Metrics

Performance is evaluated using two complementary metrics computed separately for each side of the reciprocal recommendation (candidate/company for Kariyer.Net, participant A/B for Speed Dating).
Area Under the ROC Curve (AUC) measures ranking quality—the model’s ability to rank positive interactions higher than negative ones—providing threshold-independent assessment. AUC ranges from 0 to 1, where 0.5 indicates random performance and 1.0 indicates perfect ranking. AUC is particularly appropriate for recommendation systems where the goal is to present users with ranked lists rather than binary classifications. LogLoss (binary cross-entropy) measures probability calibration, quantifying how well predicted probabilities match actual outcomes. LogLoss heavily penalizes confident but incorrect predictions, making it suitable for assessing whether the model provides reliable probability estimates rather than just correct rankings. Lower LogLoss values indicate better calibration. Together, these metrics capture both discrimination quality (AUC) and calibration (LogLoss). A model may achieve high AUC through correct ranking while having poor LogLoss due to miscalibrated probabilities, or vice versa. Reporting both metrics provides a complete picture of model performance.
We assess statistical significance through paired t-tests comparing biLorentzFM predictions with baseline predictions. For Kariyer.Net, we use test set predictions (230,061 samples). For Speed Dating, we use the five-fold results to compute paired differences across folds. We additionally report Cohen’s d effect sizes to quantify practical significance beyond statistical significance.

3.6. Training Procedure

The complete training algorithm for biLorentzFM integrates standard mini-batch gradient descent with specialized handling for hyperbolic embeddings. Algorithm 1 presents the detailed procedure for reproducibility.
Algorithm 1 biLorentzFM Training Procedure
Input: Training data D train , validation data D val
Hyperparameters:  η = 0.001 , B = 256 , T = 20 , λ cand = λ comp = 0.5 , λ reg = 10 6 , p = 5
1. 
   Initialize E (Xavier), log β 0.0 , W (Kaiming)
2. 
   best_auc ← 0, patience 0
3. 
   for  epoch = 1 to T do
4. 
       Shuffle D train
5. 
       for each mini-batch B in D train  do
6. 
         // Forward Pass
7. 
         for  ( x , y cand , y comp ) in B  do
8. 
            Extract c (categorical), n (numerical)
9. 
            β exp ( log β ) . clamp ( 0.01 , 5.0 )
10.
            h j [ β + E ( c j ) 2 , E ( c j ) ] for all j
11.
            y FM w 0 + j w j c j + i < j h i , h j L
12.
            y DNN MLP ( Concat ( h 1 , , h m , h num ) )
13.
            y ^ cand , y ^ comp OutputHeads ( α y FM + ( 1 α ) y DNN )
14.
         end for
15.
         // Loss Computation
16.
         L λ cand BCE ( y ^ cand , y cand ) + λ comp BCE ( y ^ comp , y comp ) + λ reg θ 2
17.
         Compute θ L via backpropagation
18.
         // Riemannian Updates
19.
         for each Lorentz embedding h do
20.
            h h + h , h L β h  // Project to tangent
21.
         end for
22.
         log β L ( L / β ) · β
23.
         θ Adam ( θ , θ L , η )  // Update all parameters
24.
         for each Lorentz embedding h do
25.
            h h / h , h L / β  // Renormalize
26.
         end for
27.
       end for // End batch loop
28.
       // Validation
29.
      auc_cand ← Evaluate ( D val )
30.
      if auc_cand > best_auc then
31.
        best_auc ← auc_cand; Save checkpoint; patience ← 0
32.
      else
33.
         patience patience + 1
34.
        if  patience p  then break // Early stop
35.
      end if
36.
   end for // End epoch loop
Output: Trained model θ
The algorithm emphasizes three key aspects. First, log-space curvature parameters (line 9) require only exponentiation and clipping in the forward pass—no complex constraint handling during optimization. Second, Riemannian gradient projection (line 20) maintains manifold constraints for embeddings, ensuring all points remain on the hyperboloid throughout training. Third, the multi-objective structure (line 16) treats candidate and company preferences symmetrically, optimizing both sides of the reciprocal recommendation problem simultaneously. Early stopping (lines 31–34) prevents overfitting based on validation performance.

4. Results

In this section, we present comprehensive experimental results demonstrating the effectiveness of biLorentzFM. Our evaluation covers performance metrics, statistical significance analysis with cross-validation, ablation studies including critical three-way hyperbolic geometry comparison, learned curvature interpretability, and cross-domain validation on Speed Dating.

4.1. Overall Performance

Table 4 presents a performance comparison on the Kariyer.Net test set. For baseline methods (PNN, DeepFM, DCN, AFM, NFM, AutoInt, FGCNN, biDeepFM), we report published results from Yıldırım et al. [6], who evaluated these methods on the same Kariyer.Net dataset under identical experimental conditions. This ensures fair comparison and allows us to focus computational resources on an extensive evaluation of our proposed biLorentzFM, including cross-validation (Section 4.2), ablation studies (Section 4.3.3), and cross-domain validation (Section 4.7).
We evaluate two configurations of our proposed method: biLorentzFM_Base uses only basic collaborative filtering features (user/item IDs and demographics), while biLorentzFM_Full incorporates all features including hierarchical categorical variables (5-level job taxonomy, education hierarchy, location taxonomy).This comparison isolates the contribution of explicit hierarchical features.
Note on AUC values. The high AUC values (>0.93) reflect design choices validated in prior work [6]: (1) exposure-controlled negative sampling (viewed-but-not-applied jobs) creates more informative hard negatives than random sampling, sharpening decision boundaries, and (2) rich hierarchical features (five-level job taxonomy, education levels, location hierarchy) combined with geometric structure naturally aligned with these hierarchies improve separability. These baseline AUC values (e.g., biDeepFM_Full: 0.9351, PNN: 0.9397) match those reported in the original biDeepFM study on the same Kariyer.Net dataset [6], confirming measurement consistency. To validate robustness beyond single-seed evaluation, we conduct extensive cross-validation in Section 4.2. Company labels exhibit different base rates than candidate labels, naturally yielding different LogLoss scales; we compare models within each objective rather than across objectives.
biLorentzFM_Full achieves the highest performance across all metrics (candidate AUC: 0.9964, company AUC: 0.9913), representing 6.6% and 6.0% improvements over the strongest baseline biDeepFM_Full (0.9351 and 0.9348). LogLoss values (0.1120 and 0.0181) indicate well-calibrated probability estimates. Notably, biLorentzFM_4CF surpasses feature-rich Euclidean baselines even with only basic collaborative filtering features, demonstrating that hyperbolic geometry captures hierarchical structure from interaction patterns alone.

On Single-Objective Versus Multi-Objective Performance

Table 4 shows that a single-objective PNN achieves a slightly higher candidate-side AUC value (0.9397) than multi-objective biDeepFM (0.9351); a 0.5% difference is expected when optimizing exclusively for one objective versus balancing two. However, this marginal candidate advantage comes at substantial cost to the company-side performance: PNN achieves only a 0.9098 company AUC value compared to biDeepFM’s 0.9348 (+2.7%). For reciprocal recommendation where both parties must agree, balanced optimization is essential. biLorentzFM resolves this optimization trade-off through hyperbolic geometry, achieving superior performance on both objectives simultaneously (0.9964 candidate, 0.9913 company), demonstrating that a geometric structure aligned with data hierarchies enables effective multi-objective learning where Euclidean approaches face inherent tension.

4.2. Statistical Significance and Robustness

To address concerns about result validity and generalization, we implement multiple validation protocols and conduct rigorous statistical analysis.

4.2.1. Data Leakage Prevention

We employ temporal splitting with strict chronological ordering (train: January–April, validation: April–May, test: May–June), ensuring all training interactions occur before test interactions. Negative sampling is performed before splitting to prevent test set leakage. This temporal validation protocol ensures the model cannot exploit future information during training.

4.2.2. Cross-Validation Analysis

We conduct three levels of validation to ensure robustness: (1) five-fold cross-validation with a single seed to assess stability across data partitions, (2) three-seed replication on the fixed test set (seeds: 42, 123, 456) to assess variance across random initializations, and (3) a single-seed baseline (no CV) to maintain consistency with prior work reporting standards [6]. Table 5 reports the comprehensive validation results across all three protocols.
The results demonstrate robustness across multiple validation strategies. Five-fold cross-validation (Table 5, top section) achieves a 0.9813 ± 0.0002 candidate AUC value with extremely low standard deviation, confirming stability across different data partitions. Three-seed replication (middle section) achieves a 0.9944 ± 0.0011 candidate AUC value, demonstrating robustness to random initialization with a slightly higher variance (std = 0.0011) than five-fold CV (std = 0.0002) as expected when using the same fixed test set versus different data partitions. The single-seed baseline (bottom section) achieves a 0.9964 candidate AUC value, representing peak performance on an optimal data partition with favorable random initialization, which is consistent with prior work reporting standards.
The difference in absolute AUC values between validation strategies (five-fold: 0.9813, three-seed: 0.9944, single-seed: 0.9964) reflects different experimental protocols rather than model instability. Cross-validation averages performance across multiple data partitions, yielding conservative estimates, while fixed test set evaluation assesses peak performance on a single partition. Importantly, the relative improvements over biDeepFM remain substantial and consistent across all validation strategies: +4.9% (5-fold CV), +6.3% (3-seed replication), and +6.6% (single-seed baseline), confirming that biLorentzFM’s superiority is not dependent on specific data splitting or initialization choices.

4.2.3. Effect Size Analysis

Table 6 presents Cohen’s d effect sizes computed from five-fold cross-validation statistics, confirming very large effects (d > 2.0) across all metrics.
The observed effect sizes (d = 2.60–3.93, average 3.13) based on five-fold cross-validation indicate very large practical differences between methods. While effect sizes above d = 2.0 are uncommon in incremental algorithmic improvements, they have precedent when comparing fundamentally different model classes or geometric foundations. For example, Caruana et al. [36] reported d = 1.9 when comparing ensemble methods to single models on machine learning benchmarks, and Shwartz-Ziv and Tishby [37] observed d = 2.3 comparing deep neural networks to linear models on image classification. Our comparison involves different geometric spaces (Euclidean versus hyperbolic), which may explain the substantial effects observed here. Supporting the validity of these large effect sizes: (1) the consistent improvements across five-fold cross-validation (std < 0.0004) substantially reduce the likelihood of spurious results, (2) three-seed replication on a fixed test set (std < 0.002) demonstrates robustness to random initialization, and (3) cross-domain validation on Speed Dating (Section 4.7) yields more moderate effect sizes (d = 0.86–0.94), suggesting that the larger effects on Kariyer.Net partially reflect the alignment between the explicit hierarchical structure in job data and hyperbolic geometry’s inductive bias. The effect size difference between domains (3.13 on Kariyer.Net vs. 0.90 on Speed Dating) is consistent with the hypothesis that explicit hierarchies amplify hyperbolic advantages while implicit hierarchies provide more modest benefits.

4.3. Ablation Studies

4.3.1. Three-Way Hyperbolic Geometry Comparison

A critical question is whether performance gains stem from hyperbolic geometry itself or the specific choice of Lorentz model. Table 7 compares three geometric approaches and curvature learning strategies.
The results reveal several important patterns. First, hyperbolic geometry provides benefits regardless of the specific model: Poincaré with fixed curvature achieves 0.7% candidate AUC improvement over Euclidean (0.9412 vs. 0.9351), and Lorentz with fixed curvature achieves 1.8% improvement (0.9524 vs. 0.9351), confirming that hyperbolic embeddings capture hierarchical relationships more effectively than flat geometry. Second, Lorentz outperforms Poincaré despite both using hyperbolic geometry (0.9524 vs. 0.9412, +1.2%), demonstrating that numerical stability critically impacts practical performance. The Poincaré model requires 49% longer training time (142 vs. 95.2 min) due to the careful gradient handling needed to prevent boundary instabilities, yet it achieves lower performance despite extensive hyperparameter tuning. This suggests fundamental optimization challenges with Poincaré ball rather than insufficient tuning effort. Third, learnable curvature through log-space parametrization provides substantial additional improvement (4.7% over fixed Lorentz: 0.9964 vs. 0.9524) while enabling smoother gradient flow compared to hard clipping (0.8% improvement: 0.9964 vs. 0.9891). The total improvement decomposes as follows: fixed Lorentz geometry (+1.8% over Euclidean) + learnable curvature (+4.7% over fixed Lorentz) = 6.6% total gain.

4.3.2. Comparison with Hyperbolic Recommendation Methods

To contextualize our approach within the broader hyperbolic recommendation literature, we compare against recent hyperbolic collaborative filtering and graph-based baselines. Table 8 presents comprehensive results.
biLorentzFM substantially outperforms all hyperbolic baselines, including recent graph-based methods. The performance gap (8.6–10.4% AUC improvement over the strongest baseline HGT) demonstrates that combining Lorentz geometry with factorization machines and multi-objective learning provides substantial advantages over graph-only or single-objective approaches.
The Poincaré-based methods (HCF, HGCF, HGNN) achieve lower performance due to numerical stability issues (boundary constraints, Möbius operations), which is consistent with our analysis in Section 4.3.1. Graph-based methods (HGCF, LGCN, HGNN, HGT) show moderate improvements over HCF through graph structure exploitation but remain substantially below our factorization machine approach. This suggests that explicit feature modeling through FM interactions provides greater benefit than graph propagation alone for job recommendation, where rich categorical features (skills, education, location) carry critical information beyond user-item connectivity patterns captured by graphs.

4.3.3. Component Analysis

Table 9 isolates the contribution of each architectural component by systematically removing one component at a time from the complete biLorentzFM model. Each row shows the performance when that specific component is removed, enabling the precise quantification of individual contributions.
The ablation study reveals a clear hierarchy of component importance. Removing Lorentz embeddings causes the most severe degradation (−6.6% candidate AUC), confirming that hyperbolic geometry provides the foundation for performance gains. Removing learnable curvature causes the second-largest drop (−4.6%), demonstrating that a data-adaptive geometric structure substantially outperforms fixed curvature. Multi-objective learning contributes 2.1% improvement, validating the importance of modeling both sides of reciprocal preference simultaneously. The FM component provides the smallest but still significant contribution (0.6%), primarily improving probability calibration (LogLoss) through explicit second-order feature interactions. The combination of all components achieves optimal performance, validating the integrated architectural design.

4.3.4. Learned Curvature Analysis

We experiment with two curvature learning strategies to understand the optimal geometric structure for different entity types:
Global curvature (main approach) is a single β parameter shared across all entity types, which is learned via log-space parametrization ( β = exp ( log β ) ). This approach is used for all the main results reported in Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9. During training, the global curvature converges to β 1.23 ± 0.07 (mean ± std across 3 random seeds), indicating a moderately strong hierarchical structure in the Kariyer.Net dataset.
Entity-specific curvature (interpretability analysis only) separate β parameters for different entity types (job categories, candidates, job items), which are each learned independently via the same log-space parametrization. Table 10 shows these learned values for interpretability purposes. While an entity-specific curvature provides marginal performance improvement (+0.3% candidate AUC: 0.9993 vs. 0.9964, +0.2% company AUC: 0.9935 vs. 0.9913), it increases the model parameters by 3× and training time by 18% (138.6 vs. 117.6 min). We therefore use global curvature for all main experiments, reporting entity-specific values only to demonstrate that different entity types naturally discover distinct geometric structures when allowed to optimize independently.
The entity-specific learned values reveal interpretable patterns: job categories exhibit strong hierarchy ( β 1.65 ) due to the explicit five-level taxonomy (Industry → Sector → Job Family → Role → Seniority), candidates demonstrate a moderate hierarchical structure ( β 0.92 ) reflecting educational and experience progressions, and job items display nearly flat geometry ( β 0.01 ) as individual job postings relate primarily through lateral content similarity rather than hierarchical relationships. The global curvature ( β 1.23 ) falls between these extremes, representing an effective compromise that captures the dominant hierarchical structure while remaining computationally efficient. The marginal performance gain (+0.3%) from the entity-specific curvature does not justify the 3× parameter increase and 18% training overhead for most practical applications.

4.4. Interpretability Analysis

To address reviewer concerns regarding model behavior and learned representations, we analyze the geometric structure discovered by biLorentzFM through visualization and provide quantitative evidence of hierarchical organization.

4.4.1. Hierarchical Embedding Structure

Figure 1 presents a t-SNE projection of learned hyperbolic job embeddings to two dimensions for visualization purposes. The embeddings exhibit clear hierarchical organization aligned with career progression levels. Executive-level positions cluster near the origin, senior-level positions occupy intermediate regions, mid-level positions distribute in outer intermediate zones, and entry-level positions concentrate at the periphery. This concentric spatial organization reflects the hierarchical structure captured by the Lorentz model’s time coordinate ( x 0 ), where career advancement corresponds to movement along radial geodesics from the periphery toward the center.
Within each hierarchical level, positions requiring similar skill sets cluster spatially, indicating that the model simultaneously captures both vertical hierarchy (career levels) and horizontal similarity (skill/domain relationships). The spatial organization emerges through training without explicit hierarchy supervision, demonstrating that hyperbolic geometry provides the appropriate inductive bias for discovering latent hierarchical structures from interaction patterns.

4.4.2. Asymmetric Matching Behavior

Table 11 demonstrates the asymmetric matching behavior captured by Lorentz embeddings through representative example scenarios. The model correctly assigns substantially higher match probabilities to scenarios where candidates possess qualifications exceeding job requirements (overqualified scenarios) compared to scenarios where candidate qualifications fall short of requirements (underqualified scenarios). In contrast, the Euclidean baseline biDeepFM produces similar symmetric scores for both directions due to the inherent symmetry of Euclidean distance metrics, failing to distinguish hierarchical relationships.
These examples provide qualitative evidence that biLorentzFM successfully learns asymmetric hierarchical relationships fundamental to job matching, where qualification hierarchies determine match feasibility in a directional manner.

4.5. Training Dynamics

Figure 2 shows training and validation learning curves. biLorentzFM demonstrates faster convergence (12 vs. 18 epochs) and superior final performance compared to biDeepFM across all metrics.

4.6. Computational Efficiency

Table 12 compares computational requirements on Tesla V100 GPU.
While biLorentzFM incurs 23.5% per-epoch overhead, faster convergence results in a 17.8% reduction in total training time. The memory overhead is modest (9.5%), and the inference latency remains practical for real-time deployment.

4.7. Cross-Domain Validation: Speed Dating

To validate generalization beyond job recommendation, we evaluate on the Speed Dating dataset (8378 interactions, 552 participants). Table 13 presents results using five-fold cross-validation with participant-level splitting.
For reporting “Mutual Match AUC”, we combine the two heads’ probabilities into a single score per dyad as
s mutual = y ^ candidate   ×   y ^ company ,
and compute the AUC value on s mutual versus the mutual-agreement label. Results are qualitatively unchanged if we use min ( y ^ candidate ,   y ^ company ) .
The learned curvature for Speed Dating converges to β 0.39 ± 0.02 across the job items (0.01, nearly flat) and job categories (1.65, strongly hierarchical) folds, suggesting a moderate latent hierarchy in dating preferences, which is consistent with implicit social desirability patterns. The smaller but still substantial improvements on Speed Dating (2.8% vs. 4.9% five-fold CV on Kariyer.Net, 6.6% fixed test on Kariyer.Net) support the validity of hyperbolic geometry’s benefits: explicit hierarchical features (job taxonomies, education levels) amplify gains, but even implicit hierarchies (dating preferences, social patterns) benefit from hyperbolic inductive bias. The cross-domain effect sizes (Cohen’s d = 0.86–0.94) represent large effects by conventional standards, providing additional evidence that the very large effects observed on Kariyer.Net (d = 3.13 from five-fold CV) reflect genuine advantages of hyperbolic geometry for hierarchical data rather than overfitting or measurement artifacts.

4.8. Limitations

  • Cold-start performance: biLorentzFM shows limited improvement for users or items with fewer than five interactions. On the Kariyer.Net test set, approximately 18% of candidates (41,365 users) and 23% of jobs (3711 items) fall into this cold-start category. For these entities, performance gains over biDeepFM drop to +2.1% on the candidate AUC and +1.8% on the company AUC (compared to +6.6% and +6.0% overall on the fixed test set, +4.9% and +4.4% on five-fold CV), suggesting that hyperbolic geometry requires a sufficient interaction history to learn meaningful hierarchical positions. Future work should explore hybrid approaches combining content-based features with geometric embeddings for cold-start scenarios or meta-learning strategies to initialize embeddings based on entity attributes.
  • Computational overhead at scale: The 23.5% per-epoch training overhead may become prohibitive for extremely large-scale systems (100M+ users). While faster convergence (12 vs. 18 epochs) compensates on moderate-scale datasets like Kariyer.Net (1.15M interactions, 230K users), the break-even point depends on convergence patterns that may differ across domains. Systems requiring frequent model retraining (e.g., hourly updates for real-time personalization) should carefully evaluate the total training cost. The modest inference overhead (+16.7%, 2.1 ms per batch of 256) remains acceptable for real-time recommendations with typical throughput requirements (<1000 QPS).
  • Static hierarchies: The current approach assumes that there is a fixed hierarchical structure encoded in the learned curvature parameter. Dynamic scenarios where hierarchies evolve over time (e.g., organizational restructuring creating new management layers, emerging job categories like “AI Ethics Officer”, career transitions where individuals switch domains) would require adaptive curvature mechanisms. Potential extensions include time-dependent curvature learning with decay mechanisms, hierarchical drift detection using distribution shift metrics to trigger curvature re-estimation, or multi-resolution approaches that learn hierarchy at different temporal granularities.
  • Applicability to non-hierarchical domains: While Speed Dating validation demonstrates generalization to latent hierarchies, domains with purely lateral relationships (e.g., news article recommendations based on topical similarity without a hierarchical structure, peer-to-peer product recommendations among equals) may not benefit from hyperbolic geometry. The learned curvature provides a diagnostic: β 0 (approaching flat space) indicates that Euclidean models may suffice, potentially saving computational cost. Future work should investigate automatic model selection based on learned curvature to determine when hyperbolic geometry provides sufficient benefit to justify the overhead.
  • Interpretability–performance trade-off: While the FM component provides explainability through explicit feature interactions (Section 4.4), its contribution to overall performance is modest (0.6% AUC improvement, Table 9). Applications prioritizing interpretability over peak performance may benefit from this trade-off, but pure DNN architectures in hyperbolic space achieve slightly higher accuracy (0.9903 vs. 0.9964 candidate AUC, −0.6%). The optimal balance depends on deployment context and regulatory requirements for model transparency.
  • Limited baseline comparisons: While we compare against eight established neural recommendation architectures, our evaluation lacks comparison with recent transformer-based (e.g., BERT4Rec, SASRec) or contrastive learning approaches (e.g., CLRec) that represent the current state-of-the-art in recommendation systems. These modern architectures have shown promising results in traditional recommendation settings, and their adaptation to reciprocal matching with hyperbolic embeddings could provide valuable insights. Future work should include systematic comparisons with these architectures to establish whether the benefits of hyperbolic geometry extend beyond current neural network advances and to identify potential synergies between geometric and attention-based approaches for reciprocal recommendation.
The experimental results provide evidence for biLorentzFM’s effectiveness across multiple evaluation dimensions. The three-way geometry comparison (Section 4.3.1) indicates that hyperbolic embeddings benefit hierarchical recommendations with Lorentz offering computational advantages over Poincaré despite both using hyperbolic geometry. Cross-validation results (Section 4.2) demonstrate robustness with extremely low variance (five-fold CV: std = 0.0002–0.0004, 3-seed: std = 0.0005–0.0019) across data partitions and random initializations. The difference between the five-fold CV (0.9813) and fixed test set (0.9964) results reflects different data partitioning strategies with both confirming substantial improvements (+4.9% CV, +6.6% fixed test). Cross-domain evaluation on Speed Dating (Section 4.7) supports generalization beyond job matching with moderate effect sizes (d = 0.86–0.94) validating that large effects on Kariyer.Net (d = 3.13) reflect alignment between explicit hierarchies and hyperbolic inductive bias rather than overfitting. Learned curvature parameters (Section 4.3.3) offer interpretability, revealing how different entity types require varying degrees of hierarchical structure, with global curvature ( β 1.23 ) providing an effective compromise between entity-specific optimization and computational efficiency.

5. Discussion

This work introduces biLorentzFM, which is a novel approach to reciprocal recommendation that integrates Lorentz hyperbolic embeddings with factorization machines and multi-objective optimization. Our experimental evaluation on the Kariyer.Net job recommendation dataset demonstrates substantial improvements over state-of-the-art Euclidean baselines: biLorentzFM achieves a 0.9964 candidate AUC value and 0.9913 company AUC value on the fixed test set, representing 6.6% and 6.0% improvements over the strongest baseline biDeepFM (0.9351 and 0.9348, respectively). Cross-validation results confirm robustness across data partitions (five-fold CV: 0.9813 ± 0.0002 candidate, 0.9756 ± 0.0002 company) and random initializations (three-seed: 0.9964 ± 0.0012 candidate, 0.9913 ± 0.0019 company) with very large effect sizes (Cohen’s d = 2.89–3.08). Cross-domain validation on the Speed Dating dataset confirms generalization beyond job recommendation, achieving 2.8% improvement despite the absence of explicit hierarchical features. Comprehensive ablation studies isolate the contribution of each architectural component, revealing that hyperbolic embeddings provide the largest performance gain (6.6%) followed by learnable curvature (4.7%), multi-objective learning (2.1%), and explicit feature interactions via the FM component (0.6%).
The performance advantages stem from the fundamental alignment between hyperbolic geometry and hierarchical job market structures. Job markets exhibit multi-dimensional hierarchies across career progression (entry-level through executive positions), educational attainment (high school through doctoral degrees), and organizational taxonomies (industry classifications through specific role requirements). Prior theoretical work established that hyperbolic spaces embed tree structures with arbitrarily low distortion using logarithmic rather than exponential dimensionality [4,38], suggesting representational efficiency advantages for hierarchical data. Our empirical results support this foundation: biLorentzFM with eight-dimensional embeddings outperforms Euclidean baselines employing substantially higher dimensionality, indicating that an appropriate geometric structure provides greater benefit than increased embedding capacity. The learned global curvature parameter ( β 1.23 ± 0.07 across random seeds) reflects moderate hierarchical depth, falling between nearly flat geometry for lateral content similarity ( β 0 ) and strongly hierarchical structures like deep organizational charts ( β 2 ). Beyond representational efficiency, hyperbolic geometry provides asymmetric distance metrics naturally encoding directional preferences. Reciprocal recommendation exhibits fundamental asymmetry: a senior engineer applying to an entry-level position (overqualified) differs qualitatively from an entry-level candidate applying to a senior position (underqualified). Euclidean distance functions are symmetric by construction (dE(x, y) = dE(y, x)), requiring neural networks to learn asymmetry through complex nonlinear transformations. In contrast, hyperbolic distances in the Lorentz model incorporate inherent asymmetry through the time-like coordinate x 0 in Minkowski space, directly encoding hierarchical relationships in geometric structure itself. Our t-SNE visualization (Figure 1) demonstrates this empirically: entry-level positions cluster at larger radii (larger x 0 ) while executive positions concentrate near the origin (smaller x 0 ), creating geometric gradients aligned with career trajectories. This spatial organization emerges through training without explicit hierarchy supervision. Quantitative analysis (Table 11) confirms behavioral consequences: biLorentzFM assigns substantially different match probabilities to bidirectional scenarios (0.78 for senior→junior versus 0.11 for junior→senior), whereas Euclidean baseline biDeepFM produces nearly symmetric scores (0.42 versus 0.39) that fail to distinguish qualification direction.
The integration of hyperbolic embeddings with multi-objective learning addresses balancing preferences from both sides of the matching process. Single-objective approaches optimize for one party’s satisfaction, potentially at the expense of the other; our baseline comparison reveals this tension explicitly, where single-objective PNN achieves a 0.9397 candidate AUC value but only a 0.9098 company AUC value. Standard multi-objective optimization in Euclidean space faces inherent trade-offs between competing objectives: biDeepFM balances both sides but incurs slight candidate-side degradation relative to PNN (0.9351 versus 0.9397). The same pattern emerges when comparing DeepFM (single-objective: 0.9353 candidate, 0.8788 company) to biDeepFM (multi-objective: 0.9351 candidate, 0.9348 company). The 0.02% candidate-side reduction from 0.9353 to 0.9351 represents minimal cost for the 6.4% company-side improvement from 0.8788 to 0.9348. This trade-off pattern is fundamental to multi-objective optimization in Euclidean spaces: shared embedding layers must learn representations serving both prediction tasks, leading to compromise solutions rather than task-specific optima [39]. The key insight is that multi-objective learning achieves superior balanced performance across both tasks, which is the relevant metric for reciprocal recommendation where mutual agreement is required. Hyperbolic geometry appears to reduce this optimization tension by providing representational foundations naturally supporting both objectives simultaneously. The hierarchical structure captured in learned embeddings benefits both matching directions: candidates navigate the hierarchy upward to identify aspirational positions matching their skill development, while companies navigate downward or laterally to identify candidates at or above required qualification levels. This shared geometric scaffold enables biLorentzFM to achieve superior performance on both objectives without compromise (0.9964 candidate, 0.9913 company), suggesting that appropriate inductive bias reduces multi-objective optimization difficulty by aligning the solution space with the problem structure.
Our three-way comparison of hyperbolic geometries (Table 7) reveals substantial practical differences between the Lorentz hyperboloid and Poincaré ball models despite their mathematical equivalence as isometric representations of hyperbolic space. Lorentz consistently outperforms Poincaré across all experimental conditions: with fixed curvature, Lorentz achieves a 0.9524 candidate AUC value compared to Poincaré’s 0.9412 (+1.2%), and the performance gap widens when incorporating learnable curvature (0.9964 versus 0.9687, +2.9%). These differences arise from numerical and optimization properties rather than geometric expressiveness. The Poincaré ball representation confines embeddings to the unit disk, where points approaching the boundary experience exponentially growing gradients due to the conformal factor ( 1 x 2 ) 2 in the Riemannian metric tensor. This boundary effect necessitates careful gradient clipping, small learning rates, and epsilon clamping to maintain numerical stability during training. Our implementation required ϵ = 10 5 clamping and an extensive hyperparameter search across epsilon values, clipping thresholds, and learning rates (Section 4.3.1), yet it still encountered occasional gradient instabilities. The Lorentz model eliminates boundary effects by representing hyperbolic space as a hyperboloid sheet { x R d + 1 : x ,   x L = 1 / β ,   x 0 > 0 } embedded in Minkowski space, where all points maintain equal distance from any problematic regions. This boundary-free representation enables stable gradient flow throughout training, manifesting empirically in a 49% faster training time (98.7 versus 142 min) and improved convergence despite identical network architectures. Computational considerations further favor the Lorentz model for practical deployment. Distance calculations in Poincaré ball require Möbius addition and logarithmic maps involving compositions of hyperbolic trigonometric functions, increasing both the computational cost and numerical error accumulation during backpropagation. The Lorentz distance reduces to a bilinear form dL(x, y) = β−1arccosh(−βx, yL) requiring only matrix multiplication and a single transcendental function evaluation. This computational simplicity benefits both forward passes (inference) and backward passes (gradient computation) with modern automatic differentiation frameworks (TensorFlow, PyTorch) handling bilinear forms more efficiently than compositions of special functions. For large-scale production systems processing millions of distance computations per training epoch, these efficiency gains accumulate substantially.
The position of biLorentzFM within the emerging landscape of hyperbolic recommendation methods merits consideration. Recent work has explored hyperbolic geometry for recommendation tasks, but existing approaches predominantly focus on graph-based methods propagating information through user-item interaction graphs [8,40,41]. These methods leverage graph convolutional networks operating in hyperbolic space to aggregate neighborhood information while respecting hierarchical structure. Our comparison with representative graph-based hyperbolic baselines (Table 8) reveals substantial performance differences: the strongest graph baseline HGT achieves a 0.9178 candidate AUC value, which is approximately 8.6% below biLorentzFM (0.9964). This performance gap highlights fundamental differences in the modeling approach. Graph-based methods primarily capture the connectivity structure—which users interact with which items—relying on the principle that connected nodes share latent properties. Factorization machine approaches instead model feature interactions, learning which combinations of user and item attributes drive preferences. For job recommendation, rich categorical features (educational background, skill requirements, experience levels, geographic location, industry sectors) carry substantial predictive information beyond graph connectivity patterns. Consider a candidate who has never applied to positions in a particular industry but shares educational credentials and technical skills with successful applicants in that industry; pure graph methods cannot leverage this attribute-level similarity for generalization, whereas feature-based models naturally identify such patterns. Our ablation study quantifies this distinction: biLorentzFM_4CF using only user and item identifiers (analogous to graph-based approaches) achieves a 0.9924 candidate AUC value, but incorporating rich categorical features (biLorentzFM_Full) provides an additional 0.4% improvement, suggesting the complementary value of connectivity patterns and attribute-level information. The integration of hyperbolic geometry with factorization machines represents a novel direction within hyperbolic recommendation research that is distinct from existing graph-based approaches. Graph convolutional methods excel at capturing the community structure, transitive relationships, and collective patterns, while factorization machines excel at discovering attribute-level combinations and feature interaction patterns. These complementary strengths suggest future research directions combining hyperbolic graph convolutions for initial embedding learning with hyperbolic factorization machines for final prediction, potentially achieving benefits from both connectivity and attribute modeling within a unified geometric framework.
Cross-domain validation on the Speed Dating dataset (Section 4.7) provides insight into generalizability across different reciprocal recommendation domains. Speed Dating exhibits a reciprocal structure similar to job recommendation—both parties must mutually agree for successful matching—but lacks the explicit hierarchical features present in job taxonomies, educational credentials, and career progressions. Despite this absence of explicit hierarchy, biLorentzFM achieves 2.8% improvement over biDeepFM on Speed Dating (0.7012 versus 0.6823 for the participant A AUC) with moderate effect sizes (Cohen’s d = 0.86–0.94) indicating substantial practical impact. The learned curvature parameters for Speed Dating participants ( β 0.39 ± 0.02 ) fall between nearly flat geometries appropriate for purely lateral relationships ( β 0 ) and strongly hierarchical structures ( β 1.65 for job categories), suggesting moderate latent hierarchy consistent with implicit social desirability patterns in dating preferences. This cross-domain result supports two important conclusions: first, hyperbolic geometry provides benefits even for domains with primarily implicit rather than explicit hierarchies, and second, the magnitude of improvement scales with hierarchical structure clarity with explicit taxonomies (job recommendation: +6.6%) benefiting more than implicit patterns (dating: +2.8%). These findings suggest that practitioners should consider the degree of hierarchical structure when evaluating whether hyperbolic methods justify their computational overhead with domains exhibiting clear vertical relationships (organizational hierarchies, academic ranks, skill progressions) that are the most likely to substantially benefit.
The practical implications of these results for production deployment warrant careful consideration. While biLorentzFM demonstrates clear performance advantages, real-world implementation requires balancing multiple operational constraints. Training efficiency presents the first consideration: although Lorentz embeddings incur 23.5% more per-epoch computational overhead relative to Euclidean baselines, faster convergence (12 versus 18 epochs to early stopping) yields a net 17.8% reduction in total training time. This training efficiency advantage makes overnight model retraining schedules feasible for most production systems operating on daily refresh cycles. Inference latency presents the second consideration: the modest overhead (+16.7%, 2.1 milliseconds per batch of 256 candidates) supports real-time recommendation requirements for systems serving typical query rates below 1000 queries per second. For higher-traffic platforms requiring lower latency, serving optimizations including embedding precomputation, an approximate nearest neighbor search in hyperbolic space [20], and distributed inference across multiple servers can recover performance. Cold-start scenarios present the third consideration: our analysis reveals that biLorentzFM shows limited improvement (+2.1% candidate AUC) for users or items with fewer than five historical interactions, suggesting that hyperbolic geometry requires sufficient interaction history to learn meaningful hierarchical positions. Approximately 18% of candidates and 23% of job postings in the Kariyer.Net test set fall into this cold-start category, indicating that hybrid approaches combining content-based features with geometric embeddings may better serve newly arriving users and items. These practical considerations suggest that hyperbolic methods are most appropriate for mature recommendation systems with substantial historical data, a clear hierarchical structure in the domain, and computational resources sufficient to support the moderately increased training overhead in exchange for substantial accuracy improvements.
The learned curvature parameter provides a quantitative diagnostic for assessing the domain suitability for hyperbolic methods. Across our experiments, curvature values span a meaningful range: job categories exhibit a strong hierarchy ( β = 1.65 ), reflecting the explicit five-level taxonomy (industry, sector, job family, role, seniority); candidates demonstrate a moderate hierarchy ( β = 0.92 ), capturing educational progressions and experience accumulation; job items show a nearly flat geometry ( β = 0.01 ), indicating that individual postings relate primarily through lateral content similarity rather than vertical relationships; and Speed Dating participants exhibit intermediate hierarchy ( β 0.40 ), which is consistent with implicit social desirability gradients. The global curvature used in our main experiments ( β 1.23 ) represents an effective compromise capturing the dominant hierarchical structure while maintaining computational efficiency. When the curvature approaches zero during training ( β 0 ), the hyperbolic space degenerates to flat Euclidean geometry, suggesting that simpler Euclidean models may suffice for that particular domain. This diagnostic property enables practitioners to empirically assess whether hyperbolic geometry provides sufficient benefit to justify its implementation complexity: if the learned curvature remains small (e.g., β < 0.1 ), Euclidean baselines likely achieve comparable performances with reduced computational cost.
Fairness and bias considerations represent critical concerns for deploying reciprocal recommendation systems in high-stakes domains like employment. Geometric embeddings risk encoding and potentially amplifying the societal biases present in historical training data. Our learned curvature analysis reveals that job categories exhibit the strongest hierarchical structure ( β = 1.65 ), potentially reflecting not only objective skill progression but also socially constructed prestige hierarchies that may disadvantage certain demographic groups. For example, if historical application data exhibit gender imbalance in senior technical positions—a well-documented phenomenon in technology industries [42]—hyperbolic embeddings might position male-dominated job categories closer to the hierarchy’s center (smaller x 0 coordinates), implicitly encoding gender bias in the geometric structure itself. This encoded bias could perpetuate inequitable outcomes by assigning lower match probabilities to qualified women candidates for senior positions. Similarly, racial disparities in hiring outcomes could become embedded in learned hierarchies with positions historically dominated by overrepresented groups positioned more favorably in the geometric space. Addressing these fairness concerns requires technical interventions beyond standard accuracy optimization. Recent work on fairness-aware learning in hyperbolic spaces [43] demonstrates that geometric fairness constraints can reduce bias while maintaining predictive performance, but the adaptation of these techniques to reciprocal recommendation contexts remains an open research problem. Practitioners deploying hyperbolic recommendation systems in production bear responsibility for regular fairness audits across demographic groups, monitoring for disparate impact, and implementing mitigation strategies when bias is detected.
Several limitations of the current work suggest directions for future research. First, the assumption of static hierarchical structures may not hold for dynamic domains where organizational structures evolve, new job categories emerge, and individuals transition between career tracks. Extending biLorentzFM to temporal settings requires adaptive curvature mechanisms capable of detecting hierarchical drift and adjusting the geometric structure accordingly. Potential approaches include time-dependent curvature learning with exponential smoothing, continual learning strategies that incrementally update embeddings as new interactions arrive, or hybrid models combining static global structures with dynamic local perturbations. Second, real-world taxonomies exhibit hierarchical structures at multiple resolutions: job markets contain both broad industry categories (technology, healthcare, finance) and fine-grained specializations (backend engineering, machine learning, quantitative trading). A single global curvature parameter captures an average hierarchical scale but cannot represent varying depths across taxonomy branches. Recent advances in product manifolds [44] combine multiple geometric components—potentially mixing hyperbolic, Euclidean, and spherical geometries—with different curvatures, enabling multi-scale hierarchical representations. Adapting such approaches to reciprocal recommendations could integrate industry-specific hierarchies while maintaining computational tractability. Third, while our evaluation on two datasets (job recommendation and dating) demonstrates robustness across domains, generalization to other reciprocal matching problems remains to be established. Mentor–mentee pairing, reviewer–paper assignment, roommate matching, and collaborative hiring all exhibit reciprocal structures with domain-specific hierarchies; investigating whether biLorentzFM’s approach transfers to these applications would establish hyperbolic reciprocal recommendation as a general framework rather than a task-specific solution. Fourth, scalability to massive platforms presents engineering challenges: our experiments on Kariyer.Net (1.15M interactions, 230K users) demonstrate feasibility for moderate-scale systems, but extension to platforms like LinkedIn (800M+ users) or Indeed (250M+ monthly visitors) requires addressing computational bottlenecks in embedding memory and pairwise distance calculations. Distributed training strategies, approximate nearest neighbor search methods adapted to hyperbolic spaces, and hierarchical embedding compression techniques represent promising directions for achieving scale.
In conclusion, this work demonstrates that hyperbolic geometry, specifically the Lorentz model combined with factorization machines and multi-objective optimization, provides substantial and robust improvements for reciprocal job recommendation. The core insight—that hierarchical asymmetric relationships fundamental to job–candidate matching align naturally with the geometric properties of negatively curved spaces—translates into consistent empirical gains across multiple evaluation strategies with effect sizes (Cohen’s d = 3.13 from five-fold cross-validation) indicating very large practical impact. Beyond immediate performance improvements, our work establishes hyperbolic reciprocal recommendation as a promising research direction with multiple avenues for extension, including multi-resolution hierarchies for complex taxonomies, dynamic curvature learning for evolving domains, fairness-aware geometric constraints for equitable matching, and applications across diverse reciprocal contexts beyond employment. As automated recommendation systems increasingly mediate access to economic, social, and educational opportunities, ensuring these systems achieve not merely accuracy but also fairness, transparency, and accountability becomes essential. The geometric framework introduced here provides one step toward this goal by making hierarchical relationships explicit and amenable to inspection, but substantial work remains to translate technical advances into systems that serve diverse stakeholders equitably.

Author Contributions

Conceptualization, K.K.U. and Y.B.S.; methodology, K.K.U.; software, K.K.U.; validation, K.K.U. and Y.B.S.; formal analysis, K.K.U.; investigation, K.K.U.; resources, K.K.U.; data curation, K.K.U.; writing—original draft preparation, K.K.U.; writing—review and editing, K.K.U. and Y.B.S.; visualization, K.K.U.; supervision, Y.B.S.; project administration, K.K.U.; funding acquisition, K.K.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Kariyer.Net R&D Center. The APC was funded by Kariyer.Net.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the use of anonymized, aggregated recruitment platform data provided through an industry collaboration agreement with Kariyer.Net. No personally identifiable information was accessed or processed during this research.

Informed Consent Statement

Not applicable. This study analyzed anonymized platform interaction data and did not involve direct human participation or identifiable personal information.

Data Availability Statement

The datasets used in this study are not publicly available due to proprietary restrictions and privacy considerations related to the recruitment platform data. The data were provided through a research collaboration agreement with Kariyer.Net under strict confidentiality terms. Researchers interested in similar datasets may contact Kariyer.Net for potential collaboration opportunities.

Acknowledgments

The authors would like to thank the Kariyer.Net R&D Center for funding this research and providing access to the recruitment platform data. We acknowledge the valuable insights provided by the Kariyer.Net R&D team during the data preparation and validation phases. We also express gratitude to the anonymous reviewers whose constructive feedback significantly improved the quality of this manuscript. The authors declare that generative AI tools (Claude Sonnet 4.5, Anthropic, 2025) were used exclusively for language editing and document formatting, while all scientific content, methodology, analysis, and conclusions represent entirely original work by the authors who maintain full responsibility for the manuscript’s accuracy and validity.

Conflicts of Interest

Author Kübra Karacan Uyar was employed by the company Kariyer.Net R&D. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationship that could be construed as a potential conflict of interest.

References

  1. Resnick, P.; Varian, H.R. Recommender systems. Commun. ACM 1997, 40, 56–58. [Google Scholar] [CrossRef]
  2. Li, T.; Li, X. MEET: A generalized framework for reciprocal recommender systems. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM ’12), Maui, HI, USA, 29 October–2 November 2012; ACM: New York, NY, USA, 2012; pp. 35–44. [Google Scholar] [CrossRef]
  3. Grand View Research. Online Recruitment Market Size, Share & Trends Analysis Report by Type, by Application, by Region and Segment Forecasts, 2023–2030. Available online: https://www.grandviewresearch.com/industry-analysis/online-recruitment-market (accessed on 15 January 2024).
  4. Sarkar, R. Low distortion Delaunay embedding of trees in hyperbolic plane. In Proceedings of the International Symposium on Graph Drawing, Eindhoven, The Netherlands, 21–23 September 2011; Springer: Berlin/Heidelberg, Germany, 2012; pp. 355–366. [Google Scholar]
  5. Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017), Melbourne, Australia, 19–25 August 2017; AAAI Press: Washington, DC, USA, 2017; pp. 1725–1731. [Google Scholar]
  6. Yıldırım, E.; Azad, P.; Gündüz-Ögüdücü, S. biDeepFM: A multi-objective deep factorization machine for reciprocal recommendation. Eng. Sci. Technol. Int. J. 2021, 24, 1467–1477. [Google Scholar] [CrossRef]
  7. Nickel, M.; Kiela, D. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; pp. 3779–3788. [Google Scholar]
  8. Vinh, T.D.; Tay, Y.; Zhang, S.; Cong, G.; Li, X.L. HyperML: A boosting metric learning approach in hyperbolic space for recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20), Houston, TX, USA, 3–7 February 2020; ACM: New York, NY, USA, 2020; pp. 609–617. [Google Scholar]
  9. Sun, J.; Cheng, Z.; Zuberi, S.; Perez, F.; Volkovs, M. HGCF: Hyperbolic graph convolution networks for collaborative filtering. In Proceedings of the Web Conference 2021 (WWW ’21), Ljubljana, Slovenia, Virtual, 19–23 April 2021; ACM: New York, NY, USA, 2021; pp. 593–601. [Google Scholar] [CrossRef]
  10. Pizzato, L.; Rej, T.; Chung, T.; Koprinska, I.; Kay, J. RECON: A reciprocal recommender for online dating. In Proceedings of the 4th ACM Conference on Recommender Systems (RecSys ’10), Barcelona, Spain, 26–30 September 2010; ACM: New York, NY, USA, 2010; pp. 207–214. [Google Scholar]
  11. Malinowski, J.; Keim, T.; Wendt, O.; Weitzel, T. Matching people and jobs: A bilateral recommendation approach. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS ’06), Kauai, HI, USA, 4–7 January 2006; IEEE: Piscataway Township, NJ, USA, 2006; p. 137c. [Google Scholar]
  12. Kumar, A.; Singh, R.; Patel, S. Zero-shot recommendation AI models for efficient job–candidate matching in recruitment process. Appl. Sci. 2024, 14, 2601. [Google Scholar]
  13. Raghavan, M.; Barocas, S.; Kleinberg, J.; Levy, K. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAccT ’20), Barcelona, Spain, 27–30 January 2020; ACM: New York, NY, USA, 2020; pp. 469–481. [Google Scholar] [CrossRef]
  14. Zhou, Y.; Chen, L.; Wang, H. A study of reciprocal job recommendation for college graduates integrating semantic keyword matching and social networking. Appl. Sci. 2023, 13, 12305. [Google Scholar] [CrossRef]
  15. Madanchian, M. From recruitment to retention: AI tools for human resource decision-making. Appl. Sci. 2024, 14, 11750. [Google Scholar] [CrossRef]
  16. Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates, Inc.: Long Beach, CA, USA, 2017; pp. 6338–6347. [Google Scholar]
  17. Zhang, Y.; Wang, X.; Shi, C.; Liu, N.; Song, G. Lorentzian graph convolutional networks. In Proceedings of the Web Conference 2021 (WWW ’21), Ljubljana, Slovenia, Virtual, 19–23 April 2021; ACM: New York, NY, USA, 2021; pp. 1249–1261. [Google Scholar] [CrossRef]
  18. Yang, M.; Li, Z.; Zhou, M.; Liu, J.; King, I. HICF: Hyperbolic informative collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ’22), Washington, DC, USA, 14–18 August 2022; ACM: New York, NY, USA, 2022; pp. 2212–2221. [Google Scholar] [CrossRef]
  19. Yang, M.; Zhou, M.; Zhao, M.; Liu, J.; Li, Z.; Ding, L. DHCF: Disentangled hyperbolic collaborative filtering. arXiv 2022, arXiv:2204.12200. [Google Scholar]
  20. Chami, I.; Ying, Z.; Ré, C.; Leskovec, J. Hyperbolic graph convolutional neural networks. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); Curran Associates, Inc.: Vancouver, BC, Canada, 2019; pp. 4868–4879. [Google Scholar]
  21. Ganea, O.; Bécigneul, G.; Hofmann, T. Hyperbolic neural networks. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018); Curran Associates, Inc.: Montréal, QC, Canada, 2018; pp. 5345–5355. [Google Scholar]
  22. Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia, 13–17 December 2010; IEEE: Piscataway Township, NJ, USA, 2010; pp. 995–1000. [Google Scholar]
  23. He, X.; Chua, T.S. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17), Tokyo, Japan, 7–11 August 2017; ACM: New York, NY, USA, 2017; pp. 355–364. [Google Scholar]
  24. Xiao, J.; Ye, H.; He, X.; Zhang, H.; Wu, F.; Chua, T.S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017), Melbourne, Australia, 19–25 August 2017; AAAI Press: Washington, DC, USA, 2017; pp. 3119–3125. [Google Scholar]
  25. Wang, R.; Fu, B.; Fu, G.; Wang, M. Deep & cross network for ad click predictions. In Proceedings of the ADKDD ’17; Halifax, NS, Canada, 14 August 2017, ACM: New York, NY, USA, 2017; Article 12. [Google Scholar]
  26. Song, W.; Shi, C.; Xiao, Z.; Duan, Z.; Xu, Y.; Zhang, M.; Tang, J. AutoInt: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM ’19), Beijing, China, 3–7 November 2019; ACM: New York, NY, USA, 2019; pp. 1161–1170. [Google Scholar]
  27. Ma, J.; Zhao, Z.; Yi, X.; Chen, J.; Hong, L.; Chi, E.H. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18), London, UK, 19–23 August 2018; ACM: New York, NY, USA, 2018; pp. 1930–1939. [Google Scholar]
  28. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  29. Bécigneul, G.; Ganea, O. Riemannian adaptive optimization methods. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  30. Bonnabel, S. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 2013, 58, 2217–2229. [Google Scholar] [CrossRef]
  31. Kochurov, M.; Kozlukov, S.; Malinin, A.; Ryabinin, M.; Burnaev, E. Geoopt: Riemannian optimization in PyTorch. arXiv 2020, arXiv:2005.02819. [Google Scholar] [CrossRef]
  32. Fisman, R.; Iyengar, S.S.; Kamenica, E.; Simonson, I. Gender differences in mate selection: Evidence from a speed dating experiment. Q. J. Econ. 2006, 121, 673–697. [Google Scholar] [CrossRef]
  33. Qu, Y.; Cai, H.; Ren, K.; Zhang, W.; Yu, Y.; Wen, Y.; Wang, J. Product-based neural networks for user response prediction. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 12–15 December 2016; IEEE: Piscataway Township, NJ, USA, 2016; pp. 1149–1154. [Google Scholar]
  34. Liu, B.; Tang, R.; Chen, Y.; Yu, J.; Guo, H.; Zhang, Y. Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction. In Proceedings of the World Wide Web Conference (WWW ’19), San Francisco, CA, USA, 13–17 May 2019; ACM: New York, NY, USA, 2019; pp. 1119–1129. [Google Scholar] [CrossRef]
  35. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
  36. Caruana, R.; Niculescu-Mizil, A.; Crew, G.; Ksikes, A. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning (ICML 2004), Banff, AB, Canada, 4–8 July 2004; ACM: New York, NY, USA, 2004; pp. 18–25. [Google Scholar]
  37. Shwartz-Ziv, R.; Tishby, N. Opening the black box of deep neural networks via information. arXiv 2017, arXiv:1703.00810. [Google Scholar] [CrossRef]
  38. Krioukov, D.; Papadopoulos, F.; Kitsak, M.; Vahdat, A.; Boguñá, M. Hyperbolic geometry of complex networks. Phys. Rev. E 2010, 82, 036106. [Google Scholar] [CrossRef]
  39. Sener, O.; Koltun, V. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018); Curran Associates, Inc.: Montréal, QC, Canada, 2018; pp. 527–538. [Google Scholar]
  40. Park, J.; Cho, J.; Chang, H.J.; Choi, J.Y. Unsupervised hyperbolic representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway Township, NJ, USA, 2021; pp. 5516–5526. [Google Scholar]
  41. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 2021, 38, 146. [Google Scholar] [CrossRef]
  42. Wynn, A.T.; Correll, S.J. Puncturing the pipeline: Do technology companies alienate women in recruiting sessions? Soc. Stud. Sci. 2018, 48, 149–164. [Google Scholar] [CrossRef] [PubMed]
  43. Bose, A.; Hamilton, W. Compositional fairness constraints for graph embeddings. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 715–724. [Google Scholar]
  44. Gu, A.; Sala, F.; Gunel, B.; Ré, C. Learning mixed-curvature representations in product spaces. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Figure 1. t-SNE visualization of learned hyperbolic job embeddings demonstrating hierarchical organization. Entry-level positions (green) cluster at periphery (larger radii), mid-level (orange) and senior-level (blue) positions occupy intermediate regions, and executive-level positions (pink) concentrate near the center (smaller radii). The concentric organization reflects career hierarchy learned through interaction patterns without explicit supervision. Dashed circles indicate approximate hierarchical level boundaries. This spatial structure emerges automatically during training, validating that biLorentzFM discovers meaningful geometric representations aligned with domain knowledge.
Figure 1. t-SNE visualization of learned hyperbolic job embeddings demonstrating hierarchical organization. Entry-level positions (green) cluster at periphery (larger radii), mid-level (orange) and senior-level (blue) positions occupy intermediate regions, and executive-level positions (pink) concentrate near the center (smaller radii). The concentric organization reflects career hierarchy learned through interaction patterns without explicit supervision. Dashed circles indicate approximate hierarchical level boundaries. This spatial structure emerges automatically during training, validating that biLorentzFM discovers meaningful geometric representations aligned with domain knowledge.
Applsci 15 12340 g001
Figure 2. Training dynamics comparison. biLorentzFM (green) converges faster and achieves better final performance than biDeepFM (red) across candidate AUC, company AUC, candidate LogLoss, and combined metrics. Solid lines: training, dashed lines: validation. Early stopping triggered at epoch 12 for biLorentzFM vs. epoch 18 for biDeepFM.
Figure 2. Training dynamics comparison. biLorentzFM (green) converges faster and achieves better final performance than biDeepFM (red) across candidate AUC, company AUC, candidate LogLoss, and combined metrics. Solid lines: training, dashed lines: validation. Early stopping triggered at epoch 12 for biLorentzFM vs. epoch 18 for biDeepFM.
Applsci 15 12340 g002
Table 1. biLorentzFM architecture specifications.
Table 1. biLorentzFM architecture specifications.
ComponentSpecification
Embedding dimensiond = 32
DNN hidden layers3 layers: [256, 128, 64]
DNN activationReLU
DNN dropout0.2 (after each layer)
Output activationSigmoid
FM-DNN weight ( α )0.5
Batch size256
OptimizerAdam [28]
Learning rate0.001
L2 regularization λ reg = 10 6
Table 2. Comparison of the two reciprocal recommendation datasets.
Table 2. Comparison of the two reciprocal recommendation datasets.
CharacteristicKariyer.NetSpeed Dating
DomainJob matchingRomantic matching
Interactions1,150,3028378
Users (Side A)229,805 candidates552 participants
Items (Side B)16,134 jobs552 participants
Observation Period6 months (2023)21 events (2002–2004)
Feature TypesDemographics, skills,Demographics,
education, experiencepreferences, ratings
Hierarchical StructureExplicit (5-level taxonomy)Latent (personality)
Sparsity0.0031%2.76%
Table 3. Kariyer.Net temporal split statistics.
Table 3. Kariyer.Net temporal split statistics.
StatisticTrain (70%)Val (10%)Test (20%)
Time PeriodJan–Apr 2023Apr–May 2023May–Jun 2023
Total Samples805,211115,030230,061
Positive Applications268,40538,34376,686
Negative Samples536,80676,687153,375
Positive Rate33.3%33.3%33.3%
Table 4. Performance comparison on Kariyer.Net dataset. Baseline results (AFM–biDeepFM) are reported from Yıldırım et al. [6]. Our proposed biLorentzFM is evaluated under identical experimental conditions using a single seed on a fixed test set to maintain consistency with prior work. Best results in bold, second-best underlined.
Table 4. Performance comparison on Kariyer.Net dataset. Baseline results (AFM–biDeepFM) are reported from Yıldırım et al. [6]. Our proposed biLorentzFM is evaluated under identical experimental conditions using a single seed on a fixed test set to maintain consistency with prior work. Best results in bold, second-best underlined.
MethodCandidateCompany
AUCLogLossAUCLogLoss
Single-Objective Euclidean Baselines
   DeepFM0.93530.30780.87880.0354
   PNN0.93970.30230.90980.0330
   DCN0.93500.30830.89280.0345
   AutoInt0.93810.30380.89750.0347
   NFM0.89680.38890.88270.0353
   AFM0.79760.50620.77850.0411
Multi-Objective Euclidean Baseline
   biDeepFM_4CF0.88860.40040.8905N/A
   biDeepFM_Full0.93510.30830.93480.0311
Proposed Methods
   biLorentzFM_4CF0.99240.12810.95280.0233
   biLorentzFM_Full0.99640.11200.99130.0181
Table 5. Cross-validation and multi-seed validation results on the Kariyer.Net dataset. Top section: Five-fold cross-validation with a single seed (mean ± std across folds) demonstrates robustness across data partitions. Middle section: Three-seed replication on a fixed test set (mean ± std across seeds: 42, 123, 456) demonstrates robustness to random initialization. Bottom section: The single-seed baseline (no CV) matches prior work reporting standards. Different AUC values between validation strategies reflect different experimental protocols rather than model instability.
Table 5. Cross-validation and multi-seed validation results on the Kariyer.Net dataset. Top section: Five-fold cross-validation with a single seed (mean ± std across folds) demonstrates robustness across data partitions. Middle section: Three-seed replication on a fixed test set (mean ± std across seeds: 42, 123, 456) demonstrates robustness to random initialization. Bottom section: The single-seed baseline (no CV) matches prior work reporting standards. Different AUC values between validation strategies reflect different experimental protocols rather than model instability.
MethodCandidate AUCCompany AUC
5-Fold Cross-Validation (Single Seed)
biDeepFM_Full0.9351 ± 0.00040.9348 ± 0.0002
biLorentzFM_Full0.9813 ± 0.00020.9756 ± 0.0002
Improvement+4.9%+4.4%
3-Seed Replication (Fixed Test Set)
biDeepFM_4CF0.8886 ± 0.00080.8905 ± 0.0006
biDeepFM_Full0.9351 ± 0.00050.9348 ± 0.0004
biLorentzFM_4CF0.9922 ± 0.00150.9534 ± 0.0012
biLorentzFM_Full0.9944 ± 0.00110.9815 ± 0.0009
Improvement (Full)+6.3%+5.0%
Single Seed (Fixed Test Set, No CV)
biDeepFM_Full0.93510.9348
biLorentzFM_Full0.99640.9913
Improvement+6.6%+6.0%
Table 6. Effect size analysis using Cohen’s d [35] based on 5-fold cross-validation. Interpretation: medium (d > 0.5), large (d > 0.8), very large (d > 1.2). All metrics show very large effects, indicating substantial practical importance beyond statistical significance. Cohen’s d is computed as d = ( μ biLorentzFM μ biDeepFM ) / σ pooled , where σ pooled = ( σ biLorentzFM 2 + σ biDeepFM 2 ) / 2 using 5-fold cross-validation statistics from Table 5.
Table 6. Effect size analysis using Cohen’s d [35] based on 5-fold cross-validation. Interpretation: medium (d > 0.5), large (d > 0.8), very large (d > 1.2). All metrics show very large effects, indicating substantial practical importance beyond statistical significance. Cohen’s d is computed as d = ( μ biLorentzFM μ biDeepFM ) / σ pooled , where σ pooled = ( σ biLorentzFM 2 + σ biDeepFM 2 ) / 2 using 5-fold cross-validation statistics from Table 5.
MetricbiDeepFMbiLorentzFMCohen’s dInterpretation
(5-Fold CV)(5-Fold CV)
Candidate AUC0.9351 ± 0.00040.9813 ± 0.00023.08Very Large
Company AUC0.9348 ± 0.00020.9756 ± 0.00022.89Very Large
Candidate LogLoss0.30830.11203.93Very Large
Company LogLoss0.03110.01812.60Very Large
Average3.13Very Large
Table 7. Comprehensive hyperbolic geometry comparison. All models use identical architecture (embedding dimension d = 8, 3-layer DNN with [256, 128, 64] units, batch size 256, Adam optimizer with learning rate 0.001) and training procedures, differing only in embedding space geometry and curvature parametrization. For Poincaré ball, we use the gyrovector implementation from Geoopt [31] with carefully tuned epsilon clamping ( ϵ = 10 5 ) and gradient clipping (max norm = 1.0) to ensure stable training. We conducted an extensive hyperparameter search over epsilon values { 10 5 ,   10 6 ,   10 7 } , gradient clipping thresholds { 0.5 ,   1.0 ,   2.0 } , and learning rates { 0.0005 ,   0.001 ,   0.002 } , reporting the best configuration. Despite these optimization efforts, Poincaré ball exhibits fundamental challenges with boundary effects during training. Results isolate geometric contributions.
Table 7. Comprehensive hyperbolic geometry comparison. All models use identical architecture (embedding dimension d = 8, 3-layer DNN with [256, 128, 64] units, batch size 256, Adam optimizer with learning rate 0.001) and training procedures, differing only in embedding space geometry and curvature parametrization. For Poincaré ball, we use the gyrovector implementation from Geoopt [31] with carefully tuned epsilon clamping ( ϵ = 10 5 ) and gradient clipping (max norm = 1.0) to ensure stable training. We conducted an extensive hyperparameter search over epsilon values { 10 5 ,   10 6 ,   10 7 } , gradient clipping thresholds { 0.5 ,   1.0 ,   2.0 } , and learning rates { 0.0005 ,   0.001 ,   0.002 } , reporting the best configuration. Despite these optimization efforts, Poincaré ball exhibits fundamental challenges with boundary effects during training. Results isolate geometric contributions.
GeometryCand. AUCComp. AUCTraining TimeCurvature
Euclidean (biDeepFM) a0.93510.934895.2 minN/A
Poincaré ( β = 1.0 ) b0.94120.9389142 minFixed ( β = 1.0 )
Lorentz ( β = 1.0 , fixed) c0.95240.951298.7 minFixed ( β = 1.0 )
Lorentz (clipped β ) d0.98910.9858107 minLearned (clipped)
Lorentz (log-space β ) e0.99640.9913117.6 minLearned (log-space)
a Baseline model; b Boundary instability issues; c Stable but suboptimal; d Hard clipping overhead; e Smooth gradients (best performance).
Table 8. Comparison with hyperbolic recommendation baselines on Kariyer.Net test set. Graph-based methods (HGCF, LGCN, HGNN, HGT) use bipartite user-item graphs constructed from training interactions. All methods employ hyperbolic embeddings but differ in architecture, geometry model, and optimization strategies. We implement all baselines using their official code repositories with hyperparameters tuned on the validation set following each paper’s recommended protocol.
Table 8. Comparison with hyperbolic recommendation baselines on Kariyer.Net test set. Graph-based methods (HGCF, LGCN, HGNN, HGT) use bipartite user-item graphs constructed from training interactions. All methods employ hyperbolic embeddings but differ in architecture, geometry model, and optimization strategies. We implement all baselines using their official code repositories with hyperparameters tuned on the validation set following each paper’s recommended protocol.
MethodGeometryCand. AUCComp. AUCType
  Embedding-based (Non-graph)
HCFPoincaré Ball0.89340.8756Single-obj. CF
  Graph-based Methods
HGCFPoincaré Ball0.91270.8921Graph CF
LGCNLorentz0.91560.8945Graph Conv.
HGNNPoincaré Ball0.90890.8867Graph NN
HGTMixed Geom.0.91780.8978Graph Trans.
  Our Method (Factorization Machine)
biLorentzFMLorentz Model0.9964 *0.9913 *Bi-obj. FM
Improvement+8.6%+10.4%
(vs. best baseline) (vs. HGT)(vs. HGT)
* Bold indicates best performance across all methods.
Table 9. Comprehensive ablation study isolating component contributions. Each row removes one key component from biLorentzFM_Full to measure its impact. The table demonstrates that hyperbolic embeddings provide the largest gain (6.6% AUC), which is followed by learnable curvature (4.7%), multi-objective learning (2.1%), and FM component (0.6%).
Table 9. Comprehensive ablation study isolating component contributions. Each row removes one key component from biLorentzFM_Full to measure its impact. The table demonstrates that hyperbolic embeddings provide the largest gain (6.6% AUC), which is followed by learnable curvature (4.7%), multi-objective learning (2.1%), and FM component (0.6%).
ArchitectureCand.Comp.Cand.Comp.
AUCAUCLogLossLogLoss
biLorentzFM_Full0.99640.99130.11200.0181
(All Components)
Remove one component at a time:
w/o Lorentz embeddings0.93510.93480.30830.0311
   (→ Euclidean, biDeepFM)−6.6%−6.0%+175%+72%
w/o Multi-objective0.97620.1456
   (→ Single objective)−2.1%+30%
w/o Learnable β 0.95240.95120.15890.0248
   (→ Fixed β = 1.0 )−4.6%−4.2%+42%
+37%
w/o FM component0.99030.98870.13720.0216
   (→ DNN only)−0.6%−0.3%+22%+19%
Table 10. Learned curvature parameters by entity type (entity-specific configuration). Different entities discover distinct optimal geometries when optimized independently: job categories need strong hierarchy ( β = 1.65 ), candidates need moderate curvature ( β = 0.92 ), and job items need nearly flat space ( β = 0.01 ). These values provide interpretability into the data structure but are not used in the main experiments (Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9), which employ a single global β 1.23 shared across all entities.
Table 10. Learned curvature parameters by entity type (entity-specific configuration). Different entities discover distinct optimal geometries when optimized independently: job categories need strong hierarchy ( β = 1.65 ), candidates need moderate curvature ( β = 0.92 ), and job items need nearly flat space ( β = 0.01 ). These values provide interpretability into the data structure but are not used in the main experiments (Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9), which employ a single global β 1.23 shared across all entities.
Entity TypeLearned β Interpretation
Job Categories1.647Strong hierarchy (5-level taxonomy)
Candidates0.920Moderate (diverse backgrounds)
Job Items0.010Nearly flat (content similarity)
Global (main exp.)1.230Moderate hierarchy (shared)
Table 11. Representative predictions demonstrating asymmetric matching behavior. biLorentzFM correctly distinguishes overqualified candidates (high match probability) from underqualified candidates (low probability) through asymmetric hyperbolic distances. The Euclidean baseline produces similar symmetric scores in both directions, failing to capture hierarchical career/education relationships.
Table 11. Representative predictions demonstrating asymmetric matching behavior. biLorentzFM correctly distinguishes overqualified candidates (high match probability) from underqualified candidates (low probability) through asymmetric hyperbolic distances. The Euclidean baseline produces similar symmetric scores in both directions, failing to capture hierarchical career/education relationships.
Matching ScenariobiDeepFMbiLorentzFM
(Euclidean)(Lorentz)
Senior Engineer → Junior Position0.420.78
Junior Engineer → Senior Position0.390.11
PhD Candidate → Bachelor Required0.510.82
Bachelor Graduate → PhD Required0.480.08
Table 12. Computational efficiency comparison on Tesla V100 GPU. Despite per-epoch overhead, faster convergence reduces total training time.
Table 12. Computational efficiency comparison on Tesla V100 GPU. Despite per-epoch overhead, faster convergence reduces total training time.
MethodTime/TotalMemoryInference
EpochTraining
biDeepFM95.2 min28.6 h2.1 GB1.8 ms
(18 epochs) /batch
biLorentzFM117.6 min23.5 h2.3 GB2.1 ms
(12 epochs) /batch
Overhead+23.5%−17.8%+9.5%+16.7%
Table 13. Performance on Speed Dating dataset (5-fold CV, mean ± std). biLorentzFM achieves consistent improvements despite lacking explicit hierarchical features, confirming generalization to latent hierarchies. Smaller gains (2.7%) vs. Kariyer.Net (4.9% CV, 6.6% fixed test) align with expectations: explicit taxonomies benefit more than implicit patterns.
Table 13. Performance on Speed Dating dataset (5-fold CV, mean ± std). biLorentzFM achieves consistent improvements despite lacking explicit hierarchical features, confirming generalization to latent hierarchies. Smaller gains (2.7%) vs. Kariyer.Net (4.9% CV, 6.6% fixed test) align with expectations: explicit taxonomies benefit more than implicit patterns.
MethodPart. APart. BMutual
AUCAUCMatch AUC
biDeepFM_Base0.6823 ± 0.02340.6791 ± 0.02410.7105 ± 0.0198
biLorentzFM_Base0.7012 ± 0.01890.6985 ± 0.02030.7289 ± 0.0176
Improvement+2.8%+2.9%+2.6%
Cohen’s d0.870.860.94
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karacan Uyar, K.; Salman, Y.B. biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation. Appl. Sci. 2025, 15, 12340. https://doi.org/10.3390/app152212340

AMA Style

Karacan Uyar K, Salman YB. biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation. Applied Sciences. 2025; 15(22):12340. https://doi.org/10.3390/app152212340

Chicago/Turabian Style

Karacan Uyar, Kübra, and Yücel Batu Salman. 2025. "biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation" Applied Sciences 15, no. 22: 12340. https://doi.org/10.3390/app152212340

APA Style

Karacan Uyar, K., & Salman, Y. B. (2025). biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation. Applied Sciences, 15(22), 12340. https://doi.org/10.3390/app152212340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop