1. Introduction
The exponential growth of online platforms has fundamentally transformed how individuals and organizations connect, creating unprecedented opportunities for automated matching systems. Recommender systems, which emerged to address information overload in large-scale digital environments, have become indispensable tools across diverse domains ranging from e-commerce and entertainment to social networking and professional services [
1]. However, traditional recommendation paradigms primarily focus on unilateral user preferences, optimizing for the satisfaction of a single party—typically the content consumer.
A distinct class of recommendation problems, known as reciprocal recommendation, requires satisfying the preferences of multiple parties simultaneously. Unlike conventional systems where movies can be liked by unlimited users or products can be purchased by countless customers, reciprocal recommendation operates under mutual constraints where success depends on bilateral agreement [
2]. This paradigm is exemplified in critical real-world applications such as online dating platforms, where both parties must express mutual interest, and job-matching systems, where successful placement requires alignment between candidate aspirations and employer requirements. The global online recruitment market continues its rapid expansion, with projections indicating growth from USD 29.09 billion in 2022 to USD 58.52 billion by 2030 [
3], reflecting the increasing digitalization of hiring processes and the critical need for intelligent matching systems.
Job markets exhibit inherent hierarchical relationships that present fundamental challenges for current recommendation systems. Career progression follows directional paths where senior engineers possess qualifications enabling them to fill mid-level roles (acceptable overqualification), whereas junior engineers applying to senior positions typically lack required expertise (problematic underqualification). These asymmetric relationships pose difficulties for standard Euclidean embeddings that compute symmetric distances (d
E(
A,
B) = d
E(
B,
A)). Furthermore, skills form tree-like taxonomies with substantial branching, and theoretical analysis demonstrates that embedding
n-node trees in Euclidean space without distortion requires
dimensions [
4], whereas hyperbolic space achieves comparable representation quality in
dimensions through exponential volume growth. Recent advances in deep learning architectures including DeepFM [
5] and biDeepFM [
6] have improved feature interaction modeling but operate exclusively in Euclidean space, potentially inheriting these geometric constraints.
Hyperbolic space provides a geometric framework suited for hierarchical data representation through negative curvature and exponential volume growth. The Lorentz model represents hyperbolic space as a hyperboloid in Minkowski space, offering computational advantages through efficient distance computations and numerical stability [
7]. Recent work has begun exploring hyperbolic embeddings for collaborative filtering [
8,
9], demonstrating superior performance on datasets with hierarchical structures. However, existing hyperbolic recommendation methods share critical limitations: they optimize single objectives focused solely on user satisfaction, lacking the bilateral preference modeling essential for reciprocal matching, and most rely on graph structures that may be sparse or unavailable in cold-start scenarios. This work introduces biLorentzFM, which is a multi-objective deep learning framework employing Lorentz embeddings for reciprocal recommendation. Through controlled empirical evaluation on two datasets—Kariyer.Net job matching (1.15 M interactions, 230 K users, explicit five-level hierarchies) and Speed Dating (8.4 K interactions, 552 participants, latent hierarchies)—we demonstrate that hyperbolic geometry provides substantial performance improvements (6.6% AUC on job matching, 2.8% on dating) over state-of-the-art Euclidean baselines. Systematic ablation studies isolate contributions from the geometric structure (6.6%), learnable curvature (4.7%), and multi-objective learning (2.1%), while a three-way comparison reveals that Lorentz embeddings outperform Poincaré ball implementations by 4.4% due to numerical stability advantages. Statistical validation through five-fold cross-validation confirms robustness (std < 0.0004) with very large effect sizes (Cohen’s d = 2.89–3.08), and cross-domain evaluation establishes generalization beyond explicit hierarchies. The approach maintains practical efficiency despite 23.5% per-epoch overhead with faster convergence reducing total training time by 17.8% and inference latency remaining suitable for real-time deployment (2.1 ms per batch).
Organization
The remainder of this paper proceeds as follows.
Section 2 reviews related work on reciprocal recommendation, hyperbolic embeddings, and deep factorization models, identifying the research gaps motivating this work.
Section 3 presents the biLorentzFM methodology including Lorentz model foundations, architecture design, multi-objective optimization procedures, and computational complexity analysis.
Section 4 describes the experimental setup including datasets, data leakage prevention protocols, baselines, and evaluation metrics.
Section 5 presents the comprehensive results including performance comparisons, a three-way geometric comparison (Euclidean vs. Lorentz vs. Poincaré), statistical significance analysis, ablation studies, learned curvature interpretability, and cross-domain validation. This section discusses theoretical implications, addresses reviewer concerns regarding result plausibility, examines practical deployment considerations, and acknowledges limitations, concluding with a summary and future research directions.
3. Methodology
In this section, we present the biLorentzFM framework, which leverages Lorentz embeddings for multi-objective reciprocal recommendation. We begin by formalizing the reciprocal recommendation problem; then, we introduce the mathematical foundations of Lorentz embeddings, describe our architecture, detail the optimization procedure, and specify the experimental setup including datasets, data leakage prevention, and training procedures.
3.1. Problem Formulation
Let denote the set of users (candidates) and denote the set of items (jobs). In reciprocal recommendation, we aim to model the preferences of both sides simultaneously. Each interaction instance is represented as , where is a feature vector encoding the candidate–job pair characteristics, and are binary labels indicating the preferences of the candidate and company, respectively.
The candidate preference
indicates that the candidate applied for the job, while
signifies that the company expressed interest in the candidate (e.g., by viewing their contact information). Our goal is to learn a function
that jointly predicts both preferences:
Unlike traditional single-objective recommendation systems, reciprocal recommendation requires satisfying both parties for successful matching, making multi-objective optimization essential.
3.2. Lorentz Embedding Layer
Traditional recommendation systems typically operate in Euclidean space
. However, job–candidate relationships often exhibit hierarchical structures that are better captured by hyperbolic geometry. Among the hyperbolic models (Poincaré ball, Klein disk, Lorentz/hyperboloid), we adopt the Lorentz model based on three practical considerations: (1) it avoids numerical instabilities near manifold boundaries that affect the Poincaré ball model, (2) distance computations require only inner products without arctanh operations, and (3) gradient flow remains stable across the entire manifold. Our systematic comparison in
Section 4 demonstrates that Lorentz embeddings achieve 4.4% higher AUC values than Poincaré under identical architectural conditions.
3.2.1. Mathematical Foundation
The Lorentz model represents hyperbolic space as a hyperboloid in Minkowski space. Intuitively, this hyperboloid curves inward, naturally organizing embeddings by hierarchy—entities at different organizational levels appear at different “depths” on the surface. The d-dimensional Lorentz manifold
is formally defined as
where
is the Lorentz inner product, which differs from the standard Euclidean dot product by treating the first coordinate (time component) with a negative sign:
This negative sign on the time component creates the hyperbolic geometry that enables hierarchical representation. The parameter is a learnable curvature parameter that controls the “curvature” of the hyperbolic space—larger values create stronger hierarchical capacity.
3.2.2. Embedding Mapping
To embed entities in Lorentz space, we define a mapping from categorical indices to hyperbolic coordinates. The approach starts with standard Euclidean embeddings (which are straightforward to initialize and optimize); then, it augments them with a time coordinate that automatically places them on the hyperboloid. For a categorical feature with value
v, we first obtain a Euclidean embedding
through a standard embedding layer. We then map this to Lorentz space via
This ensures that lies on the Lorentz manifold, satisfying the constraint . The time component encodes the hierarchical depth—embeddings with larger norms naturally appear “deeper” in the hierarchy. Meanwhile, the spatial components preserve semantic relationships learned from the data.
3.2.3. Learnable Curvature Parameter
The curvature parameter
must remain strictly positive to maintain valid hyperbolic geometry. Rather than using constrained optimization, we employ log-space parametrization [
28]: we learn an unconstrained parameter
and compute
. This automatically ensures
while allowing standard gradient descent. The gradient follows from the chain rule:
. We initialize
(corresponding to
) and clip the final value to
to prevent numerical issues. This approach eliminates constraint handling overhead while maintaining stable convergence. In our experiments (
Section 4.3.4),
converges to approximately 1.65 for the Kariyer.Net dataset, indicating moderately strong hierarchical structure. We use a single shared
across all embeddings rather than feature-specific values, as ablation studies showed no significant benefit from added complexity.
3.3. biLorentzFM Architecture
Our biLorentzFM architecture extends the DeepFM framework to hyperbolic space, combining factorization machines with deep neural networks while operating on Lorentz embeddings.
Table 1 summarizes the architectural specifications with hyperparameters selected through validation set performance.
3.3.1. Input Feature Processing
Given an interaction between user
u and item
i, we extract categorical features
and numerical features
. Categorical features are mapped to Lorentz embeddings:
For numerical features, we first project them to Euclidean space via a linear transformation; then, we map to Lorentz space:
3.3.2. Hyperbolic Factorization Machine Component
The FM component captures pairwise feature interactions using Lorentz inner products. In traditional Euclidean space, the dot product measures feature similarity; in Lorentz space, the inner product additionally captures hierarchical relationships—features at similar hierarchy levels have stronger interactions. For categorical features, the interaction strength between features
i and
j is computed as
The complete FM output aggregates all pairwise interactions:
where
is the global bias and
represents linear coefficients, which model first-order feature effects.
3.3.3. Deep Neural Network Component
The DNN component learns high-order feature interactions by processing the concatenated Lorentz embeddings. We flatten all embeddings and feed them through a multi-layer perceptron:
where
is the ReLU activation function. Dropout with a rate of 0.2 is applied after each hidden layer to prevent overfitting.
3.3.4. Multi-Objective Output Layer
To handle the dual objectives of reciprocal recommendation, we employ separate output heads for candidate and company predictions:
where
balances the FM and DNN components, and
is the sigmoid function.
3.4. Multi-Objective Optimization
We optimize both objectives simultaneously using a weighted multi-objective loss function. Each objective uses binary cross-entropy:
The total loss combines both objectives:
The task weights were selected through a grid search over , evaluating balanced performance across both objectives on the validation set. Equal weighting (0.5, 0.5) yielded the best trade-off, improving the candidate AUC value by 2.1% and company AUC value by 1.8% over unbalanced configurations. The regularization coefficient prevents overfitting.
3.4.1. Hyperbolic Optimization Considerations
Optimizing on the Lorentz manifold requires maintaining manifold constraints throughout training. Standard gradient descent operates in Euclidean space, but our embeddings must remain on the curved hyperboloid surface. We address this through Riemannian optimization [
29,
30]: gradients are first projected onto the tangent space (the local flat approximation of the manifold at each point); then, they are used for updates. For a point
, the tangent space
is defined as
Geometrically, the tangent space consists of all vectors orthogonal to
x under the Lorentz inner product. The projection of a Euclidean gradient
g onto the tangent space is
This projection removes the component of
g perpendicular to the manifold, ensuring the gradient respects the geometric constraint. After each gradient update, embeddings are re-normalized to ensure they remain on the manifold. We implement Riemannian optimization using the Geoopt library [
31], which provides efficient PyTorch 2.8.0 implementations of manifold-aware optimizers.
3.4.2. Computational Complexity
The computational complexity of biLorentzFM is comparable to a standard DeepFM. The Lorentz inner product requires operations, which are identical to Euclidean dot products. The main overhead comes from the square root computation in the embedding mapping, which is per embedding. The overall time complexity remains for m categorical features and L DNN layers.
In terms of practical efficiency, while the per-epoch training time increases by 23.5% due to manifold operations (117.6 vs. 95.2 min), biLorentzFM converges in fewer epochs (12 vs. 18), reducing the total training time by 17.8% (23.5 vs. 28.6 h). The inference latency is 2.1 ms per 256-sample batch on NVIDIA V100 GPUs compared to 1.8 ms for biDeepFM (+16.7%), which remains suitable for real-time production deployment. Memory consumption increases modestly from 2.1 GB to 2.3 GB (+9.5%) due to additional time-coordinate storage for hyperbolic embeddings.
3.5. Datasets and Experimental Setup
We evaluate biLorentzFM on two reciprocal recommendation datasets with complementary characteristics.
Table 2 summarizes their key properties, demonstrating diversity in scale, domain, and hierarchical structure.
3.5.1. Kariyer.Net: Job Matching Dataset
The Kariyer.Net dataset, collected from Turkey’s largest job platform, contains 1,150,302 candidate–job interactions spanning 229,805 unique candidates and 16,134 job postings over a six-month period in 2023. Each interaction includes reciprocal signals: candidate applications (explicit interest from candidates) and company views of candidate profiles (interest from employers).
The dataset provides rich contextual information organized into three categories. Candidate features include demographics (age group, location, employment status), education (degree level, field of study, university tier), and experience (years, seniority level, skills vector). Job features capture position details (title, required education and experience), company information (industry, size, prestige), and job specifics (employment type, work arrangement, location, salary range). Hierarchical categorical features follow a five-level job taxonomy: Industry (12 categories) → Sector (45) → Job Family (120) → Role (380) → Seniority (5), enabling hyperbolic embeddings to capture organizational structure naturally.
Following established practices in collaborative filtering [
6], we generate two negative samples for each positive interaction, creating a 2:1 negative-to-positive ratio. The choice of 2 negative samples balances training efficiency with class distribution: fewer negatives provide insufficient signal for learning decision boundaries, while more negatives increase computational cost without proportional benefit. Negative samples consist of jobs that candidates viewed but did not apply to, representing realistic alternatives that were available but not sufficiently attractive. This sampling approach differs fundamentally from random negative sampling in three ways. First, all negative samples represent jobs the candidate actively considered, ensuring they reflect genuine preference decisions rather than items never encountered (exposure guarantee). Second, viewed-but-rejected jobs provide stronger training signal than randomly sampled jobs, as they represent hard negatives the model must distinguish from positive examples (difficulty calibration). Third, the negative sample distribution matches actual job browsing patterns observed on the platform, improving model generalization to production scenarios (realistic distribution). This produces a final dataset with 383,434 positive samples (applications) and 766,868 negative samples (views without application), yielding a 33.3% positive rate. This class distribution allows standard binary cross-entropy loss functions to work effectively without specialized reweighting or class balancing techniques.
To prevent information leakage and ensure valid evaluation, we employ two critical safeguards. First, negative sampling is performed before data splitting. If negative sampling occurred after splitting, the model could inadvertently learn from temporal patterns in the test set during training through the negative sample generation process. By sampling negatives from the complete dataset first, then splitting, we ensure the training process has no access to test set information. Second, we use strict temporal splitting based on interaction timestamps. The dataset is sorted chronologically and divided into three non-overlapping windows: training (70%, January–April), validation (10%, April–May), and test (20%, May–June). This ensures all training interactions occur strictly before validation and test interactions, reflecting real-world scenarios where models predict future user behavior.
Table 3 shows the distribution across splits with consistent positive rates (33.3%), indicating stable user behavior throughout the observation period.
3.5.2. Speed Dating: Cross-Domain Validation
The Speed Dating dataset [
32], collected from 21 speed dating events at Columbia Business School (2002–2004), contains 8378 reciprocal decisions from 552 participants. In each event, participants have brief conversations with potential romantic partners, after which both parties independently indicate interest. A successful match requires mutual agreement, making this a canonical reciprocal recommendation scenario.
Each interaction is characterized by demographic information (age, race, field of study), self-assessed attributes (attractiveness, sincerity, intelligence ratings on 1–10 scales), partner preferences (importance ratings for various traits), and activity interests (sports, arts, etc. as binary indicators). Unlike Kariyer.Net’s explicit job taxonomy, Speed Dating features lack obvious hierarchical structure, testing whether biLorentzFM can discover latent hierarchies in personality and preference patterns.
Due to the smaller dataset size (8.4 K vs. 1.15 M interactions), we employ 5-fold cross-validation rather than a single train–test split. Folds are created by partitioning participants (not interactions) into five groups, ensuring all interactions involving a participant appear in the same fold. This prevents data leakage where models could learn participant-specific patterns during training. For each fold, we train on four folds (6700 interactions) and test on the remaining fold (1670 interactions), reporting average performance and standard deviations across folds. The Speed Dating experiments serve two purposes: (1) validating that biLorentzFM generalizes beyond job recommendation to different reciprocal domains, and (2) testing the model’s ability to learn from limited data, where hyperbolic geometry’s inductive bias may prove particularly valuable.
3.5.3. Baseline Methods
We compare biLorentzFM against state-of-the-art neural recommendation architectures from Yıldırım et al. [
6]:
PNN [
33]: product-based neural networks with explicit pairwise feature interactions through inner/outer product layers.
DeepFM [
5]: combines factorization machines (low-order interactions) with deep neural networks (high-order interactions) using shared embeddings.
DCN [
25]: Deep & Cross Network that automatically learns explicit feature crossings through a cross-network parallel to deep layers.
AutoInt [
26]: uses multi-head self-attention mechanisms to model feature interactions of different orders.
NFM [
23]: neural factorization machine with bi-interaction pooling followed by deep networks for high-order interaction learning.
AFM [
24]: attentional factorization machine that applies attention to weight the importance of different feature interactions.
FGCNN [
34]: feature generation by CNN, automatically creating new features through convolutional operations on raw feature embeddings.
biDeepFM [
6]: multi-objective extension of DeepFM for reciprocal recommendation, serving as our strongest Euclidean baseline.
All baselines employ Euclidean embeddings. We report their published results from Yıldırım et al. [
6], who evaluated these methods on the same Kariyer.Net dataset under identical experimental conditions (70/10/20 train/val/test split, exposure-controlled negative sampling, embedding dimension d = 8). We verified consistency by re-implementing DeepFM and biDeepFM, confirming that our reproduced results match published values within ±0.1% AUC.
3.5.4. Evaluation Metrics
Performance is evaluated using two complementary metrics computed separately for each side of the reciprocal recommendation (candidate/company for Kariyer.Net, participant A/B for Speed Dating).
Area Under the ROC Curve (AUC) measures ranking quality—the model’s ability to rank positive interactions higher than negative ones—providing threshold-independent assessment. AUC ranges from 0 to 1, where 0.5 indicates random performance and 1.0 indicates perfect ranking. AUC is particularly appropriate for recommendation systems where the goal is to present users with ranked lists rather than binary classifications. LogLoss (binary cross-entropy) measures probability calibration, quantifying how well predicted probabilities match actual outcomes. LogLoss heavily penalizes confident but incorrect predictions, making it suitable for assessing whether the model provides reliable probability estimates rather than just correct rankings. Lower LogLoss values indicate better calibration. Together, these metrics capture both discrimination quality (AUC) and calibration (LogLoss). A model may achieve high AUC through correct ranking while having poor LogLoss due to miscalibrated probabilities, or vice versa. Reporting both metrics provides a complete picture of model performance.
We assess statistical significance through paired t-tests comparing biLorentzFM predictions with baseline predictions. For Kariyer.Net, we use test set predictions (230,061 samples). For Speed Dating, we use the five-fold results to compute paired differences across folds. We additionally report Cohen’s d effect sizes to quantify practical significance beyond statistical significance.
3.6. Training Procedure
The complete training algorithm for biLorentzFM integrates standard mini-batch gradient descent with specialized handling for hyperbolic embeddings. Algorithm 1 presents the detailed procedure for reproducibility.
| Algorithm 1 biLorentzFM Training Procedure |
| Input: Training data , validation data |
| Hyperparameters: , , , , , |
- 1.
Initialize E (Xavier), , W (Kaiming) - 2.
best_auc ← 0, - 3.
for to T do - 4.
Shuffle - 5.
for each mini-batch in do - 6.
// Forward Pass - 7.
for in do - 8.
Extract c (categorical), n (numerical) - 9.
- 10.
for all j - 11.
- 12.
- 13.
- 14.
end for - 15.
// Loss Computation - 16.
- 17.
Compute via backpropagation - 18.
// Riemannian Updates - 19.
for each Lorentz embedding h do - 20.
// Project to tangent - 21.
end for - 22.
- 23.
// Update all parameters - 24.
for each Lorentz embedding h do - 25.
// Renormalize - 26.
end for - 27.
end for // End batch loop - 28.
// Validation - 29.
auc_cand ← Evaluate - 30.
if auc_cand > best_auc then - 31.
best_auc ← auc_cand; Save checkpoint; patience ← 0 - 32.
else - 33.
- 34.
if then break // Early stop - 35.
end if - 36.
end for // End epoch loop
|
| Output: Trained model θ |
The algorithm emphasizes three key aspects. First, log-space curvature parameters (line 9) require only exponentiation and clipping in the forward pass—no complex constraint handling during optimization. Second, Riemannian gradient projection (line 20) maintains manifold constraints for embeddings, ensuring all points remain on the hyperboloid throughout training. Third, the multi-objective structure (line 16) treats candidate and company preferences symmetrically, optimizing both sides of the reciprocal recommendation problem simultaneously. Early stopping (lines 31–34) prevents overfitting based on validation performance.
5. Discussion
This work introduces biLorentzFM, which is a novel approach to reciprocal recommendation that integrates Lorentz hyperbolic embeddings with factorization machines and multi-objective optimization. Our experimental evaluation on the Kariyer.Net job recommendation dataset demonstrates substantial improvements over state-of-the-art Euclidean baselines: biLorentzFM achieves a 0.9964 candidate AUC value and 0.9913 company AUC value on the fixed test set, representing 6.6% and 6.0% improvements over the strongest baseline biDeepFM (0.9351 and 0.9348, respectively). Cross-validation results confirm robustness across data partitions (five-fold CV: 0.9813 ± 0.0002 candidate, 0.9756 ± 0.0002 company) and random initializations (three-seed: 0.9964 ± 0.0012 candidate, 0.9913 ± 0.0019 company) with very large effect sizes (Cohen’s d = 2.89–3.08). Cross-domain validation on the Speed Dating dataset confirms generalization beyond job recommendation, achieving 2.8% improvement despite the absence of explicit hierarchical features. Comprehensive ablation studies isolate the contribution of each architectural component, revealing that hyperbolic embeddings provide the largest performance gain (6.6%) followed by learnable curvature (4.7%), multi-objective learning (2.1%), and explicit feature interactions via the FM component (0.6%).
The performance advantages stem from the fundamental alignment between hyperbolic geometry and hierarchical job market structures. Job markets exhibit multi-dimensional hierarchies across career progression (entry-level through executive positions), educational attainment (high school through doctoral degrees), and organizational taxonomies (industry classifications through specific role requirements). Prior theoretical work established that hyperbolic spaces embed tree structures with arbitrarily low distortion using logarithmic rather than exponential dimensionality [
4,
38], suggesting representational efficiency advantages for hierarchical data. Our empirical results support this foundation: biLorentzFM with eight-dimensional embeddings outperforms Euclidean baselines employing substantially higher dimensionality, indicating that an appropriate geometric structure provides greater benefit than increased embedding capacity. The learned global curvature parameter (
across random seeds) reflects moderate hierarchical depth, falling between nearly flat geometry for lateral content similarity (
) and strongly hierarchical structures like deep organizational charts (
). Beyond representational efficiency, hyperbolic geometry provides asymmetric distance metrics naturally encoding directional preferences. Reciprocal recommendation exhibits fundamental asymmetry: a senior engineer applying to an entry-level position (overqualified) differs qualitatively from an entry-level candidate applying to a senior position (underqualified). Euclidean distance functions are symmetric by construction (d
E(
x,
y) = d
E(
y,
x)), requiring neural networks to learn asymmetry through complex nonlinear transformations. In contrast, hyperbolic distances in the Lorentz model incorporate inherent asymmetry through the time-like coordinate
in Minkowski space, directly encoding hierarchical relationships in geometric structure itself. Our t-SNE visualization (
Figure 1) demonstrates this empirically: entry-level positions cluster at larger radii (larger
) while executive positions concentrate near the origin (smaller
), creating geometric gradients aligned with career trajectories. This spatial organization emerges through training without explicit hierarchy supervision. Quantitative analysis (
Table 11) confirms behavioral consequences: biLorentzFM assigns substantially different match probabilities to bidirectional scenarios (0.78 for senior→junior versus 0.11 for junior→senior), whereas Euclidean baseline biDeepFM produces nearly symmetric scores (0.42 versus 0.39) that fail to distinguish qualification direction.
The integration of hyperbolic embeddings with multi-objective learning addresses balancing preferences from both sides of the matching process. Single-objective approaches optimize for one party’s satisfaction, potentially at the expense of the other; our baseline comparison reveals this tension explicitly, where single-objective PNN achieves a 0.9397 candidate AUC value but only a 0.9098 company AUC value. Standard multi-objective optimization in Euclidean space faces inherent trade-offs between competing objectives: biDeepFM balances both sides but incurs slight candidate-side degradation relative to PNN (0.9351 versus 0.9397). The same pattern emerges when comparing DeepFM (single-objective: 0.9353 candidate, 0.8788 company) to biDeepFM (multi-objective: 0.9351 candidate, 0.9348 company). The 0.02% candidate-side reduction from 0.9353 to 0.9351 represents minimal cost for the 6.4% company-side improvement from 0.8788 to 0.9348. This trade-off pattern is fundamental to multi-objective optimization in Euclidean spaces: shared embedding layers must learn representations serving both prediction tasks, leading to compromise solutions rather than task-specific optima [
39]. The key insight is that multi-objective learning achieves superior balanced performance across both tasks, which is the relevant metric for reciprocal recommendation where mutual agreement is required. Hyperbolic geometry appears to reduce this optimization tension by providing representational foundations naturally supporting both objectives simultaneously. The hierarchical structure captured in learned embeddings benefits both matching directions: candidates navigate the hierarchy upward to identify aspirational positions matching their skill development, while companies navigate downward or laterally to identify candidates at or above required qualification levels. This shared geometric scaffold enables biLorentzFM to achieve superior performance on both objectives without compromise (0.9964 candidate, 0.9913 company), suggesting that appropriate inductive bias reduces multi-objective optimization difficulty by aligning the solution space with the problem structure.
Our three-way comparison of hyperbolic geometries (
Table 7) reveals substantial practical differences between the Lorentz hyperboloid and Poincaré ball models despite their mathematical equivalence as isometric representations of hyperbolic space. Lorentz consistently outperforms Poincaré across all experimental conditions: with fixed curvature, Lorentz achieves a 0.9524 candidate AUC value compared to Poincaré’s 0.9412 (+1.2%), and the performance gap widens when incorporating learnable curvature (0.9964 versus 0.9687, +2.9%). These differences arise from numerical and optimization properties rather than geometric expressiveness. The Poincaré ball representation confines embeddings to the unit disk, where points approaching the boundary experience exponentially growing gradients due to the conformal factor
in the Riemannian metric tensor. This boundary effect necessitates careful gradient clipping, small learning rates, and epsilon clamping to maintain numerical stability during training. Our implementation required
clamping and an extensive hyperparameter search across epsilon values, clipping thresholds, and learning rates (
Section 4.3.1), yet it still encountered occasional gradient instabilities. The Lorentz model eliminates boundary effects by representing hyperbolic space as a hyperboloid sheet
embedded in Minkowski space, where all points maintain equal distance from any problematic regions. This boundary-free representation enables stable gradient flow throughout training, manifesting empirically in a 49% faster training time (98.7 versus 142 min) and improved convergence despite identical network architectures. Computational considerations further favor the Lorentz model for practical deployment. Distance calculations in Poincaré ball require Möbius addition and logarithmic maps involving compositions of hyperbolic trigonometric functions, increasing both the computational cost and numerical error accumulation during backpropagation. The Lorentz distance reduces to a bilinear form d
L(
x,
y) =
β−1arccosh(−
β〈
x,
y〉
L) requiring only matrix multiplication and a single transcendental function evaluation. This computational simplicity benefits both forward passes (inference) and backward passes (gradient computation) with modern automatic differentiation frameworks (TensorFlow, PyTorch) handling bilinear forms more efficiently than compositions of special functions. For large-scale production systems processing millions of distance computations per training epoch, these efficiency gains accumulate substantially.
The position of biLorentzFM within the emerging landscape of hyperbolic recommendation methods merits consideration. Recent work has explored hyperbolic geometry for recommendation tasks, but existing approaches predominantly focus on graph-based methods propagating information through user-item interaction graphs [
8,
40,
41]. These methods leverage graph convolutional networks operating in hyperbolic space to aggregate neighborhood information while respecting hierarchical structure. Our comparison with representative graph-based hyperbolic baselines (
Table 8) reveals substantial performance differences: the strongest graph baseline HGT achieves a 0.9178 candidate AUC value, which is approximately 8.6% below biLorentzFM (0.9964). This performance gap highlights fundamental differences in the modeling approach. Graph-based methods primarily capture the connectivity structure—which users interact with which items—relying on the principle that connected nodes share latent properties. Factorization machine approaches instead model feature interactions, learning which combinations of user and item attributes drive preferences. For job recommendation, rich categorical features (educational background, skill requirements, experience levels, geographic location, industry sectors) carry substantial predictive information beyond graph connectivity patterns. Consider a candidate who has never applied to positions in a particular industry but shares educational credentials and technical skills with successful applicants in that industry; pure graph methods cannot leverage this attribute-level similarity for generalization, whereas feature-based models naturally identify such patterns. Our ablation study quantifies this distinction: biLorentzFM_4CF using only user and item identifiers (analogous to graph-based approaches) achieves a 0.9924 candidate AUC value, but incorporating rich categorical features (biLorentzFM_Full) provides an additional 0.4% improvement, suggesting the complementary value of connectivity patterns and attribute-level information. The integration of hyperbolic geometry with factorization machines represents a novel direction within hyperbolic recommendation research that is distinct from existing graph-based approaches. Graph convolutional methods excel at capturing the community structure, transitive relationships, and collective patterns, while factorization machines excel at discovering attribute-level combinations and feature interaction patterns. These complementary strengths suggest future research directions combining hyperbolic graph convolutions for initial embedding learning with hyperbolic factorization machines for final prediction, potentially achieving benefits from both connectivity and attribute modeling within a unified geometric framework.
Cross-domain validation on the Speed Dating dataset (
Section 4.7) provides insight into generalizability across different reciprocal recommendation domains. Speed Dating exhibits a reciprocal structure similar to job recommendation—both parties must mutually agree for successful matching—but lacks the explicit hierarchical features present in job taxonomies, educational credentials, and career progressions. Despite this absence of explicit hierarchy, biLorentzFM achieves 2.8% improvement over biDeepFM on Speed Dating (0.7012 versus 0.6823 for the participant A AUC) with moderate effect sizes (Cohen’s d = 0.86–0.94) indicating substantial practical impact. The learned curvature parameters for Speed Dating participants (
) fall between nearly flat geometries appropriate for purely lateral relationships (
) and strongly hierarchical structures (
for job categories), suggesting moderate latent hierarchy consistent with implicit social desirability patterns in dating preferences. This cross-domain result supports two important conclusions: first, hyperbolic geometry provides benefits even for domains with primarily implicit rather than explicit hierarchies, and second, the magnitude of improvement scales with hierarchical structure clarity with explicit taxonomies (job recommendation: +6.6%) benefiting more than implicit patterns (dating: +2.8%). These findings suggest that practitioners should consider the degree of hierarchical structure when evaluating whether hyperbolic methods justify their computational overhead with domains exhibiting clear vertical relationships (organizational hierarchies, academic ranks, skill progressions) that are the most likely to substantially benefit.
The practical implications of these results for production deployment warrant careful consideration. While biLorentzFM demonstrates clear performance advantages, real-world implementation requires balancing multiple operational constraints. Training efficiency presents the first consideration: although Lorentz embeddings incur 23.5% more per-epoch computational overhead relative to Euclidean baselines, faster convergence (12 versus 18 epochs to early stopping) yields a net 17.8% reduction in total training time. This training efficiency advantage makes overnight model retraining schedules feasible for most production systems operating on daily refresh cycles. Inference latency presents the second consideration: the modest overhead (+16.7%, 2.1 milliseconds per batch of 256 candidates) supports real-time recommendation requirements for systems serving typical query rates below 1000 queries per second. For higher-traffic platforms requiring lower latency, serving optimizations including embedding precomputation, an approximate nearest neighbor search in hyperbolic space [
20], and distributed inference across multiple servers can recover performance. Cold-start scenarios present the third consideration: our analysis reveals that biLorentzFM shows limited improvement (+2.1% candidate AUC) for users or items with fewer than five historical interactions, suggesting that hyperbolic geometry requires sufficient interaction history to learn meaningful hierarchical positions. Approximately 18% of candidates and 23% of job postings in the Kariyer.Net test set fall into this cold-start category, indicating that hybrid approaches combining content-based features with geometric embeddings may better serve newly arriving users and items. These practical considerations suggest that hyperbolic methods are most appropriate for mature recommendation systems with substantial historical data, a clear hierarchical structure in the domain, and computational resources sufficient to support the moderately increased training overhead in exchange for substantial accuracy improvements.
The learned curvature parameter provides a quantitative diagnostic for assessing the domain suitability for hyperbolic methods. Across our experiments, curvature values span a meaningful range: job categories exhibit a strong hierarchy (), reflecting the explicit five-level taxonomy (industry, sector, job family, role, seniority); candidates demonstrate a moderate hierarchy (), capturing educational progressions and experience accumulation; job items show a nearly flat geometry (), indicating that individual postings relate primarily through lateral content similarity rather than vertical relationships; and Speed Dating participants exhibit intermediate hierarchy (), which is consistent with implicit social desirability gradients. The global curvature used in our main experiments () represents an effective compromise capturing the dominant hierarchical structure while maintaining computational efficiency. When the curvature approaches zero during training (), the hyperbolic space degenerates to flat Euclidean geometry, suggesting that simpler Euclidean models may suffice for that particular domain. This diagnostic property enables practitioners to empirically assess whether hyperbolic geometry provides sufficient benefit to justify its implementation complexity: if the learned curvature remains small (e.g., ), Euclidean baselines likely achieve comparable performances with reduced computational cost.
Fairness and bias considerations represent critical concerns for deploying reciprocal recommendation systems in high-stakes domains like employment. Geometric embeddings risk encoding and potentially amplifying the societal biases present in historical training data. Our learned curvature analysis reveals that job categories exhibit the strongest hierarchical structure (
), potentially reflecting not only objective skill progression but also socially constructed prestige hierarchies that may disadvantage certain demographic groups. For example, if historical application data exhibit gender imbalance in senior technical positions—a well-documented phenomenon in technology industries [
42]—hyperbolic embeddings might position male-dominated job categories closer to the hierarchy’s center (smaller
coordinates), implicitly encoding gender bias in the geometric structure itself. This encoded bias could perpetuate inequitable outcomes by assigning lower match probabilities to qualified women candidates for senior positions. Similarly, racial disparities in hiring outcomes could become embedded in learned hierarchies with positions historically dominated by overrepresented groups positioned more favorably in the geometric space. Addressing these fairness concerns requires technical interventions beyond standard accuracy optimization. Recent work on fairness-aware learning in hyperbolic spaces [
43] demonstrates that geometric fairness constraints can reduce bias while maintaining predictive performance, but the adaptation of these techniques to reciprocal recommendation contexts remains an open research problem. Practitioners deploying hyperbolic recommendation systems in production bear responsibility for regular fairness audits across demographic groups, monitoring for disparate impact, and implementing mitigation strategies when bias is detected.
Several limitations of the current work suggest directions for future research. First, the assumption of static hierarchical structures may not hold for dynamic domains where organizational structures evolve, new job categories emerge, and individuals transition between career tracks. Extending biLorentzFM to temporal settings requires adaptive curvature mechanisms capable of detecting hierarchical drift and adjusting the geometric structure accordingly. Potential approaches include time-dependent curvature learning with exponential smoothing, continual learning strategies that incrementally update embeddings as new interactions arrive, or hybrid models combining static global structures with dynamic local perturbations. Second, real-world taxonomies exhibit hierarchical structures at multiple resolutions: job markets contain both broad industry categories (technology, healthcare, finance) and fine-grained specializations (backend engineering, machine learning, quantitative trading). A single global curvature parameter captures an average hierarchical scale but cannot represent varying depths across taxonomy branches. Recent advances in product manifolds [
44] combine multiple geometric components—potentially mixing hyperbolic, Euclidean, and spherical geometries—with different curvatures, enabling multi-scale hierarchical representations. Adapting such approaches to reciprocal recommendations could integrate industry-specific hierarchies while maintaining computational tractability. Third, while our evaluation on two datasets (job recommendation and dating) demonstrates robustness across domains, generalization to other reciprocal matching problems remains to be established. Mentor–mentee pairing, reviewer–paper assignment, roommate matching, and collaborative hiring all exhibit reciprocal structures with domain-specific hierarchies; investigating whether biLorentzFM’s approach transfers to these applications would establish hyperbolic reciprocal recommendation as a general framework rather than a task-specific solution. Fourth, scalability to massive platforms presents engineering challenges: our experiments on Kariyer.Net (1.15M interactions, 230K users) demonstrate feasibility for moderate-scale systems, but extension to platforms like LinkedIn (800M+ users) or Indeed (250M+ monthly visitors) requires addressing computational bottlenecks in embedding memory and pairwise distance calculations. Distributed training strategies, approximate nearest neighbor search methods adapted to hyperbolic spaces, and hierarchical embedding compression techniques represent promising directions for achieving scale.
In conclusion, this work demonstrates that hyperbolic geometry, specifically the Lorentz model combined with factorization machines and multi-objective optimization, provides substantial and robust improvements for reciprocal job recommendation. The core insight—that hierarchical asymmetric relationships fundamental to job–candidate matching align naturally with the geometric properties of negatively curved spaces—translates into consistent empirical gains across multiple evaluation strategies with effect sizes (Cohen’s d = 3.13 from five-fold cross-validation) indicating very large practical impact. Beyond immediate performance improvements, our work establishes hyperbolic reciprocal recommendation as a promising research direction with multiple avenues for extension, including multi-resolution hierarchies for complex taxonomies, dynamic curvature learning for evolving domains, fairness-aware geometric constraints for equitable matching, and applications across diverse reciprocal contexts beyond employment. As automated recommendation systems increasingly mediate access to economic, social, and educational opportunities, ensuring these systems achieve not merely accuracy but also fairness, transparency, and accountability becomes essential. The geometric framework introduced here provides one step toward this goal by making hierarchical relationships explicit and amenable to inspection, but substantial work remains to translate technical advances into systems that serve diverse stakeholders equitably.