Abstract
Precise 3D shape correspondence is a fundamental prerequisite for critical applications ranging from medical anatomical modeling to visual recognition. However, non-isometric 3D shape matching remains a challenging task due to the limited sensitivity of traditional Laplace–Beltrami (LB) bases to local geometric deformations such as stretching and bending. To address these limitations, this paper proposes a Sinkhorn-Regularized Elastic Functional Map framework (SRE-FMaps) that integrates entropy-regularized optimal transport with an elastic thin-shell energy basis. First, a sparse Sinkhorn transport plan is adopted to initialize a bijective correspondence with linear computational complexity. Then, a non-orthogonal elastic basis, derived from the Hessian of thin-shell deformation energy, is introduced to enhance high-frequency feature perception. Finally, correspondence stability is quantified through a cosine-based elastic distance metric, enabling retrieval and classification. Experiments on the SHREC2015, McGill, and Face datasets demonstrate that SRE-FMaps reduces the correspondence error by a maximum of 32% and achieves an average of 92.3% classification accuracy (with a peak of 94.74% on the Face dataset). Moreover, the framework exhibits superior robustness, yielding a recall of up to 91.67% and an F1-score of 0.94, effectively handling bending, stretching, and folding deformations compared with conventional LB-based functional map pipelines. The proposed framework provides a scalable solution for non-isometric shape correspondence in medical modeling, 3D reconstruction, and visual recognition.
1. Introduction
Three-dimensional shape matching is a core problem in computer vision and geometric processing, whose essence is to establish point-wise correspondences between models while preserving geometric or semantic consistency. This task plays a pivotal role across multiple critical application scenarios. In medical imaging, 3D organ modeling enables precise anatomical simulation, supporting drug response assessment and disease mechanism analysis [1]; in face modeling, non-rigid 3D correspondence is fundamental for handling expression variations, directly influencing face recognition robustness [2]; and in industrial inspection, deformation-aware surface recovery ensures accurate structural defect detection in mechanical components, thus safeguarding product reliability [3]. Across these scenarios, non-isometric deformation—characterized by local nonlinear geometric variations such as bending, twisting, and stretching—poses a major challenge, making robust non-isometric matching a prerequisite for real-world deployment.
However, existing 3D matching techniques exhibit critical limitations under such deformation. Traditional Laplace–Beltrami (LB) operator-based methods [4,5] perform reliably in near-isometric conditions due to the isometry-invariant nature of spectral eigenfunctions, which effectively encode global topology. Yet, their low-frequency eigenbases inherently suppress high-frequency signals, rendering them incapable of capturing sharp curvature transitions or localized surface wrinkles. As a result, LB-based pipelines lose discriminative detail under elastic or articulated deformation. To overcome this limitation, our framework replaces LB eigenfunctions with elastic spectral bases [6] that explicitly encode local geometric distortion, thereby recovering deformation-sensitive correspondence that LB operators inherently overlook.
Meanwhile, deep learning-based approaches [7,8] have shown promising matching capabilities but rely heavily on large-scale annotated datasets and supervised training. Their high data dependence and limited generalization to rare or unseen deformation patterns restrict their adoption in industrial and medical environments where labeled data is scarce. Moreover, the high computational burden makes them impractical for large-scale or real-time applications. In contrast, our framework leverages an entropy-regularized Sinkhorn transport module with sparse kernel acceleration to efficiently estimate bijective correspondences without requiring any labeled data, achieving scalability while maintaining robustness.
To address the aforementioned limitations, we propose a Sinkhorn-Regularized Elastic Functional Map framework (SRE-FMaps) for non-isometric 3D shape matching. Unlike the standard elastic basis method [6], which relies on unstable Iterative Closest Point (ICP) [9] refinement, and traditional Sinkhorn approaches [10], which are limited by the Laplace–Beltrami operator, our framework uniquely synergizes these components. SRE-FMaps introduces a non-orthogonal elastic spectral basis derived from the Hessian of thin-shell deformation energy [6], enabling sensitivity to extrinsic deformations and fine-scale curvature variations. In addition, an entropy-regularized Sinkhorn transport module [10] with sparse kernel acceleration is incorporated to compute a bijective and computationally efficient initial correspondence. Intuitively, while the elastic basis acts as a “deformation-aware lens” to capture high-frequency details, the Sinkhorn regularization functions as a “soft assignment filter”, ensuring global consistency without getting trapped in local minima. Furthermore, a multi-scale cosine-based elastic distance metric is proposed to jointly encode global topology and local geometric detail, enabling both matching and classification within a unified framework.
The main contributions of this work are summarized as follows:
- Elastic deformation-aware spectral representation. We propose SRE-FMaps, a Sinkhorn-Regularized Elastic Functional Map framework that replaces conventional LB eigenbases with thin-shell-derived elastic basis functions, greatly enhancing sensitivity to high-frequency non-isometric deformation.
- Scalable entropy-regularized optimal transport correspondence. We integrate entropy-regularized Sinkhorn transport with sparse kernel acceleration to establish bijective and computationally efficient initialization, achieving linear complexity while maintaining high correspondence accuracy.
- Unified matching and classification via elastic cosine distance. We design a multi-scale cosine-based elastic distance metric that fuses global spectral information with local geometric features, enabling reliable evaluation in both 3D correspondence and classification tasks.
The remainder of this paper is organized as follows: Section 2 reviews related work on functional maps and deep shape matching strategies. Section 3 details the proposed SRE-FMaps framework, including the sparse Sinkhorn initialization and elastic basis construction. Section 4 presents the experimental results and performance comparisons on standard datasets. Finally, Section 5 concludes this paper and discusses future directions.
2. Related Work
Since 2012, the theory of functional maps [4] and its combination with geometric learning have gradually become the core research framework in the field of 3D non-rigid shape matching. Early work focused on mining the intrinsic geometric properties of shapes (such as Laplace–Beltrami spectral features [5]) and optimizing map constraints (such as orientation preservation and feature alignment), laying the theoretical foundation for low-dimensional matching based on spectral space. In addition, innovative methods based on deep learning continue to emerge, combined with data-driven strategies, to improve the accuracy of shape matching and promote the continued development of this field. This section will introduce traditional functional maps and deep functional maps.
2.1. Traditional Functional Maps
Traditional functional maps, first introduced by Ovsjanikov et al. [4], reformulate non-rigid 3D shape matching as a linear mapping problem by enforcing geometric consistency constraints. Correspondence refinement is typically achieved through least-squares optimization combined with Iterative Closest Point (ICP) post-processing [9]. However, such pipelines often suffer from symmetric inversion artifacts in point-to-point recovery.
Subsequent improvements aimed to address this instability. Corman et al. [11] resolved directional ambiguity via supervised descriptor learning and vector field regularization, while Nogneng et al. [12] strengthened local consistency by enforcing operator commutativity constraints. Huang et al. [13] proposed AdjointFmaps, introducing bidirectional coupling through adjoint operators. Poulenard et al. [14] further enhanced robustness by integrating topological invariants as explicit constraints. To improve resolution, Melzi et al. [15] introduced ZoomOut, which progressively upsamples coarse maps via spectral refinement, and Hu et al. [16] incorporated multi-scale spectral manifold wavelets (SMWs) [17] for richer feature encoding.
Despite their success, traditional functional maps fundamentally rely on Laplace–Beltrami (LB) eigenfunctions [5], which primarily capture intrinsic low-frequency structure and are insensitive to extrinsic deformations such as stretching and compression. To overcome this limitation, Magnet et al. [18] proposed ScalableFmaps for dense matching without mesh simplification, while Hartwig et al. [6] introduced an elastic spectral basis derived from thin-shell deformation energy. By encoding high-frequency curvature responses directly into the basis, the latter significantly improves alignment in regions with large geometric distortion.
2.2. Deep Functional Maps
Existing methods built on the functional map framework predict functional maps by learning feature representations robust to discretization variations, thereby advancing the development of supervised and unsupervised learning paradigms in 3D shape analysis. The deep functional map framework (FMNet), proposed by Litany et al. [7], models shape matching as a supervised correspondence problem—one that learns functional relationships between shapes via neural networks to achieve more flexible global matching than traditional point-to-point map approaches. Inspired by FMNet, subsequent studies have widely embraced functional maps (FMs) as core matching modules. However, conventional approaches rely heavily on approximate isometric assumptions and necessitate elaborate post-processing pipelines—this fundamentally restricts their applicability to scenarios involving non-isometric deformations.
Subsequent studies inspired by FMNet mostly use FMs as the matching layer, but traditional methods rely on approximate isometry assumptions and have complex post-processing, making them difficult to adapt to non-isometric deformation scenarios. The deep geometric matching framework that has emerged in recent years has broken through this limitation through multidimensional innovations. The Deep Shells framework by Eisenberger et al. [19] incorporates Smooth Shells into an end-to-end architecture, processing both intrinsic and extrinsic geometric information via entropy-regularized optimal transport and spectral CNN features [20]. Building on DiffusionNet [8], Donati et al. [21] developed DUO-FMNet using complex-valued functional maps [22] to achieve directional awareness, enabling unsupervised learning of orientation-sensitive features. Hu et al. [23] proposed RFMNet, synergizing diffusion-based features with optimal transport theory to reduce dependency on handcrafted descriptors while enhancing geometric fusion. Magnet et al. [18] further advanced efficiency with DiffZO, enabling large-scale correspondence through differentiable Zoomout optimization and topology-aware constraints.
The most recent advances have focused on addressing computational scalability and unsupervised robustness. For instance, Magnet et al. [24] proposed a memory-scalable learning pipeline to handle dense meshes, while others have integrated synchronous diffusion models to ensure smooth correspondence without supervision [25]. Furthermore, deep frequency-aware networks have been introduced to enhance spectral stability [26]. Despite these significant strides, ensuring bijective correspondence efficiently without heavy data dependence remains a challenge, which our SRE-FMaps aims to address.
In summary, the review of the existing literature reveals a critical dichotomy in non-isometric shape matching. While traditional spectral methods are mathematically elegant and efficient, their reliance on the isometry-invariant Laplace–Beltrami operator fundamentally limits their ability to capture high-frequency deformations like stretching and bending. Conversely, although deep learning approaches offer improved flexibility, they suffer from heavy dependence on large-scale annotated data and lack generalization to unseen deformation types. These observations suggest a clear need for a framework that incorporates the deformation sensitivity of physical models (such as elastic thin-shell energy) with the computational efficiency of modern optimization techniques, without the constraints of supervised training. This motivates the design of our SRE-FMaps, which bridges this gap by regularizing elastic functional maps with the Sinkhorn algorithm.
3. Methods
3.1. Overview of the SRE-FMaps Framework
Figure 1 illustrates the framework of the non-isometric 3D shape matching algorithm based on elastic functional maps. The specific procedural steps are as follows: (A). Input 3D Models: The input 3D models undergo non-isometric transformations, where their geometries are deformed in a manner that does not preserve the original distance and angle relationships. This deformation causes stretching, compression, bending, or twisting in specific regions, thereby altering local or global geometric properties. (B). Initial Map Calculation: Sparse kernel allocation is performed on the input models. An entropy-regularized transport plan is generated through Sinkhorn iterations, from which the final correspondence is extracted to complete the initial map computation. (C). Elastic Basis Calculation: The SRE-FMaps framework is constructed to generate a non-orthogonal basis functional map. An alternating optimization framework is employed to refine the mapping, ultimately outputting the correspondence between the two models. (D). Distance Calculation and Classification: Based on the established correspondence, the distance between models is computed, followed by classification to evaluate and output the classification accuracy.
Figure 1.
The framework of SRE-FMaps for non-isometric 3D shape matching in imaging applications. (A) Input 3D Models; (B) Initial Map Calculation; (C) Elastic Basis Calculation; (D) Distance Calculation and Classification.
To strictly evaluate the proposed method within a comprehensive experimental context, we adopt a systematic methodological pipeline, as illustrated in Figure 2. This pipeline encompasses four sequential stages: Stage 1 (Pre-processing) performs data standardization and spectral basis construction to establish the geometric representation space; Stage 2 (Model Building) implements the core SRE-FMaps algorithm, integrating Sinkhorn initialization and elastic functional map optimization; Stage 3 (Post-processing) converts the computed functional maps into point-to-point correspondences; and Stage 4 (Evaluation) quantifies performance using matching metrics (e.g., geodesic error, bijectivity error, etc.) and classification metrics (e.g., cosine distance [27], etc.).
Figure 2.
The methodological pipeline of the proposed study. It consists of four main stages: Pre-processing (basis construction), Model Building (SRE-FMaps core algorithm), Post-processing (correspondence recovery), and Evaluation (performance assessment).
3.2. Initial Map Calculation
The initial point-by-point correspondence is computed through the following procedure.
First, we establish the link between the functional representation and spatial mapping. By leveraging the property that the functional map aligns the spectral embeddings of two shapes, the point-to-point map can be recovered via a nearest neighbor search in the spectral domain. Detailed mathematical derivations regarding the adjoint operators and the consistency between spectral and spatial representations are provided in Appendix A to maintain focus on the core framework.
Next, the idea of extracting point-by-point map by traversing all points on the shape is realized:
where returns the point in p closest to query q. This step effectively converts the spectral alignment into spatial probability distributions.
Then, the spectral embedding alignment problem is treated as a linear assignment problem to be solved, where the point-to-point map is a doubly stochastic matrix and its search space is . The problem can be formulated as follows:
where , and are predefined initial masses for each vertex of x and y, respectively, and is the pairwise Euclidean distance matrix between aligned embeddings: .
Finally, the Sinkhorn algorithm is adopted to solve the linear assignment problem and construct a sparse kernel assignment matrix. This matrix only retains the distance values of a few nearest neighbors for each point (denoted as for the nearest neighbor set of the i-th point on X), which is expressed as follows:
The sparse kernel generated by Sinkhorn iteration improves the accuracy and bijectivity of the initial correspondence without significantly increasing the runtime, and the final initial point-wise correspondence is derived from this iteration.
3.3. SRE-FMaps
Based on the obtained initial map, solving for the elastic basis requires constructing the core operators of the functional map method adapted to non-orthogonal basis functions. Unlike standard spectral bases, the elastic basis functions are not orthonormal with respect to the standard inner product. This necessitates a careful definition of adjoint operators to ensure geometric consistency during map refinement.
3.3.1. Functional Maps Under Orthogonal and Non-Orthogonal Bases
First, we clarify key symbols for consistency: Let denote the space of real-valued functions on shape S; arrange a general basis of its k-dimensional subspace into a matrix . For non-orthogonal bases, the mass matrix in the spectral domain, defined as , is not an identity matrix.
A fundamental step for restricted bases is computing the best approximation of a function in , which is defined by the orthogonal projection operator:
where denotes the pseudoinverse of and is the mass-weighted norm.
In the standard functional map framework, the adjoint of the map matrix coincides with its transpose. However, due to the non-orthogonality of the elastic basis, the adjoint operator must be corrected by the mass matrices. The relationship between the functional map and the point-to-point map is given by
This equation ensures that the geometric structure of the map is preserved in the spectral domain. We provide the detailed mathematical derivation of this relationship and the definition of the adjoint operators in Appendix A.
3.3.2. Point-to-Point Map Calculation
To recover a precise point-to-point correspondence from the functional map, we enforce consistency between the spectral and spatial representations. The term exhibits regularization properties, capturing shape differences and enhancing invertibility. Incorporating this into the objective function yields
where the norm is defined on the orthogonal complement space of , imposing more comprehensive constraints on the continuity and consistency of .
Minimizing J effectively aligns the spectral embeddings of the two shapes. The optimization with respect to the point-to-point map is row-separable. For all , the minimizer satisfies if and only if
This step can be interpreted as a nearest neighbor search in the aligned spectral embedding space (further details on the derivation are provided in Appendix B). We employ an alternating optimization framework (similar to ZoomOut [15]) to iteratively refine and by increasing the spectral resolution k.
3.4. Shape Distance Measurement
In machine learning, shape features are typically represented as high-dimensional vectors. To quantify the similarity between two shape feature vectors, cosine similarity [27] is employed—this metric focuses on the angular relationship between vectors, avoiding distortion from scalar differences. For two shapes quantized as n-dimensional feature vectors A and B, their cosine similarity is defined as follows:
where denotes the dot product of A and B, and are the Euclidean (L2) norms of A and B, respectively.
Derived from cosine similarity, cosine distance quantifies the angular divergence between two vectors, making it particularly suitable for high-dimensional shape feature spaces. Its definition is
The cosine distance ranges from 0 to 2: a value of 0 indicates identical vector directions (high shape similarity), while 2 indicates opposite directions (low similarity). Critically, this metric depends only on vector direction (not magnitude)—even if two shapes have structural or scalar differences, their cosine distance remains small if their feature vectors are directionally aligned. This property makes it ideal for normalized shape data, as it mitigates interference from non-isometric deformations (e.g., stretching and compression) and accurately captures the intrinsic similarity between geometric forms.
4. Experiments
This section details the experimental setup and results. We first introduce the datasets employed in our evaluation, then validate the effectiveness of the proposed elastic functional maps, and finally present the results for shape distance calculation and classification.
4.1. Experimental Datasets
To comprehensively evaluate the proposed framework, we selected three datasets with diverse characteristics, ranging from articulated objects to fine-grained anatomical surfaces. The SHREC 2015 dataset [28] and the McGill dataset [29] serve as public benchmarks, while a private Face dataset is introduced for high-fidelity deformation analysis.
SHREC 2015 dataset: The SHREC 2015 dataset comprises 1200 watertight 3D triangle meshes across 50 categories. It is a balanced dataset, where each category contains exactly 24 objects exhibiting distinct poses and topological variations. Designed primarily for non-rigid 3D shape retrieval, this dataset evaluates algorithmic robustness against shape deformation and topological changes.
McGill dataset: The McGill dataset consists of 255 objects with articulated parts, spanning 10 categories (e.g., ants, crabs, hands, and humans). This dataset is characterized by significant non-isometric articulated deformations, such as limb rotations and bending, which effectively tests the robustness of the algorithm against part-based variations.
Face dataset: This dataset contains 100 high-resolution 3D face mesh models, preserving detailed anatomical features. Collected specifically for this study, it is characterized by fine-grained geometric details and captures subtle non-isometric deformations caused by varying facial expressions. This high-fidelity data is particularly suitable for analyzing subtle surface variations.
Data Pre-processing: To ensure consistent input for spectral analysis, all 3D meshes underwent a standardized pre-processing pipeline. Initially, center-of-mass alignment was performed to normalize spatial positioning. Subsequently, to address scale ambiguity between models, we normalized the surface area of each mesh to unity. Finally, to establish the functional representation space, we constructed spectral basis functions on the normalized meshes, providing the essential geometric encoding for the subsequent SRE-FMaps framework.
4.2. Experimental Design
Parameter evaluation was conducted on the McGill dataset. We selected a representative pair of non-isometrically deformed human shapes and performed an ablation study to validate the effectiveness of the core components within our framework. Specifically, we compared multiple initialization strategies (nearest neighbor [NN] vs. Sinkhorn) and spectral bases (Laplace–Beltrami [LB] vs. elastic basis [EB]) by varying the number of basis functions. Performance was assessed in terms of both accuracy and efficiency using six widely adopted metrics: average geodesic error, bijectivity error, coverage, Chamfer distance, smoothness, and runtime.
For notational consistency, let and denote the geodesic distance on the source shape S and target shape T, respectively; the estimated correspondence; A and B two point sets of sizes and ; and the indicator function. To quantitatively assess correspondence quality, we report six commonly used metrics.
Geodesic Error (GtErr, mm). This metric evaluates the intrinsic distortion induced by the mapping. Given a set of point pairs on the source shape and their mapped counterparts on the target via correspondence , it is defined as
Lower values indicate better preservation of intrinsic surface structure.
Bijectivity Error (%). To measure the consistency between forward and backward mappings, we compute
A perfectly bijective correspondence yields zero error.
Coverage (%). We further report the proportion of source points that are correctly mapped to valid targets:
Higher coverage reflects robustness against degeneration or collapse.
Chamfer Distance (mm2). To evaluate geometric proximity in Euclidean space, we compute the symmetric Chamfer distance
This complements GtErr by assessing visual and spatial alignment.
Smoothness (Dirichlet Energy, a.u.). The local regularity of the mapping function f is measured by
Lower values indicate smoother and more stable correspondences.
Runtime (s). We report the empirical wall-clock time T and additionally list the theoretical complexity
This presentation provides insight into both the practical efficiency and scalability of the proposed method.
As shown in Figure 3, the six evaluation metrics of the nearest neighbor (NN) and Sinkhorn initialization strategies were compared under varying numbers of basis functions. For geodesic error, NN exhibits a decreasing–increasing trend, whereas Sinkhorn consistently maintains lower distortion and achieves its minimum at 90 bases. For bijectivity error, NN fluctuates strongly, while Sinkhorn remains stable and steadily improves as the basis size increases. Coverage further highlights the advantage of Sinkhorn: NN yields almost no valid matches (close to 0%), whereas Sinkhorn improves monotonically and stabilizes at approximately 70% when 90 bases are used. The Chamfer distance of both methods decreases rapidly at first and then gradually levels off, with Sinkhorn achieving the lowest value at 90 bases. Regarding smoothness, the Dirichlet energy of both mappings decreases with increasing basis size, yet Sinkhorn consistently yields smoother functional assignments. Finally, the runtime of both methods increases with the number of bases, and Sinkhorn incurs a higher computational cost than NN.
Figure 3.
Comparison of NN and Sinkhorn under different basis numbers.
Synthesizing these metric evaluations, we further analyze the optimal parameter selection and the algorithm’s scalability to balance accuracy and efficiency.
First, regarding the parameter selection, the number of basis functions (k) represents a critical trade-off between spectral resolution and computational noise. As analyzed in Figure 3, setting results in the loss of high-frequency geometric details, while introduces spectral instability and significantly increases the complexity of Sinkhorn iterations. Consequently, based on the convergence of error metrics, was empirically selected as the optimal balance.
Second, addressing the computational cost noted above, we analyze the scalability of the framework. The complexity of SRE-FMaps is primarily dominated by the Sinkhorn initialization (). While this is efficient for standard datasets (e.g., ∼5–10 k vertices), the runtime scales quadratically with mesh resolution. To mitigate this for very large meshes (>50 k vertices), we recommend utilizing the proposed sparse kernel approximation, which reduces the effective complexity to near-linear (), ensuring practical applicability in industrial scenarios.
Matching Performance Evaluation
After obtaining the initial correspondence, we further validate the effectiveness of elastic functional maps by comparing the Laplace–Beltrami (LB) operator with the elastic basis (EB) on the Face dataset. As shown in Figure 4, LB aligns prominent regions such as the nose and mouth but exhibits noticeable mismatches in flatter or highly deformable areas (e.g., cheeks and chin), resulting in fragmented or distorted color transfer. In contrast, EB produces globally consistent correspondences with smooth region transitions, indicating higher robustness and stability under non-isometric variations.
Figure 4.
Comparison of the visualization results of the LB operator and elastic basis. Red boxes indicate the regions with inconsistent correspondences.
A quantitative comparison on the McGill human pair further supports this observation. Figure 5 reports the cumulative correspondence ratio under varying point-to-point error thresholds for Initial, LB, and EB mappings. EB rises the fastest in the low-error range and consistently remains the highest across all thresholds, demonstrating the smallest alignment error and the largest proportion of reliable matches. LB performs moderately, while the Initial map retains the fewest valid correspondences. Overall, EB exhibits superior accuracy and error tolerance, confirming its advantage in non-isometric shape matching.
Figure 5.
Visual correspondence comparison between the Laplace–Beltrami (LB) operator and the elastic basis (EB) on the Face dataset.
4.3. Comparative Experiment
To validate the effectiveness of our approach, we compare it against representative functional map-based methods on three datasets (Face, SHREC2015, and McGill), with qualitative correspondence visualizations shown in Figure 6. Across all datasets, our method achieves markedly better structural continuity and alignment accuracy.
Figure 6.
Comparison between ZoomOut, ICP, Scalable-ZoomOut, EB, and our method on three representative datasets (Face, SHREC2015, and McGill).
On the Face dataset, classical methods (ZoomOut, ICP, and Scalable-ZoomOut) exhibit color discontinuities in fine-scale regions such as the nasal bridge and mouth corners, while the elastic basis (EB) variant smooths global transitions but blurs local features. In contrast, our method preserves both natural gradients and precise feature boundaries. A similar pattern is observed on SHREC2015 (centaur shapes) and McGill (gorilla shapes), where competing methods introduce deformation artifacts at limb joints or extremities, whereas our approach maintains coherent and geometrically consistent correspondences.
It is important to note that our comparisons are restricted to methods within the functional map (FM) framework, as the proposed elastic functional maps specifically target limitations of classical FM pipelines—such as reliance on isometric assumptions and post-processing refinements. Direct comparison with neural models (e.g., FMNet [7] and Deep Shells [19]) would introduce additional learning-related variables and obscure the contribution of our geometric regularization strategy.
Overall, the visual evidence confirms that our method produces more accurate and continuous correspondences than existing FM-based approaches, demonstrating greater robustness under non-isometric deformation.
4.4. Shape Distance Calculation and Classification Results
4.4.1. Shape Distance Calculation
For the Face dataset, six shapes were used and divided into two classes (three male and three female). For the SHREC2015 dataset, six representative categories (centaur, ant, gorilla, pliers, dog, and human) were selected, with one shape per class. Table 1 summarizes the cosine distance results of four correspondence strategies (NN + LB, NN + EB, Sinkhorn + LB, and Sinkhorn + EB). On the Face dataset, a smaller intra-class cosine distance indicates stronger similarity among samples of the same gender, while a larger inter-class distance reflects better discriminative ability. Among all compared methods, the Sinkhorn+EB configuration achieves the lowest intra-class distances while maintaining clear inter-class separation, suggesting improved feature compactness and better class separability.
Table 1.
Cross-dataset method performance comparison. For the Face dataset, C1 = Male and C2 = Female. For the SHREC2015 dataset, C1 = Centaur, C2 = Ant, C3 = Gorilla, C4 = Pliers, C5 = Dog, and C6 = Human. Repeated combinations (e.g., C1-C1) indicate different subject pairs from the same class.
4.4.2. Classification Results
Figure 7 illustrates the cosine distance heatmaps for both datasets. On the Face dataset, the Sinkhorn + EB configuration exhibits clear block-diagonal structures, characterized by compact intra-class regions and sharply defined inter-class boundaries. A similar trend is observed on SHREC2015, where six categories are cleanly separated without noticeable cross-category leakage. In contrast, the remaining methods suffer from either intra-class dispersion or inter-class entanglement. These results suggest that Sinkhorn + EB produces more stable and discriminative correspondence patterns in both binary and multi-class settings.
Figure 7.
Cosine distance heatmaps generated by different correspondence strategies. In the heatmaps, warmer colors (e.g., red, yellow) represent smaller cosine distances (higher similarity), while cooler colors (e.g., blue, cyan) represent larger cosine distances (lower similarity). Sinkhorn + EB exhibits clear block-diagonal structures with high intra-class compactness and inter-class separability on both Face and SHREC2015 datasets.
To further validate classification robustness under different task complexities, two representative datasets were employed. The Face dataset corresponds to a binary scenario, where 99 models were split into 80% training and 20% testing sets. The SHREC2015 dataset represents a more challenging six-class setting, with a 70%/30% train–test split over 35 models.
Performance was evaluated using standard global metrics (accuracy, precision, recall, and F1-score), complemented by specificity, which is particularly informative for negative-sample detection in binary classification (see Table 2). The mathematical definitions of these standard performance metrics are adopted from [30]. Across all correspondence strategies (NN + LB, NN + EB, Sinkhorn + LB, and Sinkhorn + EB), the proposed Sinkhorn+EB consistently achieves superior results on both datasets (see Table 3 and Table 4). On the Face dataset, it achieves 94.74% accuracy with perfect specificity and precision (100%), while retaining an 88.89% recall rate (F1 = 0.94), indicating that it can detect positive samples effectively without introducing false alarms. On SHREC2015, despite higher inter-class diversity, Sinkhorn+EB still maintains balanced precision and recall at 91.67% (F1 = 0.89), outperforming all baselines.
Table 2.
Commonly used classification evaluation indicators.
Table 3.
Comparison of classification performance on the Face dataset.
Table 4.
Comparison of classification performance on the SHREC2015 dataset.
Taken together, these results demonstrate the complementary benefits of Sinkhorn optimization and elastic basis construction. While Sinkhorn improves correspondence stability over nearest neighbor initialization, the elastic basis contributes additional discriminative power by capturing extrinsic deformation cues. Their combination yields both intra-class compactness and inter-class separability, whereas NN+LB notably suffers from severe recall degradation (only 33.33% on Face), revealing its limitations in handling complex deformations.
4.5. Discussion and Limitations
This study presented SRE-FMaps, a framework designed to address the limitations of Laplace–Beltrami bases in handling non-isometric deformations. The primary outcome of our study, as evidenced by the experiments on the Face, McGill, and SHREC2015 datasets, is that replacing the standard spectral basis with a Hessian-derived elastic basis significantly enhances the preservation of high-frequency features under large deformations. Furthermore, the integration of entropy-regularized Sinkhorn transport provides a globally consistent initialization that avoids the local minima often encountered by nearest neighbor approaches.
To validate the individual contributions of these components, we conducted an ablation study by comparing four configurations: NN + LB, NN + EB, Sinkhorn + LB, and Sinkhorn + EB (Table 1, Table 3, and Table 4). The quantitative results demonstrate that while the elastic basis (EB) alone improves local detail preservation, its combination with Sinkhorn initialization yields the highest accuracy and robustness. This confirms that a bijective and globally consistent initialization is a prerequisite for fully leveraging the deformation-aware properties of the elastic basis. Specifically, the elastic basis (EB) encodes the local minimization of the elastic shell energy, making the basis functions inherently robust to non-isometric changes like local bending and folding. This mechanism allows SRE-FMaps to naturally preserve high-curvature details during functional map matching, a capability often degraded by standard spectral methods.
Although the proposed SRE-FMaps framework demonstrates strong robustness, several limitations remain. First, when encountering extreme distortions (e.g., stretching or compression exceeding 50%), the high-frequency components of the elastic basis become less stable, leading to a noticeable degradation in correspondence precision due to weakened locality of the elastic shell eigenfunctions.
Second, a limitation lies in the construction of the sparse kernel matrix, where the neighborhood size must be manually specified. While a fixed value performs well within a dataset, the optimal setting varies across shape categories and deformation types, indicating the lack of an adaptive selection mechanism. Furthermore, the iterative refinement still requires multiple steps, which may become computationally expensive on very large point sets.
Future work will address these limitations through two dimensions. To overcome the manual parameter selection, we plan to implement an adaptive neighborhood strategy based on local curvature entropy, allowing to vary dynamically according to geometric complexity. Additionally, to handle extreme topological variations and improve computational efficiency, we will explore a coarse-to-fine multi-scale optimization scheme that progressively refines the elastic basis, ensuring stability even under severe non-isometric distortions, and investigate multi-resolution solvers for large-scale dense point clouds.
5. Conclusions
This paper presents an elastic functional map-based algorithm for non-isometric 3D shape matching. To address the instability of traditional Laplace–Beltrami eigenbases and the misalignment caused by large deformations, correspondence is initialized using the Sinkhorn algorithm, yielding bijective and globally consistent mappings via a sparse entropy-regularized kernel. A non-orthogonal elastic basis derived from the Hessian of elastic shell energy further enhances local sensitivity and high-frequency responsiveness. Extensive experiments on the Face, McGill, and SHREC2015 datasets confirm that the proposed framework outperforms existing approaches in terms of matching accuracy, robustness, and classification capability, particularly under high-curvature bending and localized folding. These results demonstrate that SRE-FMaps offers a practical and efficient solution for real-world non-isometric shape correspondence. However, limitations remain regarding the stability of basis functions under extreme distortions (e.g., excessive stretching) and the reliance on manual parameter tuning for kernel construction, which may incur high computational costs on large-scale dense point clouds. To overcome these limitations, future work will focus on exploring curvature-aware or entropy-driven strategies to automatically adjust the neighborhood size , as well as investigating multi-resolution solvers and data-driven refinement of elastic bases to further enhance scalability and robustness against extreme topological variations.
Author Contributions
Conceptualization, D.Z. (Dan Zhang) and N.W.; methodology, D.Z. (Dan Zhang); software, N.W. and D.Z. (Dong Zhao); validation, N.W.; formal analysis, N.W. and D.Z. (Dong Zhao); investigation, N.W. and D.Z. (Dong Zhao); resources, D.Z. (Dan Zhang); writing—original draft preparation, Y.Z.; writing—review and editing, D.Z. (Dan Zhang) and Y.Z.; visualization, Y.Z.; supervision, D.Z. (Dan Zhang); project administration, D.Z. (Dan Zhang) and D.Z. (Dong Zhao); funding acquisition, D.Z. (Dan Zhang). All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially supported by the National Nature Science Foundation of China (No. 6246070542, 62102213, 62262056), Natural Science Youth Foundation of Qinghai Province (No.2023-ZJ-947Q).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original data presented in the study are openly available in SHREC 2015 dataset at https://www.icst.pku.edu.cn/zlian/representa/3d15/dataset/index.htm (accessed on 15 October 2025) and in the McGill 3D Shape Benchmark at http://www.cim.mcgill.ca/~shape/benchMark/ (accessed on 15 October 2025), but currently unavailable due to webpage parsing failure. Alternative access can be requested from the McGill University School of Computer Science or via academic data repositories such as IEEE DataPort).
Acknowledgments
The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which significantly improved the quality of this manuscript.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Derivation of Adjoint Operators
In this appendix, we provide the detailed mathematical definitions for the inner products and derive the adjoint operators referenced in Section 3.2 and Section 3.3. We first define the general continuous case and then derive the specific matrix forms required for non-orthogonal elastic bases.
Appendix A.1. Continuous Definitions and Inner Products
Let and denote the spaces of real-valued functions on source shape X and target shape Y, respectively. The standard inner product of two functions is defined as follows:
In the context of functional maps, a linear operator transfers functions from X to Y. The adjoint operator is implicitly defined by the following adjoint property, which must hold for all and :
This property forms the theoretical basis for converting functional correspondence back to spatial point-to-point correspondence.
Appendix A.2. Discrete Discretization and Mass Matrices
When shapes are discretized into meshes with and vertices, functions are represented as vectors. Let and be the diagonal lumped mass matrices (containing area weights) for shapes and . The discrete inner product becomes
For the specific case of SRE-FMaps, we employ a truncated elastic basis. Let be the matrix containing the first k elastic basis functions. The spectral mass matrix in this reduced basis is defined as follows:
Note: Unlike the Laplace–Beltrami eigenbasis, the elastic basis is generally non-orthogonal, meaning . This requires explicit correction terms in the adjoint derivations below.
Appendix A.3. Adjoint of the Functional Map Matrix
Let be the functional map matrix (the discrete representation of ) transferring coefficients from basis to . Its adjoint matrix (representing ) is derived from the discrete version of Equation (A2):
where are coefficient vectors. This can be expanded using the inner product definition:
Since this holds for arbitrary vectors, we obtain the closed-form expression for the adjoint functional map:
Appendix A.4. Adjoint of the Point-to-Point Map
Similarly, let be the binary matrix representing the point-to-point correspondence. Its adjoint satisfies . Following the same derivation logic as above yields
Appendix B. Derivation of Spectral–Spatial Consistency
In this appendix, we utilize the adjoint operators derived in Appendix A to establish the relationship between the functional map and the point map in the elastic basis. This leads to the consistency equation used in Equation (5) of the main text.
The functional map is computed by projecting the mapped basis functions. Using the generalized pseudoinverse , we have:
Expanding the transpose (noting that due to mass symmetry),
The term cancels to identity:
By inserting appropriately, we recover the adjoint definitions:
Thus, we arrive at the consistency equation used in the main text:
This confirms that our framework preserves the geometric structure of the map even when using non-orthogonal elastic bases.
References
- Weinhart, M.; Hocke, A.; Hippenstiel, S.; Kurreck, J.; Hedtrich, S. 3. organ models—Revolution in pharmacological research? Pharmacol. Res. 2019, 139, 446–451. [Google Scholar] [CrossRef] [PubMed]
- Al-Osaimi, F.; Bennamoun, M.; Mian, A. An expression deformation approach to non-rigid 3D face recognition. Int. J. Comput. Vis. 2009, 81, 302–316. [Google Scholar] [CrossRef]
- Salzmann, M.; Pilet, J.; Ilic, S.; Fua, P. Surface deformation models for nonrigid 3D shape recovery. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1481–1487. [Google Scholar] [CrossRef] [PubMed]
- Ovsjanikov, M.; Ben-Chen, M.; Solomon, J.; Butscher, A.; Guibas, L. Functional maps: A flexible representation of maps between shapes. ACM Trans. Graph. (TOG) 2012, 31, 30. [Google Scholar] [CrossRef]
- Rustamov, R.M. Laplace-Beltrami eigenfunctions for deformation invariant shape representation. In Symposium on Geometry Processing; Eurographics Association: Goslar, Germany, 2007; Volume 257, pp. 225–233. [Google Scholar]
- Hartwig, F.; Sassen, J.; Azencot, O.; Rumpf, M.; Ben-Chen, M. An elastic basis for spectral shape correspondence. In Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, Los Angeles, CA, USA, 6–10 August 2023; pp. 1–11. [Google Scholar]
- Litany, O.; Remez, T.; Rodola, E.; Bronstein, A.; Bronstein, M. Deep functional maps: Structured prediction for dense shape correspondence. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5659–5667. [Google Scholar]
- Sharp, N.; Attaiki, S.; Crane, K.; Ovsjanikov, M. Diffusionnet: Discretization agnostic learning on surfaces. ACM Trans. Graph. (TOG) 2022, 41, 27. [Google Scholar] [CrossRef]
- Chetverikov, D.; Svirko, D.; Stepanov, D.; Krsek, P. The trimmed iterative closest point algorithm. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; IEEE: New York, NY, USA, 2002; Volume 3, pp. 545–548. [Google Scholar]
- Pai, G.; Ren, J.; Melzi, S.; Wonka, P.; Ovsjanikov, M. Fast sinkhorn filters: Using matrix scaling for non-rigid shape correspondence with functional maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 384–393. [Google Scholar]
- Corman, E.; Ovsjanikov, M.; Chambolle, A. Continuous matching via vector field flow. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2015; Volume 34, pp. 129–139. [Google Scholar]
- Nogneng, D.; Ovsjanikov, M. Informative descriptor preservation via commutativity for shape matching. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2017; Volume 36, pp. 259–267. [Google Scholar]
- Huang, R.; Ovsjanikov, M. Adjoint map representation for shape analysis and matching. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2017; Volume 36, pp. 151–163. [Google Scholar]
- Poulenard, A.; Skraba, P.; Ovsjanikov, M. Topological function optimization for continuous shape matching. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2018; Volume 37, pp. 13–25. [Google Scholar]
- Melzi, S.; Ren, J.; Rodola, E.; Sharma, A.; Wonka, P.; Ovsjanikov, M. Zoomout: Spectral upsampling for efficient shape correspondence. arXiv 2019, arXiv:1904.07865. [Google Scholar] [CrossRef]
- Hu, L.; Li, Q.; Liu, S.; Liu, X. Efficient deformable shape correspondence via multiscale spectral manifold wavelets preservation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14536–14545. [Google Scholar]
- Hammond, D.K.; Vandergheynst, P.; Gribonval, R. Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal. 2011, 30, 129–150. [Google Scholar] [CrossRef]
- Magnet, R.; Ovsjanikov, M. Scalable and efficient functional map computations on dense meshes. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2023; Volume 42, pp. 89–101. [Google Scholar]
- Eisenberger, M.; Toker, A.; Leal-Taixé, L.; Cremers, D. Deep shells: Unsupervised shape correspondence with optimal transport. Adv. Neural Inf. Process. Syst. 2020, 33, 10491–10502. [Google Scholar]
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
- Donati, N.; Corman, E.; Ovsjanikov, M. Deep orientation-aware functional maps: Tackling symmetry issues in shape matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 742–751. [Google Scholar]
- Donati, N.; Corman, E.; Melzi, S.; Ovsjanikov, M. Complex functional maps: A conformal link between tangent bundles. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2022; Volume 41, pp. 317–334. [Google Scholar]
- Hu, L.; Li, Q.; Liu, S.; Yan, D.M.; Xu, H.; Liu, X. RFMNet: Robust deep functional maps for unsupervised non-rigid shape correspondence. Graph. Model. 2023, 129, 101189. [Google Scholar] [CrossRef]
- Magnet, R.; Ovsjanikov, M. Memory-scalable and simplified functional map learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4041–4050. [Google Scholar]
- Cao, D.; Lähner, Z.; Bernard, F. Synchronous diffusion for unsupervised smooth non-rigid 3D shape matching. In European Conference on Computer Vision; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 262–281. [Google Scholar]
- Luo, F.; Li, Q.; Hu, L.; Wang, H.; Xu, H.; Liu, X.; Liu, S.; Chen, H. Deep Frequency Awareness Functional Maps for Robust Shape Matching. In IEEE Transactions on Visualization and Computer Graphics; IEEE: New York, NY, USA, 2025; Volume 31, pp. 7781–7794. [Google Scholar]
- Salton, G.; Wong, A.; Yang, C.S. A vector space model for automatic indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef]
- Pickup, D.; Sun, X.; Rosin, P.L.; Martin, R.R.; Cheng, Z.; Nie, S.; Jin, L. SHREC’15 track: Canonical forms for non-rigid 3D shape retrieval. In Proceedings of the 8th Eurographics Workshop on 3D Object Retrieval, Zurich, Switzerland, 2–3 May 2015. [Google Scholar]
- Siddiqi, K.; Zhang, J.; Macrini, D.; Shokoufandeh, A.; Bouix, S.; Dickinson, S. Retrieving articulated 3-D models using medial surfaces. Mach. Vis. Appl. 2008, 19, 261–275. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).