SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System

Wang, Runmin; Deng, Zhongliang

doi:10.3390/app152010876

Open AccessArticle

SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System

by

Runmin Wang

and

Zhongliang Deng

^*

School of Electronic Engineering, Beijing University of Post and Telecommunication, No. 10, Xitucheng Road, North Taipingzhuang, Haidian District, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10876; https://doi.org/10.3390/app152010876

Submission received: 9 September 2025 / Revised: 5 October 2025 / Accepted: 6 October 2025 / Published: 10 October 2025

Download

Browse Figures

Versions Notes

Abstract

The integration of SLAM with Gaussian splatting presents a significant challenge: achieving compatibility between real-time performance and high-quality rendering. This paper introduces a novel SLAM system named SFGS-SLAM (SuperFeats Gaussian Splatting SLAM), restructured from tracking to mapping, to address this issue. A new keypoint detection network is designed and characterized by fewer parameters than existing networks such as SuperFeats, resulting in faster processing speeds. This keypoint detection network is augmented with a global factor graph incorporating the GICP (Generalized Iterative Closest Point) odometry, reprojection-error factors and loop-closure constraints to minimize drift. It is integrated with the Gaussian splatting as the mapping part. By leveraging the reprojection error, the proposed system further reduces odometry error and improves rendering quality without compromising speed. It is worth noting that SFGS-SLAM is primarily designed for static indoor environments and does not explicitly model or suppress dynamic disturbances. Comprehensive experiments were conducted on various datasets to evaluate the performance of our system. Extensive experiments on indoor and synthetic datasets show that SFGS-SLAM achieves accuracy comparable to state-of-the-art SLAM while running in real time. SuperFeats reduces matching latency by over 50%, and joint optimization significantly improves global consistency. Our results demonstrate the practicality of combining lightweight feature matching with dense Gaussian mapping, highlighting trade-offs between speed and accuracy.

Keywords:

SLAM; vision-based navigation; indoor environment; Gaussian splatting

1. Introduction

High-quality mapping and rendering based on real-world data has long been a significant area of research [1]. The progression from traditional raster maps [2] to NeRF-based methods [3], and more recently to the advent of 3D Gaussian splatting (3DGS) [4], has led to substantial advancements in visually based mapping quality. Notably, with the increasing computational power and the advantages of Gaussian splatting in terms of mapping quality and speed [5], numerous groundbreaking studies have emerged in this field [6]. Given SLAM’s intrinsic requirement for mapping, integrating Gaussian splatting methods with SLAM offers considerable potential [7,8]. However, existing approaches face challenges in achieving satisfactory real-time performance when combining SLAM with 3DGS. This limitation presents a significant obstacle to the real-time applicability of SLAM systems.

In general, recent systems demonstrate that combining explicit geometric features with learned photometry can greatly improve mapping quality [9], but they often depend on heavy deep networks or large computation. In contrast, our goal is to build a lightweight SLAM front-end that still supports high-quality Gaussian mapping. Building upon this foundation, we have further explored the integration of keypoint detection and descriptor matching methods within vision-based Gaussian splatting frameworks, which has yet to be fully investigated. For instance, PhotoSLAM [10] lacks significant innovations in the tracking part and its complex mapping and training pipeline is substantial. In recent years, deep learning-based approaches to keypoint matching have been widely explored [11,12,13]. However, these methods have faced significant speed limitations during practical testing. They often struggled suboptimally in resource-constrained environments or systems demanding rapid feature extraction. Furthermore, the need for real-time performance in keypoint detection networks often outweighs the demand for absolute matching accuracy. This principle is evident in earlier non-learning-based feature point matching systems, such as [14,15], where accuracy limitations were effectively mitigated through specialized techniques. This adaptability has allowed the ORB-SLAM [16] system to remain a leading approach in the field of localization SLAM, even after many years of advancements in the domain. Despite the aforementioned challenges, we identify promising approaches to address them. The lightweight keypoint detection network [13] and the ICP-based 3DGS system [9] offer new directions for advancing SLAM systems. Both approaches significantly enhance speed performance while maintaining high accuracy. This lightweight network is particularly well-suited for keypoint-based tracking systems, as it achieves acceptable processing speed and superior matching accuracy compared to traditional methods such as ORB [15].

We introduce SFGS-SLAM, a complete tracking-and-mapping pipeline that balances speed and quality. On the front-end, we design SuperFeats, a tiny deep network that detects and describes image keypoints with far fewer parameters than SuperPoint [11] or ALIKE [12], inspired by XFeat [13] and YOLO bottlenecks [17]. SuperFeats outputs a heatmap and 64-D descriptors at 1/8 resolution, along with a dustbin channel to handle non-keypoint pixels. SuperFeats runs 3× faster than comparable networks while maintaining matching accuracy. In tracking, we combine feature matches with depth and apply Generalized ICP (GICP) [18] between frames for initial pose estimates. In the back-end, each matched 3D landmark and camera pose become variables, and observations become factors. Reprojection errors enforce consistency between predicted and observed image points. The factor graph consists of reprojection error factor, GICP factor and loopback factor and we solve the combined graph using GTSAM [19]. Importantly, we integrate a 3D Gaussian splatting [4] as the scene representation. Each keyframe contributes Gaussians to the model, which provides photorealistic renderings at any viewpoint. The camera poses optimized by feature reprojection ensure the Gaussian map remains consistent and drift-free. The primary contributions of this work are as follows:

To further enhance speed, we incorporated elements from YOLO-based detection methods [17] to design a lightweight keypoint detection and matching network named SuperFeats.
We designed specific loss functions for both the keypoint detection and descriptor networks in SuperFeats. By combining semi-supervised and self-supervised training, the network achieved promising results.
Leveraging this network, we integrated it with the GICP algorithm [9] and factor graph to develop a new SLAM system, SFGS-SLAM, which optimizes rendering quality efficiently.

The rest of this paper is organized as follows. Section 2 includes relative work. This section mainly introduces the related work of keypoint detection and SLAM-based Gaussian Splatting algorithm. Section 3 presents a theoretical analysis of SFGS-SLAM, including our SuperFeats network, feature point mapping and factor graph. The simulation and real-world experiment results are provided in Section 4, followed by the discussion in Section 5. We conclude this work in Section 6.

2. Relative Works

In a SLAM system, the front-end is crucial, and robust feature matching allows for a simplified back-end. Inspired by traditional keypoint methods, the seminal work SuperPoint [11] introduced deep learning for visual feature matching for the first time. Its robustness is exceptional, and more lightweight approaches have been proposed to improve speed since then. For instance, the ALIKE method [12] offers a comprehensive improvement in both matching accuracy and speed compared to earlier methods. The XFeat network [13], incorporated early downsampling and shallow convolutions, followed by deeper convolutional layers in subsequent encoders for fast and robust feature extraction. Keypoint detection is separated into its branches and processed rapidly on an 8 × 8 tensor block transform image using 1 × 1 convolutions. While the speed has improved dramatically, there is still a gap when compared to the traditional ORB method [15]. Our approach aims to bring the speed closer to that of ORB while maintaining high-quality feature matching.

The primary challenge in combining 3DGS with SLAM lies in ensuring real-time performance. Splatam [7] was the first open-source work to integrate 3DGS with SLAM. Its key contributions include fast rendering and optimization, enhanced ability to determine whether a given area has been previously reconstructed, and the extension of the original map by adding additional Gaussian representations. MonoGS [8] was the first to apply 3D Gaussian splatting to incremental 3D reconstruction using monocular or RGB-D cameras. This SLAM method, operating at a real-time speed of 3 frames per second, uses Gaussian splatting as the sole 3D representation. PhotoSLAM [10] combines the traditional ORB-SLAM3 method [16] with 3DGS, enabling the system to use explicit geometric features for localization while also capturing the texture information of the scene implicitly. This method can run on embedded platforms, showcasing its potential for robotics applications in real-world scenarios. For the first time, the integration of the ICP algorithm with 3DGS [9] enabled near-real-time tracking and mapping. In parallel, recent studies have explored how saliency prediction [20] and hierarchical reinforcement learning [21] can enhance the robustness and semantic awareness of SLAM systems, suggesting potential future directions for tightly coupling 3DGS with attention-guided or memory-augmented navigation frameworks. Despite speed improvements, the rendering quality of 3DGS-based SLAM methods still lags behind that of non-SLAM-based approaches.

3. SFGS-SLAM Framework

In SFGS-SLAM, accuracy meets efficiency. Our approach is comprehensively compared, where squares represent keypoint detection methods, and dots denote 3DGS-based SLAM algorithms. The red stars signify our proposed method. Our work comprises two main components: the design of a keypoint detection network, SuperFeats, and the SLAM system, SFGS-SLAM. The comparison demonstrates that our method achieves an optimal balance between performance and speed. Specifically, pose accuracy (10°) refers to the proportion of poses where the maximum angular error is below 10 degrees. The detection network results are derived from the Megadepth-1500 dataset, while the SLAM system’s performance is evaluated using the Replica dataset [22], as shown in Figure 1.

Our SFGS-SLAM pipeline consists of three main parts: (1) a lightweight keypoint detection and description network (SuperFeats) for the SLAM front-end, (2) frame alignment using GICP for odometry and Gaussian Point initialization, and (3) global optimization of camera poses and map points via factor graphs, with a Gaussian splatting rendering for mapping. Figure 2 outlines the system flow, and we describe each component in detail.

3.1. Network Structure and Loss Functions

In a tracking system, the length of the descriptor does not directly determine the final tracking accuracy. For example, the 128-dimensional SIFT descriptor and the 32-dimensional ORB descriptor both exhibit lower matching accuracy compared to similar methods in SuperPoint [11]. Nevertheless, systems such as VINS-Mono [23] and ORB-SLAM3 [16] have achieved outstanding performance with these descriptors. On the other hand, the running speed has always been a challenge for device-side feature detection networks. To address this, we have designed a smaller network called SuperFeats, aiming to achieve faster speeds without compromising matching accuracy. For the detection component, we drew inspiration from the XFeat [13] and developed a simple detection network. This network utilizes multiple 1 × 1 convolutions and multi-layer convolutional modules within the detection head, based on deep convolutional networks, to ensure high resolution and accuracy in the detection results. The grayscale image, with dimensions H × W × 1, is decomposed into a matrix of size H/8 × W/8 × 64. Additionally, a dustbin layer is added to the last dimension of the matrix to account for the absence of keypoints, resulting in a final output of size H/8 × W/8 × 64.

In the description subnetwork, the SuperFeats network is inspired by YOLOPoint [24], with the Faster Implementation of CSP bottleneck with 2 convolutions (C2F) module from YOLOv8 serving as the core component. This module is effectively combined with multiple convolutional layers and bottleneck structures. The C2F module has been shown to significantly reduce computational complexity while preserving the ability to express spatial information [17]. The introduction of this module allows our network to achieve efficient data compression and information transfer during keypoint extraction, ensuring both low computational cost and high inference speed. For the SuperFeats model, the descriptor length is set to 64. Shortening the descriptor length reduces the computational cost of matching but may come at the expense of precision. The network starts with an input grayscale image, which is processed through a normalization layer. Subsequently, SuperFeats progressively extracts both low-level features and high-level semantic information through four convolutional layers. Each convolutional layer is followed by a C2F module, and the block fusion modules within the network merge features from different levels to enhance feature expressiveness. The final two layers of the network are fused. The output of the network consists of two primary components: a heatmap and a descriptor. The network structure is illustrated in Figure 3.

Due to the dense number of points in XFeat [13], we use this network to generate pre-trained matching heatmaps on synthetic datasets. Corresponding real point labels are then generated as ground truth supervision for various datasets. We apply known homography transformations to each image and introduce different shapes of masks to further enhance the differences between images to make augmented images. We designed distinct loss functions for detection and description tasks.

For the detection component, we use binary cross-entropy (BCE) loss and contrastive loss to refine the identification of feature points at the pixel level. The

L_{B C E}

is the mean of the binary cross-entropy losses over all pixels of the heatmaps of the original and augmented images of size H × W and corresponding ground truth labels:

L_{B C E} = - \frac{1}{H W} \sum_{i, j} (q_{i j} l o g p_{i j} + (1 - q_{i j}) l o g (1 - p_{i j}))

(1)

where

p_{i j}

represents the predicted probability of position

i, j

of the network output, and

q_{i j} \in {0,1}

is the target label.

Additionally, pixel-level contrastive loss is employed to enhance the quality of local keypoint matching. The

L_{c o n}

is to minimize the distance between keypoints in the augmented images and the corresponding original image, which is particularly crucial in tracking tasks. This pixel-level offset is a key factor contributing to cumulative errors in the front-end odometry. The loss function is designed to reduce this discrepancy, ensuring more accurate keypoint matching and improved odometry performance:

L_{c o n} = \frac{1}{N} \sum_{i = 1}^{N} (y_{i} \cdot d_{i}^{2} + (1 - y_{i}) \cdot m a x (0, m_{c o n} - d_{i}^{2}))

(2)

where

N

represents the total number of sample pairs, and

d_{i}

represents the Euclidean distances between the matched augmented keypoints and the original keypoints. Here,

y_{i} \in {0,1}

denotes whether the keypoints are valid or not, and

m_{c o n}

is used to limit the loss contribution from negative samples. This formulation ensures that the loss is focused on improving the accuracy of true keypoint matches while reducing the influence of irrelevant or negative samples.

The descriptor loss also utilizes contrastive loss. The loss function computes the similarity between the original and augmented image descriptors. This loss is divided into two components: matching loss and non-matching loss. The matching loss measures the similarity between corresponding feature points, while the non-matching loss ensures that the descriptors of non-corresponding points are dissimilar. The offset distance between the coordinates of each keypoint and the true value labels is calculated, and a mask is generated to focus the loss calculation on valid feature points, effectively reducing the influence of irrelevant or incorrectly matched points.

{m a s k}_{i, j} = \{\begin{matrix} 1, & d_{i, k} \leq σ ∥ d_{j, k} \leq σ \\ 0, & otherwise \end{matrix}

(3)

where

i, j

represents the corresponding descriptor sub-numbers in the two training images, and

k

represents the corresponding number in the labels.

s_{i, j} = D_{i}^{T} \cdot D_{j}

(4)

where

D_{i}

and

D_{j}

are descriptors for their respective positions in the original and augmented images, respectively.

\begin{matrix} L_{d e s c} = m a s k \cdot [λ_{d} \cdot \frac{1}{N} \cdot \sum_{i, j}^{N} m a x (0, m_{pos} - s_{i, j}) + \frac{1}{M} \cdot \sum_{i, j}^{M} m a x (0, s_{i, j} - m_{neg})] \end{matrix}

(5)

N

represents the number of positive sample pairs, and

M

denotes the number of negative sample pairs.

λ_{d}

can balance the ratio. The hyperparameters

m_{pos}

and

m_{neg}

define the margins for positive and negative samples with their values provided in the experimental section. The objective of the optimization is to maximize the similarity between matching descriptors and to minimize the similarity between mismatched descriptors. This balance ensures robust feature matching, improving both accuracy and reliability in feature-based tasks.

3.2. Feature Point Mapping

In the mapping component, we adopted the framework of RGBD GS-ICP SLAM [9] to evaluate the effectiveness of our proposed algorithm. The modifications are primarily concentrated in the tracking and do not involve the keyframe selection process, so the detailed principles of the original work are not reiterated here. By lowering the threshold, we identify the most prominent points in each region, enabling the generation of dense point clouds that closely resemble the original ones. Compared to the original method, this approach allows for further refinement by filtering out more characteristic points. It reduces the inclusion of excessive point clouds from walls or featureless planes in the original method while extracting additional distinctive point clouds. This selective point-cloud generation strategy, combined with a mapping densification scheme, effectively enhances the mapping speed without compromising quality.

Consider a scenario where feature points are detected in two image frames,

I_{1}

and

I_{2}

. After applying Non-Maximum Suppression (NMS), the matching relationships between feature points in the two frames are established. To refine these matches, the slopes between all corresponding point pairs are calculated, as the slopes encapsulate the spatial direction information between the points. The matching process prioritizes selecting point pairs with the most consistent slopes, identifying them as correct matches. If the number of matching points falls below 20, suboptimal matches are incorporated to ensure a sufficient number of points. This strategy prevents crashes caused by an insufficient quantity of matching points, thereby maintaining the robustness of the tracking system.

The GICP algorithm determines the rigid transformation between two frames,

I_{a}

and

I_{b}

, represented by the rotation matrix

R

and translation vector

t

[18]. For each point

p_{n}

in image

I_{a}

, after the transformation, the expected position

p_{n}^{'}

of the corresponding point in image

I_{b}

is:

p_{n}^{'} = R p_{n} + t

(6)

Using SuperFeats, matching points

(p_{n}, q_{n})

are established between the two frames. The reprojection error measures the difference between the transformed point

p_{n}^{'}

and its corresponding match

q_{n}

in

I_{b}

:

e_{n} = p_{n}^{'} - q_{n}

(7)

where

e_{n}

is the reprojection error for the pair of matching points. A least-squares optimization is applied to refine the transformation matrix, and we use the Scipy optimization library. The total residuals for all matching points are expressed as:

E = \sum_{n = 1}^{N} ∥ e_{n} ∥^{2} = \sum_{n = 1}^{N} ∥ p_{n}^{'} - q_{n} ∥^{2}

(8)

where

N

is the total number of correct four matching points. The optimization process aims to minimize the total residuals

E

, ensuring the transformation matrix

(R, t)

best aligns the feature points between the two frames:

R^{*}, t^{*} = {a r g m i n}_{R, t} \sum_{n = 1}^{N} ∥ (R p_{n} + t) - q_{n} ∥^{2}

(9)

3.3. Factor Graph

In SFGS-SLAM, a unified factor graph back-end is utilized to optimize all measurements jointly. Figure 4 illustrates the structure of this factor graph, which we describe as follows. In our approach, all front-end measurements are jointly optimized in a single factor graph. A factor graph represents the joint probability of all state variables

X

as the product of local factor potentials

ϕ_{i}

, each encoding a probabilistic constraint from a measurement.

ϕ (X) = \prod_{i} ϕ_{i} (X_{i})

(10)

The maximum-a posteriori (MAP) estimate maximizes this product, and we minimize the negative log-likelihood,

a r g \underset{X}{m a x} ϕ (X) \equiv a r g \underset{X}{m i n} \sum_{i} ∥ e_{i} (X_{i}) ∥_{Σ_{i}^{- 1}}^{2}

(11)

where each

e_{i}

is a measurement error function and its covariance is

Σ_{i}

. In practice, we build and solve this nonlinear factor graph using GTSAM [19] by adding factors for each measurement type. The unified optimization adjusts all poses (and landmarks) to best satisfy the combined constraints.

GICP Factors: Each pair of consecutive RGBD frames

(k - 1, k)

leads a relative pose measurement

{\hat{T}}_{k, k - 1}

from GICP. Under the assumption that measurement noise is zero-mean Gaussian along the normal direction, we add a binary factor between

T_{k - 1}

and

T_{k}

with error.

e_{o d o m} = l o g ({\hat{T}}_{k, k - 1}^{- 1} T_{k, k - 1})

(12)

Reprojection Factors: For each SuperFeats keypoint match between a 3D landmark

L_{j}

and an image observed in pose

X_{i}

, we add a reprojection error factor. If

u_{i j}

is the observed pixel location and

π (\cdot)

the camera projection, the error is assuming isotropic Gaussian noise in image space, and the cost is

| e_{p r o j} |^{2}

with an appropriate covariance

Σ_{p r o j}

. These factors tie 3D landmarks and camera poses together, similar to standard bundle-adjustment terms, and constrain the camera trajectory by feature tracks.

Loop Closure Factors: When a loop is detected between the current frame

i

and a previously visited frame

j

, we compute a relative pose measurement

{\hat{T}}_{i j}

. We then introduce a binary factor connecting the two pose states

X_{i}

and

X_{j}

. The loop detection and verification process is implemented using the built-in GTSAM loop-closure module, which automatically identifies revisited poses based on pose graph consistency and residual thresholds. This ensures that only geometrically valid loop constraints are added, effectively suppressing false positives without requiring additional external loop-detection algorithms.

We perform nonlinear optimization over the full pose graph. By jointly optimizing GICP odometry residuals, reprojections residuals, and loop closure constraints in the factor graph, the system enforces global consistency.

4. Results

Section 4 presents the experimental evaluation of SFGS-SLAM. First, Section 4.1 describes the experimental setup, including the datasets, training parameters, and evaluation metrics. Next, Section 4.2 reports the keypoint matching results, Section 4.3 discusses the tracking performance, and Section 4.4 evaluates the mapping and rendering quality. Together, these results demonstrate the effectiveness of our approach under various conditions.

4.1. Setup

We validate the proposed algorithm’s performance from three perspectives: matching ability, tracking ability, and rendering effect. Below, we outline the datasets, parameters, and evaluation metrics used for each experiment. In the training phase, our network was initially trained on the Megadepth dataset using the Xfeat [13] training framework, and subsequently, the pre-trained model was fine-tuned on the COCO 2017 [25] dataset. While using the Megadepth dataset [26] is not strictly necessary, we observed that its inclusion improves experimental results and reduces training time. The Adam optimizer was employed for all training stages, with an initial learning rate of

10^{- 3}

for training on Megadepth and

10^{- 4}

for pre-training and fine-tuning. The fine-tuning process was performed for 30 epochs on the COCO dataset [25]. Hyperparameters were set as

λ_{d} = 1.3

,

m_{p o s} = 7.5,

m_{p o s} = 1

and

m_{n e g} = 0.2 .

In addition to random homography transformations and synthetic occlusion masks, we apply photometric augmentations and small random rotations and noise. These augmentations further enrich training data beyond the homography + mask strategy, improving robustness.

4.2. Matching

We use HPatches [27] to verify the matching ability of keypoints, as it accounts for a wide range of lighting and viewpoint variations. The HPatches dataset contains 116 scenes, each with 6 images, including 57 scenes with significant lighting changes and 59 scenes with substantial viewpoint changes. This dataset is particularly suitable for assessing the detector’s repetition and the descriptor’s matching rates. Our evaluation primarily focuses on three key metrics: speed, matching accuracy, and dimensions of descriptors. The experiment results are illustrated in Figure 5 and Table 1.

The keypoint matching results on the HPatches dataset, as shown in Table 1, demonstrate that SuperFeats achieves a favorable trade-off between accuracy and efficiency. Compared to SuperPoint, SuperFeats improves the Acc (10°) metric by approximately 5% while delivering a 6.3-fold increase in inference speed. With a significantly smaller parameter count, it is more suitable for embedded or mobile platforms. Compared to XFeat, SuperFeats achieves comparable matching accuracy with a 25% higher inference speed, confirming that its network design achieves an effective balance between compactness and expressive capacity.

Further analysis of the qualitative matching results in Figure 5 indicates that SuperFeats remains robust under various disturbances such as occlusions and scale changes. In particular, its performance under varying illumination conditions is notable. The dense heatmap outputs show accurate localization, and the matched keypoints are distributed evenly across the image, suggesting that the proposed loss functions effectively enhance the discriminative power of local feature representations.

Although SuperFeats does not top every accuracy metric, we selected this network for its superior speed and lightweight design, which are crucial for real-time SLAM. In our complete SFGS-SLAM system, the slight trade-off in descriptor accuracy is compensated by the global optimization, allowing the system to achieve final localization and mapping performance comparable to more computationally intensive methods.

4.3. Tracking

We used the TUM and Replica datasets to validate our tracking capabilities. The TUM dataset includes real-world indoor scenes with significant noise, blur, and instances of depth information loss. The TUM comprises two components: the RGBD-Dataset [28] and the Visual-Inertial Dataset [29], while we chose the Visual-Inertial Dataset only to demonstrate tracking performance. Table 2 shows the tracking result of replica data, Table 3 shows the tracking result of the TUM dataset and Figure 6 shows a qualitative comparison of the tracking result in the TUM dataset. We assessed camera tracking accuracy using absolute trajectory error (ATE) and feet per second (FPS).

According to the quantitative results reported in Table 2 and Table 3, SFGS-SLAM achieves competitive localization accuracy while maintaining real-time performance. On the Replica dataset, SFGS-SLAM achieves an absolute trajectory error (ATE) of 2.33 m, outperforming both GICP-SLAM and SplaTAM. Although its accuracy is slightly lower than Photo-SLAM, SFGS-SLAM employs a much lighter-weight network, which makes its performance particularly compelling.

On the TUM dataset, SFGS-SLAM achieves a PSNR of 24.85 dB, outperforming both Photo-SLAM and GICP-SLAM. Furthermore, the SSIM and LPIPS metrics are also improved significantly. These results indicate that, despite the reduced number and dimensionality of keypoints produced by SuperFeats, the integration with a global factor graph enables SFGS-SLAM to maintain robust and accurate pose estimation. The system exhibits resilience to loop closure errors and is capable of stable tracking in challenging indoor environments.

In terms of runtime, SFGS-SLAM achieves a real-time frame rate of 31.26 FPS on the TUM dataset. Although slightly lower than GICP-SLAM, it substantially exceeds the runtime performance of Photo-SLAM and SplaTAM, underscoring its suitability for deployment on resource-constrained platforms, such as edge devices and lightweight robotic systems.

4.4. Rendering

We used the Replica [22] and TUM [28] datasets to evaluate the rendering comprehensively. The Replica dataset includes composite scenes with high-quality rendered RGB and depth images, offering detailed geometry and lighting information. We evaluated the quality of rendering with image quality metrics, including the peak-signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS). Table 4 shows the rendering result of the Replica dataset, Table 3 shows the rendering result of the TUM dataset, and Figure 7 shows a qualitative comparison of the rendering result using the TUM dataset’s Desk sequence.

Table 3 and Table 4 present the reconstruction quality of SFGS-SLAM across multiple scenes from the TUM and Replica datasets. Across all evaluated scenes, SFGS-SLAM consistently outperforms GICP-SLAM, SplaTAM, and MonoGS in terms of PSNR, SSIM, and LPIPS metrics. In some scenes, its rendering quality approaches that of Photo-SLAM. For example, in the Office0 scene, SFGS-SLAM achieves a PSNR of 40.35 dB and an SSIM of 0.986, reaching a level comparable to leading rendering-based SLAM methods.

The rendering comparison in Figure 7 also provides qualitative evidence: the results generated by SFGS-SLAM exhibit clearer geometric structures near high-frequency regions, such as window frames and object edges. In contrast to the blurred contours and structural drift observed in GICP-SLAM, SFGS-SLAM provides a more accurate and visually coherent reconstruction, confirming its effectiveness in preserving fine-grained details.

4.5. Real-World Experiment

We further conducted real-world experiments in the Asian Winter Games Ice Hockey Arena, located in Harbin. This venue features extensive flat surfaces, localized highly reflective materials, numerous occluding structures, and significant spatial depth variation, making it a suitable environment for indoor 3D reconstruction evaluation. As illustrated in Figure 8, data acquisition was performed using the Amu P250 UAV. The platform is built on a 250 mm frame and equipped with a NVIDIA Jetson TX2 edge computing unit, a Holybro Pixhawk 6c mini flight controller, and an Intel RealSense D435i stereo camera as the primary visual sensor. During the experimental flight, the UAV operated at an altitude of 1.5 to 3 m, covering the central area of the arena and part of the spectator stands. The UAV followed a pre-defined flight path to perform multi-view image capture. The captured images were then transmitted to a server, with the same configuration as that described in Section 4.1, for offline reconstruction and mapping. The camera resolution was set to 1280 × 720 pixels.

Due to the absence of RTK data during the flight and the lack of a motion capture system within the arena, ground-truth trajectory data could not be obtained. Therefore, this section focuses solely on the evaluation of reconstruction quality rather than localization accuracy.

We compared our method with 3D Gaussian Splatting, Photo-SLAM, and GICP-SLAM. As summarized in Table 5, a total of eight complete datasets were recorded, and the table reports the average performance. Beyond rendering quality, SFGS-SLAM also exhibits clear advantages in computational efficiency and resource usage. As shown in the last two columns of Table 5, SFGS-SLAM achieves the lowest map storage and GPU memory consumption among all compared methods, which is better suitable for embedded and edge computing applications. The proposed method consistently demonstrates superior image quality across various viewpoints. It effectively reduces shadow artifacts in rendering and improves reconstruction consistency through SuperFeats’ enhanced detection of stable structures. Moreover, the joint optimization of GICP and reprojection factors significantly suppresses dynamic errors and drift. Qualitative comparisons are shown in Figure 9 and Figure 10, where we present rendering results from multiple viewpoints alongside the ground-truth RGB images and the outputs of Gaussian Splatting.

4.6. Ablation

To better understand the contribution of each component within the proposed SFGS-SLAM system, we conducted Ablation experiments on the TUM RGB-D dataset [28]. Specifically, we evaluated the impact of removing two key modules: the keypoint detection network SuperFeats and the factor graph optimization. The Without SuperFeats setting removes our proposed keypoint detection network and replaces it with ORB [15]. The Without Factor Graph variant disables the global optimization framework and relies solely on frame-to-frame GICP-based odometry.

Quantitative results are presented in Table 6. As expected, both ablations lead to reduced performance in terms of rendering quality (PSNR, SSIM, LPIPS) and trajectory accuracy (ATE). The removal of SuperFeats leads to noisier reconstructions and moderately degraded tracking, while removing the Factor Graph causes severe pose drift, confirming the importance of global optimization in maintaining long-term consistency. Our complete system outperforms both baselines and demonstrates the synergy between semantic-aware feature detection and factor graph-based global optimization.

5. Discussion

SFGS-SLAM demonstrates a well-balanced performance across multiple evaluation dimensions. Its keypoint detection network sacrifices a small amount of absolute accuracy in exchange for substantial improvements in inference speed. When coupled with GICP registration and factor graph-based global optimization, the system maintains robustness and low cumulative drift over long trajectories. As shown in Table 1, SuperFeats achieves a 5% gain in Acc (10°) over SuperPoint, while being 6.3 times faster in inference, which directly contributes to the real-time capabilities of the entire SLAM pipeline.

Importantly, despite not utilizing complex graph neural networks or volumetric radiance fields like NeRF, SFGS-SLAM achieves high-fidelity reconstruction using a lightweight Gaussian splatting framework. This is particularly evident in its superior handling of edge contours and low-texture regions, where it demonstrates improved structural consistency. These results prove that the keypoints extracted by SuperFeats are not only well-distributed but also geometrically stable, providing strong constraints for reprojection error minimization and subsequent Gaussian map optimization.

Compared to rendering-heavy systems such as SplaTAM and Photo-SLAM, SFGS-SLAM significantly reduces dependence on high-end graphics hardware, making it highly suitable for deployment on mobile robots, drones, and other computation-limited platforms. Furthermore, compared to traditional methods like ORB-SLAM3, the integration of the Gaussian splatting module in SFGS-SLAM enhances scene-level rendering quality and opens up potential for downstream applications such as scene understanding, semantic mapping, and augmented reality.

6. Conclusions

In conclusion, we have introduced SFGS-SLAM, a lightweight SLAM system that integrates a novel keypoint network with a Gaussian splatting mapper, to balance real-time performance and high-fidelity mapping. This work’s primary contributions include the design of the SuperFeats feature network, which drastically reduces computation while maintaining robust matching accuracy, and the incorporation of a factor graph with GICP and loop-closure to ensure global consistency. Our experiments demonstrate that SFGS-SLAM achieves competitive accuracy and rendering quality compared to state-of-the-art methods. Moreover, the Gaussian splatting map representation yields photorealistic reconstructions, highlighting the benefit of our approach for tasks such as augmented reality and semantic mapping.

Despite these strengths, we acknowledge several limitations of the current system. SFGS-SLAM does not explicitly model dynamic objects, which may lead to degraded performance in scenes with moving elements. The 64-dimensional SuperFeats descriptors, while efficient, may lose discriminative power in highly repetitive textures like other deep learning-based methods. Additionally, our real-world evaluation lacked ground-truth positioning data, making it difficult to quantify absolute accuracy in those scenarios. In future work, we plan to extend SFGS-SLAM to handle dynamic environments by incorporating motion segmentation and semantic filtering, thereby enabling robust performance under real-world disturbances, similar to our previous work [31]. We also aim to optimize the map representation to reduce redundancy and memory usage. In addition, we plan to conduct onboard and real-time deployment tests on robotic and UAV platforms to further verify the practicality and robustness of SFGS-SLAM in real operational environments. Furthermore, we will explore extending our evaluations to dynamic environment datasets and practical application scenarios and include more lightweight SLAM baselines in our comparisons to further validate SFGS-SLAM’s performance.

Author Contributions

Conceptualization, R.W. and Z.D.; methodology, R.W.; software, R.W.; validation, R.W.; formal analysis, R.W.; investigation, R.W.; resources, R.W.; data curation, R.W.; writing—original draft preparation, R.W.; writing—review and editing, R.W.; visualization, R.W.; supervision, Z.D.; project administration, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study did not report any data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, G.; Wang, W. A survey on 3D gaussian splatting. arXiv 2024, arXiv:2401.03890. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
Lin, J. Dynamic nerf: A review. arXiv 2024, arXiv:2405.08609. [Google Scholar] [CrossRef]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 1–14. [Google Scholar] [CrossRef]
Fei, B.; Xu, J.; Zhang, R.; Zhou, Q.; Yang, W.; He, Y. 3D gaussian splatting as new era: A survey. IEEE Trans. Vis. Comput. Graph. 2024, 31, 4429–4449. [Google Scholar] [CrossRef] [PubMed]
Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2d gaussian splatting for geometrically accurate radiance fields. In Proceedings of the ACM SIGGRAPH 2024 Conference Papers, Denver, CO, USA, 27 July–1 August 2024; pp. 1–11. [Google Scholar]
Keetha, N.; Karhade, J.; Jatavallabhula, K.M.; Yang, G.; Scherer, S.; Ramanan, D.; Luiten, J. Splatam: Splat track & map 3D gaussians for dense rgb-d slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21357–21366. [Google Scholar]
Matsuki, H.; Murai, R.; Kelly, P.H.; Davison, A.J. Gaussian splatting slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 18039–18048. [Google Scholar]
Ha, S.; Yeon, J.; Yu, H. Rgbd gs-icp slam. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 180–197. [Google Scholar]
Huang, H.; Li, L.; Cheng, H.; Yeung, S.-K. Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21584–21593. [Google Scholar]
DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 224–236. [Google Scholar]
Zhao, X.; Wu, X.; Miao, J.; Chen, W.; Chen, P.C.; Li, Z. Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans. Multimed. 2022, 25, 3101–3112. [Google Scholar] [CrossRef]
Potje, G.; Cadar, F.; Araujo, A.; Martins, R.; Nascimento, E.R. Xfeat: Accelerated features for lightweight image matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2682–2691. [Google Scholar]
Vedaldi, A. An implementation of SIFT detector and descriptor. Univ. Calif. Los Angeles 2006, 7. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Koide, K.; Yokozuka, M.; Oishi, S.; Banno, A. Voxelized GICP for fast and accurate 3D point cloud registration. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11054–11059. [Google Scholar]
Dellaert, F. Factor graphs and GTSAM: A hands-on introduction. Ga. Inst. Technol. Tech. Rep. 2012, 2. [Google Scholar]
Jin, S.; Dai, X.; Meng, Q. “Focusing on the right regions”—Guided saliency prediction for visual SLAM. Expert Syst. Appl. 2023, 213, 119068. [Google Scholar] [CrossRef]
Jin, S.; Wang, X.; Meng, Q. Spatial memory-augmented visual navigation based on hierarchical deep reinforcement learning in unknown environments. Knowl. Based Syst. 2024, 285, 111358. [Google Scholar] [CrossRef]
Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J.J.; Mur-Artal, R.; Ren, C.; Verma, S. The replica dataset: A digital replica of indoor spaces. arXiv 2019, arXiv:1906.05797. [Google Scholar] [CrossRef]
Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
Backhaus, A.; Luettel, T.; Wuensche, H.-J. YOLOPoint: Joint Keypoint and Object Detection. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Kumamoto, Japan, 21–23 August 2023; pp. 112–123. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Li, Z.; Snavely, N. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2041–2050. [Google Scholar]
Balntas, V.; Lenc, K.; Vedaldi, A.; Mikolajczyk, K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5173–5182. [Google Scholar]
Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
Schubert, D.; Goll, T.; Demmel, N.; Usenko, V.; Stückler, J.; Cremers, D. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1680–1687. [Google Scholar]
Zhao, Z.; Wu, C.; Kong, X.; Li, Q.; Guo, Z.; Lv, Z.; Du, X. Light-SLAM: A robust deep-learning visual SLAM system based on LightGlue under challenging lighting conditions. IEEE Trans. Intell. Transp. Syst. 2025, 26, 9918–9931. [Google Scholar] [CrossRef]
Deng, Z.; Wang, R. SGF-SLAM: Semantic Gaussian Filtering SLAM for Urban Road Environments. Sensors 2025, 25, 3602. [Google Scholar] [CrossRef] [PubMed]

Figure 1. In SFGS-SLAM, accuracy meets efficiency. The SLAM method corresponds to the left PSNR (Peak-signal-to-noise Ratio) axis, reflecting the photorealistic quality of reconstruction, while the detection network corresponds to the right pose accuracy axis, capturing the robustness of feature-based localization.

Figure 2. Overview of our pipeline. By integrating with the tracking results from GICP, the reprojection error is optimized to further enhance overall tracking accuracy. This optimization ultimately leads to improved mapping quality.

Figure 3. The structure diagram of the descriptor part in the superfeats network. Notably, C2F represents Faster Implementation of CSP Bottleneck with 2 convolutions derived from YOLOv8 [17].

Figure 4. The structure diagram of factor graph.

Figure 5. Qualitative results on HPatches are shown, with the left (a) representing the Superpoint matching result, the middle (b) showing the XFeat network matching result, and the right (c) displaying the SuperFeats network matching result. Our SuperFeats network achieves noble matching performance among the compared methods.

Figure 6. Qualitative comparison of the tracking capabilities of the detection network, with XFeat [13] on the left (a), SuperPoint [11] in the middle (b), and SuperFeats on the right (c). We use the dataset-corridor1_512_16 sequence of TUM [29].

Figure 7. Qualitative comparison of rendering results. The left side (a) is the Ground Truth with Replica dataset’s Desk sequence [22], the middle (b) is the rendering result using GICP-SLAM [9], and the right side (c) is the rendering result of our method. It can be seen that our method has a better rendering effect near the window.

Figure 8. Experimental setup illustration. The left side shows the drone used in the experiments, while the right side depicts the actual testing environment.

Figure 9. Rendering results of the scoreboard side of the arena are also evaluated. From top to bottom are the ground truth, Gaussian Splatting [4], and our proposed method.

Figure 10. Qualitative rendering comparison at the main stand of the Asian Winter Games Ice Hockey Arena. From top to bottom are the ground truth, Gaussian Splatting [4], and our proposed method.

Table 1. Quantitative feature matching results on hpatches dataset [27].

Methods	Acc (5°)	Acc (10°) ¹	Dim	FPS
ORB [15]	13.8	31.9	256-b	46.7
Alike [12]	49.3	77.7	64-f	6.9
Superpoint [11]	45.0	67.4	256-f	5.1
Xfeat [13]	41.9	74.9	64-f	25.8
SuperFeats	40.1	70.6	64-f	32.2

¹ The proportion of poses where the maximum angular error is below 10 degrees.

Table 2. Quantitative tracking results on replica dataset [22].

Methods	ATE (cm)	FPS
ORB-SLAM3 [16]	1.27	156.46
Light-SLAM [30]	2.34	167.13
SplaTAM [7]	3.23	0.43
Photo-SLAM [10]	1.27	41.66
MonoGS SLAM [8]	3.69	3.21
GICP-SLAM [9]	2.40	45.59
SFGS-SLAM	2.33	33.17

Table 3. Quantitative rendering and tracking results on tum dataset [28].

Methods	PSNR [dB] ↑	SSIM ↑	LPIPS ↓	FPS
SplaTAM [7]	23.46	0.906	0.156	0.32
Photo-SLAM [10]	21.40	0.738	0.447	36.75
GICP-SLAM [9]	19.62	0.750	0.240	42.41
SFGS-SLAM	24.85	0.909	0.187	31.26

Table 4. Quantitative render results on replica dataset [22].

Method	Metric	Room0	Room1	Room2	Office0	Office1	Office2	Office3	Office4	Avg.
SplaTAM [7]	PSNR [dB] ↑	32.60	33.55	34.83	38.09	39.02	31.95	29.53	31.55	33.88
	SSIM ↓	0.975	0.969	0.982	0.983	0.981	0.966	0.949	0.951	0.970
	LPIPS ↓	0.070	0.097	0.074	0.088	0.093	0.098	0.119	0.150	0.099
Photo-SLAM [10]	PSNR [dB] ↑	32.09	33.03	34.30	37.56	38.42	31.47	29.10	31.05	33.37
	SSIM ↓	0.920	0.915	0.921	0.925	0.929	0.912	0.899	0.896	0.915
	LPIPS ↓	0.053	0.074	0.056	0.067	0.071	0.075	0.091	0.114	0.075
Mono GS [8]	PSNR [dB] ↑	32.83	36.43	37.49	39.95	42.09	36.24	36.70	36.07	37.22
	SSIM ↓	0.954	0.959	0.965	0.971	0.977	0.964	0.963	0.957	0.964
	LPIPS ↓	0.068	0.076	0.075	0.072	0.055	0.078	0.065	0.099	0.073
GICP-SLAM [9]	PSNR [dB] ↑	32.20	35.36	34.42	40.31	40.75	33.85	34.08	34.47	35.93
	SSIM ↓	0.940	0.960	0.957	0.978	0.977	0.962	0.953	0.963	0.961
	LPIPS ↓	0.081	0.067	0.083	0.045	0.051	0.069	0.067	0.065	0.066
SFGS-SLAM	PSNR [dB] ↑	35.39	36.91	35.58	40.35	40.79	39.02	35.25	37.72	37.63
	SSIM ↓	0.957	0.966	0.969	0.986	0.984	0.972	0.965	0.973	0.972
	LPIPS ↓	0.079	0.066	0.082	0.044	0.050	0.067	0.066	0.063	0.064

Table 5. Quantitative rendering results on ice hockey arena dataset.

Methods	PSNR [dB] ↑	SSIM ↑	LPIPS ↓	Map Storage (MB)	GPU Memory (MB)
3D Gaussian Splatting [4]	23.35	0.787	0.298	3432	6945
Photo-SLAM [10]	20.57	0.706	0.345	3247	4126
GICP-SLAM [9]	19.36	0.714	0.301	2981	3941
SFGS-SLAM	24.66	0.805	0.239	2877	3804

Table 6. Quantitative rendering and tracking results on tum dataset [28].

Methods	PSNR [dB] ↑	SSIM ↑	LPIPS ↓	ATE (cm)
Without SuperFeats	23.08	0.881	0.224	3.74
Without Factor Graph	21.56	0.798	0.346	5.23
SFGS-SLAM	24.85	0.909	0.187	2.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Deng, Z. SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Appl. Sci. 2025, 15, 10876. https://doi.org/10.3390/app152010876

AMA Style

Wang R, Deng Z. SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Applied Sciences. 2025; 15(20):10876. https://doi.org/10.3390/app152010876

Chicago/Turabian Style

Wang, Runmin, and Zhongliang Deng. 2025. "SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System" Applied Sciences 15, no. 20: 10876. https://doi.org/10.3390/app152010876

APA Style

Wang, R., & Deng, Z. (2025). SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System. Applied Sciences, 15(20), 10876. https://doi.org/10.3390/app152010876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SFGS-SLAM: Lightweight Image Matching Combined with Gaussian Splatting for a Tracking and Mapping System

Abstract

1. Introduction

2. Relative Works

3. SFGS-SLAM Framework

3.1. Network Structure and Loss Functions

3.2. Feature Point Mapping

3.3. Factor Graph

4. Results

4.1. Setup

4.2. Matching

4.3. Tracking

4.4. Rendering

4.5. Real-World Experiment

4.6. Ablation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI