1. Introduction
Archaeological artifacts serve as tangible connections to past societies, offering valuable insights into the lives, technologies, and cultural practices of our predecessors. These objects (ranging from stone tools and ceramic vessels to ornate jewelry and inscribed tablets) act as portals to bygone eras, enabling archaeologists and historians to reconstruct patterns of daily life, belief systems, trade networks, and social organization over millennia [
1,
2,
3,
4,
5,
6,
7]. Preserving and studying such artifacts is critical not only for advancing scholarly understanding but also for safeguarding cultural heritage for future generations.
Modern conservation efforts employ a wide array of strategies to protect these irreplaceable objects, including environmental monitoring, careful handling, preventive conservation, and systematic documentation [
8,
9,
10]. Among these, digital documentation through high-resolution three-dimensional (3D) modeling has emerged as an essential tool for both research and preservation [
1,
11,
12,
13,
14,
15,
16]. Accurate 3D models allow for detailed, non-invasive study of fragile artifacts and enable wider public access through virtual exhibitions and replicas. For small-sized artifacts in particular (those typically under 15 cm in size), achieving high-fidelity digital representations poses notable challenges due to their complex geometries, reflective surfaces, and fine textural details.
In the last two decades, geomatics technologies such as terrestrial laser scanning, structured-light scanning, and SfM photogrammetry have revolutionized 3D feature/object documentation practices in archaeology [
17,
18,
19,
20,
21]. SfM photogrammetry, in particular, has gained prominence due to its accessibility and ability to reconstruct dense point clouds and textured meshes from overlapping photographic datasets [
22]. Nevertheless, when applied to small archaeological artifacts, traditional SfM techniques often encounter limitations: they require controlled lighting, extensive photographic coverage, and struggle with shiny or translucent materials [
23,
24,
25]. These constraints motivate the exploration of alternative or complementary approaches capable of addressing the specific difficulties posed by small, complex cultural heritage objects.
Recent advances in artificial intelligence have introduced new paradigms for 3D reconstruction, notably NeRF [
26]. Unlike conventional methods that explicitly reconstruct geometry, NeRF models learn a volumetric representation of a scene from a set of input images and synthesize photorealistic views from novel perspectives. By leveraging deep neural networks, NeRF can capture intricate lighting effects, subtle textures, and material properties that traditional methods may overlook [
27]. Early applications in cultural heritage contexts suggest that NeRF could become a transformative technology for the non-invasive documentation of artifacts, particularly in scenarios where optical challenges hinder feature-based reconstruction techniques [
28].
Recent works have explored NeRF approaches in architectural documentation and large-scale outdoor environments [
28,
29,
30,
31]. However, few studies have examined the behavior of neural rendering methods in the context of small archaeological artifacts, especially those with metallic or reflective surfaces. Most existing benchmarks focus on synthetic datasets, ideal lighting, or non-reflective materials. This study addresses that gap by applying NeRF and Gaussian Splatting to a challenging but realistic use case: bronze fibulae from the Late Iron Age with high reflectivity and complex geometry. This constitutes a novel application of neural rendering methods to micro-scale, reflective cultural heritage objects, a scenario rarely addressed in the current literature.
This approach complements recent evaluations such as [
27,
32] but moves the focus toward micro-scale heritage documentation, which remains underrepresented in the literature.
Moreover, rapid developments in NeRF-related methods have significantly improved their efficiency and quality. One of the most promising recent innovations is Gaussian Splatting, introduced by Kerbl et al. (2023) [
33], which represents scenes using 3D Gaussians instead of dense voxel grids or implicit functions. This approach offers real-time rendering capabilities while maintaining high visual fidelity, potentially overcoming some of the computational bottlenecks associated with early NeRF models.
Despite these advances, critical questions remain regarding the comparative performance of NeRF techniques and traditional photogrammetry for small artifact reconstruction: How do their outputs differ in terms of geometric accuracy, surface noise, and completeness? Can NeRF or Gaussian Splatting match or surpass the established standards of SfM photogrammetry for heritage applications?
This research aims to systematically compare 3D reconstructions of small archaeological artifacts generated using SfM photogrammetry, NeRF, and GS. To that end, this study examines three horse-shaped fibulae recovered from archaeological sites associated with the Vettones, one of the most famous pre-Roman peoples of western Spain [
34]. Currently housed in the Museum of Ávila, these artifacts were selected as case studies due to their refined morphological details and historical significance [
35,
36,
37,
38,
39,
40]. Quantitative evaluation metrics (including root mean square error (RMSE), Hausdorff distance, Chamfer distance, and density analyses) were used to assess the fidelity of the reconstructions against ground-truth measurements obtained with a metrological articulated arm.
One of the main contributions of this study lies in the novel application of NeRF-based methods to the 3D documentation of small archaeological objects with reflective metallic surfaces, a particularly challenging domain for conventional reconstruction techniques. Unlike most previous works focused on large architectural or natural scenes, our research demonstrates the potential of neural rendering approaches for capturing complex microgeometries and materials such as bronze. This application opens up new pathways for non-invasive, low-cost documentation of delicate and hard-to-scan cultural heritage items, especially in museum environments where traditional scanning technologies may be unfeasible. To address this gap, the present study aims to answer the following research questions: (i) How do NeRF and GS perform in reconstructing small-scale, reflective archaeological artifacts compared to traditional SfM photogrammetry? (ii) What are the strengths and limitations of these neural approaches in terms of geometric accuracy and detail preservation? (iii) Can these methods offer a viable alternative for cultural heritage documentation in constrained environments such as museums?
The studied objects exhibit specular reflections and complex geometry; feature-based photogrammetry loses correspondence, and the projected patterns of structured scanners become saturated. NeRF has been shown to tolerate non-ideal illumination and capture subtle radiometric properties. Its integration into Nerfstudio allows models to be trained from smartphone video, aligning with the “low-cost/low-contact” objective of the project.
The structure of this article is organized as follows:
Section 2 describes the materials and methods, including the artifacts, equipment, and data acquisition protocols.
Section 3 presents the results of the comparative analysis.
Section 4 discusses the findings in relation to prior work and identifies avenues for future research. Finally,
Section 5 summarizes the conclusions and practical implications of this study for 3D documentation of small archaeological objects.
2. Materials and Methods
This section describes the experimental design, acquisition methods, and reconstruction pipelines employed to generate 3D models of small archaeological artifacts. The workflow included (i) the selection and documentation of representative objects, (ii) the acquisition of ground-truth data using a metrological articulated arm, (iii) photogrammetric image acquisition and 3D reconstruction, (iv) Neural Radiance Fields (NeRF) training and point cloud extraction, and (v) quantitative evaluation of the reconstructed models (
Figure 1).
The selection of NeRF and GS methods was driven by the specific challenges posed by the artifacts under study: their small size, metallic surfaces, and intricate geometry. These conditions make conventional methods like laser scanning or even structured-light scanning less effective due to issues like reflection, occlusion, or loss of surface detail. Neural rendering techniques, by contrast, are more tolerant to such complexities, providing robust reconstructions using standard RGB imaging and offering significant advantages in environments where physical contact with artifacts is restricted or lighting cannot be fully controlled.
For the NeRF and GS reconstructions, video sequences were captured using an iPhone 13 under diffused daylight conditions, while the photogrammetric images were acquired using a Canon EOS R camera equipped with a macro lens and artificial lighting. The use of different acquisition devices (DSLR for SfM, mobile phone for NeRF/GS) was intentional, aiming to test the robustness of neural rendering methods under realistic, low-cost documentation scenarios. This setup reflects common constraints in museum or fieldwork environments, where access to professional imaging equipment may be limited.
2.1. Archaeological Artifacts
Three horse-shaped fibulae from the Late Iron Age (3rd–1st centuries BC), associated with the Vettones, were selected from the collection of the Museum of Ávila, Spain (
Figure 2). These bronze artifacts, approximately 5 cm in length, are characterized by thin, complex geometries and metallic reflective surfaces, posing significant challenges for 3D digitization. Their intricate morphology and cultural significance make them ideal candidates for assessing reconstruction accuracy at small scales.
With a design inspired by the form of a horse, these bronze archaeological pieces are usually associated with the equestrian elite who assumed leadership roles in the pre-Roman societies of Celtic Iberia [
35,
40]. Fibula 1 (museum number: 06/56/MS/343; size: 4.1 × 3.6 cm) was found in the walls of the archaeological site of La Mesa de Miranda (Chamartín, Ávila) [
39,
40,
41]. Fibula 2 (museum number: 1989/41/3461; size: 4.1 × 3.6 cm) was discovered during Juan Cabré’s excavations at the archaeological site of Las Cogotas (Cardeñosa, Ávila), carried out between 1927 and 1932 [
35,
39,
42,
43,
44,
45]. Fibula 3 (museum number: 04/112/2571; size: 4.5 × 3.7 cm) was found in the walls of Las Cogotas
oppidum [
39,
44,
45,
46]. In all cases, their excellent state of preservation gives us a glimpse of the dexterity and skill of the craftsmen who created them.
2.2. Ground-Truth Acquisition: Metrological Arm Measurements
Precise 3D coordinates were acquired for each fibula using a Hexagon ROMER Absolute Arm 7325SI metrological system (Hexagon Manufacturing Intelligence, USA) (
Figure 3). This instrument is used at the industrial level for metrological control of parts and components, ensuring that they meet the required dimensional specifications.
During data acquisition, the object was kept completely still to avoid any distortion in the measurements, and a total of 22–24 control points were recorded per brooch, distributed strategically over flat areas, morphological details, and high-contrast color regions to ensure comprehensive coverage (
Figure 4). Measurements were performed in a controlled indoor environment to minimize external influences, and each fibula was immobilized during acquisition. These datasets served as ground truth for subsequent accuracy assessments.
2.3. Photogrammetric 3D Reconstruction
Photographic datasets were acquired using a Canon EOS 700D DSLR camera equipped with a 60 mm macro lens (
Table 1). Each piece was placed on a white matte turntable inside a lightbox with diffuse illumination to minimize reflections. The camera was fixed on a tripod while the object was rotated, following a convergent photogrammetry scheme [
23].
Acquisition parameters included
Manual focus and exposure settings (f/8 aperture, low ISO).
150–200 images per artifact at 70–80% overlap.
A focus distance of approximately 100–120 mm.
To obtain high quality images, an efficient system using a camera mounted on a fixed tripod was implemented. In addition, a lighting box with a neutral, matte (white) background color was used. To improve the handling of the piece, a white turntable was included inside the illumination box, which allowed for better control of the object. In this way, good contrast and a soft and homogeneous illumination around the object of study was achieved. It should be noted that all three objects are metallic, so correct illumination is vital for the data acquisition process (avoiding reflections is paramount).
After adjusting the composition, the object was slowly rotated on the turntable, enabling images to be taken from multiple angles without altering the camera position. Although this method requires additional preprocessing time to remove the white background and avoid interference in the 3D reconstruction, it is highly effective (
Figure 5).
In order to scale the 3D model, the data obtained with the metrological arm were used. A total of 150 to 200 photographs were taken of each element. The number of photographs varied according to the geometry of the archaeological piece.
We performed 3D reconstructions using Agisoft Metashape (version 2.2.1.
https://www.agisoft.com/), a commercial photogrammetric software, applying standard SfM and Multi-View Stereo (MVS) pipelines. Masks were applied to each image to remove the background prior to processing. The models were scaled using the ground-truth points. The resulting models achieved an average Ground Sample Distance (GSD) of 0.0079 mm, average scale error of 0.0165 mm, and mean photogrammetric error of 0.0056 mm.
2.4. NeRF-Based 3D Reconstruction
To complement the photogrammetric reconstructions, NeRF were employed using the Nerfstudio framework [
32], a modular, open-source platform that facilitates the implementation and training of NeRF volumetric models from image datasets. This approach enables high-fidelity reconstructions of complex geometries and surface textures from dense visual information.
Data preparation involved extracting frames from high-resolution video footage captured around each artifact. The videos were recorded using a 50-megapixel HDR-enabled smartphone, ensuring detailed coverage from multiple viewpoints while maintaining consistent exposure and dynamic range. This method allowed for efficient acquisition of training data with sufficient angular diversity.
Camera poses were estimated using COLMAP [
22], a well-established SfM tool, which provided the necessary orientation data to initialize the NeRF training process. The sparse point clouds and calibrated camera parameters obtained from COLMAP served as the geometric backbone for radiance field learning.
NeRF models were trained using the Nerfacto [
32] configuration within Nerfstudio. This setup utilizes multi-resolution hash encoding and adaptive ray sampling to accelerate training and convergence. A total of 200,000 training iterations were performed using default hyperparameters, optimizing the volumetric representation of the scene for accurate photo-consistent rendering. The resulting radiance fields effectively captured the spatial and radiometric properties of the objects, including fine surface details and material variation (
Figure 6).
From the trained models, depth maps were extracted and subsequently converted into dense 3D point clouds. These point clouds provided geometric representations suitable for comparative analysis and downstream processing.
2.5. Gaussian Splatting-Based 3D Reconstruction
In parallel, reconstructions using GS [
33] were also performed through Nerfstudio’s GS module. GS represents the scene as a collection of 3D anisotropic Gaussian primitives, enabling faster training and real-time rendering capabilities. This method is particularly beneficial in scenarios where interactive visualization and responsiveness are required.
Gaussian Splatting (GS) represents the scene as a set
of anisotropic Gaussian primitives in
, each defined by a center
, a covariance matrix
and a color
. Following Kerbl [
33] and Nerfstudio implementation, we trained the parameters by minimizing the photometric error:
where
is the alpha-composited integration of the Gaussians along the ray
r.
2.6. Experimental Configurations
Alignment of the NeRF and GS models to the ground-truth coordinate system was initially conducted manually and later refined using the Iterative Closest Point (ICP) algorithm. This ensured precise spatial correspondence between the reconstructed models and reference datasets, facilitating quantitative accuracy assessments and integration with other datasets.
The NeRF- and GS-based 3D reconstructions, generated using the Nerfstudio framework, were systematically compared with scaled photogrammetric models obtained using Agisoft Metashape. The photogrammetric models were accurately scaled using ground-truth reference points acquired with a robotic arm-based measurement system, which provided high-precision spatial data for alignment. This comparison was carried out to evaluate the geometric accuracy, spatial consistency, and level of surface detail captured by each method. Alignment of all reconstructions within a unified coordinate system enabled the calculation of key quantitative metrics, including Ground Sample Distance (GSD), average scale error, and photogrammetric residuals.
All experiments were conducted using an RTX 3070 GPU (8 GB VRAM) in a Python 3.13.3 environment with Nerfstudio 1.0.0. For the NeRF reconstructions, the Nerfacto pipeline was used with default parameters, with key settings including 4096 rays per batch () and 32,768 rays per chunk (). The average training time per object was approximately 32 min.
For GS, the splatfacto method was used with key settings including an initial radius which was randomly sampled within the range 0.04–0.08 mm, and the training was conducted using 15,000 Adam iterations with a learning rate of 1 × 10−3. The renderer used the antialiased rasterization mode, and refinement steps were performed every 100 iterations. The average training time per object was 21 min.
In all cases, the reconstructed volumetric representations were converted into point clouds for evaluation. After convergence, volumetric density was sampled on a voxel grid, with approximately 33 million voxels for NeRF and 20 million voxels for GS. The resulting 3D point clouds were exported in .PLY format and used for quantitative geometric comparison.
2.7. Evaluation Metrics
Quantitative evaluation between the reconstructed point clouds and ground-truth data was performed according to standard geometric metrics [
47]:
2.7.1. Initial Alignment (ICP)
Prior to the calculation of the comparison metrics, an alignment was performed using the Iterative Closest Point (ICP) [
48] algorithm to minimize the position discrepancies between the reconstructed cloud and the reference cloud. This algorithm measures the proportion of points that, after alignment, are sufficiently close (inliers) according to the defined threshold. According to this, a high fitness indicates a good overlap between both point clouds, which is essential before evaluating point errors. Here, the RMSE, which measures the mean square error between the matched points after alignment, provides an overall measure of deviation in alignment and is sensitive to outliers, which can help to detect mismatches.
where the alignment error
represents the mean squared distance between corresponding points of the two point clouds after applying a rigid transformation. In this context
denotes the total number of point correspondences between the source (reconstructed) point cloud and the target (reference) point cloud. The term
refers to the coordinates of the
i-th point in the reference cloud, while
corresponds to the coordinates of the
i-th point in the source cloud. The expression
calculates the squared Euclidean distance between the transformed source point and its corresponding point in the reference. This function is minimized during the ICP optimization process to determine the rigid transformation that achieves the best possible alignment between both clouds.
2.7.2. Geometric Discrepancy Metrics
To quantify the differences between the reconstructed cloud and the reference cloud, metrics based on point-to-point distances were calculated.
RMS Error: Measure the average deviation considering a higher weight for large errors, which helps to identify possible outliers in the reconstruction.
where
d(
p,
Pr) is the distance from the point
in
to its nearest point in
.
Hausdorff distance: Evaluates the worst-case discrepancy between the two clouds, identifying the maximum distance between a point in the reference and its closest point in the reconstruction. This value is critical in applications where it is necessary to ensure that there are no significant error regions.
where the first part evaluates the worst distance from the reference to the reconstruction and the second from the reconstruction to the reference.
Chamfer’s distance: Calculates the sum of the nearest point distances in both directions (from reference to reconstruction and vice versa), providing a symmetric metric that allows for comparison of the completeness of the reconstruction.
In this equation, Chamfer’s distance quantifies the geometric difference between the source point cloud and the reference cloud by averaging the nearest-neighbor distances in both directions.
Mean Absolute Distance: Represents the average of all measured discrepancies between the two clouds, providing a value less sensitive to outliers than the RMS.
Here, the equation measures the average distance between corresponding points in the two clouds. Unlike RMSE, it is less affected by outliers, offering a more robust estimate of the typical point-to-point deviation.
Symmetric distance: Calculated as the average of the distances between the two clouds in both directions, ensuring that the comparison does not depend on the density of points in one of the two representations.
This equation averages the mean distances from each cloud to the other, ensuring that the comparison is unbiased with respect to point density or sampling differences.
2.7.3. Density Characterization and Spatial Distribution
In addition to geometric accuracy, the structure of the reconstructed point cloud was evaluated in terms of its density and spatial volume:
Density of points: It was estimated as the number of points per unit volume, considering the approximate volume defined by the extent of the cloud on the three axes. This metric allows us to identify if the reconstruction contains a detailed representation or if it presents deficiencies in capturing fine details.
where
is the total number of points and (
,
), (
,
) and (
,
) are the minimum and maximum values on each cloud axis.
These metrics provide a comprehensive assessment of reconstruction fidelity in terms of both local and global deviations.
3. Results
The central purpose of this study is to systematically compare the performance of NeRF 3D reconstruction methods, specifically NeRF and GS, against conventional SfM photogrammetry, using small archaeological artifacts as test subjects. To this end, we evaluate and interpret the accuracy of each method relative to a high-precision ground truth generated with a metrological articulated arm.
3.1. Reconstruction Performance Metrics
Table 2 presents the comparative results across three fibula samples using several quantitative metrics, including ICP Fitness, ICP RMSE, RMS Error, Hausdorff distance, Chamfer distance, Mean Absolute Error, and Symmetric Error.
NeRF consistently achieved lower RMS, Chamfer, and Symmetric Errors compared to GS across all samples, indicating better overall geometric fidelity. Hausdorff distances, which captured the worst-case deviations, were also smaller for NeRF in two of the three brooches. However, both methods demonstrated increased errors when reconstructing Fibula 3, likely due to its more complex geometry and reflective surface properties.
Although GS produced denser point clouds in certain cases, this did not translate into better reconstruction accuracy, suggesting that point density alone is insufficient to guarantee model fidelity.
3.2. Point Density Comparison
Table 3 shows the density of the reconstructed point clouds relative to the ground-truth models.
Both NeRF and GS produced point clouds significantly less dense than the ground-truth data, which was expected given the differences in acquisition technologies. Although GS sometimes generated a higher number of points than NeRF, this did not correlate with superior metric performance. In particular, Fibula 3 exhibited very low densities in both NeRF and GS reconstructions, suggesting that complex or reflective artifacts pose challenges to image-based methods.
3.3. Visual Comparison of Point Clouds
The spatial distribution of errors between the photogrammetric reference models, NeRF reconstructions, and GS models was assessed using the Cloud-to-Cloud Distance tool in CloudCompare.
Figure 7 and
Figure 8 present deviation maps for the three fibulae, illustrating geometric discrepancies between each reconstruction method and the ground-truth photogrammetric model.
Figure 7 illustrates the deviations between the NeRF models and the photogrammetric reference. The predominance of blue and green tones across the fibulae indicates that NeRF reconstructions maintain a high degree of geometric fidelity, especially in flat and less detailed areas. Minor deviations (green to light yellow) appear mainly along edges and small ornamental features, but they remain limited to some extent.
In contrast,
Figure 8 presents the deviation maps for the GS-based models, where a larger proportion of yellow and red areas can be observed, particularly in regions containing fine geometrical details such as contours, cavities, and narrow features. This suggests that GS has more difficulty in accurately reproducing intricate elements and tends to produce higher deviations overall.
While both techniques can approximate the overall structure of the archaeological pieces, the NeRF reconstructions outperform GS in terms of spatial accuracy. The results confirm that NeRF is more effective in preserving geometric integrity, making it a more suitable approach for documenting small and complex archaeological artifacts that require precise 3D representation.
3.4. Quantitative Evaluation of Geometric Accuracy Using Distance Distributions
To complement the visual deviation maps, quantitative analysis was performed to assess the geometric fidelity of the NeRF and GS reconstructions with respect to the photogrammetric reference models. Specifically, the Euclidean distances between corresponding points in the reconstructed and reference clouds were computed and analyzed using histograms that reflect the frequency distribution of point-to-point distances.
This method provides a more detailed understanding of how each technique behaves across the full surface of the object, highlighting not only average performance but also the presence of outliers or areas with significant error. The analysis was carried out separately for each of the three fibulae under study, and the resulting distributions are presented in the following figure (
Figure 9).
Figure 9 represents the point-to-point error density (top) and cumulative distribution curves (bottom) for the three fibulae. In models 1 and 2, the NeRF distribution is concentrated in the 0.1–0.4 mm range, while GS exhibits longer tails, indicating greater local deviations. Eighty percent of the NeRF points fall within 0.3 mm of the metrological control, compared to 55–65% in GS. In Fibula 3, both methods exhibit larger errors due to the strong specular reflectance; however, NeRF maintains a median 30% lower than GS.
The F-score at 0.5 mm (tabulated in the same figure) quantifies this observation: 0.73 and 0.83 for NeRF versus 0.55 and 0.50 for GS in the first two cases. In the third, both fall below 0.40. These results confirm NeRF’s geometric advantage, already indicated by the RMS and Chamfer metrics.
Across all three brooches, the NeRF reconstructions clearly achieve superior geometric fidelity, particularly in areas requiring detailed reproduction. These results reinforce the conclusion that NeRF is better suited than GS for high-accuracy 3D modeling of small archaeological objects, especially when precise spatial measurements are essential for documentation, conservation, or scientific analysis.
3.5. Summary of Reconstruction Performance
The results of this comparative study provide a comprehensive overview of the capabilities and limitations of NeRF and GS for the 3D reconstruction of small archaeological artifacts, using high-precision photogrammetric models such as ground truth.
Quantitatively, NeRF consistently outperforms GS across all evaluated metrics, including Root Mean Square Error (RMS), Chamfer distance, and Symmetric Error, as reported in
Table 2. These lower error values indicate a higher level of geometric accuracy and surface fidelity. Although GS occasionally produces denser point clouds (
Table 3), this does not translate into improved performance, suggesting that point count alone is not a reliable indicator of model quality.
Visually, the deviation maps (
Figure 7 and
Figure 8) further emphasize NeRF’s superiority in spatial accuracy. While both NeRF and GS can reproduce the overall shape of the objects, NeRF more accurately captures fine details and preserves the structural integrity of the fibulae. GS, on the other hand, exhibits noticeable errors in complex or narrow regions, particularly evident in Fibula 3, where it struggles with reflective surfaces and intricate geometry.
The histogram-based distance analysis (
Figure 9) confirms these findings, showing that NeRF consistently maintains a higher concentration of points within minimal deviation ranges (<0.3 mm), while GS distributions are broader and skewed toward higher errors. This pattern is consistent across all three fibulae and highlights NeRF’s robustness and precision.
In conclusion, while both NeRF and GS represent promising approaches for image-based 3D reconstruction, NeRF demonstrates greater reliability and accuracy for small, complex cultural heritage objects. Its ability to generate metrically accurate models makes it more suitable for applications in digital preservation, scientific analysis, and heritage documentation, where geometric precision is critical. GS may still offer advantages in terms of rendering speed and visual smoothness, but its geometric limitations must be carefully considered when fidelity is a priority.
4. Discussion
4.1. Comparative Performance of NeRF, GS, and SfM
The findings of this study contribute significant insights into the practical capabilities, limitations, and future potential of NeRF-based 3D reconstruction methods, specifically NeRF and GS, when applied to the documentation of small-scale archaeological artifacts. By benchmarking these approaches against high-precision photogrammetric models derived from SfM, we have been able to draw robust conclusions regarding their geometric accuracy, reliability, and applicability in cultural heritage workflows.
Across all evaluated metrics (RMS Error, Chamfer distance, Mean Absolute Error, and Symmetric Error, see
Table 2 and
Figure 10), NeRF consistently outperformed GS. On average, NeRF reduced RMS and Chamfer errors by 35–40% relative to GS and by >40% relative to SfM. While GS occasionally produced denser point clouds, these did not translate into higher accuracy, reinforcing the notion that point density alone is not a reliable proxy for model quality. Indeed, GS exhibited the highest Hausdorff distances (up to 20% larger than NeRF), signaling a greater frequency of localized outliers and boundary artifacts. These patterns aligns with previous benchmarks [
33], which report that GS favors visual smoothness and real-time rendering speed at the expense of sub-millimetre precision.
Tests demonstrate that NeRF offers superior geometric fidelity, achieving lower RMS, Chamfer, and symmetry errors across all three parts analyzed. This is because NeRF’s continuous volumetric rendering better captures microcurvatures and fine edges, while GS tends to smooth out details to optimize real-time rendering. Furthermore, NeRF enables the extraction of dense, reliable depth maps that feed a coherent cloud, ideal for comparison against metrological control. Furthermore, hierarchical ray sampling reduces artifacts in hard-to-see areas. Finally, while GS renders more efficiently, NeRF keeps the VRAM footprint within manageable limits (<8 GB in our tests).
4.2. Performance Under Complex Geometries and Reflective Surfaces
One of the most revealing outcomes emerged from the analysis of Fibula 3, an artifact characterized by a more complex morphology and reflective metallic surfaces. Both NeRF and GS showed markedly reduced performance on this object, with increased reconstruction errors and significantly lower point densities (as reflected in
Table 3). This decline can be attributed to specular reflections, occlusions, and the overall difficulty of modeling highly detailed or shiny surfaces using passive image-based methods. Reflective artifacts disrupt the feature correspondence mechanisms fundamental to SfM and interfere with radiance field estimation in neural methods, resulting in noise, over smoothing, or missing geometry.
The deviation maps (
Figure 7 and
Figure 8) and histogram analyses (
Figure 9) confirmed that the major discrepancies in NeRF and GS reconstructions are concentrated in small-scale features, such as edges, decorative engravings, and narrow voids. Although both methods effectively capture the overall shape and volumetric proportions of the fibulae, GS displayed higher spatial deviations across all samples, particularly in high-curvature areas. NeRF, by contrast, maintained a tighter error distribution, with most deviations below 0.3 mm and localized mainly in ornamented zones or occluded surfaces. These findings align with previous research in the cultural heritage domain, where NeRF has demonstrated strong performance in texture reproduction and general geometry, but some difficulty in achieving submillimetric fidelity required for high-precision applications.
4.3. Applicability in Cultural Heritage Documentation
Despite these limitations, NeRF reconstructions offer significant advantages in terms of workflow efficiency, visual realism, and user accessibility. The relatively short acquisition and training times (especially when compared to traditional photogrammetric modeling and dense SfM reconstructions) make NeRF a promising option for preliminary documentation, interactive exhibitions, digital storytelling, and remote research collaboration. Its ability to produce compelling visual outputs with reduced human intervention also makes it highly suitable for public dissemination and educational use.
However, for tasks requiring strict geometric accuracy (such as digital conservation archives, physical replication (e.g., 3D printing), morphometric analyses, or architectural integration), photogrammetry remains the gold standard. SfM still provides superior control over point cloud resolution, alignment, and error traceability, essential in projects demanding traceable metrological validity.
Beyond the quantitative performance evaluation, the present study offers an original contribution for exploring how NeRF systems perform in a particularly underexplored domain: the documentation of small-scale, high-reflectivity heritage artifacts. This niche application area introduces unique challenges that are rarely addressed in NeRF-related literature, including issues related to scale, lighting control, and reflective surface properties. By rigorously testing these technologies under such constraints, we demonstrate that neural rendering can extend its utility beyond synthetic benchmarks and architectural datasets, providing heritage scientists with alternative tools for documenting fragile or visually complex items. These challenges underscore the importance of developing material-aware or photometrically robust NeRF variants to improve accuracy in cultural heritage scenarios involving metal, glass, or polished stone.
4.4. Methodological Scope and Future Research Directions
While other reconstruction techniques such as structured-light scanning or micro-CT imaging are also used for small object digitization, their inclusion was beyond the scope of this study, which aimed to evaluate the performance of NeRF methods against a widely accepted standard (SfM photogrammetry). Nonetheless, future work could expand this comparison to include additional methods, especially those designed for metrological accuracy in micro-scale applications.
It is worth noting that this study did not seek to provide a comprehensive benchmark of all existing 3D reconstruction techniques. Instead, it focuses on evaluating NeRF approaches in contrast with the most widely adopted photogrammetric method (SfM) and a real-time neural variant (GS). Although techniques such as laser scanning or structured-light scanning are also used in heritage digitization, they were excluded from this comparison due to practical constraints and to maintain a clear focus on the performance of neural rendering models. Future research may benefit from including additional methods to further validate and contextualize the comparative performance of AI-based reconstructions.
Looking ahead, the integration of NeRF with hybrid or multimodal approaches presents a promising avenue for enhancing accuracy without sacrificing efficiency. For instance, combining NeRF with SfM-derived sparse geometry may provide stronger geometric priors during training. Adaptive-resolution NeRFs, capable of focusing on high-detail areas, could address the current limitations in fine-feature reproduction. Material-aware rendering models, incorporating reflectance properties or multispectral data, may mitigate the challenges posed by reflective surfaces. Improvements in camera pose estimation and radiance field optimization under complex lighting could further enhance the reliability of neural reconstructions in cultural heritage settings. Moreover, the incorporation of photometric consistency constraints, BRDF-aware shading models, and integrated uncertainty quantification could position future NeRF systems as more robust tools for scientific documentation.
It is important to note that the input data used for the neural methods (NeRF and GS) were derived from mobile video capture, whereas the SfM model was built from a controlled DSLR-based image set. This divergence reflects a deliberate choice to simulate realistic, low-cost documentation conditions using consumer devices. While this introduces variability in the comparison, it also highlights one of the core advantages of neural rendering: its capacity to generate plausible reconstructions from less controlled, lower-quality data sources. Future work will explore direct comparisons using matched datasets from identical cameras to further isolate the effects of algorithmic differences.
4.5. Trade-Offs Between Accuracy, Processing Time, and Usability
While SfM photogrammetry remains the most accurate method for sub-millimeter documentation, it requires a carefully controlled acquisition environment, a large number of well-planned images, and time-intensive processing (including masking, alignment, and manual scaling). In contrast, NeRF and GS reconstructions (particularly when implemented via frameworks like Nerfstudio) require significantly less human intervention and can be deployed using video frames or simple acquisition protocols.
In terms of computational cost, GS resulted in 21 ± 3 min of training time and 28 FPS of rendering at 1080p on an 8 GB RTX 3070, while NeRF-Nerfacto required 32 ± 5 min and yielded 12 FPS. While GS improves the interactive performance, our results confirm that this improvement comes at the cost of an 18–35% loss in RMS. Therefore, the choice between the two approaches must weigh display speed against the fidelity of the visual metrics required by the final application. However, this reduced effort comes at the cost of geometric accuracy, as neural methods still struggle with fine-feature preservation and reflective surfaces. In field applications such as emergency documentation, museum digitization, or educational VR content creation, NeRF provides an accessible, fast, and visually rich alternative. For applications requiring reproducible metrological accuracy (e.g., conservation or 3D printing), SfM remains preferable despite the higher processing burden.
These trade-offs underscore the importance of aligning method selection with project goals and constraints, particularly in time-sensitive or resource-limited archaeological and heritage contexts.
4.6. Final Remarks
In summary, this study confirms the potential of NeRF-based approaches to serve as efficient and visually compelling tools in the digitization of archaeological heritage. While not yet a replacement for high-precision photogrammetry in rigorous conservation or analytical contexts, NeRF’s strengths in volumetric fidelity and fast deployment make it an attractive complement to existing 3D documentation pipelines [
33]. Continued research and cross-disciplinary development will be essential to unlocking its full potential for use in cultural heritage science.
5. Conclusions
This study has systematically evaluated the performance of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) in comparison to conventional Structure-from-Motion (SfM) photogrammetry for the 3D reconstruction of small archaeological artifacts. The results clearly demonstrate that NeRF consistently outperforms GS across multiple quantitative metrics, including RMS Error, Chamfer distance, and Symmetric Error, indicating a higher degree of geometric fidelity. However, both neural methods still fall short of the geometric precision achieved by SfM photogrammetry, which remains the most accurate approach for capturing sub-millimeter details essential in cultural heritage conservation. Despite the longer acquisition and processing times associated with photogrammetry, its reliability and metrological robustness continue to make it the gold standard for rigorous documentation.
NeRF, in contrast, offers significant advantages in terms of efficiency and visual quality. It effectively reconstructs the global shape and surface curvature of artifacts while producing visually compelling models suitable for applications in virtual exhibitions, educational tools, and public outreach initiatives. Nevertheless, both NeRF and GS exhibit notable limitations when applied to objects with reflective or highly complex surfaces, such as metallic fibulae, where specular reflections and occlusions negatively impact reconstruction accuracy. These findings highlight the need for continued development of neural rendering techniques, particularly in enhancing their ability to handle challenging materials and detailed features.
While not yet a replacement for photogrammetry in precision-critical contexts, NeRF and GS present promising alternatives for rapid and accessible 3D documentation workflows. Future research should explore hybrid approaches that combine the photometric realism of NeRF with the geometric precision of SfM, potentially through the integration of sparse geometric priors or adaptive-resolution neural models. Additionally, further investigation is warranted to examine how variables such as lighting, camera calibration, and dataset diversity influence reconstruction quality in neural pipelines. Developments in material-aware radiance fields and uncertainty quantification may also help bridge the current accuracy gap.
In summary, although SfM photogrammetry remains the most accurate method for 3D reconstruction in archaeology, NeRF-based techniques are emerging as valuable complementary tools that offer a balance between visual realism, processing speed, and usability, particularly in scenarios where extreme precision is not the primary requirement. Their ongoing evolution holds great potential for enriching the digital preservation and dissemination of cultural heritage.
The originality of this work also stems from its focus on a highly specific yet largely overlooked use case: the neural-based reconstruction of small, metallic archaeological artifacts. This constitutes a meaningful advancement in the field of digital heritage, as it illustrates the feasibility of applying NeRF and GS techniques in scenarios traditionally dominated by photogrammetry or laser scanning. Our findings encourage further exploration into how AI-based methods can be adapted or optimized for heritage contexts that demand both accuracy and accessibility.