1. Introduction
Three-dimensional (3D) surface imaging has become a valuable tool in plastic surgery, particularly in aesthetic procedures such as breast augmentation and facial contouring. It surpasses traditional two-dimensional (2D) photography by providing comprehensive spatial data. This capability significantly enhances preoperative analysis and surgical planning accuracy [
1]. While its use in oncoplastic and reconstructive breast surgery—especially in cancer contexts—remains limited due to current constraints in accuracy and clinical validation, 3D imaging holds potential for future integration. In standard clinical practice, surgeons primarily rely on radiographic images such as mammograms/MRI scans for tumor localization and surgical planning [
2]. However, these images are usually acquired with patients in standing (mammography) or prone (MRI) positions, which do not reflect the supine posture typically used during surgery. Additionally, breast compression during mammography, along with differences in patient positioning in both modalities, can complicate the accurate translation of imaging data into surgical scenarios. Also, these discrepancies can compromise tumor removal and negatively affect reconstructive outcomes, highlighting the need for improved (position-consistent) preoperative planning.
Various techniques such as structured-light scanners, laser scanners, and stereophotogrammetry have emerged as viable solutions for capturing the complex geometry of the breast. Structured-light and stereophotogrammetry systems have demonstrated sub-millimeter (mm) precision and reliability (with typical errors well under 1 mm in controlled settings) [
3,
4,
5], but their clinical adoption remains limited. The trade-offs lie in cost, complexity, and capture speed/time. For instance, multi-camera photogrammetry enables rapid acquisitions but at high cost and complexity, whereas structured-light and laser scanners, though portable, require a steadier subject due to longer scan durations and have limitations in mesh quality. In response, recent developments in mobile scanning devices and smartphone-based applications offer a promising balance of accuracy, reproducibility, and accessibility [
6].
One notable device is Structure Sensor, an infrared depth camera attachable to iPads. Oranges et al. [
7] evaluated it on an iPad Pro by scanning a rigid female torso phantom alongside two commercial system (Canfield Vectra M5 and Artec Eva). Analysis of breast measurements (distances between anatomical landmarks and surface areas) showed no significant differences, indicating that low-cost mobile scanners can match the accuracy of high-end systems in controlled settings. This finding supports the conclusion that, for a controlled model, a mobile infrared scanner can be as reliable as industry-standard 3D imaging systems in capturing breast morphology. Koban et al. [
8] compared an affordable handheld scanner (Intel RealSense) against a multi-camera photogrammetric system in 42 patients. Breast measurement showed good-to-excellent correlation between the tested devices, with surface deviations around 1.6–1.8 mm root-mean-square (RMS) error and a reproducibility of 0.64 mm RMS.
Beyond these, modern smartphones further extend mobile scanning capabilities through two main approaches: photogrammetry apps and LiDAR-based scanning. Photogrammetry apps reconstruct 3D meshes by stitching together multiple images, while newer devices equipped with LiDAR sensors allow direct depth capture. One recent study [
9] compared smartphone-derived 3D breast volumes (3D Scanner App was used) with MRI-based volumes (MRI being a gold standard for volume measurement) in 22 women. While small systematic offsets were observed, the results demonstrated that smartphone scanning can yield reasonably accurate volume estimates at low cost. Kyriazidis et al. [
10] evaluated a smartphone LiDAR workflow in 25 patients, focusing on clinically relevant (linear) breast measurements. Relative errors were low (approximately 1.4% for notch-to-nipple and 3.5% for nipple-to-midline distances) with excellent inter-rater reliability (ICC—0.92). Notably, the learning curve was short, with clinicians achieving proficiency after only a few scans. All these findings suggest that portable, affordable solutions could achieve clinically acceptable accuracy in breast imaging, provided their limitations are well understood and characterized.
While smartphone-based 3D scanning apps provide convenient solutions, many use proprietary reconstruction algorithms that often rely on deep learning (DL) methods, the specifics of which remain undisclosed. In contrast, recent academic research has developed open-source DL frameworks capable of reconstructing detailed meshes from sparse inputs. For example, Occupancy Networks [
11] accurately learn continuous shape representations from single images or sparse point clouds and they have been validated on large synthetic datasets, such as ShapeNet (over 50,000 synthetic 3D models) [
12]. Pixel-Aligned Implicit Functions (PIFu) [
13] reconstruct detailed clothed human meshes from one or more RGB images, showing robustness to occlusions by aligning pixel-level features with learned shape priors. Alternative approaches such as convolutional mesh regression [
14] and graph-based approaches like Pixel2Mesh [
15] predict mesh vertex positions or deform template meshes from single images, achieving high accuracy on real-world human datasets. Specifically in breast reconstruction, recent open-source DL developments remain largely experimental or in early clinical stages. For instance, Weiherer et al. [
16] introduced an implicit neural breast shape model (iRBSM), trained on 168 photogrammetry-based 3D scans, which reconstructs detailed breast meshes from single 2D photos. Their workflow uses a state-of-the-art monocular depth estimator (Depth Anything V2) to generate depth maps, reprojected into 3D using known camera intrinsics, followed by model fitting to the resulting point cloud. Duarte et al. [
17] proposed a CNN-based system for real-time reconstruction, validated on synthetic data with a mean surface error of approximately 3.9 mm. Commercial tools like Crisalix also use DL algorithms (despite proprietary) to generate 3D breast models from photographs for preoperative planning. However, all these methods face key limitations, including limited datasets, difficulty achieving clinical-grade accuracy, and the need for real-world validation.
Given this context, the clinical utility and practical capabilities of currently available commercial 3D scanning applications still require rigorous evaluation. The primary objective of our study was to evaluate and quantify the performance of a selected set of current applications—3D Scanner App, Heges, Polycam, SureScan, Kiri and Structure Sensor—in reconstructing 3D breast meshes, using manual measurements as the reference for validation. A mannequin phantom fitted with silicone breasts was employed to maintain controlled conditions. Specifically, we focused on the following:
Overall reconstruction accuracy across applications: Assessed by comparing manually measured anatomical distances on the phantom with corresponding digital measurements extracted from 3D meshes, aggregated across all scans.
Intra-trial and operator-related variability: Evaluated by analyzing repeated scans per app and comparing error consistency across trials and between two independent operators.
Anatomical region-specific error patterns: Analyzed by examining reconstruction accuracy across different anatomical distances to identify which regions are more prone to distortion or variability.
Visual and practical challenges: Documented through qualitative inspection of reconstructed meshes.
By systematically addressing these objectives, this study provides insights into the reliability, accuracy and practical limitations of current 3D reconstruction applications for breast mesh generation using a phantom model.
4. Discussion
The primary aim of this study was to systematically assess the accuracy and consistency of various commercially available mobile 3D scanning apps in generating anatomically accurate meshes of a female torso phantom with silicone breasts. While previous work has benchmarked individual apps or compared mobile scanners with high-end systems, a comprehensive evaluation of accuracy, repeatability and region-specific performance across multiple consumer solutions has so far been lacking. This paper addresses that gap by providing the first systematic assessment of six available mobile 3D scanning applications—SureScan, Heges, 3D Scanner App, Polycam, Kiri Engine, and Structure Sensor—using a controlled phantom model. Our findings provide a practical, evidence-based resource to help researchers and clinicians make informed decisions when selecting 3D scanning workflows that balance accuracy, cost, and real-world usability.
Among the tested applications, SureScan emerged as the best performer in terms of mean reconstruction accuracy, achieving a mean absolute error of approximately 2.9 mm. This is comparable to previously reported accuracies for high-end structured-light and stereophotogrammetry systems [
3,
4,
7]. Structure Sensor and Heges also showed similar performance, with mean errors close to SureScan, supporting prior observations that LiDAR and infrared-based handheld scanners can be accurate [
1]. However, the Kruskal–Wallis test and follow-up analyses revealed no statistically significant differences in accuracy between SureScan, Heges, Structure Sensor, Kiri, and 3D Scanner App. From a clinical perspective, this lack of significant difference may be seen as a strength: it suggests that several mobile apps can deliver comparable reconstruction accuracy, allowing users to choose based on workflow needs, cost, and device compatibility. Still, statistical similarity does not imply similar performance in practice. Polycam stood out with a much higher mean error of around 21.4 mm and large variability across scans. Interestingly, its reconstructions often appeared visually acceptable. The issue likely lies in its reconstruction method, which produces reasonable shapes but struggles with scaling. Without consistent scaling, measurements become unreliable. This entire limitation highlights an important point: visual quality alone does not guarantee quantitative accuracy.
Although this study did not include direct comparisons with clinical-grade systems like Canfield Vectra M5 and/or Artec Eva, the observed mean errors (e.g., 2.9 mm for SureScan) fall within clinically acceptable thresholds for surgical planning and breast symmetry assessment tasks (typically 3–5 mm) [
3,
7]. This suggests that mobile, camera-based tools may already offer sufficient accuracy for selected clinical applications, particularly where usability, cost, and repeatability are key considerations. This raises a broader question for the field: should future innovation focus more on hardware, on reconstruction methods, or ideally on both? Five of the six evaluated applications rely entirely on the iPad’s built-in RGB camera and depth estimation, with only Structure Sensor using an external depth sensor. This common imaging source enables a clearer comparison of reconstruction performance. Despite using similar input, the apps produced a wide range of mean errors (from 2.9 to 21 mm), indicating that differences in performance are largely driven by the reconstruction algorithms and post-processing techniques. Visual assessments reinforce this finding. As shown in
Figure 6, all apps generated plausible-looking reconstructions, but each exhibited specific limitations. Structure Sensor produced smooth and realistic meshes with low error but tended to over-smooth surfaces, potentially hiding detail. However, when combining accuracy, consistency, and visual inspection, Structure Sensor appears to offer a high reliable and well-rounded performance. SureScan showed the highest accuracy but often lacked sharp visibility of fiducial markers and surfaces. Heges appeared sensitive to lighting, leading to surface noise in some scans. Both SureScan and 3D Scanner App occasionally duplicated markers. Kiri and Polycam showed volume distortions, with Polycam particularly affected by scaling issues despite coherent overall shapes. These findings suggest that performance depends not only on sensor quality or visual appeal but heavily on the reconstruction pipeline. Given that several apps using the same hardware delivered different (quantitative and qualitative) results, future improvements may be more effectively achieved through algorithmic enhancements (such as DL-based surface completion, mesh denoising, or improved marker detection) than through hardware upgrades alone. That said, a combination of software and hardware advancements may be necessary to fully overcome current limitations. If the strengths of individual approaches were integrated (e.g., Structure Sensor’s surface realism, SureScan’s accuracy, and more reliable marker handling), significantly improved results could be achieved at low costs.
Other analyses revealed specific anatomical patterns influencing reconstruction accuracy. Short-range distances, particularly those concentrated around periareolar regions, consistently exhibited high accuracy and low variability across most applications. This pattern is likely attributable to the relatively simpler surface geometry and the localized area of reconstruction, aligning with previous studies suggesting higher reliability of local morphological assessments using mobile scanning technologies [
9,
10]. In contrast, longer cross-quadrant measurements, encompassing medial–lateral and superior–inferior transitions, presented significantly greater reconstruction challenges across all platforms, notably in apps with lower algorithmic robustness, such as Polycam. This underscores the critical need for enhanced reconstruction algorithms capable of accurately capturing complex geometries/morphologies.
We also looked at how consistent each app was across repeated trials. SureScan and Structure Sensor again performed best, showing low variability between scans. Polycam, in contrast, was much less consistent, with high differences between repeated measurements. These results suggest that although some scanning apps deliver consistent results, others show some variability across repeated measurements, which may impact their reliability in clinical applications.
Operator-related variability was minimal. The Kruskal–Wallis test showed no significant differences between operators, suggesting that inter-operator variability had minimal impact on measurement performance across trials and applications. This is encouraging from a clinical perspective, as it means scanning can be performed reliably by different users, if a standard protocol is followed.
The heatmap in
Figure 5 gives a detailed breakdown of errors by anatomical distances. While it may seem complex/dense at first, it allows users to identify which application yields more reliable distance measurements for specific clinical requirements. Periareolar distances were among the most accurate and consistent across all apps, likely because they involve small, local areas with relatively simple surface geometry. Cross-quadrant distances, especially those covering medial–lateral or superior–inferior directions, were more error-prone. These results indicate that while some applications maintain consistently low variability across repeated trials, others are more affected by spatial error fluctuations depending on scan region and landmarks. While we report absolute errors directly, future work could explore fuzzy logic-based uncertainty visualization techniques to help communicate confidence levels in distance estimations, particularly in regions prone to spatial error fluctuations [
20].
We also observed several practical challenges during scanning. Mesh quality often depended on where the scan was started, with the first scanned side appearing less distorted. Some scans showed incomplete geometry at the bottom of the breasts, likely due to difficult viewing angles and complex curvature. Full 360° scans occasionally led to alignment issues, possibly caused by cumulative drift or loop closure errors. Even when overall errors were low, we sometimes found local surface deformations or irregularities. Lighting conditions and marker visibility also influenced surface accuracy. Moreover, the colors shown in-app did not always match the exported mesh when viewed in other software, likely due to in-app color processing. This introduces a source of variability in visual evaluation, emphasizing the need to standardize the rendering environment or interpret visual assessments with caution.
Our results suggest that mobile 3D scanning can already achieve clinically acceptable accuracy. Notably, a dedicated depth sensor such as the Structure Sensor is not strictly necessary to obtain high accuracy—apps like SureScan and Heges performed comparably well using only the iPad’s built-in sensors. However, the choice of application plays a critical role, as not all apps employ equally robust (qualitative) reconstruction algorithms. Even among apps with statistically similar performance, qualitative differences in repeatability and visual quality still matter.
The overall similarity in reconstruction accuracy (despite differences in app design and unknown internal implementations) prompted us to investigate whether these apps might rely on a shared underlying framework, specifically Apple’s ARKit. Since none of the applications publicly disclose their technical architecture, we analyzed their .ipa files (Apple’s binary package format for iOS apps) to identify potential API dependencies. These findings should be interpreted with caution, as they were not confirmed by the developers and the binaries were stripped of symbolic information, meaning internal method names and class structures were obfuscated. Nevertheless, they provided insights into potential dependencies. We found that Heges, Polycam, KIRI Engine, and 3D Scanner App all link to ARKit, while SureScan does not. For Heges, the internal logging evidence strongly suggests use of ARKit’s LiDAR features such as ARSession, ARPointCloud, and ARMeshAnchor for real-time reconstruction. For Polycam and KIRI, internal strings in their logs like ’sceneImage’ and ’meshAnchors’ suggest similar/possible conditional or runtime ARKit usage. The Structure Sensor app primarily relies on its proprietary depth hardware, with ARKit listed as an optional dependency not evidently used in our recordings.
Notably, SureScan (despite not using ARKit) achieved the lowest mean error, followed closely by ARKit-enabled apps like Heges, and 3D Scanner App. KIRI performed slightly worse, and Polycam, though also linked to ARKit, showed the highest error and variability. These findings suggest that while ARKit may provide a shared baseline for motion tracking and depth acquisition, its mere inclusion does not guarantee high quantitative and/or qualitative performance. Instead, performance appears to depend critically on how these tools are implemented and integrated into each app’s broader reconstruction pipeline. This reinforces the central role of software strategy and algorithmic design—beyond hardware and API choice—in determining overall reconstruction quality.
While our study focuses on static surface mesh reconstruction using phantom models, future clinical applications will require handling patient-specific challenges, such as soft tissue deformation, anatomical variability, and motion during scanning. DL-based reconstruction frameworks offer promising solutions, as they can learn statistical shape priors from large datasets and adapt to local anatomical variability, enabling more complete and anatomically coherent reconstructions even with missing or noisy input data [
21,
22]. In particular, combining spatial and temporal learning (which process both geometric structure and temporal evolution) has demonstrated strong performance in reconstructing dynamic anatomical textures. For example, STINR-MR [
23] models 3D motion in cine-MRI, combining spatial and temporal implicit neural representations, while TransforMesh [
24] uses mesh transformers to analyze long-term anatomical changes. Additionally, MedNeRF [
25], a medically adapted Neural Radiance Field, enhances soft tissue reconstruction from limited/imperfect visual data.
Beyond improving reconstruction itself, biomechanical simulation methods such as finite element modeling (FEM) could serve as a complementary tool to validate reconstructed meshes. Multiscale FEM has already been used to simulate breast tissue deformation during surgery and wound healing, demonstrating realistic behavior when subjected to physiological loads [
26]. Incorporating FEM-based validation in the pipeline would provide an additional layer of structural credibility—especially important for clinical applications like surgical planning, where mesh behavior under load is as critical as its shape.
Additionally, AI-based closed-loop scanning guidance systems could help reduce operator dependency by providing real-time feedback during acquisition (highlighting incomplete regions, poor surface coverage, or suboptimal scan angles). Similar AI techniques have already shown benefits in clinical imaging: for instance, AI-assisted ultrasound systems guide operators to capture optimal views and reduce variability between users [
27]. Finally, fuzzy logic systems could potentially support uncertainty visualization by providing interpretable estimations of local reconstruction confidence, though their application to mesh error quantification remains exploratory. Although our pipeline focuses on surface mesh reconstruction, future multimodal frameworks could integrate non-optical sensing techniques such as impedance tomography or electromagnetic imaging, which have demonstrated promise in detecting internal breast tissue variations and deeper structural discontinuities [
28,
29].
Moreover, the field would benefit from open benchmarking initiatives that include shared datasets and standardized reporting of error distributions to enhance reproducibility and transparent comparison. Importantly, the ability of simple, smartphone-based tools to produce accurate 3D scans opens the door to more accessible and affordable workflows, especially in settings without access to high-end commercial systems. This also creates (new) opportunities for building datasets to train DL models for applications such as simulating postoperative breast morphology after lumpectomy or mastectomy. While these applications remain exploratory, they highlight the need for ethically sourced, anatomically diverse 3D data and robust validation before clinical use. Mobile scanning could eventually support not just documentation, but also personalized surgical planning and predictive modeling.
6. Conclusions
This study provides the first systematic evaluation of six commercially available mobile 3D scanning applications (SureScan, Heges, 3D Scanner App, Polycam, Kiri Engine, and Structure Sensor) using a controlled silicone phantom model to assess their accuracy, repeatability, and practical limitations for breast reconstruction.
Key findings: (a) SureScan demonstrated the lowest mean absolute error (2.9 mm), followed by Structure Sensor (3.0 mm), Heges (3.6 mm), 3D Scanner App (4.4 mm), Kiri (5.0 mm), and Polycam (21.4 mm). Despite these differences, statistical analysis revealed no significant differences in accuracy among the top five apps, indicating that high performance is achievable using either built-in mobile cameras or external sensors—emphasizing the importance of software and reconstruction algorithms over hardware alone. (b) Repeatability was generally high, with low variability across repeated scans and minimal impact of operator differences when standard scanning protocols were followed. (c) Visual and practical differences still matter. Even among statistically similar apps, variations in mesh smoothness and marker/surface clarity were observed, all of which (may) affect clinical usability. (d) Future work should validate these findings in clinical settings, including real-patient testing, volume and overlay accuracy assessments, and the application of AI-based methods to improve reconstruction robustness.
In summary, mobile 3D scanning apps—especially SureScan and Structure Sensor—can already deliver clinically acceptable accuracy. However, variability in reconstruction quality highlights the importance of careful app selection and further clinical validation.