A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models

Zhou, Qianwen; Ren, Na; Zhu, Changqing; Cai, Jingyi

doi:10.3390/rs18101468

Open AccessArticle

A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models

¹

Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing 210023, China

²

Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China

³

Nanjing Geomarking Information Technology Co., Ltd., Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1468; https://doi.org/10.3390/rs18101468

Submission received: 19 March 2026 / Revised: 3 May 2026 / Accepted: 4 May 2026 / Published: 8 May 2026

(This article belongs to the Section Earth Observation Data)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Improves recall of sensitive object detection in multi-level 3D urban models by up to 0.42 and precision by up to 0.14.
Enhances texture continuity by 56.9–79.5% through scene-aware reconstruction and multi-level mapping.

What are the implications of the main findings?

Improves the usability and visual consistency of privacy-preserved 3D urban models.
Provides an effective solution for secure data sharing and privacy-compliant processing in smart city applications.

Abstract

Texture privacy preservation is a key technique for enabling the secure sharing and compliant use of three-dimensional (3D) urban models. However, maintaining visually continuous textures after privacy-preserving processing remains a challenging task due to the fragmented storage of textures in models. Considering the continuity of texture representation within the 3D scene, this paper proposes a scene-context-aware texture privacy-preserving method for 3D urban models. Specifically, a texture fragmentation metric is first introduced to adaptively determine the optimal detection level for sensitive targets, enabling accurate localization and segmentation of sensitive regions. Subsequently, scene-level texture contextual information associated with the detected regions is incorporated to guide a scene-aware texture repair strategy, achieving spatially consistent texture reconstruction. Furthermore, a multi-level texture pyramid mapping mechanism is established to ensure texture representation consistency across different levels of detail of the model. Experimental results demonstrate that the proposed method improves post-preservation texture continuity by approximately 56.9–79.5% compared with representative approaches that rely on fragmented texture map, significantly enhancing the usability and reliability of privacy-preserving results. Overall, this work provides a novel technical framework for privacy-preserving processing of 3D urban models and offers new insights for secure data sharing in smart city applications.

Keywords:

3D urban models; earth observation data; privacy-preserving; image inpainting

1. Introduction

3D urban models, as a form of earth observation data, are typically composed of triangular meshes and texture maps, providing detailed representations of urban environments and serving as essential data assets in digital economy applications, smart city management, and spatial information services [1,2,3,4,5]. However, during real-world data acquisition and modeling, texture maps inevitably capture sensitive information, including personal identities, confidential facilities, and critical infrastructure details [6]. Unauthorized disclosure of such information may lead to severe privacy violations and even pose threats to public and national security [7,8,9,10]. Consequently, increasingly strict regulations on data sharing have significantly limited the utilization of large-scale 3D urban datasets, highlighting the urgent need for effective texture privacy-preserving techniques.

Texture privacy-preserving aims to eliminate the identifiability of sensitive content through detection and modification while preserving the usability of the original data [11,12,13,14]. Early approaches relied heavily on manual editing, which is inefficient and impractical for large-scale datasets. With the advancement of computer vision, automated methods based on object detection and image inpainting have been widely adopted [15,16,17]. However, directly applying these techniques to 3D urban models remains challenging due to their unique structural characteristics.

3D urban models generated through oblique photogrammetry consist of massive irregular triangular meshes and are typically organized in hierarchical Level of Detail (LOD) structures, where different levels correspond to varying spatial resolutions and texture granularities. During reconstruction, texture images are partitioned into numerous independent texture blocks, resulting in highly fragmented texture representations [18,19]. As illustrated in Figure 1, while simple geometric surfaces can be mapped to continuous texture domains, textures in 3D urban models exhibit discontinuous spatial distributions. Moreover, as shown in Figure 2, texture characteristics vary significantly across LOD levels: higher levels provide finer details but exhibit more severe fragmentation, whereas lower levels preserve more complete contextual information but suffer from insufficient resolution. Since texture restoration relies heavily on contextual information, these characteristics introduce significant challenges for achieving visually coherent privacy-preserving results.

Recent studies on privacy preservation in 3D urban models can be broadly categorized into three groups: texture camouflage-based methods, multi-view image-based methods, and texture map-based methods.

Texture camouflage-based methods aim to reduce the visual saliency of sensitive targets by blending them into surrounding backgrounds under specific viewpoints or rendering conditions [20,21,22,23,24]. For example, Guo et al. proposed GANmouflage, which learns texture distributions in continuous 3D space using neural implicit functions to achieve viewpoint-dependent visual camouflage [22]. However, these methods primarily operate at the rendering level and do not irreversibly remove or replace sensitive textures at the data level, leaving potential risks of sensitive information recovery.

Multi-view image-based methods perform sensitive content detection and modification on images captured during the data acquisition stage, such as UAV or oblique photogrammetry, prior to 3D reconstruction [25,26,27,28,29]. For instance, Li et al. integrated semantic segmentation networks with generative adversarial networks to automatically detect and repair privacy-sensitive regions in multi-view images [26]. Although these approaches effectively prevent sensitive information from appearing in reconstructed models, they require access to the original imagery and therefore cannot be directly applied to existing 3D urban models, limiting their practical applicability.

Texture map-based methods operate directly on 2D texture maps exported from 3D models and offer strong applicability and data-level security [30,31,32,33]. For instance, Xu et al. combined YOLOv5s with PatchMatch to perform sensitive target detection and texture replacement, demonstrating the feasibility of this approach [32]. However, these methods process fragmented texture patches independently and ignore spatial relationships in the 3D scene, often resulting in noticeable visual discontinuities and inconsistent restoration across LODs.

In summary, texture camouflage-based methods provide limited data-level security, while multi-view image-based methods suffer from restricted applicability. Although texture map-based methods are widely applicable, they struggle to maintain spatial continuity and inter-level consistency due to fragmented texture representations. Therefore, constructing continuous texture representations from fragmented textures and ensuring coherent texture restoration across multiple LODs remains a critical challenge.

To address these issues, this paper proposes a scene-context-aware texture privacy-preserving method for 3D urban models. The proposed method establishes a mapping between 3D scene geometry and continuous texture representations, enabling scene-context-guided and spatially coherent texture reconstruction. Specifically, a fragmentation-aware detection strategy is introduced to select appropriate texture levels, a scene-aware reconstruction method is designed to achieve context-consistent restoration, and a multi-level mapping mechanism is employed to propagate reconstructed textures across different LODs. Experimental results demonstrate that the proposed method effectively removes sensitive content while significantly improving visual continuity compared with existing approaches.

The remainder of this paper is organized as follows. Section 2 presents the proposed method. Section 3 reports experimental results and analysis, followed by discussion and conclusions in Section 4 and Section 5.

2. Materials and Methods

2.1. Overview of the Proposed Method

3D urban models are typically organized as hierarchical structures composed of multiple nodes, where each node stores both geometric information and associated texture maps. Texture privacy-preserving for such models involves two fundamental tasks: sensitive content localization and sensitive content removal. Existing methods generally operate on 2D texture maps within individual nodes, performing detection and inpainting independently. However, this paradigm neglects the spatial relationships of textures in 3D scenes, often resulting in degraded visual continuity and inconsistent outcomes across different LODs.

To overcome these limitations, the proposed method departs from conventional 2D texture-based processing and introduces a scene-aware framework guided by 3D geometric constraints. Specifically, a fragmentation-aware strategy is first employed to identify suitable detection levels, enabling sensitive object detection to be performed under a balanced condition between texture resolution and structural integrity. Based on the detected regions, planar fitting is conducted on the corresponding triangular surfaces to estimate local geometric structures. An orthographic projection coordinate system is then constructed to establish an accurate mapping between 2D texture regions and their corresponding 3D scene surfaces, allowing texture reconstruction to be guided by continuous and spatially coherent contextual information.

The overall framework of the proposed method is illustrated in Figure 3 and consists of three key components:

(1): Fragmentation-aware sensitive object detection module, which adaptively selects appropriate texture levels to ensure reliable detection under fragmented texture representations, thereby providing accurate inputs for subsequent processing;
(2): Scene-aware texture reconstruction module, which leverages 3D geometric constraints to establish continuous contextual representations and enables high-quality texture restoration;
(3): Multi-level stable mapping module, which propagates the reconstructed textures back to the original 3D urban model, ensuring global consistency across different LODs.

2.2. Fragmentation-Aware Detection Strategy

Sensitive object detection is a prerequisite for texture privacy preservation, aiming to accurately localize privacy-related targets within complex multi-level texture data. In this study, sensitive objects refer to scene elements containing privacy-related information, such as license plates and textual signs. A pre-trained object detection model is adopted as the front-end component to provide initial localization results. In this study, the detector is further adapted to the target domain through light-weight fine-tuning on a combination of publicly available datasets and domain-specific samples collected from photogrammetric 3D urban models. These samples are manually annotated to include privacy-related categories relevant to urban scenes.

Existing methods typically perform detection independently on texture maps at each node or level. Such exhaustive processing not only introduces substantial computational overhead, but also leads to unreliable detection results, including missed detections at low-resolution levels and false detections at highly fragmented levels. To address these issues, a fragmentation-aware detection strategy is proposed to adaptively select a subset of appropriate texture levels for detection. By restricting detection to selected levels rather than all nodes, the proposed strategy effectively reduces computational cost while ensuring a balance between texture integrity and spatial resolution.

To quantify texture fragmentation, a fragmentation indicator is defined based on the proportion of non-informative pixels in texture images. During texture unwrapping, unmapped regions are typically filled with uniform black pixels, which can be regarded as non-informative areas. Let the total number of LOD levels in a 3D urban model be denoted as

L

. The model hierarchy is traversed from the root node. For each node at level

l

, its corresponding texture image is denoted as

T_{l}

. The fragmentation degree at level

l

is defined as:

F_{l} = \sum_{i = 1}^{N (l)} \frac{A (T_{i})}{A_{t o t a l} (T_{i})} / N (l)

(1)

where

N (l)

is the number of nodes at level

l

,

A (T_{i})

represents the number of non-informative pixels, and

A_{t o t a l} (T_{i})

denotes the total number of pixels in

T_{i}

. A higher value of

F_{l}

indicates a larger proportion of non-informative regions, higher texture fragmentation, and consequently a higher risk of missed detections.

Based on the fragmentation measure, detection levels are adaptively determined. Starting from lower levels, levels with

F_{l} < γ

are selected to form the detection level set

L_{\det}

, where

γ = 0.2

is a predefined fragmentation threshold. This selection process ensures that detection is performed only on texture images with sufficient contextual continuity, while avoiding highly fragmented or low-resolution levels. As a result, redundant detection across all nodes is avoided, further improving computational efficiency.

Sensitive object detection is then conducted on texture images corresponding to levels in

L_{\det}

using a pre-trained YOLOv11 model [34]. The model outputs 2D bounding boxes, confidence scores, and object categories. Valid detections are filtered based on a confidence threshold and collected into the detection set

R_{\det}

:

R_{\det} = {B_{x y} = (x_{\min}, y_{\min}, x_{\max}, y_{\max})}

(2)

where

B_{x y}

denotes the pixel coordinates of sensitive targets.

To establish the connection between 2D detection results and the 3D model, the detected bounding boxes are first mapped to UV space:

B_{u v} = (\frac{x_{\min}}{W}, \frac{y_{\min}}{H}, \frac{x_{\max}}{W}, \frac{y_{\max}}{H})

(3)

where

W

and

H

denote the width and height of the texture image, respectively, and

B_{u v}

represents the UV coordinates of sensitive targets.

Subsequently, triangular faces whose UV regions overlap with the detected regions are identified to obtain candidate sensitive triangles:

T r i = {t r i |I o U (t e x_{t r i}, B_{u v}) > λ_{u v}}

(4)

where

t r i

denotes a triangular face in 3D space,

t e x_{t r i}

is its corresponding UV region,

I o U

denotes the intersection-over-union metric, and

λ_{u v}

is the overlap threshold.

Finally, a 3D bounding box is computed for the triangle set, and the detection results are updated to form

{R_{\det}}^{'} = {B_{x y}, B_{u v}, B o x, T r i}

, which includes pixel coordinates, UV coordinates, 3D spatial coordinates of sensitive targets and associated triangle sets.

This process produces a unified set of sensitive regions with consistent spatial localization, providing reliable inputs for subsequent scene-aware texture reconstruction.

2.3. Scene-Aware Texture Reconstruction

Scene-aware texture reconstruction is the core component of the proposed method, designed to overcome the limitations of fragmented texture maps by introducing continuous scene-level contextual information. Unlike conventional methods that perform independent inpainting on each texture map, the proposed approach reconstructs each sensitive target only once in a unified scene representation. This strategy avoids redundant processing across multiple texture fragments and significantly reduces computational cost while improving reconstruction consistency.

To generate high-quality scene images for texture restoration, two key requirements must be satisfied: (1) the sensitive region should be fully observable to avoid occlusion, and (2) sufficient spatial resolution should be preserved to prevent the loss of fine-grained texture details during projection.

To meet these requirements, an orthographic projection coordinate system is constructed based on the geometric characteristics of the sensitive region, as illustrated in Figure 4. Specifically, while the detection stage is performed on selected levels (Section 2.2), the scene rendering stage preferentially accesses texture data from the highest available level of detail. This design decouples detection efficiency from reconstruction quality, enabling the method to achieve both reduced computational cost and high-resolution texture reconstruction.

Starting from the detection results in Section 2.2, the 2D sensitive object mask is first mapped into 3D space to obtain the corresponding spatial region

m a s k

. Based on the extracted sensitive triangle set

T r i

, a local planar approximation is constructed to describe the geometric structure of the region. Specifically, a plane

α

is fitted to the triangle vertices using a least-squares method, and the unit normal vector

n

is derived to represent the local surface orientation.

Based on this geometric representation, a virtual orthographic camera is defined. The center of the 3D bounding box

B o x

of the sensitive region is selected as the reference point

P

, ensuring that the camera is oriented toward the target. To obtain a fronto-parallel observation and minimize geometric distortion, the camera is positioned along the normal direction of the fitted plane

α

:

L = P + d \cdot n

(5)

where

L

denotes the camera position and

d

is an adaptive offset distance along the normal direction. In this study,

d

is determined proportionally to the spatial extent of the bounding box, defined as:

d = β \cdot \max (l e n g t h (B o x), w i d t h (B o x), h e i g h t (B o x))

(6)

where

β

is an empirical scaling factor which is set to 1.5. This design ensures that the sensitive region is fully visible within the camera view while avoiding excessive scale distortion.

Since orthographic projection eliminates perspective effects, the viewing volume is defined as a rectangular window centered at the reference point

P

. Its size is determined based on the spatial extent of the sensitive region with an additional margin:

D = λ \cdot \max (l e n, w i d)

(7)

where

λ

is a scale factor which is set to 1.5,

w i d

and

l e n

denote the width and length of the sensitive target within the

m a s k

, respectively. This guarantees that the projected image fully covers both the target and its surrounding context.

Under the constructed orthographic coordinate system, scene rendering is performed to obtain a continuous image

i m g_{o r t h o}

. During this process, texture data from the highest available level of detail are preferentially utilized to ensure that the rendered image preserves maximal spatial resolution and fine-grained texture details.

Compared with directly processing fragmented 2D texture maps, the rendered scene image

i m g_{o r t h o}

provides a compact, continuous, and semantically coherent representation of the sensitive region and its surroundings. This significantly reduces redundant and non-informative content while providing stronger contextual constraints for subsequent reconstruction. The rendered scene image is then fed into a pre-trained inpainting network BrushNet [35] released in 2024 to perform texture restoration, yielding the final inpainted result

I_{i n p a i n t}

.

Overall, this design transforms fragmented texture representations into a unified scene-based representation, enabling efficient and high-quality texture reconstruction while maintaining both spatial continuity and computational efficiency.

2.4. Multi-Level Stable Mapping

Due to inherent differences in texture resolution and fragmentation across multiple levels of a 3D urban model, independently performing texture replacement at each level often leads to noticeable inter-level inconsistencies.

To address this issue, the entire model is formulated as a hierarchical texture pyramid, and a multi-level stable mapping strategy is proposed to ensure globally consistent texture reconstruction across all levels. Unlike conventional approaches that perform independent inpainting at each level, the proposed method conducts texture reconstruction only once at the selected detection levels (Section 2.2) using scene-aware context (Section 2.3), and subsequently propagates the reconstructed results to other levels through a geometry-guided seamless re-projection mechanism. This design effectively avoids redundant inpainting operations and improves computational efficiency.

A key characteristic of the proposed strategy is that texture updating is performed at the pixel level rather than the triangle level, enabling fine-grained and seamless integration of reconstructed textures.

Specifically, for each node, all associated triangular faces are first traversed to identify sensitive triangles based on their spatial relationship with the detected sensitive regions:

I o V (t r i, B o x) > 0

(8)

where

t r i

represents the spatial coordinates of a triangle,

B o x

denotes the sensitive target bounding box, and

I o V

is the volume intersection-over-volume metric. If the condition is satisfied, the triangle is considered to contain sensitive texture information and is selected for texture replacement.

For the selected triangles, their UV coordinates are mapped to pixel coordinates in the scene image using the orthographic camera parameters established in Section 2.3. Based on the corresponding scene mask, pixels that fall within sensitive regions are further identified. Only those pixels satisfying the mask constraint are updated, ensuring that texture replacement is restricted to precise sensitive areas rather than entire triangles. The texture update is then performed as:

T_{n o d e} (u, v) = I_{i n p a i n t} (X, Y), i f (m a s k (X, Y) = 1)

(9)

where

T_{n o d e}

denotes the texture map of the current node,

T_{n o d e} (u, v)

represents the pixel value at UV coordinates

(u, v)

, and

I_{i n p a i n t} (X, Y)

is the corresponding pixel value in the reconstructed scene image.

Through the joint constraints of triangle selection and pixel-wise mask filtering, the reconstructed textures are seamlessly re-projected onto the original 3D surfaces. Compared with conventional triangle-level replacement strategies, this approach effectively eliminates boundary artifacts and avoids visual discontinuities at region transitions.

Finally, the reconstructed textures are consistently propagated across all relevant levels in the hierarchical structure, ensuring that identical sensitive regions are uniformly updated throughout the model. This multi-level mapping strategy not only maintains global texture coherence but also reduces redundant computations, thereby improving both visual consistency and computational efficiency.

3. Experiments and Analysis

3.1. Experimental Datasets

To evaluate the security of the proposed privacy-preserving method and its capability to preserve scene-level visual continuity, three 3D urban models from different geographic regions were selected as experimental datasets. These datasets exhibit diverse data scales and scene complexities, thereby covering representative characteristics commonly encountered in practical applications. The basic information and visualizations of the datasets are presented in Table 1 and Figure 5. The dataset coordinates are defined in the CGCS2000 coordinate system and stored in the OSGB format. In the 3D coordinate system, the X-axis corresponds to the east direction, the Y-axis corresponds to the north direction, and the Z-axis corresponds to the upward direction.

To quantitatively evaluate detection performance, ground-truth annotations of sensitive targets are constructed for all experimental datasets. Specifically, sensitive regions are manually annotated on texture images with reference to their corresponding 3D spatial locations. The annotation process follows standard object detection protocols, where bounding boxes are defined to tightly enclose sensitive regions. These annotations are used as the ground truth for computing metrics in Section 3.3.

All experiments were conducted on a workstation equipped with Windows 11, an Intel^® Core™ i7-13700 (13th Gen) CPU, 64 GB RAM, and an NVIDIA GeForce RTX 3090 GPU. The proposed method was implemented using C++ and Python 3.9, and GPU acceleration was enabled throughout the experiments to ensure computational efficiency.

To systematically assess the contribution of each key component in the proposed framework to the privacy-preserving performance, two comparison algorithms were designed. Comparison algorithm A adopts a 2D texture-based privacy-preserving method proposed by Xu et al. (2023) [32], referred to as AL. A. This method employs the non-deep-learning PatchMatch algorithm to perform texture repair within fragmented texture regions. Comparison algorithm B is derived from the proposed framework by removing the scene-aware strategy and the multi-level consistency mapping module, while retaining the deep-learning-based sensitive target inpainting component. This algorithm is denoted as AL. B and is used to analyze the performance gains introduced by scene awareness and multi-level mapping.

3.2. Evaluation Metrics

To comprehensively evaluate the effectiveness of the proposed method, three aspects are considered: (1) sensitive target detection accuracy, (2) scene texture visual continuity, and (3) security of privacy-preserving results. The corresponding evaluation metrics are defined as follows.

3.2.1. Metrics for Sensitive Target Detection

Accurate detection and localization of sensitive targets are fundamental to ensuring the completeness and security of privacy-preserving processing. In this study, detection performance is evaluated using recall, precision, and Intersection over Union (IoU).

Recall measures the completeness of detection, defined as:

r e c a l l = \frac{T P}{n}

(10)

where

T P

denote the number of sensitive targets correctly detected, and

n

represents the total number of sensitive targets. A

r e c a l l

closer to 1 indicates a lower missed-detection rate and thus higher privacy-preserving security.

Precision evaluates the reliability of detection by measuring the proportion of correctly detected targets among all detections:

p r e c i s i o n = \frac{T P}{T P + F P}

(11)

where

F P

denote the falsely detected sensitive texture regions.

To further assess spatial localization accuracy, the IoU metric is adopted:

I o U = \frac{A_{\det} \cap A}{A} \times 100 %

(12)

where

A_{\det}

denote the volume of the detected bounding box of a sensitive target, and

A

denote the volume of the ground-truth bounding box. An

I o U

value closer to 1 indicates a higher proportion of correctly segmented sensitive regions, resulting in more complete detection and more thorough privacy-preserving processing.

3.2.2. Metrics for Scene Texture Visual Continuity

Visual continuity is critical for maintaining the perceptual quality and usability of privacy-preserved 3D urban models. In this study, it is evaluated from two aspects: spatial visual continuity and inter-level consistency.

To quantify spatial continuity, the Boundary Gradient Consistency (BGC) metric is introduced:

B G C = |\frac{1}{|R_{i n}|} \sum_{P \in R_{i n}} \nabla I (p) - \frac{1}{|R_{o u t}|} \sum_{P \in R_{o u t}} \nabla I (p)|

(13)

where

I

denotes the input image,

\nabla I

represents the gradient magnitude map, and

p = (x, y)

denotes pixel coordinates.

|R_{i n}|

and

|R_{o u t}|

represent the number of pixels within the inner and outer boundary bands, respectively. Lower BGC values indicate smoother transitions between processed and unprocessed regions.

For inter-level consistency, the Learned Perceptual Image Patch Similarity (LPIPS) metric is employed:

LPIPS (I, I_{0}) = \sum_{n} \frac{1}{H_{n} W_{n}} {\sum_{h, w} ‖w_{n} ⊙ ({\hat{f}}_{h w}^{n} - {\hat{f}}_{0 h w}^{n})‖}^{2}

(14)

where

I

and

I_{0}

denote the two images being compared,

n

indexes the network layer,

H_{n}

and

W_{n}

denote the height and width of the feature map,

w_{n}

is the learned weight vector,

{\hat{f}}_{h w}^{n}

and

{\hat{f}}_{0 h w}^{n}

are the normalized feature representations. LPIPS measures perceptual differences between texture images in a deep feature space and effectively reflects human sensitivity to texture variations. The LPIPS value ranges from 0 to 1, where smaller values indicate higher perceptual similarity.

3.2.3. Metrics for Re-Detection of Inpainted Sensitive Targets

To evaluate the security of privacy-preserving results from a machine perception perspective, a re-detection rate (RD) is introduced. RD measures whether sensitive targets can still be identified after processing. Specifically, pre-trained object detectors are applied to the inpainted textures with a reduced confidence threshold to increase sensitivity to residual cues.

In the initial detection stage, a confidence threshold of 0.75 is adopted to ensure high-precision localization of sensitive targets, thereby avoiding excessive false positives that may lead to unnecessary texture modification. In contrast, during the re-detection stage, the threshold is reduced to 0.60. This adjustment aims to increase the sensitivity of the detector to weak residual features that may remain after inpainting. By relaxing the confidence constraint, the detector is more likely to identify suspicious regions that are not visually salient but may still contain exploitable cues from a machine perception perspective.

Under this setting, the RD is defined as:

R D = \frac{n_{r e}}{n}

(15)

where

n

represents the total number of sensitive targets and

n_{r e}

represents the number of original sensitive target regions that are re-detected as suspected sensitive regions after privacy-preserving processing. A lower RD indicates a more effective removal of sensitive features, and an RD approaching zero suggests a higher level of security in the texture privacy-preserving results.

3.3. Results and Analysis

3.3.1. Sensitive Target Detection Accuracy

3D urban models exhibit pronounced multi-level characteristics. To analyze sensitive target detection accuracy across different levels, the test datasets were divided into three level intervals according to modeling granularity: low levels (L17–L18), medium levels (L19–L21), and high levels (L22–L24). Sensitive target detection

r e c a l l

,

p r e c i s i o n

and

I o U

were statistically analyzed for each level interval.

As shown in Table 2 and Figure 6, the recall values of AL. A and AL. B are mainly concentrated in the range of 0.60–0.85. The minimum recall values are limited to 0.57 and 0.59, respectively, indicating noticeable missed detections. This phenomenon can be attributed to the fact that AL. A and AL. B perform sensitive target detection independently at each level. As the level increases, texture resolution improves while texture fragmentation becomes more severe. Consequently, low levels are prone to missed detections due to insufficient resolution, whereas high levels are susceptible to detection failures caused by highly fragmented textures.

In contrast, the proposed method introduces a fragmentation-aware adaptive detection level selection strategy, enabling sensitive target detection to be performed at levels where texture resolution and integrity are relatively balanced. Experimental results show that the minimum recall achieved by the proposed method across all test datasets is 0.94, while the maximum reaches 1.00, indicating that sensitive targets can be detected with high completeness and strong reliability.

As shown in Table 3 and Figure 7, the precision values further reveal the differences in detection reliability among the compared methods. Similar to the recall analysis, AL. A and AL. B perform detection independently at each level, which makes them sensitive to variations in texture quality across different LODs. Specifically, at lower levels, the relatively low texture resolution leads to insufficient feature representation, making background textures more likely to be incorrectly identified as sensitive targets. This results in an increased number of false positives and consequently lower precision values. In contrast, the proposed method maintains consistently high precision across all datasets and levels. This indicates that the method effectively suppresses false detections while ensuring accurate localization of sensitive targets. The results demonstrate that the proposed fragmentation-aware detection strategy not only improves detection completeness but also enhances detection accuracy by reducing erroneous responses.

As shown in Table 4 and Figure 8, further analysis from the perspective of spatial segmentation accuracy reveals that AL. A and AL. B exhibit substantial fluctuations in spatial localization across different levels, indicating inconsistency in segmentation results between levels. The IoU values at low and high levels are mainly distributed in the range of 0.50–0.70, suggesting that portions of sensitive regions are not involved in the privacy-preserving process. By contrast, the proposed method effectively enhances spatial segmentation accuracy through spatial aggregation of detection results and multi-level consistent mapping. In the experiments, the IoU values at different levels reach 0.99, 1.00, and 0.95, respectively, representing an overall improvement of more than 7.5% compared with the comparison algorithms, while remaining stable across levels. These results demonstrate that the proposed method achieves high-precision spatial localization of sensitive targets and effectively ensures the security, which is consistent with the theoretical analysis presented earlier.

3.3.2. Scene Texture Visual Continuity

Insufficient visual continuity of privacy-preserving textures not only degrades the overall representation of 3D scenes but may also induce abnormal visual perception of processed regions, thereby increasing the risk that sensitive areas are re-identified. Such artifacts severely undermine the usability and credibility of privacy-preserving 3D urban models.

From the perspective of 3D scene perception and multi-level rendering, scene texture visual continuity can be characterized by two aspects: spatial visual continuity and inter-level consistency. Spatial visual continuity reflects the smoothness of transitions between reconstructed regions and their surrounding textures, while inter-level consistency describes the consistency of reconstruction results for the same sensitive target across different levels of detail, which is essential for coherent multi-scale rendering.

To comprehensively evaluate the proposed method, both qualitative and quantitative analyses are conducted.

From a qualitative perspective, three representative sensitive targets are selected: (a) a height-restriction sign on a bridge in Model A, (b) building text in Model A, and (c) a building signboard in Model C. The original sensitive targets are shown in Figure 9, and the corresponding privacy-preserving results of AL. A, AL. B, and the proposed method under multi-level rendering are presented in Table 5, Table 6 and Table 7.

Due to the fragmented storage characteristics of textures in 3D urban models, 2D texture images often contain large areas of meaningless black regions. Under this condition, AL. A adopts 2D texture images as the reconstruction context, where the PatchMatch algorithm relies on local patch propagation. As a result, non-informative patches are frequently introduced into reconstructed regions, leading to noticeable noise artifacts, as observed for targets (a) and (c). Moreover, in scenarios with large missing texture regions, AL. A struggles to reconstruct fine-grained texture structures, producing block-like artifacts and exhibiting limited practical usability.

Compared with AL. A, AL. B replaces PatchMatch with the deep-learning-based BrushNet, which improves local texture reconstruction to some extent. However, since the repair process still relies on 2D texture images, scene-level visual coherence remains insufficient. Specifically, for sensitive target (a), the repaired region is severely disconnected from its surrounding scene, with large-scale visual artifacts. For sensitive target (b), local texture distortion occurs, and wall structures become discontinuous. For sensitive target (c), substantial discrepancies are observed among different levels. Overall, the privacy-preserving results of AL. B remain visually noticeable, limiting their applicability.

In contrast, the proposed method utilizes scene textures as repair context and achieves complete removal of sensitive information across all levels while maintaining coherent integration with the surrounding environment. For sensitive target (a), the repaired region is artifact-free; for sensitive target (b), wall textures remain structurally continuous; and for sensitive target (c), strong visual consistency is preserved across levels without reintroducing sensitive content. Overall, the proposed method produces privacy-preserving results with high scene texture continuity, satisfying the requirements of practical applications for 3D urban models.

From a quantitative perspective, spatial visual continuity is evaluated using the BGC metric, while inter-level consistency is measured using LPIPS.

Spatial Visual Continuity

The BGC statistics of AL. A, AL. B, and the proposed method are reported in Table 8. Although the number of samples for each sensitive target is limited, box plots effectively reveal the distribution trends and stability of BGC values across multiple levels. Therefore, BGC comparison box plots are illustrated in Figure 10.

As shown in Table 8 and Figure 10, significant differences exist among the BGC distributions of different methods. For sensitive target (a), the BGC values of AL. A and AL. B fluctuate substantially across levels, with notable increases at levels 22 and 23. This indicates that 2D texture-based repair methods are more prone to introducing boundary discontinuities at higher resolutions. In contrast, the proposed method consistently maintains low BGC values across all levels, achieving average reductions of approximately 56.9% and 79.5% compared with AL. A and AL. B, respectively. This demonstrates that incorporating scene texture context leads to smoother gradient transitions between repaired regions and surrounding textures.

Overall, the proposed method not only significantly reduces BGC values but also effectively suppresses boundary gradient discontinuities across multiple levels, thereby preserving spatial texture continuity in privacy-preserving results.

Inter-Level Consistency

Inter-level consistency is evaluated using the LPIPS metric. By comparing texture results of adjacent levels under the same viewpoint, the perceptual differences introduced by privacy-preserving processing are quantified. The LPIPS statistics for different sensitive targets are reported in Table 9.

The results indicate that both AL. A and AL. B introduce noticeable perceptual inconsistencies across levels for multiple sensitive targets. In contrast, the proposed method treats the model as a hierarchical pyramid and performs precise multi-level mapping of repaired textures, achieving LPIPS values that are comparable to or even lower than those of the original data across corresponding levels.

Considering that level transitions inherently introduce natural variations in resolution and texture details, absolute LPIPS values alone are insufficient to fully characterize the impact of privacy-preserving processing. Therefore, the change in LPIPS before and after privacy-preserving processing, denoted as ΔLPIPS, is further analyzed, as shown in Figure 11. When ΔLPIPS approaches zero, the impact on inter-level consistency is limited.

As illustrated in Figure 11, AL. A and AL. B exhibit large fluctuations in ΔLPIPS across different level pairs, with predominantly positive values, indicating that these methods generally amplify perceptual differences between adjacent levels and disrupt the original hierarchical relationships. In contrast, the proposed method yields a more concentrated ΔLPIPS distribution centered around zero or negative values. For example, for sensitive target (a), LPIPS decreases by approximately 13% between levels L20 and L21, and for sensitive target (c), LPIPS decreases by approximately 63% between levels L22 and L23.

In summary, the scene-aware and multi-level mapping-based privacy-preserving strategy effectively mitigates inter-level instability and provides reliable visual continuity for multi-scale rendering and application scenarios.

3.3.3. Re-Detection Rate of Inpainted Sensitive Targets

After categorizing the 3D urban models into low, medium, and high LOD, the re-detection rates of sensitive targets are summarized in Table 10 and illustrated in Figure 12.

As shown in Table 10 and Figure 12, although AL. A and AL. B exhibit comparable recall performance in sensitive target detection (as discussed in Section 3.2.2), their robustness after texture reconstruction differs significantly. Specifically, AL. A demonstrates a substantially higher RD compared to AL. B. For instance, in the medium LOD of Model C, the detection recall values are 0.88 and 0.90 (Table 2), respectively, whereas the corresponding RD values increase to 0.14 and 0.10 after texture inpainting. This indicates that the reconstructed results of AL. A still contain noticeable residual cues, leading to a higher risk of sensitive information leakage. The primary reason lies in the fact that AL. A employs a PatchMatch-based texture inpainting strategy, which focuses on local texture consistency but lacks sufficient semantic-level reconstruction capability. As a result, the inpainted regions may preserve implicit structural patterns related to the original sensitive targets, making them more likely to be re-detected by machine vision models.

In contrast, the proposed method achieves RD values close to zero across most datasets and LOD levels. This demonstrates that, through accurate sensitive target detection and high-quality texture reconstruction, the likelihood of re-identification is significantly reduced. The processed textures effectively eliminate machine-recognizable cues of sensitive regions, thereby meeting the requirements for secure and publicly shareable 3D urban models, with enhanced reliability and trustworthiness.

4. Discussion

4.1. Impact of the Fragmentation Degree Threshold on Accuracy and Efficiency

The fragmentation degree threshold directly determines the adaptive partitioning of detection layers. This subsection analyzes the impact of different fragmentation degree threshold settings on the overall performance of the proposed method, with the objective of identifying a reasonable parameter range for practical applications. For clarity, the texture fragmentation degree threshold is denoted as FD.

To conduct a statistical analysis, texture images corresponding to a total of 13,083 nodes from the experimental dataset were used as inputs for fragmentation degree computation. Outliers were removed according to Equation (16), after which the distribution of valid FD values was statistically analyzed and visualized using box plots. The corresponding results are presented in Table 11 and Figure 13.

[0, Q_{1} - 1.5 \times L] \cup [Q_{3} + 1.5 \times L, 1]

(16)

where

Q_{1}

and

Q_{3}

represent the first and third quartiles of the FD distribution, respectively, while

L

denotes the interquartile range.

As shown in Figure 13, across different hierarchical levels, the minimum fragmentation degree consistently remains at a relatively low level, whereas both the mean and maximum FD values exhibit a clear increasing trend with increasing hierarchy depth. These statistics indicate that when the mean FD is adopted as the criterion for measuring texture fragmentation at each level, an increasing γ causes progressively more high-level texture nodes to be classified as detection layers and thus involved in sensitive object detection. For example, when the γ is set to 0.1, only levels up to level 20 are included in the detection process. When the γ reaches 0.4, all hierarchical levels in the experimental dataset are incorporated into the detection scope. From a theoretical perspective, this demonstrates that the γ not only directly affects the completeness and accuracy of sensitive object detection but also has a significant impact on the computational cost of the subsequent privacy-preserving process.

To quantitatively validate these effects, sensitive object detection accuracy and efficiency were evaluated under different γ settings on the experimental dataset. The corresponding results are reported in Table 12, Table 13 and Table 14 and illustrated in Figure 14.

As shown in Figure 14, when the γ increases from 0.1 to 0.2, a substantial improvement in detection performance can be observed. Specifically, the recall increases from approximately 0.5 to 0.9, while the IoU improves from about 0.6 to 0.9. However, when the γ is further increased beyond 0.2, detection accuracy improves only marginally, exhibiting a clear diminishing returns effect.

In contrast, efficiency shows a negative correlation with the γ. As the threshold increases, the number of hierarchical levels involved in detection grows accordingly, requiring the object detection network to be repeatedly. This results in a continuous increase in total processing time.

In summary, although increasing the γ can enhance sensitive object detection accuracy to a certain extent, privacy-preserving efficiency remains a critical factor for practical applicability. Based on the experimental results, an FD value of 0.2 was selected under the experimental conditions, as it provides high detection accuracy while maintaining a relatively desirable level of efficiency. It should be noted that this threshold is still an empirical choice derived from a limited dataset. Future work will focus on developing more systematic, data-driven adaptive parameter selection mechanisms to further improve the scientific robustness of γ determination.

4.2. Robustness Under Different Texture Organization Conditions

Although Section 4.1 has validated the effectiveness of the fragmentation degree threshold γ in typical OSGB datasets, in practical applications, the texture organization of 3D urban models—such as texture atlas partitioning and fragmentation levels—may vary significantly due to differences in photogrammetric reconstruction workflows and parameter settings. These variations may affect the statistical distribution of texture fragmentation, thereby influencing the applicability of the threshold γ. Therefore, it is necessary to evaluate the adaptability and robustness of the proposed method under different texture organization conditions.

To this end, while keeping the geometric structure and semantic information unchanged, we construct four types of datasets with different texture organization characteristics by adjusting texture-related parameters, as summarized in Table 15.

Based on these datasets, a unified FD threshold of γ = 0.2 is applied for sensitive texture target detection. Recall is adopted as the evaluation metric, and the experimental results are presented in Table 16.

As shown in Table 16, under different texture organization conditions, the use of γ = 0.2 for sensitive target detection achieves stable performance, maintaining consistently high recall values. Specifically, when the texture structure becomes more regular (Datatype = 2), the overall FD distribution shifts downward. Under a fixed threshold, more layers are included in the detection scope, ensuring sufficient coverage of sensitive targets and thus maintaining detection performance. In contrast, when the texture fragmentation level increases significantly (Datatype = 3), the FD distribution shifts upward, resulting in fewer layers being selected for detection, which may slightly reduce coverage. Therefore, in such scenarios that deviate from typical texture organization patterns, it is necessary to appropriately increase the threshold γ according to application requirements to maintain detection effectiveness. When only the texture storage format is changed (Datatype = 4), without affecting the FD distribution, the detection performance remains unchanged. This indicates that the proposed method is insensitive to texture storage formats.

In summary, γ = 0.2 can be regarded as a generally effective empirical parameter with good generalization capability for texture privacy-preserving tasks in different types of 3D urban models. Meanwhile, in practical applications, the parameter can be further adjusted according to specific platform characteristics and texture organization patterns, especially in highly fragmented scenarios, to enhance the adaptability and robustness of the method.

4.3. Challenges and Future Research Directions

Although the proposed privacy-preserving texture processing method for 3D urban models has been validated through both theoretical analysis and experimental evaluation in terms of security and usability, there remains room for further improvement.

From a methodological perspective, the current privacy-preserving strategy still relies on sensitive object detection paradigms operating in 2D texture image space. Such approaches implicitly assume that sensitive objects appear as continuous, complete regions with stable visual characteristics in image space. However, as analyzed in Section 2.1, textures in 3D urban models are often fragmented into multiple spatially discontinuous patches and stored in a distributed manner. This structural fragmentation makes it difficult for conventional convolutional neural networks to perceive sensitive objects holistically, inevitably leading to missed detections or incomplete detection results in practical applications.

Furthermore, certain types of sensitive objects in 3D scenes exhibit strong correlations with geometric structures. These objects often lack stable and distinguishable visual patterns in two-dimensional texture space, and their discriminative characteristics rely more heavily on spatial configuration, geometric relationships, or high-level semantic information. Such requirements exceed the representational capability of traditional object detection networks. This limitation constitutes a fundamental challenge for current methods, particularly in accurately identifying three-dimensional sensitive targets within texture space and supporting geometry–texture collaborative privacy-preserving processing.

Based on the above analysis, a key direction for future research lies in introducing more global scene-aware mechanisms into sensitive object detection and privacy-preserving workflows, thereby reducing reliance on purely two-dimensional texture representations. In addition, from a three-dimensional perspective, exploring collaborative privacy-preserving methods that integrate semantic information, geometric structure, and texture features is expected to further advance privacy protection technologies for 3D geospatial data.

5. Conclusions

Addressing the problem of insufficient visual continuity in repaired textures caused by fragmented texture storage in 3D urban models remains a critical challenge in privacy-preserving processing. By considering the continuity of texture representation during 3D scene rendering, this study proposes a scene-context-aware texture privacy-preserving method. Specifically, a mapping between the 3D scene structure and continuous 2D texture representations is established, allowing fragmented texture patches to be processed under global scene constraints. This design effectively alleviates the visual discontinuity problem commonly observed in existing texture map-based privacy-preserving methods. In addition, an adaptive detection-layer partitioning strategy based on the degree of texture fragmentation is introduced to improve the completeness of sensitive object detection. By dynamically adjusting detection regions according to texture distribution characteristics, the proposed mechanism enhances the reliability of sensitive content identification.

Experimental results demonstrate that the proposed method significantly improves both sensitive object detection accuracy and the visual continuity of repaired textures compared with existing approaches. In particular, the proposed method improves post-preservation texture continuity by approximately 56.9–79.5% compared with representative texture map-based privacy-preserving methods. These results indicate that incorporating scene-aware texture representation can effectively bridge the gap between strong data-level privacy protection and high visual fidelity in 3D urban models.

Overall, the proposed framework provides a new perspective for privacy-preserving processing of large-scale 3D urban models and shows strong potential for practical deployment in smart city data sharing, digital twin systems, and geospatial data governance. Future work will explore the integration of geometric semantics and neural rendering techniques to further improve privacy-preserving performance in complex 3D environments.

Author Contributions

All authors made a valuable contribution to this paper. Conceptualization: Q.Z. and N.R.; Methodology: Q.Z., N.R. and J.C.; Formal analysis and investigation: Q.Z., N.R. and J.C.; Resources: N.R. and Q.Z.; Writing—original draft preparation: Q.Z. and J.C.; Writing—review and editing: Q.Z. and N.R.; Supervision: N.R. and Q.Z.; Funding acquisition: N.R. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers: 42571500 and 42471440] and Postgraduate Research & Practice Innovation Program of Jiangsu Province [grant number: KYCX25_1971].

Data Availability Statement

The data presented in this paper are available on request from the corresponding author.

Acknowledgments

The authors thank the anonymous referees for their constructive comments, which improved the manuscript.

Conflicts of Interest

Authors Na Ren and Changqing Zhu were employed by the company Nanjing Geomarking Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ullah, Z.; Al-Turjman, F.; Mostarda, L.; Gagliardi, R. Applications of Artificial Intelligence and Machine Learning in Smart Cities. Comput. Commun. 2020, 154, 313–323. [Google Scholar] [CrossRef]
Xia, Q.; Lin, H.; Ye, W.; Luo, Y.; Zhao, S.; Wu, H.; Wen, C.; Wang, C. Label-Efficient Outdoor 3D Object Detection via Single Click Annotation from LiDAR Point Cloud. ISPRS J. Photogramm. Remote Sens. 2026, 234, 151–166. [Google Scholar] [CrossRef]
Huang, H.; Yao, X.A.; Krisp, J.M.; Jiang, B. Analytics of Location-Based Big Data for Smart Cities: Opportunities, Challenges, and Future Directions. Comput. Environ. Urban Syst. 2021, 90, 101712. [Google Scholar] [CrossRef]
Lewis, R.H.; Jiao, J.; Seong, K.; Farahi, A.; Navratil, P.; Casebeer, N.; Niyogi, D. Fire and Smoke Digital Twin–a Computational Framework for Modeling Fire Incident Outcomes. Comput. Environ. Urban Syst. 2024, 110, 102093. [Google Scholar] [CrossRef]
Li, G.; Luan, T.H.; Li, X.; Zheng, J.; Lai, C.; Su, Z.; Zhang, K. Breaking down Data Sharing Barrier of Smart City: A Digital Twin Approach. IEEE Netw. 2023, 38, 238–246. [Google Scholar] [CrossRef]
Zhou, J.; Zhang, J.; Zhou, W.; Yi, C.; Song, B. CRFD: A Novel Face Privacy Preservation via Fine-Grained Controllable and Reversible De-Identification. Expert Syst. Appl. 2025, 302, 130386. [Google Scholar] [CrossRef]
Ren, N.; Wang, H.; Chen, Z.; Zhu, C.; Gu, J. A Multilevel Digital Watermarking Protocol for Vector Geographic Data Based on Blockchain. J. Geovisualization Spat. Anal. 2023, 7, 31. [Google Scholar] [CrossRef]
Changqing, Z.H.U.; Na, R.E.N.; Dingjie, X.U. Geo-Information Security Technology: Progress and Prospects. Acta Geod. Cartogr. Sin. 2022, 51, 1017. [Google Scholar] [CrossRef]
Narendra, M.; Valarmathi, M.L.; Anbarasi, L.J. Watermarking Techniques for Three-Dimensional (3D) Mesh Models: A Survey. Multimed. Syst. 2022, 28, 623–641. [Google Scholar] [CrossRef]
Beugnon, S.; Van Rensburg, B.J.; Amalou, N.; Puech, W.; Pedeboy, J.-P. A 3D Visual Security (3DVS) Score to Measure the Visual Security Level of Selectively Encrypted 3D Objects. Signal Process. Image Commun. 2022, 108, 116832. [Google Scholar] [CrossRef]
Ouyang, X.; Xu, Y.; Li, B.; Liu, Y.; Wang, Z.; Yan, Y. Balance Confidentiality and Publicity of Vector Data: A Novel Geometric Accuracy Reduction Method. Int. J. Appl. Earth Obs. Geoinf. 2024, 127, 103684. [Google Scholar] [CrossRef]
Kounadi, O.; Leitner, M. Adaptive Areal Elimination (AAE): A Transparent Way of Disclosing Protected Spatial Datasets. Comput. Environ. Urban Syst. 2016, 57, 59–67. [Google Scholar] [CrossRef]
Zhang, S.; Freundschuh, S.M.; Lenzer, K.; Zandbergen, P.A. The Location Swapping Method for Geomasking. Cartogr. Geogr. Inf. Sci. 2017, 44, 22–34. [Google Scholar] [CrossRef]
Qin, C.; Chang, C.C.; Lee, W.B.; Zhang, X.P. A Secure Image Delivery Scheme with Regional Protection Capability Using Image Inpainting and Editing. Imaging Sci. J. 2013, 61, 509–517. [Google Scholar] [CrossRef]
Frome, A.; Cheung, G.; Abdulkader, A.; Zennaro, M.; Wu, B.; Bissacco, A.; Adam, H.; Neven, H.; Vincent, L. Large-Scale Privacy Protection in Google Street View. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2373–2380. [Google Scholar] [CrossRef]
Bai, T.; Yang, Q.; Fu, S. User-Defined Privacy Preserving Data Sharing for Connected Autonomous Vehicles Utilizing Edge Computing. In Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing, Wilmington, DE, USA, 6–9 December 2023; pp. 145–157. [Google Scholar] [CrossRef]
Tanwar, V.K.; Raman, B.; Rajput, A.S.; Bhargava, R. 2DInpaint: A Novel Privacy-Preserving Scheme for Image Inpainting in an Encrypted Domain over the Cloud. Signal Process. Image Commun. 2020, 88, 115931. [Google Scholar] [CrossRef]
Liu, B.; Liu, W.; Lei, Z.; Zhang, F.; Huang, X.; Awwad, T.M. A Planar Feature-Preserving Texture Defragmentation Method for 3D Urban Building Models. Remote Sens. 2024, 16, 4154. [Google Scholar] [CrossRef]
Maggiordomo, A.; Ponchio, F.; Cignoni, P.; Tarini, M. Real-World Textured Things: A Repository of Textured Models Generated with Modern Photo-Reconstruction Tools. Comput. Aided Geom. Des. 2020, 83, 101943. [Google Scholar] [CrossRef]
Henzler, P.; Mitra, N.J.; Ritschel, T. Learning a Neural 3D Texture Space from 2d Exemplars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8356–8364. [Google Scholar] [CrossRef]
Yao, X. VPP_AHA: Visual Privacy Protection via Adaptive Histogram Adjustment. Multimed. Tools Appl. 2022, 81, 6277–6303. [Google Scholar] [CrossRef]
Guo, R.; Collins, J.; de Lima, O.; Owens, A. Ganmouflage: 3D Object Nondetection with Texture Fields. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4702–4712. [Google Scholar] [CrossRef]
Yuan, L.; Liang, K.; Li, X.; Wu, T.; Wang, N.; Gao, X. iFADIT: Invertible Face Anonymization via Disentangled Identity Transform. Pattern Recognit. 2025, 168, 111807. [Google Scholar] [CrossRef]
Owens, A.; Barnes, C.; Flint, A.; Singh, H.; Freeman, W. Camouflaging an Object from Many Viewpoints. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2782–2789. [Google Scholar] [CrossRef]
Thonat, T.; Shechtman, E.; Paris, S.; Drettakis, G. Multi-View Inpainting for Image-Based Scene Editing and Rendering. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 351–359. [Google Scholar] [CrossRef]
Li, Q.; Zheng, Z.; Wu, F.; Chen, G. Generative Adversarial Networks-Based Privacy-Preserving 3D Reconstruction. In Proceedings of the 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), Hang Zhou, China, 15–17 June 2020; pp. 1–10. [Google Scholar] [CrossRef]
Yang, C.; Zhang, F.; Gao, Y.; Mao, Z.; Li, L.; Huang, X. Moving Car Recognition and Removal for 3D Urban Modelling Using Oblique Images. Remote Sens. 2021, 13, 3458. [Google Scholar] [CrossRef]
Wei, F.; Funkhouser, T.; Rusinkiewicz, S. Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 18131–18141. [Google Scholar] [CrossRef]
Xiang, H.; Min, W.; Han, Q.; Zha, C.; Liu, Q.; Zhu, M. Structure-Aware Multi-View Image Inpainting Using Dual Consistency Attention. Inf. Fusion 2024, 104, 102174. [Google Scholar] [CrossRef]
Maltezos, E.; Doulamis, A.; Ioannidis, C. Improving the Visualisation of 3D Textured Models via Shadow Detection and Removal. In Proceedings of the 2017 9th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Athens, Greece, 6–8 September 2017; pp. 161–164. [Google Scholar] [CrossRef]
Huang, X.; Wang, Y.; Wu, Z. An Automatic Texture Generation Algorithm for 3D Shapes Based on cGAN. J. Phys. Conf. Ser. 2019, 1335, 012006. [Google Scholar] [CrossRef]
Xu, H.; Guo, W.; Li, D.; Hao, J.; Xu, G. Secret Target Automatic Recognition and Decryption Method for Real Scene 3D Model Texture. Bull. Surv. Mapp. 2023, 12, 153–158. [Google Scholar] [CrossRef]
Gao, J.; Shen, T.; Wang, Z.; Chen, W.; Yin, K.; Li, D.; Litany, O.; Gojcic, Z.; Fidler, S. GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images. Adv. Neural Inf. Process. Syst. 2022, 35, 31841–31854. [Google Scholar]
Khanam, R.; Hussain, M. Yolov11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Ju, X.; Liu, X.; Wang, X.; Bian, Y.; Shan, Y.; Xu, Q. Brushnet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 150–168. [Google Scholar] [CrossRef]

Figure 1. Unwrapping textures.

Figure 2. Texture maps at different LODs. Red boxes denote the detected sensitive regions.

Figure 3. Flow of the proposed algorithm. Red boxes denote the detected sensitive regions.

Figure 4. The generation of scene texture images. The detected Chinese text corresponds to “High-pressure oil pipe”.

Figure 5. Experimental datasets.

Figure 6. Recall for (a) AL. A, (b) AL. B, and (c) the proposed method.

Figure 7. Precision for (a) AL. A, (b) AL. B, and (c) the proposed method.

Figure 8. IoU for (a) AL. A, (b) AL. B, and (c) the proposed method.

Figure 9. Sensitive targets: (a) height-restriction sign on a bridge in Model A; (b) “High-pressure oil pipe” text in Model A; (c) “JD No.1” building signboard in Model C.

Figure 10. BGC comparison.

Figure 11. Changes in LPIPS for target (a), target (b), and target (c).

Figure 12. Sensitive target RD for (a) AL. A, (b) AL. B, and (c) the proposed method.

Figure 13. FD values.

Figure 14. Impact of fragmentation degree threshold on (a) recall, (b) IoU, and (c) efficiency.

Table 1. Experimental dataset information.

Dataset	Data Characteristics	Level Range	Area/(m²)	Number of Nodes	Number of Sensitive Texture Targets
Model A	Sparse urban areas	L15~L24	43,932.57	2080	70
Model B	Moderately dense urban areas	L15~L23	38,688.40	7504	97
Model C	Dense urban areas	L15~L22	120,340.12	3499	290

Table 2. Sensitive target detection recall.

Dataset	LOD	Number of Sensitive Texture Targets	Recall
Dataset	LOD	Number of Sensitive Texture Targets	AL. A	AL. B	The Proposed
Model A	low	9	0.67	0.67	1.00
	medium	11	0.91	0.91	1.00
	high	50	0.70	0.69	1.00
Model B	low	12	0.58	0.57	1.00
	medium	17	0.82	0.88	1.00
	high	68	0.79	0.81	1.00
Model C	low	51	0.59	0.61	0.94
	medium	72	0.88	0.90	0.96
	high	167	0.75	0.75	0.96

Table 3. Sensitive target detection precision.

Dataset	LOD	Number of Sensitive Texture Targets	Precision
Dataset	LOD	Number of Sensitive Texture Targets	AL. A	AL. B	The Proposed
Model A	low	9	0.86	0.86	1.00
	medium	11	1.00	1.00	1.00
	high	50	0.97	1.00	1.00
Model B	low	12	0.87	0.87	1.00
	medium	17	1.00	1.00	1.00
	high	68	0.98	0.98	1.00
Model C	low	51	0.88	0.89	1.00
	medium	72	0.98	0.98	1.00
	high	167	0.98	0.98	1.00

Table 4. Sensitive target detection IoU.

Dataset	LOD	Number of Sensitive Texture Targets	IoU
Dataset	LOD	Number of Sensitive Texture Targets	AL. A	AL. B	The Proposed
Model A	low	9	0.63	0.63	0.99
	medium	11	0.92	0.88	0.99
	high	50	0.65	0.64	0.99
Model B	low	12	0.52	0.50	1.00
	medium	17	0.80	0.82	1.00
	high	68	0.72	0.75	1.00
Model C	low	51	0.51	0.53	0.95
	medium	72	0.84	0.88	0.95
	high	167	0.71	0.71	0.95

Table 5. Privacy-preserving comparison for sensitive target (a).

Algorithm	LOD
Algorithm	L19	L20	L21	L22	L23
AL. A
AL. B
The proposed

Table 6. Privacy-preserving comparison for sensitive target (b) (“High-pressure oil pipe”).

Algorithm	LOD
Algorithm	L19	L20	L21	L22	L23
AL. A
AL. B
The proposed

Table 7. Privacy-preserving comparison for sensitive target (c) (“JD No.1” building signboard).

Algorithm	LOD
Algorithm	L19	L20	L21	L22	L23
AL. A
AL. B
The proposed

Table 8. BGC results.

Sensitive Target	Algorithm	BGC
Sensitive Target	Algorithm	L20	L21	L22	L23	Mean
(a)	AL. A	0.0443	0.0456	0.1266	0.1013	0.0795
	AL. B	0.1060	0.0946	0.2248	0.2455	0.1677
	The proposed	0.0286	0.0419	0.0424	0.0242	0.0343
(b)	AL. A	-	-	0.0343	0.0461	0.0402
	AL. B	-	-	0.0341	0.0232	0.0287
	The proposed	0.0051	0.0160	0.0167	0.0147	0.0131
(c)	AL. A	0.0460	0.0524	0.0821	0.0492	0.0574
	AL. B	0.0109	0.1108	0.1469	0.0763	0.0862
	The proposed	0.0155	0.0083	0.0101	0.0119	0.0115

Table 9. LPIPS results (↓ indicates perceptual similarity between adjacent levels is higher after privacy-preserving processing, * indicates privacy-preserving processing was not applied).

Sensitive Target	LOD	LPIPS
Sensitive Target	LOD	Original	AL. A	AL. B	The Proposed
(a)	L19~L20	0.3486	0.4180	0.3450 ↓	0.3340 ↓
	L20~L21	0.2860	0.2754 ↓	0.3037	0.2488 ↓
	L21~L22	0.2195	0.3062	0.2909	0.2785
	L22~L23	0.0450	0.1645	0.1154	0.0756
(b)	L19~L20	0.1610	0.1610 *	0.1610 *	0.1722
	L20~L21	0.2247	0.2247 *	0.2247 *	0.2233 ↓
	L21~L22	0.1538	0.2008	0.2268	0.1658
	L22~L23	0.1130	0.1582	0.1762	0.1132
(c)	L19~L20	0.2155	0.3988	0.3838	0.3078
	L20~L21	0.0995	0.2027	0.2905	0.1570
	L21~L22	0.0192	0.0436	0.1146	0.0597
	L22~L23	0.0607	0.0661	0.0932	0.0223 ↓

Table 10. Sensitive target re-detection rate.

Dataset	LOD	Number of Sensitive Texture Targets	RD
Dataset	LOD	Number of Sensitive Texture Targets	AL. A	AL. B	The Proposed
Model A	low	9	0.33	0.44	0.00
	medium	11	0.09	0.09	0.00
	high	50	0.34	0.31	0.00
Model B	low	12	0.42	0.43	0.00
	medium	17	0.29	0.12	0.00
	high	68	0.24	0.19	0.00
Model C	low	51	0.45	0.39	0.06
	medium	72	0.14	0.10	0.04
	high	167	0.26	0.25	0.04

Table 11. FD values.

Statistical Characteristics	Value
Statistical Characteristics	<L20	L20	L21	L22	L23	L24
Mean	0.0304	0.0987	0.1913	0.2285	0.3047	0.3078
Minimum	0.0020	0.0715	0.0172	0.0206	0.0088	0.0280
Maximum	0.0646	0.1052	0.5526	0.6509	0.7105	0.8565

Table 12. Recall.

Dataset	Recall
Dataset	γ = 0.10	γ = 0.20	γ = 0.30	γ = 0.40
Model A	0.44	1.00	1.00	1.00
Model B	0.48	1.00	1.00	1.00
Model C	0.51	0.96	0.96	0.96

Table 13. IoU.

Dataset	IoU
Dataset	γ = 0.10	γ = 0.20	γ = 0.30	γ = 0.40
Model A	0.52	0.99	0.99	0.99
Model B	0.46	1.00	1.00	1.00
Model C	0.57	0.95	0.95	0.97

Table 14. Efficiency.

Dataset	Efficiency/(MB/s)
Dataset	γ = 0.10	γ = 0.20	γ = 0.30	γ = 0.40
Model A	8.12	6.96	6.96	3.54
Model B	6.58	6.02	6.02	3.07
Model C	9.20	7.75	7.75	4.35

Table 15. Experimental datasets under different texture organization conditions.

Datatype	Description	Texture Organization Characteristics
1	Original data	Baseline condition
2	Reduced tile partitioning	Fewer texture patches with more regular texture distribution
3	Increased texture splitting and recombination	Significantly increased number of texture patches, resulting in highly fragmented textures
4	External texture referencing	Same texture organization as the original data, with only storage format modified

Table 16. Recall under different texture organization conditions.

Dataset	Recall
Dataset	Datatype = 1	Datatype = 2	Datatype = 3	Datatype = 4
Model A	1.00	1.00	0.94	1.00
Model B	1.00	1.00	1.00	1.00
Model C	0.96	0.96	0.96	0.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Q.; Ren, N.; Zhu, C.; Cai, J. A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models. Remote Sens. 2026, 18, 1468. https://doi.org/10.3390/rs18101468

AMA Style

Zhou Q, Ren N, Zhu C, Cai J. A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models. Remote Sensing. 2026; 18(10):1468. https://doi.org/10.3390/rs18101468

Chicago/Turabian Style

Zhou, Qianwen, Na Ren, Changqing Zhu, and Jingyi Cai. 2026. "A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models" Remote Sensing 18, no. 10: 1468. https://doi.org/10.3390/rs18101468

APA Style

Zhou, Q., Ren, N., Zhu, C., & Cai, J. (2026). A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models. Remote Sensing, 18(10), 1468. https://doi.org/10.3390/rs18101468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Scene-Context-Aware Texture Privacy-Preserving Method for Photogrammetric 3D Urban Models

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Proposed Method

2.2. Fragmentation-Aware Detection Strategy

2.3. Scene-Aware Texture Reconstruction

2.4. Multi-Level Stable Mapping

3. Experiments and Analysis

3.1. Experimental Datasets

3.2. Evaluation Metrics

3.2.1. Metrics for Sensitive Target Detection

3.2.2. Metrics for Scene Texture Visual Continuity

3.2.3. Metrics for Re-Detection of Inpainted Sensitive Targets

3.3. Results and Analysis

3.3.1. Sensitive Target Detection Accuracy

3.3.2. Scene Texture Visual Continuity

Spatial Visual Continuity

Inter-Level Consistency

3.3.3. Re-Detection Rate of Inpainted Sensitive Targets

4. Discussion

4.1. Impact of the Fragmentation Degree Threshold on Accuracy and Efficiency

4.2. Robustness Under Different Texture Organization Conditions

4.3. Challenges and Future Research Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI