Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth

Salehi Dastjerdi, Niloufar; Ahmad, M. Omair

doi:10.3390/electronics14081671

Open AccessArticle

Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth

by

Niloufar Salehi Dastjerdi

and

M. Omair Ahmad

^*

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(8), 1671; https://doi.org/10.3390/electronics14081671

Submission received: 4 March 2025 / Revised: 5 April 2025 / Accepted: 16 April 2025 / Published: 20 April 2025

(This article belongs to the Special Issue Image and Video Processing for Emerging Multimedia Technology)

Download

Browse Figures

Versions Notes

Abstract

The rapid advancement of 3D imaging technology and depth cameras has made depth data more accessible for applications such as virtual reality and autonomous driving. However, depth maps typically suffer from lower resolution and quality compared to color images due to sensor limitations. This paper introduces an improved approach to guided depth map super-resolution (GDSR) that effectively addresses key challenges, including the suppression of texture copying artifacts and the preservation of depth discontinuities. The proposed method integrates both local and nonlocal models within a structured framework, incorporating an adaptive bandwidth mechanism that dynamically adjusts guidance weights. Instead of relying on fixed parameters, this mechanism utilizes a distance map to evaluate patch similarity, leading to enhanced depth recovery. The local model ensures spatial smoothness by leveraging neighboring depth information, preserving fine details within small regions. On the other hand, the nonlocal model identifies similarities across distant areas, improving the handling of repetitive patterns and maintaining depth discontinuities. By combining these models, the proposed approach achieves more accurate depth upsampling with high-quality depth reconstruction. Experimental results, conducted on several datasets and evaluated using various objective metrics, demonstrate the effectiveness of the proposed method through both quantitative and qualitative assessments. The approach consistently delivers improved performance over existing techniques, particularly in preserving structural details and visual clarity. An ablation study further confirms the individual contributions of key components within the framework. These results collectively support the conclusion that the method is not only robust and accurate but also adaptable to a range of real-world scenarios, offering a practical advancement over current state-of-the-art solutions.

Keywords:

depth upsampling; GDSR methods; texture copy artifacts; depth discontinuities

1. Introduction

Rapid development of 3D imaging technologies and the appearance of depth cameras such as Time of Flight (ToF) camera and Microsoft Kinect have led to achieving the depth information of scenes easily. Depth information plays a fundamental role in many computer vision and image processing applications, such as virtual reality, object detection, autonomous driving [1,2], and 3D reconstruction [3,4]. However, due to the limitations of current depth sensing techniques, the captured depth maps usually contain different quality degradations in comparison to the color images acquired by high resolution color sensors. Typical depth quality degradations in depth maps captured by Kinect cameras include missing structural information, whereas ToF depth maps often suffer from noise and have much lower resolution compared to RGB color images. Consequently, the captured low-quality depth maps would greatly affect the performance of algorithms in image and video processing applications. Therefore, it is essential to develop effective and efficient depth restoration algorithms to obtain a high-resolution (HR) depth map from a low-resolution (LR) input. Techniques for enhancing depth map resolution can be categorized as guided when they use a corresponding high-resolution RGB image to assist in the process. Many approaches leverage this high-resolution color image to effectively guide the resolution enhancement of a low-resolution depth map. These methods are known as color-guided depth recovery or guided depth map super-resolution (GDSR) [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]. Compared to the general approaches of achieving high-resolution depth maps, GDSR methods have two main characteristics. First, edges play important roles in depth maps. Generally, depth maps contain smooth areas separated by sharp boundaries, which is known as piece-wise smooth. Second, due to the ill-posed nature of the depth restoration problem, additional information is required to achieve satisfactory performance. Consequently, in GDSR approaches. high-resolution color images can be used as additional information to guide the depth map reconstruction process. Some approaches [5,15,18] rely solely on local depth information for interpolating the depth map. In the edge-guided method (EG) [15], a single depth image is upscaled using a high-resolution edge map from Markov random field optimization, but the method is complex and produces jagged artifacts. In contrast, other approaches in GDSR [6,7,12] aim to find an optimal solution that minimizes a predefined objective function. Dong et al. [6] designed a joint local and nonlocal regularization term in gradient and spatial domains for depth map recovery. Zhang et al. [7] proposed a method for jointly filling hole pixels and upsampling depth maps by exploiting the local linear fitting in the spatial domain. Although this method performs well against some of the state-of-the-art restoration methods, it suffers from a high computational complexity. Moreover, it is not robust against noise in the depth maps, where it lacks a pre-denoiser step before applying this method. The AR model [12] is demonstrated to perform well in depth imaging in some cases. However, it can only model the relationship of pixels in a relatively small neighborhood. Despite the general characteristics and advantages of GDSR methods, there are a few problems in these methods that need to be addressed. Two of the most challenging issues are inconsistency between the edges in guidance color images and depth discontinuities in depth maps, which result in texture copy artifacts and the blurring of depth discontinuities in the restored depth map [6,7,8,9,10,11,12,13]. The guidance color image is typically used to provide cues for depth upsampling, based on the assumption that depth discontinuities align with color edges. However, this alignment is not always consistent. In cases where a depth discontinuity coincides with a weak color edge, the depth discontinuity may become blurred. Conversely, when a smooth depth region aligns with a highly textured color region, texture copy artifacts can occur. Several works [9,10,11,12] have been undertaken to tackle these issues. Some of them [16,17,18] have proposed complex guidance weights based on guidance color images and have utilized the bicubic interpolation for the input depth map. However, a complex guidance weight does not consistently improve upsampling quality and often increases computational complexity. Furthermore, the bicubic interpolation [17] of the input depth map can be unreliable when the upsampling factor is large and the input depth map suffers from heavy noise. Liu et al. [8] proposed a robust method to alleviate texture copy artifacts and boundary blurs for depth super-resolution. However, most of these methods could not completely address the boundary distortion problem in depth maps. Zuo et al. [9] developed a model to quantitatively measure the inconsistency of associated boundaries in RGB images and depth maps by detecting and matching edges. Then, they designed a new regularization term in a Markov random field (MRF) to suppress texture copy artifacts in depth maps. Yang et al. [10] addressed the boundary misalignment problem of distorted depth maps. Their work only addressed the boundary misalignment problem in slightly distorted depth maps. Moreover, it did not handle the texture copy artifacts problem and was not suitable for seriously distorted depth maps. Wang et al. [11] introduced a method that begins by identifying distorted boundaries in depth maps, followed by the application of a weighted median filter for depth recovery. Their approach employs a local method to upsample depth maps by imposing constraints on the relationship between the target pixel and its neighboring pixels. Gaussian weighting is utilized as a simple feature to capture depth structures. While this method is computationally efficient, it often results in undesirable outcomes such as edge blurring and texture-copying artifacts. In addition, there are few studies on the mathematical analysis of the issues [19]. Wei liu et al. [19] proposed smoothness measurements to suppress texture copy artifacts and preserve depth discontinuities. However, they used preset parameters to distinguish homogenous and edge regions. Ham et al. [14] introduced the Static/Dynamic (SD) filter, combining static and dynamic guidance for robust image filtering through nonconvex optimization. Despite its versatility across applications, the method required careful parameter tuning and incurred computational overhead.

It can be seen from the above that existing works in guided depth map super-resolution (GDSR) have made significant progress, but challenges such as texture copy artifacts, blurred depth boundaries, high computational complexity, sensitivity to noise, and missing depth pixels still remain prevalent. Many methods struggle with maintaining depth discontinuities when relying on color image guidance, especially when color edges do not align with depth edges. Approaches using complex guidance weights often increase computational overhead without consistent improvements, while interpolation-based methods falter under high noise and large upsampling factors.

Motivated by these limitations, in this paper, we propose a new scheme to handle the common issues, such as missing depth pixels, texture copy artifacts, and blurred depth discontinuities, to obtain a high-resolution depth map. To suppress texture copy artifacts and preserve depth discontinuities, we introduce an adaptive bandwidth mechanism that replaces the use of fixed parameters in existing works with a dynamic adjustment of guidance weights. This mechanism proposes and uses a distance map for measuring patch similarity. The adaptive bandwidth is then integrated into both local and nonlocal models to enhance upsampling performance. The local model emphasizes spatial smoothness by using neighboring depth data to retain fine details without excessive smoothing and maintaining depth consistency within small regions. However, relying solely on local processing can fail to capture distant repetitive patterns, which are common in natural images. To address this problem, a nonlocal model leveraging pixel similarities across distant regions is used to capture these nonlocal patterns and to preserve the depth discontinuities. The use of both local and nonlocal models in the proposed scheme results in providing a more comprehensive depth upsampling.

This paper is organized as follows: Section 2 introduces a practical framework for depth upsampling and presents the proposed scheme in detail, outlining its key components and innovations. In Section 3, experimental results obtained by applying the proposed scheme for depth upsampling on the images of two benchmark datasets are provided and compared with those provided by other state-of-the art existing techniques. Finally, Section 4 concludes the paper by summarizing the work of this paper and highlighting the salient features of the proposed scheme.

2. Proposed Method

Figure 1 indicates the proposed framework, which integrates local and nonlocal models while leveraging distance map computation and adaptive bandwidth estimation to refine depth upsampling dynamically. The process begins with a low-resolution depth map,

D_{L}

, and a corresponding high-resolution color image,

C_{H}

, as inputs to the framework. The distance map and adaptive bandwidth module compute pixel-wise similarity and dynamically adjust sensitivity based on local depth variance, nonlocal patterns, and color gradients. The output from this module guides both models, local and nonlocal, in the next stage. The local model prioritizes preserving depth discontinuities and ensures structural consistency in small neighborhoods. The nonlocal model captures repetitive structures and patterns across distant regions. Finally, the outputs from both models are merged in a fusion stage, balancing local consistency and nonlocal similarity to generate a high-resolution depth map,

D_{H}

, with enhanced accuracy and structural integrity.

2.1. Distance Map and Adaptive Bandwidth

Depth map super-resolution methods often capture the intrinsic similarities within local neighborhoods to guide the upsampling process. To achieve this, we define a distance map to quantify patch-level similarities, forming the foundation for adaptive bandwidth estimation. In this approach, the guidance weights used for depth recovery are adjusted dynamically based on the local texture and depth discontinuities to suppress texture copy artifacts while preserving sharp edges.

First, the depth map is preprocessed to handle missing pixels using connected component analysis based on depth information. Missing pixels are assigned values derived from neighboring pixel data, which may be further refined in subsequent steps of the proposed method. The process continues with the selection of a reference patch

P_{r}

from a preprocessed depth map, with dimensions

\sqrt{n} \times \sqrt{n}

. Inspired from [21], the goal is to identify the

m

most similar patches within the local neighborhood (assume there are a total of N Patches) using the Euclidean distance. To facilitate computational efficiency, the selected patches are vectorized and arranged into a structured matrix.

Y_{r} = [y_{r, 1}^{p}, y_{r, 2}^{p}, \dots, y_{r, m}^{p}] \in R^{n \times m}, r = 1, \dots, N; p = 1, \dots, n

(1)

where

y_{r, m}

represents the vectorized form of the

m

-th patch. This matrix,

Y_{r}

, consolidates the patch information into a structured format, streamlining further computations. To capture pairwise similarities, the Euclidean distance between rows of the patch matrix

Y_{r}

is calculated as

d_{r}^{i, j} = {‖y_{r}^{i} - y_{r}^{j}‖}_{2}

(2)

Here,

y_{r}^{i}

and

y_{r}^{j}

are the vectorized patches corresponding to rows

i

and

j

, respectively. This step generates a distance matrix reflecting the relationships among rows of

Y_{r}

.

Next, a row similarity matrix

S \in R^{m \times m}

is constructed to encode these relationships. Each element of

S

is computed as

S (i, j) = e x p (\frac{d_{r}^{i, j}}{σ^{2}})

(3)

where

σ^{2}

is a scaling parameter that controls sensitivity to distance variations. Higher similarity emphasizes closely related patches for aggregation. From the similarity matrix, the

k

most similar rows are selected based on their pairwise distances, computed as row similarity score

r s s

.

r s s = \sum_{i = 1}^{n} S (i, j)

(4)

Using the row similarity score, we refine the patch matrix into

Y_{r}^{k} = [y_{r, 1}^{c}, y_{r, 2}^{c}, \dots, y_{r, m}^{c}] \in R^{n \times m}, r = 1, \dots, N; c = \{1, \dots k\} \subset {1, \dots, n}

(5)

where

k

is the number of rows selected. This refinement ensures that only the most relevant information is retained, reducing dimensionality and improving subsequent computations. The refined matrix

Y_{r}^{k}

serves as the basis for constructing the distance map. For a reference patch

P_{r}

at pixel

(x, y)

, a pixel-wise distance map

D_{r o w} (x, y)

is computed by aggregating the minimum distances across the selected rows as follows:

D_{r o w} (x, y) = m i n {‖y_{r}^{c} - P_{r}‖}_{2}, c = {1, \dots, k}

(6)

We compute the Euclidean distance between the reference vector

P_{r}

and

k

selected similar rows. The minimum of these distances is stored in

D_{r o w} (x, y)

, ensuring that the most relevant pixel similarity is used. This aggregation captures localized similarities while mitigating the influence of outliers, which affects the performance of depth recovery. This methodology not only aims to enhance the quality of the distance map but also enables precise adaptive bandwidth estimation. The dynamic adjustment of guidance weights balances the suppression of texture copying and the preservation of depth discontinuities, making it an essential step in advanced depth upsampling frameworks.

Adaptive bandwidth

b (x, y)

plays a crucial role in refining guidance weights by dynamically adapting to local variations and inter-pixel relationships. This process begins by capturing intensity variations through the local variance

σ^{2} (x, y)

, which highlights areas of significant detail, such as depth discontinuities and transitions. These regions are essential for maintaining structural accuracy in depth recovery.

The refined patch similarity measure

D_{r o w} (x, y)

, derived from the distance map, further encodes structural patterns by summarizing pixel-level relationships within the local neighborhood. This measure complements the local variance by incorporating the aggregated structural similarities from the refined patch matrix. Additionally, texture information is incorporated through the computation of color gradients

\nabla_{c} (x, y)

, derived from the grayscale version of the RGB image (

C_{H} (x, y)

). Grayscale conversion simplifies computation while retaining critical intensity transitions. These gradients are computed using Sobel operators, with horizontal and vertical gradients represented as

\nabla_{c_{x}} (x, y) = {S o b e l}_{H o r i z o n t a l} (I_{G r a y}), \nabla_{c_{y}} (x, y) = {S o b e l}_{V e r t i c a l} (I_{G r a y})

(7)

The combined influence of these features is aggregated into the bandwidth computation as

b (x, y) = α σ^{2} (x, y) + β D_{r o w} (x, y) + {δ \nabla}_{c} (x, y)

(8)

where

α

,

β

, and

δ

are weighting parameters that regulate the contributions of local variance, patch similarity, and color gradients, respectively. This formulation allows the bandwidth to adapt dynamically, ensuring balance between preserving depth discontinuities and suppressing texture copying artifacts.

To further enhance the depth upsampling process, adaptive thresholding is introduced to segment the depth map into regions of depth discontinuities and homogeneous areas. A dynamic threshold

T (x, y)

is calculated by integrating the adaptive bandwidth with depth and color gradients:

T (x, y) = λ_{1} b (x, y) + λ_{2} \nabla_{d} (x, y) + λ_{3} \nabla_{c} (x, y)

(9)

where

\nabla_{d} (x, y)

denotes the depth gradients computed using Sobel operators. Pixels are classified as depth discontinuities if

\nabla_{d} (x, y) > T (x, y)

(10)

These depth discontinuities often correspond to object boundaries and are critical for preserving structural details during depth recovery. Homogeneous regions, in contrast, are characterized by uniformly low-intensity variations and are identified based on the local variance as follows:

σ^{2} (x, y) < τ

(11)

where

τ

is a predefined threshold. This classification enables selective processing of depth discontinuities and homogeneous regions, ensuring that noise is suppressed in smooth areas while preserving significant structural transitions in the depth map.

2.2. Local Model

The local model enhances depth upsampling by preserving spatial smoothness while avoiding excessive blurring. It achieves this by utilizing neighboring depth information to retain fine details and maintain consistency within small regions. We formulate this approach as an energy minimization problem, where the objective is to find an optimal high-resolution depth map by balancing data fidelity and regularization. The data fidelity term ensures that the estimated depth values remain close to the original low-resolution observation, preventing significant deviations, while the regularization term enforces smoothness in the depth map, adapting to local structures to avoid over-smoothing depth discontinuities. The regularization dynamically adjusts based on depth variance, structural patterns, and texture information. This adaptive mechanism is used to enable the method to respond to varying depth characteristics within the scene. The objective function is presented as

E (d) = \sum_{(x, y)} ({‖D_{H} (x, y) - D_{L} (x, y)‖}^{2} + ρ \sum_{(x', y') \in N (x, y)} \frac{{‖D_{H} (x, y) - D_{H} (x', y')‖}^{2}}{b^{2} (x, y)})

(12)

In this formulation, the first term is the data fidelity term. It ensures that the estimated high-resolution depth value

D_{H} (x, y)

remains consistent with the observed low-resolution depth value

D_{L} (x, y)

. The second term is the regularization term (smoothness term). It penalizes large differences between the depth value at the reference pixel and the depth values at neighboring pixels

(x', y')

within the local neighborhood

N (x, y)

. The regularization parameter ρ serves as a global scalar that determines the overall influence of the regularization term relative to the data fidelity term. A well-balanced choice of ρ is essential; setting it too high may lead to over-smoothing of depth discontinuities and loss of fine details, while a value too low might allow noise or inconsistencies to persist in the reconstructed depth map. However, while ρ is applied globally across the entire image, its actual effect is modulated locally through the integration of the adaptive bandwidth b(x,y) into the regularization term. The adaptive bandwidth b(x,y) is designed to respond to the local characteristics of the scene. It is computed based on a combination of local depth variance, patch similarity, and image gradients. This enables the method to adjust the strength of smoothness enforcement at each pixel location. Specifically, in regions where the scene contains sharp depth transitions or structural complexity, the bandwidth value increases, which reduces the regularization strength locally, thereby preserving edges and important depth details. In contrast, in more homogeneous areas, where the depth is relatively constant, the bandwidth decreases, leading to stronger smoothing and effective noise suppression. To dynamically adjust this balance, the adaptive bandwidth b(x,y) is embedded directly into the regularization term, allowing the penalty for depth differences between neighboring pixels to adapt based on local image content. To solve the resulting optimization problem, we employ the Iteratively Reweighted Least Squares (IRLS) method, which efficiently handles noise and outliers by iteratively updating the weights in the regularization process. It dynamically adjusts the influence of each pixel during optimization, which aligns with the adaptive bandwidth, preserving depth structures while suppressing noise. This causes those regions with high depth variance (e.g., discontinuities) to be penalized less, preserving critical structures, while homogeneous regions receive stronger regularization, suppressing noise. This synergy between global and local control ensures that the method maintains structural detail where needed while promoting smoothness in flat regions, leading to more accurate and visually coherent depth reconstruction.

2.3. Nonlocal Model

In this section, we propose a nonlocal model that leverages the strong nonlocal similarity inherent in depth maps to enhance the accuracy and efficiency of depth upsampling. By integrating advanced techniques such as low-rank approximation, sparse representation, and adaptive bandwidth, the model captures repetitive patterns and structural alignments across distant regions while preserving critical depth discontinuities. Low-rank approximation exploits the inherent redundancy in depth patches, sparse representation retains only the most significant contributors to each patch, and adaptive bandwidth dynamically adjusts sensitivity to local and nonlocal features. This combination addresses the limitations of traditional approaches for robust handling of depth variations, reduced over-smoothing, and improved preservation of high-frequency details in high-resolution depth maps.

In the first step, the depth map is divided into overlapping patches of size

\sqrt{n} \times \sqrt{n}

, and each patch is vectorized into a column vector. These vectors form a patch matrix

R

.

R = [P_{1}, P_{2}, \dots, P_{m}] \in R^{n \times m}

(13)

where n is the number of elements in each patch, and m is the number of patches. Depth maps exhibit significant nonlocal similarity, making the patch matrix

R

inherently low-rank. To exploit this property, we perform singular value decomposition (SVD).

R \approx U Σ V^{T}

(14)

where

U \in R^{n \times r}

is left singular vectors (basis for patch space),

Σ \in R^{r \times r}

denotes a diagonal matrix of singular values,

V \in R^{m \times r}

is right singular vectors, and

r

is the rank of the approximation, with

r ≪ \min (n, m)

. Retaining only the top-

r

singular values suppress noise and outliers, reconstructing the patch matrix as

R_{r} \approx U Σ_{r} V^{T}

(15)

Then, each reference patch,

P_{r}

, is expressed as a sparse linear combination of the reconstructed patches in

R_{r}

as follows:

P_{r} \approx R_{r} . φ

(16)

where

φ

is the sparse coefficient vector obtained by solving

{m i n}_{α} {‖P_{r} - R_{r} . φ‖}_{2}^{2} + ω {‖φ‖}_{1}

(17)

This step ensures that only the most relevant patches contribute to the reconstruction. This optimization is solved using an iterative shrinkage-thresholding algorithm (ISTA) [22]. Using the sparse coefficients

φ

, and by incorporating the adaptive bandwidth into the sparse representation step, the optimization problem becomes

{m i n}_{φ} {‖P_{r} - R_{r} . φ‖}_{2}^{2} + ω {‖φ . b (x, y)‖}_{1}

(18)

Using the adaptive sparse coefficients

φ

, the depth values are aggregated as follows:

D (x, y) = \sum_{(x', y')} φ_{(x', y')} . D (x', y')

(19)

2.4. Fusion of Local and Nonlocal Outputs for Depth Upsampling

The depth upsampling problem is formulated as the optimization of a combined local and nonlocal models as follows:

E_{c o m b i n e d} (d) = w_{l} E_{l o c a l} (d) + w_{n l} E_{n o n l o c a l} (d)

(20)

The local model term

E_{l o c a l} (d)

is obtained in Equation (12) and enforces consistency within small neighborhoods. The nonlocal model term

E_{n o n l o c a l} (d)

is computed according to Equation (18) and captures long-range dependencies.

w_{l}

and

w_{n l}

are local and nonlocal model coefficients, which allow fine-tuning the model for specific applications.

w_{l}

provides sufficient emphasis on local smoothness, ensuring that small-scale details and depth discontinuities are preserved without excessive noise.

w_{n l}

allocates weight to nonlocal similarities, leveraging repetitive patterns and global structures while maintaining computational efficiency, with

w_{l}

>

w_{n l}

typically resulting in better preservation of fine details and

w_{l}

<

w_{n l}

enhancing global consistency.

3. Experimental Results

This section presents a comprehensive evaluation of the proposed method through various experiments and analyses, validating its effectiveness, accuracy, and robustness using both quantitative and qualitative assessments. The effectiveness of the proposed depth map recovery method is verified on three widely used RGB–depth datasets, Middlebury 2005 [23,24,25], Middlebury 2014 [23,26], and NYU [27]. The experimental results are organized into four subsections, each highlighting a different aspect of the evaluation process. In Section 3.1, we explain how the key parameters in the proposed method are set and used during the experiments. In Section 3.2, we analyze the proposed adaptive bandwidth by first comparing pixel-wise and patch-wise strategies for patch similarity, followed by an ablation study of its three components. In Section 3.3, quantitative performance, in terms of two metrics, of the proposed scheme is provided by applying it on the two datasets. Finally, in Section 3.4, visual results, in terms of the high-resolution depth maps provided by the proposed scheme, are presented.

3.1. Parameters Settings

In the experiments carried out in this section, MATLAB R2019a is used for the implementation and evaluation of the proposed scheme. An 8 × 8 patch size is selected, with a total of N = 100 patches sampled from the local neighborhood for each reference patch to capture sufficient contextual information. The scaling parameter σ for constructing the row similarity matrix is set to 0.5, balancing sensitivity to distance variations and noise suppression. From the similarity matrix, the top 20 most similar rows (k = 20) are selected to retain relevant local structures while reducing noise.

In the adaptive bandwidth computation defined in Equation (8), the weighting parameters for local variance (α), patch similarity (β), and color gradients (γ) are set to 0.4, 0.35, and 0.25, respectively, to achieve a balance between texture suppression and edge preservation. To evaluate how sensitive the overall performance is to these weights, we conducted a sensitivity analysis by varying their relative contributions while keeping their sum constant. This analysis was performed on the Art, Moebius, Books, Laundry, and Reindeer images from the Middlebury 2005 dataset [23,24,25], images that were also used in the ablation study in Section 3.2 for consistency. We ensured that the sum α + β + γ = 1 remained constant, and we evaluated the overall impact on the final depth upsampling quality using mean absolute error (MAE) and root mean square error (RMSE). The obtained MAE and RMSE values reflect the final high-resolution depth map produced by the proposed method, not merely the quality of the adaptive bandwidth itself. The adaptive bandwidth influences depth recovery indirectly through its role in modulating guidance weights, so its effectiveness is best assessed via the accuracy of the final depth estimation. Among the different parameter combinations tested, the setting α = 0.4, β = 0.35, and γ = 0.25 yielded the lowest MAE (0.63) and RMSE (0.93) on average, outperforming other combinations, which resulted in a higher MAE and RMSE. This setting was not only optimal for the initial selected images but was also further tested with other parameter variations across different images and datasets, where it continued to provide the most stable and accurate results. This sensitivity analysis further supports our selection of the weighting parameters used in the main experiments and reinforces the generalizability of the proposed method.

Color and depth gradients are computed using a 3 × 3 Sobel operator to capture intensity transitions. The threshold for depth discontinuity detection is dynamically computed with an initial value of T_d = 0.15 to identify object boundaries effectively, while a local variance threshold, T_ℎ = 0.05, is used to classify homogeneous regions for noise suppression. For the local model, the regularization parameter

ρ

= 0.8 is set to balance the trade-off between data fidelity and spatial smoothness, preventing over-smoothing in regions with high depth variance while effectively suppressing noise in homogeneous areas. For the IRLS solver, a tolerance of

10^{6}

is used as the convergence criterion to ensure precise depth estimation. These parameter settings, optimized through empirical testing and cross-validation across multiple datasets, contributed to achieving high-quality depth super-resolution results.

To evaluate the influence of the weighting parameters w_l and w_nl, which govern the fusion of local and nonlocal models in Equation (20), we conducted empirical evaluations on the Middlebury 2005 [23,24,25] and Middlebury 2014 [23,26] datasets. These parameters allow the method to adaptively emphasize either local structural details or broader contextual patterns. We found that increasing w_l favors the preservation of depth discontinuities and fine textures, while larger values of w_nl enhance performance in smooth or repetitive regions by leveraging nonlocal correlations. Several weight combinations satisfy w_l + w_nl = 1, ranging from local-only (w_l = 1.0, w_nl = 0.0) to nonlocal-only (w_l = 0.0, w_nl = 1.0). For each pair of weights, we evaluated the performance and selected the pair of weights that minimized the average MAE and RMSE. The best performance was achieved at w_l = 0.6 and w_nl = 0.4, striking an effective balance between edge preservation and global consistency. This configuration, adopted in our main experiments, yielded the average lowest MAE and RMSE, confirming its suitability for high-quality depth recovery across diverse scenes, and was used in our final implementation.

3.2. Adaptive Bandwidth Analysis

The first part analyzes the role and impact of the proposed distance map, illustrating how its design influences patch similarity measurement. Figure 2 illustrates the visual results of the proposed distance map, which captures both patch-wise and pixel-wise similarities, emphasizing depth discontinuities and structural details. The inclusion of histogram analysis in Figure 2 serves to provide a quantitative interpretation of the distance map, offering deeper insights into how well the distance map differentiates between similar and dissimilar regions, and how the proposed method captures structural details at both the patch and pixel levels. Since the distance map guides the adaptive bandwidth estimation, understanding its distribution helps validate the method’s ability to balance detail preservation and smoothness. The analysis compares patch-level and pixel-level similarity. The patch-level distance map in Figure 2c effectively summarizes the similarity of larger areas, which can be useful for coarse matching. The pixel-level distance map indicated in Figure 2e provides detailed insights into local variations, which is essential for high-precision tasks such as edge detection or depth discontinuity preservation. While the overall trend of smaller distances remains, the histogram in Figure 2f indicates slightly more variations at the pixel level, capturing finer structural details that are averaged out in patch-level computations, as indicated in its histogram in Figure 2d. In Figure 2d, the histogram demonstrates the relationship between the patch-wise distance and the number of reference patches with a given patch-wise distance to their corresponding most similar patches. The histogram in Figure 2f indicates that the distance map value of each pixel by this method is different from the patch-wise method. We observe that over 1.6 × 10⁴ reference patches contain closely matched pixels. The range of distance values (on the x-axis) is consistent across both histograms, but the pixel-level histogram shows more granular variations due to its finer computation approach. Patch-level distances aggregate distances over patches (groups of pixels), resulting in fewer unique distance values and smoother distributions. Pixel-level distances show a broader distribution, as every individual pixel contributes to the histogram, capturing nuances that patches cannot. These results validate the proposed method’s capability to capture both global patterns and local details effectively.

To analyze the effectiveness of the proposed adaptive bandwidth formulation introduced in Equation (8), we conducted an ablation study to evaluate the contributions of each component, local variance, patch similarity, and color gradient information to the overall depth reconstruction performance. Table 1 summarizes the average MAE and RMSE values obtained from five images, Art, Moebius, Books, Laundry, and Reindeer, selected from the Middlebury 2005 dataset [23,24,25]. The table presents three configurations of the adaptive bandwidth computation. Configuration 1 uses only local variance for adaptive bandwidth formulation. Configuration 2 combines local variance with patch similarity based on a pixel-wise distance map, and Configuration 3 (the proposed configuration) incorporates all three terms—local variance, patch similarity, and color gradient—for more refined and context-aware guidance.

The results in Table 1 clearly demonstrate that incorporating additional information significantly improves depth reconstruction accuracy. The MAE reduces from 3.15 in Configuration 1 to 0.63 in Configuration 3, while RMSE decreases from 4.57 to 0.93. These reductions reflect the progressively stronger guidance provided by the adaptive bandwidth as each component is added, validating the design of the full proposed configuration.

Figure 3 visually illustrates the depth reconstruction performance on two representative images, Art and Moebius, indicated in the first and second row, respectively. For each image, Figure 3a shows the original color image, while Figure 3b presents the depth map. Figure 3c–e depict the output depth maps generated using different adaptive bandwidth configurations. As more components are added, the reconstructed depth maps become increasingly accurate, with sharper object boundaries, fewer artifacts, and better preservation of depth discontinuities. This trend is consistent with the quantitative results in Table 1 and highlights the visual impact of each component within the adaptive bandwidth computation. Therefore, both the quantitative improvements in Table 1 and the qualitative enhancements in Figure 3 confirm the importance of combining local variance, patch similarity, and color gradient in the adaptive bandwidth model. Together, they contribute to the improvement in accuracy and visual quality of the final high-resolution depth maps.

3.3. Quantitative Results

In this section, we present the quantitative evaluation of the proposed method to demonstrate its effectiveness in depth-disparity reconstruction. Middlebury 2005 [23,24,25] and Middlebury 2014 [23,26] datasets are used for experiments, providing a diverse set of depth images for a comprehensive assessment. The mean absolute error (MAE) and root mean square error (RMSE) are employed as the metrics for comparison, as they provide reliable measures for the accuracy of depth reconstruction. The proposed method is compared against several state-of-the-art techniques: AR [12], LN [6], LLFM [7], RCG [8], EIEF [9], DBR [10], UDBD [11], MSG [13], SDF [14], EG [15], and DTSR [16]. The experimental results are analyzed for multiple sampling rates (×2, ×4, and ×8) to demonstrate the robustness of the proposed method across varying levels of upsampling. To validate the improvements in performance over existing depth upsampling methods, we conduct a statistical significance analysis using the Wilcoxon signed-rank test [28]. This non-parametric test is particularly well-suited for evaluating paired data, making it an appropriate choice for comparing the MAE and RMSE results of our proposed method against those of other baseline techniques on each of the two datasets, Middlebury 2005 and Middlebury 2014, individually. In this analysis, each method is evaluated on a set of images selected from one of the datasets at a time. For a given dataset and a given error metric, we calculate the Wilcoxon value (W). The sample size used for the test corresponds to the number of images used for evaluation. We then compare the computed W values against the critical value W_α [29] for the same sample size as that of the selected dataset. If W ≤ W_α, the difference in the performance improvement of the proposed scheme for that of the method used for comparison is considered to be statistically significant. In the presented tables, the Wilcoxon value is denoted as W, allowing a clear indication of whether the improvement is significant.

Table 2 provides a quantitative evaluation, in terms of MAE in the high-resolution depth map, resulting from the various depth upsampling methods when applied to guidance color images and low-resolution depth maps on the images Art, Moebius, Books, Laundry, and Reindeer from the Middlebury 2005 dataset [23,24,25]. In the last column of this table that contains the MAE values resulting from the proposed scheme, the quantities in the parentheses represent the reduction in the values of MAE in comparison to that of the second-best performing scheme. It can be seen from this table that the proposed scheme generates high-resolution depth maps with the lowest MAE among all the methods chosen for comparison in different sampling rates (in Table 2, Table 3, Table 4 and Table 5, the lowest, the second lowest, and the third lowest error values are indicated in bold-faced, bold-italic, and regular italic fonts, respectively). On average, the proposed method reduces the error by 19.4%, 18.4%, and 15% in 2×, 4×, and 8× sampling rates, respectively, compared to the second-best performing method in each case. The results in this table also show how the other schemes, AR [12], RCG [8], and LLFM [7], can benefit from the adaptive bandwidth (AB) mechanism proposed in our paper for depth upsampling. It can be clearly seen from this table that all these three methods are able to reduce their MAEs when the idea of AB is incorporated into these methods. Moreover, considering that a sample size of five images is used in this table, the value of W_α (the critical value) is 0 [29]. As seen from this table, for each of the methods compared, the value of W obtained satisfies the condition W ≤ W_α, confirming that the performance improvements of the proposed scheme over the methods compared are significant.

Table 3 presents the average computational time (in seconds) for processing the set of sample images using the methods listed in Table 2. As observed from this table, the average computational time of the proposed scheme is the second lowest among all the schemed used for comparison. The average processing time of the LLFM [7] is approximately 14% lower than that of the proposed scheme. However, the MAE of the latter is approximately 18% lower than that of the former. Table 2 shows that the three other methods can benefit in reducing the errors by adapting our proposed adaptive bandwidth, but, as seen from Table 3, this benefit is achieved at the expense of increased processing time overhead.

Table 4 presents a quantitative evaluation of various depth upsampling methods based on the RMSE of the obtained high-resolution depth maps. The evaluation is conducted using guidance color images and low-resolution depth maps from the Middlebury 2005 dataset [23,24,25], specifically on the Art, Dolls, Laundry, Moebius, Reindeer, and Books images. The results show that the proposed method achieves the lowest average RMSE among all compared methods, indicating its superior performance in generating high-resolution depth maps. More specifically, the proposed method obtained the lowest RMSE in all images, except one Doll image, for which the error is the second lowest. Furthermore, given that the sample size in this table is six images, W_α is 2 [29]. As seen from the table, the computed W for each of the compared methods satisfies the condition W ≤ W_α, indicating that the performance improvements of the proposed scheme over the methods compared are statistically significant.

Table 5 and Table 6 provide a quantitative evaluation of various depth upsampling methods using the Middlebury 2014 dataset [23,26]. These tables present the performance results in terms of MAE and RMSE metrics, respectively, across the images Couch, Motorcycle, Pipes, Recycle, Sticks, and Sword1. It can be seen from these tables that the proposed scheme provides high-resolution depth maps with the lowest errors in terms of both the metrics among the different methods used for all the images except for the Recycle image. Moreover, since the sample size in the table consists of six images, W_α is 2 [29]. As can be seen from the table, the calculated W for each of the compared methods meets the condition W ≤ W_α, indicating that the performance improvements of the proposed scheme over the compared methods are statistically significant.

3.4. Visual Results

In this section, we present the visual results to evaluate the performance of the proposed method in depth map upsampling. The Middlebury 2005 dataset [23,24,25] and NYU [27] are used for this evaluation, offering a diverse set of images with ground truth depth data to facilitate accurate visual comparisons. The proposed method is compared against AR [12], LN [6], LLFM [7], and RCG [8]. For fair comparison, we used the publicly available source code provided by the authors of these methods to conduct our experiments.

Figure 4 demonstrates the visual results of the proposed algorithm applied to three images from the Middlebury 2005 dataset [23,24,25] (Moebius, Laundry, and Reindeer) under varying sample rates. Figure 4a shows the high-resolution color images used as guidance, while Figure 4b presents the corresponding depth maps. Figure 4c, Figure 4d, and Figure 4e display the upsampled depth results with 2×, 4×, and 8× sample rates, respectively. The results illustrate that the proposed method effectively preserves depth discontinuities and structural details even at higher upsampling rates. The edges and sharp transitions in the depth maps remain well-defined without introducing texture-copying artifacts from the color images. Additionally, the algorithm demonstrates robustness across diverse scenes and sampling scenarios, consistently maintaining depth accuracy and visual quality. These findings validate the method’s ability to handle complex depth upsampling tasks with high fidelity.

Figure 5 presents the experimental results of 4× upsampling on the Art image from the Middlebury 2005 dataset [23,24,25], highlighting the effectiveness of integrating adaptive bandwidth (AB). The second row displays the results from AR [12], RCG [8], and LLFM [7], while the third row shows the outcomes of these methods when enhanced with AB. The integration of AB consistently improves the preservation of depth discontinuities, as evident in the zoomed-in region, demonstrating the effectiveness of the proposed approach in enhancing the visual quality of depth upsampling.

Figure 6 presents the experimental results, illustrating the contributions of individual components within the proposed depth upsampling method, including the local and nonlocal models. Figure 6f displays the final results obtained from combining the local and nonlocal models and shows enhanced visual quality and more accurate depth representation. The local model, as depicted in Figure 6d, captures finer structural details but introduces slight variations, leading to texture copy artifacts in some regions. In contrast, the nonlocal model, shown in Figure 6e, produces a smoother output but results in blurred depth edges and less defined depth discontinuities. The combined approach, presented in Figure 6f, integrates both local and nonlocal constraints, effectively suppressing artifacts and restoring sharper edges.

Figure 7 presents the experimental results of 4× upsampling on two images, Art (left column) and Moebius (right column), from the Middlebury 2005 dataset [23,24,25]. This figure compares different depth upsampling methods (indicated in different rows) against the ground truth shown in Figure 7b. For the Art image, the result of the proposed method, as shown in Figure 7f, effectively preserves depth discontinuities around object boundaries, achieving clearer and more distinct edges compared to the results of the other approaches shown in Figure 7c–e, which tend to have excessive smoothing or blurring. For the Moebius image, the depth maps shown in Figure 7c–e resulting from the other methods struggle with texture-copying artifacts, where high-frequency details from the guidance color image are transferred to the depth maps, affecting the depth quality. In contrast, it can be seen from the depth map of Figure 7f that the proposed method significantly reduces unwanted artifacts while maintaining depth consistency, ensuring a more accurate and visually coherent upsampling result. Overall, as seen from Figure 7f, the proposed method achieves better performance in preserving depth discontinuities (Art image) and effectively suppressing texture-copying artifacts (Moebius image). In conclusion, the results of Figure 7 clearly demonstrate the effectiveness of the proposed scheme by maintaining structural integrity while minimizing unwanted artifacts, resulting in a more accurate and visually consistent depth reconstruction.

To provide a more comprehensive evaluation of our proposed method, we include additional visual results in Figure 8 using depth images from the NYU dataset [27]. The NYU dataset presents real-world scenes acquired using a structured-light sensor (Kinect v1). This introduces a different set of challenges, such as significant noise, depth discontinuities, low spatial resolution, and large regions with missing depth information due to sensor limitations, as can be seen in Figure 8b. These characteristics make the NYU dataset useful for testing how well depth upsampling and completion methods work in real-world scenes. As shown in Figure 8, the results are arranged in two columns, each corresponding to a different image from the dataset. For each image, Figure 8a shows the guidance color image along with a zoomed-in view of a selected region, while Figure 8b presents the registered raw depth map and its corresponding zoomed portion. Figure 8c–e display the results obtained using AR [12], RCG [8], and LLFM [7], respectively, and Figure 8f shows the result produced by the proposed method. Notably, the texture copy artifacts observed in the results of RCG [8] and LLFM [7] indicated in Figure 8d,e are more pronounced compared to AR [12] shown in Figure 8c. However, AR [12] tends to suffer from more depth blurring and discontinuity, especially around object boundaries. From Figure 8f, it can be seen that the proposed method effectively addresses both challenges, offering a better balance by reducing texture copy artifacts while preserving depth discontinuities, as clearly demonstrated in both examples. In addition, it is also evident, particularly from the selected regions of the test images, that noise and missing pixels are present throughout these real images. Our method significantly improves these degraded areas by effectively denoising the depth map and filling in missing values, further enhancing the visual quality and structural consistency of the results.

4. Conclusions

This paper has presented a novel depth upsampling method by integrating local and nonlocal models to achieve a high-quality depth reconstruction. Our proposed scheme first computes a distance map based on a low-resolution depth map and a high-resolution guidance color image and uses this map to propose an adaptive bandwidth. The main novelty of our proposed scheme has been the simultaneous use of both the distance map and adaptive bandwidth so obtained in guiding the functioning of both the local and nonlocal models. In contrast to existing schemes, which used fixed weighting parameter in their objective function, the proposed scheme, by using the adaptive bandwidth, is able to provide a better balance between preserving depth discontinuities and suppressing texture-copying artifacts. Extensive experiments have been performed by applying the proposed scheme of depth upsampling on different datasets to demonstrate its efficiency. It has been shown that the performance in terms of MAE and RMSE metrics of the proposed scheme is significantly superior to that of the second-best performing schemes used for comparisons. It has also been demonstrated that the other schemes can also benefit from incorporating the idea of the distance map and adaptive bandwidth in their methods by reducing their errors. However, the lowest errors are still achieved by the proposed scheme. It is seen that the depth maps resulting from the proposed scheme have a superior subjective quality in that they have a better preservation of depth discontinuities and reduction in texture-copying artifacts in comparison in the depth maps obtained from the other schemes. In future work, welan to explore the integration of our depth upsampling framework to real-world applications such as smart assembly line monitoring, where accurate depth information plays a crucial role in recognizing fine-grained worker activities. The approach presented in [30] demonstrates the effectiveness of using multi-visual modalities for fine-grained activity classification, and our method may provide valuable depth enhancement in such contexts.

Author Contributions

Conceptualization, N.S.D. and M.O.A.; methodology, N.S.D. and M.O.A.; software, N.S.D.; validation, N.S.D.; formal analysis, N.S.D.; investigation, N.S.D. and M.O.A.; resources, N.S.D. and M.O.A.; data curation, N.S.D. and M.O.A.; writing—original draft preparation, N.S.D.; writing—review and editing, N.S.D. and M.O.A.; visualization, N.S.D.; supervision, M.O.A.; project administration, M.O.A.; funding acquisition, M.O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada and in part by the Regroupement Stratégique en Microsystèmes du Québec (ReSMiQ).

Data Availability Statement

The datasets used in this study are publicly available at sites mentioned in references [23,24,25,26,27] of the paper.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Abbreviations

ToF	Time of Flight
HR	High-resolution
LR	Low-resolution
GDSR	Guided depth map super-resolution
SD	Static/Dynamic
MAE	Mean absolute error
RMSE	Root mean square error

References

Du, L.; Ye, X.; Tan, X.; Johns, E.; Chen, B.; Ding, E.; Xue, X.; Feng, J. AGO-Net: Association-Guided 3D Point Cloud Object Detection Network. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8097–8109. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Zhang, L.; Zhu, Y.; Zhang, Z.; He, T.; Li, M.; Xue, X. Progressive coordinate transforms for monocular 3d object detection. Adv. Neural Inf. Process. Syst. 2021, 34, 13364–13377. [Google Scholar]
Han, X.F.; Laga, H.; Bennamoun, M. Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1578–1604. [Google Scholar] [CrossRef] [PubMed]
Peng, S.; Jiang, C.; Liao, Y.; Niemeyer, M.; Pollefeys, M.; Geiger, A. Shape as points: A differentiable poisson solver. Adv. Neural Inf. Process. Syst. 2021, 34, 13032–13044. [Google Scholar]
Liu, M.-Y.; Tuzel, O.; Taguchi, Y. Joint Geodesic Upsampling of Depth Images. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 169–176. [Google Scholar]
Dong, W.; Shi, G.; Li, X.; Peng, K.; Wu, J.; Guo, Z. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Trans. Multimed. 2017, 19, 293–301. [Google Scholar] [CrossRef]
Zhang, Y.; Ding, L.; Sharma, G. Local-linear-fitting-based matting for joint hole filling and depth upsampling of RGB-D images. J. Electron. Imaging 2019, 28, 033019. [Google Scholar] [CrossRef]
Liu, W.; Chen, X.; Yang, J.; Wu, Q. Robust color guided depth map restoration. IEEE Trans. Image Process. 2016, 26, 315–327. [Google Scholar] [CrossRef] [PubMed]
Zuo, Y.; Wu, Q.; Zhang, J.; An, P. Explicit edge inconsistency evaluation model for color-guided depth map enhancement. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 439–453. [Google Scholar] [CrossRef]
Yang, M.; Cheng, Y.; Guang, Y.; Wang, J.; Zheng, N. Boundary recovery of depth map for synthesis view optimization in 3D video. In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA; 2019; pp. 1–4. [Google Scholar]
Wang, H.; Yang, M.; Lan, X.; Zhu, C.; Zheng, N. Depth map recovery based on a unified depth boundary distortion model. IEEE Trans. Image Process. 2022, 31, 7020–7035. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Ye, X.; Li, K.; Hou, C.; Wang, Y. Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 2014, 23, 3443–3458. [Google Scholar] [CrossRef] [PubMed]
Hui, T.-W.; Loy, C.C.; Tang, X. Depth map super-resolution by deep multi-scale guidance. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 353–369. [Google Scholar]
Ham, B.; Cho, M.; Ponce, J. Robust image filtering using joint static and dynamic guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4823–4831. [Google Scholar]
Xie, J.; Feris, R.S.; Sun, M.-T. Edge-guided single depth image super resolution. IEEE Trans. Image Process. 2016, 25, 428–438. [Google Scholar] [CrossRef] [PubMed]
Jiang, Z.; Hou, Y.; Yue, H.; Yang, J.; Hou, C. Depth super-resolution from RGB-D pairs with transform and spatial domain regularization. IEEE Trans. Image Process. 2018, 27, 2587–2602. [Google Scholar] [CrossRef] [PubMed]
Diebel, J.; Thrun, S. An application of Markov random fields to range sensing. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 291–298. [Google Scholar]
Park, J.; Kim, H.; Tai, Y.-W.; Brown, M.S.; Kweon, I. High quality depth map upsampling for 3D-TOF cameras. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1623–1630. [Google Scholar]
Liu, W.; Chen, X.; Yang, J.; Wu, Q. Variable bandwidth weighting for texture copy artifact suppression in guided depth upsampling. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2072–2085. [Google Scholar] [CrossRef]
Zhong, Z.; Liu, X.; Jiang, J.; Zhao, D.; Ji, X. Guided depth map super-resolution: A survey. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Hou, Y.; Zhang, H.; Yuan, X.; Su, Z.; Huang, F. NLH: A blind pixel-level non-local method for real-world image denoising. IEEE Trans. Image Process. 2020, 29, 5121–5135. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Middlebury Datasets. Available online: http://vision.middlebury.edu/stereo/data/ (accessed on 8 November 2024).
Scharstein, D.; Pal, C. Learning conditional random fields for stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
Hirschmüller, H.; Scharstein, D. Evaluation of cost functions for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
Daniel, S.; Heiko, H.; York, K.; Greg, K.; Nera, N.; Xi, W.; Porter, W. High-resolution stereo datasets with subpixel accurate ground truth. In Proceedings of the German Conference on Pattern Recognition (GCPR), Münster, Germany, 2–5 September 2014; pp. 31–42. [Google Scholar]
Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGB-D images. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012; pp. 746–760. Available online: https://cs.nyu.edu/fergus/datasets/ (accessed on 12 March 2025).
Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers, 6th ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Chen, H.; Zendehdel, N.; Leu, M.C.; Yin, Z. Fine-grained activity classification in assembly based on multi-visual modalities. J. Intell. Manuf. 2023, 35, 2215–2233. [Google Scholar] [CrossRef]

Figure 1. Method framework.

Figure 2. Visual results of the proposed distance map on a selected part of Art. (a) Ground truth depth map. (b) Guidance color image. (c) Patch-wise distance map. (d) Patch-wise distance histogram. (e) Pixel-wise (row-wise) distance map. (f) Pixel-wise (row-wise) distance histogram.

Figure 3. Examples of an ablation study on selected region of two images, Art and Moebius, in the first and right rows from the Middlebury 2005 dataset [23,24,25]. (a) Color image, (b) depth map, (c) result from Configuration 1, (d) result from Configuration 2, (e) result from Configuration 3.

Figure 4. Visual results of the proposed algorithm on three images (top: Moebius, middle: Laundry, bottom: Reindeer) from the Middlebury 2005 dataset [23,24,25] with different sample rates. (a) High-resolution color image, (b) corresponding depth map, (c) results with 2× sample rate, (d) results with 4× sample rate, (e) results with 8× sample rate.

Figure 5. Experimental results of 4× upsampling on Art from the Middlebury 2005 dataset [23,24,25]. (a) Guidance color image and zoomed portion of selected red box. (b) Ground truth depth map and zoomed portion of selected box. (c) Adaptive bandwidth. (d1) Result of AR [12]. (d2) Result of AB-AR. (e1) Result of RCG [8]. (e2) Result of AB-RCG. (f1) Result of LLFM [7]. (f2) Result of AB-LLFM.

Figure 6. Experimental results of 4× upsampling on Art from the Middlebury 2005 dataset [23,24,25]. (a) Guidance color image and zoomed portion of selected red box. (b) Ground truth depth map and zoomed portion of selected box. (c) Adaptive bandwidth. (d) Result of local model. (e) Result of nonlocal model. (f) Result of proposed method (local + nonlocal).

Figure 7. Experimental results of 4× upsampling on Art (left column) and Moebius (right column) from the Middlebury 2005 dataset [23,24,25]. (a) Guidance color image and zoomed portion of selected red box. (b) Ground truth depth map and zoomed portion of selected box. (c) Result of AR [12]. (d) Result of RCG [8]. (e) Result of LLFM [7]. (f) Result of proposed method.

Figure 8. Experimental results of real Kinect data from the NYU dataset [27] in two columns. (a) Guidance color image and zoomed portion of selected red box. (b) Registered raw depth map and zoomed portion of selected box. (c) Result of AR [12]. (d) Result of RCG [8]. (e) Result of LLFM [7]. (f) Result of proposed method.

Table 1. Average MAE and RMSE results for each component in adaptive bandwidth formulation.

Adaptive Bandwidth	MAE	RMSE
Configuration 1: Local variance	3.15	4.57
Configuration 2: Local variance + Patch Similarity	1.64	2.44
Configuration 3: Local variance + Patch Similarity + Color Gradient (Proposed Configuration)	0.63	0.93

Table 2. MAE results for 2×, 4×, and 8× upsampling on images form the Middlebury 2005 dataset [23,24,25].

	AR [12]		RCG [8]		LLFM [7]		Proposed
	without AB	with AB	without AB	with AB	without AB	with AB	Proposed
2× sampling rate
Art	1.17	1.03	0.71	0.65	0.69	0.59	0.55 (−20%)
Moebius	0.95	0.87	0.55	0.45	0.57	0.55	0.43 (−21%)
Books	0.98	0.9	0.57	0.51	0.54	0.48	0.42 (−22%)
Laundry	1	0.88	0.54	0.49	0.61	0.6	0.41 (−24%)
Reindeer	1.07	0.93	0.57	0.55	0.55	0.52	0.49 (−10%)
W	0	-	0	-	0	-	-
4× sampling rate
Art	1.7	1.58	1.06	0.97	0.98	0.89	0.78 (−20%)
Moebius	1.2	1.07	0.76	0.67	0.75	0.7	0.63 (−16%)
Books	1.22	1.11	0.78	0.7	0.71	0.68	0.58 (−18%)
Laundry	1.31	1.15	0.77	0.69	0.8	0.73	0.61 (−20%)
Reindeer	1.3	1.13	0.8	0.71	0.72	0.63	0.59 (−18%)
W	0	-	0	-	0	-	-
8× sampling rate
Art	2.93	2.75	1.72	1.66	1.68	1.49	1.37 (−18%)
Moebius	1.79	1.58	1.15	1.08	1.25	1.18	0.99 (−13%)
Books	1.74	1.64	2.18	2.11	2.09	1.97	1.73 (−17%)
Laundry	1.97	1.73	1.12	1.05	1.15	1.03	0.96 (−14%)
Reindeer	2.03	1.89	1.14	1.05	1.08	0.99	0.93 (−13%)
W	0	-	0	-	0	-	-

Table 3. Average computational time (in seconds) for sample images using methods listed in Table 2.

AR [12]		RCG [8]		LLFM [7]		Proposed
without AB	with AB	without AB	with AB	without AB	with AB	Proposed
231.34	268.21	205.09	246.81	169.40	201.54	197.63

Table 4. RMSE results from the Middlebury 2005 dataset [23,24,25].

	Art	Dolls	Laundry	Moebius	Reindeer	Books	Avg	W
MSG [13]	5.84	2.08	3.89	2.27	4.62	2.94	3.61	0
SDF [14]	4.14	1.52	2.53	1.72	3.05	1.98	2.49	0
LLFM [7]	3.31	1.48	2.75	1.34	3.51	1.78	2.36	0
RCG [8]	3.56	1.42	2.91	2.05	3.65	1.65	2.76	0
AR [12]	4.07	1.52	2.70	1.64	2.86	2.18	2.5	0
EG [15]	4.16	1.53	2.68	1.56	3.30	2.15	2.56	0
LN [6]	2.62	1.03	1.66	1.02	2.14	1.47	1.66	0
DTSR [16]	1.57	0.87	0.98	0.62	1.19	1.05	1.04	2
Proposed	1.32	0.93	0.76	0.58	1.02	0.98	0.93	-

Table 5. MAE results from the Middlebury 2014 dataset [23,26].

	Couch	Motorcycle	Pipes	Recycle	Sticks	Sword1	Avg	W
LN [6]	3.25	2.77	4.20	1.40	1.77	3.10	2.75	0
LLFM [7]	3.1	2.63	4.08	1.15	1.78	3.06	2.63	0
RCG [8]	3.33	2.76	4.18	1.17	1.98	3.18	2.77	0
EIEF [9]	2.99	2.40	3.60	1.12	1.56	2.84	2.41	0
DBR [10]	2.71	2.45	3.54	1.29	1.30	2.51	2.3	0
UDBD [11]	2.51	2.42	3.59	0.80	1.90	2.45	2.28	2
Proposed	2.3	2.36	3.28	0.95	1.15	2.23	2.04	-

Table 6. RMSE results from the Middlebury 2014 dataset [22,25].

	Couch	Motorcycle	Pipes	Recycle	Sticks	Sword1	Avg	W
LN [6]	9.86	7.13	9.62	4.47	3.18	8.40	7.11	0
LLFM [7]	11.59	7.45	10.14	4.6	3.98	9.33	7.84	0
RCG [8]	11.66	7.78	11.76	3.98	4.02	9.69	8.14	1
EIEF [9]	11.32	7.30	10.16	4.77	3.84	10.27	7.94	0
DBR [10]	9.68	7.40	9.39	4.74	3.06	8.92	7.2	0
UDBD [11]	11.22	7.24	9.43	3.98	4.71	8.29	7.5	1
Proposed	9.56	7.07	9.23	4	2.94	8.01	6.8	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salehi Dastjerdi, N.; Ahmad, M.O. Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth. Electronics 2025, 14, 1671. https://doi.org/10.3390/electronics14081671

AMA Style

Salehi Dastjerdi N, Ahmad MO. Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth. Electronics. 2025; 14(8):1671. https://doi.org/10.3390/electronics14081671

Chicago/Turabian Style

Salehi Dastjerdi, Niloufar, and M. Omair Ahmad. 2025. "Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth" Electronics 14, no. 8: 1671. https://doi.org/10.3390/electronics14081671

APA Style

Salehi Dastjerdi, N., & Ahmad, M. O. (2025). Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth. Electronics, 14(8), 1671. https://doi.org/10.3390/electronics14081671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth

Abstract

1. Introduction

2. Proposed Method

2.1. Distance Map and Adaptive Bandwidth

2.2. Local Model

2.3. Nonlocal Model

2.4. Fusion of Local and Nonlocal Outputs for Depth Upsampling

3. Experimental Results

3.1. Parameters Settings

3.2. Adaptive Bandwidth Analysis

3.3. Quantitative Results

3.4. Visual Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI