Uncertainty Quantification of Neural Reflectance Fields for Underwater Scenes

: Neural radiance fields and neural reflectance fields are novel deep learning methods for generating novel views of 3D scenes from 2D images. To extend the neural scene representation techniques to complex underwater environments, beyond neural reflectance fields underwater (BNU) was proposed, which considers the relighting conditions of on-aboard light sources by using neural reflectance fields, and approximates the attenuation and backscatter effects of water with an additional constant. Because the quality of the neural representation of underwater scenes is critical to downstream tasks such as marine surveying and mapping, the model reliability should be considered and evaluated. However, current neural reflectance models lack the ability of quantifying the uncertainty of underwater scenes that are not directly observed during training, which hinders their widespread use in the field of underwater unmanned autonomous navigation. To address this issue, we introduce an ensemble strategy to BNU that quantifies cognitive uncertainty in color space and unobserved regions with the expectation and variance of RGB values and termination probabilities along the ray. We also employ a regularization method to smooth the density of the underwater neural reflectance model. The effectiveness of the present method is demonstrated in numerical experiments.


Introduction
A 3D reconstruction of underwater scenes is central to marine environmental studies.Neural radiance fields (NeRFs) [1] is a recently emerging method for synthesizing novel views from 2D images based on volume rendering and deep learning.Since its inception, it has drawn tremendous attention in 3D reconstruction area due to its flexibility of processing complex scenes.Nonetheless, the original version of NeRFs was proposed in natural and ambient light in clear air conditions.For underwater environment conditions, which is where the images were acquired by autonomous underwater vehicles (AUVs), two intractable problems arose when using the conventional NeRFs method: (i) the scattering media in the ocean led to attenuation and backscatter effects in the light and distorted the true color; (ii) illumination of on-board light sources resulted in relighting on the scene's appearance.To tackle these challenges, beyond neural reflectance fields underwater (BNU) [2] was proposed to model the effects of water as a combination of attenuation and inverse scattering.Additionally, it employs neural reflectance fields [3] in place of neural radiance fields to learn the scene's reflection properties and simulate the impact of lighting variations on the scene's appearance.BNU has made a great contribution to neural volume rendering and has paved the way for its application in novel view synthesis from underwater imagery.
Due to the complexity of the underwater environment, as well as due to the limitation of lacking cognitive knowledge about unobserved parts of the scene, there is inherent uncertainty when modeling underwater optical effects.Such cognitive uncertainty existing in underwater neural scene representation directly influences the downstream tasks of AUV navigation, such as the automatic identification of damage in underwater infrastructure, target detection and tracking, mapping and motion planning, etc. [4][5][6].By quantifying uncertainty, we can assess the reliability of the model and thus reduce the risks of failures in the aforementioned downstream tasks in real-world underwater exploration conditions that are subject to probabilistic constraints [7].
Uncertainty quantification is a highly active research area in machine learning [8].Numerous techniques have been proposed such as Bayesian neural networks [9,10], Gaussian procedure, neural processes, Monte-Carlo dropout [11], and ensemble networks [12].Inspired by the aforementioned research, some scholars have begun to analyze the uncertainties of neural radiance fields [13].However, to the authors' best knowledge, quantifying the uncertainty associated with neural reflectance fields, particularly for underwater scenes, has not been reported in the existing literature.
To fill the aforementioned research gap, the present study attempted to extend the uncertainty quantification of neural radiance fields [13] to neural reflectance fields [2] based on BNU.We took an ensemble learning approach, which not only averages rendered RGB images, but also introduces an additional cognitive uncertainty term consisting of the termination probability of each ray in order to infer the cognitive uncertainty arising from the absence of cognitive knowledge about the unobserved part of the scene during training [13].In addition, we employ a regularization method that smooths the density amplitude to test the model's inference uncertainty capability [14].
In this work, we demonstrated that our model can explicitly infer uncertainty in both synthetic and real-world underwater scenes.Besides, the model exhibits superior performance in key metrics related to image quality and reconstruction error.In summary, our research makes the following contributions:

•
For the first time, uncertainty quantification has been introduced to neural reflectance fields of underwater scenes, thus enabling us to analyse the reliability and enhance the robustness of the model.

•
The regularization proposed in Ref. [14] is incorporated to BNU.

•
Our uncertainty quantification framework strictly follows a volume rendering procedure and does not require changes to underlying architectures of the codes.

Underwater Neural Scene Representation
Neural scene representation is a method for encoding and reconstructing 3D scenes based on neural networks.Among them, NeRFs has gained increasing popularity due to its high-quality and realistic image rendering results.Numerous works have followed the steps of NeRFs, and they have expanded the original framework of NeRFs from different perspectives.However, NeRFs and its variants overlook the strong influence of the medium on object appearance in underwater conditions.WaterNeRF [15] takes this as a starting point and utilizes mip-NeRF 360 [16] to simulate underwater scenes, where absorption and backscatter coefficients are learnt by optimizing the Sinkhorn loss between rendered images and histogram-equalized images.Unlike WaterNeRF, SeaThru-NeRF [17] introduces a scattering image formation model to capture the impact of the underwater medium on imaging by respectively assigning color and density parameters to objects and the medium, to model the effects of shallow water natural ambient light.Additionally, a typical physicalbased reconstruction of the optical scene in shallow underwater environment conditions is discussed in [18,19].They predicted the impact of optical effects on underwater images by inputting the underwater images into an underwater wave propagation model.It is noteworthy that underwater scenes are largely dependent on the depth of the water.Shallow water scenes are influenced by sunlight, while deeper water layers prevent the penetration of sunlight.For deep-sea environments, illumination primarily relies on searchlight beams that are attached to the underwater vehicle, and the influence of natural ambient light can be negligible.Therefore, as the underwater vehicle moves, the scene's appearance will change with different lighting conditions.To adapt to these changes, it is necessary to model the reflection properties of the scene using neural reflectance fields [3].For example, BNU has utilized neural reflectance field model to learn the albedo, surface normal, and volume density of underwater environments [2].By jointly learning the medium effects and neural scene representation, it recovers the true colors of underwater images and achieves high-quality rendering under new lighting conditions.

Uncertainty Estimation in Deep Learning
Uncertainty estimation is an research topic of theoretical and practical importance across many domains [20,21].Characterizing the uncertainty of neural networks not only enhances the interpretability of model outputs, but also reduces the risk of severe failures occurring.Early research proposed using Bayesian neural networks (BNNs) [22] to introduce probability distributions and estimate the uncertainty of network weights and outputs by probabilistic modeling.However, training BNNs typically require significant modifications to the network architecture and training process, which is computationally expensive.As such, several more low-complexity and practical strategies have been proposed to incorporate uncertainty estimation into deep neural networks, such as Monte-Carlo dropout and deep ensembles.MC-dropout-based methods [23][24][25] introduce randomness on the intermediate neurons of the network.Such methods perform multiple forward propagations of the model during testing and randomly discard some neurons in each forward propagation to obtain different predictions, which are then aggregated to obtain an uncertainty estimate for a given input.However, the need to perform multiple forward propagations increases the computational cost during the inference process.In deep ensemble strategies [26], by training a finite ensemble of independent deep-learning models, introducing differences such as different initializations, different training data sampling, or different model structures, this kind of method can provide more comprehensive and reliable uncertainty estimation, especially in the case of complex noisy data with better robustness than other methods.Refs.[27][28][29] adopted model order reduction methods combined with deep learning to accelerate the sampling procedure in uncertainty quantification.

Uncertainty Estimation in Neural Radiance Fields
Recently, several works have explored the possibility of applying uncertainty estimation to NeRFs.The pioneering work on NeRF-W [30] used an auxiliary volumetric radiance field and a data-dependent uncertainty field to model the transient elements and reduce the impact of transient objects on static scene representations.S-NeRF [31] uses a Bayesian model approach to the probability distribution of all possible radiance fields in the scene and quantifies the information uncertainty provided by the model through this distribution.CF-NeRF [32] combines latent variable modeling and conditional normalizing flows to flexibly learn the distribution of radiance fields in a data-driven manner, thereby obtaining reliable uncertainty estimation and maintaining the expressive power of the model.ActiveNeRF [33] proposes an approach based on an active learning scheme, which selects samples that yield the maximum information gain by evaluating the reduction in uncertainty for the entire scene.In this way, the quality of a novel view synthesis can be improved with minimal additional resource investment.However, the aforementioned NeRF-based methods cannot make inferences explicitly in the absence of knowledge about unknown scene regions, thus leading to high degree of uncertainty [7].To address this issue, we introduced an additional measure of uncertainty in the unknown space, i.e., we quantified the epistemic uncertainty of unknown scene regions by learning the probability of ray termination in the geometric domain.

Neural Reflectance Fields
Neural reflectance fields (Figure 1) is a method for representing the reflection properties of object surfaces, and it uses MLP to approximate the albedo α = α r , α g , α b , surface normal n = n x , n y , n z , and volume density σ at a given 3D position x = (x, y, z) that maps with the hash encoding γ.It implicitly represents the interaction of rays with the object's surface.By combining it with a physically based differentiable ray marching framework, it can accurately model the appearance of real scenes with complex geometry and reflection characteristics.The neural reflectance field is oriented toward arbitrary illumination conditions, improving the MLP's ability to learn high-frequency information and compensating for the shortcomings of the NeRF that only synthesizes novel views under fixed illumination and does not allow for relighting tasks.
The modeling process of the neural reflectance field is expressed in Equation ( 1) as: In the following, we use the function f θ implemented to denote a deep neural network MLP with parameters θ:

Beyond NeRFs Underwater (BNU)
As illustrated in Figure 2, BNU considers the effects of optical properties such as attenuation and backscatter on underwater imaging.The radiation L λ captured by the camera ray x = o − tω, which is mapped to the position o in the direction ω, can be expressed as follows: where the integration along the ray is restricted to the near boundary d n and the far boundary d f ; λ denotes the wavelength; S λ is the backscatter; σ(x) is the volume density predicted by the MLP at the point x; and T n λ is the transmittance between the camera plane and d n , which is written as where β λ and S λ stand for the attenuation coefficient and backscatter (which are independent of spatial position throughout the scene reconstruction process), respectively.The scattered radiance l λ along the ray from x to o is expressed as part of the integrand as follows: where S 2 represents the spherical domain around point x and I λ is the incident radiance to x from direction ω i .The albedo α λ (x) and normal n(x) are the reflection properties at x obtained from the neural network prediction.
T λ (x) represents the cumulative transmittance of the ray from x to d n : where σ λ (x) = σ(x) + β λ denotes the attenuation coefficient.We used numerical methods to estimate this continuous integral of Equation (3).First, [d n , d f ] was divided into N uniformly distributed intervals through stratified sampling, and then one sample was randomly drawn from each interval as follows: Although we use a set of discrete samples to estimate integration, stratified sampling ensures a continuous scene representation.This is because stratified sampling divides the input space into multiple levels or subregions to comprehensively capture the characteristics of the input space, thus enabling us to evaluate MLP at continuous positions.
We utilized this set of discrete samples to estimate L λ by using the integration rule discussed in [34] for volume rendering: Upon substitution of Equation ( 9) into Equation ( 8), we have the following: Equation ( 3) allows us to compute L λ for all rays x passing through the camera center and the image plane to render an image.Next, we will use the symbol L θ k to denote the estimated radiance of the MLP network f θ k along the light ray x.
Also, in a similar manner to the neural radiance fields, the above was primarily divided into coarse rendering and refined rendering.The final rendering came from refined rendering, while the coarse rendering was mainly used for loss calculation.The main difference between them was that coarse rendering utilizes the coarse volume density σ(x) predicted directly by the neural network, while refined rendering uses the exact volume density m o (x)σ(x), where m o (x) = sigmoid(3(σ(x) − 3)).

Ensembles for Predictive RGB Uncertainty
We fed the same training data into the neural network model { f θ k } k=1...M in which the radiance was predicted as follows: where As can be seen through Equation ( 2), our ensemble model is divided into three main parts: the albedo field, the surface normal field, and the volume density field.
where k = 1 . . .M denotes the number of ensemble model.By averaging the color of each pixel during the rendering process, the expected radiation of the camera ray x is We denote the prediction uncertainty in RGB space as the variance of individual network predictions: µ RGB or σ RGB has three components on the RGB color channels.To simplify the calculations, we assumed that the three color channels R, G, and B are independent of each other, and this was achieved by disregarding their inter-channel relationships and only retaining the primary information.In addition, for the convenience of subsequently combining color variance with other uncertainty terms such as σ 2 epi (x) into a unified uncertainty metric ψ 2 (x), we merged the variances of the three color channels and used a single value to represent the overall RGB uncertainty.

Ensembles for Epistemic Uncertainty in Unseen Areas
Using the expectation and variance of RGB values in color space to quantify prediction uncertainty is a simple and partially effective approach.However, these values only reflect the uncertainty of color predictions.They cannot measure cognitive uncertainty about unobserved scenes during training.
From our observations, it was found that since the prediction model lacks relevant data to learn features of unobserved regions during the training process, it does not know the exact shape and color of those regions; thus, it is not able to provide meaningful termination probability predictions.As a result, the model assigns very low termination probabilities to each sample point on ray x, and the cumulative sum q θ k (x) of termination probabilities along the ray approaches zero [13].Hence, we can consider the termination probabilities along the ray x as a way for the model to express cognitive uncertainty about the unknown regions of the scene.
We averaged the termination probabilities along ray x across the entire ensemble model as follows: where q(x) ≈ 1 indicates that the ray intersects with the observed scene structure during training, and q(x) ≈ 0 is such if it did not.
In order to capture the prediction uncertainty along ray x as comprehensively as possible, we expressed the total uncertainty as a combination of the RGB variance σRGB and a cognitive uncertainty term σ epi as follows: where Moreover, the predicted colors of the ensemble along ray x were modeled as Gaussian distributions with diagonal covariance matrices: Simply using the expectation and variance of RGB values in color space would overlook the cognitive knowledge about unobserved scenes during training.However, the termination probability precisely reflects the uncertainty of this process.Therefore, the uncertainty terms σ2 RGB (x) and σ 2 epi (x) were found to be complementary in that they capture the arbitrary and cognitive uncertainty in different aspects of the model.
Additionally, as indicated in this section, volume density was identified as a significant factor influencing uncertainty in our research.Rendering with coarse volume density σ(x) and exact volume density m o (x)σ(x) yields different σ2 RGB (x) and σ 2 epi (x) results, respectively.Therefore, it is necessary to conduct separate experiments for both coarse rendering and refined rendering in a systematic study.We performed experiments on the scenes introduced in [2].This process included the synthetic dataset and real dataset.The synthetic images with underwater optical effects were simulated based on the Jaffe-McGlamery model [35].Ref. [2] used a Sony ILCE-7M3 camera with a 40 mm prime lens and LED lights to collect the real images in a tank at a water depth of 1.3 m.Additionally, the camera poses were simultaneously acquired via COLMAP [36], and the JPEG images were post-processed to ensure high feature quality.

Framework
Our ensemble model primarily consists of three components: the normal field, the density field, and the albedo field (Figure 3).For the choice of M, we empirically selected M = 3.The ensemble model sampled 100 rays from an image and 100 points on each ray at each training iteration.The model was trained for 5000 epochs for each scene.

Metrics
Herein, we use two main types of metrics: quality metrics and uncertainty quantification.For the quality metrics, we computed the mean absolute error (MAE), mean-square error (MSE), and root-mean-square error (RMSE) to evaluate the reconstruction error, and the peak signal-to-noise ratio (PSNR) was used to characterize the quality of the rendered views.

•
The MAE directly calculates the average absolute error between the predicted value of the model and the ground truth.The smaller the MAE value, the more accurate the prediction.

•
The MSE is calculated by squaring the differences between the ground truth and predicted values, summing them up, and then taking the average.
• The RMSE measures the deviation between the predicted values and ground truth, and it is sensitive to outliers in the data. • The PSNR is a metric used to measure image quality.Here, MAX x represents the maximum pixel value of image x.
We used the predicted values obtained through experiments to evaluate the quality of reconstruction.It was evident that the evaluation results of the quality metrics were correlated with the accuracy of the predicted values.
For uncertainty quantification, to assess the reliability of the model prediction uncertainty, we reported two widely used metrics.First, we used the negative log likelihood (NLL) as an evaluation criterion [37].The using NLL was chosen because our ensemble model provides a Gaussian distribution for predicting rendered colors along rays, where the expectation is equivalent to the estimated color and the variance is a combination of uncertainty measures based on RGB variances and cognitive metrics.In addition, to assess the correlation between prediction error and uncertainty estimates, we reported the area under the sparsification error (AUSE) curve [38][39][40].

•
The log likelihood function was defined as follows: where θ is the unknown parameters, x is the observed sample, and P(x|θ) is the probability distribution of x given θ.The negative log likelihood (NLL) is defined as the negative of the log likelihood function as follows: NLL(θ|x) = − ln L(θ|x) = − ln P(x|θ).
From Section 4.2, the sample of observations x follows a Gaussian distribution as follows: with the probability density function as Then, the NLL of this Gaussian distribution is the following: • For the AUSE, the prediction error (e.g., RMSE) for each pixel was firstly computed based on the predicted values and ground truth.The uncertainty values obtained from training and the prediction errors for all pixels were merged and sorted.The top 1% of data were removed from the sorted list, and the average error and average uncertainty of the remaining data were treated as the point at 1%.Similarly, the top 2% of data were removed; then, the point at 2% was calculated, and this was so on performed until it reached 100%.This process generates two curves: the prediction error curve and the uncertainty curve.The area enclosed by the two curves is the AUSE value.A lower AUSE value indicates a higher correlation between uncertainty estimation and true error, thus implying a more reliable uncertainty estimation.
We quantified the uncertainty by means of experimentally obtained predicted values, as well as with σ2 RGB (x) and σ 2 epi (x).Based on the above discussion, we could infer that the uncertainty quantification results of the numerical experiments were closely related to the accuracy of these values.

Results
We show the main results of our experiments in Table 1.Our ensemble strategy achieved excellent NLL and AUSE on both synthetic and real datasets.Moreover, it exhibited outstanding performance in terms of image reconstruction error and rendering quality.An individually trained network can only learn the information contained in a limited number of training samples, whereas the ensemble model learns features from the expression of features learned through multiple networks with different initialization; as such, it can express richer information.
On the other hand, the ensemble model obtains the final prediction result by averaging the multiple networks (Figure 4), which is equivalent to sampling the dataset several times and then averaging the data.Statistically, after averaging multiple quantification errors, the average error will be smaller or equal to the single quantification error, which means that this horizontal averaging can offset the effect of the larger prediction errors of individual networks to a certain extent.
In short, by using multiple networks to predict the results, the overall prediction error can be reduced to a certain degree, and this improves the quality of image reconstruction so that the PSNR and other metrics can obtain better results than a single model.Individual Ensemble Component: As shown in Table 3, we experimented with the results of implementing the ensemble strategy for individual ensemble components (Figure 5).Firstly, normals are crucial for capturing the surface details and geometric structure of objects.By ensembling multiple norm fields, it is possible to better reproduce the details and shapes of objects in scenes.This allows it to outperform the BNU model in terms of performance on both synthetic and real datasets as it can more accurately capture the appearance characteristics of objects.
Furthermore, by ensembling multiple albedo fields, it is possible to enhance the model's ability to capture lighting and material variations.However, due to the characteristics of the synthetic data, the contribution of the albedo may be relatively small, thus resulting in a less significant impact from the ensemble of the albedo field, as well as a slightly lower performance when compared to BNU.On the other hand, the contribution of albedo was found to be greater on the real dataset, which captured the appearance features of the object more accurately and therefore outperformed BNU on the real dataset.
Then, the density field plays a key role in the propagation of light within a scene.By ensembling multiple density fields, the effect of light propagation in a scene can be better simulated, thereby providing more accurate rendering results.Since synthetic datasets usually have more idealized lighting conditions, geometric structures, and material properties than real scenes, they can lead to the overfitting of the specific features of the synthetic dataset during the training process, which do not generalize well to other synthetic scenes and result in a poorer performance on synthetic datasets.However, in the real dataset, the ensemble strategy can better simulate the propagation of light in different media, thus making it able to provide better rendering results and performance than BNU.
Finally, Table 3 lists the quality metrics for refined rendering and coarse rendering in different models.It was found that the values between them were very close.Therefore, we chose to retain more decimal places to investigate the effect of the coarse volume density σ(x) and exact volume density m o (x)σ(x) on the results because more of an accurate volume density m o (x)σ(x) will always provide better results.

Influence of Uncertainty Terms
Our uncertainty measure consists of two terms.To better understand their respective impacts, we reported the results when using only one term, as well as those obtained by combining the two terms on both synthetic and real datasets.
Additionally, from Tables 4 and 5, we can observe that, aside from the NLL corresponding to σ2 RGB (x), various data metrics between the coarse and refined rendering were very close.To make the differences between them more apparent, we chose to retain more decimal places.In the case of the NLL data corresponding to σ2 RGB (x), the differences between them were already obvious enough; as such, we did not choose to expand the number of decimal places for that particular metric.All of the displayed decimal places hold physical significance as they represent a higher precision in quantifying uncertainty.This helps us observe the differences in uncertainty quantification between coarse and refined rendering.Entire Ensemble Model: From Table 4, we can observe that using only σ2 RGB (x) as an uncertainty measure results in significantly higher NLL, as well as that metrics such as AUSE are inferior to other cases.This is because color variance in high-frequency regions is strongly influenced by individual pixel outliers.A high deviation in a single pixel can lead to high variance across the entire image, especially in unobserved scenes during training.In contrast, the cognitive uncertainty term σ 2 epi (x) achieves excellent results in most scenarios.This is attributed to the fluctuations in individual q θ k (x), where the ensemble model assigns slightly lower q θ k (x) to rays with higher uncertainty.
Using σ 2 epi (x) allows us to differentiate the model's understanding level between known and unknown regions, effectively capturing cognitive uncertainty in unobserved areas.By comparison with the color variance, termination probability can directly reflect the model's judgment on the entire ray entering the scene.Hence, it is less affected by individual outliers and more stable reflecting the model's knowledge level.
The reason for considering the combination of both components, rather than relying on a single one, is that color variance σ2 RGB (x) can effectively capture noise uncertainty that is difficult to model caused by factors like sampling and occlusion in known regions.Additionally, in simple scenes with homogeneous color distributions, color variance can better express uncertainty in known regions and can be directly computed in the image pixel space.On the other hand, for unknown regions, utilizing the uncertainty term σ 2 epi (x) can supplement the cognitive uncertainty information that cannot be expressed by the color variance well.By combining these two uncertainty terms, a more comprehensive and accurate uncertainty judgment is provided.
Individual Ensemble Component Frameworks: We experimented with individual ensemble component frameworks (Figure 6).According to Table 5, we can observe that the inferring ability of the uncertainty measure ψ 2 (x) exhibits different performances on the synthetic and real datasets when we focus on a specific ensemble component.Intuitively, performance degradation was observed on the synthetic dataset while the performance enhancement was observed on the real dataset.This difference stems from the complexity of the data itself.Synthetic data scenes are relatively simple with fewer lighting variations.The multi-component ensemble uses multiple sources of information for greater expressiveness, thus leading to more pronounced advantages.In contrast, in a real dataset, due to the complexity of the data such as in large light variations, the inference capability of uncertainty makes it more dependent on individual components.To provide further illustration, we present the impact of lighting variations on the appearance at the same location in both synthetic and real data in Figure 7. From Figure 7, we can observe that the lighting variations have a significant impact on appearance in the real dataset, while the effect is relatively smaller in the synthetic dataset.A multi-component ensemble approach requires learning resources to be spread evenly across multiple tasks, thus making it difficult to optimize for a particular component in more depth.Therefore, focusing on a specific ensemble component allows for better adaptation to the complex distribution of real data.In addition, through observation, we can notice that, when ensembling multiple density fields, the uncertainty measure ψ 2 (x) exhibits a superior performance compared to ensembling multiple norm fields or multiple albedo fields.This is because the termination probability is used to determine whether the propagation of light in the scene is terminated or not.And the termination probability is closely related to density as density reflects the extent of the presence of objects in the scene.By considering the relationship between termination probability and density, the model can better estimate the light propagation within the scene and capture the degree of object presence, thereby effectively handling the uncertainty of the unobserved parts of the scene and quantifying this uncertainty.

Discussion
From the above study, we can see that the fluctuation caused by a single q θ k (x) can make the cognitive uncertainty term σ 2 epi (x) obtain very good results in most scenes, and the fluctuation caused by a single q θ k (x) can be traced back to the fluctuation caused by a single density.In other words, σ 2 epi (x) depends on a single density prediction along each ray x.However, during the model training process, the density predictions tended to have large oscillations or jumps for various reasons.It was also found that our uncertainty estimation framework is highly sensitive to density, thus making it more likely to capture uncertainty in regions with high oscillations.Considering our goal was to assess the reliability of uncertainty quantification models, we experimented to artificially reduce the density amplitude and fluctuations in the ray sampling region by smoothing the density curve around the camera (Table 6).
We used a regularization method [14] that penalizes density fields in the vicinity of the camera: where σ K denotes the density value of the K points sampled along the ray x from the origin and n K is the binary mask vector that determines whether a sampled point will be penalized or not.We set the value of n K to 1 before the regularization range N, and we also set the rest of the values to 0. In addition, we used a weight of 0.01 in all experiments.We conducted the above experiment for 100 rays on the same image (Figure 8), and it can be seen that our uncertainty estimation framework performed well even when the density amplitude and fluctuations were artificially reduced (which is sufficient to demonstrate its ability to infer uncertainty).

Conclusions
In this paper, we introduced an ensemble approach to quantify uncertainty in underwater neural scene representation models with reflection properties.It can quantify prediction uncertainty in RGB space by modeling ensemble rendering errors in the scene and identifying the cognitive uncertainty caused by the lack of cognitive knowledge about unknown scenes through an additional cognitive uncertainty term based on ray termination density.Furthermore, we used a method to artificially reduce its density amplitude to test its ability to capture subtle uncertainties and validate the reliability of the model.The numerical experiments demonstrated that our ensemble model can explicitly infer uncertainty in the model on both synthetic and real scenes, thereby exhibiting superior performance and outperforming previous works in key metrics related to reconstruction error and rendering quality.The proposed algorithm will benefit the downstream tasks of ocean exploration and navigation, such as the automatic identification of damage in underwater infrastructure [18,19], target detection and tracking, mapping and motion planning, etc.The present work is not without limitations.For example, our model assumes that the light source is from onboard light, which is only valid for deep sea conditions.For shallow sea conditions, however, natural light comes into play and should be considered in the model.We will investigate this problem in depth in future work.

Figure 4 .
Figure 4. Ensemble prediction framework.The physical interpretation represented by each color is the same as in Figure 2.

Figure 5 .
Figure 5. Ensemble model of the individual ensemble components.(a) Ensemble albedo field.(b) Ensemble normal field.(c) Ensemble density field.Arrows point to different initialization parameters and the physical interpretation of each color is the same as in Figure 2.

Figure 6 .
Figure 6.Uncertainty estimation framework of the individual ensemble components: (a) Ensemble albedo field.(b) Ensemble normal field.(c) Ensemble density field.

Figure 7 .
Figure 7. Real and synthetic images of lighting variations with the same locations being marked by red boxes: (a) Real.(b) Synthetic.

Figure 8 .
Figure 8.We experimented with 100 rays, which are represented as curves of different colors in the figure, in addition to the vertical axis indicating the density values corresponding to the sampling points, and the horizontal axis indicating the sampling points on the rays.By observation we can find that without the regularization term, the maximum value reached more than 200, but, with the regularization term, the maximum value did not exceed 16.

Table 1 .
The results of our ensemble model.
5.3.Ablation Study5.3.1.Influence of the EnsembleEntire Ensemble Model: As can be seen from Table2, we outperformed previous works in terms of image reconstruction error and rendering quality.

Table 2 .
Influence of the entire ensemble model.

Table 3 .
Influence of individual ensemble components.

Table 4 .
The uncertainty terms' influence on the entire ensemble framework.

Table 5 .
The uncertainty terms' influence on the individual ensemble components.