Hierarchical Bayesian Data Analysis in Radiometric SAR System Calibration : A Case Study on Transponder Calibration with RADARSAT-2 Data

A synthetic aperture radar (SAR) system requires external absolute calibration so that radiometric measurements can be exploited in numerous scientific and commercial applications. Besides estimating a calibration factor, metrological standards also demand the derivation of a respective calibration uncertainty. This uncertainty is currently not systematically determined. Here for the first time it is proposed to use hierarchical modeling and Bayesian statistics as a consistent method for handling and analyzing the hierarchical data typically acquired during external calibration campaigns. Through the use of Markov chain Monte Carlo simulations, a joint posterior probability can be conveniently derived from measurement data despite the necessary grouping of data samples. The applicability of the method is demonstrated through a case study: The radar reflectivity of DLR’s new C-band Kalibri transponder is derived through a series of RADARSAT-2 acquisitions and a comparison with reference point targets (corner reflectors). The systematic derivation of calibration uncertainties is seen as an important step toward traceable radiometric calibration of synthetic aperture radars.


Introduction
A synthetic aperture radar (SAR) system is a measurement system that acquires measurement data for earth-observation applications.Many applications like soil moisture [1] or forest biomass estimation [2] require absolutely radiometrically calibrated data products.The absolute radiometric calibration ensures that systematic effects have been characterized and are compensated so that the derived radar ground reflectivity appears to be independent of the measurement system.Not until then can the data be used for physical parameter inversion modeling.
The necessary radiometric calibration is a two-step process and is divided into relative and absolute calibration [3,4].During relative calibration, the system transfer function, antenna pattern, instrument drift (due to temperature and time), antenna mis-pointing and mis-polarization, SAR mode (Stripmap, ScanSAR, etc.), atmosphere, and processor effects are characterized and compensated.After relative calibration, the mapped reflectivities are still only given as system indications, i.e., as digital numbers derived during processing.The digital numbers are then related to the definition of a physical unit through absolute calibration.Absolute calibration is, for practical reasons and an expected higher calibration quality, executed as an external calibration for space-borne SAR systems [3].Reference point targets (typically corner reflectors or active radar calibrators [3][4][5][6]) with known radar reflectivities are distributed within an imaged scene, and their backscatters are related to the indications from the SAR system through a proportionality constant, often called calibration factor K. Usually the external calibration is repeated for a number of acquisitions, each using a number of reference point targets, in order to reduce random effects through averaging.

Problem Statement
The result of a measurement is only meaningful if, besides the estimate of the value of the measurand, a statement about the uncertainty of the estimate is given.Uncertainty is defined here as "the parameter which characterizes the dispersion of the values that could reasonably be attributed to the measurand" [7].Stating uncertainties is the prerequisite for measurement results which are traceable to national standards.Metrological traceability is necessary for the comparability of measurement results, the foundation for quantitative scientific and commercial applications.
The final, combined measurement uncertainty results from several uncertainty contributions, and the calibration uncertainty is one of them.If a SAR system is absolutely calibrated, the uncertainty with which the radar reflectivity of the reference point target is known has a direct impact on the combined uncertainty.Another source of uncertainty is the estimation of the calibration factor from external calibration measurements.Specifically, this second source of uncertainty is addressed in this paper.
Currently, the derivation of measurement uncertainties in radiometric SAR system calibration is not systematically conducted.To the best of our knowledge, no operator of a previous or current space-borne SAR system claims to provide traceable radiometric calibration, and the quoted measurement uncertainties in the respective data product specifications or other publications [5,6,[8][9][10][11] remain questionable.At the moment, a comprehensive uncertainty analysis, preferably in accordance with the ISO Guide to the expression of uncertainty in measurement (GUM) [7], is lacking.This diminishes the scientific integrity of the data products and hinders further adoption of radiometric measurement data for advanced quantitative analysis and physical parameter inversion modeling, within and across SAR missions.

Objective, Approach, and Paper Structure
One piece of the puzzle of achieving traceable radiometric SAR system calibration is the derivation of the calibration factor from external calibration acquisitions.We propose for the first time to apply Bayesian statistical data analysis and hierarchical modeling to estimate parameters like the absolute calibration factor including uncertainties and confidence intervals from SAR images.The proposed method is also applicable for similar data analysis tasks in radiometric calibration.
Bayesian data analysis has been proven advantageous in many parameter estimation problems [12,13], but to the best of our knowledge so far it has not yet been applied for absolute radiometric SAR calibration.Bayesian analysis lends itself well to hierarchical data modeling [14], and is therefore a very good fit for the estimation problems that one faces during an external SAR calibration campaign.A general but short review of the methodology and an outline of its application in the domain of absolute radiometric calibration is given in Section 2.
The suitability of the proposed method is shown through a case study in Sections 3 (campaign setup) and 4 (data analysis).The objective of the case study is conceptually identical to the derivation of the absolute calibration factor, although here the reflectivity of a point target shall be accurately determined.The presented case study is based on an external calibration campaign which was executed in April 2013.15 corner reflectors were deployed as reference targets, and the reflectivity of DLR's new C-band Kalibri transponder prototype was derived from a series of eight data acquisitions from the Canadian RADARSAT-2 SAR system.
Finally, the proposed approach is further discussed in Section 5 and conclusions are given in Section 6.

Note on Point Target RCS versus ERCS
This paper adopts the reasoning laid out in [15].According to it, a reference point target's brightness in a SAR image is not generally proportional to its radar cross section (RCS), which is a body property depending on frequency and incidence angle.Rather, a target's brightness in a SAR image depends on the measured complex amplitudes and the weighted averaging (over the chirp pulse bandwidth and the azimuth angles) executed by the SAR processor.This other quantity is called equivalent radar cross section, or ERCS.
Targets whose ERCS is accurately known can be taken as a reference to calibrate SAR systems.The aim of the presented case study is to derive the ERCS of DLR's next-generation C-band Kalibri transponder prototype so that in principle it can be taken as a calibration normal for subsequent SAR calibration campaigns.
In order to avoid confusion between the symbols for (E)RCS and standard deviation, the (E)RCS will not be denoted with the customary letter σ but with its alternative form ς .

Methodology for Parameter Estimation from SAR Data
Bayesian statistics is, like classical (frequentist) statistics, a well established field with applications in many scientific areas.It is extensively covered in the literature (see for example [12,13,16]), and the following aims to only sketch basic principles and to highlight its usefulness in the frame of absolute radiometric SAR system calibration.

Introduction to Bayesian Statistics and Numerical Methods
In Bayesian statistics, all unknown quantities are handled as random variables and are described by probability distributions, which in turn describe a measure of the state of knowledge of the parameter's value.In Bayesian analysis, the probability function of a parameter (prior) is updated by incorporating new information, i.e., data, to derive a posterior probability function.
If a population parameter (e.g., the case study transponder ERCS) is called θ , then it can be described by a probability distribution function p(θ ).Even before a measurement is done, a certain, subjective, (prior) probability density can be attributed to the parameter.In case of the transponder's ERCS, one might claim that the parameter needs to be positive and that it certainly will not be above 100 dBm 2 .If now some new data y (e.g., from the case study RADARSAT-2 campaign) becomes available, one wants to update one's believe on θ given data y.This can be written as p(θ |y) where p(•|•) describes a conditional probability.The notation follows [12].Now, Bayes' rule allows to compute the posterior probability p(θ |y) from a prior probability p(θ ) and a likelihood function p(y|θ ) [12]: The likelihood function p(y|θ ) describes how likely the data is, given the parameter θ .Describing parameters by probability distributions is a natural fit when not only the best estimate of a parameter needs to be stated, but also a confidence interval needs to be quantified, as is the case for calibration.Deriving a confidence interval from a distribution is straight forward for any kind of distribution, i.e., the analysis is not limited to Gaussian distributions.
Deriving posterior distributions is in practice mostly achieved through numerical methods, which allow to consider more complex problems and arbitrary distributions.The simulation method used in the case study is the Markov chain Monte Carlo (MCMC) approach [12].The posterior distribution is approximated by sequentially drawing samples from approximate iterative intermediate distributions until the simulation converges to the target distribution.Several software packages exist which simplify computations; here the PyMC library is used [17].

Hierarchical Models
An external radiometric calibration campaign results in a pool of data samples (see [5] for an example), one per reference point target ERCS per SAR image.Depending on the research question, the data often needs to be grouped to estimate group-level parameters (for compensation) or to select group-level models (for validation).Typical questions are: • What is the best estimate of the calibration factor (and its respective confidence interval) if several types of reference point targets (i.e., transponders and corners of different sizes) with different ERCS' and stabilities are deployed?Solving this problem with classical (frequentist) statistics would require to estimate the population mean of each group, and deriving the calibration factor after ERCS compensation between groups.
The information on the variance within each group is lost, and a reliable statement of the final uncertainty or confidence interval on the estimated calibration factor is difficult to achieve.With hierarchical Bayesian modeling though, the variance within each group (target type) and the variance across all target types can be derived simultaneously because group and total dispersion are handled within a joint probability model.• Is there a significant systematic dependence on the chosen antenna beam (or near/far range, left/right looking geometries, or ascending/descending orbits) for radiometric measurements?Once again the same set of data samples as before should be grouped, but this time by antenna beam (or near/far range, left/right looking acquisitions, or ascending/descending orbits).For each group, a posterior distribution for the respective calibration factor can now be derived.Comparing the different posterior distributions allows to conclude if a significant radiometric inter-beam offset exists.• For a check on plausibility: Is the ERCS of one of the reference point targets systematically different from the others?(Here repeated overpasses over the same set of targets is assumed.) In order to answer this question, the overpass-dependent effect of the SAR system and the atmosphere should be modeled out of the analysis.This can be done by grouping the samples according to overpass and target ID.All target samples of one overpass can be used to compensate for SAR system and atmospheric effects, and in a second step the group ERCS of each target can be determined.
In fact, all questions previously raised can be conveniently answered by setting up a single joint hierarchical model.Partial results contribute to the particular derivation of posterior distributions without loss of information.On the other hand, a classical (frequentist) approach would require independent analyses, and carrying over results from one model to the next would result in loss of information (due to the summarizing nature of the classical (frequentist) approach).
Further details on hierarchical models can be found in [12].

Case Study: Measurement Campaign Goal and Setup
The following case study applies hierarchical Bayesian data analysis to a practical problem.This Section 3 describes the campaign goal and setup, whereas the following Section 4 applies the proposed method of hierarchical Bayesian data analysis for the estimation of a parameter in absolute radiometric calibration.

Introduction and Goal
The DLR currently develops and manufactures a set of next-generation active radar calibrators (transponders) [18], see Figure 1.These Kalibri transponders will be used in the upcoming C-band Sentinel-1 SAR mission.The most critical transponder parameter is its radar reflectivity expressed in ERCS because the transponder will be used as a calibration standard for absolute radiometric calibration.Any error in the calibration of the calibration standards has a direct impact on the derived absolute calibration factor for Sentinel-1.First ERCS measurements of the Kalibri prototype were performed in DLR's compact antenna test range.The goal of the campaign was to derive the estimated transponder ERCS including its confidence interval.The data analysis, see Section 4, uses hierarchical Bayesian modeling as described above.The research question answered in the case study, derivation of the transponder ERCS ς t , is conceptually identical to the derivation of the absolute calibration factor K (linking SAR system indications P to the measurement quantity, (E)RCS) because only the roles of knowns and unknowns are reversed: Therefore, the approach used in the case study is equally applicable to the radiometric calibration of SAR systems.

RADARSAT-2 Products
The RADARSAT-2 SAR system operates at a center frequency of 5.405 GHz.The mode-dependent bandwidth goes up to 100 MHz.Center frequency and maximal bandwidth are therefore identical to those of the Sentinel-1 mission for which the Kalibri transponder was designed.
All eight RADARSAT-2 products were acquired in the Wide Ultra-Fine mode in VV polarization and delivered as single-look complex (SLC) images.The products offer a high nominal single-look geometric resolution of 1.6 m × 2.8 m (range × azimuth resolution) [19].A high-resolution mode was chosen as to increase the target-to-clutter ratio for the used point targets.Special care had to be taken in choosing the dynamic range of the processed products so that the peak of the transponder pulse response is still within the processed image's dynamic range.
An overview of all overpass times and beams is shown in Table 1.

Reference Point Targets
Within this study, the RADARSAT-2 system was considered radiometrically uncalibrated because of the mismatch between the system's specified relative radiometric calibration (<1 dB) [19] and the target uncertainty of about 0.2 dB.Therefore, reference point targets have been placed in the imaged scenes for manual, day-to-day calibration.
In total 15 trihedral corner reflectors were used as the reference point targets to derive the transponder ERCS.The comparatively large number was deemed necessary in order to profit from averaging during data analysis.Reflectors with two different inner leg lengths were deployed, resulting in two distinct values for their ERCS.
The RADARSAT-2 system operates at a center frequency of 5.405 GHz with a small relative bandwidth of 2%.Under this precondition, a corner's ERCS is, with sufficient accuracy, equal to a corner's RCS at the center frequency [15].The peak RCS of an ideal corner reflector can be estimated with the physical optics approximation where a is the corner's inner leg length and λ is the wavelength [20].The resulting values are listed in Table 2.

Target Alignment
The corner reflectors were realigned together with the transponder for each upcoming overpass, so that all corners could be used for all overpasses during analysis.The alignment angles were computed based on the predicted RADARSAT-2 orbit for the respective overpass and the point target location (latitude, longitude).
An accurate corner alignment is necessary because the peak RCS (i.e., at perfect alignment) is taken as the reference value for the computation of the transponder ERCS.The corner reflector RCS is comparably insusceptible to misalignments.On the other hand, the transponder's antenna pattern falls of more rapidly if a deviation from the main beam direction occurs, so more care needed to be taken when a transponder was aligned.
The alignment of the corner reflectors (made out of aluminum) was performed manually with an inclinometer and a compass.A local magnetic declination of 2.5°was accounted for when using a compass during azimuth alignment.The alignment standard uncertainty for both axes was estimated to be not above 0.5°.
The transponder was mounted on a positioner unit, which allows a motorized two-axis alignment in azimuth and elevation with high mechanical repeatability (better 0.1°).The alignment was zeroed in elevation with a water-level.In azimuth, a compass could not be used like for the corners because the positioner's ferromagnetic components result in a misreading.Instead, the true North direction was determined by measuring a reference azimuth direction with a GPS device.For this, two reference points where placed several dozens of meters away from the transponder, one in front, one behind, and both on the line of the current alignment.By measuring the accurate GPS positions, the azimuth orientation could be computed by determining the angle between the connecting line and true North.The standard uncertainty of this method was estimated to be 0.1°through cross-validation.

Imaged Area
All acquired RADARSAT-2 scenes imaged an area around the DLR site at Oberpfaffenhofen, Germany.The transponder was built up right on the DLR premises, allowing easy wiring and monitoring.The 1.5 m corners were installed in the immediate vicinity on the protected grasslands surrounding the airstrip at Oberpfaffenhofen.The site names for sites on which the 1.5 m corners are installed begin with D26.The locations are shown in Figure 2. The 3.0 m corners were located within a larger rectangular area of about 14 km × 8 km.
The site names for locations with a 3.0 m corner are D24, D25, and D27 to D30.An overview of the 3.0 m corner sites is shown in Figure 3.

Overview
The RADARSAT-2 datatakes were processed by MDA and the final single-look complex (SLC) images were the starting point for the data analysis.The primary analysis goal is to derive the transponder ERCS and, equally importantly, an associated uncertainty statement.In principal, the transponder ERCS ς t can be derived if the ERCS of a reference target (placed in the same scene) ς r is known according to the proportionality Here, P t and P r are the point target intensities of the transponder and the reference target, derived from 224 the processed SAR images and possibly expressed as digital numbers.

225
The analysis is split into two parts:  The primary analysis goal is to derive the transponder ERCS and, equally importantly, an associated uncertainty statement.In principal, the transponder ERCS ς t can be derived if the ERCS of a reference target (placed in the same scene) ς r is known according to the proportionality Here, P t and P r are the point target intensities of the transponder and the reference target, derived from the processed SAR images and possibly expressed as digital numbers.
The analysis is split into two parts: 1. Point target analysis: Extract the relative point target impulse response powers for all point targets in all scenes (see Section 4.2). 2. Parameter estimation: Set up a statistical model to derive the estimated transponder ERCS and corresponding uncertainty from all datatakes (see Section 4.3).

Power Estimation for Point Targets from SAR Images
The imaged point targets appear as bright crosses in the processed image.Two methods are distinguished when deriving the point target power: the peak and the integral method.With the peak method, the target power is described by the pixel with the peak power (complex pixel amplitude squared) whereas with the integral method, the target power results as the sum of pixel powers over a relevant region around the target.It was shown that the integral method is advantageous as it leads to accurate results even if the image is not perfectly focused [21].Thus the integral method was chosen for this analysis.
The goal of this analysis step is to derive a table of relative point target intensities for all point targets in all images.Although an absolute scale is not necessary (and the RADARSAT-2 absolute calibration factor is ignored here), it is crucial that the point target intensities derived from different images are all calibrated with respect to each other.This is achieved by applying the (range-independent) beta naught look-up table delivered together with the images [22].It was confirmed that no obvious incidence-angle dependence remained in the data.
The following steps were performed to derive a targets' integrated impulse response intensity: 1. Define a search window around the point target in the georeferenced, processed image.
2. Find and record the brightest pixel location.
3. Define an analysis window, centered on the brightest pixel of the previous step.4. Estimate the clutter power from four non-overlapping areas surrounding the peak. 5. Integrate the target power over a cross area (see Figure 4 and Table 3), capturing all pixels with significant point target power.6. Subtract the estimated clutter power from the integrated target power to get a clutter-compensated target power.
This procedure is in line with the one described in [3].An exemplary transponder impulse response is shown in Figure 5.The peak power lies more than 60 dB above the clutter level.This large separation is already an indication that clutter power compensation, in practice, is not necessary.Nevertheless, it was performed for all 126 analyzed point targets.Exemplary impulse responses for 3.0 m and 1.5 m corners are show in Figures 6 and 7, respectively.Although the impulse response is clearly visible in the range and azimuth cuts, both corners are comparatively hard to spot in the image patches themselves.This is due to the way the intensities in the image patches have been clipped for visualization.All digital values above a threshold (2.7 times the mean absolute amplitude within the patch) are set to the threshold value.

Bayesian Statistics and Hierarchical Model Fitting
The output of the previous processing step is a table which reports a relative point target power per target and scene.A graphical representation is shown in Figure 8.This is the input data to derive the transponder ERCS and its confidence interval.Original Data In order to estimate the target powers from all available data, daily RADARSAT-2 and transponder drifts need to be estimated and compensated, as described in the following Section 4.3.1.Then, the transponder ERCS can be computed with the measurement model Equation (4).The final, combined uncertainty shall include the uncertainties which are incorporated in each step.This problem is solved with Bayesian statistics, which directly allows to estimate the uncertainty for all derived parameters.

Daily RADARSAT-2 and Transponder Drifts
A plot of the original data from the first processing step was show in Figure 8.On first impression, the data non-uniformity is dominated by a daily systematic drift.For instance, independent of the radar target group (transponder, 3.0 m, or 1.5 m corners), a radiometric drift of more than 0.5 dB can be observed between the last and the previous overpass.Therefore, a daily RADARSAT-2 radiometric drift compensation had to be included in the analysis.
Also, there is one immediately apparent data outlier: The corner at site D26g on the third day was clearly not aligned and was masked out prior to further analysis.
Besides the RADARSAT-2 drift, the (mostly temperature-dependent) transponder drift can be estimated and compensated.The compensation is based on internal calibration data recorded by the transponder itself.Exemplary transponder loop gain and temperature drift data are shown in Figure 9.The dashed red line in the upper plot defines the point in time when the transponder RCS was corrected to its nominal RCS.The blue markers, on the other hand, indicate times at which the transponder RCS was merely estimated.In the case of Figure 9, the drift at the overpass time can be estimated to be nonexistent (0.0 dB).An upper bound on the uncertainty of this estimated drift can be stated from the plot as 0.02 dB, i.e., the true drift should lie, with high probability, in the range [−0.02, 0.02] dB.A similar assessment has been performed for all other days.The estimated transponder drifts and their estimated error bounds are listed in Table 4.A higher drift and/or error is stated for overpasses when a transponder software problem prohibited a nominal, temperature controlled operation of the prototype.

Hierarchical Bayesian Model
The Bayesian model requires the definition of priors, i.e., probability distributions for all model parameters.These shall be defined in the following.For the eight overpass days d, eight different daily RADARSAT-2 drifts r d need to be determined.It is estimated from Figure 8 that the daily drift, expressed as a scaling factor, will certainly be in the range 0.4 to 1.6 (i.e., −4 dB to 2 dB).The prior can thus be written as r d ∼ U(0.4,1.6), where U(a, b) describes a uniform distribution with lower and upper bounds a and b, respectively.
Besides the RADARSAT-2 drift, the transponder drift needs to be modeled.The values in column three of Table 4 define maximum error bounds, i.e., a lower bound a and an upper bound b.According to [7], this information can be converted to a standard uncertainty and therefore a normal distribution, where the standard deviation σ is given as Hence, the daily transponder drift s d is modeled as a normal distribution N(µ, σ ), where µ describes its mean and σ its standard deviation: The best estimate µ s,d is taken from Table 4, where also the σ s,d (resulting from Equation ( 5)) are listed.
It is assumed that all measured point target powers of all targets within one target group g (transponder t, 3.0 "30", and 1.5 corners "15") belong to the same population, which can be described by a normal distribution N(µ g , σ g ), having the group location (mean) µ g and group scale (standard deviation) σ g > 0. A normal distribution was chosen because it is symmetric (no plausible reason can be found for an asymmetric distribution), and because the distribution of the measured values results from several physical effects (satellite thermal drift, satellite and target alignment errors, target stability, clutter, etc.) so that the central limit theorem suggests a normal distribution as well.Each µ g and σ g are modeled to originate from uniform distributions: µ g ∼ U(10 1.5 , 10 7 ) (i.e., U(15 dB, 70 dB)) and σ g ∼ U(0, 10 6 ).In a way, the σ g are nuisance parameters which are not required to derive the transponder ERCS, but they are nevertheless necessary in order to describe the normal distributions N g .Now, for every overpass d, the data y d,g is fitted depending on its group g ∈ [t, 15, 30]: This way, the daily drift parameter r d is estimated from all available data, exploiting the fact that the drift should be equal for data across all three groups.
Estimating parameters with the model Equation ( 6) results in estimates for the (relative) point target powers for each group (µ t , µ 30 , and µ 15 ) after drift compensation.
The next step is to relate the point target powers to ERCS.For this, a reference ERCS is necessary.It was chosen to be the group ERCS of the 1.5 corners.(The not more than 7 years old 1.5 corner reflectors were, mechanically speaking, in a better shape than the more than 20 years old 3.0 corners, which show apparent deformations due to damages and their continuous exposition on a field.Mechanical deformations result in a reduction of a corners monostatic ERCS because some of the incident power is reflected away from the incident beam direction.The visual observation was confirmed by looking at the ERCS dispersion within each group: σ 15 = 0.15 dBm 2 and σ 30 = 0.41 dBm 2 .These observations lead to the conclusion that the ERCS of the utilized 3.0 corners should not be used as an absolute reference, and that the 1.5 corners provide a better link to an absolute ERCS.Nevertheless, the 3.0 corners and their large ERCS helped in determining more accurately the daily RADARSAT-2 drift.) The value of the reference ERCS is modeled as ς 15 ∼ N(10 3.838 m 2 , 10 0.02 m 2 ) (i.e., N(38.38 dBm 2 , 0.2 dBm 2 )).Its location is defined by Equation ( 3).The standard deviation, or standard uncertainty according to [7], characterizes the state of knowledge about the reference ERCS.The statement that the ERCS of 1.5 corners can be determined with Equation (3) up to a standard uncertainty of 0.2 dBm 2 is certainly the weakest point in the argument.It is based on previous experience gained from numerical field simulations on corners of the same size at X-band, and on plausibility.(A standard uncertainty of 0.2 dBm 2 is plausible because with it the theoretical RCS difference of 3.0 and 1.5 corners can be (well) explained.The theoretical difference, according to Table 2, is 12.05 dBm 2 .The difference between the estimated mean target powers (in MCMC parameters: µ 30 − µ 15 ) is 11.92 dB, i.e., 0.13 dB away from the predicted value despite the already discussed deformation of the 3.0 corners.It is still possible though that the RCS of all corners is, due to deformation and the approximate nature of Equation (3), lower than assumed.Nevertheless, Equation (3) seems to characterize the absolute RCS of trihedral corners despite some mechanical deformations with a standard uncertainty of not more than 0.2 dBm 2 .)Nevertheless, it cannot be proofed and further work should be conducted in determining the absolute knowledge of a trihedral corner reflector's ERCS.Now, the final transponder ERCS is deterministically related to the already derived model parameters through ς t = µ t µ 15 ς 15 (7) according to the measurement model Equation ( 4).The complete Bayesian model described above is visualized in Figure 10.

Posterior Simulation
The hierarchical model developed in the previous section is now solved iteratively with the numerical Markov chain Monte Carlo method (MCMC) [12,13].The goal is to find the most probable set of parameters (e.g., r d , µ 30 , etc.) which is most likely in describing the given data.If the model is well posed, then the iterative MCMC will converge to the true distribution of the parameters.This also means that the first draws/simulation runs need to be discarded, and only values after this burn in period should be considered.To improve the (required) independence between successive draws, only every nth simulation draw is considered, i.e., thinning is applied.
In order to compute the parameters of the hierarchical model, which was presented in the previous chapter, 2 × 10 5 simulation runs were conducted, allowing for a burn in of 1 × 10 4 and a thinning of 20.These parameters were determined empirically by observing the traces and autocorrelations of the model parameter draws.

MCMC Results
The solution of the hierarchical model describes all parameters at the same time.Nevertheless, the results can be visualized step by step.
The first result is the estimated RADARSAT-2 drift.The estimated drift is shown in Figure 11.The error bars indicate the range of values which defines the 95% probability intervals.From this it can be seen that the drift between the first and the second overpass is statistically not significant, but the drift between the last two overpasses, for instance, is.The drift was estimated with a standard uncertainty (according to [7]) of always less than 0.1 dB.
This estimated RADARSAT-2 drift can now be applied to the measured data.The original data in Figure 8 appears now much more uniform, see Figure 12.No apparent systematic drift remains.After RADARSAT-2 drift compensation, the estimated transponder drift can be taken out from the upper plot in Figure 12.The plot in Figure 13 results.Already visually it becomes clear that the transponder drift is small in comparison to the remaining dispersion of the measurement data.This can also explain why at times the drift compensation results in updated values which lie further away from the population mean.

Before After Transponder Drift Compensation
Now the most important result of the MCMC simulation is the derivation of the transponder ERCS ς t and its standard uncertainty.The transponder ERCS was estimated to be 60.80 dBm 2 with a standard uncertainty according to [7] of 0.206 dBm 2 .The 95% highest probability density interval is given as [60.38, 61.17] dBm 2 .Note that the standard uncertainty is clearly dominated by the assumed ERCS knowledge uncertainty of the 1.5 corners of 0.2 dBm 2 .

Posterior Predictive Checks: Model Verification
In Section 4.3.2, a normal distribution was chosen in order to model the observed integrated pixel intensities.Here it shall be demonstrated that this model is indeed plausible and adequate.
Focusing on the transponder data y t,d , test statistics T (y t,d ) of the observed data can be compared to the test statistics of replicated data samples T (y rep t,d ), i.e., samples which were generated numerically through the model [12].If this analysis is conducted for the four test statistics mean, standard deviation, minimum, and maximum value across all eight overpasses, Figure 14 results.A good model fit is found if the test statistic of the observed data (vertical line) lies close to the center of mass of the histogram.As a criterion, the p value (relative number of samples above observed test statistic) can be used.At a confidence level of 95%, the p value should therefore be within the range of 2.5% to 97.5%.This is observed for all four test statistics, and especially the most important aspect of the model, its mean, is well reproduced by the model with a p value close to ½.As a means of verification, the result of the previous section can be reproduced approximately with classical (frequentist) statistics.One way of handling the hierarchical data structure is to derive one transponder ERCS per overpass, and then to combine the resulting eight ERCS values through averaging into an overall transponder ERCS.The disadvantage of this simplified approach (in comparison to approach using hierarchical Bayesian data analysis as shown before) is that information about the uncertainty of each of the eight transponder ERCS values is lost and does not contribute to the combined uncertainty.
After averaging, an estimated standard deviation of the mean can be derived from the eight estimated ERCS values, resulting in a standard uncertainty for the estimated transponder ERCS [7].The third step is then to derive a combined standard uncertainty by incorporating the ERCS knowledge uncertainty of the 1.5 corners of 0.2 dB through the method of "root-sum-square".Lastly, the expanded uncertainty [7] with a coverage factor of k = 2 is derived.
This approach results in an estimated transponder ERCS of 60.80 dBm 2 with a standard uncertainty of 0.20 dB or an expanded standard uncertainty of 0.41 dB at a confidence level of 95% (k = 2).The result confirms the findings of the previous section.
Note that once again the combined uncertainty is dominated by the assumed ERCS knowledge uncertainty of the 1.5 corner reflectors.The transponder's combined backscatter uncertainty is sufficiently low to permit the calibration of SAR instruments like Sentinel-1 to an absolute radiometric uncertainty of 1.00 dB at a confidence level of 99%, provided that the SAR instrument is otherwise sufficiently precise [23].

Discussion of Hierarchical Bayesian Data Analysis for Radiometric Calibration
The advantages of using hierarchical Bayesian data anlysis for radiometric calibration were laid out before: The approach can jointly answer typical analysis questions in radiometric calibration while fully exploiting the hierarchical nature of external calibration data and fulfilling the requirement on reporting measurement uncertainties and confidence intervals.This section shall add a critical discussion of the proposed method.
First, the most recognized approach in metrology for deriving measurement uncertainties is the ISO Guide to the expression of uncertainty in measurements (GUM) [7].The GUM, which was introduced in 1993, fundamentally applies classical (frequentist) statistics, and is not directly compatible with a Bayesian approach (which was proposed in this paper).Nevertheless, the current GUM approach has been repeatedly criticized, and Bayesian methods have been proposed as a consistent and universally applicable replacement [24][25][26].In practice, uncertainties derived by Bayesian statistics are often equal or approximately equal to uncertainties derived by classical statistics [25] so that both approaches lead to desired results.Yet thanks to the availability of numerical tools, Bayesian computations are now often simpler than classical approaches if hierarchical data is analyzed [14].Therefore, taking a Bayesian approach for deriving the measurement uncertainty of the absolute calibration factor seems justified.
From the outset, using Markov chain Monte Carlo (MCMC) simulations to infer model parameters appears to be more complicated than employing classical statistics.The added initial work is offset though by a joint probability model, which allows to derive model parameters on arbitrary hierarchical levels without loss of information.Hence the improved analysis justifies the initial extra work.
If numerical methods like MCMC are used, problems of non-convergence can occur and must be addressed during analysis.Care must be taken in the assessment of simulation results, and plausibility checks should be added.

Conclusions
The presented work proposed for the first time in the field of radiometric SAR system calibration to exploit hierarchical Bayesian data analysis.It was claimed that Bayesian statistics is well suited to the analysis of calibration data because of the following key factors: • Within Bayesian statistics, probability distributions are used in describing model parameters.The distributions convey a meaning of uncertainty.Bayesian statistics is therefore an appropriate choice for calibration, where an estimated parameter is meaningless without a statement of its uncertainty.• Hierarchical joint probability models are well suited to describe data that is typically acquired during an external radiometric SAR calibration campaign.During data analysis, depending on the research question, parameters often need to be estimated on different levels or for different groups.Hierarchical Bayesian modeling is well suited to derive model parameters for different interdependent parameters, especially when numerical methods like Markov chain Monte Carlo simulations are used.
The applicability of the method for radiometric calibration was demonstrated through a case study.The data of an external calibration campaign was analyzed.The campaign goal was to derive the ERCS of DLR's C-band Kalibri transponder prototype.Due to hierarchical Bayesian data analysis, it was possible to estimate and compensate the overpass-dependent drift of the RADARSAT-2 system and to derive the transponder ERCS with a remaining standard uncertainty (according to GUM) of only 0.21 dBm 2 .
In order to convert the case study approach to an operational uncertainty analysis procedure for SAR missions, a database of point-target measurements should be set up.Filling the database incrementally with measurements of permanently installed reference point targets over the mission lifetime would allow one to continually derive radiometric uncertainty estimates based on Bayesian statistics.
The authors are convinced that Bayesian statistics and hierarchical modeling are an important step toward traceable radiometric SAR system calibration.Uncertainties and confidence intervals can now be derived with the same diligence that is presently applied to the estimation of calibration parameters.

Figure 1 .
Figure 1.Artist's rendering of DLR's new C-band Kalibri transponder, mounted on a two-axis positioner.

Figure 2 .Figure 2 .
Figure 2. Locations of the transponder on the DLR premises and of the 1.5 m corners on the adjacent airport in Oberpfaffenhofen, Germany.(Map tiles in this and Figure 3 reproduced with permission from Stamen Design, based on data from OpenStreetMap.) Version November 28, 2013 submitted to Remote Sens.9 of ??

219 4 .
Case Study: Data Analysis and Results

220 4 .
1. Overview 221The RADARSAT-2 datatakes were processed by MDA and the final single-look complex (SLC) 222 images were the starting point for the data analysis. 223

226 1
Map tiles in Fig.2 and 3reproduced with permission from Stamen Design, based on data from OpenStreetMap.

Figure 3 .Figure 3 .
Figure 3. Map of imaged area, showing the locations of the 3.0 m corners.

Figure 5 .
Figure 5. Transponder impulse response for the first overpass on 7 April 2013.A large target-to-clutter ratio is apparent.The four red squares indicate the areas from which the clutter power was estimated.

Figure 6 .
Figure 6.3.0 corner impulse response at site D24 for the first overpass on 7 April 2013.Also see caption of Figure 5.

Figure 7 .
Figure 7. 1.5 corner impulse response at site D26 for the first overpass on 7 April 2013.Also see caption of Figure 5.

Figure 8 .
Figure 8. Uncompensated and unmasked data, the immediate result of the processing described in Section 4.2.The data points lie on a common ordinate; for better visibility, one region per target group is plotted.

Figure 9 .
Figure 9.Estimated transponder drift temperature stability (main influence for gain drift) for the overpass on 21 April 2013.The transponder operates with its nominal loop gain if a relative drift of 0 dB is detected.The loop gain was adjusted at 16:30 (red dotted line), after which the drift was monitored (blue circles) until the overpass (dashed green line).After drift correction, the drift is estimated to be with high probability within the range [−0.02, 0.02] dB at 17:03 UTC.

Figure 10 .
Figure 10.Diagram of the Bayesian model.The ellipsis symbol . . .indicates a family of probability distributions (per group g or overpass d), whereas ∼ means that a parameter is drawn from the respective distribution.

Figure 11 .Figure 12 .
Figure 11.Estimated daily drifts r d of the RADARSAT-2 system.The error bars indicate 95% highest-probability density intervals.

Figure 13 .
Figure 13.Visualization of the Kalibri transponder drift compensation with data from Table4.For visual guidance, the gray line marks the sample mean of the data before transponder drift compensation.

Figure
Figure 14.Posterior predictive checking for predicted (modeled) transponder data y

Table 1 .
Overview of approximate acquisition times (in UTC).

Table 2 .
Corner reflectors with two different inner-leg lengths were used during the campaign.

Table 3 .
Cross parameters according to Figure 4 used during analysis.

Table 4 .
(5)ly transponder RCS drift, estimated maximal error bounds on the estimated drift, and resulting standard uncertainties according to Equation(5).Date Estimated Drift µ s (dB) Maximal Error (dB) Resulting σ s (dB) VersionNovember 20, 2013submitted to Remote Sens. 18 of 26 Figure 10.Diagram of the Bayesian model.The ellipsis symbol . . .indicates a family of probability distributions (per group g or overpass d), whereas ∼ means that a parameter is drawn from the respective distribution.
14. Posterior predictive checking for predicted (modeled) transponder data y rep t,d and four different test statistics.Especially the most important aspect of the predicted data, its mean, is well modeled.
T (y t,d ) = max(y t,d ) [dB] the 1.5 m corners of 0.2 dB through the method of "root-sum-square".Lastly, the expanded uncertainty [?382] with a coverage factor of k = 2 is derived.383Thisapproach results in an estimated transponder ERCS of 60.80 dBm 2 with a standard uncertainty 384 of 0.20 dB or an expanded standard uncertainty of 0.41 dB at a confidence level of 95 % (k = 2).The 385 result confirms the findings of the previous section.386 Note that once again the combined uncertainty is dominated by the assumed ERCS knowledge 387 uncertainty of the 1.5 m corner reflectors.The transponder's combined backscatter uncertainty is 388 sufficiently low to permit the calibration of SAR instruments like Sentinel-1 to an absolute radiometric 389 uncertainty of 1.00 dB at a confidence level of 99 %, provided that the SAR instrument is otherwise 390 sufficiently precise [? ]. 391 5. Discussion of Hierarchical Bayesian Data Analysis for Radiometric Calibration 392 The advantages of using hierarchical Bayesian data anlysis for radiometric calibration were laid out 402 applicable replacement [? ??].In practice, uncertainties derived by Bayesian statistics are often equal 403 4.5.Plausibility Check with Classical Statistics